[Beowulf] MPI performance on clusters of SMP
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Philippe Blaise philippe.blaise at cea.frThu Aug 26 09:18:40 PDT 2004
- Previous message: [Beowulf] MPI performance on clusters of SMP
- Next message: [Beowulf] MPI performance on clusters of SMP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Igor, the situation is rather complex. You compare a N nodes x 2 cpus with a 2 * N nodes x 1 cpu machine, but you forget the number of network interfaces. In the first case the 2 cpus share the network interface and they share the memory too. And of course, in the first case, you save money because you have less network cards to buy... that's why cluster with 2 cpus boxes are so common. And the 2 cpus boxes can be smp (intel) or ccnuma (opteron) Then, it's difficult to predict if a N nodes x 2 cpus machine performance is better than the 2 N * 1 cpu solution for a given program. The better way is to do some tests ! For example, a MPI_Alltoall communication pattern should be more effective on a 2 N * 1 cpu machine, but it could be the inverse situation for a intensive MPI_Isend / MPI_Irecv pattern... For your tiger box problem, first you should know that the intel chipset is not very good, then are you sure that no other program (like system activity) has interfered with your measurments ? regards, Philippe Blaise Kozin, I (Igor) wrote: >Nowadays clusters are typically built from SMP boxes. >Dual cpu nodes are common but quad and more available too. >Nevertheless I never saw that a parallel program runs quicker >on N nodes x 2 cpus than on 2*N nodes x 1 cpu >even if local memory bandwidth requirements are very modest. >The appearance is such that shared memory communication always >comes at an extra cost rather than as an advantage although >both MPICH and LAM-MPI have support for shared memory. > >Any comments? Is this MPICH/LAM or Linux issue? > >At least in one case I observed a hint towards Linux. >I run several instances of a small program on a 4-way Itanium2 Tiger box >with 2.4 kernel. The program is basically >a loop over an array which fits into L1 cache. >Up to 3 instances finish virtually simultaneously. >If 4 instances are launched then 3 finish first and the 4th later >the overall time being about 40% longer. > >Igor >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > >
- Previous message: [Beowulf] MPI performance on clusters of SMP
- Next message: [Beowulf] MPI performance on clusters of SMP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
