[Beowulf] MPI performance on clusters of SMP
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kozin, I (Igor) I.Kozin at dl.ac.ukThu Aug 26 08:15:06 PDT 2004
- Previous message: [Beowulf] mpich on OS X
- Next message: [Beowulf] MPI performance on clusters of SMP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Nowadays clusters are typically built from SMP boxes. Dual cpu nodes are common but quad and more available too. Nevertheless I never saw that a parallel program runs quicker on N nodes x 2 cpus than on 2*N nodes x 1 cpu even if local memory bandwidth requirements are very modest. The appearance is such that shared memory communication always comes at an extra cost rather than as an advantage although both MPICH and LAM-MPI have support for shared memory. Any comments? Is this MPICH/LAM or Linux issue? At least in one case I observed a hint towards Linux. I run several instances of a small program on a 4-way Itanium2 Tiger box with 2.4 kernel. The program is basically a loop over an array which fits into L1 cache. Up to 3 instances finish virtually simultaneously. If 4 instances are launched then 3 finish first and the 4th later the overall time being about 40% longer. Igor
- Previous message: [Beowulf] mpich on OS X
- Next message: [Beowulf] MPI performance on clusters of SMP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
