LAM SMP performance
josip at icase.edu
Fri Dec 8 14:26:53 PST 2000
Patrick Geoffray wrote:
> On Fri, 8 Dec 2000, Josip Loncaric wrote:
> > I believe that you are thinking of sysv (semaphores). LAM compiled with
> > usysv uses spinlocks, and the peak 266 Mbyte/s bandwidth is reached for
> > 8KB cache-to-cache copies. Memory gets involved only for larger message
> > sizes, and then the bandwidth drops to 127 Mbyte/s. See my raw data
> For the bandwidth measurement, it's a good occasion to talk about a good
> way to measure SMP bandwidth : some people do not accept cache-to-cache
> performance values because they do not show the real memory bus capacity,
> some others do.
> I believe that it gives a more accurate result as an application usually
> write the message to send just before to send it, so the data is in the
> sender cache. On another hand, the message can be asynchronous and the
> cache can be trashed on the receiving side before the user application
> uses the payload.
> What do you think ?
The performance figures which NPmpi reports are what an application
sees, and therefore should be accepted. These cache effects are due to
the computer architecture, not clever coding. (I believe that at the
lowest level, LAM invokes plain memcpy() to move shared memory data, but
management of data movements is actually done by hardware, which
exploits the L2 caches as much as it can.)
We all know that RAM bandwidth is a bottleneck that should be avoided
whenever possible. Therefore, the whole idea of having caches is to get
performance boosts at small but still reasonable data sizes.
Applications which manage their work in a cache friendly way will see
significant benefits, whether doing matrix multiplies or exchanging data
via shared memory. In that light, peak performance is definitively
interesting, even when we measure the entire curve from 1 byte to 1
Mbyte message size.
BTW, if you are interested in sustained (out-of-cache) shared memory
performance, LAM-6.3.2-usysv still works very nicely (127.3 Mbyte/s).
MPICH-1.2.0 is almost as good at 120.0 Mbyte/s.
Dr. Josip Loncaric, Senior Staff Scientist mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
More information about the Beowulf