LAM SMP performance
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduFri Dec 8 14:26:53 PST 2000
- Previous message: LAM SMP performance
- Next message: LAM SMP performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Patrick Geoffray wrote: > > On Fri, 8 Dec 2000, Josip Loncaric wrote: > > > I believe that you are thinking of sysv (semaphores). LAM compiled with > > usysv uses spinlocks, and the peak 266 Mbyte/s bandwidth is reached for > > 8KB cache-to-cache copies. Memory gets involved only for larger message > > sizes, and then the bandwidth drops to 127 Mbyte/s. See my raw data > > For the bandwidth measurement, it's a good occasion to talk about a good > way to measure SMP bandwidth : some people do not accept cache-to-cache > performance values because they do not show the real memory bus capacity, > some others do. > > I believe that it gives a more accurate result as an application usually > write the message to send just before to send it, so the data is in the > sender cache. On another hand, the message can be asynchronous and the > cache can be trashed on the receiving side before the user application > uses the payload. > > What do you think ? The performance figures which NPmpi reports are what an application sees, and therefore should be accepted. These cache effects are due to the computer architecture, not clever coding. (I believe that at the lowest level, LAM invokes plain memcpy() to move shared memory data, but management of data movements is actually done by hardware, which exploits the L2 caches as much as it can.) We all know that RAM bandwidth is a bottleneck that should be avoided whenever possible. Therefore, the whole idea of having caches is to get performance boosts at small but still reasonable data sizes. Applications which manage their work in a cache friendly way will see significant benefits, whether doing matrix multiplies or exchanging data via shared memory. In that light, peak performance is definitively interesting, even when we measure the entire curve from 1 byte to 1 Mbyte message size. BTW, if you are interested in sustained (out-of-cache) shared memory performance, LAM-6.3.2-usysv still works very nicely (127.3 Mbyte/s). MPICH-1.2.0 is almost as good at 120.0 Mbyte/s. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: LAM SMP performance
- Next message: LAM SMP performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
