Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Again about NUMA (numactl and taskset)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Håkon Bugge Hakon.Bugge at scali.com
Thu Jun 26 02:05:50 PDT 2008


At 01:23 25.06.2008, Chris Samuel wrote:
> > IMHO, the MPI should virtualize these resources
> > and relieve the end-user/application programmer
> > from the burden.
>
>IMHO the resource manager (Torque, SGE, LSF, etc) should
>be setting up cpusets for the jobs based on what the
>scheduler has told it to use and the MPI shouldn't
>get a choice in the matter. :-)

I am inclined to agree with you in a perfect 
world. But, from my understanding the resource 
managers does not know the relationship between 
the cores. E.g., does core 3 and core 5 share a 
cache? Do they share a north-bridge bus, or are 
they located on different sockets?

This is information we're using to optimize how 
pnt-to-pnt communication is implemented. The 
code-base involved is fairly complicated and I do 
not expect resource management systems to cope with it.

I posted some measurement of the benefit of this 
methods some time ago and I include it here as a 
reference: 
http://www.scali.com/info/SHM-perf-8bytes-2007-12-20.htm 
If you look at the ping-ping numbers, you will se 
a nearly constant message rate, independent of 
placement of the processes. This contrary to 
other MPIs which (apparently) does not use this technique.

So, in a practical world I go for performance, not perfect layering ;-)

>Also helps when newbies run OpenMP codes thinking they're
>single CPU codes and get 3 or 4 on the same 8 CPU node.

Not sure I read you here. Do you mean pure OMP or hybrid models?



Thanks, Håkon





More information about the Beowulf mailing list