[Beowulf] numactl and load balancing

mathog mathog at caltech.edu
Thu Jul 23 12:03:17 PDT 2015


Dell with 2CPU x 12core x 2 threads, shows up in procinfo as 48 cpus.

Trying to run 30 processes 1 each on different "CPU"s by starting them 
one at a time with

numactl -C 1-30 /$PATH/program #args...

when 30 have started the script spins waiting for one to exit then 
another is started.  "top" is showing some of these are running at 50% 
CPU, so they are being started on a CPU which already has a job going.  
I can see where that would happen, since there doesn't seem to be 
anything in numactl about load balancing. The thing is, these processes 
are _staying_ on the same CPU, never migrating to another.  That I don't 
understand.  I would have thought numactl sets some mask on the process 
restricting the CPUs it can move to, but would not otherwise affect it, 
so the OS should migrate it when it sees this situation.  In practice it 
seems to leave it running on whichever CPU it starts on.  Or does linux 
not migrate processes when they are heavily loading a single CPU, only 
when they run out of memory???

Also "perf top" shows 81% for the program and 13% for numactl.

The goal here is to carefully divvy up the load so that exactly 15 jobs 
run on each Numa zone, since then the data in all the inner loops will 
fit within the 30M of L3 cache on each CPU.  If it puts 17 on one and 13 
the inner loop data won't fit and performance slows down dramatically.  
Looks like I need to keep track of which job is running where and 
numactl lock it to that node.  (I don't think there is a queue system on 
this machine at present.)

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the Beowulf mailing list