[Beowulf] numactl and load balancing

Gus Correa gus at ldeo.columbia.edu
Thu Jul 23 13:45:22 PDT 2015


On 07/23/2015 04:25 PM, Gus Correa wrote:
> On 07/23/2015 03:03 PM, mathog wrote:
>> Dell with 2CPU x 12core x 2 threads, shows up in procinfo as 48 cpus.
>>
>> Trying to run 30 processes 1 each on different "CPU"s by starting them
>> one at a time with
>>
>> numactl -C 1-30 /$PATH/program #args...
>>
>> when 30 have started the script spins waiting for one to exit then
>> another is started.  "top" is showing some of these are running at 50%
>> CPU, so they are being started on a CPU which already has a job going. I
>> can see where that would happen, since there doesn't seem to be anything
>> in numactl about load balancing. The thing is, these processes are
>> _staying_ on the same CPU, never migrating to another.  That I don't
>> understand.  I would have thought numactl sets some mask on the process
>> restricting the CPUs it can move to, but would not otherwise affect it,
>> so the OS should migrate it when it sees this situation.  In practice it
>> seems to leave it running on whichever CPU it starts on.  Or does linux
>> not migrate processes when they are heavily loading a single CPU, only
>> when they run out of memory???
>>
>> Also "perf top" shows 81% for the program and 13% for numactl.
>>
>> The goal here is to carefully divvy up the load so that exactly 15 jobs
>> run on each Numa zone, since then the data in all the inner loops will
>> fit within the 30M of L3 cache on each CPU.  If it puts 17 on one and 13
>> the inner loop data won't fit and performance slows down dramatically.
>> Looks like I need to keep track of which job is running where and
>> numactl lock it to that node.  (I don't think there is a queue system on
>> this machine at present.)
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> Hi David
>
> Two alternatives to numactl that you could try are
> taskset and hwloc/hwloc-bind.
> The latter, along with the lstopo utility,
> may allow finer control regarding numa, cache, etc.
>
> I hope this helps,
> Gus Correa
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

PS - There is good information about hwloc here:

http://www.open-mpi.org/projects/hwloc/
http://www.open-mpi.org/projects/hwloc/tutorials/

[It is used by Open MPI, but independent from it.]

Gus Correa


More information about the Beowulf mailing list