[Beowulf] Again about NUMA (numactl and taskset)

Vincent Diepeveen diep at xs4all.nl
Mon Jun 23 16:27:54 PDT 2008


On Jun 23, 2008, at 9:12 PM, Mark Hahn wrote:

>> "how sure are we that a process (or thread) that allocated and  
>> initialized and writes to memory at a single specific memory node,
>> also keeps getting scheduled at a core on that memory node?"
>
> numactl --cpubind=0 --membind=0
>
>> It seems to me that sometimes (like every second or so) threads  
>> jump from 1 memory node to another. I could be wrong,
>> but i certainly have that impression with the linux kernels.
>
> you can always tie a thread to a core.  for non-bound threads,
> the question is really how long the kernel should leave a runnable  
> thread "on" a busy cpu before running it on another (idle) cpu.   
> the kernel
> does try to avoid this, but how hard has in the past depended on  
> the kernel's guess about the cache footprint of the thread and its  
> "natural"
> timeslice (how long it typically runs before yielding.)
> ______

Mark, thanks for your input. I've tried that numactl several times to  
no avail. It kept doing wrong. Though this is from a few years ago,
last time i toyed a few days with numactl, it could have been  
improved by now, maybe.

It is in itself a very relevant topic that Michael Kuzminsky  
adresses, as when a thread allocates a lot of memory, it is really  
relevant.

Now i assume what i'm doing is on paper the ideal situation. If an  
AMD machine with 2 to 4 memory controllers
has say 4 GB of ram, i give each process (memory - 500MB) / 4.

So that is quite a tad of RAM. This ram gets nonstop hammered upon  
storing better ways to achieve finding the holy grail.
It's writing about 150k entries a second to RAM on each core to  
memory controller at my AMD dual opteron dual core 2.4ghz machine
(probably by most of you considered nowadays as old energy wasting  
junk, but well). If a cpu has say 750MB ram that means
to get a loading factor alpha of 0.5 into memory (ignoring the  
chaining that happens a lot by the way as taking the  
efficiencydecrease by
chaining is faster thanks to how latency to RAM works) is roughly 0.5  
* 750M / (20 bytes * 0.150M/s) = 0.5 * 750 / 3 = 125 seconds

A game can last for an hour or 6.

So even a single switch within that 6 hours to another memory node is  
very bad.

I tried to lock with commands each process to a different core (4  
processes, 4 cores). I still saw a flip sometimes.

Now of course at big clusters/supers some software support from  
manufacturers allows automigration of nodes and memory, with good  
reasons.
So i guess for this type of scheduling we speak at a different level.  
Avoiding latency of RAM over the network by scheduling
nodes closer to each other is really important.

Yet within 1 node it is a different story.

Suppose we've got 4 search processes P0..P3 and we have 4 cores C0..C3

I am guessing this happens, please tell me it is wrong:
    some OS-service gets a timeslice at C1, searchprocess P1 gets  
pushed backwards in the queue.
    C2's timeslice at memory node 1 finishes. P2 gets pushed back in  
FIFO queue. P1 is before P2 in the queue,
     so P1 runs on C2.

What i want is in fact that P2 starts to run on C2 and P1 still keeps  
in queue ntil the service timeslice finished.

Note the above is based purely on guessing based upon a 100  
assumptions from something i *thought* i saw this or that;
i didn't look in kernel code for it. Compared to that Perry is not  
paranoia at all.

When getting in a further state you only consider someone paranoia  
when a person is paranoia with respect to future occasions;
seeing some sort of ghost or bottleneck will be there as "it has to  
suck".
RGB don't give up yet! Do your public duty and take care that Perry  
speaks out on those subjects,
as we all have to deal with it, some more than others! Maybe tempt  
him post on how he deals with potential nuke builders?

Best Regards,
Vincent

> _________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>




More information about the Beowulf mailing list