[Beowulf] Definition of HPC

Mark Hahn hahn at mcmaster.ca
Wed Apr 24 08:18:33 PDT 2013


> Sure, it's important - WITHIN a given job.  Why should a
> new job's performance depend on what ran before?  (And in

I can't see why anyone would want to throw away a performance improvement.

> most cases, the impact is negative, because the cached
> pages are not the ones needed by the new job.)

prove it.  scavenging clean pages is one of the most common
kernel paths, constantly being used in normal operation.
I see no reason to expect that scavenging overhead is noticable,
and especially that bulk reclamation is significantly faster than
incremental.

I guess that's a useful point to make here: drop_caches is not doing
anything different than the kernel normally does.  it's just doing 
the same thing, but in bulk and blindly, including pages that really
will be used again and shouldn't be dropped.

if the claim is that drop_caches creates less cpu cache pollution than
the normal incremental scavenging, well, that would be interesting 
to see the numbers for.  certainly possible, but would be surprising
given the overhead for system calls in the first place, and given
that the user-space codepath is, after all, *IO*, which tends to be 
fairly cpu-cache-unfriendly in the first place.


>> for sites where a single job is rolled onto all nodes and runs for a long
>> time, then is entirely removed, sure, it may make sense.  rebooting 
>> entirely
>> might even work better.  I'm mainly concerned with clusters which run a
>> wide mixture of jobs, probably with multiple jobs sharing a node at times.
>
> I would advise any user never to do that.

don't be silly: nodes are fat, and there is waste for many workloads
if you only allocate full nodes.  this is a decision that must be made
based on your workload mixture.  my organization (like MANY others),
handles very disparate workloads, and cannot easily switch to unshared
nodes.

nodes are also not getting thinner.

>> who says determinism is a good thing?  I assume, for instance, you turn off
>> your CPU caches to obtain determinism, right?  I'm not claiming that 
>> variance
>> is good, but why do you assume that the normal functioning of the pagecache
>> will cause it?
>
> Try it and see.

are you just hecking, or do you have some measurements to contribute?


More information about the Beowulf mailing list