[Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.

Rahul Nabar rpnabar at gmail.com
Wed Sep 1 08:18:06 PDT 2010


On Wed, Sep 1, 2010 at 3:47 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> My impression was always (as there is a similar setting for the load_threshold in OGE), that it should limit the number of jobs on a big SMP machine when you oversubscribe by intention, as not all parallel jobs are really using all the CPU power over their lifetime (maybe such a machine was even operated w/o any NFS). Then allowing e.g. 72 slots for jobs on a 60 core maschine might get most out of it with a load near 100%.

Our scheduler is currently set as to never allow over-subscription.
Also, we don't allocate shared nodes. Users get resources in 8-core
increments.

-- 
Rahul




More information about the Beowulf mailing list