[Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Marian Marinov mm at yuhu.bizWed Sep 1 03:15:55 PDT 2010
- Previous message: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.
- Next message: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 01 September 2010 11:47:29 Reuti wrote: > Am 01.09.2010 um 09:34 schrieb Christopher Samuel: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > On 01/09/10 01:58, Reuti wrote: > >> With recent kernels also (kernel) processes in D state > >> count as running. > > > > I wouldn't say recent, that goes back as far as I can > > remember. > > > > For instance I've seen RHEL3 (2.4.x - sort of) NFS servers > > with load averages in the 80's where they were run with a lot > > of nfsd's that were blocked waiting for I/O due to ext3. > > My impression was always (as there is a similar setting for the > load_threshold in OGE), that it should limit the number of jobs on a big > SMP machine when you oversubscribe by intention, as not all parallel jobs > are really using all the CPU power over their lifetime (maybe such a > machine was even operated w/o any NFS). Then allowing e.g. 72 slots for > jobs on a 60 core maschine might get most out of it with a load near 100%. > > Well, getting now 12 cores in newer CPUs and assemble them to 24 or 48 core > machines would make such a setting useful again. Maybe the load sensor > should honor only the scheduled jobs' load. > > -- Reuti > > > cheers! > > Chris I believe that the load threshold should be set depending on the type of jobs you run on your compute nodes. In some cases the load is not linked only to disk/network I/O and CPU, sometimes the jobs do a lot of in memory changes which bring more weight then the actual CPU or disk/network I/O. So for example a load average of 15 can also be considered for normal load, as far as the system is still responsive and the jobs time don't degrade. -- Best regards, Marian Marinov -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://www.scyld.com/pipermail/beowulf/attachments/20100901/6b3e8b8d/attachment.bin
- Previous message: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.
- Next message: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
