another radical concept...Re: [Beowulf] Cooling vs HW replacement

Chris Samuel csamuel at vpac.org
Wed Jan 19 20:14:07 PST 2005


On Wed, 19 Jan 2005 09:17 am, Jim Lux wrote:

> I was thinking about doing the load scheduling at a higher "cluster" level,
> rather than at the micro "single processor" level... That way you could
> manage thermal issues in a more "holistic" way.. (like also watching disk
> drive temps, etc.)

Hmm, well Moab (the next generation of the Maui scheduler from SuperCluster / 
ClusterResources)  supports Ganglia as a source of information, so 
technically if you could get that temperature stuff into Ganglia (which 
should be that hard with lm_sensors and gmetric) you could persuade Moab to 
include that in decisions (such as marking a node that's too hot as 'busy' so 
as to not put any more jobs there)..

Using Moab with Ganglia is documented here:

 http://www.clusterresources.com/products/mwm/docs/13.5nativerm.shtml

There's some information about other possible ways to get that data into Moab 
here:

 http://www.clusterresources.com/products/mwm/docs/13.6multirm.shtml

The only problem that we've found (which has stopped us using this so far) is 
that if you're running Ganglia on systems that aren't part of PBS's view of 
the cluster (such as your login nodes) then Moab starts to put job 
reservations onto them that can never be fulfilled..

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050120/6a289396/attachment.sig>


More information about the Beowulf mailing list