[Beowulf] New member, upgrading our existing Beowulf cluster

Joshua Baker-LePain jlb17 at duke.edu
Thu Dec 3 11:35:45 PST 2009


On Thu, 3 Dec 2009 at 2:29pm, Mark Hahn wrote

>>> if a single node goes down, you need to take down all the
>>> nodes in the chassis before you can remove the dead node. Not very
>>> practical.
>> 
>> Eh? What's so hard about marking the other nodes as unusable in your
>> batch system, and waiting for them to become free?
>
> depends on your max job length.  but yeah, idling three nodes for a week
> is not going to be noticable in anything but a quite small cluster...

But doesn't the engineer in you just bristle at the (admittedly, rather 
slight) inefficiency?  Call me OCD (you wouldn't be the first), but it 
just bugs me.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF



More information about the Beowulf mailing list