[Beowulf] anyone using SALT on your clusters?
deadline at eadline.org
Fri Jun 28 12:56:00 PDT 2013
> On Fri, Jun 28, 2013 at 09:45:50AM +0100, Jonathan Barber wrote:
>> The problem with SSH based approaches is when you have failed nodes -
>> normally they cause the entire command to hang until the attempted
>> connection times out.
> Normally what people do is ping the node before trying ssh on it. And
> have reasonable timeouts around both the ssh connect and the command
> execution. There's no fundamental reason why this is any different
> from messaging or subscription-plus-messaging.
I have found using whatsup-pingd
(https://computing.llnl.gov/linux/whatsup.html) run once every minute or
so, to create a list of "up nodes"
and "down nodes" is very handy. You can even point pdsh WCOLL to the up
> -- greg
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> Mailscanner: Clean
More information about the Beowulf