Better rsh with timeout option?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
alan at dasher.wustl.edu alan at dasher.wustl.eduFri Feb 8 06:32:12 PST 2002
- Previous message: Better rsh with timeout option?
- Next message: Better rsh with timeout option?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
:(where lilith is a laptop that happens to be entirely off, so it doesn't
:even ping). In cases where the host pings but sshd happens to be turned
:off it tries four times and then quits -- about four seconds. I haven't
:tried it when a host is hung in exactly the state you describe (pingable
:so there is a route, but sshd nonresponsive but not quite dead because
:of thrashing) but it seems likely that it would be handled somewhere
:between these two extremes.
Not always -- there are times when this sort of assumption will fail,
and ssh will hang forever. We used to run into this trouble routinely
on our cluster when running large G98 jobs, and although the problems
diminished when we upgraded to the 2.4 kernels, they never entirely
disappeared.
I handled this by surrounding the ssh calls in my perl script with
alarms, such that the perl code enforced the timeout since ssh
wouldn't. This approach wasn't foolproof -- for reasons I don't
entirely understand, every once in a while the perl code would dump
core. However, since it was only a status monitor, I just set up a
cron job to restart it if it stopped.
If you're interested, you can grab the perl code for my status monitor
at http://dasher.wustl.edu/~alan/software/ psh2, which runs commands
on each node in turn, does not have this code built in to it, but
that's because the user tells it which nodes to run on, and can
manually skip nonresponsive nodes.
Hope this helps,
Alan Grossfield
----------------------------------------------------------
| Programming: a pastime similar to banging one's head |
| against a wall, but with fewer opportunities for |
| reward. The Jargon File |
|----------------------------------------------------------
- Previous message: Better rsh with timeout option?
- Next message: Better rsh with timeout option?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
