[Beowulf] running out of rsh ports
strombrg at dcs.nac.uci.edu
Wed May 3 13:07:02 PDT 2006
On Wed, 2006-05-03 at 15:21 -0400, Joe Landman wrote:
> David Simas wrote:
> > Except that it probably won't help with the problem, which I'm
> > guessing is caused by a given host attempting more than 1024
> > RSH connections to a given server in less than TCP TIME WAIT
> > seconds (minutes, whatever). If the original correspondent
> Actually it handles exactly these cases. The FANOUT variable lets you
> indicate the appropriate parallelism for rsh. I believe pdsh is in use
> on the big clusters ( > 1024 nodes at the national labs )
Nod. I was pleased to learn of pdsh. FWIW, loop doesn't try to run all
n at once either, though this degree of parallelism is controlled with a
command line option.
> > doesn't want to use SSH for RSH, which would fix things
> True, and you can use ssh with pdsh. Or rsh. With no syntax change to
> the end user.
> > SSH isn't restricted to low-numbered ports, he could try to
> > re-implement his application in MPI.
> The basic question a few of us have is exactly what is Bruce and team
> doing that is causing them to run out of ports. Once we see this, we
> can stop guessing and make better/targetted suggestions.
Yup, and strace/truss/whatever is your friend for that:
...though based on the message, I'm guessing they are trying to run too
many rsh's in parallel, and hence running out of reserved ports.
More information about the Beowulf