RSH scaling problems...

Mon Dec 16 08:37:41 PST 2002

Hi Mike:

Look in your log files Luke...  

You might find the relevant error message at the tail end of
/var/log/message.  Look for rshd or in.rshd errors.

  Some thoughts that might help if RSH is really the issue:

In later linuxes (linicies?) rsh spawning is done by xinetd.  You want
to make sure xinetd can spawn enough processes.  Look at the xinetd man
page, and the -limit option.  Adjust the /etc/xinetd.conf file to
reflect the limit.  One one system I had to bump this pretty high to
allow all the connections to daemons.

If you are still using /etc/inetd.conf, you can tell it how many servers
it may spawn by including a .nservers at the appropriate part of the
line (though my memory is unclear as to which part)

You might also be running out of network bandwidth.  Try running 

	vmstat 1
	netstat -cav

and see what your machine is doing network-wise.  Try grabbing the atop
program from freshmeat, and using that to summarize the net utilization
(or use ntop, or any of the others).

Joe

On Sun, 2002-12-15 at 20:05, Mike S Galicki wrote:
> Can't seem to get rsh to scale past like 63 nodes with mpi jobs.  SSH
> scales much higher, but the performance is a lot worse in customer
> benchmark tests.  I'm guessing that I'm running out of pty's or tty's
> or something on the headnode. Anyone have some ideas?  I believe the
> default pty's in 2.4.20 is 1024, but when I list /dev/pty I only see
> 256 entries. MAKEDEV -m 1024 didn't seem to do anything past 256.
> 
> Mike Galicki
> Technical Consultant
> Linux Services Team
> San Francisco, CA
> Internet ID: mgalicki at us.ibm.com
-- 
Joseph Landman, Ph.D.
Scalable Informatics LLC
email:   landman at scalableinformatics.com
  web:   http://scalableinformatics.com
voice:  +1 734 612 4615
  fax:  +1 734 398 5774