RSH scaling problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduWed Dec 18 09:26:19 PST 2002
- Previous message: Implementing a Beowulf into the Computer Forensics Process?
- Next message: RSH scaling problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Lindahl wrote: > Low ports can't be reused until TIME_WAIT time has passed. True. Let's see what kind of a limit that imposes on rsh command rates on a typical system - RedHat 7.3 with a few servers and SGE running, over 100baseT. I put 100 copies of a target node's name in a file and then did: time rsh -zf manycopies.txt hostname That blew up. But initially it was because of the default cps setting in xinetd.d for rsh, which picked up the default cps = 25 30 So I added cps = 250 10 to the /etc/xinetd.d/rsh, restarted xinetd, and tried it again, whereupon it completed in 2.196 seconds real time. Running this 3 times quickly failed in the third one, and netstat on the target showed all the ports used up. On the node running rsh netstat showed no TIME_WAIT connections. I think that means the target was closing the connection before the source. After a while (TIME_WAIT, presumably) these all dropped out of netstat and rsh to the target started working again. Then I changed the target file so that it listed 50 copies of target1 and 50 copies of target2. That variation failed in the 6th iteration, further supporting the conjecture that the limit is on the target end. So the rate for outgoing rsh from a given node seems not to be limited (at least by this effect) but the incoming rate to a node is limited. It jams up when about 290 ports are stuck in TIME_WAIT. TIME_WAIT on linux is 60 seconds (I think). So the average sustainable rate of incoming rsh (or rlogin, or rcp) commands is about 290/60, or just less than 5 per second. cps set to 250 is overly optimistic as well, if all rsh come from one source, since the fastest that rsh can send them (my modified version, which basically runs rcmd() in a loop), is only about 50/second. This was over 100baseT, maybe you can go higher with Myrinet. Which means, I suppose that if you want to fire a lot of commands from one machine to another putting rsh inside a loop is a bad idea. Better to start up one rsh, leave it running, and pipe the commands through it to some target process which runs them on the other end without dropping the connection between commands. ANYWAY, going back to the original post by Mike Galicki, he should check that the xinetd cps value (or equivalent, if it isn't linux) isn't setting the upper limit. Possibly he can get more throughput by raising it. Failing that, perhaps one of the other mpi devices keeps a line open all the time and so bypasses this limit entirely? Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: Implementing a Beowulf into the Computer Forensics Process?
- Next message: RSH scaling problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
