tim.carlson at pnl.gov
Thu Oct 4 21:24:18 PDT 2001
On Thu, 4 Oct 2001, Donald Becker wrote:
> > If you were running
> > 1000 small jobs in a couple of minutes I could imagine having problems
> > authenticating against any non-local mechanism.
> Hmmm, a reasonable goal is running a small cluster-wide job every
> second. I suspect the NIS delays alone take longer than one second with
> just a few nodes.
So I ran the following test on one of our small clusters.
6 client NIS nodes with one NIS master (front end node) and no NIS slave
servers. Dual 800Mhz Pentium IIIs connected on a fast ethernet switch.
Forgive my sloppy C shell programming :)
The "script" which is basically 100 rsh calls and some NIS work on
looking up the ownership of a file.
I am doing an ls on /tmp which contains only 3 or 4 files, but I own two
of them so NIS is consulted for file ownership. I took NFS delays out by
going to /tmp.
while ($i < 100)
rsh $1 ls -l /tmp > /dev/null
set i=`expr $i + 1`
[tim at frontend-0 tim]$ time ./script compute-0-0
So if the job takes zero time and connecting to a machine takes zero time
then the NIS overhead is about 1/8 of a second. I ran this a half a dozen
times and the run varied between 10 and 13 seconds.
Now I point this script at 6 nodes at the same time (or at least as fast
as I can type a return in 6 xterms) and the mean time per run is about 31
seconds. That puts my potential NIS delay at a maximum of 1/3 of a
second. But I have also launched 600 jobs in 31 seconds.
Two examples from the larger test:
[tim at frontend-0 tim]$ date; time ./script compute-0-0
Thu Oct 4 21:06:53 PDT 2001
[tim at frontend-0 tim]$ date; time ./script compute-0-2
Thu Oct 4 21:06:52 PDT 2001
Before and after "ps -ax | grep ypserv" on the master node.
639 ? S 73:08 ypserv
639 ? S 73:10 ypserv
So I used 2 seconds of CPU time with ypserv
My first version of the script was a "touch /tmp/testfile" and produced
similar results. My /etc/nsswitch.conf files go "files nis" and the only
entry in /etc/passwd on the compute nodes is root
I am willing to be enlightened as to how my test is flawed. I'll run
different tests if asked. Is my test too trivial?
Voice: (509) 376-0300
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support
More information about the Beowulf