Robert G. Brown
rgb at phy.duke.edu
Fri Oct 5 04:48:27 PDT 2001
On Thu, 4 Oct 2001, Tim Carlson wrote:
> On Thu, 4 Oct 2001, Greg Lindahl wrote:
> > Don Becker gave some good reasons why NIS isn't used by most big
> > clusters. The investment bank I used to work for (5 years ago) made
> > every machine a slave in order to avoid NIS's braindamage.
> Before bashing NIS completely, maybe a qualifier of "sucks on Linux"
> would be appropriate? I ran NIS with one master and 50 slaves in a Solaris
> environment (2.5 -> 2.8) for 5 years and can't think of one problem
> (outside the time we tried to add a slave and did it incorrectly).
> Ran NIS on a 32 node Linux cluster based on RedHat 6.2 for six
> months and never had a problem with NIS.
We actually run NIS in the physics department, partly because we've been
doing so continuously since sometime in the mid-to-late eighties. We
started on SunOS, migrated to Solaris (linux servers just didn't work at
the time) and some years ago migrated to linux only -- we have pretty
much a straight linux operation at this point with the usual exception
of the half dozen WinXX systems that we don't take care of and a
selection of lab and classroom NT systems that we have to run to be able
to use certain physics/teaching software. NIS certainly works "well
enough" in a linux-only departmental environment (up to several hundred
users and machines).
However, I agree that NIS sucks for the reasons Don gave on a true
beowulfish cluster. A lot of overhead and a resource bottleneck.
Remember, if a system is stat'ing a lot of files it continuously has to
go back and check access permissions, which typically involves an NIS
hit. On the other hand, it does make managing users in a midsized
organization easier and generally works pretty well. So if you are
building a cluster to do embarrassingly parallel long running compute
jobs (so you are unlikely to encounter the contention problems Don
described) NIS is a perfectly reasonable way to manage user accounts.
Unfortunately, we are pretty darned certain that NIS leaks memory; if it
runs a "long time" it tends to crash, and can easily drag down clients
in mid-query with it. So it also sucks because it is somewhat broken,
at least compared to the tremendous stability of most of the core OS
services. I believe Seth (our systems guy) has NIS servers restart once
a day to clear their memory allocation to avoid this problem.
NIS is old and should almost certainly be thoroughly redesigned. We may
one day transition to LDAP for directory services but in the meantime
NIS or SOME sort of distributed database service is in my opinion
necessary in order to manage a midsize organization (although it is
generally NOT necessary to manage a cluster with only a few, slowly
varying users). E.g. rsyncing /etc/passwd doesn't scale very well when
you have 100's of users moving freely from machine to machine among
100's of machines, although of course one can work very hard and build
decent enough scripts to avoid some of the problems. In fact, kludging
together one-of-a-kind (nonportable) solutions to problems like this is
the very definition of things that don't scale well and are generally
"bad ideas" in systems administration. It is heartening that the Scyld
folks are working on a truly lightweight replacement -- one can only
hope that it is portable back to mainstream linux.
> Tim Carlson
> Voice: (509) 376-0300
> Email: Tim.Carlson at pnl.gov
> EMSL UNIX System Support
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf