diskless clients? beowulf-newbie seeks advice

Fri Jun 22 11:56:46 PDT 2001

couple of follow-up questions....

********************
I personally try to advocate diskless clients whenever I get a chance. 
There are several reasons for this:
1.	Administration is much easier if you don't have local data or OS or
anything that won't be fixed by a simple power cycle.  If a problem
persists through that, you know it's hardware (I guess it could be BIOS
setup, but I'd consider that hardware).
********************
this is easy enough to automate though...a single shell script that rcp's
the changed files to all the nodes (for configuration files).  Then I was
thinking of nfs exporting any applications that are needed...simply install
them into /usr2, and share out /usr2...wouldn't that handle that?

********************
2.	You do save money on hard disks, but have to spend a little more on
NICs.  What this does mean though, is that you can put a more advanced
storage system on the server (or network attached storage).  Putting
SCSI Raid or such on the server will increase performance and / or
reliability (I think Gig E is faster than most IDE drives and I know
Myrinet is, so you should improve total system performance).
********************

well, we do use 15k rpm scsi drives, but yeah....ya want to do as few reads
as possible even then.  Just seems with a potential of there being 100 of
these nodes, I very well may have to boot one every few days...and that
would be lots of traffic.  At that point, is it advisable to have a worldly
node set aside for doing the boot-up?  Hell, I could have everything
bootstrap off their 100bt ports, then actually work off a gig-ethernet
port...hmmm...

**************************
3.	In theory, you shouldn't be writing to the local disks in a program
anyway.  This will slow your computations waaaay down.  Same thing for
swapping to disk.
4.	You only have one copy of data, so not only do you save in total
storage costs (assuming cost/byte is fixed), but you also only have one
copy of the data so you don't have to worry about inconsistencies
between nodes (there are other ways to get the hard drives consistent,
but they are a bit of a pain).  Also, as a sys-admin it is nice to only
have one place to look if there are problems.
*************************
the data would obviously be shared out, yeah.  With a possibility of a
multi-terabyte database that the cluster is querying in a year, there's no
way I'd put a terabyte on each node..hehe.  I mean, IBM can say all they
want about low-end drives being 400gigs in a year or so, but...  point
being, whether they are diskless or not they'll be pulling in all their data
locally.  Its just the OS that would be local, and perhaps any applications
I'd want on each node.

**************************
5.	You can call it intuition if you like (although it's based on
facts),
but I really think this is the way clusters are going.  This is the way
that big systems like the Cray T3E work.  It's a lot simpler for
programmers (It took me a while to explain how to write to local disks
vs. server disks).  It's also just a lot more elegant, which may sound
like a cop out, but most good solutions are elegant.
*************************
Elegancy is certainly not a cop-out in my book.  The more elegant something
is, the better it works...this is almost always true.

**********************************
Having said all that, it is important to note that diskless nodes are
not for everyone.  In fact our cluster is not diskless, and we aren't
looking at getting diskless nodes any time soon (give me a few years). 
Right now it doesn't meet our needs since we need the local disk and we
have our disked cluster working fine.

Hope this helps a little.
Jared
*************************************

Any tips are helpful.  I'm just sittin here trying to decide which would be
better for -our- particular application.  When is it better for there to be
disked-clients?  Is diskless pretty much something I should obviously do
considering the fact that the cluster will be quering a huge database
anyway?

Brian LaMere
Diversa
Brian LaMere wrote:
> 
> why does every guide around talk about diskless clients?  I mean...disks
are
> stinkin cheap nowadays...
> 
> I have ~$150,000 to make a test cluster (with WAY more if the test cluster
> shows worth) but the boss-man wants to go with nodes which aren't exactly
> "commodity" in my book.  dual p3-1000 with 1.25Gb ram, 15krpm 18Gb drives.
> The things cost $8k+ each...tried to explain that 148 $1k machines would
way
> out perform 16 $8k machines, but...oh well.  These boxes take up 1u, which
> seems to be their main selling point (HP's lp1000r).  Fortunately, these
> boxes are down to $6.5k now in cost (dropped a bit since we bought them a
> couple months back), but still...
> 
> on to my point.  Getting PVM to see everyone as one happy little family
was
> easy enough.  Got the network guys to isolate the little guys, so that
only
> the worldly node could see them, since I wasn't happy with opening up
> everything and simply putting a little all:all in hosts.deny, and having
> that be all the security I had.  But every guide that I've found has been
> all about diskless nodes for a beowulf.  And this isn't really a beowulf
> with just pvm (and soon lam-mpi and mpich), right?  I personally thought
> that the network nfs/tftp traffic would be horrible if they were all
> diskless clients...
> 
> so the real question:  I can put gig-e cards in the boxes instead of hard
> drives...right now they just have 2 100bt enet connections.  I'm only
using
> one of the enet ports at the moment, too.  Would I be better with no
disks,
> and gig-e instead?  Some of the concerns I have here: though we're only
> starting with a hundred gigs or such of data, we'll be at multi-terabyte
> within a year.  To be throwing around data that large, while nfs'ing the
OS
> filesystems (on the clients) just seems like a lot for the boxes to do.
Am
> I looking at it wrong?  Also, for cost reasons we may be doing our data
> storage on something as tacky as network attached storage; we were looking
> at some NetApp boxes, but went with some EMC boxes instead.  Note I'm not
> talking about a symmetrix box or something (I already have one of those
> housing my oracle data), but instead a EMC product called an "ip4700."
Not
> all that impressed with it.
> 
> Just a little genetics research firm, needing some serious horsepower to
> start running big hammer and blast jobs.  The data we have now is just the
> bare minimum we need to get by, but if we had things like a working
beowulf
> the scientists upstairs would start making, since they'd be able to use
it,
> much more data.  They hired me on as the unix guy here knowing I don't
know
> squat about beowulfs, but that I really want to learn :)  Got "how to
build
> a beowulf" <grin> and I've read the manuals for pvm, mpich, lam-mpi, etc,
> and several other beowulf how-to guides.  All are about diskless.  Is
> diskless better?  Is it just better because its cheaper?  Are there other
> reasons its better?  Would having gig-ethernet in the boxes instead of
hard
> drives be far better performance-wise?
> 
> Brian LaMere
> Diversa
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
Fax: 512-471-9096
Email: Jared_Hodge at iat.utexas.edu