diskless clients? beowulf-newbie seeks advice

Brian LaMere blamere at diversa.com
Fri Jun 22 10:55:08 PDT 2001

why does every guide around talk about diskless clients?  I mean...disks are
stinkin cheap nowadays...

I have ~$150,000 to make a test cluster (with WAY more if the test cluster
shows worth) but the boss-man wants to go with nodes which aren't exactly
"commodity" in my book.  dual p3-1000 with 1.25Gb ram, 15krpm 18Gb drives.
The things cost $8k+ each...tried to explain that 148 $1k machines would way
out perform 16 $8k machines, but...oh well.  These boxes take up 1u, which
seems to be their main selling point (HP's lp1000r).  Fortunately, these
boxes are down to $6.5k now in cost (dropped a bit since we bought them a
couple months back), but still...

on to my point.  Getting PVM to see everyone as one happy little family was
easy enough.  Got the network guys to isolate the little guys, so that only
the worldly node could see them, since I wasn't happy with opening up
everything and simply putting a little all:all in hosts.deny, and having
that be all the security I had.  But every guide that I've found has been
all about diskless nodes for a beowulf.  And this isn't really a beowulf
with just pvm (and soon lam-mpi and mpich), right?  I personally thought
that the network nfs/tftp traffic would be horrible if they were all
diskless clients...

so the real question:  I can put gig-e cards in the boxes instead of hard
drives...right now they just have 2 100bt enet connections.  I'm only using
one of the enet ports at the moment, too.  Would I be better with no disks,
and gig-e instead?  Some of the concerns I have here: though we're only
starting with a hundred gigs or such of data, we'll be at multi-terabyte
within a year.  To be throwing around data that large, while nfs'ing the OS
filesystems (on the clients) just seems like a lot for the boxes to do.  Am
I looking at it wrong?  Also, for cost reasons we may be doing our data
storage on something as tacky as network attached storage; we were looking
at some NetApp boxes, but went with some EMC boxes instead.  Note I'm not
talking about a symmetrix box or something (I already have one of those
housing my oracle data), but instead a EMC product called an "ip4700."  Not
all that impressed with it.

Just a little genetics research firm, needing some serious horsepower to
start running big hammer and blast jobs.  The data we have now is just the
bare minimum we need to get by, but if we had things like a working beowulf
the scientists upstairs would start making, since they'd be able to use it,
much more data.  They hired me on as the unix guy here knowing I don't know
squat about beowulfs, but that I really want to learn :)  Got "how to build
a beowulf" <grin> and I've read the manuals for pvm, mpich, lam-mpi, etc,
and several other beowulf how-to guides.  All are about diskless.  Is
diskless better?  Is it just better because its cheaper?  Are there other
reasons its better?  Would having gig-ethernet in the boxes instead of hard
drives be far better performance-wise?

Brian LaMere

More information about the Beowulf mailing list