[Beowulf] 512 nodes Myrinet cluster Challanges

Mark Hahn hahn at physics.mcmaster.ca
Sun Apr 30 09:42:57 PDT 2006

> > By the way, the idea of rolling-your-own hardware on a large cluster, and
> > planning on having a small technical team, makes me shiver in horror.  If
> > you go that route, you better have *lots* of experience in clusters. and
> > make very good decisions about cluster components and management methods.
> > If you don't, your users will suffer mightily, which means you will suffer
> > mightily too.

I believe that overstates the case significantly.

some clusters are just plain easy.  it's entirely possible to buy a
significant number of conservative compute nodes, toss them onto a generic
switch or two, and run the whole thing for a couple years without any real
effort.  I did it, and while I have a lot of experience, I didn't apply any
deep voodoo for the cluster I'm thinking of.  it started out with a good 
solid login/file/boot server (4U, 6x scsi, dual-xeon 2.4, 1G ram), a single
48pt 100bt (1G up) switch, and 48 dual-xeon nodes (diskful but not
disk-booting).  it was a delight to install, maintain and manage.
I originally built it with APC controllable PDUs, but in the process of 
moving it, stripped them out as I didn't need them.  (I _do_ always require
net-IPMI on anything newly purchased.)  I've added more nodes to the cluster
since then - dual-opteron nodes and a couple GE switches.

> For clusters with more than perhaps 16 nodes, or EVEN 32 if you're
> feeling masochistic and inclined to heartache:

with all respect to rgb, I don't think size is a primary factor in cluster 
building/maintaining/etc effort.  certainly it does eventually become a
concern, but that's primarily a statistical result of MTBF/nnodes.  it's
quite possible to choose hardware to maximize MTBF and configuration risk.

in the cluster above, I choose a chassis (AIC) which has a large centrifugal
blower, rather than a bunch of 40mm axial/muffin fans.  a much larger cluster
I'm working on now (768 nodes) has 14 40mm muffin fans in each node!  while
I know I can rely on the vendor (HP) to replace failures promptly and without
complaint, there's an interesting side-effect: power dissipation.  of 12 fans
pointing at the CPUs are actually paired inline, and each pair is rated to 
dissipate up to 20W.  so a node that idles at 210W and 265W under full load
can easily consume 340W if the fans are ramped up.  ouch!

this is probably the most significant size-dependent factor for me.  if
you're doing your own 32-node cluster, it's pretty easy to manage the
cooling.  the difference between dissipating 300 and 400W is less than
a ton of chiller capacity.  scraping up 10-20 additional tons of capacity
is quite a different proposition.

regards, mark hahn.

More information about the Beowulf mailing list