Robert G. Brown
rgb at phy.duke.edu
Fri Mar 28 12:07:30 PST 2003
On Fri, 28 Mar 2003 jbbernard at eng.uab.edu wrote:
> Hi all,
> We're renovating a classroom for the use of a cluster that does not yet
> exist. We're hoping it will reach 1000 nodes in a couple of years, but it's
> likely that it will have to grow piecemeal as funding allows.
> Notwithstanding that, we hope to administer the machines as a single
> My question is: how much room (in racks, or rack-units) would you allocate
> for the non-compute-node hardware, ie, the networking equipment, file
> servers, backup systems, management nodes, etc, for 1000 nodes? I'm sure
> this greatly depends on how we set things up, and I'm happy to be reminded
> of the many variables that need to be considered. (In our case, the UPSs and
> airconditioning units will be an adjacent room, so luckily they don't factor
> in.) Data for existing clusters would of course also be helpful.
This is tough to answer, because networking equipment, file servers,
backup systems and so forth can almost always be cased for racks (or in
the worst case, installed on rack shelves) and the number of them depend
entirely on your cluster's particular requirements. Even things like
the density of switch ports depends a lot on what kind of switching you
select, which also depends on what your primary "parallel" network is.
If you use SCI interconnects, for example, you may not need a switch at
all except perhaps for 100BT connections to the nodes for installation
and maintenance purposes, but you may also have some physical
constraints on rack layout to keep the internode connects topologically
For 100BT ports alone for 1000 nodes you'll likely need a full 45U rack,
twice that if you use patch panels, guestimating something like 24 ports
per U (although there may well be switches with higher per U density).
OTOH, if you have a lot of higher speed ports, they may have lower port
density. If you have both 100BT and myrinet switches, you'll have to
You also have to worry a LOT about how you're going to partition the
switches -- I don't think there are any switches with 1000 ports with
full bisection bandwidth at any price, and the bigger the full-BB
switches that you DO select, with uplinks of one sort or another to
connect the switches, the more expensive. Unless, of course, you choose
another network (Myrinet, SCI) as the IPC channel or only will be doing
embarrassingly parallel tasks and don't care about the 100BT network.
Servers/backup are equally difficult to estimate, as it will depend very
much on what the cluster is doing. I could imagine running the entire
cluster with a single server and very little disk either on the server
or the nodes. For some tasks that just compute and don't manage lots of
data (such as mine), that would be fine. I could also imagine a cluster
with terabytes of dataspace in several RAIDs, lots of node-local disk,
with backup to match -- if your task is rendering a movie, for example.
In the first case it would take a handful of U -- maybe order 10U being
generous (room for a small RAID, a tape, a 2U server). In the second
well, the sky's the limit -- a full rack? More?
So to make your estimate, you're going to have to start by thinking
about what KIND of network(s) you're planning to use, how MUCH
server-based disk resource you'll need to provide (and whether all,
some, or none of it will be backed up locally) and how much redundancy
you require -- 1000 nodes is a lot to have down for long because your
server crashes. At a very wild guess, four 45U racks at a minimum, up
There are other things you also need to think about space for -- a
workbench/tool area for node assembly, installation, removal, repair is
very useful (better than having to haul the nodes out and far away and
back again). Stereo headphones (noise cancellation is good:-). A warm
jacket. Remote monitoring devices (thermal, video, whatever).
That is aside from thermal kill on the power, professional wiring for
switched power supplies, harmonic mitigating transformers. Enough room
to be able to easily insert and remove rack units and get a front or
back accordingly. Enough light.
And don't forget, those 1000 nodes will cost you on the order of
$150,000 to $250,000 PER YEAR to leave turned on all the time, JUST for
the electricity and AC (assuming 150-250 Watts per node, $0.08/KW-hour,
and a COP of ballpark two or three on the AC). Plus amortized costs for
the room renovation. Obviously you have deep pockets, but make sure
they are deep enough! 250 KW is a rather lot of power to be moving into
and out of a limited space.
BTW, I'm >>trying<< to get more of this sort of infrastructure thing
explicitly into my online book, and (to plug a bit ahead of time) an
upcoming issue of linux magazine is going to cover many aspects of
cluster computing, including infrastructure. The book is linked to my
personal website below and to the brahma website as well.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf