[Beowulf] Opteron cooling specifications?

David Kewley kewley at gps.caltech.edu
Sat Apr 29 18:08:49 PDT 2006


On Saturday 29 April 2006 10:02, Michael Will wrote:
> David,
>
> 32 compute nodes per rack is just good practice, but of course not the
> maximum if you are willing
> to cut corners and do somewhat nonstandard and potentially riskier power
> (i.e. L21-30 208V 3phase).
>
> Where do your power cords go, and how many PDU's and cords do you have
> going to each rack?

What does it mean to say that 32 is "standard"?  Why shouldn't 40 be 
standard, other than perhaps 32 is more typically done?

How do you conclude that putting 40 in a rack rather than 32 necessarily 
means you cut corners?  Just because we had our sites set on fitting all 
those machines into the room doesn't mean that the rack setup is poor (it's 
actually excellent, come by and see it some time), nor that we cut corners 
to make things fit.  We didn't.  Many people who have seen our room remark 
that it's an outstandingly good room, so it's quite possible to do what we 
did and do it well -- 40 per rack is no problem if you design it well.

We use L21-20 208 V 3-phase (note -20 not -30).  How is that riskier?

We have 3 L21-20 based power strips per rack, APC units each rated (80% 
derated) at 5.7kW.  The 3' cords go under the raised floor to outlets that 
are fastened to a rail that runs directly under the raised floor gridwork.  
The outlets are supplied by flexible, waterproof conduit that runs across 
the cement subfloor to the raised-floor-mounted PDUs.  The cords are 
essentially invisible when the rack doors are closed, and are completely 
out of the way.

> Our 32 node racks have only two L21-20 cords for the nodes and comply
> with national fire and electrical codes
> that limit continuous loads on a circuit to 80% of the breaker rating.

Same here, but with 3 rack PDUs.  It's not a problem for us.

> The extra space in the rack is dedicated to the remaining cluster
> infrastructure, i.e. 2U headnode, switches,
> LCD-keyboard-tray,  storage, and room to run cables between racks. The
> additional infrastructure is not powered
> by the two L21-20 but fits on a separate single phase 120V circuit,
> typically a UPS to protect headnode and storage,
> or just a plain 1U PDU. That means 32 compute nodes and headnode and
> storage and LCD/keyboard together
> are fed by a total of 7 120V 20A phases of your breaker panel, utilizing
> three cords that go to your rack.

Right, that works in your setup, especially for smaller clusters.  For our 
large cluster, 40 per rack (for most but not all racks) works great.

David

> Michael Will
>
> David Kewley wrote:
> > I totally agree with Michael.
> >
> > Except I'd assert that 32 nodes per rack is not a fully populated rack.
> > :) Twenty-four of our 42U racks have 40 nodes apiece, plus 1U for a
> > network switch and 1U blank.  It works fine for us and allows us
> > comfortably fit 1024 nodes in our pre-existing room.
> >
> > At full power, one 40-node rack burns about 13kW.  Heat has not been a
> > problem -- the only anomaly is that the topmost node in a 40-node rack
> > tends to experience ambient temps a few degrees C higher than the
> > others. That's presumably because some of the rear-vented hot air is
> > recirculating within the rack back around to the front, and rising as
> > it goes.  There may well be some other subtle airflow issues involved.
> >
> > But yeah, initially Dell tried to tell me that 32 nodes is a full rack.
> > Pfft. :)
> >
> > In case it's of interest (and because I'm proud of our room :), our
> > arrangement is:
> >
> > A single 3-rack group for infrastructure (SAN, fileservers, master
> > nodes, work nodes, tape backup, central Myrinet switch), placed in the
> > middle of the room.
> >
> > Four groups of 7 racks apiece, each holding 256 compute nodes and
> > associated network equipment.  Racks 1-3 and 5-7 are 40-node racks (20
> > modes at the bottom, then 2U of GigE switch & blank space, then 20
> > nodes at the top). Rack 4 is 16 nodes at the bottom, a GigE switch in
> > the middle, a Myrinet edge switch at the top, with quite a bit of blank
> > space left over.
> >
> > In the room, these are arranged in a long row:
> >
> > [walkway][7racks][7racks][3racks][walkway][7racks][7racks][walkway]
> >
> > And that *just* barely fits in the room. :)
> >
> > One interesting element: Our switches are 48-port Nortel BayStack
> > switches, so we have a natural arrangement: The 7-rack and 3-rack
> > groups each have one switch stack.  The stacking cables go rack-to-rack
> > horizontally between racks (only the end racks have side panels).
> >
> > David
> >
> > On Friday 28 April 2006 14:22, Michael Will wrote:
> >> A good set of specs according to our engineers could be:
> >>
> >> 1. No side vending of hot air from the case. The systems will be
> >> placed into 19" racks and there is no place for the air to go if it's
> >> blown into the side of the rack. Even if you take the sides off then
> >> you still will have racks placed next to each other. Airflow should be
> >> 100% front to back.
> >>
> >> 2. Along with that, there should be no "cheat holes" in the top,
> >> bottom or sides of the case. All "fresh" air should be drawn in from
> >> the front of the chassis. Again, the system will be racked in a 19"
> >> rack and there is no "fresh air" to be drawn in from the sides of the
> >> case (see 1 above) nor will the holes be open when nodes are stacked
> >> one on top of the other in a fully populated rack (32 nodes per rack).
> >>
> >> 3. There should be a mechanical separation between the hot and cold
> >> sections of the chassis to prevent the internal fans from sucking in
> >> hot air from the rear of the chassis.
> >>
> >> 4. The power supply *must* vent directly to the outside of the case
> >> and not inside the chassis. The power supply produces approximately
> >> 20% of the heat in the system. That hot air must be vented directly
> >> out of the chassis to prevent it from heating other components in the
> >> system.
> >>
> >> 5. The system should employ fan speed control. Running high speed fans
> >> at less than rated speed prolongs their life and reduces power usage
> >> for the platform as a whole. Fan speed should be controlled by either
> >> ambient temperature or preferably by CPU temperature.
> >>
> >> 6. The system must have a way of measuring fan speed and reporting a
> >> fan failure so that failed fans can be replaced quickly.
> >>
> >> Michael Will / SE Technical Lead / Penguin Computing
> >>
> >> -----Original Message-----
> >> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
> >> On Behalf Of Bill Broadley
> >> Sent: Thursday, April 27, 2006 2:55 PM
> >> To: beowulf at beowulf.org
> >> Subject: [Beowulf] Opteron cooling specifications?
> >>
> >>
> >> I'm writing a spec for future opteron cluster purchases.  The issue of
> >> airflow came up.
> >>
> >> I've seen a surprising variety of configurations, some with a giant
> >> rotating cylinder (think paddle wheel), most with a variety of 40x28
> >> or 40x56mm fans, or horizontal blowers.
> >>
> >> Anyone have a fan vendor they prefer?  Ideally well known for making
> >> fans that last 3+ years when in use 24/7.
> >>
> >> A target node CFM for a dual socket dual core opteron?
> >>
> >> A target maximum CPU temp?  I assume it's wise to stay well below the
> >> 70C or so thermal max on most of the dual core Opterons.
> >>
> >> Seems like there is a huge variation in the number of fans and total
> >> CFM from various chassis/node manufacturers.  A single core single
> >> socket 1u opteron I got from sun has two 40mm x 56mm, and 4 * 40mm x
> >> 28mm fans. Not bad for a node starting at $750.
> >>
> >> Additionally some chassis designs form a fairly decent wall across the
> >> node for the fans to insure a good front to back airflow.  Others seem
> >> to place fans willy nilly, I've even seen some that suck air sideways
> >> across the rear opteron.
> >>
> >> In any case, the nature of the campus purchasing process is that we
> >> can put in any specification, but can't buy from a single vendor, or
> >> award bids for better engineering.  So basically lowest bid wins that
> >> meets the spec.  Thus the need for a better spec.
> >>
> >> Any feedback appreciated.
> >>
> >> --
> >> Bill Broadley
> >> Computational Science and Engineering
> >> UC Davis
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org To change your subscription
> >> (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org
> >> To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list