[Beowulf] power usage, Intel 5160 vs. AMD 2216

Robert G. Brown rgb at phy.duke.edu
Fri Jul 13 07:31:44 PDT 2007


On Fri, 13 Jul 2007, Bruce Allen wrote:

> I have had power-related problems in the past, and have found that the lower 
> maintenance needs and higher reliability of UPS backed systems are worth it.

There are actually two "kinds" of power distribution issues that one
needs to think about for really large sites -- what goes on on the
outside of the room and what goes on on the inside.  On the outside it
is smart to get harmonic distortion correcting transformers that can do
things like protect your primary transformers from overload caused by
non-PFC power supplies should any of your nodes use them, and which can
often provide capacitative buffering of short outages, surge protection,
and so on at the same time.

On the inside there is power distribution and control -- avoidance of
ground loops within racks, 120V vs 240V vs 209V supply to the racks,
balancing the use of circuits on multiphase power (to minimize neutral
load at the common ground ideally to building steel).

UPS can be either inside or outside, but in either case should be
configured with a room kill switch (both thermal and manual) to avoid
cooking systems in the event of an AC failure or electrocuting firemen
in the event of a fire.

Truthfully, in the case of a really professional large scale cluster
room I'd be inclined to put the UPS on the outside in the primary
distribution system and make it easy to kill the room power.  I also
agree with Mark -- it is "better" to avoid UPS on compute nodes unless
you really really need it and can't find any other way of smoothing over
short power glitches.  UPS batteries are toxic and wear out quickly.  I
find that it is actually quite rare for a UPS to last three whole years
before you have a really significant probability that the battery won't,
actually work if there is a power glitch.  They're expensive, and have
to be regularly tested and maintained and refitted with batteries to be
reliable. They're dangerous unless you spend time and money on kill
switches (see discussion of same in the list archives).  I think they're
a good idea in a server room with a HA cluster, where failover is key
and money is on the line.  I think that they are rarely worth it when
one is just cranking away on compute nodes and all that one "risks" by
not having them is the hassle of a reboot and a bit of lost computation
time in the event of a power glitch.  This DOES depend on the quality of
local power, of course, but there are a variety of ways to deal with
power quality issues short of a full UPS.

> That's interesting.  Where does the PDU store 1 second of power?

I don't know about "PDU" per se -- but units that do significant power
conditioning (up to the extreme of dual isolation transformers) usually
have big capacitors to buffer surges and load variations.  I wouldn't
have expected them to make it through a whole second unless they were
designed to do so, but at this point they may be so designed.

For small/personal clusters I change my mind again.  I tend to buy cheap
UPS's for my house because our power bobbles for 1-5 seconds nearly
every heavy rain/windstorm we have, which is why I know from direct
experience that the batteries in these UPSs are lucky to last two whole
years.  I've got something like three of them where I'm plugged into the
surge side because the UPS side is dead (no power at all) or goes down
anyway when power bobbles, to the tune of system crash AND the maddening
beeping.  I'd love to find a 10 second PDU/conditioner that used a
really large capacitor instead of a battery to buffer short outages,
especially at mass market prices.

Does anybody know of such a beast?  No battery, no toxins, just a big
cap and small price?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list