Quick survey -- UPSs on slave nodes?

Maurice Hilarius maurice at harddata.com
Mon Feb 10 22:54:56 PST 2003


With regards to your message:
>At the University of Idaho we are preparing to order a new beowulf cluster
>and a vendor seemed to be shocked that we wanted UPSs attached to ALL of
>the nodes.  He stated that most people only use UPSs on the Master Node,
>unless the cluster was used for some kind of mission critical purpose.
>So the question, do you use UPSs on your slave nodes?  Arguments for not
>using UPSs on slave nodes (and vice versa)?


In general it depends on what your code is designed to do.
IF it can sustain being stopped and started again, then there is little 
benefit to the operation from a computing point of view.
However some people can not afford this.

The amount of UPS facility needed to keep any reasonable size of cluster 
going for anything more than a few minutes is prohibitively large and 
expensive. Beyond that you have to look at substantial backup generators, 
to power the cluster AND to power the air conditioning.

So, given that the above yields an answer to the uptime question as "No", 
then we are left with 2 electrical protection modes:

1) The concept that UPS will protect your equipment from spikes, noise and 
so on.

2) Protection from brown-outs ( low voltage temporary conditions).

A UPS that can give real protection MUST fully isolate all lines, hot, 
neutral, and most importantly ground. To effectively provide some useful 
protection against bad power conditions it must protect against a number of 
different conditions, many of which occur simultaneously.

It is expensive to do this. There are essentially 3 major classes of UPS:

Level 3 - Standby or Off-line design:
Protects from :
Power failure
Power sag
Power spike or surge on hot or neutral lines.

Does NOT protect from:
Undervoltage (Brownout)
Overvoltage
Line Noise
Frequency variation
Switching transients
Harmonic distortion

Approximate Cost for a typical 650VA/400W unit, enough to run 2 or 3 nodes: 
$ 160
Rather useless for protecting computers, I might add. All it really 
protects you from is short power failures.

Level 5 - Line Interactive:
Protects from :
Power failure
Power sag
Power spike or surge on hot or neutral lines.
Undervoltage (Brownout)
Overvoltage

Does NOT protect from:
Line Noise
Frequency variation
Switching transients
Harmonic distortion

Approximate Cost for a typical 750VA/500W unit, enough to run  3  typical 
nodes: $ 270
Fairly good for this application, but not for absolutely mission critical 
equipment.

Level 9 - Online UPS:
Protects from :
Power failure
Power sag
Power spike or surge on hot or neutral lines.
Undervoltage (Brownout)
Overvoltage
Line Noise
Frequency variation
Switching transients
Harmonic distortion

Approximate Cost for a typical 700VA/490W unit, enough to run  3  typical 
nodes: $ 425
Probably overkill for cluster nodes. Maybe a good idea for a master node 
and mass storage/arrays.

See:

NO UPS can effectively protect you from really big surges and spikes, such 
as a transformer failure in the power distribution, or a local lightning 
strike.
There is equipment designed to do that to a great degree, but it is not 
generally included in UPS equipment. The best methods of doing tat kind of 
line spike protection that I have seen usually involve large, centre tap 
toroidal transformers.
See:
http://www.oneac.com/powercon.html
http://www.powervar.com/english/solutions/prod_spc_na_gg.asp

Few UPS give any real isolation on the ground line.
As the most spikes and noise occurs on this line, I question the usefulness 
in using a UPS as a filtering device in many cases.

Batteries are expensive, and MUST be maintained and tested.
Typically a UPS after 1 year has lost at least 40% of it's capacity.

Ultimately it all depends on your budget.
I do not consider anything less than level 5 worth considering.
Assuming you DO want to add level 5 protection, you are looking at roughly 
$200+ per node for UPS protection.
As a typical dual processor node goes for "around" $2000 nowadays, this 
means a 10% immediate cost overhead, plus as a typical rack of 1U machines 
and switches is about a rack full, and that the UPS equipment needed to 
keep these up for 5 minutes of reliable shutdown time takes about 6U, it 
means that you consume about 15% of your racks with UPS gear.

If you are more comfortable with this , and can afford to give up 10% of 
your hardware budget and 15% of your space for UPS equipment, then it is 
not money wasted.

In our experience we rarely see the addition or lack of UPS on clusters we 
build and support as having much significant effect on hardware failure 
rates. IF you use decent quality power supplies with good design and 
components, and sufficient capacity, they can run in a fairly bad brownout 
without issues. Good power supplies are worth the money, and personally I 
would recommend you look seriously at ensuring you get that criteria 
fulfilled first.

I am sure lots of people on this list have their own experiences and 
beliefs, but in quite a few years of building hardware, as a designer, 
manufacturer and supplier of clusters and servers we are much more 
concerned with power supply quality than line conditions.
The cases where we feel UPS is more vital is in industrial sites where the 
building current is subject to sever use and conditions, and in some areas, 
especially rural ones, where lightning storms and power failures are a 
regular occurrence.

We have built many machines for use in Japan, and there they have to deal 
with lower voltage than North America, often below 100V, and a 50 cycle 
power grid. That effectively reduces a typical 400W supply to be 
effectively 300W. By using good power supplies we do not see problems with 
that.
In general, power supplies rated for European Power Factor Correction (PFC) 
specifications are able to handle much worse conditions without issues.

Sorry to be so wordy, but it is not a simple question to answer..



With our best regards,

Maurice W. Hilarius       Telephone: 01-780-456-9771
Hard Data Ltd.               FAX:       01-780-456-9772
11060 - 166 Avenue        mailto:maurice at harddata.com
Edmonton, AB, Canada      http://www.harddata.com/
    T5X 1Y3




More information about the Beowulf mailing list