[Beowulf] UPS & power supply instability

David Kewley kewley at gps.caltech.edu
Tue Sep 27 17:32:18 PDT 2005


Hi all,

I wonder whether anyone has seen the problem that we're seeing with our 
cluster's electrical supply.

We have a Liebert 600 Series 500kVA UPS feeding two Liebert PDUs.  The 
PDUs then have a fanout of whips to the computer racks.

The UPS voltage in/out and the PDU voltage in is 480V 3ph.  The PDU out 
is 208V 3ph +neutral, 120V wrt neutral.  The whips are 5-conductor 
(3ph, ground, neutral), and they feed APC AP7960 switchable rack PDUs.  
The computers are fed 120V from the AP7960s.

Our compute nodes, the main power load, are Dell PowerEdge 1850s with a 
single power supply per node.  This power supply is 
power-factor-corrected, so the Liebert PDUs see a power factor of 0.99 
or 1.00.

I've balanced the loads on the three phases about as well as possible.  
We still have neutral current, about 1/3 to 1/2 the magnitude of any of 
the per-phase currents.

The problem is this: We can fire up our cluster to about 40% of maximum 
load and everything is fine.  But if we go over some threshold right 
around 40% of max, the output currents from the PDUs go unstable.  It's 
a fairly sharp edge: Approximately speaking, if I stay below the 
threshold, the current variation is <1%.  But if go to the top end of 
the stable range, then add another ~2% load, the output currents vary 
over something like 30%.  The instability gets worse with increasing 
load above the threshold.  Reducing the load below the threshold 
restores stability (with perhaps a slight bit of hystereticity).

This instability only happens when the UPS is online.  If we put the UPS 
in bypass, we can go up to around 70% of max load with no instability 
(all computers on but idling in the OS; we haven't tested all nodes at 
100% CPU yet).

We suspect the problem is due to some interaction between the computer 
power supplies and the output stage of the UPS.  Perhaps the UPS isn't 
regulating correctly with this load.  Or perhaps it's regulating *too 
well*, and the rock-solid voltages allow the oscillations to grow 
instead of damp.  I don't know.

Liebert has been on this case for something like 4 weeks now.  So far 
they have no solution.  Mind you, the "blame" may be shared by the 
Liebert UPS and the Dell power supplies, but I'm relying on Liebert to 
figure out why things go unstable *when their UPS is online, supplying 
a load that should be quite normal*, and so far they have no solution 
for me.  We can't just wait on Liebert; this problem is hamstringing 
our use of our new 1024-node cluster.  So now I turn to this list.

Can anyone here offer ideas, or better yet, experience?

David



More information about the Beowulf mailing list