[Beowulf] UPS & power supply instability
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Sep 28 06:08:01 PDT 2005
- Previous message: [Beowulf] UPS & power supply instability
- Next message: [Beowulf] UPS & power supply instability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
David Kewley writes: > Hi all, > > Our compute nodes, the main power load, are Dell PowerEdge 1850s with a > single power supply per node. This power supply is > power-factor-corrected, so the Liebert PDUs see a power factor of 0.99 > or 1.00. > > I've balanced the loads on the three phases about as well as possible. > We still have neutral current, about 1/3 to 1/2 the magnitude of any of > the per-phase currents. I know this song all too well -- not for lieberts/dells per se but power woes in general. From what I learned messing with this before: If the power supplies on the load side are really power factor corrected, you shouldn't have a neutral current when the load is balanced. Certainly not one 1/2 the magnitude of the per phase current. That is a fairly clear signature of switching power supplies on 3 phase power, where the fact that the supplies draw current pretty much only in the middle third of each half cycle prevents the three phase cancellation. Also, you shouldn't be sharing a neutral line anywhere between the transformer/wall and the load -- each outgoing phase should have its own neutral back to the UPS. That is there shouldn't BE a "neutral current" for you to measure in a shared neutral line of three phase wye, although if you are running off a UPS I suppose they could float the neutral line (something that strikes me as cosmically boneheaded to do and if done, a very likely source of your woes and then some). Assume nothing. That is, don't assume that your wiring was done sanely or safely until/unless YOU'VE traced it back on through, and beware ground loops and worse. Our wiring was done by area contractors who supposedly knew what they were doing, and obviously passed inspection. Our wiring was FUBAR anyway -- the contractors were clueless and didn't even follow the architect's spec; the inspector inspected it as if it was household wiring and not server room wiring. It is entirely possible for your wiring to have been done by well-meaning electricians who haven't the faintest idea how to correctly and safely wire a server room environment with its (usually) highly nonlinear loads. Don't assume that Dell's power supplies are really PFC just because they say that they are, also. Believe it when you put a dual input scope on it and measure the current and voltage simultaneously as a function of time on a triggered scope and see two perfectly sinusoidal waves, in phase. > The problem is this: We can fire up our cluster to about 40% of maximum > load and everything is fine. But if we go over some threshold right > around 40% of max, the output currents from the PDUs go unstable. It's > a fairly sharp edge: Approximately speaking, if I stay below the > threshold, the current variation is <1%. But if go to the top end of > the stable range, then add another ~2% load, the output currents vary > over something like 30%. The instability gets worse with increasing > load above the threshold. Reducing the load below the threshold > restores stability (with perhaps a slight bit of hystereticity). > > This instability only happens when the UPS is online. If we put the UPS > in bypass, we can go up to around 70% of max load with no instability > (all computers on but idling in the OS; we haven't tested all nodes at > 100% CPU yet). > > We suspect the problem is due to some interaction between the computer > power supplies and the output stage of the UPS. Perhaps the UPS isn't > regulating correctly with this load. Or perhaps it's regulating *too > well*, and the rock-solid voltages allow the oscillations to grow > instead of damp. I don't know. Ummm, yes, something like this is possible, especially if the UPS is also being fed by a switching power supply in its own right. You could end up with some odd ripple on the line from the 180 Hz harmonics. It somewhat sounds like its primary capacitors are being driven to where they are undercharged (and can no longer effectively filter the ripple which then is bleeding through). Additionally, every transformer in the supply system is an inductor chained to capacitance and if your load has harmonics, it can drive resonance-like behaviors. A secondary problem is that with three-phase wye transformers in particular, switching power supply loads with odd harmonics (e.g. 180 Hz) can drive loop/eddy currents within the transformer itself, causing it to overheat (wasting power and costing you money) which will shorten its lifetime. The mid-phase overloads also brown out the computer power supplies during the draw part of the cycle. By far the best (and in fact nearly the only:-) decent explanation of harmonics and harmonic mitigation is to be found here: http://www.mirusinternational.com/pages/faq.html I would recommend reading ALL of this -- in fact, print it out and just keep it handy to use in testing and discussions with Liebert and/or Dell. In particular see #7 "Why do 3rd harmonic currents overload neutral conductors". I would have THOUGHT that Liebert would be all over this stuff as well, but from the sound of it that might not be the case. I would expect Dell to know none of it, and to not really know what a power factor correcting power supply is or what it does and why you need it. I don't know what you can do to positively diagnose the situation, but I expect that it will involve a dual trace oscilloscope rigged so it can function as a line voltmeter and ammeter at the same time, in a test circuit you'll probably have to hand wire so you can insert the one (ammeter) and run the other (voltmeter) across all three wire pair combos, and a handful of nodes to load the test circuit with. If you aren't comfortable with wiring, and don't know why you NEVER put an ammeter across two voltage lines and ALWAYS but a voltmeter across the voltage lines and which wire is hot and which is not and which is neutral, DON'T TRY THIS YOURSELF. Dying is such a drag. Be sure to rig a scope to measure current (safely, without starting fires or injuring living things including yourself), which isn't horribly easy but can be done. Look at the shape of the neutral line current, compared to the line voltage, when a single Dell is on the system and compared to the pictures in Mirus FAQ #7. This should give you a quick-and-dirty picture of whether the Dell power supplies are really PFC or if they're just ordinary switching power supplies that are supposedly more efficient or something so Dell claims that they are "PFC". Dell may in ignorance interpret "PFC" as having current and voltage "in phase" for the primary draws but ignore the presence of third harmonics. Honestly, from your reported neutral current I expect that this is the case (and I'm assuming since you REPORT the neutral current that you do indeed know how to measure it -- but looking at it is better). Look for voltage distortion on the supply lines as well. If you discover that -- surprise -- the "PFC" supplies aren't, you can either: a) Bug Dell for "real" PFC supplies, directing them to the Mirus FAQ in case they are clueless about what that means and telling them that when you slap the aforementioned scope on them under load you'd better see nearly perfect 60 Hz sinusoids, in phase, in both power and current with "no" odd-phase harmonics -- once you can find an engineer somewhere you understands a word of this; b) Live with it (this is what we did, reasonably successfully). Rewire the shared neutral so each phase has its own neutral back to a solid ground (e.g. building steel, depending on how your setup is wired). Try to ensure that the runs from the primary circuit panels are as short as possible and use as heavy gauge wire as possible/practical (minimally 12/2, but 10/2 would be even better although it is a PITA to work with in conduits) to keep the overcurrents in the middle third half-phase from browning out the supply. Also watch the circuit breakers -- when we shared a neutral we would pop the breaker whenever load went above about 60% of theoretical line capacity because of breaker overheating caused by the extra non-cancelled current. Sounding a lot like your current problem, that is; c) Give Mirus a call and get a harmonic correcting primary transformer for the space. Then forget about the problem and use whatever kind of power supplies you like (but still avoid sharing a neutral and all that). Or get Liebert to work on this for/with you. If the dells ARE OK and HAVE PFC transformers when you test them independently on otherwise quiet lines, then I suspect that you have a bigger problem. At least you'll know it isn't in the dells, which limits the number of people you have to yammer at. Consequently it must be in the Lieberts, the UPS, or in the wiring itself. I'd suspect something wired egregiously incorrectly -- a floating neutral on the UPS, for example -- that causes the neutral line to to accumulate a voltage bias relative to true ground and undercharge power supply capacitors, create a significant ground loop risk, and all sorts of other things. Or something else, maybe something worse. Maybe something dangerous. Take it pretty seriously -- people have been known to melt down racks of equipment (as in "melt the metal and burn the epoxy and insulation", not as in "cause equipment to momentarily smoke and break") from incorrect multiphase wiring. People have also been known to have been killed by faulty wiring. USUALLY you can detect egregious problems with an ordinary voltmeter or scope or maybe even a kill-a-watt -- if there is a significant voltage between the neutral line and the (unloaded) ground wire on any circuit (where I'm not certain what "significant" is -- greater than the 1-3 volts that might represent the resistive voltage on the driven neutral line from load to wall at any rate) this is a problem. If for any reason the neutral is far away from the local ground spatially (long runs of wire in between them increases the voltage disparity) ground loops can be quite dangerous and can cause system malfunction. Also see Mirus FAQ #9. I'm GUESSING that similar things to the pictures on this page can happen to the UPS under harmonic loads -- decrease in ride-through capacity as the caps are incorrectly charged. An interesting possibility is that the UPS switching power supplies are NOT harmonic corrected and share a neutral back to the transformers, so the fact that the dells are PFC is completely erased by having the UPS inline. This is an appealing possibility, really -- you have spent much money to ensure that you don't have a harmonic distortion problem, but in fact moved the harmonic distortion problem one step upstream and if anything exacerbated it (since the UPS has its own inefficiencies and ADDS those to the inefficiencies in the node power supplies, so it draws EVEN MORE peak current in the middle third half-cycles than the aggregate nodes would have done:-). SO, you might want to put your dual scope on the UPS supply lines themselves under various loads, looking for ripple and harmonics on both sides. Some of this stuff you can check for on your own, but really you may need to find a COMPETENT electrician -- one that specializes in server room wiring and is e.g. union trained -- to help you out. My brother-in-law is a journeyman electrician in the Detroit area, and I know what he went through in his journeyman training -- serious physics, actually. I also know what the local electricians who wired our server room had as training -- think "How to Wire Your Own Home" from Home Depot (well, maybe a BIT more than this, but you get the idea...:-). There exist competent people but you'll have to look for them and probably pay for their knowledge. > Liebert has been on this case for something like 4 weeks now. So far > they have no solution. Mind you, the "blame" may be shared by the > Liebert UPS and the Dell power supplies, but I'm relying on Liebert to > figure out why things go unstable *when their UPS is online, supplying > a load that should be quite normal*, and so far they have no solution > for me. We can't just wait on Liebert; this problem is hamstringing > our use of our new 1024-node cluster. So now I turn to this list. > > Can anyone here offer ideas, or better yet, experience? I've done my best above. I'm sorry you're having this problem, but you are certainly not the first person to get bitten by it and probably won't be the last, even though it SOUNDS like you did everything right (from your end) during the server room design phase. Good luck. rgb > > David > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20050928/5fa38860/attachment.bin
- Previous message: [Beowulf] UPS & power supply instability
- Next message: [Beowulf] UPS & power supply instability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
