[Beowulf] Re: UPS & power supply instability
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Kewley kewley at gps.caltech.eduThu Sep 29 09:20:07 PDT 2005
- Previous message: [Beowulf] Re:UPS & power supply instability
- Next message: [Beowulf] Re: UPS & power supply instability - ongoing discussions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 28 September 2005 23:04, Maurice Hilarius wrote: > David Kewley wrote: > > .. > > > >One quick update: We finally had a high-level engineering conference > >call with Liebert this morning, at their instigation. It was a very > >good call, and they're being very helpful. I hope there'll be a > >workaround soon, but we may have to live with this problem for a while > >yet... > > > >David > > That sounds like positive progress.. > Did they name a reason this is happening, or are they taking steps to > send someone down with a scope to see what is happening? Yes, they're sending people out. Liebert engineers say they have basic understandings of what's going wrong, and have some possible ways to work around it or solve it. I'll not say more at this time, to give them time to work it out. > >P.S. I suppose you can guarantee that Hard Data power supplies would not > >induce current oscillations in the room? :) Didn't think so. > > Actually, if the room is wired correctly we can. > We have been building servers and installations in server rooms since > 1992, about 2 times as long as Dell have. > And, we DO claim PFC compliance on our power supplies, and can produce > test results and compliancy test reports to back that up. > > However, instead of trying to deflect from the real issue, let's look at > what I replied to, word for word: I apologize for making a snide comment. > >Mind you, the "blame" may be shared by the > >Liebert UPS and the Dell power supplies, but I'm relying on Liebert to > >figure out why things go unstable *when their UPS is online, supplying > >a load that should be quite normal*, and so far they have no solution > >for me. We can't just wait on Liebert; this problem is hamstringing > >our use of our new 1024-node cluster. So now I turn to this list. > > You have obviously decided, in advance, that the problem is with the > Liebert equipment. Maurice, in your two replies to this thread you've made lots of incorrect inferences and assumptions, including this one. Please show me the respect of *not* assuming what I think, what I've done, or what others have done at our site for this problem. If I've not *stated* some fact that you think is important, simply ask me rather than assuming. > You mention absolutely nothing about testing the power supplies. > That step should be the first, and fortunately is the easiest. > Almost any modern scope will do the job. > As it is low frequency it does not have to be an expensive or > specialized scope. > Instead of trusting "Kill-a-Watt clones" why not check the actual power > supply response, on a standard 115V single phase power input circuit? That's an excellent suggestion, and is in accord with my usual troubleshooting & experimental inclinations. But because I have the responsibility for *all* the aspects of commissioning this brand-new, large cluster, I've had to leave lots of details to others, Liebert in this case. To the best of my knowledge, Liebert has not studied these exact power supplies, but they say they understand PSes that are similar enough that they can work out a model of our specific problem. Until I have time to run experiments myself, I am going to trust them to cover these bases. > I have seen power regulation equipment fail in a similar fashion before, > where the power supplies are pulling down too much current to the > neutral phase, > and making the power feed overload on one phase, driving it into > instability. > This is a classic symptom of cheap, poorly designed and made power > supplies. Or bad room wiring, with undersized neutral lines. The PDUs have a front panel that displays lots of diagnostic measurements, and they sound a rather piercing alarm when any measurement goes over its Liebert-defined limit (they are the only alarms I've heard in that room that can reliably be heard over the room noise, from any part of the room :). The PDUs also have suitably sized breakers and suitably sized conductors on each of the 93 branch circuits. The three output phase currents all stay well under their limits, even when they begin to become unstable (at the low-power end of the instability, and well into the instability domain). Toward the high-power end of the instability domain that we've tested, the current oscillations become large enough, and sit on top of a large enough average current, so the PDUs *do* give overcurrent alarms (plus other alarms due to the wild oscillations). Unless something is going on that is not alarmed for, the PDUs and the Liert techs who've been onsite don't indicate any problem with the neutral wiring or the power supplies per se. > Liebert make big UPS and power units, and those are their "bread & > butter" > > Frankly I am surprised they have not yet dispatched a tech down to your > site with test equipment by now.. When did I say they haven't dispatched a tech to our site? In fact they have, mutliple times; I just hadn't mentioned that up to this point in this thread. My concern was not that they aren't sending techs, but that they have no solution yet, and that I wasn't getting a warm-fuzzy feeling that they really were treating this problem as critically as we need them to. After yesterday's conference call, I feel better about their efforts. Even so, the proof is still in the outcome, and the outcome is far from certain. > When you say "Liebert has been on this case for something like 4 weeks > now." what does that mean? That's when we first demonstrated this problem to their onsite tech & engaged their help in solving it. > >Can anyone here offer ideas, or better yet, experience? > > I was trying to. > Apparently you do not appreciate suggestions, except ones that support > your distrust of Liebert. I appreciate all constructive suggestions. My appreciation does not extend to insinuations. Thanks for trying to offer ideas & experience. I *do* appreciate some of what you've written in this email. I appreciate *none* of what you wrote in your first reply to this thread -- if you like, go back and read it and see if you can understand why. > Why not test the power supplies? > If doing it yourself is not something you are comfortable with, there > are many electrical inspection labs in your region that provide this > service, usually for under $150. > Look in the yellow pages under "testing" or similar. > > Many will allow you to stand there and watch and ask questions as they > do it. Now *that* is a very good suggestion. Thank you. I did not know testing could be this easy. (By the way, I'm comfortable with testing / measuring the power supplies, although I don't have the equipment on hand to do it properly, and I don't have the full range of knowledge to interpret all of what I measure.) For now, I'm going to continue to let Liebert run with this problem; we've offered to get them a power supply to take apart and/or measure, but so far they seem to believe they understand it well enough. I'm also going to trust Dell, that their power supplies are of good quality, just of poor interaction with the rest of our power infrastructure. Meanwhile, I have several other things to take care of on the cluster, before users can get more than minimal use out of it, so I'm not yet going to get into detailed measurements myself. David
- Previous message: [Beowulf] Re:UPS & power supply instability
- Next message: [Beowulf] Re: UPS & power supply instability - ongoing discussions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
