[Beowulf] Re: UPS & power supply instability - ongoing discussions
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Kewley kewley at gps.caltech.eduThu Sep 29 15:57:55 PDT 2005
- Previous message: [Beowulf] Re: UPS & power supply instability - ongoing discussions
- Next message: [Beowulf] Re: UPS & power supply instability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thursday 29 September 2005 10:54, Maurice Hilarius wrote: > David Kewley wrote: > OTOH, I am appalled by the fact that it has been "4 weeks since you > reported the problem to Dell and Liebert and apparently they have > done close to nothing about it. > I know that on a job of this size, if you had bought from us, and > reported it, we would be all over it. > Even Dell can not afford this kind of bad PR.. I am disappointed that Liebert has known for four weeks that we have this problem, and until yesterday did essentially nothing that was visible to us, aside from sending field techs out to take measurements, and giving us assurances that they're working very hard on the problem. We haven't gotten the feeling that they'd been putting enough urgency on the problem. If in fact they have, then they need to work on their communications with us. After talking to Liebert engineers yesterday, I feel better about their response. All the same, this is the only issue in the room that I'm actually *worried* about. The other, non-power issues clearly will get addressed when I have the resources to do so. Dell has been very responsive, proactively so, on the problems that seem to be in their domain. It has seemed to us and to Dell, however, that the power problems are between Liebert and Caltech, so Dell has kept informed but has not attempted to address the perceived room problems. In light of discussions about power supplies, however, I will keep an eye out for things I need to alert Dell to. If it ends up that their power supplies should have different electrical characteristics, I will make sure the right people at Dell get the message. They've been very welcoming of my constructive criticism to date, and I've already seen them improve their processes. > So, DID you get any useful response from either Dell or Liebert 4 > weeks ago, and in the interim. The response has been to send techs out a handful of times & have the District Manager talk with us on the phone. I'd like to think that lots more is going on in the background, but I see few direct results so far from such activity. "Useful" response? Not useful as in solving the problem, no. Just "working on it". That's really not good enough for us, even if it is all they can do. At least keeping us in the loop with their engineers incereases my comfort level, while we wait for a solution. > So, delays are partially because the staffing at your site is short > and you simply do not have enough time to do what it takes to make it > run? If so, I offer sympathy. > I see this far too often. > A budget of a million dollars for a cluster, but no cash to implement > it or maintain it. > That must be very frustrating! Thanks for your sympathy. I was very frustrated about understaffing three months ago or so, but it has been stated very clearly to me that I will be the only staff member. In addition to my best efforts, we will continue to expect a lot from our vendors and from Caltech staff. My local colleagues and our vendors have on the whole been excellent -- we've gotten many times more work done in the past months than I could have done myself. If I can automate enough pieces, we might be OK in the coming years. :) I've accepted that "me plus staff and vendors" is the way it's going to be, and I do my best within those constraints, letting management know exactly where we are and exactly what is too much to expect. It's worked OK so far. > >To the best of my knowledge, Liebert has not studied these exact > > power supplies, but they say they understand PSes that are similar > > enough that they can work out a model of our specific problem. > > Until I have time to run experiments myself, I am going to trust > > them to cover these bases. > > I would, in my experience they have a heck of a good rep. Ditto for me. Glad to hear you've had excellent experience with them. > >>I have seen power regulation equipment fail in a similar fashion > >> before, where the power supplies are pulling down too much current > >> to the neutral phase, > >>and making the power feed overload on one phase, driving it into > >>instability. > >>This is a classic symptom of cheap, poorly designed and made power > >>supplies. Or bad room wiring, with undersized neutral lines. > > > >The PDUs have a front panel that displays lots of diagnostic > > measurements, and they sound a rather piercing alarm when any > > measurement goes over its Liebert-defined limit (they are the only > > alarms I've heard in that room that can reliably be heard over the > > room noise, from any part of the room :). The PDUs also have > > suitably sized breakers and suitably sized conductors on each of > > the 93 branch circuits. > > > >The three output phase currents all stay well under their limits, > > even when they begin to become unstable (at the low-power end of > > the instability, and well into the instability domain). Toward the > > high-power end of the instability domain that we've tested, the > > current oscillations become large enough, and sit on top of a large > > enough average current, so the PDUs *do* give overcurrent alarms > > (plus other alarms due to the wild oscillations). > > > >Unless something is going on that is not alarmed for, the PDUs and > > the Liert techs who've been onsite don't indicate any problem with > > the neutral wiring or the power supplies per se. > > So, what DO they think is causing this? I am really curious.. I believe at the moment they're looking at it as a control theory problem, with multiple poles in the system, including the UPS, the PDUs, and the computer power supplies. I suppose it's possible that the APC strips and wiring play significant roles as well, but that seems less likely. Liebert's speculative workarounds involve reducing the magnitude of the poles. Ask me again for details when the problem is solved. > >>Liebert make big UPS and power units, and those are their "bread & > >>butter" > >> > >>Frankly I am surprised they have not yet dispatched a tech down to > >> your site with test equipment by now.. > > > >When did I say they haven't dispatched a tech to our site? In fact > > they have, mutliple times; I just hadn't mentioned that up to this > > point in this thread. > > Ah.. that paints one very different picture. > So bascially Liebert are on it, you have not mentioned what, if > anything Dell have done, but your are coming to this list because > after some weeks you still are not seeing a solution happening? Liebert is on it, but results (anything more than assurances) have been much too slow. So I came to this list to get ideas. > If you had mentioned things like what you say about Liebert's actions > to date, as you have in in this message, it would have painted a > different story entirely. Yes. I could not realistically include in my first emails all the details that you or anyone else might possibly consider important. I waited for your questions before I elaborated further. Hopefully all the important elements are on the table now. > To meet CSA, CE, and UL one gets what is called a "site inspection" > Often the best and cheapest way is to take the piece to a certified > test labs and they do the test, provide a short report, and a sticker > certifying it is electricall safe and accceeptable > It is not an FCC radio emissions test and certification, but you can > ask for that too, albeit at a higher cost. > They measure power characteristics, PF, current leakage, consumption, > stability, load maximum, etc. Interesting. I'll consider doing this, thanks. David
- Previous message: [Beowulf] Re: UPS & power supply instability - ongoing discussions
- Next message: [Beowulf] Re: UPS & power supply instability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
