custom hardware (was: Xbox clusters?)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bob Drzyzgula bob at drzyzgula.orgThu Nov 29 07:02:00 PST 2001
- Previous message: Strange hardware (was Re: custom hardware (was: Xbox clusters?))
- Next message: custom hardware (was: Xbox clusters?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote: > > David Vos wrote: > > > .... > > There is one computer in our cluster that would make me think twice before > > doing a custom build. I prefer to call it the node from heck. It only > > has one problem: it won't boot. If you press the power button, the > > powerlight flashes while the cpu and case fans turn a quarter turn, then > > nothing. You have to wait a minute before you even get that reaction > > again. (Sounds like a short somewhere). The problem only surfaces if the > > computer has been off for a little while, and nearly every time at that. > > I have seen similar strange behavior of some boxes in a set of 66's, and the > way to restart is also rather odd. > Basically, and this has been repeatedly observed on several boxes of the same > composition (dual Pentium III with ASUS P2BD motherboard) aligned on a metallic > shelf, the ATX box would stop after months of activity, and the simplest found > way to restart it is to unplug everything (power and ethernet), touch it for > a few seconds with hands, replug and voila. No need to open the box! > My guess is that some condensator needs to be unloaded, but exactly why > one needs to unplug every cable appears curious. One thing to understand is that, unless there is a physical switch on the power supply itself, ATX systems are never *really* turned off as long as they are plugged in -- they only go to a "standby" state, wherein +5V power is still being applied to a single pin (the purple wire). When you press the power button on the front of the chassis, it merely shorts a header that ultimately causes the motherboard to short the green wire in the ATX cable to ground -- this is a signal to the power supply to leave standby and start generating power for all the other outputs. Another thing to observe is that generally, ATX power supplies are switching supplies, which means that (to simplify things somewhat) they generate the correct voltage by charging and discharging a capacitor at a high rate. The switching controller constantly monitors the voltage on the capacitor and connects or disconnects the capacitor to the incoming supply, depending on whether the charge is above or below the desired level (the detailed truth behind this is fairly complex and typically involves multiple stages and inductors as well as capacitors, but this model is probably good enough for this discussion...). Thus, even when an ATX system is "off", the power supply is chugging along, keeping a capacitor charged to provide +5V at a low current. BTW, if you have the resources to do this, put a current sensor on the incoming AC line for a running system and feed the output to an oscilloscope. You should see a series of alternating positive and negative spikes -- those are the capacitors charging at the peaks and troughs of the AC voltage. Now, if the ATX board were simply to run the green-wire contact straight through to the power on/off header, you wouldn't need much oomph at all on the +5V standby line, and older ATX power supplies in fact didn't. However, newer boards have things like Wake-on-LAN, Wake-on-Modem, and other various and sundry goodies that have to run off the +5V standby. It has gotten to the point that, in order to do all the processing that is required to leave standby, the standby current draw is greater than what some older supplies can provide. So in the case of a power supply that either by design or fault cannot provide sufficient current under standby, what (I think) happens is that while the motherboard is waiting for the main supply voltages to come up to full power, the standby processing bleeds off the capacitor to the point that the standby voltage sags below the minimum required for operation. At that point, the standby processing halts, the motherboard stops holding the green wire to ground, and the power supply stops trying to power up. It then returns to standby mode, re-charges the standby capacitor, and the cycle begins again. If you have a system that is behaving like this, try putting a voltmeter on the standby pin of the ATX header (you can usually jab a probe down into the back of the connector). You should see it at +5V when the system is "off". Then press the system's "on" button and watch the voltage. You'll most likely see it sag down to a couple of volts or so. If this doesn't happen, you've probably got some other problem, perhaps a POST failure of some sort. Also, this may not be the end of the diagnosis -- it is possible that the failure to provide enough current on standby may not be the fault of the power supply itself. It could be a faulty componant (e.g. the SCSI drive we heard about) sucking down too much current on power-up, or an overburdened AC supply circuit that sags just a bit when your system starts up -- in the latter case I imagine that you could wind up with a seemingly jinxed spot in the equipment rack. :-) BTW, if the power supply has too little oomph on standby by *design*, the system will probably *never* power up. If the supply's design meets the new spec only marginally, or if it is malfunctioning, say, because of a damaged or weakened capacitor, then it might behave differently when cold than it does when it is fully warmed up. In this event, unplugging the supply for a while and reconnecting it can create a short window in which the supply can get the system over the hump to leave standby. I in fact have a supply at home that has this problem, and I just sort of live with it because it's not my main system. Someday perhaps I'll replace the supply. As to why you have to disconnect the Ethernet as well, I really don't have a clue. HTH, --Bob Drzyzgula
- Previous message: Strange hardware (was Re: custom hardware (was: Xbox clusters?))
- Next message: custom hardware (was: Xbox clusters?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
