[Beowulf] delayed savings time crashes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Kewley kewley at gps.caltech.eduWed Apr 12 12:57:30 PDT 2006
- Previous message: [Beowulf] delayed savings time crashes
- Next message: [Beowulf] Re: Reminder: BWBUG/LCUG live stream today, 2006-04-11
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 12 April 2006 11:34, David Mathog wrote: > Hmm, now that we know the cause of it that might explain > why all those that did reboot were plugged into just 2 surge > suppressors, where the loss was 9/10 machines, whereas the > other 2 surge suppressors lost 0/10 machines. Each surge > suppressor is on its own circuit which is 1/3rd of a 3 phase line. > Maybe only one phase had the glitch and by good luck the > two circuits which lost no machines were wired between the > two good phases? I do not know how this worked, but I did see something similar but even stranger. Our UPS feeds two PDUs, each responsible for about 1/2 the computers. One PDU saw all computers on phases 1 & 2 fail, and the other saw all computers on phases 1 & 3 fail. On both PDUs, the third, unaffected phase saw all its computers stay up. I have no idea how to explain this. > > This info comes from the responsible EE at Caltech. As for its > > effects, believe me, I know about it the hard way, as it took down 2/3 > > of our compute nodes, 1/3 of our disk shelves, and 3/4 of our > > fileservers. > > That's a lot of machines in your case. Did any sustain permanent > damage? It was a voltage drop rather than a spike, and that probably explains why we had no hardware damage. Just quite a bit of filesystem corruption to clean up (which leaves lost files & corrupted file data for some small subset of user files). David
- Previous message: [Beowulf] delayed savings time crashes
- Next message: [Beowulf] Re: Reminder: BWBUG/LCUG live stream today, 2006-04-11
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
