custom hardware (was: Xbox clusters?)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Vos dvos12 at calvin.eduWed Nov 28 14:11:09 PST 2001
- Previous message: Xbox clusters?
- Next message: custom hardware (was: Xbox clusters?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 28 Nov 2001, John Burton wrote: > Ummmm....speak for yourself. I've been putting together these "self > assembled beige box" for many years and currently have about 5% > component DOA rate, and about another 1% infant mortality rate (crap > out within 30 days). Takes on average 4 hours to determine what the > bad component is and 24-48 hours to replace it. I've never spent more > than 1 week "figuring out" which part is broken. The time I spent 1 > week was due to a flakey memory chip that was causing filesystem > errors in a 90GB RAID 5 array. Flakey memory is difficult to track > down because it can masquerade as virtually anything else... There is one computer in our cluster that would make me think twice before doing a custom build. I prefer to call it the node from heck. It only has one problem: it won't boot. If you press the power button, the powerlight flashes while the cpu and case fans turn a quarter turn, then nothing. You have to wait a minute before you even get that reaction again. (Sounds like a short somewhere). The problem only surfaces if the computer has been off for a little while, and nearly every time at that. 1st Occurance (several months ago). Try new power supply. No go. Remove drives, cards, etc. from motherboard until only (new) PS(power supply), Motherboard, Mem, and CPU. Nope. Swap mem. Nope. Swap CPU. Nope. Sounds like the motherboard (I replaced everything else). I return the original parts (and drop a screwdriver on the motherboard by accident), and it suddenly starts working. I put computer back in and it runs fine with everything the way it was before. 2nd Occurance (a month or so later). I knew it was a bad motherboard last time, so I replaced the motherboard. Worked great. 3rd Occurance (a month or so later). I take things apart and put them back together. Starts working. Now I'm starting to get confused. 4th Occurance (a month or so later). I remove drives and cards, put in spare PS. Nothing. Remove motherboard and put on a piece of wood with nothing attached but spare PS, CPU, and mem (using a screw driver to short pins instead of power switch). Used a new power cable plugged into a different circuit. Nothing. Try new mem. Get another system and individually check mem, motherboard, cpu. They are all good. Try both PS's in other system and problem follows them. Two bad powersupplies -- not too unusual. I replace them, and things run great. 5th Occurance (recently). I removed all cards, drives from motherboard. Nothing. Tried spare PS. That worked. Unplugged current PS from case, HD, FD, it started working. Put everything back together and it was still working. Since there is not a single piece of hardware that was present in each case, I feel forced to conclude that there must be something (power cord?) that is braking the power supplies. I have not seen this problem on any other computers. This is the point at which I would love to put the whole computer back in a box and send it to the reseller. Luckily we never sent back the "bad" motherboard and keep it around as a spare, since it works fine on other systems, now. David
- Previous message: Xbox clusters?
- Next message: custom hardware (was: Xbox clusters?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
