How do you keep clusters running....
SGaudet at turbotekcomputer.com
Wed Apr 3 13:26:28 PST 2002
> What are folks doing about keeping hardware running on large clusters?
> Right now, I'm running 10 Racksaver RS-1200's (for a total of
> 20 nodes)...
> Sure seems like every week or two, I notice dead fans (each RS-1200
> has 6 case fans in addition to the 2 CPU fans and 2 power
> supply fans).
> My last fan failure was a CPU fan that toasted the CPU and
> How are folks with significantly more nodes than mine dealing
> with constant
> maintenance on their nodes? Do you have whole spare nodes
> sitting around-
> ready to be installed if something fails, or do you have a pile of
> spare parts? Did you get the vendor (if you purchased
> prebuilt systems)
> to supply a stockpile of warranty parts?
> One of the problems I'm facing is that every time something croaks,
> Racksaver is very good about replacing it under warranty, but getting
> the new parts delivered usually takes several days.
> For some things like fans, they sent extras for me to keep on-hand.
> For my last fan/CPU/motherboard failure, the node pair will be
> down ~5 days waiting for parts.
> Comments? Thoughts? Ideas?
The vendor of choise should be using quality parts. We don't see these
Linux Solutions Engineer
| Turbotek Computer Corp. tel:603-666-3062 ext. 21 |
| 8025 South Willow St. fax:603-666-4519 |
| Building 2, Unit 105 toll free:800-573-5393 |
| Manchester, NH 03103 e-mail:sgaudet at turbotekcomputer.com |
| web: http://www.turbotekcomputer.com |
More information about the Beowulf