[Beowulf] Dolphin PCI adapters suddenly invisible
cfa22 at drexel.edu
Thu Sep 15 17:30:37 PDT 2005
I have a 16-node dual-athlon cluster with dolphin/sci that has run
beautifully for two years. After a recent power cycle, 13 of 15 dolphin
adapters were simply not recognized by the motherboards. Each compute
node has a Tyan Tiger MPX 2446N-4M, bios 4.03, dual Athlon MP 2200+, and
the dolphins d334-xxx (64bit/66MHz) with 2 ins and 2 outs. I opened no
cases, I changed no cables; the adapters just became invisible, from one
day to the next.
I have picked one node at random to diagnose in detail. So far, I have
flashed the motherboard bios (both upgrade and retrograde), tried
different PCI slots, tried other PCI cards in the same slots. Other
cards are recognized on the bus just fine; only the dolphin adapter is
not. I swapped an adapter from a bad node into a good node, and the
good node found it. The BIOS settings on the good node are identical to
all the bad nodes.
I am stumped. Has anyone ever experienced anything like this?
Cameron F Abrams, PhD
Department of Chemical and Biological Engineering
Philadelphia, Pennsylvania USA
(v) 215-895-2231 (f) 215-895-5837
More information about the Beowulf