[Beowulf] HP 2848 switch woes
bill at Princeton.EDU
Mon Jan 10 09:12:49 PST 2005
Trying to install a new cluster of Tyan 2881 mothers with CentOS 3.3,
kernel 2.4.21-27.0.1.ELsmp (Opteron).
When running through this switch (Firmware:I.08.55, ROM:I.08.04), the
system is forced to do a manual install as a failure occurs in what I
believe is the initial discovery phase after the kernel boots.
When a direct connection is made to the head node, everything proceeds
During the initial booting, after PXE, the system sends a request out to
the network asking for it's MAC address. Right before this time, the
network card appears to be reset by the OS. This appears to be the
normal progression from within the kernel.
On a direct cable, the rarp is seen and the compute node receives the
info via the head node, right after the network card is reset. Through
the switch though, the rarp is never seen by the head node.
At first I thought it was something with autodetection and so set the
switch up for just Gig. It certainly isn't the case that rarps don't
work as the initial tftp boot works fine, the vmlinuz is downloaded and
booting proceeds. It only is when during the boot phase when the
network card is reset does communication somehow fail.
I've set the timeout in the switch for 15 minutes, made sure spanning
tree was off, connected the cables to adjacent ports, all to no avail.
If anyone has any suggestions I am all ears as I have run out of ideas
at this point. HP just suggests updating the firmware, which I have
done to no avail.
More information about the Beowulf