[Beowulf] HP 2848 switch woes

Bill Wichser bill at Princeton.EDU
Mon Jan 10 09:12:49 PST 2005


Trying to install a new cluster of Tyan 2881 mothers with CentOS 3.3, 
kernel 2.4.21-27.0.1.ELsmp (Opteron).

When running through this switch (Firmware:I.08.55, ROM:I.08.04), the 
system is forced to do a manual install as a failure occurs in what I 
believe is the initial discovery phase after the kernel boots.

When a direct connection is made to the head node, everything proceeds 
as normal.

During the initial booting, after PXE, the system sends a request out to 
the network asking for it's MAC address.  Right before this time, the 
network card appears to be reset by the OS.  This appears to be the 
normal progression from within the kernel.

On a direct cable, the rarp is seen and the compute node receives the 
info via the head node, right after the network card is reset.  Through 
the switch though, the rarp is never seen by the head node.

At first I thought it was something with autodetection and so set the 
switch up for just Gig.  It certainly isn't the case that rarps don't 
work as the initial tftp boot works fine, the vmlinuz is downloaded and 
booting proceeds.  It only is when during the boot phase when the 
network card is reset does communication somehow fail.

I've set the timeout in the switch for 15 minutes, made sure spanning 
tree was off, connected the cables to adjacent ports, all to no avail.

If anyone has any suggestions I am all ears as I have run out of ideas 
at this point.  HP just suggests updating the firmware, which I have 
done to no avail.

Thanks,

Bill




More information about the Beowulf mailing list