Scyld - slave node boot failure

Andrew Shewmaker shewa at inel.gov
Mon Jun 25 11:03:53 PDT 2001


I have installed a Scyld Beowulf master node and I am having problems 
with the slave nodes.
The addresses pop up as unknown in beosetup, I move an address to the 
middle column and
click on apply.  The slave nodes fail in the third phase of their boot 
up after the bpslave daemon
is started with a message like "short read - lost connection to master". 
 Then the slave reboots
after waiting 30 seconds.

All of the hardware is identical - slot a Athlons and one network card a 
piece, including the
master node.  I am using the Scyld prerelease CDs with the update rpms 
off of the website.

Here is the content of /var/log/beowulf/node.0

node_up: Setting system clock.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking / (type=fs_size=65536)...
setup_fs: Mounting / on /rootfs/ext2... (type=fs_size=65536; options=0)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking 134.20.8.76:/home (type=nfs)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting 134.20.8.76:/home on /rootfs//home... (type=nfs; 
options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=proc)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//proc... (type=proc; options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=devpts)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//dev/pts... (type=devpts; 
options=gid=5,mode=620)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
rfork: Invalid argument
Failed to create /etc/mtab.


I have successfully installed both the prerelease and final release on a 
different cluster and I
did not see this problem.  I did update the master node before I tried 
to boot a slave node--
could my difficulties be the result of a botched update?  I have tried 
booting the slaves with
the prerelease cd as well as a floppy, so I don't think this is a 
problem with mismatched
versions.

Thanks for any help,

Andrew Shewmaker





More information about the Beowulf mailing list