Help!! Scyld slave boot problem!!

Donald Becker becker at scyld.com
Mon May 6 21:08:00 PDT 2002


On Mon, 6 May 2002 sungyen at cch.org.tw wrote:

>     I installed Scyld in the master node (kernel 2.2.19-12.beo) which
>    equipped with IBM SCSI HDD and DFE-530TX Ethernet card. The slave
>    node is boot with floppy made from master node, however it
>    encounters problem in phase II booting process.
>    In phase II booting, the information in slave node is as follows:
> ......
> ......
> Sending RARP requests
> RSRP: 00:50:BA:24:DE:4F -> 192.168.1.100
> boot : boot : Server IP address : 192.168.1.1
> boot : boot : My IP address : 192.168.1.100
> boot : started sendstats daemon ; pid=16
> bpslave : IO daemon started : pid=17
> node_up : This is node 0
> node_up : boot log available in /var/log/beowulf/node.0 on the master
> kmod : failed to exec /sbin/modprob -s -k block-major-3. error=2

Did this report "/sbin/modprobe" or "/sbin/modprobe"?

Are you able to boot in diskless mode, without mounting any disk?

The Scyld system enhances the 'modprobe', 'insmod', and 'rmmod' programs
to provide device driver modules from the cluster master.  The programs
now migrate back and forth between the master and compute node to
load the driver file and resolve symbols.

The kernel can request device drivers internally.  The kernel does this
by calling the '/sbin/modprobe' program when there is a request for a
missing capability, driver or device.  The cluster-enhanced 'modprobe'
knows to migrate to the master to read the dependency file and driver,
and return to the original node to load the module into the kernel.

In the case above the kernel appears to be trying to load a module for
block-major-3, and something is going wrong.  Since "block-major-3"
usually refers to IDE disks and you have a SCSI disk, you might check
/etc/beowulf/fstab to see that you are not trying to mount a /dev/hda
partition when you mean /dev/sda.

A related aspect of the Scyld system is that there is new option for
'modprobe', 'insmod' and 'rmmod' that is used for system
administration. The "--node <nodenum>" option specifies which compute
node the module should be loaded onto (or removed from with 'rmmod').
This is typically used in the 'node_up' script to load driver modules
onto the slave nodes kernels.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list