Slave node problem

ericf at whispers.org ericf at whispers.org
Tue Aug 21 21:29:34 PDT 2001


On Tue, 21 Aug 2001, Sean Dilda wrote:

> On Mon, 20 Aug 2001, ericf at whispers.org wrote:
> 
> > Hi, I just recently installed the Scyld Beowulf software (27bz-7) and have
> > ran into some difficulty with slave nodes.  The front end setup went
> > smoothly and I made my floppy disks for the slaves.  When I boot up the
> > slaves, they go through the boot process then start the RARP sequence, so
> > I dragged their MAC adresses from unknown to configured nodes and hit
> > apply.  The slave nodes appear to get an IP address then (and this happens
> > pretty fast) the machine reboots.  I noted it said neighbour table
> > overflow right after the IP assignment.  I've waded thru the mailing list
> > archives and the only remotely relevent thing I found to my problem didn't
> > really have an answer.  Any help would be greatly appreciated.
> 
> What happened after the machine rebooted?
> 
> In a normal run of events, once you hit the apply button, then slave
> node will get an IP, then imediately request a phase2 beoboot image from
> the master and download that.  This can happen pretty quickly over a
> fast network.  Once it has the phase2 beoboot image, it will do what we
> call the 2-kernel monte.  It will boot the kernel that is on the phase2
> floppy.  You will see the normal stuff for a kernel boot, however this
> isn't quite a normal reboot as it never goes back to BIOS or anything
> like that.  Once it has booted the kernel from the phase2 image, it will
> RARP again, and once it has its IP it will start up sendstats and
> bpslave, and once bpslave is up, the status of the slave node should
> change in beosetup.
> 

Well, When the machine reboots it completely reboots (as if i hit the
reset switch or just turned it on).  It then proceeds to boot back off the
floppy and go through the boot process, grabs its IP after the RARP and
said neighbor table overflow and spits out some more info about taking
down interfaces and reboots.  Basically, its an endless loop...if it
reboots and I take the floppy out, then it'll boot up linux that's
installed on the hard drive (which was there before I even started messing
with Beowulf).  As far as the configuration file mentioned in the other
email, I've checked it and manually did some adjusting as an afterthought,
but it lead to nothing but the same problem.  And yes I rebooted the
daemons and made a new floppy to be safe...three times.  So, at this
point, I'm back to a default configuration file, and the same problem.
I'm not quite sure what the problem may be because it seems to like all of
the hardware in that machine up until the point it gets the IP address.
If there is a way to log what is happening then that may help, but staring
at screen for three brief seconds each time it boots and gets the IP is
kind of tough. ;)  Hope this helps.

TIA,
Eric





More information about the Beowulf mailing list