Problem Booting the Slaves

Erik Arjan Hendriks erik at hendriks.cx
Fri Dec 15 07:47:20 PST 2000


On Fri, Dec 15, 2000 at 06:39:35AM -0600, Dave Leimbach wrote:
> There is a parameter in that crash output about RAMDISK size.  If your
> machine only has 16MB I think you may be out of luck.  You need to get the
> kernel, some kernel related processes, and bpslave running on those slave
> nodes.  Since the kernel crashed on a paging request I would assume that
> this is either the root or near the root of your problem.  That would be
> my best guess anyway.

"Unable to handle kernel paging request" just means that there was a
page fault that the kernel didn't know what to do with.  That's
usually caused by referencing a bad pointer.  You can also do it by
branching to oblivion.  Judging by the addresses, it looks like the
bad pointer case.

That being said... 16MB is not enough to boot a Scyld beowulf slave.
I believe the minimum is somewhere around 64MB.  There are some rather
large ram disks involved.  You need this much RAM to get things off
the ground regardless of whether or not you're trying to do a diskless
system.

In a sense, all systems start diskless.  Most of the stuff that's in
these ram disks gets freed when node setup is finished.  The setup
scripts free it right before setting the node state to "up".

Scyld's next release will be a lot better for low memory systems.

> The one that comes up unavailable has 256MB of RAM which is plenty of RAM
> for the ramdisks.  Scyld has an option in the beowulf distribution to run
> diskless.  I assume that this requires a large ramdisk that you can't get
> on the 16MB node.
> 
> Dave
> 
> On Fri, 15 Dec 2000, David Leunen wrote:
> 
> > Hello,
> > 
> > With BeoSetup's Scyld, I assigned the nodes to the cluster by draging
> > the ether-addresses, and cliked apply. Then I did 
> > 
> > "bpctl -S all -s reboot"
> > 
> > but nothing appened: the states of the nodes stayed "down". On the
> > slaves, the output was:
> > 
> > ...
> > monte: command line: panic=30 ramdisk_size=131072 apm=power-off
> > monte: Loading 7 sectors of setup code
> > monte: Loading kernel data at0x100000
> > monte: Loaded 642144 bytes of kernel data
> > Unable to handle kernel paging request at virtual address 9f4d4dcb
> > current->tss.cr3=00195000, %cr3=00195000
> > *pde=00000000
> > Oops: 0000
> > CPU:  0
> > EIP:  0010:[<c1800060>]
> > EFLAGS: 00010282
> > ...
> > 
> > after that, there are all the register and stack status, and it is
> > frozen. The slaves are P200 16Ram Diskless. I tried with a double PII
> > 256Ram Diskless and it turn 'unavailable' after reboot. What is wrong
> > with the other ones?
> > 
> > Thanks.
> > 
> > David
> > 
> > _______________________________________________
> > Beowulf mailing list
> > Beowulf at beowulf.org
> > http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Erik Arjan Hendriks          Printed On 100 Percent Recycled Electrons
erik at hendriks.cx                   Contents may settle during shipment




More information about the Beowulf mailing list