Scyld 27Z-8 Gig Net - HELP!
calvert at scyld.com
Thu Sep 26 08:35:50 PDT 2002
I know you said you modified all of the files, but just to review, under
27z-8, you need to modify the file /etc/beowulf/config.boot to add the
device and vendor information for the newer e1000 card. So you'll need
to add the following line:
pci 0x8086 0x100E e1000
In addition, make sure you have a 'bootmodule' entry for "e1000" near
the beginning of the file. Next rebuild your node boot floppy and
beoboot images and try rebooting.
If you've already done all of that (which it sounds like you have), then
attached are some directions for building an e1000 driver under Scyld.
Hopefully, this solves your problem.
Stanley, Matthew D. wrote:
>I have several clusters running the public release of 27Z-8. They have been, up until now exclusively via-rhine and 3c59x based 100mbit clusters. We wanted to upgrade to gigabit ethernet and decided to upgrade our 4 machine cluster using Dlink DGE-500T cards (ns820/ns83820 based). I compiled the latest netdrivers.tgz file and the ns820 driver appeared to work fine as a link to the outside world but did not function on the beoboot floppy even though I compiled for that kernel and even did a full kernel set rebuild (rpm -bb) including the new netdrivers.tgz file. What happened was right after it would find the card, find the master server and assign the IP address it would just sit at the line where it requests /var/beowulf/boot.img.
>Ok, so I gave up on Dlink cards, and purchased 4 Intel PRO/1000MT cards, the new version which requires the new release of drivers since it's PCI id is 8086:100E and not 8086:1000. I again compiled the drivers and tested the card to the internet side with 0 problems. I then create my boot images and try to boot, it gets a little farther than the Dlink, it will actually starts to boot the net boot image and then locks up and never completes.
>Am I missing something here? Ive modified all of the files, it finds the cards, it even works for days on the internet if I switch my card to the eth0 and not eth1. It appears to be a driver issue yet I have similar problems with two completely different sets of cards. I have even tried using a 100 mbit hub instead of a gigabit switch with identical results. I can also just take out the cards and put in 3c59x cards and the problem is fixed!
>We use our clusters for NAMD only, is there a way to just install full versions of Scyld and then execute bpslave? If so, what modifications need to be done to the node_up and other scripts to make that work. I realize this means more administration, but at this point I have spent weeks trying to make this work, I can install and update 4 machines in a matter of a couple hours.
>Are there settings in beoboot which changes the way it gets the information from the master node, maybe making it more reliable like broadcast/multicast, etc?
>Any help would be appreciated,
>Structural Biology Core
>University of Missouri - Columbia
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
HOW TO ADD DRIVERS - Example shown for Intel Pro/1000 series gigabit adapters
=> If available, get the prebuilt modules for the appropriate kernel from:
For example, for the 2.2.19-12 kernel:
=> If not available, download source code for driver. The Intel Pro/1000
series driver can be found at ftp://www.intel.com/df-support/2897/eng or
NOTE: If the kernel source rpm was not installed, you'll have to do that
first. It is installed by default under 27cz-9, but not under
28cz-8-beta2. The kernel source is available on the distribution
CD under Scyld/RPMS/kernel-source-2.4.9-21.1.i386.rpm
=> Add this line to the beginning of the Makefile
CFLAGS = $(KCFLAGS)
=> Make the beoboot, SMP, and UP modules for the version of the Scyld
kernel that you are running under (27cz-9 shown here):
> make KCFLAGS="-D__BOOT_KERNEL_H_ -D__module__beoboot"
> mv e1000.o /lib/modules/2.2.19-14.beobeoboot/net
> make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_SMP=1"
> mv e1000.o /lib/modules/2.2.19-14.beosmp/net
> make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_UP=1"
> mv e1000.o /lib/modules/2.2.19-14.beo/net
=> Add new entries for this module to the PCI table
1. Add, if necessary, the following bootmodule entry to the configuration
file (in /etc/beowulf/config.boot for 27cz-9 and /etc/beowulf/config for
2. Add entries to the device list for each device supported by this driver
(in /etc/beowulf/config.boot for 27cz-9 and /usr/share/kudzu/pcitable for
pci 0x8086 0x1000 e1000
pci 0x8086 0x1001 e1000
pci 0x8086 0x1004 e1000
pci 0x8086 0x1008 e1000
pci 0x8086 0x1009 e1000
pci 0x8086 0x100c e1000
=> Build the dependency file (for each kernel) used by modprobe to load the
For single processor kernel:
depmod -a -e -F /boot/System.map-2.2.19-14.beo 2.2.19-14.beo
For SMP (more than one processor machine) kernel:
depmod -a -e -F /boot/System.map-2.2.19-14.beosmp 2.2.19-14.beosmp
For beoboot kernel (Stage 1 image):
depmod -a -e -F /boot/System.map-2.2.19-14.beobeoboot 2.2.19-14.beobeoboot
=> Rebuild the Phase 1 and Phase 2 kernel images:
/usr/bin/beoboot -1 -f -o /dev/fd0 -c "apm=power-off"
/usr/bin/beoboot -2 -n -k /boot/vmlinuz-`uname -r` -o /var/beowulf/boot.img -c "apm=power-off"
If your master node is single processor and your compute node is SMP,
and you don't have a SMP kernel installed, you'll have to get the RPM
from the distribution CD and install it (using rpm -U). This happens
when you install on a single processor machine because the installer
selects the kernel to be installed based on the machine being installed
on. You must run the same kernel on all of the machines in the cluster.
The SMP kernel can run on both single processor and SMP machines.
More information about the Beowulf