[Beowulf] Anybody here still use SystemImager?

Joe Landman joe.landman at gmail.com
Thu Feb 28 14:19:16 PST 2019


On 2/27/19 9:08 PM, David Mathog wrote:
> Joe Landman wrote:
[...]
>
> I'm about 98% of the way there now, with a mashup of parts from boel 
> and Centos 7.
> The initrd is pretty large though.
>
> Wasted most of a day on a mysterious issue with "sh" (busybox) not 
> responding to the keyboard with a 3.10.108 kernel built starting from 
> the boel config, but it would respond using the same initrd and a 
> stock Centos 7 kernel.  So 3.10.108 was recompiled with the Centos 7 
> config (which makes WAY too many modules for an initrd) with the 
> network drivers built into the kernel.  This fixes that problem but I 
> could not tell you why.

This is a driver issue.  Likely you aren't including the hid components 
in your initramfs, or built into the kernel.

lsmod | grep hid
mac_hid                16384  0
hid_generic            16384  0
usbhid                 49152  0
hid                   118784  2 usbhid,hid_generic

You should make sure hid, usbhid, and hid_generic are all included/loaded.

>
> The last thing to overcome is that in this environment the SATA disk 
> is not seen/mounted, even though tty* and numerous other things are.
>
>   modprobe sd_mod
>
> puts sd_mod in lsmod, but no /dev/sd* show up.  Hardware detection in 
> Linux has been done and redone so many times I have no idea what to 
> use in a 3.*.* kernel, and the web is littered with descriptions of 
> methods which no longer work.  The lspci from busybox doesn't give 
> device names for humans, which isn't helping.  BOEL used
> modules.pcimap for this, and that is one of the things which no longer 
> exist.
> The init script tries to set things up with mdev (not udev) this way:

This is, again, a driver issue.  You need to know which SATA/SAS card 
you have (including motherboard versions).

For example, for the system I am on now:

lsmod | grep sas
mpt3sas               241664  16
raid_class             16384  1 mpt3sas
scsi_transport_sas     40960  2 ses,mpt3sas

and another pure SATA system, looking at dmesg output,

[    2.133951] ahci 0000:00:11.0: version 3.0
[    2.134248] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 
0xf impl SATA mode
[    2.134250] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo 
pmp pio slum part

This is the ahci driver.  Most motherboards I've run into use it for 
basic SATA.


>
>     echo /sbin/mdev > /proc/sys/kernel/hotplug || shellout
>     /sbin/mdev -s || shellout
>
> which puts a lot of things in /dev, just not the SATA.  This is on a 
> Dell poweredge T110, maybe there is some driver for the SATA 
> controller which isn't loading.

In both cases, it is a driver issue.  For large initramfs, it varies 
from about 710MB for everything and the kitchen sink in debian9, to 
about 1.5GB for CentOS7.

root at zoidberg:/data/tiburon/diskless/images/nyble# ls -alF centos7/
total 2736520
drwxr-xr-x 2 root root        138 Jun 15  2018 ./
drwxr-xr-x 4 root root         36 Apr 25  2018 ../
-rw-r--r-- 1 root root 1436202727 Jun  5  2018 initramfs-4.16.13.nlytiq.img
-rw-r--r-- 1 root root 1356007691 Jun 15  2018 initramfs-4.16.15.nlytiq.img
-rw-r--r-- 1 root root    5023504 Jun  5  2018 vmlinuz-4.16.13.nlytiq
-rw-r--r-- 1 root root    4953872 Jun 15  2018 vmlinuz-4.16.15.nlytiq

root at zoidberg:/data/tiburon/diskless/images/nyble# ls -alF debian9/
total 2607756
drwxr-xr-x 2 root root        212 Sep 15 14:53 ./
drwxr-xr-x 4 root root         36 Apr 25  2018 ../
-rw-r--r-- 1 root root 1002775823 Jun  5  2018 
initramfs-ramboot-4.16.13.nlytiq
-rw-r--r-- 1 root root  908767337 Sep 15 14:53 
initramfs-ramboot-4.18.5.nlytiq
-rw-r--r-- 1 root root  744269030 May 29  2018 
initramfs-ramboot-4.9.0-6-amd64
-rw-r--r-- 1 root root    5019408 Jun  5  2018 vmlinuz-4.16.13.nlytiq
-rw-r--r-- 1 root root    5269360 Sep 15 14:53 vmlinuz-4.18.5.nlytiq
-rw-r--r-- 1 root root    4224800 May 29  2018 vmlinuz-4.9.0-6-amd64


I PXE boot all of these.  Takes ~10s over 1GbE, much less over faster 
networks.  You should see the thing boot over 100GbE. Sadly I don't have 
100GbE at home.

-- 
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman



More information about the Beowulf mailing list