[Beowulf] USB flash drive bootable distro to check cluster health.

Joe Landman joe.landman at gmail.com
Fri Jan 11 08:06:00 PST 2019


On 1/11/19 7:59 AM, Richard Chang wrote:
> Hi,
> I would like to know if we have or can make( or prepare) a USB 
> bootable OS that we can boot in a cluster and its nodes to test all 
> its functionality.
>
> The purpose of this is to boot a new or existing cluster to check its 
> health, including Infiniband network,  any cards, local hard disks, 
> memory etc, so that I don't have to disturb the existing OS and its 
> configuration.
>
> If possible, it would be nice to boot the compute nodes from the 
> master node.
>
> Anyone knows of any pre-existing distribution that will do the job ? 
> Or know how to do it with Centos or Ubuntu ?

FWIW: this is one of the uses cases of 
https://github.com/joelandman/nyble .  It works with CentOS, Debian, and 
Ubuntu (though I've not pushed the 18.04.1 changes yet).

I have a rudimentary USB target I was going to clean up soon, and the 
images can be centrally booted from a pxe server, and pull/run scripts 
post boot.

Runs in RAM, you can modify the distributions to your hearts content.  I 
have a few private repos here which have NVidia + MLNX + other drivers 
and related bits already built in.

I've set up many systems with this, tying it together with 
https://github.com/joelandman/tiburon for boot control.   This was 
originally used at Scalable Informatics when we were alive, and has 
evolved significantly since then.

If you want a simple pure USB distro for this, try SystemRescueCD, 
though I don't think it does Infiniband, or most drivers.


-- 

Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman



More information about the Beowulf mailing list