[Beowulf] Fault tolerance & scaling up clusters (was Re: Bright Cluster Manager)

Michael Di Domenico mdidomenico4 at gmail.com
Mon May 14 04:53:02 PDT 2018

On Sat, May 12, 2018 at 3:33 AM, Chris Samuel <chris at csamuel.org> wrote:
> On Wednesday, 9 May 2018 2:34:11 AM AEST Lux, Jim (337K) wrote:
> Where I am now we're pretty much the same, except instead of booting a pure
> RAM disk we boot an initrd that pivots onto an image stored on our Lustre
> filesystem instead.  These nodes do have local SSDs for local scratch, but
> again no real local state.

Can you expand on "image stored on lustre" part?  I'm pretty sure i
understand the gist, but i'd like to know more.

More information about the Beowulf mailing list