[Beowulf] Checkpointing MPI applications

Scott Atchley e.scott.atchley at gmail.com
Mon Mar 27 16:27:59 UTC 2023


On Thu, Mar 23, 2023 at 3:46 PM Christopher Samuel <chris at csamuel.org>
wrote:

> On 2/19/23 10:26 am, Scott Atchley wrote:
>
> > We are looking at SCR for Frontier with the idea that users can store
> > checkpoints on the node-local drives with replication to a buddy node.
> > SCR will manage migrating non-defensive checkpoints to Lustre.
>
> Interesting, does it really need local storage or can it be used with
> diskless systems via tricks with loopback filesystems, etc?


Yes, it only needs a mount path. It can be ramfs/tmpfs, xfs (or other local
file system), etc.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230327/f4cd8dac/attachment.htm>


More information about the Beowulf mailing list