[Beowulf] Checkpointing using flash

Fri Sep 21 12:13:09 PDT 2012

On Fri, Sep 21, 2012 at 01:09:41PM -0400, Ellis H. Wilson III wrote:
> On 09/21/12 12:58, Lux, Jim (337C) wrote:
> > Yes.. If that's the frequency of checkpoints.  I was thinking more like 1
> > checkpoint per second or 10 seconds.
> 
> While I suppose they might exist that frequent somehow in the wild, I've 
> never heard of checkpoints at that low of time interval.  These huge 
> cluster checkpoints are near to the entire memories, so even today we're 
> talking near to 64 or 128 GB of RAM per node.  In ten years we're 

Exascale will be likely ARM-like SoCs with stacked memories, including
nonvolatile ones (phase change, spintronics, whatever). At >100 GByte/s
memory bandwidth you can snapshot at ~Hz without too much penalties.

> talking what, near to if not above a TB of RAM per node?  Moreover, they 

I'd rather have MB/node or less.

> all tend to write their checkpoint at the same time and the SSDs aren't 
> on the compute nodes -- they're on some intermediate I/O storage nodes 

The forthcoming ARM SoCs have typically mSATA SSD at each node.

> (akin to BlueGene's intermediate layer).  So were talking about huge 
> cluster-wide dumps of data to the flash intermediate layer, which then 
> takes some hours to dump that data down to the more persistent HDDs. 
> This takes at the very least many minutes, and in the normal case hours. 
>   I would not be surprised if the best they could do at exascale was one 

Exascale won't look like today's clusters. Can't look like today's
clusters.

> checkpoint a day.  Again, I don't think these are used as the front-line 
> of defense against failures.  That would really suck :D.