[Beowulf] Checkpointing using flash

Eugen Leitl eugen at leitl.org
Mon Sep 24 02:10:56 PDT 2012


On Sat, Sep 22, 2012 at 09:29:25PM +0000, Lux, Jim (337C) wrote:

> I think the future is in explicitly recognizing that you have to pass
> messages serially and designing algorithms that are tolerant of things
> like missing messages, variable (but bounded) latency (or heck, latency at
> all).

Computational physics pretty much demands this. 
 
> Once you've got a generalized fast approach using message passing, it's
> very scalable.

But the human programming doesn't scale across 10^6
to asynchronous 10^9 nodes with <GByte of memory each 
and where determinism is computationally more expensive
than stochastical good-enough result.

Of course the physical modelers won't bat an eyelash,
but the common programmer who still tries to figure out
this multithreading thing will be out to lunch.



More information about the Beowulf mailing list