[Beowulf] Checkpointing using flash

Justin YUAN SHI shi at temple.edu
Sat Sep 29 02:29:43 PDT 2012


I missed this thread. Got busy with classes. Sorry.

Going back to Jim's comments on Infiniband and OSI and MPI. I see the
exacscale computing requires us to rethink MPI's insistence on sending
message directly. Even with the group communicators, the
implementation
insists on the same.

The problem with direct communication is that you leave the
application without a recourse when the transmission fails. As we have
discussed, any transient fault can cause that to happen. It is
practically impossible to provide redundancy for every transmission
unless we change our API design that eliminates the reliable
communication assumption. The application-level re-transmission will
allow the application to survive NOT only the communication failures
but also node failures (when you loose a chunk of memory). But the MPI
semantics does not allow this to happen, even if the implementation
tries to re-transmit a failed message.

Justin

On Tue, Sep 25, 2012 at 8:19 AM, Ellis H. Wilson III <ellis at cse.psu.edu> wrote:
> On 09/24/2012 12:57 PM, Andrew Holway wrote:
>>> Haha, I doubt it -- probably the opposite in terms of development cost.
>>>    Which is why I question the original statement on the grounds that
>>> "cost" isn't well defined.  Maybe the costs just performance-wise, but
>>> that's not even clear to me when we consider things at huge scales.
>>
>> 40 years ago an army of cheap software developers were needed to
>> service a single very expensive box. Now the boxes are super cheap and
>> the price for decent software developers is very high.
>
> 40 years ago the demand for this type of job was...what?  Incredibly
> limited, I'd bet, if not a downright niche (supercomputing,
> defense-related calculations, business apps, maybe a handful of other
> purposes).  And the boxes aren't super cheap because things have been
> "solved" in hardware rather than software -- the fabs for modern
> processors are much, much more expensive than they used to be, but the
> laws of sales at scale, if you will, kick in to make things cheap since
> so many want PCs.
>
>> With hardware, you just have to solve the problem once. With this
>
> I am totally unconvinced about this...if I solve something in software,
> don't I only need to solve it once as well, opensource my code, and
> share it?  While I agree certain things are downright destined for
> hardware (computer vision problems, arithmetic, etc), it is completely
> unclear to me that something as unsolved and as high-level as parallel
> programming for exascale computing should even be attempted to be dealt
> with in hardware.  What are you expecting the developers to code like
> then, if they cannot understand parallel programming?  Serial codes?
> Good luck finding or writing a compiler (also software) that will turn a
> serial code into a parallel code perfectly.  That's many decades down
> the line.
>
>> Checkpointing to some kind of non volatile disk might work for some
>> codes but its not a universal solution. Some MPI tricks might work for
>
> Uhh...I think it's the opposite.  We've been discussing Checkpointing in
> this thread as a general solution that almost always works (I mean
> you're literally snapshotting your memory, I cannot think of an instance
> where that would not work), but it's not a solution that we'd like to
> continue using for most of our codes in the future.  It's just inefficient.
>
>> another code. What about QCD codes that are almost completely I/O
>> bound....I cant wrap my head around how either solution would work in
>> that circumstance but then again I am not a computer scientist and
>> have a moderately weak grasp on the mechanics.
>
> What does I/O-bound or CPU-bound have to do with correctness of a
> checkpoint?  Do you mean data continues to be streamed in real-time like
> from a collider so we have to deal with that during the checkpoint?  Or
> are you referring to something else entirely?
>
>> Its easy to underestimate the golden rule of HPC! "Never underestimate
>> the crappyness of the code!". It is our task to provide a safe an
>> elegant playground for our users so that this crappyness matters a bit
>> less :)
>
> On a related note (I assume a majority of your users are scientists),
> regarding your or somebody else's post a bit back about how poor
> scientists are at coding -- I've witnessed the exact opposite.  Now,
> this is going on limited experience and all, but when I interned at
> Argonne National Labs by Chicago I saw some absolutely amazing code
> written by people without a computer science background that ran on what
> was then one of the top supers in the country (Intrepid).  The point is,
> they need to get their work done, and they know just how painful and
> long poor code will be and take.  Moreover, their careers rest on the
> premise that their calculations and resultant code are correct, and they
> have deadlines like the rest of us that they have to meet, which means
> therefore their code has to complete by.  My golden rule of HPC is
> therefore quite the opposite: "Never underestimate the cleverness of
> your users."  Their code might do "weird" things, but it's simply
> because your framework wasn't adaptive enough.  I have supreme respect
> for most of the "users" I've dealt with, but as I said before, this is
> admittedly going on limited experience and I could be an exceptional case.
>
> Best,
>
> ellis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list