[Beowulf] Exascale by the end of the year?

Justin Y. Shi shi at temple.edu
Wed Mar 12 07:37:42 PDT 2014


Thanks Chris for the links. I took a quick look into ULFM work. It is
really encouraging to see these type of efforts instead of hiding the
transient failures under the "carpet".

The issues are really hidden inside of the transient failures, though. For
exascale apps, one must prove an architecture that can gain application
performance and reliability at the same time when adding processors and
interconnects.

The fix is easier than it seems on surface. We called it Statistic
Multiplexed Computing. I gave it talk last year at NCAR. Here is the link
for the slides and video:
https://sea.ucar.edu/event/statistic-multiplexed-computing-smc-neglected-path-unlimited-application-scalability

I am also helping with InterCloud HPC 2014 in Italy this year. Please
submit your work if interested.

Many thanks in advance!

Justin Y. Shi
shi at temple.edu


On Wed, Mar 5, 2014 at 9:49 PM, Christopher Samuel <samuel at unimelb.edu.au>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/03/14 03:07, Joe Landman wrote:
>
> > I've not done much with MPI in a few years, have they extended it
> > beyond MPI_Init yet?  Can MPI procs just join a "borgified"
> > collective, preserve state so restarts/moves/reschedules of ranks
> > are cheap?  If not, what is the replacement for MPI that will do
> > this?
>
> Oops, forgot this in my previous email - I stumbled across the Uni of
> Tenessee's ULFM (User Level Failure Mitigation) project which has a
> Wordpress blog here:
>
> http://fault-tolerance.org/
>
> There is the PDF for a two page flyer from SC13 on the site which
> gives an overview and describes it thus:
>
> http://fault-tolerance.org/wp-content/uploads/2013/12/SC13-ULFM.pdf
>
> # User Level Failure Mitigation is a set of MPI interface extensions
> # enabling Message Passing programs to restore MPI communication
> # capabilities affected by process failures. It supports rebuilding
> # communicators, RMA windows and I/O Files
>
> All the best,
> Chris
> - --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlMX4kMACgkQO2KABBYQAh9iXgCffxwP07z91by2FCHxVRwtTl4Q
> yTUAni3Xn0C+Nla0rS4HwW2dfF4Czb0Q
> =yWTJ
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140312/1686e1e6/attachment.html>


More information about the Beowulf mailing list