[Beowulf] Checkpointing using flash
Justin YUAN SHI
shi at temple.edu
Mon Oct 1 15:59:24 PDT 2012
On Mon, Oct 1, 2012 at 2:22 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
>> My idea is to use data parallel API. This is nothing new. In theory,
> right, it's not new. so why would it succeed this time around?
This is because the transformation of the application architecture
from static to statistic multiplexed for both computing and
>> can still be elegant looking. For example, you can have multiple
>> Infiniband interfaces (some machines already have) to help counter the
>> speed disparity between computing and communication.
> you lost me there. MPI has no problem using multiple interfaces...
That only helps with communication failure and bandwidth. We need to
hedge for computing failures and power as well.
More information about the Beowulf