updating the Linux kernel
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Lombard david.lombard at mscsoftware.comMon Jun 12 08:11:08 PDT 2000
- Previous message: updating the Linux kernel
- Next message: updating the Linux kernel
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Crutcher Dunnavant wrote: > > Now, I might completly miss something here, but shouldn't all *distibuted* > parallel programs assume that a node may not return. After all, what do you > assume about hardware failures? ... Um, no. It all depends upon the software. PVM does provide the ability to recover from a node failure, while an MPI program will just tank. > ... So, while it may not be a *good* way to do it, > In a properlly paralized application, shouldn't you be able to take down any > random node other than the job allocation node, AT ANY TIME, and have that job > reallocated.> reallocated (Yeah, you lose the local work, but those tasks should be > checkpointed frequently)... As for checkpointing, that too is an "it depends" answer. Application-level checkpointing may be available to varying degrees -- it can be a non-trivial task. System-level checkpointing generally can't handle sockets, and that rules out both PVM and MPI. -- David N. Lombard MSC.Software
- Previous message: updating the Linux kernel
- Next message: updating the Linux kernel
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
