updating the Linux kernel
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jun 12 10:40:57 PDT 2000
- Previous message: updating the Linux kernel
- Next message: updating the Linux kernel
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 12 Jun 2000, David Lombard wrote: > Crutcher Dunnavant wrote: > > > > Now, I might completly miss something here, but shouldn't all *distibuted* > > parallel programs assume that a node may not return. After all, what do you > > assume about hardware failures? ... > > Um, no. > > It all depends upon the software. PVM does provide the ability to > recover from a node failure, while an MPI program will just tank. > > ... So, while it may not be a *good* way to do it, > > In a properlly paralized application, shouldn't you be able to take down any > > random node other than the job allocation node, AT ANY TIME, and have that job > > reallocated.> reallocated (Yeah, you lose the local work, but those tasks should be > > checkpointed frequently)... > > As for checkpointing, that too is an "it depends" answer. > Application-level checkpointing may be available to varying degrees -- > it can be a non-trivial task. System-level checkpointing generally > can't handle sockets, and that rules out both PVM and MPI. Depends indeed, and on hardware as much (or even more) than the software per se. In many, if not most cases, checkpointing in particular (or any other means of ensuring a fully robust capability of a parallel program to recover from a node failure) can be mathematically shown to >>increase<< on the average, the time to completion, possibly considerably. This is therefore a decision that should be left up to the developer after considering their own particular task(s), parallel environment, and associated scaling properties. This is purely a statistical cost-benefit problem. In many/most cases the additional (expectation value of the) costs of developing and running a fully node-robust parallel application will outweigh the (expectation value of the) benefits and it should NOT be done; checkpointing code (parallel or otherwise) is not something done either lightly or as standard practice because "usually" the probability of a failure that might be "rescued" by the checkpoint is small and the cost of the checkpoint(s) large. In others, either nonlineariites in the perceived benefits (for example, in a distributed database supporting 911 operations for a police/fire/emergency group where the "cost" of failure might be a human life) or a statistical analysis of the expected time of completion make checkpointing desirable, or even essential if you with to complete the calculation at all (have an expected time of completion much less than infinity). To do the decision analysis (checkpoint/don't checkpoint -- make code node robust or not) correctly, one needs a set of parameters. From these parameters one can see roughly how to make the decision. One parameter -- the additional time required to develop a robust application -- is semi-heuristic. Making an application robust is silly if it takes you weeks to develop it and you plan to use it only one time and it will complete in at most a few days even if you have to deal with a few node failures. The others are (as a function of number of nodes N): a) The time required for the calculation to complete normally on N nodes T_calc(N). b) The time required to checkpoint the calculation (e.g. store all active memory onto a RAID server) T_save(N) c) The time interval selected between checkpoints T_check (which is a parameter chosen optimally as a function of N, but which doesn't depend directly on N). d) The probability density (probability per unit time) of node failure p. With luck one can assume that node failures are independent events and that all nodes have the same probability density (presuming that node design is homogeneous and that nodes have independent UPS and so forth). In a nutshell, if one massages these numbers (several of which are themselves highly nonlinear functions of e.g. the number of nodes N and other hardware design parameters) and guestimates that the probability of overall node-based failure of a calculation is very small in the time T_calc(N), the estimated time of completion is unlikely to be significantly reduced by checkpointing and indeed may be very significantly increased if the time cost of checkpointing itself, T_save(N)*T_calc(N)/T_check, is at all comparable to T_calc(N). If the overall probability of a node-based failure is "high" (above some threshold clearly related to the additional time cost of checkpointing) checkpointing is worth it if the additional time required to develop the code isn't unreasonable compared to the amount of time one expects to use the program. In some cases (for example, on a cluster based on an unstable and unnamed operating system that might have one node crash per day in a thirty-node cluster, while the parallel calculation takes at least a week to complete) code that is NOT checkpointed will simply "never" complete. I would therefore NOT make auto-migration, robustness under node failure or checkpointing a "required feature of well-written parallel programs", or even the much weaker "advised/nice/suggested feature". Sometimes this sort of bet-hedging is appropriate. Sometimes it is essential. Sometimes it would be idiotic. Only the programmer/designer, with a clear picture of the scaling properties and probabilities of failure for their own application and cluster hardware, is in a position to determine which category >>their<< particular application/cluster falls into. BTW, this sort of tradeoff occurs in a lot of other areas in computational optimization design -- e.g. TCP (robust against packet loss) vs UDP (not so robust against packet loss but somewhat faster). There are also plenty of analogs in quality control in manufacturing and so forth. Statistical cost/benefit analysis rules, and good programmers should do (and hence should be able to do) at least seat-of-the-pants estimates as standard practice in code design. IMHO, anyway...;-) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: updating the Linux kernel
- Next message: updating the Linux kernel
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
