updating the Linux kernel
glindahl at hpti.com
Mon Jun 12 09:47:10 PDT 2000
> Do I save all results at every program statement to a file?
> Do I have the slaves send the results back to the master after every step
> in the calculation?
There are 2 solutions.
The traditional one is to have the program save its state every Nth timestep
to disk. Since failures are rare, the fact that you lose some work and have
to wait for disk I/O is acceptable.
A more clever but expensive solution is to use only 1/2 of your RAM, and
save a copy of all the state to memory of a different node. You can do this
more frequently than saving to disk since you have a lot more network
bandwidth than disk bandwidth. The bonus is that you lose less work when you
do have a failure, and you never wait for (wasted) disk I/O to save the
state in the first place.
More information about the Beowulf