Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Beowulf Questions

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Lindahl lindahl at keyresearch.com
Tue Jan 7 11:42:45 PST 2003


On Tue, Jan 07, 2003 at 12:22:12PM -0600, Randall Jouett wrote:

> With all kidding aside, I can see how (in some applications)
> check-point files are and absolute necessity. My only beef
> with the situation is that a large amount of time is being
> spent doing IO on a "maybe." I do, however, see how they
> can be useful.

Most people don't waste large amount of time. What they do is compare
the average loss of computation due to a failure with the loss of
computation due to the extra I/O.

Example: My machine fails on average every 24 hours. It takes me 1
hour to checkpoint. Therefore if I checkpoint every 8 hours, the
average loss from a failure is 4 hours, and I spent 3 hours doing I/O.

That's an ASCI-class example; most small clusters only need a few
minutes to checkpoint and have a failure every month.

-- greg




More information about the Beowulf mailing list