Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Reliability analysis was RE: Windows HPC (@ Cornell)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Tim Wait waitt at saic.com
Thu Nov 7 15:34:47 PST 2002


One aspect I haven't seen mentioned in this thread, except for
Greg's oblique reference to Mosix, is that many (most?)
of our clusters run parallel apps. Regardless of HA, if you have
a node fail while running a parallel job, you have just blown your
(supposed) 5 nines away; in my experience, it takes the user O(12+ hours)
to restart the job. Is this deteriorating to HA vice beowulf?

5 nines? Yeah, right ;)

Even those $50k hand built Cray disks die.

Tim

-- 
Tim Wait       waitt at saic.com
SAIC - Advanced Systems Group
PO Box 41, Sumerduck VA 22742
Phone: 540-439-0193




More information about the Beowulf mailing list