Reliability analysis was RE: Windows HPC (@ Cornell)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Tim Wait waitt at saic.comThu Nov 7 15:34:47 PST 2002
- Previous message: Reliability analysis was RE: Windows HPC (@ Cornell)
- Next message: Reliability analysis was RE: Windows HPC (@ Cornell)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
One aspect I haven't seen mentioned in this thread, except for Greg's oblique reference to Mosix, is that many (most?) of our clusters run parallel apps. Regardless of HA, if you have a node fail while running a parallel job, you have just blown your (supposed) 5 nines away; in my experience, it takes the user O(12+ hours) to restart the job. Is this deteriorating to HA vice beowulf? 5 nines? Yeah, right ;) Even those $50k hand built Cray disks die. Tim -- Tim Wait waitt at saic.com SAIC - Advanced Systems Group PO Box 41, Sumerduck VA 22742 Phone: 540-439-0193
- Previous message: Reliability analysis was RE: Windows HPC (@ Cornell)
- Next message: Reliability analysis was RE: Windows HPC (@ Cornell)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
