[Beowulf] Re: cheap PCs this christmas
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduMon Nov 14 08:46:51 PST 2005
- Previous message: [Beowulf] memory bandwidth estimation
- Next message: [Beowulf] Re: cheap PCs this christmas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Tony Travis <ajt at rri.sari.ac.uk> wrote > > That sounds great BUT what about the reliability of COTS memory? <SNIP> > The economics DIY Beowulf seem a lot less attractive if > you have to use PC's with ECC memory. If you expect the memory to stay error free for any length of time it must be resistant to various memory failures, including those caused by gamma rays and other background radiation. ECC gives you that, regular memory does not. There is a reason servers use ECC memory! That said, most consumer grade PCs are still useful with their not so great memory because they are reset periodically, typically at least once a day, and that clears out the memory errors. If random errors occur at "Erate" (errors per unit time per machine, all identical machines) you could keep the total number of errors per machine, on average, below MaxE by rebooting the entire cluster at time MaxE/Erate. However, OpenMosix is going to make a hash of that simple model since, as you described, it replicates memory errors across the nodes. Other options include running jobs in duplicate and if a discrepancy is found, running a third instance to break the tie. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] memory bandwidth estimation
- Next message: [Beowulf] Re: cheap PCs this christmas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
