[Beowulf] cheap PCs this christmas
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caTue Nov 22 20:58:09 PST 2005
- Previous message: [Beowulf] cheap PCs this christmas
- Next message: [Beowulf] cheap PCs this christmas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > I'm interested to know about other people's views and experiences of > > the reliability of COTS (i.e. non-ECC) memory? reliability is always a gamble; reducing risk always means increasing cost and/or decreasing performance. the amount you decrease risk through techniques like ECC can be large or small, depending on your configuration. > My view has always been to use ECC memory. the comfort factor of ECC always has to be balanced against the missed opportunity cost of paying more. > Aside from non-ECC memory being cheaper, I see no benefits of using it > when one accounts for downtime, troubleshooting, paying for replacement > RAM, and worse getting wrong results. this implies that you see enough ECC detections to produce a significant sample. that implies that you probably have both a high-altitude facility and have very large amounts of ram in use. > Honestly, I never knew that not using ECC RAM on anything besides a > nonessential system like a standard desktop configuration was ever an > option. I find that the use of "nonessential" often indicates rather poor reasoning about the risks (and costs) involved. a statistically-grounded approach would treat memory size and perhap activity more than whether something is "desktop" or "server". that said, our servers all have ECC. on our current ~500 cpus and ~800GB, I'd guess we see O(10) corruptions/year. going to 7500 cores and >14TB, (all with ECC) I'm pretty happy not to be risking undetected corruptions. still, for some workloads, especially for leaner facilities (lower memory, less budget spent on network and storage), I'd certainly want to consider non-ECC. I only wish vendors would publish their FIT figures, so we could crunch the numbers properly. more to the point, if you're going to network $300 PCs, ECC should almost certainly not be on your xmas list...
- Previous message: [Beowulf] cheap PCs this christmas
- Next message: [Beowulf] cheap PCs this christmas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
