[Beowulf] cheap PCs this christmas

Robert G. Brown rgb at phy.duke.edu
Wed Nov 23 06:26:43 PST 2005


On Tue, 22 Nov 2005, Mark Hahn wrote:

> the comfort factor of ECC always has to be balanced against the missed
> opportunity cost of paying more.

What he said, what he said.  Also, YMMV as far as real-world reliability
of non-ECC memory and its effect on the stability of either nodes or
applications.  For many cluster applications that are time-granular at
the level of hours to a day, there may be low sensitivity to errors even
if they occur.  Higher quality (but still non-ECC) memory may throw
fewer errors.  Higher quality motherboards and power supplies may
throw fewer errors even with non-ECC memory.  Clusters near sea level or
down in building basements may throw fewer errors, except where they are
built in old radiation facilities with a bit of leaked radioactive
source dust still trapped in the ceiling;-).  I use non-ECC memory
exclusively on e.g. my home cluster, but it still manages to be stable
for months at a time and to run applications with known answers to the
right answers over that kind of timeframe.  OTOH I have a system that is
throwing errors on non-ECC memory in my office at Duke right now.  I
view this as MUCH more likely a problem with its PARTICULAR DIMMs or
motherboard than an ECC issue per se.

There are some interesting remarks (IIRC) from Don Becker on this very
point back in the list archives, if anybody feels like googling for
them....

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list