Cluster-wide overclocking...
Bill Broadley
bill@math.ucdavis.edu
Wed, 23 Sep 1998 18:00:02 -0400
> Yes. I've been debugging some fluid dynamics computational code some
> physicist here wrote and I eventually got to the conclusion that it's not
> the code that's faulty, but rather the kernel (due to overclocking).
> However, sanity checks in this particular case show that 1 in approx. 1.2
> billion calculations fail, and then computing it again covers up for the
> glitch, and we still benefit from overclocking.
So in cluster you could get a few errors per second. Few enough
they might get missed, enough to corrupt any long run, especially since
errors often propagate....
Sounds like an excellent reason to not overclock...