[Beowulf] reboot without passing through BIOS?
kilian at stanford.edu
Thu Jul 31 13:00:45 PDT 2008
On Wednesday 30 July 2008 09:13:56 am David Mathog wrote:
> If one were to build nodes without ECC memory it would probably be a
> good idea to reboot them from time to time to clean out whatever bad
> bits might have accumulated. It then occurred to me that doing so
> would require a trip through the BIOS on every reboot, at least on
> every x86 based computer I'm familiar with. That is not a terrible
> thing, but it made me wonder if it is really necessary.
I may be totally missing the point, but doesn't the memory need to be
physically (as in electrically) reset in order to clean out those bad
bits? And doesn't this require a hard reboot, for the machine to be
power cycled, so that memory cells are reinitialized?
I mean, if the BIOS stage is skipped, as in kexec'ing a new kernel,
electrical initialization doesn't occur, and the bad bits will probably
stick there. Unless the kernel does this kind of scrubbing in its
initialization phase, which I don't know, I don't see any reason why
the memory would be cleaned from errors.
And another point I wonder about, is to know if a reboot would do any
good for non-ECC memory anyway. As far as I understand it, a memory
error is either a repeatable, hard one, like a bad chip, and a reboot
won't change anything about it, since the hardware is faulty ; either a
transient, soft error, where a bad value is read once, but where next
reads are ok. So unless there's a sort of accumulation somewhere in the
soft case, I don't really understand what a reboot could do about it?
If you got some light to shed on this, I'd be interested.
More information about the Beowulf