[Beowulf] reboot without passing through BIOS?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduFri Aug 1 09:11:25 PDT 2008
- Previous message: [Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
- Next message: [Beowulf] Re: Building new cluster - estimate (Ivan Oleynik)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Kilian CAVALOTTI <kilian at stanford.edu> wrote: > I may be totally missing the point, but doesn't the memory need to be > physically (as in electrically) reset in order to clean out those bad > bits? And doesn't this require a hard reboot, for the machine to be > power cycled, so that memory cells are reinitialized? The type of errors I am talking about are random bit flips, for instance, from ambient radiation. When the OS reboots it will overwrite memory and so remove those errors. The affected cells were not damaged, just in the wrong state. This should work so long as none of the damaged bits prevent kexec from doing its job. Presumably the OS will also reinitialize all memory structures stored elsewhere in hardware (as in storage controllers and NICs) since it should not trust the BIOS to have done this. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275
- Next message: [Beowulf] Re: Building new cluster - estimate (Ivan Oleynik)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
