[Beowulf] Logging MCE information on next warm boot?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eric W. Biederman ebiederm at xmission.comMon Jan 25 16:17:07 PST 2010
- Previous message: [Beowulf] Logging MCE information on next warm boot?
- Next message: [Beowulf] WhisperingWulf: A Silent Personal Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Keller <Greg at keller.net> writes: >> Date: Mon, 25 Jan 2010 10:46:31 -0800 >> From: "David Mathog" <mathog at caltech.edu> >> Subject: [Beowulf] Logging MCE information on next warm boot? >> To: beowulf at beowulf.org >> Message-ID: <E1NZTxH-00035U-1F at mendel.bio.caltech.edu> >> Content-Type: text/plain; charset=iso-8859-1 >> >> Is it possible to have the Machine Check Exception (MCE) information >> saved to disk automatically on the next warm boot? > > David, > > I believe the utility you are looking for is mcelog. We usually run it with > the following arguments: > /usr/sbin/mcelog -h --ignorenodev --filter > > I think it clears the info after it reports it, so make sure to tee it to a > file. I don't understand the command or the flags, just a copy / paste script > kiddy in these regards, but I hope it helps. In the case of a panic this won't work. You would need to setup kdump or something like that to capture the panic. This sounds like L1 or L2 cache corruption but I haven't ever had any machine checks on anything before the k8 core. Wow. You are talking about old machines. If machine check registers are kept across reboot there is a reasonable chance that the firmware clears them. Eric
- Previous message: [Beowulf] Logging MCE information on next warm boot?
- Next message: [Beowulf] WhisperingWulf: A Silent Personal Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
