Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Logging MCE information on next warm boot?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Eric W. Biederman ebiederm at xmission.com
Mon Jan 25 16:17:07 PST 2010


Greg Keller <Greg at keller.net> writes:

>> Date: Mon, 25 Jan 2010 10:46:31 -0800
>> From: "David Mathog" <mathog at caltech.edu>
>> Subject: [Beowulf] Logging MCE information on next warm boot?
>> To: beowulf at beowulf.org
>> Message-ID: <E1NZTxH-00035U-1F at mendel.bio.caltech.edu>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> Is it possible to have the Machine Check Exception (MCE) information
>> saved to disk automatically on the next warm boot?
>
> David,
>
> I believe the utility you are looking for is mcelog.  We usually run  it with
> the following arguments:
> /usr/sbin/mcelog -h --ignorenodev --filter
>
> I think it clears the info after it reports it, so make sure to tee it  to a
> file.  I don't understand the command or the flags, just a copy /  paste script
> kiddy in these regards, but I hope it helps.

In the case of a panic this won't work.  You would need to setup kdump or
something like that to capture the panic.

This sounds like L1 or L2 cache corruption but I haven't ever had any
machine checks on anything before the k8 core.  Wow.  You are talking about
old machines.

If machine check registers are kept across reboot there is a reasonable
chance that the firmware clears them.

Eric



More information about the Beowulf mailing list