[Beowulf] Intel microcode updates for "erratum" - "an incorrect instruction stream may be executed"

Tim Small tim at buttersideup.com
Mon Jul 26 05:15:18 PDT 2004


Hi all,

I see that Intel has noted an "erratum", which it says may lead to "an 
incorrect instruction stream may be executed".  The erratum has shown up 
on the latest specification update documents, and seems to affect recent 
P4s (stepping D1/M0 - versions 0x0f25, 0x0f29), P4 Celerons (stepping D1 
- 0x0f29), P4 Xeons (all 400/533MHz bus CPUs, I think), and some recent 
XeonMPs (stepping B0/C0 - 0x0f25, 0x0f26).

ref:

http://developer.intel.com/design/xeon/documentation.htm#updates

I was wondering if anyone running large clusters had seen any problems 
attributable to this bug - and whether people with heavy vendor support 
had been given any advice about it - particularly how often the problem 
is likely to occur?


I've noticed that IBM, and HP have released BIOSs for some machines, 
which includes new microcode to fix this erratum..

The erratum text follows:

> Problem: A Timing Marginality in the Instruction Decoder Unit May 
> Cause an Unpredictable Application Behavior and/or System Hang
>
> A timing marginality may exist in the clocking of the instruction 
> decoder unit which leads to a circuit slowdown in the read path from 
> the Instruction Decode PLA circuit. This timing marginality may not be 
> visible for some period of time.
>
> Implication: When this erratum occurs, an incorrect instruction stream 
> may be executed resulting in an unpredictable application behavior 
> and/or system hang
>
> Workaround: It is possible for the BIOS to contain a workaround for 
> this erratum. BIOS must load the microcode update during the BIOS POST 
> time prior to memory initialization. Status: For the steppings 
> affected, see the Summary Table of Changes.


The "cpuid" utility - http://www.ka9q.net/code/cpuid/ - will tell you 
which version your CPU is, and there is an Intel Microcode update 
utility for Linux:

http://www.urbanmyth.org/microcode/

which can be used to load new microcode from user space, but it doesn't 
appear to have the latest microcode yet - although it's difficult to 
tell given Intel's lack of changelog/release notes for the microcode 
files - running the latest microcode.ctl on an IBM machine which has the 
fixed BIOS says:

microcode: CPU0 not 'upgrading' to earlier revision 0x17 (current=0x21)
microcode: No suitable data for cpu 0

Intel's erratum text says "BIOS must load the microcode update during 
the BIOS POST time prior to memory initialization" - but whether that 
means "this is the general policy for microcode updates", or "this is 
necessary to fix this particular erratum" is not clear, but hopefully, 
it is the former...

Tim.




More information about the Beowulf mailing list