[Beowulf] NMI (Non maskable interrupts)

Steven Truong midair77 at gmail.com
Tue Mar 18 11:55:21 PDT 2008


On Mon, Mar 17, 2008 at 3:02 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
> > From my understanding, NMI is not good since the processors really
>  > have to handle these interrupts right away and these might degrade the
>  > performance of the nodes.
>
>  I think you're mistaken - NMI's of the sort you're talking about will
>  result in a panic.  these NMI's are probably just low-level kernel
>  synchronization like where one CPU needs to cause others to immediately do
>  something like changing the status of a page in their MMUs.
>
>  for instance, I notice that more recent kernels classify interrupts
>  more finely:
>
>  [root at experiment ~]# cat /proc/interrupts
>             CPU0       CPU1       CPU2       CPU3
>    0:         68          0          0          0   IO-APIC-edge      timer
>    1:          0          0          0         10   IO-APIC-edge      i8042
>    4:          0          0          0          2   IO-APIC-edge
>    8:          0          0          0          0   IO-APIC-edge      rtc
>    9:          0          0          0          0   IO-APIC-fasteoi   acpi
>   12:          0          0          0          4   IO-APIC-edge      i8042
>   14:          0          0          0          0   IO-APIC-edge      ide0
>   17:          0          0          0          0   IO-APIC-fasteoi   sata_nv
>   18:          0          0          0          0   IO-APIC-fasteoi   sata_nv
>   19:     123229        148        514       4698   IO-APIC-fasteoi   sata_nv
>  362:  127524168    5281605     236961     121506   PCI-MSI-edge      eth1
>  377:     519748   12731137     607115   42573852   PCI-MSI-edge      eth0:MSI-X-2-RX
>  378:     109154      80191  302109913    6487104   PCI-MSI-edge      eth0:MSI-X-1-TX
>  NMI:          0          0          0          0   Non-maskable interrupts
>  LOC:  300446104  300446082  300446060  300446038   Local timer interrupts
>  RES:    2698262      44102    2234502    3677120   Rescheduling interrupts
>  CAL:       4135       4379       4460        415   function call interrupts
>  TLB:      14018      15088       4079       7251   TLB shootdowns
>  TRM:          0          0          0          0   Thermal event interrupts
>  THR:          0          0          0          0   Threshold APIC interrupts
>  SPU:          0          0          0          0   Spurious interrupts
>  ERR:          0
>
>  I suspect that all the counts listed after RES are, in earlier kernels,
>  lumped into NMI.  obviously, rescheduling, function call and TLB shootdowns
>  are perfectly normal, not indicating any error (though you might want to
>  minimize them as well...)
>
>  how about trying a new kernel?  the above is 2.6.24.3.  note that there are
>  important security fixes that you might be missing if you're running certain
>  ranges of old kernels...
>

Hi, Mark.  Yes, I was wrong.  I also found a very informative discussion of NMI.

http://x86vmm.blogspot.com/2005/10/linux-nmis-on-intel-64-bit-hardware.html

Thank you.



More information about the Beowulf mailing list