Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

intermittent crashing of programs

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Patrick Geoffray patrick at myri.com
Thu Feb 21 09:09:11 PST 2002


Donald Becker wrote:

> Could you elaborate?  What PCI problems cause a NMI, and on which
> motherboards.  You obviously have some first-hand experience with the
> problem.  I'm guessing that you have helped many customers debug their
> hardware problems.

I have seen it on x330 and supermicro DLE: the SCSI board would issue a 
SERR on the PCI, and it would be translated to a NMI in the system. NMIs 
are very hard to debug because it's hard to know what is the source of 
these NMIs.
For this specific problem with SCSI, we used a PCI analyser and noticed 
the SERR. I am not 100% sure why the SCSI was dying with a SERR, but it 
was after the board asked for the bus and was waiting for a long DMA in 
progress by another PCI device to finish. Replacing the SCSI card was 
the solution in this case.

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------




More information about the Beowulf mailing list