Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Errors on IBM e325

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jeff Layton jeffrey.b.layton at lmco.com
Fri Jun 25 08:21:11 PDT 2004


Good morning,

   We've got a shiny new IBM cluster with e325 nodes (Opteron).
However, we're having some trouble with a number of nodes.
We keep getting 'GART' errors showing up in the logs. Here is
an example,

Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error
Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error
Jun 21 07:07:42 c3n32.cluster kernel: GART error 3
Jun 21 07:40:52 c1n4.cluster kernel: GART error 3
Jun 21 14:03:49 c1n2.cluster kernel:     extended error chipkill ecc error
Jun 21 14:03:50 c1n2.cluster kernel:     corrected ecc error


   Does anybody have any ideas what the cause might be?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta





More information about the Beowulf mailing list