[Beowulf] Errors on IBM e325
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Layton jeffrey.b.layton at lmco.comFri Jun 25 08:21:11 PDT 2004
- Previous message: [Beowulf] Clusters for Computing in Physics, 14th Summer School on Computing Techniques in Physics (fwd from rabenseifner@hlrs.de)
- Next message: [Beowulf] Errors on IBM e325
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Good morning, We've got a shiny new IBM cluster with e325 nodes (Opteron). However, we're having some trouble with a number of nodes. We keep getting 'GART' errors showing up in the logs. Here is an example, Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error Jun 21 07:07:42 c3n32.cluster kernel: GART error 3 Jun 21 07:40:52 c1n4.cluster kernel: GART error 3 Jun 21 14:03:49 c1n2.cluster kernel: extended error chipkill ecc error Jun 21 14:03:50 c1n2.cluster kernel: corrected ecc error Does anybody have any ideas what the cause might be? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta
- Previous message: [Beowulf] Clusters for Computing in Physics, 14th Summer School on Computing Techniques in Physics (fwd from rabenseifner@hlrs.de)
- Next message: [Beowulf] Errors on IBM e325
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
