Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Errors on IBM e325

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Michael Will mwill at penguincomputing.com
Mon Jun 28 09:39:12 PDT 2004


Was this not tested before it was deployed? Or is it a problem that only
recently developed?

It sounds similar to http://lists.suse.com/archive/suse-amd64/2003-Sep/0063.html 
suggesting that you should make sure that you run the latest kernel, and if the problem 
persists is a case for your service contract. (i.E. hardware broken)

also see http://www.cs.caltech.edu/~weixl/research/fast-mon/arch/x86_64/kernel/bluesmoke.c

Michael Will
On Friday 25 June 2004 08:21 am, Jeff Layton wrote:
> Good morning,
> 
>    We've got a shiny new IBM cluster with e325 nodes (Opteron).
> However, we're having some trouble with a number of nodes.
> We keep getting 'GART' errors showing up in the logs. Here is
> an example,
> 
> Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error
> Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error
> Jun 21 07:07:42 c3n32.cluster kernel: GART error 3
> Jun 21 07:40:52 c1n4.cluster kernel: GART error 3
> Jun 21 14:03:49 c1n2.cluster kernel:     extended error chipkill ecc error
> Jun 21 14:03:50 c1n2.cluster kernel:     corrected ecc error
> 
> 
>    Does anybody have any ideas what the cause might be?
> 
> Thanks!
> 
> Jeff
> 

-- 
Michael Will, Linux Sales Engineer
NEWS: We have moved to a larger iceberg :-)
NEWS: 300 California St., San Francisco, CA.
Tel:  415-954-2822  Toll Free:  888-PENGUIN
Fax:  415-954-2899 
www.penguincomputing.com




More information about the Beowulf mailing list