[Beowulf] Errors on IBM e325
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Michael Will mwill at penguincomputing.comMon Jun 28 09:39:12 PDT 2004
- Previous message: [Beowulf] Errors on IBM e325
- Next message: [Beowulf] Errors on IBM e325
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Was this not tested before it was deployed? Or is it a problem that only recently developed? It sounds similar to http://lists.suse.com/archive/suse-amd64/2003-Sep/0063.html suggesting that you should make sure that you run the latest kernel, and if the problem persists is a case for your service contract. (i.E. hardware broken) also see http://www.cs.caltech.edu/~weixl/research/fast-mon/arch/x86_64/kernel/bluesmoke.c Michael Will On Friday 25 June 2004 08:21 am, Jeff Layton wrote: > Good morning, > > We've got a shiny new IBM cluster with e325 nodes (Opteron). > However, we're having some trouble with a number of nodes. > We keep getting 'GART' errors showing up in the logs. Here is > an example, > > Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error > Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error > Jun 21 07:07:42 c3n32.cluster kernel: GART error 3 > Jun 21 07:40:52 c1n4.cluster kernel: GART error 3 > Jun 21 14:03:49 c1n2.cluster kernel: extended error chipkill ecc error > Jun 21 14:03:50 c1n2.cluster kernel: corrected ecc error > > > Does anybody have any ideas what the cause might be? > > Thanks! > > Jeff > -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com
- Previous message: [Beowulf] Errors on IBM e325
- Next message: [Beowulf] Errors on IBM e325
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
