Riser card -mainboard conflicts?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Donald Becker becker at scyld.comWed Jan 8 08:19:45 PST 2003
- Previous message: Riser card -mainboard conflicts?
- Next message: Riser card -mainboard conflicts?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 8 Jan 2003 tegner at nada.kth.se wrote: > We have a cluster consisting of 30 athlon 2000+ nodes on a KT3 Ultra > MS-6380E mainboard (using ide discs) connected by a fast Ethernet > network. > > For the nodes we use 2U chassis, and the NIC and the graphic card sit on a > PCI-301 riser card. .. > On one of the nodes we can newer get the network to function, there > are messages about bus-master dirty, PCI bus error, etc, and we never > get any contact with the rest of the cluster. PCI bus errors are a pretty clear indication that the riser cards are a problem. > The other nodes "seem" to work OK, but for some parallel applications > one or more of the nodes just "give up" after some time, and in those > cases we get similar messages as above - but it have also happened > that a node just died in which case we have to use the reset button to > get it back. ... > We start to suspect that mainboard and the riser card are in some way > incompatible, but we would greatly appreciate any hints of other > reasons for these problems. OK, here is an alternative: you have _both_ memory errors and PCI errors. Track down the PCI errors first. Not all drivers report PCI bus errors. Especially with vendor-written drivers, there is a reason to ignore or silently recover from errors -- the driver and hardware _appears_ more robust when there are no messages. The scary thing is that you might have silent data corruption from other devices. Any driver that goes to the extra effort of reporting a bus error is doing you a big favor by pointing out the problem! -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993
- Previous message: Riser card -mainboard conflicts?
- Next message: Riser card -mainboard conflicts?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
