[Beowulf] Solved: SATA(?) errors locks up node
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gebhardt Thomas gebhardt at hrz.uni-marburg.deMon Jul 2 07:05:34 PDT 2007
- Previous message: [Beowulf] interconnects (intel's optical cx4)
- Next message: [Beowulf] cold cathode fluorescent backlighting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, thank you all for your advice! After a Firmware upgrade (->20.06C06) of the SATA disks we had no further incident until now. So I'm pretty sure that we have caught the bug. Thanks again, Th. Gebhardt On Wednesday 23 May 2007 11:13, Gebhardt Thomas wrote: > we are running a cluster of 57 dual opteron nodes. Once or twice a week > one of these nodes gets in an error state and can't connect to the > I/O-subsystem anymore. I need to reboot that node. As far as I can see, > the problem occurs randomly at any of our nodes, i.e., the MTBF of a single > node is about 6-12 months. > > I still don't know whether this is a problem of the linux kernel sata > driver, a hardware problem, a flaw of the disk firmware or something else. > I'm looking for a possibilty to track down the problem without > substantially interfering with the jobs on the cluster. > > This is our environment: > TYAN S3992 motherboard with Serverworks HT1000+2000 chipset. > 2 DualCore Opteron 2216 HE 2.4GHz, 16GByte Mem > Western Digital 250GByte SATA disk, WDC WD2500YS-01SHB0, firmware rev. 20.06C03
- Previous message: [Beowulf] interconnects (intel's optical cx4)
- Next message: [Beowulf] cold cathode fluorescent backlighting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
