[Beowulf] Tyan S2882
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caTue Sep 26 08:14:45 PDT 2006
- Previous message: [Beowulf] Tyan S2882
- Next message: [Beowulf] Tyan S2882
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> We are currently deploying Tyan S2882 Dual Opteron Boards, and we have these are older, well-known, widely installed and certainly _can_ run stable. > found the system to be quite unstable. After BIOS updates and kernel > changes we still get random kernel panics when under load. have you run memtest86? are you monitoring temperatures? (and perhaps voltages) > So far we have solved the > - broken BIOS problem with an update to the most recent BIOS. due to a newer cpu? the cluster I have with S2882's (mixed with S2881's, I think) hasn't needed any updates, but it's not using dual-core or anything exotic. > - Discovered that some power supplies can produce problems > http://www.anandtech.com/mb/showdoc.aspx?i=2608 I have a hard time believing this is specific to antec+tyan. yes, certainly, PS's are a sensitive point, especially if you've got heavily-configured systems. > - FS corruption due to a firmeware problem in a RAID hardware board therefore not related to the MB, right? > - MCE chipkill errors (non-fatal) due to apparent bad RAM also not related to the MB, right? also, you really should expect some small rate of corrected ECC's on any system; it's only a high rate that's a problem (or uncorrectable ones, of course...) > To be solved: > - random kernel panics that take out the logging even when all debug > flags are set in the kernel, as it fails to sync the disc during the > kernel panic. but kernel panics never sync - after all, a panic is specifically an event from which you can't continue in any way. or am I misunderstanding what you're saying? it sounds like you've done a lot of debugging already, but I'd recommend going back to basics. remove all the io devices, disks, etc and see whether the board+cpu+memory can run stably, etc.
- Previous message: [Beowulf] Tyan S2882
- Next message: [Beowulf] Tyan S2882
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
