Problems with dual Athlons
Robert G. Brown
rgb at phy.duke.edu
Wed Jul 31 11:35:41 PDT 2002
On Wed, 31 Jul 2002, Steven Timm wrote:
> Has anyone managed to successfully configure a Tyan 2466 board
> so that it can have a boot partition that's bigger than 1024 cylinders
> on its system drive? Drive in question is WD200-BB
Are you using grub? I thought that was no longer an issue with grub.
> Steve Timm
> Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
> On Wed, 31 Jul 2002, Robert G. Brown wrote:
> > On Wed, 31 Jul 2002, Ray Schwamberger wrote:
> > > You might try the noapic option. I'm thinking there may be some kind of
> > > issues with APIC, AMD and 2.4.18.
> > We don't have ASUS systems but instead a mix of Tyan 2460 and 2466
> > systems and see very similar things, including the bizarreness of the
> > blind crash problems appearing on one system (consistently are
> > repeatedly) but not another IDENTICAL system sitting right next to it.
> > We have found that power supplies (both the power line itself and the
> > switching power supply in the chassis) can make a difference on the
> > 2466's -- a marginal power supply is an invitation to problems for sure
> > on these beasties. This is reflected in the completely outrageous
> > observation that I have some nodes that will boot and run stably when
> > plugged into certain receptacles on the power pole, but not other
> > receptacles. If I put a polarity/circuit tester on the receptacles,
> > they pass. If I check the line voltages, they are nominal (120+ VAC).
> > If I plug any 2466 into them (I tried 3), it fails to POST. If I move
> > the plug two receptacles up on the same pole and same circuit, it POSTS,
> > installs, and works fine. I haven't put an oscilloscope on the line
> > when plugging it in, but I'm sure it would be fascinating to do so.
> > We're also in the problem of investigating kernel snapshot dependencies
> > and the SMP issues aforementioned as we continue to try to stabilize our
> > 2460's, which seem even more sensitive than the 2466's (which so far
> > seem to run stably and and give decent performance overall).
> > Unfortunately, our crashes occur with a mean time of days to a week or
> > two under load in between (consistent with a rare interrupt conflict or
> > SMP issue) so it takes a long time to test a potential fix. We did
> > avoid a crash for about 9 days on a 2460 running 2.4.18-5 (Red Hat's
> > build id) after experiencing crashes on the node every 5-10 days, but
> > are only just now accumulating better statistics on a group of nodes
> > instead of just the one.
> > So overall, I concur -- try different smp kernel releases and snapshots,
> > try rearranging the cards (order often seems to matter) and bios
> > settings, try --noapic (which we should probably also do -- we haven't
> > so far) and yes, try rearranging the way the nodes are plugged in.
> > Notice that this is evil and insidious -- you can pull a node from a
> > rack and bench it and it will run fine forever, but if you plug it back
> > in to the same receptacle when you put it back, it has problems.
> > Maddening.
> > rgb
> > Robert G. Brown http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf