Problems with dual Athlons

Robert G. Brown rgb at phy.duke.edu
Wed Jul 31 16:09:39 PDT 2002


On Wed, 31 Jul 2002, Steven Timm wrote:

> 
> 
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
> 
> On Wed, 31 Jul 2002, Robert G. Brown wrote:
> 
> > On Wed, 31 Jul 2002, Steven Timm wrote:
> >
> > > Has anyone managed to successfully configure a Tyan 2466 board
> > > so that it can have a boot partition that's bigger than 1024 cylinders
> > > on its system drive?  Drive in question is WD200-BB
> >
> > Are you using grub?  I thought that was no longer an issue with grub.
> >
> >    rgb
> 
> Haven't ported to 7.3 yet, so can't use grub.. besides, "Grub and Stitch"
> doesn't quite sound the same.  But we'll keep it in mind
> for when we do.

Well goodness, use kickstart and upgrade to 7.3.  Much better kernels --
the early 2.4 kernels had lots of issues -- and grub is infinitely
better than lilo.  With a little practice, one can use a single grub
floppy to boot any kernel from any partition, for example, and the
system doesn't die if you fail to rerun lilo after making a
configuration change and then reboot.  grub has many other advantages as
well.

If you insist on staying with lilo for the time being, the only solution
I know of is to create a boot partition that is < 1024 cylinders.  Of
course, to do this you probably have to reinstall (well, you don't HAVE
to if you are comfortable repartitioning a functioning drive) and you
may as well reinstall into 7.3.  This limit is inherited from decades
ago, BTW, and it is high time it is consigned to the hell of ancient
ideas that weren't so bright in the long run. (Nobody will EVER be able
to make a disk with 2^10 cylinders, so this is plenty...;-)

You should also check out yum -- yum allows one to upgrade a running 7.1
or 7.2 system to e.g. 7.3 "in place".  Alas, it cannot manage a
lilo->grub transition for you -- the best it can do is reinstall lilo --
so you have to figure out how to install grub by hand if you don't do
just do a kickstart reinstall to 7.3.  Once you are running 7.3, yum can
save you so much time you don't mind the time spent doing the upgrade...

   rgb

> 
> Steve
> 
> 
> >
> > >
> > > Steve Timm
> > >
> > >
> > > ------------------------------------------------------------------
> > > Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
> > > Fermilab Computing Division/Operating Systems Support
> > > Scientific Computing Support Group--Computing Farms Operations
> > >
> > > On Wed, 31 Jul 2002, Robert G. Brown wrote:
> > >
> > > > On Wed, 31 Jul 2002, Ray Schwamberger wrote:
> > > >
> > > > > You might try the noapic option. I'm thinking there may be some kind of
> > > > > issues with APIC, AMD and 2.4.18.
> > > >
> > > > We don't have ASUS systems but instead a mix of Tyan 2460 and 2466
> > > > systems and see very similar things, including the bizarreness of the
> > > > blind crash problems appearing on one system (consistently are
> > > > repeatedly) but not another IDENTICAL system sitting right next to it.
> > > >
> > > > We have found that power supplies (both the power line itself and the
> > > > switching power supply in the chassis) can make a difference on the
> > > > 2466's -- a marginal power supply is an invitation to problems for sure
> > > > on these beasties.  This is reflected in the completely outrageous
> > > > observation that I have some nodes that will boot and run stably when
> > > > plugged into certain receptacles on the power pole, but not other
> > > > receptacles.  If I put a polarity/circuit tester on the receptacles,
> > > > they pass.  If I check the line voltages, they are nominal (120+ VAC).
> > > > If I plug any 2466 into them (I tried 3), it fails to POST.  If I move
> > > > the plug two receptacles up on the same pole and same circuit, it POSTS,
> > > > installs, and works fine.  I haven't put an oscilloscope on the line
> > > > when plugging it in, but I'm sure it would be fascinating to do so.
> > > >
> > > > We're also in the problem of investigating kernel snapshot dependencies
> > > > and the SMP issues aforementioned as we continue to try to stabilize our
> > > > 2460's, which seem even more sensitive than the 2466's (which so far
> > > > seem to run stably and and give decent performance overall).
> > > > Unfortunately, our crashes occur with a mean time of days to a week or
> > > > two under load in between (consistent with a rare interrupt conflict or
> > > > SMP issue) so it takes a long time to test a potential fix.  We did
> > > > avoid a crash for about 9 days on a 2460 running 2.4.18-5 (Red Hat's
> > > > build id) after experiencing crashes on the node every 5-10 days, but
> > > > are only just now accumulating better statistics on a group of nodes
> > > > instead of just the one.
> > > >
> > > > So overall, I concur -- try different smp kernel releases and snapshots,
> > > > try rearranging the cards (order often seems to matter) and bios
> > > > settings, try --noapic (which we should probably also do -- we haven't
> > > > so far) and yes, try rearranging the way the nodes are plugged in.
> > > > Notice that this is evil and insidious -- you can pull a node from a
> > > > rack and bench it and it will run fine forever, but if you plug it back
> > > > in to the same receptacle when you put it back, it has problems.
> > > > Maddening.
> > > >
> > > >    rgb
> > > >
> > > > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > > > Duke University Dept. of Physics, Box 90305
> > > > Durham, N.C. 27708-0305
> > > > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf at beowulf.org
> > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > > >
> > >
> >
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> >
> >
> >
> >
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list