Problems with dual Athlons

Velocet math at velocet.ca
Wed Jul 31 12:25:22 PDT 2002


On Wed, Jul 31, 2002 at 02:34:54PM -0400, Robert G. Brown's all...
> The memory is all AMD/Tyan approved, straight off their list, and we've
> swapped around DIMMs from several approved brands.  Of course this
> proves nothing -- too small an N, for all that.  About the only thing
> one can conclude is that there is SOMETHING marginal about the
> engineering of this system -- I've never run a system more temperamental
> about environment and configuration, in nearly 20 years of PC and
> workstation operations.  Identical configurations -- case, motherboard,
> cpu, power supply, memory, network, BIOS version and setup.  One works,
> the other just doesn't.  One works when plugged in HERE, but not when
> plugged in THERE.  Almost certainly hardware, but something very subtle
> -- a timing issue, some sort of noise bleeding through the power supply.

We've had no problems on the systems we have running the Tbirds on the
2460 and 2466s we setup as an experiment around here. Then again we
use all 400W Enermax's here, and some really nice clean power coming out
of our $150K UPS :)

However, in playing with a couple MP2000+s and 2 different 2466's with
the 4.01 bios, we've had no end of trouble.

Is there some problem with MP2000s with these boards? 2466N-4M is the
exact model, with Bios v4.01.

> Whatever it is, we've done a lot of hardware exercise testing and gotten
> nothing, for the most part.  It isn't even clear that the crashes are
> load dependent.  Sometimes an idle system just decides to die.

We had these crashes happen only when running CPUBURN, MEMTEST or compiling
things like Perl.

> Oh, it always helps.  Actually, I'm doing decently with my 2466 cluster
> (much better system than the 2460 in terms of stability) although it
> still has a bit more "character" than I'd like (he grumbles as he goes
> in to reboot a few cranky nodes that are having PXE issues).  The bad
> 2460's we've reinstalled, rebuilt, and now are RMA'ing the motherboards
> to get 2466's in exchange as they crash the sixth or seventh or eleventh
> time.  It may be something as simple as quality control or design issues
> on the motherboard -- a few traces with inductions that can resonate
> enough to create a spurious signal in the logic depending on things like
> precisely where nearby metal sits and just how much HF noise there is on
> the power, a couple of traces that run too far and too close so a signal
> on one can be picked up on the other ditto.  A different layout and the
> problems go away.

Again, wow, we've had just no problems at all. Perhaps you got
ahold of a bad batch.

We had this happen with ECS K7S-5A boards. One bad batch made it out
and was floating around ontario/toronto. All the boards we used here
as well as those of friends of mine and my brothers and other colleagues
all had a 50% DOA rate right out of the box. We found one store
that had an older shipment and did not have this problem (perhaps slightly
older revision) and so we bought our next batches there. Otherwise
a great board.

As always, I never put full trust into a system til its been stable
a month anyway. If you only have a month to test though, it can be
a major problem if you need to go back to mfg for the RMAs - tyan
is 3-4 weeks usually for eg.

/kc

> 
>    rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 



More information about the Beowulf mailing list