[Fwd: Tyan Tiger 2460]

Robert G. Brown rgb at phy.duke.edu
Fri Apr 26 08:55:56 PDT 2002


On Fri, 26 Apr 2002, Maurice Hilarius wrote:

> >   a) Only the video card would work in slot 1.  Period.  If we put the
> >3c905 in slot one all by itself (using the BIOS console), the system
> >would behave erratically, actually mistaking the number and speed of
> >processors during boot and crashing under heavy network loads if and
> >when it booted.
> 
> That is basically correct, with SOME video cards.
> In general the BIOS and bus setup seem to prefer the first slot be used by 
> video, but it really seems to matter what card it is more than anything 
> else. In general the ATI RageXL cards are not happy, but the RAGE Pro are, 
> and many TNT2 cards work well over all slots.

You misunderstand.  The video card works fine in all slots.  The system
locks with a 3c905C-TX-M in slot 1 even with the system stripped so that
2 processors, a stick of certified registered ECC DDR, and the 3C905C
are it (NOTHING else plugged into the mobo).  Tyan is now refusing to
own the problem, and we're on the phone with 3Com to see if we can get
some help at that end.  They, at least, are being constructive.

I was mistaken that only video works in slot 1.  A Netgear in slot 1
still appears to work, but it doesn't support PXE or WOL and we need PXE
for the nodes.

Or maybe I misunderstand, if you are pointing out that the 2460 does
have a history of problems with certain video cards in slot 1 AS WELL AS
the 3c905.

> >   b) If slot one had video or was empty, the system would work fine for
> >all other vertical configurations.  That is, video in 1, net in 6, video
> >in 2, net in 3 or vice versa, video in 5, NIC in 2, etc.  I don't know
> >that we tested every combination but we didn't find another that failed
> >in all our tests.  Slot 1 alone seems to be the ringer.
> 
> If you are using a riser the other slots are mainly irrelevant.
> In some risers they use extension boards to derive addressing from the next 
> two slots, and in others they use some logic on the riser. It is advisable 
> to use the Tyan M2039 riser as it seems to behave well with this, although, 
> depending on cards used sometimes we see the ability to only support two 
> out of three cards on the riser.

We've just verified that the system works fine with a riser that plugs
into the data lines in slot 2, not slot 1 (via a ribbon cable).  It
really is a slot 1/3C905 issue.  Interestingly, the system does work if
the FSB is set back to 200 MHz.

I've now gone through dozens of threads on the motherboard on amdmb.com,
and the 246X motherboards appear to be extremely "finicky", often
requiring immense amounts of energy to find a companion configuration
that will work.  Tyan refused to acknowledge that there were any
problems with the motherboard at all when we were on the phone with
them, which is odd given the dozens of threads reporting them in the
forum.  I consider it a "problem" when a motherboard is so bleeding edge
sensitive to timing/configuration issues that moving a well-known stable
PCI card from a major manufacturer over a slot makes the system break.
Obviously Tyan doesn't;-)

> >It is not a 64 vs 32 bit slot question or a power question per se, as
> >far as we can tell.  Slots 1-4 are all apparently identical 32 bit, five
> >volt slots, slots 5+ are 32 bit five volt slots, and both the 3c905 and
> >ATI are slotted for 3.3/32 bit slots with the extra notch near the
> >back.  There is no reason that we can see for the 3c905 to work in slot
> >2, 3, 4, 5, 6, 7 but not in slot 1.
> >
> >This is further verified by the fact that we had a 2566 to play with as
> >well, which has two 64/66 3.3 volt slots, and the cards worked perfectly
> >in them in any order.
> 
> In the case of the 2466 the only drawback with what you describe is that 
> generally to get 33MHz cards running off a riser in slot1 or 2 usually 
> requires the motherboard to be jumpered to 33MHz on the 64 bit PCI. There 
> ARE however NICs and video cards that will run on a 66MHz bus successfully, 
> but it does require some testing to find the right choices..
> 
> >   c) Our real torment comes from the riser.  Most riser cards are
> >designed so they HAVE to plug into slot 1 so that their physical
> >framework can hold the cards sideways in the remaining room over the PCI
> >bus.  Plugged into slot 2, there isn't generally room to fit a full
> >height card (or the support frame) into the remaining space to the side.
> >With the riser in slot 1, no combination of cards in the riser that
> >included the NIC would work, and even the video alone in the slot that
> >should have been a "straight through" connection appeared to have
> >problems, although a system without a NIC is useless to us so the issue
> >is moot.  Again, the most common symptom was that the system wouldn't
> >even get the CPU info correct at the bios level before any boot is even
> >initiated, and if the boot/install succeeded at all the system was
> >highly unstable under any kind of load.
> 
> Again, I think you are mostly seeing a riser card issue. We have used 
> different risers with 3COM, Intel, and DLink NICs successfully, with the 
> riser plugged into slot 1.
> These have included some 32 bit, and a few 64 bit risers. In general we 
> have the best results, supporting 64 bit, on the Tyan riser. But with 32 
> bit only cards we are successful with more generic models.

It's not a riser issue.  The system locks, as noted above, if the 3c905
is the ONLY card in the system and is plugged vertically into slot 1 (no
riser in the system at all).

The only riser-related issue is that it does seem to be related to the
use of the slot 1 data lines and not the power rails, since the other
riser slots draw power from other PCI slots with little extension
cables, and a 3c905 in any slot-1 mounted riser then causes the lockup.

> 
> Of course the RIGHT solution would be to keep our perfectly good cards
> >and risers and get Tyan to replace the 2460's (if there isn't a bios
> >upgrade that fixes the ones we have).  Given the frustration and
> >downtime and lost productivity we have suffered, giving us 2466
> >replacements seems reasonable to me:-).
> While I am sure that this would be a possible solution, I feel that the 
> right solution is to use a different (better) riser card.
> 
> >Anyway, this explains to at least some extent why such a wide range of
> >experiences has been reported for these motherboards on the list.
> Most of the problems I see are caused by:
> 1) Obsolete BIOS versions
> 2) Poor RAM
> 3) problems with cooling
> 4) In appropriate BIOS setup choices
> 5) Riser cards with issues
> 
> >BTW, so far the 2466 runs fine, as noted by many listvolken.
> 
> 
> 2466 is actually MUCH more difficult to deal with, especially if you want 
> to use a 64 bit/66MHz card, as the bus is very particular about what cards 
> you use. 5 volt cards are definitely going to make problems on most risers, 
> in our testing.

The good thing about the 2466 is that it has onboard 100BT in addition
to the serial console, so that one doesn't necessarily need any cards at
all to run as a simple node.  If one wants to use it as a gigabit-linked
node, then one probably wants a 64/66 card anyway.  We've only been
playing with one since yesterday, but it does seem a bit better (with
what we've tested) than the 2460, but then, our 2460's do not work at
all in the configuration we're trying to run.

> 
> Still as you mention, people have had success, but you can not just throw 
> ANY riser or NIC or (especially) video card in and have it work..

Overall, the Tyans seem a bit on the maddening side.  Marginal hardware
is Evil.  I'm sure we'll eventually get things worked out (we're trying
to microconfigure the 3c905 in ITS bios on the phone with 3com now) but
it costs a lot in time, energy, and lost productivity.  (Well, looks
like configuring the 3c905 bios by hand didn't do it).

So far, the only solutions we've found appear to be displaced risers or
(possibly) different NICs.  Someone suggested that EEpro's work in a
slot 1 riser, and they do PXE and perform well.  Setting the FSB back
isn't an option.

I do appreciate your help and the remarks/suggestions above.  If I sound
abrupt, it is due to two nights running up til 3 websurfing on this
issue, and a pending meeting on why our cluster nodes still aren't in
production this afternoon.

   rgb

> 
> 
> 
> With our best regards,
> 
> Maurice W. Hilarius       Telephone: 01-780-456-9771
> Hard Data Ltd.               FAX:       01-780-456-9772
> 11060 - 166 Avenue        mailto:maurice at harddata.com
> Edmonton, AB, Canada      http://www.harddata.com/
>     T5X 1Y3
> 
> Ask me about the UP1500 Alpha - Full systems from $3,500!
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list