problems with 3com and intel 100MB cards

Marcin Kaczmarski mkaczm at us.edu.pl
Wed Oct 9 08:37:31 PDT 2002


	Hello,

I try to make calculations from materials science using parallel code
with lam library. I use 6 athlon cluster connected with fast ethernet
cards. I use 3com3c905c-tx cards with dual channel bonding connection.I
use redhat linux with 2.4.x kernel and 3c59x driver. Everything works
quite good but unfortunately I observed that my program (CPMD) simply
dies without any info in logs (only confirmation from lam that process
died and that`s all). I tried also intel eepro100 cards and with no
channel bonding and the results are the same. I consulted this problem
with some man from Germany who is admin of 60 pc cluster and also uses
the same program - CPMD. I got the information from him that 3com and
intel cards are very unreliable , they cannot bear extremely high
network load. So are these cards only well suited for throwing them away
to the basket? Do you possibly know how to solve such problem?
I know about SCI dolphinics cards but they cost $1500 each. I also heard
from the same man that the only one reliable 100 MB cards were that with
dec tulip chip. However it seems that they are out of market. Especially
after intel bought tulip chipset and did sth wrong with that. I`ve heard
that many clusters run with smc etherpower cards (with tulip chipset)
however I checked that smc has actually no card with tulip on market.
I was also thinking about gigabit but first my channel bonding gives me
almost  gigabit performance, second I use motherboards with viakt266a
chipset and heard that it is very bad chipset and I don`t know if it
really can handle gigabit card properly, third SCI dolphinics cards have
very attractive prices while compared with reliable gigabit switches but
still very high.

I also heard about linux alpha clusters that fail to operate while
running with 3com cards.

Please give  some hint how to rebuilt the cluster in order to make it
usable. Despite using epox cards with viakt266 chipset I have a
confirmation that the problem is really primarily related to these
network cards.


Kind regards
Marcin Kaczmarski




More information about the Beowulf mailing list