problems with 3com and intel 100MB cards
mkaczm at us.edu.pl
Thu Oct 10 02:47:08 PDT 2002
Maybe all these problems related to 3com and intel cards happen only
while using CPMD package - this is ab initio code developed at IBM.
But those people do not have to worry for network connections, they have
IBMs , Crays, etc.
I run several tests for instance with Netpipe and nothing bad happenned
during these tests indeed, the results were proper - 160-180 Mbits.sec
for dual channel and 51us latency (kernel 2.4.x). I think that I did
almost everything, I do not have any other idea how to fight with that.
Perhaps it`s time to make a pleasure for Myricom or Dolphinics company.
> On 9 Oct 2002, Marcin Kaczmarski wrote:
> > It is a proven fact that happened at some University in Germany that the
> > newly bought super linux alpha dual cluster with 3com NIC ( I do not
> > know the model of these cards in this case) simply failed to operate
> > while trying to run very demanding scientifical calculations in material
> > science just because of cards. After replacing them with 3 year old dec
> > tulip cards everything gone fantastic. I am highly convinced that a
> When was this? Cards and drivers are constantly in (r)evolution. Four
> or five years ago I think that this experience was common -- real
> digital tulip cards were one of the best NICs there were and amazingly
> cheap besides, and I personally had endless trouble with 3coms, even on
> Intel. However, Digital became Compaq, the tulip was cloned (two or
> three times) and sold to Intel besides, every vendor known to man
> started adding their own proprietary crap on top of the basic tulip (or
> clone, and the clones add their own intermediate layer) AND 3com cleaned
> up its design and Don's drivers started to work quite well indeed with
> the cards.
> Finally, there is the alpha issue -- don't assume that just because
> hardware works on Intel with the Intel (or AMD) kernels that it or its
> drivers will work on alphas or anything else. I imagine that companies
> like e.g. Scyld spend a LOT of time making sure that their kernels and
> drivers do indeed work across hardware architectures for the simple
> reason that a lot of the time they don't, initially.
> These days, I see 3coms consistently outperform tulip clones (and don't
> even want to talk about RTLs), and agree that 3com or eepro (with PXE)
> are the NICs of choice for clusters and workstations alike, for at least
> Intel and AMD based systems at 100BT. Gigabit cards add yet another
> layer of driver and hardware compatibility questions -- you really have
> to start looking at the gigabit chip being used to build the NIC and who
> actually makes it.
> > server NIC which runs excellently in servers may be really absolutely
> > not suitable for cluster that runs calculations, because you cannot
> > compare the network load that you have on servers with the network load
> > that appears while running in cluster, in case of cluster it is very
> > very bigger. I`m sure of that. We had another reports in cpmd mailing
> > lists in September about linux 10 dual alpha cluster with 3com cards
> > that hangs calculations. I do not believe that they have low price 3com
> > cards in such a cluster.
> This is the sort of conclusion that is very dangerous, as it is based on
> a fairly small sample (N of one? two?) and hence is pretty much
> anecdotal and not necessarily reflective of everybody's general
> experience. It may well be that 3com cards have problems in alpha
> clusters. It might also be that SOME 3com cards have had problems in
> SOME alpha clusters using SOME kernels -- in the past -- and are now
> fueling anecdotal reports of failure that might or might not be in the
> process of being fixed or have already been fixed in current kernels.
> There is, after all, a kernel mailing list and device specific mailing
> lists for all the major NIC drivers (I'm still on the driver lists for
> some of the primary cards like eepro, 3com and tulip) and if someone
> DOES have trouble with a given card on a given architecture, they should
> by all means communicate with these lists and hence with the primary
> kernel/driver maintainers. Sometimes that is still Don Becker (revered
> by all for his work over years on network drivers, beowulfery and more),
> sometimes not.
> You might find that the "fix" is just matter of changing a line in e.g.
> /etc/modules.conf to ensure that the right driver is being loaded
> instead of the wrong one, or upgrading the kernel to a more current one
> because of a bug in the particular kernel snapshot you are using. I
> personally don't think that it is likely to be because of any
> fundamental flaw in 3com design, as they work pretty well on tens to
> hundreds of machines here (stable under all loads, some of the best
> bandwidth/latency numbers when netperf or netpipe or lmbenched). On
> Intel/AMD, of course, and a variety of kernels from 2.2 on (not so much
> under 2.0 kernels).
> > kind regards
> > Marcin Kaczmarski
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf