problems with 3com and intel 100MB cards
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Oct 9 10:24:15 PDT 2002
- Previous message: problems with 3com and intel 100MB cards
- Next message: problems with 3com and intel 100MB cards
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9 Oct 2002, Marcin Kaczmarski wrote: > It is a proven fact that happened at some University in Germany that the > newly bought super linux alpha dual cluster with 3com NIC ( I do not > know the model of these cards in this case) simply failed to operate > while trying to run very demanding scientifical calculations in material > science just because of cards. After replacing them with 3 year old dec > tulip cards everything gone fantastic. I am highly convinced that a When was this? Cards and drivers are constantly in (r)evolution. Four or five years ago I think that this experience was common -- real digital tulip cards were one of the best NICs there were and amazingly cheap besides, and I personally had endless trouble with 3coms, even on Intel. However, Digital became Compaq, the tulip was cloned (two or three times) and sold to Intel besides, every vendor known to man started adding their own proprietary crap on top of the basic tulip (or clone, and the clones add their own intermediate layer) AND 3com cleaned up its design and Don's drivers started to work quite well indeed with the cards. Finally, there is the alpha issue -- don't assume that just because hardware works on Intel with the Intel (or AMD) kernels that it or its drivers will work on alphas or anything else. I imagine that companies like e.g. Scyld spend a LOT of time making sure that their kernels and drivers do indeed work across hardware architectures for the simple reason that a lot of the time they don't, initially. These days, I see 3coms consistently outperform tulip clones (and don't even want to talk about RTLs), and agree that 3com or eepro (with PXE) are the NICs of choice for clusters and workstations alike, for at least Intel and AMD based systems at 100BT. Gigabit cards add yet another layer of driver and hardware compatibility questions -- you really have to start looking at the gigabit chip being used to build the NIC and who actually makes it. > server NIC which runs excellently in servers may be really absolutely > not suitable for cluster that runs calculations, because you cannot > compare the network load that you have on servers with the network load > that appears while running in cluster, in case of cluster it is very > very bigger. I`m sure of that. We had another reports in cpmd mailing > lists in September about linux 10 dual alpha cluster with 3com cards > that hangs calculations. I do not believe that they have low price 3com > cards in such a cluster. This is the sort of conclusion that is very dangerous, as it is based on a fairly small sample (N of one? two?) and hence is pretty much anecdotal and not necessarily reflective of everybody's general experience. It may well be that 3com cards have problems in alpha clusters. It might also be that SOME 3com cards have had problems in SOME alpha clusters using SOME kernels -- in the past -- and are now fueling anecdotal reports of failure that might or might not be in the process of being fixed or have already been fixed in current kernels. There is, after all, a kernel mailing list and device specific mailing lists for all the major NIC drivers (I'm still on the driver lists for some of the primary cards like eepro, 3com and tulip) and if someone DOES have trouble with a given card on a given architecture, they should by all means communicate with these lists and hence with the primary kernel/driver maintainers. Sometimes that is still Don Becker (revered by all for his work over years on network drivers, beowulfery and more), sometimes not. You might find that the "fix" is just matter of changing a line in e.g. /etc/modules.conf to ensure that the right driver is being loaded instead of the wrong one, or upgrading the kernel to a more current one because of a bug in the particular kernel snapshot you are using. I personally don't think that it is likely to be because of any fundamental flaw in 3com design, as they work pretty well on tens to hundreds of machines here (stable under all loads, some of the best bandwidth/latency numbers when netperf or netpipe or lmbenched). On Intel/AMD, of course, and a variety of kernels from 2.2 on (not so much under 2.0 kernels). rgb > > kind regards > Marcin Kaczmarski > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: problems with 3com and intel 100MB cards
- Next message: problems with 3com and intel 100MB cards
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
