Planned Cluster.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bogdan Costescu Bogdan.Costescu at IWR.Uni-Heidelberg.DeThu Oct 26 13:48:19 PDT 2000
- Previous message: Planned Cluster.
- Next message: Planned Cluster.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 25 Oct 2000, J. G. LaBounty wrote: > > > > The head node works fine, but people have mentioned problems with the 3com > > network card. testing has shown no problems but the vendor has informed us > > they have had problems with the 3com cards, "some batches don't seem to > > work", they have offered intel EtherExpress PRO 10/100+ TX - PCI cards for > > the same cost. > > We were using the 3com 905b cards on 2 16 node clusters. Our application > keeps the network pegged most of the time. We were getting network > hangs about once every two weeks running RH6.1. We moved to RH6.2 > and switched to the 3c90x driver and problem happened about once per > day. We have since changed out the 3com cards for the EtherExpress PRO > 10/100 and have not seen the problem but we only have about 3 weeks of > runtime on this configuration. Sorry guys, but I don't quite get it! The network is maybe the most important part of a cluster setup. And what do you do about it ? "I heard that this card doesn't work right" or "It seems that this card works better". While there is nothing wrong in asking about card/driver combinations on this list, do you ALSO take a look at archives of mailing list devoted to development of these drivers ? And if you have a problem, do you report it on such a list ? Or you just say: "OK, this card/driver combination is just crap, let's change it." ? What if you still have problems after the change - will you make another change ? I encountered the same way of thinking on the NFS list... For reference: http://www.scyld.com/network/index.html has links for drivers (and more) while mailing list archives start at: http://www.scyld.com/mailman/listinfo Going back to the 3Com problem: the driver that was present in kernels up to around 2.2.15 was an old driver, based on Don's 0.99H and modified by different people. It had a race which was only possible to happen in a very narrow window; but 2-3 weeks of uptime under load give this window the opportunity to happen (I know because I had exactly the same problem). Now I have 3C905 B and C cards in UP and SMP nodes which have uptimes of more than 2 months (we do upgrade kernels from time to time). RH 6.1 had the "bad" driver; the original kernel from RH 6.2 also had it, but the updated 2.2.16-3 has the new one; and this is the 3c59x driver, not the 3c90x driver (which is written by 3Com). If you trust Don's drivers more, his 3c59x is available from: http://www.scyld.com/network/vortex.html and it includes (AFAIK) a fix for this problem and much more. Best regards, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
- Previous message: Planned Cluster.
- Next message: Planned Cluster.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
