Planned Cluster.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kim Branson bra369 at pp.molsci.csiro.auThu Oct 26 17:09:13 PDT 2000
- Previous message: Planned Cluster.
- Next message: Cluster Monitoring software?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have tried both cards, and yes before even buying a single card i checked out all the archives. Now extensive (read 4 weeks run time on 2 nodes) has not revealed any faults. Its when the supplier told us they had had problems i wondered if it was poor quality control in the manufacture or some software problems which had not surfaced due to the nature of my test application, which is not a heavy network style of calculation. I have been using the 3c59x driver, however i do consider that this is the appropriate place to ask such questions. I merely wondered if others had seen this fault or is it a recent thing, as a phd student with a limited budget to spend on building equipment, i'd like to ensure that before spending money on 65 cards that they work, and are reliable. kim branson ______________________________________________________________________ Mr Kim Branson Phd Student Diffraction and Theory Biomolecular Research Institute 343 Royal Parade, Melbourne Victoria Ph 61 03 9662 7300 Email kim.branson at bioresi.com.au ______________________________________________________________________ On Thu, 26 Oct 2000, Bogdan Costescu wrote: > On Wed, 25 Oct 2000, J. G. LaBounty wrote: > > > > > > > The head node works fine, but people have mentioned problems with the 3com > > > network card. testing has shown no problems but the vendor has informed us > > > they have had problems with the 3com cards, "some batches don't seem to > > > work", they have offered intel EtherExpress PRO 10/100+ TX - PCI cards for > > > the same cost. > > > > We were using the 3com 905b cards on 2 16 node clusters. Our application > > keeps the network pegged most of the time. We were getting network > > hangs about once every two weeks running RH6.1. We moved to RH6.2 > > and switched to the 3c90x driver and problem happened about once per > > day. We have since changed out the 3com cards for the EtherExpress PRO > > 10/100 and have not seen the problem but we only have about 3 weeks of > > runtime on this configuration. > > Sorry guys, but I don't quite get it! > The network is maybe the most important part of a cluster setup. And what > do you do about it ? "I heard that this card doesn't work right" or "It > seems that this card works better". While there is nothing wrong in asking > about card/driver combinations on this list, do you ALSO take a look at > archives of mailing list devoted to development of these drivers ? > And if you have a problem, do you report it on such a list ? > Or you just say: "OK, this card/driver combination is just crap, let's > change it." ? What if you still have problems after the change - will you > make another change ? > I encountered the same way of thinking on the NFS list... > > For reference: http://www.scyld.com/network/index.html has links for > drivers (and more) while mailing list archives start at: > http://www.scyld.com/mailman/listinfo > > Going back to the 3Com problem: the driver that was present in kernels up > to around 2.2.15 was an old driver, based on Don's 0.99H and modified by > different people. It had a race which was only possible to happen in a > very narrow window; but 2-3 weeks of uptime under load give this window > the opportunity to happen (I know because I had exactly the same > problem). Now I have 3C905 B and C cards in UP and SMP nodes which have > uptimes of more than 2 months (we do upgrade kernels from time to time). > RH 6.1 had the "bad" driver; the original kernel from RH 6.2 also had it, > but the updated 2.2.16-3 has the new one; and this is the 3c59x driver, > not the 3c90x driver (which is written by 3Com). > If you trust Don's drivers more, his 3c59x is available from: > http://www.scyld.com/network/vortex.html and it includes (AFAIK) a > fix for this problem and much more. > > Best regards, > > Bogdan Costescu > > IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen > Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY > Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 > E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De > > > > _______________________________________________ > Beowulf mailing list > Beowulf at beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: Planned Cluster.
- Next message: Cluster Monitoring software?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
