bra369 at pp.molsci.csiro.au
Thu Oct 26 17:09:13 PDT 2000
I have tried both cards, and yes before even buying a single card i
checked out all the archives. Now extensive (read 4 weeks run time on 2
nodes) has not revealed any faults. Its when the supplier told us they had
had problems i wondered if it was poor quality control in the manufacture
or some software problems which had not surfaced due to the nature of my
test application, which is not a heavy network style of calculation. I
have been using the 3c59x driver, however i do consider that this is the
appropriate place to ask such questions.
I merely wondered if others had seen this fault or is it a recent thing,
as a phd student with a limited budget to spend on building equipment, i'd
like to ensure that before spending money on 65 cards that they work, and
Mr Kim Branson
Diffraction and Theory
Biomolecular Research Institute
343 Royal Parade, Melbourne Victoria
Ph 61 03 9662 7300
Email kim.branson at bioresi.com.au
On Thu, 26 Oct 2000, Bogdan Costescu wrote:
> On Wed, 25 Oct 2000, J. G. LaBounty wrote:
> > >
> > > The head node works fine, but people have mentioned problems with the 3com
> > > network card. testing has shown no problems but the vendor has informed us
> > > they have had problems with the 3com cards, "some batches don't seem to
> > > work", they have offered intel EtherExpress PRO 10/100+ TX - PCI cards for
> > > the same cost.
> > We were using the 3com 905b cards on 2 16 node clusters. Our application
> > keeps the network pegged most of the time. We were getting network
> > hangs about once every two weeks running RH6.1. We moved to RH6.2
> > and switched to the 3c90x driver and problem happened about once per
> > day. We have since changed out the 3com cards for the EtherExpress PRO
> > 10/100 and have not seen the problem but we only have about 3 weeks of
> > runtime on this configuration.
> Sorry guys, but I don't quite get it!
> The network is maybe the most important part of a cluster setup. And what
> do you do about it ? "I heard that this card doesn't work right" or "It
> seems that this card works better". While there is nothing wrong in asking
> about card/driver combinations on this list, do you ALSO take a look at
> archives of mailing list devoted to development of these drivers ?
> And if you have a problem, do you report it on such a list ?
> Or you just say: "OK, this card/driver combination is just crap, let's
> change it." ? What if you still have problems after the change - will you
> make another change ?
> I encountered the same way of thinking on the NFS list...
> For reference: http://www.scyld.com/network/index.html has links for
> drivers (and more) while mailing list archives start at:
> Going back to the 3Com problem: the driver that was present in kernels up
> to around 2.2.15 was an old driver, based on Don's 0.99H and modified by
> different people. It had a race which was only possible to happen in a
> very narrow window; but 2-3 weeks of uptime under load give this window
> the opportunity to happen (I know because I had exactly the same
> problem). Now I have 3C905 B and C cards in UP and SMP nodes which have
> uptimes of more than 2 months (we do upgrade kernels from time to time).
> RH 6.1 had the "bad" driver; the original kernel from RH 6.2 also had it,
> but the updated 2.2.16-3 has the new one; and this is the 3c59x driver,
> not the 3c90x driver (which is written by 3Com).
> If you trust Don's drivers more, his 3c59x is available from:
> http://www.scyld.com/network/vortex.html and it includes (AFAIK) a
> fix for this problem and much more.
> Best regards,
> Bogdan Costescu
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
> Beowulf mailing list
> Beowulf at beowulf.org
More information about the Beowulf