Problems with Tulip ethernet on diskless cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Thomas Lovie tlovie at cr434095-a.glph1.on.wave.home.comThu Sep 7 09:15:30 PDT 2000
- Previous message: Problems with Tulip ethernet on diskless cluster
- Next message: Problems with Tulip ethernet on diskless cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I had a similar problem with a similar configuration. The net card I was using is the ACER ALN-315 with an Intel(DEC) 21143 chip. Something strange was causing the cards to produce 'garbled' packets when they were operated in full duplex mode. By 'garbled' I mean that the packets were dropped by the switch since they failed the checksum tests that the switch was performing. The problem stopped when I operated the cards in half duplex mode, and I never got to checking it further. Something was suggested by R.G. Brown (I think) about a 'udelay driver' which had to be manually tuned to your specific setup. I can't remember the kernel errors that I was getting, but what are your netperf numbers for full and half duplex TCP? My netperf numbers (through the switch) for half duplex were good (but half duplex), however for full duplex, I was getting about 1/100 wire speed, which is undoubtedly due to the switch dropping lots of bad packets. Regards, Tom Lovie. -----Original Message----- From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On Behalf Of Franz Marini Sent: Thursday, September 07, 2000 10:38 AM To: beowulf at beowulf.org Subject: Problems with Tulip ethernet on diskless cluster Hi all, we have a 16 diskless-nodes cluster working since Dec 1999. Trying fftw (library for FFT) I noticed a strange behaviour (that is, running the benchmark in mpi, I get a lower performance than running it on a single machine, even using 8 nodes). The configuration is : 1 server w/ DLink 4 port fast ethernet card, using 4 tulip chips, 2 UW Scsi2 IBM hard drive in software RAID 1, p III 500, 128 Mb 16 diskless nodes w/ 3com fast ethernet card, p III 500, 128 Mb 1 3com Superstack II 3300 XM switch. The ethernet drivers are all updated to the latest version, we're using RedHat 6.1 as Linux distro and LAM-Mpi 6.3 for parallel comms (but we tried with mpich with the same results). The only strange thing I noticed is the output from "cat /proc/net/dev" on the server : eth0:199467368 1848969 0 0 0 0 0 0 268603237 550016 405696 0 0 0 405696 0 eth1:1015790525 10879129 0 0 0 0 0 0 4262257137 9926748 419750 0 0 0 419750 0 eth2:56208013 216130 0 0 0 0 0 0 73798 312 317023 0 0 0 317023 0 eth3:28227811 118504 0 0 0 0 0 0 99299 1036 254667 0 0 0 254667 0 that is, in Rx we have no prob at all, in Tx almost avery packet get a carrier error. Note : the switch management soft doesn't report any error on the ports connected to eth1,2 and 3. All ports are configured as 100baseTx-FD (as reported from mii-diag) and so the switch. I have no clue on what is happening, especially considering the fact that the network apparently is working correctly. Any idea ? Thank you all in advance, Franz. --------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : marini at pcmenelao.mi.infn.it --------------------------------------------- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: Problems with Tulip ethernet on diskless cluster
- Next message: Problems with Tulip ethernet on diskless cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
