Problems with Tulip ethernet on diskless cluster

Don Holmgren djholm at fnal.gov
Thu Sep 7 08:07:49 PDT 2000


Whenever I've had this problem - carrier errors on a full duplex
connection - it's always been because the tulip driver is in half duplex
mode but the card is in full duplex.  I believe (but may be wrong) that
mii-diag reports only the setting of the hardware.  Try adding

  full_duplex=1

to your tulip insmod command.


Don Holmgren
Fermilab
djholm at fnal.gov
630-840-2745




On Thu, 7 Sep 2000, Franz Marini wrote:

> Hi all, 
> 
>   we have a 16 diskless-nodes cluster working since Dec 1999. Trying fftw
> (library for FFT) I noticed a strange behaviour (that is, running the
> benchmark in mpi, I get a lower performance than running it on a single
> machine, even using 8 nodes). 
>  The configuration is :
> 
>    1 server w/ DLink 4 port fast ethernet card, using 4 tulip chips, 2
> UW Scsi2 IBM hard drive in software RAID 1, p III 500, 128 Mb
> 
>   16 diskless nodes w/ 3com fast ethernet card, p III 500, 128 Mb
> 
>    1 3com Superstack II 3300 XM switch.
> 
>  The ethernet drivers are all updated to the latest version, we're using
> RedHat 6.1 as Linux distro and LAM-Mpi 6.3 for parallel comms (but we
> tried with mpich with the same results).
> 
>  The only strange thing I noticed is the output from "cat
> /proc/net/dev" on the server :
> 
> eth0:199467368 1848969 0 0 0 0 0 0 268603237 550016 405696 0 0 0 405696 0
> eth1:1015790525 10879129 0 0 0 0 0 0 4262257137 9926748 419750 0 0 0 
> 419750 0
> eth2:56208013 216130 0 0 0 0 0 0 73798 312 317023 0 0 0 317023 0
> eth3:28227811 118504 0 0 0 0 0 0 99299 1036 254667 0 0 0 254667 0
> 
>  that is, in Rx we have no prob at all, in Tx almost avery packet get a
> carrier error. Note : the switch management soft doesn't report any error
> on the ports connected to eth1,2 and 3.
> 
>  All ports are configured as 100baseTx-FD (as reported from mii-diag) and
> so the switch.
> 
>  I have no clue on what is happening, especially considering the fact that
> the network apparently is working correctly.
> 
>  Any idea ?
> 
>  Thank you all in advance,  
> 
> Franz.
> 
> 
> ---------------------------------------------
> Franz Marini
> Sys Admin and Software Analyst,
> Dept. of Physics, University of Milan, Italy.
> email : marini at pcmenelao.mi.infn.it
> --------------------------------------------- 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 





More information about the Beowulf mailing list