[eepro100] Odd IEE Pro/100+ problem on Linux 2.2.12-20 (fwd)

Duncan Napier napier@napiersys.bc.ca
Mon, 24 Jul 2000 21:27:58 -0700 (PDT)


Hello,

I've just joined your mailing list with the purpose of finding a solution
to my problem:

I'm using RedHat 6.1, 2.2.12 kernel recompiled with FreeSWAN IPSec. I have
an odd problem. I have 2 identical boxes, Dell Dimension Pentium 133 MHz
machines with 32 Mb of RAM. Both each have 2 Intel Ether Express Pro/100+
cards in them (total of 4 cards, dual NICs in each machine). They both use
the kernel and the eepro100 modules sources that came with the RedHat 6.1
distribution. 

Each machine has eth0 as a DHCP IP address, while eth1 is a static,
internal (RFC1918) IP address. The first machine works flawlessly, the
second one freezes on the bootup on card eth1 (eth0 passes fine). Oddly
enough, the second one will boot just fine if the network connection
from eth1 is unplugged! (ie, you just yank out the RJ45 connector, and all
is well. I have tested it with a 3Com TP800 100 mbps hub  and a Linksys
EFAH05W 10/100 mbps hub). The second problem machine will then carry on
working just fine when the network is plugged in again after it has passed
eth1. After that, it too runs flawlessly! The machines are Firewalling VPN
gateways and once booted, work just fine. 

I set up and tested the machines offsite with 2 static IP addresses and
everything worked fine. Once I shipped the second one onsite to a site
with DHCP-assigned IP addresses on eth0, it would lock up on the boot pass
through  eth1.

This almost seems like a hardware problem to me, but can anyone explain
this? It is a real pain troubleshooting the machine now, especially when
it refuses to boot. It has already been installed and is running some
distance away (I have to fly or take a boat to get there :-), and there
is no onsite tech help). 

I read somewhere on the 'Net that for machine with identical NICs, the
ordering of the MAC addresses for the cards asssigned to eth0 and eth1 can
cause problems. It stated that if eth0 had a larger MAC address that eth1,
there could be problems. Now this sounds like a load of baloney to me, but
I have noticed that the machine that works has a lower MAC address on eth0 
than eth1, but the one that doesn't has the reverse case .... of course
the probablility that this coincidence is 50%. 

Both machines appear completely identical in all other respects, eg:

/etc/conf.modules:
alias eth0 eepro100

/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="dhcp"
IPADDR=""
NETMASK=""
ONBOOT="yes"
IPXNETNUM_802_2="" .....

/etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
BOOTPROTO="none"
IPADDR="192.168.2.154"
NETMASK="255.255.255.0"
ONBOOT="yes"
IPXNETNUM_802_2="" ....

/sbin/insmod eepro100
dmesg :

eth0: Intel EtherExpress Pro 10/100 at 0xff00, 00:D0:B7:73:3E:CA, IRQ 11.
  Receiver lock-up bug exists -- enabling work-around.
  Board assembly 721383-008, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
eth1: Intel EtherExpress Pro 10/100 at 0xfe80, 00:D0:B7:73:09:51, IRQ 11.
  Receiver lock-up bug exists -- enabling work-around.
  Board assembly 721383-008, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).

				Best Regards,

				Duncan Napier.
----------------------------------------------------------------------------
Duncan Napier 					
Napier Systems Research