[eepro100] Quad port Compaq NC3134/35 i82559 = IRQ 23 is physically blocked

Claude LeFrancois (LMC) Claude.LeFrancois@ericsson.ca
Thu Feb 21 10:14:01 2002


--------------060507060804060606090105
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

Hi Donald,

Donald Becker wrote:


On Wed, 20 Feb 2002, Claude LeFrancois (LMC) wrote:


I try to install/configure a quad port Compaq NC3134 equipped with the
NC3135 module into a server system. The NC3134 is a dual board, NC3135
is a module installed on top of the NC3134 which provides 2 extra 10/100
ports for a total of 4 ports. The board is a PCI 64 bit card. All the
four ports are i82559 chipsets (eepro100).


If I'm thinking of the same board, the primary board contains a 21152
bus
bridge.  The daughterboard has only the two '559 chips on the PCI bus.

Exact !




... This
system is also equipped with dual on-board i82559. It makes a total of 6
i82559. The server runs RedHat 6.2 over a 2.2.17 kernel.

The problem resides in the fact that 2 NICs are not working well. I got
this message:

    eth0: IRQ 23 is physically blocked! Failing back to low-rate
polling.

It looks like an IRQ/IOAPIC problem. The faulty ports (both module ports
on NC3135) are sharing IRQs with their parents (main ports on NC3134):


As you guessed, this indicates an IRQ mapping problem.  And the APIC
table is usually to blame.

The quick work-around -- and one that Scyld always ships by default for
2.2 kernel -- is to use the "noapic" kernel option.  This results in
unbalanced interrupts, but this can actually be good in some SMP
environments.

When I boot the system with the "noapic" option the machine freeze when
eepro100 is loaded. Same thing happens when I try to load the machine
with the UP kernel.



It is possible that the IRQ isn't really blocked, just that there is a
race condition where the other CPU is currently handling the interrupt.
You can check this by starting up only eth0 and checking the interrupt
count. But I'm guessing from the low interrupt count that we really do
have a problem here. 


 22:          4          3   IO-APIC-level  eth1, eth3
 23:          4          4   IO-APIC-level  eth0, eth2

...

 28:        277        516   IO-APIC-level  eth5
 31:        279         92   IO-APIC-level  eth4


Yup, not many interrupts are getting through.  Does the count ever go
up?

Not really...



It is curious that there are two IRQ assigned (I'm guessing INTA and
INTB pins) rather than one or four.


 The board finally works but give a really slow rate:

    [root@lmcx2 /root]# ping 192.166.0.1
    PING 192.166.0.1 (192.166.0.1) from 192.166.50.1 : 56(84) bytes of
data.
    eth0: IRQ 23 is physically blocked! Failing back to low-rate
polling.
    64 bytes from 192.166.0.1: icmp_seq=0 ttl=255 time=13.367 sec
    64 bytes from 192.166.0.1: icmp_seq=1 ttl=255 time=12.370 sec
    64 bytes from 192.166.0.1: icmp_seq=2 ttl=255 time=11.370 sec
    64 bytes from 192.166.0.1: icmp_seq=3 ttl=255 time=10.370 sec
    64 bytes from 192.166.0.1: icmp_seq=4 ttl=255 time=9.370 sec
    64 bytes from 192.166.0.1: icmp_seq=5 ttl=255 time=8.370 sec
    64 bytes from 192.166.0.1: icmp_seq=6 ttl=255 time=7.370 sec
    64 bytes from 192.166.0.1: icmp_seq=7 ttl=255 time=6.370 sec
    64 bytes from 192.166.0.1: icmp_seq=8 ttl=255 time=5.370 sec
    64 bytes from 192.166.0.1: icmp_seq=9 ttl=255 time=4.370 sec


This is exactly what is expected when the interrupt isn't getting
through.  The driver eventually decides to give up and processes all of
the packets in the Rx ring.

The low-rate polling isn't intended to work well.  Instead it's a
fall-back so that you can ssh to the server and figure out what is
broken.  To do high-throughput polling the driver would need many more
Rx buffers and access to 1000+ Hz polling rather than the kernel's
standard 100Hz timer ticks.


IO APIC #5......

...

IRQ22 -> 6
IRQ23 -> 7
IRQ26 -> 10
IRQ27 -> 11
IRQ28 -> 12
IRQ30 -> 14
IRQ31 -> 15

...

PCI->APIC IRQ transform: (B0,I4,P0) -> 28
PCI->APIC IRQ transform: (B0,I5,P0) -> 26
PCI->APIC IRQ transform: (B0,I5,P1) -> 27
PCI->APIC IRQ transform: (B0,I6,P0) -> 31
PCI->APIC IRQ transform: (B0,I15,P0) -> 10
PCI->APIC IRQ transform: (B1,I0,P0) -> 30
PCI->APIC IRQ transform: (B3,I4,P0) -> 22
PCI->APIC IRQ transform: (B3,I5,P0) -> 23
PCI->APIC IRQ transform: (B3,I6,P0) -> 22
PCI->APIC IRQ transform: (B3,I7,P0) -> 23

...

eepro100.c:v1.19 12/19/2001 Donald Becker    <mailto:becker@scyld.com>
<mailto:becker@scyld.com>
 <mailto:becker@scyld.com> <becker@scyld.com>
   http://www.scyld.com/network/eepro100.html
<http://www.scyld.com/network/eepro100.html> 
 <http://www.scyld.com/network/eepro100.html>
<http://www.scyld.com/network/eepro100.html> 
eth0: OEM Intel i82559 rev 8 at 0xe0843000, 00:02:A5:DA:80:75, IRQ 23.
eth1: OEM Intel i82559 rev 8 at 0xe0845000, 00:02:A5:DA:80:74, IRQ 22.


These are the problem interfaces on the daughtercard, correct?

Yes,



(I expected the daughtercard interfaces to be eth2 & 3.)


eth2: OEM Intel i82559 rev 8 at 0xe0847000, 00:02:A5:D6:4A:C3, IRQ 23.
eth3: OEM Intel i82559 rev 8 at 0xe0849000, 00:02:A5:D6:4A:C2, IRQ 22.


And these are on the base PCI card and work fine.

Yes.




eth4: OEM Intel i82559 rev 8 at 0xe084b000, 00:30:48:11:FE:68, IRQ 31.
eth5: OEM Intel i82559 rev 8 at 0xe084d000, 00:30:48:11:F7:62, IRQ 28.


And these are on the motherboard.  (On-motherboard devices are always
last, designed so that a plug-in card overrides a potentially broken
on-board device.)


You're right.



Donald Becker				 becker@scyld.com
<mailto:becker@scyld.com> 
Scyld Computing Corporation		 http://www.scyld.com
<http://www.scyld.com> 
410 Severn Ave. Suite 210		Second Generation Beowulf
Clusters
Annapolis MD 21403			410-990-9993


-- 


Claude LeFrancois
Packet Core Network (LMC/XP/DG)
Ericsson Canada Inc.
Tel: +1 (888) 345-7900 x7579
Fax: +1 (514) 345-5837
Mailto:Claude.Lefrancois@ericsson.ca 



--------------060507060804060606090105
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<html>
<head>
</head>
<body>
Hi Donald,<br>
<br>
Donald Becker wrote:<br>
<blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
  <pre wrap="">On Wed, 20 Feb 2002, Claude LeFrancois (LMC) wrote:<br><br></pre>
  <blockquote type="cite">
    <pre wrap="">I try to install/configure a quad port Compaq NC3134 equipped with the<br>NC3135 module into a server system. The NC3134 is a dual board, NC3135<br>is a module installed on top of the NC3134 which provides 2 extra 10/100<br>ports for a total of 4 ports. The board is a PCI 64 bit card. All the<br>four ports are i82559 chipsets (eepro100).<br></pre>
    </blockquote>
    <pre wrap=""><!----><br>If I'm thinking of the same board, the primary board contains a 21152 bus<br>bridge.  The daughterboard has only the two '559 chips on the PCI bus.<br></pre>
    </blockquote>
Exact !<br>
    <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
      <pre wrap=""><br></pre>
      <blockquote type="cite">
        <pre wrap="">... This<br>system is also equipped with dual on-board i82559. It makes a total of 6<br>i82559. The server runs RedHat 6.2 over a 2.2.17 kernel.<br><br>The problem resides in the fact that 2 NICs are not working well. I got<br>this message:<br><br>    eth0: IRQ 23 is physically blocked! Failing back to low-rate<br>polling.<br><br>It looks like an IRQ/IOAPIC problem. The faulty ports (both module ports<br>on NC3135) are sharing IRQs with their parents (main ports on NC3134):<br></pre>
        </blockquote>
        <pre wrap=""><!----><br>As you guessed, this indicates an IRQ mapping problem.  And the APIC<br>table is usually to blame.<br><br>The quick work-around -- and one that Scyld always ships by default for<br>2.2 kernel -- is to use the "noapic" kernel option.  This results in<br>unbalanced interrupts, but this can actually be good in some SMP<br>environments.<br></pre>
        </blockquote>
When I boot the system with the "noapic" option the machine freeze when eepro100
is loaded. Same thing happens when I try to load the machine with the UP
kernel.<br>
        <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
          <pre wrap=""><br>It is possible that the IRQ isn't really blocked, just that there is a<br>race condition where the other CPU is currently handling the interrupt.<br>You can check this by starting up only eth0 and checking the interrupt<br>count. But I'm guessing from the low interrupt count that we really do<br>have a problem here. <br><br></pre>
          <blockquote type="cite">
            <pre wrap=""> 22:          4          3   IO-APIC-level  eth1, eth3<br> 23:          4          4   IO-APIC-level  eth0, eth2<br></pre>
            </blockquote>
            <pre wrap=""><!---->...<br></pre>
            <blockquote type="cite">
              <pre wrap=""> 28:        277        516   IO-APIC-level  eth5<br> 31:        279         92   IO-APIC-level  eth4<br></pre>
              </blockquote>
              <pre wrap=""><!----><br>Yup, not many interrupts are getting through.  Does the count ever go<br>up?<br></pre>
              </blockquote>
Not really...<br>
              <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
                <pre wrap=""><br>It is curious that there are two IRQ assigned (I'm guessing INTA and<br>INTB pins) rather than one or four.<br><br></pre>
                <blockquote type="cite">
                  <pre wrap=""> The board finally works but give a really slow rate:<br><br>    [root@lmcx2 /root]# ping 192.166.0.1<br>    PING 192.166.0.1 (192.166.0.1) from 192.166.50.1 : 56(84) bytes of<br>data.<br>    eth0: IRQ 23 is physically blocked! Failing back to low-rate<br>polling.<br>    64 bytes from 192.166.0.1: icmp_seq=0 ttl=255 time=13.367 sec<br>    64 bytes from 192.166.0.1: icmp_seq=1 ttl=255 time=12.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=2 ttl=255 time=11.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=3 ttl=255 time=10.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=4 ttl=255 time=9.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=5 ttl=255 time=8.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=6 ttl=255 time=7.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=7 ttl=255 time=6.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=8 ttl=255 time=5.370 sec<br>    64 bytes from 192.166.0.1: icmp_seq=9 ttl=255 time=4.370 sec<br></pre>
                  </blockquote>
                  <pre wrap=""><!----><br>This is exactly what is expected when the interrupt isn't getting<br>through.  The driver eventually decides to give up and processes all of<br>the packets in the Rx ring.<br><br>The low-rate polling isn't intended to work well.  Instead it's a<br>fall-back so that you can ssh to the server and figure out what is<br>broken.  To do high-throughput polling the driver would need many more<br>Rx buffers and access to 1000+ Hz polling rather than the kernel's<br>standard 100Hz timer ticks.<br><br></pre>
                  <blockquote type="cite">
                    <pre wrap="">IO APIC #5......<br></pre>
                    </blockquote>
                    <pre wrap=""><!---->...<br></pre>
                    <blockquote type="cite">
                      <pre wrap="">IRQ22 -&gt; 6<br>IRQ23 -&gt; 7<br>IRQ26 -&gt; 10<br>IRQ27 -&gt; 11<br>IRQ28 -&gt; 12<br>IRQ30 -&gt; 14<br>IRQ31 -&gt; 15<br></pre>
                      </blockquote>
                      <pre wrap=""><!---->...<br></pre>
                      <blockquote type="cite">
                        <pre wrap="">PCI-&gt;APIC IRQ transform: (B0,I4,P0) -&gt; 28<br>PCI-&gt;APIC IRQ transform: (B0,I5,P0) -&gt; 26<br>PCI-&gt;APIC IRQ transform: (B0,I5,P1) -&gt; 27<br>PCI-&gt;APIC IRQ transform: (B0,I6,P0) -&gt; 31<br>PCI-&gt;APIC IRQ transform: (B0,I15,P0) -&gt; 10<br>PCI-&gt;APIC IRQ transform: (B1,I0,P0) -&gt; 30<br>PCI-&gt;APIC IRQ transform: (B3,I4,P0) -&gt; 22<br>PCI-&gt;APIC IRQ transform: (B3,I5,P0) -&gt; 23<br>PCI-&gt;APIC IRQ transform: (B3,I6,P0) -&gt; 22<br>PCI-&gt;APIC IRQ transform: (B3,I7,P0) -&gt; 23<br></pre>
                        </blockquote>
                        <pre wrap=""><!---->...<br></pre>
                        <blockquote type="cite">
                          <pre wrap="">eepro100.c:v1.19 12/19/2001 Donald Becker  <a class="moz-txt-link-rfc2396E" href="mailto:becker@scyld.com">&lt;mailto:becker@scyld.com&gt;</a><br><a class="moz-txt-link-rfc2396E" href="mailto:becker@scyld.com">&lt;becker@scyld.com&gt;</a><br>  <a class="moz-txt-link-freetext" href="http://www.scyld.com/network/eepro100.html">http://www.scyld.com/network/eepro100.html</a><br><a class="moz-txt-link-rfc2396E" href="http://www.scyld.com/network/eepro100.html">&lt;http://www.scyld.com/network/eepro100.html&gt;</a> <br>eth0: OEM Intel i82559 rev 8 at 0xe0843000, 00:02:A5:DA:80:75, IRQ 23.<br>eth1: OEM Intel i82559 rev 8 at 0xe0845000, 00:02:A5:DA:80:74, IRQ 22.<br></pre>
                          </blockquote>
                          <pre wrap=""><!----><br>These are the problem interfaces on the daughtercard, correct?<br></pre>
                          </blockquote>
Yes,<br>
                          <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
                            <pre wrap=""><br>(I expected the daughtercard interfaces to be eth2 &amp; 3.)<br><br></pre>
                            <blockquote type="cite">
                              <pre wrap="">eth2: OEM Intel i82559 rev 8 at 0xe0847000, 00:02:A5:D6:4A:C3, IRQ 23.<br>eth3: OEM Intel i82559 rev 8 at 0xe0849000, 00:02:A5:D6:4A:C2, IRQ 22.<br></pre>
                              </blockquote>
                              <pre wrap=""><!----><br>And these are on the base PCI card and work fine.<br></pre>
                              </blockquote>
Yes.<br>
                              <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
                                <pre wrap=""><br></pre>
                                <blockquote type="cite">
                                  <pre wrap="">eth4: OEM Intel i82559 rev 8 at 0xe084b000, 00:30:48:11:FE:68, IRQ 31.<br>eth5: OEM Intel i82559 rev 8 at 0xe084d000, 00:30:48:11:F7:62, IRQ 28.<br></pre>
                                  </blockquote>
                                  <pre wrap=""><!----><br>And these are on the motherboard.  (On-motherboard devices are always<br>last, designed so that a plug-in card overrides a potentially broken<br>on-board device.)<br><br></pre>
                                  </blockquote>
You're right.<br>
                                  <blockquote type="cite" cite="mid:Pine.LNX.4.10.10202201824240.12621-100000@vaio.greennet">
                                    <pre wrap=""><br>Donald Becker				<a class="moz-txt-link-abbreviated" href="mailto:becker@scyld.com">becker@scyld.com</a><br>Scyld Computing Corporation		<a class="moz-txt-link-freetext" href="http://www.scyld.com">http://www.scyld.com</a><br>410 Severn Ave. Suite 210		Second Generation Beowulf Clusters<br>Annapolis MD 21403			410-990-9993<br></pre>
                                    </blockquote>
                                    <br>
                                    <div class="moz-signature">-- <br>
                                    <p><b><font face="3D&quot;Arial&quot;">Claude LeFran&ccedil;ois</font></b><br>
                                    <font size="3D2" face="3D&quot;Arial&quot;">Packet
Core Network (LMC/XP/DG)</font><br>
                                    <font size="3D2" face="3D2&quot;Arial&quot;">Ericsson
Canada Inc.</font><br>
                                    <font size="3D2" face="3D2&quot;Arial&quot;">Tel:
+1 (888) 345-7900 x7579</font><br>
                                    <font size="3D2" face="3D2&quot;Arial&quot;">Fax:
+1 (514) 345-5837</font><br>
                                    <font size="3D2" face="3D2&quot;Arial&quot;"><a href="3D%22mailto:Claude.Lefrancois@ericsson.sca%22">
Mailto:Claude.Lefrancois@ericsson.ca</a>
                                    </font></p>
                                    </div>
                                    <br>
                                    </body>
                                    </html>

--------------060507060804060606090105--