[Beowulf] ssh connection problem

Ruhollah Moussavi Baygi ruhollah.mb at gmail.com
Fri Jun 1 13:41:05 PDT 2007


Hi,

Thank you for your answers,

But, please ignore the content of the 'links' I have posted, I didn't mean
to send you those links. I just did google to find a solution for our
cluster's problem 'Disconnecting:…'. However, because I couldn't find a
proper solution via googling, I posted it to Beowulf, so, I just did
copy-paste the sentence 'Disconnecting:…' in my gmail. That's why you can
see 'links' in my email.

Returning to our problem, the results of 'netstat –i' and '-s' are as
follows, respectively.

Please note that:

a)    I use cat 6,

b)    it is nearly improbable to have electricity noise

c)    the head-node has two NICs, eth0 is for internal zone, i.e. computing
nodes, which is running with no problem. eth1 is for external zone, i.e. to
be connected by our users via ssh. This one has disconnecting problem.

d)    it doesn't seem that there is any SW/router problem. Because in the
same network, there is some other machine, which is connected by users via
ssh with no problem.

___________________________________________________________________

*[root at node01 ~]# netstat -i***

Kernel Interface table

Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR
Flg

eth0       1500   0 586745989      0      0      0 598858710      0      0
0 BMRU

eth1       1500   0   701868      0      0      0   325542      0      0
0 BMRU

lo        16436   0     1959      0      0      0     1959      0      0
0 LRU



*[root at node01 ~]# netstat -s***

Ip:

    585891011 total packets received

    0 forwarded

    0 incoming packets discarded

    585887228 incoming packets delivered

    597668214 requests sent out

Icmp:

    34 ICMP messages received

    21 input ICMP message failed.

    ICMP input histogram:

        destination unreachable: 25

        timeout in transit: 5

        echo requests: 4

    601 ICMP messages sent

    0 ICMP messages failed

    ICMP output histogram:

        destination unreachable: 597

        echo replies: 4

Tcp:

    78 active connections openings

    360 passive connection openings

    0 failed connection attempts

    18 connection resets received

    8 connections established

    585798178 segments received

    597666644 segments send out

    16197 segments retransmited

    94 bad segments received.

    1682 resets sent

Udp:

    1005 packets received

    596 packets to unknown port received.

    0 packet receive errors

    1019 packets sent

TcpExt:

    2 resets received for embryonic SYN_RECV sockets

    26 packets pruned from receive queue because of socket buffer overrun

    ArpFilter: 0

    60 TCP sockets finished time wait in fast timer

    1 packets rejects in established connections because of timestamp

    734435 delayed acks sent

    127 delayed acks further delayed because of locked socket

    Quick ack mode was activated 7963 times

    724 packets directly queued to recvmsg prequeue.

    6030 packets directly received from backlog

    164431 packets directly received from prequeue

    571897537 packets header predicted

    138 packets header predicted and directly queued to user

    TCPPureAcks: 44870

    TCPHPAcks: 458279645

    TCPRenoRecovery: 0

    TCPSackRecovery: 2875

    TCPSACKReneging: 0

    TCPFACKReorder: 0

    TCPSACKReorder: 0

    TCPRenoReorder: 0

    TCPTSReorder: 0

    TCPFullUndo: 0

    TCPPartialUndo: 0

    TCPDSACKUndo: 1

    TCPLossUndo: 7099

    TCPLoss: 626

    TCPLostRetransmit: 0

    TCPRenoFailures: 0

    TCPSackFailures: 1635

    TCPLossFailures: 169

    TCPFastRetrans: 4294

    TCPForwardRetrans: 23

    TCPSlowStartRetrans: 1130

    TCPTimeouts: 8329

    TCPRenoRecoveryFail: 0

    TCPSackRecoveryFail: 279

    TCPSchedulerFailed: 0

    TCPRcvCollapsed: 2731

    TCPDSACKOldSent: 8194

    TCPDSACKOfoSent: 0

    TCPDSACKRecv: 7125

    TCPDSACKOfoRecv: 0

    TCPAbortOnSyn: 0

    TCPAbortOnData: 28

    TCPAbortOnClose: 8

    TCPAbortOnMemory: 0

    TCPAbortOnTimeout: 12

    TCPAbortOnLinger: 0

    TCPAbortFailed: 0

    TCPMemoryPressures: 0

___________________________________________________________________
-- 
Best,
Ruhollah Moussavi Baygi

On 5/29/07, Robert G. Brown <rgb at phy.duke.edu> wrote:
>
> On Sun, 27 May 2007, Ruhollah Moussavi Baygi  wrote:
>
> > Hi everybody at Beowulf,
> >
> > I have a serious problem with ssh connection to our cluster. Every
> > hint/help/suggestion, which can help me to solve it, is highly
> appreciated.
> >
> > Most of the time, when users want to connect and run their programs from
> > their own PCs, the ssh connection failed, especially during transfer
> files
> > from/to head-node. Our user's PCs are mainly WindowsXP, so they use
> packages
> > like SSH Secure Shell for connection and file transfer, or Putty for
> > connection and WinSCP for file transfer.
> >
> >
> > The error massage is as follows:
> >
> > 'Disconnecting: Corrupted MAC on input'
>
> This sounds to me like hardware problems.  What does your physical
> network look like?  Is it built with the right cables, within spec, with
> decent switches?  Do you see other evidence of network packet
> corruption?
>
> > <
> http://www.google.com/history/url?url=http://ubuntuforums.org/showthread.php%3Ft%3D202076&ei=wkJZRsGfHZf-0gTehKXrDQ&sig2=lIzQGYq3zN0Tz2EC8b4dAw&zx=JGkABbsjtaA&ct=w
> >
> >
> > or
> >
> > 'Disconnecting: bad packet
>
> Yes, sounds like bad hardware.  Perhaps your cables aren't cat 5?
> Perhaps your electrical power has noise?  Perhaps your switch(es) are
> broken or have been taken over by trolls?  This sounds like you're
> failing packet checksum tests or experiencing pretty serious TCP
> collision problems.
>
> What do the network statistics look like on the interfaces in question?
>
>     rgb
>
> > length...<
> http://www.google.com/search?q=disconnecting:+bad+packet+length+from+windows+to+linux+machine&hl=en
> >',
> > followed by a long integer.
> >
> >
> > This problem has practically made our cluster unusable. So, I would be
> > thankful for any coming advice.
> >
>
> --
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>


-- 
Best,
Ruhollah Moussavi Baygi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070602/11a5e345/attachment.html>


More information about the Beowulf mailing list