[Beowulf] tcp error: Need ideas!
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at pbm.comWed Jan 21 14:49:33 PST 2009
- Previous message: [Beowulf] tcp error: Need ideas!
- Next message: [Beowulf] tcp error: Need ideas!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Jan 21, 2009 at 04:40:26PM -0600, Gerry Creager wrote: > We're seeing the following error in WRF compiled with openMPI and the > PGI 7.2 compiler: > mca_btl_tcp_frag_send:writev failed with errno=104 It's unfortunate that OpenMPI is following in the footsteps of MPICH and doesn't print out that 104 = "Connection reset by peer". The OpenMPI FAQ has some info about that: http://open-mpi.basemirror.de/faq/?category=tcp > While all nodes were accessible prior to the run and returned > appropriate "stuff" when queried with, eg., ssh and a command, two nodes > now return something like this: > [gerry at brazos SCOOP12km]$ ssh c0522 > Received disconnect from 192.168.200.154: 2: Bad packet length 808464432. That's kinda interesting. Perhaps the network chip got into a really funny state, and is corrupting packets? Power off for a while. -- greg
- Previous message: [Beowulf] tcp error: Need ideas!
- Next message: [Beowulf] tcp error: Need ideas!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
