[Beowulf] tcp error: Need ideas!
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gerry Creager gerry.creager at tamu.eduWed Jan 21 14:40:26 PST 2009
- Previous message: [Beowulf] CAPITALIZATION
- Next message: [Beowulf] tcp error: Need ideas!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
History/background/description of the cluster * 126 node Dell 1950 cluster with dual-quad core Xeons * HP 5412zl switch for gigabit cluster backplane and 10GBE interconnect to selected services (file server, etc) * Gigabit interconnect * Hand compiled 2.6.26 kernel * bnx2 module loaded for the Broadcom onboard nics * Switch, compute nodes, head node set to 9000 byte MTU We're seeing the following error in WRF compiled with openMPI and the PGI 7.2 compiler: mca_btl_tcp_frag_send:writev failed with errno=104 While all nodes were accessible prior to the run and returned appropriate "stuff" when queried with, eg., ssh and a command, two nodes now return something like this: [gerry at brazos SCOOP12km]$ ssh c0522 Received disconnect from 192.168.200.154: 2: Bad packet length 808464432. I'm stumped and looking for causes and solutions. Yeah, the WRF as compiled did run before the change to Jumbos. Do I reduce the size of the frames to something smaller, like 8800 bytes? 7500? 1500? I'm not completely out of ideas but stumped. Thanks, gerry -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
- Previous message: [Beowulf] CAPITALIZATION
- Next message: [Beowulf] tcp error: Need ideas!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
