MPICH-1.2.2.3 Problem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gabriel J. Weinstock gabriel.weinstock at dnamerican.comWed Oct 24 12:12:21 PDT 2001
- Previous message: Turning off option of running jobs on the Master Node using Scyld
- Next message: Duplicate email broadcasts...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm trying to get MPICH 1.2.2.3 running on a 4 node cluster of PIII 1 GHz machines. the tstmachines program runs without error and the rsh mechanism is set up and functioning properly. LAM-MPI works out of the box, so we decided to use that for awhile, but we're going to need a production environment and MPICH seemed more suitable. Anyway, I compile the example `cpi.c' program, and do `mpirun -v -np 4 cpi'. Nothing happens for a few minutes, then I get a flurry of `Connection failed for reason: : Connection timed out' messages, followed by p1_10899: p4_error: Timeout in establishing connection to remote process: 0 p3_15707: p4_error: net_recv read: probable EOF on socket: 1 bm_list_4303: (378.120857) Listener: Unable to interrupt client pid=4302. We had a similar problem about 2 months ago which led us to abandon this implementation. There seem to be a number of people having this problem, but no one, and I mean no one, seems to know the answer. Any help would be greatly appreciated. Thanks, Gabe
- Previous message: Turning off option of running jobs on the Master Node using Scyld
- Next message: Duplicate email broadcasts...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
