[Beowulf] Timeout in making connection to remote process...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Reuti reuti at staff.uni-marburg.deSat Jun 18 04:11:27 PDT 2005
- Previous message: [Beowulf] Timeout in making connection to remote process...
- Next message: [Beowulf] MPI community
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jorge, are you using hardcoded /etc/hosts on the machines, or are you using any DNS (which might sometimes be unavailable)? The machines have enough memory for your job, or started they to swap? Which MPI lib & version? - Reuti Quoting jorgegg at sas.upenn.edu: > Hi, > I'm running a fortran 90 code on a Linux cluster with 7 nodes (I actually > only > use 6) using the MPI library. I can change the "size" of the program > (meaning > the number of operations to be performed although all operations are the > same). > The problem is that when I try to run the program using mpirun sometimes > --most > of the times but not always-- the program won't start running and I'll get > the > following message (the name of the cluster is max and it's not always the > node > number 2): > p0_20621: p4_error: Timeout in making connection to remote process on > maxsl2-d: > 0 > bm_list_20622: p4_error: interrupt SIGINT: 2 > > Some other times it would run fine even with the same number of operations! > It's > not the number of people using the cluster because most of the time it's > only > me. This problem also arises sometimes after 3 or 4 hours of running the > program. > Do you have any idea of why this happens? I estimate that with this number > of > nodes my code should run around 3 weeks to finish so I really need to rely > on > the computers keep communicating. > Thank you very much and please let me know if I didn't explain myself > clearly. > Jorge > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] Timeout in making connection to remote process...
- Next message: [Beowulf] MPI community
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
