[Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caWed Jan 3 07:53:35 PST 2007
- Previous message: [Beowulf] picking out a job scheduler
- Next message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> " p1_8544: p4_error: Timeout in Establishing connection to remote process: > 0 " > rm_l_1_8667: (359.417969) net_send: could not write to fd=5, errno=104 > > We have been trying the same for the past two days and we didnt get any > solution for the above. but what have you tried? I would guess that this is a simple rsh config problem, nothing to do with mpich. > Also we downloaded the Latest MPICH 1.2.7p1 and configured the same. now for but why do you think the problem lies with mpich? > The same testing with LAM/MPI and OPENMPI are working fine. lam being mostly just a previous version of lam, and I think inheriting lam's agent-based process-starting, no? personally, I'm pretty convinced that MPI implementations should stay out of the jobstarter business, and go with straight agentless (ssh-based) job spawning.
- Previous message: [Beowulf] picking out a job scheduler
- Next message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
