Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Wed Jan 3 07:53:35 PST 2007


> "  p1_8544: p4_error: Timeout in Establishing connection to remote process:
> 0  "
> rm_l_1_8667: (359.417969) net_send: could not write to fd=5, errno=104
>
> We have been trying the same for the past two days and we didnt get any
> solution for the above.

but what have you tried?  I would guess that this is a simple rsh config
problem, nothing to do with mpich.

> Also we downloaded the Latest MPICH 1.2.7p1 and configured the same. now for

but why do you think the problem lies with mpich?

> The same testing with LAM/MPI and OPENMPI are working fine.

lam being mostly just a previous version of lam, and I think inheriting
lam's agent-based process-starting, no?

personally, I'm pretty convinced that MPI implementations should stay
out of the jobstarter business, and go with straight agentless (ssh-based)
job spawning.



More information about the Beowulf mailing list