broken pipe at MPI startup

Qian Peng peng at apollo.gat.com
Wed Jan 10 18:52:37 PST 2001


I had a small cluster of 6 duals and recently doubled the size of it.  A
program once ran fine on the 12 processors.  Now when I run it on all 24
processors, I will occasionally get "Command terminated on signal 13" error
at the mpirun level.  The broken pipe is when mpirun is trying to start the
executables.  If I only use 12 processors, whether from 6 nodes or use one
processor each from all 12 nodes, I cannot make this error happen.  I'm
using mpirun with ssh.  It seems to be random when and on which node this
error occurs.  Any insights on what may the possible causes be?  Thanks,

-Qian Peng
General Atomics







More information about the Beowulf mailing list