[Beowulf] p4_error: net_recv read: probable EOF on socket: 1
hahn at physics.mcmaster.ca
Mon May 8 11:04:34 PDT 2006
> p4_error:interrupt SIGSEGV: 11
well, some program tried to access inappropriate memory.
note that this _can_ be due to hardware problems (overheating,
bad memory, etc).
> p4_error: net_recv read: probable EOF on socket: 1
afaik, this is from a different node and just means that it noticed
that its socket closed to the peer who SEGV'ed.
> This error occurs after running the code for several hours using all
> processors in my cluster. I have seen several postings similar to this
> on the web, however, I have not seen any posted solutions. My
for a good reason - the problem is probably particular to the cluster,
not general to the software...
> Mpich_1.2.1 compiled w/ Portland compilers
that said, it seems inappropriate to be running a quite old version.
wow, that actually dates from 09/05/2000, at least according to the
timestamps on the mpich ftp server...
More information about the Beowulf