[Beowulf] MPICH2: Handle Limit?

Rob Ross rross at mcs.anl.gov
Sat Feb 5 08:05:04 PST 2005


Hi Ron,

Well there *is* a limit, because the handles are represented by an 
integer, but from a practical perspective you should never have to worry 
about it.

I have not ever encountered this before.  I wrote most of that code, so I
would very much like to figure out what is happening in your case.  I tend
to agree that it is probably some sort of buffer overrun.  We test on IA32
with gcc as our primary environment.

What exactly is happening when it "bombs"?  Are you getting a segfault?  
Is this something where you could capture a core file and get a stack 
trace?  Are there any errors reported?

Will the problem manifest itself with a single-process run?  If so, you 
could try valgrind.

Actually, while we're discussing it, why do you need "lots" of datatypes 
to exchange ghost cells?  There might be a way to simplify that too.

Regards,

Rob

On Fri, 4 Feb 2005, R Hamann wrote:

> I thought any limit would be wierd, let alone something like 84 (7 X 
> 12?)  Anyway, I thought it was based on the number of MPI variables 
> declared (data_types, windows, requests) because every time I added 
> new declarations, it would hang on Fedora core 2, but run to 
> completion on Scyld (but with erroneous results). If I deleted unused 
> MPI declarations, it would start to work again.  I counted all my 
> handles and came up with 84.
> 
> However, after deleting two 26 element arrays of handles, I thought it 
> would work.  When I added more handles, it bombed again.  I started to 
> try other things.  I added 4 junk ints.  I didn't use the variables I 
> declared, but it still bombed.  When I converted them to chars, it 
> started working again.  Very strange.
> 
> Have you ever encountered this before?   I'm doing a 3d cellular 
> automata, so I need a lot of datatypes for exchange of ghost cells. 
>  It's obviously some strange error I've made that's manifesting itself 
> in MPI instead of a runtime or sytax error.  I'm gonna try looking for 
> any buffer overruns now, but other than that I'm stumped.
> 
> GCC on Fedora Core 2 and on Scyld Beowulf
> MPICH 2 1.0
> 
> Thanks,
> 
> R



More information about the Beowulf mailing list