[Beowulf] MPICH2: Handle Limit?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Rob Ross rross at mcs.anl.govSat Feb 5 08:05:04 PST 2005
- Previous message: [Beowulf] MPICH2: Handle Limit?
- Next message: [Beowulf] Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Ron, Well there *is* a limit, because the handles are represented by an integer, but from a practical perspective you should never have to worry about it. I have not ever encountered this before. I wrote most of that code, so I would very much like to figure out what is happening in your case. I tend to agree that it is probably some sort of buffer overrun. We test on IA32 with gcc as our primary environment. What exactly is happening when it "bombs"? Are you getting a segfault? Is this something where you could capture a core file and get a stack trace? Are there any errors reported? Will the problem manifest itself with a single-process run? If so, you could try valgrind. Actually, while we're discussing it, why do you need "lots" of datatypes to exchange ghost cells? There might be a way to simplify that too. Regards, Rob On Fri, 4 Feb 2005, R Hamann wrote: > I thought any limit would be wierd, let alone something like 84 (7 X > 12?) Anyway, I thought it was based on the number of MPI variables > declared (data_types, windows, requests) because every time I added > new declarations, it would hang on Fedora core 2, but run to > completion on Scyld (but with erroneous results). If I deleted unused > MPI declarations, it would start to work again. I counted all my > handles and came up with 84. > > However, after deleting two 26 element arrays of handles, I thought it > would work. When I added more handles, it bombed again. I started to > try other things. I added 4 junk ints. I didn't use the variables I > declared, but it still bombed. When I converted them to chars, it > started working again. Very strange. > > Have you ever encountered this before? I'm doing a 3d cellular > automata, so I need a lot of datatypes for exchange of ghost cells. > It's obviously some strange error I've made that's manifesting itself > in MPI instead of a runtime or sytax error. I'm gonna try looking for > any buffer overruns now, but other than that I'm stumped. > > GCC on Fedora Core 2 and on Scyld Beowulf > MPICH 2 1.0 > > Thanks, > > R
- Previous message: [Beowulf] MPICH2: Handle Limit?
- Next message: [Beowulf] Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
