[Beowulf] debugging

Naoya Maruyama naoya.maruyama at is.titech.ac.jp
Thu Apr 12 09:14:22 PDT 2007

On 4/12/07, Ashley Pittman <ashley at quadrics.com> wrote:
> On Mon, 2007-04-09 at 11:30 -0600, Matt Funk wrote:
> > The reason i want to run on 32 processor though, is that it takes (on
> > 32 procs) several hours till my program crashes. Also, i would like to
> > be able to keep the conditions under which it crashes intact as much
> > as possible (i.e. run on 32 procs rather than 1).
> >
> > Does anyone have any advice? I am open to try out other things as well
> > if possible. I am just starting to learn debugger techniques for a
> > parallel
> > program.
> What you are trying to do isn't uncommon, some of us do it most days.
> having a job which exhibits the problem with only 32 procs and several
> hours isn't a bad reproducer, I've certainly seen much worse.  Debugging
> at this scale isn't exactly interactive but it's small enough to me able
> to make timely progress.
> My advice would be first and foremost to look at the core file, I assume
> your program is receiving a SEGV and exiting?  core files can be
> problematical, partly because they aren't always enabled and partly
> because to extract anything useful out of them you need to run the
> debugger with the same environment as the application was, this isn't
> always as easy as it sounds if you are using modules or something like
> that.

One question. When the debuggee app was a 32-PE MPI job, you would end
up with 32 core files. Would you check each of them manually? Or do
you have any trick to parallellize the checking process? Say, using a
parallel debugger?

Naoya Maruyama
Tokyo Institute of Technology

More information about the Beowulf mailing list