AGAIN: mpi-prog from lam -> scyld beompi DIES
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at conservativecomputer.comMon Dec 10 13:01:27 PST 2001
- Previous message: AGAIN: mpi-prog from lam -> scyld beompi DIES
- Next message: Mass Storage and Parallel I/O -- IEEE & Wiley Press's new book
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Dec 08, 2001 at 12:36:25PM -0800, Peter Beerli wrote: > Some time ago I asked about some problem with my mpi program and a scyld > beowulf cluster and got no real response to it. > - did nobody every port a lam-mpi program onto a scyld-beowulf cluster? > - did I miss the right keywords or what information is missing?? Well, here are a couple of clues. 1) If you really want to use gdb against the processes, and you can convince your program to run with little enough memory that they all fit on one CPU, you can: export ALL_LOCAL=1 mpirun foo bar baz and all the processes will run on the master. Attach gdb, enjoy. bproc isn't a full enough emulation of /proc to run gdb remotely. If you REALLY need to do that you can bpsh gdb to a remote node after bpsh ps to find out the remote PID, etc etc. If you figure this out, do write a script for it so everyone else don't have to deal with such nasty details. 2) You can always add printfs to the program. In your case I would suggest printing out the sent and received value of buffsize, and then another printf after the actual data arrives. My guess is that you're somehow stomping memory in a way that's different in LAM and MPICH, and so the free() causes a coredump. greg
- Previous message: AGAIN: mpi-prog from lam -> scyld beompi DIES
- Next message: Mass Storage and Parallel I/O -- IEEE & Wiley Press's new book
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
