Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Kill zombies after a parallel run

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

David Kewley kewley at gps.caltech.edu
Tue May 2 13:02:46 PDT 2006


I don't have a solution for your case, but here's an idea: MPICH-GM (MPICH 
for the Myrinet GM protocol) has an option to mpirun.ch_gm that would do 
what you want, if you were running Myrinet/GM:

     --gm-kill <n>   Kill all processes <n> seconds after the first exits.

Other than that, a resource manager may do what you want -- our resource 
manager, LSF, does this for us.  It even mostly works. :)

David

On Tuesday 02 May 2006 00:49, mg wrote:
> Hi all,
>
> I use MPICH-1.2.5.2 to generate and run an FEM parallel application.
>
> During a parallel run, one process can crash, leaving the other
> processes run and OS commands have to be used for kill these zombies.
> So, does someone have a solution to avoid zombies after a failed
> parallel run: can the crashed process kill the other processes?
>
> Thanks,
> Mathieu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list