[Beowulf] Kill zombies after a parallel run
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Kewley kewley at gps.caltech.eduTue May 2 13:02:46 PDT 2006
- Previous message: [Beowulf] Kill zombies after a parallel run
- Next message: [Beowulf] Kill zombies after a parallel run
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I don't have a solution for your case, but here's an idea: MPICH-GM (MPICH
for the Myrinet GM protocol) has an option to mpirun.ch_gm that would do
what you want, if you were running Myrinet/GM:
--gm-kill <n> Kill all processes <n> seconds after the first exits.
Other than that, a resource manager may do what you want -- our resource
manager, LSF, does this for us. It even mostly works. :)
David
On Tuesday 02 May 2006 00:49, mg wrote:
> Hi all,
>
> I use MPICH-1.2.5.2 to generate and run an FEM parallel application.
>
> During a parallel run, one process can crash, leaving the other
> processes run and OS commands have to be used for kill these zombies.
> So, does someone have a solution to avoid zombies after a failed
> parallel run: can the crashed process kill the other processes?
>
> Thanks,
> Mathieu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] Kill zombies after a parallel run
- Next message: [Beowulf] Kill zombies after a parallel run
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
