[Beowulf] Kill zombies after a parallel run

Toon Knapen toon.knapen at fft.be
Mon May 8 05:31:09 PDT 2006


Peter Jakobi kindly just gave me following reply:


There's

- zap: the kill example in Larry Wall's Perlbook
   # interactive verification
   # any regular expression matching any
   # string in the output of ps -ef, ...

   # I tend to keep hacking my ancient copy of this
   # so currently my copy can be run non-interactively,
   # kill children, kill per tty (carefully craft your
   # regex, otherwise DO NOT use with -y to randomly
   # kill wrong processes!!!), or list/nice processes
   # instead of killing.

   # for a short while, I've put a copy here:
   http://www.oa.shuttle.de/kefk/tmp/zap



non-internactive and a bit heavy-handed:
- killall; by name,can also kill acc. to PGID (process groups)
- killproc; by name of executable; -G incl. children in
   current process group or session(check that these
   are identical?). -g
   to kill the incl. other process in the group.
   # you are also able to get the list of processes
   # the use a specific file via lsof, than pass the
   # pids to kill. Quickly, but pid reuse hopefully
   # doesn't occur within a few secs. You'd need to
   # check the kernel to be certain that this is
   # the case (any other kernel behaviour I'd consider
   # a bug).
- skill/snice
   # adds selection by tty, command, ... . But still
   # only command binary name in the sense of killproc.


 > I think what the OP is asking is how to kill (automagicallY) all 
processes in a parallel run once one process crashed (due to 
segmentation failure or soth.)

 > Generally if one process (in the whole bunch of processes) crashes, 
all other processes will wait eternally from the moment they try to 
communicate with the crashed process or at the MPI_Finalize. So how can 
one kill all remaining processes?





More information about the Beowulf mailing list