how to tell when jobs are finished
ron_chen_123 at yahoo.com
Sat Aug 4 13:44:43 PDT 2001
Why don't you try to catch the SIGCHLD?
When child exits, the parent receives SIGCHLD:
int sh(int i) /* signal handler */
pid = wait(&status);
printf("pid = %d status = %d\n", pid, (status>>8));
signal(SIGCHLD, sh); /* install handler */
for (i=0; i<10;i++) /* parent work on other stuff */
The parent (main function) may not call sleep(10), I
used it as a hack. For you, you should call select to
listen to the new execution request from the cluster
master (like qmaster in SGE).
The program will print out the pid and exit status:
Hello pid = 19443 status = 0
Hello pid = 19444 status = 1
Hello pid = 19445 status = 2
Just wanted to point out one thing here:
We are working on the process level. We need more work
to trace the process tree to make sure that we can get
the information about the job (which consists of 1
more more processes). Currently, if the child forks
another child, we won't be able to know.
--- Nicholas Henke <henken at seas.upenn.edu> wrote:
> Thanks guys for the input--
> I think the signal handler with a polling mechanism
> should work
> very nice.
> > Hi,
> > I agree with Sean about the use of waitpid().
> About the daemon, well, I think
> > is not necesary. If I not misunderstood you, what
> you want to do is execute
> > a certain number of programs and know when anyone
> of those programs exited.
> > Here is my proposal, in the form of a
> > 1.read somehow the list of programs to execute
> > 2. For each program to run, create a child using
> fork() (the master
> > creates all the childs)
> > 3. (optional) you may want to redirect each child
> output to some file
> > 4. get from the childs its pid via some IPC
> mechanism (a pipe will do) and
> > store the in an array or something (i would use a
> linked list, or a
> > search-tree table if you will have lots of pids)
> > 5. Finally, each child calls exec*() and replaces
> its memory image with the
> > program desired - that is - executes the program
> > 6. Now, you have to know when each of the programs
> you executed has exited.
> > For simplicity, lets assume you whant to printf
> something like "Hey!, PID
> > XXXX finished!". You can do this in two forms (to
> mi knowledge):
> > a) loop until all the programs have exited. You
> can use waitpid with WNOHANG
> > to poll each pid. Advantage: Simple.
> Disadvantage: You can't do other
> > productive things while waiting
> > b) Set a signal handler to the alarm signal, and
> test say each second for
> > completion of one of the pids in your list. If
> completion, print message and
> > remove that pid from the list. Disable signal
> callback when list is empty,
> > and re-enable when list has at leat one element.
> Advantages: You can do other
> > productive things, like launching more processes.
> Disadvantages: A little
> > more complicated. If you use this option, see
> sigaction(2) and signal(2)
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
More information about the Beowulf