[Beowulf] [EXTERNAL] Re: PBS question

Gus Correa gus at ldeo.columbia.edu
Tue Oct 29 14:42:33 PDT 2019


On Tue, Oct 29, 2019 at 4:49 PM Lux, Jim (US 337K) <james.p.lux at jpl.nasa.gov>
wrote:

> True, there’s tons of info in qstat -f, however, doesn’t qstat stop
> showing my job after it completes, though?
>

Unless the system administrator sets a grace period for completed jobs to
stay on the queue.
You could ask your sysadmin to set that up:

qmgr -c 'set server keep_completed=7200'

will keep the completed jobs information available for two hours (time in
seconds).
I think the default is zero seconds, maybe a bit more.
I think it increases the pbs_server memory use, hence there may be some
reluctance to do this,
but for a modest amount of time in a not extremely busy server it may not
be too bad.

However, most (maybe all) of this information is stored in the accounting
logs, after the job completes.
I think read access to those logs is by default blocked to regular users,
but maybe you can try to mollify the sysadmin policy. :)
There are three records there for each job, one for when the job is
submitted,
another for when the job starts, and one for when it ends, tagged by Q,S,E,
respectively.


Maybe there’s a switch that retrieves “last data”?
>
>
I don't know, but a longer "keep_completed" (above) may produce a similar
effect.

You can also ask the system administrator to write a PBS epilogue script to
print out the job requested and used resources
to the job STDOUT log. I did this here. It is useful information for every
user.
Sample output:

job_id=95649.master
job_user=gus
job_group=gus
job_name=FJXndg
job_session_id=10623
job_requested_resources=neednodes=3:ppn=32,nodes=3:ppn=32,walltime=24:00:00
job_used_resources=cput=875:53:15,mem=105991292kb,vmem=152856944kb,walltime=09:36:08
job_queue=production
job_account=
job_exit_code=0

This is just printing variables that are available from PBS at the job's
end.

Prologue/epilogue scripts are documented here:
http://docs.adaptivecomputing.com/torque/3-0-5/a.gprologueepilogue.php




I suppose I could put a qstat -f in as the last thing in the sub job file.
>
>
>
>     resources_used.cpupercent = 99
>
>     resources_used.cput = 00:34:26
>
>     resources_used.mem = 11764kb
>
>     resources_used.ncpus = 4
>
>     resources_used.vmem = 1135264kb
>
>     resources_used.walltime = 00:34:28
>

qstat -f $PBS_JOBID (I think this is the environment variable.)
It may work.
Wouldn't some statistics extracted with qstat -f from previous jobs provide
a reasonable guess for the information
you need to set up future jobs?

Gus Correa


>
> and then there’s /usr/bin/time which might also help.
>
>
>
>
>
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of Gus Correa <
> gus at ldeo.columbia.edu>
> *Date: *Tuesday, October 29, 2019 at 11:34 AM
> *To: *"beowulf at beowulf.org" <beowulf at beowulf.org>
> *Subject: *[EXTERNAL] Re: [Beowulf] PBS question
>
>
>
>
>
>
>
> On Tue, Oct 29, 2019 at 2:00 PM Lux, Jim (US 337K) via Beowulf <
> beowulf at beowulf.org> wrote:
>
> I’m doing some EP job arrays using PBS, and I was wondering if there’s a
> way to find out how much resources the job actually consumed.
>
> For instance, if I do a select 1:ncpus=4 does it actually use 4 CPUs?
>
> Likewise, is there a “memory high water mark”.
>
>
>
> The idea here is that since my job competes with all the other jobs, the
> better I can describe my resource needs (i.e. not over-require) the faster
> my job gets through the system.
>
>
>
> Or are these things I need to instrument in some other way.
>
>
>
> --
>
>
>
> ncpus=4 requests 4 cpus/cores, which PBS/Torque will allocate to your job
> alone (unless the nodes are tagged as work shared).
>
> It is up to you use these cpus (e.g. MPI: mpirun np=4; OpenMP:
> OMP_THREAD_NUM=4) or not (a serial program).
>
> "qstat -f" produces a fair amount of information about the job resources.
>
> The accounting logs, in server_priv/accounting/YYYMMDD  also do, if you
> have access to them.
>
>
>
> You can request a certain amount of memory with the -l (lowercase L)
> switch of qsub.
>
> See 'man qsub' for syntax, and 'man pbs_resources' for which resources can
> be requested (particularly pmem, pvmem, vmem).
>
>
>
> I hope this helps,
>
> Gus Correa
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20191029/f68dfa6c/attachment-0001.html>


More information about the Beowulf mailing list