How to tell when a job is swapping?

Tue Feb 19 07:21:49 PST 2002

On Tue, 19 Feb 2002, Jeff Layton wrote:

> Good morning,
> 
>    For a while now I've been checking if a job is swapping
> on our clusters using bWatch. The nodes are dual CPU
> boxes and we run two MPI processes per node. I usually
> look at the load on the nodes to see if it is above 3.0
> (sometimes our code will peak out at about 2.3) and I
> look at the free swap space number (bWatch just cats the
> /proc/meminfo file).
>    I usually assume that if the free swap space falls below
> the maximum and load starts climbing that the node is
> swapping. However, when I talk to the user, he states that
> the code is running fine and the timing numbers are where
> they should be. So, I'm obviously interpreting something
> incorrectly (unless the job is really swapping but for some
> reason performance is unaffected).
>    Does someone give me a could way to check if a job
> is swapping? Maybe a URL?

I'm ALMOST finished with a rewrite of procstatd that will be called
xmlsysd -- I've converted the entire API into xml so that what the
daemon generates is xml where nesting and so forth echo the /proc
structure and the systems calls as best as possible. You're welcome to a
pre-alpha copy (which works just fine on a RH 7.1 or 7.2 box) but it has
no GUI clients yet (next project).  It should be fairly easy to write an
e.g. web client for it, though, xml being what it is.  procstatd also
delivers the information you are interested in.

In terms of whether or not a particular job is swapping, it's pretty
hard because a job causes swaps because of total SYSTEM memory
requirements, not just those of the job.  However, in /proc/stat there
is a field called "swap" (and another one called "page" that you should
also keep your eye on).  It basically counts swaps (in) (out) since the
system was last booted.  You can thus get the RATE at which a system
swaps by sampling swap at two different times, subtracting, and dividing
by the time difference.

In general, this rate should be zero.  If it is nonzero when your
application is running, that's "bad".

You can also get the information with all sorts of systems tools (top
and friends).  Perhaps my favorite (and the most useful) is "vmstat".
Use "vmstat 5" on the system in question and watch the output.  This
should let you literally watch the system march into oblivion, as it
shows you number of running processes, memory consumption, swap RATES,
i/o (block in/out rates), system loads (interrupt and context switch
RATES), and cpu percentages consumed broken down into user, system and
idle.

If you use this and e.g. top to look at what your job is doing you
should be able to pick up a memory leak (RSS/VSS steadily growing) and
at least some other resource bottlenecks.

Eventually, I hope for xmlsysd to be able to provide lightweight access
to this sort of data for entire clusters both via GUI and tty interfaces
(as procstatd has in the past) but xmlification has taken some time to
work out so I haven't really begun work on the clients.

   rgb

> 
> TIA,
> 
> Jeff
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu