[Beowulf] confused about high values of "used" memory under "top" even without running jobs

Mark Hahn hahn at mcmaster.ca
Wed Aug 12 12:07:57 PDT 2009


> I am a bit confused about the high "used" memory that top is showing on one
> of my machines? Is this "leaky" memory caused by codes that did not return
> all their memory? Can I identify who is hogging the memory? Any other ways
> to "release" this memory?

free memory is WASTED memory.  linux tries hard to keep only a smallish,
limited amount of memory wasted.  if you add up rss of all processes,
the difference between that and 'used' is normally dominated by kernel
page-cache.  see /proc/sys/vm/drop_caches on how to force the kernel 
to throw away FS-related caches.

also, I often do this:
awk '{print $3*$4,$0}' /proc/slabinfo|sort -rn|head
to get a quick snapshot of kinds of memory use.

> Linux is also supposed to start using as much memory as you give it? Just
> confused if this is something I need to worry about or not.

you should never worry about paging (swapping, thrashing) until you see
nontrivial swapin (NOT out) traffic.  (ie, the 'si' column in "vmstat 1").

> Incidentally the way I discovered this was because users reported that their
> codes were running ~30% faster right after a machine reboot as opposed to
> after a few days running.

isn't this one of the anomalous nehalem machines we've been talking about?
if so, it's become clear that the kernel isn't managing the memory
numa-aware, so the problem is probably just poor numa-layout/balance
of allocations.

> that in a scheduler based environment (say PBS) the last job releases
> all its memory resources before the new one starts running?

you could drop_caches, but this would also hurt you sometimes.



More information about the Beowulf mailing list