[Beowulf] disabling swap on cluster nodes?

Bogdan Costescu bcostescu at gmail.com
Wed Feb 11 05:41:19 PST 2015

On Tue, Feb 10, 2015 at 6:54 AM, Mark Hahn <hahn at mcmaster.ca> wrote:
> swapout is good.  it's how the kernel keeps ram warm, rather than
> letting cold pages *waste* your ram.  swap IN can be bad (thrashing),
> but the main point is to toss cold pages into the attic that you're not
> going to use soon or ever.

I like your distinction between swap usage and thrashing, but...
please allow me disagree with what you're saying in an HPC context. It
keeps amazing me how people pay lots of money for the latest
generation of fast interconnects which just improved latency by a
"fabulous" 0.1ns compared to the previous generation, but happily
enable swap on the same nodes. While swapping (both in and out) the
latest generation CPU gets to perform probably about the same FLOPS as
an 8086 with software emulated FP - is this really HPC? It's already
bad enough that the RAM is mostly NUMA these days and the various
levels of cache only have a limited efficiency when the access is as
random as the R in the name implies, such that the CPU almost never
reaches its full potential. So I'd rather have the job killed and
teach the user to move it to a different node or set of nodes with
more RAM - this increases the cluster usage (previously swapping node
is now free for others) and the user satisfaction (no swap=faster
finishing) simultaneously :)

Your description fits *light* swap usage. However, in quite some years
of combined experience on both sides (user and sysadmin) I have never
seen such swap usage on a compute node: either the memory is not
exhausted, swap not touched and therefore could have been absent, or
the swap is heavily used, in which case there's no reason to call it
HPC any longer but rather HSC or HTC (Heavy Swapping/Thrashing
Computing). Of course, I base this on my experience alone; do you or
others see often such a light swap usage on the compute nodes?

For many years now I do not configure swap on my workstations or
laptops either (using mostly Fedora), as they have enough RAM for a
full-blown graphical desktop environment and all assorted
applications. And when I do start to see problems, it is typically one
application (mostly the browser) that gets killed by the OOM-killer
rather than all (when I was pressing the reset or power button out of
pity for the storage device).

I put swap in the same category as WiFi - they are very useful
technologies, but don't fit in HPC. I do see swap and WiFi (or other
slow/lossy networking technology) as very useful teaching tools for
parallel programming, though :)

> we also run with overcommit=2.

Overcommitting is unfortunately still needed for the older Fortran
applications which statically allocate large arrays. Yes, some of
those pesky things are still around...


More information about the Beowulf mailing list