[Beowulf] size of swap partition

Mark Hahn hahn at mcmaster.ca
Mon Jun 9 21:58:12 PDT 2008

> We have the potential to have to swap whole jobs out of memory on a complete 
> node.

that was our intent as well.  among other things, this scheme enables
running the cluster "split-personality" - mostly shorter/smaller even
interactive jobs during the day, with big/long jobs running at night.
unfortunately, you need a smart scheduler to do this, and ours is dumb.

>> beleive, it is 2 or more GB per core; we have 16 GB per dual-socket 
>> quad-core Opteron node). What is typical modern swap size today?

are you willing to use a node which is actually occupying 16 GB of swap?

it is possible to tune how the kernel responds to memory crunches - 
for instance, you can always avoid OOM with the vm.overcommit_memory=2
sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap
to get the desired limits.)  in this mode, the kernel tracks how much VM
it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo)
and compares that to a commit limit that reflects ram and swap.

if you don't use overcommit_memory=2, you are basically borrowing VM
space in hopes of not needing it.  that can still be reasonable, considering
how often processes have a lot of shared VM, and how many processes 
allocate but never touch lots of pages.  but you have to ask yourself:
would I like a system that was actually _using_ 16 GB of swap?  if you
have 16x disks, perhaps, but 16G will suck if you only have 1 disk.
at least for overcommit_memory != 2, I don't see the point of configuring
a lot of swap, since the only time you'd use it is if you were thrashing.
sort of a "quality of life" argument.

>> But what are the reccomendations of modern praxis ?

it depends a lot on the size variance of your jobs, as well as 
their real/virtual ratio.  the kernel only enforces RLIMIT_AS
(vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did 
RLIMIT_RSS or not.

if you use overcommit_memory=2, your desired max VM size determines 
the amount of swap.  otherwise, go with something modest - memory size
or so.  but given that the smallest reasonable single disk these days
is probably about 320GB, it's hard to justify being _too_ tight.

More information about the Beowulf mailing list