partitioning HD for use of swap & for booting

Robert G. Brown rgb at phy.duke.edu
Fri Jun 1 07:40:55 PDT 2001


On Thu, 31 May 2001, Kent Heyward wrote:

> 1.    What is the correct partitioning of the hard drive? Currently, I
> am booting from the floppy and am not able to use a swap file.

The rule for aeons has been to create swap at least 2x the size of main
memory.  For 2.2 kernels, one could often get away with a swap 1x
memory, no swap, or anything in between BUT remember that if you run out
of virtual memory (real memory plus swap) your systems WILL die an ugly
death in all probability as critical systems tasks cannot get memory to
run and die, leaving the system in an indeterminate and unstable state
at best.  I've heard that 2x memory is the MINIMUM recommended for the
2.4 kernels for performance reasons.

Another performance tip is to spread swap out over all the disks you
have -- swap will happily stripe and parallelize for a small edge in
performance if there is swapspace available on multiple devices that are
bottlenecked at a controller.

You can read a lot about swap in "man mkswap".  You will also notice
that one can swap to/from a special file (instead of a physical device)
by creating a big file with no holes, running mkswap on it with
appropriate arguments, and then telling the system to swap on it.  This
does indeed work, although some years ago I was curious about the
performance penalty associated with it and wrote a short program (using
code fragments and suggestions from Mark Hahn) to test it.  A swapfile
as of the 2.0-2.2 kernel series was significantly slower than swapping
on a block device, although since you "lose" in systems performance if
you swap at all this may not matter to you.  At that time the swapfile
couldn't be on an NFS mount, although that may have changed by now.

A swapfile isn't really a useless concept -- it can save you from having
to repartition a hard disk to meet a new swap requirement, whether
imposed by the demands of a new kernel or by the temporary demands of a
large application.  In all cases, though, significant utilization equals
a performance drain and with memory as unbelievably cheap as it is
(something like 25 cents a MB) there isn't a lot of motivation to run a
small-memory system anymore.  In many cases you're going to be much
better off (performance wise) filling your systems with memory than you
will be using huge swapspaces.  I'd even speculate that a 2.4 kernel
with lots of memory but small swapspace is a better performer than a 2.4
kernel with barely enough memory but plenty of swapspace.

> 2.    Does having a swap file contribute to performance?

Yes it does or can, depending on how much memory you have and how much
of that memory you fill.  For one thing, running out of (virtual) memory
is VERY BAD for performance, as a system crash can cost you days of
non-checkpointed work, hours of downtime, and a lot of angry users or
missed deadlines.  If you have marginal real memory installed, you'd
better have a nice big swapspace so that any peak utilization associated
with e.g. accidentally viewing a PDF file in acroread as spawned by the
latest versions of netscape (which seems to be capable of sucking up all
the memory you have time ten, no matter how large a virtual memory you
have) doesn't immediately crash the system but instead gives you the
chance to killall -9 netscape or killall -9 acroread before your xterms
all freeze.  A virtual memory size 2-3x your typical high water mark
utilization is a very sane idea as "mean performance" is really just
crushed by downtime.

It can also improve running performance (still depending on memory size,
VM size, typical highwater usage, demands of peak memory consuming
applications and so forth).  Linux has a very decent memory management
system, and generally caches and buffers everything it handles, sucking
down free memory as long as it finds it for this purpose down to a
relatively small "pool" that it keeps free to handle the urgent demands
of applications. For example:

rgb at ganesh|T:112>free
                        total   used    free shared buffers cached
Mem:                   771108 755560   15548      0   93724 588804
-/+ buffers/cache:      73032 698076
Swap:                 2104432    388 2104044

(on a running 768 MB DDR 2.4.2-2 system).

Note that I have LOTS of memory and am running relatively little stuff
-- only 73 MB of active space are being used.  However, the system is
USING all by 15 MB or so of memory (the amount kept free for typical
malloc and stack/heap requirements in the various applications that are
running and disappearing)!  Almost 700 MB of system memory are occupied
by buffers and cached files (for example, all the dynamic libraries are
almost certainly in cache and run out of memory instead of off of the
disk images).

At the moment the system is hardly using swap, mostly because this
system hasn't been up very long and has so much physical memory.  As the
system is up for more time and as its memory utilization fluctuates, the
kernel may start using some of the swap to hold pages that are "very"
infrequently used in favor of pages that are popping in and out a lot,
if it ever gets loaded enough that there is a conflict or performance
advantage to doing so.  It may already doing this on a tiny scale, as
I've certainly never had the system loaded enough to actually swap at
the application level with 768 MB of available real memory, but 388 KB
are used just the same.

If I ever DO run a really large application, I've got all 698076 KB of
space that can be recovered by dropping cached files and buffers (saving
anything that is in use in swap if the system thinks it best, or just
dropping it to reload on demand).  A lot of an OS's apparent speed comes
from how efficiently and cleverly it handles things like cached dynamic
link libraries and operational memory pages, and swap is a key component
of the overall VM system.

> 3.    When I run linpack, I see in the beostatus monitor, the cpu usage
> go to 100% and the memory utilization on my 256k node move from 24% to
> 32%(59m to 80m)and never get any higher.  On another node with 128m, it
> will utilize 58 to 90 m.  It appears that adding more memory does not
> improve performance on the benchmark.

As long as the system has "enough" memory to buffer and cache the stuff
that is actually in use, you won't see much difference.  On a scyld node
that is pared-to-the-bone-thin in terms of running processes and
required libraries, you probably have a very small footprint in terms of
kernel, dynamic libraries and running applications.

Run the benchmark at sizes that use up all the memory and you'll start
seeing a major performance difference.  You'll begin to see it as you
closely approach using all the memory, but Linux is >>good<< at this and
will likely yield remarkably flat performance until it just plain runs
out of room and has to start paging or swapping to keep the application
going.

> 4.    Another cluster uses scalapack as one of their benchmarking tools.
> How does it compare with linpack?

Got me.  I haven't used linpack as a benchmark for many years (for a
while there a decade or so ago vendors were apparently actually tuning
their system design to return a good linpack float rate and it stopped
being as good an indicator of overall systems performance:-( I also
stopped using fortran.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list