[Beowulf] mem consumption strategy for HPC apps?

Wed Apr 13 23:59:34 PDT 2005

What is the ideal way to manage memory consumption in HPC applications?

For HPC applications, performance is everything. Next we all know about 
the famous performance-memory tradeoff which says that performance can 
be improved by consuming more memory and vice versa. Therefore HPC 
applications want to consume all available memory.

But the performance-memory tradeoff as mentioned above supposes infinite 
memory and infinite memory bandwith. Because memory if finite, consuming 
more memory as physically available will result in swapping by the OS 
and therefore a big performance hit. And since BW is also finite and 
latency we have caching. But now we also need to be cautious not to 
loose time due to cache trashing.

Knowing this we could say that HPC applications generally want to eat 
all available memory but not more. All available memory here means all 
physical memory minus the physical memory consumed by the system and its 
basic services because we suppose that HPC applications do not share 
their processor with other applications (to have the whole cache for 
itself). Well this is true for single-processor machines. On multi-proc 
machines (smp,numa) only a part of the physical memory can be consumed.

So because the application does not know how much physical memory it is 
allowed to eat, it might be best that the user just specifies it when 
launching the application.

But suppose now a single-proc machine has 8GB physical memory. Taking 
into account that the OS and its services will never take more than 
500MB, the user might say to the HPC application that it can eat up to 
7.5GB of the physical memory.

But what does this number mean to the HPC application that is trying to 
optimise its performance? Should it try to never consume more memory as 
7.5GB or should it only try to consume never more as 7.5GB in intensive 
loops (e.g. in the solver)? In the latter case, can we rely on the OS 
swapping out the inactive parts of our application to make space for the 
solver or would it be better that the application puts all 
data-structures that are not used in the solver on disk to make sure? 
OTOH if we want to limit the total memory consumption to 7.5GB, would it 
be best to allocate a memory-pool of 7.5GB and if the pool is full abort 
the application (after running for days)?

toon