[Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?
prentice.bisbal at rutgers.edu
Mon Feb 9 10:56:01 PST 2015
On 02/09/2015 03:43 AM, Remy Dernat wrote:
> Le 09/02/2015 03:56, Christopher Samuel a écrit :
>> On 07/02/15 14:57, Alan Louis Scheinine wrote:
>>> Only problem I've seen is that if a user allocates too much memory,
>>> OOM killer can kill maintenance processes such as a scheduler daemon.
>> This is why we disable overcommit. :-)
> I already saw that problem on our master. The scheduler, SGE, runs out
> of memory and OOM decided to kill it:
> Dec 1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963
> (sge_qmaster) score 948 or sacrifice child
> I resolved that issue by disabling "schedd_job_info" in SGE with
> "qconf -msconf".
> However, this setting gives significant informations about our jobs.
> How should I adjust OOM killer ? Sould I set
> = 2
To be clear setting vm.overcommit_memory doesn't directly affect the
behavior of the OOM killer. Turning off overcommit prevents the Linux
virtual memory system from making promises it can't always keep, which
reduces/eliminates the need for the OOM Killer.
Setting vm.overcommit_memory = 2 turns off overcommitting and is the
best choice if you want to avoid the OOM Killer.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf