[Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?

Prentice Bisbal prentice.bisbal at rutgers.edu
Mon Feb 9 10:56:01 PST 2015


On 02/09/2015 03:43 AM, Remy Dernat wrote:
>
> Le 09/02/2015 03:56, Christopher Samuel a écrit :
>> On 07/02/15 14:57, Alan Louis Scheinine wrote:
>>
>>> Only problem I've seen is that if a user allocates too much memory,
>>> OOM killer can kill maintenance processes such as a scheduler daemon.
>> This is why we disable overcommit. :-)
>>
> Hi,
>
> I already saw that problem on our master. The scheduler, SGE, runs out 
> of memory and OOM decided to kill it:
>
> Dec  1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963 
> (sge_qmaster) score 948 or sacrifice child
>
> I resolved that issue by disabling "schedd_job_info" in SGE with 
> "qconf -msconf".
>
> However, this setting gives significant informations about our jobs.
>
> How should I adjust OOM killer ? Sould I set
> |vm.overcomm!
>   it_memory
> = 2
> |
> ?
>
>

To be clear setting vm.overcommit_memory doesn't directly affect the 
behavior of the OOM killer. Turning off overcommit prevents the Linux 
virtual memory system from making promises it can't always keep, which 
reduces/eliminates the need for the OOM Killer.

Setting vm.overcommit_memory = 2 turns off overcommitting and is the 
best choice if you want to avoid the OOM Killer.

--
Prentice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150209/0d8bd130/attachment.html>


More information about the Beowulf mailing list