[Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?

Remy Dernat remy.dernat at univ-montp2.fr
Mon Feb 9 00:43:02 PST 2015


Le 09/02/2015 03:56, Christopher Samuel a écrit :
> On 07/02/15 14:57, Alan Louis Scheinine wrote:
>
>> Only problem I've seen is that if a user allocates too much memory,
>> OOM killer can kill maintenance processes such as a scheduler daemon.
> This is why we disable overcommit. :-)
>
Hi,

I already saw that problem on our master. The scheduler, SGE, runs out 
of memory and OOM decided to kill it:

Dec  1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963 
(sge_qmaster) score 948 or sacrifice child

I resolved that issue by disabling "schedd_job_info" in SGE with "qconf 
-msconf".

However, this setting gives significant informations about our jobs.

How should I adjust OOM killer ? Sould I set

|vm.overcommit_memory = 2
|

?

Best regards,

Rémy

-- 
Rémy Dernat
MBB/ISE-M

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150209/b82043d7/attachment.html>


More information about the Beowulf mailing list