<div dir="ltr"><div><div><div>Most of my WRF users are running their jobs up at NCAR because of that reason alone.  It's terribly inefficient and complicated to get set up correctly.  Let the WRF pros deal with it...<br><br></div>Mahmood Sayed<br></div>HPC Admin<br></div>US National Institute for Environmental Health Sciences<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 30, 2015 at 2:51 PM, Tom Harvill <span dir="ltr"><<a href="mailto:unl@harvill.net" target="_blank">unl@harvill.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

Hi Prentice,<br>

<br>

Thank you for your reply.  Yes, it's 'bad' code.  It's WRF mostly. If you have suggestions for that app I'm<br>

all ears.  We don't control the code-base.  We're also not allowed to update it except between projects<br>

which is very infrequent.<br>

<br>

It would be ideal if we could discretely control memory allocations to individual processes within<br>

a job but I don't expect it's possible.  I wanted to reach out to this list of experts in case we might be<br>

missing something.<br>

<br>

The resistance comes from increased wait times as a result of staggered serial jobs that prevent<br>

allocations within a node exclusively.  Yes, the users would probably get better aggregate turnaround<br>

time if they waited for node exclusivity...<span class="HOEnZb"><font color="#888888"><br>

<br>

...Tom</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 7/30/2015 1:37 PM, Prentice Bisbal wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Tom,<br>

<br>

I don't want to be 'that guy', but it sounds like the root-cause of this problem is the programs themselves. A well-written parallel program should balance the workload and data pretty evenly across the nodes. Is this software written by your own researchers, open-source, or a commercial program? In my opinion, your efforts would be better spent fixing the program(s), if possible, than finding a scheduler with the feature you request, which I don't think exists.<br>

<br>

If you can't fix the software, I think you're out of luck.<br>

<br>

I was going to suggest requesting exclusive use of nodes (whole-node assignment) the easiest solution. What is the basis for the resistance?<br>

<br>

Prentice<br>

<br>

On 07/30/2015 11:34 AM, Tom Harvill wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

Hi,<br>

<br>

We run SLURM with cgroups for memory containment of jobs.  When users request<br>

resources on our cluster many times they will specify the number of (MPI) tasks and<br>

memory per task.  The reality of much of the software that runs is that most of the<br>

memory is used by MPI rank 0 and much less on slave processes. This is wasteful<br>

and sometimes causes bad outcomes (OOMs and worse) during job runs.<br>

<br>

AFAIK SLURM is not able to allow users to request a different amount of memory<br>

for different processes in their MPI pool.  We used to run Maui/Torque and I'm fairly<br>

certain that feature is not present in that scheduler either.<br>

<br>

Does anyone know if any scheduler allows the user to request different amounts of<br>

memory per process?  We know we can move to whole-node assignment to remedy<br>

this problem but there is resistance to that...<br>

<br>

Thank you!<br>

Tom<br>

<br>

Tom Harvill<br>

Holland Computing Center<br>

<a href="http://hcc.unl.edu" rel="noreferrer" target="_blank">hcc.unl.edu</a><br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</blockquote>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</blockquote>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</div></div></blockquote></div><br></div>