<div dir="ltr">On Thu, Apr 18, 2013 at 7:21 PM, Mark Hahn <span dir="ltr"><<a href="mailto:hahn@mcmaster.ca" target="_blank">hahn@mcmaster.ca</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Only for benchmarking?  We have done this for years on our production<br>

clusters (and SGI provides a tool this and more to clean up nodes).  We<br>

have this in our epilogue so that we can clean out memory on our diskless<br>

nodes so there is nothing stale sitting around that can impact the next<br>

users job.<br>

</blockquote>

<br></div>

understood, but how did you decide that was actually a good thing?<br>

<br></blockquote><div><br></div><div style>Mark,</div><div style><br></div><div style>Because it stopped the random out of memory conditions that we were having.  </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


if two jobs with similar file reference patterns run, for instance, drop_caches will cause quite a bit of additional IO delay.<br>

<br></blockquote><div><br></div><div style>For our workloads, this is a highly unlikely scenario because nodes are not shared and the workload is very diverse, so for the next job to have any connection to the previous job is negligible.</div>

<div style><br></div><div style>Craig</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I guess the rationale would also be much clearer for certain workloads, such as big-data reduction jobs, where things like executables would have to be re-fetched, but presumably much larger input data might never be re-referenced by following jobs.  it would have to be jobs that have a lot of intra- but not inter-job readonly file re-reference,<br>


and where clean-page scavenging is a noticable cost.<br>

<br>

I'm guessing this may have been a much bigger deal on strongly NUMA<br>

machines of a certain era (high-memory ia64 SGI, older kernels).<br>

<br>

regards, mark.<br>

</blockquote></div><br></div></div>