<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Good question. I just checked using vmstat. When running xhpl on

      both systems, vmstat shows only zeros for si and so, even long

      after the performance degrades on the nfsroot instance. Just to be

      sure, I double-checked with top, which shows 0k of swap being

      used. <br>

    </p>

    <pre class="moz-signature" cols="72">Prentice</pre>

    <div class="moz-cite-prefix">On 09/13/2017 02:15 PM, Scott Atchley

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAL8g0j+fBjXBRHmYj5kjyXbh+_D_wGjgmDnG_uOBJM=YR+EUww@mail.gmail.com">

      <div dir="ltr">Are you swapping?</div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Sep 13, 2017 at 2:14 PM, Andrew

          Latham <span dir="ltr"><<a href="mailto:lathama@gmail.com"

              target="_blank" moz-do-not-send="true">lathama@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">ack, so maybe validate you can reproduce with

              another nfs root. Maybe a lab setup where a single server

              is serving nfs root to the node. If you could reproduce in

              that way then it would give some direction. Beyond that it

              sounds like an interesting problem.</div>

            <div class="gmail_extra">

              <div>

                <div class="h5"><br>

                  <div class="gmail_quote">On Wed, Sep 13, 2017 at 12:48

                    PM, Prentice Bisbal <span dir="ltr"><<a

                        href="mailto:pbisbal@pppl.gov" target="_blank"

                        moz-do-not-send="true">pbisbal@pppl.gov</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">Okay,

                      based on the various responses I've gotten here

                      and on other lists, I feel I need to clarify

                      things:<br>

                      <br>

                      This problem only occurs when I'm running our

                      NFSroot based version of the OS (CentOS 6). When I

                      run the same OS installed on a local disk, I do

                      not have this problem, using the same exact

                      server(s).  For testing purposes, I'm using

                      LINPACK, and running the same executable  with the

                      same HPL.dat file in both instances.<br>

                      <br>

                      Because I'm testing the same hardware using

                      different OSes, this (should) eliminate the

                      problem being in the BIOS, and faulty hardware.

                      This leads me to believe it's most likely a

                      software configuration issue, like a kernel tuning

                      parameter, or some other software configuration

                      issue.<br>

                      <br>

                      These are Supermicro servers, and it seems they do

                      not provide CPU temps. I do see a chassis temp,

                      but not the temps of the individual CPUs. While I

                      agree that should be the first thing I look at,

                      it's not an option for me. Other tools like FLIR

                      and Infrared thermometers aren't really an option

                      for me, either.<br>

                      <br>

                      What software configuration, either a kernel a

                      parameter, configuration of numad or cpuspeed, or

                      some other setting, could affect this?<span

                        class="m_5099190104119760613HOEnZb"><font

                          color="#888888"><br>

                          <br>

                          Prentice</font></span><span

                        class="m_5099190104119760613im

                        m_5099190104119760613HOEnZb"><br>

                        <br>

                        On 09/08/2017 02:41 PM, Prentice Bisbal wrote:<br>

                      </span><span class="m_5099190104119760613im

                        m_5099190104119760613HOEnZb">

                        <blockquote class="gmail_quote" style="margin:0

                          0 0 .8ex;border-left:1px #ccc

                          solid;padding-left:1ex">

                          Beowulfers,<br>

                          <br>

                          I need your assistance debugging a problem:<br>

                          <br>

                          I have a dozen servers that are all identical

                          hardware: SuperMicro servers with AMD Opteron

                          6320 processors. Every since we upgraded to

                          CentOS 6, the users have been complaining of

                          wildly inconsistent performance across these

                          12 nodes. I ran LINPACK on these nodes, and

                          was able to duplicate the problem, with

                          performance varying from ~14 GFLOPS to 64

                          GFLOPS.<br>

                          <br>

                          I've identified that performance on the slower

                          nodes starts off fine, and then slowly

                          degrades throughout the LINPACK run. For

                          example, on a node with this problem, during

                          first LINPACK test, I can see the performance

                          drop from 115 GFLOPS down to 11.3 GFLOPS. That

                          constant, downward trend continues throughout

                          the remaining tests. At the start of

                          subsequent tests, performance will jump up to

                          about 9-10 GFLOPS, but then drop to 5-6 GLOPS

                          at the end of the test.<br>

                          <br>

                          Because of the nature of this problem, I

                          suspect this might be a thermal issue. My

                          guess is that the processor speed is being

                          throttled to prevent overheating on the "bad"

                          nodes.<br>

                          <br>

                          But here's the thing: this wasn't a problem

                          until we upgraded to CentOS 6. Where I work,

                          we use a read-only NFSroot filesystem for our

                          cluster nodes, so all nodes are mounting and

                          using the same exact read-only image of the

                          operating system. This only happens with these

                          SuperMicro nodes, and only with the CentOS 6

                          on NFSroot. RHEL5 on NFSroot worked fine, and

                          when I installed CentOS 6 on a local disk, the

                          nodes worked fine.<br>

                          <br>

                          Any ideas where to look or what to tweak to

                          fix this? Any idea why this is only occuring

                          with RHEL 6 w/ NFS root OS?<br>

                          <br>

                        </blockquote>

                        <br>

                      </span>

                      <div class="m_5099190104119760613HOEnZb">

                        <div class="m_5099190104119760613h5">

                          ______________________________<wbr>_________________<br>

                          Beowulf mailing list, <a

                            href="mailto:Beowulf@beowulf.org"

                            target="_blank" moz-do-not-send="true">Beowulf@beowulf.org</a>

                          sponsored by Penguin Computing<br>

                          To change your subscription (digest mode or

                          unsubscribe) visit <a

                            href="http://www.beowulf.org/mailman/listinfo/beowulf"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  <br>

                  <br clear="all">

                  <div><br>

                  </div>

                </div>

              </div>

              <span class="">-- <br>

                <div class="m_5099190104119760613gmail_signature"

                  data-smartmail="gmail_signature">

                  <div dir="ltr">

                    <div>

                      <div dir="ltr">- Andrew "lathama" Latham <a

                          href="mailto:lathama@gmail.com"

                          target="_blank" moz-do-not-send="true">lathama@gmail.com</a>

                        <a href="http://lathama.org" target="_blank"

                          moz-do-not-send="true">http://lathama.com</a> -</div>

                    </div>

                  </div>

                </div>

              </span></div>

            <br>

            ______________________________<wbr>_________________<br>

            Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org"

              moz-do-not-send="true">Beowulf@beowulf.org</a> sponsored

            by Penguin Computing<br>

            To change your subscription (digest mode or unsubscribe)

            visit <a

              href="http://www.beowulf.org/mailman/listinfo/beowulf"

              rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.beowulf.org/<wbr>mailman/listinfo/beowulf</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>