<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Another good question. The systems with the nfsroot os still have
      a local disk. That local disk has a /var partition where logs are
      written. Both system do send some logs to a remote log server.
      While /etc/rsyslog.conf files were almost identical, I copied the
      one from the nfsroot system to the local-os system to make sure
      they were identical. This has had no impact on the performance of
      xhpl. <br>
    </p>
    <pre class="moz-signature" cols="72">Prentice</pre>
    <div class="moz-cite-prefix">On 09/13/2017 02:16 PM, Scott Atchley
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAL8g0j+_zgCQnVmz3Cxn=nzv5mniLdZ9mWhcD3L8e_JrDqYctQ@mail.gmail.com">
      <div dir="ltr">Are you logging something goes to the disk in the
        local case, but that is competing for network bandwidth when NFS
        mounting?</div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Sep 13, 2017 at 2:15 PM, Scott
          Atchley <span dir="ltr"><<a
              href="mailto:e.scott.atchley@gmail.com" target="_blank"
              moz-do-not-send="true">e.scott.atchley@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Are you swapping?</div>
            <div class="HOEnZb">
              <div class="h5">
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">On Wed, Sep 13, 2017 at 2:14
                    PM, Andrew Latham <span dir="ltr"><<a
                        href="mailto:lathama@gmail.com" target="_blank"
                        moz-do-not-send="true">lathama@gmail.com</a>></span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div dir="ltr">ack, so maybe validate you can
                        reproduce with another nfs root. Maybe a lab
                        setup where a single server is serving nfs root
                        to the node. If you could reproduce in that way
                        then it would give some direction. Beyond that
                        it sounds like an interesting problem.</div>
                      <div class="gmail_extra">
                        <div>
                          <div class="m_7604217799998711846h5"><br>
                            <div class="gmail_quote">On Wed, Sep 13,
                              2017 at 12:48 PM, Prentice Bisbal <span
                                dir="ltr"><<a
                                  href="mailto:pbisbal@pppl.gov"
                                  target="_blank" moz-do-not-send="true">pbisbal@pppl.gov</a>></span>
                              wrote:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">Okay, based
                                on the various responses I've gotten
                                here and on other lists, I feel I need
                                to clarify things:<br>
                                <br>
                                This problem only occurs when I'm
                                running our NFSroot based version of the
                                OS (CentOS 6). When I run the same OS
                                installed on a local disk, I do not have
                                this problem, using the same exact
                                server(s).  For testing purposes, I'm
                                using LINPACK, and running the same
                                executable  with the same HPL.dat file
                                in both instances.<br>
                                <br>
                                Because I'm testing the same hardware
                                using different OSes, this (should)
                                eliminate the problem being in the BIOS,
                                and faulty hardware. This leads me to
                                believe it's most likely a software
                                configuration issue, like a kernel
                                tuning parameter, or some other software
                                configuration issue.<br>
                                <br>
                                These are Supermicro servers, and it
                                seems they do not provide CPU temps. I
                                do see a chassis temp, but not the temps
                                of the individual CPUs. While I agree
                                that should be the first thing I look
                                at, it's not an option for me. Other
                                tools like FLIR and Infrared
                                thermometers aren't really an option for
                                me, either.<br>
                                <br>
                                What software configuration, either a
                                kernel a parameter, configuration of
                                numad or cpuspeed, or some other
                                setting, could affect this?<span
                                  class="m_7604217799998711846m_5099190104119760613HOEnZb"><font
                                    color="#888888"><br>
                                    <br>
                                    Prentice</font></span><span
                                  class="m_7604217799998711846m_5099190104119760613im
m_7604217799998711846m_5099190104119760613HOEnZb"><br>
                                  <br>
                                  On 09/08/2017 02:41 PM, Prentice
                                  Bisbal wrote:<br>
                                </span><span
                                  class="m_7604217799998711846m_5099190104119760613im
m_7604217799998711846m_5099190104119760613HOEnZb">
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    Beowulfers,<br>
                                    <br>
                                    I need your assistance debugging a
                                    problem:<br>
                                    <br>
                                    I have a dozen servers that are all
                                    identical hardware: SuperMicro
                                    servers with AMD Opteron 6320
                                    processors. Every since we upgraded
                                    to CentOS 6, the users have been
                                    complaining of wildly inconsistent
                                    performance across these 12 nodes. I
                                    ran LINPACK on these nodes, and was
                                    able to duplicate the problem, with
                                    performance varying from ~14 GFLOPS
                                    to 64 GFLOPS.<br>
                                    <br>
                                    I've identified that performance on
                                    the slower nodes starts off fine,
                                    and then slowly degrades throughout
                                    the LINPACK run. For example, on a
                                    node with this problem, during first
                                    LINPACK test, I can see the
                                    performance drop from 115 GFLOPS
                                    down to 11.3 GFLOPS. That constant,
                                    downward trend continues throughout
                                    the remaining tests. At the start of
                                    subsequent tests, performance will
                                    jump up to about 9-10 GFLOPS, but
                                    then drop to 5-6 GLOPS at the end of
                                    the test.<br>
                                    <br>
                                    Because of the nature of this
                                    problem, I suspect this might be a
                                    thermal issue. My guess is that the
                                    processor speed is being throttled
                                    to prevent overheating on the "bad"
                                    nodes.<br>
                                    <br>
                                    But here's the thing: this wasn't a
                                    problem until we upgraded to CentOS
                                    6. Where I work, we use a read-only
                                    NFSroot filesystem for our cluster
                                    nodes, so all nodes are mounting and
                                    using the same exact read-only image
                                    of the operating system. This only
                                    happens with these SuperMicro nodes,
                                    and only with the CentOS 6 on
                                    NFSroot. RHEL5 on NFSroot worked
                                    fine, and when I installed CentOS 6
                                    on a local disk, the nodes worked
                                    fine.<br>
                                    <br>
                                    Any ideas where to look or what to
                                    tweak to fix this? Any idea why this
                                    is only occuring with RHEL 6 w/ NFS
                                    root OS?<br>
                                    <br>
                                  </blockquote>
                                  <br>
                                </span>
                                <div
                                  class="m_7604217799998711846m_5099190104119760613HOEnZb">
                                  <div
                                    class="m_7604217799998711846m_5099190104119760613h5">
                                    ______________________________<wbr>_________________<br>
                                    Beowulf mailing list, <a
                                      href="mailto:Beowulf@beowulf.org"
                                      target="_blank"
                                      moz-do-not-send="true">Beowulf@beowulf.org</a>
                                    sponsored by Penguin Computing<br>
                                    To change your subscription (digest
                                    mode or unsubscribe) visit <a
                                      href="http://www.beowulf.org/mailman/listinfo/beowulf"
                                      rel="noreferrer" target="_blank"
                                      moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                            <br clear="all">
                            <div><br>
                            </div>
                          </div>
                        </div>
                        <span>-- <br>
                          <div
                            class="m_7604217799998711846m_5099190104119760613gmail_signature"
                            data-smartmail="gmail_signature">
                            <div dir="ltr">
                              <div>
                                <div dir="ltr">- Andrew "lathama" Latham
                                  <a href="mailto:lathama@gmail.com"
                                    target="_blank"
                                    moz-do-not-send="true">lathama@gmail.com</a>
                                  <a href="http://lathama.org"
                                    target="_blank"
                                    moz-do-not-send="true">http://lathama.com</a> -</div>
                              </div>
                            </div>
                          </div>
                        </span></div>
                      <br>
                      ______________________________<wbr>_________________<br>
                      Beowulf mailing list, <a
                        href="mailto:Beowulf@beowulf.org"
                        target="_blank" moz-do-not-send="true">Beowulf@beowulf.org</a>
                      sponsored by Penguin Computing<br>
                      To change your subscription (digest mode or
                      unsubscribe) visit <a
                        href="http://www.beowulf.org/mailman/listinfo/beowulf"
                        rel="noreferrer" target="_blank"
                        moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
                      <br>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>