[Beowulf] Weird blade performs worse as more cpus are used?

Faraz Hussain info at feacluster.com
Thu Sep 14 08:34:59 PDT 2017


Earlier I had posted about one of our blades running 30-50% slower  
than other  ones despite having identical hardware and OS. I followed  
the suggestions and compared cpu temperature, memory, dmesg and  
sysctl. Everything looks the same.

I then used "perf stat" to compare speed of pigz ( parralel gzip ).  
The results are quite interesting. Using one cpu, the slow blade is as  
fast as the rest! But as I use more cpus, the speed decreases linearly  
from 3.1Ghz to 0.4 Ghz. See snippets from "perf stat" command below.  
All tests were on /tmp to eliminate any nfs issue. And same behavior  
is observed with any multi-threaded program.

Healthy blade 1 cpu:

Performance counter stats for './pigz -p 1 some200MBfile':

        6441.560969 task-clock                #    1.001 CPUs utilized
     21,230,248,729 cycles                    #    3.296 GHz
        6.435670580 seconds time elapsed

Slow blade 1 cpu:

  Performance counter stats for './pigz -p 1 some200MBfile':

        6857.933315 task-clock                #    1.001 CPUs utilized
     21,412,281,401 cycles                    #    3.122 GHz
        6.851644289 seconds time elapsed

Healthy blade 20 cpus:

Performance counter stats for './pigz -p 1 some200MBfile':

        7570.967306 task-clock                #   16.367 CPUs utilized
     21,913,797,346 cycles                    #    2.894 GHz
        0.462575439 seconds time elapsed

Slow blade 20 cpus:

  Performance counter stats for './pigz -p 1 some200MBfile':

       63404.802003 task-clock                #   19.524 CPUs utilized
     24,834,879,081 cycles                    #    0.392 GHz
        3.247597619 seconds time elapsed





More information about the Beowulf mailing list