[Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release

Rayson Ho raysonlogin at gmail.com
Wed Apr 13 09:21:21 PDT 2011


Carlos,

I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
arch string, so I believe you are running the loadcheck from standard
Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
the one from the Open Grid Scheduler page.

The existing Grid Engine (including the latest Open Grid Scheduler
releases: SGE 6.2u5p1 & SGE 6.2u5p2, or Univa's fork) uses PLPA, and
it is known to be wrong on magny-cours.

(i.e. SGE 6.2u5p1 & SGE 6.2u5p2 from:
http://sourceforge.net/projects/gridscheduler/files/ )


Chansup on the Grid Engine mailing list (it's the general purpose Grid
Engine mailing list for now) tested the version I uploaded last night,
and seems to work on a dual-socket magny-cours AMD machine. It prints:

m_topology      SCCCCCCCCCCCCSCCCCCCCCCCCC

However, I am still fixing the processor, core id mapping code:

http://gridengine.org/pipermail/users/2011-April/000629.html
http://gridengine.org/pipermail/users/2011-April/000628.html

I compiled the hwloc enabled loadcheck on kernel 2.6.34 & glibc 2.12,
so it may not work on machines running lower kernel or glibc versions,
you can download it from:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

Rayson



On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
<carlosf at cesga.es> wrote:
> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD system
> (and seems to be wrong!):
>
> arch            lx24-amd64
> num_proc        24
> m_socket        2
> m_core          12
> m_topology      SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
> load_short      0.29
> load_medium     0.13
> load_long       0.04
> mem_free        26257.382812M
> swap_free       8191.992188M
> virtual_free    34449.375000M
> mem_total       32238.328125M
> swap_total      8191.992188M
> virtual_total   40430.320312M
> mem_used        5980.945312M
> swap_used       0.000000M
> virtual_used    5980.945312M
> cpu             0.0%
>
>
> Carlos Fernandez Sanchez
> Systems Manager
> CESGA
> Avda. de Vigo s/n. Campus Vida
> Tel.: (+34) 981569810, ext. 232
> 15705 - Santiago de Compostela
> SPAIN
>
> --------------------------------------------------
> From: "Rayson Ho" <raysonlogin at gmail.com>
> Sent: Tuesday, April 12, 2011 10:31 PM
> To: "Beowulf List" <Beowulf at beowulf.org>
> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
> -pre-alpha release
>
>> If you are using the "Job to Core Binding" feature in SGE and running
>> SGE on newer hardware, then please give the new hwloc enabled
>> loadcheck a try.
>>
>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>
>> The current hardware topology discovery library (Portable Linux
>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>> hardware topology may not be detected correctly by PLPA.
>>
>> If you are running SGE on AMD Magny-Cours servers, please post your
>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>
>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>> support in later releases of Grid Engine / Grid Scheduler.
>>
>> http://gridscheduler.sourceforge.net/
>>
>> Thanks!!
>> Rayson
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
>



More information about the Beowulf mailing list