[Beowulf] itanium vs. x86-64

kyron at neuralbs.com kyron at neuralbs.com
Tue Feb 10 07:09:59 PST 2009


>> Next caliper allows to get a lot of diagnostics from the cpu (also
>> because
>> ia64 supports all that while x86-64 does not AFAICT) like number of
>> bubbles
>> in the pipeline, L2-cache misses, clock-cycles per line of C-code etc.
>
> these are just the performance-counting MSR's, which are available
> on Opterons as well as Xeons too.

Even back to the PIII processors (and more?). Check out PAPI
(http://icl.cs.utk.edu/papi/) for more details but, as an example, here is
the output from an old cluster node:

eric at thinkbig1 ~ $ papi_avail -a
Available events and hardware information.
-------------------------------------------------------------------------
Vendor string and code   : AuthenticAMD (2)
Model string and code    : AMD K7 (9)
CPU Revision             : 0.000000
CPU Megahertz            : 2083.157959
CPU's in this Node       : 1
Nodes in this System     : 1
Total CPU's              : 1
Number Hardware Counters : 4
Max Multiplex Counters   : 32
-------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.

Name            Derived Description (Mgr. Note)
PAPI_L1_DCM     Yes     Level 1 data cache misses
PAPI_L1_ICM     No      Level 1 instruction cache misses
PAPI_L2_DCM     No      Level 2 data cache misses
PAPI_L2_ICM     No      Level 2 instruction cache misses
PAPI_L1_TCM     Yes     Level 1 cache misses
PAPI_L2_TCM     Yes     Level 2 cache misses
PAPI_TLB_DM     No      Data translation lookaside buffer misses
PAPI_TLB_IM     No      Instruction translation lookaside buffer misses
PAPI_TLB_TL     Yes     Total translation lookaside buffer misses
PAPI_L1_LDM     No      Level 1 load misses
PAPI_L1_STM     No      Level 1 store misses
PAPI_L2_LDM     No      Level 2 load misses
PAPI_L2_STM     No      Level 2 store misses
PAPI_HW_INT     No      Hardware interrupts
PAPI_BR_UCN     No      Unconditional branch instructions
PAPI_BR_CN      No      Conditional branch instructions
PAPI_BR_TKN     No      Conditional branch instructions taken
PAPI_BR_NTK     Yes     Conditional branch instructions not taken
PAPI_BR_MSP     No      Conditional branch instructions mispredicted
PAPI_BR_PRC     Yes     Conditional branch instructions correctly predicted
PAPI_TOT_INS    No      Instructions completed
PAPI_BR_INS     No      Branch instructions
PAPI_RES_STL    No      Cycles stalled on any resource
PAPI_TOT_CYC    No      Total cycles
PAPI_L1_DCH     Yes     Level 1 data cache hits
PAPI_L2_DCH     No      Level 2 data cache hits
PAPI_L1_DCA     No      Level 1 data cache accesses
PAPI_L2_DCA     Yes     Level 2 data cache accesses
PAPI_L2_DCR     No      Level 2 data cache reads
PAPI_L2_DCW     No      Level 2 data cache writes
PAPI_L1_ICA     No      Level 1 instruction cache accesses
PAPI_L2_ICA     No      Level 2 instruction cache accesses
PAPI_L1_ICR     No      Level 1 instruction cache reads
PAPI_L1_TCA     Yes     Level 1 total cache accesses
-------------------------------------------------------------------------
avail.c                                  PASSED

And from a newer cluster node. Note the addition of floating point metrics
now available:

eric at h2 ~ $ papi_avail -a
Available events and hardware information.
--------------------------------------------------------------------------------
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel Core 2 (18)
CPU Revision             : 11.000000
CPU Megahertz            : 2394.000000
CPU Clock Megahertz      : 2394
CPU's in this Node       : 4
Nodes in this System     : 1
Total CPU's              : 4
Number Hardware Counters : 5
Max Multiplex Counters   : 32
--------------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.

    Name        Code    Deriv Description (Note)
PAPI_L1_DCM  0x80000000  No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  No   Level 2 instruction cache misses
PAPI_L1_TCM  0x80000006  No   Level 1 cache misses
PAPI_L2_TCM  0x80000007  No   Level 2 cache misses
PAPI_CA_SHR  0x8000000a  No   Requests for exclusive access to shared
cache line
PAPI_CA_CLN  0x8000000b  No   Requests for exclusive access to clean cache
line
PAPI_CA_ITV  0x8000000d  No   Requests for cache line intervention
PAPI_TLB_DM  0x80000014  No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  No   Instruction translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  No   Level 1 load misses
PAPI_L1_STM  0x80000018  No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes  Level 2 load misses
PAPI_L2_STM  0x8000001a  No   Level 2 store misses
PAPI_HW_INT  0x80000029  No   Hardware interrupts
PAPI_BR_CN   0x8000002b  No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes  Conditional branch instructions correctly
predicted
PAPI_TOT_IIS 0x80000031  No   Instructions issued
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_FP_INS  0x80000034  No   Floating point instructions
PAPI_BR_INS  0x80000037  No   Branch instructions
PAPI_VEC_INS 0x80000038  No   Vector/SIMD instructions
PAPI_RES_STL 0x80000039  No   Cycles stalled on any resource
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_L1_DCH  0x8000003e  Yes  Level 1 data cache hits
PAPI_L1_DCA  0x80000040  No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes  Level 2 data cache accesses
PAPI_L2_DCR  0x80000044  No   Level 2 data cache reads
PAPI_L2_DCW  0x80000047  No   Level 2 data cache writes
PAPI_L1_ICH  0x80000049  Yes  Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes  Level 2 instruction cache hits
PAPI_L1_ICA  0x8000004c  No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  No   Level 2 instruction cache accesses
PAPI_L2_TCH  0x80000056  Yes  Level 2 total cache hits
PAPI_L1_TCA  0x80000058  Yes  Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  No   Level 2 total cache accesses
PAPI_L2_TCR  0x8000005c  Yes  Level 2 total cache reads
PAPI_L2_TCW  0x8000005f  No   Level 2 total cache writes
PAPI_FML_INS 0x80000061  No   Floating point multiply instructions
PAPI_FDV_INS 0x80000063  No   Floating point divide instructions
PAPI_FP_OPS  0x80000066  No   Floating point operations
-------------------------------------------------------------------------
Of 45 available events, 10 are derived.
avail.c                                  PASSED

The limiting factor here is the number of available hardware counters (ie:
5 for the Q6600)...check out Blue Gene's table ;) :
http://www.nic.uoregon.edu/mediawiki-tau/index.php?title=Guide:BlueGene_PAPI_Counter_Analysis&printable=yes#PAPI_Events_Available_on_Blue_Gene

Eric




More information about the Beowulf mailing list