Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] itanium vs. x86-64

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

kyron at neuralbs.com kyron at neuralbs.com
Tue Feb 10 07:09:59 PST 2009


>> Next caliper allows to get a lot of diagnostics from the cpu (also
>> because
>> ia64 supports all that while x86-64 does not AFAICT) like number of
>> bubbles
>> in the pipeline, L2-cache misses, clock-cycles per line of C-code etc.
>
> these are just the performance-counting MSR's, which are available
> on Opterons as well as Xeons too.

Even back to the PIII processors (and more?). Check out PAPI
(http://icl.cs.utk.edu/papi/) for more details but, as an example, here is
the output from an old cluster node:

eric at thinkbig1 ~ $ papi_avail -a
Available events and hardware information.
-------------------------------------------------------------------------
Vendor string and code   : AuthenticAMD (2)
Model string and code    : AMD K7 (9)
CPU Revision             : 0.000000
CPU Megahertz            : 2083.157959
CPU's in this Node       : 1
Nodes in this System     : 1
Total CPU's              : 1
Number Hardware Counters : 4
Max Multiplex Counters   : 32
-------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.

Name            Derived Description (Mgr. Note)
PAPI_L1_DCM     Yes     Level 1 data cache misses
PAPI_L1_ICM     No      Level 1 instruction cache misses
PAPI_L2_DCM     No      Level 2 data cache misses
PAPI_L2_ICM     No      Level 2 instruction cache misses
PAPI_L1_TCM     Yes     Level 1 cache misses
PAPI_L2_TCM     Yes     Level 2 cache misses
PAPI_TLB_DM     No      Data translation lookaside buffer misses
PAPI_TLB_IM     No      Instruction translation lookaside buffer misses
PAPI_TLB_TL     Yes     Total translation lookaside buffer misses
PAPI_L1_LDM     No      Level 1 load misses
PAPI_L1_STM     No      Level 1 store misses
PAPI_L2_LDM     No      Level 2 load misses
PAPI_L2_STM     No      Level 2 store misses
PAPI_HW_INT     No      Hardware interrupts
PAPI_BR_UCN     No      Unconditional branch instructions
PAPI_BR_CN      No      Conditional branch instructions
PAPI_BR_TKN     No      Conditional branch instructions taken
PAPI_BR_NTK     Yes     Conditional branch instructions not taken
PAPI_BR_MSP     No      Conditional branch instructions mispredicted
PAPI_BR_PRC     Yes     Conditional branch instructions correctly predicted
PAPI_TOT_INS    No      Instructions completed
PAPI_BR_INS     No      Branch instructions
PAPI_RES_STL    No      Cycles stalled on any resource
PAPI_TOT_CYC    No      Total cycles
PAPI_L1_DCH     Yes     Level 1 data cache hits
PAPI_L2_DCH     No      Level 2 data cache hits
PAPI_L1_DCA     No      Level 1 data cache accesses
PAPI_L2_DCA     Yes     Level 2 data cache accesses
PAPI_L2_DCR     No      Level 2 data cache reads
PAPI_L2_DCW     No      Level 2 data cache writes
PAPI_L1_ICA     No      Level 1 instruction cache accesses
PAPI_L2_ICA     No      Level 2 instruction cache accesses
PAPI_L1_ICR     No      Level 1 instruction cache reads
PAPI_L1_TCA     Yes     Level 1 total cache accesses
-------------------------------------------------------------------------
avail.c                                  PASSED

And from a newer cluster node. Note the addition of floating point metrics
now available:

eric at h2 ~ $ papi_avail -a
Available events and hardware information.
--------------------------------------------------------------------------------
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel Core 2 (18)
CPU Revision             : 11.000000
CPU Megahertz            : 2394.000000
CPU Clock Megahertz      : 2394
CPU's in this Node       : 4
Nodes in this System     : 1
Total CPU's              : 4
Number Hardware Counters : 5
Max Multiplex Counters   : 32
--------------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.

    Name        Code    Deriv Description (Note)
PAPI_L1_DCM  0x80000000  No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  No   Level 2 instruction cache misses
PAPI_L1_TCM  0x80000006  No   Level 1 cache misses
PAPI_L2_TCM  0x80000007  No   Level 2 cache misses
PAPI_CA_SHR  0x8000000a  No   Requests for exclusive access to shared
cache line
PAPI_CA_CLN  0x8000000b  No   Requests for exclusive access to clean cache
line
PAPI_CA_ITV  0x8000000d  No   Requests for cache line intervention
PAPI_TLB_DM  0x80000014  No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  No   Instruction translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  No   Level 1 load misses
PAPI_L1_STM  0x80000018  No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes  Level 2 load misses
PAPI_L2_STM  0x8000001a  No   Level 2 store misses
PAPI_HW_INT  0x80000029  No   Hardware interrupts
PAPI_BR_CN   0x8000002b  No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes  Conditional branch instructions correctly
predicted
PAPI_TOT_IIS 0x80000031  No   Instructions issued
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_FP_INS  0x80000034  No   Floating point instructions
PAPI_BR_INS  0x80000037  No   Branch instructions
PAPI_VEC_INS 0x80000038  No   Vector/SIMD instructions
PAPI_RES_STL 0x80000039  No   Cycles stalled on any resource
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_L1_DCH  0x8000003e  Yes  Level 1 data cache hits
PAPI_L1_DCA  0x80000040  No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes  Level 2 data cache accesses
PAPI_L2_DCR  0x80000044  No   Level 2 data cache reads
PAPI_L2_DCW  0x80000047  No   Level 2 data cache writes
PAPI_L1_ICH  0x80000049  Yes  Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes  Level 2 instruction cache hits
PAPI_L1_ICA  0x8000004c  No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  No   Level 2 instruction cache accesses
PAPI_L2_TCH  0x80000056  Yes  Level 2 total cache hits
PAPI_L1_TCA  0x80000058  Yes  Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  No   Level 2 total cache accesses
PAPI_L2_TCR  0x8000005c  Yes  Level 2 total cache reads
PAPI_L2_TCW  0x8000005f  No   Level 2 total cache writes
PAPI_FML_INS 0x80000061  No   Floating point multiply instructions
PAPI_FDV_INS 0x80000063  No   Floating point divide instructions
PAPI_FP_OPS  0x80000066  No   Floating point operations
-------------------------------------------------------------------------
Of 45 available events, 10 are derived.
avail.c                                  PASSED

The limiting factor here is the number of available hardware counters (ie:
5 for the Q6600)...check out Blue Gene's table ;) :
http://www.nic.uoregon.edu/mediawiki-tau/index.php?title=Guide:BlueGene_PAPI_Counter_Analysis&printable=yes#PAPI_Events_Available_on_Blue_Gene

Eric




More information about the Beowulf mailing list