[Beowulf] itanium vs. x86-64
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
kyron at neuralbs.com kyron at neuralbs.comTue Feb 10 07:09:59 PST 2009
- Previous message: [Beowulf] itanium vs. x86-64
- Next message: [Beowulf] itanium vs. x86-64
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>> Next caliper allows to get a lot of diagnostics from the cpu (also >> because >> ia64 supports all that while x86-64 does not AFAICT) like number of >> bubbles >> in the pipeline, L2-cache misses, clock-cycles per line of C-code etc. > > these are just the performance-counting MSR's, which are available > on Opterons as well as Xeons too. Even back to the PIII processors (and more?). Check out PAPI (http://icl.cs.utk.edu/papi/) for more details but, as an example, here is the output from an old cluster node: eric at thinkbig1 ~ $ papi_avail -a Available events and hardware information. ------------------------------------------------------------------------- Vendor string and code : AuthenticAMD (2) Model string and code : AMD K7 (9) CPU Revision : 0.000000 CPU Megahertz : 2083.157959 CPU's in this Node : 1 Nodes in this System : 1 Total CPU's : 1 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------------------------------------- The following correspond to fields in the PAPI_event_info_t structure. Name Derived Description (Mgr. Note) PAPI_L1_DCM Yes Level 1 data cache misses PAPI_L1_ICM No Level 1 instruction cache misses PAPI_L2_DCM No Level 2 data cache misses PAPI_L2_ICM No Level 2 instruction cache misses PAPI_L1_TCM Yes Level 1 cache misses PAPI_L2_TCM Yes Level 2 cache misses PAPI_TLB_DM No Data translation lookaside buffer misses PAPI_TLB_IM No Instruction translation lookaside buffer misses PAPI_TLB_TL Yes Total translation lookaside buffer misses PAPI_L1_LDM No Level 1 load misses PAPI_L1_STM No Level 1 store misses PAPI_L2_LDM No Level 2 load misses PAPI_L2_STM No Level 2 store misses PAPI_HW_INT No Hardware interrupts PAPI_BR_UCN No Unconditional branch instructions PAPI_BR_CN No Conditional branch instructions PAPI_BR_TKN No Conditional branch instructions taken PAPI_BR_NTK Yes Conditional branch instructions not taken PAPI_BR_MSP No Conditional branch instructions mispredicted PAPI_BR_PRC Yes Conditional branch instructions correctly predicted PAPI_TOT_INS No Instructions completed PAPI_BR_INS No Branch instructions PAPI_RES_STL No Cycles stalled on any resource PAPI_TOT_CYC No Total cycles PAPI_L1_DCH Yes Level 1 data cache hits PAPI_L2_DCH No Level 2 data cache hits PAPI_L1_DCA No Level 1 data cache accesses PAPI_L2_DCA Yes Level 2 data cache accesses PAPI_L2_DCR No Level 2 data cache reads PAPI_L2_DCW No Level 2 data cache writes PAPI_L1_ICA No Level 1 instruction cache accesses PAPI_L2_ICA No Level 2 instruction cache accesses PAPI_L1_ICR No Level 1 instruction cache reads PAPI_L1_TCA Yes Level 1 total cache accesses ------------------------------------------------------------------------- avail.c PASSED And from a newer cluster node. Note the addition of floating point metrics now available: eric at h2 ~ $ papi_avail -a Available events and hardware information. -------------------------------------------------------------------------------- Vendor string and code : GenuineIntel (1) Model string and code : Intel Core 2 (18) CPU Revision : 11.000000 CPU Megahertz : 2394.000000 CPU Clock Megahertz : 2394 CPU's in this Node : 4 Nodes in this System : 1 Total CPU's : 4 Number Hardware Counters : 5 Max Multiplex Counters : 32 -------------------------------------------------------------------------------- The following correspond to fields in the PAPI_event_info_t structure. Name Code Deriv Description (Note) PAPI_L1_DCM 0x80000000 No Level 1 data cache misses PAPI_L1_ICM 0x80000001 No Level 1 instruction cache misses PAPI_L2_DCM 0x80000002 Yes Level 2 data cache misses PAPI_L2_ICM 0x80000003 No Level 2 instruction cache misses PAPI_L1_TCM 0x80000006 No Level 1 cache misses PAPI_L2_TCM 0x80000007 No Level 2 cache misses PAPI_CA_SHR 0x8000000a No Requests for exclusive access to shared cache line PAPI_CA_CLN 0x8000000b No Requests for exclusive access to clean cache line PAPI_CA_ITV 0x8000000d No Requests for cache line intervention PAPI_TLB_DM 0x80000014 No Data translation lookaside buffer misses PAPI_TLB_IM 0x80000015 No Instruction translation lookaside buffer misses PAPI_L1_LDM 0x80000017 No Level 1 load misses PAPI_L1_STM 0x80000018 No Level 1 store misses PAPI_L2_LDM 0x80000019 Yes Level 2 load misses PAPI_L2_STM 0x8000001a No Level 2 store misses PAPI_HW_INT 0x80000029 No Hardware interrupts PAPI_BR_CN 0x8000002b No Conditional branch instructions PAPI_BR_TKN 0x8000002c No Conditional branch instructions taken PAPI_BR_NTK 0x8000002d No Conditional branch instructions not taken PAPI_BR_MSP 0x8000002e No Conditional branch instructions mispredicted PAPI_BR_PRC 0x8000002f Yes Conditional branch instructions correctly predicted PAPI_TOT_IIS 0x80000031 No Instructions issued PAPI_TOT_INS 0x80000032 No Instructions completed PAPI_FP_INS 0x80000034 No Floating point instructions PAPI_BR_INS 0x80000037 No Branch instructions PAPI_VEC_INS 0x80000038 No Vector/SIMD instructions PAPI_RES_STL 0x80000039 No Cycles stalled on any resource PAPI_TOT_CYC 0x8000003b No Total cycles PAPI_L1_DCH 0x8000003e Yes Level 1 data cache hits PAPI_L1_DCA 0x80000040 No Level 1 data cache accesses PAPI_L2_DCA 0x80000041 Yes Level 2 data cache accesses PAPI_L2_DCR 0x80000044 No Level 2 data cache reads PAPI_L2_DCW 0x80000047 No Level 2 data cache writes PAPI_L1_ICH 0x80000049 Yes Level 1 instruction cache hits PAPI_L2_ICH 0x8000004a Yes Level 2 instruction cache hits PAPI_L1_ICA 0x8000004c No Level 1 instruction cache accesses PAPI_L2_ICA 0x8000004d No Level 2 instruction cache accesses PAPI_L2_TCH 0x80000056 Yes Level 2 total cache hits PAPI_L1_TCA 0x80000058 Yes Level 1 total cache accesses PAPI_L2_TCA 0x80000059 No Level 2 total cache accesses PAPI_L2_TCR 0x8000005c Yes Level 2 total cache reads PAPI_L2_TCW 0x8000005f No Level 2 total cache writes PAPI_FML_INS 0x80000061 No Floating point multiply instructions PAPI_FDV_INS 0x80000063 No Floating point divide instructions PAPI_FP_OPS 0x80000066 No Floating point operations ------------------------------------------------------------------------- Of 45 available events, 10 are derived. avail.c PASSED The limiting factor here is the number of available hardware counters (ie: 5 for the Q6600)...check out Blue Gene's table ;) : http://www.nic.uoregon.edu/mediawiki-tau/index.php?title=Guide:BlueGene_PAPI_Counter_Analysis&printable=yes#PAPI_Events_Available_on_Blue_Gene Eric
- Previous message: [Beowulf] itanium vs. x86-64
- Next message: [Beowulf] itanium vs. x86-64
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
