Академический Документы
Профессиональный Документы
Культура Документы
Legal Disclaimer
Intel may make changes to specifications and product descriptions at any time, without notice. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel Virtualization Technology requires a computer system with a processor, chipset, BIOS, virtual machine monitor (VMM) and applications enabled for virtualization technology. Functionality, performance or other virtualization technology benefits will vary depending on hardware and software configurations. Virtualization technology-enabled BIOS and VMM applications are currently in development. 64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel 64 architecture. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for more information. Lead-free: 45nm product is manufactured on a lead-free process. Lead is below 1000 PPM per EU RoHS directive (2002/95/EC, Annex A). Some EU RoHS exemptions for lead may apply to other components used in the product package. Halogen-free: Applies only to halogenated flame retardants and PVC in components. Halogens are below 900 PPM bromine and 900 PPM chlorine. Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. 2009 Standard Performance Evaluation Corporation (SPEC) logo is reprinted with permission
Agenda
CPU Utilization Monitoring Performance Monitoring Units (PMU) in Processors Offline analysis with PMU: Intel VTune Performance Analyser Online Dynamic Processor Monitoring NEW!
How do I find out what keeps processor busy? Or is my software just wasting compute cycles?
Existing OS CPU meters can not predict capacity of modern hardware
Software & Services Group 4
SYSTEM
SOCKET (CPU)
CORE
Programming PMUs
Programming by reading/writing Model Specific Registers Much of hardware and events are platform specific Core PMU is enumerate in CPUID Leaf A:
Number of fully programmable counters (4 per logical core), a counter is assigned to count a certain event Number of fixed function counters exist (3 per logical core): core clocks counter, reference clock counter, instruction counter
Some uncore and core programmable counters can be only programmed with certain types of events Other tricky restrictions apply, restructions are documented in the event list
Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3B: System Programming Guide, Part 2 http://www.intel.com/products/processor/manuals/ Intel Xeon Processor 7500 Series Uncore Programming Guide
http://www.intel.com/Assets/en_US/PDF/designguide/323535.pdf
http://software.intel.com/file/20476
Peggy Irelan and Shihjong Kuo Performance Monitoring Unit Sharing Guide
Gillespie, Drysdale Intel Hyper-Threading Technology: Analysis of the HT Effects on a Server Transactional Workload http://software.intel.com/en-us/articles/intel-hyper-threadingtechnology-analysis-of-the-ht-effects-on-a-server-transactional-workload/
Besides sampling analysis with PMU can also produce call-graph (not covered here)
DEMO
Intel VTune Performance Analyzer in action!
etc.
Software & Services Group 13
Hotspot view of one module for all OS processes and threads grouped by function (or method).
Sampling Source View Displays Source Code Annotated with Performance Data
Online Performance Counter Monitoring: Access Intel CPU Counters* in Your Program
Terminology: System consists of several sockets (=CPUs) Socket has a number (logical) cores
Usage pattern 1. Save counter state for {core,socket,system} into a state object 1 2. Run user code or experiment 3. Save counter state for {core,socket,system} into a state object 2 4. Using state object 1 and 2 compute performance/utilization metrics Caution: OS may schedule different user threads on the same core (context switches) NEW!
Access not only core counters (clock ticks, L2 cache misses, etc) but also uncore (Intel memory controllers, Intel QPI, etc) counters*
Software & Services Group
* Implemented for Intel Core i7, Xeon 5500, 5600 and 7500 Processor Series (based on microarchitecture codenamed Nehalem/Westmere) 17
Example 1
Compare traversal/searching in the STL list vs. STL vector (4 byte records) C++ code to measure:
std::find( ds.begin(), ds.end(), ds.size());
Advanced Examples
NEW!
Self-tuning software !!
11
11
12
13
Advanced Use-Cases I
Extend the problem (to be closer to reality):
Schedule to all Hyper-Threaded cores in the system The remaining capacities are not known a priori because the jobs are not predictable in exact resource utilization Do we have a room to put another job on this HT core?
Should it be compute intensive or rather memory intensive job?
CPU Performance Monitoring can provide more insights and help to answer these questions
Advanced Use-Cases II
Depending on remaining resource capacities choose the best algorithm to compute result
mem-intensive or compute-intensive
and, so on
Processor performance counters are heavily used in established performance tools like Intel VTune Performance Analyser
New advanced use-cases for PMUs for dynamic online optimization possible
new kind of intelligent CPU-monitoring aware software