Академический Документы
Профессиональный Документы
Культура Документы
Special considerations for MPI on Intel Xeon Phi and using the Intel Trace Analyzer and Collector
Gergana Slavova
gergana.s.slavova@intel.com Technical Consulting Engineer Intel Software and Services Group
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
2
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
3
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Code
Multicore
Many-core
Cluster
Multicore CPU
Multicore CPU
Multicore Cluster
Use One Software Develop Architecture Today. Scale Forward & Parallelize Today for Maximum Tomorrow. Performance
4
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Automatic tuning
Interconnect Independence & Runtime Selection Multi-vendor interoperability Performance optimized support for the latest OFED capabilities through DAPL 2.0
More robust MPI applications Seamless interoperability with Intel Trace Analyzer and Collector
5
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
9
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Installation
Download latest Intel MPI Library plus a license from Intel Registration Center (included in Intel Cluster Studio XE)
l_mpi_p_4.1.0.030.tgz l_itac_p_8.1.0.024.tgz
Root or user installation possible! Resulting directory structure has intel64 and coprocessor sub-directories:
/opt/intel/impi/4.1.0.030/intel64/{bin,etc,include,lib} /opt/intel/impi/4.1.0.030/mic/{bin,etc,include,lib}
Only one user environment setup required, serves both architectures!
10
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Prerequisites
Make sure the Intel Xeon Phi card is accesible through the network (meaning, unique IP address)
11
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
mpirun needs ssh access to Intel Xeon Phi coprocessor. Copy the ssh key for the card:
# sudo cp /opt/intel/mic/id_rsa* ~/.ssh/ # sudo chown myuser.mygroup ~/.ssh/id_rsa* # ssh root@node0-mic0 hostname
12
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
13
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Outline
Overview Installation of Intel MPI
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
14
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
MPI ranks on Intel Xeon Phi coprocessor (only) All messages into/out of the coprocessors Intel Cilk Plus, OpenMP*, Intel Threading Building Blocks, Pthreads used directly within MPI processes
MPI
Data
CPU
Data
CPU
Build Intel Xeon Phi coprocessor binary using the Intel compiler Upload the binary to the Intel Xeon Phi coprocessor Run instances of the MPI application on Intel Xeon Phi coprocessor nodes
15
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Build the application for the Intel Xeon Phi Architecture # mpiicc -mmic -o test_hello.MIC test.c Upload the coprocessor executable # scp ./test_hello.MIC node0-mic0:/tmp
Remark: If NFS available no explicit uploads required!
Launch the application on the coprocessor from host # export I_MPI_MIC=enable # cat mpi_hosts node0-mic0 # mpirun f mpi_hosts -n 2 /tmp/test_hello.MIC Alternatively: login to the coprocessor and execute the already uploaded mpirun there!
16
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
MPI
Network
Data
CPU
Data
CPU
Data
C
Build binaries by using the resp. compilers targeting Intel 64 and Intel Xeon Phi Architecture
Upload the binary to the Intel Xeon Phi coprocessor Run instances of the MPI application on different mixed nodes
17
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Build the application for Intel64 and the Intel Xeon Phi Architecture separately # mpiicc -o test_hello test.c # mpiicc mmic -o test_hello.MIC test.c Upload the Intel Xeon Phi coprocessor executable # scp ./test_hello.MIC node0-mic0:~/test_hello
Remark: If NFS available no explicit uploads required (look for tips on next slide)!
Launch the application on the host and the coprocessor from the host # export I_MPI_MIC=enable # cat mpi_hosts node0 node0-mic0 # mpirun f mpi_hosts -n 2 ~/test_hello
18
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Compile your code for the Intel Xeon processor node And for Intel Xeon Phi Coprocessor
# mpiicc mmic o test.mic test.c
Instead of copying the executable, simply tell the Intel MPI Library to add a postfix to the executable when running on Intel Xeon Phi coprocessor
# export I_MPI_MIC_POSTFIX=.mic
Compile your code for the Intel Xeon Architecture And for Intel Xeon Phi coprocessor (place the executable in a separate directory)
# mpiicc mmic o ./MIC/test test.c
Tell the Intel MPI Library that the Intel Xeon Phi coprocessor executable are located in a separate path
# export I_MPI_MIC_PREFIX=./MIC/
20
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
All messages into/out of host CPUs Offload models used to accelerate MPI ranks
MPI
CPU
Offload
Data
CPU
Intel Cilk Plus, OpenMP*, Intel Threading Building Blocks, Pthreads* within Intel Xeon Phi coprocessor
Build Intel 64 executable with included offload by using the Intel compiler Run instances of the MPI application on the host, offloading code onto coprocessor Advantages of more cores and wider SIMD for certain applications
21
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Compile for MPI and internal offload # mpiicc o test_hello test.c Latest compiler compiles by default for offloading if offload construct is detected! Switch off by -no-offload flag Execute on host(s) as usual # cat mpi_hosts node0 # mpirun f mpi_hosts -n 2 ~/test_hello MPI processes will offload code for acceleration
22
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
23
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
24
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Potential MPI limitations Memory consumption per MPI process, sum exceeds the node memory Limited scalability due to exhausted interconnects (e.g. MPI collectives) Load balancing is often challenging in MPI
25
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Hybrid Computing
Combine MPI programming model with threading model Overcome MPI limitations by adding threading:
26
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
OpenMP*
Programmer control
Choice of unified programming to target Intel Xeon and Intel Xeon Phi Architecture!
27
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
28
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Pin OpenMP threads inside the domain with KMP_AFFINITY (or in the code)
29
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Host, number of nodes, but also environment can be set independently in each argument set # mpirun env I_MPI_PIN_DOMAIN 4 host myXEON ... : -env I_MPI_PIN_DOMAIN 16 host myMIC
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
31
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
32
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Features
Event-based approach Low overhead Excellent scalability Comparison of multiple profiles Powerful aggregation and filtering functions Fail-safe MPI tracing Provides API to instrument user code MPI correctness checking Idealizer
33
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
34
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Set the Intel Trace Analyzer and Collector environment (per user)
# source /opt/intel/itac/8.1.0.024/bin/itacvars.sh
35
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel Trace Analyzer and Collector Usage with Intel Xeon Phi coprocessor Recommended
Run with trace flag (without linkage) to create a trace file MPI+Offload # mpirun trace -n 2 ./test Coprocessor only and Symmetric # export I_MPI_MIC=enable
36
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel Trace Analyzer and Collector Usage with Intel Xeon Phi coprocessor Compilation Support
# mpiicc trace mmic -o test_hello.MIC test.c
Linkage of tracing library
Select Charts->Event Timeline Select Charts->Message Profile Zoom into the Event Timeline
Click into it, keep pressed, move to the right, and release the mouse See menu Navigate to get back
Outline
Overview Installation of Intel MPI
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
39
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Load Balancing
Situation
Host and coprocessor computation performance are different
MPI in symmetric mode is like running on a heterogenous cluster Load balanced codes (on homogeneous cluster) may get imbalanced!
Approach 3: ...
Analyze load balance of application with Intel Trace Analyzer and Collector
Ideal Interconnect Simulator
41
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
42
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
43
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
44
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
45
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Idealized Trace
46
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
P1 P2
MPI_Isend
MPI_Recv
P1
MPI_Isend
P2
MPI_Recv
47
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
P1 P2
MPI_Isend MPI_Isend
MPI_Recv
P1
MPI_Isend
P2
MPI_Recv
48
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
P1 P2
MPI_Isend
MPI_Recv
zero duration
P1
MPI_Isend
P2
MPI_Recv
49
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
P1 P2
MPI_Isend
MPI_Recv
P1
MPI_Isend MPI_Isend
P2
MPI_Recv
50
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
P1 P2
MPI_Isend
MPI_Recv
P1
MPI_Isend MPI_Isend
P2
MPI_Recv
Load imbalance
51
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
52
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
53
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
54
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
55
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Change algorithm
"calculation"
56
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Outline
Overview Installation of the Intel MPI Library
Programming Models
Hybrid Computing Intel Trace Analyzer and Collector Load Balancing Debugging
57
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
59
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Online Resources
Intel MPI Library product page
www.intel.com/go/mpi
60
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Summary
The ease of use of the Intel MPI Library and related tools like the Intel Trace Analyzer and Collector extends from the Intel Xeon architecture to the Intel Xeon Phi architecture
61
Copyright 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
62
Optimization Notice
Intels compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
64
64
Copyright 2013, 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.