Вы находитесь на странице: 1из 49

Multi - Core Architectures and Programming

Subject : Multi - Core Architectures and Programming


An Introduction to Parallel Programming by Peter S Pacheco
Chapter 1 Why Parallel Computing
1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)

Chapter 2 Parallel Hardware and Parallel Software


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)

Chapter 3 Distributed Memory Programming with MPI


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)

Chapter 4 Shared Memory Programming with Pthreads


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)

Multicore Application Programming For Windows Linux and


Oracle Solaris by Darryl Gove
Chapter 1 Hardware and Processes and Threads
1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)

Chapter 2 Coding for Performance


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)

Chapter 3 Identifying Opportunities for Parallelism


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)

Chapter 4 Synchronization and Data Sharing


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)

Chapter 5 Using POSIX Threads


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)

Chapter 6 Windows Threading


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)
154. Creating Native Windows Threads - Answer (click here)
155. Terminating Threads - Answer (click here)
156. Creating and Resuming Suspended Threads - Answer (click here)
157. Using Handles to Kernel Resources - Answer (click here)
158. Methods of Synchronization and Resource Sharing - Answer (click here)
159. An Example of Requiring Synchronization Between Threads - Answer
(click here)
160. Protecting Access to Code with Critical Sections - Answer (click here)
161. Protecting Regions of Code with Mutexes - Answer (click here)
162. Slim Reader/Writer Locks - Answer (click here)
163. Signaling Event Completion to Other Threads or Processes - Answer (click
here)
164. Wide String Handling in Windows - Answer (click here)
165. Creating Processes - Answer (click here)
166. Sharing Memory Between Processes - Answer (click here)
167. Inheriting Handles in Child Processes - Answer (click here)
168. Naming Mutexes and Sharing Them Between Processes - Answer (click
here)
169. Communicating with Pipes - Answer (click here)
170. Communicating Using Sockets - Answer (click here)
171. Atomic Updates of Variables - Answer (click here)
172. Allocating Thread-Local Storage - Answer (click here)
173. Setting Thread Priority - Answer (click here)

Chapter 7 Using Automatic Parallelization and OpenMP


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)
154. Creating Native Windows Threads - Answer (click here)
155. Terminating Threads - Answer (click here)
156. Creating and Resuming Suspended Threads - Answer (click here)
157. Using Handles to Kernel Resources - Answer (click here)
158. Methods of Synchronization and Resource Sharing - Answer (click here)
159. An Example of Requiring Synchronization Between Threads - Answer
(click here)
160. Protecting Access to Code with Critical Sections - Answer (click here)
161. Protecting Regions of Code with Mutexes - Answer (click here)
162. Slim Reader/Writer Locks - Answer (click here)
163. Signaling Event Completion to Other Threads or Processes - Answer (click
here)
164. Wide String Handling in Windows - Answer (click here)
165. Creating Processes - Answer (click here)
166. Sharing Memory Between Processes - Answer (click here)
167. Inheriting Handles in Child Processes - Answer (click here)
168. Naming Mutexes and Sharing Them Between Processes - Answer (click
here)
169. Communicating with Pipes - Answer (click here)
170. Communicating Using Sockets - Answer (click here)
171. Atomic Updates of Variables - Answer (click here)
172. Allocating Thread-Local Storage - Answer (click here)
173. Setting Thread Priority - Answer (click here)
174. Using Automatic Parallelization and OpenMP - Answer (click here)
175. Using Automatic Parallelization to Produce a Parallel Application -
Answer (click here)
176. Identifying and Parallelizing Reductions - Answer (click here)
177. Automatic Parallelization of Codes Containing Calls - Answer (click here)
178. Assisting Compiler in Automatically Parallelizing Code - Answer (click
here)
179. Using OpenMP to Produce a Parallel Application - Answer (click here)
180. Using OpenMP to Parallelize Loops - Answer (click here)
181. Runtime Behavior of an OpenMP Application - Answer (click here)
182. Variable Scoping Inside OpenMP Parallel Regions - Answer (click here)
183. Parallelizing Reductions Using OpenMP - Answer (click here)
184. Accessing Private Data Outside the Parallel Region - Answer (click here)
185. Improving Work Distribution Using Scheduling - Answer (click here)
186. Using Parallel Sections to Perform Independent Work - Answer (click here)
187. Nested Parallelism - Answer (click here)
188. Using OpenMP for Dynamically Defined Parallel Tasks - Answer (click
here)
189. Keeping Data Private to Threads - Answer (click here)
190. Controlling the OpenMP Runtime Environment - Answer (click here)
191. Waiting for Work to Complete - Answer (click here)
192. Restricting the Threads That Execute a Region of Code - Answer (click
here)
193. Ensuring That Code in a Parallel Region Is Executed in Order - Answer
(click here)
194. Collapsing Loops to Improve Workload Balance - Answer (click here)
195. Enforcing Memory Consistency - Answer (click here)
196. An Example of Parallelization - Answer (click here)

Chapter 8 Hand Coded Synchronization and Sharing


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)
154. Creating Native Windows Threads - Answer (click here)
155. Terminating Threads - Answer (click here)
156. Creating and Resuming Suspended Threads - Answer (click here)
157. Using Handles to Kernel Resources - Answer (click here)
158. Methods of Synchronization and Resource Sharing - Answer (click here)
159. An Example of Requiring Synchronization Between Threads - Answer
(click here)
160. Protecting Access to Code with Critical Sections - Answer (click here)
161. Protecting Regions of Code with Mutexes - Answer (click here)
162. Slim Reader/Writer Locks - Answer (click here)
163. Signaling Event Completion to Other Threads or Processes - Answer (click
here)
164. Wide String Handling in Windows - Answer (click here)
165. Creating Processes - Answer (click here)
166. Sharing Memory Between Processes - Answer (click here)
167. Inheriting Handles in Child Processes - Answer (click here)
168. Naming Mutexes and Sharing Them Between Processes - Answer (click
here)
169. Communicating with Pipes - Answer (click here)
170. Communicating Using Sockets - Answer (click here)
171. Atomic Updates of Variables - Answer (click here)
172. Allocating Thread-Local Storage - Answer (click here)
173. Setting Thread Priority - Answer (click here)
174. Using Automatic Parallelization and OpenMP - Answer (click here)
175. Using Automatic Parallelization to Produce a Parallel Application -
Answer (click here)
176. Identifying and Parallelizing Reductions - Answer (click here)
177. Automatic Parallelization of Codes Containing Calls - Answer (click here)
178. Assisting Compiler in Automatically Parallelizing Code - Answer (click
here)
179. Using OpenMP to Produce a Parallel Application - Answer (click here)
180. Using OpenMP to Parallelize Loops - Answer (click here)
181. Runtime Behavior of an OpenMP Application - Answer (click here)
182. Variable Scoping Inside OpenMP Parallel Regions - Answer (click here)
183. Parallelizing Reductions Using OpenMP - Answer (click here)
184. Accessing Private Data Outside the Parallel Region - Answer (click here)
185. Improving Work Distribution Using Scheduling - Answer (click here)
186. Using Parallel Sections to Perform Independent Work - Answer (click here)
187. Nested Parallelism - Answer (click here)
188. Using OpenMP for Dynamically Defined Parallel Tasks - Answer (click
here)
189. Keeping Data Private to Threads - Answer (click here)
190. Controlling the OpenMP Runtime Environment - Answer (click here)
191. Waiting for Work to Complete - Answer (click here)
192. Restricting the Threads That Execute a Region of Code - Answer (click
here)
193. Ensuring That Code in a Parallel Region Is Executed in Order - Answer
(click here)
194. Collapsing Loops to Improve Workload Balance - Answer (click here)
195. Enforcing Memory Consistency - Answer (click here)
196. An Example of Parallelization - Answer (click here)
197. Hand-Coded Synchronization and Sharing - Answer (click here)
198. Atomic Operations - Answer (click here)
199. Using Compare and Swap Instructions to Form More Complex Atomic
Operations - Answer (click here)
200. Enforcing Memory Ordering to Ensure Correct Operation - Answer (click
here)
201. Compiler Support of Memory-Ordering Directives - Answer (click here)
202. Reordering of Operations by the Compiler - Answer (click here)
203. Volatile Variables - Answer (click here)
204. Operating System–Provided Atomics - Answer (click here)
205. Lockless Algorithms - Answer (click here)
206. Dekker’s Algorithm - Answer (click here)
207. Producer-Consumer with a Circular Buffer - Answer (click here)
208. Scaling to Multiple Consumers or Producers - Answer (click here)
209. Scaling the Producer-Consumer to Multiple Threads - Answer (click here)
210. Modifying the Producer-Consumer Code to Use Atomics - Answer (click
here)
211. The ABA Problem - Answer (click here)

Chapter 9 Scaling with Multicore Processors


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)
154. Creating Native Windows Threads - Answer (click here)
155. Terminating Threads - Answer (click here)
156. Creating and Resuming Suspended Threads - Answer (click here)
157. Using Handles to Kernel Resources - Answer (click here)
158. Methods of Synchronization and Resource Sharing - Answer (click here)
159. An Example of Requiring Synchronization Between Threads - Answer
(click here)
160. Protecting Access to Code with Critical Sections - Answer (click here)
161. Protecting Regions of Code with Mutexes - Answer (click here)
162. Slim Reader/Writer Locks - Answer (click here)
163. Signaling Event Completion to Other Threads or Processes - Answer (click
here)
164. Wide String Handling in Windows - Answer (click here)
165. Creating Processes - Answer (click here)
166. Sharing Memory Between Processes - Answer (click here)
167. Inheriting Handles in Child Processes - Answer (click here)
168. Naming Mutexes and Sharing Them Between Processes - Answer (click
here)
169. Communicating with Pipes - Answer (click here)
170. Communicating Using Sockets - Answer (click here)
171. Atomic Updates of Variables - Answer (click here)
172. Allocating Thread-Local Storage - Answer (click here)
173. Setting Thread Priority - Answer (click here)
174. Using Automatic Parallelization and OpenMP - Answer (click here)
175. Using Automatic Parallelization to Produce a Parallel Application -
Answer (click here)
176. Identifying and Parallelizing Reductions - Answer (click here)
177. Automatic Parallelization of Codes Containing Calls - Answer (click here)
178. Assisting Compiler in Automatically Parallelizing Code - Answer (click
here)
179. Using OpenMP to Produce a Parallel Application - Answer (click here)
180. Using OpenMP to Parallelize Loops - Answer (click here)
181. Runtime Behavior of an OpenMP Application - Answer (click here)
182. Variable Scoping Inside OpenMP Parallel Regions - Answer (click here)
183. Parallelizing Reductions Using OpenMP - Answer (click here)
184. Accessing Private Data Outside the Parallel Region - Answer (click here)
185. Improving Work Distribution Using Scheduling - Answer (click here)
186. Using Parallel Sections to Perform Independent Work - Answer (click here)
187. Nested Parallelism - Answer (click here)
188. Using OpenMP for Dynamically Defined Parallel Tasks - Answer (click
here)
189. Keeping Data Private to Threads - Answer (click here)
190. Controlling the OpenMP Runtime Environment - Answer (click here)
191. Waiting for Work to Complete - Answer (click here)
192. Restricting the Threads That Execute a Region of Code - Answer (click
here)
193. Ensuring That Code in a Parallel Region Is Executed in Order - Answer
(click here)
194. Collapsing Loops to Improve Workload Balance - Answer (click here)
195. Enforcing Memory Consistency - Answer (click here)
196. An Example of Parallelization - Answer (click here)
197. Hand-Coded Synchronization and Sharing - Answer (click here)
198. Atomic Operations - Answer (click here)
199. Using Compare and Swap Instructions to Form More Complex Atomic
Operations - Answer (click here)
200. Enforcing Memory Ordering to Ensure Correct Operation - Answer (click
here)
201. Compiler Support of Memory-Ordering Directives - Answer (click here)
202. Reordering of Operations by the Compiler - Answer (click here)
203. Volatile Variables - Answer (click here)
204. Operating System–Provided Atomics - Answer (click here)
205. Lockless Algorithms - Answer (click here)
206. Dekker’s Algorithm - Answer (click here)
207. Producer-Consumer with a Circular Buffer - Answer (click here)
208. Scaling to Multiple Consumers or Producers - Answer (click here)
209. Scaling the Producer-Consumer to Multiple Threads - Answer (click here)
210. Modifying the Producer-Consumer Code to Use Atomics - Answer (click
here)
211. The ABA Problem - Answer (click here)
212. Scaling with Multicore Processors - Answer (click here)
213. Constraints to Application Scaling - Answer (click here)
214. Hardware Constraints to Scaling - Answer (click here)
215. Bandwidth Sharing Between Cores - Answer (click here)
216. False Sharing - Answer (click here)
217. Cache Conflict and Capacity - Answer (click here)
218. Pipeline Resource Starvation - Answer (click here)
219. Operating System Constraints to Scaling - Answer (click here)
220. Multicore Processors and Scaling - Answer (click here)

Chapter 10 Other Parallelization Technologies


1. Why Parallel Computing? - Answer (click here)
2. Why We Need Ever-Increasing Performance - Answer (click here)
3. Why We’re Building Parallel Systems - Answer (click here)
4. Why we Need to Write Parallel Programs - Answer (click here)
5. How Do We Write Parallel Programs? - Answer (click here)
6. Concurrent, Parallel, Distributed - Answer (click here)
7. Parallel Hardware and Parallel Software - Answer (click here)
8. Some Background: von Neumann architecture, Processes, multitasking,
and threads - Answer (click here)
9. Modifications to the Von Neumann Model - Answer (click here)
10. Parallel Hardware - Answer (click here)
11. Parallel Software - Answer (click here)
12. Input and Output - Answer (click here)
13. Performance of Parallel Programming - Answer (click here)
14. Parallel Program Design with example - Answer (click here)
15. Writing and Running Parallel Programs - Answer (click here)
16. Assumptions - Parallel Programming - Answer (click here)
17. Distributed-Memory Programming with MPI - Answer (click here)
18. The Trapezoidal Rule in MPI - Answer (click here)
19. Dealing with I/O - Answer (click here)
20. Collective Communication - Answer (click here)
21. MPI Derived Datatypes - Answer (click here)
22. Performance Evaluation of MPI Programs - Answer (click here)
23. A Parallel Sorting Algorithm - Answer (click here)
24. Shared-Memory Programming with Pthreads - Answer (click here)
25. Processes, Threads, and Pthreads - Answer (click here)
26. Pthreads - Hello, World Program - Answer (click here)
27. Matrix-Vector Multiplication - Answer (click here)
28. Critical Sections - Answer (click here)
29. Busy-Waiting - Answer (click here)
30. Mutexes - Answer (click here)
31. Producer-Consumer Synchronization and Semaphores - Answer (click here)
32. Barriers and Condition Variables - Answer (click here)
33. Read-Write Locks - Answer (click here)
34. Caches, Cache Coherence, and False Sharing - Answer (click here)
35. Thread-Safety - Answer (click here)
36. Shared-Memory Programming with OpenMP - Answer (click here)
37. The Trapezoidal Rule - Answer (click here)
38. Scope of Variables - Answer (click here)
39. The Reduction Clause - Answer (click here)
40. The parallel For Directive - Answer (click here)
41. More About Loops in Openmp: Sorting - Answer (click here)
42. Scheduling Loops - Answer (click here)
43. Producers and Consumers - Answer (click here)
44. Caches, Cache Coherence, and False Sharing - Answer (click here)
45. Thread-Safety - Answer (click here)
46. Parallel Program Development - Answer (click here)
47. Two n-Body Solvers - Answer (click here)
48. Parallelizing the basic solver using OpenMP - Answer (click here)
49. Parallelizing the reduced solver using OpenMP - Answer (click here)
50. Evaluating the OpenMP codes - Answer (click here)
51. Parallelizing the solvers using pthreads - Answer (click here)
52. Parallelizing the basic solver using MPI - Answer (click here)
53. Parallelizing the reduced solver using MPI - Answer (click here)
54. Performance of the MPI solvers - Answer (click here)
55. Tree Search - Answer (click here)
56. Recursive depth-first search - Answer (click here)
57. Nonrecursive depth-first search - Answer (click here)
58. Data structures for the serial implementations - Answer (click here)
59. Performance of the serial implementations - Answer (click here)
60. Parallelizing tree search - Answer (click here)
61. A static parallelization of tree search using pthreads - Answer (click here)
62. A dynamic parallelization of tree search using pthreads - Answer (click
here)
63. Evaluating the Pthreads tree-search programs - Answer (click here)
64. Parallelizing the tree-search programs using OpenMP - Answer (click here)
65. Performance of the OpenMP implementations - Answer (click here)
66. Implementation of tree search using MPI and static partitioning - Answer
(click here)
67. Implementation of tree search using MPI and dynamic partitioning -
Answer (click here)
68. Which API? - Answer (click here)
69. Hardware, Processes, and Threads - Answer (click here)
70. Examining the Insides of a Computer - Answer (click here)
71. The Motivation for Multicore Processors - Answer (click here)
72. Supporting Multiple Threads on a Single Chip - Answer (click here)
73. Increasing Instruction Issue Rate with Pipelined Processor Cores -
Answer (click here)
74. Using Caches to Hold Recently Used Data - Answer (click here)
75. Using Virtual Memory to Store Data - Answer (click here)
76. Translating from Virtual Addresses to Physical Addresses - Answer (click
here)
77. The Characteristics of Multiprocessor Systems - Answer (click here)
78. How Latency and Bandwidth Impact Performance - Answer (click here)
79. The Translation of Source Code to Assembly Language - Answer (click
here)
80. The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
81. Ensuring the Correct Order of Memory Operations - Answer (click here)
82. The Differences Between Processes and Threads - Answer (click here)
83. Coding for Performance - Answer (click here)
84. Defining Performance - Answer (click here)
85. Understanding Algorithmic Complexity - Answer (click here)
86. Why Algorithmic Complexity Is Important - Answer (click here)
87. Using Algorithmic Complexity with Care - Answer (click here)
88. How Structure Impacts Performance - Answer (click here)
89. Performance and Convenience Trade-Offs in Source Code and Build
Structures - Answer (click here)
90. Using Libraries to Structure Applications - Answer (click here)
91. The Impact of Data Structures on Performance - Answer (click here)
92. The Role of the Compiler - Answer (click here)
93. The Two Types of Compiler Optimization - Answer (click here)
94. Selecting Appropriate Compiler Options - Answer (click here)
95. How Cross-File Optimization Can Be Used to Improve Performance -
Answer (click here)
96. Using Profile Feedback - Answer (click here)
97. How Potential Pointer Aliasing Can Inhibit Compiler Optimizations -
Answer (click here)
98. Identifying Where Time Is Spent Using Profiling - Answer (click here)
99. Commonly Available Profiling Tools - Answer (click here)
100. How Not to Optimize - Answer (click here)
101. Performance by Design - Answer (click here)
102. Identifying Opportunities for Parallelism - Answer (click here)
103. Using Multiple Processes to Improve System Productivity - Answer (click
here)
104. Multiple Users Utilizing a Single System - Answer (click here)
105. Improving Machine Efficiency Through Consolidation - Answer (click here)
106. Using Containers to Isolate Applications Sharing a Single System -
Answer (click here)
107. Hosting Multiple Operating Systems Using Hypervisors - Answer (click
here)
108. Using Parallelism to Improve the Performance of a Single Task - Answer
(click here)
109. One Approach to Visualizing Parallel Applications - Answer (click here)
110. How Parallelism Can Change the Choice of Algorithms - Answer (click
here)
111. Amdahl’s Law - Answer (click here)
112. Determining the Maximum Practical Threads - Answer (click here)
113. How Synchronization Costs Reduce Scaling - Answer (click here)
114. Parallelization Patterns - Answer (click here)
115. Data Parallelism Using SIMD Instructions - Answer (click here)
116. Parallelization Using Processes or Threads - Answer (click here)
117. Multiple Independent Tasks - Answer (click here)
118. Multiple Loosely Coupled Tasks - Answer (click here)
119. Multiple Copies of the Same Task - Answer (click here)
120. Single Task Split Over Multiple Threads - Answer (click here)
121. Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
122. Division of Work into a Client and a Server - Answer (click here)
123. Splitting Responsibility into a Producer and a Consumer - Answer (click
here)
124. Combining Parallelization Strategies - Answer (click here)
125. How Dependencies Influence the Ability Run Code in Parallel - Answer
(click here)
126. Antidependencies and Output Dependencies - Answer (click here)
127. Using Speculation to Break Dependencies - Answer (click here)
128. Critical Paths - Answer (click here)
129. Identifying Parallelization Opportunities - Answer (click here)
130. Synchronization and Data Sharing - Answer (click here)
131. Data Races - Answer (click here)
132. Using Tools to Detect Data Races - Answer (click here)
133. Avoiding Data Races - Answer (click here)
134. Synchronization Primitives - Answer (click here)
135. Mutexes and Critical Regions - Answer (click here)
136. Spin Locks - Answer (click here)
137. Semaphores - Answer (click here)
138. Readers-Writer Locks - Answer (click here)
139. Barriers - Answer (click here)
140. Atomic Operations and Lock-Free Code - Answer (click here)
141. Deadlocks and Livelocks - Answer (click here)
142. Communication Between Threads and Processes - Answer (click here)
143. Storing Thread-Private Data - Answer (click here)
144. Using POSIX Threads - Answer (click here)
145. Creating Threads - Answer (click here)
146. Compiling Multithreaded Code - Answer (click here)
147. Process Termination - Answer (click here)
148. Sharing Data Between Threads - Answer (click here)
149. Variables and Memory - Answer (click here)
150. Multiprocess Programming - Answer (click here)
151. Sockets - Answer (click here)
152. Reentrant Code and Compiler Flags - Answer (click here)
153. Windows Threading - Answer (click here)
154. Creating Native Windows Threads - Answer (click here)
155. Terminating Threads - Answer (click here)
156. Creating and Resuming Suspended Threads - Answer (click here)
157. Using Handles to Kernel Resources - Answer (click here)
158. Methods of Synchronization and Resource Sharing - Answer (click here)
159. An Example of Requiring Synchronization Between Threads - Answer
(click here)
160. Protecting Access to Code with Critical Sections - Answer (click here)
161. Protecting Regions of Code with Mutexes - Answer (click here)
162. Slim Reader/Writer Locks - Answer (click here)
163. Signaling Event Completion to Other Threads or Processes - Answer (click
here)
164. Wide String Handling in Windows - Answer (click here)
165. Creating Processes - Answer (click here)
166. Sharing Memory Between Processes - Answer (click here)
167. Inheriting Handles in Child Processes - Answer (click here)
168. Naming Mutexes and Sharing Them Between Processes - Answer (click
here)
169. Communicating with Pipes - Answer (click here)
170. Communicating Using Sockets - Answer (click here)
171. Atomic Updates of Variables - Answer (click here)
172. Allocating Thread-Local Storage - Answer (click here)
173. Setting Thread Priority - Answer (click here)
174. Using Automatic Parallelization and OpenMP - Answer (click here)
175. Using Automatic Parallelization to Produce a Parallel Application -
Answer (click here)
176. Identifying and Parallelizing Reductions - Answer (click here)
177. Automatic Parallelization of Codes Containing Calls - Answer (click here)
178. Assisting Compiler in Automatically Parallelizing Code - Answer (click
here)
179. Using OpenMP to Produce a Parallel Application - Answer (click here)
180. Using OpenMP to Parallelize Loops - Answer (click here)
181. Runtime Behavior of an OpenMP Application - Answer (click here)
182. Variable Scoping Inside OpenMP Parallel Regions - Answer (click here)
183. Parallelizing Reductions Using OpenMP - Answer (click here)
184. Accessing Private Data Outside the Parallel Region - Answer (click here)
185. Improving Work Distribution Using Scheduling - Answer (click here)
186. Using Parallel Sections to Perform Independent Work - Answer (click here)
187. Nested Parallelism - Answer (click here)
188. Using OpenMP for Dynamically Defined Parallel Tasks - Answer (click
here)
189. Keeping Data Private to Threads - Answer (click here)
190. Controlling the OpenMP Runtime Environment - Answer (click here)
191. Waiting for Work to Complete - Answer (click here)
192. Restricting the Threads That Execute a Region of Code - Answer (click
here)
193. Ensuring That Code in a Parallel Region Is Executed in Order - Answer
(click here)
194. Collapsing Loops to Improve Workload Balance - Answer (click here)
195. Enforcing Memory Consistency - Answer (click here)
196. An Example of Parallelization - Answer (click here)
197. Hand-Coded Synchronization and Sharing - Answer (click here)
198. Atomic Operations - Answer (click here)
199. Using Compare and Swap Instructions to Form More Complex Atomic
Operations - Answer (click here)
200. Enforcing Memory Ordering to Ensure Correct Operation - Answer (click
here)
201. Compiler Support of Memory-Ordering Directives - Answer (click here)
202. Reordering of Operations by the Compiler - Answer (click here)
203. Volatile Variables - Answer (click here)
204. Operating System–Provided Atomics - Answer (click here)
205. Lockless Algorithms - Answer (click here)
206. Dekker’s Algorithm - Answer (click here)
207. Producer-Consumer with a Circular Buffer - Answer (click here)
208. Scaling to Multiple Consumers or Producers - Answer (click here)
209. Scaling the Producer-Consumer to Multiple Threads - Answer (click here)
210. Modifying the Producer-Consumer Code to Use Atomics - Answer (click
here)
211. The ABA Problem - Answer (click here)
212. Scaling with Multicore Processors - Answer (click here)
213. Constraints to Application Scaling - Answer (click here)
214. Hardware Constraints to Scaling - Answer (click here)
215. Bandwidth Sharing Between Cores - Answer (click here)
216. False Sharing - Answer (click here)
217. Cache Conflict and Capacity - Answer (click here)
218. Pipeline Resource Starvation - Answer (click here)
219. Operating System Constraints to Scaling - Answer (click here)
220. Multicore Processors and Scaling - Answer (click here)
221. Other Parallelization Technologies - Answer (click here)
222. GPU-Based Computing - Answer (click here)
223. Language Extensions - Answer (click here)
224. Alternative Languages - Answer (click here)
225. Clustering Technologies - Answer (click here)
226. Transactional Memory - Answer (click here)
227. Vectorization - Answer (click here)

Вам также может понравиться