Академический Документы
Профессиональный Документы
Культура Документы
Question 1: MCQs
1.1 Term which is named as shared memory with both SMP and DSM referring that the address
space is
1.2 Server clusters grow to 10th of thousands and beyond, are called
1.3 For achieving a speedup up to 80 with 100 processors. What original computation fraction
can be sequential?
1.7 Two-way set associative having a 64-byte block, single clock-cycle hit time is a
Question II
Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has
a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and
load/store instructions, meaning that each non-diverged SIMD instruction can produce 32
results every 4 cycles. Assume a kernel that has divergent branches that causes on average
80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-
precision arithmetic and 20% are load/store. Since not all memory latencies are covered,
assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock
b. Increasing the number of SIMD processors to 15 (assume this change doesn't affect any other
performance metrics and that the code scales to the additional processors)
c. Adding a cache that will effectively reduce memory latency by 40%, which will increase
Solution:
1. Throughput = clock speed × average number of threads to be active × instruction issue rate ×
single precision instruction percentage × number of SIMD processors ×number of results per
2. a. Simply replace 32/4 by 64/4, therefore the new throughput is: 1.5 GHz x 0.8 x 0.85 x 0.7 x
b. Replace 10 with 15: 1.5 GHz x 0.8 x 0.85 x 0.7 x 15 x 32/4 x 2 = 85.68 GFLOP/sec speedup
= 1.5
c. Replace 0.85 with 0.95: 1.5 GHz x 0.8 x 0.95 x 0.7 x 10 x 32/4 x 2 = 63.84 GFLOP/sec
speedup = 1.12
Question III: With reference to the below mentioned figure, write a detailed
The above figure shows the importance of MIMD versus SIMD. This figure shows the plot of
the number of votes for MIMD versus the number of 32-bit & 64-bit operation s per clock cycles
in SIMD mode for the x86 computers with the passage of time. In case of the x86 computers we
note that there are two additional cores per chips that increase every two years, where the width
of the SIMD doubles after every four years. Based upon these observations and the subsequent
assumptions we might infer that the potential speedup for the SIMD becomes double than the
MIMD parallelism. So it is useless to understand both of this parallelism (SIMD and MIMD) in
the same frame. However, MIMD received more importance than the SIMD parallelism. The
applications that have the thread-level & data-level parallelism, the potential speedup will be
much higher in the year 2020 than the status it secures today. There are several advantages and
disadvantages to use both of these parallelism systems. For SIMD parallelism, despite its broad
usage, there are certain cons such as; there are only a few applications that might support enough
data-parallelism that will be exploited by SIMD parallelism, when a programmer works on the
SIMD parallelism based systems, he needs to handle careful grouping of the data and should
have a very deep understanding in SIMD I.e., the systems that utilize SIMD, are not trivial to
programs, it needs larger register files that might utilize large power & area, the instructions to
operate SIMD-based parallelism often requires complicated operations. The MIMD parallelism,
on the other hand, has certain disadvantages too. For instance, the scalability used in MIMD
parallelism; beyond the thirty two processors is very complicated & difficult to be handled, and
the shared memory model in case of the MIMD parallelism is much lower & flexible than the