Вы находитесь на странице: 1из 4

Advanced Computer Architecture

Question 1: MCQs

1.1 Term which is named as shared memory with both SMP and DSM referring that the address

space is

Answer: Option D (shared memory)

1.2 Server clusters grow to 10th of thousands and beyond, are called

Answer: Option A (Warehouse scale Computer)

1.3 For achieving a speedup up to 80 with 100 processors. What original computation fraction

can be sequential?

Answer: Option B (0.995)

1.4 Multiple-applications independently running, are typically called

Answer: Option B (Multiprogramming)

1.5 Symmetric multiprocessors architectures are sometimes known as

Answer: Option A (uniform memory assessors)

1.6 From inter-processor communication, misses arises are often called

Answer: Option A (coherence misses)

1.7 Two-way set associative having a 64-byte block, single clock-cycle hit time is a

Answer: Option A (Level 1 instruction cache)


1.8 Tightly coupled set of threads' execution working on a single task, that is called

Answer: Option C (parallel processing)

Question II

Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has

a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and

load/store instructions, meaning that each non-diverged SIMD instruction can produce 32

results every 4 cycles. Assume a kernel that has divergent branches that causes on average

80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-

precision arithmetic and 20% are load/store. Since not all memory latencies are covered,

assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock

speed of 1.5 GHz.

1. Compute the throughput, in GFLOP/sec, for this kernel on this GPU.

2. Assume that you have the following choices:

a. Increasing the number of single-precision lanes to 16

b. Increasing the number of SIMD processors to 15 (assume this change doesn't affect any other

performance metrics and that the code scales to the additional processors)

c. Adding a cache that will effectively reduce memory latency by 40%, which will increase

instruction issue rate to 0.95.

What is speedup in throughput for each of these improvements?

Solution:
1. Throughput = clock speed × average number of threads to be active × instruction issue rate ×

single precision instruction percentage × number of SIMD processors ×number of results per

cycle × 2 = 1.5 GHz x 0.8 x 0.85 x 0.7 x 10 x 32/4 x 2 = 57.12 GFLOP/sec b.

2. a. Simply replace 32/4 by 64/4, therefore the new throughput is: 1.5 GHz x 0.8 x 0.85 x 0.7 x

10 x 64/4 x 2 = 114.24 GFLOP/sec speedup = 2

b. Replace 10 with 15: 1.5 GHz x 0.8 x 0.85 x 0.7 x 15 x 32/4 x 2 = 85.68 GFLOP/sec speedup

= 1.5

c. Replace 0.85 with 0.95: 1.5 GHz x 0.8 x 0.95 x 0.7 x 10 x 32/4 x 2 = 63.84 GFLOP/sec

speedup = 1.12

Question III: With reference to the below mentioned figure, write a detailed

description of each approach and pro and cons as well.


Solution:

The above figure shows the importance of MIMD versus SIMD. This figure shows the plot of

the number of votes for MIMD versus the number of 32-bit & 64-bit operation s per clock cycles

in SIMD mode for the x86 computers with the passage of time. In case of the x86 computers we

note that there are two additional cores per chips that increase every two years, where the width

of the SIMD doubles after every four years. Based upon these observations and the subsequent

assumptions we might infer that the potential speedup for the SIMD becomes double than the

MIMD parallelism. So it is useless to understand both of this parallelism (SIMD and MIMD) in

the same frame. However, MIMD received more importance than the SIMD parallelism. The

applications that have the thread-level & data-level parallelism, the potential speedup will be

much higher in the year 2020 than the status it secures today. There are several advantages and

disadvantages to use both of these parallelism systems. For SIMD parallelism, despite its broad

usage, there are certain cons such as; there are only a few applications that might support enough

data-parallelism that will be exploited by SIMD parallelism, when a programmer works on the

SIMD parallelism based systems, he needs to handle careful grouping of the data and should

have a very deep understanding in SIMD I.e., the systems that utilize SIMD, are not trivial to

programs, it needs larger register files that might utilize large power & area, the instructions to

operate SIMD-based parallelism often requires complicated operations. The MIMD parallelism,

on the other hand, has certain disadvantages too. For instance, the scalability used in MIMD

parallelism; beyond the thirty two processors is very complicated & difficult to be handled, and

the shared memory model in case of the MIMD parallelism is much lower & flexible than the

rigid distribution memory model.

Вам также может понравиться