Advanced Computer Architecture Question 1: Mcqs

Advanced Computer Architecture
Question 1: MCQs
1.1 Term which is named as shared memory with both SMP and DSM referring that the address
space is
Answer: Option D (shared memory)
1.2 Server clusters grow to 10th of thousands and beyond, are called
Answer: Option A (Warehouse scale Computer)
1.3 For achieving a speedup up to 80 with 100 processors. What original computation fraction
can be sequential?
Answer: Option B (0.995)
1.4 Multiple-applications independently running, are typically called
Answer: Option B (Multiprogramming)
1.5 Symmetric multiprocessors architectures are sometimes known as
Answer: Option A (uniform memory assessors)
1.6 From inter-processor communication, misses arises are often called
Answer: Option A (coherence misses)
1.7 Two-way set associative having a 64-byte block, single clock-cycle hit time is a
Answer: Option A (Level 1 instruction cache)

1.8 Tightly coupled set of threads' execution working on a single task, that is called
Answer: Option C (parallel processing)
Question II
Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has
a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and
load/store instructions, meaning that each non-diverged SIMD instruction can produce 32
results every 4 cycles. Assume a kernel that has divergent branches that causes on average
80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-
precision arithmetic and 20% are load/store. Since not all memory latencies are covered,
assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock
speed of 1.5 GHz.
1. Compute the throughput, in GFLOP/sec, for this kernel on this GPU.
2. Assume that you have the following choices:
a. Increasing the number of single-precision lanes to 16
b. Increasing the number of SIMD processors to 15 (assume this change doesn't affect any other
performance metrics and that the code scales to the additional processors)
c. Adding a cache that will effectively reduce memory latency by 40%, which will increase
instruction issue rate to 0.95.
What is speedup in throughput for each of these improvements?
Solution:
1. Throughput = clock speed × average number of threads to be active × instruction issue rate ×
single precision instruction percentage × number of SIMD processors ×number of results per
cycle × 2 = 1.5 GHz x 0.8 x 0.85 x 0.7 x 10 x 32/4 x 2 = 57.12 GFLOP/sec b.
2. a. Simply replace 32/4 by 64/4, therefore the new throughput is: 1.5 GHz x 0.8 x 0.85 x 0.7 x
10 x 64/4 x 2 = 114.24 GFLOP/sec speedup = 2
b. Replace 10 with 15: 1.5 GHz x 0.8 x 0.85 x 0.7 x 15 x 32/4 x 2 = 85.68 GFLOP/sec speedup
= 1.5
c. Replace 0.85 with 0.95: 1.5 GHz x 0.8 x 0.95 x 0.7 x 10 x 32/4 x 2 = 63.84 GFLOP/sec
speedup = 1.12
Question III: With reference to the below mentioned figure, write a detailed
description of each approach and pro and cons as well.

Solution:
The above figure shows the importance of MIMD versus SIMD. This figure shows the plot of
the number of votes for MIMD versus the number of 32-bit & 64-bit operation s per clock cycles
in SIMD mode for the x86 computers with the passage of time. In case of the x86 computers we
note that there are two additional cores per chips that increase every two years, where the width
of the SIMD doubles after every four years. Based upon these observations and the subsequent
assumptions we might infer that the potential speedup for the SIMD becomes double than the
MIMD parallelism. So it is useless to understand both of this parallelism (SIMD and MIMD) in
the same frame. However, MIMD received more importance than the SIMD parallelism. The
applications that have the thread-level & data-level parallelism, the potential speedup will be
much higher in the year 2020 than the status it secures today. There are several advantages and
disadvantages to use both of these parallelism systems. For SIMD parallelism, despite its broad
usage, there are certain cons such as; there are only a few applications that might support enough
data-parallelism that will be exploited by SIMD parallelism, when a programmer works on the
SIMD parallelism based systems, he needs to handle careful grouping of the data and should
have a very deep understanding in SIMD I.e., the systems that utilize SIMD, are not trivial to
programs, it needs larger register files that might utilize large power & area, the instructions to
operate SIMD-based parallelism often requires complicated operations. The MIMD parallelism,
on the other hand, has certain disadvantages too. For instance, the scalability used in MIMD
parallelism; beyond the thirty two processors is very complicated & difficult to be handled, and
the shared memory model in case of the MIMD parallelism is much lower & flexible than the
rigid distribution memory model.

Advanced Computer Architecture Question 1: Mcqs

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Advanced Computer Architecture Question 1: Mcqs

Загружено:

Авторское право:

Доступные форматы

Advanced Computer Architecture

Answer: Option D (shared memory)

Answer: Option A (Warehouse scale Computer)

Answer: Option B (0.995)

1.4 Multiple-applications independently running, are typically called

Answer: Option B (Multiprogramming)

1.5 Symmetric multiprocessors architectures are sometimes known as

Answer: Option A (uniform memory assessors)

1.6 From inter-processor communication, misses arises are often called

Answer: Option A (coherence misses)

Answer: Option A (Level 1 instruction cache)

Answer: Option C (parallel processing)

speed of 1.5 GHz.

1. Compute the throughput, in GFLOP/sec, for this kernel on this GPU.

2. Assume that you have the following choices:

a. Increasing the number of single-precision lanes to 16

instruction issue rate to 0.95.

What is speedup in throughput for each of these improvements?

cycle × 2 = 1.5 GHz x 0.8 x 0.85 x 0.7 x 10 x 32/4 x 2 = 57.12 GFLOP/sec b.

10 x 64/4 x 2 = 114.24 GFLOP/sec speedup = 2

description of each approach and pro and cons as well.

rigid distribution memory model.

Вам также может понравиться