Вы находитесь на странице: 1из 10

UNIVERSITI SAINS MALAYSIA

First Semester Examination Academic Session 2010/2011 November 2010

CCS524 Parallel Computing Architectures and Algorithms


[Seni Bina dan Algoritma Perkomputeran Selari]
Duration : 2 hours
[Masa : 2 jam]

INSTRUCTIONS TO CANDIDATE:
[ARAHAN KEPADA CALON:]

Please ensure that this examination paper contains THREE questions in SEVEN printed pages before you begin the examination.
[Sila pastikan bahawa kertas peperiksaan ini mengandungi TIGA soalan di dalam TUJUH muka surat yang bercetak sebelum anda memulakan peperiksaan ini.]

Answer ALL questions.


[Jawab SEMUA soalan.]

You may answer the questions either in English or in Bahasa Malaysia.


[Anda dibenarkan menjawab soalan sama ada dalam bahasa Inggeris atau bahasa Malaysia.]

In the event of any discrepancies, the English version shall be used.


[Sekiranya terdapat sebarang percanggahan pada soalan peperiksaan, versi bahasa Inggeris hendaklah diguna pakai.]

...2

[CCS524] -21. (a) (i) What is a thread? (2/100)

A thread is a sequence of instructions that can be executed in parallel with other threads. It is a lightweight process and a basic unit of CPU utilization.
(ii) What are the four (4) advantages of threads and justify your answers? (8/100)
1.) Software Portability Can be developed on serial machines and run on parallel machines w/o change. 2.) Latency hiding A major overhead is memory, I/O, and communication latency. When on thread is blocked, another can proceed. While one thread is waiting on communication, another can operate. 3.) Scheduling and Load Balancing Threaded API's allow the programmer to specify a large number of concurrent tasks. Minimizes idling overhead. Frees the programmer from explicit scheduling and load balance. Done by the OS. 4.) Ease of Programming Easier to write than message passing programs using tools like MPI and PVM. We may lose some efficiencies however. POSIX threads have become a standard and are widely used.

(b)

Elaborate on the computer architecture classifications based on Flynns Taxonomy. Provide an example for each classification. SISD Single Instruction, single data Von Neuman model MISD Multiple Instruction, Single data signal processing SIMD Single Instruction, multiple data - GPGPU MIMD- Multiple Instruction, multiple data Multicore processor (8/100)

(c)

OpenMP directives are based on the #pragma compiler that can be written as: #pragma omp directive [clause list] (i) List four (4) OpenMP directives.
-parallel -section -for -barrier

(4/100)

(ii) Elaborate on how and when to use the listed directives. -parallel : create group of threads ...3

[CCS524] -2-section:non iterative task assignment


-for:split parallel iteration space across threads -barrier:for synchronization

(4/100)

(iii) Describe the use and function of the directives. OpenMP directives provide support for concurrency, synchronization, and data handling while obviating the need for explicitly setting up mutexes, condition variables, data scope, and initialization

(4/100)

Describe the architecture on which message passing library can be used to develop parallel programs. (2/100) Message passing architecture -Memory is physically distributed among processors; each local memory is directly accessible only by its processor (ii) How is data shared among processors in MPI? The data shared among processors in MPI by using message passing routine. (iii) Differentiate between blocking and non-blocking communications.
y y

2.

(a)

(i)

(2/100)

(4/100) Synchronous/Blocking Message Passing: o Routines that return when the message transfer has been completed. Non-blocking/asynchronous message passing: o Routines that return regardless whether the message has been transferred (iv) The data message which is sent or received in MPI is described by a triplet (address, count, data type). Why is there a need to define the data type when sending a message? (2/100) o Specifying application-oriented layout of data in memory  reduces memory-to-memory copies in the implementation  allows the use of special hardware (scatter/gather) when available

...4

[CCS524] -3(b) Given the following results for the execution of a parallel program. Answer the following questions: Number of Processors 1 2 3 4 5 6 Execution Time (Sec) 99.230769 60.274725 49.285714 43.681319 40.714286 41.978022

Calculate the speedup for the cases when the number of processors are 2, 3, 4, 5 and 6. (5/100)  Number of Processors Execution Time (Sec) Speedup 1.646308 2 60.274725 2.013378 3 49.285714 2.271698 4 43.681319 2.437247 5 40.714286 2.363874 6 41.978022

(i)

(ii)

What can you deduce from the speedup results found in 2(b)(i)? (4/100)

From 2 to 5 speed up increase the speed up improves (from 2 to 5 number) as the num of processors increases. We also see significant speedup from the transition of 2 to 3 processors. Therefore more processors available to execute more tasks thus result to better performance. With 6 processor, speedup reduce one possible cause of this is when number of processors (thread/process) exceed number of logical processors thus result to excess/extra computations.

(iii) Will adding more processors help provide better performance? Explain your answer. Adding more processors will not help provide better performance because the communication overhead between processors degrades the speedup. (3/100) (c) You and a friend have decided to test your programming and algorithmic skills by writing a parallel program to solve a given problem. Each of you wrote separate programs and compared the speedups for 10 processors. Your program has a speedup of 5.5, while your friend obtained a nicer speedup figure of 6.5. When both of you started to discuss the different approaches you both took as well as the details of the experiment, you realize both you and your friend ...5

[CCS524] -3computed the speedup as T(1)/T(10), and used your own programs to generate T(1). You also find out that the execution times for your friend's program were 65 seconds and 10 seconds for T(1) and T(10), respectively. Your execution times were 50 and 9.09 seconds for T(1) and T(10), respectively. (i) Your friends speedup figure of 6.5 is not an accurate measure for comparison. Why? For the purpose of computing speedup, we always consider the best sequential program as the baseline. (4/100) (4/100)

(ii) What should be done to ensure an accurate comparison? We should use speedup figure 5.5 seconds with respect to the best serial algorithm.

...6

[CCS524] -43. Given a list of integers {1,2,3, ...,n}. Design a pseudocode in OpenMP and MPI. (a) Consider the following matters for pseudocode in OpenMP: (i) What are the design choices considered? o Single Program Multiple Data SPMD o Embarrassingly Parallel o Master / Slave o Work Pool o Divide and Conquer o Pipeline o Competition (3/100)

(ii) Describe the directives used in the pseudocode. -parallel : create group of threads
-for:split parallel iteration space across threads -barrier:for synchronization

(10/100)

(iii) What synchronization construct is used in the design?


#pragma omp barrier

(7/100)

(b)

Consider the following matters for pseudocode in MPI: (i) Indicate whether you will use function partitioning or data partitioning for this task.

Data partitioning - The data associated with a problem is decomposed. Each parallel task then works on a portion of of the data. (3/100) (ii) Depending on your choice in 3(b)(i), illustrate how the partitioning will be done. (7/100)

...7

[CCS524] -4-

(iii) Using a master-slave programming model, write the pseudocode (no need to be concerned with MPI syntax, you can use generic send/receive commands) to solve this problem based on your choice of approach in 3(b)(i). find out if I am MASTER or WORKER if I am MASTER initialize the array send each WORKER info on part of array it owns send each WORKER its portion of initial array receive from each WORKER results else if I am WORKER receive from MASTER info on part of array I own receive from MASTER my portion of initial array # calculate my portion of array do j = my first column,my last column do i = 1,n a(i,j) = fcn(i,j) end do end do send MASTER results endif (10/100)

...8

KERTAS SOALAN DALAM VERSI BAHASA MALAYSIA


[CCS524] -51. (a) (i) Apakah itu bebenang? (2/100) (ii) Apakah empat (4) kebaikan bebenang dan jelaskan jawapan anda? (8/100) (b) Jelaskan penklasifikasian reka bentuk komputeran berdasarkan Flynns Taxonomy. Berikan contoh bagi setiap klasifikasi tersebut. (8/100) Direktif OpenMP adalah berdasarkan pengkompil #pragma yang ditulis sebagai: #pragma omp direktif [senarai klausa] (i) Senaraikan empat (4) direktif OpenMP. (4/100) (ii) Terangkan bagaimana dan bila direktif yang tersenarai digunakan. (4/100) (iii) Jelaskan kegunaan dan fungsi direktif tersebut. (4/100) 2. (a) (i) Jelaskan seni bina yang membolehkan perpustakaan penghantaran mesej boleh digunakan untuk menghasilkan atur cara selari. (2/100) Bagaimana data dikongsi sesama pemproses dalam MPI? (2/100) (iii) Bezakan antara komunikasi menahan dan bukan-menahan. (4/100) (iv) Mesej data yang dihantar atau terima dalam MPI digambarkan oleh suatu tigaan (alamat, bilangan, jenis data). Mengapakah ia perlu untuk menakrif jenis data apabila menghantar mesej? (2/100)

(c)

(ii)

...9

[CCS524] -6(b) Diberi keputusan berikut bagi pelaksanaan satu atur cara selari. Jawab soalansoalan berikut: Bilangan Pemproses 1 2 3 4 5 6 Masa Larian (Saat) 99.230769 60.274725 49.285714 43.681319 40.714286 41.978022

(i)

Hitungkan speedup bagi kes-kes melibatkan bilangan pemproses 2, 3, 4, 5 dan 6. (5/100) Apakah yang boleh dirumus daripada keputusan speedup yang diperolehi di 2(b)(i)? (4/100)

(ii)

(iii) Adakah dengan menambah bilangan pemproses akan membantu dalam meningkatkan prestasi? Jelaskan jawapan anda. (3/100) (c) Anda dan rakan anda mengambil keputusan untuk menguji kemahiran pengaturcaraan dan algoritma dengan menulis satu atur cara selari bagi menyelesaikan satu masalah. Anda dan rakan anda masing-masing telah menulis atur cara berlainan dan membandingkan speedup bagi 10 pemproses. Atur cara anda mempunyai speedup 5.5, manakala rakan anda memperolehi speedup yang lebih baik iaitu 6.5. Apabila anda berdua mula berbincang tentang perbezaan pendekatan yang telah diambil serta perincian eksperimen, anda menyedari bahawa anda berdua menghitung speedup sebagai T(1)/T(10), dan menggunakan atur cara anda untuk menjana T(1). Anda juga mendapati masa larian bagi atur cara rakan anda ialah 65 saat dan 10 saat bagi T(1) dan T(10) masing-masing. Masa larian anda ialah 50 saat dan 9.09 saat bagi T(1) dan T(10) masing-masing. (i) Bacaan speedup rakan anda ialah 6.5 bukanlah suatu ukuran yang tepat untuk perbandingan. Mengapa? (4/100) Apakah yang harus dilakukan untuk mempastikan perbandingan yang tepat? (4/100)

(ii)

...6

[CCS524] -73. (a) Diberi senarai integer {1,2,3, ...,n}. Rekakan pseudokod dalam OpenMP dan MPI. Pertimbangkan perkara-perkara berikut untuk pseudokod dalam OpenMP. (i) Apakah pilihan-pilihan rekaan yang dipertimbangkan? (3/100) (ii) Jelaskan direktif-direktif yang digunakan dalam pseudokod. (10/100) (iii) Apakah binaan segerak yang digunakan dalam rekaan tersebut? (7/100) (b) Pertimbangkan perkara-perkara berikut untuk pseudokod dalam MPI. (i) Nyatakan sama ada anda akan menggunakan pembahagian fungsi atau pembahagian data bagi masalah ini. (3/100) Berdasarkan pilihan anda dalam pembahagian tersebut akan dilakukan. 3(b)(i), terangkan bagaimana (7/100) (iii) Menggunakan model pengaturcaraan master-slave, tuliskan pseudokod (tidak perlu menimbangkan nahu MPI, anda boleh menggunakan arahan hantar/terima yang generik) untuk menyelesaikan masalah ini berdasarkan pilihan pendekatan anda dalam 3(b)(i). (10/100)

(ii)

- oooOooo -

Вам также может понравиться