4 views

Uploaded by Nita Pirolita

- ADCOM 2009 Conference Proceedings
- Parallel Matlab 2010
- Splunk 6.6.1 Updating (1)
- High-Performance Tsunami Wave Propagation Modeling
- Parallel computing engineering
- 12. Eng Minimizing Hamsa .K
- Nov_2007
- Advanced Computer Architecture SG
- The bionic DBMS is coming, but what will it look like?
- Super Computers
- Micro Strategy Admin
- GPU Parallel Computing Architecture and CUDA Programming Model
- 2015 New 70-450 VCE Free Download in Braindump2go (31-40)
- 310-330
- Cache Nptel
- VCS Problem and Solution
- AI DS
- Parallela Cluster by Michael Johan Kruger
- WSFC and AlwaysOn Deployment Guide
- Types of Computers

You are on page 1of 46

Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Tcnico e

2011-11-14

1 / 24

Outline

Simple Example: Opportunities for Parallelism Speedup and Overheads Application Areas Parallel Systems

2011-11-14

2 / 24

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);

No good?

2011-11-14

3 / 24

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);

No good?

2011-11-14

3 / 24

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);

No good?

2011-11-14

3 / 24

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);

No good?

2011-11-14

3 / 24

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);

No good?

2011-11-14

3 / 24

Speedup

S= tserial tparallel

2011-11-14

4 / 24

Speedup

S= tserial tparallel

tserial p )

S =p

(tparallel =

2011-11-14

4 / 24

Speedup

S= tserial tparallel

tserial p )

S =p

(tparallel =

2011-11-14

4 / 24

Speedup

S= tserial tparallel

tserial p )

S =p

(tparallel =

Yes!

2011-11-14

4 / 24

Speedup

S= tserial tparallel

tserial p )

S =p

(tparallel =

Yes!

2011-11-14

4 / 24

Speedup

S= tserial tparallel

tserial p )

S =p

(tparallel =

Yes!

2011-11-14

4 / 24

Overheads that limit parallel speedup?

2011-11-14

5 / 24

Overheads that limit parallel speedup?

data transfers (or more generally, communication among tasks) task startup / nalize load balancing inherent sequential portions of computation

2011-11-14

5 / 24

tparallel = f tserial + (1 S(p, f ) = f ) tserial p 1 f + 1pf

p!1

lim S(p, f ) =

1 f

2011-11-14

6 / 24

tparallel = f tserial + (1 S(p, f ) = f ) tserial p 1 f + 1pf

p!1

lim S(p, f ) =

1 f

f=0%

f=5%

f=10%

f=20%

2011-11-14

6 / 24

dene and coordinate concurrent tasks

2011-11-14

7 / 24

dene and coordinate concurrent tasks

low level parallel directives debug signicantly more di cult lack of programming models and environments

2011-11-14

7 / 24

dene and coordinate concurrent tasks

low level parallel directives debug signicantly more di cult lack of programming models and environments

parallel algorithm may not be e cient for next generation of parallel computers

2011-11-14

7 / 24

Application Areas

Why bother with parallel computation?

2011-11-14

8 / 24

Application Areas

Why bother with parallel computation? Continued demand for greater computational power from many dierent domains! Two major classes of problems in parallel computation:

Problems that cannot be solved in a reasonable amount of time with todays computers.

Problems whose workload can be easily divided into (almost) independent tasks.

Jos Monteiro (DEI / IST) e Parallel and Distributed Computing 2 2011-11-14 8 / 24

Global Environmental/Ecosystem Modeling Biomechanics and biomedical imaging Fluid dynamics Molecular nanotechnology Nuclear power and weapons simulations

2011-11-14

9 / 24

Numerical weather forecasting Computer graphics / animation Basic Local Alignment Search Tool (BLAST) in bioinformatics Monte-Carlo methods Genetic algorithms

2011-11-14

10 / 24

Atmospheric conditions (temperature, pressure, humidity, etc) for each cell are computed as a function of neighbors cell conditions in this and previous time intervals.

2011-11-14

11 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 .

2011-11-14

12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval.

2011-11-14

12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval. If the second is the time interval, and we want to compute the forecast for tomorrow (almost 105 seconds in a day), there a total of 5 1014 operations.

2011-11-14

12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval. If the second is the time interval, and we want to compute the forecast for tomorrow (almost 105 seconds in a day), there a total of 5 1014 operations. An Intel Pentium IV 3.2 GHz performs at 3 GFLOPS, hence taking about 40 hours...

2011-11-14

12 / 24

Each body has a given position, velocity, acceleration, that needs to be computed for every time interval.

Each body attracts (and/or repels) every other body. For n bodies, there are a total of n2 interactions that need to be accounted for.

Example: a galaxy has more than 1011 stars, leading to more than 1022 oating point operations for each time interval!

2011-11-14

13 / 24

Processor Evolution

2011-11-14

14 / 24

Supercomputer Evolution

2011-11-14

15 / 24

2011-11-14

16 / 24

First Peta system available in 2009! Estimate of humans brain computational power: 1014 neural connections at 200 calculations per second ) 20 PFLOPS

Jos Monteiro (DEI / IST) e Parallel and Distributed Computing 2 2011-11-14 17 / 24

Types of Supercomputers

Processor Arrays (SIMD)

Name associated with vector processing, very popular in early supercomputers.

2011-11-14

18 / 24

Types of Supercomputers

Processor Arrays (SIMD)

Name associated with vector processing, very popular in early supercomputers.

Multicore (SMP)

Set of processors sharing a common main memory.

2011-11-14

18 / 24

Types of Supercomputers

Processor Arrays (SIMD)

Name associated with vector processing, very popular in early supercomputers.

Multicore (SMP)

Set of processors sharing a common main memory.

Processors with individual main memory with tightly coupled interconnections.

2011-11-14

18 / 24

Processor Arrays (SIMD)

Name associated with vector processing, very popular in early supercomputers.

Multicore (SMP)

Set of processors sharing a common main memory.

Processors with individual main memory with tightly coupled interconnections.

Clusters

Processors with individual main memory linked together using InniBand, Quadrics, Myrinet, or Gigabit Ethernet connections.

COW / NOW: Cluster / Network Of Workstations Beowulf: cluster made of PCs running Linux using TCP/IP (COTS: Commodity-O-The-Shelf)

2011-11-14

18 / 24

Processor Arrays (SIMD)

Name associated with vector processing, very popular in early supercomputers.

Multicore (SMP)

Set of processors sharing a common main memory.

Processors with individual main memory with tightly coupled interconnections.

Clusters

Processors with individual main memory linked together using InniBand, Quadrics, Myrinet, or Gigabit Ethernet connections.

COW / NOW: Cluster / Network Of Workstations Beowulf: cluster made of PCs running Linux using TCP/IP (COTS: Commodity-O-The-Shelf)

Constellation

MPP / cluster where each node is a multicore.

Jos Monteiro (DEI / IST) e Parallel and Distributed Computing 2 2011-11-14 18 / 24

2011-11-14

19 / 24

2011-11-14

20 / 24

2011-11-14

21 / 24

Warehouse-size Computers

2011-11-14

22 / 24

Warehouse-size Computers

2011-11-14

22 / 24

Multicores

Sample of todays multicore processors: AMD

Opteron: dual, quad, hex, 8-, 12-cores Phenom: dual, quad, hex cores

Intel

Core i7: six hyperthreaded cores Dunnington (Xeon): six cores

Sun

Niagara: 8 cores; 8-way ne-grain multithreading per core

IBM

Power 7: dual, quad, hex, 8-core Cell: 1 PPC core; 8 SPEs w/ SIMD parallelism

Jos Monteiro (DEI / IST) e Parallel and Distributed Computing 2 2011-11-14 23 / 24

Next Class

levels of parallelism

2011-11-14

24 / 24

- ADCOM 2009 Conference ProceedingsUploaded bysahibhai
- Parallel Matlab 2010Uploaded byMónica Salgado
- Splunk 6.6.1 Updating (1)Uploaded bybobwillmore
- High-Performance Tsunami Wave Propagation ModelingUploaded byJakub Kowal
- Parallel computing engineeringUploaded byKangan Sardar Singh
- 12. Eng Minimizing Hamsa .KUploaded byImpact Journals
- Nov_2007Uploaded byapi-3737553
- Advanced Computer Architecture SGUploaded byKarthik Setty
- The bionic DBMS is coming, but what will it look like?Uploaded byWarren Smith QC (Quantum Cryptanalyst)
- Super ComputersUploaded byshreeyogi
- Micro Strategy AdminUploaded byrush2kar15
- GPU Parallel Computing Architecture and CUDA Programming ModelUploaded byMohammed Morsy
- 2015 New 70-450 VCE Free Download in Braindump2go (31-40)Uploaded byDemi Davison
- 310-330Uploaded byjimalif
- Cache NptelUploaded byjaneprice
- VCS Problem and SolutionUploaded bySalman Salu
- AI DSUploaded byEuu
- Parallela Cluster by Michael Johan KrugerUploaded byGiacomo Marco Toigo
- WSFC and AlwaysOn Deployment GuideUploaded byAlejandro Valenzuela
- Types of ComputersUploaded byniks220989
- Final ReportUploaded byVijay Panwar
- 3d5a4f6021bd32220713f2b41fd36d18d83110a2.pdfUploaded byomkarenator
- 2D Gaussian Filter for Image Processing: A StudyUploaded byIJSTE
- 1507606415_logUploaded bydhiforester33
- BFS TRUploaded bykumarabarbarian
- IntroductionUploaded byARun YadAv
- ACA AssignmentUploaded byshreyas sachin
- A1A_CptrArchUploaded byUlmann Madre
- Guide Obs Ha SetupUploaded byCristianEnache
- an_gfdma_004Uploaded byvisionctrl

- Cse IV Computer Organization [10cs46] NotesUploaded bynbpr
- SAP Solutions on VMware Best Practice Guide 2011Uploaded bySujay Anireddy
- M.Tech CSEUploaded byrakeshguptacse
- Workstation Player 15 Windows User GuideUploaded byLeonardoAcevedo
- Quick Assist Aal WhitepaperUploaded byManali Bhutiyani
- L2Uploaded bychhabra_amit78
- MSc ThesisUploaded bymail4scribd
- Embedded Systems Handbook.pdfUploaded byace hood
- 20101209122036 Master of Engineering Programme Regular and Modular Programme in Computer Science and EngineeringUploaded byvatsgaurav
- Iscse 2011 Proceedings Final-kusadasıUploaded byinvisible25
- pc-02-01Uploaded byRamya Vk
- B.tech CS S8 High Performance Computing Module Notes Module 4Uploaded byJisha Shaji
- Admin 400Uploaded byIwan
- SyllabusUploaded byVaibhav_Garg_1259
- Distributed System DesignUploaded byBaltazar Juárez García
- Operating Systems-1Uploaded byM Jameel Mydeen
- Introduction to Operating SystemsUploaded byrvaleth
- i7 fhfghUploaded byrajatpreet
- OS Module IUploaded bykalaraiju
- Trac Nghiem HDHUploaded byLê Thị Minh Châu
- Computer Architecture and Algorithms for High Performance Computing through Parallel and Distributed ProcessingUploaded byJournal of Computer Science and Engineering
- Multiprocessing SchedullingUploaded byOscar Loor
- OS Slides Ch1Uploaded byZachary James
- Multi ThreadingUploaded byHari Prasad Chowdary
- Parallel ProcessingUploaded byaymanwahba
- BCA 602 Handling Operating SystemsUploaded byRahul Yadav
- Operating System.docxUploaded byChiragThakur
- M.tech Syllabus 1st sem pondicherry engineering collegeUploaded bysendhilmca
- Java MultithreadingUploaded byS R Krishnan
- MCSE Solved AssignmentUploaded bySaurabh Mishra