Вы находитесь на странице: 1из 11

What is Parallel Computing?

• Traditionally, software has been written for serial computation:


o To be run on a single computer having a single Central Processing Unit
(CPU);
o A problem is broken into a discrete series of instructions.
o Instructions are executed one after another.
o Only one instruction may execute at any moment in time.

• In the simplest sense, parallel computing is the simultaneous use of multiple


compute resources to solve a computational problem:
o To be run using multiple CPUs
o A problem is broken into discrete parts that can be solved concurrently
o Each part is further broken down to a series of instructions
o Instructions from each part execute simultaneously on different CPUs
• The compute resources can include:
o A single computer with multiple processors;
o An arbitrary number of computers connected by a network;
o A combination of both.
• The computational problem usually demonstrates characteristics such as the
ability to be:
o Broken apart into discrete pieces of work that can be solved
simultaneously;
o Execute multiple program instructions at any moment in time;
o Solved in less time with multiple compute resources than with a single
compute resource.

Uses for Parallel Computing:

• Historically, parallel computing has been considered to be "the high end of


computing", and has been used to model difficult scientific and engineering
problems found in the real world. Some examples:
o Atmosphere, Earth, Environment
o Physics - applied, nuclear, particle, condensed matter, high pressure,
fusion, photonics
o Bioscience, Biotechnology, Genetics
o Chemistry, Molecular Sciences
o Geology, Seismology
o Mechanical Engineering - from prosthetics to spacecraft
o Electrical Engineering, Circuit Design, Microelectronics
o Computer Science, Mathematics

• Today, commercial applications provide an equal or greater driving force in the


development of faster computers. These applications require the processing of
large amounts of data in sophisticated ways. For example:
o Databases, data mining
o Oil exploration
o Web search engines, web based business services
o Medical imaging and diagnosis
o Pharmaceutical design
o Management of national and multi-national corporations
o Financial and economic modeling
o Advanced graphics and virtual reality, particularly in the entertainment
industry
o Networked video and multi-media technologies
o Collaborative work environments

Why Use Parallel Computing?


Main Reasons:

• Save time and/or money: In theory, throwing more resources at a task will
shorten its time to completion, with potential cost savings. Parallel clusters can be
built from cheap, commodity components.

• Solve larger problems: Many problems are so large and/or complex that it is
impractical or impossible to solve them on a single computer, especially given
limited computer memory. For example:
o "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems
requiring PetaFLOPS and PetaBytes of computing resources.
o Web search engines/databases processing millions of transactions per
second

• Provide concurrency: A single compute resource can only do one thing at a time.
Multiple computing resources can be doing many things simultaneously. For
example, the Access Grid (www.accessgrid.org) provides a global collaboration
network where people from around the world can meet and conduct work
"virtually".
• Use of non-local resources: Using compute resources on a wide area network, or
even the Internet when local compute resources are scarce. For example:
o SETI@home (setiathome.berkeley.edu) uses over 330,000 computers for a
compute power over 528 TeraFLOPS (as of August 04, 2008)
o Folding@home (folding.stanford.edu) uses over 340,000 computers for a
compute power of 4.2 PetaFLOPS (as of November 4, 2008)

• Limits to serial computing: Both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
o Transmission speeds - the speed of a serial computer is directly dependent
upon how fast data can move through hardware. Absolute limits are the
speed of light (30 cm/nanosecond) and the transmission limit of copper
wire (9 cm/nanosecond). Increasing speeds necessitate increasing
proximity of processing elements.
o Limits to miniaturization - processor technology is allowing an increasing
number of transistors to be placed on a chip. However, even with
molecular or atomic-level components, a limit will be reached on how
small components can be.
o Economic limitations - it is increasingly expensive to make a single
processor faster. Using a larger number of moderately fast commodity
processors to achieve the same (or better) performance is less expensive.

Current computer architectures are increasingly relying upon hardware level


parallelism to improve performance:

o Multiple execution units


o Pipelined instructions
o Multi-core
o

Concepts and Terminology


von Neumann Architecture

• Named after the Hungarian mathematician John von Neumann who first authored
the general requirements for an electronic computer in his 1945 papers.
• Since then, virtually all computers have followed this basic design, which differed
from earlier computers programmed through "hard wiring".
o Comprised of four main components:
 Memory
 Control Unit
 Arithmetic Logic Unit
 Input/Output
o Read/write, random access memory is used to store both program
instructions and data
 Program instructions are coded data which tell the computer to do
something
 Data is simply information to be used by the program
o Control unit fetches instructions/data from memory, decodes the
instructions and then sequentially coordinates operations to accomplish
the programmed task.
o Aritmetic Unit performs basic arithmetic operations

o Input/Output is the interface to the human operator

Flynn's Classical Taxonomy

• There are different ways to classify parallel computers. One of the more widely
used classifications, in use since 1966, is called Flynn's Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer architectures according
to how they can be classified along the two independent dimensions of
Instruction and Data. Each of these dimensions can have only one of two
possible states: Single or Multiple.
• The matrix below defines the 4 possible classifications according to Flynn:
SISD SIMD

Single Instruction, Single Data Single Instruction, Multiple Data

MISD MIMD

Multiple Instruction, Single Data Multiple Instruction, Multiple Data

Single Instruction, Single Data (SISD):

• A serial (non-parallel) computer


• Single instruction: only one instruction stream is
being acted on by the CPU during any one clock
cycle
• Single data: only one data stream is being used as
input during any one clock cycle
• Deterministic execution
• This is the oldest and even today, the most common
type of computer

• Examples: older generation mainframes,


minicomputers and workstations; most modern day
PCs.

Single Instruction, Multiple Data (SIMD):

• A type of parallel computer


• Single instruction: All processing units execute the same instruction at any given
clock cycle
• Multiple data: Each processing unit can operate on a different data element
• Best suited for specialized problems characterized by a high degree of regularity,
such as graphics/image processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
Multiple Instruction, Single Data (MISD):

• A single data stream is fed into multiple processing units.


• Each processing unit operates on the data independently via independent
instruction streams.
• Few actual examples of this class of parallel computer have ever existed. One is
the experimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
o multiple frequency filters operating on a single signal stream
o multiple cryptography algorithms attempting to crack a single coded
message.

Multiple Instruction, Multiple Data (MIMD):


• Currently, the most common type of parallel computer. Most modern computers
fall into this category.
• Multiple Instruction: every processor may be executing a different instruction
stream
• Multiple Data: every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Examples: most current supercomputers, networked parallel computer clusters
and "grids", multi-processor SMP computers, multi-core PCs.

• Note: many MIMD architectures also include SIMD execution sub-components

Handler classificqation :

In computer programming, an event handler is an asynchronous callback subroutine that


handles inputs received in a program. Each event is a piece of application-level
information from the underlying framework, typically the GUI toolkit. GUI events
include key presses, mouse movement, action selections, and timers expiring. On a lower
level, events can represent availability of new data for reading a file or network stream.
Event handlers are a central concept in event-driven programming.

The events are created by the framework based on interpreting lower-level inputs, which
may be lower-level events themselves. For example, mouse movements and clicks are
interpreted as menu selections. The events initially originate from actions on the
operating system level, such as interrupts generated by hardware devices, software
interrupt instructions, or state changes in polling. On this level, interrupt handlers and
signal handlers correspond to event handlers.
Created events are first processed by an event dispatcher within the framework. It
typically manages the associations between events and event handlers, and may queue
event handlers or events for later processing. Event dispatchers may call event handlers
directly, or wait for events to be dequeued with information about the handler to be
executed. Handling signals

Signal handlers can be installed with the signal() system call. If a signal handler is not
installed for a particular signal, the default handler is used. Otherwise the signal is
intercepted and the signal handler is invoked. The process can also specify two default
behaviors, without creating a handler: ignore the signal (SIG_IGN) and use the default
signal handler (SIG_DFL). There are two signals which cannot be intercepted and
handled: SIGKILL and SIGSTOP.

[edit] Risks

Signal handling is vulnerable to race conditions. Because signals are asynchronous,


another signal (even of the same type) can be delivered to the process during execution of
the signal handling routine. The sigprocmask() call can be used to block and unblock
delivery of signals.

Signals can cause the interruption of a system call in progress, leaving it to the
application to manage a non-transparent restart.

Signal handlers should be written in a way that doesn't result in any unwanted side-
effects, e.g. errno alteration, signal mask alteration, signal disposition change, and other
global process attribute changes. Use of non-reentrant functions, e.g. malloc or printf,
inside signal handlers is also unsafe.

[edit] Relationship with Hardware Exceptions

A process's execution may result in the generation of a hardware exception, for instance,
if the process attempts to divide by zero or incurs a TLB miss. In Unix-like operating
systems, this event automatically changes the processor context to start executing a
kernel exception handler. With some exceptions, such as a page fault, the kernel has
sufficient information to fully handle the event and resume the process's execution. In
other exceptions, however, the kernel cannot proceed intelligently and must instead defer
the exception handling operation to the faulting process. This deferral is achieved via the
signal mechanism, wherein the kernel sends to the process a signal corresponding to the
current exception. For example, if a process attempted to divide by zero on an x86 CPU,
a divide error exception would be generated and cause the kernel to send the SIGFPE
signal to the process. Similarly, if the process attempted to access a memory address
outside of its virtual address space, the kernel would notify the process of this violation
via a SIGSEGV signal. The exact mapping between signal names and exceptions is
obviously dependent upon the CPU, since exception types differ between architectures.
Amdahl's law and Gustafson's law

A graphical representation of Amdahl's law. The speed-up of a program from


parallelization is limited by how much of the program can be parallelized. For example, if
90% of the program can be parallelized, the theoretical maximum speed-up using parallel
computing would be 10x no matter how many processors are used.

Optimally, the speed-up from parallelization would be linear—doubling the number of


processing elements should halve the runtime, and doubling it a second time should again
halve the runtime. However, very few parallel algorithms achieve optimal speed-up. Most
of them have a near-linear speed-up for small numbers of processing elements, which
flattens out into a constant value for large numbers of processing elements. Grid
computing is the most distributed form of parallel computing. It makes use of computers
communicating over the Internet to work on a given problem. Because of the low
bandwidth and extremely high latency available on the Internet, grid computing typically
deals only with embarrassingly parallel problems. Many grid computing applications
have been created, of which SETI@home and Folding@Home are the best-known
examples.[31]

Most grid computing applications use middleware, software that sits between the
operating system and the application to manage network resources and standardize the
software interface. The most common grid computing middleware is the Berkeley Open
Infrastructure for Network Computing (BOINC). Often, grid computing software makes
use of "spare cycles", performing computations at times when a computer is idling.

The potential speed-up of an algorithm on a parallel computing platform is given by


Amdahl's law, originally formulated by Gene Amdahl in the 1960s.[11] It states that a
small portion of the program which cannot be parallelized will limit the overall speed-up
available from parallelization. Any large mathematical or engineering problem will
typically consist of several parallelizable parts and several non-parallelizable (sequential)
parts. This relationship is given by the equation:

S= 1/1-p

where S is the speed-up of the program (as a factor of its original sequential runtime), and
P is the fraction that is parallelizable. If the sequential portion of a program is 10% of the
runtime, we can get no more than a 10× speed-up, regardless of how many processors are
added. This puts an upper limit on the usefulness of adding more parallel execution units.
"When a task cannot be partitioned because of sequential constraints, the application of
more effort has no effect on the schedule. The bearing of a child takes nine months, no
matter how many women are assigned."[12]

Gustafson's law is another law in computer engineering, closely related to Amdahl's law.
It can be formulated as:

S(p)=p- aplha(p-1)

Assume that a task has two independent parts, A and B. B takes roughly 25% of the time
of the whole computation. With effort, a programmer may be able to make this part five
times faster, but this only reduces the time for the whole computation by a little. In
contrast, one may need to perform less work to make part A twice as fast. This will make
the computation much faster than by optimizing part B, even though B got a greater
speed-up (5× versus 2×).

where P is the number of processors, S is the speed-up, and α the non-parallelizable part
of the process.[13] Amdahl's law assumes a fixed-problem size and that the size of the
sequential section is independent of the number of processors, whereas Gustafson's law
does not make these assumptions.

Вам также может понравиться