Вы находитесь на странице: 1из 23

*

* Parallel * Parallel *A
computers are those systems that emphasize parallel processing. computer structure divided into three architectural configurations : pipeline computers , array processors and multiprocessor system. pipeline computer performs overlapped computations to exploit temporal parallelism. An array processor uses multiple synchronized arithmetic logic units to achieve spatial parallelism. A multiprocessor system achieves asynchronous parallelism through a set of interactive processors, with shared resources computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel").

* Parallel

*
* Save time and/or money : Throwing more resources at a task
will shorten its time to completion, with potential cost savings.

* Solve larger problems : Many problems are so large and/or


complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. For e.g : Web search engines/databases processing millions of transactions per second.

* Provide concurrency : Multiple computing resources can be


doing many things simultaneously.

* Transmission * Use

speeds : The parallel computers have high transmission speed as compare to serial computers. of non-local resources : Using compute resources on a wide area network, or even the Internet when local compute resources are scarce

*
* Shared
memory parallel computers vary widely. Multiple processors can operate independently but share the same memory resources. based upon memory access times: UMA and NUMA.

* Shared memory machines can be divided into two main classes * Uniform Memory Access (UMA):
* Most
commonly represented today by Symmetric Multiprocessor (SMP) machines.

* Equal access and access times


to memory.

* Identical processors. * Data sharing is fast. * Lack of scalability between memory and CPUs

*
* Often made by physically linking two or more SMPs. * One SMP can directly access
memory of another SMP.

* Not all processors have equal


access time to all memories.

* Memory access across link is


slower.

*
Each processor has its own memory.
Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors . Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors. Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply.

*
Pipeline computers - temporal parallelism. Array processors - spatial parallelism. Multiprocessor systems - asynchronous parallelism.
PIPELINE COMPUTERS : The process of executing an instruction in a digital computer involves four major steps: Instruction fetch from the main memory; Instruction decoding, identifying the operation to be performed; Operand fetch, if needed in the execution; and then execution of the decoded arithmetic logic operation. . In a pipelined computer, successive instructions are executed in an overlapped fashion.

*
*A
technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. * A Pipeline is a series of stages, where some work is done at each stage. The work is not finished until it has passed through all stages. * With pipelining, the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations, holding them in a buffer close to the processor until each instruction operation can performed.

*
* The
pipeline is divided into segments and each segment can execute it operation concurrently with the other segments. Once a segment completes an operations, it passes the result to the next segment in the pipeline and fetches the next operations from the preceding segment.
5
IF ID IF EX ID M EX W M

1
W

1
W

IF

ID IF

EX

1
W

ID

EX

Four Pipelined Instructions

*
* Types
of Pipeline processors are based on following three factors: Level of Processing Pipeline configuration Type of Instruction and data

* Classification according to level of processing:


Instruction Pipeline Arithmetic Pipeline Processor Pipeline

* Classification according to pipeline configuration:


Unifunction Pipelines Multifunction Pipelines

*
* Classification according to type of instruction and data:
Scalar Pipelines Vector Pipelines

* GENERAL PIPELINE AND RESERVATION TABLES


The interconnection structure and data flow patterns in general pipelines with either forwarded or feedback connections. Reservation tables are used how successive pipeline stages are utilized for a specific evaluation function.

*
* Data
Hazards an instruction uses the result of the previous instruction. A hazard occurs exactly when an instruction tries to read a register in its ID stage that an earlier instruction intends to write in its WB stage.

* Control

Hazards the location of an instruction depends on previous instruction

* Structural Hazards two instructions need to access the same


resource. E.g. two dogs fighting for the same bone.

*
An array processor is a synchronous parallel computer with multiple arithmetic logic units, called processing element (PE), that can operate in parallel in a lock-step fashion.

Functional structure of an SIMD Array processor with concurrent scalar processing in the control unit

*
* Its
purpose is to enhance the performance of the computer by providing vector processing for complex scientific applications. The objective of the attached array processor is to provide vector manipulation capabilities to a conventional computer at a fraction of the cost of supercomputer.

Attached array processor with host computer

* An

array processor can handle single instruction multiple data streams. It is also known as SIMD computers. * SIMD appears in 2 basic architectural organizations: Array Processors using random access memory Associative Processors Classification of Parallel Machines: * Single Instruction Stream, Single Data Stream: SISD * Multiple Instruction Stream, Single Data Stream: MISD * Single Instruction Stream, Multiple Data Stream: SIMD * Multiple Instruction Stream, Multiple Data Stream: MIMD SIMD COMPUTER ORGANIZATION:

* Research and development of multiprocessor systems are aimed at


improving throughput, reliability, flexibility and availability.

Why Choose a Multiprocessor? * A single CPU can only go so fast,

use more than one CPU to

improve performance * Multiple users can work simultaneously. * Multiple applications can be accessible. * Multi-tasking within an application is possible. * Responsiveness and/or throughput can be increased due to multitasking. * Share hardware between CPUs.

Three different interconnections are there: * Time shared common bus * Crossbar switch network * Multiport memories

Functional design of an MIMD multiprocessor system

* Time

shared or common buses: The simplest interconnection system for multiple processors is a common communication path connecting all of the functional units. The common path is often called a time shared or common bus. If number of buses in a time-shared bus system is increased, a point is reached at which there is a separate path available for each memory unit. The interconnection network is then called a non-blocking crossbar. The crossbar switch possesses complete connectivity with respect to memory modules because there is a separate bus associated with each memory module. network for multiprocessor: In order to design multistage network, we need to understand the basic principle involves in the construction and control of simple crossbar switches.

* Cross bar switch:

* Multistage

*
* 2 Different Architectural Models of Multiprocessor are:
Tightly Coupled System - Tasks and/or processors communicate in a highly synchronized fashion - Communicates through a common shared memory - Shared memory system Loosely Coupled System - Tasks or processors do not communicate in a synchronized fashion - Communicates by message passing packets - Overhead for data exchange is high - Distributed memory system

*
* Message Passing * Separate address space for each processor * Processors communicate via message passing * Processors have private memories * Focuses attention on costly non-local operations * Shared Memory * Processors communicate with shared address space * Processors communicate by memory read/write * Easy on small-scale machines * Lower latency * SMP or NUMA

*
* A Global-memory Message-passing Multiprocessor:
Eric E Johnson researched a global-memory message-passing multiprocessor. The unusual class of multiprocessors which employ a global memory and private process access environments presents intriguing possibilities for machines which retain high efficiency even when process interactions are relatively frequent. Because all writable data is private to a process, such machines are free from the cache consistency concerns of shared-variable architectures, while, in contrast to distributed memory machines, a global memory is available for code and data which is physically (not logically) shared. An example of this class of multiprocessors, the virtual port memory (VPM) architecture, is apparently scalable to at least 256 processors using only a simple busbased interconnection network.

Architectural Differences Of Efficient Sequential And Parallel Computers:


J. Forsell researched Architectural differences of efficient sequential and parallel computers, In this paper he try to conclude what kind of a computer architecture is efficient for executing sequential problems, and what kind of an architecture is efficient for executing parallel problems from the processor architects point of view. For that purpose we analytically evaluate the performance of eight general purpose processor architectures representing widely both commercial and scientific processor designs in both single processor and multiprocessor setups. The results are interesting. The most efficient architecture for sequential problems is a twolevel pipelined VLIW (very long instruction Word ) architecture with few parallel functional units. The most efficient architecture for parallel problems is a deeply inter-thread super-pipelined architecture in which functional units are chained. Thus designing a computer for efficient sequential computation leads to a very different architecture than designing one for efficient parallel computation and there exists no single optimal architecture for general purpose computation.

* Martti

*
Parallel computing is fast. There are many different approaches and models of parallel computing . Parallel processing is a future technique for higher performance and effectiveness for multiprogrammed workloads. Parallel computing is the future of computing. Purpose of array processor is to enhance the performance of the computer by providing vector processing for complex scientific applications. The purpose of parallel processing is to speed up the computer processing capability and increase its throughput , that is, the amount of processing that can be accomplished during a given time interval. The amount of hardware increases with parallel processing, the cost of system increases.

Вам также может понравиться