Вы находитесь на странице: 1из 7

COMPUTER

ARCHITECTURE
AND
PARALLEL
PROCESSING

Kai Hwang
University of Southern California

Faye A. Briggs
Rice University

Technische Hochschule Darmstadt


FACHBEREICH INFORMATIK

B I B L I O T H E K .
3?CD t\
Inventar-Nr.:
Sachgebiete:

9J7.TS.V.
O.Hv

Standort:

McGraw-Hill Book Company


New York
Panama

St. Louis San Francisco Auckland Bogota Hamburg


London Madrid Mexico Montreal New Delhi
Paris Sao Paulo Singapore Sydney Tokyo Toronto

CONTENTS

Preface

XV

Chapter 1 Introduction to Parallel Processing


1.1

1.2

1.3

1.4

1.5

1.6

Chapter 2
2.1

Evolution of Computer Systems


1.1.1 Generations of Computer Systems
1.1.2 Trends towards Parallel Processing
Parallelism in Uniprocessor Systems
.2.1 Basic Uniprocessor Architecture
.2.2 Parallel Processing Mechanisms
.2.3 Balancing of Subsystem Bandwidth
.2.4 Multiprogramming and Time Sharing
Parallel Computer Structures
.3.1 Pipeline Computers
.3.2 Array Computers
.3.3 Multiprocessor Systems
.3.4 Performance of Parallel Computers
.3.5 Dataflow, and New Concepts
Architectural Classification Schemes
.4.1 Multiplicity of Instruction-Data Streams
.4.2 Serial versus Parallel Processing
.4.3 Parallelism versus Pipelining
Parallel Processing Applications
.5.1' Predictive Modeling and Simulations
.5.2 Engineering Design and Automation
1.5.3 Energy Resources Exploration
1.5.4 Medical, Military, and Basic Research
Bibliographic Notes and Problems

11
13
16
20
20
22
25
27
29
32
32
35
37
40
42
44
46
48
49

Memory and Input-Output Subsystems

52

Hierarchical Memory Structure


2.1.1 Memory Hierarchy
2.1.2 Optimization of Memory Hierarchy
2.1.3 Addressing Schemes for Main Memory

52
52
56
58

X CONTENTS

2.2

2.3

2.4

2.5

2.6

Chapter 3
3.1

3.2

3.3

3.4

3.5

Chapter 4
4.1

Virtual Memory System


2.2.1 The Concept of Virtual Memory
2.2.2 Paged Memory System
2.2.3 Segmented Memory System
2.2.4 Memory with Paged Segments
Memory Allocation and Management
2.3.1 Classification of Memory Policies
2.3.2 Optimal Load Control
2.3.3 Memory Management Policies
Cache Memories and Management
2.4.1 Characteristics of Cache Memories
2.4.2 Cache Memory Organizations
2.4.3 Fetch and Main Memory Update Policies
2.4.4 Block Replacement Policies
Input-Output Subsystems
2.5.1 Characteristics of I/O Subsystem
2.5.2 Interrupt Mechanisms and Special Hardware
2.5.3 I/O Processors and I/O Channels
Bibliographic Notes and Problems

60
61
65
71
77
80
80
86
91
98
98
102
113
115
118
118
123
128
141

Principles of Pipelining and Vector Processing

145

Pipelining: An Overlapped Parallelism


3.1.1 Principles of Linear Pipelining
3.1.2 Classification of Pipeline Processors
3.1.3 General Pipelines and Reservation Tables
3.1.4 Interleaved Memory Organizations
Instruction and Arithmetic Pipelines
3.2.1 Design of Pipelined Instruction Units
3.2.2 Arithmetic Pipelines Design Examples
3.2.3 Multifunction and Array Pipelines
Principles of Designing Pipelined Processors
3.3.1 Instruction Prefetch and Branch Handling
3.3.2 Data Buffering and Busing Structures
3.3.3 Internal Forwarding and Register Tagging
3.3.4 Hazard Detection and Resolution
3.3.5 Job Sequencing and Collision Prevention
*3.3.6 Dynamic Pipelines and Reconfigurability
Vector Processing Requirements
3.4.1 Characteristics of Vector Processing
*3:4.2 Multiple Vector task Dispatching
3.4.3 Pipelined Vector Processing Methods
Bibliographic Notes and Problems

Pipeline Computers a n d Vectorization Methods


The Space of Pipelined Computers
4.1.1 Vector Supercomputers
4.1.2 Scientific Attached Processors

145
146
151
154
156
164
164
170
181
187
187
193
196
200
203
208
212
213
218
226
229

233
233
234
235

CONTENTS XI

4.2

4.3

4.4

4.5

4.6

Early Vector Processors


4.2.1 Architectures of Star-100 and TI-ASC
4.2.2 Vector Processing in Streaming Mode
Scientific Attached Processors
4.3.1 The Architecture of AP-120B
4.3.2 Back-end Vector Computations
4.3.3 FPS-164, IBM 3838 and Datawest MATP
Recent Vector Processors
4.4.1 The Architecture of Cray-1
4.4.2 Pipeline Chaining and Vector Loops
4.4.3 The Architecture of Cyber-205
4.4.4 Vector Processing in Cyber-205 and CDC-NASF
4.4.5 Fujitsu VP-200 and Special Features
Vectorization and Optimization Methods
4.5.1 Language Features in Vector Processing
4.5.2 Design of Vectorizing Compilers
4.5.3 Optimization of Vector Operations
*4.5.4 Performance Evaluation of Pipelined Computers
Bibliographic Notes and Problems

237
237
245
249
249
255
258
264
264
271
280
285
293
301
301
305
308
314
320

Chapter 5 Structures and Algorithms for Array Processors


5.1.

SIMD Array Processors


5.1.1 SIMD Computer Organizations
5.1.2 Masking and Data Routing Mechanisms
5.1.3 Inter-PE Communications
5.2 SIMD Interconnection Networks
5.2.1 Static versus Dynamic Networks
5.2.2 Mesh-Connected Illiac Network
5.2.3 Cube Interconnection Networks
5.2.4 Barrel Shifter and Data Manipulator
5.2.5 Shuffle-Exchange and Omega Networks
5.3 Parallel Algorithms for Array Processors
5.3.1 SIMD Matrix Multiplication
5.3.2 Parallel Sorting on Array Processors
5.3.3 SIMD Fast Fourier Transform
5.3.4 Connection Issues for SIMD Processing
5.4 Associative Array Processing
5.4.1 Associative Memory Organizations
5.4.2 Associative Processors (PEPE and STARAN)
*5.4.3 Associative Search Algorithms
5.5 Bibliographic Notes and Problems

''

--

325
325
326
328
332
333
334
339
342
345
350
355
355
361
367
373
" 374
375
380
385
388

Chapter 6 S I M D Computers and Performance Enhancement 393


6.1

The Space of SIMD Computers


6.1.1 Array and Associative Processors
6.1.2 SIMD Computer Perspectives

393
394
396

Xll CONTENTS

6.2

The Illiac-IV and the BSP Systems


6.2.1 The Illiac-IV System Architecture
6.2.2 Applications of the Illiac-IV
6.2.3 The BSP System Architecture
6.2.4 The Prime Memory System
6.2.5 The BSP Fortran Vectorizer
6.3 The Massively Parallel Processor
6.3.1 The MPP System Architecture
6.3.2 Processing Array, Memory, and Control
6.3.3 Image Processing on the MPP
6.4 Performance Enhancement Methods
6.4.1 Parallel Memory Allocation
6.4.2 Array Processing Languages
*6.4.3 Performance Analysis of Array Processors
*6.4.4 Multiple-SIMD Computer Organization
6.5 Bibliographic Notes and Problems

Chapter 7
7.1

7.2

7.3

7.4

7.5

7.6

Chapter 8
8.1

8.2

398
399
402
410
414
417
422
423
426
430
434
434
440
445
448
454

Multiprocessor Architecture a n d P r o g r a m m i n g
Functional Structures
7.1.1 Loosely Coupled Multiprocessors
7.1.2 Tightly Coupled Multiprocessors
7.1.3 Processor Characteristics for Multiprocessing
Interconnection Networks
7.2.1 Time Shared or Common Buses
7.2.2 Crossbar Switch and Multiport Memories
7.2.3 Multistage Networks for Multiprocessors
*7.2.4 Performance of Interconnection Networks
Parallel Memory Organizations
7.3.1 Interleaved Memory Configurations
*7.3.2 Performance Tradeoffs in Memory Organizations
7.3.3 Multicache Problems and Solutions
Multiprocessor Operating Systems
7.4.1 Classification of Multiprocessor Operating Systems
7.4.2 Software Requirements for Multiprocessors
7.4.3 Operating System Requirements
Exploiting Concurrency for Multiprocessing
7.5.1 Language Features to Exploit Parallelism
7.5.2 Detection of Parallelism in Programs
*7.5.3 Program and Algorithm Restructuring
Bibliographic Notes and Problems

Multiprocessing C o n t r o l a n d Algorithms
Interprocess Communication Mechanisms
8.1.1 Process Synchronization Mechanisms
8.1.2 Synchronization with Semaphores
8.1.3 Conditional Critical Sections and Monitors
System Deadlocks and Protection
8.2.1 System Deadlock Problems

459
459
459
460
478
481
481
487
492
502
508
508
513
517
525
526
528
531
533
533
541
545
551

557
557
557
565
572
577
577

CONTENTS Xlll

8.2.2 Deadlock Prevention and Avoidance


8.2.3 Deadlock Detection and Recovery
8.2.4 Protection Schemes
8.3 Multiprocessor Scheduling Strategies
8.3.1 Dimensions of Multiple Processor Management
8.3.2 Deterministic Scheduling Models
*8.3.3 Stochastic Scheduling Models
8.4 Parallel Algorithms for Multiprocessors
8.4.1 Classification of Parallel Algorithms
8.4.2 Synchronized Parallel Algorithms
8.4.3 Asynchronous Parallel Algorithms
*8.4.4 Performance of Parallel Algorithms
8.5 Bibliographic Notes and Problems

Chapter 9
9.1

9.2

9.3

9.4

9.5

9.6

9.7

580
582
583
590
590
596
606
613
614
616
622
628
637

Example Multiprocessor Systems


The Space of Multiprocessor Systems
9.1.1 Exploratory Systems
9.1.2 Commercial Multiprocessors
The C.mmp Multiprocessor System
9.2.1 The C.mmp System Architecture
9.2.2 The Hydra Operating System
*9.2.3 Performance of the C.mmp
The S-l Multiprocessor
9.3.1 -The S-l System Architecture
9.3.2 Multiprocessing Uniprocessors
9.3.3 The S-l Software Development
The HEP Multiprocessor
1
9.4.1 The HEP System Architecture
9.4.2 Process Execution Modules
9.4.3 Parallel Processing'on the HEP
Mainframe Multiprocessor Systems
9.5.1 IBM 370/168 MP, 3033, and 3081
9.5.2 Operating System for IBM Multiprocessors
9.5.3 Univac 1100/80 and 1100/90 Series
9.5.4 The'Tandem Nonstop System
- The Cray X-MP and Cray 2
9.6.1 Cray X-MP System Architecture
9.6.2 Multitasking on Cray X-MP
*9.6.3 Performance of Cray X-MP
Bibliographic Notes and Problems

643

"~

Chapter 10 Data Flow Computers and VLSI Computations


10.1

10.2

Data-Driven Computing and Languages


10.1.1 Control-Flow versus Data Flow Computers
10.1.2 Data Flow Graphs and Languages
10.1.3 Advantages and Potential Problems
Data Flow Computer Architectures

643
643
644
645
645
650
654
658
659
661
668
669
669
674
680
684
684
693
694
705
714
714
1\1
721
728

732
732
733
740
745
748

XIV CONTENTS

10.2.1 Static Data Flow Computers


10.2.2 Dynamic Data Flow Computers
10.2.3 Data-Flow Design Alternatives
10.3 VLSI Computing Structures
10.3.1 The Systolic Array Architecture
*10.3.2 Mapping Algorithms into Systolic Arrays
10.3.3 Reconfigurable Processor Array
10.4 VLSI Matrix Arithmetic Processors
10.4.1 VLSI Arithmetic Modules
10.4.2 Partitioned Matrix Algorithms
10.4.3 Matrix Arithmetic Pipelines
*10.4.4 Real-Time Image Processing
10.5 Bibliographic Notes and Problems

748
755
763
768
769
774
779
788
788
790
798
803
807

Bibliography

813

Index

833

Вам также может понравиться