Добро пожаловать в Scribd!

Limits of Instruction-Level Parallelism

Загружено:

0% нашли этот документ полезным (0 голосов)

12 просмотров18 страниц

Instructions that do not have dependencies between each other; can be executed in any order. Practical, achievable parallelism should be between 3 and 1000. A super-scalar machine - a machine that can issue multiple independent instructions in the same clock cycle.

Исходное описание:

Оригинальное название

Presentation Robert

Авторское право

Доступные форматы

PPT, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

12 просмотров18 страниц

Limits of Instruction-Level Parallelism

Загружено:

Shijith Thotton

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 18

Поиск в документе

Limits of Instruction-Level Parallelism

Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W. Wall WRL Research Report, November 1993

What is ILP?
Instructions that do not have dependencies on each other; can be executed in any order.
r1 := 0[r9] r2 := 17 4[r3] := r6 (has ILP) r1 := 0[r9] r2 := r1 + 17 4[r2] := r6 (no ILP)

Super-scalar machine a machine that can issue multiple independent instructions in the same clock cycle.

Definition of Parallelism

Parallelism = (Number of Instructions) / (Number of Cycles it takes to execute) r1 := 0[r9] r2 := 17 4[r3] := r6 Parallelism = 3 r1 := 0[r9] r2 := r1 + 17 4[r2] := r6 Parallelism = 1

How much parallelism is there?

That depends how hard you want to look for it... Ways to increase ILP: Register renaming Branch prediction Alias analysis Indirect-jump prediction

Low estimate for ILP

Programsaremadeupofbasicblocksuninterrupted sequences of instructions with no branches. On average, in typical applications, basic blocks are ~10 instructions long. Each basic block has parallelism of around 3.

High estimate for ILP

If you look beyond a basic block, at the entire scope of a program, studieshaveshownthatanomniscientschedulercanachieve parallelism of > 1000 in some numerical applications. Omniscientschedulingcanbeimplementedbysavingatraceof a program execution, and using an oracle to schedule it. The oracle knows what will happen, and thus can create a perfect execution schedule. Practical, achievable ILP should be between 3 and 1000.

Types of dependencies
Types of dependencies: * True dependency - given the computations involved, the dependency must exist * False dependency - dependency happens to exist as an artifact of the code generation engine. E.g., two independent values are allocated to the same register by the compiler. r1 := 20[r4] r2 := r1 + r4 ... ... r2 := r1 + 1 r1 := r17 - 1 (a) true data dependency (b) anti-dependency

if r17 = 0 goto L ... ... r1 := r2 + r3 ... r1 := 0[r7] L: (c) output dependency (d) control dependency

r1 := r2 * r3

Register renaming
The compiler's register allocation algorithm can insert false dependencies by assigning unrelated values to the same register. We can undo this damage by assigning each value to a unique register so that only true dependencies remain. However, machines have a finite number of registers, so we can never guarantee perfect parallelism.

Alias analysis
We often have registers that point to a memory location or contain a memory offset. Can two memory pointers point to the same place in memory? If so, there might be a dependency. We're not sure yet. We can try to inspect pointer values at runtime to see if they point to overlapping memory.

Alias analysis

Limitations of branch prediction:

We can correctly predict around ~0.9 by counting which branches have been recently taken, and taking the most common one.

Indirect-jump prediction
If we jump to an address that is not known at compile time--for example, if a destination address is calculated into a register at runtime. This is often the case for "return" constructs, where the the calling function's address is stored on the stack. In this case, we can do indirect-jump prediction.

Latency

Multi-cycle instructions can greatly decrease parallelism

Window size
The window size is the maximum number of instructions that can appear in the pending cycle list.

Overall results

Conclusions: the ILP Wall

Evenwithperfecttechniques,mostrealapplicationshitan ILP limit of around 20 With reasonable, practical methods, it's even worseit's very difficult to get an ILP above 10.

Relationship to Term Project

Our term project is about optimization techniques for AMD64 Opteron/Athlon processors. Maximizing ILP is essential to getting the most performance out of any processor. Branch prediction, register renaming, etc., are all particularly relevant optimizations

Вам также может понравиться

J2534 Manual 2 2 - 130926154727
Документ24 страницы
J2534 Manual 2 2 - 130926154727
Илья Никифоров
Оценок пока нет
Mainframe Assembler Translation
Документ16 страниц
Mainframe Assembler Translation
AlaEddineDridi
Оценок пока нет
IBM Mainframe Assembler - Hints and Tips
Документ7 страниц
IBM Mainframe Assembler - Hints and Tips
Nigthstalker
100% (2)
Salcescu, Cristian - Functional Programming in JavaScript (Functional Programming With JavaScript and React Book 4) (2020)
Документ182 страницы
Salcescu, Cristian - Functional Programming in JavaScript (Functional Programming With JavaScript and React Book 4) (2020)
Sam Hyde
Оценок пока нет
Verilog HDL Lab Quiz
Документ69 страниц
Verilog HDL Lab Quiz
srilakshmi08
50% (2)
Gathering and Acquiring Nursing Health Care Data
Документ17 страниц
Gathering and Acquiring Nursing Health Care Data
Monique Leonardo
Оценок пока нет
Basic Information About C language PDF
От Everand
Basic Information About C language PDF
Suraj Das
Оценок пока нет
Lesson 7 - Structured Cabling Systems
Документ21 страница
Lesson 7 - Structured Cabling Systems
jackstone saitoti
Оценок пока нет
OTDR Presentation
Документ24 страницы
OTDR Presentation
Bh@tt Ch!nt@n
100% (1)
Instruction-Level Parallel Processors: Asim Munir
Документ28 страниц
Instruction-Level Parallel Processors: Asim Munir
ibrar7876
Оценок пока нет
Instruction-Level Parallelism (ILP), Since The
Документ57 страниц
Instruction-Level Parallelism (ILP), Since The
Ram Sudeep
Оценок пока нет
RPRT
Документ6 страниц
RPRT
anon_191402976
Оценок пока нет
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
Документ38 страниц
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
kbkkr
Оценок пока нет
Avrbeginners 04 Jumps Calls and The Stack 1.0.1
Документ11 страниц
Avrbeginners 04 Jumps Calls and The Stack 1.0.1
Pintuabc
Оценок пока нет
Verilog
Документ80 страниц
Verilog
genx142
Оценок пока нет
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
Документ28 страниц
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
reinaldoopus
Оценок пока нет
Advanced Computer Architecture
Документ36 страниц
Advanced Computer Architecture
amitha
Оценок пока нет
Question Bank
Документ173 страницы
Question Bank
bhaskarven
Оценок пока нет
Reverse Engineering of Real-Time Assembly Code
Документ16 страниц
Reverse Engineering of Real-Time Assembly Code
emanresusugob
Оценок пока нет
P14-15 Superscalar
Документ28 страниц
P14-15 Superscalar
heru
Оценок пока нет
CDSSS
Документ72 страницы
CDSSS
Naman Kabadi
Оценок пока нет
ARMv7 Reference
Документ7 страниц
ARMv7 Reference
Angela Barber
Оценок пока нет
Easier Parallel Computing in R With Snowfall and Sfcluster
Документ6 страниц
Easier Parallel Computing in R With Snowfall and Sfcluster
eugenebud1
Оценок пока нет
Module-3 ARMProgram Notes.-16857877494142 PDF
Документ5 страниц
Module-3 ARMProgram Notes.-16857877494142 PDF
rockyv9964
Оценок пока нет
Itanium Processor Seminar
Документ30 страниц
Itanium Processor Seminar
Danish Kunroo
Оценок пока нет
ILP1 (Unit3)
Документ81 страница
ILP1 (Unit3)
Roxane hol
Оценок пока нет
Parallel Random Number Generation: Ahmet Duran CISC 879
Документ37 страниц
Parallel Random Number Generation: Ahmet Duran CISC 879
smkjadoon
Оценок пока нет
Micro Processor Lab 2 Manual
Документ7 страниц
Micro Processor Lab 2 Manual
Xafran Khan
Оценок пока нет
Assignment 1
Документ5 страниц
Assignment 1
breakz
Оценок пока нет
Opnet Lab 6 Solutions: A Routing Protocol Based On The Distance-Vector Algorithm
Документ7 страниц
Opnet Lab 6 Solutions: A Routing Protocol Based On The Distance-Vector Algorithm
ahmed
Оценок пока нет
Module 1 Chapter2
Документ100 страниц
Module 1 Chapter2
Usha Vizay Kumar
Оценок пока нет
Less-Numerical Algorithms: More-Numerically Oriented Book. Authors of Computer Science Texts, We've Noticed
Документ6 страниц
Less-Numerical Algorithms: More-Numerically Oriented Book. Authors of Computer Science Texts, We've Noticed
Vinay Gupta
Оценок пока нет
Large x86 - 64 Assembly Programs
Документ10 страниц
Large x86 - 64 Assembly Programs
pulp noir
Оценок пока нет
TRAP Routines and Subroutines
Документ32 страницы
TRAP Routines and Subroutines
Nirmal Gupta
Оценок пока нет
Assembler Verilog
Документ9 страниц
Assembler Verilog
Ayush Gera
Оценок пока нет
Tutorial WhitePaper IntellivisionProgramming1
Документ9 страниц
Tutorial WhitePaper IntellivisionProgramming1
downscribd
Оценок пока нет
ACA Unit 3
Документ17 страниц
ACA Unit 3
Fak Profi Les
Оценок пока нет
Stored-Program Computers
Документ37 страниц
Stored-Program Computers
1 2
Оценок пока нет
Advanced Computer Architecture: Conditions of Parallelism
Документ27 страниц
Advanced Computer Architecture: Conditions of Parallelism
Maher Khalaf Hussien
Оценок пока нет
C Undefined Behavior
Документ4 страницы
C Undefined Behavior
John Smith
Оценок пока нет
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
Документ21 страница
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
daniel
Оценок пока нет
CSC 212 Assembler Practical 1
Документ14 страниц
CSC 212 Assembler Practical 1
Royal Gabriel
Оценок пока нет
EC2303 2 Marks With Answers
Документ19 страниц
EC2303 2 Marks With Answers
Vijay Arunagiri A
Оценок пока нет
Make LISP Faster Than C
Документ6 страниц
Make LISP Faster Than C
rnspace
Оценок пока нет
Assembly Language Programming
Документ11 страниц
Assembly Language Programming
Usman Ullah Asif
Оценок пока нет
Program and Network Properties
Документ27 страниц
Program and Network Properties
sanzog rai
Оценок пока нет
Mohammed Raif CSP2 A2
Документ4 страницы
Mohammed Raif CSP2 A2
lanadel1710
Оценок пока нет
Chapter 04 ARM Assembly
Документ53 страницы
Chapter 04 ARM Assembly
007-Balamurugan A
Оценок пока нет
Topic2c Ss Dynamicscheduling
Документ94 страницы
Topic2c Ss Dynamicscheduling
Calam1tous
Оценок пока нет
Instructions and Instruction Sequencing
Документ25 страниц
Instructions and Instruction Sequencing
samueljamespeter
100% (4)
The Compilation Process: The Compilation Process Combines Both Translation and Optimisation of High Level Language Code
Документ20 страниц
The Compilation Process: The Compilation Process Combines Both Translation and Optimisation of High Level Language Code
Rajesh c
Оценок пока нет
Deconstructing Superpages With Altrossel: Mike Badfe
Документ6 страниц
Deconstructing Superpages With Altrossel: Mike Badfe
Nikos Aleksandrou
Оценок пока нет
Twelve Ways
Документ4 страницы
Twelve Ways
de7yT3iz
Оценок пока нет
Nonlinear ARDL Model Manual
Документ18 страниц
Nonlinear ARDL Model Manual
Dumegã Kokutse
Оценок пока нет
Functional - Programming - LISP - Part 2
Документ16 страниц
Functional - Programming - LISP - Part 2
masa
Оценок пока нет
Data Dependences and Hazards
Документ24 страницы
Data Dependences and Hazards
sshekh28374
Оценок пока нет
PowerPC Assembly Overview
Документ9 страниц
PowerPC Assembly Overview
bisti tu
Оценок пока нет
Lab Manual 5 Latest Loop Manual
Документ6 страниц
Lab Manual 5 Latest Loop Manual
Waqas Qureshi
Оценок пока нет
ILP Saad Saeed
Документ31 страница
ILP Saad Saeed
Sumathy Subramanian
Оценок пока нет
ARM GCC Inline Assembler Cookbook
Документ7 страниц
ARM GCC Inline Assembler Cookbook
BarryXu
100% (1)
Listing 1. First Try at Factorial Function
Документ19 страниц
Listing 1. First Try at Factorial Function
sankar_mca227390
Оценок пока нет
Iterative and Recursive Algorithm Comparison Using Radix and Merge Sort
Документ4 страницы
Iterative and Recursive Algorithm Comparison Using Radix and Merge Sort
Hizkia William Eben
Оценок пока нет
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
От Everand
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
Mulayam Singh
Оценок пока нет
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
От Everand
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
Rustacean Team
Оценок пока нет
CS8691 Unit1 ARTIFICIAL INTELLIGENCE Regulation 2017
Документ88 страниц
CS8691 Unit1 ARTIFICIAL INTELLIGENCE Regulation 2017
JEYANTHI mary
Оценок пока нет
Bit Plane Slicing and Bit Plane Compression
Документ5 страниц
Bit Plane Slicing and Bit Plane Compression
Sharmila Arun
Оценок пока нет
Hardware Interfaces: Software Requirements Specification For LMS
Документ3 страницы
Hardware Interfaces: Software Requirements Specification For LMS
Subrat Nayak
Оценок пока нет
Abdullah Ashfaq CV
Документ1 страница
Abdullah Ashfaq CV
Junaid Asrar
Оценок пока нет
Data Visualization Using Python
Документ43 страницы
Data Visualization Using Python
Sintya
Оценок пока нет
Vonets VAP11S Quick Setting Guide
Документ22 страницы
Vonets VAP11S Quick Setting Guide
Manivannan Manikam
Оценок пока нет
Fix Operation Failed With Error 0x0000011B (Windows Cannot Connect To The Printer) PASTI BISAAAA
Документ15 страниц
Fix Operation Failed With Error 0x0000011B (Windows Cannot Connect To The Printer) PASTI BISAAAA
Zhakaria Berta Pentakostanta
Оценок пока нет
Mpi Lab1
Документ6 страниц
Mpi Lab1
abdul shaggy
Оценок пока нет
8.1. Introduction To Classes: Class Classname (//code Here)
Документ13 страниц
8.1. Introduction To Classes: Class Classname (//code Here)
jestinmary
Оценок пока нет
CG Lab Manual
Документ64 страницы
CG Lab Manual
ANUJ SINGH
Оценок пока нет
Java Developer GraalVM&ReactiveProgramming v1
Документ27 страниц
Java Developer GraalVM&ReactiveProgramming v1
amartinb
Оценок пока нет
Electrical Network Transfer Function PDF
Документ14 страниц
Electrical Network Transfer Function PDF
balvez nickmar
Оценок пока нет
Common TCP/IP Protocols and Ports: Protocol Tcp/Udp Port Number Description
Документ3 страницы
Common TCP/IP Protocols and Ports: Protocol Tcp/Udp Port Number Description
sreedharkundir
Оценок пока нет
First and Follow Set
Документ5 страниц
First and Follow Set
api-3696125
86% (7)
Unit - 2 ARM Instruction Set-Notes
Документ18 страниц
Unit - 2 ARM Instruction Set-Notes
RAMACHANDRA KHOT
Оценок пока нет
Liebert Apm 400 600 Kva Brochure
Документ8 страниц
Liebert Apm 400 600 Kva Brochure
HATIM KATAWALA
Оценок пока нет
Q3 Tle CSS 10
Документ80 страниц
Q3 Tle CSS 10
Mark John Salunga
Оценок пока нет
DX Diag
Документ35 страниц
DX Diag
Benjamin Alarcon
Оценок пока нет
A Detailed Guide On Kerbrute PDF
Документ16 страниц
A Detailed Guide On Kerbrute PDF
sam
Оценок пока нет
Updated 0 Lecture of CSE408
Документ45 страниц
Updated 0 Lecture of CSE408
vikrant kumar
Оценок пока нет
C Language 2018
Документ6 страниц
C Language 2018
asmit
Оценок пока нет
Whitepaper The Future of AI Is Hybrid Part 1 Unlocking The Generative AI Future With On Device and Hybrid AI
Документ17 страниц
Whitepaper The Future of AI Is Hybrid Part 1 Unlocking The Generative AI Future With On Device and Hybrid AI
Châu Tô
Оценок пока нет
Samuel: Senior Software Developer
Документ1 страница
Samuel: Senior Software Developer
Oluwaseun Ajiboye
Оценок пока нет
Uninformed Search Algorithms
Документ19 страниц
Uninformed Search Algorithms
Izwah Nazir
Оценок пока нет
Department of Computer Science
Документ5 страниц
Department of Computer Science
Philip Christian Zuniga
Оценок пока нет
VB Advantages and Disadvantages
Документ5 страниц
VB Advantages and Disadvantages
lalitha
75% (4)