Вы находитесь на странице: 1из 22

[ II - IT- II semester

Computer Organization -- Unit-3 ]

COMPUTER ORGANIZATION UNIT -3


Instruction pipelining Pipelining Hazards, Dealing with Branches, 8086 Processor Family, Reduced Instruction Set Computers : Instruction Execution Characteristics, Large Register Files RISC Architecture

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 1

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Instruction pipelining:
Pipelining is a technique of decomposing a sequential process into sub-operations, with each sub process being executed in a special dedicated segment that operates competently with all other segments. Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. An instruction pipeline operates on a stream of instruction by overlapping the fetch, decode & execute phases of the instruction cycle. The pipeline organization of a CPU is similar to an assembly line: the work to be done in an instruction is broken into smaller steps (pieces), each of which takes a fraction of the time needed to complete the entire instruction. Each of these steps is a pipe stage (or a pipe segment). Pipe stages are connected to form a pipe:

The time required for moving an instruction from one stage to the next: a machine cycle (often this is one clock cycle). The execution of one instruction takes several machine cycles as it passes through the pipeline.

Two stage pipeline:

FI: fetch instruction EI: execute instruction

We consider that each instruction takes execution time Tex. Execution time for the 7 instructions, with pipelining: (Tex/2)*8= 4*Tex

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 2

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Working of Instructional Pipelining:


An instructional pipeline reads consecutive instructions from memory while previous instructions are being executed in other segments. This causes the instruction fetch & execute phases to overlap and perform simultaneous operations. When branch instruction is encountered, pipeline must be emptied and all the instructions that have been read from memory after the branch instruction must be discarded. A greater number of stages always provides better performance

Six stage pipeline: FI: fetch instruction EI: execute instruction

FO: fetch operand CO: calculate operand address

DI: decode instruction WO: write operand

Branch in a Pipeline:

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 3

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Pipeline performance:
Pipeline performance measure is in terms of time taken in executing a program. If a non-pipe line unit that performs a given task and takes a time equal to tn to complete. The speed up of a pipe line processing over an equivalent non-pipe line processing is defined by the ratio:

Where

K = No. of segments in pipe line. Tp = Time taken by each segment to process a sub-operation. n = No. of tasks.

Problems with Pipeline: A greater number of stages increases the overhead in moving information between stages and synchronization between stages. With the number of stages the complexity of the CPU grows. With is difficult to keep a large pipeline at maximum rate because of pipeline hazards.

Pipelining Hazards:
Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. The instruction is said to be stalled. When an instruction is stalled, all instructions later in the pipeline than the stalled instruction are also stalled. Instructions earlier than the stalled one can continue. No new instructions are fetched during the stall. Types of hazards: 1. Structural hazards 2. Data hazards 3. Control hazards

Structural Hazards:
Structural hazards occur when a certain resource (memory, functional unit) is requested by more than one instruction at the same time. Example: Instruction ADD R4,X fetches in the FO stage operand X from memory. The memory doesnt accept another access during that cycle.

Penalty: 1 cycle Solutions: Certain resources are duplicated in order to avoid structural hazards. Functional units (ALU, FP unit) can be pipelined themselves in order to support several instructions at a time. A classical way to avoid hazards at memory access is by providing separate data and instruction caches.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 4

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Data Hazards:
This conflict arises when an instruction depends on the result of a pervious instruction, but this result is not yet variable We have two instructions, I1 and I2. In a pipeline the execution of I2 can start before I1 has terminated. If in a certain stage of the pipeline, I2 needs the result produced by I1, but this result has not yet been generated, we have a data hazard. Example:

Before executing its FO stage, the ADD instruction is stalled until the MUL instruction has written the result into R2. Penalty: 2 cycles Solutions: The problem of data dependency can be solved through the followings. 1.Operand forwarding: The hardware avoid the conflict by routing the data through special paths between pipe line segments. 2.Through Compiler Programs: Insert the No. operation instruction in the program.

After the EI stage of the MUL instruction the result is available by forwarding. The penalty is reduced to one cycle.

Control Hazards:

Control hazards are produced by as consequence of branch instructions. Unconditional branch: BR TARGET

TARGET _______
After the FO stage of the branch instruction the address of the target is known and it can be fetched.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 5

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Conditional branch:

Handling branch difficulties: The methods used are (i) Prefetch target instructions (ii) Use of branch target buffer (iii) Use of loop buffer. (iv) branch prediction (v) Delayed branch.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 6

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Dealing with Branches:


A number of techniques can be used to minimize the impact of the branch instruction i.e the branch penalty such are Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching

Multiple Streams:
Replicate the initial portions of the pipeline and fetch both possible next instructions Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Increases chance of memory contention Must support multiple streams for each instruction in the pipeline

Prefetch Branch Target:


Target of branch is prefetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91

Loop buffer:
Loop Buffer is small, very high speed memory maintained by the instruction fetch stage of pipeline and containing n most recently fetched instructions in sequence. Look ahead, look behind buffer. If the branch is to be taken ,the hardware first checks whether branch target is within buffer, If so next instruction is fetched from the buffer.

Benefits of Loop Buffer: With use of prefetching, Instruction fetched in sequence without the usual memory access time. If the Branch occurs to target just a few locations ahead of the address of branch instruction,the target is already in buffer. Very good for small loops or jumps. If buffer is big enough, entire loop can be held in it -- reducing branch penalty c.f. cache Used by CRAY-1

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 7

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Branch Prediction:
Make a good guess as to which instruction will be executed next and start that one down the pipeline. If the guess turns out to be right, no loss of performance in the pipeline If the guess was wrong, empty the pipeline and restart with the correct instruction -- suffering the full branch penalty. Static guesses: make the guess without considering the runtime history of the program Predict never taken Predict always taken Predict based on the opcode Dynamic guesses: track the history of conditional branches in the program Taken / not taken switch History table Predict never taken: Assume that jump will not happen Always fetch next instruction 68020 & VAX 11/780 VAX will not prefetch after branch if a page fault would result (O/S v CPU design) Predict always taken: Assume that jump will happen Always fetch target instruction Predict by Opcode: Some instructions are more likely to result in a jump than others Can get up to 75% success Taken/Not taken switch: Based on previous history Good for loops Branch Prediction Flowchart:

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 8

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Branch Prediction State Diagram Dealing With Branches:

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 9

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Delayed branch: Minimize the branch penalty by finding valid instructions to execute in the pipeline while the branch address is being resolved. Compiler is tasked with reordering the instruction sequence to find enough independent instructions (wrt to the conditional branch) to feed into the pipeline after the branch that the branch penalty is reduced to zero. Consider the sequence: Instruction x Instruction x+1 Instruction x+2 Conditional branch Do not take jump until you have to Rearrange instructions Implemented on many RISC architectures

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 10

[ II - IT- II semester

Computer Organization -- Unit-3 ]

8086 Processor Family:


8086 Register Organization:
Intel 8086 was the first 16-bit microprocessor introduced by Intel in 1978. The register organization includes the following types of Registers. 1. General Purpose: 2.Segment: The 16-bit segment register selectors which segment selectors, which index into segment tables The Code Segment(CS):Register references the segment containing the instruction being executed. The Stack Segment(SS):Register references contains a user-visible stack. The Remaining segment registers(DS,ES,FS,GS) enable the user to separate the data segments at a time. There are 8 32-bit general purpose registers Used for all types of x86 instructions Holds the operands for address calculations. String instructions use the contents of ECX,ESI and EDI registers In 64-bit there are 16 64-bit general purpose registers.

3.FLAGS: The 32-bit EFLAGS register contain the conditional codes and various mode bits. 4.Instruction Pointer: Contain the address of the current instruction. 5.Numaric: Each register holds an extended precision 80-bit floating point numbers. There are 8 registers that function as a stack, with push and pop operations available in the instruction set.

6.Control: The 16-bit control registers contains bit that control the operations of floating point unit. It include rounding, exception, precision controls 7.Staus: 16-bit status register contains bits that reflects the current state of floating point unit. It include 3-bit pointer to the top of the stack Conditional codes are reported 8.Tag word: 16-bit register contains a 2-bit tag for each floating point numeric register which indicates the nature of the contents of corresponding register. The four possible values are valid, zero, special and empty Enable program to check the contents of the numeric register without performing complex decoding of actual data in the register.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 11

[ II - IT- II semester

Computer Organization -- Unit-3 ]

EFLAGS Registers: There is a special register in the processor called EFLAGS. This register is 32 bits wide and most of those bits are used to track a variety of conditions in the processor. It includes the six condition codes (like carry, parity, auxiliary, zero, sign, overflow) which reports results of an integer operations.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 12

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Trap Flag(TF): when set, causes an interrupt after the execution of each instruction. Used for debugging. Interrupt Enable Flag (IF): when set ,the processor will recognize the external interrupts. Direction Flag (DF): It is used in string processing. I/O privilege flags(IOPL): Used in protected mode to generate four levels of security Resume Flag(RF): It enables you to turn off certain exceptions while debugging code. Identification Flag (IF):If this bit can be set and cleared, then the processor supports the ProcessorID instruction. It provide information about vendor, family and model. Nested Task Flag: Indicate current task is nested within another task in protected mode. Virtual Mode: Allow the programmer to enable or disable virtual mode. Virtual Interrupt Flag(VIF) & Virtual Interrupt Pending(VIP) are used in multi tasking environment.

Control Registers:

MMX Registers:
MMX uses several 64 bit data types Use 3 bit register address fields 8 registers No MMX specific registers Aliasing to lower 64 bits of existing floating point registers

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 13

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Interrupt Processing:
Interrupt processing with in a processor is facility provided to support the operating system. It allow the application programmer to be suspended, in order that a variety of interrupt conditions can be serviced and latter resumed.

Interrupts & Exceptions: Interrupt is generated by a signal from hardware, and it may occur at random times during the execution of a program. Exception is generated from software an it is provoked by the execution of an instruction. There are two sources of interrupts and exceptions. Interrupts: Maskable: Received on the processors INTR pin.The processor does not recognize a maskable interrupt unless the Interrupt Enable Flag(IF) is set. Nonmaskable: Received on the processors NMI pin, Reorganization of such interrupts can not be prevented. Exceptions: Processor detected: Results when processor encounters an error while attempting to execute an instruction. Programmed: These are instructions that generate an exception. Interrupt vector table: Each interrupt type assigned a number Index to vector table 256 * 32 bit interrupt vectors 5 priority classes : Class1: Traps Previous instructions Class2: External Interrupts Class3: Faults from fetching next instruction Class4: Faults from decoding the next instruction Class5: Faults on executing an instruction

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 14

[ II - IT- II semester

Computer Organization -- Unit-3 ]

RISC (Reduced Instruction Set Computers):


Major Advances in Computers:
The family concept IBM System/360 1964 DEC PDP-8 Separates architecture from implementation Microporgrammed control unit Idea by Wilkes 1951 Produced by IBM S/360 1964 Cache memory IBM S/360 model 85 1969 Solid State RAM (See memory notes) Microprocessors Intel 4004 1971 Pipelining Introduces parallelism into fetch execute cycle Multiple processors Reduced Instruction Set Computer Key features Large number of general purpose registers or use of compiler technology to optimize register use Limited and simple instruction set Emphasis on optimising the instruction pipeline

Instruction Execution Characteristics:


Driving force for CISC: Software costs far exceed hardware costs Increasingly complex high level languages Semantic gap Leads to: Large instruction sets More addressing modes Hardware implementations of HLL statements e.g. CASE (switch) on VAX Intention of CISC: Ease compiler writing Improve execution efficiency Complex operations in microcode Support more complex HLLs Execution Characteristics: Operations performed Operands used Execution sequencing

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 15

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Studies have been done based on programs written in HLLs Dynamic studies are measured during the execution of the program Operations: Assignments Movement of data Conditional statements (IF, LOOP) Sequence control Procedure call-return is very time consuming Some HLL instruction lead to many machine code operations Operands: Mainly local scalar variables Optimisation should concentrate on accessing local variables Procedure Calls: Very time consuming Depends on number of parameters passed Depends on level of nesting Most programs do not do a lot of calls followed by lots of returns Most variables are local (c.f. locality of reference) Implications: Best support is given by optimising most used and most time consuming features Large number of registers Operand referencing Careful design of pipelines Branch prediction etc. Simplified (reduced) instruction set Large Register File: Software solution Require compiler to allocate registers Allocate based on most used variables in a given time Requires sophisticated program analysis Hardware solution Have more registers Thus more variables will be in registers Registers for Local Variables: Store local scalar variables in registers Reduces memory access Every procedure (function) call changes locality Parameters must be passed Results must be returned Variables from calling programs must be restored Register Windows: Only few parameters Limited range of depth of call Use multiple small sets of registers Calls switch to a different set of registers

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 16

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Returns switch back to a previously used set of registers Three areas within a register set Parameter registers Local registers Temporary registers Temporary registers from one set overlap parameter registers from the next This allows parameter passing without moving data

Circular Buffer Organization of overlapped windows:

Operation of Circular Buffer : When a call is made, a current window pointer is moved to show the currently active register window. If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory. A saved window pointer indicates where the next saved windows should restore to. Global Variables: Allocated by the compiler to memory

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 17

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Inefficient for frequently accessed variables Have a set of registers for global variables

Large Registers File v Cache Organization Large Register File All local scalars Individual variables Compiler-assigned global variables Save/Restore based on procedure nesting depth Register addressing Cache Recently-used local scalars Blocks of memory Recently-used global variables Save/Restore based on cache replacement algorithm Memory addressing

Referencing a Scalar - Window Based Register File

Referencing a Scalar Cache:

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 18

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Why CISC:
Compiler simplification? Disputed Complex machine instructions harder to exploit Optimization more difficult Smaller programs? Program takes up less memory but Memory is now cheap May not occupy less bits, just look shorter in symbolic form More instructions require longer op-codes Register references require fewer bits Faster programs Bias towards use of simpler instructions More complex control unit Microprogram control store larger thus simple instructions take longer to execute It is far from clear that CISC is the appropriate solution

RISC Characteristics:
One instruction per cycle Register to register operations Few, simple addressing modes Few, simple instruction formats Hardwired design (no microcode) Fixed instruction format More compile time/effort

RISC VS CISC

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 19

[ II - IT- II semester

Computer Organization -- Unit-3 ]

RISC Pipelining:
Most instructions are register to register Two phases of execution I: Instruction fetch E: Execute ALU operation with register input and output For load and store I: Instruction fetch E: Execute Calculate memory address D: Memory Register to memory or memory to register operation

Effects of Pipelining:

Optimization of Pipelining: Delayed branch Does not take effect until after execution of following instruction This following instruction is the delay slot Delayed Load Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load complete Re-arranging instructions can allow useful work whilst loading Loop Unrolling

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 20

[ II - IT- II semester
Example:

Computer Organization -- Unit-3 ]

Replicate body of loop a number of times Iterate loop fewer times Reduces loop overhead Increases instruction parallelism Improved register, data cache or TLB locality

do i=2, n-1 a[i] = a[i] + a[i-1] * a[i+l] end do Becomes do i=2, n-2, 2 a[i] = a[i] + a[i-1] * a[i+i] a[i+l] = a[i+l] + a[i] * a[i+2] end do if (mod(n-2,2) = i) then a[n-1] = a[n-1] + a[n-2] * a[n] end if Use of Delayed Branch:

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 21

[ II - IT- II semester

Computer Organization -- Unit-3 ]

Assignment Questions
1. What is a pipeline register. What is the use of it? Explain in detail? 2. (a) Differentiate RISC and CISC computers. (b) Explain RISC pipelining. 3. Explain vector processing? 4. (a) What is pipeline? Explain. (b) Explain arithmetic pipeline.

R.Veeranjaneyulu M.Tech

PACE Institute of Technology & Sciences,

Ongole

Page 22

Вам также может понравиться