Вы находитесь на странице: 1из 60

Chap 5: Pentium and Pentium

Pro
Pentium and Pentium Pro
! Original pentium
! Introduced in 1993
! 0.8 micron
! 3.1 million transistors
! 60-66 Mhz clock
! L1 8KB instruction,8KB data
! ISA MMX added in 1997
! Look at fig 5-1
Pentium and Pentium Pro
! Caches
! Main memory speed
! Has not kept pace with processor speed
! Takes many clock cycles to transfer code and
data between memory and processor
! Fast memory available but expensive
! Put small amounts of cache
! Between registers and memory
! Hold frequently used code and data
Pentium and Pentium Pro
! Can have multiple cache levels
! L1
! Smallest
! Most expensive
! Located closest to processor’s back end
! L2
! Located between L1 and main memory
! slower
! See fig 5-2
Pentium and Pentium Pro
! Processor checks L1 first
! If finds
! Get cache hit
! If not
! Get cache miss
! Go to L2
! Copy to L1
! Passed along
! If not here go to other caches if present
! Then main memory
Pentium and Pentium Pro
! L1 layout
! Usually code and data stored in separate
halves
! Code part
! Called instruction cache or I-cache
! Data part
! Called data cache or D-cache
Pentium and Pentium Pro
! L1 and L2 usually on chip with CPU
! Called on-die cache
Pentium and Pentium Pro
! Pentium’s pipeline
! Has multiple pipelines
! Have four stages
! Fetch
! Decode-1
! Decode-2
! Write
Pentium and Pentium Pro
! Processor’s various execution units
! Have pipelines of different depths
! Integer
! Shortest
! Usually default in spec sheet
! Floating point
! Longest
! See fig 5-3
Pentium and Pentium Pro
! Integer pipeline stages
! Refetch/fetch
! Instruction fetched from instruction cache
! Put in prefetch buffers for decoding
! Decode-1
! Instructions decoded according to hardware based rules
! Instructions can vary in length
! One to 17 bytes

! Must be boundary aligned

! Branch prediction done here


Pentium and Pentium Pro
! Decode-2
! Instructions that require microcode ROM are
decoded here
! Address computations done here
! Execute
! ALU executes the instruction
! Write-back
! Results written back to register file
Pentium and Pentium Pro
! Branch unit and branch prediction
! Branch unit (BU)
! Contains
! Branch execution unit (BEU)
! Branch prediction unit (BPU)
! Whenever front decoder encounters a
branch
! Sends to BU
! Sends to execution unit to evaluate branch’s
condition
Pentium and Pentium Pro
! To determine if branch
! Taken
! Go get starting address of next block of code to

be executed
! Start address is branch target and must be

calculated
! Front end must be told to fetch code at new

address
! Not taken
! Moves to next line of code
Pentium and Pentium Pro
! Processor use speculative execution
! While waiting for branch calculation
! An educated guess on direction branch will take
! Based on history
! Will start before calculation actually done
! Stops delays
! Stops bubble formation
Pentium and Pentium Pro
! Instructions evaluated using speculation
! Can not be written to register until branch
condition evaluated
! If right
! Made non speculative
! Put in register
! If wrong
! Pipeline must be flushed
! Front end fetches correct branch target address
! Processor continues
! Bad hit to completion rate
Pentium and Pentium Pro
! Branch prediction
! Static
! Usually determine whether loop should continue
! A backward branch
! Branch condition false or taken
! Goes to code already used
! Majority this way
! A forward branch
! Branch condition true or not taken
! Go to new code
Pentium and Pentium Pro
! Static prediction fast
! No look up tables
! No calculations
! Success depends on instruction mix
! Many loops are good
Pentium and Pentium Pro
! Dynamic branch prediction
! Use
! Branch history table (BHT)
! Creates entry for each branch encountered on last
few cycles
! Some bits to determine if branch will be taken
! Based on past history

! Front end uses entry in this table to allow BP to

decide whether to speculatively execute the


branch
Pentium and Pentium Pro
! Branch target buffer (BTB)
! Should BP decide to evaluate branch speculatively
! Needs to know where in memory the branch is pointing
! Needs branch target

! BTB stores the branch targets of previously executed


branches
! When branch taken
! BPU grabs target from BTB
! Tells front end to start fetching instructions from that
address
! If right ok

! If not take a performance hit


Pentium and Pentium Pro
! If branch prediction has BHT
! Dynamic used
! If not static used
! Pentium holds 256 entries in BHT
! Not enough for most programs
! But fairly successful
! 75-85 % success rate
Pentium and Pentium Pro
! Pentium’s back end
! Two five stage integer pipelines
! U
! V
! One six stage floating point pipeline
Pentium and Pentium Pro
! Integer ALUs
! U is the default
! Simple integer unit (SIU)
! Contains a shifter
! V
! Complex integer unit (CIU)
! Address calculations
Pentium and Pentium Pro
! Floating point ALU
! Processor can only dispatch integer and
floating point under extreme circumstances
! Registers arranged in a stack
! Eight 80 bit
! Push and pop data onto stack
! As stack grows and shrinks
! Stack top (ST) only element accessible

! If desired data in middle must pop to get to it

! See fig 5-4


Pentium and Pentium Pro
! X87 register has ST with index value
! ST(1)
! See fig 5-5
! If add two floating point numbers
! One must be ST
! Other can be in any register
! fadd ST, ST(5)
! Flat register easier for RISC compiler
Pentium and Pentium Pro
! Use hack to get around stack concept
! Fxch
! Can swap any element of stack with ST
! See program 5-1
! Executes in zero cycles
! Free of charge
Pentium and Pentium Pro
! X86 overhead
! 30% transistors used for legacy support
! Support microcode ROM
! Instructions are not uniform in size
! Segmented memory model
Pentium and Pentium Pro
! Intel P6 micro architecture
! Pentium Pro features
! .35 micron
! 5 million transistors
! 200 Mhz clock speed
! L1 cache
! 8 KB instruction
! 8 KB data
! L2 cache
! 512 KB on die
Pentium and Pentium Pro
! Success related to decoupling of front and
back end
! Installed an instruction window
! See Fig 5-6
Pentium and Pentium Pro
! Decoupling front and back ends
! Old pentiums
! Statically scheduled
! Front and back connected together
! See fig 5-7
! Decode dispatch
! Looks at two instructions headed for both ALUs

! Determines whether they can be executed in

parallel
! Put back in original order
Pentium and Pentium Pro
! Problem with static scheduling
! Adapts poorly to dynamic code stream
! Poor use of super scalar hardware
! Code stream changes from application to
application
! But rules remain fixed
Pentium and Pentium Pro
! The issue phase
! Use special buffer to hold newly decoded
instructions
! Once buffer has some instructions
! Examines instructions
! Examines state of processor
! Issues instructions at the most opportune time
! Called dynamic scheduling
! Called issue buffer
! Out of program order
Pentium and Pentium Pro
! Once out of execution
! Must be put back into program order
! Called complete phase
! Called out-of-order execution or dynamic
execution
! See Fig 5-8
! Squeezes bubbles out of pipeline
! Front end must keep up with back end to benefit
from squeezing
Pentium and Pentium Pro
! Completion phase
! Put here while results of execution
! Are put in register file in program order
! Called commit
! Why need completion buffer

! Has rename register for result until final register

ready
! Bookkeeping register- not real just temporary
Pentium and Pentium Pro
! P6 issue phase
! Called reservation station (RS)
! Gets decoded instructions
! Waits to see execution requirements
! Then issued to execution stage
! P6 can issue three instructions per cycle to RS
! RS can issue up to five instructions per cycle
! Heart of P6 performance
Pentium and Pentium Pro
! Completion or reorder buffer
! Puts instructions back into order the came
in
! Stores data about each instructions
! Status
! Operands
! Register needs
! Original place in program
Pentium and Pentium Pro
! Reorder buffer at both the front and back
of back end
! New instructions given a tracking entry
! Have temporary renaming register
! Holds 40 entries
! With a data field
Pentium and Pentium Pro
! Instruction window
! ROB+RS
! ROB handles up to 40 instructions
! RS handles up to 20 instructions
Pentium and Pentium Pro
! P6 pipeline
! 12 stage
! BTB
! 3.5 stages
! Branch target buffer
! Two cycle fetch phase
! Decode
! 2.5 stages
Pentium and Pentium Pro
! Register rename
! In ROB
! Write to RS
! Read from RS
! Execute
! Commit
Pentium and Pentium Pro
! Lengthen pipeline allowed increase in clock
speed
! Decoupling reduced bubbles
! Hides hang ups
! But if one major problem occurs
! All must be flushed
Pentium and Pentium Pro
! Branch prediction
! 90 % correct
! Has 512 entry BHT+BTB
! Uses four bits for history information
! Gets more important as pipeline lengthens
! Flush more costly
Pentium and Pentium Pro
! Back end
! Two asymmetrical ALUs
! Single cycle throughput and latency
! Multiplication has
! Single throughput
! Four cycle latency
! One floating point ALU
! Three cycle throughput for most operations
! Multiplication
! Five cycles
! Beefed up memory access
! Load address
! Store address
! Store data
Pentium and Pentium Pro
! CISC,RISC and instruction set
translation (ISA)
! Older machine had register to memory and
memory to memory format instructions
! Programmer had to do all
! Now compilers can do
! ISA does this
! Support strings as well
Pentium and Pentium Pro
! Variable length instructions
! Harder to fetch and decode
! Harder to schedule
! X86 uses CISC and ISA
! Limit front end by translating CISC operations into
smaller, faster operations
! PowerPC uses RISC
! Use C
! Shifts burden of scheduling to compiler from
processor
Pentium and Pentium Pro
! Decoding unit
! Breaks down complex variable length X86
instructions
! Into one or more micro operations
! Three separate decoders
! Two simple fast
! One complex slow
! 16 byte groups of X86 instructions fetched from I-
cache
! Goes to 32 byte front end instruction queue
Pentium and Pentium Pro
! Instruction boundaries identified
! Type established
! Align for entry into decoder
! Up to three instructions per cycle
! Moved to decoder
! Converted to micro ops
! Passed to micro op queue
! Moved to ROB
! Produce six decoded micro ops/ cycle
! Four from complex
! Two from simple
Pentium and Pentium Pro
! Micro op queue
! Can pass up to three micro ops/cycle into
instruction window
! Simple X86 instructions most common type
of instruction
Pentium and Pentium Pro
! Cost of X86 legacy support
! 40% transistor budget
! Increased L1 to keep speed up
Pentium and Pentium Pro
! Conclusions
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro
Pentium and Pentium Pro

Вам также может понравиться