Вы находитесь на странице: 1из 28

Nirmal Chhugani

Power PC Architecture
Introduction
o PowerPC (Performance Optimization With
Enhanced RISC Performance Computing) is a
RISC architecture created by (AIM) AppleIBM
Motorola alliance in 1991.

o The original idea for the PowerPC architecture
came
from IBMs Power architecture (introduced in the
Risc/6000) and retains a high level of
compatibility with it.

o The intention was to build a high-performance,
superscalar low-cost processor.


History
o The history of the PowerPC began with IBM's 801
prototype chip of John Cocke s(IBM Watson
Research Lab) RISC ideas in the late 1970s (with
further refinements developed by David Paterson).
o 801-based cores were used in a number of IBM
embedded products, eventually becoming the 16-
register ROMP (Research Office Products Division
Micro Processor was a 10 MHz RISC
microprocessor designed by IBM in the early 1980)
processor used in the IBM RT(computer workstation
by IBM).
o The RT had disappointing performance and IBM
started the project to build the fastest processor on
the market. The result was the POWER architecture,
introduced with the RISC System/6000 in early 1990.
History.. POWER
architecture
The POWER architecture incorporated lots of the
RISC characteristics :
fixed-length instructions,
register-to-register architecture,
simple addressing modes,
large general register file
three-operand instruction format.


Additionally, it has other features more characteristic of more
complex ISAs.
Power Architecture
o Designed to be superscalar- dispatched across three independent
units: branch, fixed-point arithmetic, and floating point units. This
allows out of order execution.

o Compound instructions--updating the base register on a load and
store with the newly calculated effective address, thus eliminating
the need for extra add instructions required to increment the index
for array traversals.

o Does not implement delayed branches- Instead the POWER
architecture uses a branch target buffer, and the now well known
branch folding technique.

o Branching technique- The POWER architecture has eight
condition registers that are set by compare instructions. One
additional bit in the opcode of each instruction signaled that
instructions should be executed only under certain conditions, a
form of predicated execution.


Shortfalls..
o The original POWER microprocessor, one of the
first superscalar RISC implementations, was a
high performance, multi-chip design.
o IBM soon realized that they would need a single-
chip microprocessor to scale their RS/6000 line
from lower-end to high-end machines.
o Work on a single-chip POWER microprocessor,
called the RSC (RISC Single Chip) began. In
early 1991 IBM realized that their design could
potentially become a high-volume microprocessor
used across the industry.

PowerPC Architecture
o In order to maintain RS/6000 software compatibility, the
PowerPC adapted the POWER architecture, and many
enhancements were added to provide a low-cost, single-chip,
superscalar, multiprocessor capable, and 64-bit processor.

Several bit/field instructions that use three source operands were
eliminated to avoid the need for extra register ports.
Complex string instructions were left out, consistent with the
RISC philosophy.
Instructions whose operation was dependent on the value of
source operand were eliminated.
Precision shifts, integer multiplies, and divide-with-reminder
instructions were omitted.
Support for operation in both big-endian and little-endian
modes
Single and double precision floating-point arithmetic 64-bit
architecture, backward compatible to 32-bit

PowerPC family
o PowerPC 601:
medium sized and medium performance processor
includes a more sophisticated branch unit
capable to dispatch three out-of-order instructions per cycle.
up to 8 instructions per cycle can be fetched directly into an eight-entry
instruction queue (IQ), where they're decoded before being
dispatched to the execution core.
Branch folding: The instruction queue is used for detecting and dealing
with branches. The branch unit scans bottom four entries of the queue,
identifying branch instructions and determining what type they are
(conditional, unconditional).
In cases where the branch unit has enough information to resolve the
branch right then and there (an unconditional branch, or a conditional
branch whose condition is dependent on information that's already in the
condition register) then the branch instruction is simply deleted from
the instruction queue and replaced with the instruction located at the branch
target.
o PowerPC 603:
smaller die size than the 601
smaller cache
capable to dispatch three out-of-order instructions per cycle.
The 604 and 620 microprocessors were developed in the sequel of the PowerPC
production line. Both aimed for higher performance. The 604 was based on the
32-bit architecture while the 620 is a 64-bit architecture.


Current Status
PowerPC e200 - 32 bit power architecture microprocessor - speed ranging
up to 600 MHz - ideal for embedded applications.
PowerPC e300 similar to e200 with an increase in speed upto 667 MHz.
PowerPC e600 speed upto 2 Ghz ideal for high performance routing
and telecommunications applications.
POWER5 IBM dual core P
POWER6 IBM Dual core P - A notable difference from POWER5 is
that the POWER6 executes instructions in-order instead of out-of-order
PowerPC G3 - Apple Macintosh computers such as the PowerBook G3, the
multicolored iMacs, iBooks and several desktops, including both the Beige
and Blue and White Power Macintosh G3s.
PowerPC G4 - is a designation used by Apple Computer to describe a
fourth generation of 32-bit PowerPC microprocessors.
PowerPC G5 - 64-bit Power Architecture processors
Xenon - based on IBMs PowerPC ISA XBOX 360 game console.
Broadway based on IBMs PowerPC ISA Nintendo Wii gaming
console

Blue Gene/L - dual core PowerPC 440, 700 MHz, 2004
Blue Gene/P - quad core PowerPC 450, 850 MHz, 2007

PowerPC ISA
o Mix between Sparc(Risc) and Motorola(Cisc).
o Different implementation levels ( so the chip does
not need to be fully implemented for embedded
solutions ).
o Load and store architecture. Operations are always
done over registers. Memory is never directly
addressed.
o Offers a large number of mnemonics that increase
the number of instructions without increasing the
number of on-chip instruction.
o Passes arguments using registers and the stack.
o 32-bit Registers, allow to address 4 gigabytes of
virtual memory.
Overall design
Integer Execution Unit
Floating Point Unit
Load/Store Unit (LSU)
Branch Execution Units
Memory Management Unit
Memory Unit
Cache

PowerPC Registers
PowerPC's application-level registers are broken into three categories :
general purpose, floating point and special purpose registers.
o General-purpose registers (GPRs) - r0 to r31
flat-scheme of 32 general purpose registers.
Source and destination for all integer operations
address source for all load/store operations.
They also provide access to SPRs.
All GPRs are available for use with one exception: in certain instructions,
GPR0 simply means the value 0, and no lookup is done for GPR0's
contents.
o Some of these registers have special tasks assigned to them:
r0 Volatile register which may be modified during function linkage
r1 Stack frame pointer, always valid
r2 System-reserved register
r3-r4 Volatile registers used for parameter passing and return values
r5-r10 Volatile registers used for parameter passing
r11-r12 Volatile registers which may be modified during function linkage
r13 Small data area pointer register
r14-r30 Registers used for local variables
r31 Used for local variables or "environment pointers

Floating point registers
o Floating-point registers (FPRs)- fr0 to fr31
32 floating-point registers with 64-bit precision.
source and destination operands of all floating-point operations
can contain 32-bit and 64-bit signed and unsigned integer values,
as well as single-precision and double-precision floating-point
values.
FPRs also provide access to the FPSCR(Floating-Point Status and
Control Register)
FPSCR captures status and exceptions resulting from floating-
point operations, and also provides control bits for enabling specific
exception types.
Instructions to load and store double precision floating point
numbers transfers 64-bit of data without conversion.
Instructions to load from memory single precision floating point
numbers convert to double precision format before storing them in
the register.
f0 Volatile register
f1 Volatile register used for parameter passing and return values
f2-f8 Volatile registers used for parameter passing
f9-f13 Volatile registers
f14-f31 Registers used for local variables
Special-purpose registers (SPRs)
The Fixed-Point Exception Register (XER)- used for
indicating conditions for integer operations, such as carries
and overflows.

The Floating-Point Status and Control Register
(FPSCR)- 32-bit register used to store the status and
control of the floating-point operations.

The Count Register (CTR)- used to hold a loop count that
can be decremented during the execution of branch
instructions.

The Condition Register (CR)-32-bit register grouped into
eight fields, where each field is 4 bits that signify the result
of an instructions operation: Equal (EQ), Greater Than
(GT), Less Than (LT), and Summary Overflow (SO).

The Link Register (LR) contains the address to return to
at the end of a function call.


Data Types
It can use either little-endian or big-endian style.

Fixed-point data types include:
o Unsigned byte 8bits
o Unsigned halfword 16-bits
o Signed halfword 16-bits
o Unsigned word 32-bit
o Signed word 32-bit
o Unsigned doubleword 64-bits
o Byte Strings: From 0 128 bytes in length

2s complement is used for negative values
floating-point data formats
single-precision, 32 bits long (23 + 8 + 1)
double-precision, 64 bits long (52 + 11 + 1)
characters are stored using 8-bit ASCII codes


Instruction types
Instruction Format
All instruction encodings are 32 bits in length.
Bit numbering for PowerPC is the opposite of most other definitions:
bit 0 is the most significant bit, and bit 31 is the least significant bit.
Instructions are first decoded by the upper 6 bits in a field, called
the primary opcode. The remaining 26 bits contain fields for
operand specifiers, immediate operands, and extended opcodes,
and these may be reserved bits or fields.
Common Instruction formats:
Format 0-5 6-10 11-15 16-20 21-25 26-29 30 31
D-form opcd tgt/src src/tgt
immediate
X-form opcd tgt/src src/tgt src
extended opcd
A-form opcd tgt/src src/tgt src src extended opcd Rc
BD-
form
opcd BO BI BD AA LK
I-form opcd LI AA LK
Instruction format
D-form- provides up to two registers as source operands, one immediate source, and
up to two registers as target operands. Some variations of this instruction format use
portions of the target and source register operand specifiers as immediate fields or as
extended opcodes.

X-form- provides up to two registers as source operands and up to two target
operands. Some variations of this instruction format use portions of the target and
source operand specifiers as immediate fields or as extended opcodes.

A-form- provides up to three registers as source operands, and one target operand.
Some variations of this instruction format use portions of the target and source
operand specifiers as immediate fields or as extended opcodes.

BD-form- conditional branch instruction. The BO field specifies the type of condition ;
BI field specifies which CR bit to be used as the condition; BD field is used as the
branch displacement. AA bit specifies whether the branch is an absolute or relative
branch. The LK bit specifies whether the address of the next sequential instruction is
saved in the Link Register as a return address for a subroutine call.

I-form- used by the unconditional branch instruction. Being unconditional, the BO and
BI fields of the BD format are exchanged for additional branch displacement to form
the LI instruction field. This instruction format also supports the AA and LK bits in the
same fashion as the BD format.

Simplified powerpc instrution set
http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/

D-form opcd tgt/src src/tgt
immediate
X-form opcd tgt/src src/tgt src
extended opcd
A-form opcd tgt/src src/tgt src src extended opcd Rc
BD-
form
opcd BO BI BD AA LK
I-form opcd LI AA LK
Instruction formats
A-
Form
BD-
Form
D-
Form
PowerPC Addressing Modes
Load/store architecture
Indirect
Instruction includes 16 bit displacement to be added to base register
(may be GP register)
Can replace base register content with new address
Indirect indexed
Instruction references base register and index register (both may be
GP)
EA is sum of contents
Branch address Target address calculation
Absolute TA= actual address
Relative TA= current instruction address + displacement
{25 bits, signed}
Indirect
Arithmetic
Operands in registers or part of instruction
Floating point is register only
Link Register TA= (LR)
Count Register TA= (CR)


PowerPC function call
conventions
Results from a function call are returned in GPR3, FPR1, or
by passing a pointer to a structure as the implicit leftmost
parameter.
Any parameters that do not fit into the designated registers
are passed on the stack. In addition, enough space is
allocated on the stack to hold all parameters, whether they
are passed in registers or not.
PowerPC run-time environment uses a
grow-down stack that allocates space for
a function's parameters, linkage
information, and for local variables.
The environment uses a single stack
pointer without any frame pointer.
To achieve this simplification, the
PowerPC stack has a much more rigidly
defined structure.

PowerPC G4e Pipelining
Seven Stage Pipeline

Superscalar Microprocessor allows multiple instructions
to be executed in parallel.

Nine Execution Units
BPU : Branch Processing Unit
VPU : Vector Permute Unit
VIU : Vector Integer Unit
VCIU : Vector Complex Integer Unit
VFPU : Vector Floating Point Unit
FPU : Floating Point Unit
IU : Integer Unit
CIU : Complex Integer Unit
LSU : Load/Store Unit
PowerPC G4e Pipeline Stages
Stages 1 and 2 - Instruction Fetch:

These two stages are both dedicated primarily to grabbing an
instruction from the L1 cache.

The G4e can fetch four instructions per clock cycle from the L1
cache and send them on to the next stage

Stage 3 - Decode/Dispatch:

Once an instruction has been fetched, it goes into a 12-entry
instruction queue to be decoded.

The G4e's decoder can dispatch up to three instructions per clock
cycle to the next stage.
PowerPC G4e Pipeline Stages
Stage 4 - Issue:

The first queue Floating-Point Issue Queue (FIQ),
which holds floating-point (FP) instructions that are
waiting to be executed.

The second is the Vector Issue Queue (VIQ), which
holds vector operations.

The third queue is the General Instruction Queue
(GIQ), which holds everything else.

Once the instruction leaves its issue queue, it goes
to the execution engine to be executed.

PowerPC G4e Pipeline Stages
Stage 5 - Execute:

The instructions can pass out-of-order from their
issue queues into their respective functional units
and be executed.

Stage 6 and 7 - Complete and Write-Back :

In these two stages, the instructions are put back
into the order in which they came into the
processor, and their results are written back to
memory.

Design principles
Simplicity favors' regularity
Standard 32 bit instruction format for all instructions
fixed-length instructions,
register-to-register architecture
three-operand instruction format.
Smaller is faster
3- Categories of registers , but each handles specific instructions
so presumably faster access time
Make the common case fast
Integer and floating point instructions
Good design demands good compromises
To align with RISC principles many instructions that required
three source operands were eliminated
Many complex instructions curtailed to confirm with RISC
principles but compensated by large number of mnemonics that
increase the number of instructions .
Pros and Cons
Instruction Set
200 machine instructions
More complex than most RISC machines
e.g. floating-point multiply and add instructions that take three
input operands
e.g. load and store instructions may automatically update the
index register to contain the just-computed target address
Pipelined execution
More sophisticated than SPARC
Input and Output
Two different modes
Direct-store segment: map virtual address space to an external
address space
Normal virtual memory access
Permits a range of implementation from low cost
controllers through high performance processors.


References
http://www.ibm.com/developerworks/linux/library/l-
powarch/
http://www.cresco.enea.it/LA1/cresco_sp14_ylichr
on/CBE-
docs/PowerPC_Vers202_Book1_public.pdf
http://en.wikipedia.org/wiki/PowerPC
http://pds.twi.tudelft.nl/vakken/in1200/labcourse/in
struction-set

http://www.eecs.umich.edu/~stever/373/lecnotes2
.pdf
http://www.devx.com/ibm/Article/20943

Вам также может понравиться