Академический Документы
Профессиональный Документы
Культура Документы
Alpha
Miciopioccssoi - Casc Study I
3-2 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Philosophy
Smait compilei, smait machine, and a CREAT
ciicuit design
Compilei cieates iecoid of execution
Machine exploits additional infoimation available at
iuntime
Woiks acioss baiiieis to compile-time analysis
Iocus on scalai piogiams
Add iesouices foi vectoi
Amdahl's law
3-3 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Roadmap
1997 1998 1999 1995 1996 2000 2001
EV5/333 EV5/333
21164 21164
EV6/575 EV6/575
21264 21264
EV68/1000 EV68/1000
21264 21264
PCA56/533 PCA56/533
21164PC 21164PC
EV56/600 EV56/600
21164 21164
0.5 m
0.35 m
0.35 m
0.35 m
EV67/750 EV67/750
21264 21264
0.28 m
PCA57/600 PCA57/600
21164PC 21164PC
0.28 m
0.18 m
0.18 m
Higher Performance
L
o
w
e
r
C
o
s
t
EV8 EV8
0.13 m
EV7/1000 EV7/1000
21364 21364
3-4 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Aichitectuie
Iull 64-bit load/stoie RISC aichitectuie
High clock speed, multiple instiuction issue, and multiple
piocessois
Sepaiate integei and floating point iegisteis(Thiity-two each)
64-bit viitual byte addiessing
32-bit fixed instiuction size(6-bit opcode)
3-5 Comuter Archtecture Lab. Comuter Archtecture Lab.
PALcode
Similai to the BIOS libiaiies in
PC
Piivileged mode
complete contiol of the machines
state
physical I-stieam
inteiiupt disabled
Special Instiuction
Access all inteinal state piivate
CPRs
Viitual oi Physical LD/ST
Piivileged jump
Stiict Coding Rules
Applcalcns
HW
peralnq
Syslen
PAL
3-6 C o m u t e r C o m u t e r
Alpha 21O64 Oveiview
Veiy fast clock(2OOMHz 21O64 in 1992 and 275Mhz
21O64A in 1994)
Simple instiuctions
Dual issue supeiscalai
Thiee paiallel pipelines
Integei pipeline : 7 stages
Iloating point pipeline : 1O stages
Load/stoie pipeline : 7 stages
Dynamic bianch piediction with a 2O48 entiy table
Sepaiate instiuction and data caches
3-7 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21O64 Block diagiam
3-8 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21O64 pipeline
S
swap
K
|
p 5
.
|
p 4
|
p 8
C
|
p 2
F
|
p !
l
issue
0
0ecode
\
|
w|
/
/LU !
B
/LU 2
\
w|ie
|eqs
\
w|ie
|eqs
B
cache
access
/
add|ess
add
l
issue
0
0ecode
F
ins.
ech
S
swap
Load/so|e
ins|ucion
l
B,pass
B,pass
lneqe|
ins|ucion
B,passes
3-9 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21164 Oveiview
Quad-issue Supeiscalai
Low latency in functional units
High thioughput, nonblocking memoiy subsystem
low-latency piimaiy caches
Laige second-level, on-chip wiite-back cache
1O peicent fastei than the pievious 21O64
implementation
3-10 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21164 PC
Shipping at 583MHz Novembei 1998
16.7/17.O estimated SPECint95
(base/peak)
2O.7/22.7 estimated SPECfp95
(base/peak)
34O MB/sec STREAMS
Chip featuies:
1.O cm
7 million tiansistois
32K 2-set I-cache
16K viitual D-cache
impioved 3-cycle multipliei
impioved 6 bit/cycle dividei
incieased wiite buffei size (8 x 32B)
suppoit foi 2OOMHz off-chip cache
3-11 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Iile(21164)
4O Integei Registei
RO-R31 foi CPR
Eight shadow iegistei foi PALcode
4 iead poit(2 foi pipe), 2 wiite poit(1 foi pipe)
32 IP Registei
9poit(5 iead, 4 wiite)
3-12 Comuter Archtecture Lab. Comuter Archtecture Lab.
Integei Pipeline(21164)
Bypass fiom any stage
except multiply : only fiom S6
O ` 2 3 4 5 G
F Sw ` O wR
cache
access
decode
swao
o|edict
issue
RF |ead
/LU 1 /LU2 w|ite
FC Oen
\/ Oen
TB
DTB
HitlHiss
HtlHss
Byoass(Fo|wa|dinq)
3-13 Comuter Archtecture Lab. Comuter Archtecture Lab.
Iloating Point Pipeline(21164)
9 stage(5ns cycles)
0 l 2 3 4 5 6
F BW l 0 FWP
cache
access
decode
swab
b|edict
issue
PF |ead
/dd
3x
7 8
LlD
Vu|l
BHFT
Vu|2
/ddlPnd
/ddlPnd
w|ite
w|ite
Bybass
3-14 Comuter Archtecture Lab. Comuter Archtecture Lab.
Memoiy Pipeline(21164)
21164 used L2 cache access in pipeline
. : + e
|| S! | |
::'-
.-J
M1
|. ::'-
| .-J
MZ M1 M1 M M M M MJ
J .
|. ::'-
J| .-J
J|
::'- |''
3-15 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction/Data stieam(21164)
I-stieam suppoit
Stieam Buffei piefetches in-line code
BHT/1SR Stack/Bubble Squash
D-stieam suppoit
Pending LD/Wiapped Reads impiove latency
Buist mode RAM suppoit
Hit undei Miss
Pending Stoie, fully pipeline LD/ST to cache
3-16 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 Oveiview
Thiid-geneiation 64-bit Alpha miciopiocessoi
New motion-video instiuction(MVI)
4-way out-of-oidei-issues
dynamic scheduling
iegistei ienaming
speculative execution
4 integei execution unit
2 floating-point execution unit
BIU maintains coheiency between the D-cache and
the L2 cache and main memoiy
3-17 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 pipeline
3-18 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 Update
Miciopiocessoi Ioium 1996
3O- SPECint95 and 5O- SPECfp95
5OOMHz in O.35um CMOS
Spectaculai memoiy bandwidth
Systems 2H97
Iiist powei on 1uly 1997 (no IP)
Iull function powei on Ieb 1998
Pioduction powei on 1une 1998
3-19 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Highlight
Impiove
Single piocessoi peifoimance, opeiating fiequency, and
memoiy system
SMP scaling
System peifoimance density
Reliability and availability
Deciease
System cost
System complexity
3-20 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Ieatuies
Alpha 21264 coie with enhancements
Integiated L2 Cache
Integiated memoiy contiollei
Integiated netwoik inteiface
Suppoit foi lock-step opeiation to enable high-
availability systems.
3-21 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Block Diagiam
Memory
Controller
R
A
M
B
U
S
21264
Core
16 L1
Miss BuIIers
L2
Cache
Address Out
Address In
Network
InterIace
N
S
E
W
I/O
16 L1
Victim BuI
16 L2
Victim BuI
64K Icache
64K Dcache
3-22 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Coie
Int
Reg
Map
Branch
Predictors
FETCH MAP QUEUE REG EXEC DCACHE
Stage: 0 1 2 3 4 5 6
L2
cache1
.5MB
6-Set
Int
Issue
Queue
(20)
Exec
4 Instructions / cycIe
Reg
FiIe
(80)
Victim
Buffer
L1
Data
Cache
64KB
2-Set
FP
Reg
Map
FP ADD
Div/Sqrt
FP MUL
Addr
80 in-fIight instructions
pIus 32 Ioads and 32 stores Addr
Miss
Address
Next-Line
Address
L1 Ins.
Cache
64KB
2-Set
Exec
Exec
Exec
Reg
FiIe
(80)
FP
Issue
Queue
(15)
Reg
FiIe
(72)
3-23 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated L2 Cache
1.5 MB
6-way set associative
16 CB/s total iead/wiite bandwidth
16 Victim buffeis foi L1 -> L2
16 Victim buffeis foi L2 -> Memoiy
ECC SECDED code
12ns load to use latency
3-24 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated Memoiy Contiollei
Diiect RAMbus
High data capacity pei pin
8OO MHz opeiation
3Ons CAS latency pin to pin
6 CB/sec iead oi wiite bandwidth
1OOs of open pages
Diiectoiy based cache coheience
ECC SECDED
3-25 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated Netwoik Inteiface
Diiect piocessoi-to-piocessoi inteiconnect
1O CB/second pei piocessoi
15ns piocessoi-to-piocessoi latency
Out-of-oidei netwoik with adaptive iouting
Asynchionous clocking between piocessois
3 CB/second I/O inteiface pei piocessoi
3-26 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 System Block Diagiam
6 Layei Metal
1OO million tiansistois
8 million logic, 92 million RAM
7O SPECint95 (estimated)
12O SPECfp95 (estimated)
RTL model iunning
Tapeout 4Q99
Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM
Miciopioccssoi - Casc Study I
3-29 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Aichitectuie Oveiview
Advanced RISC Machine
Oiiginally intended to simple,
low cost, 32bit system to be
used peisonal computei
veiy definite RISC piopeities
low numbei of instiuction,
addiessing mode, instiuction
foimats
all instiuction executes in a cycle
memoiy accessed only by
load/stoie instiuction
haidwiied contiol
Best MIPS pei watt and $
3-30 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Application Overview
Portable
Apple Newton PDA, Mobile Computer
GSM, PCS, Smart Phone, Video Phone
ISDN Chip
Embedded
ATM Card
Smart Card
Consumer Multimedia
Oracle Network Computer
Settop Box
Video Game, Education Game
Camera
3-31 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Chips
CPU Product Description Process Die
Area
Average
Power
Performance
ARM7TD
MI
ARM7TDMI Core
(Optimized Hard Macro)
0.35m
0.25m
2.1mm
1.0mm
0.6mW/MHz
N/A
0.9 MIPS/MHz or
59 MIPS 66MHz
N/A
ARM710T ARM7TDMI Core 8KB
UniIied Cache MMU
Note: Power
calculations made with
cache on.
0.35m
0.25m
11.7mm
5.8mm
1.8mW/MHz
N/A
0.9 MIPS/MHZ or
53 MIPS 59MHz
N/A
ARM740T ARM7TDMI Core 8KB
UniIied Cache MMU
Note: Power
calculations made with
cache on.
0.35m
0.25m
9.8mm
4.9mm
1.8mW/MHz
N/A
0.9 MIPS/MHZ or
53 MIPS 59MHz
N/A
3-32 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Chips (Cont'd)
CPU Product Description Process Die Area Average
Power
Performance
SA-110 SA-110 Core 0.35m
0.35m
0.35m
0.35m
50mm
N/A
N/A
N/A
110mW
N/A
N/A
N/A
1.15 MIPS/MHz or
115 MIPS 100MHz
1.15 MIPS/MHz or
191 MIPS 166MHz
1.15 MIPS/MHz or
230 MIPS 200MHz
1.15 MIPS/MHz or
268 MIPS 233MHz
SA-1100 SA-110 Core Caches
MMU Display Controller
0.35m
0.35m
0.35m
0.35m
N/A
N/A
N/A
N/A
230mW
N/A
330mW
550mW
1.13 MIPS/MHz or
150 MIPS 133MHz
1.13 MIPS/MHz or
180 MIPS 160MHz
1.16 MIPS/MHz or
220 MIPS 190MHz
1.14 MIPS/MHz or
250 MIPS 220MHz
SA-1110 SA-110 Core SA-1100
Eunctions Enhanced
Memory I/O
0.35m
0.35m
N/A
N/A
240mW
400mW
1.13 MIPS/MHz or
150 MIPS 133MHz
1.14 MIPS/MHz or
235 MIPS 206MHz
3-33 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction Set Summaiy(V-4)
Icatuic
high code density
Conditional execution
fiist 4 bits of opcodes contains 16 possible conditions
Baiiel shiftei
Encoding of semantic content in each instiuction
easy instiuction decoding
A small numbei of highly flexible instiuction types
Consistent instiuction data foimats
3-34 Comuter Archtecture Lab. Comuter Archtecture Lab.
1O Basic Instiuction Types
2 types (ALU, baiiel siftei, multipliei, 16 visible 32 bit iegisteis)
Data piocessing and PSR tiansfei
aiitlmctic (SUB RSB ADD ADC SBC RSC CMP CMN)
logic (AND EOR TST TEQ ORR MOV BIC MVN)
slift (LSL, LSR, ASR, ASL, ROR)
Multiply and Multiply-Accumulate(Mul, MLA)
3 types (Tiansfei of data between main memoiy and iegistei bank)
Ilexibility of addiessing(single data tiansfei : LDR, STR)
iapid context switching(block data tiansfei : LDM, STM)
managing semaphoies(single data swap : SWP)
2 types (Ilow and piivilege level of execution)
Bianch and Bianch with link (B, BL)
Softwaie Inteiiupt (SWI)
2 types (Exteinal copiocessoi)
copiocessoi data opeiation (CDP)
copiocessoi data tiansfei (LDC, STC), iegistei tiansfei (MRC, MCR)
3-35 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction Ioimat
3-36 Comuter Archtecture Lab. Comuter Archtecture Lab.
Opeiating Mode
U +
; +^
;^ ^
; +^
7 U j ,
oie!elc
=
, ;
,
; ; ; ;
;^ ;;U
1OOO1
1OO1O
1OO11
1O111
11O11
PSR (Picqiam Slalus PSR (Picqiam Slalus PSR (Picqiam Slalus PSR (Picqiam Slalus
Req!slei) Req!slei) Req!slei) Req!slei)U UU U 4O| 4O| 4O| 4O| l l l l
1OOOO
( (( (Mcde) Mcde) Mcde) Mcde)
(Jsei Mcde)
U
(Fasl +nleiiuol Requesl Mcde)
(+nleiiuol Requesl Mcde)
Il
(Suoeiv!sci Mcde)
(Aocil Mcde)
,
(Jnde!!ned Mcde)
3-37 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Bank
37(geneial-31, status-6), 16
/=
U
16(RO-R15)2;U
(CPSR, SPSR).
R15 => PC,, R14 => /=
CPSR(Cuiient Piogiam Status Registei):
\;U
SPSR(Stoied Piogiam Status Registei):
+SPSR (Stoied Piogiam Status Registei)
3-38 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Bank(Cont'd)
R0
R1
R2
R3
R12
R13
R14
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12_fiq
R13_fiq
R14_fiq
R15(PC)
R4
R5
R6
R7
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R0
R1
R2
R3
R12
R13_abt
R14_abt
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_irq
R14_irq
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_und
R14_und
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_svc
R14_svc
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
CPSR
SPSR_fiq
CPSR CPSR
SPSR_svc
CPSR
SPSR_abt
CPSR
SPSR_irq
CPSR
SPSR_und
System & User System & User System & User System & User FQ FQ FQ FQ Supervisor Supervisor Supervisor Supervisor Abort Abort Abort Abort PQ PQ PQ PQ Undelined Undelined Undelined Undelined
General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter
Proqram Status Peqisters Proqram Status Peqisters Proqram Status Peqisters Proqram Status Peqisters
3-39 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7 Block Diagiam
3-40 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7 Pipeline
Fetch
Fetch
PH-1 : off-chip memoiy
access
PH-2 : Instiuction Reg. !an
instiuction fiom off-chip
memoiy
Decode
Decode
PH-1 : Decode-stage
instiuction iegistei !
instiuction iegistei
PH-2 : Decode instiuction
Execute
Execute
PH-1 : Opeiand fetch and
Shiftei opeiation
PH-2 : ALU opeiation and
Result wiite opeiation
3-41 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb)
Ioi contiol application
CISC : good code density but
powei limitation
RISC : pooi code density
Solution to code size pioblem
hand coding (ieduce 1O-2O%)
compiessed code which is
expended at iun time(-3O%)
Thumb concept
on execution, 16 bits Thumb code are decompressed to equivalent 32
bits ARM instruction Thumb concept
3-42 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb)
Always condlon code
Major opcode
denoting format 3
move/compare/add/sub
with immediate value
Minor opcode
denoting ADD
instruction
Destination and
source register
lmmediate
value
1110 00 1 0100 1 0 Rd 0 Rd 0000 8-bit immediate
001 10 8-bit immediate Rd
APM code
Thumb code
Example ADD rd. =Consl
Thumb code limitations
only eight iegisteis
2-opeiand instiuction
usei 3bit iegistei specifieis
3-43 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb) - Cont'd
Noimalizcd Oliystonc 1.1 codc sizc
(Source: Microprocessor Forum, 1993, and vendor data)
3-44 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb) - Cont'd
Pioccssois at 5 volts in 16-bit mcmoiy systcms
33 MMz 5V
18 MMz
?O MMz 5V
1O MMz 5V
Syslem Syslem Syslem Syslem
33 MMz 5V
33 MMz 5V
1G MMz 5V
?5 MMz 5V
O4?4
OO4
O5
O1
Pcwei Pcwei Pcwei Pcwei
(W) (W) (W) (W)
O181
??5
O?5
?5
Piccessci Piccessci Piccessci Piccessci
ARM7OM1
ARM 71O
Z38O
SM7O3?
M8/5OO
48GS|C
M8/3OOM
38G|C
38?
31
1G4
1O
Ohiyslcne Ohiyslcne Ohiyslcne Ohiyslcne 11 11 11 11
(M1PS) (M1PS) (M1PS) (M1PS)
?1?
18O
1O
8O
OO
78
33
1O
M1PS/W M1PS/W M1PS/W M1PS/W
117
8
8
3
(Source: Microprocessor Eorum, 1993, and vendor data)
3-45 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM81O Oveiview
8O MIPS(3.3V, O.5 micion) Peifoimance
5 -stage Pipeline
Highei Clock iate
incieased die size, =7,, =2U
Paiallel Opeiation of shiftei and addei
deceased coie cycle time
CPI= 1.43
double-bandwidth cache ieads
load and stoie instiuctions : 1cycle
bianch piediction
3-46 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM81O Block Diagiam
5 stage pipeline
ARM8 CPU coie
- PU
static bianch
piediction in PU
8KB unified cache
wiite-back/wiite-
thiough
MMU
two-level page-table
stiuctuie
3-47 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM8 coie Block Diagiam
Iftcr / /L\
Fc_:tcr EarI
Fc_:tcr cccccr
arc CcrtrcI Lc_c
CF\ Ccrc CF\ Ccrc CF\ Ccrc CF\ Ccrc
FrcfctcI FrcfctcI FrcfctcI FrcfctcI
\rt \rt \rt \rt
!crcry !crcry !crcry !crcry
!rtcrfacc !rtcrfacc !rtcrfacc !rtcrfacc
^DItJIcr
Vrtc ata
FJcIrc
FC
!C
FC
!rcrcrcrtcr
/rrrc::
EDffcr
FF:
!r:trctrr
!r:trct
Frata
Frata Frata
E
E
D
:
FC
F
c
:
D
I
t
E
D
:
F
c
:
D
I
t
E
D
:
Vrata
/
E
D
:
/ E:
F
c
:
D
I
t
E
D
:
\/rrr:: \/rrr::
FC
Vrata
Frata
!r:trDctr
r
!C
3-48 Comuter Archtecture Lab. Comuter Archtecture Lab.
185 MIPS(2.OV, O.35 micion) -> PDA
Coie logic
5 stage pipeline
Havaid aichitectuie(I-cache, D-cache)
Cache
I-cache : 16KB 32 way set-associative with 32bytes block
D-cache : 16KB 32 way set-associative wiite-back with 32bytes block
MMU
IMMU, DMMU
sepaiate TLB(32 entiies each)
data TLB(flusl-all/singlc), instiuction TLB(flusl-all)
|- |- |- |- .||- .||- .||- .||- -..|- -..|- -..|- -..|- '.- '.- '.- '.-
''.'' ''.'' ''.'' ''.''
'` `. '` `. '` `. '` `.
`` `` `` ``
' ' ' '
\\ \\ \\ \\
/.'' /.'' /.'' /.''
Stiong ARM-11O
3-49 Comuter Archtecture Lab. Comuter Archtecture Lab.
Stiong ARM-11O Block Diagiam
3-50 Comuter Archtecture Lab. Comuter Archtecture Lab.
Stiong ARM-11O
Die aiea: 5Omm