Computer System ArchitectureLecture

ctu re
chi te
em Ar
Sy st ctur
e
r
pute
i t e
A rch
Com t er
o mpu b o
n t o C
C e ri
uctio C .
: Int
rod
a M ay t or
pt er 1
C a rl tru c
Cha g r. L In s
En 513/
CPE
 Archit
visib ecture
le to i s th
 Instru t he p o s e a t
Orga re and
c rogr t ri b u
tion
us e d t i o n se a m m tes
fo r d t, n um e r
m e c ha at a ber
n i sms re p r of bi
t ec h , ad d e s e n t at ts
niqu
niza  e.g. I es. ress
in g
i on , I
/O
 Orga
s th e
n iz a
re a
mu l t
are iply
u
im p l t i o n i ns t r
e i s ho w u c t io
 Contr m
itect
e nt e feat n?
o l d ures
m em s ig
o r y n a ls , in
 e.g. I technolo terfaces
s th e gy. ,
u ni t r e a
o
addi r is it do hardwa
Arch
t io n ne b re
? y rep multipl
eate y
d
 A l l In t
the el x 8
s am 6 fam
arch e ba ily s
hare
Orga re and
itect s ic
 The IB ure
tion
sh ar M Sy st
e e
arch the sam m/370 f
itect am il
niza
ure e b as ic y
u
 T h is g
itect
com ives co
patib de
 At lea ility
 Organ st backward
Arch
betw ization s
een d i fe
di f e rs
rent
vers
io ns
 • Fun
 • Nee ctional
 • And ds to be cor
once unlike rect
s
 • Whadeployedoftware, dif
com t functi cult
to u
o a ls
plete ons pdat
 • Rel ness asi should it e
de)
 • D o e i a bl e supp
or t (
Turin
s it c
 • Har onti
nu e
g
d fau
gn G
 • Goo lt vs
tran
to p
erf o
rm c
spot gle st s ie n orre
s o ry t f a u lt ctly?
- me
 • Spa m or
y err
relia c e s or s a
bility atellites nd s
 • Hig vs d un
Desi
e sk t
op v
 • “Fas h performa s se
r ve r
a se t ” is on n c e
t of l y
 • Not importanmeaningfu
t tas l in t
anal just “Gi k s he c
og y gahe onte
 • Imp rtz”
– tru
xt of
for a ossible ck v
s sp
ll pro g o al : orts
gram faste c ar
s st po
s sib
l e de
sign
 • Low
 • Per cost
 • Cos unit manufa
(ma t of mak ctur
i ng c
sk c i n g o st (
 • De
s
ost) frst
c h i p aft
wafe
r cos
Two ign cost t)
o a ls
er d
reas ( e si
 • (D ons… huge d gn
ime/ ) e s i gn
team
 • Lo
wp
d ollar
joke s, w
h y?
)
 • En o w er/e
gn G
ergy nerg
 • En
ergy
i n (bat
tery
y
 • Cy
clic
o u t (co
oling
life,
cost
toda prob o f el
y lem, a n d re ectr
icity
 • Ch very
m
l a ted
c os
)
Desi
a uch ts)
impo llenge: a pr
oble
rtan b a la
 • An
d th
c e of
t h es
n cing
t he
m
 • No e bala e goal relativ

nce s e
expe go al i s c o ns
nse is absol tant
 • Ou
r foc
o f a ll o
ther
u t e ly i m po r t
ly ch
angi
cost us: p s ant ng
, po w e at
er, r rforman
eliab c
ility e, only
touc
h on
 • A no
(usa ther sha
ge a pi n g
 • A pp n d c on t
ext)
force
: ap
h a v e l i ca t i o n plic
atio
d i fe s an ns
 • Dom rent req applica d
:
/
tion
orce  • Lea ai n : uirem
dom
a i ns
i o ns
grou ents a i ns
d to p w i th
 • Scie diferen s im i
lar c
ge n o n t i t d e h ar a
fc: sign ct e r
i ng F
Do m me s we a s
l i ca t
 • Firs e qu e
n c i
ther
pred
n av a t c om p n g ictio
l ba l u ti n n,
 • Nee listics frg applicati
Ap p
foat d: large in g t on d
S hap
in g p a b l es o ma
mem in :
 • E x a o in t o ry,
heav
mp l e
 • Com s: CRAY y-du
ty
e-co m er T 3E ,
mm c ia l: d a IBM
 • Nee rce, Goo tabase/ BlueGene
e
+ I/O d: data g le web
mo v se r v
b an d in g ,
 • Exa w i dt
h
e m e n t,
h ig h
AMD m pl e s mem
O p te : S u n E ory
ron, nter
I n te p
l Xeo rise Ser
n ver,
 • De s
 • Needktop: home o
gr a p : inte fce,
hics/ g e r, me mult
 • Exam network? mory bandimedia, game
width s
 • Mob ples: Intel Co , inte
grate
:
ile : l re
 • Need a p to 2 , Cor d
/
orce p s, m e i7,
ai ns
i o ns
integ : low p obile AMD
ra t e d owe p hone Athlo
n
 • Lapt wirel
es s
r , integ
er p e
s
ops: rform
 • Sma Intel
ing F
Dom Core ance

l i c at
l le r dev 2 ,
othe ices: M o bile,
rs, In Atom
 • Emb te l Ato
m
A R M ch
ip s b , AM
D Tu
door e d de d y Sa rion
knob : mic m s un
Ap p
g an
S ha p
 • Need s rocon
trolle
rs in
d
 • Exam : low power, auto
mob
proc p l e l o w iles,
esso s : A RM c c o s t
 • Over r s (D SPs)
h i p s, d ed
icate
one 1 b il d dig
per p lion ARM ital s
 • De e h o ne ) c ores
sold
igna
l
s en s p ly E m in 20
ors be d d 0 6 (a
t l ea
 • Need e d : disp
o s
st
: ext a b le
reme “sma
ly low rt d u
powe s t”
r, ex
trem
ely lo
w co
s
Func
t io n a
l Vie
w
ns
ratio
Ope
Data Movement
ns
ratio
Ope
Data Storage
ns
ratio
Ope
Processing from/to Storage

ns
ratio
Ope
Processing from Storage to I/O

Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
asic
a l
c ept
n
ratio
B
Co n
Ope
Load R2, LOC

Add R4,R2,R3
Store R4, LOC
 Add t
loca he conte
tion
and A to nts of m
plac t e
C” e th hose of mory
e an
swe location
GE
r in
loca B,
Load tion
LLEN Ri , L
and OC
Stor
e Ri,
LOC
CH A
 are th
to tr e only i
ansf nstr
mem e r da uctio
t n
regis ory and a betwe s availa
t t e ble
cont ers. Do he gene n the
ents n ra
of ei ot chan l purpo
ther ge t se
loca he
tion
A or
B.
 Suppo
are a se that
vaila M
ble w ove and
ith t Add
Mov h e form instruc
e Lo ats tions
and c ation1
, Loc
GE
Add ation
Loca 2
tion1
, Loc
LLEN  These
ation
2
the instr
oper uctio
frst and ns m
lo a t ov
oper cation, the sec e or add
a o o
both nd at th verwrit nd loca a copy o
CH A
e in t f
mem of the op frst loc g the o ion to th
ory o eran ation rigin e
 Is it p r the
g e
d s ca
n b
. Eithe a
r or
l
ossib nera e in
of th le to l-pur t he
e se u s e few p o se re
part ty
(a)? pes to a er in giste
rs.
If ye c com s truc
s, g i plish ti ons
ve th the
e se task
quen in
ce.
C
Repr haract
esen er
t a t io
n
 Instru
 ct ion l
The
simp evel
of in l
stru est way
Para
com ction to ex llelis
m
elism
plete s in ecut
instr a e
uctio all steps process a sequ
the o ence
next n before of the c r is to
 Mult
icore
instr
u c tion
star
.
ti ng t
urre
he s
nt
teps
 Mult P ro c
of
ll
ip
fabr le proc esso
rs
Para
icate essi
core d n g
is us on a sin units ca
proc
esso ed for e gle chip n be
used rs ac .T
for t . The te h of the he term
 Mu l t
iproc
he c
o mple
rm p
roces
se
t e chip sor is th
 Mult esso . en
iproc rs
 Shar e s sors
ed-m
 Mes
sage
emo
ry m
pass ultip
roce
ing m ssor
ultic
omp
uter
s
 Electr
Inte onic Nu
grat mer
C-
 Ecker r And Coical
o
nd
t an mpu
 Unive d Mau t er
ENIA
grou rsity chly
 Trajec of Pe
nnsy
wea t or y ta b lvan
pons les f ia
 Starte or
back
d 19
 Finish 43
ed 1
 To
o la t
946
 Used
e for
war
e fo r
unti t
l1 955
 Decim
al (n
 20 ac ot b
ina
digit cumula ry)
s
s tors
e ta i l
 Prog
ram
of 10
swit med
ches man
 ually
C-d
18,0 by
00 v
 30 t
ons
acuu
m tu
bes
 15,0
ENIA
00 s
 140
kW p
quar
e fee
cons o t
ump wer
 5,00
0 ad
tion
seco ditio
nd ns p
 Store
d Pro
 Main g ra m
c
prog memory once
ra m pt
sto r
 ALU o
von
s a n d d i ng
ring
p er a ata
 Contr ting
on b
in s t r o l unit ina r
uctio i n t er y da
ta
n/Tu
and ns fr pret
exec o m m in g
 Input uting em o
ry
ope r a n d ou
ated tput
m an
 Prince by contr equipmen

Adva ton Ins o l un t
n ced titut i t
 IAS St u d e f o r
i es
Ne u
 Com p
leted
1952
Structure of von Neumann
machine
 1000
x 40
 Binar b it w
ords
y nu
2x2 m ber
 Set o bit instru
0
il s
(sto f regist ction
s
ra g e ers
d e ta  M e m in C P U )
o
 Mem ry Bufer R
ory
 Instru Addres egister
ction s Re
 Instru
IAS -
Reg giste
i r
s t er
Reg ction B
ister ufe
 Progr r
 Accu am Counte
mu l r
 Mu l t i ato r
p l i er
Q uo
t i en
t
Stru
c ture
of IA
S–
deta
il
 1947 - Eckert-Mauchly
c ia l
Computer Corporation
rs
 UNIVAC I (Universal
pute Automatic Computer)
mer
 US Bureau of Census 1950
calculations
Com
Com
 Became part of Sperry-Rand

Corporation
 Late 1950s - UNIVAC II
 Faster
 More memory
 Punched-card processing
equipment
IBM
 1953 - the 701
 IBM’s frst stored program
computer
 Scientifc calculations
 1955 - the 702
 Business applications
 Lead to 700/7000 series
 Replaced vacuum tubes
 Smaller
s
stor  Cheaper
 Less heat dissipation
si
 Solid State device
Tran
 Made from Silicon (Sand)

 Invented 1947 at Bell
Labs
 William Shockley et al.
 Second generation
d
machines
rs
Base
 NCR & RCA produced
pute small transistor
machines
stor
Com
 IBM 7000
 DEC - 1957
si
 Produced PDP-1
Tran
 Literally - “small
electronics”
nics
 A computer is made up
of gates, memory cells
ctro
and interconnections
 These can be
o e le
manufactured on a
semiconductor
 e.g. silicon wafer
M ic r
 Vacuu
 Trans m tube - 1
 Small istor - 1958 946-1957
-196
 Up to scale integ 4
of
 Mediu 100 device ration - 1
r
p u te
9
 100-3m scale in s on a chip 65 on
ions
 Large ,000 devicetegration -
 3,000scale integ s on a chip to 1971
 Very l - 100,000 ration - 1
Com
e ra t
-199 arge sc d e vi 971-

1 ale i c es
on a 1977
 100,0 nteg
ratio
chip
chip 00 - 10 n-1
978
 Ultra 0,00
Ge n
0,00
large 0 de
vic e
 Over sc a l
e
s on
a
100
,000 i n t e grat
,000 io n –
d e vi 1991
ce s
on a
c
-
hip
 Incre
as e d
 Gordo d e ns
ity o
n Mo
 Numb ore
–c
f com
pone
doub er of tr o -f o nts o
ansi u n d er o n ch
le e v ip
 Since ery stor
s
f Int
e
Law
year on a l
little 1 9 70 c hip w
’s d e ill
v el o
 N um p me
nt h
b
mon er of tr as s
lowe
an si
 Cost t hs s t o rs d o da
re’s
o u bles
unch f a chip ev er
y 18
ange has
 Highe d rem
aine
elec r pack d a lm
trica i ng d ost
M oo
perf l e
orm paths, g nsity me
 Smal ance iving
h
ans
ighe shorte
le r s
 Redu ize g
ives
r r
c ed incre
 Fewe pow
er a ased
fexi
r i nt nd c bility
relia erco oolin
bility n ne c g re
tions quir
incre e me
ases nts
Growth in CPU Transistor
Count
 1964
 Replaced (& not compatible with)
s
7000 series
serie
 First planned “family” of
computers
 Similar or identical instruction sets
 Similar or identical O/S
36 0
 Increasing speed
 Increasing number of I/O ports (i.e.
more terminals)
IBM
 Increased memory size

 Increased cost
 Multiplexed switch structure
 1964
 First minicomputer (after
-8
miniskirt!)
PDP  Did not need air conditioned

room
 Small enough to sit on a lab
DEC
bench
 $16,000
 $100k+ for IBM 360
 Embedded applications & OEM
 BUS STRUCTURE
DEC - PDP-8 Bus Structure
 1970
Mem r
 Fairchild
or y
o
duct
 Size of a single core
 i.e. 1 bit of magnetic core
storage
ic on
 Holds 256 bits

 Non-destructive read
Sem
 Much faster than core

 Capacity approximately
doubles each year
 1971 - 4004
 First microprocessor
l
Inte
 All CPU components on a single
chip
 4 bit
 Followed in 1972 by 8008
 8 bit
 Both designed for specifc
applications
 1974 - 8080
 Intel’s frst general purpose
microprocessor
 Pipelining
 On board cache
it u p  On board L1 & L2 cache
 Branch prediction
d in g
 Data fow analysis

 Speculative execution
Spee
 Processor speed
n ce
increased
n ce
 Memory capacity
rma
increased
B a la
 Memory speed lags
o
behind processor speed

Perf
Logic and Memory Performance
Gap
 Increase number of bits retrieved at
one time
ons
 Make DRAM “wider” rather than
“deeper”
ti  Change DRAM interface

 Cache
Solu
 Reduce frequency of memory
access
 More complex cache and cache on chip
 Increase interconnection bandwidth
 High speed buses
 Hierarchy of buses
 Peripherals with intensive I/O
demands
es
 Large data throughput demands
evic  Processors can handle this
 Problem moving data
 Solutions:
I/ O D
 Caching
 Bufering
 Higher-speed interconnection buses
 More elaborate bus structures
 Multiple-processor confgurations
Typi
Dev c al I/O
ice D
a ta
Rate
s
 Processor components
 Main memory
e
la n c  I/O devices
 Interconnection
is B a
structures
Key
 Increase hardware speed of
ip
processor
and
ure
h  Fundamentally due to shrinking logic
Orga nts in C
gate size
itect  More gates, packed more tightly, increasing

tio n
clock rate
 Propagation time for signals reduced
 Increase size and speed of caches
n iz a
Arch
 Dedicating part of processor chip

e
ov e m
 Cache access times drop signifcantly

 Change processor organization and
architecture
 Increase efective speed of execution
Im p r
 Parallelism
 Power
o ck
 Power density increases with density of logic
gin
s i ty
and clock speed
 Dissipating heat
Spee with Cl
d Lo  RC delay
De n  Speed at which electrons fow limited by
resistance and capacitance of metal wires
connecting them
d an
 Delay increases as RC product increases

 Wire interconnects thinner, increasing resistance
l em s
 Wires closer together, increasing capacitance
 Memory latency
 Memory speeds lag processor speeds
Prob
 Solution:
 More emphasis on organizational and
architectural approaches
Intel Microprocessor
Performance
 Typically two or three levels
e
of cache between processor
a c i ty
Cach and main memory
 Chip density increased
 More cache memory on chip
Cap
 Faster cache access
ased
 Pentium chip devoted about

10% of chip area to cache
 Pentium 4 devotes about
Incre
50%
 Enable parallel execution of
instructions
ex
c
 Pipeline works like assembly
Logi
Exec Compl
line
 Diferent stages of execution of
diferent instructions at same
u t ion
time along pipeline
 Superscalar allows multiple

More
pipelines within single

processor
 Instructions that do not depend
on one another can be executed
in parallel
 Internal organization of
processors complex
in g
rns
 Can get a great deal of
parallelism
inish
Retu  Further signifcant increases
likely to be relatively modest
 Benefts from cache are

Dim
reaching limit
 Increasing clock rate runs into
power dissipation problem
 Some fundamental physical
limits are being reached
 Multiple processors on single chip
 Large shared cache
h–
s
 Within a processor, increase in
Core
performance proportional to square root
roac
of increase in complexity
 If software can use multiple processors,
doubling number of processors almost
doubles performance
ip le
App
 So, use two simpler processors on the

chip rather than one more complex
processor
M u lt
 With two processors, larger caches are

Ne w
justifed
 Power consumption of memory logic less
than processing logic
 8080
 frst general purpose microprocessor
 8 bit data path
(1)
 Used in frst personal computer – Altair
 8086 – 5MHz – 29,000 transistors
 much more powerful
t io n  16 bit
 instruction cache, prefetch few instructions
 8088 (8 bit external bus) used in frst IBM PC
 80286
 16 Mbyte memory addressable
u
 up from 1Mb
Evol
 80386
 32 bit
 Support for multitasking
 80486
 sophisticated powerful cache and instruction pipelining
x86
 built in maths co-processor

 Pentium
 Superscalar
(2)
 Multiple instructions executed in parallel
 Pentium Pro
t io n  Increased superscalar organization

 Aggressive register renaming
 branch prediction
 data fow analysis
u
 speculative execution

Evol
Pentium II
 MMX technology
 graphics, video & audio processing
 Pentium III
x86
 Additional foating point instructions for 3D

graphics
 Pent
ium
 Note 4
 Furth Arabic rathe
r t ha
enha er foati n Ro

Core
ncem ng po
e nt s int a
nd m
ma n
num
era l
u l t im s
 First edia
(3)
 Core
x86
w it h
d u al
2 core
 64 b
t io n  Core
it a r
c h it e
ctur
2 Qu e
 Four a d – 3G
Hz –
proc 82 0
esso
r s on milli
 x86 chip on t
rans
arch istor
u
syst itect s
ems ure
 Orga d o mi n
Evol
n iz a a nt
dr a m t io n o ut s
a id e e
atica nd t mb e
 Instr lly echn
o lo g
d d ed
uctio y ch
ba c k n se ange
ward t a d
 ~1 i s co
m pa
rc hi t e
c t
nstr tibili ure evo
 500 u ction ty lved
x86
in s t r p er w it h
m on
 Se e u ction
s av
th a
d d ed
Inte a ila b
proc l we le
esso b pa
rs g es
for d
e ta i l
e d in
form
a t io n
on
Proc core
rs
esso
i
M u lt
 • Intel Core i7 (2009)

 • Application:  • 128-bit data (2x)
desktop/server  • 14-stage pipelined
datapath (0.5x)
 • Technology: 45nm
ern
(1/2x)  • 4 instructions per

cycle (~1x)
 • 774M transistors
 • Three levels of on-
M od
(12x)
chip cache
 • 296 mm2 (3x)
 • data-parallel vector
 • 3.2 GHz to 3.6 Ghz (SIMD) instructions,
(~1x) hyperthreading
 • 0.7 to 1.4 Volts  • Four-core
 ARM evolved from RISC
design
e ms
ARM
Syst  Used mainly in
embedded systems
d
 Used within product

edde
 Not general purpose

Emb
computer
 Dedicated function
 E.g. Anti-lock brakes in car
 Difer
ent
 Di f e
r
size
s
optim ent con
st r a
 Dife
izati
o n , r i n t s,
s
e u se
ents
rent
stem  Safet requirem
fexi y, reliab e nt s
irem bility ility,
 Lifesp , legis real-
d Sy
l a ti o time
an n ,
 E nv i r
o nm
 Static ental co
Requ
e dde
vs d ndit
 Slow y na m
ic lo
i o ns
to fa
 Comp s t sp
eeds
ads
uta t
 D i sc r io n v
Emb
ete I/O i
dyna even n te n
m ic s t vs sive
cont
i n uo
us
Possible Organization of an Embedded System
 Designed by ARM Inc.,
Cambridge, England
u t io n
 Licensed to manufacturers
 High speed, small die, low
power consumption
 PDAs, hand held games, phones
E v ol
 E.g. iPod, iPhone

 Acorn produced ARM1 & ARM2
in 1985 and ARM3 in 1989
ARM
 Acorn, VLSI and Apple Computer

founded ARM Ltd.
 Embedded real time
Cate ems
 Application platform
es
gori  Linux, Palm OS, Symbian
Syst
OS, Windows mobile
 Secure applications
ARM
 Key p
a ra m
 Perf
o
eter
s
pow rmance
 Syst
em
er co ,
nsumcost, siz
ptio e , se
c n curit
 In Hz lock sp y, re
liabi
e
Cloc sment
 Clock or m e d lity,
eed
ultip
 Sign rat e
, c lo c
les o
f
to 1 a l s i n C P
k cy
k Sp
cle,
or 0 U ta cloc
s
 Sign ke k t ic
k, c y
Asse
a ls m time c le t
to se
 O pe r ay
c han ttl e d
own
im e
a t ion
 Instru s ne ge a
t di f
an c e
c ti o ned to b e rent
 Fetch exec
u ti o
e sync s pe e
ds
logic , decod n hron
al e, lo i n di ised
orm
 Us u a ad an
d
s cret
e s te
instr lly requ s tore ps
, ari
 Pipeli uctio ire m th
Perf
n u lt i p met
l e c lo ic or
n
exec ing gi ck c
utio v es ycle
 So, cl n of inst simultan s pe
r
o ck ruct
spee ions eous
d is
not
th e
who
le st
ory
System Clock
 Millions of instructions
n
per second (MIPS)
Rate
u c t io
 Millions of foating point
instructions per second
(MFLOPS)
ution
Instr
 Heavily dependent on
instruction set, compiler
design, processor
Exec
implementation, cache
& memory hierarchy
 Progr
perf ams de
orm
 Writte ance signed to te
n in st
 Porta h ig h l
e ve l
 Repre le b
l a ng
ks
s e nt uage
 Syste s style
 Easily ms, numer of task
ma r  Wide measured ical, commerc
l y di
 E.g. S s tribu
ted
ia l
Corp ystem
ch
orat Perf
 CP U 2 i o n (S orman
P EC ce E
Ben
00 6 ) va l u
 17 fo for comp a t io n
at ut
 12 in ing point ation bo
 3 mil teger progra programs in und
 S p ee
lion
l in e s ms i
n C,
C++
C, C
++,
Fort
d an of co
de ran
 d ra
Sing te m
le ta
sk a
e tr ic
n s
d th
roug
hput
 Single task
 Base runtime defned for each
benchmark using reference machine
 Results are reported as ratio of
d
ic
Spee reference time to system run time
Metr
 Trefi execution time for benchmark i on
reference machine
 Tsuti execution time of benchmark i on test
system
SPEC
• Overall performance calculated by averaging

ratios for all 12 integer benchmarks
— Use geometric mean
– Appropriate for normalized numbers
such as ratios
 M ea
s
mac ures th
hine roug
 Mult
iple
carr
y
hput
ing o or ra
simu copi u t a n te of a
ltane es o umb
 Typic ously f b en c h er o
m f tas
 Ratio
ally,
sam
a r ks r
un
k
ic
is ca e as
num
 Tref lcula
t ber
M e tr
i ref
e d a of pr
benc erenc s f ol
oces
hma e ex lo ws : sors
 N nu rk i ecutio
n time
 Tsuti mber of cop for
of pr elapsed ies r
un s
og r ti m imul
Rate
com am o e f tane
pleti n all rom s ousl
 Agai on o f al N
l cop
proc
e
t ar t of
ex
y
n, a
geom ies o ssors un ecution
f pro t
etric
mea gramil
n is
calc
ulate
SPEC
d
 Gene Amdahl [AMDA67]
 Potential speed up of program
Law
using multiple processors
 Concluded that:
 Code needs to be parallelizable
ahl’s
 Speed up is bound, giving
diminishing returns for more
processors
 Task dependent
Am d
 Servers gain by maintaining multiple

connections on multiple processors
 Databases can be split into parallel
tasks
•
For
prog
—F ra m
ract runn
in g
para ion f of on s
Law
ingl
u la
lleliz cod e pr
ove a b l e i n fn o ce
—F rh e ad e w itely ssor
ract ith no s
seri ion che
( 1 duli
Form
a l -f) o ng
h l’s —T f co
is to de i
t nhe
prog al exe rent
—N r am c utio ly
is n o n si n tim
n e fo
fully umber gle
a
pro c r
cod e xplo o f pro e sso r
Amd
e it parr c e
alle ssors th
l po
rtion at
s of
 Conclusions
 f small, parallel processors has little efect
 N ->∞, speedup bound by 1/(1 – f)
 Diminishing returns for using more processors

Computer System ArchitectureLecture

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Computer System ArchitectureLecture

Загружено:

Авторское право:

Доступные форматы

ctu re

 • No e bala e goal relativ

Dom Core ance

Processing from/to Storage

Processing from Storage to I/O

Load R2, LOC

 Prince by contr equipmen

 Became part of Sperry-Rand

 Made from Silicon (Sand)

-199 arge sc d e vi 971-

 Increased memory size

PDP  Did not need air conditioned

 Holds 256 bits

 Much faster than core

 Data fow analysis

behind processor speed

ti  Change DRAM interface

itect  More gates, packed more tightly, increasing

 Dedicating part of processor chip

 Cache access times drop signifcantly

 Delay increases as RC product increases

 Wires closer together, increasing capacitance

 Pentium chip devoted about

 Superscalar allows multiple

pipelines within single

 Benefts from cache are

 So, use two simpler processors on the

 With two processors, larger caches are

 built in maths co-processor

t io n  Increased superscalar organization

 Additional foating point instructions for 3D

 • Intel Core i7 (2009)

(1/2x)  • 4 instructions per

 Used within product

 Not general purpose

 E.g. iPod, iPhone

 Acorn, VLSI and Apple Computer

• Overall performance calculated by averaging

 Servers gain by maintaining multiple

Вам также может понравиться