Академический Документы
Профессиональный Документы
Культура Документы
Assembly Languages
Fundamentals
Pu-Jen Cheng
Materials
Some materials used in this course are adapted from
¾ The slides prepared by Kip Irvine for the book, Assembly Language
for Intel-Based Computers, 5th Ed.
¾ The slides prepared by S. Dandamudi for the book, Fundamentals of
Computer Organization and Designs.
¾ The slides prepared by S.
S Dandamudi for the book
book, Introduction to
Assembly Language Programming, 2nd Ed.
¾ Introduction to Computer Systems, CMU
(http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/
15213-f05/www/)
¾ Assembly Language & Computer Organization
Organization, NTU
(http://www.csie.ntu.edu.tw/~cyy/courses/assembly/
05fall/news/)
/ /)
(http://www.csie.ntu.edu.tw/~acpang/course/asm_2004)
Outline
General Concepts of Computer Organization
¾ Overview of Microcomputer
CPU, Memory, I/O
Instruction Execution Cycle
y
¾ Central Processing Unit (CPU)
CISC vs. RISC
6 Instruction Set Design Issues
¾ How Hardwares Execute Processor’s Instructions
Digital Logic Design (Combinational & Sequential Circuits)
Microprogrammed Control
¾ Pipelining
3 Hazards
3 ttechnologies
h l i ffor performance
f iimprovementt
¾ Memory
Data Alignment
2 Design Issues (Cache, Virtual Memory)
¾ I/O Devices
General Concepts of Computer Organization
Overview of Microcomputer
Von Neumann Machine, 1945
Memory,
y, Input/Output,
p p , Arithmetic/Logic
g Unit,, Control Unit
Stored-program Model
¾ Both data and programs are stored in the same main memory
Sequential Execution
http://www.virtualtravelog.net/entries/2003-08-TheFirstDraft.pdf
What is Microcomputer
Microcomputer
¾ A computer with a microprocessor (µP) as its central processing
unit (CPU)
Microprocessor (µP)
¾ A digital electronic component with transistors on a single
semiconductor integrated circuit (IC)
¾ One or more microprocessors typically serve as a central
processing unit (CPU) in a computer system or handheld device.
Components of Microcomputer
Basic Microcomputer Design
data bus
registers
I/O I/O
Central Processor Unit Memory Storage
Device Device
(CPU) Unit
#1 #2
ALU CU clock
l k
control bus
address bus
CPU
Arithmetic and logic unit (ALU) performs arithmetic (add, subtract) and
logical (AND
(AND, OR
OR, NOT) operations
Registers store data and instructions used by the processor
Control unit (CU) coordinates sequence of execution steps
¾ Fetch instructions from memory, decode them to find their types
Clock
Datapath consists of registers and ALU(s)
Datapath ALU input
RISC processor
Clock
Provide timing signal and the basic unit of time
Synchronize all CPU and BUS operations
Machine (clock) cycle measures time of a single operation
Clock is used to trigger
gg events
Clock period = 1 1GHz→clock cycle=1ns
Clock frequency
A instruction could take multiple cycles to complete, e.g. multiply in
8088 takes 50 cycles
one cycle
0
Memory, I/O, System Bus
Main/primary memory (random access memory, RAM)
stores
t both
b th program iinstructions
t ti and
dddata
t
I/O devices
¾ Interface: I/O controller
¾ User interface: keyboard, display screen, printer, modem, …
¾ Secondary storage: disk
¾ Communication network
System Bus
¾ A bunch of parallel wires
¾ Transfer data among the components
¾ Address bus (determine the amount of physical memory addressable)
¾ Data bus (indicate the size of the data transferred)
¾ Control bus (consists of control signals:
memory/IO read/write
read/write, interrupt
interrupt, bus request/grand)
Instruction Execution Cycle
Execution Cycle
¾ Fetch (IF): CU fetches next instruction, advance PC/IP
¾ Decode (ID): CU determines what the instruction will do
¾ Execute
Fetch operands (OF): (memory operand needed) read value from memory
E
Execute
t the
th iinstruction
t ti (IE)
Store output operand (WB): (memory operand needed) write result to
memory y
Instruction Execution Cycle (cont.)
Fetch PC program
Decode I-1 I-2 I-3 I-4
Fetch operands memory fetch
Execute op1
read
op2
Store output
p registers
g registers
g
instruction
I-1 register
decode
write
write
w
w
flags ALU
execute
(output)
Introduction to Digital Logic Design
¾ See asm
asm_ch2_dl.ppt
ch2 dl ppt
CPU
CPU
CISC vs.
vs RISC
6 Instruction Set Design Issues
¾ Number
N b off Add
Addresses
¾ Flow of Control
¾ O
Operand dTTypes
¾ Addressing Modes
¾ Instruction Types
¾ Instruction Formats
Processor
RISC and CISC designs
¾ Reduced Instruction Set Computer (RISC)
Simple instructions, small instruction set
Operands
O d are assumed d tto be
b iin processor registers
i t
Not in memory
Simplify design (e.g., fixed instruction size)
Examples: ARM (Advanced RISC Machines),
DEC Alpha
p ((now Compaq) p q)
¾ Complex Instruction Set Computer (CISC)
Complex instructions, large instruction set
¾ 2-address machines
One address doubles as source and result
¾ 1-address machine
Accumulator machines
¾ 0-address machines
Stack machines
Result
R lt goes onto
t the
th stack
t k
Number of Addresses (cont.)
Three-address machines
¾ Two for the source operands, one for the result
¾ RISC processors use three addresses
¾ Sample instructions
add dest
dest,src1,src2
src1 src2
; M(dest)=[src1]+[src2]
sub
b d
dest,src1,src2
t 1 2
; M(dest)=[src1]-[src2]
mult
lt d
dest,src1,src2
t 1 2
; M(dest)=[src1]*[src2]
Number of Addresses (cont.)
Example
¾ C statement
A=B+C*D–E+F+A
¾ Equivalent code:
mult T
T,C,D
C D ;T = C*D
C D
add T,T,B ;T = B+C*D
sub T
T,T,E
T E ;T = B+C*D-E
add T,T,F ;T = B+C*D-E+F
add A
A,T,A
T A ;A = B+C*D-E+F+A
Number of Addresses (cont.)
Two-address machines
¾ One address doubles (for source operand & result)
¾ Last example makes a case for it
¾ Sample instructions
A=B+C*D–E+F+A
¾ Equivalent code:
load T
T,CC ;T = C
mult T,D ;T = C*D
add T
T,BB ;T = B+C*D
sub T,E ;T = B+C*D-E
add T
T,FF ;T = B+C*D-E+F
add A,T ;A = B+C*D-E+F+A
Number of Addresses (cont.)
One-address machines
¾ Use special set of registers called accumulators
Specify one source operand & receive the result
¾ Sample instructions
A=B+C*D–E+F+A
¾ Equivalent code:
load C ;load C into accum
mult D ;accum = C*D
add B ;accum = C*D+B
sub E ;accum = B+C*D-E
add F ;accum = B+C*D-E+F
add A ;accum = B+C*D-E+F+A
store A ;store accum contents in A
Number of Addresses (cont.)
Zero-address machines
¾ Stack supplies operands and receives the result
Special instructions to load and store use an address
¾ Sample instructions
Sample instructions
load Rd,addr ;Rd = [addr]
store
t addr,Rs
dd R ;(addr)
( dd ) = R
Rs
add Rd,Rs1,Rs2 ;Rd = Rs1 + Rs2
subb Rd
Rd,Rs1,Rs2
R 1 R 2 ;Rd
Rd = R
Rs1
1 - Rs2
R 2
mult Rd,Rs1,Rs2 ;Rd = Rs1 * Rs2
Number of Addresses (cont.)
Example
¾ C statement
A = B + C * D – E + F + A
¾ Equivalent code:
load R1,B mult R2,R2,R3
load R2,C add R2,R2,R1
load R3,D sub R2,R2,R4
load R4,E add R2,R2,R5
load R5,F add R2,R2,R6
load R6,A store A,R2
Flow of Control
Default is sequential flow
Several instructions alter this default execution
¾ Branches
B h
Unconditional
Conditional
C di i l
Delayed branches
¾ Procedure calls
Delayed procedure calls
Flow of Control (cont.)
Branches
¾ Unconditional
Absolute address
PC-relative
j target
PC-relative
b target
Flow of Control (cont.)
e g , Pentium
e.g., e g , SPARC
e.g.,
Flow of Control (cont.)
Branches
¾ Conditional
Jump p is taken only
y if the condition is met
¾ Two types
Set-Then-Jump
Pentium
uses ret instruction
MIPS
uses jr instruction
Return address
In a (special) register
MIPS allows any general-purpose register
On the stack
Pentium
Flow of Control (cont.)
Flow of Control (cont.)
Delay slot
Parameter Passing
Two basic techniques
¾ Register-based (e.g., PowerPC, MIPS)
Internal registers are used
Faster
Limit the number of parameters
Recursive procedure
¾ Stack-based ((e.g.,
g Pentium))
Stack is used
More general
Operand Types
Instructions support basic data types
¾ Characters
¾ Integers
¾ Floating-point
I t ti overload
Instruction l d
¾ Same instruction for different data types
¾ Example: Pentium
mov AL,address ;loads an 8-bit value
mov AX,address ;loads a 16-bit value
mov EAX,address ;loads a 32-bit value
Operand Types
Separate instructions
¾ Instructions specify the operand size
¾ Example: MIPS
lb Rdest,address ;loads a byte
lh Rdest
Rdest,address
address ;loads a halfword
;(16 bits)
l
lw Rdest
Rdest,address
address ;loads
loads a word
ord
;(32 bits)
ld Rd
Rdest,address
t dd ;loads
l d a d doubleword
bl d
;(64 bits)
Similar instruction: store
Addressing Modes
How the operands are specified
¾ Operands can be in three places
Registers
instructions
Indirect
I di t d data
t movementt
add Rdest,Rsrc,0 ;Rdest = Rsrc+0
¾ Arithmetic and Logical
Arithmetic
Integer and floating
floating-point,
point signed and unsigned
add, subtract, multiply, divide
Logical
and, or, not, xor
Instruction Types (cont.)
Condition code bits
¾ S: Sign bit (0 = +, 1= -)
¾ Z: Zero bit (0 = nonzero
nonzero, 1 = zero)
¾ O: Overflow bit (0 = no overflow, 1 = overflow)
¾ C: Carry bit (0 = no carry
carry, 1 = carry)
E
Example:
l P Pentium
ti
cmp count,25 ;compare count to 25
;subtract 25 from count
je target ;jump if equal
Instruction Types (cont.)
¾ Flow control and I/O instructions
Branch
Procedure call
Interrupts
¾ I/O instructions
Memory-mapped I/O
Examples: SPARC,
SPARC MIPS,
MIPS PowerPC
¾ Variable-length
Used by CISC processors
Opcode
¾ Major and exact operation
Examples of Instruction Formats
How Hardware Executes
Processor’s
ocesso s Instructions
s uc o s
How Hardware Executes
Processor’s
Processor s Instructions
Operating System
Level 3
Instruction Set
Architecture Level 2
Microarchitecture L
Level
l1
data bus
registers
I/O I/O
Central Processor Unit Memory Storage
Device Device
(CPU) Unit
#1 #2
ALU CU clock
l k
control bus
address bus
Consider 1
1-bus
bus Datapath
Assume all entities are
32-bit wide
1-bit
1 bit ALU
ALU Circuit in 1
1-bus
bus Datapath
Memory Interface Implementation
Microprogrammed Control
32 32-bit
32 bit general-purpose
general purpose registers
¾ Interface only with the A-bus
¾ Each register has two control signals
G i and
Gxin d Gxout
G t
Control signals used by the other registers
¾ PC register:
PCin, PCout, and PCbout
¾ IR register:
IRout and IRbin
¾ MAR register:
MARin, MARout, and MARbout
¾ MDR register:
MDRin, MDRout, MDRbin and MDRbout
Microprogrammed Control (cont.)
add %G9,%G5,%G7
Implemented as
Transfer G5 contents to A register
Read: IRbin;
¾ Register
Arithmetic and logic instructions
¾ Branch
Jump
J di
direct/indirect
t/i di t
¾ Call
Procedures
P d iinvocation
ti mechanisms
h i
¾ More…
Microprogrammed Control (cont.)
High-level FSM
for instruction
execution
expensive
Example
add %G9,%G5,%G7
¾ Three steps
S1 G5out: Ain;
S2 G7out: ALU=add: Cin;
S3 Cout: G9in: end;
Microprogrammed Control (cont.)
Simple
microcode
organization
Microprogrammed Control (cont.)
Uses a microprogram to generate the control
signals
¾ Encode the signals of each step as a codeword
Called microinstruction
Microprogram essentially
Mi ti ll iimplements
l t th
the FSM
discussed before
Microprogrammed Control (cont.)
A simple microcontroller can execute a
microprogram to generate the control signals
¾ Control store
Store microprogram
¾ Use μPC
Similar to PC
¾ Address generator
Generates appropriate address depending on the
Opcode, and
Opcode
Condition code inputs
Microprogrammed Control (cont.)
Microcontroller
Vertical organization
¾ Horizontal organization
One
O bit forf eachh signal
i l
Very flexible
Long
L microinstructions
i i t ti
Example: 1-bus datapath
N d 90 bits
Needs bit for
f each
h microinstruction
i i t ti
Microprogrammed Control (cont.)
Horizontal
microinstruction
format
Microprogrammed Control (cont.)
Vertical organization
¾ Encodes to reduce microinstruction length
Reduced flexibility
¾ Example:
Horizontal organization
64 control
t l signals
i l ffor th
the 32 generall purpose registers
i t
Vertical organization
5 bits to identifyy the register
g and 1 for in/out
2-bus
2 bus Datapath
Microprogrammed Control (cont.)
Adding more buses reduces time needed to
execute instructions
¾ No need to multiplex the bus
Example
add
dd %G9
%G9,%G5,%G7
%G5 %G7
¾ Needed three steps in 1-bus datapath
¾ Need only two steps with a 2-bus datapath
S1 G5out: Ain;
S2 G7out: ALU=add: G9in;
Pipelining
Pipelining
Introduction
3 Hazards
¾ R
Resource, D
Data
t and
dCControl
t lH Hazards
d
3 Technologies for Performance Improvement
¾ Superscalar, Superpipelined, and Very Long Instruction
Word
Serial and Pipelining
pipeline
Pipelining (cont.)
Some reasons for unequal work stages
¾ A complex step cannot be subdivided conveniently
¾ An operation
p takes variable amount of time to execute
EX: Operand fetch time depends on where the operands
are located
Registers
Cache
Memory
¾ Complexity of operation depends on the type of operation
Add: may take one cycle
Multiply:
M lti l may ttake
k severall cycles
l
Pipeline Stall
Operand fetch of I2 takes three cycles
¾ Pipeline stalls for two cycles
Caused by hazards
resource
Also called structural hazards
¾ D t hazards
Data h d
Caused by data dependencies between instructions
Example:
p Result produced
p by
y I1 is read by
y I2
¾ Control hazards
Default: sequential execution suits pipelining
¾ Superpipelined
Increases the pipeline depth
Ex: Pentium
Wasted Cycles (pipelined)
When one of the stages requires two or more clock cycles,
clock cycles are again wasted.
St
Stages
exe
S1 S2 S3 S4 S5 S6
For k states and n
1 I-1
2 I-2 I-1
instructions the
instructions,
3 I-3 I-2 I-1 number of required
cycles is:
Cyccles
Almost all p
processors support
pp this
Memory address space
¾ Determined byy the address bus width
¾ Pentium has a 32-bit address bus
address space = 4GB (2 )
32
A 4 X 3 memory ddesign
i
using D flip-flops
Building a Memory Block (cont
(cont’d)
d)
Bl k di
Block diagram representation
t ti off a 4x3
4 3 memory
Address
Data
Control signals
¾ Read
¾ Write
Building Larger Memories
2 X 16 memory module using 74373 chips
Designing Larger Memories
64M X 32
memory using
i
16M X 16 chips
Alignment of Data
Programmer managed
¾ Virtual memory
Automates overlay management
¾ Set-associative
Set associative mapping
Specifies a set of cache lines for each memory block
¾ Associative mapping
No restrictions
Most p
processors use this I/O mapping
pp g
¾ Isolated I/O
Separate I/O address space
transfers
Typically used for bulk data transfer
¾ Interrupt-driven I/O
Interrupts are used to initiate and/or terminate data
transfers
Powerful technique
Handles unanticipated transfers
Interconnection
System components are interconnected by buses
¾ Bus: a bunch of parallel wires
Uses several buses at various levels
¾ On-chip buses
Buses
B to iinterconnect ALU and
d registers
i
A, B, and C buses in our example
Data
D t and d address
dd b
buses tto connectt on-chip
hi caches
h
¾ Internal buses
PCI,
PCI AGP
AGP, PCMCIA
¾ External buses
Serial,
S i l parallel,
ll l USB
USB, IEEE 1394 (Fi
(FireWire)
Wi )
PC
y
System Buses
ISA (Industry Standard
A hi
Architecture)
)
PCI (Peripheral Component
Interconnect)
AGP (Accelerated Graphics
Port))
Interconnection (cont.)
Bus is a shared resource
¾ Bus transactions
Sequence of actions to complete a well-defined
well defined
activity
Involves a master and a slave
operations
¾ Bus arbitration
Summary
General Concepts of Computer Organization
¾ Overview of Microcomputer
CPU, Memory, I/O
Instruction Execution Cycle
y
¾ Central Processing Unit (CPU)
CISC vs. RISC
6 Instruction Set Design Issues
¾ How Hardwares Execute Processor’s Instructions
Digital Logic Design (Combinational & Sequential Circuits)
Microprogrammed Control
¾ Pipelining
3 Hazards
3 ttechnologies
h l i ffor performance
f iimprovementt
¾ Memory
Data Alignment
2 Design Issues (Cache, Virtual Memory)
¾ I/O Devices