Академический Документы
Профессиональный Документы
Культура Документы
Hardware/Software Introduction
Chapter 1: Introduction
Outline
Embedded systems overview
What are they?
PCs
Laptops
Mainframes
Servers
Modems
MPEG decoders
Network cards
Network switches/routers
On-board navigation
Pagers
Photocopiers
Point-of-sale systems
Portable video games
Printers
Satellite phones
Scanners
Smart ovens/dishwashers
Speech recognizers
Stereo systems
Teleconferencing systems
Televisions
Temperature controllers
Theft tracking systems
TV set-top boxes
VCRs, DVD players
Video game consoles
Video phones
Washers and dryers
Tightly-constrained
Low cost, low power, small, fast, etc.
Pixel coprocessor
D2A
A2D
lens
JPEG codec
Microcontroller
Multiplier/Accum
DMA controller
Memory controller
Display ctrl
UART
LCD ctrl
Design metric
A measurable feature of a systems implementation
Optimizing design metrics is a key challenge
Maintainability: the ability to modify the system after its initial release
Correctness, safety, many more
10
Power
Performance
Size
NRE cost
CCD
CCD preprocessor
Pixel coprocessor
D2A
lens
JPEG codec
Microcontroller
Multiplier/Accum
DMA controller
Memory controller
Display ctrl
UART
LCD ctrl
Hardware
Software
11
Revenues ($)
Average time-to-market
constraint is about 8 months
Delays can be costly
12
Peak revenue
Market rise
Delayed
Loss
D
On-time
entry
2W
W
Time
Delayed
entry
13
Peak revenue
Peak revenue from
delayed entry
On-time
Market fall
Market rise
Delayed
On-time
entry
Delayed
entry
2W
W
Time
On-time = 1/2 * 2W * W
Delayed = 1/2 * (W-D+W)*(W-D)
14
Example
NRE=$2000, unit=$100
For 10 units
total cost = $2000 + 10*$100 = $3000
per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
15
B
C
$120,000
$80,000
A
B
$160
p er p rod uc t c ost
$160,000
$200
$120
$80
$40
$40,000
$0
$0
0
800
1600
2400
800
1600
2400
16
Throughput
Tasks per second, e.g. Camera A processes 4 images per second
Throughput can be more than latency seems to imply due to concurrency, e.g.
Camera B may process 8 images per second (by capturing a new image while
previous image is being stored).
17
18
Processor technology
The architecture of the computation engine used to implement a
systems desired functionality
Processor does not have to be programmable
Processor not equal to general-purpose processor
Controller
Datapath
Controller
Datapath
Controller
Datapath
Control
logic and
State register
Control logic
and State
register
Registers
Control
logic
index
Register
file
Custom
ALU
State
register
IR
PC
General
ALU
IR
total
+
PC
Data
memory
Program
memory
Assembly code
for:
Data
memory
Data
memory
Program memory
Assembly code
for:
total = 0
for i =1 to
total = 0
for i =1 to
General-purpose
Application-specific
Single-purpose (hardware)
19
Processor technology
Processors vary in their customization for the problem at hand
Desired
functionality
General-purpose
processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
total = 0
for i = 1 to N loop
total += M[i]
end loop
Application-specific
processor
Single-purpose
processor
20
General-purpose processors
Programmable device used in a variety of
applications
Also known as microprocessor
Features
Program memory
General datapath with large register file and
general ALU
User benefits
Low time-to-market and NRE costs
High flexibility
Controller
Datapath
Control
logic and
State register
Register
file
IR
PC
Program
memory
General
ALU
Data
memory
Assembly code
for:
total = 0
for i =1 to
21
Single-purpose processors
Digital circuit designed to execute exactly
one program
a.k.a. coprocessor, accelerator or peripheral
Features
Contains only the components needed to
execute a single program
No program memory
Controller
Datapath
Control
logic
index
total
State
register
Data
memory
Benefits
Fast
Low power
Small size
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
22
Application-specific processors
Programmable processor optimized for a
particular class of applications having
common characteristics
Compromise between general-purpose and
single-purpose processors
Controller
Datapath
Control
logic and
State register
Registers
Custom
ALU
IR
PC
Features
Program
memory
Program memory
Optimized datapath
Special functional units
Data
memory
Assembly code
for:
total = 0
for i =1 to
Benefits
Some flexibility, good performance, size and
power
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
23
IC technology
The manner in which a digital (gate-level)
implementation is mapped onto an IC
IC: Integrated circuit, or chip
IC technologies differ in their customization to a design
ICs consist of numerous layers (perhaps 10 or more)
IC technologies differ with respect to who builds each layer and
when
IC package
IC
source
gate
oxide
channel
drain
Silicon substrate
24
IC technology
Three types of IC technologies
Full-custom/VLSI
Semi-custom ASIC (gate array and standard cell)
PLD (Programmable Logic Device)
25
Full-custom/VLSI
All layers are optimized for an embedded systems
particular digital implementation
Placing transistors
Sizing transistors
Routing wires
Benefits
Excellent performance, small size, low power
Drawbacks
High NRE cost (e.g., $300k), long time-to-market
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Har
26
Semi-custom
Lower layers are fully or partially built
Designers are left with routing of wires and maybe placing
some blocks
Benefits
Good performance, good size, less NRE cost than a fullcustom implementation (perhaps $10k to $100k)
Drawbacks
Still require weeks to months to develop
27
Benefits
Low NRE costs, almost instant IC availability
Drawbacks
Bigger, expensive (perhaps $30 per unit), power hungry,
slower
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
28
Moores law
The most important trend in embedded systems
Predicted in 1965 by Intel co-founder Gordon Moore
IC transistor capacity has doubled roughly every 18 months
for the past several decades
10,000
1,000
Logic transistors
per chip
(in millions)
100
10
1
0.1
Note:
logarithmic scale
0.01
0.001
29
Moores law
Wow
This growth rate is hard to imagine, most people
underestimate
How many ancestors do you have from 20 generations ago
i.e., roughly how many people alive in the 1500s did it take to make
you?
220 = more than 1 million people
30
1984
1987
1990
1993
1996
1999
2002
10,000
transistors
150,000,000
transistors
Leading edge
chip in 1981
Leading edge
chip in 2002
31
Design Technology
The manner in which we convert our concept of
desired system functionality into an implementation
Compilation/
Synthesis
Compilation/Synthesis:
Automates exploration and
insertion of implementation
details for lower level.
Libraries/
IP
Test/
Verification
System
specification
System
synthesis
Hw/Sw/
OS
Model simulat./
checkers
Behavioral
specification
Behavior
synthesis
Cores
Hw-Sw
cosimulators
RT
specification
RT
synthesis
RT
components
HDL simulators
Logic
specification
Logic
synthesis
Gates/
Cells
Gate
simulators
To final implementation
32
1,000
100
10
1
Productivity
(K) Trans./Staff Mo.
10,000
2009
0.01
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
0.1
33
Hardware/software
codesign
Compilers
(1960's,1970's)
Register transfers
Assembly instructions
RT synthesis
(1980's, 1990's)
Assemblers, linkers
(1950's, 1960's)
Machine instructions
Logic synthesis
(1970's, 1980's)
Logic gates
Microprocessor plus
program bits: software
Implementation
The choice of hardware versus software for a particular function is simply a tradeoff among various
design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
fundamental difference between what hardware or software can implement.
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
34
General,
providing improved:
Generalpurpose
processor
ASIP
Singlepurpose
processor
Flexibility
Maintainability
NRE cost
Time- to-prototype
Time-to-market
Cost (low volume)
Customized,
providing improved:
Power efficiency
Performance
Size
Cost (high volume)
PLD
Semi-custom
Full-custom
35
Logic transistors
per chip
(in millions)
10,000
100,000
1,000
10,000
100
10
1000
Gap
IC capacity
10
0.1
0.01
0.001
100
Productivity
(K) Trans./Staff-Mo.
productivity
0.1
0.01
36
Logic transistors
per chip
(in millions)
10,000
100,000
1,000
10,000
100
10
1000
100
Gap
IC capacity
1
0.1
10
1
productivity
0.01
Productivity
(K) Trans./Staff-Mo.
0.1
0.001
0.01
37
1M transistors, 1
designer=5000 trans/month
Each additional designer
reduces for 100 trans/month
So 2 designers produce 4900
trans/month each
60000
50000
40000
30000
20000
10000
16
16
19
18
23
24
Months until completion
43
Individual
0
Team
15
10
20
30
Number of designers
40
38
Summary
Embedded systems are everywhere
Key challenge: optimization of design metrics
Design metrics compete with one another
39
Outline
Anatomy of integrated circuits
Full-Custom (VLSI) IC Technology
Semi-Custom (ASIC) IC Technology
Programmable Logic Device (PLD) IC Technology
CMOS transistor
Source, Drain
Diffusion area where electrons can flow
Can be connected to metal contacts (vias)
Gate
Polysilicon area where control voltage is applied
Oxide
Si O2 Insulator so the gate voltage cant leak
IC package
IC
source
gate
oxide
channel
drain
Silicon substrate
20Ghz +
FinFET has been manufactured to
18nm
Still acts as a very good transistor
NAND
Spin
One time through the manufacturing process
Photolithography
Drawing patterns by using photoresist to form barriers for deposition
Full Custom
Very Large Scale Integration (VLSI)
Placement
Place and orient transistors
Routing
Connect transistors
Sizing
Make fat, fast wires or thin, slow wires
May also need to size buffer
Design Rules
simple rules for correct circuit function
Metal/metal spacing, min poly width
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Full Custom
Best size, power, performance
Hand design
Horrible time-to-market/flexibility/NRE cost
Reserve for the most important units in a processor
ALU, Instruction fetch
10
Semi-Custom
Gate Array
Standard Cell
11
Semi-Custom
Most popular design style
Jack of all trade
Good
Power, time-to-market, performance,
NRE cost, per-unit cost, area
Master of none
Integrate with full custom for
critical regions of design
12
13
Benefits
Very low NRE costs
Great time to market
Drawback
High unit cost, bad for large volume
Power
Except special PLA
slower
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Xilinx FPGA
15
16
I/O Block
17
Outline
Models vs. Languages
State Machine Model
FSM/FSMD
HCFSM and Statecharts Language
Program-State Machine (PSM) Model
Dataflow Model
Real-Time Systems
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Introduction
Describing embedded systems processing behavior
Can be extremely difficult
Complexity increasing with increasing IC capacity
Past: washing machines, small games, etc.
Hundreds of lines of code
Today: TV set-top boxes, Cell phone, etc.
Hundreds of thousands of lines of code
All that just for crossing the street (and theres much more)!
Dataflow model
For data dominated systems, transforms input data streams into output streams
Object-oriented model
For breaking complex software into simpler, well-defined pieces
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Recipe
Story
State
machine
Sequent.
program
Dataflow
English
Spanish
Japanese
C++
Java
Models
Languages
X = 1;
X=1
Y = X + 1;
Y=X+1
Simple elevator
controller
Request Resolver
resolves various floor
requests into single
requested floor
Unit Control moves
elevator to this requested
floor
System interface
up
Unit
Control
down
open
floor
req
Request
Resolver
...
b1
b2
bN
up1
up2
dn2
up3
dn3
buttons
inside
elevator
up/down
buttons on
each
floor
...
dnN
void RequestResolver()
{
while (1)
...
req = ...
...
}
void main()
{
Call concurrently:
UnitControl() and
RequestResolver()
}
System interface
Partial English description
Move the elevator either up or down
to reach the requested floor. Once at
the requested floor, open the door for
at least 10 seconds, and keep it open
until the requested floor changes.
Ensure the door is never open while
moving. Dont change directions
unless there are no higher requests
when moving up or no lower requests
when moving down
up
Unit
Control
down
open
floor
req
Request
Resolver
...
b1
b2
bN
up1
up2
dn2
up3
dn3
buttons
inside
elevator
up/down
buttons on
each
floor
...
dnN
Try it...
10
u,d,o, t = 1,0,0,0
GoingUp
u,d,o,t = 0,0,1,0
Idle
req == floor
u,d,o,t = 0,1,0,0
DoorOpen
u,d,o,t = 0,0,1,1
t is timer_start
11
Formal definition
An FSM is a 6-tuple F<S, I, O, F, H, s0>
Moore-type
Associates outputs with states (as given above, H maps S O)
Mealy-type
Associates outputs with transitions (H maps S x I O)
12
FSMD extends FSM: complex data types and variables for storing data
FSMs use only Boolean data types and operations, no variables
u,d,o, t = 1,0,0,0
GoingUp
timer < 10
DoorOpen
u,d,o,t = 0,0,1,1
!(req<floor)
GoingDn
t is timer_start
I,O,V may represent complex data types (i.e., integers, floating point, etc.)
F,H may include arithmetic operations
H is an action function, not just an output function
Describes variable updates as well as outputs
Complete system state now consists of current state, si, and values of all variables
13
Otherwise nondeterministic
state machine
u,d,o, t = 1,0,0,0
GoingUp
timer < 10
DoorOpen
u,d,o,t = 0,0,1,1
req == floor
req < floor
u,d,o,t = 0,1,0,0
!(req<floor)
GoingDn
u is up, d is down, o is open
req < floor
t is timer_start
14
15
16
Despite benefits of state machine model, most popular development tools use
sequential programming language
C, C++, Java, Ada, VHDL, Verilog, etc.
Development tools are complex and expensive, therefore not easy to adapt or replace
Must protect investment
Drawback: must support additional tool (licensing costs, upgrades, training, etc.)
17
if() {state = ;}
#define IDLE0
#define GOINGUP1
#define GOINGDN2
#define DOOROPEN3
void UnitControl() {
int state = IDLE;
while (1) {
switch (state) {
IDLE: up=0; down=0; open=1; timer_start=0;
if
(req==floor) {state = IDLE;}
if
(req > floor) {state = GOINGUP;}
if
(req < floor) {state = GOINGDN;}
break;
GOINGUP: up=1; down=0; open=0; timer_start=0;
if
(req > floor) {state = GOINGUP;}
if
(!(req>floor)) {state = DOOROPEN;}
break;
GOINGDN: up=1; down=0; open=0; timer_start=0;
if
(req < floor) {state = GOINGDN;}
if
(!(req<floor)) {state = DOOROPEN;}
break;
DOOROPEN: up=0; down=0; open=1; timer_start=1;
if (timer < 10) {state = DOOROPEN;}
if (!(timer<10)){state = IDLE;}
break;
}
}
}
18
General template
#define S0 0
#define S1 1
...
#define SN N
void StateMachine() {
int state = S0; // or whatever is the initial state.
while (1) {
switch (state) {
S0:
// Insert S0s actions here & Insert transitions Ti leaving S0:
if( T0s condition is true ) {state = T0s next state; /*actions*/ }
if( T1s condition is true ) {state = T1s next state; /*actions*/ }
...
if( Tms condition is true ) {state = Tms next state; /*actions*/ }
break;
S1:
// Insert S1s actions here
// Insert transitions Ti leaving S1
break;
...
SN:
// Insert SNs actions here
// Insert transitions Ti leaving SN
break;
}
}
}
19
y
A2
A1
A1
B
A2
With hierarchy
Without hierarchy
Known as AND-decomposition
Concurrency
Statecharts
B
C
C1
x
D1
y
C2
v
D2
20
GoingUp
req>floor
u,d,o = 0,0,1
UnitControl
timeout(10)
req==floor
u,d,o = 0,1,0
FireMode
!(req>floor)
Idle
DoorOpen
fire
fire
!(req<floor)
req<floor
fire
FireGoingDn
GoingDn
fire
floor>1
req<floor
u,d,o = 0,0,1
u,d,o = 0,1,0
floor==1 u,d,o = 0,0,1
FireDrOpen
!fire
With hierarchy
fire
UnitControl
Without hierarchy
NormalMode
req>floor
u,d,o = 1,0,0
GoingUp
!(req>floor)
req>floor
ElevatorController
UnitControl
u,d,o = 0,0,1
RequestResolver
NormalMode
u,d,o = 0,1,0
...
!fire
Idle
req==floor
req<floor
GoingDn
fire
timeout(10)
!(req>floor)
DoorOpen
u,d,o = 0,0,1
req<floor
FireMode
fire
!fire
FireMode
u,d,o = 0,1,0
FireGoingDn
floor==1 u,d,o = 0,0,1
floor>1
FireDrOpen
fire
21
ElevatorController
int req;
UnitControl
NormalMode
up = down = 0; open = 1;
while (1) {
while (req == floor);
open = 0;
if (req > floor) { up = 1;}
else {down = 1;}
while (req != floor);
open = 1;
delay(10);
}
}
!fire
fire
RequestResolver
...
req = ...
...
FireMode
up = 0; down = 1; open = 0;
while (floor > 1);
up = 0; down = 0; open = 1;
22
To create state machine, we thought in terms of states and transitions among states
When system must react to changing inputs, state machine might be best model
HCFSM described FireMode easily, clearly
23
ConcurrentProcessExample() {
x = ReadX()
y = ReadY()
Call concurrently:
PrintHelloWorld(x) and
PrintHowAreYou(y)
}
PrintHelloWorld(x) {
while( 1 ) {
print "Hello world."
delay(x);
}
}
PrintHowAreYou(x) {
while( 1 ) {
print "How are you?"
delay(y);
}
}
PrintHelloWorld
ReadX
ReadY
PrintHowAreYou
time
Enter X: 1
Enter Y: 2
Hello world.
Hello world.
How are you?
Hello world.
How are you?
Hello world.
...
(Time
(Time
(Time
(Time
(Time
(Time
=
=
=
=
=
=
1
2
2
3
4
4
s)
s)
s)
s)
s)
s)
24
Dataflow model
B C
Z = (A + B) * (C - D)
t1 t2
When all of nodes input edges have at least one token, node may
fire
When node fires, it consumes input tokens processes
transformation and generates output token
Nodes may fire simultaneously
Several commercial tools support graphical languages for capture
of dataflow model
B C
modulate
convolve
t1 t2
transform
25
Synchronous dataflow
A
mA
mB
mC
modulate
mD
convolve
mt1
t1
t2
tt1
ct2
tt2
transform
tZ
Z
Synchronous dataflow
26
27
Concurrent processes
Consider two examples
having separate tasks running
independently but sharing
data
Difficult to write system
using sequential program
model
Concurrent process model
easier
Separate sequential
programs (processes) for
each task
Programs communicate with
each other
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Heart-beat
pulse
Task 1:
Read pulse
If pulse < Lo then
Activate Siren
If pulse > Hi then
Activate Siren
Sleep 1 second
Repeat
Task 2:
If B1/B2 pressed then
Lo = Lo +/ 1
If B3/B4 pressed then
Hi = Hi +/ 1
Sleep 500 ms
Repeat
Set-top Box
Input
Signal
Task 1:
Read Signal
Separate Audio/Video
Send Audio to Task 2
Send Video to Task 3
Repeat
Task 2:
Wait on Task 1
Decode/output Audio
Repeat
Task 3:
Wait on Task 1
Decode/output Video
Repeat
Video
Audio
28
Process
A sequential program, typically an infinite loop
Executes concurrently with other processes
We are about to enter the world of concurrent programming
Join
A process suspends until a particular child process finishes execution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
29
Encoded video
packets
processA() {
// Decode packet
// Communicate packet
to B
}
}
Decoded video
packets
void processB() {
// Get packet from A
// Display packet
}
To display
30
Shared Memory
Processes read and write shared variables
No time overhead, easy to implement
But, hard to use mistakes are common
Error when both processes try to update count concurrently (lines 10 and 19)
and the following execution sequence occurs. Say count is 3.
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
data_type buffer[N];
int count = 0;
void processA() {
int i;
while( 1 ) {
produce(&data);
while( count == N );/*loop*/
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
}
}
void processB() {
int i;
while( 1 ) {
while( count == 0 );/*loop*/
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
consume(&data);
}
}
void main() {
create_process(processA);
create_process(processB);
}
Embedded
mb
Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
31
Message Passing
Message passing
Data explicitly sent from one process to
another
Sending process performs special operation,
send
Receiving process must perform special
operation, receive, to receive the data
Both operations must explicitly specify which
process it is sending to or receiving from
Receive is blocking, send may or may not be
blocking
void processA() {
while( 1 ) {
produce(&data)
send(B, &data);
/* region 1 */
receive(B, &data);
consume(&data);
}
}
void processB() {
while( 1 ) {
receive(A, &data);
transform(&data)
send(A, &data);
/* region 2 */
}
}
32
When a process enters the critical section, all other processes must be locked
out until it leaves the critical section
Mutex
A shared object used for locking and unlocking segment of shared data
Disallows read/write access to memory it guards
Multiple processes can perform lock operation simultaneously, but only one process
will acquire lock
All other processes trying to obtain lock will be put in blocked state until unlock
operation performed by acquiring process when it exits critical section
These processes will then be placed in runnable state and will compete for lock again
33
Say B acquires it
A will be put in blocked state
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
data_type buffer[N];
int count = 0;
mutex count_mutex;
void processA() {
int i;
while( 1 ) {
produce(&data);
while( count == N );/*loop*/
buffer[i] = data;
i = (i + 1) % N;
count_mutex.lock();
count = count + 1;
count_mutex.unlock();
}
}
void processB() {
int i;
while( 1 ) {
while( count == 0 );/*loop*/
data = buffer[i];
i = (i + 1) % N;
count_mutex.lock();
count = count - 1;
count_mutex.unlock();
consume(&data);
}
}
void main() {
create_process(processA);
create_process(processB);
}
34
Process Communication
Try modeling req value of our
elevator controller
System interface
up
Unit
Control
down
open
floor
req
Request
Resolver
...
b1
b2
bN
up1
up2
dn2
up3
dn3
buttons
inside
elevator
up/down
buttons on
each
floor
...
dnN
35
DEADLOCK!
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
mutex1.lock();
/* critical section
mutex2.lock();
/* critical section
mutex2.unlock();
/* critical section
mutex1.unlock();
}
}
void processB() {
while( 1 ) {
mutex2.lock();
/* critical section
mutex1.lock();
/* critical section
mutex1.unlock();
/* critical section
mutex2.unlock();
}
}
1 */
2 */
1 */
2 */
1 */
2 */
36
37
Condition variables
Condition variable is an object that has 2 operations, signal and wait
When process performs a wait on a condition variable, the process is blocked
until another process performs a signal on the same condition variable
How is this done?
Process A acquires lock on a mutex
Process A performs wait, passing this mutex
Causes mutex to be unlocked
38
2 condition variables
buffer_empty
Signals at least 1 free location available in buffer
buffer_full
Signals at least 1 valid data item in buffer
processA:
01:
02:
03:
04:
06:
07:
08:
09:
10:
11:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
37:
data_type buffer[N];
int count = 0;
mutex cs_mutex;
condition buffer_empty, buffer_full;
void processA() {
int i;
while( 1 ) {
produce(&data);
cs_mutex.lock();
if( count == N ) buffer_empty.wait(cs_mutex);
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
cs_mutex.unlock();
buffer_full.signal();
}
}
void processB() {
int i;
while( 1 ) {
cs_mutex.lock();
if( count == 0 ) buffer_full.wait(cs_mutex);
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
cs_mutex.unlock();
buffer_empty.signal();
consume(&data);
}
}
void main() {
create_process(processA); create_process(processB);
}
39
Monitors
Monitor
Monitor
DATA
Waiting
DATA
CODE
Process
X
CODE
Process
Y
Process
X
(a)
(b)
Monitor
Monitor
DATA
Waiting
DATA
CODE
Process
X
CODE
Process
Y
(c)
Process
Y
Process
X
Process
Y
(d)
40
01:
02:
03:
04:
06:
07:
08:
09:
10:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
35:
Monitor {
data_type buffer[N];
int count = 0;
condition buffer_full, condition buffer_empty;
void processA() {
int i;
while( 1 ) {
produce(&data);
if( count == N ) buffer_empty.wait();
buffer[i] = data;
i = (i + 1) % N;
count = count + 1;
buffer_full.signal();
}
}
void processB() {
int i;
while( 1 ) {
if( count == 0 ) buffer_full.wait();
data = buffer[i];
i = (i + 1) % N;
count = count - 1;
buffer_empty.signal();
consume(&data);
buffer_full.signal();
}
}
} /* end monitor */
void main() {
create_process(processA); create_process(processB);
}
41
Implementation
State
machine
Sequent.
program
Dataflow
Pascal
C/C++
Java
Implementation A Implementation
B
Concurrent
processes
VHDL
Implementation
C
The choice of
computational
model(s) is based
on whether it
allows the designer
to describe the
system.
The choice of
language(s) is
based on whether
it captures the
computational
model(s) used by
the designer.
The choice of
implementation is
based on whether it
meets power, size,
performance and
cost requirements.
42
(a)
Process3
Process4
Process2
Process3
(b)
Processor D
General Purpose
Processor
Process4
Processor C
Process1
More common
Processor B
Process2
Processor A
Process1
Processor A
Process1
Process2
(c)
Process3
Process4
General
Purpose
Processor
Communication Bus
Communication Bus
43
Implementation:
multiple processes sharing single processor
Can convert processes to sequential program with process scheduling right in code
44
Threads
Lightweight process
Subprocess within process
Only program counter, stack, and registers
Shares address space, system resources with other threads
Allows quicker communication between threads
45
Implementation:
suspending, resuming, and joining
Multiple processes mapped to single-purpose processors
Built into processors implementation
Could be extra input signal that is asserted when process suspended
Additional logic needed for determining process completion
Extra output signals indicating process done
46
Scheduler
Special process that decides when and for how long each process is executed
Implemented as preemptive or nonpreemptive scheduler
Preemptive
Determines how long a process executes before preempting to allow another process
to execute
Time quantum: predetermined amount of execution time preemptive scheduler allows each
process (may be 10 to 100s of milliseconds long)
Nonpreemptive
Only determines which process is next after current process finishes execution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
47
Scheduling: priority
Process with highest priority always selected first by scheduler
Typically determined statically during creation and dynamically during
execution
FIFO
Runnable processes added to end of FIFO as created or become runnable
Front process removed from FIFO when time quantum of current process is up
or process is blocked
Priority queue
Runnable processes again added as created or become runnable
Process with highest priority chosen when new process needed
If multiple processes with same highest priority value then selects from them
using first-come first-served
Called priority scheduling when nonpreemptive
Called round-robin when preemptive
48
Priority assignment
Period of process
Repeating time interval the process must complete one execution within
Execution deadline
Rate monotonic
Process
Period
Priority
A
B
C
D
E
F
25 ms
50 ms
12 ms
100 ms
40 ms
75 ms
5
3
6
1
4
2
Deadline monotonic
Process
Deadline
Priority
G
H
I
J
K
L
17 ms
50 ms
32 ms
10 ms
140 ms
32 ms
5
2
3
6
1
4
49
Real-time systems
Systems composed of 2 or more cooperating, concurrent processes with
stringent execution time constraints
E.g., set-top boxes have separate processes that read or decode video and/or
sound concurrently and must decode 20 frames/sec for output to appear
continuous
Other examples with stringent time constraints are:
50
Provide mechanisms, primitives, and guidelines for building real-time embedded systems
Windows CE
QNX
Real-time microkernel surrounded by optional processes (resource managers) that provide POSIX and
UNIX compatibility
51
Summary
Computation models are distinct from languages
Sequential program model is popular
Most common languages like C support it directly
52
Outline
Introduction
Combinational logic
Sequential logic
Custom single-purpose processor design
RT-level custom single-purpose processor design
Introduction
Processor
Digital circuit that performs a
computation tasks
Controller and datapath
CCD
General-purpose: variety of computation
tasks
Single-purpose: one particular
lens
computation task
Custom single-purpose: non-standard
task
A2D
JPEG codec
Pixel coprocessor
Microcontroller
Multiplier/Accum
DMA controller
Display
ctrl
A custom single-purpose
processor may be
Fast, small, low power
But, high NRE, longer time-to-market,
less flexible
D2A
Memory controller
UART
LCD ctrl
IC package
IC
source
gate
oxide
channel
drain
Conducts
if gate at 1
source
drain
Silicon substrate
nMOS transistor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
source
drain
gate
Conducts
if gate at 1
source
gate
Conducts
if gate at 0
drain
pMOS
nMOS
x
x
F = x'
F = (xy)'
x
y
Basic gates
F = (x+y)'
x
0
0
NOR gate
NAND gate
inverter
x
0
1
F
0
1
F = x
Inverter
F=xy
AND
F=x
Driver
x
0
1
F
1
0
x
y
F = (x y)
NAND
x
0
0
1
1
y
0
1
0
1
F
0
0
0
1
x
y
x
0
0
1
1
y
0
1
0
1
F
1
1
1
0
x
y
F=x+y
OR
F = (x+y)
NOR
x
0
0
1
1
y
0
1
0
1
F
0
1
1
1
x
0
0
1
1
y
0
1
0
1
F
1
0
0
0
F=xy
XOR
F = (x y)
XNOR
x
0
0
1
1
y
0
1
0
1
F
0
1
1
0
x
0
0
1
1
y
0
1
0
1
F
1
0
0
1
B) Truth table
y is 1 if a is to 1, or b and c are 1. z is 1 if
b or c is to 1, but not both, or if all are 1.
a
0
0
0
0
1
1
1
1
C) Output equations
Outputs
y
z
0
0
0
1
0
1
1
0
1
0
1
1
1
1
1
1
Inputs
b
c
0
0
0
1
1
0
1
1
0
0
0
1
1
0
1
1
E) Logic Gates
(random logic)
a
b
c
y = a + bc
z
bc
0
00
0
01
1
11
0
10
1
z = ab + bc + bc
Combinational components
I(m-1) I1 I0
n
S0
n-bit, m x 1
Multiplexor
S(log m) n
O
Multiplexor
O=
I0 if S=0..00
I1 if S=0..01
I(m-1) if S=1..11
I(log n -1) I0
B
n
A
n
log n x n
Decoder
n-bit
Adder
O(n-1) O1 O0
carry sum
Decoder
Adder
Comparator
sum = A+B
(first n bits)
carry = (n+1)th
bit of A+B
n-bit
Comparator
O0 =1 if I=0..00
O1 =1 if I=0..01
O(n-1) =1 if I=1..11
sum = A + B + Ci
less = 1 if A<B
equal =1 if A=B
greater=1 if A>B
B
n
n bit,
m function S0
ALU
S(log m)
n
O
ALU
O = A op B
op determined
by S.
Sequential components
I
n
load
shift
n-bit
Register
clear
n-bit
Shift register
n-bit
Counter
n
Q
Shift register
(storage) Register
Counter
Q = lsb
- Content shifted
- I stored in msb
Q=
0 if clear=1,
I if load=1 and clock=1,
Q(previous) otherwise.
Q=
0 if clear=1,
Q(prev)+1 if count=1 and clock=1.
C) Implementation Model
Combinational logic
I0
B) State Diagram
a=0
a=1
1
x=0
a=0
I1
I0
Q1
0
0
0
0
1
1
1
1
Inputs
Q0
a
0
0
0
1
1
0
1
1
0
0
0
1
1
0
1
1
I1
0
0
0
1
1
1
1
0
Outputs
I0
0
1
1
0
0
1
1
0
x
0
0
0
1
a=1
a=0
Q0
State register
x=1
x=0
x
I1
Q1
a=1
a=1
2
x=0
a=0
10
01
11
10
01
11
10
I0 Q1Q0
00
a
01
11
10
x Q1Q0
00
a
(random logic)
a
x
I1 = Q1Q0a + Q1a +
Q1Q0
I1
I0 = Q0a + Q0a
I0
x = Q1Q0
Q1 Q0
11
external
control
inputs
external
data
inputs
controller
datapath
control
inputs
datapath
control
outputs
external
control
outputs
datapath
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
external
data
outputs
12
(a) black-box
view
1:
1
!(!go_i)
2:
go_i
x_i
y_i
!go_i
2-J:
GCD
3:
x = x_i
4:
y = y_i
d_o
(b) alg. specification
!(x!=y)
5:
0: int x, y;
1: while (1) {
2: while (!go_i) ;
3: x = x_i;
4:
y = y_i;
5: while (x != y) {
6:
if (x < y)
7:
y = y - x;
else
8:
x = x - y;
}
9:
d_o = x;
}
x!=y
6:
x<y
7:
y = y -x
!(x<y)
8: x = x - y
6-J:
5-J:
9:
d_o = x
1-J:
13
Loop statement
while (cond) {
loop-bodystatements
}
next statement
a=b
next statement
a=b
Branch statement
!cond
C:
if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
C:
c1
cond
loop-bodystatements
next
statement
c2 stmts
!c1*!c2
others
J:
J:
next
statement
c1 stmts
!c1*c2
next
statement
14
!1
1:
1
!(!go_i)
2:
x_i
!go_i
Datapath
2-J:
x_sel
3:
x = x_i
4:
y = y_i
x_ld
n-bit 2x1
0: x
0: y
y_ld
!(x!=y)
5:
!=
5: x!=y
x_neq_y
6:
x<y
y = y -x
7:
n-bit 2x1
y_sel
x!=y
y_i
!(x<y)
<
subtractor
6: x<y
subtractor
8: x-y
x_lt_y
8: x = x - y
9: d
d_ld
d_o
6-J:
7: y-x
5-J:
9:
d_o = x
1-J:
15
!1
1:
Controller
1
!(!go_i)
0000
1:
0001
2:
!1
1
2:
!go_i
!(!go_i)
!go_i
2-J:
0010 2-J:
3:
x = x_i
4:
y = y_i
0011
x_sel = 0
3: x_ld = 1
0100
y_sel = 0
4: y_ld = 1
0101
5:
!(x!=y)
5:
x_i
0110
x<y
7:
y = y -x
!(x<y)
8: x = x - y
x_neq_y
6:
!x_lt_y
8: x_sel = 1
x_ld = 1
0111
6-J:
9:
1-J:
d_o = x
!=
x_lt_y
1011
9:
d_ld = 1
1100 1-J:
n-bit 2x1
0: x
0: y
y_ld
5: x!=y
x_neq_y
1010 5-J:
n-bit 2x1
y_sel
1000
1001 6-J:
5-J:
x_sel
x_ld
x_lt_y
7: y_sel = 1
y_ld = 1
y_i
Datapath
!x_neq_y
x!=y
6:
<
6: x<y
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
d_o
16
Controller
0000
go_i
!1
x_i
1:
1
x_sel
Combinational
logic
0001
y_sel
(b) Datapath
2:
x_sel
!go_i
x_ld
0010 2-J:
y_ld
x_neq_y
0011
x_lt_y
d_ld
0100
x_ld
x_sel = 0
3: x_ld = 1
5:
0110
6:
I1
5: x!=y
x_neq_y
x_neq_y=1
x_lt_y=1
7: y_sel = 1
y_ld = 1
I0
0: x
0: y
!=
x_neq_y=0
subtractor
8: x-y
subtractor
7: y-x
9: d
d_ld
x_lt_y=0
8: x_sel = 1
x_ld = 1
0111
<
6: x<y
x_lt_y
State register
I2
n-bit 2x1
y_ld
y_sel = 0
4: y_ld = 1
0101
n-bit 2x1
y_sel
Q3 Q2 Q1 Q0
I3
y_i
!(!go_i)
d_o
1000
1001 6-J:
1010 5-J:
1011
9:
d_ld = 1
1100 1-J:
17
Q2
Q1
Q0
Outputs
x_lt_
y
*
go_i
I3
I2
I1
I0
x_sel
y_sel
x_ld
y_ld
d_ld
x_neq
_y
*
18
controller
datapath
next-state
and
control
logic
registers
state
register
functional
units
19
Problem Specification
clock
data_in(4)
Example
Bridge
A single-purpose processor that
converts two 4-bit inputs, arriving one
at a time over data_in along with a
rdy_in pulse, into one 8-bit output on
data_out along with a rdy_out pulse.
rdy_in=0
rdy_out
Rece
iver
data_out(8)
Bridge
rdy_in=1
RecFirst4Start
data_lo=data_in
RecFirst4End
rdy_in=1
WaitFirst4
rdy_in=0
FSMD
rdy_in
WaitSecond4
rdy_in=0
rdy_in=1
RecSecond4Start
data_hi=data_in
rdy_in=0
Send8Start
data_out=data_hi
& data_lo
rdy_out=1
Send8End
rdy_out=0
rdy_in=1
RecSecond4End
Inputs
rdy_in: bit; data_in: bit[4];
Outputs
rdy_out: bit; data_out:bit[8]
Variables
data_lo, data_hi: bit[4];
20
(a) Controller
rdy_in=0
WaitFirst4
rdy_in=0
WaitSecond4
Send8Start
data_out_ld=1
rdy_out=1
rdy_in=1
rdy_in=1
RecFirst4Start
data_lo_ld=1
rdy_in=0
rdy_in=1
RecSecond4Start
data_hi_ld=1
RecFirst4End
rdy_in=1
RecSecond4End
Send8End
rdy_out=0
rdy_in
rdy_out
clk
data_out
data_hi
data_lo
data_lo_ld
data_out_ld
data_hi_ld
to all
registers
data_in(4)
data_out
(b) Datapath
21
original program
FSMD
datapath
FSM
22
number of computations
size of variable
time and space complexity
operations used
multiplication and division very expensive
23
optimized program
0: int x, y, r;
1: while (1) {
2:
while (!go_i) ;
// x must be the larger number
3:
if (x_i >= y_i) {
4:
x=x_i;
5:
y=y_i;
}
6:
else {
7:
x=y_i;
8:
y=x_i;
}
9:
while (y != 0) {
10:
r = x % y;
11:
x = y;
12:
y = r;
}
13:
d_o = x;
}
24
separate states
states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
scheduling
25
!1
1:
original FSMD
optimized FSMD
int x, y;
!(!go_i)
2:
2:
go_i
!go_i
2-J:
3:
3:
x = x_i
!go_i
x = x_i
y = y_i
5:
4:
y = y_i
!(x!=y)
5:
x!=y
6:
x<y
7:
y = y -x
!(x<y)
x<y
7: y = y -x
9:
x>y
8: x = x - y
d_o = x
8: x = x - y
6-J:
5-J:
9:
d_o = x
1-J:
26
Multi-functional units
ALUs support a variety of operations, it can be shared
among operations occurring in different states
27
State minimization
task of merging equivalent states into a single state
state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state
28
Summary
Custom single-purpose processors
29
Introduction
Instruction-Set Processor
Processor designed for a variety of computation tasks
General-Purpose Processor (GPP)
Application-Specific Processor (ASIP): optimized for a specific subset of tasks
Low unit cost because NRE is spreaded over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
System implementations designed with low NRE cost, short time-tomarket/prototype, high flexibility
User just writes software; no processor design
Basic Architecture
Control unit and
datapath
Processor
Control unit
Note similarity to
single-purpose
processor
Datapath
ALU
Controller
Control
/Status
Registers
Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory
Embedded Systems Design: A Unified
E
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
PC
IR
I/O
Memory
Datapath Operations
Load
Processor
Control unit
Datapath
ALU
ALU operation
Controller
+1
Control
/Status
Registers
Store
10
Write register to
memory location
PC
11
IR
I/O
...
Memory
10
11
...
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Control Unit
Processor
Control unit
ALU
Controller
Datapath
Control
/Status
Registers
PC
IR
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
R0
Memory
R1
...
500
501
10
...
5
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
Memory
...
500
501
101
inc R1, R0
102 store M[501], R1
R1
10
...
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
R1
...
500
501
10
...
7
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
Memory
...
500
501
101
inc R1, R0
102 store M[501], R1
R1
10
...
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
R1
...
500
501
10
...
9
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
IR
load R0, M[500]
100
R0
I/O
Memory
...
500
501
101
inc R1, R0
102 store M[501], R1
R1
10
...
10
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC 100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
R1
...
500
501
10
...
11
Instruction Cycles
PC=100
Processor
clk
Control unit
Datapath
ALU
Controller
+1
Control
/Status
PC=101
Registers
clk
10
PC 101
IR
inc R1, R0
R0
I/O
100 load R0, M[500]
Memory
101
inc R1, R0
102 store M[501], R1
11
R1
...
500
501
10
...
12
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
PC=101
Registers
clk
10
PC 102
IR
store M[501], R1
R0
11
R1
PC=102
clk
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
...
500 10
501 11
...
13
Architectural Considerations
N-bit processor
N-bit ALU, registers,
buses, memory data
interface
Embedded: 8-bit, 16bit, 32-bit common
Desktop/servers: 32bit, even 64
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
PC size determines
address space
I/O
Memory
14
Architectural Considerations
Clock frequency
Inverse of clock
period
Must be longer than
longest register to
register delay in
entire processor
Memory access is
often the longest
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
15
Non-pipelined
Dry
Decode
Time
Instruction 1
Execute
Store res.
Fetch ops.
Pipelined
2
Fetch-instr.
Time
Pipelined
Time
16
17
Processor
Program
memory
Data memory
Harvard
Processor
Memory
(program and data)
Princeton
18
Cache Memory
Memory access may be slow
Cache is small but fast
memory close to processor
Holds copy of part of memory
Hits and misses
Program Cache
Data Cache
Memory
19
Programmers View
Programmer doesnt need detailed understanding of architecture
Instead, needs to know what instructions can be executed
20
Assembly-Level Instructions
Instruction 1
opcode
operand1
operand2
Instruction 2
opcode
operand1
operand2
Instruction 3
opcode
operand1
operand2
Instruction 4
opcode
operand1
operand2
...
Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc.
Arithmetic/logical: move register through ALU and back
Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
21
First byte
Second byte
Operation
0000
Rn
direct
Rn = M(direct)
MOV direct, Rn
0001
Rn
direct
M(direct) = Rn
MOV @Rn, Rm
0010
Rn
0011
Rn
ADD Rn, Rm
0100
Rn
Rm
Rn = Rn + Rm
SUB Rn, Rm
0101
Rn
Rm
Rn = Rn - Rm
JZ Rn, relative
0110
Rn
opcode
Rm
M(Rn) = Rm
immediate
relative
Rn = immediate
PC = PC+ relative
(only if Rn is 0)
operands
22
Addressing Modes
Addressing
mode
Operand field
Immediate
Data
Register-direct
Register-file
contents
Memory
contents
Register address
Data
Register
indirect
Register address
Memory address
Direct
Memory address
Data
Indirect
Memory address
Memory address
Data
Data
23
Sample Programs
Equivalent assembly program
C program
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...
0
1
2
3
// total = 0
// i = 10
// constant 1
// constant 0
Loop:
5
6
7
JZ R1, Next;
ADD R0, R1;
SUB R1, R2;
JZ R3, Loop;
// Done if i=0
// total += i
// i-// Jump always
Next:
// next instructions...
24
Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
I/O
How communicate with external signals?
Interrupts
25
26
I/O Direction
Register Address
Output
7th
2-9
Output
bit of register #0
10,11,12,13,15
Input
14,16,17
Output
Pin 13
PC
Switch
Parallel port
Pin 2
LED
27
CheckPort
push
push
mov
in
and
cmp
jne
proc
ax
; save the content
dx
; save the content
dx, 3BCh + 1 ; base + 1 for register #1
al, dx
; read register #1
al, 10h
; mask out all but bit # 4
al, 0
; is it 0?
SwitchOn
; if not, we need to turn the LED on
SwitchOff:
mov
in
and
out
jmp
SwitchOn:
mov
in
or
out
dx,
al,
al,
dx,
Done:
pop
pop
CheckPort
dx
ax
endp
extern C CheckPort(void);
// defined in
// assembly
void main(void) {
while( 1 ) {
CheckPort();
}
}
Pin 13
PC
Switch
Parallel port
Pin 2
LED
I/O Direction
Register Address
Output
2-9
Output
10,11,12,13,15
Input
14,16,17
Output
28
Operating System
Optional software layer
providing low-level services to
a program (application).
File management, disk access
Keyboard/display interfacing
Scheduling multiple programs for
execution
Or even just multiple threads from
one program
R0, 1324
R1, file_name
34
R0, L1
-----
29
Development Environment
Development processor
The processor on which we write and debug our programs
Usually a PC
Target processor
The processor that the program will run on in our embedded
system
Often different from the development processor
Development processor
Target processor
30
C File
Compiler
Binary
File
Binary
File
Cross compiler
Asm.
File
Runs on one
processor, but
generates code for
another
Assemble
r
Binary
File
Linker
Library
Exec.
File
Implementation Phase
Debugger
Profiler
Verification Phase
Assemblers
Linkers
Debuggers
Profilers
31
Running a Program
If development processor is different than target, how
can we run our compiled code? Two options:
Download to target processor
Simulate
Simulation
One method: Hardware description language
But slow, not always available
32
instruction program[1024];
unsigned char memory[256];
//instruction memory
//data memory
}
return 0;
}
FILE* ifs;
void run_program(int num_bytes) {
If( argc != 2 ||
(ifs = fopen(argv[1], rb) == NULL ) {
return 1;
}
if (run_program(fread(program, 2,
sizeof(program), ifs)) == 0) {
print_memory_contents();
return(0);
}
else return(-1);
int pc = -1;
unsigned char reg[16], fb, sb;
while( ++pc < (num_bytes / 2) ) {
fb = program[pc].first_byte;
sb = program[pc].second_byte;
switch( fb >> 4 ) {
case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;
case 2: memory[reg[fb & 0x0f]] =
reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return 1;
33
ISS
(b)
Implementation
Phase
Verification
Phase
Implementation
Phase
Development processor
Debugger
/ ISS
Emulator
External tools
Download to board
Use device programmer
Runs in real environment, but
not controllable
Compromise: emulator
Programmer
Verification
Phase
34
Application-Specific Instruction-Set
Processors (ASIPs)
GPPs
Sometimes too general to be effective in demanding
application
e.g., video processing requires huge video buffers and operations
on large arrays of data, inefficient on a GPP
Still programmable
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
35
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space
36
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
37
38
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.
39
Instruction-Set Processors
Processor
Clock speed
Intel PIII
1GHz
IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110
550 MHz
250 MHz
233 MHz
Intel
8051
Motorola
68HC811
12 MHz
TI C5416
160 MHz
Lucent
DSP32C
80 MHz
3 MHz
Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None
Bus Width
MIPS
General Purpose Processors
32
~900
Power
Trans.
Price
97W
~7M
$900
32/64
~1300
5W
~7M
$900
32/64
NA
NA
3.6M
NA
32
268
1W
2.1M
NA
Microcontroller
~1
~0.2W
~10K
$7
~.5
~0.1W
~10K
$5
NA
NA
$34
32
NA
NA
$75
40
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
40
Declarations:
bit PC[16],
// Program Counter
IR[16];
// Instruction Reg.
bit M[64k][16], // Memory
RF[16][16]; // Register File
Reset
PC=0;
Fetch
IR=M[PC];
PC=PC+1
Decode
from states
below
Mov1
RF[Rn] = M[dir]
to Fetch
Mov2
M[dir] = RF[Rn]
to Fetch
Mov3
M[@Rn] = RF[Rm]
to Fetch
Mov4
RF[Rn]= imm
to Fetch
Op = 0000
0001
0010
0011
Add
RF[Rn] =RF[Rn]+RF[Rm]
to Fetch
Sub
RF[Rn] = RF[Rn]-RF[Rm]
to Fetch
0100
Aliases:
0101
Op IR[15..12]
Rn IR[11..8]
Rm IR[7..4]
dir IR[7..0]
imm IR[7..0]
rel IR[7..0]
Jz
0110
41
Control unit
Controller
(Next-state and
control
logic; state
register)
To all
input
contro
l
signals
Datapath
From all
output
control
signals
16
PCld
PCinc
Irld
PC
IR
RFs
2x1 mux
RFwa
RFw
RFwe
RF (16)
RFr1a
RFr1e
RFr2a
RFr1
RFr2e
RFr2
ALUs
PCclr
ALU
ALUz
Ms
4x1 mux
Mre Mwe
Memory
42
A Simple Microprocessor
Reset
PC=0;
PCclr=1;
Fetch
IR=M[PC];
PC=PC+1
MS=10;
Irld=1;
Mre=1;
PCinc=1;
Decode
from states
below
RF[Rn] = M[dir]
to Fetch
Mov2
M[dir] = RF[Rn]
to Fetch
RFr1a=Rn; RFr1e=1;
Ms=01; Mwe=1;
Mov3
M[@Rn] = RF[Rm]
to Fetch
Mov4
RF[Rn]= imm
to Fetch
Mov1
Op = 0000
0001
0010
0011
0100
0101
0110
Add
Sub
Jz
FSMD
Control unit
Controller
(Next-state and
control
logic; state
register)
To all
input
contro
l
signals
From all
output
control
signals
16
PCld
PCinc
Irld
PC
IR
Datapath
RFs
2x1 mux
RFwa
RFw
RFwe
RF (16)
RFr1a
RFr1e
RFr2a
RFr2e
RFr1
RFr2
ALUs
PCclr
ALU
ALUz
3
Ms
4x1 mux
0
Mre Mwe
Memory
43
Chapter Summary
Instruction-Set processors
Good performance, low NRE, flexible
ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs
44
Introduction
Single-purpose processors
Performs specific computation task
Custom single-purpose processors
Designed by us for a unique task
Basic timer
Clk
16-bit up
counter
16 Cnt
Top
Reset
Counters
Timer/counter
Clk
2x1
mux
16-bit up
counter
16 Cnt
Cnt_in
Top
Reset
Mode
Cascaded counters
Prescaler
Divides clock
Increases range, decreases
resolution
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Ha
H
16/32-bit timer
Clk
Timer with a terminal
count
16-bit up
counter
16 Cnt1
Top1
Clk
16-bit up
counter
16 Cnt
16-bit up
counter
16
Reset
Cnt2
Top2
=
Top
Prescaler
Terminal count
16-bit up
counter
Mode
indicator
light
LCD
/* main.c */
#define MS_INIT
63535
void main(void){
int count_milliseconds = 0;
time: 100 ms
Watchdog timer
Must reset timer every
X time unit, else timer
generates a signal
Common use: detect
failure, self-reset
Another use: timeouts
e.g., ATM machine
16-bit timer, 2
millisec. resolution
timereg value = 2*(2161)X = 131070X
For 2 min. timeout,
X = 120,000 microsec.;
so timereg = 11070
osc
prescaler
clk
(/12)
12 MHz
scalereg
overflow
(12 bits)
1 MHz
overflow
Timereg
(16 bits)
to system reset
or interrupt
1/(131070 ms)
1/(2ms)
checkreg
/* main.c */
main(){
wait until card inserted
call watchdog_reset_routine
while(transaction in progress){
if(button pressed){
perform corresponding action
call watchdog_reset_routine
}
/* if watchdog_reset_routine not called every
< 2 minutes, interrupt_service_routine is
called */
}
watchdog_reset_routine(){
/* checkreg is set so we can load value into
timereg. Zero is loaded into scalereg and
11070 is loaded into timereg */
checkreg = 1
scalereg = 0
timereg = 11070
}
void interrupt_service_routine(){
eject card
reset screen
}
embedded
device
1
10011011
10011011
Sending UART
start bit
Receiving UART
end bit
data
pwm_o
clk
pwm_o
clk
50% duty cycle average pwm_o is 2.5V.
pwm_o
clk
75% duty cycle average pwm_o is 3.75V.
clk_div
clk
controls how
fast the
counter
increments
8-bit
comparator
Input Voltage
% of Maximum
Voltage Applied
RPM of DC Motor
2.5
50
1840
3.75
75
6900
5.0
100
9200
counter <
cycle_high,
pwm_o = 1
counter >=
cycle_high,
pwm_o = 0
pwm_o
cycle_high
void main(void){
/* controls period */
PWMP = 0xff;
/* controls duty cycle */
PWM1 = 0x7f;
5V
DC
From
processor
5V
MOTOR
while(1){};
}
A
B
10
LCD controller
E
R/W
RS
communications
bus
RS = 1;
DATA_BUS = c;
EnableLCD(45);
DB7DB0
8
microcontroller
LCD
controller
CODES
I/D = 1 cursor moves left
DL = 1 8-bit
DL = 0 4-bit
N = 1 2 rows
N = 0 1 row
F = 1 5x10 dots
F = 0 5x7 dots
RS
R/W
DB7
DB6
DB5
DB4
DB3
DB2
DB1
DB0
Description
I/D
S/C
R/L
DL
WRITE DATA
Writes Data
11
Keypad controller
N1
N2
N3
N4
k_pressed
M1
M2
M3
M4
4
key_code
key_code
keypad controller
N=4, M=4
12
Sequence
1
2
3
4
5
A
+
+
+
B
+
+
+
A
+
+
-
B
+
+
-
Vd
16
MC3479P 15
14
13
12
Bias/Set
11
Phase A
Clk
10
CW/CCW
O|C
Full/Half Step
GND
Red
White
Yellow
Black
Vm
B
B
GND
A
A
B
B
13
MC3479P
Stepper Motor
Driver
10
7
void main(void){
sbit clk=P1^1;
sbit cw=P1^0;
8051
CW/CCW
CLK
P1.0
P1.1
2 A B 15
3 A B 14
void delay(void){
int i, j;
for (i=0; i<1000; i++)
for ( j=0; j<50; j++)
i = i + 0;
}
Stepper
Motor
+V
1K
Q1
A
Q2
1K
14
/*main.c*/
sbit notA=P2^0;
sbit isA=P2^1;
sbit notB=P2^2;
sbit isB=P2^3;
sbit dir=P2^4;
GND/ +V
P2.3
P2.2
P2.1
P2.0
Stepper
Motor
Q1
B
+V
1K
A
Q2
Q3
330
void delay(){
int a, b;
for(a=0; a<5000; a++)
for(b=0; b<10000; b++)
a=a+0;
}
void move(int dir, int steps) {
int y, z;
/* clockwise movement */
if(dir == 1){
for(y=0; y<=steps; y++){
for(z=0; z<=19; z+4){
isA=lookup[z];
isB=lookup[z+1];
notA=lookup[z+2];
notB=lookup[z+3];
delay();
}
}
}
15
Analog-to-digital converters
3.0V
2.5V
2.0V
1.5V
1.0V
0.5V
0V
5.0V
4.5V
4.0V
3.5V
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
Vmax = 7.5V
7.0V
6.5V
6.0V
5.5V
2
1
t1
t2
0100
t3
2
1
time
t1
t4
0100
t3
time
t4
1000 0110
Digital input
0101
digital to analog
analog to digital
proportionality
t2
16
Successive-approximation method
(Vmax Vmin) = 7.5 volts
Vmax = 7.5 volts.
17
Chapter 5 Memory
Outline
Introduction
Embedded systems functionality aspects
Processing
processors
transformation of data
Storage
memory
retention of data
Communication
buses
transfer of data
m words
m n memory
32,768 bits
12 address input signals
8 input/output data signals
r/w
Memory access
r/w: selects read or write
enable: read or write only when asserted
multiport: multiple accesses to different locations
simultaneously
enable
A0
Ak-1
Qn-1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Q0
ROM
RAM
EEPROM
FLASH
NVRAM
Nonvolatile
In-system
programmable
SRAM/DRAM
Near
zero
Write
ability
e.g., NVRAM
Write ability
EPROM
Tens of
years
Battery
life (10
years)
Ideal memory
OTP ROM
e.g., EEPROM
Mask-programmed ROM
Life of
product
Storage
permanence
During
External
External
External
External
In-system, fast
fabrication programmer, programmer, programmer programmer
writes,
1,000s
OR in-system, OR in-system,
only
one time only
unlimited
block-oriented
1,000s
of cycles
cycles
writes, 1,000s
of cycles
of cycles
Storage permanence
Write ability
Middle range
processor writes to memory, but slower
e.g., FLASH, EEPROM
Lower range
special equipment, programmer, must be used to write to memory
e.g., EPROM, OTP ROM
Low end
bits stored only during fabrication
e.g., Mask-programmed ROM
Storage permanence
Middle range
holds bits days, months, or years after memorys power source turned off
e.g., NVRAM
Lower range
holds bits as long as power supplied to memory
e.g., SRAM
Low end
begins to lose bits almost immediately after written
e.g., DRAM
Nonvolatile memory
Holds bits after power is no longer supplied
High end and middle range of storage permanence
External view
2k n ROM
enable
A0
Nonvolatile memory
Can be read from but not written to, by a
processor in an embedded system
Traditionally written to, programmed,
before inserting to embedded system
Uses
Ak-1
Qn-1
Q0
Example: 8 x 4 ROM
Internal view
8 4 ROM
word 0
38
decoder
enable
word 1
word 2
A0
A1
A2
word line
data line
programmable
connection
wired-OR
Q3 Q2 Q1 Q0
Truth table
Inputs (address)
a
b
c
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1
Outputs
y
z
0
0
0
1
0
1
1
0
1
0
1
1
1
1
1
1
82 ROM
0
0
0
1
1
1
1
1
enable
c
b
a
0
1
1
0
0
1
1
1
z
word 0
word 1
word 7
10
Mask-programmed ROM
Connections programmed at fabrication
set of masks
11
12
0V
floating gate
drain
source
(a)
+15V
(b)
source
drain
5-30 min
source
drain
(c)
(d)
13
14
Flash Memory
Extension of EEPROM
Same floating gate principle
Same write ability and storage permanence
Fast erase
Large blocks of memory erased at once, rather than one word at a time
Blocks typically several thousand bytes large
15
during execution
Internal structure more complex than ROM
external view
r/w
2k
enable
A0
Ak-1
Qn-1
Q0
internal view
I3 I2 I1 I0
24
decoder
A0
A1
Memory
cell
rd/wr
To every cell
Q3 Q 2 Q 1 Q 0
16
SRAM
Data'
Data
DRAM
Data
W
17
Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
Popular low-cost high-density alternative to SRAM
18
Example:
HM6264 & 27C256 RAM/ROM devices
Low-cost low-capacity memory
devices
Commonly used in 8-bit
microcontroller-based
embedded systems
First two numeric digits indicate
device type
RAM: 62
ROM: 27
11-13, 15-19
data<70>
2,23,21,24,
25, 3-10
22
addr<15...0>
11-13, 15-19
data<70>
27,26,2,23,21,
addr<15...0>
24,25, 3-10
22
/OE
27
/WE
20
/CS1
26
CS2 HM6264
20
/OE
/CS
27C256
block diagrams
Device
Access Time (ns)
HM6264
85-100
27C256
90
device characteristics
Read operation
Write operation
data
data
addr
addr
OE
WE
/CS1
/CS1
CS2
CS2
timing diagrams
19
Example:
TC55V2325FF-100 memory device
2-megabit
synchronous pipelined
burst SRAM memory
device
Designed to be
interfaced with 32-bit
processors
Capable of fast
sequential reads and
writes as well as
single byte I/O
data<310>
addr<150>
Device
Access Time (ns)
TC55V23
10
25FF-100
addr<10...0>
device characteristics
/CS1
/CS2
CS3
CLK
/WE
/ADSP
/OE
/ADSC
MODE
/ADV
/ADSP
/ADSC
/ADV
CLK
TC55V2325F
F-100
addr <150>
/WE
/OE
/CS1 and /CS2
CS3
data<310>
block diagram
timing diagram
20
Composing memory
A0
Am-1
Am
12
decoder
2m n ROM
enable
Qn-1
2m 3n ROM
2m n ROM
enable
Increase width
of words
A0
Am
2m n ROM
Increase number
and width of
words
Q3n-1
2m n ROM
Q2n-1
Q0
enable
Q0
outputs
21
Memory hierarchy
Want inexpensive, fast
memory
Main memory
Large, inexpensive, slow
memory stores entire
program and data
Cache
Small, expensive, fast
memory stores copy of likely
accessed parts of larger
memory
Can be multiple levels of
cache
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Processor
Registers
Cache
Main memory
Disk
Tape
22
Cache
Usually designed with SRAM
faster but more expensive than DRAM
Cache operation:
Request for main memory access (read or write)
First, check cache for copy
cache hit
copy is in cache, quick access
cache miss
copy not in cache, read address and possibly its neighbors into cache
23
Cache mapping
Far fewer number of available cache addresses
Are address contents in cache?
Cache mapping used to assign main memory address to cache
address and determine hit or miss
Three basic techniques:
Direct mapping
Fully associative mapping
Set-associative mapping
24
Direct mapping
Main memory address divided into 2 fields
Index
cache address
number of bits determined by cache size
Tag
compared with tag stored in cache at address
indicated by index
if tags match, check valid bit
Tag
Index
Offset
V T D
Valid bit
Data
Valid
=
Offset
used to find particular word in cache line
25
Offset
Data
V T D
V T D
V T D
Valid
=
26
Set-associative mapping
Compromise between direct mapping and
fully associative mapping
Index same as in direct mapping
But, each cache address contains content
and tags of 2 or more memory address
locations
Tags of that set simultaneously compared as
in fully associative mapping
Cache with set size N called N-way setassociative
Tag
Index
V T D
Offset
V T D
Data
Valid
=
27
Cache-replacement policy
Technique for choosing which block to replace
when fully associative cache is full
when set-associative caches line is full
FIFO: first-in-first-out
push block onto queue when accessed
choose block to replace by popping queue
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
28
Write-back
main memory only written when dirty block replaced
extra dirty bit for each block set when cache block written to
reduces number of slow main memory writes
29
Degree of associativity
Data block size
Larger caches achieve lower miss rates but higher access cost
e.g.,
2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles
4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles
(improvement)
8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
(worse)
30
0.1
1 way
2 way
0.08
4 way
0.06
8 way
0.04
0.02
0
1 Kb
2 Kb
4 Kb
8 Kb
16 Kb 32 Kb
64 Kb 128 Kb
cache size
31
Advanced RAM
DRAMs commonly used as main memory in processor based
embedded systems
high capacity, low cost
32
Basic DRAM
address
cas
ras
Col Decoder
cas, ras, clock
Sense
Amplifiers
Row Decoder
rd/wr
Refresh
Circuit
Data In Buffer
data
33
cas
address
row
col
data
col
col
data
data
data
34
ras
cas
address
row
col
data
col
col
data
data
data
35
(S)ynchronous and
Enhanced Synchronous (ES) DRAM
SDRAM latches data on active edge of clock
Eliminates time to detect ras/cas and rd/wr signals
A counter is initialized to column address then incremented on
active edge of clock to access consecutive memory locations
ESDRAM improves SDRAM
added buffers enable overlapping of column addressing
faster clocking and lower read/write latency possible
clock
ras
cas
address
row
data
col
data
data
data
36
37
38
39
Outline
Automation: synthesis
Verification: hardware/software co-simulation
Reuse: intellectual property cores
Design process models
Introduction
Design task
Define system functionality
Convert functionality to physical implementation while
Satisfying constrained metrics
Optimizing other design metrics
Productivity gap
As low as 10 lines of code or 100 transistors produced per day
Improving productivity
Design technologies developed to improve productivity
We focus on technologies advancing hardware/software unified
view
Automation
Specification
Automation
Verification
Reuse
Implementation
Reuse
Predesigned components
Cores
General-purpose and single-purpose processors on single IC
Verification
Ensuring correctness/completeness of each design step
Hardware/software co-simulation
Automation: synthesis
Behavioral synthesis
(1990s)
Compilers
(1960s,1970s)
Register transfers
RT synthesis
(1980s, 1990s)
Assembly instructions
Logic synthesis
(1970s, 1980s)
Machine instructions
Microprocessor plus
program bits
Logic gates
Implementation
Compilers
translate sequential programs into assembly
Behavioral synthesis
(1990s)
Compilers
(1960s,1970s)
Register transfers
RT synthesis
(1980s, 1990s)
Assembly instructions
Behavioral synthesis
converts sequential programs into FSMDs
Assemblers, linkers
(1950s, 1960s)
Logic synthesis
(1970s, 1980s)
Machine instructions
Microprocessor plus
program bits
Logic gates
Implementation
idea
idea
back-of-the-envelope
sequential program
register-transfers
logic
implementation
(a)
implementation
(b)
Synthesis
Automatically converting systems behavioral description to a structural
implementation
Complex whole formed by parts
Structural implementation must optimize design metrics
Gajskis Y-chart
Structural
Implements behavior by connecting
components with known behavior
Processors, memories
Register transfers
Gates, flip-flops
Logic equations/FSM
Transistors
Transfer functions
Cell Layout
Modules
E.g.,
Sequential programs
Physical
Behavior
Structural
Chips
Boards
Physical
Logic synthesis
Minimize size
Minimum cover
Minimum cover that is prime
Heuristics
Multilevel minimization
Trade performance for size
Pareto-optimal solution
Heuristics
FSM synthesis
State minimization
State encoding
10
Two-level minimization
Represent logic function as sum of
products (or product of sums)
AND gate for each product
OR gate for each sum
Sum of products
F = abc'd' + a'b'cd + a'bcd + ab'cd
Direct implementation
a
b
c
11
Minimum cover
Minimum # of AND gates (sum of products)
Literal: variable or its complement
a or a, b or b, etc.
12
Minimum cover
Covering all 1s with min # of
circles
Example: direct vs. min cover
00
00
01
01
11
11
10
10
Minimum cover
F=abc'd' + a'cd + ab'cd
Less gates
Minimum cover implementation
4 vs. 5
Less transistors
28 vs. 40
a
b
c
13
ab
00
01
11
10
00
01
11
10
Less transistors
26 vs. 28
Implementation
a
b
c
d
14
Heuristic
Solution technique where optimal solution not guaranteed
Hopefully comes close
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
15
Reduce
Opposite of expand
Reshape
Expands one implicant while reducing another
Maintains total # of implicants
Irredundant
Selects min # of implicants that cover from existing implicants
Synthesis tools differ in modifications used and the order they are used
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
16
delay
2-level minim.
size
17
Example
Minimized 2-level logic function:
F = adef + bdef + cdef + gh
Requires 5 gates with 18 total gate inputs
4 ANDS and 1 OR
2-level minimized
a
d
b
e
c
f
g
h
multilevel minimized
a
b
c
d
e
f
g
h
18
FSM synthesis
FSM to gates
State minimization
Reduce # of states
Identify and merge equivalent states
Outputs, next states same for all possible inputs
Tabular method gives exact solution
Table of all possible state pairs
If n states, n2 table entries
Thus, heuristics used with large # of states
State encoding
19
Technology mapping
Library of gates available for implementation
Simple
only 2-input AND,OR gates
Complex
various-input AND,OR,NAND,NOR,etc. gates
Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)
20
Fast heuristics
21
Today
Wire
Delay
Transistor
22
Register-transfer synthesis
Converts FSMD to custom single-purpose processor
Datapath
Register units to store variables
Complex data types
Functional units
Arithmetic operations
Connection units
Buses, MUXs
FSM controller
Controls datapath
Binding
Mapping FSMD operations to specific units
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
23
Behavioral synthesis
High-level synthesis
Converts single sequential program to single-purpose processor
Does not require the program to schedule states
Optimizations important
Compiler
Constant propagation, dead-code elimination, loop unrolling
24
System synthesis
Convert 1 or more processes into 1 or more processors (system)
For complex embedded systems
Multiple processes may provide better performance/power
May be better described using concurrent sequential programs
Tasks
Transformation
Allocation
Essentially design of system architecture
Select processors to implement processes
Also select memories and busses
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
25
System synthesis
Tasks (cont.)
Partitioning
Mapping 1 or more processes to 1 or more processors
Variables among memories
Communications among buses
Scheduling
Multiple processes on a single processor
Memory accesses
Bus communications
26
System synthesis
Synthesis driven by constraints
E.g.,
Meet performance requirements at minimum cost
Allocate as much behavior as possible to general-purpose processor
Low-cost/flexible implementation
Minimum # of SPPs used to meet performance
Hardware/software codesign
Simultaneous consideration of GPPs/SPPs during synthesis
Made possible by maturation of behavioral synthesis in 1990s
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
27
Spatial thinking
Structural diagrams
Data sheets
28
Temporal thinking
States or sequential statements have relationship over time
29
Verification
Ensuring design is correct and complete
Correct
Implements specification accurately
Complete
Describes appropriate output to all relevant input
Formal verification
Hard
For small designs or verifying certain key properties only
Simulation
Most common verification method
30
Formal verification
Analyze design to prove or disprove certain properties
Correctness example
Prove ALU structural implementation equivalent to behavioral
description
Derive Boolean equations for outputs
Create truth table for equations
Compare to truth table from original behavior
Completeness example
Formally prove elevator door can never open while elevator is moving
Derive conditions for door being open
Show conditions conflict with conditions for elevator moving
31
Simulation
Create computer model of design
Provide sample input
Check for acceptable output
Correctness example
ALU
Provide all possible input combinations
Check outputs for correct results
Completeness example
Elevator door closed when moving
Provide all possible input sequences
Check door always closed when elevator moving
32
Increases confidence
Simulating all possible input sequences impossible for most
systems
E.g., 32-bit ALU
33
Observability
Examine system/environment values at any time
Debugging
Can stop simulation at any point and:
Observe internal values
Modify system/environment values before restarting
34
Disadvantages
Simulation setup time
Often has complex external environments
Could spend more time modeling environment than system
35
Simulation speed
Relative speeds of different types of
simulation/emulation
1 hour actual execution of SOC
= 1.2 years instruction-set simulation
= 10,000,000 hours gate-level simulation
1
u10
u100
u10000
u1,000,000
u10,000,000
1 hour
1 day
hardware emulation
throughput model
u1000
u100,000
IC
FPGA
4 days
1.4 months
instruction-set simulation
cycle-accurate simulation
1.2 years
12 years
>1 lifetime
1
millennium
36
Reduced confidence
1 msec of cruise controller operation tells us little
Faster simulator
Emulators
Special hardware for simulations
37
Reducing precision/accuracy
Dont need gate-level analysis for all simulations
E.g., cruise control
Dont care what happens at every input/output of each logic gate
38
Hardware/software co-simulation
Variety of simulation approaches exist
From very detailed
E.g., gate-level model
To very abstract
E.g., instruction-level model
Hardware (SPP)
Typically with models in HDL environment
39
Hardware-software co-simulator
40
Minimizing communication
Memory shared between GPP and SPPs
Where should memory go?
In ISS
HDL simulator must stall for memory access
In HDL?
ISS must stall when fetching each instruction
41
Emulators
General physical device system mapped to
Microprocessor emulator
Microprocessor IC with some monitoring, control circuitry
SPP emulator
FPGAs (10s to 100s)
42
Disadvantages
Still not as fast as real implementations
E.g., emulated cruise-control may not respond fast enough to
keep control of car
43
System-on-a-chip (SOC)
All components of system implemented on single chip
Made possible by increasing IC capacities
Changing the way COTS components sold
As intellectual property (IP) rather than actual IC
Behavioral, structural, or physical descriptions
Processor-level components known as cores
44
Cores
Soft core
Synthesizable behavioral
description
Typically written in HDL
(VHDL/Verilog)
Gajskis Y-chart
Processors, memories
Firm core
Structural description
Typically provided in HDL
Hard core
Physical description
Provided in variety of physical
layout file formats
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Behavior
Structural
Sequential programs
Register transfers
Gates, flip-flops
Logic equations/FSM
Transistors
Transfer functions
Cell Layout
Modules
Chips
Boards
Physical
45
Predictability
Size, power, performance predicted accurately
46
Firm cores
Compromise between hard and soft cores
Some retargetability
Limited optimization
Better predictability/ease of use
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
47
Today
Vendors can sell as IP
Designers can make as many copies as needed
48
IP protection
Past
Illegally copying IC very difficult
Reverse engineering required tremendous, deliberate effort
Accidental copying not possible
Today
Cores sold in electronic format
Watermarking
determines if particular instance of processor was copied
whether copy authorized
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
49
Licensing arrangements
Not as easy as purchasing IC
More contracts enforcing pricing model and IP protection
Possibly requiring legal assistance
50
Waterfall model
Physical
Spiral model
Proceed through 3 steps in order but with less
detail
Repeat 3 steps gradually increasing detail
Keep repeating until desired system obtained
Becoming extremely popular (hardware &
software development)
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Behavioral
Physical
51
Waterfall method
Not very realistic
Bugs often found in later steps that must be fixed in
earlier step
E.g., forgot to handle certain input condition
Behavioral
Structural
Physical
Lost revenues
May never make it to market
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
52
Spiral method
First iteration of 3 steps incomplete
Much faster, though
End up with prototype
Use to test basic functions
Get idea of functions to add/remove
Behavioral
Physical
Extra effort/cost
53
Spiral-like model
Beginning to be applied by embedded system designers
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
54
Spiral-like model
Y-chart
Architecture
Application(s)
Mapping
Analysis
55
Summary
Design technology seeks to reduce gap between IC
capacity growth and designer productivity growth
Synthesis has changed digital design
Increased IC capacity means sw/hw components
coexist on one chip
Design paradigm shift to core-based design
Simulation essential but hard
Spiral design process is popular
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
56
Outline
Introduction
Putting it all together
Instruction-set processor (GPP, ASIP)
Single-purpose processor
Custom
Standard
Memory
Interfacing
Downloads images to PC
Only recently possible
Systems-on-a-chip
Multiple processors and memories on one IC
Designers perspective
Two key tasks
Processing images and storing in memory
When shutter pressed:
Image captured
Converted to digital form by charge-coupled device (CCD)
Compressed and archived in internal memory
Uploading images to PC
Digital camera attached to PC
Special software commands camera to transmit archived
images serially
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Lens area
Covered columns Electro-
Pixel rows
mechanical
shutter
Electronic
circuitry
Pixel columns
Zero-bias error
Manufacturing errors cause cells to measure slightly above or below actual
light intensity
Error is typically the same across columns, but is different across rows
Some of left most columns blocked by black paint to detect zero-bias error
Reading of other than 0 in blocked cells is zero-bias error
Each row is corrected by subtracting the average error found in blocked cells for
that row
Covered
cells
136
145
144
176
144
122
121
173
170
146
153
183
156
131
155
175
155
168
168
161
161
128
164
176
140
123
117
111
133
147
185
183
144
120
121
186
192
206
254
188
115
117
127
130
153
151
165
184
112
119
118
132
138
131
138
117
248 12
147 12
135 9
133 0
139 7
127 2
129 4
129 5
14
10
9
0
7
0
4
5
Zero-bias
adjustment
-13
-11
-9
0
-7
-1
-4
-5
123
134
135
176
137
121
117
168
157
135
144
183
149
130
151
170
142
157
159
161
154
127
160
171
127
112
108
111
126
146
181
178
131
109
112
186
185
205
250
183
102
106
118
130
146
150
161
179
99
108
109
132
131
130
134
112
235
136
126
133
132
126
125
124
Compression
Store more images
Transmit image to PC in less time
JPEG (Joint Photographic Experts Group)
Popular standard format for representing digital images in a compressed
form
Provides for a number of different modes of operation
Mode used in this chapter provides high compression ratios using DCT
(discrete cosine transform)
Image data divided into blocks of 8 x 8 pixels
3 steps performed on each block
DCT
Quantization
Huffman encoding
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
DCT step
Transforms original 8 x 8 block into a cosine-frequency
domain
Upper-left corner values represent more of the essence of the image
Lower-right corner values represent finer details
Can reduce precision of these values and retain reasonable image quality
F(u,v) = C(u) C(v) x=0..7 y=0..7 Dxy FRV>[X@FRV>\Y@
Gives encoded pixel at row u, column v
Dxy is original pixel value at row x, column y
Quantization step
Achieve high compression ratio by reducing image
quality (loss compression)
Reduce bit precision of encoded data
Fewer bits needed for encoding
One way is to divide all values by a factor of 2
Simple right shifts can do this
39 -43
-3 115
-11
1
-61 -13
13 37
-11
-9
-7 21
-13 -11
-10
-73
-42
-12
-4
-4
-6
-17
26
-6
26
36
10
20
3
-4
-83
-2
-3
-23
-21
-28
3
-1
11
22
17
-18
7
-21
12
7
41
-5
-38
5
-8
14
-21
-4
144
-10
2
0
6
5
-2
-1
5
0
-1
-8
2
-1
-1
-2
-5
14
0
-2
5
-1
3
-1
-1
-9
-5
-2
-1
-1
-1
-2
3
-1
3
5
1
3
0
-1
-10
0
0
-3
-3
-4
0
0
1
3
2
-2
1
-3
2
1
5
-1
-5
1
-1
2
-3
-1
After quantization
10
11
Pixel
frequencies
-1 15x
0
8x
-2
6x
1
5x
2
5x
3
5x
5
5x
-3
4x
-5
3x
-10 2x
144 1x
-9
1x
-8
1x
-4
1x
6
1x
14 1x
6
4
3
5
29
-1
1
5
1
7
1
8
1
4
1
0
-2
-10
5
2
3
1
6
-5
1
14
1
1
Huffman
codes
Huffman tree
-3
1
-4
1
-8
1
-9
1
144
-1
0
-2
1
2
3
5
-3
-5
-10
144
-9
-8
-4
6
14
00
100
110
010
1110
1010
0110
11110
10110
01110
111111
111110
101111
101110
011111
011110
12
Archive step
Record starting address and image size
Can use linked list
13
Uploading to PC
When connected to PC and upload command received
Read images from memory
Transmit serially using UART
While transmitting
Reset pointers, image-size variables and global memory pointer
accordingly
14
Requirements Specification
Systems requirements what system should do
Nonfunctional requirements
Constraints on design metrics (e.g., should use 0.001 watt or less)
Functional requirements
Systems behavior (e.g., output X should be input Y times 2)
Initial specification may be very general and come from marketing dept.
E.g., short document detailing market need for a low-end digital camera that:
15
Nonfunctional requirements
Design metrics of importance based on initial specification
Constrained metrics
Values must be below (sometimes above) certain threshold
Optimization metrics
Improved as much as possible to improve product
16
Performance
Must process image fast enough to be useful
1 sec reasonable constraint
Slower would be annoying
Faster not necessary for low-end of market
Size
Must use IC that fits in reasonably sized camera
Constrained and optimization metric
Constraint could be 200,000 gates, but smaller would be cheaper
Power
Must operate below certain temperature (cooling fan not possible)
Therefore, constrained metric
Energy
Reducing power or time reduces energy
Optimized metric: want battery to last as long as possible
17
Zero-bias adjust
CCD
input
DCT
yes
no
Archive in
memory
yes
More
88
blocks?
no
Done?
Transmit serially
serial output
e.g., 011010...
18
101011010
110101010
010101101.
..
CCD.C
CCDPP.C
image file
CNTRL.C
101010101
010101010
101010101
0...
CODEC.C
UART.C
output file
19
CCD module
#include <stdio.h>
#define SZ_ROW
64
void CcdCapture(void) {
#define SZ_COL
(64 + 2)
int pixel;
20
#define SZ_ROW
64
#define SZ_COL
64
void CcdppCapture(void) {
colIndex = -1;
char bias;
CcdCapture();
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
}
char CcdppPopPixel(void) {
char pixel;
pixel = buffer[rowIndex][colIndex];
buffer[rowIndex][colIndex] = CcdPopPixel();
}
bias = (CcdPopPixel() + CcdPopPixel()) / 2;
colIndex = 0;
buffer[rowIndex][colIndex] -= bias;
rowIndex = -1;
}
}
}
}
rowIndex = 0;
return pixel;
colIndex = 0;
}
21
UART module
#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, "w");
}
void UartSend(char d) {
fprintf(outputFileHandle, "%i\n", (int)d);
}
22
CODEC module
static short ibuffer[8][8], obuffer[8][8], idx;
void CodecPushPixel(short p) {
if( idx == 64 ) idx = 0;
ibuffer[idx / 8][idx % 8] = p; idx++;
}
void CodecDoFdct(void) {
int x, y;
for(x=0; x<8; x++) {
for(y=0; y<8; y++)
obuffer[x][y] = FDCT(x, y, ibuffer);
}
idx = 0;
}
short CodecPopPixel(void) {
short p;
if( idx == 64 ) idx = 0;
p = obuffer[idx / 8][idx % 8]; idx++;
return p;
}
23
CODEC (cont.)
{ 32768,
32138,
30273,
27245,
{ 32768,
27245,
12539,
{ 32768,
{ 32768,
{ 32768,
18204,
23170,
23170,
12539,
{ 32768, -32138,
30273, -27245,
6392,
12539,
30273,
6392, -23170,
6392 },
27245 },
32138, -23170,
{ 32768, -27245,
18204,
-6392,
32138, -30273,
23170, -18204,
32138 },
30273, -27245 },
12539,
18204 },
-6392 }
};
static int FDCT(int u, int v, short img[8][8]) {
double s[8], r = 0; int x;
for(x=0; x<8; x++) {
s[x] = img[x][0] * COS(0, v) + img[x][1] * COS(1, v) +
24
void CntrlSendImage(void) {
for(i=0; i<SZ_ROW; i++)
for(j=0; j<SZ_COL; j++) {
temp = buffer[i][j];
UartSend(((char*)&temp)[0]);
UartSend(((char*)&temp)[1]);
}
}
}
void CntrlCompressImage(void) {
for(i=0; i<NUM_ROW_BLOCKS; i++)
for(j=0; j<NUM_COL_BLOCKS; j++) {
for(k=0; k<8; k++)
void CntrlCaptureImage(void) {
CcdppCapture();
CodecPushPixel(
buffer[i][j] = CcdppPopPixel();
}
#define SZ_ROW
64
#define SZ_COL
64
#define NUM_ROW_BLOCKS
(SZ_ROW / 8)
#define NUM_COL_BLOCKS
(SZ_COL / 8)
}
}
25
26
Design
Memories, buses
Implementation
A particular architecture and mapping
Solution space is set of all implementations
Starting point
Low-end general-purpose processor connected to flash memory
All functionality mapped to software running on processor
Usually satisfies power, size, and time-to-market constraints
If timing constraint not satisfied then later implementations could:
use single-purpose processors for time-critical functions
rewrite functional specification
27
28
Implementation 2:
Microcontroller and CCDPP
EEPROM
SOC
UART
8051
RAM
CCDPP
29
Microcontroller
Instruction
Decoder
Controller
128
RAM
ALU
30
UART
UART in idle mode until invoked
UART invoked when 8051 executes store instruction
with UARTs enable register as target address
Memory-mapped communication between 8051 and
all single-purpose processors
Lower 8-bits of memory address for RAM
Upper 8-bits of memory address for memory-mapped
I/O devices
Idle
:
I=0
I<8
Stop:
Transmi
t HIGH
I=8
Start:
Transmi
t LOW
Data:
Transmit
data(I),
then I++
31
CCDPP
Idle:
GetRow:
invoked
B[R][C]=Pxl
C=C+1
R=0
C=0
C = 66
R = 64
R < 64
NextRow:
ComputeBias:
C < 64
R++
C=0
C = 64
C < 66
Bias=(B[R][11] +
B[R][10]) / 2
C=0
FixBias:
B[R][C]=B[R][C]-Bias
32
Read
Write
33
Software
#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, "w");
}
void UartSend(char d) {
fprintf(outputFileHandle, "%i\n", (int)d);
}
34
Analysis
VHDL
VHDL
VHDL
simulator
Power
equation
Synthesis
tool
Gate level
simulator
gates
Execution time
gates
gates
Sum gates
Power
Chip area
35
Implementation 2:
Microcontroller and CCDPP
Analysis of implementation 2
Total execution time for processing one image:
9.1 seconds
Power consumption:
0.033 watt
Energy consumption:
0.30 joule (9.1 s x 0.033 watt)
36
37
38
Fixed-point arithmetic
Integer used to represent a real number
Constant number of integers bits represents fractional portion of real number
More bits, more accurate the representation
2^4 = 16
3.14 x 16 = 50.24
16 (2^4) possible values for fraction, each represents 0.0625 (1/16)
Last 4 bits (0010) = 2
2 x 0.0625 = 0.125
3(0011) + 0.125 = 3.125 PRUHELWVIRUIUDFWLRQZRXOGLQFUHDVHDFFXUDF\
39
Multiply
Multiply integer representations
Shift result right by # of bits in fractional part
E.g., 3.14 * 2.71 = 8.5094
50 * 43 = 2150 = 1000.01100110
[ = (3.14*16) * (2.71*16) = (3.14*2.71*16) *16 ]
>> 4 = 1000.0110
8(1000) + 6(0110) x 0.0625 = 8.375
Range of real values used is limited by bit widths of possible resulting values
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
40
64,
62,
59,
53,
45,
35,
24,
12 },
64,
53,
24,
-12,
-45,
-62,
-59,
-35 },
64,
35,
-24,
-62,
-45,
12,
59,
53 },
64,
12,
-59,
-35,
45,
53,
-24,
-62 },
64,
-12,
-59,
35,
45,
-53,
-24,
62 },
64,
-35,
-24,
62,
-45,
-12,
59,
-53 },
64,
-53,
24,
12,
-45,
62,
-59,
64,
-62,
59,
-53,
45,
-35,
24,
35 },
-12 }
};
static const char ONE_OVER_SQRT_TWO = 5;
static short xdata inBuffer[8][8], outBuffer[8][8], idx;
void CodecInitialize(void) { idx = 0; }
void CodecPushPixel(short p) {
unsigned char x, j;
void CodecDoFdct(void) {
unsigned short x, y;
for(x=0; x<8; x++)
for(y=0; y<8; y++)
outBuffer[x][y] = F(x, y, inBuffer);
idx = 0;
}
41
Power consumption:
0.033 watt (same as 2)
Energy consumption:
0.050 joule (1.5 s x 0.033 watt)
Battery life 6x longer!!
42
Implementation 4:
Microcontr. and CCDPP/DCT and CODEC
EEPROM
SOC
CODEC
RAM
8051
UART
CCDP
P
43
CODEC design
4 memory mapped registers
C_DATAI_REG/C_DATAO_REG used to
push/pop 8 x 8 block into and out of
CODEC
C_CMND_REG used to command
CODEC
Writing 1 to this register invokes CODEC
44
Implementation 4:
Microcontr. and CCDPP/DCT and CODEC
Analysis of implementation 4
Total execution time for processing one image:
0.099 seconds (well under 1 sec)
Power consumption:
0.040 watt
Increase over 2 and 3 because SOC has another processor
Energy consumption:
0.00040 joule (0.099 s x 0.040 watt)
Battery life 12x longer than previous implementation!!
45
Summary of implementations
Performance (second)
Power (watt)
Size (gate)
Energy (joule)
Implementation 3
Close in performance
Cheaper
Less time to build
Implementation 4
Great performance and energy consumption
More expensive and may miss time-to-market window
If DCT designed ourselves then increased NRE cost and time-to-market
If existing DCT purchased then increased IC cost (IP royalties)
Which is better?
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
46
Summary
Digital camera example
Specifications in English and executable language
Design metrics: performance, power and area
Several implementations
47
Introduction to VHDL
.. .
VHDL
VHDL is a language for describing digital
hardware used by industry worldwide
.. .
Genesis of VHDL
State of art circa 1980
Multiple design entry methods and
hardware description languages in use
No or limited portability of designs
between CAD tools from different vendors
Objective: shortening the time from a
design concept to implementation from
18 months to 6 months
.. .
VHDL-87
VHDL-93
VHDL-01
.. .
.. .
Algorithmic level
Register Transfer Level
Level of description
most suitable for synthesis
.. .
Combinational
Logic
Combinational
Logic
Registers
.. .
.. .
10
Free Format
VHDL is a free format language
No formatting conventions, such as spacing or
indentation imposed by VHDL compilers. Space
and carriage return treated the same way.
Example:
if (a=b) then
or
if (a=b)
then
or
if (a =
b) then
11
Comments
Comments in VHDL are indicated with
a double dash, i.e., --
Comment indicator can be placed anywhere in the
line
Any text that follows in the same line is treated as
a comment
Carriage return terminates a comment
No method for commenting a block extending over
a couple of lines
Examples:
-- main subcircuit
Data_in <= Data_bus; -- reading data from the input FIFO
.. .
12
Design Entity
design entity
entity declaration
architecture 1
architecture 2
architecture 3
.. .
13
Entity Declaration
Entity Declaration describes the interface of the
component, i.e. input and output ports.
Entity name
Port names
Port type
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC
);
END nand_gate;
Reserved words
Semicolon
No Semicolon
14
ENTITY entity_name IS
PORT (
port_name : signal_mode signal_type;
port_name : signal_mode signal_type;
.
port_name : signal_mode signal_type);
END entity_name;
.. .
15
Architecture
Describes an implementation of a design
entity.
Architecture example:
.. .
16
.. .
17
.. .
18
Mode In
Port signal
Entity
Driver resides
outside the entity
.. .
19
.. .
20
Mode out
Entity
Port signal
Driver resides
inside the entity
c <= z
Signal X can be
read inside the entity
Driver resides
inside the entity
z <= x
c <= x
.. .
21
Mode inout
Entity
Port signal
Signal can be
read inside the entity
.. .
22
Mode buffer
Entity
Port signal
z
c
Driver resides
inside the entity
c <= z
.. .
23
Port Modes
The Port Mode of the interface describes the direction in which data travels with
respect to the component
In: Data comes in this port and can only be read within the entity. It can
appear only on the right side of a signal or variable assignment.
Out: The value of an output port can only be updated within the entity. It
cannot be read. It can only appear on the left side of a signal
assignment.
Inout: The value of a bi-directional port can be read and updated within
the entity model. It can appear on both sides of a signal assignment.
Buffer: Used for a signal that is an output from an entity. The value of the
signal can be used inside the entity, which means that in an assignment
statement the signal can appear on the left and right sides of the <=
operator
.. .
24
Library declarations
Library declaration
Use all definitions from the package
LIBRARY ieee;
std_logic_1164
USE ieee.std_logic_1164.all;
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC);
END nand_gate;
ARCHITECTURE model OF nand_gate IS
BEGIN
z <= a NAND b;
END model;
.. .
25
LIBRARY library_name;
USE library_name.package_name.package_parts;
.. .
26
PACKAGE 2
TYPES
CONSTANTS
FUNCTIONS
PROCEDURES
COMPONENTS
TYPES
CONSTANTS
FUNCTIONS
PROCEDURES
COMPONENTS
.. .
27
Libraries
ieee
Specifies multi-level logic system,
including STD_LOGIC, and
STD_LOGIC_VECTOR data types
Need to be explicitly
declared
std
Specifies pre-defined data types
(BIT, BOOLEAN, INTEGER, REAL,
SIGNED, UNSIGNED, etc.), arithmetic
operations, basic type conversion
functions, basic text i/o functions, etc.
Visible by default
work
Current designs after compilation
.. .
28
STD_LOGIC
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY nand_gate IS
PORT(
a
: IN STD_LOGIC;
b
: IN STD_LOGIC;
z
: OUT STD_LOGIC);
END nand_gate;
ARCHITECTURE model OF nand_gate IS
BEGIN
z <= a NAND b;
END model;
29
Meaning
High Impedance
Don't Care
.. .
30
X
Contention on the bus
X
0
.. .
31
0
0
.. .
32
H
1
0
.. .
33
Do not care.
Can be assigned to outputs for the case of invalid
inputs(may produce significant improvement in
resource utilization after synthesis).
Use with caution
1 = - give FALSE
.. .
34
X
0
1
Z
W
L
H
-
X
X
X
X
X
X
X
X
X
0
X
0
0
0
0
X
X
X
1
1
1
1
1
X
X
0
1
Z
W
L
H
X
X
0
1
W
W
W
W
X
X
0
1
L
W
L
W
X
X
0
1
H
W
W
H
X
X
X
X
X
X
X
X
X
.. .
35
Signals
SIGNAL a : STD_LOGIC;
a
1
wire
b
8
bus
.. .
36
37
-- c = 00001111
-- d <= 00001111
.. .
38
structural
dataflow
Concurrent
statements
Components and
interconnects
behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.
39
.. .
40
xor3 Example
Entity xor3
ENTITY xor3
PORT(
A : IN
B : IN
C : IN
Result
);
end xor3;
IS
STD_LOGIC;
STD_LOGIC;
STD_LOGIC;
: OUT STD_LOGIC
.. .
41
.. .
42
Dataflow Description
Describes how data moves through the system
and the various processing steps.
Data Flow uses series of concurrent statements
to realize logic. Concurrent statements are
evaluated at the same time; thus, order of these
statements doesnt matter.
Data Flow is most useful style when series of
Boolean equations can represent a logic.
43
.. .
Y
XOR2
A
B
C
Result
XOR3
U1_OUT
A
B
RESULT
XOR3
.. .
44
.. .
45
.. .
46
Structural Description
Structural design is the simplest to understand.
This style is the closest to schematic capture and
utilizes simple building blocks to compose logic
functions.
Components are interconnected in a hierarchical
manner.
Structural descriptions may connect simple gates
or complex, abstract components.
Structural style is useful when expressing a
design that is naturally composed of sub-blocks.
.. .
47
48
Behavioral Description
It accurately models what happens on the inputs
and outputs of the black box (no matter what is
inside and how it works).
This style uses PROCESS statements in VHDL.
.. .
49
Testbench
Processes
Generating
Design Under
Test (DUT)
Stimuli
Observed Outputs
.. .
50
Testbench Defined
Testbench applies stimuli (drives the inputs) to
the Design Under Test (DUT) and (optionally)
verifies expected outputs.
The results can be viewed in a waveform window
or written to a file.
Since Testbench is written in VHDL, it is not
restricted to a single simulation tool (portability).
The same Testbench can be easily adapted to
test different implementations (i.e. different
architectures) of the same design.
.. .
51
Testbench Anatomy
ENTITY tb IS
--TB entity has no ports
END tb;
ARCHITECTURE arch_tb OF tb IS
--Local signals and constants
COMPONENT TestComp --All Design Under Test component declarations
PORT ( );
END COMPONENT;
----------------------------------------------------BEGIN
testSequence: PROCESS
-- Input stimuli
END PROCESS;
DUT:TestComp PORT MAP(
);
END arch_tb;
-- Instantiations of DUTs
.. .
52
.. .
53
54
Constants
Syntax:
CONSTANT name : type := value;
Examples:
CONSTANT init_value : STD_LOGIC_VECTOR(3 downto 0) := "0100";
CONSTANT ANDA_EXT : STD_LOGIC_VECTOR(7 downto 0) := X"B4";
CONSTANT counter_width : INTEGER := 16;
CONSTANT buffer_address : INTEGER := 16#FFFE#;
CONSTANT clk_period : TIME := 20 ns;
CONSTANT strobe_period : TIME := 333.333 ms;
.. .
55
Constants - features
Constants can be declared in a
PACKAGE, ENTITY, ARCHITECTURE
When declared in a PACKAGE, the constant
is truly global, for the package can be used
in several entities.
When declared in an ARCHITECTURE, the
constant is local, i.e., it is visible only within this architecture.
When declared in an ENTITY declaration, the constant
can be used in all architectures associated with this entity.
.. .
56
57
Space
Unit of time
(dimension)
.. .
58
TIME values
Numeric value can be an integer or
a floating point number.
Numeric value is optional. If not given, 1 is
implied.
Numeric value and dimension MUST be
separated by a space.
.. .
59
Units of time
Unit
Base Unit
fs
Derived Units
ps
ns
us
ms
sec
min
hr
Definition
femtoseconds (10-15 seconds)
picoseconds (10-12 seconds)
nanoseconds (10-9 seconds)
microseconds (10-6 seconds)
miliseconds (10-3 seconds)
seconds
minutes (60 seconds)
hours (3600 seconds)
.. .
60
61
.. .
62
dataflow
Concurrent
statements
structural
Components and
interconnects
behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.
.. .
63
Data-flow VHDL
Major instructions
Concurrent statements
.. .
64
Data-flow VHDL
Major instructions
Concurrent statements
.. .
65
ci + 1
si
0
0
0
1
0
1
1
1
0
1
1
0
1
0
0
1
00
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
01
11
0
1
10
1
s i = x i y i c i
xiyi
ci
00
01
11
0
1
10
ci + 1 = xi yi + xici + yi ci
si
ci
ci + 1
(c) Circuit
.. .
66
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY fulladd IS
PORT ( x
: IN
y
: IN
cin
: IN
s
: OUT
cout : OUT
END fulladd ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ) ;
.. .
67
.. .
68
Logic Operators
Logic operators
and
or
nand
nor
xor
not
xnor
only in VHDL-93
Highest
and
or
not
nand
nor
xor
xnor
Lowest
.. .
69
.. .
70
No Implied Precedence
Wanted: y = ab + cd
Incorrect
y <= a and b or c and d ;
equivalent to
y <= ((a and b) or c) and d ;
equivalent to
y = (ab + c)d
Correct
y <= (a and b) or (c and d) ;
Concatenation
SIGNAL a: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL c, d, e, f: STD_LOGIC_VECTOR(7 DOWNTO 0);
a <= 0000;
b <= 1111;
c <= a & b;
-- c = 00001111
-- d <= 00001111
71
.. .
72
Rotations in VHDL
a<<<1
a(3) a(2)
a(1)
a(0)
.. .
73
C <= A + B;
.. .
74
Data-flow VHDL
Major instructions
Concurrent statements
75
.. .
Value N
Value N-1
0
1
0
1
0
1
Value 2
Target Signal
Value 1
Condition N-1
Condition 2
Condition 1
.. .
76
Operators
Relational operators
=
/=
<
<=
>
>=
=
and
/=
or
not
<
<=
nand
nor
>
xor
>=
xnor
.. .
77
.. .
78
.. .
79
.. .
80
Data-flow VHDL
Major instructions
Concurrent statements
.. .
81
expression1
choices_1
expression2
choices_2
target_signal
expressionN
choices_N
choice expression
.. .
82
WHEN value
WHEN value_1 to value_2
WHEN value_1 | value_2 | .... | value N
.. .
83
.. .
84
IN0
NEG_A
MUX_1
IN1
MUX_2
Y1
IN2
IN3
OUTPUT
SEL1
SEL0
B1
NEG_Y
MUX_4_1
MUX_3
NEG_B
L1 L0
.. .
85
.. .
86
A1 : STD_LOGIC;
B1 : STD_LOGIC;
Y1 : STD_LOGIC;
MUX_0 : STD_LOGIC;
MUX_1 : STD_LOGIC;
MUX_2 : STD_LOGIC;
MUX_3 : STD_LOGIC;
L: STD_LOGIC_VECTOR(1 DOWNTO 0);
.. .
87
.. .
88
AND B1;
OR B1;
XOR B1;
XNOR B1;
WHEN "00",
WHEN "01",
WHEN "10",
WHEN OTHERS;
END mlu_dataflow;
Data-flow VHDL
Major instructions
Concurrent statements
.. .
89
.. .
90
.. .
91
.. .
92
xor_out(2)
xor_out(3)
xor_out(4)
xor_out(5) xor_out(6)
.. .
93
.. .
94
PARITY: Architecture
ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: std_logic_vector (6 downto 1);
BEGIN
xor_out(1) <= parity_in(0) XOR parity_in(1);
xor_out(2) <= xor_out(1) XOR parity_in(2);
xor_out(3) <= xor_out(2) XOR parity_in(3);
xor_out(4) <= xor_out(3) XOR parity_in(4);
xor_out(5) <= xor_out(4) XOR parity_in(5);
xor_out(6) <= xor_out(5) XOR parity_in(6);
parity_out <= xor_out(6) XOR parity_in(7);
END parity_dataflow;
xor_out(1)
xor_out(2)
xor_out(3)
xor_out(4)
xor_out(5) xor_out(6)
xor_out(7)
.. .
95
.. .
96
PARITY: Architecture
ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0);
BEGIN
xor_out(0) <= parity_in(0);
xor_out(1) <= xor_out(0) XOR parity_in(1);
xor_out(2) <= xor_out(1) XOR parity_in(2);
xor_out(3) <= xor_out(2) XOR parity_in(3);
xor_out(4) <= xor_out(3) XOR parity_in(4);
xor_out(5) <= xor_out(4) XOR parity_in(5);
xor_out(6) <= xor_out(5) XOR parity_in(6);
xor_out(7) <= xor_out(6) XOR parity_in(7);
parity_out <= xor_out(7);
END parity_dataflow;
.. .
97
<=
Right side
<= when-else
with-select <=
Expressions including:
Internal signals (defined
in a given architecture)
Ports of the mode
- in
- inout
- buffer
.. .
98
Arithmetic operations
Synthesizable arithmetic operations:
Addition, +
Subtraction, Comparisons, >, >=, <, <=
Multiplication, *
Division by a power of 2, /2**6
(equivalent to right shift)
Shifts by a constant, SHL, SHR
.. .
99
Arithmetic operations
The result of synthesis of an arithmetic
operation is a
- combinational circuit
- without pipelining.
The exact internal architecture used
(and thus delay and area of the circuit)
may depend on the timing constraints specified
during synthesis (e.g., the requested maximum
clock frequency).
.. .
100
101
102
.. .
103
: IN
: IN
: OUT
: OUT
STD_LOGIC ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC ) ;
104
: IN
: IN
: OUT
: OUT
STD_LOGIC ;
UNSIGNED(15 DOWNTO 0) ;
UNSIGNED(15 DOWNTO 0) ;
STD_LOGIC ) ;
105
: IN
: IN
: OUT
: OUT
STD_LOGIC ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC_VECTOR(15 DOWNTO 0) ;
STD_LOGIC ) ;
106
.. .
107
: IN
: IN
: OUT
: OUT
STD_LOGIC ;
SIGNED(15 DOWNTO 0) ;
SIGNED(15 DOWNTO 0) ;
STD_LOGIC ) ;
108
.. .
109
.. .
110
Integer Types
Operations on signals (variables)
of the integer types:
INTEGER, NATURAL,
and their sybtypes, such as
TYPE day_of_month IS RANGE 0 TO 31;
are synthesizable in the range
-(231-1) .. 231 -1 for INTEGERs and their subtypes
0 .. 231 -1 for NATURALs and their subtypes
.. .
111
Integer Types
Operations on signals (variables)
of the integer types:
INTEGER, NATURAL,
are less flexible and more difficult to control
than operations on signals (variables) of the type
STD_LOGIC_VECTOR
UNSIGNED
SIGNED, and thus
are recommened to be avoided by beginners.
.. .
112
ENTITY adder16 IS
PORT ( X, Y
S
END adder16 ;
: IN
: OUT
.. .
113
dataflow
Concurrent
statements
structural
Components and
interconnects
behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.
.. .
114
Structural VHDL
Major instructions
component instantiation (port map)
generate scheme for component instantiations
(for-generate)
component instantiation with generic
(generic map, port map)
.. .
115
Structural VHDL
Major instructions
component instantiation
(port map)
component instantiation with generic
(generic map, port map)
generate scheme for component instantiations
(for-generate)
.. .
116
r(0)
r(1)
En
p(0)
w0
p(1)
r(2)
p(2)
r(3)
r(4)
r(5)
w1
p(3)
q(0)
q(1)
y1
w2
w3
y0
z
priority
ena
w
0
w
1
En
Enable
z(0)
z(0)
y
0
y
1
y
2
y
3
z(1)
z(3)
Clk
z(1)
z(2)
z(2)
dec2to4
D Q
regn
z(3)
Clock
s(1)
.. .
117
2-to-1 Multiplexer
w
0
w
1
w
0
w
1
.. .
118
: IN
: OUT
STD_LOGIC ;
STD_LOGIC ) ;
.. .
119
.. .
120
Priority Encoder
w0
y0
w1
y1
w2
w3
w3 w2 w1 w0
0
0
0
0
1
0
0
0
1
x
0
0
1
x
x
0
1
x
x
x
y1 y0
d
0
0
1
1
0
1
1
1
1
d
0
1
0
1
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;
121
2-to-4 Decoder
En w w
1 0
y y y y
0 1 2 3
w
0
w
1
En
y
0
y
1
y
2
y
3
.. .
122
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
.. .
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
Enable
Q
Clock
regn
.. .
124
r(0)
r(1)
1
p(1)
r(2)
p(2)
r(3)
r(4)
r(5)
En
p(0)
w0
w1
p(3)
q(0)
q(1)
y1
w2
w3
y0
ena
priority
w
0
w
1
En
Enable
t(0)
z(0)
y
0
y
1
y
2
y
3
z(1)
D Q
t(2)
z(2)
z(3)
dec2to4
Clk
t(1)
regn
t(3)
Clock
s(1)
.. .
125
p : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
q : STD_LOGIC_VECTOR (1 DOWNTO 0) ;
z : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
ena : STD_LOGIC ;
.. .
126
: IN
: OUT
COMPONENT priority
PORT (w
: IN
y
: OUT
z
: OUT
END COMPONENT ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;
COMPONENT dec2to4
PORT (w
: IN
En
: IN
y
: OUT
END COMPONENT ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
STD_LOGIC ;
STD_LOGIC ) ;
.. .
127
.. .
128
129
END structural;
.. .
130
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;
.. .
131
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;
.. .
132
.. .
133
STD_LOGIC ;
STD_LOGIC ) ;
COMPONENT priority
PORT (w : IN
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
y
: OUT STD_LOGIC_VECTOR(1 DOWNTO 0) ;
z
: OUT STD_LOGIC ) ;
END COMPONENT ;
.. .
134
.. .
135
.. .
136
ENTITY priority_resolver IS
PORT (r
: IN
STD_LOGIC_VECTOR(5 DOWNTO 0) ;
s
: IN
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
clk
: IN
STD_LOGIC;
en
: IN
STD_LOGIC;
t
: OUT
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
END priority_resolver;
ARCHITECTURE structural OF priority_resolver IS
SIGNAL
SIGNAL
SIGNAL
SIGNAL
p : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
q : STD_LOGIC_VECTOR (1 DOWNTO 0) ;
z : STD_LOGIC_VECTOR (3 DOWNTO 0) ;
ena : STD_LOGIC ;
.. .
137
.. .
138
END structural;
.. .
139
Configuration declaration
CONFIGURATION SimpleCfg OF priority_resolver IS
FOR structural
FOR ALL: mux2to1
USE ENTITY work.mux2to1(dataflow);
END FOR;
FOR u3: priority
USE ENTITY work.priority(dataflow);
END FOR;
FOR u4: dec2to4
USE ENTITY work.dec2to4(dataflow);
END FOR;
END FOR;
END SimpleCfg;
.. .
140
Configuration specification
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE work.GatesPkg.all;
ENTITY priority_resolver IS
PORT (r
: IN
s
: IN
z
: OUT
END priority_resolver;
STD_LOGIC_VECTOR(5 DOWNTO 0) ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
.. .
141
Structural VHDL
Major instructions
component instantiation (port map)
component instantiation with generic
(generic map, port map)
generate scheme for component instantiations
(for-generate)
.. .
142
Example 1
s0
s1
w0
w3
s2
s3
w4
w7
f
w8
w11
w12
w15
.. .
143
A 4-to-1 Multiplexer
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY mux4to1 IS
PORT (
w0, w1, w2, w3
s
: IN
f
: OUT
END mux4to1 ;
: IN
STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;
.. .
144
STD_LOGIC_VECTOR(0 TO 15) ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ) ;
.. .
145
: IN
: IN
: OUT
STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
Mux1: mux4to1 PORT MAP ( w(0),
Mux2: mux4to1 PORT MAP ( w(4),
Mux3: mux4to1 PORT MAP ( w(8),
Mux4: mux4to1 PORT MAP ( w(12),
Mux5: mux4to1 PORT MAP ( m(0),
END Structure ;
w(1),
w(5),
w(9),
w(13),
m(1),
w(2),
w(6),
w(10),
w(14),
m(2),
w(3),
w(7),
w(11),
w(15),
m(3),
.. .
146
: IN
: IN
: OUT
STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ) ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
G1: FOR i IN 0 TO 3 GENERATE
Muxes: mux4to1 PORT MAP (
w(4*i), w(4*i+1), w(4*i+2), w(4*i+3), s(1 DOWNTO 0), m(i) ) ;
END GENERATE ;
Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ;
END Structure ;
.. .
147
.. .
148
Example 2
w0
w1
w0
w1
En
w0
w1
w2
w3
w0
w1
En
En
y0
y1
y2
y3
En
w0
w1
En
w0
w1
En
y0
y1
y2
y3
y0
y1
y2
y3
y0
y1
y2
y3
y4
y5
y6
y7
y0
y1
y2
y3
y8
y9
y10
y11
y0
y1
y2
y3
y12
y13
y14
y15
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;
149
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 15) ) ;
.. .
150
: IN
: IN
: OUT
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;
SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ;
BEGIN
G1: FOR i IN 0 TO 3 GENERATE
Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0), m(i), y(4*i TO 4*i+3) );
G2: IF i=3 GENERATE
Dec_left: dec2to4 PORT MAP ( w(i DOWNTO i-1), En, m ) ;
END GENERATE ;
END GENERATE ;
END Structure ;
.. .
151
begin
Concurrent statements:
Concurrent simple signal assignment
Conditional signal assignment
Selected signal assignment
Generate statement
Concurrent Statements
end ARCHITECTURE_NAME;
.. .
152
dataflow
Concurrent
statements
structural
Components and
interconnects
behavioral
Sequential statements
Registers
State machines
Test benches
Algorithm spec.
.. .
153
Anatomy of a Process
OPTIONAL
.. .
154
Statement Part
Contains Sequential Statements to be
Executed Each Time the Process Is
Activated
Analogous to Conventional Programming
Languages
.. .
155
What is a PROCESS?
A process is a sequence of instructions referred to as
sequential statements.
The keyword PROCESS
A process can be given a unique name
using an optional LABEL
This is followed by the keyword
PROCESS
The keyword BEGIN is used to indicate
the start of the process
All statements within the process are
executed SEQUENTIALLY. Hence,
order of statements is important.
Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT FOR 10 ns;
END PROCESS;
.. .
156
Order of execution
Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT FOR 10 ns;
END PROCESS;
Program control is passed to the
first statement after BEGIN
.. .
157
Testing: PROCESS
BEGIN
test_vector<=00;
WAIT FOR 10 ns;
test_vector<=01;
WAIT FOR 10 ns;
test_vector<=10;
WAIT FOR 10 ns;
test_vector<=11;
WAIT;
END PROCESS;
Order of execution
.. .
158
.. .
159
160
If Statement - Example
SELECTOR: process
begin
WAIT UNTIL Clock'EVENT AND Clock = '1' ;
IF Sel = 00 THEN
f <= x1;
ELSIF Sel = 10 THEN
f <= x2;
ELSE
f <= x3;
END IF;
end process;
.. .
161
Loop Statement
Loop Statement
FOR i IN range LOOP
statements
END LOOP;
.. .
162
Testing: PROCESS
BEGIN
test_vector<="000";
FOR i IN 0 TO 7 LOOP
WAIT FOR 10 ns;
test_vector<=test_vector+001";
END LOOP;
END PROCESS;
.. .
163
.. .
164
.. .
165
........
END behavioral;
.. .
166
.. .
167
........
END behavioral;
.. .
168
.. .
169
170
Typical error
SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0);
SIGNAL reset : STD_LOGIC;
BEGIN
.......
generator1: PROCESS
reset <= 1;
WAIT FOR 100 ns
reset <= 0;
test_vector <="000";
WAIT;
END PROCESS;
generator2: PROCESS
WAIT FOR 200 ns
test_vector <="001";
WAIT FOR 600 ns
test_vector <="011";
END PROCESS;
.......
END behavioral;
.. .
171
Combinational
Logic
Combinational
Logic
Registers
.. .
172
clk
w
a
b
c
y
priority
173
Processes in VHDL
Processes Describe Sequential Behavior
Processes in VHDL Are Very Powerful
Statements
Allow to define an arbitrary behavior that may
be difficult to represent by a real circuit
Not every process can be synthesized
174
D latch
Truth table
Graphical symbol
Clock
0
1
1
D
Clock
0
1
Q(t+1)
Q(t)
0
1
Timing diagram
t1
t2
t3
t4
Clock
D
Q
Time
175
.. .
D flip-flop
Truth table
Graphical symbol
D
Clk D
n 0
n 1
0
1
Clock
Q(t+1)
0
1
Q(t)
Q(t)
Timing diagram
t1
t2
t3
t4
Clock
D
Q
Time
.. .
176
D latch
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY latch IS
PORT ( D, Clock : IN
Q
: OUT
END latch ;
STD_LOGIC ;
STD_LOGIC) ;
Clock
177
.. .
D flip-flop (1)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;
Clock
.. .
178
D flip-flop (2)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;
Clock
179
.. .
D flip-flop (3)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;
Clock
.. .
180
D flip-flop (4)
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
ENTITY flipflop IS
PORT ( D, Clock : IN
STD_LOGIC ;
Q
: OUT STD_LOGIC) ;
END flipflop ;
Clock
181
.. .
: IN
: OUT
STD_LOGIC ;
STD_LOGIC) ;
Clock
Resetn
182
: IN
: OUT
STD_LOGIC ;
STD_LOGIC) ;
Clock
Resetn
.. .
: IN
STD_LOGIC_VECTOR(7 DOWNTO 0) ;
: IN
STD_LOGIC ;
: OUT STD_LOGIC_VECTOR(7 DOWNTO 0) ) ;
Resetn
D
Clock
reg8
.. .
184
Resetn
D
Clock
regn
185
.. .
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) ;
Enable
Q
Clock
regn
.. .
186
STD_LOGIC ;
STD_LOGIC_VECTOR(1 DOWNTO 0) ) ;
Clear
2
Q
upcount
Clock
.. .
187
Enable
Q
Clock
upcount
Resetn
.. .
188
upcount
Resetn
.. .
189
Shift register
Sin
Q(1)
Q(2)
Q(3)
Q(0)
Clock
Enable
.. .
190
D(1)
D(2)
Sin
D
D(0)
Clock
Enable
Q(3)
Q(2)
Q(1)
Q(0)
.. .
191
: IN
: IN
: IN
: IN
: IN
: BUFFER
STD_LOGIC_VECTOR(3 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC ;
STD_LOGIC_VECTOR(3 DOWNTO 0) ) ;
Enable
D
Q
Load
Sin
shift4
Clock
.. .
192
Enable
D
Q
Load
Sin
shift4
Clock
.. .
193
Enable
D
Q
Load
Sin
shiftn
Clock
.. .
194
shiftn
Clock
.. .
195
: IN
STD_LOGIC_VECTOR(1 TO 3) ;
: OUT INTEGER RANGE 0 TO 3) ;
.. .
196
.. .
197
Variables - features
Can only be declared within processes and
subprograms (functions & procedures)
Initial value can be explicitly specified in the
declaration
When assigned take an assigned value
immediately
Variable assignments represent the desired
behavior, not the structure of the circuit
Should be avoided, or at least used with
caution in a synthesizable code
.. .
198
Delays
Delays are not synthesizable
Statements, such as
wait for 5 ns
a <= b after 10 ns
will not produce the required delay, and
should not be used in the code intended
for synthesis.
.. .
199
Initializations
Declarations of signals (and variables)
with initialized values, such as
SIGNAL a : STD_LOGIC := 0;
cannot be synthesized, and thus should
be avoided.
If present, they will be ignored by the
synthesis tools.
Use set and reset signals instead.
.. .
200
201
Floating-point operations
Operations on signals (and variables)
of the type
real
are not synthesizable by the
current generation of synthesis tools.
.. .
202
203
204
2-to-4 Decoder
En w w
1 0
y y y y
0 1 2 3
w
0
w
1
En
y
0
y
1
y
2
y
3
.. .
205
STD_LOGIC_VECTOR(1 DOWNTO 0) ;
STD_LOGIC ;
STD_LOGIC_VECTOR(0 TO 3) ) ;
y <= "1000" ;
y <= "0100" ;
y <= "0010" ;
y <= "0001" ;
.. .
206
207
STD_LOGIC ;
STD_LOGIC ) ;
208
STD_LOGIC ;
STD_LOGIC ) ;
209
A
B
AeqB
.. .
210
.. .
211
.. .
212
CASE y IS
WHEN S1 => Z <= "10";
WHEN S2 => Z <= "01";
WHEN S3 => Z <= "00";
WHEN OTHERS => Z <= --";
END CASE;
213
214
215
array of bits
string
array of characters
array of std_logic_vectors
.. .
216
.. .
217
218
Array Attributes
Aleft(N)
Aright(N)
Alow(N)
Ahigh(N)
Arange(N)
219
Aleft(1)
=1
Aright(2)
=0
Alow(1)
=1
Ahigh(2)
= 31
Arange(1)
= 1 to 4
Alength(2)
= 32
Aascending(2)
= false
.. .
220
Subprograms
Include
.. .
LIBRARY
global
FUNCTION /
PROCEDURE
ENTITY
222
223
.. .
224
Function syntax
FUNCTION function_name
(<parameter_list>)
RETURN data_type IS
[declarations]
BEGIN
(sequential statements)
END function_name;
.. .
225
.. .
226
Function Example 1
LIBRARY ieee;
USE ieee.std_logic_1164.all;
PACKAGE my_package IS
FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER;
END my_package;
PACKAGE body my_package IS
FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER IS
VARIABLE m,n : INTEGER;
BEGIN
m := 0;
n := 1;
WHILE (n < s) LOOP
m := m + 1;
n := n*2;
END LOOP;
RETURN m;
END log2_ceil;
END my_package;
.. .
227
228
Function Example 2
library IEEE;
use IEEE.std_logic_1164.all;
ENTITY powerOfFour IS
PORT(
X
: IN INTEGER;
Y
: OUT INTEGER;
);
END powerOfFour;
.. .
229
.. .
230
Function Example 2
ARCHITECTURE behavioral OF powerOfFour IS
FUNCTION Pow ( SIGNAL N:INTEGER; Exp : INTEGER)
RETURN INTEGER IS
VARIABLE Result : INTEGER := 1;
BEGIN
FOR i IN 1 TO Exp LOOP
Result := Result * N;
END LOOP;
RETURN( Result );
END Pow;
BEGIN
Y <= Pow(X, 4);
END behavioral;
.. .
231
.. .
232
LIBRARY ieee;
USE ieee.std_logic_1164.all;
------------------------------------------------------------------------------------------------PACKAGE my_package IS
FUNCTION conv_integer (SIGNAL vector: STD_LOGIC_VECTOR)
RETURN INTEGER;
END my_package;
-------------------------------------------------------------------------------------------------
.. .
233
.. .
234
.. .
235
236
Procedure syntax
PROCEDURE procedure_name
(<parameter_list>) IS
[declarations]
BEGIN
(sequential statements)
END function_name;
.. .
237
.. .
238
.. .
239
.. .
240
241
.. .
242
.. .
243
Operator overloading
Operator overloading allows different
argument types for a given operation
(function)
The VHDL tools resolve which of these
functions to select based on the types of
the inputs
This selection is transparent to the user as
long as the function has been defined for
the given argument types.
.. .
244
.. .
245
246
Notion of type
Type defines a set of values and a set of
applicable operations
Declaration of a type determines which values
can be stored in an object (signal, variable,
constant) of a given type
Every object can only assume values of its
nominated type
Each operation (e.g., and, +, *) includes the types
of values to which the operation may be applied,
and the type of the result
The goal of strong typing is a detection of errors
at an early stage of the design process
.. .
247
248
Integer type
Name:
Status:
Contents:
integer
predefined
all integer numbers
representable on a
particular host computer,
but at least numbers in the
range
(231-1) .. 231-1
.. .
249
250
(true, false)
bit
(0, 1)
character
VHDL-87:
128 7-bit ASCII characters
VHDL-93:
256 ISO 8859 Latin-1 8-bit characters
.. .
251
252
253
254
23.1
46 105
1 1012
1.234 109
34.0 10-8
2#0.101#E5
8#0.4#E-6
16#0.a5#E-8
0.1012 25 =(2-1+2-3) 25
0.48 8-6 = (4 8-1) 8-6
0.a516 16-8 =(1016-1+516-2) 16-8
.. .
255
.. .
256
.. .
257
258
= 21
= 11
= 11
= 21
= false
= 14
= 20
.. .
259
position number of x in T
value in T at position n
value in T at position one greater
than position of x
value in T at position one less
than position of x
value in T at position one to the
left of x
value in T at position one to the
right of x
.. .
260
=0
= high
= low
= low
error
= high
.. .
261
Subtype
Defines a subset of a base type values
A condition that is used to determine which
values are included in the subtype is called
a constraint
All operations that are applicable to the
base type also apply to any of its subtypes
Base type and subtype can be mixed in the
operations, but the result must belong to
the subtype, otherwise an error is
generated.
.. .
262
Predefined subtypes
natural
integers t 0
positive
integers > 0
time t 0
.. .
263
.. .
264
Operators (1)
.. .
265
.. .
266
Operators (2)
Operators (3)
.. .
267
268
.. .
269
.. .
270
.. .
271
.. .
272
273
.. .
274
275
.. .
276
Event-driven simulation
time
signal
new value
.. .
277
signal
new value
.. .
278
Delta delay
A propagation delay of 0 time units is
equivalent to omitting the after clause and is
called a delta delay.
Used for functional simulation.
.. .
279
.. .
280
.. .
281
Signals vs Variables
architecture DUMMY_1 of JUNK is
signal Y : bit := 0;
begin
process
variable X : bit := 0;
begin
wait for 10 ns;
X := 1;
Y <= X;
wait for 10 ns;
-- What is Y at this point ? 1
...
end process;
end DUMMY_1;
282
Properties of signals
Signals represent a time-ordered list of values
denoting past, present and future values.
This time history of a signal is called a waveform.
A value/time pair (v, t) is called a transaction.
If a transaction changes value of a signal, it is
called an event.
.. .
283
.. .
284
.. .
285
286