Вы находитесь на странице: 1из 56

# VHDL Coding Exercise 4: FIR Filter

Where to start?
Feedback Designspace Exploration

Algorithm

Architecture

Optimization

RTLBlock diagram

VHDL-Code

Algorithm
High-Level System Diagram
Context of the design
Inputs and Outputs Throughput/rates Algorithmic requirements

Algorithm Description
Mathematical Description Performance Criteria
Accuracy Optimization constraints

y ( k ) = bi x( k i )
N

x( k )

i =0

Implementation constraints
Area Speed

FIR

y( k )

Architecture (1)
Isomorphic Architecture:
Straight forward implementation of the algorithm
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

Architecture (2)
Pipelining/Retiming:
Improve timing
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

## Insert register(s) at the inputs or outputs

Increases Latency

Architecture (2)
Pipelining/Retiming:
Improve timing
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

## Insert register(s) at the inputs or outputs

Increases Latency

Perform Retiming:
Move registers through the logic without changing functionality

Backwards:

Forward:

Architecture (2)
Pipelining/Retiming:
Improve timing
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

## Insert register(s) at the inputs or outputs

Increases Latency

Perform Retiming:
Move registers through the logic without changing functionality

Backwards:

Forward:

Architecture (2)
Pipelining/Retiming:
Improve timing
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

## Insert register(s) at the inputs or outputs

Increases Latency

Perform Retiming:
Move registers through the logic without changing functionality

Backwards:

Forward:

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (3)
Retiming and simple transformation:
Optimization
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Reverse the adder chain Perform Retiming

Architecture (4)
More pipelining:
Add one pipelining stage to the retimed circuit
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## The longest path is given by the multiplier

Unbalanced: The delay from input to the first pipeline stage is much longer than the delay from the first to the second stage

Architecture (5)
More pipelining:
Add one pipelining stage to the retimed circuit
x( k )

b0
y( k )

b1

b2

bN 2

bN 1

bN

## Move the pipeline registers into the multiplier:

Paths between pipeline stages are balanced Improved timing

## Tclock = (Tadd + Tmult)/2 + Treg

Architecture (6)
Iterative Decomposition:
Reuse Hardware
x( k )

b0

b1

b2

bN 2

bN 1

bN
y( k )

## Identify regularity and reusable hardware components Add control x( k )

multiplexers storage elements Control

Increases Cycles/Sample

b0

y( k )

bN

RTL-Design
Choose an architecture under the following constraints:
It meets ALL timing specifications/constraints:
Throughput Latency

It consumes the smallest possible area It requires the least possible amount of power

Iterative Decomposition

Decide which additional functions are needed and how they can be implemented efficiently:
Storage of samples x(k) => MEMORY Storage of coefficients bi => LUT
x( k )

Address generators for MEMORY and LUT => COUNTERS Control => FSM

b0 bN

y( k )

RTL-Design
RTL Block-diagram: N Datapathy ( k ) = bi x( k i )
x( k )
i =0

b0 bN

y( k )

FSM:
Interface protocols datapath control:

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RUN:

y ( k ) = bi x( k i )
N i =0

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RUN:

## y k = bi x k i i =0 Store result to output register

( )

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RUN:

## y k = bi x k i i =0 Store result to output register Output result

( )

DATA OUT:

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RUN:

## y k = bi x k i i =0 Store result to output register Output result / Wait for ACK

( )

DATA OUT:

RTL-Design
y How it works: ( k ) = bi x( k i )
N

IDLE

i =0

## Wait for new sample Store to input register

NEW DATA:
Store new sample to memory

RUN:

( )

## Translation into VHDL

Some basic VHDL building blocks:
Signal Assignments:
Outside a process:
AxD AxD BxD YxD YxD

## Within a process (sequential execution):

AxD BxD YxD

Sequential execution The last assignment is kept when the process terminates

## Translation into VHDL

Some basic VHDL building blocks:
Multiplexer:
AxD BxD CxD SELxS YxD

Default Assignment

Conditional Statements:
AxD BxD SelAxS CxD DxD SelBxS OUTxD

STATExDP

## Translation into VHDL

Common mistakes with conditional statements:
Example:
AxD ?? SelAxS BxD ?? SelBxS OUTxD

NO default assignment

NO else statement

STATExDP

ASSIGNING NOTHING TO A SIGNAL IS NOT A WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!!

## Translation into VHDL

Some basic VHDL building blocks:
Register:
DataREGxDN DataREGxDP

## Register with ENABLE:

DataREGxDN DataREGxDP

DataREGxDN

DataREGxDP

## Translation into VHDL

Common mistakes with sequential processes:
DataREGxDN DataREGxDP CLKxCI DataRegENxS

DataREGxDP

DataREGxDN

0 1

## Clocks are NEVER generated within any logic

DataREGxDP

DataREGxDN

CLKxCI DataRegENxS

Gated clocks are more complicated then this Avoid them !!!

## Translation into VHDL

Some basic rules:
Sequential processes (FlipFlops)
Only CLOCK and RESET in the sensitivity list Logic signals are NEVER used as clock signals

Combinatorial processes
Multiple assignments to the same signal are ONLY possible within the same process => ONLY the last assignment is valid Something must be assigned to each signal in any case OR There MUST be an ELSE for every IF statement

## More rules that help to avoid problems and surprises:

Use separate signals for the PRESENT state and the NEXT state of every FlipFlop in your design. Use variables ONLY to store intermediate results or even avoid them whenever possible in an RTL design.

## Translation into VHDL

Write the ENTITY definition of your design to specify:
Inputs, Outputs and Generics

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

Counter

Counter

## Translation into VHDL

Describe the functional units in your block diagram one after another in the architecture section:

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process
MEALY

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

MEALY

## Translation into VHDL

The FSM is described with one sequential process and one combinatorial process

MEALY

## Translation into VHDL

Complete and check the code:
Declare the signals and components Check and complete the sensitivity lists of ALL combinatorial processes with ALL signals that are:
used as condition in any IF or CASE statement being assigned to any other signal used in any operation with any other signal

## Check the sensitivity lists of ALL sequential processes that they

contain ONLY one global clock and one global async. reset signal no other signals

## Other Good Ideas

Keep things simple Partition the design (Divide et Impera):
Example: Start processing the next sample, while the previous result is waiting in the output register:
Just add a FIFO to at the output of you filter

Do NOT try to optimize each Gate or FlipFlop Do not try to save cycles if not necessary VHDL code
Is usually long and that is good !! Is just a representation of your block diagram Does not mind hierarchy

## Нижнее меню

### Получите наши бесплатные приложения

Авторское право © 2021 Scribd Inc.