Академический Документы
Профессиональный Документы
Культура Документы
Objectives
After completing this module, you will be able to:
Specify Xilinx resources that may need to be instantiated Identify some basic design guidelines that successful FPGA designers follow Select a proper HDL coding style for fast, efficient circuits, including:
Note: The guidelines in this module are not specific to any particular synthesis tool or Xilinx FPGA family
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
Breakthrough Performance
Dedicated resources are faster than a LUT/flip-flop implementation and consume less power Typically built with the CORE Generator tool and instantiated DSP48, FIFO, block RAM, ISERDES, OSERDES, PowerPC processor, EMAC, and MGT, for example Use synchronous design methodology Ensure the code is written optimally for critical paths Pipeline Try different optimization techniques Add critical timing constraints in synthesis Preserve hierarchy Apply full and correct constraints Use High effort
2008 Xilinx, Inc. All Rights Reserved
Offers as much as 3x the performance of soft implementations Uses less power Examples
FIFO at 500 MHz DSP slices at 500 MHz PowerPC processor at 702 DMIPS
Timing Closure
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
Instantiate a component when you must dictate exactly which resource is needed
The synthesis tool is unable to infer the resource The synthesis tool fails to infer the resource Inference makes your code more portable
Xilinx recommends using the CORE Generator software to create functions such as Arithmetic Logic Units (ALUs), fast multipliers, and Finite Impulse Response (FIR) filters for instantiation Xilinx recommends using the Architecture Wizard utility to create DCM, PLL, and BUFG instantiations
FPGA Resources
Shift register LUT (SRL16/ SRLC16) F5, F6, F7, and F8 multiplexers Carry logic MULT_AND Multipliers and counters using the DSP48 Global clock buffers (BUFG) SelectIO (single-ended) interface I/O registers (single data rate) Input DDR registers
Memories Global clock buffers (BUFGCE, BUFGMUX, BUFGDLL) Some complex DSP functions
SelectIO (differential) interface Output DDR registers DCM / PLL Local clock buffers (BUFIO, BUFR)
Suggested Instantiation
Memory resources
Block RAMs specifically (use the CORE Generator software to build large memories)
DCM, PMCD (use the Architecture Wizard) IBUFG, BUFGMUX, BUFGCE BUFIO, BUFR
Suggested Instantiation
START UP
Easier to port your HDL to other and newer technologies Fewer synthesis constraints and attributes to pass on
IBUFG
IBUF _SSTL2_I
OBUF _GTL
OBUF _GTL
Keeping most of the attributes and constraints in the Xilinx User Constraints File (UCF) keeps it simpleone file contains critical information
Above the top-level block, create a Xilinx wrapper with instantiations specific to Xilinx
2008 Xilinx, Inc. All Rights Reserved
Hierarchy Management
Flatten the design: Allows total combinatorial optimization across all boundaries Maintain hierarchy: Preserves hierarchy without allowing optimization of combinatorial logic across boundaries
If you have followed the synchronous design guidelines, use the setting -maintain hierarchy If you have not followed the synchronous design guidelines, use the setting -flatten the design Your synthesis tool may have additional settings
Easily locate problems in the code based on the hierarchical instance names contained within static timing analysis reports Enables floorplanning and incremental design flow The primary advantage of flattening is to optimize combinatorial logic across hierarchical boundaries
If the outputs of leaf-level blocks are registered, there is generally no need to flatten
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
Multiplexers
IF/THEN statements generate priority encoders Use a CASE statement to generate complex encoding Delay and size
Affected by the number of inputs and number of nested clauses to an IF/THEN or CASE statement Generated when IF/THEN or CASE statements do not cover all conditions Review your synthesis tool warnings Check by looking at the component with a schematic viewer
Priority Encoder
Most critical input listed first Lease critical input listed last
do_e 0 do_c 1 IF (crit_sig) THEN oput <= do_d ; ELSIF cond_a THEN oput <= do_a; ELSIF cond_b THEN oput <= do_b; ELSIF cond_c THEN oput <= do_c; ELSE oput <= do_e; END IF;
0
do_b
do_a
oput
cond_c
do_d
cond_b
cond_a
crit_sig
Nested IF or IF/THEN/ELSE statements form priority encoders CASE statements do not have priority If nested IF statements are necessary, put critical input signals on the first IF statement
CASE Statements
Latches are inferred if outputs are not defined in all branches Use default assignments before the CASE statement to prevent latches
Clock enables are inferred if outputs are not defined in all branches This is not wrong, but might generate a long clock enable equation Use default assignments before CASE statement to prevent clock enables
CASE Statements
Can reduce the number of logic levels between flip-flops Eliminating the select decoding can improve performance
Determine how your synthesis tool synthesizes the order of the select lines
If there is a critical select input, this input should be included last in the logic for fastest performance
CASE Statement
This Verilog code describes a 6:1 multiplexer with binary-encoded select inputs
module case_binary (clock, sel, data_out, in_a, in_b, in_d, in_c, in_e, in_f) ; input clock ; input [2:0] sel ; input in_a, in_b, in_c, in_d, in_e, in_f ; output data_out ; reg data_out; always @(posedge clock) begin case (sel) 3'b000 : data_out <= in_a; 3'b001 : data_out <= in_b; 3'b010 : data_out <= in_c; 3'b011 : data_out <= in_d; 3'b100 : data_out <= in_e; 3'b101 : data_out <= in_f; default : data_out <= 1'bx; endcase end endmodule
The advantage of using the dont care for the default, is that the synthesizer will have more flexibility to create a smaller, faster circuit How could the code be changed to use one-hot select inputs?
Basic HDL Coding Techniques - A - 20
10
CASE Statement
module case_onehot (clock, sel, data_out, in_a, in_b, in_d, in_c, in_e, in_f) ;
This synthesized to 8 LUTs and 1 F5 mux Longest path = 2 LUTs + 1 F5 This yields no benefit for 6-1 mux, but when you get larger the benefit is significant
input clock ; input [5:0] sel ; input in_a, in_b, in_c, in_d, in_e, in_f ; output data_out ; reg data_out; always @(posedge clock) begin case (sel) 6'b000001 : data_out <= in_a; 6'b000010 : data_out <= in_b; 6'b000100 : data_out <= in_c; 6'b0010 00: data_out <= in_d; 6'b010000 : data_out <= in_e; 6'b100000 : data_out <= in_f; default : data_out = 1'bx; endcase end endmodule
Synthesis tools may not produce optimal results A <= B + C + D + E; should be: A <= (B + C) + (D + E) Better system control
11
Not using a synchronous element will not save silicon and it wastes money Reduces capability of end products; higher speed grades cost more Difficult timing specifications and tool-effort levels Probability, race conditions, temperature, and process effects
Waste performance
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
12
Ex 3. D Flip-Flop with Asynch Reset always @(posedge CLOCK or posedge RESET) if (RESET) Q = 0; else Q = D_IN;
Basic HDL Coding Techniques - A - 25
Clock Enables
Verilog
always @(posedge CLOCK) if (ENABLE) Q = D_IN;
13
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
The state register can also be included here or in a separate process block or always block
S5
S4 HDL Code
14
Inputs: Input signals and state jumps Outputs: Output states and control and enable signals to the rest of the design NO arithmetic logic, datapaths, or combinatorial functions inside the state machine
Current State Feedback to Drive State Jumps
State Register
Input Signals
Next State
Most synthesis tools have commands to extract and re-encode state machines described in this way Uses more registers, but simplifies next-state logic Examine trade-offs: Gray and Johnson encoding styles can also improve performance Refer to the documentation of your synthesis tool to determine how your synthesis tool chooses the default encoding scheme
15
Binary
Smallest (fewest registers) Complex FSM tends to build multiple levels of logic (slow) Synthesis tools usually map to this encoding when FSM has eight or fewer states Largest (more registers), but simplifies next-state logic (fast) Synthesis tools usually map this when FSM has between 8 and 16 states Always evaluate undefined states (you may need to cover your undefined states) Efficient size and can have good speed Depends on the number of states, inputs to the FSM, complexity of transitions Build your FSM and then synthesize it for each encoding and compare size and speed
2008 Xilinx, Inc. All Rights Reserved
One-hot
Which is best?
always @(posedge clock or posedge reset) begin if (reset) begin current_state <= s0; end else current_state <= next state; end
16
17
18
19
20
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
Pipelining Concept
fMAX = n MHz
D Q
fMAX 2n MHz
one level
one level
21
Pipelining
Typically done after timing analysis Can easily be done for purely combinatorial components Usually done by the designer from the beginning
Register I/O
Performance by Design
D Q
Code A
Two levels of logic (connections dominate) May require higher speed grade, adding cost Switch
D Q
Enable data_in
CE D Q
reg_data
Code B
One level of logic Maximum time for routing of high fanout net Flip-flop adds nothing to the cost
D Q
D Q
Enable data_in
CE D Q
reg_data
Tip: Remember that the LUT output feeds the D input to the flip-flop
Basic HDL Coding Techniques - A - 44
2008 Xilinx, Inc. All Rights Reserved
Ken Chapman (Xilinx UK) 2003
22
The code on the right forms a pipeline stage for the circuit and improves its speed
Code B always @(posedge clk) begin if (set_in && enable_in) reg_enable <= 1'b1; else reg_enable <= 1'b0; if (reg_enable) reg_data <= data_in; end
Code A always @(posedge clk) begin if (switch && enable) reg_data <= data_in; end
In each case
reg_data and data_in are 16-bit buses switch and enable are outputs from flip-flops
Code A capture: process (clk) begin if clk'event and clk='1' then if switch='1 and enable=1 then reg_data <= data_in; end if; end if; end process;
Code B capture: process (clk) begin if clk'event and clk='1' then if switch ='1 and enable=1 then reg_enable <= 1; else reg_enable <= 0; end if; if reg_enable='1 then reg_data <= data_in; In each case end if; reg_data and data_in are 16-bit buses end if; switch and enable are outputs from flip-flops end process;
2008 Xilinx, Inc. All Rights Reserved
The code on the right forms a pipeline stage for the circuit and improves its speed
23
Outline
Achieving Breakthrough Performance Basic Design Guidelines Coding for Combinatorial Logic Register Inference Finite State Machine Design Pipelining Summary
What is the approach presented here for obtaining breakthrough performance? Compare CASE and IF/THEN statements when creating multiplexers What problem occurs with nested CASE and IF/THEN statements?
24
Answers
1. Utilize embedded (dedicated) resources Performance by construction DSP48, FIFO, block RAM, ISERDES, OSERDES, PowerPC processor, EMAC, and MGT, for example 2. Write code for performance Pipeline Xilinx FPGAs have abundant registers: one register per LUT 3. Drive your synthesis tool Apply full and correct constraints Utilize optional settings Use High effort
2008 Xilinx, Inc. All Rights Reserved
Answers
Both types of statements produce multiplexers CASE statements produce smaller/faster circuits in general IF/THEN statements are more flexible but create a priority encoder IF/THEN statements may be faster for late arriving signals Nested CASE and IF/THEN statements can generate long delays due to cascaded functions
25
Summary
Use as much of the dedicated hardware resources as possible to ensure optimum speed and device utilization Plan on instantiating clocking and memory resources Try to use the Core Generator tool to create optimized components that target dedicated FPGA resources (BRAM, DSP48, and FIFO) Maintain your design hierarchy to make debugging, simulation, and report generation easier CASE and IF/THEN statements produce different types of multiplexers
CASE statements tend to build logic in parallel while IF/THEN statements tend to build priority encoders
Summary
You should always build a synchronous design for your FPGA Inferring many types of flip-flops from HDL code is possible
When coding a state machine, separate the next-state logic from state machine output equations Evaluate whether you need to use binary, one-hot, or Gray encoding style for your FSM
26
www.xilinx.com/support Documentation Search for Synthesis & Simulation Design Guide www.xilinx.com/support Documentation Search for User Guides www.xilinx.com/support Documentation Search for XST This guide has example inferences of many architectural resources Start Xilinx ISE 10.1i Documentation Software Manuals
User guides
27