Вы находитесь на странице: 1из 48

Basic FPGA Architecture

This material exempt per Department of Commerce license exception TSU

Objectives
After completing this module, you will be able to:
Identify the basic architectural resources of the Virtex-II FPGA List the differences between the Virtex-II, VirtexII Pro, Spartan-3, and Spartan-3E devices List the new and enhanced features of the new Virtex-4 device family

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 3

Overview
All Xilinx FPGAs contain the same basic resources
Slices (grouped into CLBs)
Contain combinatorial logic and register resources

IOBs
Interface between the FPGA and the outside world

Programmable interconnect Other resources


Memory Multipliers Global clock buffers Boundary scan logic
Basic Architecture 4

Virtex-II Architecture
Block SelectRAM resource I/O Blocks (IOBs)

Programmable interconnect Dedicated multipliers Configurable Logic Blocks (CLBs)

Virtex-II architectures core voltage operates at 1.5V


Basic Architecture 5

Clock Management (DCMs, BUFGMUXes)

The Spartan-3 Solution


A New Class of Spartan FPGAs
18x18 bit Embedded Pipelined Multipliers for efficient DSP

Spartan-3

Configurable 18K Block RAMs + Distributed RAM

Bank 0 Bank 2

Up to eight on-chip Digital Clock Managers to support multiple system clocks

4 I/O Banks, Support for all I/O Standards including PCI, DDR333, RSDS, mini-LVDS

Bank 3

Bank 1

Basic Architecture 6

Virtex-II Pro Platform FPGA


3.125 Gbps MultiGigabit Transceivers (MGTs) Supports 10 Gbps standards Up to 24 per device
MGT MGT

Fabric

MGT MGT

IP-Immersion Fabric Active Interconnect 18Kb Dual-Port RAM Xtreme Multipliers 16 Global Clock Domains

PowerPC 405 Core 300+ MHz / 450+ DMIPS Performance Upto 4 per device
Basic Architecture 7

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 8

Slices and CLBs


Each Virtex -II CLB contains four slices
Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs A switch matrix provides access to general routing resources
Basic Architecture 9

COUT BUFT BUF T

COUT

Slice S3

Slice S2 Switch Matrix SHIFT

Slice S1

Slice S0

Local Routing

CIN

CIN

Simplified Slice Structure


Each slice has four outputs
Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs
Slice 0 LUT Carry
PRE D Q CE CLR

Carry logic runs vertically, up only


Two independent Basic Architecture 10 carry chains per CLB

LUT

Carry

D PRE Q CE CLR

Detailed Slice Structure


The next few slides discuss the slice features
LUTs MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) Carry Logic MULT_ANDs Sequential Elements
Basic Architecture 11

Look-Up Tables
Combinatorial logic is stored in Look-Up Tables (LUTs)
Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the complexity
A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1
Combinatorial Logic

Delay through the LUT is constant

0 1 0 1 1 .
Z

A B C D

1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

Basic Architecture 12

Connecting Look-Up Tables


Slice S3 Slice S2
F7 F5 F8 F5 F6

CLB

MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs

F5

Slice S1

F5

Slice S0

F6

MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice

Basic Architecture 13

Fast Carry Logic


Simple, fast, and complete arithmetic Logic
Dedicated XOR gate for singlelevel sum completion Uses dedicated routing resources All synthesis tools can infer carry logic
COUT
To S0 of the next CLB

COUT
To CIN of S2 of the next CLB

First Carry Chain

SLICE S3
CIN COUT

SLICE S2

CIN COUT

SLICE S1 Second Carry Chain SLICE S0

CIN

CIN

CLB

Basic Architecture 14

MULT_AND Gate
Highly efficient multiply and add implementation
Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit
LUT

S CO DI CI

CY_MUX

CY_XOR MULT_AND

AxB
LUT

B
Basic Architecture 15

LUT

Flexible Sequential Elements


Either flip-flops or latches Two in each slice; eight in each CLB Inputs come from LUTs or from an independent CLB input Separate set and reset controls
Can be synchronous or asynchronous
_1 FDRSE D CE R FDCPE D PRE Q CE CLR S Q

LDCPE D PRE Q CE G CLR

All controls are shared within a slice


Basic Architecture 16

Control signals can be inverted locally within a

Shift Register LUT (SRL16CE)


Dynamically addressable serial shift registers
Maximum delay of 16 clock cycles per LUT (128 per CLB) Cascadable to other LUTs or CLBs for longer shift registers
Dedicated connection from Q15 to D input of LUT the next SRL16CE
LUT D CE CLK
D Q CE

D Q CE

D Q CE

D Q CE

Shift register length can be changed asynchronously Basic Architecture 17 by toggling address A

A[3:0]

Q15 (cascade out)

Shift Register LUT Example


The SRL can be used to create a No Operation (NOP)
This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72 CLBs) and associated routing and delays
12 Cycles
Operation A Operation B

64

4 Cycles
Operation C

8 Cycles
Operation D NOP

64

3 Cycles

9 Cycles
Paths are Statically Balanced

12 Cycles

Basic Architecture 18

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix
Basic Architecture 19

IOB Element
Input path
Two DDR registers

IOB
DDR MUX Reg
OCK1

Output path
Two DDR registers Two 3-state enable DDR registers

Input
Reg
ICK1

OCK2

Reg

3-state

Separate clocks and clock enables for I and O Set and reset signals are shared

Reg
ICK2

OCK1

DDR MUX Reg

PAD
Output

Reg
OCK2

Basic Architecture 20

SelectIO Standard
Allows direct connections to external signals of varied voltages and thresholds
Optimizes the speed/noise tradeoff Saves having to place interface components onto your board

Differential signaling standards


LVDS, BLVDS, ULVDS LDT LVPECL

Single-ended I/O standards



LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz) GTL, GTLP and more!

Basic Architecture 21

Digital Controlled Impedance (DCI)


DCI provides
Output drivers that match the impedance of the traces On-chip termination for receivers and transmitters

DCI advantages
Improves signal integrity by eliminating stub reflections Reduces board routing complexity and component count by eliminating external resistors Eliminates the effects of temperature, voltage, and process variations by using an internal feedback circuit

Basic Architecture 22

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 23

Other Virtex-II Features


Distributed RAM and block RAM
Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits) Block RAM is a dedicated resources on the device (18kb blocks)

Dedicated 18 x 18 multipliers next to block RAMs Clock management resources


Sixteen dedicated global clock multiplexers Digital Clock Managers (DCMs)

Basic Architecture 24

Distributed SelectRAM Resources


Uses a LUT in a slice as memory Synchronous write Asynchronous read
Accompanying flip-flops can be used to create synchronous read
LUT
RAM16X1S D WE WCLK A0 O A1 A2 A3

RAM and ROM are initialized during configuration


Data can be written to RAM after configuration

Slice LUT

RAM32X1S D WE WCLK A0 O A1 A2 A3 A4

RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3

LUT

Emulated dual-port RAM


Basic Architecture 25

One read/write port

Block SelectRAM Resources


Up to 3.5 Mb of RAM in 18-kb blocks
Synchronous read and write
18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA CLKA DIB DIPB ADDRB WEB ENB SSRB CLKB

True dual-port memory


Each port has synchronous read and write capability Different clocks for each port

DOA DOPA

Supports initial values Synchronous reset on output latches Supports parity bits
Basic Architecture 26

DOB DOPB

Dedicated Multiplier Blocks


18-bit twos complement signed operation Optimized to implement Multiply and Accumulate functions Multipliers are physically located next to block SelectRAM memory
Data_A (18 bits)

4 x 4 signed
18 x 18 Multiplier
Output (36 bits)

8 x 8 signed 12 x 12 signed 18 x 18 signed

Data_B (18 bits)

Basic Architecture 27

Global Clock Routing Resources


Sixteen dedicated global clock multiplexers
Eight on the top-center of the die, eight on the bottom-center Driven by a clock input pad, a DCM, or local routing

Global clock multiplexers provide the following:


Traditional clock buffer (BUFG) function Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX)

Up to eight clock nets can be used in each clock region of the device
Each device contains four or more clock regions
Basic Architecture 28

Digital Clock Manager (DCM)


Up to twelve DCMs per device
Located on the top and bottom edges of the die Driven by clock input pads

DCMs provide the following:


Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)

Up to four outputs of each DCM can drive onto global clock buffers
All DCM outputs can drive general routing

Basic Architecture 29

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan-3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix

Basic Architecture 30

Spartan-3 versus Virtex-II


Lower cost Smaller process = lower core voltage
.09 micron versus .15 micron Vccint = 1.2V versus 1.5V

More I/O pins per package Only one-half of the slices support RAM or SRL16s (SLICEM) Fewer block RAMs and multiplier blocks
Same size and functionality

Different I/O standard support


New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL Default is LVCMOS, Basic Architecture 31 versus LVTTL

Eight global clock multiplexers Two or four DCM blocks No internal 3-state

SLICEM and SLICEL


Each Spartan-3 CLB contains four slices
Similar to the Virtex-II
Right-Hand SLICEL Left-Hand SLICEM
COUT COUT

Slice X1Y1

Slices are grouped in pairs


Left-hand SLICEM (Memory)
LUTs can be configured as memory or SRL16

Slice X1Y0 Switch Matrix SHIFTIN

Slice X0Y1

Slice X0Y0

Fast Connects

Right-hand SLICEL (Logic)


Basic Architecture 32

SHIFTOUT CIN

CIN

LUT can be used as logic only

Spartan-3E Features
More gates per I/O than Spartan-3 Removed some I/O standards

Higher-drive LVCMOS GTL, GTLP SSTL2_II HSTL_II_18, HSTL_I, HSTL_III LVDS_EXT, ULVDS

16 BUFGMUXes on left and right sides


Drive half the chip only In addition to eight global clocks

Pipelined multipliers Additional configuration modes


SPI, BPI Multi-Boot mode

DDR Cascade
Internal data is presented on a single Basic Architecture 33 clock edge

Virtex-II Pro Features


0.13 micron process Up to 24 RocketIO Multi-Gigabit Transceiver (MGT) blocks
Serializer and deserializer (SERDES) Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers, and others 8-, 16-, and 32-bit selectable FPGA interface 8B/10B encoder and decoder

PowerPC RISC processor blocks


Thirty-two 32-bit General Purpose Registers (GPRs) Low power consumption: 0.9mW/MHz IBM CoreConnect bus architecture support
Basic Architecture 34

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 35

Virtex-4 Architecture Has the Most Advanced Feature Set


RocketIO Multi-Gigabit Transceivers
622 Mbps10.3 Gbps

Smart RAM
New block RAM/FIFO

Advanced CLBs
200K Logic Cells

Xesium Clocking Technology


500 MHz

Tri-Mode Ethernet MAC XtremeDSP Technology Slices


256 18x18 GMACs 10/100/1000 Mbps

PowerPC 405 with APU Interface


450 MHz, 680 DMIPS
Basic Architecture 36

1 Gbps SelectIO
ChipSync Source synch, XCITE Active Termination

Choose the Platform that Best Fits the Application


LX
Resource

FX
12K140K LCs 0.610 Mb 420 32192 240896 024 Channels 1 or 2 Cores 2 or 4 Cores

SX
23K55K LCs 2.35.7 Mb 48 128512 320640 N/A N/A N/A

Logic Memory DCMs DSP Slices SelectIO RocketIO PowerPC Ethernet MAC

14K200K LCs 0.96 Mb 412 3296 240960 N/A N/A N/A

Basic Architecture 37

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 38

Review Questions
List the primary slice features List the three ways a LUT can be configured

Basic Architecture 39

Answers
List the primary slice features
Look-up tables and function generators (two per slice, eight per CLB) Registers (two per slice, eight per CLB) Dedicated multiplexers (MUXF5, MUXF6, MUXF7, MUXF8) Carry logic MULT_AND gate

List the three ways a LUT can be configured


Combinatorial logic Shift register (SRL16CE) Distributed memory
Basic Architecture 40

Summary
Slices contain LUTs, registers, and carry logic
LUTs are connected with dedicated multiplexers and carry logic LUTs can be configured as shift registers or memory

IOBs contain DDR registers SelectIO standards and DCI enable direct connection to multiple I/O standards while reducing component count Virtex-II memory resources include the following:
Distributed SelectRAM resources and distributed SelectROM (uses CLB LUTs) 18-kb block SelectRAM resources
Basic Architecture 41

Summary
The Virtex-II devices contain dedicated 18x18 multipliers next to each block SelectRAM resource Digital clock managers provide the following:
Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)
Basic Architecture 42

Where Can I Learn More?


User Guides
www.xilinx.com Documentation User Guides

Application Notes
www.xilinx.com Documentation Application Notes

Education resources
Designing with the Virtex-4 Family course Spartan-3E Architecture free Recorded e-Learning

Basic Architecture 43

Outline

Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix


Basic Architecture 44

Double Data Rate Registers


DDR registers can be clocked
By Clock and NOT(Clock) if the duty cycle is 50/50 By the CLK0 and CLK180 outputs of a DCM
D1

Clock

OCK1 D2

Reg DDR MUX Reg

OBUF

PAD
FDDR

OCK2

If D1 = 1 and D2 = 0, the output is a copy of Clock


Use this technique to generate a clock output that is Basic Architecture 45 synchronized to DDR output data

Dual-Port Block RAM Configurations


Configurations Configuratio n available on 16k x 1 each port
8k x 2 4k x 4 2k x 9 1k x 18 512 x 36 Depth 16 kb 8 kb 4 kb 2 kb 1 kb 512
IN 8 bit

Data Bits Parity Bits 1 2 4 8 16 32


Port A: 8 bits

0 0 0 1 2 4

Independent configurations on ports A and B


Supports data-width conversion, including Basic Architecture 46 parity bits

Port B: 32 bits

OUT 32 bit

Clock Buffer Configurations


Clock buffer (BUFG)
Low-skew clock distribution
I
BUFG

Clock enable buffer (BUFGCE)


Holds the clock output Low when Clock Enable (CE) is inactive CE can be active-High or active-Low Changes in CE are only recognized when the clock input is Low to avoid glitches and short clock pulses
Basic Architecture 47

I
BUFGCE

CE

Clock Buffer Configurations


Switches from one clock to another, glitch-free After a change on S, the BUFGMUX waits for the currently selected clock input to go Low The output is held Low until the newly selected clock goes Low, then switches
Basic Architecture 48

BUFGMUX

Clock multiplexer (BUFGMUX)

I0

I1 S

S I0 I1 O

Wait for low Switch

Вам также может понравиться