Вы находитесь на странице: 1из 30

L12: Reconfigurable Logic Architectures

Acknowledgements:

Materials in this lecture are courtesy of the following sources and are used with permission.
Frank Honore
Prof. Randy Katz (Unified Microelectronics Corporation Distinguished Professor
in Electrical Engineering and Computer Science at the University of California, Berkeley) and
Prof. Gaetano Borriello (University of Washington Department of Computer Science &
Engineering) From Chapter 2 of R. Katz, G. Borriello. Contemporary Logic Design. 2nd ed.
Prentice-Hall/Pearson Education, 2005.
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 1
History of Computational Fabrics

Discrete devices: relays, transistors (1940s-50s)


Discrete logic gates (1950s-60s)
Integrated circuits (1960s-70s)
e.g. TTL packages: Data Book for 100s of different parts
Gate Arrays (IBM 1970s)
Transistors are pre-placed on the chip & Place and Route
software puts the chip together automatically only program the
interconnect (mask programming)
Software Based Schemes (1970s- present)
Run instructions on a general purpose core
Programmable Logic (1980s to present)
A chip that be reprogrammed after it has been fabricated
Examples: PALs, EPROM, EEPROM, PLDs, FPGAs
Excellent support for mapping from Verilog
ASIC Design (1980s to present)
Turn Verilog directly into layout using a library of standard cells
Effective for high-volume and efficient use of silicon area
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 2
Reconfigurable Logic

Logic blocks
To implement combinational
and sequential logic
Interconnect
Wires to connect inputs and
outputs to logic blocks
I/O blocks
Special logic blocks at
periphery of device for
external connections

Key questions:
How to make logic blocks programmable?
(after chip has been fabbed!)
What should the logic granularity be?
How to make the wires programmable? n m
(after chip has been fabbed!) Logic
Logic
Inputs Outputs
SET
D Q

Specialized wiring structures for local


CLR

vs. long distance routes?


How many wires per logic block?
Configuration
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 3
Programmable Array Logic (PAL)

Based on the fact that any combinational logic can be


realized as a sum-of-products
PALs feature an array of AND-OR gates with
programmable interconnect

input AND
signals array OR array

output
signals

programming of programming of
product terms sum terms

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 4


Inside the 22v10 PAL

Each input pin (and its complement) sent to the AND array
OR gates for each output can take 8-16 product terms, depending on output
pin
Macrocell block provides additional output flexibility...

Image removed due to copyright restrictions.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 5


Cypress PAL CE22V10

From Lattice Semiconductor

Image removed due to copyright restrictions.

Images courtesy of Lattice Semiconductor Corporation. Used with permission.

Outputs may be registered


or combinational, positive
or inverted
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 6
Anti-Fuse-Based Approach (Actel)

Rows of programmable I/O Buffers, Programming and Test Logic

I/O Buffers, Programming and Test Logic


logic building blocks

I/O Buffers, Programming and Test Logic


+

rows of interconnect

Anti-fuse Technology:
Program Once

Use Anti-fuses to build I/O Buffers, Programming and Test Logic


up long wiring runs from
short segments Logic Module Wiring Tracks

8 input, single output combinational logic blocks

FFs constructed from discrete cross coupled gates

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 7


Actel Logic Module

Combinational block does not have the output FF


Example Gate Mapping

GND 00
A 01
10 Y
11
D
E

B
C

S-R Flip-Flop

GND 00
VDD 01
10 Q
11
S
GND

R
VDD

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 8


Actel Routing & Programming
Courtesy of Actel. Used with permission. Precharge
Phase

Vpp/2 Vpp/2

Vpp/2

Input Segments
Vpp/2

Inputs
Outputs
Gnd Vpp/2

Horizontal
Channel Vpp/2

Logic Vpp
Module
Antifuse
shorted

Long Vertical Tracks


Output Segments Programming an Antifuse

Programming is Permanent (one time)


Courtesy of Actel. Used with permission.
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 9
RAM Based Field Programmable
Logic - Xilinx

Vcc
Slew Passive
Rate Pull-Up,
Control Pull-Down
CLB CLB

D Q
Switch Output Pad
Matrix Buffer

Input
Buffer
Q D Delay
CLB CLB

Programmable
Interconnect I/O Blocks (IOBs)

C1 C2 C3 C4

H1 DIN S/R EC
S/R
Control

G4 DIN
G3 G F'
SD

G2 Func. G' D Q

Gen. H'

G1
EC
RD
1

H G'
H'
Y
Func. S/R
Gen. Control

Configurable
F4
F3 F DIN
Func. SD
F2 Gen.
F'
G' D Q

Logic Blocks (CLBs)


F1 H'

EC
RD
1
H'
F'
X
K
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 10


The Xilinx 4000 CLB
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 11


Two 4-input Functions, Registered Output
and a Two Input Function
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 12


5-input Function, Combinational Output
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 13


LUT Mapping

N-LUT direct implementation of a truth table: any


function of n-inputs.
N-LUT requires 2N storage elements (latches)
N-inputs select one latch location (like a memory)
Inputs

Why Latches and Not Registers?

Courtesy of Xilinx.
Used with permission.
Output

Latches set by configuration bitstream


4LUT example

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 14


Configuring the CLB as a RAM

Memory is built using Latches not FFs

Courtesy of Xilinx.
Used with permission.
16x2

Read is same a LUT Function!


L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 15
Xilinx 4000 Interconnect

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 16


Xilinx 4000 Interconnect Details

Wires are not ideal!

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 17


Xilinx 4000 Flexible IOB

Adjust Transition Time

Outputs through FF or bypassed

Courtesy of Xilinx.
Used with permission.
Adjust the Sampling Edge

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 18


Add Bells & Whistles
Hard
Processor

Gigabit
Serial

18 Bit
36 Bit
I/O
18 Bit

Multiplier VCCIO

Programmable Z
Z
Impedance
Control Clock
Termination Mgmt
BRAM

Courtesy of David B. Parlour, ISSCC 2004 Tutorial, The Reality and Promise of
Reconfigurable Computing in Digital Signal Processing. and Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 19


The Virtex II CLB (Half Slice Shown)

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 20


Adder Implementation

Cout
LUT: AB

B
A Y = A B Cin

Dedicated carry logic

1 half-Slice = 1-bit adder

Courtesy of Xilinx.
Cin Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 21


Carry Chain
Courtesy of Xilinx.
Used with permission.
1 CLB = 4 Slices = 2, 4-bit adders
64-bit Adder: 16 CLBs

A[63:0]
+ Y[63:0]
B[63:0]

Y[64]
A[63:60]
CLB15 Y[63:60]
B[63:60]

A[7:4]
CLB1 Y[7:4]
B[7:4]

A[3:0]
CLB0 Y[3:0]
B[3:0]
CLBs must be in same column

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 22


Virtex II Features

Double Data Rate registers Digital Clock Manager

Embedded Multiplier
Courtesy of Xilinx.
Used with permission.
Block SelectRAM
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 23
The Latest Generation: Virtex-II Pro

FPGA Fabric Embedded memories

Embedded PowerPc

Hardwired multipliers
High-speed I/O

Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 24


FPGA Evolution Summary [Parlour04]
Hard MAC

Distributed RAM Hard CPU


DSP System
Arithmetic Support Design Tools
High Speed
Serial IO
Logic + FF Block RAM
1000

Transistors
x 106

10

0.1
1980 1985 1990 1995 2000 2005
a in
ue ore lity gic m m m ic
l
G gic C ona o
L for y ste rm Do ecif m
Lo ncti Pla
t S tfo
l a Sp tfor
Fu P Pla
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 25


Design Flow - Mapping

Technology Mapping: Schematic/HDL to Physical


Logic units
Compile functions into basic LUT-based groups
(function of target architecture)
a
c
b
SET SET
D Q D Q
LUT
b Q Q
CLR CLR

always @(posedge Clock or negedge Reset)


begin
if (! Reset)
q <= 0;
else
q <= (a & b & c) | (b & d);
end

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 26


Design Flow Placement & Route
Placement assign logic location on a particular device

LUT

LUT

LUT

Routing iterative process to connect CLB inputs/outputs and IOBs. Optimizes


critical path delay can take hours or days for large, dense designs

Iterate placement if timing


not met

Satisfy timing? Generate


Bitstream to config device

Challenge! Cannot use full chip for reasonable speeds (wires are not ideal).
Typically no more than 50% utilization.
L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 27
Example: Verilog to FPGA

module adder64 (a, b, sum); Synthesis


input [63:0] a, b;
Tech Map
output [63:0] sum;
Place&Route
assign sum = a + b;
endmodule

64-bit Adder Example Virtex II XC2V2000

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 28


How are FPGAs Used?

Prototyping
Ensemble of gate arrays used to emulate a circuit to be manufactured
Get more/better/faster debugging done than with simulation
Reconfigurable hardware
One hardware block used to implement more than one function
Special-purpose computation engines
Hardware dedicated to solving one problem (or class of problems)
Accelerators attached to general-purpose computers (e.g., in a cell
phone!)

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 29


Summary

FPGA provide a flexible platform for implementing


digital computing
A rich set of macros and I/Os supported (multipliers,
block RAMS, ROMS, high-speed I/O)
A wide range of applications from prototyping (to
validate a design before ASIC mapping) to high-
performance spatial computing
Interconnects are a major bottleneck (physical design
and locality are important considerations)

College students will study concurrent programming instead of C as their first


computing experience.
-- David B. Parlour, ISSCC 2004 Tutorial

L12: 6.111 Spring 2006 Introductory Digital Systems Laboratory 30

Вам также может понравиться