Вы находитесь на странице: 1из 38

Basics of Reconfigurable

Computing

Nagendra P Gajjar
Associate Professor, EC Dept
IT, Nirma University
nagendra.gajjar@nirmauni.ac.in

1
Features of Reconfigurable Computing

 In the middle of μP and ASICs as performance and power


consumption
 Parellalism due to hardware implementation
 Cheaper for low volume production
 Design time and time to market is much less
 Functional units can change over time
 General Purpose

2
System Level Architectures

Five classes of reconfigurable systems


a External stand-alone processing unit
b Attached processing unit
c Co-processor
d Reconfigurable functional unit
e Processor embedded in a reconfigurable fabric

3
Reconfigurable Fabric

 consists of reconfigurable functional units, reconfigurable


interconnect and a flexible interface to the rest of the system
 There is a tradeoff between flexibility and efficiency.
Functional Units

 Fine Grained: implements functions for single or small number of bits


 Coarse Grained: larger, ALU, DSP
Emerging Directions

 Low power techniques


 Activity reduction in power-aware design tools
 Leakage current reduction
 Dual supply voltage methods
 Asynchronous architectures
 Fine grained asynchronous pipelines
 Quasi delay insensitive architectures
 Globally async. Locally sync.
 Molecular microelectronics
 Promising for increasing capacity and performance of reconf.
Comp. Arch.
Main Trends in Architectures

 Coarse grained fabrics


 Cost of interconnection is increasing due to advancements in tech.,
granularity of LU’s are increasing to reduce routing overhead
 Heterogeneous functions
 Due to migration to more advanced tech., number of transistors
devoted to the reconf. logic increases. Multipliers , DSP units can
be added
 Soft cores
 Instructions are programmable
 Although being less area and speed efficient, flexible
Design Methods
• Hardware compilers for high-level descriptions are key to reduce the
productivity gap for advanced circuit development
• Design methods can be generally categorized into two; general design
methods and special design methods
 General purpose designs
 Annotation and constraint driven approach
 Source-directed compilation approach
 Special purpose design
 Digital Signal Processing
 The word-length optimisation
 Other Design Methods
 Run time customisation
 Soft instruction processor
 Multi FPGA compilation
General Purpose Design

 Design methods and tools are based on general purpose programming


languages such as C, C++ and Java
 HDLs: VHDL and Verilog are also available
 A number of compilers from C to hardware have been developed
 Target hardware or hardware/software
 Two different approached

 Annotation and constraint driven approach


 Annotations are used in source-code + constraint files
 Only minor changes are needed to produce a compilable
program from software description
 Source-directed compilation approach
 Source languages are adopted to enable explicit description of
parellelism, communication and other customisable hardware
resources
Summary of General Purpose Hardware Compilers
Special Purpose Design

 There are many specific


problem domains which deserve
special consideration
 Exploiting domain specific properties:
 Describe the computation (MATLAB-
Simulink)
 Optimise the implementation

 Digital Signal Processing


 The word-length
optimisation
Digital Signal Processing

 One of the most successful applications for


reconfigurable computing is real-time digital signal
processing(DSP)
 Inclusion of hardware support for DSP in FPGA
 DSP problems share similar properties
 Algorithms are numerically insensitive but have very simple
control structures
 Controlled numerical error is acceptable
 SNR exist for measuring numeric precision
 Design is performed in Simulink
 Designer need not deal with low level implementation
issues
The word-length optimisation

 Unlike mP based implementations size of each variable is customisable.


 Numerical accuracy, design size, speed and power cons.
 Most important decision is selection of an appropriate word-length and
scaling for each signal
 The accuracy is less sensitive to some variables than to others
 Considering error and area information, it is possible to achieve a highly
efficient DSP implementation
 Word-length optimisation problem is NP-hard
 Heuristics are used, area/signal quality tradeoff
 Analytic methods are faster but pessimistic

Area % Speedup %
Bitwise 15-56 65
MATCH 80 20
Wadeker&Parker 15-40
Bitsize 20-30
Other Design Methods

 Run-time customisation
 Soft instruction processors
 Multi-FPGA compilation
Run-time customisation

 One part of the system continues to be operational, while


other part is being reconfigured
 At compile time, an initial configuration bitstream and
incremental bitstreams have to be produced
 Runtime design optimisation techniques
 Runtime constraint propagation, producing a smaller circuit with
higher performance by boolean algebra optimisation
 Library compilation, precompiled modules are loaded into
reconfigurable resources by procedure call mechanisms
 Exploiting information about program branch probabilities
 Promote utilization by dedicating more resources the branches which
execute more frequently
 Hardware compiler produces a collection of designs, each optimised
for a particular branch probability, best one is selected at runtime
Soft Instruction Processor

 Examples of soft instruction processors are ARM, LEON,


MizroBlaze and Nios
 Customisation of resources and instructions is supported
 Time for instruction fetch and decode is reduced, each
custom instruction replaces several regular instructions
 Additional resources can be assigned to a custom
instruction to improve the performance
Field Programmable Gate
Array(FPGA)
• A FPGA is a device made that is easy to
configure for any developer's or customer's
need.
• SRAM based reprogrammable device.
• Logic blocks-reconfigurable.
• Possible to design programming with needed
aspects.

17
Typical Architectures, Advantages
and Disadvantages
• 4-Input lookup Table (LUT).
• Flip-Flops.
• Output Block.
• Better flexibility with higher bandwidth.
• Parallel Processing.
• ADC or DAC incorporated in FPGA.
• FGPA can show its processing speed.
• Big circuits may have complexity to design
and higher cost.
18
Internal of Reconfigurable Architecture

19
Application-Specific Integrated
Circuit(ASIC)
Different Design Architectures
• An ASIC is a customized IC for a particular

task or application
• Not general purpose device.
• Modern ASIC using 32-bit processor, meomory
block with ROM, RAM, Flash and other some
devices.
• Full Custom Design.
• Standard Cell Design.
• Gate Array Design.
• Field Programmable Logic.ROM, RAM, Flash and other
some devices.
20
ASIC Properties

• Much higher bandwidth, less power


consumption and lower costs.
• Customer-specific design rather than general
purpose device.
• Possible to design with optimum
performance.
• Support advanced programming language.
• Full custom design is very costly.
• Less reconfigurability.
21
Partial Reconfiguration

22
Remote HW-SW Reconfigurable

23
Introduction to PR
 Partial Reconfiguration:

o Static Partial Reconfiguration: device is inactive


without affecting other areas of the device

o Dynamic Partial Reconfiguration (DPR):


Reconfiguring a portion of the device while the
remaining design is still active and operating
without affecting the remaining portion of the
device.
24

24
25
3 – PRR
4 – RM
Device Partial Reconfiguration

Reconfig
via ICAP
Reconfiguration Module - I
Reconfiguration Module - II
Reconfiguration Module - III
Typical Configuration Mode
 Fixed configuration
 Data loads from PROM or other
source at power on
 Configuration fixed until the end
of the FPGA duty cycle

 Used extensively during


traditional design flow Configuration Device
 Evaluate functionality of design Overhead Duty-cycle
as it is developed

Function
Power Shut
On Time Down

26
Reconfiguration Mode

 Configuration memory is no
longer fixed during the system
duty cycle

 Initial bitstream loaded at


power-on Configuration Reconfiguration
Overhead Overhead
 Different, full device bit streams

Function
loaded over time

Power Shut
On Time Down

27
Partial Reconfiguration Mode
 Only a subset of configuration
data is altered

 But all computation halts while


modification is in progress…
Configuration Reconfiguration
 Main benefit: Overhead Overhead
reduced configuration overhead

Function

Power Shut
On Time Down

28
Dynamic PR Mode
 A subset of the configuration data
changes…

 But logic layer continues operating


while configuration layer is
modified…
Reconfiguration
Configuration
 Configuration overhead limited to Overhead
Overhead
circuit that is changing…

Function

Power Shut
On Time Down

29
DPR Tools Flow
(Using Micro-blaze and ICAP)
• Design entry (RModules) Using Xilinx ISE 9.2i
• Design of Base System Using Xilinx EDK 9.2i
• Design entry (Top-level) Using Xilinx ISE 9.2i
• Floor Planning Using Xilinx PlanAhead 10.1 with
PR9
• Initial budgeting
• Active module
implementation Using Xilinx Explore Ahead 10.1
• Final assembly with PR9
• Verify design
• Create bitstream
• Merging .bit file with .elf Using Xilinx EDK Shell 9.2i
file
30
System Architecture

PLB MicroBlaze LMB


MicroBlaze

Bus
Macro ICAP SyatemACE UART

PRR
Terminal
Sampler

31
Working Design Flow

34
Partial Reconfigurable Arithmetic
Coprocessor (PRAC).

35
B
Mu
Control
 LS x
Circuit
0

 Mux  Mux
 M  Mu
 Contro  Contr
ux x
l ol
D
 LS

 ADD 1

 ADD/SU  RS
B D
 Control R
 q
Circuit S
LS – Left Shifter
 Output RS – Right Shifter

36
Device Utilization data – Comparison between
reconfigurable unit and individual units - 32 bit
Device Utilization data – Comparison between
reconfigurable unit and individual units - 32 bit
Combin
% Maxim
e
saving um size
utilizati
Logic Reconfigura in in the
on of
Utilization ble Unit hardwa device
Individ
re availabl
ual
space e
Units
Number of Slice
440 1048 58.0 69120
Registers
Number of Slice
442 499 11.4 69120
LUTs
Number of fully
325 442 26.4 357
used Bit Slices
Number of
598 357 ---- 640
bonded IOBs
Number of
BUFG/BUFGCT 1 1 -- 32
RLs

37
Device Utilization data– Comparison between
reconfigurable unit and individual units - 3 bit
Device Utilization data– Comparison between reconfigurable unit
and individual units - 3 bit
Combin
% Maxim
e
saving um size
utilizati
Reconfigura in in the
Logic Utilization on of
ble Unit hardwa device
Individ
re availabl
ual
space e
Units
Number of Slice
102 140 27.1 69120
Registers
Number of Slice
101 78 -29.5 69120
LUTs
Number of fully
46 64 28.2 357
used Bit Slices
Number of
115 57 ---- 640
bonded IOBs
Number of
BUFG/BUFGCT 1 1 -- 32
RLs

38
Conclusion
+ Energy savings are in average 35% to
70%, and speedup is in average 3 to 7
times.
+ Reduction in size and component
+ Time to market
+ Flexibility and upgradability

39
Thank
you
40

Вам также может понравиться