Вы находитесь на странице: 1из 70

INTRODUCTION TO FPGA BASED EMBEDDED SYSTEMS

What is embedded System?


Any system implemented using a general purpose CPU but is not a general purpose computer in itself: CPU programmed to perform a specific task

Examples of embedded systems


Home automation
Microwave oven Washing machines Automatic caller ID machines

Office automation
Printers and peripherals Fax modem Xerox machines Scanner

Instrumentation
Panel meters Dataloggers

Industrial control
Automatic circuit breaker CNC machines Energy monitoring & control

Medical electronics
Remote patient monitoring units: ECG EEG CAT & MRI scanning

Automobile industry
Fuel injection control Security & safety devices Window & wiper control

Why embedded systems?


Embedded systems are Small & compact Have tight architecture & limited instructions Are limited to specific applications Can operate with low powered batteries

Microcontrollers
Most commonly used in Embedded systems Microprocessors with all peripherals integrated into a single chip Eg; Atmel INTEL, Philips, Microchips PIC 16/17 Cxxx family, Power PC, ARM, Strong ARM

Limitations of microcontrollers
MIPs rating inadequate for more advanced applications Cellphones require Equaliser, Source codec Channel codec (Viterbi decoder) These are too complex to be done with MCU Advanced processing functions require either a DSP or an FPGA/ASIC

Block diagram of Cellphone Transmitter

Advantages of MCU
Very good application base for user interface such as Key board interface Display interface A/D & D/A interface Interface to motors TCP/IP protocol stack with advanced MCU Programming using C, C++

What is an OS?
OS is a core software which enables the computer system to function. It is the interface between the application and the hardware Examples: MSDOS Windows UNIX LINUX

OS Size
They cater to variety of applications and hence require memory Win 95 180MB HDD & 16 MB RAM Win 98 195MB HDD & 32 MB RAM Win XP-1 1 GB HDD & 256 MB RAM Linux: depends on version

Real time OS (RTOS) OS with features to support real time systems Correctness depends not only on the logical result but also on the result delivery time Respond in a timely & predictable way to unpredictable external stimuli (interrupts) Hard real time : Missing a dead line leads to catastrophic results Eg. Anti aircraft missile Firm real time : Missing dead line leads to unacceptable quality reduction eg. International telephone calls through satellite links Soft real time : Reduction in system quality due to delayed response is tolerable Eg. Air line reservation system

Features of embedded OS 1. Small foot print: Use only few kilobytes of RAM & ROM 2. Unmanned intelligence: Should run for years w/o human intervention H/w & S/w should never fail Should have no mechanical parts such as Floppy drives or hard disk: Occupy more space, use more energy, slow to communicate & complex to drive

3. Monitoring & control of appln: Hardware watchdogs: Timers force the CPU to reset if CPU does not reset it due to malfunctioning Software watchdog: If CPU is all right but some other block is wrong an interrupt is invoked to rectify it. 4. Low power : Battery life 5. Rebooting Instantly & in safe state 6. Low cost

Typical EOS/ RTOS VxWorks for Embedded TCP/IP QNx Automative & Control RTX51 Instrumention & Control WinNTE Networked applications Win CE PDAs, Mobile commn.

Advanced Embedded systems


Require both the MCU function & DSP function System on a chips integrate both OMAP from TI has both power PC & TMS320C54X/55x in a single core Virtex II Pro FPGA from Xilinx has both Power PC & FPGA in a single core

FPGAs with 10M gates a reality


Enables integration of MCU & DSP functions in a single core Enables implementation of MCUs in softcores. No. of MCUs can be programmed

VLSI implementation of functions :PLA, PAL & GAL SOP : Y = ACD + ABC + BCD AND Array realizes products OR array realizes SUMs of products Programmable logic arrays(PLAs): Two level AND-OR device that can be programmed to realize any SOP logic expression Both AND and OR array are programmable Programmable Array logic (PAL): Here the OR arrays are fixed Generic Array logic: Reprogrammable; Either Y or Y can be obtained Tristate output possible

ALTERA MAX 22V10 PLD

FPGAs: Field programmable Gate Arrays: Gate arrays whose functionality user programmable Mutilevel logic functions : PLDs only 2 level logic FPGA are of 2 types : MUX based and Look up table based FPGAs have a 2D array of user programmable logic blocks: Configurable logic blocks (CLBs) Slices Logic array blocks Logic modules (LMs) They implement smaller functions. A no. of such blocks interconnected to realise bigger functions: Eg: implement 5 input using AND gate using 3 input AND gates Interconnects can be realised using: Antifuses & SRAM

MUXes may be used for realising boolean functions

FPGA TECHNOLOGIES

ROMS may be used for implementing BFs

ROM with 4 locations implements EXOR function


A,B are the address inputs

A 0 0 1 1

B 0 1 0 1

Y 0 1 1 0

A B

0 1 1 0

Xilinx & Altera FPGAs use LUTs/ROMs with


4 address inputs & 16 locations

Altera Flex FPGAs also have LUTs, Fast CY logic & F/Fs

Logic cells and interconnects use SRAM cell

Logic cell array

FPGA WITH SRAM TECHNOLOGY

Details of interconnect in Xilinx LCA

Advanced compared to XC4000

Similar to 1 CLB of XC4000

Advantages of FPGAs

Small development time Availability of IP cores Lower cost for smaller numbers Field upgradability of functions

Wider application potential

Evolution of FPGAs
Replacement for TTL glue logic . Fast prototyping device Complete system on a chip (SOC) with Configurable logic blocks (FFs & LUTs) Embedded RAM, RISC processor,

Dedicated multiplier blocks,


MultiGigabit transceivers

Intellectual property (IP) cores


PCI interface
.Voice/Video codec FFT/DCT computation

Convolutional encoder /Viterbi decoder


Filters & equalisers Data Encryption/decryption Digital modulation/Demodulation blocks

Advantages of advanced FPGAs

RISC processor, LUTs & RAM on a single FPGA enables optimum apportioning of task between software & hardware Internet reconfigurablity enables the partitioning by remote control Altera and Xilinx compete & hold > 50% share Design FPGAs with increased complexity & reduced cost per logic element.

Programmable logic device market leaders


In Q4 of 2002, Altera introduced the first two Cyclone FPGAs , the EP1C20 and EP1C6 Densities range 2,910 - 20,060 LEs, embedded memory up to 288 Kbits Priced < $1.50 per 1,000 LEs) for volume applications (200 K & above ). APEX 200K gate EP200KE costs < $80 on prototyping quantities < less than $40 on large volumes

Xilinx matching show


300K gate XC2S300 costs $20 for large volume Board with XC2S300 & Web pack s/w costs $ 275 Virtex II PRO @ $45 dollars in 04 for large volumes. Virtex-II Pro include MGTs Upto 4 integrated IBM PowerPC CPU cores. APEX, Cyclone, Spartan II & Virtex II Pro Use 1.5-V, 0.13-m all-layer-copper SRAM process. The copper routing provides Low power &low-latency clock distribution

Common Features of the leaders


Altera's 1G Excalibur XA hybrid chips, includes ARM architecture in hard core , Altera has soft core 16 & 32 bit RISC IP called as Nios Reside in FPGA logic as instead of in ASIC gates.
Virtex II Pro:Upto 4 integrated IBM PowerPC CPU cores.

MicroBlaze: Xilinxs soft- 32-bit CPU IP core Both Xilinx&Altera supply freewares for design: Altera : Quartus II design software Xilinx: WebPack ISE 5.1i for design sizes upto 300K gate.

FPGA based H/w implementation schemes


Direct: Minimum sampling period Tmin =Tm + (N-1)Ta Tm = computation time of a multiplier, Ta = computation time of an adder Restrict the maximum sampling rate

xn
D X h0

xn-1
D X + h1

xn-2
D X + h2 X + D X + hN-1`

yn

FPGA based H/w implementation schemes


Transpose: Tmin =Tm + Ta Needs driving N multipliers Feeding output to N-1 registers in sync

xn
X h N-1 D X + hN-2 D X + h2 D X + X + h0`

h1` D

yn

(32 read operations /sample)

Comparison of single MAC DSP with FPGA


50 tap FIR filter requires 500 ns in a DSP with 100 MHz clock (10 ns period) per o/p Sampling period > 500 ns Sampling rate < 2 MSPs Input frequency < 1 MHz With 50 multipliers a 100 MHz FPGA can do the same in about 11 ns (Tm+ Ta) Sampling rate < 90 MSPS Input frequency < 45 MHz 40 fold increase in speed

DSP Vs Microcontroller
Microcontroller Digital Signal Processor
Multicycle instruction set. Single cycle inst. set. Multicycle multiplicity. Single cycle multiply. 8 or 16 bit support. 16/32 bit fixed or floating. Limited onchip RAM .Large on chip data RAM. Limited data pointers. Data pointers. Limited BW and limited algorithms. Speed!

Comparison of DSPs & FPGAs

FPGAs have a no. of choices for filter implementation


Direct Form 1 Transpose (Form 2) Pipelined filter Parallel filter Polyphase filter Pipelined & paralleled filter Trade off speed Vs Power Speed Vs Area or Minimise area & power

Approaches for DSP system design


Full Custom Design
Application Specific Integrated Circuits (ASIC).

Semi Custom Design. 1.Gate arrays. 2.Standard cell based design.


Field Programmable Gate arrays (FPGA) & CPLDs

Programmable DSP (PDSPs)

Advantages of VLSI
Better reliability Lower board space Lower power consumption Larger complexity

Challenges of VLSI era


Complexity in Testing
Need for CAD based tools for Design &testing Smaller time to market Smaller product life cycle Complexity increases following Moores law: No. of devices double every 18 months

Need for hardware description languages


Limitations of High level languages Useful only for sequential hardware Word size cannot be exactly specified Delays cannot be specified: M/c dependent Popular HDLs: Verilog & VHDL, SystemC

Role of softcore processors


Microcontrollers well suited for control loops (if then else)
Interfacing input & output devices (key boards and displays) Send & receive data from remote units using serial & ||l Tx modes

Send & Receive packets


In software defined radio, the frequency, phase and amplitude of the DDFS need to be controlled through s/w Cut off frequencies of the analog BPF and LPF need to be tuned through s/w These data may be received remotely Requires a communication interface.

A rake receiver implemented in digital hardware/software requires a large no. of parameters to be altered through training

RAKE receiver, used in CDMA cellular systems, can combine multipath components To improve the signal to noise ratio (SNR) at the receiver Provides a separate correlation receiver for each of the multipath signals Multipath components are practically uncorrelated when their relative propagation delay exceeds one chip period The basic idea of A RAKE receiver was first proposed by Price and Green and patented in 1956

M-finger RAKE Receiver RAKE receiver utilizes multiple correlators to separately detect M strongest multipath components Each correlator detects a time-shifted version of the original transmission, and each finger correlates to a portion of the signal, which is delayed by at least one chip in time from the other fingers The outputs of each correlator are weighted to provide better estimate of the transmitted signal than is provided by a single component Outputs of the M correlators are denoted as Z1, Z2,, and ZM Outputs are weighted by 1, 2,, and M, respectively Demodulation and bit decisions are then based on the weighted outputs of the M correlators

FPGA vendors provide tools and s/w for implementation of multiple softcore processors in an FPGA :eg. MUTEX
Multiple processors can be used to derive the benefits of multicore programming Implementation of the TCP/IP stack becomes simplified due to linux for softcore processors :eg. Clinux Optimum partitioning of tasks between h/w & s/w Speech Codec Video codec

Encryption
Channel decoder: Viterbi decoder Transform blocks: FFT, DCT, DWT

Reconfiguration through E-Mail


Earlier FPGA families require reconfiguration of the whole FPGA. With Virtex FPGA families & above , partially configure or partially reconfigure possible Can reconfigure a remote FPGA via the Internet using Xilinx Internet Reconfigurable Logic (IRL) technology. Securely reconfigure FPGAs in the field simply by sending an e-mail message Xilinx IRL reconfiguration technology uses Internet e-mail protocols: TCP/IP Transmission Control Protocol over Internet Protocol transports the e-mail over the Internet to its destination. SMTP Simple Mail Transfer Protocol is used to deliver the messages. POP3 Post Office Protocol 3 retrieves the messages.

Each layer of the protocol stack is an abstraction level hiding details from other layers on top or below. For example, the network access layer does not need to know what kind of data it is carrying.

Implementation
To implement the Internet stack into an FPGA as hardware, it will: Take a lot of time (including VHDL or Verilog design and simulation). Require a robust FPGA. Consume a lot of money. Microcontrollers are good for: Protocol handling Can mitigate the time and cost of building an IRL solution Two microcontroller solutions are possible: Use external microcontrollers. Put a microcontroller inside a Virtex Platform FPGA. There are two ways to embed a microcontroller in a Virtex device. Use the software MicroBlaze microcontroller in Virtex, Virtex-E, Virtex-II, or the new Virtex-II PRO Platform FPGAs. Buy a Virtex-II PRO Platform FPGA with a hard-wired IBM PowerPC 405 microcontroller already onboard.

External Microcontrollers Several microcontroller manufacturers have Internet capable components. Eg: Ubicom microcontrollers SX52BD and IP2022 Some modifications & additions reqd. to control downloading to an FPGA. Due to the small amount of internal memory of the microcontrollers, the Internet protocol stack is tuned to only perform the necessary functions. Once the link has been set up with the e-mail server, the microcontroller asks if there is e-mail available. When there is, the microcontroller checks the e-mail header. If the header is not of a particular type, the controller will delete the mail message on the server. When the e-mail has the correct header type, the microcontroller downloads it. The /PROGRAM pin is toggled, the contents are serialized onto an output pin, and a bit clock is generated. When the DONE pin goes High, the microcontroller deletes the e-mail on the server. The microcontroller breaks the connection. If the DONE pin does not go High after a period of time, the download operation is repeated until the DONE pin goes High.

Heres what happens when the system is powered up: The FPGA is empty. The microcontroller waits for a certain time until all components of the IRL design have reached a stable state. Then the controller connects to the network by sending AT commands to an external modem through an RS-232 device, or by sending AT commands to an onboard chip modem, or by other physical implementation. Fail-Safe Setup : Make the download more reliable by storing the downloaded bitstream into semipermanent memory (flash RAM). FPGA can then be reconfigured from the flash memory.

An even more secure solution is to work with two memories. A basic configuration loaded into the FPGA when it is shipped from the manufacturer. During operation in the field, the microcontroller connects to the Internet and download a new configuration into the second memory. The new configuration bitstream downloaded into the FPGA at next boot. When the download works, the new configuration is used. If the new programming bitstream fails, the microcontroller boots again from original memory. Shortcomings of external microcontroller approach : An Internet connection is obligatory. The server must always have mail ready for the application, or else the application cannot start. When configuration fails, no fail-safe recovery mechanism exists. The design can go into an endless loop trying to download its configuration.

Internal Microcontrollers in FPGAs MicroBlaze software & PowerPC 405 processor in Virtex-II PRO devices. In both cases of soft and hard microcontrollers, the Internet protocol stack must be (ported) onto the FPGA. The FPGA must get a basic (possible partial) bitstream to download the MicroBlaze controller, its memory, and peripherals. The PowerPC Virtex-II PRO Platform FPGA must have memory and peripherals downloaded before its able to boot. The small control algorithm done in the previous description within the external microcontrollers must now be implemented in a small CoolRunner CPLD, as shown in Figure 4.

Вам также может понравиться