Unit Iii: CPLD & Fpga Architecture & Applications

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.
com
UNIT III : CASE STUDIES [CPLD & FPGA ARCHITECTURE & APPLICATIONS]
INTRODUCTION: The Field Programmable Gate Arrays consist of an array of programmable logic blocks
including general logic, memory and multiplier blocks, surrounded by a programmable routing fabric that allows blocks to be . The array is surrounded by programmable input/output blocks, labeled I/O in the figure, that connect the chip to the outside world. Here the term programmable indicates an ability to program a function into the chip after completion of silicon fabrication . This is possible by the programming technology, which is a method that can cause a change in the behavior of the pre-fabricated chip after fabrication, in the field, where system users create designs. The first programmable logic devices used very small fuses as the programming technology. Every FPGA depends on a programming technology that is used to control the programmable switches that give FPGAs their programmability. Programming Technologies There are a number of programming technologies that have been used for reconfigurable architectures. Each of these technologies have different characteristics and have significant effect on the programmable architecture. Some of the well-known technologies are (i).SRAM Based Programming Technology (ii).Flash Programming Technology(EEPROM) , and (iii) Anti-fuse based Programming Technology SRAM-Based Programming Technology
Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors like XILINX, Lattice and Altera etc.. use static memory (SRAM) based programming technology in their devices. These devices use static memory cells which are divided throughout the FPGA to provide configurability. An example of such memory cell is shown below .In an SRAM-based FPGA, SRAM cells are mainly used for following purposes (i). To program the routing interconnect of FPGAs which are generally steered by small multiplexors.
Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com (ii). To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.
There are two primary uses for the SRAM cells. Most are used to set the select lines to multiplexers that steer interconnect signals. The majority of the remaining SRAM cells are used to store the data in the lookup-tables (LUTs) that are typically used in SRAM-based FPGAs to implement logic functions. Historically, SRAM cells were used to control the tri-state buffers and simple pass transistors that were also used for programmable interconnect. SRAM-based programming technology has become the dominant approach for FPGAs because of its re-programmability and the use of standard CMOS process technology and therefore leading to increased integration, higher speed and lower dynamic power consumption of new process with smaller geometry. There are however a number of drawbacks associated with SRAM-based programming this technology
technology. For example an SRAM cell requires 6 transistors which makes costly in terms of area compared to other programming technologies.
Further SRAM cells are volatile in nature and external devices are required to permanently store the configuration data. These external devices add to the cost and area overhead of SRAM-based FPGAs. There is a problem in terms of security of data also. Since the configuration information must be loaded into the device at power up, there is the possibility that the configuration information
Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com
could be intercepted and stolen for use in a competing system. To overcome this problem certain encryption techniques are followed. Electrical properties of pass transistors are not ideal. i.e SRAM-based FPGAs typically rely on the use of pass transistors to implement multiplexers. However, they are far from ideal switches as they have significant on-resistances and present an appreciable capacitive load. As FPGAs migrate to smaller device geometries these issues may be exacerbated. Flash Programming Technology An important alternative to the SRAM-based programming technology is the use of flash or technology inject charge onto a gate that
EEPROM based programming technology. This
floats above the transistor. This approach is used in flash or EEPROM memory cells. These cells are non-volatile; they do not lose information when the device is powered down. With modern IC fabrication processes, it has become possible to use the floating gate cells directly as switches. Flash memory cells, in particular, are now used because of their improved area efficiency. The widespread use of flash memory cells for non-volatile memory chips ensures that flash manufacturing processes will benefit from steady decreases in process geometries. Flash-based programming technology offers several advantages. For example, this programming technology is nonvolatile in nature. Flash-based programming technology is also more area efficient than SRAM-based programming technology. Flash-based programming technology has its own disadvantages also. Unlike SRAM-based programming technology, flash based devices cannot be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology uses non-standard CMOS process. This flash-based programming technology offers several unique advantages, most importantly non-volatility. This feature eliminates the need for the external resources required to store and load configuration data when SRAM-based programming technology is used. Additionally, a flash-based device can function immediately upon power-up instead of having to wait for the loading of configuration data. The flash approach is also more area efficient than SRAM-based technology which requires up to six transistors to implement the programmable storage. The programming circuitry, such as the high and low voltage buffers needed to program the cell, contributes an area overhead not present in SRAM-based devices. However, this cost is relatively modest as it is amortized across numerous programmable elements. In comparison to
3
anti-fuses, an alternative non-volatile programming technology, flash-based FPGAs are reconfigurable and can be programmed without being removed from a printed circuit board. The use of a floating-gate to control the switching transistor adds design complexity because care must be taken to ensure the sourcedrain voltage remains sufficiently low to prevent charge injection into the floating gate . Since newer processes require lower voltage levels, this issue may become less of a concern in the future .One disadvantage of flash-based devices is that they cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually prevents a flash-based device from being properly erased and programmed . Devices such as the Actel ProASIC3 are useful for only 500 programming cycles . For most of the uses of FPGAs ,this programming count is more than sufficient. In many cases FPGAs are programmed for only one use. Another significant disadvantage of flash devices is the need for a non-standard CMOS process. Also, like the static memory-based technology, this programming technology suffers from relatively high resistance and capacitance due to the use of transistor-based switches. One trend that has recently emerged is the use of flash storage in combination with SRAM programming technology. In devices from Altera, Xilinx and Lattice, on-chip flash memory is used to provide nonvolatile storage while SRAM cells are still used to control the programmable elements in the design. This addresses the problems associated with the volatility of pure-SRAM approaches, such as the cost of additional storage devices or the possibility of configuration data interception, while maintaining the infinite re-configurability of SRAM-based devices. It is important to recognize that, since the programming technology is still based on SRAM cells, the devices are no different than pure-SRAM based devices from an FPGA architecture standpoint. However, the incorporation of flash memory generally means that the processing technology will not be as advanced as pure-SRAM devices. Additionally, the devices incur more area overhead than pure-SRAM devices since both flash and SRAM bits are required for every programmable element. Anti-fuse Programming Technology An alternative to SRAM and floating gate-based technologies is anti fuse programming technology. This technology is based on structures which exhibit very high-resistance under normal circumstances but can be programmably blown (in reality, connected) to create a low resistance link.
4
An anti-fuse is a two terminal device with an unprogrammed state presenting a very high resistance between its terminals. When a high voltage (from 11 to 20 volts, depending on the type of anti-fuse) is applied across its terminals the anti-fuse will blow and create a low resistance link. This link is permanent. Anti-fuses in use today are built either using an OxygenNitrogen-Oxygen (ONO) dielectric between N+ diffusion and poly-silicon or amorphous silicon between metal layers or between polysilicon and the first layer of metal. Programming an anti-fuse requires extra circuitry to deliver the high programming voltage and a relatively high current of 5 mA or more. This is done in through fairly sizable pass transistors to provide addressing to each anti-fuse. Anti-fuse technology is used in the FPGAs from Actel , Quick logic , and Cross point. A major advantage of the anti-fuse is its small size, little more than the cross-section of two metal wires. But this advantage is limited by transistors, which the large size of the necessary programming
handle large currents, and the inclusion of isolation transistors that are
sometimes needed to protect low voltage transistors from high programming voltages. A second major advantage of an anti-fuse is its relatively low series resistance. The on-resistance of the ONO anti-fuse is 300 to500 ohms, while the amorphous silicon anti-fuse is 50 to100 ohms. Additionally, the parasitic capacitance of an un programmed amorphous anti-fuse is significantly lower than for other programming technologies. The limitations of this technology are , this technology does not make use of standard CMOS process. Also, anti-fuse programming technology based devices cannot be reprogrammed. The ideal technology should be re-programmable, non-volatile, and that uses a standard CMOS
process. But it is clear that none of the above technologies satisfy these conditions. However, SRAM-based programming technology is the most widely used programming technology. The main reason is its use of standard CMOS process .Due to this reason it is expected that this technology will continue to dominate the other two programming technologies.
Comparison of Programming Technologies Programming Re-Programmable Technology Static RAM Anti-Fuse EPROM EEPROM In-circuit No Outside circuit In-Circuit Volatile Storage Yes No No No Series Resistance 1K 50-500 2 K 2 K Capacitance in pf 15 1.2 5.0 10 10 5X 1X 1X 2X Cell Area
XILINX XC3000 FPGA Device Xilinx introduced the first FPGA family, called the XC2000 series, in 1984 and next offered three more series of FPGAs namely XC3000, XC4000, and XC5000 etc. The first modern-era FPGA was introduced with 64 logic blocks and 58 inputs and outputs. XC3000 series of FPGA devices were introduced in 1985 by XILINX Inc.This was the most successful family of FPGAs. The XC3000 archtecture includes enhancements to the XC2000 architecture to improve performance ,density and usability. The XC3000 architecture was developed with manual tools for design implementation and the architecture also shows a bias towards manual design. The XC3000 Family covers a range of nominal device densities from 2,000 to 9,000 gates, practically achievable densities from 1,000 to 6,000 gates with up to 144 user-definable I/Os. Device speeds, described in terms of maximum guaranteed toggle frequencies, range from 70 to 125 MHz. The XC3000 Configurable Logic block is substantially larger than XC2000 and Each of the lookup tables has four inputs and requires 16 bits of configuration memory. The two lookup tables can be combined with a multiplexer to produce any function of five inputs and some functions of up to seven inputs.The XC3000 archtecture allows faster logic implementation with minimum CLBs in series. There are now four distinct families within the XC3000 Series of FPGA devices XC3000A Family
6
XC3000L Family XC3100A Family XC3100L Family
All four families share a common architecture, development software, design and programming methodology, and also common package pin-outs. XC3000A Family : The XC3000A is an enhanced version of the basic XC3000 family, featuring additional interconnect resources and other user-friendly enhancements. XC3000L Family : The XC3000L is identical in architecture and features to the XC3000A family, but operates at a nominal supply voltage of 3.3 V. The XC3000L is the right solution for battery-operated and low-power applications. XC3100A Family The XC3100A is a performance-optimized relative of the XC3000A family. While both families are bit stream and footprint compatible, the XC3100A family extends toggle rates to 370 MHz and in-system performance to over 80 MHz. The XC3100A family also offers one additional array size, the XC3195A. XC3100L Family The XC3100L is identical in architectures and features to the XC3100A family, but operates at a nominal supply voltage of 3.3V. The details of XC3000 family of devices are given below in the table. S.NO Member of the family 1 2 3 4 5 6 XC3020 XC3030 XC3042 XC3064 XC3090 XC3195 8X8 10X10 12X12 16X14 16X20 22X22 64 80 96 120 144 168 CLB Array Size IOs Max 2000 3000 4200 6400 9000 13000 Gate Capacity Typical 1200 1800 2500 3800 5500 7500
The basic LCA (Logic Cell Array) of XC3000 consists of three components .They are Programmable I/O Blocks , Configurable Logic Block and Programmable Interconnect. In addition to this a small amount of configurable memory is also present .
7
Programmable I/O Block
The I/O Block of the XC3000 is more complex than the XC2000 ,IOB.The important addition in this is a flip-flop in the out-put path .By registering the data in IOB ,the clock to-out- time does ot include interconnect delays. The result is a fast ,predictable clocked output. The XC3000 IOB also includes a programmable pull up, optional output inversion and selectable slew rate. A lower I/O slew rate for low speed signals reduce power surges, simplifying board level design.Input from the pad can be brought into the interior of the chip either directly or registered or both.By allowing both, the I/O block can de-multiplex external signals such as address/data busses,storing the address in the IO flip-flop and feeding the data directly into the wiring. Each user-configurable IOB as shown below, provides an interface between the external
package pin of the device and the internal user logic. Each IOB includes both registered and direct input paths. Each IOB provides a programmable 3-state output buffer, which may be driven by a registered or direct output signal. Configuration options allow each IOB an
inversion, a controlled slew rate and a high impedance pull-up. Each input circuit also provides input clamping diodes to provide electrostatic protection, and circuits to inhibit latch-up produced by input currents.
Each IOB includes input and output storage elements and I/O options selected by configuration memory cells. A choice of two clocks is available on each die edge. The polarity of each clock line (not each flip-flop or latch) is programmable. A clock line that triggers the flip-flop on the rising edge is an active Low Latch Enable (Latch transparent) signal and vice versa. Passive pullup can only be enabled on inputs, not on outputs. All user inputs are programmed for TTL or CMOS thresholds. Configurable Logic Block : The XC3000 CLB is substantially larger than the XC2000 CLB. Each of the look-up tables has four inputs rather than three and hence requires sixteen bits of configuration memory rather than eight. The lookup tables can be combined with a multiplexer to produce any function of five inputs and some functions of up to seven inputs.This allows the XC3000 architecture to implement faster logic. The XC3000 CLB has two flip-flops ,to ensure that all combinational logic can be followed by a pipelining flip-flop. The register rich CLB allows the XC3000 to implement state intensive applications and heavily pipe lined designs efficiently. As shown in the block diagram , each CLB includes a combinatorial logic section, two flip-flops and a program memory controlled multiplexer selection of function. It has the following components Five logic variable inputs A, B, C, D, and E a direct data in DI an enable clock EC a clock (invertible) K an asynchronous direct RESET RD Two outputs X and Y.
XC3000 CLB Each CLB has a combinatorial logic section, two flip-flops, and an internal control section. The CLB has five logic inputs (A, B, C, D and E) ; a common clock input(K); an asynchronous
direct RESET input (RD) and an enable clock (EC) as shown in the block diagram. Each CLB also has two outputs (X and Y) which may drive interconnect networks. Data input for the flipflops within a CLB is supplied from the function F or G outputs of the combinatorial logic, or the block input, DI. Both flip-flops in each CLB share the asynchronous RD which, when enabled , is dominant over clocked inputs. All flip-flops are reset by the active-Low chip input, RESET, or during the configuration process. The flip-flops share the enable clock (EC) which, when Low, re circulates the flip-flops present states and inhibits response to the data-in or combinatorial function inputs on a CLB. The user may enable these control inputs and select their sources. The user may also select the clock net input (K), as well as its active sense within each CLB. This programmable inversion eliminates the need to signal throughout the device. Programmable Interconnect : Programmable-interconnection resources in the Field Programmable Gate Array provide routing paths to connect inputs and outputs of the IOBs and CLBs into logic networks. Interconnections
10
route both phases of a clock
between blocks are composed of a two-layer grid of metal segments. Specially designed pass transistors, each controlled by a configuration bit, form programmable interconnect points (PIPs) and switching matrices used to implement the necessary connections between selected metal segments and block pins. The XC3000 interconnect structure has five general interconnect lines both vertically and horizontally .In addition each CLB has direct connections to adjacent CLBs both vertically and horizontally. Three types of metal resources are provided to accommodate various network interconnect requirements. General Purpose Interconnect Direct Connection Long lines (multiplexed busses and wide AND gates) These interconnects are shown in the diagrams below.
XC3000 Interconnect
11
The channels in the XC3000 are at the ends of the fixed output wires.The output channels are not both adjacent to the CLB.This enlarges the immediate neighbourhood for high speed
connections between CLBs ,since a signal can skip a switch box in two of the four directions.For XC3000 ,the pins are accessable from more than one channel .Therefore the routability depends on which channel the placer expects the router to use to route to the pinand on the ability of the router to bring the signal into the channel on the correct track. Additional enhancements to the XC3000speed and density came as a result of software improvements.These improvements do not change the maximum gate capacity,nor do they change the maximum toggle frequency,but they do increase the typical capacity and narrow the difference between the toggle frequency and the automatically achievable clock frequency .Software has improved the speed of automatically placed and routed designs by about 50% . XILINX XC4000 FPGA Device : The XC4000 was designed to improve performance and gate density for large designs. Several dedicated features were added to the general purpose logic features of XC3000 , resulting an interesting combination of special purpose and general purpose functions. The XC4000 family was designed using placement and routing tools to evaluate architectural decisions. The architectural features were designed to interact efficiently with an automated design methodology. The basic building blocks used in the XC4000 family are : (i)Look-up tables for implementation of logic functions.A designer can use a fumction generator to implement any Boolen function of a given number of inputs by pre-loading the memory with the bit pattern corresponding to the truth table of the function.All functions of a function generator have the timing ,the time to look-up results in the memory.Therefore ,the inputs to the function generator are fully interchangeable by simple rearrangement of the bits in the look-up table. (ii).A Programmable Interconnect Point(PIP) is a pass transistor controlled by a memory cell.The PIP is the basic unit of configurable interconnect mechanism.the wire segments on each side of the transistor are connected depending on the value in the memory cell.The pass transistor introduces resistance into the interconnected paths and hence delay occurs.
12
The advanced Features of the XC4000 FPGAs are : (i).CLBs can be used as on-chip RAM (ii).Fast carry chain for high speed implementation of arithmetic (iii).Boundary scan compatibility (JTAG) (iv).Wide decode logic (v).More global clocks (vi).Faster placement and routing algorithms (vii).Scaled routing resources Configurable Logic Block (CLB): The XC4000 CLB is similar to the XC3000CLB.It contains three lookup tables and two flip-flops.The two primary look-up tables F &G implement any function of four variables.These two results can be brought out of the block independently or they can be combined with another input in the H look up table to make any function of five inputs or some function of up to nine inputs.This allows functions such as nine-input AND ,OR , exclusive OR (parity) or address decode to be done at high speed in one clock.The flip-flop can take their inputs independently from the look-up tables or from external signals,but they share control signals. Unlike XC2000 and XC3000 ,flip-flop outputs are not recirculated internally.A registered feedback signal in the XC4000 must be routed in the general interconnect back to a CLB input pin. The XC3000 can implement arithmetic with sum in one look-up table and carry in another lookup table.The XC4000 CLB can implement arithmetic in this way also,but as the speed of the arithmetic operation is dominated by the speed of the carry chain ,the XC4000 CLB includes dedicated high speed carry logic. The block diagram below shows the XC4000 CLB .The dedicated carry logic in the XC4000 substantially speeds-up arithmetic while doubling its density. This XC4000 Configurable Logic Block (CLB) is based on look-up tables (LUTs). A LUT is a small one bit wide memory array, where the address lines for the memory are inputs of the logic block and the one bit output from the memory is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit memory and can realize any logic function of its K inputs by programming the logic functions
13
truth table directly into the memory. The XC4000 CLB contains three separate LUTs, in the configuration as shown below. There are two 4-input LUTS that are fed by CLB inputs, and the third LUT can be used in combination with the other two. This arrangement allows the CLB to implement a wide range of logic functions of up to nine inputs, two separate functions of four inputs or other possibilities. Each CLB also contains two flip-flops.
Xilinx XC4000 Configurable Logic Block (CLB). XC4000 I/O BLOCK: The block diagram below shows the I/O block.The signals to be output from the chip can be registered before output and enabled by a separate control signal.Outputs can be optionally pulled up or down and the output driver can be configured with either fast or or slow slew rate.Inputs from the pad can be brought into the interior of the chip directly ,registered or both to facilitate multiplexed bus interfaces.Further more ,inputs can drive dedicated decoders ,built into the edge interconnect ,for fast recognition of addresses. The XC4000IOB includes boundary scan logic compatible with the ANSI
IEEE1149.1(JTAG)boundary scan standard. The boundary scan can check internal logic or
14
external logic.Scan operation can take place before and after the FPGA is programmed and do not interfere with the operation of the part.
To provide
high density devices that support the integration of entire systems, the XC4000
chips have system oriented features. For example, each CLB contains circuitry that allows it to efficiently perform arithmetic (i.e., a circuit that can implement a fast carry operation for adderlike circuits) and also the LUTs in a CLB can be configured as read/write RAM cells. A new version of this family, the 4000E, has the additional feature that the RAM can be configured as a dual port RAM with a single write and two read ports. In the 4000E, RAM blocks can be synchronous RAM. Also, each XC4000 chip includes very wide AND-planes around the periphery of the logic block array to facilitate implementing circuit blocks such as wide decoders. Interconnect Structure : The other important feature of this FPGA is its interconnect structure. The XC4000
interconnect is arranged in horizontal and vertical channels. Each channel contains some number of short wire segments that span a single CLB (the number of segments in each channel depends on the specific part number), longer segments that span two CLBs, and very long segments that span the entire length or width of the chip. Programmable switches are available to connect the inputs and outputs of the CLBs to the wire segments, or to connect one wire segment to another.. The figure below shows only the wire segments in a horizontal channel, and does not show the vertical routing channels, the CLB inputs and outputs, or the routing switches.
The salient feature about the Xilinx interconnect is that signals must pass through switches to reach one CLB from another, and the total number of switches traversed depends on the
15
particular set of wire segments used. Thus,
speed-performance of an implemented circuit
depends in part on how the wire segments are allocated to individual signals by CAD tools. Actel FPGAs In contrast to XILINX FPGAs the devices manufactured by Actel are based on anti fuse technology. Actel offers three main families .They are : Act 1, Act 2, and Act 3. Actel devices are based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows and there are horizontal routing channels between adjacent rows. This architecture is shown in figure below. The logic blocks in the Actel devices are relatively small in comparison to the LUT based ones. , and are based on multiplexers. The figure illustrates the logic block in the Act 3 and shows that it comprises an AND and OR gate that are connected to a multiplexer based circuit block. The multiplexer circuit is arranged such that, in combination with the two logic gates, a very wide range of functions can be realized in a single logic block. About half of the logic blocks in an Act 3 device also contain a flip-flop.
Actel FPGA structure.
16
Actels interconnect is organized in horizontal routing channels. The channels consist of wire segments of various lengths with anti-fuses to connect logic blocks to wire segments or one wire to another. Also, Actel chips have vertical wires that overlay the logic blocks, for signal paths that span multiple rows. In terms of speed-performance, it is evident that Actel chips are not fully predictable, because the number of anti-fuses traversed by a signal depends on how the wire segments are allocated during circuit implementation by CAD tools. However, Actel provides a rich selection of wire segments of different length in each channel and has developed algorithms that guarantee strict limits on the number of anti-fuses traversed by any two-point connection in a circuit which improves speed-performance significantly.
Quicklogic pASIC FPGAs : The Quicklogic is the main competitor for Actel in anti-fuse -based FPGAs . It produces two families of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version of pASIC. The pASIC, consists of a regular two-dimensional array of blocks called pASIC Logic Blocks (pLBs).The logic capacities of first generation of Quick Logic FPGAs is between 48 and 380pLBs,or 500 to 4000 equivalent MPGAs gates. As shown in figure below pASIC has similarities to other FPGAs i.e the overall structure is array-based like Xilinx FPGAs, and logic blocks use multiplexers similar to Actel FPGAs, and the interconnect consists of only long- lines like in Altera FLEX 8000. It is to be noted that the pASIC architecture is now independently developed by Cypress also.
17
Structure of Quicklogic pASIC FPGA. It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer of metal. When compared to Actels PLICE anti-fuse, Via Link offers a very low on-resistance of about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. The Via Link antifuses are present at every crossing of logic block pins and interconnect wires, providing generous connectivity.
Quicklogic (Cypress) Logic Cell pASICs multiplexer-based logic block is shown in the above figure. It is more complex than Actels Logic Module, with more inputs and wide (6-input) AND-gates on the multiplexer select lines. Every logic block also contains a flip- flops. Altera FLEX 8000 and FLEX 10000 FPGAs :
18
The first FPGA chips from Aletra were simple arrays of logic cells ,which are relatively simple logic elements (LEs),each element comprising of a three input look-up table (LUT ) to generate logic functions ,a single configurable flip-flop and multiplexers for routing the signals and selecting clocks. The logic cells were connected by switch boxes instead of fixed interconnect. The general architecture of Alteras FPGAs is shown in the diagram below .
. There are two high performance FPGA series called FLEX series. Alteras FLEX 8000 series consists of a three-level hierarchy similar to CPLDs. However, the lowest level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is categorized here as an FPGA. It should be noted, however ,that FLEX 8000 is a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000gates to more than 15,000 for the 8000 series. The architecture of FLEX 8000 is shown in figure below. The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for arithmetic circuits (similar to Xilinx XC 4000). The LE also includes cascade circuitry that allows for efficient implementation of wide AND functions.
19
In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term borrowed from Alteras CPLDs). As shown in Figure below each LAB contains local interconnect and each local wire can connect any LE to any other LE within the same LAB.
Architecture of Altera FLEX 8000 FPGAs.
Altera FLEX 8000 Logic Element (LE). Local interconnect also connects to the FLEX 8000s global interconnect, called Fast Track. Fast Track is similar to Xilinx long lines in that each Fast Track wire extends the full width or height of the device. However, a major difference between FLEX 8000 and Xilinx chips is that Fast Track consists of only long lines. This makes the FLEX 8000 easy for CAD tools to
20
automatically configure. All Fast-Track wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length segments because there are fewer programmable switches in the longer paths. Predictability is furthered aided by the fact that connections between horizontal and vertical lines pass through active buffers. The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family. FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks of SRAM, called Embedded Array Blocks (EABs) which shows that each row in a FLEX 10000 chip has an EAB on one end. Each EAB is configurable to serve as an SRAM block with a variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an EAB can alternatively be configured to implement a complex logic circuit, such as a multiplier, by employing it as a large multi-output lookup table. Altera provides, as part of their CAD tools, several macrofunctions that implement useful logic circuits in EABs. Counting the EABs as logic gates, FLEX 10000 offers the highest logic capacity of any FPGA, although it is hard to provide an accurate number. Concurrent Logic FPGA Device : The manufacturer Concurrent Logic offers the CFA6006 FPGA device ,which is based on two dimensional array of identical blocks ,where each block is symmetrical on its four sides. The array holds 3136 of such blocks ,providing a total logic capacity of about 5000 equivalent gates. Connections are formed configured by a static RAM programming technology. The structure of the Concurrent Logic Block is shown below diagram. It comprises of user configurable multiplexers, basic gates and a D type flip-flop .The concurrent FPGA is especially suitable for register-intensive and arithmetic applications since the logic block can easily implement a half-adder and a register bit. using multiplexers that are
21
There are two direct connections A and B formed by routing signals through the multiplexers within the blocks.Long connection is implemented using a bussing network, in which wires of various lengths are superimposed on the array of logic blocks. Crosspoint Solutions FPGAs: The crosspoint FPGAs are different from other transistor level as aoposed to logic FPGAs because it is configurable at the
block level in other FPGAs.Basically the architecture
consists of rows of transistor pairs ,where the rows are separated by horizontal wiring segments .Veritical wiring segments are also available ,for connection among the rows. Each transistor row comprises two lines of series connected transistors ,with one line being NMOS and the other PMOS .The wiring resources allow individual transistor pairs tobe
interconnected to implement CMOS logic gates. The programming technology used for the programmable switches is similar to the Via-Link anti-fuse ,which is based on amorphous silicon. The structure of the transistor pair rows is shown in below diagram.The diagram shows the implementation of a NOR gate and a NAND gate using the transistor lines. The transistor gates ,drains , sources can be programmable interconnected to other transistors and also to power and ground.The series connections across the lines is broken where necessary by permanently
22
holding a transistor in its OFF state. A wide range of logic gates can be implemented by the transistor lines and the interconnection patterns.
The FPGAs currently offered by Crosspoint Solutions has a total logic capacity of 4200 gates.The chip has 256 rows of transistor pairs and an additional 64-rows of multiplexer like structures are provided.With its rows based architecture ,anti-fuse programming technology and multiplexers ,the Crosspoint FPGAs are most similar to those of Actel FPGAs. ALGOTRONIX CAL-1024 This design has a two-dimensional mesh array structure which resembles the gate array sea of gates architecture . Like the Xilinx architecture, Algotronics used Static RAM programming technology to specify the function performed by each logic cell and to control the switching of connections between cells. The CAL1024 design contains 1024 identical logic cells arranged in a 32 X 32 matrix. The design is considered to be a mesh-connected architecture since each cell is directly connected to its nearest north, south, east, and west neighbors. In addition to these direct connects, two global interconnect signals are routed to each cell to distribute clock and other low skew requirement control signals. Figure below shows the basic array architecture, indicating both nearest neighbor and global connections to the logic cells.
23
ALGOTRONIX Array Architecture The basic building block of the Algotronix design is a configurable cell containing multiplexers and a function unit. As indicated in the figure , the function unit is preceded by multiplexers which select the source for the X1 and X2 inputs. The function unit is capable of generating any logic function of the two inputs, or of operating as a D-type latch. There are four additional multiplexers which select the function output or one of the external inputs for routing to each of the four outputs (north, south, east, and west).
24
A unique feature in the Algotronix I/O pad design is its capability to provide simultaneous input and output on the same pin when communicating with another Algotronix chip. This is done through a 3-level (ternary) logic signaling scheme in which I/O pads sense whenever two outputs are driving each other via a contention scheme. Even during contention, the pad can deduce the correct input value and pass it along to the internal circuitry. This makes it easier to partition a single design across multiple FPGAs because the increased connectivity reduces pin limitations on communications bandwidth. AMD Mach : AMD offers a CPLD family comprising five subfamilies calledMach. Each Mach device consists of multiple PAL-like blocks (or optimizedPALs). Mach 1 and 2 consist of optimized22V16 PALs, Mach 3 and 4 consist of several optimized 34V16 PALs,and Mach 5 is similar to Mach 3 and 4but offers enhanced speed performance .All Mach chips use EEPROM technology, and together the five subfamilies provide a wide range of selection ,from small, inexpensive chips to larger, state-of-the-art ones Figure (a) below depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks and the interconnect, called the central switch matrix. The in-circuit programmable chips range in size from6 to 16 PAL-like blocks, corresponding roughly to 2,000 to 5,000 equivalent gates. All connections between PAL-like blocks (even from a PAL-like block to itself) pass through the central switch matrix. Thus, the device is not merely a collection of PAL-like blocks but a single ,large device. Since all connections travel through the same path, circuit timing delays are predictable. Figure (b) illustrates a Mach 4 PAL-like block. It has 16 outputs and a total of
25
34inputs (16 of which are the fed-back outputs), so it corresponds to a 34V16 PAL. However, there are two key differences between this block and a normal PAL:1) a product term (PT) allocator between the AND plane and the macro cells (the macro cells comprise an OR gate, an EXOR gate, and a flip-flop), and2) an output switch matrix between the OR gates and the I/O pins. These features make a Mach 4 chip easier to use because they decouple sections of the PAL-like block. More specifically, the product term allocator distributes and shares product terms from the AND plane to OR gates that require them, allowing much more flexibility than thefixed-size OR gates in regular PALs. The output switch matrix enables any macrocell output (OR gate or flip-flop)to drive any I/O pin connected to the PAL-like block, again providing greater flexibility than a PAL, in which each macro cell can drive only one specific I/O pin. Mach 4s combination of in-system programmability and high flexibility allow easy hardware design changes.
AMD Mach 4 structure
26
Summary of Commercially available FPGAs : Manufacturer General Architecture XILINX ACTEL ALTERA Plessey PLUS Logic AMD QUICK Logic ALGOTRONIX Symmetrical Array Row-Based Hierarchical -PLD Sea of Gates Hierarchical -PLD Hierarchical -PLD Symmetrical Array Sea of Gates Look-up Table Multiplexer based PLD-Block NAND-Gtae PLD-Block PLD-Block Multiplexer Based Multiplexers and Basic gates CONCURRENT Sea of Gates Multiplexers and Basic gates CROSSPOINT Solutions Row Based Transistor Pairs & Multiplexers Anti-fuse Static RAM Type of Logic Block Programming Technology Used Static RAM Anti-fuse EPROM Static RAM EPROM EEPROM Anti-fuse Static RAM
27
FPGA Design Flow:

The earlier PLD and FPGA designs were performed largely by hand But to-days
complex programmable logic devices requires the use of an integrated Computer-Aided Design (CAD) system. Both commercial CAD tool vendors and FPGA companies offer appropriate tools. For example, traditional Electronic Design Automation (EDA) vendors such as Cadence, Mentor Graphics, Synopsys, and View Logic etc. offer tools to support FPGA design. These tools are typically used for the front-end design entry and simulation operations and provide the necessary interfaces to vendor-specific back-end tools for chip placement and routing. Examples of vendor specific tools are the Xilinx XACT system and the Altera MAX+PLUS II software.The Alteras MAX+PLUS II software supports the entire design flow on either PC or workstation platforms. The first step in the design process is the description of the logic circuit,which can be done either by schematic capture tool or with Boolean expressions.This is followed by a translation that converts the original circuit description into a standard format used by the suitable CAD tools (Ex: XILINX CAD tools).The circuit is then passed through CAD programs that partition it into appropriate logic blocks. Select a specific location in the FPGA for each logic block and form the required interconnections.( (Cadence, View Logic, OrCAD, etc.) The performance of the implemented circuit can then be checked and its functionality is verified.Finally a bitmap is generated and downloaded in a serial fashion to configure the FPGA. Initial Design Entry: The detailed description of the logic circuit are entered using a schematic capture program. In the design entry phase, RTL or schematic entry is used to create the logic to be implemented in the device. Pin assignments can also be made, including pin placement information, and timing constraints that might be necessary for building a functioning design. In the design entry step a schematic or Block Design File (.bdf) is created that is the top-level design. The library of parameterized modules (LPM) functions are added and Verilog HDL code is used to add a logic block. The library may be either supplied by the vendor of the schematic capture program or any FPGA vendor(Like Xilinx or Altera etc.) .An alternate way to specify the logic circuit is to use a
28
Boolean expression or state machine language.This is done without the graphical interface.Some times it is possible to use a mixture of both schematic and Boolean expressions.
Translation to XNF Format: After the logic circuit is successfully designed and merged into one circuit ,it is translated into a special format that is understood by the CAD tools.For Xilinx this format is called Xilinx net list format or XNF.This translation utility is supported by the Xilinx or by the vendor of the logic entry tool.The translation process may also involve automatic optimizations of the circuit. Partition: The XNF circuit is partitioned into logic cells (this partition is also known as Technology Mapping). This technology mapping converts the XNF circuit which is a net list of basic logic gates ,into a net list of Xilinx logic cells.The logic cell used depends on which Xilinx product the circuit is to be implemented in. XACT tools also attempt to optimize the circuit during this step. For example, circuitry associated with unused logic block inputs or outputs is eliminated from the design. In addition, the partitioning program attempts to minimize either the total number of CLBs used or the number of logic stages in the critical delay path. The mapping procedure attempts to optimize the resulting circuit, either to minimize the total of logic cells required or the number of stages of logic cells in time critical circuitry. Place and Route: This step is performed by using the CAD tools, manually by the user or mixture of the two. The first step is placement ,in which each logic cell generated during the partition step is assigned to a specific location in the FPGA. Automatic placement can be done using the simulated annealing algorithm. After the placement ,the required interconnections among the logic cells must be realized by selecting wire segments and routing switches within the FPGA interconnection resources.An automatic routing algorithm is used for this task which is based on Maze routing algorithm. Generally this routing and placement must be done automatically but sometimes it is done
29
manually by the user also. With the physical placement and routing completed, exact timing values can now be used to determine chip performance. The XACT tools provide a critical path timing analyzer which provides delay information on the longest through shortest paths through the chip.In addition, the physical layout timing information can also be back-annotated to the schematics to get more accurate functional simulation results. The final step in the Xilinx design flow is the creation of the BIT file which contains the binary programming data needed to configure the SRAM bits of the target chip. This file is then downloaded to configure the chip for final functional and timing tests of the programmed chip. After creating the design it must be compiled. Compilation converts the design into a bitstream that can be downloaded into the FPGA. The most important output of compilation is an SRAM Object File (.sof), which is used to program the device. The software also generates other report files that provide information about the code as it compiles In the design flow process the simulation is very important to learn, and there are entire applications devoted to simulating hardware designs. There are two types of simulation, RTL and timing. RTL (or functional) simulation allows you to verify that your code is place-androute) simulation verifies that the design meets timing and functions appropriately in the device.
After completion of the design ,its performance is checked either by downloading the configuration bits into FPGA or by using an interface to a timing simulation program.If the performance is not satisfactory ,suitable modifications are done at some point in the design flow.Once the timing and functionality is verified the implementation is complete. --------------------xxxxxx-----------------References: 1.Field Programmable Gate Arrays S.D Brown, R.J.Francis et al 2.FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN & JONATHAN ROSE. 3. FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell Tessier and Jonathan Rose1
30

Unit Iii: CPLD & Fpga Architecture & Applications

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Unit Iii: CPLD & Fpga Architecture & Applications

Загружено:

Авторское право:

Доступные форматы

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

EEPROM based programming technology. This

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

XC3000L Family XC3100A Family XC3100L Family

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Programmable I/O Block

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

route both phases of a clock

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

particular set of wire segments used. Thus,

speed-performance of an implemented circuit

Actel FPGA structure.

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Architecture of Altera FLEX 8000 FPGAs.

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

block level in other FPGAs.Basically the architecture

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

AMD Mach 4 structure

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

FPGA Design Flow:

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com

Вам также может понравиться