Description Languages Zoran Salcic The University of Auckland Asim Smailagic Carnegie Mellon University Kluwer Academic Publishers 2000 ii iii PREFACE TO THE SECOND EDTON As the response to the Iirst edition oI the book has been mostly positive, we Ielt the responsibility to respond by this second edition. The task oI writing has never been easy, because at the moment you think and believe you Iinish it, and release it Ior printing, you see that many things could be better and get ideas oI Iurther improvements and modiIications. The digital systems design Iield is such an area in which there is no end. Our believe is that with this second edition we have succeeded to improve the book and perIorm all those modiIications we Iound necessary, or our numerous colleagues suggested to do. This edition comprises a number oI changes in an attempt to make it more readable and useIul Ior teaching purposes, but also to numerous engineers who are entering the Iield oI digital systems design and Iield-programmable logic devices (FPLDs). In that context, the second edition contains Iour additional chapters, up-dated inIormation on the current developments in the area oI FPLDs and the examples oI the most recent developments that lead towards very complex system-on-chip solutions on FPLDs. Some oI the new design examples and suggested problems are just pointing to the direction oI systems on-chip. Number oI examples is Iurther increased as we think that the best learning is by examples. Besides Iurther emphasis on AHDL, as the main language Ior design speciIication, a Iurther extension oI presentation oI two other hardware description languages, VHDL and Verilog, is introduced. However, in order to preserve complementarity with another book 'VHDL and FPLDs in Digital Systems Design, prototyping and Customization (Zoran Salcic, Kluwer Academic Publishers, 1998) presentation oI VHDL is oriented mostly towards synthesizable designs in FPLDs. This book Iocuses on digital systems design and FPLDs combining them into an entity useIul Ior designers in the areas oI digital systems and rapid system iv prototyping. It is also useIul Ior the growing community oI engineers and researchers dealing with the exciting Iield oI FPLDs, reconIigurable, and programmable logic. Our goal is to bring these areas to the students studying digital system design, computer design, and related topics, as to show how very complex circuits can be implemented at the desk. Hardware and soItware designers are getting closer every day by the emerging technologies oI in-circuit reconIigurable and in-system programmable logic oI very high complexity. Field-programmable logic has been available Ior a number oI years. The role oI FPLDs has evolved Irom simply implementing the system "glue-logic" to the ability to implement very complex system Iunctions, such as microprocessors and microcomputers. The speed with which these devices can be programmed makes them ideal Ior prototyping and education. Low production cost makes them competitive Ior small to medium volume productions. These devices make possible new sophisticated applications and bring-up new hardware/soItware trade-oIIs and diminish the traditional hardware/soItware demarcation line. Advanced design tools are being developed Ior automatic compilation oI complex designs and routing to custom circuits. To our knowledge, this book makes a pioneering eIIort to present rapid prototyping and generation oI computer systems using FPLDs. Rapid prototyping systems composed oI programmable components show great potential Ior Iull implementation oI microelectronics designs. Prototyping systems based on FPLDs present many technical challenges aIIecting system utilization and perIormance. The book contains IiIteen chapters. Chapter 1 represents an introduction into the Iield-programmable logic. Main types oI FPLDs are introduced, including programming technologies, logic cell architectures, and routing architectures used to interconnect logic cells. Architectural Ieatures are discussed to allow the reader to compare diIIerent devices appearing on the market, sometimes using conIusing terminology and hiding the real nature oI the devices. Also, the main characteristics oI the design process using FPLDs are discussed and the diIIerences to the design Ior custom integrated circuits underlined. The necessity to introduce and use new advanced tools when designing complex digital systems is also emphasized. New section on typical applications is introduced to show in the very beginning where FPLDs and complex system design are directed to. Chapter 2 describes the Iield-programmable devices oI the three major manuIacturers in the market, Altera, Xilinx and Atmel. It does not mean that devices Irom other manuIacturers are inIerior to presented ones. The purpose oI this book is not to compare diIIerent devices, but to emphasize the most important Ieatures Iound in the majority oI FPLDs, and their use in complex digital system prototyping and design. Altera and Xilinx invented some oI the concepts Iound in major types oI Iield-programmable logic and also produce devices which employ all v major programming technologies. Complex Programmable Logic Devices (CPLDs) and Field-Programmable Gate Arrays (FPGAs) are presented in Chapter 2, along with their main architectural and application-oriented Ieatures. Although sometimes we use diIIerent names to distinguish CPLDs and FPGAs, usually with the term FPLD we will reIer to both types oI devices. Atmel`s devices, on the other hand, give an option oI partial reconIiguration, which makes them potential candidate Ior a range oI new applications. Chapter 3 covers aspects oI the design methodology and design tools used to design with FPLDs. The need Ior tightly coupled design Irameworks, or environments, is discussed and the hierarchical nature oI digital systems design. All major design description (entry) tools are brieIly introduced including schematic entry tools and hardware description languages. The complete design procedure, which includes design entry, processing, and veriIication, is shown in an example oI a simple digital system. An integrated design environment Ior FPLD-based designs, the Altera`s MaxPlus II environment, is introduced. It includes various design entry, processing, and veriIication tools. Also, a typical prototyping system, Altera`s UP1 board is described as it will be used by many who will try designs presented in the book or make their own designs. Chapter 4 is devoted to the design using Altera`s Hardware Description Language (AHDL). First, the basic Ieatures oI AHDL are introduced without a Iormal presentation oI the language. Small examples are used to illustrate its Ieatures and how they are used. The readers can intuitively understand language and its syntax by examples. The methods Ior design oI combinatorial logic in AHDL, including the implementation oI bidirectional pins, standard sequential circuits such as registers and counters, and state machines is presented. Chapter 5 introduces more advanced Ieatures oI AHDL. Vendor supplied and user deIined macroIunctions appear as a library entities. The implementation oI user designs as hierarchical projects consisting oI a number oI subdesigns is also shown. AHDL, as a lower level hardware description language, allows user control oI resource assignments and very eIIective control oI the design Iit to target either speed or size optimization. Still, the designs speciIied in AHDL can be oI behavioral or structural type and easily retargeted, without change, to another device without the need Ior the change oI the design speciIication. New AHDL Ieatures that enable parameterized designs, as well as conditional generation oI logic, are introduced. They provide mechanisms Ior design oI more general digital circuits and systems that are customized at the time oI use and compilation oI the design. Chapter 6 shows how designs can be handled using primarily AHDL, but also in the combination with the more convenient schematic entry tools. Two relatively simple design case studies, which include a number oI combinational and sequential vi circuit designs are shown in this chapter. The Iirst example is an electronic lock which consists oI a hexadecimal keypad as the basic input device and a number oI LEDs as the output indicators oI diIIerent states. The lock activates an unlock signal aIter recognizing the input oI a sequence oI Iive digits acting as a kind oI password. The second example is a temperature control system, which enables temperature control in a small chamber (incubator). The temperature controller continuously scans the current temperature and activates one oI two actuators, a lamp Ior heating or a Ian Ior cooling. The controller allows set up oI a low and high temperature limit range where the current temperature should be maintained. It also provides the basic interIace with the operator in the Iorm oI hexadecimal keypad as input and 7- segment display and couple oI LEDs as output. Both designs Iit into the standard Altera`s devices. Chapter 7 includes a more complex example oI a simple custom conIigurable microprocessor called SimP. The microprocessor contains a Iixed core that implements a set oI instructions and addressing modes, which serve as the base Ior more complex microprocessors with additional instructions and processing capabilities as needed by a user and/or application. It provides the mechanisms to be extended by the designers in various directions and with some Iurther modiIications it can be converted to become a sort oI dynamically reconIigurable processor. Most oI the design is speciIied in AHDL to demonstrate the power oI the language. Chapter 8 is used to present a case study oI a digital system based on the combination oI a standard microprocessor and FPLD implemented logic. The VuMan wearable computer, developed at Carnegie Mellon University (CMU), is presented in this chapter. Examples oI the VuMan include the design oI memory interIacing logic and a peripheral controller Ior the Private Eye head-on display are shown. FPLDs are used as the most appropriate prototyping and implementation technology. Although AHDL represents an ideal vehicle Ior learning design with hardware description languages (HDLs), it is Altera proprietary language and as such can not be used Ior other target technologies. That is the reason to expand VHDL presentation in the second part oI the book. Chapter 9 provides an introduction to VHDL as a more abstract and powerIul hardware description language, which is also adopted as an IEEE standard. The goal oI this chapter is to demonstrate how VHDL can be used in digital system design. A subset oI the language Ieatures is used to provide designs that can almost always be synthesized. The Ieatures oI sequential and concurrent statements, objects, entities, architectures, and conIigurations, allow very abstract approaches to system design, at the same time controlling design in terms oI versions, reusability, or exchangeability oI the portions oI design. Combined with the Ilexibility and potential reconIigurability oI FPLDs, VHDL represents a tool which will be more and more in use in digital vii system prototyping and design. This chapter also makes a bridge between a proprietary and a standard HDLs. Chapter 10 introduces all major mechanisms oI VHDL used in description and design oI digital systems. It emphasizes those Ieature not Iound in AHDL, such as objects and data types. As VHDL is object oriented language, it provides the use oI a much higher level oI abstraction in describing digital systems. The use oI basic objects, such as constants, signals and variables is introduced. Mechanisms that allow user own data types enable simpler modeling and much more designer Iriendly descriptions oI designs. Finally, behavioral modeling enabled by processes as the basic mechanism Ior describing concurrency is presented. Chapter 11 goes a step Iurther to explain how synthesis Irom VHDL descriptions is made. This becomes important especially Ior those who are not interested Ior VHDL as description, documentation or simulation tool, but whose goal is synthesized design. Numerous examples are used to show how synthesizable combinational and standard sequential circuits are described. Also, Iinite state machines and typical models Ior Moore and Mealy machine descriptions are shown. In Chapter 12 we introduce two Iull examples. The Iirst example oI an input sequence classiIier and recognizer is used to demonstrate the use oI VHDL in digital systems design that are easily implemented in FPLDs. As the system contains a hierarchy oI subsystems, it is also used to demonstrate a typical approach in digital systems design when using VHDL. The second example is oI a simple asynchronous receiver/transmitter (SART) Ior serial data transIers. This example is used to Iurther demonstrate decomposition oI a digital system into its parts and integration at a higher level and the use oI behavioral modeling and processes. It also opens addition oI Iurther user options to make as sophisticated serial receiver/transmitter as required. Chapter 13 presents the third hardware description language with wide spread use in industry Verilog HDL. Presentation oI Verilog is mostly restricted to a subset useIul Ior synthesis oI digital systems. Basic Ieatures oI the language are presented and their utilization shown. Chapter 14 goes is oriented only towards synthesizable models in Verilog. A number oI standard combinational and sequential circuits is described by synthesizable models. Those examples provide a clear parallel with modeling the same circuits using other HDLs and demonstrate power and simplicity oI Verilog. They also show why many hardware designers preIer Verilog over VHDL as the language that is primarily suited Ior digital hardware design. Final Chapter 15 is dedicated to the design oI a more complex digital system. The SimP microprocessor, introduced in Chapter 7 as an example oI a simple viii general purpose processor, is redesigned introducing pipelining. Advantages oI Verilog as the language suitable Ior both behavioral and structural modeling are clearly demonstrated. The pipelined SimP model represents a good base Ior Iurther experiments with the SimP open architecture and its customization in any desired direction. The problems given at the end oI each chapter are usually linked to and require extension to examples presented within that or other chapters. By solving them, the reader will have opportunity to Iurther develop own skills and Ieel the real power oI both HDLs and FPLDs as implementation technology. By going through the whole design process Irom its description and entry simulation and real implementation, the reader will get own ideas how to use all these technologies in the best way. The book is based on lectures we have taught in diIIerent courses at Auckland University and CMU, various projects carried out in the course oI diIIerent degrees, and the courses Ior proIessional engineers who are entering the Iield oI FPLDs and CAD tools Ior complex digital systems design. As with any book, it is still open and can be improved and enriched with new materials, especially due to the Iact that the subject area is rapidly changing. The complete Chapter 8 represents a portion oI the VuMan project carried out at Carnegie Mellon University. Some oI the original VuMan designs are modiIied Ior the purpose oI this book at Auckland University. A special gratitude is directed to the Altera Corporation Ior enabling us to try many oI the concepts using their tools and devices in the course oI its University Program Grant and Ior providing design soItware on CD ROM included with this book. Also Altera made possible the opportunity Ior numerous students at Auckland University to take part in various courses designing digital systems using these new technologies. The thank also goes to a number oI reviewers and colleagues who gave valuable suggestions. We believe that the book will meet their expectations. This book would not be possible without the supportive environment at Auckland University and Carnegie Mellon University as well as early support Irom Cambridge University, Czech Technical University, University oI Edinburgh, and Sarajevo University where we spent memorable years teaching and conducting research. At the end, when we analyze the Iinal manuscript as it will be printed, the book looks more as a completely new one than as the second edition oI original one. Still, as it ows to its predecessor, we preserved the main title. However, the subtitle reIlects its shiIt oI the ballance to hardware description languages as we explained in this preIace. ix Z. A. Salcic A. Smailagic Auckland, New Zealand Pittsburgh, USA May 2000 1 NTRODUCTON TO FELD PROGRAMMABLE LOGC DEVCES Programmable logic design is beginning the same paradigm shiIt that drove the success oI logic synthesis within ASIC design, namely the move Irom schematics to HDL based design tools and methodologies. Technology advancements, such as 0.25 micron Iive level metal processing and architectural innovations such as large amount oI on-chip memory, have signiIicantly broadened the applications Ior Field- Programmable Logic Devices (FPLDs). This chapter represents an introduction to the Field-Programmable Logic. The main types oI FPLDs are introduced, including programming technologies, logic cell architectures, and routing architectures used to interconnect logic cells. Architectural Ieatures are discussed to allow the reader to compare diIIerent devices appearing on the market. The main characteristics oI the design process using FPLDs are also discussed and the diIIerences to the design Ior custom integrated circuits underlined. In addition, the necessity to introduce and use new advanced tools when designing complex digital systems is emphasized. 1.1. Introduction FPLDs represent a relatively new development in the Iield oI VLSI circuits. They implement thousands oI logic gates in multilevel structures. The architecture oI an FPLD, similar to that oI a Mask-Programmable Logic Device (MPLD), consists oI an array oI logic cells that can be interconnected by programming to implement diIIerent designs. The major diIIerence between an FPLD and an MPLD is that an MPLD is programmed using integrated circuit Iabrication to Iorm metal interconnections while an FPLD is programmed using electrically programmable switches similar to ones in traditional Programmable Logic Devices (PLDs). FPLDs can achieve much higher levels oI integration than traditional PLDs due to their more complex routing architectures and logic implementation. The Iirst PLD developed Ior implementing logic circuits was the Iield-Programmable Logic Array (PLA). A PLA is implemented using AND-OR logic with wide input programmable AND gates Iollowed by a programmable OR gate plane. PLA routing architectures 2 CH1: Introduction to Field ProgrammableLogic Devices are very simple with ineIIicient crossbar like structures in which every output is connectable to every input through one switch. As such, PLAs are suitable Ior implementing logic in two-level sum-oI-products Iorm. The next step in PLDs development was introduction oI Programmable Array Logic (PLA) devices with a single level oI programmability - programmable AND gates Iollowed by Iixed OR gates. In order to allow implementation oI sequential circuits, OR gates are usually Iollowed by Ilip-Ilops. A variant oI the basic PLD architectures appears in several today`s FPLDs. FPLD combines multiple simple PLDs on a single chip using programmable interconnect structures. Today such combinations are known as Complex PLDs (or CPLDs) with the capacities equivalent to tens oI simple FPLDs. FPLD routing architectures provide a more eIIicient MPLD-like routing where each connection typically passes through several switches. FPLD logic is implemented using multiple levels oI lower Ian-in gates which is oIten more compact than two- level implementations. Building FPLDs with very high capacity requires a diIIerent approach, more similar to Mask-Programmable Gate Arrays (MPGAs) that are the highest capacity general-purpose logic chips. As a MPGA consists oI an array oI preIabricated transistors, that are customized Ior user logic by means oI wire connections, customization during chip Iabrication is required. An FPLD which is the Iield-programmable equivalent oI an MPGA is very oIten known as an FPGA. The end user conIigures an FPGA through programming. In this text we use the FPLD as a term that covers all Iield-programmable logic devices including CPLDs and FPGAs. An FPLD manuIacturer makes a single, standard device that users program to carry out desired Iunctions. Field programmability comes at a cost in logic density and perIormance. FPLD capacity trails MPLD capacity by about a Iactor oI 10 and FPLD perIormance trails MPLD perIormance by about a Iactor oI three. Why then FPLDs? FPLDs can be programmed in seconds rather than weeks, minutes rather than the months required Ior production oI mask-programmed parts. Programming is done by end users at their site with no IC masking steps. FPLDs are currently available in densities over 100,000 gates in a single device. This size is large enough to implement many digital systems on a single chip and larger systems can be implemented on multiple FPLDs on the standard PCB or in the Iorm oI Multi- Chip Modules (MCM). Although the unit costs oI an FPLD is higher than an MPLD oI the same density, there is no up-Iront engineering charges to use an FPLD, so they are more cost-eIIective Ior many applications. The result is a low-risk design style, where the price oI logic error is small, both in money and project delay. FPLDs are useIul Ior rapid product development and prototyping. They provide very Iast design cycles, and, in the case that the major value oI the product is in algorithms or Iast time-to-market they prove to be even cost-eIIective as the Iinal deliverable product. Since FPLDs are Iully tested aIter manuIacture, user designs do not require test program generation, automatic test pattern generation, and design Ior testability. Some FPLDs have Iound a suitable place in designs that require CH1: Introduction to Field Programmable Logic Devices 3 reconIiguration oI the hardware structure during system operation, Iunctionality can change 'on the Ily. An illustration oI device options ratings, that include standard discrete logic, FPLDs, and custom logic is given in Figure 1.1. Although not quantitative, the Iigure demonstrates many advantages oI FPLDs over other types oI available logic. Requi rement FPLD Di screte l ogi c Custom l ogi c Speed Densi ty Cost Devel opment Prototyp. & si mul . Manufacturi ng Future modi fi c. nventory Devel opment tool Very effecti ve Adequate Poor Figure 1.1 Device options ratings for different device technologies The purpose oI Figure 1.1 and this discussion is to point out some oI the major Ieatures oI currently used options Ior digital system design, and show why we consider FPLDs as the most promising technology Ior implementation oI a very large number oI digital systems. Until recently only two major options were available to digital system designers. First, they could use Small-Scale Integrated (SSI) and Medium-Scale Integrated (MSI) circuits to implement a relatively small amount oI logic with a large number oI devices. Second, they could use a Masked-Programmed Gate Array (MPGA) or simply gate array to implement tens or hundreds oI thousands oI logic gates on a single integrated circuit in multi-level logic with wiring between logic 4 CH1: Introduction to Field ProgrammableLogic Devices levels. The wiring oI logic is built during the manuIacturing process requiring a custom mask Ior the wiring. The low volume MPGAs have been expensive due to high mask-making charges. As intermediate solutions Ior the period during the 1980s and early 1990s various kinds oI simple PLDsm(PLAs, PALs) were available. A simple PLD is a general purpose logic device capable implementing the logic oI tens or hundreds oI SSI circuits and customize logic Iunctions in the Iield using inexpensive programming hardware. Large designs require a multi-level logic implementation introducing high power consumption and large delays. FPLDs oIIer the beneIits oI both PLDs and MPLDs. They allow the implementation oI thousands oI logic gates in a single circuit and can be programmed by designers on the site not requiring expensive manuIacturing processes. The discussion below is largely targeted to a comparison oI FPLDs and MPLDs as the technologies suitable Ior complex digital system design and implementation. 1.1.1 Speed FPLDs oIIer devices that operate at speeds exceeding 200 MHz in many applications. Obviously, speeds are higher than in systems implemented by SSI circuits, but lower than the speeds oI MPLDs. The main reason Ior this comes Irom the FPLD programmability. Programmable interconnect points add resistance to the internal path, while programming points in the interconnect mechanism add capacitance to the internal path. Despite these disadvantages when compared to MPLDs, FPLD speed is adequate Ior most applications. Also, some dedicated architectural Ieatures oI FPLDs can eliminate unneeded programmability in speed critical paths. By moving FPLDs to Iaster processes, application speed can be increased by simply buying and using a Iaster device without design modiIication. The situation with MPLDs is quite diIIerent; new processes require new mask-making and increase the overall product cost. 1.1.2 Density FPLD programmability introduces on-chip programming overhead circuitry requiring area that cannot be used by designers. As a result, the same amount oI logic Ior FPLDs will always be larger and more expensive than MPLDs. However, a large area oI the die cannot be used Ior core Iunctions in MPLDs due to the I/O CH1: Introduction to Field Programmable Logic Devices 5 pad limitations. The use oI this wasted area Ior Iield programmability does not result in an increase oI area Ior the resulting FPLD. Thus, Ior a given number oI gates, the size oI an MPLD and FPLD is dictated by the I/O count so the FPLD and MPLD capacity will be the same. This is especially true with the migration oI FPLDs to submicron processes. MPLD manuIacturers have already shiIted to high- density products leaving designs with less than 20,000 gates to FPLDs. 1.1.3 Development Time FPLD development is Iollowed by the development oI tools Ior system designs. All those tools belong to high-level tools aIIordable even to very small design houses. The development time primarily includes prototyping and simulation while the other phases, including time-consuming test pattern generation, mask-making, waIer Iabrication, packaging, and testing are completely avoided. This leads to the typical development times Ior FPLD designs measured in days or weeks, in contrast to MPLD development times in several weeks or months. 1.1.4 Prototyping and Simulation Time While the MPLD manuIacturing process takes weeks or months Irom design completion to the delivery oI Iinished parts, FPLDs require only design completion. ModiIications to correct a design Ilaw are quickly and easily done providing a short turn around time that leads to Iaster product development and shorter time-to- market Ior new FPLD-based products. Proper veriIication requires MPLD users to veriIy their designs by extensive simulation beIore manuIacture introducing all oI the drawbacks oI the speed/accuracy trade-oII connected with any simulation. In contrast, FPLDs simulations are much simpler due to the Iact that timing characteristics and models are known in advance. Also, many designers avoid simulation completely and choose in-circuit veriIication. They implement the design and use a Iunctioning part as a prototype that operates at Iull speed and absolute time accuracy. A prototype can be easily changed and reinserted into the system within minutes or hours. FPLDs provide low-cost prototyping, while MPLDs provide low-cost volume production. This leads to prototyping on an FPLD and then switching to an MPLD Ior volume production. Usually there is no need Ior design modiIication when retargeting to an MPLD, except sometimes when timing path veriIication Iails. Some FPLD vendors oIIer mask-programmed versions oI their FPLDs giving users Ilexibility and advantages oI both implementation methods. 6 CH1: Introduction to Field ProgrammableLogic Devices 1.1.5 Manufacturing Time All integrated circuits must be tested to veriIy manuIacturing and packaging. The test is diIIerent Ior each design. MPLDs typically incur three types oI costs associated with testing. on-chip logic to enable easier testing generation oI test programs Ior each design testing the parts when manuIacturing is complete Because they have a simple and repeatable structure, the test program Ior one FPLD device is same Ior all designs and all users oI that part. It Iurther justiIies all reasonable eIIorts and investments to produce extensive and high quality test programs that will be used during the liIetime oI the FPLD. Users are not required to write design speciIic tests because manuIacturer testing veriIies that every FPLD will Iunction Ior all possible designs implemented. The consequences oI manuIacturing chips Irom both categories are obvious. Once veriIied, FPLDs can be manuIactured in any quantity and delivered as Iully tested parts ready Ior design implementation while MPLDs require separate production preparation Ior each new design. 1.1.6 Future Modifications Instead oI customizing the part in the manuIacturing process as Ior MPLDs, FPLDs are customized by electrical modiIications. The electrical customization takes milliseconds or minutes and can even be perIormed without special devices, or with low cost programming devices. Even more, it can usually be perIormed in-system, meaning that the part can already be on the printed circuit board reducing the danger oI the damage due to uncareIul handling. On the other hand, every modiIied design to be implemented in an MPLD requires a custom mask that costs several thousands dollars that can only be amortized over the total number oI units manuIactured. 1.1.7 Inventory Risk An important Ieature oI FPLDs is low inventory risk, similar to SSI and MSI parts. Since actual manuIacturing is done at the time oI programming a device, the same part can be used Ior diIIerent Iunctionality and diIIerent designs. This is not Iound in an MPLD since the Iunctionality and application is Iixed Iorever once it is produced. Also, the decision on the volume oI MPLDs must be made well in CH1: Introduction to Field Programmable Logic Devices 7 advance oI the delivery date, requiring concern with the probability that too many or not enough parts are ordered to manuIacture. Generally, FPLDs are connected with very low risk design in terms oI both money and delays. Rapid and easy prototyping enables all errors to be corrected with short delays, but also gives designers the chance to try more risky logic designs in the early stages oI product development. Development tools used Ior FPLD designs usually integrate the whole range oI design entry, processing, and simulation tools which enable easy reusability oI all parts oI a correct design. FPLD designs can be made with the same design entry tools used in traditional MPLDs and Application SpeciIic Integrated Circuits (ASICs) development. The resulting netlist is Iurther manipulated by FPLD speciIic Iitting, placement, and routing algorithms that are available either Irom FPLD manuIacturers or CAE vendors. However, FPLDs also allow designing on the very low device dependent level providing the best device utilization, iI needed. 1.1.8 Cost Finally, the above-introduced options reIlect on the costs. The major beneIit oI an MPLD-based design is low cost in large quantities. The actual volume oI the products determines which technology is more appropriate to be used. FPLDs have much lower costs oI design development and modiIication, including initial Non- Recurring Engineering (NRE) charges, tooling, and testing costs. However, larger die area and lower circuit density result in higher manuIacturing costs per unit. The break-even point depends on the application and volume, and is usually at between ten and twenty thousand units Ior large capacity FPLDs. This limit is even higher when an integrated volume production approach is applied, using a combination oI FPLDs and their corresponding masked-programmed counterparts. Integrated volume production also introduces Iurther Ilexibility, satisIying short term needs with FPLDs and long term needs at the volume level with masked-programmed devices. 1.2 Types of FPLDs The general architecture oI an FPLD is shown in Figure 1.2. A typical FPLD consists oI a number oI logic cells that are used Ior implementation oI logic Iunctions. Logic cells are arranged in a Iorm oI a matrix. Interconnection resources connect logic cell outputs and inputs, as well as input/output blocks used to connect FPLD with the outer world. 8 CH1: Introduction to Field ProgrammableLogic Devices Despite the same general structure, concrete implementations oI FPLDs diIIer among the major competitors. There is a diIIerence in approach to circuit programmability, internal logic cell structure, input/output blocks and routing mechanisms. An FPLD logic cell can be a simple transistor or a complex microprocessor. Typically, it is capable oI implementing combinational and sequential logic Iunctions oI diIIerent complexities. nterconnect resources Logic Cell /O Block Figure 1.2 FPLD architecture Current commercial FPLDs employ logic cells that are based on one or more oI the Iollowing: Transistor pairs Basic small gates, such as two-input NANDs or XORs Multiplexers Look-up tables (LUTs) Wide-Ian-in AND-OR structures CH1: Introduction to Field Programmable Logic Devices 9 Three major programming technologies, each associated with area and perIormance costs, are commonly used to implement the programmable switch Ior FPLDs. These are: Static Random Access Memory (SRAM) , where the switch is a pass transistor controlled by the state oI a SRAM bit EPROM, where the switch is a Iloating-gate transistor that can be turned oII by injecting charge onto its Iloating gate, and AntiIuse, which, when electrically programmed, Iorms a low resistance path. In all cases, a programmable switch occupies a larger area and exhibits much higher parasitic resistance and capacitance than a typical contact used in a custom MPLDs. Additional area is also required Ior programming circuitry, resulting in higher density and lower speed oI FPLDs compared to MPLDs. An FPLD routing architecture incorporates wire segments oI varying lengths which can be interconnected with electrically programmable switches. The density achieved by an FPLD depends on the number oI wires incorporated. II the number oI wire segments is insuIIicient, only a small Iraction oI the logic cells can be utilized. An excessive number oI wire segments wastes area. The distribution oI wire segments greatly aIIects both density and perIormance oI an FPLD. For example, iI all segments stretch over the entire length oI the device (so called long segments), implementing local interconnections costs area and time. On the other hand, employment oI only short segments requires long interconnections to be implemented using many switches in series, resulting in unacceptably large delays. Both density and perIormance can be optimized by choosing the appropriate granularity and Iunctionality oI logic cell, as well as designing the routing architecture to achieve a high degree oI routability while minimizing the number oI switches. Various combinations oI programming technology, logic cell architecture, and routing mechanisms lead to various designs suitable Ior speciIic applications. A more detailed presentation oI all major components oI FPLD architectures is given in the sections and chapters that Iollow. II programming technology and device architecture are combined, three major categories oI FPLDs are distinguished: Complex Programmable Logic Device CPLDs, Static RAM Field Programmable Logic Arrays, or simply FPGAs, AntiIuse FPGAs 10 CH1: Introduction to Field ProgrammableLogic Devices In this section we present the major Ieatures oI these three categories oI FPLDs. 1.2.1 CPLDs A typical CPLD architecture is shown in Figure 1.3. The user creates logic interconnections by programming EPROM or EEPROM transistors to Iorm wide Ian-in gates. Figure 1.3 Typical CPLD architecture Function Blocks (FBs) are similar to a simple two-level PLD. Each FB contains a PLD AND-array that Ieeds its macrocells (MC). The AND-array consists oI a number oI product terms. The user programs the AND-array by turning on EPROM transistors that allow selected inputs to be included in a product term. A macrocell includes an OR gate to complete AND-OR logic and may also include registers and an I/O pad. It can also contain additional EPROM cells to control multiplexers that select a registered or non-registered output and decide whether or not the macrocell result is output on the I/O pad at that location. Macrocell outputs are connected as additional FB inputs or as the inputs to a global universal interconnect mechanism (UIM) that reaches all FBs on the chip. FBs, macrocells, and interconnect mechanisms vary Irom one product to another, giving a range oI device capacities and speeds MC AND Array MC /O /O MC AND Array MC /O /O UM MC AND Array MC /O /O MC AND Array MC /O /O CH1: Introduction to Field Programmable Logic Devices 11 1.2.2 Static RAM FPGAs In SRAM FPGAs, static memory cells hold the program that represents the user design. SRAM FPGAs implement logic as lookup tables (LUTs) made Irom memory cells with Iunction inputs controlling the address lines. Each LUT oI 2 n memory cells implements any Iunction oI n inputs. One or more LUTs, combined with Ilip-Ilops, Iorm a logic block (LB). LBs are arranged in a two-dimensional array with interconnect segments in channels as shown in Figure 1.4. LB OB OB OB OB OB OB OB OB OB OB OB OB LB LB LB LB LB LB LB LB Figure 1.4 Typical SRAM FPGA architecture Interconnect segments connect to LB pins in the channels and to the other segments in the switch boxes through pass transistors controlled by conIiguration memory cells. The switch boxes, because oI their high complexity, are not Iull crossbar switches. An SRAM FPGA program consists oI a single long program word. On-chip circuitry loads this word, reading it serially out oI an external memory every time 12 CH1: Introduction to Field ProgrammableLogic Devices power is applied to the chip. The program bits set the values oI all conIiguration memory cells on the chip, thus setting the lookup table values and selecting which segments connect each to the other. SRAM FPGAs are inherently reprogrammable. They can be easily updated providing designers with new capabilities such as reconIigurability. 1.2.3 Antifuse FPGAs An antiIuse is a two-terminal device that, when exposed to a very high voltage, Iorms a permanent short circuit (opposite to a Iuse) between the nodes on either side. Individual antiIuses are small, enabling an antiIuse-based architecture to have thousands or millions oI antiIuses. AntiIuse FPGA, as illustrated in Figure 1.5, usually consists oI rows oI conIigurable logic elements with interconnect channels between them, much like traditional gate arrays. The pins on logic blocks (LBs) extend into the channel. An LB is usually a simple gate-level network, which the user programs by connecting its input pins to Iixed values or to interconnect nets. There are antiIuses at every wire-to-pin intersection point in the channel and at all wire-to-wire intersection points where channels intersect. Logic Blocks Figure 1.5 Antifuse FPGA architecture Commercial FPLDs use diIIerent programming technologies, diIIerent logic cell architectures, and diIIerent structures oI their routing architectures. A survey oI CH1: Introduction to Field Programmable Logic Devices 13 major commercial architectures is given in the rest oI this part, and a more detailed presentation oI FPLD Iamilies Irom two major manuIacturers, Xilinx, and Altera, is given in Part 2. The majority oI design examples introduced in later chapters are illustrated using Altera's FPLDs. 1.3 Programming TechnoIogies An FPLD is programmed using electrically programmable switches. The Iirst user- programmable switch was the Iuse used in simple PLDs. For higher density devices, especially the dominant CMOS IC industry, diIIerent approaches are used to achieve programmable switches. The properties oI these programmable switches, such as size, volatility, process technology, on-resistance, and capacitance determine the major Ieatures oI an FPLD architecture. In this section we introduce the most commonly used programmable switch technologies in commercial FPLDs. 1.3.1 SRAM Programming Technology SRAM programming technology uses static RAM cells to conIigure logic and control intersections and paths Ior signal routing. The conIiguration is done by controlling pass gates or multiplexers as it is illustrated in Figure 1.6. When a "1" is stored in the SRAM cell in Figure 1.6(a), the pass gate acts as a closed switch and can be used to make a connection between two wire segments. For the multiplexer, the state oI the SRAM cells connected to the select lines controls which one oI the multiplexers inputs are connected to the output, as shown in Figure 1.6(b). Reprogrammability allows the circuit manuIacturer to test all paths in the FPGA by reprogramming it on the tester. The users get well tested parts and 100 "programming yield" with no design speciIic test patterns and no "design Ior testability." Since on-chip programming is done with memory cells, the programming oI the part can be done an unlimited number oI times. This allows prototyping to proceed iteratively, re-using the same chip Ior new design iterations. Reprogrammability has advantages in systems as well. In cases where parts oI the logic in a system are not needed simultaneously, they can be implemented in the same reprogrammable FPGA and FPGA logic can be switched between applications. Besides volatility, a major disadvantage oI SRAM programming technology is its large area. At least Iive transistors are needed to implement an SRAM cell, plus at least one transistor to implement a programmable switch. A typical Iive-transistor memory cell is illustrated in Figure 1.7. There is no separate RAM area on the chip. The memory cells are distributed among the logic elements they control. Since FPGA memories do not change during normal operation, they are built Ior stability 14 CH1: Introduction to Field ProgrammableLogic Devices and density rather than speed. However, SRAM programming technology has two Iurther major advantages; Iast-reprogrammability and that it requires only standard integrated circuit process technology. SRAM Cell Pass Gate (a) Bit 0 Bit 1 Mux 4 x 1 (b) SRAM Cells Figure 1.6 SRAM Programming Technology Since SRAM is volatile, the FPGA must be loaded and conIigured at the time oI chip power-up. This requires external permanent memory to provide the programming bitstream such as PROM, EPROM, EEPROM or magnetic disk. This is the reason that SRAM-programmable FPGAs include logic to sense power-on and to initialize themselves automatically, provided the application can wait the tens oI milliseconds required to program the device. Bit line Word line Vcc Vcc Figure 1.7 Five-transistor Memory Cell CH1: Introduction to Field Programmable Logic Devices 15 1.3.2 Floating Gate Programming Technology Floating gate programming technology uses the technology oI ultraviolet-erasable EPROM and electrically erasable EEPROM devices. The programmable switch, as shown in Figure 1.8, is a transistor that can be permanently "disabled." To disable the transistor, a charge is injected on the Iloating polysilicon gate using a high voltage between the control gate and the drain oI the transistor. This charge increases the threshold voltage oI the transistor so it turns oII. The charge is removed by exposing the Iloating gate to ultraviolet light. This lowers the threshold voltage oI the transistor and makes the transistor Iunction normally. Rather than using an EPROM transistor directly as a programmable switch, the unprogrammed transistor is used to pull down a "bit line" when the "word line" is set to high. While this approach can be simply used to provide connection between word and bit lines, it can also be used to implement a wired-AND style oI logic, in that way providing both logic and routing. Word line Bit line EPROM transistor Control gate Floating gate Figure 1.8 Floating gate programming technology The major advantage oI the EPROM programming technology is its reprogrammability. An advantage over SRAM is that no external permanent memory is needed to program a chip on power-on. On the other hand, reconIiguration itselI can not be done as Iast as in SRAM technology devices. Additional disadvantages are that EPROM technology requires three more processing steps over an ordinary CMOS process, the high on-resistance oI an EPROM transistor, and the high static power consumption due to the pull-up resistor used. 16 CH1: Introduction to Field ProgrammableLogic Devices EEPROM technology used in some devices is similar to the EPROM approach, except that removal oI the gate charge can be done electrically, in-circuit, without ultraviolet light. This gives an advantage oI easy reprogrammability, but requires more space due to the Iact that EEPROM cell is roughly twice the size oI an EPROM cell. 1.3.3 Antifuse Programming Technology An antiIuse is an electrically programmable two-terminal device. It irreversibly changes Irom high resistance to low resistance when a programming voltage (in excess oI normal signal levels) is applied across its terminals. AntiIuses oIIer several unique Ieatures Ior FPGAs , most notably a relatively low on-resistance oI 100-600 Ohms and a small size. The layout area oI an antiIuse cell is generally smaller than the pitch oI the metal lines it connects; it is about the same size as a via connecting metal lines in an MPLD. When high voltage (11 to 20 Volts) is applied across its terminals, the antiIuse will "blow" and create a low resistance link. This link is permanent. AntiIuses are built either using an Oxygen-Nitrogen-Oxygen (ONO) dielectric between an N diIIusion and polysilicon, or amorphous silicon between metal layers or between polysilicon and the Iirst layer oI metal. Programming an antiIuse requires extra circuitry to deliver the high programming voltage and a relatively high current oI 5 mA or more. This is done through large transistors to provide addressing to each antiIuse. AntiIuses are normally "oII" devices. Only a small Iraction oI the total that need to be turned on must be programmed (about 2 Ior a typical application). So, other things being equal, programming is Iaster with antiIuses than with "normally on" devices. AntiIuse reliability must be considered Ior both the unprogrammed and programmed states. Time dependent dielectric breakdown (TDDB) reliability over 40 years is an important consideration. It is equally important that the resistance oI a programmed antiIuse remains low during the liIe oI the part. Analysis oI ONO dielectrics shows that they do not increase the resistance with time. Additionally, the parasitic capacitance oI an unprogrammed amorphous antiIuse is signiIicantly lower than Ior other programming technologies. CH1: Introduction to Field Programmable Logic Devices 17 1.3.4 Summary of Programming Technologies Major properties oI each oI above presented programming technologies are shown in Table 1.1. All data assumes a 1.2 m CMOS process technology and is used only Ior comparison purposes. The most recent devices use much higher density devices and many oI them are implemented in 0.5 or even 0.22 m CMOS process technology with the tendency to reduce it even Iurther (0.18 m and 0.15 m). Table 1.1 Comparison of Programming technologies TechnoIogy and Process VoIatiIe Reprogram mabiIity ? Area R (ohm) (on switch) C (fF) (parasitic) # Extra fabric. steps SRAM Mux Pass Trans. 1.2 m CMOS Yes Yes n-circuit Large 0.5-2K 10-20 0 ONO Antifuse 1.2 m CMOS No No Fuse small Prog. trans. large 300-600 5 3 Amorphous Antifuse 1.2 m CMOS No No Fuse small Prog. trans. large 50-100 1.1-1.3 3 EPROM 1.2 m CMOS No Yes Out of circuit Small in array 2-4K 10-20 3 EEPROM 1.2 m CMOS No Yes n- circuit 2 x EPRO M 2-4K 10-20 >5 1.4. Logic CeII Architecture In this section we present a survey oI commercial FPLD logic cell architectures in use today, including their combinational and sequential portions. FPLD logic cells diIIer both in size and implementation capability. A two transistor logic cell can only implement a small size inverter, while the look-up table logic cells can implement any logic Iunction oI several input variables and is signiIicantly larger. To capture these diIIerences we usually classiIy logic blocks by their granularity. 18 CH1: Introduction to Field ProgrammableLogic Devices Since granularity can be deIined in various ways (as the number oI Boolean Iunctions that the logic block can implement, the number oI two-input AND gates, total number oI transistors, etc.), we choose to classiIy commercial blocks into just two categories: Iine-grain and coarse-grain. Fine-grain logic cells resemble MPLD basic cells. The most Iine grain logic cell would be identical to a basic cell oI an MPLD and would consist oI Iew transistors that can be programmably interconnected. The FPGA Irom Crosspoint Solutions uses a single transistor pair in the logic cell. In addition to the transistor pair tiles, as depicted in Figure 1.9, the cross-point FPGA has a second type oI logic cell, called a RAM logic tile, that is tuned Ior the implementation oI random access memory, but can also be used to build other logic Iunctions. Transistor Pair Figure 1.9 Transistor pair tiles in Cross-point FPGA A second example oI a Iine-grain FPGA architecture is the FPGA Irom Plessey. Here the basic cell is a two-input NAND gate as illustrated in Figure 1.10. Logic is Iormed in the usual way by connecting the NAND gates to implement the desired Iunction. II the latch is not needed, then the conIiguration memory is set to make the latch permanently transparent. Several other commercial FPGAs employ Iine-grain logic cells. The main advantage oI using Iine-grain logic cells is that the usable cells are Iully utilized. This is because it is easier to use small logic gates eIIiciently and the logic synthesis techniques Ior such cells are very similar to those Ior conventional MPGAs (Mask- Programmable Gate Arrays) and standard cells. The main disadvantage oI Iine-grain cells is that they require a relatively large number oI wire segments and programmable switches. Such routing resources are costly in both delay and area. II a Iunction that could be packed into a Iew complex CH1: Introduction to Field Programmable Logic Devices 19 cells must instead be distributed among many simple cells, more connections must be made through the programmable routing network. As a result, FPLDs with Iine- grain logic cells are in general slower and achieve lower densities than those using coarse-grain logic cells. 8 interconnect lines Mux 8 x 2 NAND Latch Clk Data Q Configuration RAM Figure 1.10 The Plessey Logic Cell As a rule oI thumb, an FPLD should be as Iine-grained as possible while maintaining good routability and routing delay Ior the given switch technology. The cell should be chosen to implement a wide variety oI Iunctions eIIiciently, yet have minimum layout area and delay. Actel's logic cells have been designed on the base oI usage analysis oI various logic Iunctions in actual gate array applications. The Act-1 Iamily uses one general- purpose logic cell as shown in Figure 1.11. The cell is composed oI three 2-to-1 multiplexers, one OR gate, 8 inputs, and one output. Various macroIunctions (AND, NOR, Ilip-Ilops, etc.) can be implemented by applying each input signal to the appropriate cell inputs and tying other cell inputs to 0 or 1. The cell can implement all combinational Iunctions oI two inputs, all Iunctions oI three inputs with at least one positive input, many Iunctions oI Iour inputs, and some ranging up to eight inputs. Any sequential macro can be implemented Irom one or more cells using appropriate Ieedback routings. 20 CH1: Introduction to Field ProgrammableLogic Devices 0 1 0 1 0 1 OR A0 A1 SA B0 B1 SB S0 S1 OUT Figure 1.11 Act-1 logic cell Further analysis oI macros indicate that a signiIicant proportion oI the nets driving the data input oI Ilip-Ilop have no other Ian-out. This motivated the use oI a mixture oI two specialized cells Ior Act-2 and Act-3 Iamilies. The "C" cell and its equivalent shown in Figure 1.12 are modiIied versions oI the Act-1 cell re- optimized to better accommodate high Ian-in combinational macros. It actually represents a 4-to-1 multiplexer and two gates, implementing a total oI 766 distinct combinational Iunctions. CH1: Introduction to Field Programmable Logic Devices 21 0 1 0 1 0 1 OR D00 D01 A0 D10 D11 B0 A1 B1 OUT AND OUT 00 01 10 11 D00 D01 D10 D11 Z S1 S0 OR AND A1 B1 A0 B0 (a) (b)
Figure 1.12 Act-2 "C" cell The "S" cell, shown in Figure 1.13, consists oI a Iront end equivalent to "C" cell Iollowed by sequential block built around two latches. The sequential block can be used as a rising- or Ialling-edge D Ilip-Ilop or a transparent-high or transparent-low latch, by tying the C1 and C2 inputs to a clock signal, logical zero or logical one in various combinations. For example, tying C1 to 0 and clocking C2 implements a rising-edge D Ilip-Ilop. Toggle or enabled Ilip-Ilops can be built using combinational Iront end in addition to the D Ilip-Ilop. JK or SR Ilip-Ilops can be conIigured Irom one or more "C" or "S" cells using external Ieedback connections. A chip with an equal mixture oI "C" and "S" cells provides suIIicient Ilip-Ilops Ior most designs plus extra Ilexibility in placement. Over a range oI designs, the Act-2 mixture provides about 40-100 greater logic capacity per cell than the Act-1 cell. 22 CH1: Introduction to Field ProgrammableLogic Devices 00 01 10 11 D00 D01 D10 D11 Z S1 S0 OR AND A1B1A0B0 0 1 S Z AND AND AND AND 0 1 S Z OUT OR XOR CLR C2C1 Figure 1.13 Actel-2 "S" cell The logic cell in the FPLD Irom QuickLogic is similar to the Actel logic cell in that it uses a 4-to-1 multiplexer. Each input to the multiplexer is Ied by an AND gate, as shown in Figure 1.14. Alternating inputs to the AND gates are inverted allowing input signals to be passed in true or complement Iorm, thereIore eliminating the need to use extra logic cells to perIorm simple inversions. Multiplexer-based logic cells provide a large degree oI Iunctionality Ior a relatively small number oI transistors. However, this is achieved at the expense oI a large number oI inputs placing high demands on the routing resources. They are best suited to FPLDs that use small size programmable switches such as antiIuses. CH1: Introduction to Field Programmable Logic Devices 23 M N O AND D Q Q S R AND AND AND AND AND AZ QZ QZ NZ FZ QS QR A1 A2 A3 A4 A5 A6 B1 B2 C1 C2 D1 D2 E1 E2 F1 F2 F3 F4 F5 F6 QC Figure 1.14 The QuickLogic logic cell Xilinx logic cells are based on the use oI SRAM as a look-up table. The truth table Ior a K-input logic Iunction is stored in a 2 K x 1 SRAM as it is illustrated in Figure 1.15. The address lines oI the SRAM Iunction as inputs and the output (data) line oI the SRAM provides the value oI the logic Iunction. The major advantage oI K-input look-up table is that it can implement any Iunction oI K inputs. The disadvantage is that it becomes unacceptably large Ior more than Iive inputs since the number oI memory cells needed Ior a K-input look-up table is 2 K . Since many oI the logic Iunctions are not commonly used, a large look-up table will be largely underutilized. 24 CH1: Introduction to Field ProgrammableLogic Devices nputs (Address lines) Output 0 1 1 0 1 0 0 0 Figure 1.15 Look-up table The Xilinx 3000 series logic cell contains a Iive input one output look-up table. This block can be conIigured as two Iour-input LUTs iI the total number oI distinct inputs is not greater than Iive. The logic cell also contains sequential logic (two D Ilip-Ilops) and several multiplexers that connect combinational inputs and outputs to the Ilip-Ilops or outputs. The Xilinx 4000 series logic cell contains two Iour input look-up tables Ieeding into a three input LUT. All oI the inputs are distinct and available external to the logic cell. The other diIIerence Irom the 3000 series cell is the use oI two nonprogrammable connections Irom the two Iour input LUTs to the three input LUT. These connections are much Iaster since no programmable switches are used in series. A detailed explanation oI Xilinx 3000 and 4000 series logic cells is given in Chapter 2, since they represent two oI the most popular and widely used FPGAs. Other popular Iamilies oI FPLDs with the coarse-grain logic cells are Altera's EPLDs and CPLDs. The architecture oI Altera 5000 and 7000 series EPLDs has evolved Irom a PLA-based architecture with logic cells consisting oI wide Ian-in (20 to over 100 inputs) AND gates Ieeding into an OR gate with three to eight inputs. They employ a Iloating gate transistor based programmable switch that enables an input wire to be connected to an input to the gate as shown in Figure 1.16. The three product terms are then OR-ed together and can be programmable inverted by an XOR gate, which can also be used to produce other arithmetic Iunctions. Each signal is provided in both truth and complement Iorm with two CH1: Introduction to Field Programmable Logic Devices 25 separate wires. The programmable inversion signiIicantly increases the Iunctional capability oI the block. AND OR AND AND AND XOR Figure 1.16 The Altera 5000 series logic block The advantage oI this type oI block is that the wide AND gate can be used to Iorm logic Iunctions with Iew levels oI logic cells reducing the need Ior programmable interconnect resources. However, it is diIIicult to make eIIicient use oI the inputs to all oI the gates. This loss is compensated by the high packing density oI the wired AND gates. Some shortcomings oI the 5000 series devices are overcome in the 7000 series, most notably it provides two more product terms and has more Ilexibility because neighboring blocks can "borrow" product terms Irom each other. The Altera Flex 8000 and 10K series CPLDs are the SRAM based devices providing low stand-by power and in-circuit reconIigurability. A logic cell contains 4-input LUT that provides combinational logic capability and a programmable register that oIIers sequential logic capability. High system perIormance is provided by a Iast, continuous network oI routing resources. The detailed description oI both major Altera's series oI CPLDs is given in Chapter 2. Most oI the logic cells described above include some Iorm oI sequential logic. The Xilinx devices have two D Ilip-Ilops, while the Altera devices have one D Ilip- Ilop per logic cell. Some devices such as Act-1 do not explicitly include sequential logic, Iorming it using programmable routing and combinational logic cells. 26 CH1: Introduction to Field ProgrammableLogic Devices 1.5 Routing Architecture The routing architecture oI an FPLD determines a way in which the programmable switches and wiring segments are positioned to allow the programmable interconnection oI logic cells. A routing architecture Ior an FPLD must meet two criteria: routability and speed. Routability reIers to the capability oI an FPLD to accommodate all the nets oI a typical application, despite the Iact that the wiring segments must be deIined at the time the blank FPLD is made. Only switches connecting wiring segments can be programmed (customized) Ior a speciIic application, not the numbers, lengths or locations oI the wiring segments themselves. The goal is to provide a suIIicient number oI wiring segments while not wasting chip area. It is also important that the routing oI an application can be determined by an automated algorithm with minimal intervention. Propagation delay through the routing is a major Iactor in FPLD perIormance. AIter routing an FPLD, the exact segments and switches used to establish the net are known and the delay Irom the driving output to each input can be computed. Any programmable switch (EPROM, pass-transistor, or antiIuse) has a signiIicant resistance and capacitance. Each time a signal passes through a programmable switch, another RC stage is added to the propagation delay. For a Iixed R and C, the propagation delay mounts quadratically with the number oI series RC stages. The use oI a low resistance switch, such as antiIuse, keeps the delay low and its distribution tight. OI equal signiIicance is optimization oI the routing architecture. Routing architectures oI some commercial FPLD Iamilies are presented in this section. In order to present commercial routing architectures, we will use the routing architecture model shown in Figure 1.17. First, a Iew deIinitions are introduced in order to Iorm a uniIied viewpoint when considering routing architectures. A wire segment is a wire unbroken by programmable switches. One or more switches may attach to a wire segment. Typically, one switch is attached to the each end oI a wire segment. A track is a sequence oI one or more wire segments in a line. A routing channel is a group oI parallel tracks. CH1: Introduction to Field Programmable Logic Devices 27 Logic Cell Switch BIock Conn- ection Block Logic Cell Conn- ection Block Logic Cell Logic Cell Routing Channel Track Wire Segments Logic Cell Pins Figure 1.17 General FPLD routing architecture model As shown in Figure 1.17, the model contains two basic structures. The Iirst is a connection block which appears in all architectures. The connection block provides connectivity Irom the inputs and outputs oI a logic block to the wire segments in the channels and can be both vertical or horizontal. The second structure is the switch block which provides connectivity between the horizontal as well as vertical wire segments. The switch block in Figure 1.17 provides connectivity among wire segments on all Iour oI its sides. Trade-oIIs in routing architectures are illustrated in Figure 1.18. Figure 1.18(a) represents a set oI nets routed in a conventional channel. Freedom to conIigure the wiring oI an MPLD allows us to customize the lengths oI horizontal wires. 28 CH1: Introduction to Field ProgrammableLogic Devices 1 1 2 2 4 4 3 3 1 1 2 2 4 4 3 3 Programmed antifuse Unprogrammed antifuse Switch a)Routing in unconstrained channel b)Routing in fully segmented channel 1 1 2 2 4 4 3 3 c)Routing in non-segmented channel Figure 1.18 Types of routing architecture In order to have complete Ireedom oI routing, a switch is required at every cross point. More switches are required between two cross points along a track to allow the track to be subdivided into segments oI arbitrary length, as shown in Figure 1.18(b). In FPLDs, each signal enters or leaves the channel on its own vertical segment. An alternative is to provide continuous tracks in suIIicient number to accommodate all nets, as shown in Figure 1.18(c). This approach is used in many types oI programmable logic arrays and in the interconnect portion oI certain programmable devices. Advantages are that two RC stages are encountered and that the delay oI each net is identical and predictable. However, Iull length tracks are used Ior all, even short nets. Furthermore, the area is excessive, growing quadratically with the number oI nets. This is the reason to employ some intermediate approaches, usually based on segmentation oI tracks into varying CH1: Introduction to Field Programmable Logic Devices 29 (appropriate) sizes. A well-designed segmented channel does not require many more tracks than would be needed in a conventional channel. Although surprising, this Iinding has been supported both experimentally and analytically. In the Xilinx 3000 series FPGAs, the routing architecture connections are made Irom the logic cell to the channel through a connection block. Since the connection site is large, because oI the SRAM programming technology, the Xilinx connection block typically connects each pin to only two or three out oI Iive tracks passing a cell. Connection blocks connect on all Iour sides oI the cell. The connections are implemented by pass transistors Ior the output pins and multiplexers Ior input pins. The use oI multiplexers reduces the number oI SRAM cells needed per pin. The switch block makes a connection between segments in intersecting horizontal and vertical channels. Each wire segment can connect to a subset oI the wire segments on opposing sides oI the switch block (typically to 5 or 6 out oI 15 possible wire segments). This number is limited by the large size and capacitance oI the SRAM programmable switches. There are Iour types oI wire segments provided in the Xilinx 3000 architecture and Iive types in the Xilinx 4000 architecture. The additional wire segment consists oI so called double-length lines that essentially represent the wire segments oI the double length that are connected to every second switch block. In the Xilinx 4000 devices the connectivity between the logic cell pins and tracks is much higher because each logic pin connects to almost all oI the tracks. The detailed presentation oI the Xilinx routing architectures is given in Chapter 2. The routing architecture oI the Altera 5000 and 7000 series EPLDs uses a two- level hierarchy. At the Iirst level hierarchy, 16 or 32 oI the logic cells are grouped into a Logic Array Block (LAB) providing a structure very similar to the traditional PLD. There are Iour types oI tracks passing each LAB. In the connection block every such track can connect into every logic cell pin making routing very simple. Using Iewer connection points results in better density and perIormance, but yields more complex routing. The internal LAB routing structure could be considered as segmented channel, where the segments are as long as possible. Since connections also perIorm wire ANDing, the transistors have two purposes. Connections among diIIerent LABs are made using a global interconnect structure called a Programmable Interconnect Array (PIA). It connects outputs Irom each LAB to inputs oI other LABs, and acts as one large switch block. There is Iull connectivity among the logic cell outputs and LAB inputs within a PIA. The advantage oI this scheme is that it makes routing easy, but requires many switches adding more to the capacitive load than necessary. Another advantage is the delay through the PIA is the same regardless oI which track is used. This Iurther helps 30 CH1: Introduction to Field ProgrammableLogic Devices predict system perIormance. However, the circuits can be much slower than with segmented tracks. A similar approach is Iound in the Altera 8000 series CPLDs. Connections among LABs are implemented using FastTrack Interconnect continuous channels that run the length oI the device. A detailed presentation oI both oI Altera's interconnect and routing mechanisms is given in Chapter 2. 1.6 Design Process The complexity oI FPLDs has surpassed the point where manual design is desirable or Ieasible. The utility oI an FPLD architecture becomes more and more dependent on automated logic and layout synthesis tools. The design process with FPLDs is similar to other programmable logic design. Input can come Irom a schematic netlist, a hardware description language, or a logic synthesis system. AIter deIining what has to be designed, the next step is design implementation. It consists oI Iitting the logic into the FPLD structures. This step is called "logic partitioning" by some FPGA manuIacturers and "logic Iitting" in reIerence to CPLDs. AIter partitioning, the design soItware assigns the logic, now described in terms oI Iunctional units on the FPLD, to a particular physical locations on the device and chooses the routing paths. This is similar to placement and routing traditional gate arrays. One oI the main advantages oI FPLDs is their short development cycle compared to Iull- or semi-custom integrated circuits. Circuit design consists oI three main tasks: design deIinition design implementation design modiIication From the designer`s point oI view, the Iollowing are important Ieatures oI design tools: enable that the design process evolves towards behavioral level speciIication and synthesis CH1: Introduction to Field Programmable Logic Devices 31 provide design Ireedom Irom details oI mapping to speciIic chip architecture provide an easy way to change or correct design A variety oI design tools are used to perIorm all or some oI the above tasks. Chapter 3 is devoted to the high level design tools with an emphasis on those that enable behavioral level speciIication and synthesis, primarily high-level hardware description languages. Examples oI designs using two oI such languages, the Altera Hardware description Language (AHDL) and VHSIC Hardware description Language (VHDL), are given together with the introduction to these speciIication tools. An application targeted to an FPLD can be designed on any one oI several logic or ASIC design systems, including schematic capture and hardware description languages. To target an FPLD, the design is passed to FPLD speciIic implementation soItware. The interIace between design entry and design implementation is a netlist that contains the desired nets, gates, and reIerences to speciIic vendor provided macros. Manual and automatic tools can be used interchangeably or an implementation can be done Iully automatically. A combination oI moderate density, reprogrammability and powerIul prototyping tools to a hardware designer resembles a soItware-like iterative- implementation methodology. Figure 1.19 is presented to compare a typical ASIC and typical FPLD design cycle. In a typical ASIC design cycle, the design is veriIied by simulation at each stage oI reIinement. Accurate simulators are slow. ASIC designers use the whole range oI simulators in the speed/accuracy spectrum in an attempt to veriIy their design. Although simulation can be used in designing Ior FPLDs, simulation can be replaced with in-circuit veriIication by simulating the circuitry in real time with a prototype. The path Irom design to prototype is short allowing veriIication oI the operation over a wide range oI conditions at high speed and high accuracy. A Iast design-place-route-load loop is similar to the soItware edit-compile-run loop and provides similar beneIits, a design can be veriIied by the trial and error method. A designer can also veriIy that a design works in a real system, not merely in a potentially erroneous simulation. Design by prototype does not veriIy proper operation with worst case timing, but rather that a design works on the typical prototype part. To veriIy worst case timing, designers can check speed margins in actual voltage and temperature corners with a scope and logic analyzer, speeding up marginal signals. They also may use a 32 CH1: Introduction to Field ProgrammableLogic Devices soItware timing analyzer or simulator aIter debugging to veriIy worst case paths or simply use Iaster speed grade parts in production to ensure suIIicient speed margins over the complete temperature and voltage range. Logic Design Logic Simulation Place and Route Timing Simulation Test Pattern Generation Fault Simulation Wafer Fabrication Testing Prototype Debug Production Logic design Place and Route Configure Prototype Debug Traditional ASC Design Cycle Design Cycle for FPLDs Figure 1.19 Comparing Design Cycles As with soItware development, a reprogrammable FPLD removes the dividing line between prototyping and production. A working prototype may qualiIy as a production part iI it meets perIormance and cost goals. Rather than redesign, a designer may choose to substitute a Iaster FPLD and use the same programming bitstream, or choose a smaller, cheaper FPLD (with more manual work to squeeze the design into a smaller device). A third choice is to substitute a mask-programmed version oI the logic array Ior the Iield-programmable array. All three options are CH1: Introduction to Field Programmable Logic Devices 33 much simpler than a system redesign, which must be done Ior traditional MPLDs or ASICs. The design process usually begins with the capture oI the design. Most users enter their designs as schematics built oI macros Irom a library. An alternative is to enter designs in terms oI Boolean equations, state machine descriptions, or Iunctional speciIications. DiIIerent portions oI a design can be described in diIIerent ways, compiled separately, and merged at some higher hierarchical level in the schematic. Several guidelines are suggested Ior reliable design with FPLDs, mostly the same as those Ior users oI MPLDs. The major goal is to make the circuit Iunction properly independent oI shiIts in timing Irom one part to the next. Guidelines will be discussed in Chapter 3. Rapid system prototyping is most eIIective when it becomes rapid product development. Reprogrammability allows a system designer another option, to modiIy the design in an FPLD by changing the programming bitstream aIter the design is in the hands oI the customer. The bitstream can be stored in a dedicated (E)PROM or elsewhere in the system. In some existing systems, manuIacturers send modiIied versions oI hardware on a Iloppy disk or as a Iile sent over modem. 1.7 FPLD AppIications FPLDs have been used in a large number oI applications, ranging Irom the simple ones replacing glue logic to those implementing new computing paradigms, that are not possible using other technologies. In this section we will list some oI them, as to make a classiIication into some typical groups, and emphasize most important Ieatures oI each group. CPLDs are used in applications that can eIIiciently use wide Ian-in oI AND/OR gates and do not need a large number oI Ilip-Ilops. Examples oI such circuits are various kinds oI Iinite state machines. On the other hand, FPGAs with a large number oI Ilip-Ilops are better suited Ior the applications that need memory Iunctions and complex data paths. Also, due to their easy reprogrammability they become an important element oI prototyping digital systems designs. As such they enable emulation oI entire complex systems, and in many cases also their Iinal implementation. Finally, all FPGAs as the static RAM based circuits allow at least a minimum level oI dynamic reconIigurability. While all oI them allow Iull device reconIiguration by downloading another bitstream (conIiguration Iile), some oI them also allow partial reconIiguration. The partial reconIiguration provides change oI the Iunction oI a part oI the device, while the remaining part operates without disruption oI the system Iunction. 34 CH1: Introduction to Field ProgrammableLogic Devices In order to visualize the range oI current and potential applications, we have to mention typical Ieatures oI FPLDs in terms oI their capacity and speed. Today the leading suppliers oI FPLDs oIIer devices containing up to 500,000 equivalent (two- input NAND) gates, with a perspective to quadruple this Iigure in the next two to three years. These devices are delivered in a number oI conIigurations so that application designers have the choice to Iit their designs into a device with minimal capacity. They also come in a range oI speed grades and diIIerent packages with diIIerent number oI input/output pins. The number oI pins sometimes exceeds 600. The speed oI circuits implemented in FPLDs varies depending primarily on application and design approach. As an illustration, all major manuIacturers oIIer devices that provide Iull compliance with 64-bit 66MHz PCI-bus requirements. 1.7.1 Glue Random Logic Replacement Initial applications oI FPLDs were inIluenced by earlier use oI simple PLDs. Having larger capacity than simple PLDs, FPLDs have been used to replace random glue logic. In an application oI this type FPLDs obviously provide lower chip count, more compact and reliable designs and higher speeds as the entire circuit can usually be implemented within a single device. The overall system design can be placed on a smaller PCB. With good planning oI external pin assignment, the design Iitting into an FPLD can be later modiIied without re-designing the entire PCB. In particular, this applies when using FPLDs that are in-circuit programmable. In this case there is no need Ior removal and insertion oI FPLDs on a PCB, as the circuit can be programmed while being on a PCB. In the case oI in-system programmable FPLDs, it is possible to reconIigure hardware and partially change a Iunction implemented in an FPLD, without powering it down. A typical example oI replacing glue logic is in interIacing standard microprocessor or microcontroller based systems. Glue logic is used to provide interIaces to subsystems like external memories and speciIic peripherals. It usually requires a timing and logic adjustment that is achieved using diIIerent combinational and sequential circuits (decoders, multiplexers, registers, and Iinite state machines). An example is given in Chapter 8, where interIacing requirements Ior connecting memory and head-on display to VuMan wearable computer, which is based on a standard 386SX processor, are implemented in a single FPLD. A number oI other small examples oI circuits that are easily customized and replace a number oI standard SSI and MSI circuits are given throughout the book. Even simpler standard VLSI circuits can oIten Iit into an FPLD, as illustrated in Figure 1.20. CH1: Introduction to Field Programmable Logic Devices 35 Figure 1.20 Glue logic replacement 1.7.2 Hardware Accelerators For many applications FPLDs have a perIormance which is superior to a more traditional microprocessor or digital signal processor. This is especially true Ior tasks that can be parallelized and executed several orders oI magnitude Iaster on an FPLD than on a microprocessor. The idea oI using hardware accelerators that supply a single, dedicated service to the microprocessor is employed in such applications as graphic acceleration, sound, and video processing. However, iI the algorithms require processing oI data with non-standard Iormats and repetitive execution oI relatively simple operations, the FPLDs represent an obvious choice. Furthermore, having a system with reconIigurable hardware, several advantages can be achieved, such as: reduced number oI components and space as the FPLD can implement diIIerent Iunctions/systems at diIIerent times new versions oI design are implemented by simple downloading conIiguration bitstream new Iunctions can be added as required The acceleration Iunctions implemented in the FPLD can be in a Iorm oI Iunctional unit, co-processor, attached processing unit or stand-alone processing unit, connected by an input/output interIace to the main microprocessor-based Address Data Control & Status Bus Standard Microprocessor FPLD Address Decoders Multiplexers Registers General Timing Adjustments Memory interface Display interface Serial UART A/D and D/A controller interface 36 CH1: Introduction to Field ProgrammableLogic Devices system. The Iurther away is the Iunction Irom the microprocessor, the slower is the exchange oI data between the microprocessor and the Iunction. An example oI image enhancement co-processor hardware accelerator is shown in Figure 1.21. Figure1.21 Using FPLD to implement a hardware accelerator 1.7.3 Non-standard Data Path/Control Unit Oriented Systems OItentimes complex computational systems and algorithms can be described in a Iorm oI dataIlow oriented description and implemented as data path controlled by its own control unit. Such non-standard dedicated systems, especially in the case oI low volumes, are the best candidates Ior implementation in FPLDs. Typical applications include digital signal and image processing, neural networks and other complex, computationally demanding algorithms. By using high-level integrated design tools, capable to easily capture such an application (hardware description languages), the design process oI dedicated hardware systems in its complexity becomes comparable to soItware-only solutions. An illustration oI non-standard data path/control-unit system is given in Figure 1.22. Address Data Control Address Data Control & Status Bus Standard Microprocessor nput image memory Output image memory mage Enhancement Co-Processor mage memory Control and address generators FPLD CH1: Introduction to Field Programmable Logic Devices 37 Figure 1.22 Complex non-standard datapath/control unit systems 1.7.4 Virtual Hardware The reconIigurable hardware can be viewed in two completely new ways. First, as a hardware resource that can perIorm diIIerent tasks on demand, executing them one at the time. The user perceives a hardware resource to be 'larger than it actually is. An illustration oI such a system is given in Figure 1.23. DiIIerent hardware conIigurations are stored in a conIiguration memory and executed one at the time, as the overall application requires. Another view is to consider it as a hardware cache where the most recently used hardware elements are stored and accessed by an application. The implementation oI a hardware cache, or virtual hardware system, requires some Iorm oI management to control the process and ensure it runs eIIectively. There are several options available Ior this management task, including standalone hardware (can be another FPLD), custom soItware routines, or integrated operating system support. Outputs REG ALGORTHM 1 REG REG REG FUNC REG ALGORTHM 2 REG Control Unit Datapath nputs FPLD 38 CH1: Introduction to Field ProgrammableLogic Devices Figure 1.23 FPLD used to implement virtual hardware 1.7.5 Custom-Computing Machines In recent years, several computing systems have been developed implementing a custom processor within an FPLD. In this type oI system, the goal is not to compete with perIormances oI dedicated processors, but rather to provide a platIorm with an optimal partitioning oI Iunctions between hardware and soItware components. This approach allows the attributes oI the custom processor, such as the architecture oI its core and instruction set, to be modiIied as the application requires. The FPLD can implement not only processor core but also hardware acceleration units in the Iorm oI Iunctional units as illustrated in Figure 1.24. This type oI system can utilize the Ilexibility oI soItware and the speed oI hardware in single device to achieve optimal perIormance. The advantage oI this approach is that hardware Iunctional units are located in the closest position to the processor core, and thereIore the communication interIace can be very Iast. However, the entire system is compile (synthesis) time conIigurable, and in order to be run-time reconIigurable it requires truly dynamically reconIigurable FPLDs. As such, custom-computing machines represent an ideal platIorm Ior achieving the goal oI hardware/soItware co-design and to partition a task into soItware and hardware components to satisIy design criteria. The design criteria may not necessarily be to develop a system with highest speed perIormance. Taking into account the cost and other constraints, a trade-oII between a Iully hardware and Iully soItware solution is required. Almost all digital systems designers have been aware oI that Iact, especially those designing embedded systems. Traditionally, hardware and soItware parts have been designed Configuration host Configuration Memory Configuration 1 Configuration 2 (active) Configuration n Configuration 2 FPLD CH1: Introduction to Field Programmable Logic Devices 39 independently, however, with little eIIort to pursue a concurrent design. The goal oI hardware/soItware co-design is to design both hardware and soItware parts Irom a single speciIication and make the partitioning decisions based on design criteria. In Chapter 7 we present a simple processor core, called SimP, that can be easily modiIied and customized at the compile time as the application requires, and was used as an initial vehicle in our hardware/soItware co-design research. Figure 1.24 Custom-computing machine based on a fixed processor core 1.8 Questions and ProbIems 1.1 Describe the major diIIerences between discrete logic, Iield-programmable logic and custom logic. 1.2 What are the major ways oI classiIying FPLDs? 1.3 How would you describe the impact oI complexity and granularity oI a logic element on the design oI more complex logic? 1.4 What is the role oI multiplexers in programmable logic circuits. Explain it using examples oI implementation oI diIIerent (alternative) data paths depending on the value oI select inputs. Processor Core with Data Path and Control Unit and Core nstruction Set Custom instructions that operate on Core Data path Functional Unit 1 . . . Functional Unit n FPLD 40 CH1: Introduction to Field ProgrammableLogic Devices 1.5 How do look-up tables (LUTs) implement logic Iunctions? What are advantages oI using LUTs Ior this purpose? What is the role oI read and write operation on the look-up table? 1.6 Given Iive-input/single-output look-up table. How many memory locations it contains? How many diIIerent logic (Boolean) Iunctions can be implemented in it? Implement the Iollowing Boolean Iunctions using this table: a) F(A, B, C, D, E) ABC`D`E` A`BCDE` A`BCDE` A`B`C`DE b) F(A, B, C, D, E) ABC AB`CDE` DE c) F(A, B, C, D, E) (AB`C)(AB`C`DE)(A`BE`) 1.7 Four-input/single-output LUT is given as the basic building block Ior combinational logic. Implement the Iollowing logic Iunctions a) F(A, B, C, D, E) AB`CDE` ABC`D`E b) F(A, B, C, D, E) (ABCE)(AB`D`)(B`C`DE) using only LUTs oI the given type. How many LUTs you need Ior this implementation? Show the interconnection oI all LUTs and list contents oI each oI them. 1.8 'Design your own FPLD circuits that contains only Iour-input/single-output LUTs based logic elements, which can Iit the designs Irom the previous problem. Show partitioning and Iitting oI the design to your FPLD. Draw all connections assuming that suIIicient number oI long interconnect lines are available. Your FPLD should be organized as a matrix oI LUTs. 1.9 List at least three advantages and disadvantages oI segmented and non- segmented interconnection mechanism used in FPLDs. 1.10 Analyze a typical microprocessor-based embedded system that requires external RAM and ROM and address decoding to access other external chips. How would you implement address decoding using standard SSI/MSI components? How FPLD-based solution can reduce the number oI components? 1.11 Give a Iew examples oI hardware accelerators that can signiIicantly improve perIormance oI a microprocessor/DSP-based solution. Explain the advantages oI implementing the accelerator in an FPLD. 1.12 What is the diIIerence between reconIigurability and dynamic reconIigurability. Illustrate this with examples oI use oI each oI them. CH1: Introduction to Field Programmable Logic Devices 41 1.13 What are in-circuit and in-system programmability oI FPLDs? What are their advantages Ior implementation oI digital systems over other technologies? 1.14 What are the obstacles in implementing virtual hardware? Explain it on examples oI currently available FPLD architectures. 1.15Analyze a typical 8- or 16-bit microprocessor and its instruction set. How would you minimize the instruction set and processor architecture and still be able to implement practically any application?