Академический Документы
Профессиональный Документы
Культура Документы
Contents
Contents
List of Figures List of Tables 1 Basics of CMOS Circuit Design 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6 1.2 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.2.9 0-12 0-29 1-1
pn Junction Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 pn Junction Space Charge Area and Electric Field . . . . . . . . . . . . 1-1 pn Junction Built-in Potential . . . . . . . . . . . . . . . . . . . . . . . . 1-2 pn Junction Depletion Width . . . . . . . . . . . . . . . . . . . . . . . . 1-3 pn Junction with External Voltage . . . . . . . . . . . . . . . . . . . . . 1-4 pn Junction Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 pn Junction Current Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 MOSFET Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 MOS Capacitor and Threshold Voltage . . . . . . . . . . . . . . . . . . 1-7 MOSFET Operation Modes . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 MOSFET current characteristic . . . . . . . . . . . . . . . . . . . . . . . 1-16 Biased MOSFET Current Equations . . . . . . . . . . . . . . . . . . . . 1-21 Measurement of device parameters . . . . . . . . . . . . . . . . . . . . . 1-22 The Complete MOSFET GCA Analysis . . . . . . . . . . . . . . . . . . 1-23 Depletion mode nchannel MOSFET . . . . . . . . . . . . . . . . . . . 1-24 pchannel MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28
1.2.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-29 1.2.11 Modelling the MOS Transistor for Circuit simulation . . . . . . . . . . . 1-30 1.3 DC Characteristics of MOS Inverters . . . . . . . . . . . . . . . . . . . . . . . . 1-32 1.3.1 1.3.2 Basic Inverter characteristics . . . . . . . . . . . . . . . . . . . . . . . . 1-33 Inverter with Linear Resistor Load . . . . . . . . . . . . . . . . . . . . . 1-38
VLSI
Design Course
0-1
Contents
1.3.3 1.3.4 1.3.5 1.3.6 1.3.7 1.4 1.4.1 1.4.2 1.4.3 1.4.4 1.4.5 1.4.6 1.4.7 1.4.8 1.4.9 1.5 1.5.1 1.5.2
. . . . . . . . . . . . . . . . . . . . . . 1-42
Inverter with Saturated Enhancement Load . . . . . . . . . . . . . . . . 1-44 Inverter with Nonsaturated Enhancement Load . . . . . . . . . . . . . . 1-45 Inverter with Depletion mode MOSFET Load . . . . . . . . . . . . . . . 1-46 CMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-51 The output High-to-Low Time tHL . . . . . . . . . . . . . . . . . . . . . 1-54 Rise Time tLH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-54 NMOS Propagation Delay Time . . . . . . . . . . . . . . . . . . . . . . 1-56 CMOS Inverter Transient Response . . . . . . . . . . . . . . . . . . . . 1-57 Propagation Delay Time tp of CMOS Inverters . . . . . . . . . . . . . . 1-57 Power-Delay-Product (PDP) . . . . . . . . . . . . . . . . . . . . . . . . 1-65 MOSFET Capacitances . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-70 Inverter Output Capacitance . . . . . . . . . . . . . . . . . . . . . . . . 1-75 Scaled Inverter Performance . . . . . . . . . . . . . . . . . . . . . . . . . 1-78 CMOS Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-79 The Latch-Up Eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-80 2-1
Overview: Combinational Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Complex nMOS Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.2.1 2.2.2 2.2.3 nMOS NOR Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 nMOS NAND Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 nMOS Complex Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 CMOS NAND and NOR Gates . . . . . . . . . . . . . . . . . . . . . . . 2-10 Static CMOS Logic Design . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Pseudo nMOS Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23 Passtransistor Charging Characteristics . . . . . . . . . . . . . . . . . . 2-25 Passtransistor Discharging Characteristics . . . . . . . . . . . . . . . . . 2-26 CMOS Transmission Gates . . . . . . . . . . . . . . . . . . . . . . . . . 2-28 3-1
2.3
2.4
Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
VLSI
Design Course
0-2
Contents
. . . . . . . . . . . . . . . . . . . . . 3-2
Clocked Static Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 Charge Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 Dynamic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 Dynamic nMOS Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 Dynamic pMOS Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Dynamic CMOS Properties and Conditions . . . . . . . . . . . . . . . . 3-15 Complex Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 Dynamic Cascades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 Domino Logic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23 Charge Leakage and Charge Sharing . . . . . . . . . . . . . . . . . . . . 3-24 NORA Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 The Signal Race Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 NORA Structuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 Principle of CMOS Information Storage . . . . . . . . . . . . . . . . . . 3-31 Dynamic Flip-Flops: Pseudo 2-Phase Clocking . . . . . . . . . . . . . . 3-33 Pseudo 2-Phase Memory Structures . . . . . . . . . . . . . . . . . . . . 3-34 Dynamic Flip-Flop with reduced Transistor Count and Clock Connection 3-37 Dynamic D-Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-38 Pseudo 2-Phase Logic Structures . . . . . . . . . . . . . . . . . . . . . . 3-39 Pseudo 2-Phase Logic Structures: Domino Logic . . . . . . . . . . . . . 3-40 2-Phase Memory Structures: Skew Reduction . . . . . . . . . . . . . . . 3-41 2-Phase Memory Structures: Chain Latch . . . . . . . . . . . . . . . . . 3-42
3.5
3.6
3.7
Memory Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-31 3.7.1 3.7.2 3.7.3 3.7.4 3.7.5 3.7.6 3.7.7 3.7.8 3.7.9
3.7.10 2-Phase Memory Structures: Static Flip-Flops . . . . . . . . . . . . . . 3-43 3.7.11 2-Phase Memory Structures: Static D Flip-Flops . . . . . . . . . . . . . 3-44 3.7.12 Static D Flip-Flop with Set and Reset . . . . . . . . . . . . . . . . . . . 3-47 4 Performance 4.1 4.1.1 4.1.2 4-1
VLSI
Design Course
0-3
Contents
CMOS Gate Transistor Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 Power Dissipation 4.3.1 4.3.2 4.3.3 Static power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Dynamic power dissipation: . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Power delay product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Scaling principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15 Interconnect layer scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17 Power distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18 Clock distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19 Clock and Timing Circles . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20 Clock Generation Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21 Clock Drivers and Distribution Techniques . . . . . . . . . . . . . . . . 4-22
4.4
4.5
Power and Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18 4.5.1 4.5.2 4.5.3 4.5.4 4.5.5
Input Protection Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23 Static Gate Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 O-Chip Driver Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29 4.8.1 4.8.2 Basic O-Chip Driver Design . . . . . . . . . . . . . . . . . . . . . . . . 4-29 Tri-State and Bidirectional I/O . . . . . . . . . . . . . . . . . . . . . . . 4-30 5-1
5 CMOS Process and Layout Design of Integrated Circuits 5.1 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.2 5.2.1 5.2.2 5.3 5.3.1 5.3.2
Processing Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Wafer Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 The n-Well CMOS Process . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 The p-Well CMOS Process . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 The Twin-Tub Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 Latchup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 Lithography and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . 5-14 Basic Design Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 Connectivity Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 Parasitic Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . 5-24
VLSI
Design Course
0-4
Contents
5.3.3 5.3.4 5.3.5 5.4 5.4.1 5.4.2 5.4.3 5.4.4 5.4.5 5.4.6 5.5
Transistor Size Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24 Parasitic Resitance Extraction . . . . . . . . . . . . . . . . . . . . . . . 5-25 Process Parameter and Technology Description . . . . . . . . . . . . . . 5-25 IC Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 General Layout Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 Equivalent Load Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 Latch-Up Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 Static Gate Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 Transistor-Gate-Based Logic . . . . . . . . . . . . . . . . . . . . . . . . 5-32
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Package Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6.2.1 6.3.1 6.3.2 6.3.3 6.3.4 24-pin Packaging Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 VLSI Design Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 Thermal Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Electricial Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Mechanical Design Considerations . . . . . . . . . . . . . . . . . . . . . 6-11 Wafer Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Die Bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Wire Bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 Ceramic Package Technology . . . . . . . . . . . . . . . . . . . . . . . . 6-17 Glass-Sealed Refractory Technology . . . . . . . . . . . . . . . . . . . . 6-19 Plastic Molding Technology . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 Molding Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6.4
6.5
6.6 6.7
IC Package Market Share . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 Packaging Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 6.7.1 6.7.2 MultiChip Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 Comparison of Packaging Alternatives . . . . . . . . . . . . . . . . . . . 6-27 7-1
VLSI
Design Course
0-5
Contents
CAD Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 Full Custom Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 Cell Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 Design Verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.4.1 7.4.2 7.4.3 7.4.4 Physical Design Rule Check . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 LVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 Schematic / Electrical Rule Check (SRC / ERC) . . . . . . . . . . . . . 7-11 Goal of Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Simulator Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Signal Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Signal States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Circuit and Delay Modelling . . . . . . . . . . . . . . . . . . . . . . . . 7-13 Advanced Logic Simulators . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 Simulation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 Switch Level Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
7.5
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 7.5.1 7.5.2 7.5.3 7.5.4 7.5.5 7.5.6 7.5.7 7.5.8
7.6
Weinberger Structuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 Gate Matrix Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 8.2.1 8.2.2 8.2.3 8.2.4 Creating a Gate Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 Example: Half-Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9 Character Denitions for Symbolic Layout . . . . . . . . . . . . . . . . . 8-10 Summary of Gate Matrix Properties . . . . . . . . . . . . . . . . . . . . 8-13 CMOS Functional Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15 Basic Layout Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18 Graph Theoretical Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 8-21 Problem Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22 Algorithm for Calculating Minimal Interlace . . . . . . . . . . . . . . . . 8-24 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30
8.3
Optimal CMOS Complex Gate Layout . . . . . . . . . . . . . . . . . . . . . . . 8-14 8.3.1 8.3.2 8.3.3 8.3.4 8.3.5 8.3.6
8.4 8.5
VLSI
Design Course
0-6
Contents
8.5.1 8.5.2 8.5.3 8.5.4 8.5.5 8.5.6 8.5.7 8.5.8 8.6 8.6.1 8.6.2 8.6.3 8.6.4 8.6.5 8.6.6 8.6.7
Floor Plan for PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35 Static nMOS and Pseudo-nMOS PLA . . . . . . . . . . . . . . . . . . . 8-36 Static CMOS PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-39 Dynamic CMOS PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-39 Noise in PLAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41 Optimization of PLAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41 Timing and Power Dissipation of a Static PLA . . . . . . . . . . . . . . 8-44 Automatic PLA Layout Generation . . . . . . . . . . . . . . . . . . . . 8-45
Finite-State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-47 Introduction into Finite State Machines . . . . . . . . . . . . . . . . . . 8-49 Realization of Finite-State Machines . . . . . . . . . . . . . . . . . . . . 8-51 Synchronous FSM Circuit Models . . . . . . . . . . . . . . . . . . . . . 8-54 States and Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-56 Equivalence of FSMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-57 Regular Expressions and Nondeterministic FSMs . . . . . . . . . . . . . 8-59 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-62 9-1
9 ASIC Design Concepts 9.1 9.1.1 9.1.2 9.1.3 9.1.4 9.2 9.3 9.2.1 9.3.1 9.3.2 9.3.3 9.3.4 9.3.5 9.3.6 9.3.7 9.4 9.4.1
ASIC Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 The VLSI Design Process as a Transformation from Higher to Lower Descriptive Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 Phases of Electronic System Design . . . . . . . . . . . . . . . . . . . . 9-2 Application Architectural Properties . . . . . . . . . . . . . . . . . . . . 9-3 Synthesis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 ASIC Technology Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 Introduction to Gate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 IMI Grid Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 CDI Grid Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13 Gate Array Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14 Personalization Examples for IMI and CDI Gate Array . . . . . . . . . 9-15 Qualication of Gate Array Design Style . . . . . . . . . . . . . . . . . . 9-17 Gate Array Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18 Introduction to Standard Cells . . . . . . . . . . . . . . . . . . . . . . . 9-19
VLSI
Design Course
0-7
Contents
9.4.2 9.4.3 9.5 9.6 9.5.1 9.6.1 9.6.2 9.7 9.7.1 9.7.2 9.7.3 9.8 9.8.1 9.8.2 9.8.3 9.8.4 9.8.5 9.8.6 9.8.7 9.8.8 9.9
Qualication of Standard Cell Design Style . . . . . . . . . . . . . . . . 9-21 Standard Cell Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22 Introduction to the Macro Cell Concept . . . . . . . . . . . . . . . . . . 9-23 Introduction: Mixed Design Styles . . . . . . . . . . . . . . . . . . . . . 9-24 Features of Mixed-Mode ASICs . . . . . . . . . . . . . . . . . . . . . . . 9-24 Classical PLD Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25 Advanced PLD Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-29 PLA-based Device Properties . . . . . . . . . . . . . . . . . . . . . . . . 9-33 The FPGA Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34 FPGA Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-35 Programming Technologies . . . . . . . . . . . . . . . . . . . . . . . . . 9-36 Overview: Commercially Available FPGAs . . . . . . . . . . . . . . . . 9-38 Xilinx Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-39 Actel Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41 CAD for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-44 Economical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 9-46
10.1 Adders / Subtracters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.1.2 Adders / Subtracters for Binary Coded Integers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 11-1
11 Microarchitectures
11.1 Datapath Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.1.1 Bit-slice ALU AMD 2901 . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.2 Controller Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5 11.2.1 Microprogrammed Controllers . . . . . . . . . . . . . . . . . . . . . . . . 11-6 12 ASIC Design Guidelines 12-1
VLSI
Design Course
0-8
Contents
12.2 Synchronous Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.2.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.2.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.3 Clock Buering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.3.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.3.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 12.4 Gated Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.4.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.4.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.5 Double-edged Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.5.1 Non-Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.5.2 Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.6 Asynchronous Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.6.1 Non-Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.6.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.7 Shift-Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10 12.7.1 Non-recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-10 12.7.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10 12.8 Asynchronous Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 12.8.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 12.8.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 12.9 Delay Lines and Monostables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.9.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.9.2 Recommended Circuits 12.10Bistable Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-17
12.10.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.10.2 Recommended Circuits 12.11.1 Recommended Circuits 12.11RAMs and ROMs in Synchronous Circuits . . . . . . . . . . . . . . . . . . . . . 12-17 12.12Tristates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-19 12.12.1 Non-Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 12-19 12.12.2 Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-20 12.12.3 Multiplexer Tristates . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-20 12.13Parallel Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-21 12.13.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-21
VLSI
Design Course
0-9
Contents
12.13.2 Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-21 12.14Fanout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22 12.14.1 Non-Recommended Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 12-22 12.14.2 Recommended Circuits 12.16Design for Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-23 12.15Design for Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-27 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-29 13-1 12.16.1 Non-Recommended Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 12-27 12.16.2 Recommended Circuits
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1
13.2 Economical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 13.2.1 Average Quality Level (AQL) . . . . . . . . . . . . . . . . . . . . . . . . 13-2 13.2.2 Correlation: Fault Coverage and Defective Parts . . . . . . . . . . . . . 13-3 13.3 Design Flow: Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5 13.3.1 Chip Test after Manufacturing . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.4 Fundamental Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.5 Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 13.6 Fault Tolerant Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12 13.7 Test Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-15 13.7.1 The D-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-15 13.8 Fault Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-24 13.8.1 Algorithms: Serial Fault Simulation . . . . . . . . . . . . . . . . . . . . 13-24 13.8.2 Improved Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-24 13.9 Design for Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25 13.9.1 Ad-Hoc Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25 13.9.2 Scan-Path Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-28 13.9.3 Built-In Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-29 13.9.4 Evaluation of Testing Data . . . . . . . . . . . . . . . . . . . . . . . . . 13-33 13.9.5 Built-In Logic Block Observation . . . . . . . . . . . . . . . . . . . . . . 13-37 13.9.6 Example: Self-testing Circuit . . . . . . . . . . . . . . . . . . . . . . . . 13-38 14 Boundary-Scan Architecture JTAG Standard 14-1
14.1 Classical Board Test Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2 14.2 Introduction to Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4 14.3 The IEEE Standard 1149.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-7
VLSI
Design Course
0-10
Contents
14.3.1 IEEE Std 1149.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 14-7 14.3.2 Test Access Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8 14.3.3 TAP-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10 14.3.4 The Instruction Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-11 14.3.5 Test Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13 15 Analog VLSI systems 15-1
15.1 Analog Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1 15.1.1 Signal Bandwidths in Analog VLSI . . . . . . . . . . . . . . . . . . . . . 15-2 15.1.2 A/D and D/A Conversion in Signal Processing Systems . . . . . . . . . 15-3 15.2 Digital-To-Analog Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.2.1 Current Scaling D/A Converters . . . . . . . . . . . . . . . . . . . . . . 15-4 15.2.2 Voltage Scaling D/A Converters . . . . . . . . . . . . . . . . . . . . . . 15-8 15.3 Analog-To-Digital Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-9 15.3.1 Serial A/D Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-10 15.3.2 Successive Approximation A/D Converters . . . . . . . . . . . . . . . . 15-10 15.3.3 Parallel A/D Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11 15.3.4 Sigma-Delta A/D Converter . . . . . . . . . . . . . . . . . . . . . . . . . 15-14 Bibliography 16-1
VLSI
Design Course
0-11
Figures
List of Figures
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Step-prole of pn junction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 n-channel enhancement-mode MOSFET . . . . . . . . . . . . . . . . . . . . . . 1-6 The basic MOS structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 MOS accumulation state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 MOS elds and potentials for positive gate voltages . . . . . . . . . . . . . . . . 1-9 Depletion in the MOS system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Surface inversion in the MOS system . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Increase in depletion charge from body bias VB . . . . . . . . . . . . . . . . . . 1-12 Basic MOSFET channel formation . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
1.10 MOSFET in cuto mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 1.11 MOSFET in nonsaturation mode . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 1.12 MOSFET in saturation mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 1.13 MOSFET geometry used in GCA (MOSFET in linear/nonsaturated region) . . 1-16 1.14 Geometry for GCA current analysis . . . . . . . . . . . . . . . . . . . . . . . . 1-17 1.15 Nonsaturated MOS current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 1.16 Basic MOSFET characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19 1.17 Start of Saturation in a MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . 1-19 1.18 Channel length modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 1.19 MOSFET characteristics with channel length modulation . . . . . . . . . . . . 1-21 1.20 General MOSFET bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21 1.21 Body bias eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22 1.22 Device parameter measurement (a) . . . . . . . . . . . . . . . . . . . . . . . . . 1-22 1.23 Device parameter measurement (b) . . . . . . . . . . . . . . . . . . . . . . . . . 1-23 1.24 Comparision of circuit equations with the complete GCA model . . . . . . . . . 1-24 1.25 Comparision of modied circuit equations with the complete GCA model . . . 1-24 1.26 Depletion-mode MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25 1.27 Simplied depletion-mode MOSFET model . . . . . . . . . . . . . . . . . . . . 1-26
VLSI
Design Course
0-12
Figures
1.28 Depletion-mode MOSFET characteristics . . . . . . . . . . . . . . . . . . . . . 1-27 1.29 Square root of saturated depletion-mode MOSFET current . . . . . . . . . . . 1-28 1.30 p-channel MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28 1.31 Ideal inverter properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-32 1.32 Basic nMOS inverter structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33 1.33 Voltage transfer curve of an nMOS inverter . . . . . . . . . . . . . . . . . . . . 1-34 1.34 Denition: Noise margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35 1.35 Base for NM denitions: cascaded inverter stages . . . . . . . . . . . . . . . . . 1-35 1.36 Model for transmission network problem . . . . . . . . . . . . . . . . . . . . . . 1-36 1.37 Simplied AC circuit model for noise margins . . . . . . . . . . . . . . . . . . . 1-37 1.38 Inverter transient response denitions . . . . . . . . . . . . . . . . . . . . . . . 1-38 1.39 Physical reason for transition times . . . . . . . . . . . . . . . . . . . . . . . . . 1-39 1.40 Inverter with linear resistor load . . . . . . . . . . . . . . . . . . . . . . . . . . 1-40 1.41 VTC for linear resistor load nMOS inverter . . . . . . . . . . . . . . . . . . . . 1-41 1.42 VOH resistor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-42 1.43 VOL resistor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43 1.44 Saturated enhancement load nMOS inverter . . . . . . . . . . . . . . . . . . . . 1-44 1.45 VTC for saturated enhancement load nMOS inverter . . . . . . . . . . . . . . . 1-45 1.46 Nonsaturated enhancement load nMOS inverter . . . . . . . . . . . . . . . . . . 1-46 1.47 VTC for nonsaturated enhancement load nMOS inverter . . . . . . . . . . . . . 1-47 1.48 Symbol for depletion mode MOSFET . . . . . . . . . . . . . . . . . . . . . . . 1-47 1.49 Depletion mode MOSFET load . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-48 1.50 VTC for inverter with depletion mode MOSFET load . . . . . . . . . . . . . . 1-49 1.51 Driver-load ratio for depletion-load inverter . . . . . . . . . . . . . . . . . . . . 1-50 1.52 R for various VOL choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-58 1.53 Basic CMOS inverter structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-59 1.54 CMOS inverter characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-60 1.55 Output high to low time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-61 1.56 Rise time circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-62 1.57 Depletion load rise time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-62 1.58 Propagation delay time denitions . . . . . . . . . . . . . . . . . . . . . . . . . 1-63 1.59 CMOS transient analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-64 1.60 PDP: input signal waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-65 1.61 PDP for inverter with resistor load . . . . . . . . . . . . . . . . . . . . . . . . . 1-66 1.62 Power supply currents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-68
VLSI
Design Course
0-13
Figures
1.63 Capacitances: basic MOSFET structure . . . . . . . . . . . . . . . . . . . . . . 1-70 1.64 MOSFET capacitor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-71 1.65 MOSFET gate capacitances in the three operational regions . . . . . . . . . . . 1-72 1.66 Gate capacitances as functions of gate-source voltage . . . . . . . . . . . . . . . 1-73 1.67 Expanded view of an n+ drain or source region for computing depletion capacitances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-73 1.68 Approximation used for Cout in cascaded nMOS inverters . . . . . . . . . . . . 1-75 1.69 Simplied interconnect scheme for line capacitance . . . . . . . . . . . . . . . . 1-75 1.70 Capacitance calculation for FO = 3 . . . . . . . . . . . . . . . . . . . . . . . . . 1-76 1.71 Approximation used for Cout in cascaded CMOS inverters . . . . . . . . . . . . 1-77 1.72 CMOS process ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-79 1.73 Latch-up in n-tub CMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . 1-80 1.74 Guard rings for latch-up prevention . . . . . . . . . . . . . . . . . . . . . . . . . 1-81 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Example for random logic: adder . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Complex gate logic primitive: CMOS inverter . . . . . . . . . . . . . . . . . . . 2-2 MOS transistors viewed as switches . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 A complementary switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Example for regular design: gate-matrix layout . . . . . . . . . . . . . . . . . . 2-4 nMOS 2-input NOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 nMOS N-input NOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 nMOS 2-input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Example of a complex nMOS circuit . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.10 Evolution of a nMOS XOR circuit . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 2.11 Direct NOT XOR complex gate implementation . . . . . . . . . . . . . . . . . . 2-9 2.12 CMOS NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.13 CMOS NAND gate layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 2.14 CMOS NOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.15 General CMOS static logic gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 2.16 CMOS complex gate construction . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 2.17 Systematic function construction . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18 2.18 Combinational adder schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21 2.19 Combinational adder layout possibilities for one adder circuit . . . . . . . . . . 2-22 2.20 Pseudo nMOS logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23 2.21 Pass transistor logic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24 2.22 Pass transistor structure for NXOR function . . . . . . . . . . . . . . . . . . . . 2-24
VLSI
Design Course
0-14
Figures
2.23 Pass transistor charging characteristics . . . . . . . . . . . . . . . . . . . . . . . 2-25 2.24 Pass transistor discharge characteristics . . . . . . . . . . . . . . . . . . . . . . 2-27 2.25 nMOS pass characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28 2.26 CMOS transmission gate symbols . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28 2.27 CMOS transmission gate realisation . . . . . . . . . . . . . . . . . . . . . . . . 2-29 2.28 pMOS pass transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29 2.29 pMOS pass characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30 2.30 CMOS transmission gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30 2.31 MOSFET operational states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31 2.32 Transmission gate: resistor switch model . . . . . . . . . . . . . . . . . . . . . . 2-31 2.33 Transmission gate: RC switch logic transfer . . . . . . . . . . . . . . . . . . . . 2-32 2.34 Transmission gate: equivalent resistances 2.36 Transmission gate logic . . . . . . . . . . . . . . . . . . . . . 2-32 2.35 Transmission gate: basic layout . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34 2.37 TG-logic: 2-input path selector . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34 2.38 TG-logic: OR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-35 2.39 TG-logic: XOR and equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36 2.40 TG-logic: alternate equivalence logic circuit . . . . . . . . . . . . . . . . . . . . 2-36 2.41 Half adder logic symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37 2.42 TG-logic: Half adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37 2.43 TG-logic: Full adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-38 2.44 Multiplex/Demultiplex operations . . . . . . . . . . . . . . . . . . . . . . . . . 2-39 2.45 TG-logic: 4-to-1 multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-39 2.46 TG-logic: Split-Array MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40 2.47 Pass transistor logic with pMOS pull-up . . . . . . . . . . . . . . . . . . . . . . 2-40 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Ideal nonoverlapping 2-phase clocks . . . . . . . . . . . . . . . . . . . . . . . . 3-1
Basic 2-phase clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Single clock 2-phase timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Generation of inverted clock phase . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 TG delay circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Pseudo 2- clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 Shift register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 Clocked shift register circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 Leakage path in a CMOS TG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
VLSI
Design Course
0-15
Figures
3.10 Charge leakage problem in CMOS TG . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.11 Charge leakage circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 3.12 Transmission gate capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 3.13 Basic charge sharing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.14 Transient voltage behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.15 Basic dynamic nMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3.16 Dynamic nMOS inverter: precharge and evaluate . . . . . . . . . . . . . . . . . 3-13 3.17 Precharge network for worst case . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.18 Evaluation discharge network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.19 Basic dynamic pMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.20 Complex dynamic logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 3.21 Cascaded nMOS-nMOS glitch problem . . . . . . . . . . . . . . . . . . . . . . . 3-17 3.22 Dynamic cascades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 3.23 Basic domino logic circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 3.24 Domino AND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.25 Cascaded domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.26 Visualization of domino eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3.27 Domino timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.28 Cascaded domino circuit with fanout = 2 . . . . . . . . . . . . . . . . . . . . . 3-21 3.29 Cascaded domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22 3.30 Domino AND4 gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23 3.31 Domino stage with pull-up MOSFET . . . . . . . . . . . . . . . . . . . . . . . . 3-24 3.32 Charge sharing in a domino chain . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 3.33 Use of feedback to control a pull-up MOSFET for charge sharing problem . . . 3-25 3.34 Signal race problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 3.35 Clock skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 3.36 NORA structuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 3.37 NORA and sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.38 C2 MOS latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 3.39 NORA pipelined logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 3.40 Connection of components for a simple CMOS ip-op . . . . . . . . . . . . . . 3-31 3.41 Physical Construction of a CMOS ip-op . . . . . . . . . . . . . . . . . . . . . 3-32 3.42 Pseudo 2-phase clocking (a) waveforms and simple latch, (b) clock skew, and (c) slow clock edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-33 3.43 Pseudo 2-phase latches (! charge redistribution problem in (b)) . . . . . . . . . 3-34
VLSI
Design Course
0-16
Figures
3.44 Pseudo 2-phase latch layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35 3.45 Shift register array layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-36 3.46 Reduced transistor count latch . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-37 3.47 Reduced transistor count latch with high impedance sustainer transistor . . . . 3-37 3.48 Dynamic D-Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-38 3.49 Pseudo 2-phase dynamic logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39 3.50 Pseudo 2-phase domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40 3.51 2-phase ip-op and skew reduction . . . . . . . . . . . . . . . . . . . . . . . . 3-41 3.52 Chain latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-42 3.53 2-phase static ip-ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43 3.54 2-phase static D ip-ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-44 3.55 2-phase static D ip-ops (continued) . . . . . . . . . . . . . . . . . . . . . . . 3-45 3.56 2-phase D ip-ops layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-46 3.57 Static D ip-op with set and reset . . . . . . . . . . . . . . . . . . . . . . . . . 3-47 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Basic LOCOS MOSFET structure. . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 MOSFET capacitor model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Expanded view of an n+ drain or source region for computing depletion capacitances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 Representation of long wire in terms of distributed RC sections . . . . . . . . . 4-6 Segmentation of polysilicon line . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Simple model for rc delay calculation . . . . . . . . . . . . . . . . . . . . . . . . 4-7 CMOS inverter pair timing response . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Waveforms for determination of dynamic power dissipation . . . . . . . . . . . 4-11 Input voltage waveforms for the power-delay products . . . . . . . . . . . . . . 4-12 . . . . . . . . . . 4-14
4.10 Power-delay product in a resistively loaded inverter. . . . . . . . . . . . . . . . 4-13 4.11 Current waveforms for the power-delay product calculations. 4.12 Layout pattern for VDD and VSS lines. . . . . . . . . . . . . . . . . . . . . . . . 4-19 4.13 Pseudo 2-Phase Clocking Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20 4.14 Pseudo 2-Phase Overlap Times . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20 4.15 Clock Skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21 4.16 Clock Generator With a TG Delay . . . . . . . . . . . . . . . . . . . . . . . . . 4-22 4.17 Latch-Based Clock Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22 4.18 Clock Skew Due to Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23 4.19 Clock Line Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24 4.20 Clock Line Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
VLSI
Design Course
0-17
Figures
4.21 Input Protection Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 4.22 Thin Oxide MOSFET Protection Circuit . . . . . . . . . . . . . . . . . . . . . . 4-26 4.23 Capacitive Loading Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26 4.24 Inverter Sizing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27 4.25 Double-Inverter O-Chip Driver Circuit . . . . . . . . . . . . . . . . . . . . . . 4-30 4.26 Tri-State Output Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31 4.27 Bi-Directional I/O Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 Cazochalski process for manufacturing silicon ingots . . . . . . . . . . . . . . . 5-2 The n-Well Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 The Active Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 The Poly Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 The n+ Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 The p+ Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 The Contact Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 The Metalisation Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 An Example of a p-Well CMOS Process . . . . . . . . . . . . . . . . . . . . . . 5-7
5.10 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.11 Twin-tub process cross-section and layout of an inverter . . . . . . . . . . . . . 5-9 5.12 LOCOS Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 5.13 Encroachment in LOCOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.14 Trench Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.15 Trench Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 5.16 Origin of CMOS Latchup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 5.17 Trench-isolated CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14 5.18 Active Area Encroachment in LOCOS . . . . . . . . . . . . . . . . . . . . . . . 5-21 5.19 Eective Channel Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22 5.20 Design-mask transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 5.21 Contact Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 5.22 A region with eight terminals has 28 interconnection resistances. Making the cross-hatched juntions into new nodes splits the region into 10 electrically isolated regions and reduces the number of interconnection resistances to 10 . . . 5-25 5.23 Design-mask transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 5.24 General Layout Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29 5.25 Complementary Transistor/Logic Blocks . . . . . . . . . . . . . . . . . . . . . . 5-29 5.26 Equivalent Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
VLSI
Design Course
0-18
Figures
5.27 Guard Ring Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 5.28 Complement Static Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32 5.29 Transmission Gate Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33 5.30 Layout of an inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34 5.31 Layout of a 2-input nand gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35 5.32 Layout of a 2-input nor gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35 5.33 Layout of an exor gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36 5.34 Layout of a ram cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36 5.35 Layout of a pad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37 5.36 Layout of a RS-latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37 5.37 Layout of a D-latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-38 5.38 Layout of a comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-38 5.39 Layout of a 1-bit fulladder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39 6.1 6.2 6.3 Continuous growth in DRAM complexity and size places little demand on package size and number of I/Os . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Comparison of I/O requirements for DRAM, logic and microprocessor devices . 6-3 Examples for packages and PWB mounting techniques: (a) TH: Dual-in-line (DIL) package. (b) TH: Pin-grid-array (PGA) package. (c) SM: J-leaded packages, leaded chip carrier or small-outline. (d) SM: Gull-wing-leaded packages, chip-carrier or small-outline. (e) SM: Butt-leaded package, small-outline dual-in-line type. (f) Leadless type, ceramic chip carrier mounted to a matching ceramic substrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 IC package types as a function of I/Os and attachment type . . . . . . . . . . . 6-5 Package history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Comparison: 24-pin SO package and 48-pin SSO package . . . . . . . . . . . . 6-6 Bonding-pad pitch versus chip lead count for several chip sizes . . . . . . . . . 6-7 Arrangement of staggered bonding pads: lower pitch than with single line of bonding pads. (a) Bonding pads size and spacing. (b) Maximum wire angle with respect to die edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 CAD template for positioning bonding pads (assures that wire span length meets the design rules) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.9
6.10 CAD template for checking adherence to wire-span guidelines. The template also provides an extended zone (beyond the optimum shown in Fig. 6.9) for cases where location in optimum zone is not compatible with the device layout. 6-8 6.11 CAD template for checking the maximum distance that wire spans over silicon. Here: violation of the guidelines. The circle must be at minimum tangent to the step-and-repeat centerline (case of maximum distance) or cross it . . . . . . 6-9 6.12 Lead inductances for various package sizes . . . . . . . . . . . . . . . . . . . . . 6-10
VLSI
Design Course
0-19
Figures
6.13 TCE of materials for semiconductor devices, ( C) . . . . . . . . . . . . . . . . . 6-11 6.14 Plastic package: composite structure consisting of silicon chip, metal leadframe and plastic moulding compound . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 6.15 Generic assembly sequence for plastic and ceramic packages . . . . . . . . . . . 6-13 6.16 Eutectic die bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6.17 Epoxy die bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 6.18 Tailless ball-and-wedge bonding cycle . . . . . . . . . . . . . . . . . . . . . . . . 6-16 6.19 Thermosonic ball wire bonds on a gate array VLSI chip . . . . . . . . . . . . . 6-17 6.20 Process sequence to create a laminated refractory-ceramic product from a ceramic slurry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18 6.21 Cross-sectional sketches of several package types . . . . . . . . . . . . . . . . . 6-18 6.22 Structures of CERDIP and quad CERPAC . . . . . . . . . . . . . . . . . . . . 6-19 6.23 Ball-and-wedge-bonded silicon die in a plastic DIP . . . . . . . . . . . . . . . . 6-20 6.24 Molding processing system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 6.25 IC package market share . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.26 Worldwide IC package market share by material . . . . . . . . . . . . . . . . . 6-22 6.27 Pin count versus usable gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 6.28 Plastic IC package material costs . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24 6.29 Ceramic IC package material costs . . . . . . . . . . . . . . . . . . . . . . . . . 6-25 6.30 MCM: microprocessor performance . . . . . . . . . . . . . . . . . . . . . . . . . 6-26 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.1 8.2 8.3 8.4 8.5 Cell orientations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Full custom layout (hand crafted or generated out of a stick diagram resp. a layout description) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 Corresponding geometrical specication le and schematic diagram . . . . . . . 7-4 Memory cell schematic and corresponding stick diagram . . . . . . . . . . . . . 7-5 Full Custom Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 Standard Cell Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 Example of a design rules set checked during design verication . . . . . . . . . 7-10 Competing drivers at a bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 Example: compiler driven simulation . . . . . . . . . . . . . . . . . . . . . . . . 7-16 NOR gate reduction for Weinberger structuring . . . . . . . . . . . . . . . . . . 8-2 Weinberger structuring for 3-to-8 decoder . . . . . . . . . . . . . . . . . . . . . 8-3 Weinberger structuring for 3-to-8 decoder (continued) . . . . . . . . . . . . . . 8-4 Function representation in random logic . . . . . . . . . . . . . . . . . . . . . . 8-5 Weinberger NOR array representation . . . . . . . . . . . . . . . . . . . . . . . 8-5
VLSI
Design Course
0-20
Figures
Weinberger stick diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5 Weinberger array structure: (a) schematic (b) layout . . . . . . . . . . . . . . . 8-6 Gate matrix layout: (a) schematic (b) layout (c) optimized layout of n part . . 8-8 Half adder NAND/INV representation . . . . . . . . . . . . . . . . . . . . . . . 8-9
8.10 Half adder realizations: (a) standard cell (b) gate matrix . . . . . . . . . . . . . 8-10 8.11 Typical gate matrix layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11 8.12 Gate matrix row and column spacings . . . . . . . . . . . . . . . . . . . . . . . 8-12 8.13 (a) CMOS complex gate schematic and (b) corresponding layout . . . . . . . . 8-14 8.14 Implementation of an EXOR function: (a) Logic diagram. (b) Circuit. (c) Layout 8-15 8.15 Example of row-based layout scheme . . . . . . . . . . . . . . . . . . . . . . . . 8-16 8.16 Alternative complex gate implementation of EXOR function: (a) Logic diagram. (b) Circuit. (c) Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17 8.17 Basic layout of the functional cell: (a) Logic diagram. (b) Circuit. (c) Graph model. (d) Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18 8.18 Layout optimization: (a) Diusion connection of adjacent transistors. (b) Optimal arrangement (reordered input lines) . . . . . . . . . . . . . . . . . . . . . 8-19 8.19 Alternative optimal circuit layout: (a) Logic diagram. (b) Circuit. (c) Graph model. (d) Optimal Layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20 8.20 Reduction of odd numbers of edges . . . . . . . . . . . . . . . . . . . . . . . . . 8-22 8.21 Application of reduction rule: (a) Logic Diagram. (b) Graph model and its reduction. (c) Reconstruction of an Euler path . . . . . . . . . . . . . . . . . . 8-23 8.22 Application of the heuristic algorithm: (a) New inputs p1 and p2 are added. (b) Optimal sequence of inputs without the interlace of p1 or p2. (c) Circuit with the dual path {p1,2,3,1,4,5,p2} . . . . . . . . . . . . . . . . . . . . . . . . 8-24 8.23 Minimal interlace algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25 8.24 Application example for minimal interlace algorithm . . . . . . . . . . . . . . . 8-26 8.25 Carry look-ahead circuit (this representation has no Euler path) . . . . . . . . 8-27 8.26 Alternative topology for carry look-ahead circuit (with possibility of constructing an Euler path) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-28 8.27 Comparison of space: (a) Functional cell realization. (b) Conventional NAND realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-29 8.28 Standard cell architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30 8.29 Synchronous counter schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30 8.30 Synchronous counter oorplan using standard cells . . . . . . . . . . . . . . . . 8-31 8.31 AND-OR-PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-32 8.32 Programmable logic approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33 8.33 PLA realization for given example . . . . . . . . . . . . . . . . . . . . . . . . . 8-34
VLSI
Design Course
0-21
Figures
8.34 PLA generic oor plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35 8.35 NOR-NOR PLA structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-36 8.36 Pseudo nMOS NOR-NOR PLA circuit . . . . . . . . . . . . . . . . . . . . . . . 8-37 8.37 PLA implementation in pseudo nMOS logic . . . . . . . . . . . . . . . . . . . . 8-37 8.38 Stick diagram of nMOS PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-38 8.39 PLA NAND-INV-INV-NAND implementation . . . . . . . . . . . . . . . . . . . 8-39 8.40 CMOS PLA layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40 8.41 Dynamic 2-phase PLA circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-41 8.42 Noise problem in dynamic PLAs . . . . . . . . . . . . . . . . . . . . . . . . . . 8-42 8.43 Multiple sided input/output access . . . . . . . . . . . . . . . . . . . . . . . . . 8-43 8.44 PLA before folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43 8.45 Row-folded PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43 8.46 Column-folded PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-44 8.47 Automatic PLA layout generation . . . . . . . . . . . . . . . . . . . . . . . . . 8-45 8.48 Datapath and controller block . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-47 8.49 Sequential circuit example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-49 8.50 State-transition diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-51 8.51 State-transition diagram for the divide-by-5 counter . . . . . . . . . . . . . . . 8-52 8.52 State-transition diagram of an arbitrary FSM . . . . . . . . . . . . . . . . . . . 8-52 8.53 FSM-realization for rst encoding scheme . . . . . . . . . . . . . . . . . . . . . 8-54 8.54 FSM-realization for second encoding scheme . . . . . . . . . . . . . . . . . . . . 8-55 8.55 FSM (Moore automata) implementation . . . . . . . . . . . . . . . . . . . . . . 8-55 8.56 Treatment of asynchronous inputs in a Moore machine . . . . . . . . . . . . . . 8-56 8.57 Five-state FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-58 8.58 Reduced equivalent FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-59 8.59 Example FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-60 8.60 Nondeterministic FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-60 8.61 NFSM that recognizes strings of form (A | (AB )) . . . . . . . . . . . . . . . . 8-61 9.1 9.2 9.3 9.4 9.5 9.6 Gate array oorplan with row structure . . . . . . . . . . . . . . . . . . . . . . 9-6 Floorplan for a sea of gates array . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7 IMI gate array structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 Corner of IMI gate array die . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 Grid representation of IMI gate array . . . . . . . . . . . . . . . . . . . . . . . 9-10 Explanations of grid: (a) basic cell. (b) internal interconnects. (c) basic cell and crossover (poly) block. (d) XR = transistor. (e) crossover block interconnects 9-11
VLSI
Design Course
0-22
Figures
Symbolic IMI cell structure representation . . . . . . . . . . . . . . . . . . . . . 9-12 CMOS matrixcell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 CDI single metal layer gate array structure . . . . . . . . . . . . . . . . . . . . 9-13
9.10 Gate array design ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14 9.11 Personalization for inverter: (a) schematic. (b),(c) IMI layout. (d) CDI layout . 9-15 9.12 NOR gate on IMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16 9.13 Layout of transmission gates: (a) single TG. (b) pair of TGs with common output 9-16 9.14 Gate array market by process technology . . . . . . . . . . . . . . . . . . . . . 9-18 9.15 Worldwide gate array market by user sector . . . . . . . . . . . . . . . . . . . . 9-18 9.16 Circuit and corresponding standard cell . . . . . . . . . . . . . . . . . . . . . . 9-19 9.17 Standard cell scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20 9.18 Standard cell oorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20 9.19 Standard cell market by process technology . . . . . . . . . . . . . . . . . . . . 9-22 9.20 Standard cell market by application . . . . . . . . . . . . . . . . . . . . . . . . 9-22 9.21 Floor plan for macro cell design style (= building block approach . . . . . . . . 9-23 9.22 Mixed design style structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24 9.23 Combinational PAL devices: AMD 16L2 . . . . . . . . . . . . . . . . . . . . . . 9-26 9.24 Sequential PAL devices: AMD PAL16R4 . . . . . . . . . . . . . . . . . . . . . . 9-27 9.25 Arithmetic PAL devices: AMD PAL16A4 . . . . . . . . . . . . . . . . . . . . . 9-28 9.26 Advanced PLD devices: Altera EP1800 . . . . . . . . . . . . . . . . . . . . . . 9-29 9.27 Local macro cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30 9.28 Global macro cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30 9.29 Synchronous clock, output enabled by product term . . . . . . . . . . . . . . . 9-31 9.30 Asynchronous clock, output permanently enabled . . . . . . . . . . . . . . . . . 9-31 9.31 Block diagram of MAX7000 family . . . . . . . . . . . . . . . . . . . . . . . . . 9-32 9.32 MAX7000 macrocell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-33 9.33 Principal FPGA structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34 9.34 Four classes of commercially available FPGAs . . . . . . . . . . . . . . . . . . . 9-35 9.35 SRAM programming technology . . . . . . . . . . . . . . . . . . . . . . . . . . 9-36 9.36 Actel PLICE anti-fuse structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37 9.37 Quicklogic ViaLink Anti-Fuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37 9.38 EEPROM programming technology . . . . . . . . . . . . . . . . . . . . . . . . . 9-38 9.39 General architecture of XILINX FPGAs . . . . . . . . . . . . . . . . . . . . . . 9-39 9.40 Xilinx XC4000 CLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-39 9.41 Xilinx XC4000 single length lines . . . . . . . . . . . . . . . . . . . . . . . . . . 9-40
VLSI
Design Course
0-23
Figures
9.42 Xilinx XC4000 double length lines and long lines . . . . . . . . . . . . . . . . . 9-40 9.43 General architecture of Actel FPGAs . . . . . . . . . . . . . . . . . . . . . . . . 9-41 9.44 Act-1 logic module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41 9.45 Act-1 programmable interconnection architecture . . . . . . . . . . . . . . . . . 9-42 9.46 Act-2 logic cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-43 9.47 FPGA CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-44 9.48 The Xilinx design ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-45 9.49 Cost per Chip (Dollars) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-46 9.50 Logic design alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-47 9.51 Relative merits of various ASIC implementation styles . . . . . . . . . . . . . . 9-48 10.1 Serial adder principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 10.2 Ripple carry adder principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.3 Carry lookahead adder for 4 bits . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.4 Clustered carry lookahead adder for 16 bits . . . . . . . . . . . . . . . . . . . . 10-4 10.5 Carry select adder for 16 bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.6 Carry save adder for summation of 4 operands (V, W, X, Y) . . . . . . . . . . 10-5 10.7 Structure of SAA multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 10.8 Structure of CSM multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7 10.9 Architecture of the block multiplier . . . . . . . . . . . . . . . . . . . . . . . . . 10-8 11.1 Microarchitecture blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1 11.2 Datapath example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.3 Corresponding layout scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.4 2901 4-bit ALU slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5 2901 -OPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.6 16-bit bit-sliced ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4 11.7 Basic controller structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5 11.8 ROM based controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 11.9 PLA based controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 11.10Horizontal microinstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.11Vertical microinstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.12A microcode/nanocode controller . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8 12.1 Flip-op driving clock input of another Flip-op . . . . . . . . . . . . . . . . . 12-1 12.2 Gated clock line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
VLSI
Design Course
0-24
Figures
12.3 Double-edged clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.4 Flip-op driving asynchronous reset of another Flip-op . . . . . . . . . . . . . 12-2 12.5 Unequal depth of clock buering . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.6 Unbalanced fanout of clock buers . . . . . . . . . . . . . . . . . . . . . . . . . 12-4 12.7 Balanced clock tree buering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 12.8 Combined geometric/tree buering . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 12.9 Multiplexer on clock line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.10Enabled (E-type) ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.11Toggle (T-type) ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.12Pipelined logic with double-edged clocking . . . . . . . . . . . . . . . . . . . . . 12-8 12.13Pipelined logic with single-edged clocking . . . . . . . . . . . . . . . . . . . . . 12-8 12.14Flip-op driving asynchronous reset of another ip-op . . . . . . . . . . . . . . 12-9 12.15Global asynchronous reset by external signal . . . . . . . . . . . . . . . . . . . 12-9 12.16Flip-op driving synchronous reset of ip-op . . . . . . . . . . . . . . . . . . . 12-9 12.17Shift register with forward chain of clock buers . . . . . . . . . . . . . . . . . 12-10 12.18Shift register with balanced tree of clock buers . . . . . . . . . . . . . . . . . . 12-10 12.19Series D-type ip-ops for capturing asynchronous input . . . . . . . . . . . . . 12-11 12.204-bit register used as shift register to capture an asynchronous input . . . . . . 12-11 12.21Asynchronous handshake circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12 12.22Operation of asynchronous handshake circuit . . . . . . . . . . . . . . . . . . . 12-13 12.23Monostable pulse generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.24Pulse generator using ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.25Multivibrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.26Synchronous pulse generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15 12.27Bistable storing element formed by cross-coupled NAND gates . . . . . . . . . 12-15 12.28Bistable storing element formed by cross-coupled NOR gates . . . . . . . . . . 12-15 12.29Asynchronous RS ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16 12.30Latch congured as RS ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16 12.31ME and WEbar RAM/DPRAM timing scheme . . . . . . . . . . . . . . . . . . 12-17 12.32Interfacing RAM into synchronous circuit: ME and WEbar generation . . . . . 12-17 12.33Using ip-op for WEbar generation: timing schene . . . . . . . . . . . . . . . 12-18 12.34Avoiding oating RAM/DPRAM output propagation . . . . . . . . . . . . . . 12-18 12.35Tristate bus with non-central enable control . . . . . . . . . . . . . . . . . . . . 12-19 12.36Tristate bus with central control of tristate enables and additional driver activated on non-controlled states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-20
VLSI
Design Course
0-25
Figures
12.37Wired-OR part used to create higher fanout . . . . . . . . . . . . . . . . . . . . 12-21 12.38High-fanout buer replacing wired OR part . . . . . . . . . . . . . . . . . . . . 12-21 12.39Excessive fanout on control signal . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22 12.40Geometric buering on control signal . . . . . . . . . . . . . . . . . . . . . . . . 12-23 12.41Tree buering on control signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-24 12.424-input AND gate and 2-input NAND/NOR equivalent . . . . . . . . . . . . . 12-25 12.43Multiplexer using AOI logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-25 12.44Late changing input fed late into combinational logic . . . . . . . . . . . . . . . 12-25 12.454-stage Johnson counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-26 12.46Using duplicate logic for reducing fanout . . . . . . . . . . . . . . . . . . . . . . 12-26 12.47Circuit with inaccessible internal logic: only rst block is controllable and only last block is directly observable . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-27 12.48Chain of counters: rst counter is not directly observable and second counter is not directly controllable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-27 12.49Counter with closed feedback loop: initial state not known . . . . . . . . . . . . 12-28 12.50Circuit with test inputs and outputs . . . . . . . . . . . . . . . . . . . . . . . . 12-29 12.51Chain of counters broken by test input and output signals . . . . . . . . . . . . 12-29 12.52Counter with feedback loop opened by test control and output signals . . . . . 12-30 12.53Compiled megacell with compiled inputs/outputs . . . . . . . . . . . . . . . . . 12-30 12.54E-type scan path ip-op . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-31 12.55Circuit with scan path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-31 12.56JTAG test circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-32 13.1 Defect level as function of yield and fault coverage . . . . . . . . . . . . . . . . 13-4 13.2 A typical synthesis ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5 13.3 Relationship between faults, errors and failures . . . . . . . . . . . . . . . . . . 13-6 13.4 Three-universe model of a system . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.5 Examples for physical faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 13.6 Fault detection by duplication with complementary logic . . . . . . . . . . . . . 13-12 13.7 4-by-4 array with one spare column . . . . . . . . . . . . . . . . . . . . . . . . . 13-13 13.8 Recongured array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14 13.9 Basic concept of D-algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16 13.10Primitive D-cube of fault (pdcf) for two-input NAND gate . . . . . . . . . . . . 13-16 13.11Propagation-D-cube (pdc) for two-input NAND gate . . . . . . . . . . . . . . . 13-17 13.12Singular cover for two-input NAND gate . . . . . . . . . . . . . . . . . . . . . . 13-17 13.13Singular covers for several basic logic gates . . . . . . . . . . . . . . . . . . . . 13-18
VLSI
Design Course
0-26
Figures
13.14Construction the singular cover of an logic module . . . . . . . . . . . . . . . . 13-19 13.15Example circuit illustrating D-algorithm . . . . . . . . . . . . . . . . . . . . . . 13-20 13.16Serial fault simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-24 13.17Design for testability: complex gate (a) not testable with stuck-at model. (b) fully testable with stuck-at model . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25 13.18Testability: ad-hoc techniques (partitioning for testability) . . . . . . . . . . . . 13-26 13.19Testability: ad-hoc techniques (a) insertion of register in order to limit logic depth to a given maximum value. (b) test shift registers for PLA test (increasing PLA area). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-27 13.20Feedback logic with scanpath . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-28 13.21Examples for built-in test pattern generators . . . . . . . . . . . . . . . . . . . 13-30 13.22Pseudo random pattern generator . . . . . . . . . . . . . . . . . . . . . . . . . . 13-31 13.23Example for pseudo random pattern generator . . . . . . . . . . . . . . . . . . 13-32 13.24Counting techniques for test data evaluation . . . . . . . . . . . . . . . . . . . . 13-33 13.25Test data evaluation by signature analyse . . . . . . . . . . . . . . . . . . . . . 13-34 13.26Parallel signature register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-36 13.27BILBO registers: 1. full circuit 2. normal use 3. scan-path use 4. signature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-37 13.28Example: self-testing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-38 14.1 In-circuit test using bed-of-nails . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2 14.2 Functional test using board connector . . . . . . . . . . . . . . . . . . . . . . . 14-2 14.3 Combined use of in-circuit and functional test . . . . . . . . . . . . . . . . . . . 14-3 14.4 Scan design at the board level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4 14.5 Testing for interconnection faults . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5 14.6 Testing on-chip logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6 14.7 IEEE Std 1149.1 test logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-7 14.8 Test data registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8 14.9 Serial connection of IEEE Std 1149.1-compatible ICs . . . . . . . . . . . . . . . 14-9 14.10Parallel connection of IEEE Std 1149.1-compatible ICs . . . . . . . . . . . . . . 14-9 14.11Use of bus master chip to control IEEE Std 1149.1 chips . . . . . . . . . . . . . 14-10 14.12Daisy-chain connection of instruction registers . . . . . . . . . . . . . . . . . . . 14-11 14.13Instruction register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-11 14.14An example instruction register cell (stage) . . . . . . . . . . . . . . . . . . . . 14-12 14.15Example design for bypass register . . . . . . . . . . . . . . . . . . . . . . . . . 14-13 14.16Use of bypass register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
VLSI
Design Course
0-27
Figures
14.17Provision of boundary-scan cells . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14 14.18Basic boundary-scan cell for input pin . . . . . . . . . . . . . . . . . . . . . . . 14-15 14.19Basic boundary scan cell for output pin . . . . . . . . . . . . . . . . . . . . . . 14-15 15.1 Block diagram of a typical signal processing system . . . . . . . . . . . . . . . . 15-1 15.2 Bandwidths of signals used in signal processing applications . . . . . . . . . . . 15-2 15.3 Signal bandwidths that can be processed by present day (1989) technologies . . 15-2 15.4 Converters in signal processing systems: (a) A/D, (b) D/A . . . . . . . . . . . 15-3 15.5 (a) Conceptual block diagram of a D/A converter, (b) Clocked D/A converter . 15-4 15.6 (a) Sample-and-hold circuit, (b) Waveforms illustrating the operation of the sample-and-hold circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.7 Block diagram of a D/A converter . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.8 Ideal input-output characteristics for a 3-bit D/A converter . . . . . . . . . . . 15-6 15.9 (a) Conceptual illustration of a current-scaling D/A converter, (b) Implementation of (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.10A current-scaling D/A converter using an R-2R ladder . . . . . . . . . . . . . . 15-7 15.11Illustration of a voltage-scaling D/A converter . . . . . . . . . . . . . . . . . . . 15-8 15.12Block diagram of a general analog-to-digital converter . . . . . . . . . . . . . . 15-9 15.13Ideal input-output characteristics for a 3-bit A/D converter . . . . . . . . . . . 15-9 15.14Example of a successive approximation A/D converter architecture . . . . . . . 15-10 15.15The successive approximation process . . . . . . . . . . . . . . . . . . . . . . . 15-11 15.16A 3-bit parallel A/D converter . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12 15.17A time-interleaved A/D converter array . . . . . . . . . . . . . . . . . . . . . . 15-13 15.18Basic structure of a sigma-delta converter . . . . . . . . . . . . . . . . . . . . . 15-14 15.19First-order sigma-delta modulator block diagram . . . . . . . . . . . . . . . . . 15-14 15.20Output of rst-order sigma-delta modulator . . . . . . . . . . . . . . . . . . . . 15-15 15.21Frequency domain linearized model of a sigma-delta modulator . . . . . . . . . 15-15 15.22Noise-shaping lter function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-16
VLSI
Design Course
0-28
Tables
List of Tables
3.1 4.1 4.2 4.3 5.1 5.2 5.3 5.4 7.1 7.2 7.3 8.1 8.2 8.3 Static D ip-op set/reset truth table . . . . . . . . . . . . . . . . . . . . . . . 3-47 Approximation of intrinsic MOS gate capacitance . . . . . . . . . . . . . . . . 4-4
Inuence of rst-order scaling on MOS device characteristics . . . . . . . . . . 4-16 Inuence of scaling on interconnect media . . . . . . . . . . . . . . . . . . . . . 4-18 CMOS 1.5-Micron Design Rule Example . . . . . . . . . . . . . . . . . . . . . . 5-16 Basic n-well CMOS Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 Layer capacitances of an n-well CMOS process . . . . . . . . . . . . . . . . . . 5-26 Layer resistances of an n-well CMOS process . . . . . . . . . . . . . . . . . . . 5-26 Simplied geometrical specication language . . . . . . . . . . . . . . . . . . . 7-3
MOS layer denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Rotations of geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 State-transition truth table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-50 State-transition table for divide-by-5 counter . . . . . . . . . . . . . . . . . . . 8-52 State-transition truth table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-53
13.1 Propagation D-cube table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-20 13.2 Singular cover table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-20 13.3 D-cube intersection table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-21
VLSI
Design Course
0-29
pn Junction Properties
Chapter 1
Figure 1.1: Step-prole of pn junction Figure 1.1 shows the prole of a pn junction p-type region (x < 0): n-type region (x > 0): doping Na [cm3 ] doping Nd [cm3 ]
The following analysis is done for the pn junction without external voltage (V = 0).
1.1.1
Diusion (statistical phenomenon) of mobile carriers over the junction lets the dopants become ionized and space charge regions arise. The diusion is restricted by the electric eld caused by the space charge (moved electrons/holes). The equation describing the relation between the space charge density (x), the depletion electric eld E (x) and the potential (x) (Poisson equation) is given by dE (x) (x) d2 (x) = = . (1.1) 2 dx dx Si
VLSI
Design Course
1-1
pn Junction Properties
(x) is the volume charge density of ionized dopants and can idealized be written as (x) = +qNd [0, xn ], qNa [xp , 0]. (1.2)
E (x) =
x0
(x)
Si
dx
(1.3)
Integrating with the boundary conditions E (xn ) = 0 = E (xp ) (due to V = 0) gives E (x) =
qN d Si (xn x) [0, xn ],
(1.4)
qNa
Si
(x + xp ) [xp , 0].
qNa xp
Si
(1.5)
1.1.2
0 =
xp
E (x) dx.
(1.6)
The build-in potential can be derived from the following equations: The diusion hole-current density Jp dif f (x) is proportional to the positive charge carrier gradient and is given by dp(x) (1.7) Jp dif f (x) = qDp dx where Dp is the diusion constant for holes and p(x) is the density of holes at x. Diusion and charge carrier mobility are statistical phenomenons and the relationship between them is given by the Einstein equation kT Dp Dn = = VT = p n q (1.8)
where k is the Boltzmann constant (in joules per Kelvin) and T the temperature (in K). The electic eld E (x) in the analyzed junction semiconductor has not for all x the value 0 which means that also a drift current density Jdrif t exists. The equation for Jdrif t for positive charge carriers is Jp drif t (x) = qp p(x)E (x) (1.9) The resulting hole current density is Jp (x) = qp p(x)E (x) qDp dp(x) dx A m2 (1.10)
VLSI
Design Course
1-2
pn Junction Properties
and equivalent for electrons Jn (x) = qn n(x)E (x) + qDn dn(x) dx A . m2 (1.11)
Setting Jp = 0 (equilibrium condition) and using the Einstein relationship Dp = p VT we obtain d(x) VT dp(x) E (x) = = (1.12) dx p(x) dx and we can calculate the potential as dV = VT dp(x) . (p) (1.13)
Integration from x1 (with concentration p1 and potential V1 ) to a point x2 (with p2 and V2 ) yields p1 V21 = VT ln . (1.14) p2 For the built-in potential 0 we obtain 0 = VT ln With p(xp ) = Na and np = n2 i n2 i = p(xn ) = Nd we get the nal expression for 0 0 = VT ln Na Nd . n2 i (1.19) (1.17) (1.18) (1.16) p(xp ) . p(xn ) (1.15)
Note: Equation 1.17 is valid independent of the amount of donor and acceptor impurity doping.
1.1.3
VLSI
Design Course
1-3
pn Junction Properties
From
qNd
Si
qNa
Si
0 =
xn
Na (x + xp ) dx +
0 0
xp
Nd (xn x) dx
xn 0
= = = = = =
q
Si
Na 2 (x + 2x xp ) 2
2 Na x2 p + Nd xn
+
xp
Nd (2x xn x2 ) 2
q 2 2 2 2 2
Si
q
Si
Na
Nd W Na + Nd
+ Nd
Na W Na + Nd W2
q
Si
2 2 Na Nd Nd Na + (Na + Nd )2 (Na + Nd )2
q
Si
q
Si
(1.25)
we obtain for the depletion width W the following equation: W = A one-sided junction is obtained if Nd W where N = min(Na , Nd ). 2
Si 0
1 1 + . Na Nd
(1.26)
qN
(1.27)
1.1.4
Assuming that the positive side of an external voltage V is attached to the p-type area and the negative side to the n-side area (V > 0: forward bias; V < 0: reverse bias) we can modify the equilibrium equations by the transformation 0 (0 V ) and obtain for the depletion width: 2 Si 1 1 W = (0 V ) + . (1.28) q Na Nd
VLSI
Design Course
1-4
pn Junction Properties
1.1.5 pn Junction Capacitance
The junction capacitance originates from the depletion charge. It is important in reverse bias (V < 0), where it is given by Si Cj (V ) = [F/cm2 ] (1.29) W (V ) C is nonlinear since it changes with the voltage V .
1.1.6
Current ow through the junction is established by tracking the minority carriers: electron current In on the p-side hole current Ip on the n-side recombination-generation current originating from the depletion region In and Ip combine to give the ideal diode equation I = I0 (eqV /kT 1), where I0 = qA Dn np0 Dp pn0 + Ln Lp (1.31) (1.30)
is the reverse saturation current. The reverse generation current (V < 0) is found as Igen qAni W (V ), 2 0 (1.32)
while the forward recombination current assumes the form Irec qAni W (V ) qV /2kT e , 20 (1.33)
where 0 is the average carrier lifetime. These contributions must be added to the ideal diode current.
VLSI
Design Course
1-5
the aspect ratio W/L is the characteristical transistor design parameter MOSFET type: Substrate material: Drain,Source material: Gate material: n-channel p-channel weak p-type Silicon weak n-type Silicon strong n+ Silicon strong p+ Silicon strong doped Polysilicon low resistance
VLSI
Design Course
1-6
Figure 1.3: The basic MOS structure n-channel MOSFET: p-type wafer (single crystal p-type silicon) uniformly doped with acceptor (e.g. boron) concentration Na (Na 1015 cm3 ) Close to the bulk electrode, the majority and minority thermal equilibrium concentrations are approximated by ppo Na and npo n2 i Na (1.34)
1, 45 1010 cm3 )
Oxide layer (SiO2 = quartz glass) is used as insulating dielectric between metal and semiconductor layer with a resistivity > 1015 cm. State of the art MOS processes use poly silicon as gate material. The gate capacity is given by Cox with ox xox = = ox [F/cm2 ] xox 3 , 9 0 , 0 = 8, 854 1014 108 F cm2 F cm (1.35)
Cox A [F]
the top layer of metal is used for low resistance connections of transistor structures
VLSI
Design Course
1-7
%newpage Varying the gate voltage gives three modes of operation for the MOS capacitor: 1. accumulation (VG < 0) 2. depletion (VG > 0) (VG small) and 3. inversion (VG > 0) Accumulation Positively charged majority carriers (holes) accumulate at the Si-SiO2 interface (Fig. 1.4). The MOS system behaves as a capacitor (Eq. 1.35). This state is only useful for measuring some basic MOS properties. It is no operational region.
VLSI
Design Course
1-8
Depletion
Figure 1.5: MOS elds and potentials for positive gate voltages MOS eld eect: An externally applied gate voltage VG controls the semiconductor electric eld E (x) and the semiconductor potential (x) and therefore the Silicon carrier densities p, n. E (x) = d(x) dx (1.36)
Potential boundary condition: (x) VB = 0 at the bulk electrode. The total voltage accross the semiconductor is equal to the surface potential S = (x = 0). Applying the KVL leads to VG = Vox + S Connection between VG and ES : ES = E (x = 0) = d dx (1.39)
x=0
(1.37) (1.38)
ES is the maximum value of the semiconductor eld and is controlled (Poisson equation) by the voltage VG and inuences the surface carrier concentrations negatively charged acceptor ions are termination points for the electric eld lines. pS = pp (x = 0) and nS = np (x = 0) (1.40)
If VG is increased to a point where pS Na (induced by electric eld ES ) is satised, the depletion region extends from x = 0 to x = xd . The depletion phenomenon in the MOS system is analogous to the p-side of a one-sided n+ p prole junction with the dierence that there is the voltage S across the depletion region. Replacing the built-in voltage 0 by the surface potential S leads to an equation for the depletion width: 2Si xd = S , Si = 11, 80 . (1.41) qNa
VLSI
Design Course
1-9
Figure 1.6: Depletion in the MOS system The bulk depletion charge per unit area is QB 0 = qNa xd [C/cm2 ] where QB 0 = QB |VB =0 (1.43) = 2qSi Na S . MOS capacitator: QS = QB < 0 Vox = Inversion QS Cox (1.42)
(1.44)
VLSI
Design Course
1-10
Increasing VG implies increasing S and driving xd deeper towards the bulk electrode. When VG reaches a critical threshold value VT 0 (assuming VB = 0) the inversion phenomenon occurs: The depth of the depletion area remains constant (xd = xdm ) and a layer of minority carriers accumulates at the surface (x=0). The depth of the depletion area remains constant, because the inversion layer electrons shield the bulk substrate from the increasing eld at the surface. The inversion condition is given by VG VT with S (VG = VT ) = 2|F | Na kT ln where |F | = q ni The maximum depletion width is xdm = the bulk depletion charge density QB 0 = 2qSi Na (2 |F |) , and the total surface charge density QS (VG ) = QB 0 + QI (VG ) (where QI (VG ) is electron inversion layer charge). At the onset of inversion, QI QBO , so the ideal threshold voltage is ideal = VT 0 = QB 0 + 2|F | Cox 2qSi Na (2 |F |) Cox voltage drop across oxide (1.51) + 2|F | voltage drop across substrate (1.52) In reality exists an additional term VF B (called Flatband voltage) to the oxide voltage drop: VF B = GS 1 (Qox + QSS ) Cox (1.53) (1.50) (1.49) 2Si (2 |F |) , qNa (1.48) (bulk Fermi potential). (1.45) (1.46) (1.47)
GS = G S represents the dierence in work functions between the gate and substrate materials (material specic contact voltages which can be taken from tables). Qox is the oxid charge (unwanted positive ions) density QSS is the surface state density Since Qox and QSS are positive, VF B may become negative resulting in a negative threshold voltage.
VLSI
Design Course
1-11
To ensure a positive VT 0 a additional acceptor ion implantation is introduced in the MOS process with a ion dose DI [ions/cm2 ]. Final threshold voltage for VB = 0: V T 0 = VF B + 2qSi Na (2|F |) qDI + 2|F | + Cox Cox (1.54)
The electron charge density in the inversion layer is QI = Cox (VG VT 0 ) MOS Transistor Threshold Voltage for Nonzero Bulk-Source Voltages (1.55)
Figure 1.8: Increase in depletion charge from body bias VB Non zero bulk voltage reverse biases the pn junction. The depletion charge is QB = 2qSi Na (2|F | + VB ) Threshold shift: VT = VT (VB ) VT 0 , VT 0 = VT (VB = 0) 2qSi Na = 2|F | + VB 2|F | Cox = VT 0 + 2|F | + VB 2|F | (1.56)
(1.57) (1.58)
VT with
(1.59)
VLSI
Design Course
1-12
Figure 1.9: Basic MOSFET channel formation n-channel MOSFET: Source electrode (n+ region) is at the lowest potential Source potential is the reference potential for all voltages: VDS = VD VS , VGS = VG VS , VSB = (VS VB ) (1.61)
VSB > 0 because VB must be more negative than VS to make sure that the pn-junction from bulk to source is reverse biased.
VLSI
Design Course
1-13
Figure 1.10: MOSFET in cuto mode Nonsaturation : VGS VT and VDS (VGS VT ) Saturation : VGS VT and VDS (VGS VT )
VLSI
Design Course
1-14
VLSI
Design Course
1-15
The Gradual Channel Approximation analysis with the gradual channel approximation=reduction of the three-dimensional problem to a one-dimensional current ow problem approximation describes very well large devices analysis rst done for VS = 0 assumption for derivation of GCA equations: depletion charge is supported entirely by the vertical electric eld Ex (y ); (assume VT 0 (QB 0 ) indep. of V (y ))
Figure 1.13: MOSFET geometry used in GCA (MOSFET in linear/nonsaturated region) The channel electric eld Ey (y ) is established by the drain source voltage VDS is Ey (y ) = dV (y ) dy (1.62)
with V (y = 0) = VS = 0, V (y = L) = VDS . The depletion depth has its maximum at the drain electrode because V (y ) has a maximum at y = L: 2Si Xdm (y ) [2|F | + V (y )] (1.63) qNa The inversion charge density as a function of the position y is given by QI (y = 0) = Cox [VGS VT ] QI (y ) = Cox [VGS VT V (y )] The resistance for a dierential channel increment dy is dR = dy dy = [] n W QI (y ) A (1.66) (1.64) (1.65)
VLSI
Design Course
1-16
Figure 1.14: Geometry for GCA current analysis with A n W : : : : channel cross section electron surface mobility channel width conductivity
Rearranging dV
L
= ID dR =
ID dy n W QI (y ) QI (V )dV
(1.67) (1.68)
VDS
ID
0
dy = n W
0
W = n Cox L
VDS
(VGS VT V )dV
0
(1.69) (1.70)
W 1 2 = k (VGS VT )VDS VDS L 2 with the process transconductance parameter k = n Cox tance parameter = k
W 2 L [A/V ]. A V2
VLSI
Design Course
1-17
MOSFET Current Equations The resulting equation from the GCA for the nonsaturated current in a conveniant form is ID = 2 [2(VGS VT )VDS VDS ] 2 (1.71)
At the onset of saturation the current ID reaches a peak value and remains constant in the
Figure 1.15: Nonsaturated MOS current saturation region: ID = 0 = (VGS VT VDS ) VDS Evaluation of the derivation yields VDS,SAT ID,SAT = VGS VT = ID (VDS = VDS,SAT ) = (VGS VT )2 2 (1.73) (1.74) (1.72)
VLSI
Design Course
1-18
Figure 1.17: Start of Saturation in a MOSFET Channel length modulation in saturation The eective channel lenght in saturation is L = L L. From GCA: QI (L ) = 0 V (L ) VDS,SAT (1.75) (1.76)
(VDS,SAT = VGS VT 0 no inversion charge is induced). L may be approximated as a depletion region for a one-sided pn junction with a voltage VDS VDS,SAT across it. 2Si L [VDS VDS,SAT ] (1.77) qNa
VLSI
Design Course
1-19
Figure 1.18: Channel length modulation The saturated current is modied to ID k W (VGS VT )2 2 L ID0 L 1 L (VGS VT 0 )2 . 2 1 VDS (1.78) (1.79)
with [V1 ] the channel length modulation factor and assuming that VDS can be represented by ID = = = ID0 1 VDS 1 + VDS ID0 1 VDS 1 + VDS ID0 (1 + VDS ) 1 (VDS )2
1
(1.81)
ID
ID0 (1 + VDS ) =
(VGS VT 0 )2 (1 + VDS ) 2
(1.82)
has typical values from 0.1 to 0.01V1 and represents the inuence of VDS on ID in saturation. is important in small geometrie devices. In the following exercises we will neglect .
VLSI
Design Course
1-20
1.2.5
VT ID ID VDS,sat ID
= VT 0 + ( 2|F | + VSB
2|F |)
0 (VGS < VT ) 2 = 2(VGS VT )VDS VDS (VGS > VT , VDS < VDS,sat ) 2 = VGS VT = (VGS VT )2 (1 + VDS ) (VGS > VT , VDS VDS,sat ) 2
VLSI
Design Course
1-21
1.2.6
Figure 1.22: Device parameter measurement (a) Get (1) VT 0 from intercept W 2ID (2) k = k from slope: k = L VGS VT VT (VSB ) VT 0 (3) = 2|F | + VSB 2|F |
VLSI
Design Course
1-22
1.2.7
includes additional depletion charge created by the channel voltage V (y ), which is reverse bias across the n+ p junction at the channel-substrate boundary assume VS = 0 = VB calculation for nonsaturated MOSFET VT 0 (V ) = VF B + 2|F | + The basic GCA integral
VDS
2q
Si Na (2|F |
+V)
(1.88)
ID =
0
[VGS VT 0 (V ) V ] dV
(1.89)
ID =
0
+ V ) dV
which gives for the nonsaturated drain current ID = VGS VF B 2|F | 2 3Cox 2q
Si Na [(2|F |
qDI Cox
Introduction of a reduction factor M < 1 modies the nonsaturated current equation to 2 ]. ID = M [2(VGS VT 0 )VDS VDS 2 The saturated current is then given by ID,sat = M (VGS VT 0 )2 2 (1.92)
(1.93)
VLSI
Design Course
1-23
Figure 1.24: Comparision of circuit equations with the complete GCA model
Figure 1.25: Comparision of modied circuit equations with the complete GCA model
1.2.8
VLSI
Design Course
1-24
With the donator implant dose DI , VT is modied to VT = VF B + 2|F | + 1 Cox 2qSi Na (2|F | + VSB ) qDI Cox (1.94)
so that VT of a depletion MOSFET is negative. The n-type layer resulting from donor doping
Figure 1.26: Depletion-mode MOSFET is modeled by (Nd Na ) > 0. The current ID can be modeled by ID = n with QC (V ) the channel charge density QC (V ) = Qn + QS (V ) + Qj (V ) (1.97) W L
VDS
(1.95)
QC (V )dV
0
(1.96)
VLSI
Design Course
1-25
Figure 1.27: Simplied depletion-mode MOSFET model Qn : total charge density of electrons in the n-type layer QS : MOS surface charge density (VF B gives the voltage necessary to create a charge-neutral atband state at the surface of the semiconductor) Qj : amount of depletion charge on the n-side of the pn junction n-type layer substrate Qn = q (Nd Na )a QS (V ) = Cox [VGS VF B V ] Qj (V ) = 2q 0 N=
Si N (0
kT q
ln
Si N (0
+ V )]dV
q (Nd Na )a 1 2 VDS + (VGS VF B )VDS VDS Cox 2 2 2q Si N [(0 + VDS )3/2 (0 )3/2 ] . 3Cox
(1.103)
This equation is too complicate for hand-calculations, so usually the D-mode MOSFET is described by 2 ID = [2(VGS VT 0 )VDS VDS ], (1.104) 2
VLSI
Design Course
1-26
VLSI
Design Course
1-27
1.2.9
pchannel MOSFET
VLSI
Design Course
1-28
+ Qox )
1 Cox QBn
>0
VBSp + 2Fn
with p =
2qNd Si Cox
VTp is negative for enhancement p-channel MOSFET. Current equations are similar to nchannel MOSFET but all the signs are opposite.
1.2.10
Conclusions
VLSI
Design Course
1-29
n channel transistor Fp =
kT q
ln
ni Na
<0 positive
ln
Nd ni
>0
Threshold
Voltage negative
VTn = VT 0n + n (|2Fp | VBS ) n = 2qNa Si /Cox Cuto : VGS < VTn ID = 0 VGS > VTn ID =
n 2
|2Fp |
2Fn
Current
Nonsaturation and VDS (VGS VTn ) |VGSp | > |VTp | and |VDSp | |VGSp VTp | IDp =
p 2 2 2(VSGp + VTp )VSDp VSD p
+ VTp )2
1.2.11
VLSI
Design Course
1-30
F/m
tox NA or ND QSS /q
Xj LD
XJ LD UO
m m cm2 /Vs
VLSI
Design Course
1-31
VLSI
Design Course
1-32
The voltage transfer curve for DC voltages is dened as Vout (Vin ). The DC equation for the load current IL is IL = ID (Vin , Vout ) and for the output voltage we get Vout = VDD VL (IL ). If Vin is increased from 0 (Vout = VDD initially) to values greater than VT : VDS = Vout > (VGS VT ) the driver changes from cuto mode to saturation: D (Vin VT )2 (1 + Vout ) = IL (VL ) 2 = IL (VDD Vout ) (1.109) (1.108) (1.107)
If Vin is more increased and when the point is reached where Vout < (VGS VT ) then the driver is in ohmic mode: D 2 [2(Vin VT )Vout Vout ] = IL (VL ) 2 = IL (VDD Vout ) (1.110)
VLSI
Design Course
1-33
Characteristical points of the Voltage transfer curve: VOL : output low voltage of the inverter VOH : output high voltage of the inverter VIL : input low voltage of the inverter VIH : input high voltage of the inverter at the point
dVout dVin
= 1
VLSI
Design Course
1-34
Figure 1.34: Denition: Noise margins Noise margins Input voltage ranges for Output voltage ranges for N MH N ML Logic 1 Logic 0 Logic 1 Logic 0 = = : : : : VOH VIH VIL VOL VIH to VDD 0 to VIL VOH to VDD 0 to VOL
VLSI
Design Course
1-35
VLSI
Design Course
1-36
Figure 1.37: Simplied AC circuit model for noise margins Inverter Transient Response Current equation for change of Vin from VOL to VOH : ID = Cout and for change of Vin from VOH to VOL : IL (Vout ) = Cout dVout dt (1.112) dVout + IL dt (1.111)
VLSI
Design Course
1-37
1.3.2
The description of the load is given by Vout = VDD IL RL VDD Vout IL = RL (1.113) (1.114)
To obtain the VTC, the load current must be set equal to the driver current (IL = ID ) (assuming a slow change of Vin ). When the driver is in cuto (Vin < VT ID = 0), there is a zero voltage drop across RL and Vout = VDS = VOH . When Vin is increased, the driver starts conduction in saturation mode, because the output voltage is initially high, so Vout = VDS > (VGS VT ). In this case, the VTC equation is Vout = VDD RL (Vin VT )2 . (1.115) 2 When Vin is more increased, Vout = VDS drops to the value (Vin VT ) and the driver changes to ohmic mode, where the VTC equation is given by Vout = VDD 2 RL [2(Vin VT )Vout Vout ]. 2 (1.116)
VLSI
Design Course
1-38
Figure 1.39: Physical reason for transition times Calculation of VOH VOH = VDD because ID = 0 when driver in cuto. Calculation of VOL Vin = VOH and the driver is nonsaturated, because Vout < Vin VT . D VDD VOL 2 = [2(VOH VT )VOL VOL ] RL 2
2 VOL 2
(1.117) (1.118)
VLSI
Design Course
1-39
Figure 1.40: Inverter with linear resistor load Calculation of VIL For Vin = VIL the driver transistor is saturated, because Vout is slightly below VOH . From ID = IL follows: D VDD Vout (Vin VT )2 = (1.119) 2 RL VIL is dened as the point where dVout = 1 dVin (1.120)
Dierentials of both sides of ID (Vin ) = IL (Vout ): dID dVin = dVin dVout dVin = = dIL dVout dVout
dID dVin dIL dVout
(1.121) (1.122)
D (Vin VT ) 1 R L
VLSI
Design Course
1-40
Figure 1.41: VTC for linear resistor load nMOS inverter = D RL (Vin VT ) = 1 With Vin = VIL : VIL = VT + 1 D RL (1.123)
(1.124)
Replacing Vin in equation 1.119 by the preceding equation term yields for Vout : Vout (VIL ) = VDD Calculation of VIH For Vin = VIH , Vout < (VGS VT ), so the driver is in the ohmic (nonsaturated) mode. Equating ID and IL gives 1 D 2 [2(VIH VT )Vout Vout ]= (VDD Vout ) (1.126) 2 RL Evaluation of the condition (dVout /dVin ) = -1 for ID (Vin , Vout ) = IL (Vout ) gives ID dIL ID dVin + dVout = dVout . Vin Vout dVout Rearranging, dVout = dVin
ID Vin dIL dVout
1 2D RL
(1.125)
(1.127)
ID Vout
(1.128)
VLSI
Design Course
1-41
(1.129)
rearranging and setting Vin = VIH yields 1 1 Vout = (VIH VT ) + 2 2D RL Substitution of this expression for Vout in equation 1.126 gives (VIH VT )2 + 2 (VIH VT ) D RL 8VDD 1 2 2 3D RL D RL =0 (1.131) (1.130)
VIH can be computed by solving this quadratic equation and selecting the proper physical root. Calculation of Vth The inverter threshold voltage is dened as the VTC point where Vin = Vout . The current equation can be written as (with Vth = Vin = Vout ): D VDD Vth (Vth VT )2 = 2 RL Rearranging and solving the equation
2 Vth 2 VT
(1.132)
1 2VDD 2 Vth + VT D RL D RL
=0
(1.133)
yields Vth .
1.3.3
In this approach VOH and VOL are of rst and VIH and VIL of secondary importance. The inverter is modeled as series resistive voltage divider.
Figure 1.42: VOH resistor model VOH = Rof f VDD Rof f + RL (1.134)
VLSI
Design Course
1-42
Figure 1.43: VOL resistor model For RL Rof f is VOH VDD . Current equations for VOL (assuming Vin = VOH ): D VDD VOL 2 [2(VOH VT )VOL VOL ]= 2 RL Rearrangement yields RL W L =
D
(1.135)
(1.136)
with D = k (W/L). This equation describes the needed product RL (W/L) for a given voltage VOH . The driver on resistance can be written as follows: Ron = VOL = ID k 1
W L D
(VOH VT ) 1 2 VOL
(1.137)
VLSI
Design Course
1-43
Figure 1.44: Saturated enhancement load nMOS inverter With VGSL = VDSL VDSL > (VGSL VT L ) the load is automatically saturated and the current is given by k W IL = (VGSL VT L )2 (1.138) 2 L L Since VGSL = (VDD Vout ) and Vout = VDSD , ID = IL = VSBL = Vout , so VT L = VT 0L + ( Vout + 2|F | 2|F |) (1.140) The driver is in cuto for Vin < VT D Vout = VOH . As Vin increases above VT D the driver is saturated, so D L (Vin VT D )2 = [VDD Vout VT L (Vout )]2 (1.141) 2 2 k 2 W L [VDD VDSD VT L (VDSD )]2
L
(1.139)
VLSI
Design Course
1-44
Figure 1.45: VTC for saturated enhancement load nMOS inverter When Vin is increased further and the condition Vout < (Vin VT D ) becomes true, then the current is D L 2 [2(Vin VT D )Vout Vout ]= [VDD Vout VT L (Vout )]2 . (1.142) 2 2
1.3.5
The condition for the load being in nonsaturated region is VDSL < VGSL VT L (VBSL ) VGG > VDD + VT L (VDD ) (1.143) (1.144)
This extra bias ensures that VOH = VDD . Writing VDSL = (VDD Vout ) and VGSL = (VGG Vout ), the nonsaturated load current is given by L IL = [2(VGG Vout VT L )(VDD Vout ) (VDD Vout )2 ]. (1.145) 2 The load line is got from this equation by setting ID = IL , Vout = VDSD and rearranging: ID = L (2VGG VDD 2VT L VDSD )(VDD VDSD ). 2 (1.146)
VLSI
Design Course
1-45
1.3.6
The ideal load line in g. 1.49 is for the case, that the load transistor body bias eects are ignored. Because VGSL = 0 > VT L is always satised there always exists a conducting channel in the depletion load. VDSL,sat = (VGSL VT L ) = |VT L | (1.147) Border between saturated and nonsaturated load region: VDD Vout = |VT L |. VT L (Vout ) = VT 0L + L ( Vout + 2|F,L | 2|F,L |) (1.148) (1.149)
Condition for load beeing in saturation: Vout small (VDD Vout ) > |VT L (Vout )| IL = L L [VGSL VT L (Vout )]2 = [VT L (Vout )]2 2 2 (1.150)
VLSI
Design Course
1-46
Figure 1.48: Symbol for depletion mode MOSFET Condition for load beeing in nonsaturation: (VDD Vout ) < |VT L (Vout )| IL = L [2|VT L (Vout )|(VDD Vout ) (VDD Vout )2 ] 2 (1.151)
For the following discussion is assumed that VT D < |VT L | < VDD (1.152)
When Vin < VT D then the driver is in cuto and the load provides a conduction path between VDD and Vout , so Vout VOH VDD . When Vin is increased above VT D the driver enters the saturation region while the load remains ohmic (VDD Vout < |VT L |): D L (Vin VT D )2 = [2|VT L (Vout )|(VDD Vout ) (VDD Vout )2 ] 2 2 (1.153)
VLSI
Design Course
1-47
Figure 1.49: Depletion mode MOSFET load When Vin is increased further then either the driver or the load changes its operational region. If Vout < VDD |VT L | (1.154) is satised rst, then the load will change to saturation while the driver remains in saturation, otherwise Vout < Vin VT D (1.155) is satised rst and the driver becomes nonsaturated while the load is still nonsaturated. When Vin is further increased to a voltage few less than VDD the driver is nonsaturated and the load is in saturation region: D L 2 [2(Vin VT D )Vout Vout ]= [VT L (Vout )]2 2 2 (1.156)
VLSI
Design Course
1-48
Figure 1.50: VTC for inverter with depletion mode MOSFET load Taking into consideration the resistance of the load device: VOH = VDD VDSL |Vin =0 (1.158)
The current IL is in this case the driver leakage current. The conductance of the nonsaturated load is: IL L GDSL = = [2|VT L (VOH )| (VDD VOH )] (1.159) VDSL 2 With VDSL = IL /GDSL results: VOH = VDD IL
kL 2 W L L
(1.160)
Calculation of VOL D L 2 [2(Vin VT D )Vout Vout ]= [VT L (Vout )]2 2 2 and Vout = VOL yields
2 R [2(VOH VT D )VOL VOL ] = |VT L (VOL )|2
(1.161)
(1.162)
Rearranging
2 VOL 2(VOH VT D )VOL +
(1.163)
and solution of this quadratic equation (body bias is ignored at this step) yields VOL = (VOH VT D ) (VOH VT D )2 1 |VT L (VOL )|2 . R (1.164)
VLSI
Design Course
1-49
Design of Depletion Mode Inverters The output voltages VOL and VOH are tuned to predened values by adjusting the (W/L) ratios. For VOH the following equation has been given before VOH = VDD IL
kL 2 W L L
(1.165)
where IL is constrained by the driver leakage current. The dierence VDD VOH may be decreased by increasing (W/L)L by the designer (more chip area required) adjusting a proper process parameter VT 0L VT L (VOH ) = VT 0L + L ( VOH + 2|F,L | Setting VOL : rearranging the current equation for VOL gives R = where the driver-load ratio is kD D R = = L kL
W L W L D L
2|F,L |)
(1.166)
(1.167)
(1.168)
If the design problem is described by a simplied resistive network, the driver on resistance
VLSI
Design Course
1-50
[2(VOH VT D ) VOL ]
1 Gon
(1.169)
With load in saturation the drain-source resistance of the depletion-mode MOSFET is RDSL = The equation VOL = Ron VDD Ron + RDSL (1.171) 1 GDSL = VDD VOL
kL 2 W L L
(1.170)
implies that the value of VOL is lowered by increasing R . The transistor conductances are proportional to their (W/L) ratios.
1.3.7
CMOS inverter
Disadvantages of CMOS: processing is more complex than for NMOS: extra processing steps must be added to create n-tub areas for ptransistor realizations (including extra step for adjusting the threshold voltage of the p-channel device) additional processing steps for latchup prevention: guard rings prevent from unwanted forward biased pn junctions CMOS realizations of circuits generally require more transistors than equivalent NMOS-designs
Advantages of CMOS: CMOS circuits dissipate power only during switching events. When the inputs are stable, only leakage currents are required from the power supply. (NMOS: current ow, when driver is on) VOH = VDD and VOL = 0V the voltage transfer curve of a CMOS inverter will exhibit a sharp transition
CMOS Inverter Characteristics Vin = VGSn = VDD + VGSp Vout = VDSn = VDD + VDSp (1.172) (1.173)
For Vin < VT n Vout = VDD the nMOS transistor is in cuto while the pMOS transistor is in nonsaturation (|VDSp | = |Vout VDD | < |VGSp VT p | = |Vin VDD VT p |). When Vin is increased to values above VT n , the nMOS transitor starts conducting in saturation mode while the pMOS transistor is still in ohmic region: n p (Vin VT n )2 = [2(VDD Vin |VT p |)(VDD Vout ) (VDD Vout )2 ] 2 2 (1.174)
VLSI
Design Course
1-51
As Vin is increased further, Vout is decreased. When the point is reached, where (VDD Vout ) > (VDD Vin |VT p |), both transistors are in saturation: n p (Vin VT n )2 = (VDD Vin |VT p |)2 2 2 When Vout falls to a level where Vout < (Vin VT n ), the nMOS transistor becomes nonsaturated: n p 2 [2(Vin VT n )Vout Vout ] = (VDD Vin |VT p |)2 . 2 2 When the point is reached, where (VDD Vin ) < |VT p | the pMOS transistor goes into cuto ( IDn = IDp = 0, Vout = 0). Calculation of VOH VOH VDD when Vin < VT n (n-channel transistor in cuto, current is leakage current only) (1.179) (1.178) (1.177) (1.176) (1.175)
Calculation of VOL VOL 0 when (VDD Vin ) < |VT p | (p-channel transistor in cuto)
Calculation of VIL Equating currents for saturated nMOS and nonsaturated pMOS device: n p (VIL VT n )2 = [2(VDD VIL |VT p |)(VDD Vout ) (VDD Vout )2 ] 2 2 Evaluation of condition (dVout /dVin ) = 1 for IDn (Vin ) = IDp (Vin , Vout ): dVout (dIDn /dVin ) (IDp /Vin ) = = 1 dVin IDp /Vout Evaluating the derivation gives VIL 1 + n p = 2Vout + n VT n VDD |VT p | p (1.182) (1.181) (1.180)
VLSI
Design Course
1-52
Calculation of VIH At this point of the VTC the nMOS device is nonsaturated and the pMOS transistor is saturated. n p 2 [2(VIH VT n )Vout Vout ] = (VDD VIH |VT p |)2 . (1.183) 2 2 The derivation condition (dVout /dVin ) = 1 has to be evaluated for IDn (Vin , Vout ) = IDp (Vin ): dVout (dIDp /dVin ) (IDn /Vin ) = = 1 dVin IDn /Vout which gives p p = 2Vout + VT n + (VDD |VT p |) (1.185) n n This equations forms together with equation 1.183 a quadratic in VIH which has to be solved. VIH 1 + Calculation of Vth For Vth = Vin = Vout both transistors are saturated. n p (Vth VT n )2 = (VDD Vth |VT p |)2 2 2 Solving for Vth yields: Vth = VT n + p /n (VDD |VT p |) (1 + Design While at nMOS design a lot of eorts have to be made to optimize the levels of VOH and VOL , the ratio (W/L) in CMOS design is used to set the level of Vth (VOH = VDD , VOL = 0). p p = n n
W L W L p n
(1.184)
(1.186)
(1.187)
p /n )
(1.188)
The ratio required to establish a given inverter threshold voltage is n (VDD Vth |VT p |) = . p (Vth VT n ) To get a symmetrical VTC, Vth is set to VDD /2: n = p If in a process is set |VT p | = VT n
1 2 VDD 1 2 VDD
(1.189)
|VT p | VT n
(1.190)
n . p
(1.191)
VLSI
Design Course
1-53
Since n /p 2.5 a minimum area CMOS inverter will have (W/L)n In this case the VTC is completely symmetric.
1 and (W/L)p
2.5
1.4
1.4.1
dVOU T , tHL = t2 t1 , (1.192) dt Driver goes from Cuto over Saturation into Nonsaturation region. Border between Saturation and Nonsaturation is reached at the time tx and output voltage Vout = VOH VT D . In order to simplify the nal expressions, the following integrations for computing tHL are done with the borders from VOH to VOL (correct borders would be from V1 = VOL + 0.9(VOH VOL ) to V0 = VOL + 0.1(VOH VOL )).
VOH VT
Saturation : tx t1 = COU T
VOH VOL
(1.193)
Nonsaturation : t2 tx = COU T
VOH VT
(1.194)
tx t 1 =
follows : t2 tx =
tHL = with =
= COU T
1 + RLIN E (VOH VT )
1.4.2
VLSI
Design Course
1-54
tLH
=
t1
dt = COU T
V0
dVOU T IL (VOU T )
(1.203)
IL = tLH
= RL COU T
V0
with V0 = 10% of the whole Voltage swing V0 V1 VOL + 0.1(VDD VOL ) , VOL + 0.9(VDD VOL ) and V1 = 90% of the whole swing
NMOS Rise Time for Depletion Load Inverter First Approximation : IL tLH = =
L 2
COU T V IL
(1.210)
With more accuracy : VT L is not constant because of the substrate eect (=body bias eect). Depletion MOSFET changes from saturation to nonsaturated mode, if VDD VOU T < |VT L |. Nonsaturation IL = L 2 |VT L (VOU T )| (VDD VOU T ) (VDD VOU T )2 2
VDD |VT L |
(1.211)
tLH
= COU T
V0
V1
VDD |VT L |
dVOU T IL(nonSAT )
(1.212)
COU T L |VT L |
(1.213)
with
1 L |VT L |
+ RLIN E COU T
VLSI
Design Course
1-55
(1.214)
tP HL = COU T
VOH
dVOU T ID (VOU T )
(1.215)
VOH VT D
= COU T
VOH V1/2
COU T
VOH VT D
(1.216)
= D
(1.217)
with D =
COU T D (VOH VT D )
(1.218) (1.219)
Depletion load
VDD |VT L | V1/2
tP LH
= COU T
VOL
dVOU T IL(nonSAT )
(1.220)
VDD |VT L |
tP LH
COU T L |VT L |
(1.221)
VLSI
Design Course
1-56
The CMOS Inverter has a full supply voltage swing: VOH = VDD V0 = 0.1VDD and and VOL = 0V V1 = 0.9VDD . (1.222) (1.223)
The high-to-low time tHL is similar to the NMOS Inverter tHL = COU T n (V1 VTn ) 2VTn 2(V1 VTn ) + ln 1 (V1 VTn ) V0 (1.224)
From symmetry (VT n VT p ; n p ) follows: tLH = COU T p (V1 |VTp |) 2|VTp | 2(V1 |VTp |) + ln 1 (V1 |VTp |) V0 (1.225)
1.4.5
From symmetry follows: tP LH = p 2|VTp | 4(VOH |VTp |) + ln 1 (VOH |VTp |) (VOH VOL ) 1 (tP HL + tP LH ) 2 . (1.227)
tp = n =
with and
(1.228) (1.229)
p =
(1.230) (1.231)
VLSI
Design Course
1-57
VLSI
Design Course
1-58
VLSI
Design Course
1-59
VLSI
Design Course
1-60
VLSI
Design Course
1-61
VLSI
Design Course
1-62
VLSI
Design Course
1-63
VLSI
Design Course
1-64
The power-delay-product characterizes the overall performance of a digital circuit: P DP = Pav tp (1.233)
where Pav is the average power dissipated by the circuit and tp is the average propagation delay time. a small PDP is desirabel. For PDP computation the input signal waveform must be taken into consideration (Fig. 1.60). For the following PDP analysis, simplied versions of propagation delay time equations will
Figure 1.60: PDP: input signal waveforms be used: tP HL tP LH tp D = Ron Cout L = RL Cout 1 (Ron + RL )Cout 2 average propagation delay (1.234) (1.235) (1.236)
VLSI
Design Course
1-65
Figure 1.61: PDP for inverter with resistor load With Iav (average power supply current) the power dissipated by the circuit is Pav = Iav VDD Static State Contribution to the PDP: P av
2 VDD 2(Ron + RL )
(1.237)
(1.238)
(factor 1 2 because resistively loaded inverter is considered to be half of time in output low state in output high state no power is dissipated) (P DP )DC 1 2 Cout VDD 4 (1.239)
Output Rise Interval Contribution to the PDP: With driver in cuto : Vl V Iav Cout = Cout t tLH with Vl = VDD (Vout : 0 VDD ). (P DP )LH
2 Cout VDD
(1.240)
tp tLH
(1.241)
VLSI
Design Course
1-66
Output Fall Interval Contribution to the PDP: Iav where Iinitial = Assuming VOL 1 (Iinitial + Inal ) 2 Inal = VDD 2R L 1 (VDD VOL ) RL (1.242)
(1.243)
VOH = VDD
(1.244)
With tP HL
(1.245) (1.246)
Ron tp RL tHL
(1.247) (L /2).
For well-designed inverters is Ron RL . The propagation delay time is then tp With the approximations tLH = 2L and tHL = 2D follows: P DP 3 2 Cout VDD 4
(1.248)
PDP for Depletion-Load nMOS Inverter Static State Contribution to the PDP: Average DC power dissipation: 1 Imax VDD (1.249) 2 where Imax is the maximum power supply current (this is for Vout = VOL Imax = ID (Vout = VOL )). Assuming that the probability for the inverter being in this state is 50% (Pav )DC (Pav )DC D 2 [2(VOH VT D )VOL VOL ]VDD 4 L [VT L (VOL )]2 VDD 4 (1.250) (1.251)
Output Rise and Fall Interval Contributions to the PDP: (Iav )LH = with IL (t) = ID (t) + Cout 1 T
tLH
IL (t)dt
0
(1.252)
dVout dt
(1.253)
VLSI
Design Course
1-67
tLH
ID (t)dt +
0
1 Cout T
tLH 0
dVout dt dt
(1.254)
ID,LH
1 tLH
0
tLH
ID (t)dt .
(1.255)
ID,LH = 0 if Vin is an ideal square wave (driver in cuto). The second term can be evaluated as follows tLH dV V1 out dt = dVout (1.256) dt V0 0 1 (Iav )LH = [ID,LH tLH + Cout (V1 V0 )] (1.257) T
VLSI
Design Course
1-68
The equations for the discharge time are similar: (Iav )HL = 1 T
tHL
IL (t)dt
0
(1.258) (1.259)
dVout dt
1 [ID,HL tHL Cout (V1 V0 )] (1.260) T tHL 1 ID,HL ID (t)dt . (1.261) tHL 0 So the transient power supply current is 1 (1.262) (Iav )transient = (ID,LH tLH + ID,HL tHL ) T For the total PDP of the depletion-load inverter follows: tp 1 Imax VDD tp + (ID,LH tLH + ID,HL tHL )VDD (1.263) P DP 2 T To understand this expression, assume that ID,LH = ID,HL = ID,av . The logic switching frequency is f = 1/T and the maximum switching frequency is fmax = P DP 1 . tHL + tLH (1.264)
1 f Imax VDD tp + ID,av VDD tp (1.265) 2 fmax For f fmax the DC term of this equation is dominating. When f = fmax , the inverter is never in the stable state where Vout = VOL , so P DP PDP dependence: P DP Cout VDD (Voltage) PDP for CMOS Inverter Current ows only during a switching event so the average current in a logic cycle T can be written as 1 Iav = [IDn ,LH tLH + IDn ,HL tHL ] . (1.268) T In this equation tLH 1 IDn ,LH IDn (t)dt (1.269) tLH 0 gives the average current during the rise time, while IDn (t)dt (1.270) tHL 0 is the average fall time current. For a completely symmetric CMOS inverter IDn ,LH = IDn ,HL = IDn ,av , so the power-delay product is given by P DPCMOS = IDn ,av VDD tp f fmax (1.271) IDn ,HL 1
tHL
ID,av VDD tp
(1.266) (1.267)
VLSI
Design Course
1-69
MOSFET capacitances are complicated functions of the fabrication processes and the layout geometry nonlinear, voltage-dependent capacitances exact analysis not possible ( computer simulation) here: hand computations/estimations in an average sense
MOS Overlap Capacitors Refering to Fig. 1.64 the physical length of a polysilicon gate is given by L = Ls + L + LD (1.272)
The gate overlap is necessary to ensure the contact of the channel and the n+ regions. The overlap capacitances are given by Cols = Cox W Ls , Cold = Cox W Ld (1.273)
with Cox = ox /xox (gate capacitance per unit area). Self-aligned process: polysilicon gate is employed as a mask to dene the n+ source and drain regions. The overlaps occur, because the following processing steps require heating of the wafer ( lateral diusion). The overlap capacitances may only be inuenced by the designer by varying the channel width W . In design rule sets the overlap capacitance is often dened by: Co = Cox Lo Cols = Cold = Co W (1.274)
VLSI
Design Course
1-70
Figure 1.64: MOSFET capacitor model MOSFET Gate Capacitances Cgs = Cox W Lf1 (VGS , VGD ) Cgd = Cox W Lf2 (VGS , VGD ) Cgb = Cox W Lf3 (VGS , VGD , VSB ) (1.275) (1.276) (1.277)
The gate-bulk capacitance consists of the gate capacitance in series with the depletion capacitance of the depletion region. 1. Cuto: no inversion layer channel Cgb Cgs Cgd Cox W L 0 0 (1.278) (1.279) (1.280)
VLSI
Design Course
1-71
Figure 1.65: MOSFET gate capacitances in the three operational regions 2. Nonsaturation: the channel shields the bulk electrode from the gate since the inversion layer acts as conductor between drain and source Cgb = 0 Cgb Cgs Cgd 0 1 VDS Cox W L 1 + 2 3VDS,sat 1 VDS Cox W L 1 2 VDS,sat (1.281) (1.282) (1.283)
3. Saturation: the channel shields the bulk electrode from the gate since the inversion layer acts as conductor between drain and source Cgb = 0. The channel is pinched o and does not contact the drain n+ region. Cgb Cgs Cgd 0 2 Cox W L 3 0 (1.284) (1.285) (1.286)
Combination of the gate capacitances with the overlap contributions: CG = Cox W L where L = L + 2Lo (1.288) (1.289) (1.287)
VLSI
Design Course
1-72
Figure 1.66: Gate capacitances as functions of gate-source voltage The Bulk Junction Capacitances
Figure 1.67: Expanded view of an n+ drain or source region for computing depletion capacitances The reverse-biased depletion capacitance per unit area of a pn junction is given by C= Cj 0 1+
Vr 0 1/ 2
(1.290)
where Vr is the magnitude of the reverse-bias voltage applied to the junction. 0 is the built-in
VLSI
Design Course
1-73
potential 0 = kT q ln Nd Na n2 i (1.291)
1 Nd
(1.292) 0
The bottom capacitance can be computed simply using the doping concentrations Nd and Na for the pn junction: Cj 0 W Y Cbottom = (1.293) 1/2 Vr 1 + 0 For computing the sidewall capacitance the p+ channel stop doping must be taken into consideration ( see also technology description later on). The sidewall capacitance is usually computed by rst taking the sidewall capacitance per unit area as Cj 0sw = where 0sw = kT q ln Nd Na,sw n2 i (1.295) qSi 2
1 Na,sw
1 Nd
(1.294) 0sw
is the sidewall built-in potential. Because the n+ area has a junction depth of xj , the sidewall capacitance per unit length Cjsw is taken as Cjsw = Cj 0sw xj The total sidewall capacitance is then given by Csw = Cjsw l 1+
Vr 0 sw 1/2
(1.296)
(1.297)
where l is the total sidewall perimeter length (2W + 2Y ). Assuming 0 = 0sw , the total depletion capacitance for a drain or source area is given by Cd (Vr ) = Cbottom + Csw Cj 0 W Y + Cjsw l . = 1/2 Vr 1 + 0
(1.298)
For drain regions Vr = VDB and for source regions Vr = VSB the depletion capacitance depends on actual voltages. An average depletion capacitance may be dened by Cav = = 1 V2 V 1 20 CT (V2 V1 )
V2 V1
Cd (Vr )dVr 1+ V2 0
1/2
(1.299) 1+ V1 0
1/ 2
(1.300)
VLSI
Design Course
1-74
where CT = Cj 0 W Y + Cjsw l . Dening a dimensionless voltage factor K (V1 , V2 ) = yields Cav = K (V1 , V2 )CT (1.303) 20 Cav = CT (V2 V1 ) 1+ V2 0
1/ 2
(1.301)
1+
V1 0
1 /2
<1
(1.302)
1.4.8
VLSI
Design Course
1-75
(1.304)
For computation of the line capacitance transmission line theory should be used (parasitic capacitances, structures must be treated in a distributed manner). The problem can be reduced by a lumped-element approximation: Cline with Cint = Cint Aline ox [F/cm2 ]. xint (1.305)
(1.306)
Cint is the capacitance per unit area formed between the line and the substrate, xint is the oxide thickness between line and substrate. The line resistance can be estimated in a similar manner by Rline = nR2 [] (1.307) where n = (d/w) is the number of squares (2) with area w2 as seen in the direction of current ow. Fig. 1.70 gives an example for cascaded stages with a fanout of three:
Figure 1.70: Capacitance calculation for FO = 3 C CG3 + CG4 + CG5 + (Cline ) (1.308)
The output capacitance of CMOS inverters can be computed using similar techniques. In Fig. 1.71 two cascaded CMOS inverters are shown. Cout CGDn + CGDp + K (VOL , VOH )(Cdbp + Cdbn ) + Cline + CG (1.309)
VLSI
Design Course
1-76
Figure 1.71: Approximation used for Cout in cascaded CMOS inverters with CG the input capacitance of the next stage, which is given by CG = CGn + CGp (1.310)
VLSI
Design Course
1-77
Assuming that device dimensions are scaled with S > 1, such that Length = Length S (1.311)
This length reduction applies to all geometries in the chip. nMOS high-to-low time: tHL = D 2V T D 2(VOH VT D ) + ln 1 VOH VT D VOL (1.312)
Scaling: also voltage reduction by V = (V /S ). The term enclosed by curly brackets in the previous equation remains constant, but D is modied: D = Cout D (VOH VT D ) (VOH VT D ) S D = SD (1.313)
(1.314) (1.315)
Cout consists of oxide and depletion capacitances: (C )oxide = Cox (Area) = (C )junction Cout = D The maximum switching frequency is fmax = 1 tHL + tLH Sfmax (1.320) (C )junction S Cout S D S (C)oxide S (1.316) (1.317) (1.318) (1.319)
(approximation)
If the voltage is kept constant (only lengths are scaled): D fmax D S2 = S 2 fmax (1.321) (1.322)
VLSI
Design Course
1-78
CMOS Technology
CMOS Process Flow
VLSI
Design Course
1-79
CMOS Technology
1.5.2 The Latch-Up Eect
Figure 1.73: Latch-up in n-tub CMOS inverter signicant problem in CMOS circuits If the base-emitter junction of the pnp transistor becomes forward biased, the transistor is switched on and I begins to ow, causing the npn transistor to be forward biased. The collector current of the npn transistor forces the pnp transistor to conduct more current. This feedback leads to latch-up and the circuit will be destroyed by heat.
VLSI
Design Course
1-80
CMOS Technology
Figure 1.74: Guard rings for latch-up prevention The circuit can be prevented from latch-up by placing heavily doped guard ring around the MOSFETs. This reduces the eectiveness of the base and emitter regions in both transistors.
VLSI
Design Course
1-81
Chapter 2
Several kinds of combinational logic: Random Logic: Circuit design using NAND gates, NOR gates and Inverters (often called AOI Logic Gate Representation = AND-OR-Inverter Logic)
VLSI
Design Course
2-1
Figure 2.2: Complex gate logic primitive: CMOS inverter Passtransistor Logic: transistors are used as switches which are controlled by input literals.
VLSI
Design Course
2-2
Figure 2.4: A complementary switch Logic Arrays: PLA (programmable logic arrays), gate-matrix layout, Weinberger Arrays and regular layout achieved by application of the Euler-Graph method
VLSI
Design Course
2-3
VLSI
Design Course
2-4
VLSI
Design Course
2-5
VLSI
Design Course
2-6
VLSI
Design Course
2-7
VLSI
Design Course
2-8
VLSI
Design Course
2-9
VLSI
Design Course
2-10
VLSI
Design Course
2-11
VLSI
Design Course
2-12
Static CMOS Complex Gate Logic Properties Build logic gates as shown in gure 2.15 where transistors are represented as switches The pMOS pull-up network replaces resistive or depletion loads used in nMOS technique Congure so that for each input combination: either a p-chain pulls the output up or an n-chain pulls the output down pull-up and pull-down networks implement complementary functions, when one conducts the other does not No quiescent current through the gate means zero or very low static power dissipation Active pull-up chains are faster than resistive loads Switching time is the same for both kind of output changes
VLSI
Design Course
2-13
VLSI
Design Course
2-14
Design Method nMOS devices pull the output to 0 when the gate inputs are 1 pMOS devices pull the output to 1 when the gate inputs are 0 Consider a function to be realized: F (A, B, C, . . .) nMOS pull-down network must realize the pull-down function FP D = F (A, B, C, . . .) pMOS pull-up network must realize the pull-up function FP U = F (A, B, C, . . .) The literals in FP U have to be inverted, because the p-channel transistors conduct, if their gate input is 0 (low). Example: Realization of F = A + B + C (NOR) FP D = A + B + C FP U = A+B+C =ABC
(Boolean expression transformation is to be done by applying the Shannon inversion theorem De Morgans law) Synthesis can use conventional logic design techniques (Boolean functions, Karnaugh maps, logic minimization) and express the results in AND/OR form for realisation in series and parallel connections for devices
VLSI
Design Course
2-15
Rules for Logic Formation Rule 1: nMOS transistors in series implement the AND operation Rule 2: nMOS transistors in parallel implement the OR operation
Rule 3: Logic functions in series are ANDed together Rule 4: Parallel nMOS branches OR the individual branch functions
First the logic nMOS transistors are structured according to the rules above. The output of the function is the complement of the nMOS logic. Now the pMOS transistor network has to be structured according to the following rules:
VLSI
Design Course
2-16
Rule 5: Parallel connections of nMOS transistors have to be transformed to serial connections of pMOS transistors. The input literals applied to the pMOS transistors are identical with the gate inputs of the nMOS transistors (no inversion needed) Rule 6: Serial connections of nMOS transistors have to be transformed to parallel connections of pMOS transistors. Input literals remain unchanged Rule 7: Parallel connected logic blocks of the nMOS network serial connection in the pMOS network Rule 8: Serial connected logic blocks of the nMOS network parallel connection in the pMOS network
VLSI
Design Course
2-17
VLSI
Design Course
2-18
VLSI
Design Course
2-19
VLSI
Design Course
2-20
Example: Combinational Adder CARRY = AB + AC + BC = AB + C (A + B ) SUM = ABC + AB C + A BC + ABC = ABC + (A B + B C + A C )(A + B + C ) = ABC + CARRY(A + B + C ) (2.5) (2.4)
VLSI
Design Course
2-21
Figure 2.19: Combinational adder layout possibilities for one adder circuit
VLSI
Design Course
2-22
Figure 2.20: Pseudo nMOS logic Substitute the pMOS network by one single pMOS load transistor Consists of a single pMOS load per gate (emulating the nMOS depletion load, without body eect) and a nMOS pull-down network Needs ratioed devices Dissipates static power, when pull-down network is on Provides a method of emulating nMOS circuits in CMOS Reduced noise margin
VLSI
Design Course
2-23
Passtransistor and Transmission Gate Logic 2.4 Passtransistor and Transmission Gate Logic
Example: Pass Transistor NXOR Realisation A 0 0 1 1 B 0 1 0 1 A B 1 0 0 1 Pass Function A+B A+B A+B A+B
VLSI
Design Course
2-24
VGS,P = VDD Vin = VDS,P (2.6) Since the passtransistor is always saturated, the charging current equation can be written as: Cin where P = (n Cox ) P dVin = (VDD Vin VT P )2 dt 2 W L (2.7)
(2.8)
P
Ignoring the body bias eect the solution of this dierential equation is given by (initial condition: Vin (0) = 0): Vin (t) = (VDD VT P ) With ch (VDD VT P ) 1+
P t 2Cin (VDD
(2.9)
VT P ) (2.10)
2Cin P (VDD VT P )
VLSI
Design Course
2-25
this solution may be written as Vin (t) = (VDD VT P ) The maximum load voltage is given by Vin (t ) = (VDD VT P ) = Vmax or taking into account the body bias with VT P (Vin ) = VT 0P + ( 2|F | + Vin for the maximum voltage follows: Vmax = VDD VT P (Vmax ) = (VDD VT 0P ) ( 2|F | + Vmax 2|F |) . (2.14) 2|F |) (2.13) (2.12) (t/ch ) 1 + (t/ch ) . (2.11)
Consequences of the Passtransistor Charging Characteristics for the Design of Passtransistor Networks 1. Cascaded Passtransistor Chain: Vchainout = Vmax = (VDD VT P ) Vmax is propagated through the passtransistor chain
2. Pass Transistor driving another Pass Transistor: V1,max = (VDD VT P 1 ) and V2,max = (V1,max VT P 2 ) reduction of Vmax !
2.4.2
VLSI
Design Course
2-26
Since the passtransistor is always nonsaturated, the charging current dierential equation can be written as: dVin P 2 Cin = [2(VDD VT P )Vin Vin ] (2.16) dt 2
VLSI
Design Course
2-27
Ignoring the body bias eect the solution of this dierential equation is given by: Vin (t) = (VDD VT P ) where dis 2et/dis 1 + et/dis . (2.17)
(2.18) (2.19)
2.4.3
VLSI
Design Course
2-28
Figure 2.28: pMOS pass transistor pMOS Transmission Characteristics Its not possible to discharge the capacitator to 0 Volts because Vout (t ) = |VT P | = Vmin (2.20)
Transmission Gate Model Logic Level Logic 0 Logic 1 nMOS 0 (VDD VT n ) pMOS |VT p | VDD dVout dt CMOS 0 VDD (2.21) (2.22) (2.23) (2.24)
Vout (t) = VDD [1 e(t/T G ) ] with T G = RT G Cout Logic 0 transfer: Vout (t) = VDD e(t/T G )
VLSI
Design Course
2-29
Figure 2.30: CMOS transmission gate Equivalent Resistance RT G = VT G IDn + IDp (2.25)
Rn = Rp =
(2.26) (2.27)
VLSI
Design Course
2-30
VLSI
Design Course
2-31
VLSI
Design Course
2-32
VLSI
Design Course
2-33
VLSI
Design Course
2-34
= AB + A B
VLSI
Design Course
2-35
VLSI
Design Course
2-36
VLSI
Design Course
2-37
Full adder equations: Sn = (An Bn )C n1 + (An Bn )Cn1 Cn = (An Bn )Cn1 + (An Bn )An (2.35) (2.36)
VLSI
Design Course
2-38
Figure 2.44: Multiplex/Demultiplex operations 4-to-1 Multiplexer: F = D0(AB ) + D1(AB ) + D2(AB ) + D3(A B ) Multiplexers can be used as function generators (2.37)
Figure 2.45: TG-logic: 4-to-1 multiplexer (example: for D0=1, D1=0, D2=0, D3=0 an AND function is realized)
VLSI
Design Course
2-39
Split Arrays improvement of the layout eciency by separating pMOS and nMOS transistors into two distinct areas (physical separation)
Pass Transistor Logic with pMOS Pull-Up For reduction of device count and area an nMOS version with pMOS pull-up can also be useful ( kind of pseudo nMOS).
VLSI
Design Course
2-40
Clocking
Chapter 3
Clock Signal: used to synchronize data ow through a digital network clocked static or dynamic circuits problems: clock skew (delay caused by clock distribution wires)
Figure 3.1: Ideal nonoverlapping 2-phase clocks Condition for nonoverlapping clock signals 1 (t) and 2 (t): 1 (t)2 (t) = 0 t (3.1)
VLSI
Design Course
3-1
Clocking
3.1.1
Figure 3.3: Single clock 2-phase timing For nonoverlapping clock phases and ne tuned and well designed delay lines (realized as Transmission gates) have to be inserted in order to avoid overlapping of and .
VLSI
Design Course
3-2
Clocking
VLSI
Design Course
3-3
Clocking
VLSI
Design Course
3-4
Figure 3.8: Clocked shift register circuit Time constant for charging and discharging: T G = RT G CL where CL = CT G + Cin + Cline VA = VDD : (Vin (0) = 0) Vin (t) VDD [1 et/T G ] (3.4) (3.3) (3.2)
Inverter is switched, when Vin = VIH which occurs after t1 T G ln 1 VIH VDD (3.5) (3.6)
VLSI
Design Course
3-5
VA = 0 :
(Vin (0) = VDD ) Vin (t) VDD et/T G VIL VDD (3.7)
Figure 3.9: Leakage path in a CMOS TG The load capacitance, seen by the transmission gate (TG) is CL = CT G + Cline + Cin The depletion capacitance contributions to CL are due to the reversed pn junctions MOS transistors. As shown in g. 3.9 a leakage current ow exists across the reverse pn junctions. The inuence of this leakage current on the charge stored in CL depends values of ILp and ILn . With IL = ILn ILp the leakage current inuence on Vin is given by CL dVin = IL dt (3.11) (3.9) in the biased on the (3.10)
If ILp > ILn the capacitance is charged by IL otherwise it is discharged or remains constant when the ideal condition ILp = ILn is true. dQstore dt = ILp ILn dQstore dV (3.12) (3.13)
Cstore =
Assuming that the leakage currents ILp and ILn are constant and that the node charge voltage relation is linear of the form Qstore = Cstore V (3.14)
VLSI
Design Course
3-6
Figure 3.10: Charge leakage problem in CMOS TG follows (because Cstore is const.) Cstore The solution of this equation is V (t) = (ILp ILn ) t + V (0) Cstore (3.16) dV = ILp ILn . dt (3.15)
VLSI
Design Course
3-7
Figure 3.11: Charge leakage circuit With Tmax = 2tmax (the longest allowed clock period) follows for the minimum frequency fmin 1 2tmax IL 2Cstore V (3.18)
Figure 3.12: Transmission gate capacitance CT CG + Cline + Cols + Cold + CSBp (V ) + CDBn (V ) . (3.19)
So the storage capacitance can be estimated by voltage averaging of this expression: Cstore CG + Cline + Cols + Cold + K (0, VDD )[CSBp + CDBn ] (3.20)
VLSI
Design Course
3-8
For a realistic analysis of the charge leakage problems the dependence of the leakage currents from the reverse voltage bias has to be taken into consideration (see [25]).
VLSI
Design Course
3-9
Figure 3.13: Basic charge sharing circuit t<0: (TG switched o) V1 (t < 0) = VDD V2 (t < 0) = 0 QT t>0: (TG switched on) QT Vf = (C1 + C2 )Vf = V1 (t > 0) = V2 (t > 0) C1 = VDD C1 + C2 1 = VDD 1 + (C2 /C1 ) (3.24) = C1 VDD (3.21) (3.22) (3.23)
(3.25)
QT =
i=1
Ci Vi (0)
(3.26)
QT =
i=1
Ci Vf
(3.27)
Final voltage: Vf =
N i=1 Ci Vi (0) N i=1 Ci
(3.28)
VLSI
Design Course
3-10
Charge Sharing
VLSI
Design Course
3-11
Pull-up (pull-down) network of static CMOS is replaced by a single precharge (discharge) transistor. The remaining network then conditionally discharges (charges up) the output in a second operation phase One logic level is held by dynamic charge storage Transistor count is reduced from 2n (static CMOS) to n+2 for dynamic precharged CMOS (but now: 2 phases of operation)
3.4.1
Precharge Phase If Vin = 0 then ch = Worst case (Vin = VDD ): ch,max = Rp (Cout + Cn ) tch,max = ch,max 2|VT p | 2(VDD |VT p |) + ln 1 (VDD |VT p |) V0 (3.30) (3.31) Cout = Rp Cout p (VDD |VT p |)
(3.29)
VLSI
Design Course
3-12
Dynamic Logic
Figure 3.16: Dynamic nMOS inverter: precharge and evaluate Evaluation Phase For the case that M1 is switched on and identically designed channel width for M1 and Mn the discharge time constant is given by dis = (L1 + Ln )Cout kn W (VDD VT n ) (3.32)
VLSI
Design Course
3-13
Dynamic Logic
tdis = dis
(3.33)
VLSI
Design Course
3-14
Dynamic Logic
1 2tM
fmax
(3.35)
3.4.2
3.4.3
single phase clock input should change during precharge only input must be stable at the end of the precharge phase in the evaluation phase the output remains HIGH (LOW) or is optionally discharged (charged)
VLSI
Design Course
3-15
Dynamic Logic
3.4.4 Complex Logic
VLSI
Design Course
3-16
Dynamic Logic
3.4.5 Dynamic Cascades
pMOS blocks and nMOS blocks have to be installed alternated in order to avoid glitches
VLSI
Design Course
3-17
Figure 3.23: Basic domino logic circuit Domino Logic: design method for glitch-free cascading of nMOS logic blocks Each stage is driven by Precharge during = 0 Evaluation when = 1 Domino logic blocks consist of a precharge/evaluation block and an output inverter Precharge Phase: The gate output is precharged to logic 1 and the inverter output is going to logic 0. Logic transmission errors are avoided by providing a logic 0 at the inverter output (avoiding discharge of the next logic stage). Evaluation Phase: The inverter output stays according to the actual input values at logic 0 or is set to logic 1. The correct result signal is provided at the end of the domino cascade after stabilization of all stages.
VLSI
Design Course
3-18
VLSI
Design Course
3-19
$\phi_N$
$\phi_N$
VLSI
Design Course
3-20
VLSI
Design Course
3-21
out
out
n-channel only
p-channel only
clk
$\overline{\rm clk}$
Domino logic consists of either n-type or p-type blocks small load capacity to be driven by logic (one inverter only) = low dimension of transistors only one clock signal required only positive logic realizations possible because of the input inverters domino logic is noninverting Functions as F1 = A B = AB + AB F2 = A B = AB + A B (3.36)
VLSI
Design Course
3-22
Precharge Assuming that all Ai (coming from previous stages) are zero, the capacitance CX is charged, where CX = C0 + CT (CGDn1 + CBDn1 ) + (CGDp1 + CBDp1 ) + CG + Cline Evaluate If all inputs Ai are set to logic 1, the worst case delay time can be estimated by tD Rn Cn + (Rn + R3 )C3 + (Rn + R3 + R2 )C2 + +(Rn + R3 + R2 + R1 )C1 + (Rn + R3 + R2 + R1 + R0 )CX with Rj = 1 kn (W/L)j (VDD VT n ) (3.39) (3.37) (3.38)
(3.40)
VLSI
Design Course
3-23
VLSI
Design Course
3-24
Figure 3.33: Use of feedback to control a pull-up MOSFET for charge sharing problem
VLSI
Design Course
3-25
(NORA = NO RAce)
3.6.1
NORA Properties
NORA is very insensitive to clock delay one clock signal and the inverted clock signal with short slopes rise times are sucient no inverter is needed between the logik stages, because of alternate use of n-type and p-type blocks the last stage is a clocked inverter, a C2 MOS latch
3.6.2
Figure 3.34: Signal race problem From g. 3.34 the signal race problem can be seen: A signal race can arise, when both transmission gates conduct at the same time. If the new input from TG1 reaches the input of TG2 while TG2 is still transmitting the output, the output information will be lost. Imperfect TG synchronization occurs because of normal transition intervals or clock skew.
VLSI
Design Course
3-26
NORA Logic
VLSI
Design Course
3-27
NORA Logic
3.6.3 NORA Structuring
clk1
$\overline{clk1}$
clk1
clk2
VLSI
Design Course
3-28
NORA Logic
VLSI
Design Course
3-29
NORA Logic
VLSI
Design Course
3-30
Memory Structures
Principle of CMOS Information Storage
VLSI
Design Course
3-31
Memory Structures
VLSI
Design Course
3-32
Memory Structures
3.7.2 Dynamic Flip-Flops: Pseudo 2-Phase Clocking
Figure 3.42: Pseudo 2-phase clocking (a) waveforms and simple latch, (b) clock skew, and (c) slow clock edges
VLSI
Design Course
3-33
Memory Structures
3.7.3
VLSI
Design Course
3-34
Memory Structures
VLSI
Design Course
3-35
Memory Structures
VLSI
Design Course
3-36
Memory Structures
3.7.4
Figure 3.46: Reduced transistor count latch better with high impedance sustainer transistor: (accurate simulation is required for correct function)
$\phi_1$
$\phi_2$
Figure 3.47: Reduced transistor count latch with high impedance sustainer transistor
VLSI
Design Course
3-37
Memory Structures
3.7.5
Dynamic D-Latches
Figure 3.48: Dynamic D-Latches Characteristic Equation: Q(t) = D(t) and LD = 1 = Q(t 1) and LD = 0 where D(t) is the state of the data at time t Q(t) is the state of the latch at time t Q(t-1) is the state of the latch at time t-1
VLSI
Design Course
3-38
Memory Structures
3.7.6
VLSI
Design Course
3-39
Memory Structures
3.7.7
VLSI
Design Course
3-40
Memory Structures
3.7.8
VLSI
Design Course
3-41
Memory Structures
3.7.9
VLSI
Design Course
3-42
Memory Structures
3.7.10
VLSI
Design Course
3-43
Memory Structures
3.7.11
VLSI
Design Course
3-44
Memory Structures
VLSI
Design Course
3-45
Memory Structures
2-Phase D Flip-Flops layouts of Fig 3.53a, 3.54a and 3.54b
VLSI
Design Course
3-46
Memory Structures
3.7.12
INPUTS CL D R X X 1 X X 0 X X 1
S 0 1 1
OUTPUT Q 0 1 NA
VLSI
Design Course
3-47
Signaldelay
Chapter 4
Performance
4.1
4.1.1
Signaldelay
Resistance Estimation
The resistance of an uniform slab of conducting material may be expressed as R= where t l w = = = = resistivity thickness conductor length conductor width
l w
l w
units of (ohms per square). Thus to obtain the resistance of a layer, one would simply multiply the sheet resistance Rs , by the ratio of the length to width of the conductor. Note that for metal having a given thickness t, the resistivity is known, while for poly and diusion the resistivities are signicantly inuenced by the concentration density of the impurities that have been introduced into the conducting regions during implantation. This means that the process parameters have to be known to accurately estimate these quantities. Although the voltage-current characteristic of a MOS transistor is generally nonlinear, it is sometimes useful to approximate its behavior in terms of a channel resistance to estimate performance. The channel resistance may be expressed by Rc = k with k=
0 r
L W
1
tox
(Vgs Vt )
VLSI
Design Course
4-1
Signaldelay
"!$# %&'#(0)132")4# 57 6 6 57 8 " $ ! 9 # 3 " 2 4 ) # PQ #$" G E F " H I ( # G RI6 EGF "HI#(7!' @ RI6 EGF HI#ST(!' @ @BADC
Figure 4.1: Basic LOCOS MOSFET structure. For both the n-channel and p-channel devices, k may take a value within the range 50, 000 to 30, 000 2 . The equation for k as given above demonstrates the dependence of channel resistance on the surface mobility of the majority carriers. Since the mobility is also a function of temperature, the channel resistance and therefore switching time parameters, as well as power dissipation, change with temperature variations. The increase in channel resistance may be approximated by +0.25% per C for an increase in temperature above 25 C .
4.1.2
Capacitance Estimation
The dynamic response of MOS systems are very much dependent on the parasitic capacitances associated with the MOS device and interconnection capacitances that are formed by metal, poly, and diusion wires in concert with transistor and conductor resistances. The total load capacitance on the output of a MOS gate is the sum of: gate capacitance (of other inputs connected to the output of the gate) diusion capacitance (of the drain regions connected to the output) routing capacitance (of connections between the output and other inputs). Gate Capacitances The large-signal MOSFET capacitance model that will be used to compute Cgate is based on the self-aligned, poly gate LOCOS (local oxidation of silicon) structure depicted in Fig. 4.1.
VLSI
Design Course
4-2
Signaldelay
Gate (Oxide thickness xox)
Source
Drain
field oxide
Cold
field oxide
n+
Cdb p+ p-substrate
p+
Poly gate
Drain n+
Ls Source n+
Ld
Ys
Yd
Figure 4.2: MOSFET capacitor model. Although the LOCOS MOSFET has been singled out for the analysis, the model developed here is generally applicable to any MOSFET regardless of the technology base. Figure 4.2a shows the basic lumped-element capacitances and their physical origins in terms of the device cross section. This particular model is chosen because it allows the capacitors to be divided into contributions that may be computed directly from the device and processing parameters. 1. The overlap capacitances Cols and Cold are parasitic elements that originate from the basic fabrication steps. In the self-aligned process, the polysilicon gate is employed as a mask to dene the n+ drain and source regions. Directly after this step, Ls = Ld = 0 and L = L. The overlaps occure because the remaining steps require heating of the wafer. This gives rise to lateral diusion of the n+ dopants. Typically, these overlap
VLSI
Design Course
4-3
Signaldelay
capacitance Cgb O
A tox
Linear 0
Saturation 0
Cgs
1 2
A tox
2 3
A tox
Cgd
1 2
A tox
A tox
A tox
2 3
A tox
distances are less than a few tenths of a micron. Cols = Cox W Ls , where Cox =
ox
Cold = Cox W Ld
tox
2. The gate-source capacitance Cgs is really the gate-to-channel capacitance as seen between the gate and source; similarly, Cgd represents the gate-drain capacitance when the channel is acting as a conductor to the drain n+ region. The voltage-dependent nature of the channel implies that these elements are nonlinear. Cgb is the gate-bulk capacitance and consists of the gate capacitance in series with the depletion capacitance established by the p-type space charge region. Table 4.1 shows approximated values of these three capacitances in various states of the MOS transistor.
Diusion Capacitances The two remaining capacitors in the model of Fig. 4.2a are Csb and Cdb . These represent the voltage-dependent depletion capacitances that result from the pn junctions at the drain and source regions. The problem of determining these elements is aided by using the expanded drawing in Fig. 4.3. This shows an n+ well in a p-type bulk region and is representative of either a drain or a source; note that a p+ region surrounds the n+ sidewalls. The actual doping prole around the pn junction is generally quite complicated. A step doping will be assumed for simplicity. The total depletion capacitance Cd can be presented by Cd = Cja (W Yd ) + Cjp (2W + 2Yd ) where
VLSI
Design Course
4-4
Signaldelay
Nd
p+
xj
p+
Figure 4.3: Expanded view of an n+ drain or source region for computing depletion capacitances. Cja Cjp W Yd = = = = juntion capacitance per m2 periphery capacitance per m width of diusion region extent of diusion region
Since the thickness of depletion layer depends on the voltage across the junction, both Cja and Cjp are functions of junction voltage Vj . A general expression that describes the junction capacitance is Vj m Cj = Cj 0 1 B where Vj is the junction voltage (negative for reverse bias), Cj 0 zero bias capacitance (Vj = 0), and B the build-in junction potential ( 0.6V ). m is a constant, which depends on the distribution of impurities near the junction, and has a value of the order of 0.3 to 0.5. Routing Capacitances: Routing capacitances between metal and poly layers and the substrate can be approximated using a parallel plate model (C = t A), where A is the area of the plate capacitor, t is the insulator thickness, and is the dielectric constant of the insulating material between the plates. The parallel-plate approximation, however, ignores fringing elds. The eect of fringing elds is to increase the eective area of the plates. Consequently, poly and metal lines will actually have a higher capacitance (up to twice as large) than that predicted by the model. Interlayer capacitance such as metal-poly capacitance is also enhanced by fringing. As line width are scaled, the width (w) and heights of wires tend to reduce less than their separations (l). Accordingly, this fringing eect increases in importance. For current processes, a factor of 1.5 3 should be used. Another factor, which should be taken into account for small
VLSI
Design Course
4-5
Signaldelay
Ij-1 R R Ij R R
Vj-1
Vj
Vj+1
geometries when using the parallel plate model, is that a drawn shape (on mask) will not be the same as the actual physical shape produced on silicon.
4.1.3
RC-line model
The propagation of a signal along a wire depends on many factors, including the distributed resistance and capacitance of the wire, the impedance of the driving source, and the load impedance. For very long wires propagation delays caused by distributed resistance capacitance (RC ) in the wiring layer tend to dominate. This transmission line eect is particularly severe in poly wires because of the relatively high resistance of this layer. A long wire can be represented in terms of several RC sections, as shown in Fig. 4.4. The response at node Vj with respect to time is then given by C (Vj 1 Vj ) (Vj Vj +1 ) dVj = (Ij 1 Ij ) = dt R R
As the number of sections in the network becomes large (and the sections become small), the above expression reduces to the dierential form: rc where x r c = = = distance from input resistance per unit length capacitance per unit length dV d2 V = dt dx2
Solution of this dierential form yields an approximate signal delay of: tl = where r c l = = = resistance per unit length capacitance per unit length length of the wire rcl2 2
VLSI
Design Course
4-6
Signaldelay
1mm
Buffer
1mm
Input
Output
taubuf
Rs tau
Rt
Ct
Cl
The l2 term in the equation above shows that signal delay will be totally dominated by this RC eect for very long signal paths. In order to optimize speed in a long poly line, one possible strategy is to segment the line into several sections and insert buers within these sections as shown in Fig. 4.5. A model for the distributed RC delay, which takes driver and receiver loading into account, is shown in Fig. 4.6. Rs is the output resistance of the driver. Cl is the receiver input capacitance. Rt and Ct are the total lumped resistance and capacitance of the line. is the 2 RC delay calculated using the equation = rc.l 2 . The concept of using RC time constants for delay estimations is based upon the assumption that the time taken for a signal to reach 63% of its nal value approximates the switching point of an inverter. Wire length design guide For the purpose of timing analysis, an electrical mode may be dened as that region of connected paths in which the delay associated with signal propagation is small in comparison with gate delays. For suciently small wire lengths, RC delays can be ignored. Wires can then be treated as one electrical node and modeled as simple capacitive loads. It is therefore useful to dene simple electrical rules that can be used as a guide in determining the maximum length of communication paths for the various interconnect levels. To do this we required that wire delay w and gate delay g satisfy the following condition: w g
VLSI
Design Course
4-7
To full this condition, the maximum length of the wire is given by: l 2g rc
This establishes an upper bound on the allowable length of the interconnects where the above approximations are valid.
4.2
To have the same rise and fall times for an inverter, we must make Wp = 2Wn where Wp is the channel width of the p-device and Wn is the channel width of the n-device. This, of course increases layout area and dynamic power dissipation. In some cascaded structures it is possible to use minimum size devices without compromising the switching response. This is illustrated in the following analysis, in which the delay response for an inverter pair (Fig. 4.7a) with Wp = 2Wn is given by
where R is the eective on resistance of a unit-sized n-transistor and Ceq = Cg + Cd is the capacitance of a unit-size gate and drain region. The inverter pair delay with Wp = Wn is
Thus we nd similar responses are obtained for the two dierent conditions.
4.3
Power Dissipation
There are two components that establish the amount of power dissipated in a CMOS circuit. These are: 1. Static dissipation due to leakage current.
VLSI
Design Course
4-8
Power Dissipation
tinv.pair R
R 3Ceq 3Ceq
(b) Wp=2Wn
tinv.pair 2R
R 2Ceq 2Ceq
(b) Wp=Wn
2. Dynamic dissipation due to: (a) switching transient current (b) charging and discharging of load capacitances
4.3.1
Considering a complementary CMOS gate, if the input=0, the associated n-device is OFF and the p-device is ON. The output voltage is VDD or logic 1. When the input=1, the associated n-channel is biased ON and the p-channel device is OFF. The output voltage is 0V (VSS ). Note that one of the transistors is always OFF when the gate is in either of these logic states. Since no current ows into the gate terminal, and there is no D.C. current, and hence power Ps , is zero. However, there is some small static dissipation due to reverse bias leakage between diusion regions and the substrate. The source-drain diusion and the p-well diusion form parasitic diodes. Since the diodes are reverse biased, only their leakage current contributes to static power dissipation. The leakage current is described by the diode equation i0 = is (e kT /q 1) where
V
VLSI
Design Course
4-9
Power Dissipation
is V q k T = = = = = reverse saturation current diode voltage electronic charge Boltzmanns constant temperature
The static power dissipation is the product of the device leakage current and the supply voltage. A useful estimate is to allow a leakage current of 0.1nA to 0.5nA per gate at room temperature. Then total static power dissipation Ps is obtained from Ps = (
n 1 leakage
For example, typical static power dissipation due to leakage for an inverter operating at 5V is between 1 2nW (nano-watts).
4.3.2
During transition from either 0 to 1 or, alternatively, from 1 to 0, both n- and p-transistors are on for a short period of time. This results in a short current pulse from VDD to VSS . Current is also required to charge and discharge the output capacitive load. This latter term is generally the dominant term. The current pulse from VDD to VSS results in a short circuit dissipation which is dependent on the load capacitance and the gate design. This is of relevance to I/O buer design. The dynamic dissipation can be modeled by assuming the rise and fall time of the step input is much less than the repetition period. The average dynamic power, Pd , dissipated during switching for a square-wave input Vin , having a repetition frequency of fp = 1/tp , as shown by Fig. 4.8, is given by 1 Pd = tp where in ip = = n-device transient current p-device transient current
tp /2
1 in (t)Vo .dt + tp
tp
ip (t)(VDD Vo ).dt
tp /2
For a step input with in (t) = CL dVo /dt (CL =load capacitance) Pd = CL tp
VDD
= with fp =
1 tp ,
0 2 CL VDD
CL Vo .dVo + tp
(VDD Vo ).d(VDD Vo )
VDD
tp
resulting in
2 Pd = CL VDD fp
VLSI
Design Course
4-10
Power Dissipation
$t_p$ $V_{in}$
$V_{DD}$
$t$
$V_{DD}$
$0$
$t$
$I_d$
$I_{dn}$ $I_{pn}$
$0$
$t$
Thus for the repetitive step input the average power that is dissipated is proportional to the energy required to charge and discharge the circuit capacitance. The important factor to be noted here is that the lattest equation shows power to be proportional to switching frequency but independent of the device parameters.
4.3.3
The power delay product (PDP) is used to characterize the overall performance of a digital gate circuit. It is given by P DP = Pav tp where Pav is the average power dissipated by the gate and tp is the average propagation delay time. Typically, MOS-based digital gates display power-delay products on the order of a few picojoules (pJ ). The PDP is commonly used to compare the performance of various logic families or processing technologies. A small PDP is desirable, as this implies both low power consumption and fast switching speeds. As a rst step towards understanding the meaning of the PDP, suppose that an ideal square
VLSI
Design Course
4-11
Power Dissipation
Vin(t)
VOH
Vin(t)
VOH
V_{1/2}
wave Vin (t) (Fig. 4.9a) is applied to the resistively load nMOS inverter shown in Fig. 4.10a; the output voltage Vout (t) then assumes the form drawn in Fig. 4.10b. The average propagation delay is 1 tp (Ron + RL )Cout 2 with approximations as followed tP HL D = Ron Cout tP LH L = RL Cout where Ron is the on-resistance of the driver; note that Ron = RDS . The average power dissipated by the circuit is given by Pav = Iav VDD Iav is the average power supply current and is separated into two contributions: the constant (DC) current ow when the output is stable with Vout = VOL and the transient current that ows during the rise and fall times. Using Ohmss law, the average DC power dissipation during the period T is 2 VDD Pav = 2(Ron + RL ) The PDP that results from the constant DC current ow only is given by 1 2 (P DP )DC Cout VDD 4
VLSI
Design Course
4-12
Power Dissipation
$V_{DD}$
$R_L$
$+$
$+$
$C_{out}$
$V_{out}(t)$
$V_{in}(t)$ $-$
$-$
$+$ $V_{1/2}$ $V_{out}$ $R_{on}$ $V_{OL}$ $T/2$ $T$ {\it Resistor analogy for $V_{out}=V_{OL}$ (c)} $t$ $-$
The total power-delay product for the circuit must also account for the average power consumed by the gate during the rise and fall time intervals. Consider rst the charging current supplied by VDD during the rise time tLH . Since the driver is in cuto, this can be estimated by (V ) Vl Iav Cout = Cout (t) tLH with Vl = VDD being the logic swing. The resulting PDP contribution due to this current is then tp 2 (P DP )LH Cout VDD tLH The power supply current used by the inverter during the discharge time tHL is approximated by 1 1 (VDD VOH ) (VDD VOL ) + Iav (Iinitial + If inal ) = 2 2 RL RL Iinitial and If inal give the current at the beginning and end of the discharging event. Thus,
VLSI
Design Course
4-13
Power Dissipation
$V_{in}(t)$ $V_{OH}$
$V_{1/2}$
$V_{OL}$ $t$ $0$ $T/2$ {\it Input voltage waveform} {\it (a)} $T$
$I(t)$ $I_{max}$
$I(t)$ $I_{peak}$
assuming VOL
VDD 2R L
Now, noting that tP HL D , a rst-order estimate for the discharge time tHL is tHL 2D = 2Ron Cout . Forming the power-delay product for this time interval gives the term
2 (P DP )HL Cout VDD
Ron tp RL tHL
The complete expression for the PDP is obtained by summing all contributions:
2 P DP Cout VDD
This can be simplied by noting that Ron RL will be valid in a well-designed inverter. The propagation delay time is then tp (L /2). Using this in conjunction with the approximations tLH 2L and tHL 2D gives 3 2 P DP Cout VDD 4 as the lowest-order approximation for the total PDP.
VLSI
Design Course
4-14
Scaling
The power-delay product for the CMOS inverter is computed by using the current waveform in Fig. 4.11c. Since current ows only during a switching event, the average power supply current required during a single logic cycle T can be written by Iav = 1 [IDn,LH tLH + IDn,HL tHL ] T
In this equation IDn,LH gives the average current during the rise time, while IDn,HL is the average fall time current. For a completely symmetric CMOS inverter, the two currents are the same, so the power-delay product is given by P DPCM OS = IDn,av VDD tp f fmax
4.4
Scaling
Very large-scale integration (VLSI) requires dense circuit layouts on silicon. The level of integration depends on the smallest-size feature permitted by the fabrication processes. To obtain the highest packing density, the size of the transistors must be made as small as possible. This, however, changes the internal operating physics of the MOSFETs. Phenomena that are negligible in large devices become limiting factors as device geometries are reduced. This section discusses some of the important aspects involved in describing small MOSFETs. The level is introductory, with emphasis on parameters that aect circuit design. The model we use is a simple rst-order constant eld scaling.
4.4.1
Scaling principles
First-order MOS scaling theory indicates that the characteristics of an MOS device can be maintained and the basic operational characteristics preserved if the critical parameters of a device are scaled in accordance to a given criterion. Such an approach has shown to be very eective in scaling from the range 5m to 10m minimum features to the range 1m to 3m minimum feature size. Although rst-order scaling does not give optimized device performance at small dimensions, the technique is very powerful in providing the necessary guidelines to identify the improvements (or otherwise) that can be expected as processes are scaled. Basically the scaled device is obtained by applying a dimensionless factor to all dimensions, including those vertical to the surface device voltages the concentration densities. The resultant eect of the rst-order scaling process is illustrated in Table 4.2. Table 4.2 shows that if device dimensions (which include channel length L, channel width W , oxide thickness Tox , junction depth Xj , applied voltages, and substrate concentration density N ) are scaled by the constant parameter , then the depletion thickness d, the threshold voltage
VLSI
Design Course
4-15
Scaling
SCALING FACTOR 1/ 1/ 1/ 1/ 1/ 1 1/ 1/ 1/ 1/2 1/2 1/3 1/2 1 1
DEVICE PARAMETER
RESULTANT INFLUENCE
PARAMETER Length; L Width; W Gate oxide thickness; tox Junction depth; Xj Substrate doping; Na or Nd Supply voltage; VDD Electric eld across gate oxide; E Depletion layer thickness; d Parasitic capacitance; W L/tox Gate delay; (V C/I ) DC power dissipation; Ps Dynamic power dissipation; Pd Power speed product Gate area Power density; (V I/A) Current density; (I/A) Transconductance; gm
Vt , and drain-to-source current Ids are also scaled. One of the important factors to be noted is that since the voltage is scaled, electric eld E in the device remains constant. This has the desirable eect that many nonlinear factors essentially remain uneected. A further point is that reduction in oxide thickness would require the fabrication process to provide thinner oxides with comparable yield to conventional oxide thicknesses. The depletion regions associated with the pn junctions of the source and drain determine how small we can make the channel. As a rule, the source-drain distance must be greater than the sum of the widths of the depletion layers to ensure that the gate is able to exercise control over the conductance of the channel. Thus in order to reduce the length of the channel one needs to reduce the width of the depletion layers. This is accomplished by increasing the doping level of the substrate silicon. As we scale device dimensions by 1/, the drain-to-source current Ids per transistor reduces by , the number of transistors per unit area; that is, circuit density scales up by 2 , which subsequently results in the current density scaling linearly with . Thus wider metal conductors will be necessary for densly packed structures. A second characteristic illustrated in Table 4.2 is power density. Both the static power dissipation Ps and frequency dependent dissipation Pd decrease by 1/2 as the result of scaling. However, since the number of devices per unit area increases by 2 , the resultant eect is that the power density remains constant. An estimation of the limit in power density is derived from the thermodynamic relationship given by Tj = Tamb + jA .P where
VLSI
Design Course
4-16
Scaling
Tj Tamb jA P = = = = temperature of silicon chip ambient temperature thermal resistance of the package power dissipation.
Generally the thermal resistance is expressed as C per watt, which means one watt of heat energy will raise the temperature by C . As the temperature increases, the carrier mobility falls, thus reducing the gain of devices. This, in turn, would reduce the speed of circuits. If high temperature, high speed circuits are required, then special consideration during design is necessary. One of the limitations of rst-order scaling is that it gives the wrong impression of being able to scale proportionally to zero dimension, or to zero threshold voltages. In reality, both theoretical and practical considerations do not permit such behavior. This is highlighted when the surface concentrations become larger than surface concentrations become larger than 1 1019 cm3 , above which the gate oxide breaks down, before surface inversion can take place for the formation of the channel.
4.4.2
Although constant-eld (rst-order) scaling gives a number of improvements, there are a number of curcuit parameters such as voltage drop, line propagation delay, current density, and contact resistance that exhibit signicant degradation with scaling. For example scaling the thickness and width of a conductor by , reduces the cross-sectional area by 2 . The scaled line resistance r is given by R L/ t/ W/ = R =
where is the conductivity term and t is conductor thickness. The voltage drop along such a line can now be expressed as Vd = (I/)(R) = IR which is a constant. However, for constant chip size, the length of some of the signal paths that traverse across the chip, as a rule, do not scale down. This gives the principal result that voltage drops along communication paths are larger by a factor of with respect to the scaled voltages. In a similiar manner, we can derive the line response time as s = (R)(C/) = RC which is a constant. However, as before, for a constant chip size many of the communication paths do not scale. Thus the line response time normalize to scaled line response is larger by a factor of . The signicance of this result is that it is somewhat dicult to take the full advantage of the higher switching speeds inherent in scaled devices when signals are required to propagate over long paths. Thus the distribution an organization of clocking signals becomes a major problem as geometries are scaled. The inuence of scaling on interconnection paths is summarized in Table 4.3. As seen from Table 4.3, metal lines must carry a higher current with respect to cross-sectional area; thus
VLSI
Design Course
4-17
electron migration becomes a major factor to consider. The second problem relates to an increase in the capacitance of wiring. As the level of integration increases, the average line length on a chip tends to increase also. However the power dissipation per gate decreases, which diminishes the ability of gates driving wiring capacitances. Under such condition, average gate delay is determined by the interconnection rather than the gate itself. Many of these limitations are being overcome by scaling lateral dimensions while keeping vertical dimensions approximately constant.
4.5
4.5.1
One of the most important issues in chip planning is the routing of power. In technologies in which there is only one level of metal, VDD and ground are routed in interdigitated trees. This is illustrated in Fig. 4.12. Crossunders are very dicult. When necessary, these are done in low resistance interconnect (poly over buried contact over active area) with a multiplicity of contact cuts. Consider the extreme case of a crossunder that must cary 100mA. One square of low resistance interconnect might have a maximum resistance of, say, 10/2. Thus a square crossunder would drop 1 volt. Over 50 contact 2m cuts to the metal on each side would be needed because of metal migration limits. Obviously, 100mA is an awful lot of current to squeeze through a crossunder. Even 10mA can be dicult, and 10mA corresponds only to about twenty nMOS inverters. Power is usually distributed locally in diusion since it must get to the sources and drains anyway. For low-power gates, this local power distribution is not too bad, but for high performance devices, great care must be taken. When two levels of metal are available the general power distribution is much easier, though by no means trivial. Clearly, one of the worst scenarios for power supply noise is when large segments of the chip transition simultaneously. One strategy, therefore, is to distribute power in such a way that parts of the chip that are likely to transition all at once are routed separately. If power is distributed across these simultaneously switching segments, we would expect large surges on the power lines, but if power is distributed along the signal lines, then surge currents should be much smaller. A major problem of high performance chips is bringing power onto the chip. Bonding wires
VLSI
Design Course
4-18
Vdd Vss
Vss Vdd
Vdd Vdd
Vdd
Vdd
can bave anywhere from 0.25 to 2nH of inductance (about 0.5 to 1nH/mm). VDD and ground are often double-bonded (two wires to the bonding pad) but while this lowers the inductance somewhat, it does not give the expected factor of two unless the wires are kept far apart. This is because there is mutual coupling between the wires. Seperate power pins might be used for the output driver, since these drivers cause huge switching transients and can tolerate more power supply noise than the internal circuitry.
4.5.2
Clock distribution
Synchronizing machine operations and data transfers with clock pulses provides us with a structured framework for dealing with the complexities of large system designs. Clocking is a global control technique which provides the glue for system operation. It is equally important at the circuit level, particularly in a dynamic logic stage.
VLSI
Design Course
4-19
System level timing can be described using circular timing charts. Consider an ideal pseudo 2-phase scheme with mutually-exclusive pulses 1 and 2 : 1 (t) 2 (t) = 0 System timing can be described by constructing the chart shown in Fig. 4.13. Time increases in a counter-clockwise direction with one full rotation corresponding to the clock periode T . Segments are labeled according to time intervals when a clock signal is high. In this example, 1 = 1 during the rst half-period, while 2 = 1 during the last half-period.
Figure 4.13: Pseudo 2-Phase Clocking Chart A more realistic clocking arrangement is depicted by the clocking circle in Fig. 4.14. If both clocks have 50% duty cycles, normal operation gives 1 (t) 2 (t) = 0 except during the transition times. Mutually-exclusive clock signals provide timing intervals for logical operations, and are used to allow for normal gate delay times. Overlapped segments are avoided to prevent ill-dened movement of data, instructions, or control signals. Transtion times can be made small by proper clock generator design.
Figure 4.14: Pseudo 2-Phase Overlap Times Clock skew is represented by rotating one of the clocks as shown in Fig. 4.15. The skew time ts is dened as the time interval where 1 (t) 2 (t) = 1
VLSI
Design Course
4-20
and indicates the possibility of unwanted simultaneous bit transfers. This may lead to severe conict problems in the operation.
4.5.4
from a single input A basic 2-phase clock generator circuit is designed to generate and CLK signal. This is often a matter of convenience to the user: requiring only a single external clock makes the chips usage more attractive to the board designer. Various circuits have been developed for use in clock generation. Fig. 4.16 provides a CMOS generator/driver which uses a transmission gate as a delay element. MOSFETs M n1 and M p1 form an inverter which acts as the rst driver for the chain. The upper branch of the = CLK while the lower circuit consists of two cascaded inverters and generator the signal branch only has a single inverter and gives = CLK . Transmission gate T G is used as a delay . Since it is biased into active conduction, element to minimize clock skew between and we will model it using an equivalent resistance RT G , and introduce the time constant tD RT G Cin
equalizes the delay between the upper and lower branches. Recalling that the transmission gate conductance can be approximated by GT G n (VDD VT n ) + p (VDD |VT p |)
we see that clocking skew can be controlled by adjusting the size of the TG transistors. Another straightforward approach uses an SR latch as shown in Fig. 4.17. The clocking signal CLK is inverted, and CLK and CLK are used to drive the SR circuit. The 2-phase clock are taken from the latch outputs. This logic can also be used to generate signals and pseudo 2-phase clocks 1 and 2 by redening the outputs.
VLSI
Design Course
4-21
4.5.5
Once the clocking pulses are generated they must be destributed throughout the chip in a manner which minimizes clock skew. Fig. 4.18 illustrates the problem in a pseudo 2-phase circuit by showing timing circles at various points on a chip. Skew problems originate mostly from Unbalanced loads at the driver, Unequal RC line delays, so that the driver circuits and associated distribution schemes are important in maintaining the synchronous logic design. A related problem is that the drive capability of the circuit must be able to handle large capacitive loads at the required clock frequency.
VLSI
Design Course
4-22
Figure 4.18: Clock Skew Due to Distribution One approach to designing a clock distribution network is to use a cascaded chain of inverting buers that matches the clock generator to the distribution line. Also careful global planning and structured distribution patterns can be used to solve the problem. Clock distribution can also be accomplished by using a balanced tree network with multiple fanouts as shown in Fig. 4.19. Identical drivers can be used within a given stage. Moreover, the drive requirements of the output circuits are reduced from the single inverter design since the FO has been split into groups. Each inverter reshapes the clocking waveform, making the performance less sensitive to variations in the interconnect routing. Clock skew problems can be minimized by using symmetrical geometries for the clock distribution lines. An example is the H-tree network shown in Fig. 4.20. Every clock distribution point O is the same distance from the driver D, giving equal delay times. If the load capacitance is the same at every O-point, then the clocks will all be in phase with one another. Other geometrical patterns can be used so long as the general design criteria are unchanged.
4.6
Input pads connect data, control, or clocking signals to on-chip logic gates. When the pads are directly connected to the gate electrodes of MOSFETs, care must be taken to insure that excessive static electrical charge does not destroy the transistor. Protection circuits are designed to drain excessive charge away from the MOS capacitance to avoid static burnout. To understand the origin of the problem, recall that a MOSFET gate is basically a capacitor of value Cg = Cox W L With a gate-substrate voltage VG applied to the transistor, the internal oxide electric eld is
VLSI
Design Course
4-23
where we have ignored any trapped oxide or surface charge. Breakdown occurs because of the fact that silicon dioxide has a breakdown eld value of approximately EBD 7.5 106 V cm If Eox exceeds this value, the oxide insulating properties break down and charge is tranported
VLSI
Design Course
4-24
through the material. This usually results in destruction of the device. Since xox is usually less than about 450 A, the maximum gate voltage VG,max EBD xox which can be applied to the device is a relatively small number. The basic idea of an input protection circuit is to allow for alternate charge ow paths when the input voltage gets too large. Diode structures are very useful in this application since they have relatively breakdown voltages which can be controlled. Moreover, reverse breakdown in a pn-junction is non-destructive, so that the protection circuit is reusable. Junctions which are purposely used at the reverse-bias breakdown voltage are generally termed Zener diodes. Fig. 4.21 illustrates a simple input protection circuit for CMOS IC. Reverse biased pn-junctions are used as protection diodes, and a series connected resistor is included to drop some of the voltage. Both diode pairs (D1 , D2 ) and (D3 , D4 ) are designed to undergo breakdown for positive or negative voltage surges. R is designed to reduce the voltage that reaches (D3 , D4 ); this eectively increases the level of protection to the transistor gate.
Figure 4.21: Input Protection Circuit One problem that exists with this input protection circuits is the introduction of parasitic RC time constants into the network. Other input protection schemes are used. Fig. 4.22 shows a common circuit based on the properties of a thick eld oxide MOSFET. The transistor has an threshold voltage of VT,F > VDD and is in cuto during normal operation. A large input voltage V > VT,F drives the transistor into conduction, providing a path to ground to drain o the excessive charge. The breakdown voltage of the FOX MOSFET is large enough to withstand the high voltages since XF OX is large.
4.7
An interesting and useful problem is that of optimizing a chain of static gates to minimize the overall propagation delay. This type of situation arises in many dierent situations and is important to high-performance circuits. In particular, it is relevant to the output drivers and clocking circuits.
VLSI
Design Course
4-25
Figure 4.22: Thin Oxide MOSFET Protection Circuit A classic example is shown in Fig. 4.23 where the objective is to design the fastest network for driving a large capacitance. For the problem at had, we will assume a series of inverting buers for the driving network. At rst sight, it may appear that we could want the fewest possible gates between the input and the load. This simple solution, however, ignores the eect of capacitive loading on successive stages. Accounting for these factors shows that the sizing of the transistors in the chain allows for minimization of the delay. This gives the interesting result that additional logic gates are often inserted to reduce the overall propagation delay between two points.
Figure 4.23: Capacitive Loading Problem Consider the scaled inverter chain shown in Fig. 4.24. Each gate is characterized by a sizing factor Sj which is normalized to the rst stage such that S1 = 1, while Sj > 1 for (j > 1). By denition, the rst stage has a MOSFET conduction factor 1 = k while the j -th stage is described by j = Sj 1 The values of Ci and C0 are determined by gate 1, and scaled for successive gates. Note that an additional capacitive component Cw has been added between stages. This represents W L
VLSI
Design Course
4-26
the wiring contribution. We assume that the wiring capacitance is between two stages is proportional to the sizing factor of the second stage. The capacitance between the j -th gate and the (j + 1)-st gate can be summarized as follows: Sj Co , output capacitance from gate j Sj +1 Ci , input capacitance to gate (j + 1) Sj +1 Cw , wiring capacitance into gate (j + 1). The time delay through gate j is thus estimated by tD,j = R Sj [Sj Co + Sj +1 (Ci + Cw )]
Our calculation is to determine the values of Sj for (j = 2, ...) which minimizes the total delay through the chain.
Figure 4.24: Inverter Sizing Problem Suppose that there are N stages in the chain. The total time delay is given by TD = R[Sj Co + Sj +1 (Ci + Cw )] Sj j =1
N
To minimize TD , we dierentiate with respect to Sj and look for zero slope points via TD = 0; Sj this results in the recursion relation Sj +1 Sj = Sj Sj 1
VLSI
Design Course
4-27
for j = 2, 3, .., N . If this to hold for arbitrary values of j , then Sj +1 = K = constant Sj must be true. Now then, the boundary conditions of the problems are
S1 = 1 CL SN +1 = Ci and the ends of the chain. Forming the product S 2 S3 S4 SN +1 = KN S 1 S2 S3 SN and using the boundary conditions gives KN = Thus, we obtain the scaling ratio in the form K= CL Ci
1 N
CL Ci
which is our nal result. Explicitly, the scaling factors are given by S1 = 1 S2 = K S3 = K 2 . . . SN = K N 1
as the scaling required to optimize the chain. The minimum delay is then
N
TD,min =
j =1
R[Co + K (Ci + Cw )]
= N R[Co + K (Ci + Cw )]
VLSI
Design Course
4-28
One important point which is obtained from the above analysis deals with the delay time. Sj +1 The equation K = ( S ) says physically that the minimum chain delay occurs when every j stage has the same individual time delay tD . The nal question which must be answered is the number of stages N needed to optimize the delay. To calculate this, we dierentiate TD with respect to N and set the result to 0. This gives the general equation CL RCo + R(Ci + Cw ) Ci
1 N
ln(CL Ci ) =0 N
4.8
O-chip driver circuits are critical to the overall chip design. Much eort is put into speeding up internal switching networks. Careful output design insures that the high-performance specications apply to the external characteristics as well. Some important problems which must be addressed include Ecient buer circuitry between internal and o-chip drivers Minimization of transmission line eects Fast switching Static charge protection as well as interface-specic items such as a CMOS-TTL level converter. An inverter circuit can be used as a basic o-chip driver. The dominant performance factors are the transient switching times tLH and tHL . Transmission line eects also enter into the problem; this is complicated by the fact that the line characteristics such as Z0 depend on the specics of the mounting and circuit traces.
4.8.1
The simplest o-chip driver circuit consists of an inverter chain which is designed to handle a large capacitive load. Cout includes contributions from the bonding pad, the package wiring, and the circuit board trace. Since this easily amounts to tens or a few hundred of picofarads depending on the interface specications, the transistors must be relatively large.
VLSI
Design Course
4-29
Consider the 2-stage o-chip driver network shown in Fig. 4.25. We may use time constants to obtain rst-order design estimates for the sizes of the output transistors M n2 and M p2 by writing W L W L =
n2
=
p2
where n and p are the high-to-low and low-to-high time constants, respectively. Since the output capacitance seen by an o-chip driver can be large, the MOSFET aspect ratios are also quite large. These are obtained using several parallel-connected transistors to aid in layout and parasitic control. Sizing theory may be used to determine the sizes of the rst stage transistors M n1 and M p1.
Figure 4.25: Double-Inverter O-Chip Driver Circuit The actual values of the fall and rise times can be estimated from tHL = n tLH = p 2VT n 2(VDD VT n ) + ln 1 (VDD VT n ) Vo 2|VT p | 2(VDD |VT p |) + ln 1 (VDD |VT p |) Vo
4.8.2
Tri-state o-chip driver circuits are constructed by splitting the input signal to individually control each output transistor. Normal operation gives high and low voltages, while the highimpedance state is obtained by driving both the nMOS and pMOS devices into cuto. An inverting tri-state circuit is shown in Fig. 4.26. When the tri-state variable Z = 1, pMOSFETs M p1 and M p2 are o, while nMOSFET M n conducts. This gives normal circuit operation. If Z = 0, then the gate voltages to output transistors are given by
VLSI
Design Course
4-30
Vp = VDD Vn = 0
so that both are in cuto. A condition of Z = 0 thus provides the necessary high-impedance state. Bi-directional input/output (I/O) circuits are also quite useful. An example is shown in Fig. 4.27. The tri-state section of the circuit is a non-inverting buer with an enable control E , where E = 0 gives the High-Z state. Operation is straight forward and easily understood by examining the circuit.
VLSI
Design Course
4-31
VLSI
Design Course
4-32
Processing Steps
Chapter 5
The fabrication of an integrated circuit consists of a series of steps carried out in a specic order. These steps convert the circuit design into an operable silicon integrated circuit chip. The way in which individual IC fabrication steps are carried out is of critical importance to the outcome of the manufacturing process. The main objective is to minimize the departure of geometrical features of the processed circuit from those determined during the design. To achieve this, a high degree of control over the parameters of each processing step is required. Equally rigid requirements apply to the physical and chemical properties of materials used for IC fabrication as well as to the cleanliness of the production environment.
5.1.1
Wafer Processing
The basic raw material used in semiconductor plants is a wafer or disk of silicon, which varies from 75mm to 150mm in diameter and is less than 1mm thick. Wafers are cut from ingots of single crystal silicon that have been pulled from a crucible melt of pure molten polycrystalline silicon. Controlled amounts of impurities are added to the melt to provide the crystal with the required electrical properties. The crystal orientation is determined by a seed crystal that is dipped into the melt to initiate single crystal growth. The seed is then gradually withdrawn vertically from the melt while simultaneously being rotated. Slicing into wafers is usually carried out using internal cutting edge diamond blades.
5.1.2
A common approach to n-well CMOS fabrication has been to start with a moderately doped p-type substrate (wafer), create the n-type well for the p-channel devices, and build the nchannel transistors in the native p-substrate. The mask that is used in each process step is shown in addition to a sample cross-section through an n-device and a p-device.
VLSI
Design Course
5-1
Processing Steps
Figure 5.1: Cazochalski process for manufacturing silicon ingots 1. The rst mask denes the n-well (or n-tub). p-channel transistors will be fabricated in this well. Field oxide is etched away to allow a deep diusion. 2. The next mask is called the thin oxide or thinox mask, as it denes where areas of thin oxide are needed to implement transistor gates and allow implantation to form por n-type diusions for transistor source/drain regions. The eld oxide areas are etched to the silicon surface and then the thin oxide is grown on these areas. Other terms for this mask include active area, island, and mesa. 3. Polysilicon gate denition is then completed. This involves covering the surface with polysilicon and then etching the required pattern. In a self-aligned process, the poly gate regions lead to aligned source-drain regions. 4. A n+ -mask is then used to indicate those thin-oxide areas (and polysilicon) that are to be implanted n+ . Hence the thin-oxide area exposed by the n+ -mask will become a n+ diusion area. If the n+ -area is in the p-substrate, then a n-channel transistor or n-type wire may be constructed. If the n+ area is in the n-well, then an ohmic contact to the n-well may be constructed. An ohmic contact is one which is only resistive in nature and is not rectifying (as in the case of a diode). In other words, there is no junction and current can ow in both directions in an ohmic contact. This typ of mask is sometimes called the select mask as it selects those transistor regions that are to be p-type. 5. The next step ussually uses the complement of the n+ -mask, although an extra mask is
VLSI
Design Course
5-2
Processing Steps
VLSI
Design Course
5-3
Processing Steps
VLSI
Design Course
5-4
Processing Steps
normally not needed. The absence of a n+ -region over a thin oxide area indicates that the area will be an p+ -diusion. p+ -diusion in the n-well denes possible p-transistors and wires. An n+ -diusion in the n-substrate allows an ohmic contact to be made. Following this step, the surface of the chip is covered with a layer of SiO2 .
Figure 5.6: The p+ Mask 6. Contact cuts are then dened. This involves etching any SiO2 down to the contacted surface. These allow metal to contact diusion regions or polysilicon regions. 7. Metallization is then applied to the surface and selectively etched. 8. As a nal step, the wafer is passivated and openings to the bond pads are etched to allow for wire bonding. Passivation protects the silicon surface against the ingress of contaminants that can modify circuit behavior in deleterious ways. Additional steps might include threshold adjust steps to set the threshold voltages of the nand p-devices. In current fabrication processes the polysilicon is normally doped n+ . The p+ doping phase reduces the poly doping such that the polysilicon inside the p+ regions have a higher sheet resistence than the polysilicon outside the p+ region. The extent of this reduction may inuence the qulaity of metal-poly contacts within p+ regions.
VLSI
Design Course
5-5
Processing Steps
VLSI
Design Course
5-6
Processing Steps
5.1.3 The p-Well CMOS Process
Typical p-well fabrication steps are similar to an n-well process, except that a p-well is used. The rst masking step denes the p-well regions. This is followed by a low-dose boron implant driven in by a high-temperature step for the formation of the p-well. The next steps are to dene the devices and other diusions, to grow els oxide, contact cuts, and metallization. An p-well mask is used to dene a p-well regions, as opposed to a n-well mask in a n-well process. An p+ -mask may be used to dene the p-channel transistors and VSS contacts. Alternatively, we could use a n+ -mask to dene the n-channel transistors, as the masks usually are the complement of each other.
5.1.4
Twin-tub CMOS technology provides the basis for seperate optimization of the p-type and n-type transistors, thus making it possible for threshold voltage, body eect, and the gain associated with n- and p-devices to be independently optimized. Generally the starting material is either an n+ or p+ -substrate with a lightly doped epitaxial or epi layer, which is
VLSI
Design Course
5-7
Processing Steps
VLSI
Design Course
5-8
Processing Steps
used for protection against latch-up. The aim of epitaxy (which means arranged upon) is to grow high purity silicon layers of controlled thickness with accurately determined dopant concentrations distributed homogeneously throughout the layer. The electrical properties for this layer are determined by the dopant and its concentration in the silicon. The process sequence, which is similar to the p-well process apart from the tub formation where both p-well and n-well are utilized, entails the following steps: tub formation thin oxide etching source and drain implantations contact cut denition metallization. Fig. 5.11 illustrates the cross-sections of the 3 processes on an example of an inverter.
VLSI
Design Course
5-9
Processing Steps
5.1.5 Isolation
Device isolation deals with electrically decoupling neighboring transistors on a densely-packed integrated circuit. Unwanted conduction channels must be eliminated by preventing both direct and indirect current ow paths. The most common isolation techniques used in bulk CMOS are LOCOS and trench isolation. LOCOS The Local Oxidation of Silicon (LOCOS) achieves device isolation by selective oxide growth. A typical LOCOS process starts by growing a thin stress relief thermal oxide (SiO2 ) layer on the silicon surface. Next, silicon nitride (Si3 N4 ) is deposited and patterned, keeping nitride in the areas where transistors will be built. The entire surface is then exposed to an oxidizing ambient. Nitride does not oxidize, but any exposed silicon will react to form SiO2 . The resulting LOCOS structure is illustrated in Fig 5.12.
Figure 5.12: LOCOS Isolation Simple analysis shows that XR = 0.46XF OX where XR is the depth of recession and XF OX is the thickness of the grown eld oxide (FOX) which separates device locations. In general, the patterned nitride regions are called active areas, while the oxide growth denes the eld regions between active transistor sections. LOCOS is a widely used isolation technique in many processing lines. However, a major limitation is the problem of active area encroachment which occurs during the FOX growth process and reduces the usable size of the region. The Problem is illustrated in Fig. 5.13. Even though the nitride protects the silicon surface, oxygen diuses through the sides of the stressrelief oxide layer during the FOX growth. SiO2 is thus formed arround the edges, lifting the nitride upwards and forming a characteristic birds beak transition region between the active area and the eld oxide. Encroachment cannot be avoided and aects the integration density.
VLSI
Design Course
5-10
Processing Steps
VLSI
Design Course
5-11
Processing Steps
Trench Isolation Trench isolation uses reactive ion etching (RIE) to form small trenches in the silicon. The trenches are then lled with oxide and polysilicon to electrically isolate neighboring device regions from one another. High integration levels are possible since the trench widths can be reduced to the order of a few microns. Trench isolation is illustrated in Fig. 5.14. A eld implant may be used to increase the trench threshold voltage VT,T r . Small trench dimensions makes this approach particularly important for high-density integration.
Figure 5.14: Trench Isolation The vertical trench regions may also be used to create large-value capacitors without consuming valuable surface real estate. An example geometry which uses doped poly and p+ as capacitor plates is shown in Fig. 5.15. Trench capacitors are commonly used in advanced dynamic RAM (DRAM) cell design since they conserve surface real estate. Trench isolation has been developed to the point where it is a viable production line technique. It eliminates almost the problem of active area encroachment found in LOCOS and is useful when increasing the logic integration density.
5.1.6
Latchup
Bulk CMOS technologies are susceptible to latchup. This condition occurs when a parasitic conducting path is established between VDD and ground, directing current away from the circuit. Once latchup occurs, it can only be stopped by removing the power supply and restarting the circuit. In addition to halting the circuit operation, latchup may induce catastrophic failure from heating. Fig. 5.16 shows the cross-section of a n-well CMOS substrate region where the latchup problem originates. To understand the origin of the latchup problem, note that the voltage across parasitic resistor Rw1 acts to forward bias the emitter-base junction of Q2 . If VEB 2 reaches the turn-on voltage of about 0.7 volts, IC 2 ows. This current owing through Rs1 develops a forward bias VBE 1 across the base-emitter junction of Q1 , causing IC 1 to increase. The
VLSI
Design Course
5-12
Processing Steps
Figure 5.15: Trench Capacitor transistor pair Q1 and Q2 are connected to form a positive feedback loop, so that the buildup continues.
Figure 5.16: Origin of CMOS Latchup Latchup triggering may occur anytime the circuit voltages exceed normal levels. Causes include Voltage overshoot/undershoot Avalanche breakdown
VLSI
Design Course
5-13
Design Rules
Punchthrough Parasitic MOSFETs Photocurrent and others. Although careful circuit design may reduce the possibility of inducing latchup, it is generally worthwhile to take extra precautions. There are two main approaches to dealing with the latchup problem: (a) reduce the transistor current gains, or (b) decouple the transistor feedback loop; it is common to use both in practice. Deep trench isolation can also be used to reduce the possibility of latchup. Fig. 5.17 illustrates adjacent nMOS and pMOS transistors separated by deep trenches. Parasitic bipolar transistors are not found in the structure since the isolating pn-junctions have been replaced by an oxide barrier.
Figure 5.17: Trench-isolated CMOS Latchup prevention is an important aspect of CMOS chip layout and design. One should always check to insure that all suggested rules have been followed to guard against the problem.
5.2
Design Rules
Design rules are sets of geometrical specications which govern chip design for a given fabrication process. The layout rules are statements of the geometrical limits placed on the mask patterns and include items such as minimum widths, dimensions, and spacings. Violating the design rules can lead to a geometry which cannot be replicated in the fabrication line, yielding a non-functional circuit. Designers are often saved from simple mistakes by the omnipotent design rule checker (DRC) used to nd layout violations. Another important fact is that parasitic circuit component values are a direct consequence of the layout geometry. Since the layout is an integral part of the circuit design, it is important to examine how a design rule set aets the overall performance.
5.2.1
Microelectronic lithography is the science of transferring a pattern to each layer of material in an integrated circuit. The resolution of the lithography limits the smallest line dimension
VLSI
Design Course
5-14
Design Rules
and constitutes a metric for the surface dimensions. The most common approach is optical lithography which uses an ultraviolet light source through a patterned mask to selectively expose a light-sensitive photoresist layer. Alternate approaches include electron-beam and X-ray sources; these oer ner resolution but introduce other problems. X-ray lithography currently appears to be the likely winner in the next generation, but recent advances in e-beam systems still look promising. Regardless of the approach, the resolution is limited by diraction eects which occur whenever a wave passes by an opaque edge. This result in the minimum linewidth specication in the design rule set and may be viewed as the smallest mask dimension which can be reliably transferred to the chip surface. U V optical lithography has a minimum linewidth on the order of about 0.5 microns; e-beam systems can pattern down to one-tenth of a micron or less. Diraction also limits how small we can make the spacing between two lines; this consideration gives a set of minimum spacing allowances in the design rule set. Minimum spacings also are needed to account for misaligned masking steps, lateral spreading, and other problems which occur during the many weeks it takes to fabricate a wafer. Yield enhancement plays an important role in setting the nal numbers.
5.2.2
Design rules are best illustrated by example. We consider a 1.5-micron n-well, single-poly, double-metal process which uses 10 masks. The process ow description in Table 5.2 lists the major steps in the fabrication and indicates each mask in proper sequence. Geometrical layout rules specify minimum mask feature sizes. Rules are provided for each masking layer, and also for spacings between dierent layers. The former originates from lithographic constraints or physical considerations. Bloats and shrinks may be applied to selected layers during the fabrication process, but the resulting physical overlay for the structure is still represented by the layout drawing. Table 5.1 provides a listing of design rules for a 2-micron CMOS process. These consist of minimum widths or dimensions, minimum spacings between features on the same or other layers, overlap distances, and other item of importance to the chip layout. Some examples of the rules are shown below. Ground rules are usually accompanied by a complete set of drawings to illustrate each specication.
VLSI
Design Course
5-15
Design Rules
Mask 01 NWELL Value () 3.0 2.0 13.0 1.5 2.25 2.25 2.50 3.0 3.75 0.0 6.0 03 POLY 1.5 2.0 1.25 0.75 2.25 1.5 1.25 1.51.5 1.5 1.25 1.75 1.0 1.75 2.0 2.25 1.5 2.0 1.0 1.5 2.0 1.5 2.75 3.0 100100 5 Description Minimum width Minimum spacing (same polarity) Minimum spacing (dierent polarity) Minimum width (diusion line) Minimum width under POLY Minimum spacing (same polarity) Minimum spacing (dierent polarity) p-ACTIVE inside of NWELL to NWELL-edge: pMOSFET p-ACTIVE outside of NWELL to NWELL-edge: substrate contact n-ACTIVE inside of NWELL to NWELL-edge: well contact n-ACTIVE outside of NWELL to NWELL-edge: nMOSFET Minimum width Minimum spacing Gate Overlap with ACTIVE POLY outside of ACTIVE to ACTIVE edge POLY inside of ACTIVE to ACTIVE edge Minimum spacing Spacing to ACTIVE PPLUS is reverse of NPLUS Size Minimum spacing Spacing to POLY edge (from inside) Spacing to POLY (contact outside of POLY) Spacing to ACTIVE edge (from inside) Spacing to ACTIVE (contact outside of POLY) Minimum width Minimum spacing Size Minimum spacing Overlap with METAL1 Overlap with METAL2 Spacing POLY or ACTIVE Spacing to CONTACT Minimum width Minimum spacing Dimensions Spacing to glass edge
02 ACTIVE
07 METAL1 08 VIA
09 METAL2 10 PAD
An integrated circuit may be viewed as a set of overlaid geometric patterns. Each layer is
VLSI
Design Course
5-16
Design Rules
3
E
2 35
E
c same
10
Potent.
Active Area
p-active Poly
'
1.50
E ' ' ' 3.75 E
n-well 30
E
p-active
T
2.25
2.25
c
2.50
c
6.00
E n-active
3.0
c
2.25
c
scribe lane
n-active
0.75
Poly
2.0
' E
' E c
Active Area
scribe
T
2.25
c ' T ' E ' E
lane 30
E
1.50
1.25
1.50
minimum channel length for VDD = 5V is 1.5 and for VDD > 5V 2.25.
VLSI
Design Course
5-17
Design Rules
n+
Active Area
n+ diusion
Poly T
2.25
p+
is reverse of
n+
T c
2.0
p-active
' E
1.25
' E 1.50
n+
T1.25 c
n+
Contact
2.75
c T
Active Area
2.75
c
Poly
T c 1.50 'E
1.75
' E T 1.25 c
1.75 T
c
1.50
VLSI
Design Course
5-18
metal1 1.0
T c
metal1
' E
scribe lane
'
30
2.0
Via
metal1
metal1
T 2.0 c d d T c
2.0
c
2.0
c
Poly
Active Area
Poly
2.0
2.0
metal2 metal2
'
scribe lane 30
E
metal2
T 1.5 c ' d d
VLSI
Design Course
5-19
Design Rules
STEP NO. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 MASK NO. 01 02 LAYER NAME NWELL ACTIVE THIN OXIDE POLY POLY NPLUS PPLUS CONTACT METAL1 METAL1 VIA METAL2 METAL2 GLASS PAD Process STEP Start with n-type wafer n-tub diusion Active area denition Grow gate oxide Deposit polysilicon Pattern polysilicon n+ implant p+ implant Deposit oxide Pattern poly contacts Deposit metal 1 Pattern metal 1 Deposit CVD oxide Pattern metal 2 contacts Deposit metal 2 Pattern metal 2 Nitride passivation Pattern pad openings
03 04 05 06 07 08 09 10
shaped to provide the proper characteristics when referenced to every other layer. High-density circuit design requires compacting the geometrical patterns into a small area without violating the design rules. Active Areas Dimensional specications for active device areas are larger than that permitted by the lithography to account for encroachment from the isolation. As shown in the sequence of Fig. 5.18, growth of the eld oxide creates the birds beak region which must be avoided when patterning the device. Gate Dimensions Basic self-aligned MOSFETs are fabricated using the polysilicon gate as a mask for a n+ or p+ drain/source ion implant. Lateral doping aects give eective channel lengths which are smaller than the drawn values shown on the poly mask. Gate Overhang Self-aligned MOSFETs use the gate polysilicon as a mask to the drain and source implants. To insure a functional MOSFET we require that the masks are drawn so that the poly gate extends further than required in the W direction. Fig. 5.20 shows the geometry. Providing
VLSI
Design Course
5-20
Design Rules
Figure 5.18: Active Area Encroachment in LOCOS for a gate overhang allowance compensates for mask misalignment between the poly and n+ or p+ regions. If the gate over hang is reduced to zero, then even a minor registration error would result in a shorted transistor. Contacts and Vias Contact and via etches in the oxide can be troublesome failure points in a high-density layout. If the contact windows are too large, nonuniform coverage may result in void formation and other problems. The same comment also applies to oxide cuts which are too small. To avoid inducing contact-related failure modes, it is common practice to allow only one size for contact windows; large areas are connected by multiple contacts. This is illustrated in Fig. 5.21. Metal Dimensions Metal layers are deposited at the end of the fabrication sequence. They generally encounter a very rugged terrain due to patterning of the previous layers. Owing to this fact, the design rule widths and spacing must be large to insure electrical current ow. Another reason for
VLSI
Design Course
5-21
Figure 5.19: Eective Channel Length increased widths is to allow larger current ow levels for power and ground connections.
5.3
The title circuit extraction includes a broad class of layout analysis problems. The fundamental problem is connectivity extraction, which derives a list of interconnections among the terminals from a layout description. There are several parameter extraction, which augment the basic connectivity information with measurements of features that are related to the (analog) electrical characteristics of the chip. Consider the problem of nding transistors. Transistors are formed by intersecting the polysilicon and diusion layers; their type depends on the presence or absence of dierent kinds of implant or tub. Most circuit extractor treat two points (on the same or dierent layers) as electrically connected if they lie in the same region of a single layer or if they can be joined by a sequence of regions on several layers that are connected explicitly by contact windows. A common circuit extraction operation is to nd maximal regions of electrically connected points, more commonly called nodes. This operation involves labeling the contents of each layer so that items belong to the same node if and only if they have the same label.
VLSI
Design Course
5-22
5.3.1
Connectivity Extraction
The output of connectivity extraction is a list of transistors on the chip, together with node numbers on each transistors gate, source, and drain. This transistor list is adequate for
VLSI
Design Course
5-23
checking the logical correctness of the circuit. In order to check analog characteristics of the circuit, it is necessary to extract parasitic capacitances and resistances and transistor size information. The rst step in connectivity extraction is to create derived layers that correspond to transistor of dierent kinds and to electrically connected regions on single layers. To illustrate the creation of derived layers using the edge representation, suppose the artwork for an nMOS chip includes the following six levels: Dmask (the diusion mask), P mask ( the polysilicon mask), M mask (the metal mask), Cmask (contact windows from metal to underlying layers), Bmask (buried contact windows between polysilicon and diusion), and Imask ( the depletion transistor implant). Then we could create ve derive layers as follows: trans dwires PDcuts MPcuts MDcuts Dmask and Pmask and not Bmask Dmask and not trans Pmask and Dmask and Bmask Mmask and Pmask and Cmask Mmask and Dmask and Cmask and Pmask
Regions in layer trans are transitor channels, that is, places where polysilicon crosses diusion outside of a buried contact region. Conduction diusion regions are represented in layers dwires. Files P Dcuts, M P cuts, and M Dcuts contain pricisely the places where materials of the appopriate types make electrical contact. The next step is to assign globally consistent signal labels to the items on each conducting layer that belong to a node, using the contact windows to merge signals between layers. The nal step in connectivity extraction is to nd for each transistor the signal labels on the nodes that are its terminals. This requires examinig all regions that abut a transistor region.
5.3.2
To extract capacitance we still treat each node as equipotential but also consider it as the terminal of one or more capacitors. Each region has a capacitance between itself and the chip substrate and also internodal capacitances between itself and other overlapping or nearby nodes. Substrate capacitance can be accurately approximated as a function of the area and perimeter of each region on each layer. Capacitance between two nodes of the circuit is much harder to compute accurately. Internodal capacitance is not a simple function of area and perimeter.
5.3.3
Analog characteristics such as the drive of an MOS transistor are a function of its channel length and width. For a rectangular transistor formed by polysilicon that completely overlaps diusion, length is one-half of the transistors perimeter with polysilicon, and width is one-half of the transistors perimeter with diusion.
VLSI
Design Course
5-24
When we consider the problem of extracting resistances from a layout, the abstraction of transistors connected by equipotential nodes breaks down completely. It does not make sense to associate resistance with a node: resistance is dened between pairs of points. Thus, a 1) node attached to the terminals of k transistors gives rise to k(k2 resistances, one between each pair of terminals. One idea is to reduce the number of resistances we must compute by chopping the region into electrically isolated regions. If we add the appropriate k 2 junctions to a node attached to k 2) terminals, then we need to compute only O(k ) resistances, instead of k(k2 . (See Fig. 5.22)
Figure 5.22: A region with eight terminals has 28 interconnection resistances. Making the cross-hatched juntions into new nodes splits the region into 10 electrically isolated regions and reduces the number of interconnection resistances to 10 A second way to reduce the number of resistances is to break nodes into rectangles by introducing articial junctions at corners. Thus, resistances can be more easily computed. Careful resistance extraction is the hardest and most expensive problem. Indeed, most chips are manufactured without ever undergoing a complete resistance extraction because such an extraction would result in a prohibitively large network of resistors.
5.3.5
The technology description le contains all information specic for a particular technology. Among this information, and of particular importance for the extractor, is the specication of the layers that can be used in a process and electrical parameters of that process. Layers are specied by their name and their type. The type of a layer distinguishes between auxiliary layers, implantation layers, and interconnect layers. Auxiliary layers are ignored by the extractor. Interconnect layers form the conducting patterns in a chip layout, so in a chip all interconnections will always be made via such layers. If the layers is of type interconnect, an associated terminal layer must be specied for it. Given the interconnect layers, the extractor is able to determine where the nodes of an element are located. Another important part of the technology description is the specication of the elements to be extracted. For extraction of parasitic elements, electrical process parameters must be known. The layer capacities or layer and contact resistances are necessary for exact modelling of parasitic ca-
VLSI
Design Course
5-25
Basic Layout
pacitances and resistances on a wafer, e.g.: for calculating load capacities (gates) and coupling (between wires). Furthermore, process parameters must be involved during the design. So poly lines can not be designed too long because of the high layer capacity and resistance of polysilicon. Example parameter of an n-well CMOS process are listed in Table 5.3 and 5.4. Capacities Gate-Oxide n+ di to substrate (bottom) n+ di to substrat (sidewall) p+ di to n-well (bottom) p+ di to n-well (sidewall) Polysubstrate Metal1substrate Metal2substrate Metal1metal2 Metal1poly Metal2poly Metal1n+ -di. Metal1p+ -di. Metal2n+ -di. Metal2p+ -di.
nF Value ( cm 2) 135 25 4 (pF /cm) 38 4 (pF /cm) 5.9 3.2 2 3.9 5.4 2.5 5.2 5.5 2.4 2.5
5.4
Basic Layout
Transforming schematics into physical circuits occurs during the layout process. All aspects of the circuit performance are structured by the patterning. Parasitics, interconnect coupling, and logic integration density are also determined by the geometries used in the layout artwork. Although layout is easy to learn, the interplay between the geometrical shapes and the resulting electrical behavior makes it dicult to master.
VLSI
Design Course
5-26
Basic Layout
5.4.1 IC Design
IC design is a very complex process that involves hundreds of decisions dealing with the variety of IC performances and manufacturing-related issues. The nal phase in the design is the creation of an IC layout; i.e. the creation of the drawing representing the geometry of the designed circuit. For a given process such a drawing uniquely denes the IC geometry and therefore the performance of the designed circuit. The layout of an IC is dened as a set of polygons that determines the presence or absence of regions in a number of conducting and isolating layers. In other words, an IC layout shows from which part of the IC surface such materials as metal, silicon dioxide, photoresist, and so on should be removed, and where other materials should be deposited. During the design the IC is represented by a set of numbers that can be manipulated to create a composite drawing of IC masks on the screen of the terminal or on the color plotter. In the manufacturing process a hard copy of this layout is needed in the form of photolithographic masks. Typically, the IC design is transformed into a set of masks in a sequence of steps illustrated in Fig. 5.23. First, coordinates of all elements of the IC composite drawing are computed. Then data representing dierent layers are separated(Fig. 5.23 (c) and (d)) and an image of each IC layer is produced. Typically, such images are engraved on the surface of glass plates covered with chromium, using a photographic technique and pattern generator or E-beam equipment. Masks created in this way are called master mask. Next master masks are scaled down (Fig. 5.23 (e-f)) and duplicated (Fig. 5.23 (g-h)) so that working masks made in this way contain a couple of tens to a couple of hundreds of the same images as tte master masks. The size of the working mask is such that with a single exposure the entire area of a single manufacturing wafer can be covered. In the new lithography techniques, working masks are not needed and the image from the mask is transferred directly onto the surface of the wafer (the master mask is then called a reticle). Special high-precision optical step-and-repeat cameras are used for this purpose. Data that describe a single IC layer can also be used to project an image directly onto the surface of the manufacturing wafer using an electron beam technique. In this technique a deected beam of electrons exposes appropriate regions directly on the surface of the photoresist.
5.4.2
Structured layout is based on the idea of grids and cells. The simplest approaches start with the power distribution lines VDD and VSS and structure the circuits as needed. Each gate is placed in a semi-rectangular cell, and cascaded logic is achieved using adjacent cells. Fig. 5.24 illustrates the general idea. Both signal and power lines run horizontally in the network. Logical gates are built between metal VDD and VSS lines, while the signals may move between poly and metal layers when necessary. Minimization of the area is achieved by creative placement and shaping of the MOSFETs, interconnects, and cells in the overall grid structure. It is important to remember that the dimensions set the electrical characteristics and must adhere to the design rules set. CMOS has the added complications of complementary nMOS/pMOS logic blocks and physical separation of nMOS and pMOS transistors, which aect the layout.
VLSI
Design Course
5-27
Basic Layout
Figure 5.23: Design-mask transformation Complementary structuring is illustrated in Fig. 5.25. Each input is connected to both nMOS and pMOS transistors which are physically separated from one another due to the opposite background polarity requirements.
5.4.3
High-speed switching requires large currents and small Cout to insure small charging and discharging time constants. It is evident that this leads to a design problem: to increase current ow, we must use large ( W L ) values for the MOSFETs, which in turn increases the
VLSI
Design Course
5-28
Basic Layout
Figure 5.25: Complementary Transistor/Logic Blocks transistor capacitances. Increasing the aspect ratios in a CMOS circuit gives larger values for both Cin and Cout , aecting the performance of the entire logic chain. In bottom-up design, we attempt to optimize each gate, both intrinsically and with respect to its nearest neighbors. The concept of the equivalent load helps the initial layout problem by dening standard transistor or logic gate capacitances which are used as a reference. All loads are then specied by the number of equivalent loads. A common choice is a minimum-area transistor as shown
VLSI
Design Course
5-29
Basic Layout
in Fig. 5.26. Assuming drawn gate dimensions of (W L), the gate input capacitance is approximated by CG Cox W L. An inverter made using minimum area nMOS and pMOS transistors has an input capacitance of approximately Cin = 2CG which becomes our reference value. To use the equivalent load concept, we assume that the circuit we are designing must drive a load of value CL = nCin , where n is a scaling factor indicating the size of the transistors used in the next gate. For example, n = 2 may imply a single gate with MOSFETs which are twice as large as the reference, or a fan-out F O = 2 into two minimum size gates. The circuit is designed according to the assumed load value. After the design of the logic chain is completed, we recheck the circuit to insure that the actual switching performance is acceptable.
Figure 5.26: Equivalent Load Optimization of the circuit performance can also be specied at the system level and then applied to each gate. This type of top-down approach has been used to estimate gate sizing rules to speed up the response of a static logic chain. In general, combining the two views oered by bottom-up (circuit level) and top-dowm (system level) design provides the most powerful approach to high-performance design. Large digital networks contain both critical and non-critical logic paths so that intermixing design philosophies are often required.
VLSI
Design Course
5-30
Basic Layout
5.4.4 Latch-Up Prevention
Circuits which are fabricated in bulk CMOS require additional safeguards to aviod latch-up. A common approach is to use guard rings, which are heavily doped n+ or p+ regions around MOSFETs as shown in Fig. 5.27. Guard rings reduce the transistor current gain and oset the potential and are eective in preventing latch-up. Another common preventative measure is providing substrate bias contacts next to every MOSFET which is connected to the power supply or ground.
5.4.5
Static CMOS gates are based on complementary nMOS/pMOS logic blocks. Cell design can be split into two tasks: transistor placement and interconnect routing. Real estate budgets often have priority status, so that some thought may be required to t the subsystem into the allocated area. The main limitations are usually due to design rule spacings and the complexity to the interconnect topolgy. Other considerations which may come into play include the shape of the allocated area, location of input and output lines relative to neighboring logic units, and clock distribution. Some of the more interesting designs are based on the complementary placement of opposite polarity MOSFETs. Consider a NOR2 gate. This circuit uses 2 nMOS transistors in parallel and 2 pMOS transistors in series. Fig. 5.28 shows how the complementary arrangement can be implemented by using similar transistor arrays with dierent interconnect patterning. Reversing the transistors in the NOR2 gate in Fig. 5.28(a) directly yields the NAND2 gate shown in Fig. 5.28(b). Although some layouts are based on the schematic patterning, these do not generally yield minimum-area circuits. Thoughtful use of transistor arrays and interconnect routing is usually
VLSI
Design Course
5-31
Basic Layout
5.4.6
Transistor-Gate-Based Logic
The Layout of transmission-gate logic circuits is complicated by the transmission gate itself. The switch uses parallel-connected nMOS and a pMOS transistors which reside in oppositepolarity backgrounds. Consider, for example, a pwell process. The p-channel transistor is
VLSI
Design Course
5-32
Basic Layout
located on the n-substrate, while the nMOS is in a p-well region. Two extreme layout philosophies are (a) use a p-well for every transmission gate, or, (b) use a single p-well for all transmission gates in the circuit. These are illustrated in Fig. 5.29. Approach (a) reduces integration density due to the p-well spacing requirement, but is easy to replicate on a CAD systems; (b) on the other hand, may provide higher logic density, but has a larger capacitance from the extra interconnect. Although both are used in practice, minimizing the number of wells is ussually the preferred strategy. Since each well requires a connection to either VDD or VSS , this also aids in power distribution. A critical aspect of high-speed CMOS layout is control of the parasitic capacitance values.
VLSI
Design Course
5-33
VLSI
Design Course
5-34
Layout Examples
VLSI
Design Course
5-35
Layout Examples
VLSI
Design Course
5-36
Layout Examples
VLSI
Design Course
5-37
Layout Examples
VLSI
Design Course
5-38
Layout Examples
VLSI
Design Course
5-39
Introduction
Chapter 6
Packaging aects signicantly or in some cases dominates the overall chip costs ([22]). The increase of packaging costs for a increasing number of gates on is dierent for memory and logic/microprocessor devices: Memory devices: Due to multiplexing techniques on the chip, the I/O requirements remain essentially constant Logic and microprocessor devices: The number of required I/O terminals increases in proportion to the number of gates on the chip. An empirical estimation for the number of I/O-terminals needed for logic devices is known as Rents Rule: #I/O = (#Gates ) (6.1) Package design has to provide: good heat dissipation good electricial performance high reliability package must be easy to inspect after assembly package must be compatible with a variety of assembly, test and handling systems
VLSI
Design Course
6-1
Introduction
Figure 6.1: Continuous growth in DRAM complexity and size places little demand on package size and number of I/Os
VLSI
Design Course
6-2
Introduction
Figure 6.2: Comparison of I/O requirements for DRAM, logic and microprocessor devices
VLSI
Design Course
6-3
Principally there are two types of mounting devices to printed wiring boards (PWB): 1. through-hole (TH) mounting: Dual-in-line packages (DIP) Pin-grid-array (PGA) (available in hermetic plastic and ceramic types) (pitches: 2.54, 1.78 and 1.27 mm) 2. surface mounting (SM) up to 48 terminals: small outline (SO) (available in plastic only): SOP: small outline package SSOP: shrinked small outline package quad types: chip carriers (CC) and atpacks (available in ceramic and plastic) above 48 terminals: quad types only leaded plastic (PLCC) leaded ceramic (LDCC) leadless ceramic (LLCC) (pitches: 1.37 or 0.635 mm)
VLSI
Design Course
6-4
Package Types
Figure 6.3: Examples for packages and PWB mounting techniques: (a) TH: Dual-in-line (DIL) package. (b) TH: Pin-grid-array (PGA) package. (c) SM: J-leaded packages, leaded chip carrier or smalloutline. (d) SM: Gull-wing-leaded packages, chip-carrier or small-outline. (e) SM: Butt-leaded package, small-outline dualin-line type. (f) Leadless type, ceramic chip carrier mounted to a matching ceramic substrate
VLSI
Design Course
6-5
Package Types
6.2.1 24-pin Packaging Evolution
VLSI
Design Course
6-6
Design Considerations
VLSI Design Rules
Figure 6.7: Bonding-pad pitch versus chip lead count for several chip sizes
Figure 6.8: Arrangement of staggered bonding pads: lower pitch than with single line of bonding pads. (a) Bonding pads size and spacing. (b) Maximum wire angle with respect to die edge
VLSI
Design Course
6-7
Design Considerations
Figure 6.9: CAD template for positioning bonding pads (assures that wire span length meets the design rules)
Figure 6.10: CAD template for checking adherence to wire-span guidelines. The template also provides an extended zone (beyond the optimum shown in Fig. 6.9) for cases where location in optimum zone is not compatible with the device layout.
VLSI
Design Course
6-8
Design Considerations
Figure 6.11: CAD template for checking the maximum distance that wire spans over silicon. Here: violation of the guidelines. The circle must be at minimum tangent to the step-and-repeat centerline (case of maximum distance) or cross it
VLSI
Design Course
6-9
Design Considerations
6.3.2 Thermal Considerations
Objective: keep temperature of silicon die low enough to prevent failure rate Conductive thermal resistance: function of package materials, geometry and orientation.
6.3.3
Electricial Considerations
Increased operation speed and reduced noise margins demand a more careful consideration of package design. Performance criterions: low ground resistance (minimum power-supply voltage drop) short signal leads (minimum self-inductance) minimum power supply spiking due to signal lines simultaneously switching short parallel signal runs (cross talk) short-length signal length near a ground plane (minimum capacitive loading)
Figure 6.12: Lead inductances for various package sizes The inductances of SM packages are signicantly lower than the inductances of TH packages due to their shorter lead traces. Most important problem: noise reduction. The noise induced in the ground line when one line is switching is given by di Vi = Lg (6.2) dt
VLSI
Design Course
6-10
Design Considerations
dij dt
(6.3)
If m ground leads are used, the total inductance is approximately Lg /m. In practical designs often up to 25% of the leads have to be grounded in order to keep noise in desired limits (also usage of large-area power and ground planes within the package).
6.3.4
Ideally: prefer to use materials that are matched in physical properties, especially which have the same TCE (Themal Coecient of Expansion)
VLSI
Design Course
6-11
Design Considerations
Figure 6.14: Plastic package: composite structure consisting of silicon chip, metal leadframe and plastic moulding compound
VLSI
Design Course
6-12
Figure 6.15: Generic assembly sequence for plastic and ceramic packages
6.4.1
Wafer Preparation
Wafer sawing with diamant blade technology In some cases: wafer thinning down using highly automated backgrinding processes The sawed wafer is still mounted on a tape frame-xture (to which it has been attached before sawing and which is not destroyed by the sawing step) and loaded into an automatic die bonder that picks only the good chips from the tape
6.4.2
Die Bonding
The back of the die is mechanically attached to a mount medium, such as ceramic substrate, multilayer-ceramic-package-piece part or metal leadframe. This attachment sometimes enables electricial connection to the back of the die to be made. Two common Methods of die bonding:
VLSI
Design Course
6-13
Assembly Technologies
1. Eutectic die bonding 2. Epoxy die bonding Eutectic Die Bonding (Hard solders)
The die is metallurgically attached to a substrate material Substrate material: metal leadframe made of Alloy 42 or ceramic material (usually 90. . .95% Al2 O3 ) Melting preform: thin sheet of the appropriate solder-bonding Alloy Substrate: Metallization with Ag (leadframes) or Au (leadframes or ceramic) Bonding temperature: about 400 C
VLSI
Design Course
6-14
Assembly Technologies
Bond material: silver-lled adhesives Advantage: less expensive than the high-gold-content hard soldiers and easy to process
6.4.3
Wire Bonding
Typically gold-wire is ball-wedge bonded (thermosonic or thermocompression). ball-bonding to the chip bond pad (typically Al) wedge-bonding to the package substrate (typically Ag or Au) Description of the bonding cycle steps as seen in Fig. 6.18: (a) targeting the capillary on the dies bond pad (b) the capillary presses the ball on the pad. In a thermosonic system ultrasonic vibration is then applied (c) the clamp opens and the capillary rises (d) the lead of the device is positioned under the capillary, which is then lowered on the lead (e) the capillary deforms the wire against the lead. In a thermosonic system ultrasonic vibration is applied
VLSI
Design Course
6-15
Assembly Technologies
VLSI
Design Course
6-16
Package Technologies
(f) the capillary rises and the wire clamp closes at a predened height (g) a new ball is formed using a hydrogen ame or an electronic spark
Figure 6.19: Thermosonic ball wire bonds on a gate array VLSI chip
6.5
6.5.1
Package Technologies
Ceramic Package Technology
very eective for constructing complex packages with many signal, power, ground, bonding and sealing layers
VLSI
Design Course
6-17
Package Technologies
Figure 6.20: Process sequence to create a laminated refractory-ceramic product from a ceramic slurry
VLSI
Design Course
6-18
Package Technologies
6.5.2 Glass-Sealed Refractory Technology
Figure 6.22: Structures of CERDIP and quad CERPAC Lower cost ceramic technology applicable to single-chip DIPs and quad CERPACs. This technology relies on glass-sealing a leadframe between two pressed ceramic units.
VLSI
Design Course
6-19
Package Technologies
6.5.3 Plastic Molding Technology
Postmolding low cost state-of-the-art plastic package technology thermosetting epoxy resins are molded around the leadframe-chip subassembly after the chip being wire-bonded to the leadframe Premolding avoids exposure of die and wire bond to viscous molding material package is molded rst and then chip-leadframe compound is added
VLSI
Design Course
6-20
Package Technologies
6.5.4 Molding Process
Figure 6.24: Molding processing system the preheated molding compound ows under pressure to ll the cavities containing leadframe strips with their attached ICs.
VLSI
Design Course
6-21
VLSI
Design Course
6-22
6.7.1
MultiChip Modules
multiple dies are mounted on multilayer ceramic packages increasing performance by reducing the inter-die line length
VLSI
Design Course
6-23
Packaging Trends
VLSI
Design Course
6-24
Packaging Trends
VLSI
Design Course
6-25
Packaging Trends
VLSI
Design Course
6-26
Packaging Trends
6.7.2 Comparison of Packaging Alternatives
Features Mature Reliable Low risk Limitations Low density Speed: <30MHz Increased PWB complexity Low PWA producibility Requires automated assembly equipment Cost Test and burn-in of bare chips required Environmental protection of bare die TCE eects of coatings and/or PWBs Dicult repairability Available 1995 - 1999 Defect density of wafers require redundancy Thermal management TCE eects Vibration/shock environments No repairability
Chip-on-Board (COB)
Increased density Speed: 30 . . . 100MHz Average PWB complexity Good PWA producibility Good middle ground between MCMs and MWSI High density Speed: GHz range Extreme density Speed: High GHz range Potential for low cost Simplicity (once fabrication processes are fully developed)
VLSI
Design Course
6-27
CAD Tools
Chapter 7
The following list shows some important CAD tools used for the design of integrated circuits: graphics editor (drawing schematic diagrams, physical layout, stick layout diagrams, . . . , used for displaying results from simulations, layout verications (like design rule checks), placement and routing, . . . ) language based circuit capture tools (for hardware description languages like VHDL, Verilog, EDIF, . . . ) physical design verication tools (design rule checker, extractor, LVS, schematic and electrical rule checker, . . . ) simulation tools (analog simulation: circuit level; digital simulations: circuit level, switch level, logic level, register transfer level, architectural level, behavioural level; thermal simulation: displaying heat dissipation on chip) layout compilers (stick2layout, macrocell generators, datapath compilers) layout synthesizer, layout compactor logic optimizer database interfaces (le input / output from / to standardized interchange formats) database management (to keep dierent versions (current, backup1, backupn) and views of a design object [schematic, simulation netlist, stick diagram, physical layout, . . . ]) in the design database)
VLSI
Design Course
7-1
With Full Custom Design techniques, the designer is able to individually specify the geometrical layout of the integrated circuit (transistor size [channel length, channel width, shape, . . . ], transistor placement, wire width, . . . ). The designer has the option to manually optimize the layout the most dense layouts can be generated using the full custom design styles. Hand Crafted Layout The layout is drawn in form of rectangles and polygons on dierent layers using a graphics editor. The designer has to know a large set of process dependent design rules. The mask layout is generated as drawn on the screen direct inuence to component placement, to important parameters as W and L of transistors, wire widths, .... Stick Diagram The layout is drawn in form of lines and polygons on dierent layers using a graphics editor. A sticktolayout converter together with a compactor and a description of the process design rules is then used to generate the rectangle based layout. The designer can draw almost process and design rule independent symbolic layouts. Process adaption is done by the converter/compactor. Converter constraints (cell dimensions, channel widths / lengths of transistors, . . . ) can be specied. Geometrical Specication Language The layout is specied in textual form giving either the position and layer of rectangles (similar to hand crafted layout) or lines (as in stick diagrams). Since programming language constructs like parameterized macros (to be used for layout segments as cells, . . . ), loops (while, repeat, for, . . . ), and conditional statements (if, case, . . . ) may be available, parameterized layouts (e. g. generic transistor with W and L as parameters, cells for dierent bitwidths, . . . ) can be described using geometrical specication languages. Used in a large number of macrocell compilers.
VLSI
Design Course
7-2
B x y dx dy Ln Mn E Cnxym Q
Box with length dx, width dy, and lower left hand corner placed at (x, y). Layout level (layer) for the box denitions that follow Start of macro denition n End of macro denition Call for macro number n with translation x, y and orientation m. End layout le. Table 7.1: Simplied geometrical specication language Layer 1 2 3 4 5 8 9 CMOS n-diusion p-diusion polysilicon metal contact n-well overglass NMOS n-diusion ion implant polysilicon metal contact overglass
Figure 7.1: Cell orientations Orientation 1 2 3 4 5 6 7 8 Description no rotation rotate 90o counterclockwise rotate 180o counterclockwise rotate 270o counterclockwise mirror about y-axis rotate 90o counterclockwise and mirror about y-axis rotate 180o counterclockwise and mirror about y-axis rotate 270o counterclockwise and mirror about y-axis Table 7.3: Rotations of geometry
VLSI
Design Course
7-3
Figure 7.2: Full custom layout (hand crafted or generated out of a stick diagram resp. a layout description)
VLSI
Design Course
7-4
VLSI
Design Course
7-5
Symbol Generation
T
Schematic Entry
Layout Editor
Block Layout
c
Circuit Simulation
c
Fabrication
VLSI
Design Course
7-6
The Cell Based Design approaches rely on layout components predened and provided by the silicon foundry. Several implementation styles can be distinguished: Gate Array pre-fabricated diusion and poly layers (regular structures e. g. transistors) customized interconnect structures (wires in metal 1 and metal 2) xed size interconnect areas (channels) Sea of Gate Array pre-fabricated diusion and poly layers (regular structures e. g. transistors) customized interconnect structures (wires in metal 1 and metal 2) variable size interconnect areas (channels) over unused transistors Standard Cell layout blocks predened by silicon foundry full process sequence for chip fabrication required
VLSI
Design Course
7-7
Specication / Compilation
Compiled Macrocell
c c E
Schematic Entry
Routing
Channel Generation ' Global Routing Detailed Routing P & R Optimization Design Analysis
E
Fabrication
'
VLSI
Design Course
7-8
Design Verication
Physical Design Rule Check
Physical design rule checks (DRCs) are performed to guarantee the conformity of a layout design to the silicon vendors set of design rules. Design rules are dened between objects on the same layer (minimum width, minimum spacing) as well as for objects on dierent layers (minimum spacing, overlapping, extension). Minimum width Minimum spacing Overlapping Extension Design rule violations are usually reported in the physical layout using a graphics editor. Sometimes, also a tabular form indicating the location and type of design rule violation can be generated.
7.4.2
Extraction
Circuit Level Extraction: can be used to create a netlist for circuit level simulations (e. g. SPICE, . . . ). The netlist consists of MOS transistors (including geometrical parameters as W / L, parasitic capacitances), resistors, capacitances, diodes, . . . . Switch Level Extraction: can be used to create a netlist which can be processed by a switch level simulator. The resulting netlist consists of MOS transistors and parasitic capacitances (to model storage eects in MOS circuits). Parasitics Extraction: is used in conjunction with cell based design techniques. Since wire delay is dependent on the parasitic capacitance of a wire, parasitic capacitances of nets and input capacitances of other gates connected to an output can be used to estimate the extrinsic delays (Note: intrinsic delays [i. e. the delay of unloaded gates] are fetched from the cell librarys simulation model data).
Schematic Extraction: is executed to generate the connectivity data out of a graphical representation (schematic diagram) of a circuit module. The connectivity data is forwarded to a netlister which provides the information required e. g. by simulation tools (the simulators cannot operate on graphical data, they require netlists in a textual format). This kind of extraction is usually required in pre-layout design specication phases.
VLSI
Design Course
7-9
Design Verication
Figure 7.7: Example of a design rules set checked during design verication
VLSI
Design Course
7-10
Design Verication
7.4.3 LVS
The layoutversusschematic (LVS) comparison tool checks the equivalence of the layout and its schematic. The tool can be used to nd wrong connections or parameter mismatch (as W / L of transistors, . . . ) between a schematic and its physical layout representation.
7.4.4
To verify schematics used e. g. in cell based designs, a schematic rule checker can nd schematic rule violations (like the following examples): Warnings: unconnected (oating) wire segments open outputs exceeded fanout Errors: open inputs (undened input value!) number of bits dier for 2 buses connected together number of input/output pins in a schematic diers from its symbol representation ( pins are not accessible / not present at higher levels of schematic hierarchy) more than one active driver connected to a net at the same time
VLSI
Design Course
7-11
Simulation 7.5
7.5.1
Simulation
Goal of Simulation
Validation of the system, logic timing, and electricial behaviour Verify testability aspects Software development
7.5.2
Simulator Classication
Level RT Gate Switch Electricial Primitives registers, user coded primitives, busses, etc. gates transistors, capacitators capacitators, resistors, inductors, diodes etc. observable Values bit strings, vectors bits bits real values Timing Model discrete time set continuous or discrete continuous or discrete continuous time set
7.5.3
Signal Modelling
values which exist in real circuits (0, 1, high impedance, oscillation, . . .) values which exist only in the simulator (unknown, tranistion, . . .) boolean logic set not sucient
7.5.4
Signal States
VLSI
Design Course
7-12
Simulation
Problems: Pessimism of U-value (for example: circuit initialisation, spikes) logic values are often not sucient (value strength needed)
7.5.5
rr d
(n)
y
(n) = n t1 , t 2 , t 3 , . . . (t +1 t : : : basic time unit delay of the gate clock time of synchronous circuit
y t = x t (n)
= t = m )
Timing models: Zero-Delay: = 0 Unit-Delay: (n) = constant Nominal-Delay: (n) = user specied
VLSI
Design Course
7-13
Simulation
7.5.6 Advanced Logic Simulators
Introduction of signal strength additional to logic values for driver and bus modelling A P S X Y Z : : : : : : active, e.g. low impedance driver passive, e.g. high impedance driver (depletion load) storing, e.g. capacitive stored state active indeterminate (e.g. active or storing) passive indeterminate (e.g. passive or storing) high impedance
Instead of simple logical values signals are used for simulation. A signal consists of a logical value and a strength. Logical Values = {0,1,X} 16 states Overview on Signal Combinations
A0 A0 A1 AX A1 AX AX A1 AX P0 A0 A1 AX P0 P1 A0 A1 AX PX P1 PX A0 A1 AX PX PX PX S0 A0 A1 AX P0 P1 PX S0 S1 A0 A1 AX P0 P1 PX SX S1 SX A0 A1 AX P0 P1 PX SX SX SX X0 A0 AX AX X0 XX XX X0 XX XX X0 X1 AX A1 AX XX X1 XX XX X1 XX XX X1 XX AX AX AX XX XX XX XX XX XX XX XX XX Y0 A0 A1 AX P0 PX PX Y0 YX YX X0 X1 XX Y0 Y1 A0 A1 AX PX P1 PX YX Y1 YX X0 XX XX YX Y1 YX A0 A1 AX PX PX PX YX YX YX XX XX XX YX YX YX ZZ A0 A1 AX P0 P1 PX S0 S1 SX X0 X1 XX Y0 Y1 YX ZZ
A0 A1 AX P0 P1 PX S0 S1 SX X0 X1 XX Y0 Y1 YX ZZ
7.5.7
Simulation Techniques
Compiler driven technique Problems: Feedbacks Sorting of gate netlist Zero delay model Entire circuit is simulated
VLSI
Design Course
7-14
Simulation
r r r r r r A1 E A
rr r A1 E r r r C
r r r r r r A1 E A
rr r AXE r r r C
r r r r r r P0 E B
r r r r r r A0 E B
A stronger than B
Short circuit
r r r r r r P0 E A
rr r P0 E r r r C
r r r r r r X1 E A
rr r XXE r r r C
r r r r r r S1 E B
r r r r r r P0 E B
P stronger than S
7.5.8
VLSI
Design Course
7-15
Simulation
Figure 7.9: Example: compiler driven simulation no xed direction of signal ow transistor modeled as a switch with three states: open, closed, unknown algebraic or RC models
VLSI
Design Course
7-16
Simulation
Drain
d
Gate '
E d
Logic (Gate) 1 0 X
remarks:
Source
Switch transition time is assumed to be zero or some nominal value. Unknown states can cause problems.
Drain
d
Gate '
E
. . . . . . . .. .
REF F
$ $ $ $ $ $ . . . . . . .
. .. .
Logic (Gate) 1 0 X
remarks:
Source
In the linear model, node capacitance and devices resistance are used to compute output logic levels and transition times. Ratio errors can be detected.
VLSI
Design Course
7-17
VLSI
Design Course
7-18
VLSI
Design Course
7-19
VLSI
Design Course
7-20
VLSI
Design Course
7-21
VLSI
Design Course
7-22
VLSI
Design Course
7-23
VLSI
Design Course
7-24
VLSI
Design Course
7-25
VLSI
Design Course
7-26
VLSI
Design Course
7-27
VLSI
Design Course
7-28
VLSI
Design Course
7-29
VLSI
Design Course
7-30
VLSI
Design Course
7-31
VLSI
Design Course
7-32
VLSI
Design Course
7-33
VLSI
Design Course
7-34
VLSI
Design Course
7-35
VLSI
Design Course
7-36
Weinberger Structuring
Chapter 8
Weinberger structuring is a structured approach that simplies physical layout and improves layout density. The method has been presented by Weinberger in 1967. Weinberger Arrays are created by placing transistors on the chip in a geometrically regular manner. Horizontal and vertical interconnect patterns are used to wire the devices together. using one type of gate; for example, NOR gates form a complete logic set for nMOS circuits regularity of Weinberger Arrays is very suitable for automatically layout generation
VLSI
Design Course
8-1
Weinberger Structuring
Example: F = (A + B + C ) = A B C (8.1)
Figure 8.1: NOR gate reduction for Weinberger structuring empty squares denote input connections lled squares denote output connections
VLSI
Design Course
8-2
Weinberger Structuring
VLSI
Design Course
8-3
Weinberger Structuring
VLSI
Design Course
8-4
Weinberger Structuring
Example: Z =U +V +W +X +Y
U V Z W X Y
(8.2)
b b
b b b b
b b b
b b b
b b b b b
b b
b b b
U V
b b
U V
VLSI
Design Course
8-5
Weinberger Structuring
VLSI
Design Course
8-6
Gate matrix layout is a character based layout style for custom CMOS circuitry. It is a regular design style, employing a matrix of intersecting transistor diusion rows and polysilicon columns such that intersections are potential transistor sites.
8.2.1
Representational line drawing or stick gure using the levels of interconnections available (e.g. polysilicon gate technology: polysilicon, metal, diusion) immediately draw series of parallel poly lines corresponding to the number of inputs to the circuit (may become more if an output is chosen to be polysilicon) subsequent transistor placements will be determined by two factors, i.e. input column and serial or parallel association among transistors. after row denition, further interconnections may be done with horizontal and vertical metal interconnection tracks nal improvements
VLSI
Design Course
8-7
Figure 8.8: Gate matrix layout: (a) schematic (b) layout (c) optimized layout of n part
VLSI
Design Course
8-8
A HA B
A B
C = AB = AB S = AB + AB = (A + B ) B + (A + B ) A = (A B ) B + (A B ) A = (A B B ) (A B A)
(8.3)
(8.4)
VLSI
Design Course
8-9
Figure 8.10: Half adder realizations: (a) standard cell (b) gate matrix
8.2.3
N P + | ! :
n-channel transistor p-channel transistor metal-poly or metal-diusion crossover contact polysilicon or n-diusion wire p-diusion wire vertical metal horizontal metal
VLSI
Design Course
8-10
VLSI
Design Course
8-11
The following rules summarise the gate-matrix technique: 1. Polysilicon runs only in one direction and is of constant width and pitch. 2. Diusion wires (of constant width) may run vertically between polysilicon columns. 3. Metal may run horizontally and vertically. Any pitch departures from a minimum (e.g. power rails) are manually specied. 4. Transistors can only exist on polysilicon columns. Wide transistors may be specied by abutting two or more N or P symbols.
VLSI
Design Course
8-12
+ regular design style + technology updatable + modularity is encouraged by the block nature of the layout style + circuit extraction may done at the symbolic level or at the mask level by conventional circuit extractions character symbolic description is not hierarchical modules must be assembled in their entirety and pasted together at the mask level no freedom to locally optimize geometry, e.g. transistor size
VLSI
Design Course
8-13
Optimal CMOS Complex Gate Layout 8.3 Optimal CMOS Complex Gate Layout
In MOS circuit design advantage can be taken by the application of complex functional cells in order to achieve better performance. In this section the implementation of a random logic function on an array of CMOS transistors will be discussed. The method has been presented by Uehara and van Cleemput in 1981. A graph theoretical approach for systematic and ecient layout generation minimizes the required chip area.
optimal
Figure 8.13: (a) CMOS complex gate schematic and (b) corresponding layout
VLSI
Design Course
8-14
Figure 8.14: Implementation of an EXOR function: (a) Logic diagram. (b) Circuit. (c) Layout Advantages of complex gate approach: + better performance
VLSI
Design Course
8-15
Figure 8.15: Example of row-based layout scheme + smaller size In the following, the consideration is limited to AND/OR networks realized in complex gate CMOS by means of series/parallel connections of transistors. The topology of the nMOS network and the pMOS network are assumed to be dual. The delay of a complex CMOS cell mainly depends on the maximum number of series transistors between VDD or VSS and the cell output, which is called level of the complex cell. This quantity has a direct inuence on the charging or discharging resistance of the cell. Generally cells with less than four levels are desirable. The number of cells with parallel/serial topology is given by the following table: number of levels 1 2 3 4 number of cells 1 6 80 3434
So its reasonable to use mainly cells with three levels and only sometimes cells with four levels in order to get a sucient performance.
VLSI
Design Course
8-16
Figure 8.16: Alternative complex gate implementation of EXOR function: (a) Logic diagram. (b) Circuit. (c) Layout
VLSI
Design Course
8-17
Figure 8.17: Basic layout of the functional cell: (a) Logic diagram. (b) Circuit. (c) Graph model. (d) Layout
VLSI
Design Course
8-18
Layout properties (from Fig. 8.17(d)): two rows of transistors, implementing the pMOS and nMOS part of the circuit equal number of transistors in both rows
Figure 8.18: Layout optimization: (a) Diusion connection of adjacent transistors. (b) Optimal arrangement (reordered input lines) Fig. 8.18 shows layout improvements for the circuit in Fig. 8.17. If the metal connections between adjacent transistors are replaced by diusion (designer should be careful in doing this for high-speed circuits) the layout of Fig. 8.18(a) is achieved. An even more sophisticated layout arrangement which reduces the required area is shown in Fig. 8.18(b). The best layout is achieved by the transistor arrangement of Fig. 8.19, which is logically equivalent to the previous gures.
VLSI
Design Course
8-19
Figure 8.19: Alternative optimal circuit layout: (a) Logic diagram. (b) Circuit. (c) Graph model. (d) Optimal Layout.
VLSI
Design Course
8-20
and
A separation is required when there is no connection between physically adjacent transistors. An optimal layout is obtained by reducing the number of separations.
8.3.3
The p-side and the n-side of the circuit can be formulated as graphs which can be dened as follows: GP GN Graph properties: the graphs are series/parallel graphs (CMOS complex gate property/assumption) every source/drain potential is represented by a vertex V every transistor is represented by an edge E , connectiong the vertices representing source and drain edges are labeled by the corresponding transistor gate input signal GP and GN are dual If two edges Ei and Ej are adjacent in the graph model, then it is possible to place the corresponding gates in a physically adjacent position of an array and hence, connect them by a diusion area. In order to minimize the number of separations a set of minimum size paths has to be found, which corresponds to chains of transistors in the array. Denition 1 An Euler path is a closed path on a graph, that covers every edge of the graph exactly once If there exist Euler paths for GN and GP then all transistors can be chained by diusion areas. Otherwise the graphs have to be partitioned into subgraphs which have Euler graphs. Its necessary to nd a pair of paths for GP and GN with the same sequence of labels, because p- and n-type transistors corresponding to the same input have to be positioned at the same horizontal position (poly line). = (VP , EP ) = (VN , EN ) p side network n side network (8.8) (8.9)
VLSI
Design Course
8-21
General algorithm: 1. enumerate all possible decompositions of the graph model to nd the minimum number of Euler paths that cover the graph 2. chain the gates by means of a diusion area according to the order of the edges in each Euler path and 3. if more than two Euler paths are necessary to cover the graph model, then provide a separation area between each pair of chains = Search of minimal number of Euler paths is NP-complete
8.3.4
Problem Reduction
Figure 8.20: Reduction of odd numbers of edges Denition 2 The reduced graph is obtained by iteratively replacing an odd number of series (parallel) edges by a single edge, until no further reduction is possible. Theorem 1 If there is an Euler path in the reduced Graph then there exists an Euler path in the original graph.
Proof: It is possible to reconstruct an Euler path in the original graph by replacing each edge of the Euler path in the reduced graph by a sequence of the original odd number of edges.
Theorem 2 If the number of inputs to every AND/OR element is odd, then 1. the corresponding graph model has a single Euler path 2. there exists a graph model such that the sequence of edges on an Euler path corresponds to the vertical order of inputs on a planar representation of the logic diagramm. If there are gates in the logic diagramm with an even number of inputs, additional pseudo inputs have to be introduced in order to guarantee an odd number of inputs. It is guaranteed by the second previously given theorem, that there exists an Euler path for this modied problem. But the pseudo edges in the Euler path have to be removed afterwards and then they can cause diusion separations. An algorithm for minimizing separations caused by pseudo edges is given in the next section ( minimal interlace of normal and pseudo inputs). The heuristic algorithm for generating an Euler path is given by:
VLSI
Design Course
8-22
Figure 8.21: Application of reduction rule: (a) Logic Diagram. (b) Graph model and its reduction. (c) Reconstruction of an Euler path 1. To every gate with an even number of inputs a pseudo input is added 2. Add this new input to the gate such that the planar representation of the logic diagram shows a minimal interlace of pseudo and real inputs. It should be noted that a pseudo input at the top or at the bottom of the logic diagram does not contribute to the separation areas as shown in Fig. 8.22(b) and (c). 3. Construct the graph model such that the sequence of edges corresponds to the vertical order of inputs on the planar logic diagram. 4. Chain together the gates by means of diusion areas, as indicated by the sequence of edges on the Euler path. Pseudo edges indicate separation areas. 5. The nal circuit topology can be derived by deleting pseudo edges in parallel with other edges and by contracting pseudo edges in series with other edges.
VLSI
Design Course
8-23
This heuristic algorithm does not necessarily give the optimal layout, but if the resulting sequence has no separations areas, it is the real optimal solution.
Figure 8.22: Application of the heuristic algorithm: (a) New inputs p1 and p2 are added. (b) Optimal sequence of inputs without the interlace of p1 or p2. (c) Circuit with the dual path {p1,2,3,1,4,5,p2}
8.3.5
VLSI
Design Course
8-24
VLSI
Design Course
8-25
VLSI
Design Course
8-26
Figure 8.25: Carry look-ahead circuit (this representation has no Euler path)
VLSI
Design Course
8-27
Figure 8.26: Alternative topology for carry look-ahead circuit (with possibility of constructing an Euler path)
VLSI
Design Course
8-28
Figure 8.27: Comparison of space: (a) Functional cell realization. (b) Conventional NAND realization
VLSI
Design Course
8-29
VLSI
Design Course
8-30
VLSI
Design Course
8-31
A programmable logic array (PLA) maps a set of Boolean functions in cannonical, two-level sum-of-product form into a geometrical structure. A PLA consists of an AND-plane and an OR-plane. For every input variable in the Boolean equations, there is an input signal to
Figure 8.31: AND-OR-PLA the AND-plane. The AND plane produces a set of product terms by performing an AND operation. The OR-plane generates output signals by performing an OR operation on the product terms fed by the AND-plane. PLA: AND array and OR array programmable product term sharing: every product term of the AND array can be connected to any of the OR output gates PAL: AND array is programmable and OR array has xed connection points (OR gates) PROM: AND array hardwired, OR array programmable ( the set of all possible product terms is realized)
VLSI
Design Course
8-32
VLSI
Design Course
8-33
Example:
x0 0 0 0 0 1 1 1 1 here:
x1 0 0 1 1 0 0 1 1
x2 0 1 0 1 0 1 0 1
z0 1 1 0 0 0 0 1 0
z1 1 1 0 0 0 0 0 1
PROM implementation realizes all of the 8 product terms PLA implementation needs only 3 terms
AND
0
OR
E E E
0 X 1 1 0 1
1 1 0
1 0 1
1 1
c c c
c c
x0 x1 x2
z0 z1
VLSI
Design Course
8-34
Figure 8.34: PLA generic oor plan A O AO IN OUT LA RO BL BM BR TL TA TM TO TR AND plane programming cell OR plane programming cell AND-OR communication cell AND plane input cell OR plane output cell Left AND plane cell Right OR plane cell Bottom left cell Bottom middle cell Bottom right cell Top left cell Top AND cell Top middle cell Top OR cell Top right cell
VLSI
Design Course
8-35
nMOS PLA: Pull-up network realized by single nMOS depletion transistor Pseudo nMOS PLA: Pull-up by high resistance pMOS transistor with permanently grounded gate input Since the AND-OR structure is not suited to MOS circuit technology both AND and OR planes are implemented using distributed NOR or NAND gate structures based on deMorgans law: 1. INV-NOR-NOR-INV structure:
a b + c d = (a + b) + (c + d)
= ( a +b) + (c + d) INV NOR
NOR INV
(8.13)
Figure 8.35: NOR-NOR PLA structure high static power dissipation small area
VLSI
Design Course
8-36
VLSI
Design Course
8-37
Figure 8.38: Stick diagram of nMOS PLA useful if high speed is not required 2. NAND-NAND structure:
ab + cd = ab + cd = (a b) (c d) Example: z0 = xo x1 + x0 x1 x2 = (x0 x1 ) (x0 x1 x2 ) (8.17) (8.18) Properties: NAND-NAND approach not recommended: decreasing performance at increasing number of inputs (because of series connection of nMOS transistors) high static power dissipation (8.16)
VLSI
Design Course
8-38
NOR gates with a large number of inputs should be avoided in CMOS, because the p-channel devices are in series. Static CMOS PLA are usually realized with NAND-INV-INV-NAND structure in order to avoid long chains of pMOS transistors. Properties:
no static power dissipation area increase becomes unacceptable for large PLAs working fast
8.5.4
VLSI
Design Course
8-39
Figure 8.40: CMOS PLA layout 1 = 1: no path to ground inputs change both NOR planes are precharged 1 = 0: rst NOR plane discharges dummy: worst case discharge (prevents second NOR plane to discharge) after rst NOR plane, the second NOR plane evaluates
VLSI
Design Course
8-40
Figure 8.41: Dynamic 2-phase PLA circuit 2 is used to latch the second stage intermediate clock is required to precharge OR plane: this is as mentioned above generated by the cells TL, TA and TM. This uses a dummy product row that discharges at the worst case rate according to the loading of the and array
8.5.5
Noise in PLAs
in dynamic PLAs noise problems on switched supply lines discharging current is generating transients in the power supply bus to reduce noise: locally grounding the PLA; use of metal lines for power supply whenever possible (reduced impedance)
8.5.6
Optimization of PLAs
Logic Minimization optimizations (minimizations) of boolean equations in order to reduce the number of minterms or literals
VLSI
Design Course
8-41
Figure 8.42: Noise problem in dynamic PLAs if a term is needed both positive and negative sometimes a reduction can be achieved using negative logic Example: z = x1 + x0 x1 x2 + x0 x1 x2 = 3 minterms z = (x1 + x0 x1 x2 + x0 x1 x2 ) = x1 (x0 x1 x2 ) (x0 x1 x2 ) = x1 (x0 + x1 + x2 ) (x0 + x1 + x2 ) = (x1 x0 + x1 x2 ) (x0 + x1 + x2 ) = x0 x1 x2 + x0 x1 x2 = 2 minterms decoder in front of the AND plane to generate combined input variables
VLSI
Design Course
8-42
Folding
Figure 8.45: Row-folded PLA An advantage of multiple-sided access and folding is the decreased layout area, but the layout structure has changed and the wiring is more dicult.
VLSI
Design Course
8-43
8.5.7
Delay is determined by (W/L) of the AND/OR load (W/L) of the AND/OR cells Minimum Delay: large load current Iload (W/L)ORplane = e (W/L)ANDplane Limitations: Iload limited by: the total power of the PLA the internal logical 0: (I RnMOS = 0) < VT the stage sizing factor e for successive stages can not always be realized due to the oorplan
!
VLSI
Design Course
8-44
t t
logical optimization
e e
truth table = matrix Cells: e input/output buer e rr clock driver oorplanner rr VDD/VSS cells Schmittrigger ...
e e
structure of PLA
VLSI
Design Course
8-45
Truth table matrix: optimized intermediate result 11X 1X1 X11 100 010 001 111 10 10 10 01 01 01 01
VLSI
Design Course
8-46
A typical digital circuit architecture for computation intensive applications consists of a datapath and a controller. The data-path is formed by a number of arithmetic units like adders, ALUs, multipliers etc. connected through a network of connections, busses, multiplexors and registers. Registers are required to separate computational stages from each other (to synchronize computations) or to feed back data for further arithmetic operations (to break up circuit loops). However, no circuit can be realized through a data-path only since this circuit part has to be controlled to perform actual computations. Signals are required to select e.g. the functionality of an ALU, to steer data through multiplexors to a dedicated input of an arithmetic unit or to control the reading of values into registers. Those signals are provided through a control unit or short controller. To support a hierarchical design approach data-path and controller are always regarded separately as shown in gure 8.48. The control section provides some control
Figure 8.48: Datapath and controller block signals required for datapath control and on the other hand reads status information as e.g. overow ags or comparator results (to control loop execution etc). A typical control task example is the instruction set execution of standard microprocessors. Simplied the controller can work in the following way: step 1: step 2: step 3: instr 1(add) step step step step 4: 5: 6: 7: load operand 1 load operand 2 add op1 op2 store result initialize processor fetch instruction (address) decode instruction instr 2 (move) step 4: load operand step 5: store operand instr 3 . . . . . .
VLSI
Design Course
8-47
Finite-State Machine
step n: address = address + 1 step n+1: goto step 2 Dierent steps are required to fetch, decode and execute an instruction. Depending on the decoding of the instruction a dedicated sequence of steps will be executed. During each step an output vector will be produced to control the datapath (e.g. switch a multiplexor to select a certain operand or determine ALU-operation to be performed on operands). In this example the controller is also receiving signals from the datapath as e.g. instruction decoding information in step 3 to be able to branch into the corresponding instruction execution sequence. The question arises now, how such a controller can be specied and designed. Combinational circuit specication through boolean equations provides a good model for the behaviour of memoryless digital circuits. However, it is quite obvious that a controller realization cannot be memoryless. This is due to the fact that one is passing through a sequence of steps which generally will be inuenced through signals to be read from the datapath. During each step an output vector has to be produced to control/steer the datapath. Therefore, a controller can be regarded as a black box with an input and output vector, where the values of the output vector depend on the current step. A certain step is reached through a sequence of preceding steps which nally means, that the value of an output vector depends on the history of the circuit. Such a behaviour is only possible when memory is available. Synchronous digital circuits which comprise memory elements are called sequential circuits since the results produced at the primary outputs generally depend on the values at the primary inputs and the history of the circuit. History in this context means the values of all registers in the current step (or state) which received those values before the actual clock cycle (in the past). Therefore, during its operation the circuit will run through a sequence of states represented through the register contents. Each sequential circuit can be represented in a way as depicted for the controller on the left side of gure 8.48 if all registers are collected into the state register, all combinational logic producing the contents of the registers into the next state logic and all combinational logic producing the primary output values into the output function. Due to the existence of memory, combinational circuit theory is no well suited model for the description of controllers or any other sequential logic. Since a controller can be regarded as a special case of sequential logic application (and one is interested in a general approach to cope with all sequential logic circuits) the more general term sequential logic will be investigated in the remaining section. Figure 8.49 shows a small example of a sequential circuit. Despite it is principally possible to replace the registers through the corresponding combinational circuits and to open the feedback loop such that combinational circuit theory can be applied, a more abstract behaviour description would be desirable. This is especially true for complex controllers where a designer does not want to be concerned with too much circuit details. Fortunately, the theory of nite state machines provides an abstract basis for the modelling of sequential logic.
VLSI
Design Course
8-48
Finite-State Machine
8.6.1
In this section we will show how a sequential circuit can be seen as one of several possible implementations of a particular nite state machine (FSM). Each FSM has a nite set of discrete states as well as a nite set of digital inputs and outputs and a set of digital rules that govern its behaviour. An FSM operates in discrete time why its behaviour can be characterized as a sequence of steps that occur at regular intervals (all registers are synchronously clocked). An FSMs inputs, outputs, and state are assumed to be constant during each interval, changing only at the boundaries between consecutive intervals (the registers are triggered with rising or falling clock edge). Summarizing an FSM is dened in the following way: A nite state machine is a digital device having a nite set of states S1 , S2 , ..., Sk (where k is the number of states). Optionally one of these, SI is distinguished as the initial state of the FSM a nite number of binary inputs I1 , I2 , ..., Im (where m is the number of inputs) a nite number of binary outputs O1 , O2 , ..., On (where n is the number of outputs) a set of state-transition rules specifying, for each choice of current state SS and input values I1 , I2 , ..., Im , a next state SS a set of output rules specifying, for each choice of current state SS and input values I1 , I2 , ..., Im , the binary value at each output One distinguishes between two types of nite state machines, namely the Moore machine and the Mealy machine. Both types of machine dier in the last of the topics mentioned above. In the case of Moore type machines the output rules are such that the outputs of a Moore FSM are functions of the current state only. In gure 8.48 this would mean that control inputs are only going into the next-state function block and not into the output function block. The alternative Mealy machine model allows outputs to reect current inputs as well as current state. Therefore, gure 8.48 represents a Mealy machine. The behaviour of every FSM can be described using either model, although the number of states and timing details will generally dier. The Moore machine has some advantages for theoretical reasoning and is therefore generally used in proving, however, the Mealy machine type is preferred in actual circuit implementations since it generally requires less states (which means less logic for its realization) and
VLSI
Design Course
8-49
Finite-State Machine
it can respond immediately upon changes of the input vector (a Moore machine rst has to branch into a new state since output values only depend on the state information). Practical FSM implementations typically have a reset input, which returns the FSM to a well dened initial state such that the automata can be reset before a new input sequence is applied (e.g. when the system containing the FSM is turned on). Returning to the circuit of gure 8.49, one can identify the discrete states by tabulating combinations of values for its state variables. If q0 and q1 are used to denote the values of the state variables in the current state, and n0 and n1 to denote the values in the succeeding state, the following equations will describe this circuit: n0 = in q 1 n1 = q 0 out = q1 q0 The state-transition and output rules are shown in the truth table of table 8.1, which lists all possible combinations of current state and input variables on the left side, and the next Current state q1 q0 00 00 01 01 10 10 11 11 Next state n1 n0 00 01 10 11 00 00 10 10
Input 0 1 0 1 0 1 0 1
Output 0 0 0 0 0 0 1 1
Table 8.1: State-transition truth table state which the machine should enter on the right side along with the corresponding output. These tables can be easily obtained from the implementation of the FSM. For example, if in the circuit above, q1 = 0, q0 = 0, and in = 0, then the next state that results is q1 = 0, q0 = 0. If in = 1, the next state will be q1 = 0, q0 = 1. The state-transition table immediately suggests a ROM implementation of the FSM, the lefthand side of the table being the address of the ROM and the right-hand columns being data outputs. The nal and most abstract representation for a nite-state machine is a state-transition diagram. In such a diagram, states are shown as circles. Outputs associated with the state are given inside the circle. Transitions between states are represented as directed arcs from one circle to another. The input combination that causes a given transition is written along the arc. Since we are dealing with clocked sequential machines, transitions only occur on clock edges, and for this reason the clock is not explicitly shown on state-transition diagrams. Figure 8.50 gives the state-transition diagram for the FSM discussed above.
VLSI
Design Course
8-50
Finite-State Machine
8.6.2
The realization of clocked sequential circuits is a fairly straightforward processing having four main steps. First step is to draw a state-transition diagram for the FSM. This is often a very dicult step since it requires thinking very precisely about what the FSM is supposed to do. Next, determine the number of state variables (and therefore registers) from the number of states in the state-transition diagram and assign a binary encoding to each state. This assignment can be done arbitrarily, however, this might result in an inecient solution. An optimal state assignment is of major importance for the amount of combinational circuitry required to implement the FSM. Unfortunately this problem is NP-hard which means that it is suspected to require exponentially growing computation time if the problem size is increasing. The importance of an appropriate state encoding will be illustrated at the end of this subsection. Then, based on the state-transition diagram, a state-transition table has to be built. It is important that the table covers all possible input combinations for each possible state (if a combination does not occur dont cares should be inserted which can be exploited during combinational logic minimization). From the table, the circuit can be directly implemented with ROMs. If another implementation is required (logic gates, for example), Karnaugh maps from the state-transition table for each next-state variable have to be developed. Finally, a reduced sum-of-products expression has to be found for each which can be implemented through appropriate combinational logic. To illustrate those steps consider the design of a simple FSM whose one output goes high every ve clock times and remains high for one clock period. The frequency of the output pulses is one-fth that of the clock. This type of circuit is called a divide-by-5 counter. This machine has no external inputs. Its state-transition diagram is shown in gure 8.51. A state assignment and a state-transition table for this counter are given in table 8.2. Please note that the number of 3 bits for state encoding as well as the actual encoding of each state had been done arbitrarily. A larger number of bits or another encoding could have been selected! The table can now be realized through e.g. ROMs or using explicit combinational logic (realized as two/multilevel gates, PLA etc). Figure 8.52 shows another example which in the following will be used to illustrate the im-
VLSI
Design Course
8-51
Finite-State Machine
Figure 8.51: State-transition diagram for the divide-by-5 counter Transition table Current Current state state A 000 B 001 C 010 D 011 E 100 Output table Next State Output 001 0 010 0 011 0 100 0 000 1
Figure 8.52: State-transition diagram of an arbitrary FSM portance of state-encoding. The behaviour of the FSM is given represented in transition table 8.3. State encoding is the process of assigning a unanimous bit vector to each state of the FSM, e.g. the following two encodings can be selected:
VLSI
Design Course
8-52
Finite-State Machine
Current state q1 q0 S0 S0 S1 S1 S2 S2 Next state n1 n0 S0 S1 S2 S1 S0 S1
Input 0 1 0 1 0 1
Output 0 0 0 0 0 1
Table 8.3: State-transition truth table Encoding 1 S0 = 00 S1 = 01 S2 = 11 There are s possible encodings with s= k! (k m)! Encoding 2 S0 = 00 S1 = 11 S2 = 01
with k = 2n (n is the number of selected state bits) and m being the number of states to be encoded. Typically n is chosen as n = lg2 (m) . However, other values are possible for n, e.g. one bit per state! In the example above: k = 22 = 4 and m = 3. With these constraints the number of 4! possible encodings is (4 3)! = 24. Each corresponding encoding results in dierent complex realizations. The rst state encoding had been S0 = 00, S1 = 01, S2 = 11 The corresponding output function is out = abc resulting in the state transition functions y1 = a bc y2 = ab + bc + ac The number of product terms is 5 and that of the literals 12. A resulting hardware implementation using combinational logic is shown in gure 8.53 The second encoding had been S0 = 00, S1 = 11, S2 = 01 The corresponding output function is out = a bc
VLSI
Design Course
8-53
Finite-State Machine
Figure 8.53: FSM-realization for rst encoding scheme and the transfer functions y1 = a b y2 = ab + bc The number of related product terms and literals is now 4 resp. 9. Figure 8.54 shows a corresponding realization. As one can see state encoding is crucial for eciency of the nal solution. Unfortunately there is no way to nd an optimal assignment with an algorithm whose complexity is bound by a polynomial expression. A good heuristic is to simply select an encoding where only one bit is changing when sequencing from state to state (gray code). Another good approach can be one-hot encoding (where a single bit represents each state) which is certainly restricted to a small number of states.
8.6.3
Although it is possible to base FSM realizations on self-timed or other timing disciplines, most FSM implementations are based on a synchronous, single-clock scheme. As already mentioned in connection with gure 8.48 a general sketch of an implementation strategy using the Moore
VLSI
Design Course
8-54
Finite-State Machine
Figure 8.54: FSM-realization for second encoding scheme machine model (outputs are functions only of current state, independently of current inputs) is shown in gure 8.55. One should note the use of a clocked register to hold the current state
Figure 8.55: FSM (Moore automata) implementation information. All other blocks are combinational logic components which can be realized in dierent ways (PLA, ROM, dedicated logic circuits etc). Timing of the inputs of such a circuit has to be synchronous with the FSMs clock because all signal outputs of the next-state logic have to be settled down before the values are loaded into the registers during rising clock. In the case of asynchronous transitions nonsense might be loaded or meta-stable states of the registers might be activated. Asynchronous inputs can be treated as shown in gure 8.56. The synchronisation through additional clocked registers guarantees that the inputs to the state register are stable at each active clock edge, assuming of course that the propagation delay along the combinational
VLSI
Design Course
8-55
Finite-State Machine
Figure 8.56: Treatment of asynchronous inputs in a Moore machine path through the logic is shorter than the clock period (plus setup-time of state registers). Moreover, although meta-stable behaviour of the input register remains a possibility, it has a clock period (minus the next-state propagation delay) to become valid before it corrupts the contents of the state register. Thus, for suciently long clock periods, this latter design should be arbitrarily reliable. It is important to recognize that the implementations of gures 8.55 and 8.56 behave slightly dierently, owing to the extra clock delay in the inputs of gure 8.56. Given identical nextstate logic, identical input sequences will yield output sequences delayed by one clock cycle in the second approach.
8.6.4
Most real digital systems are nite-state machines, yet the view and techniques introduced in this chapter are not appropriate in every circumstance. The binary encoding of an FSMs state allows at most 2k states to be represented in k bits of state variables, and in general about k ip-ops are required to hold the state of a 2k -state machine. Adding a single ip-op to a machine potentially doubles its number of states. This exponential relationship between the number of states and the amount of physical hardware in a sequential circuit leads the FSM model to become awkward in dealing with sequential circuits having more than a few bits of storage. A 10-bit register, for example, would be quite dicult to characterize by a state-transition diagram; the number of states of a supercomputer is inconceivably large. Typically, such systems are viewed in terms of memory cells and registers, partitioning the enormous state into more tractable units. It is important to recognize that sequential circuits may be viewed either in state or in bit terms, that the two are exponentially related, and that it is often useful to change between these views. Therefore, the reader should be aware that it makes no sense to apply the FSM-model to each type of sequential circuit. However, the FSM-model is very well suited to support the design of controllers since the number of states is reasonably small.
VLSI
Design Course
8-56
Finite-State Machine
8.6.5 Equivalence of FSMs
The input/output behaviour of two FSMs may be identical even though the machines have dierent transition and output rules or even dierent numbers of states. As a degenerate example, one consider two single-input FSMs whose output remains constant, independent of their state. From external observations it is impossible to distinguish between the states of such machines one might have one state and the other nine, yet the machines are externally indistinguishable. We call FSMs equivalent if they are indistinguishable; for all practical purposes, equivalent FSMs are interchangeable. Therefore, the equivalence of FSMs is important for their construction since the designer is interested to transform an initial FSM specication to an equivalent machine which can be realized most eciently on silicon meeting all required constraints. It is therefore useful to develop the notion of equivalence together with engineering tools for reducing a specied FSM to a simpler equivalent. The terms state equivalence and FSM equivalence are dened in the following way: State equivalence: Let s1 and s2 be particular states of FSMs M1 and M2 . State s1 of M1 is equivalent to state s2 of M2 if and only if for every nite sequence of inputs, the outputs resulting from the application of that sequence to M1 in s1 are identical to the outputs resulting from the application of the same sequence to M2 in s2 . Thus two states are not equivalent only if there exists a nite input sequence that leads them to produce distinct outputs. The notation M : s will be used to specify state s of machine M . FSM equivalence: Let s1 and s2 be initial states of FSMs M1 and M2 . Then the machines M1 and M2 are equivalent if and only if M1 : s1 is equivalent to M2 : s2 . Given an FSM that solves some practical problem, one is often interested in nding the smallest equivalent FSM in order to minimize costs. While several measures of smallest might be proposed, a natural candidate (and usual choice) is the number of FSM states. Thus one seeks to perform a state reduction on a given FSM M1 to yield and equivalent M2 having fewer states. In general, this may be done by detecting and merging equivalent states within M1 . For example one can look for pairs M1 : si and M2 : sj that are equivalent. When such a pair is found, they simply can be combined into a single state, yielding an equivalent FSM with one fewer state. This process of looking for equivalent states can be continued in the new FSM and terminates when a pair of equivalent states can no longer be found. This is an example of a relaxation algorithm, in which a set of reduction rules is repeatedly applied to reduce a structure until it can be reduced no more. It begins with a pessimistic but working model of the desired FSM and iteratively improves the cost while maintaining equivalence. This approach has the disadvantage that the equivalence of two states can be dicult to detect. Rather than incrementally improving an initial pessimistic model, the optimistic relaxation approach begins with the assumption that all of the states of M1 are equivalent (yielding a one-state machine). The relaxation iteratively discovers pairs of presumed equivalent states that cannot in fact be equivalent and grudgingly splits them into their components. This scheme is based on the detection of state none-equivalence through the following two rules: If states si and sj have dierent outputs, then they are nonequivalent If, for some input combination v1 , v2 , ..., vm state si1 goes to state Si2 and state sj1 goes
VLSI
Design Course
8-57
Finite-State Machine
to state Sj2 , where Si2 and Sj2 are nonequivalent, then si1 and sj1 are nonequivalent Beginning with the unrealistic assumption that all states are equivalent, iteration of the above rules will uncover more and more nonequivalent pairs of states until every pair that has not been shown nonequivalent is in fact equivalent. Consider e.g. the FSM diagrammed in gure 8.57. The search for a reduced equivalent starts
Figure 8.57: Five-state FSM by constructing a truth table for output and transition rules for a one-state equivalent: Transitions 0 1
New state S0 = S1 = S2 = S3 = S4
Output X
In the course of building the table, it has to be checked that each output and next-state value for a merged state is consistent with each of the component states from the original FSM. In this rst step, an inconsistency will be detected immediately: It is impossible to put a value into the output column for the single combined state that is consistent with all ve component states. Thus the aggregate state has to be split into two new states for the next iteration, with output values of 0 and 1. One partitions the ve-state aggregate into one state corresponding to the original S0 and S3 states with a 1 output, and a second state corresponding to the original states with a 0 output. Then it has to be attempted to ll out the truth table: Transitions 0 1 S1 = S4 S 0 S2 X
New state S0 = S3 S1 = S2 = S4
Output 1 0
This time the table could be nearly completed. A single inconsistency is encountered when trying to assign a transition for the S1 = S2 = S4 state on a 1 input: In the original machine, S1 and S4 both go to S3 in this case, while S2 goes to S4 . Since the respective next states S3 and S4 are not equivalent, S2 has to be split into a separate state. This results in:
VLSI
Design Course
8-58
Finite-State Machine
Transitions 0 1 S1 = S4 S0 S2 S3 S2 S4
New state S0 = S3 S1 = S4 S2
Output 1 0 0
The corresponding state-transition diagram is shown in gure 8.58. The reader might verify
Figure 8.58: Reduced equivalent FSM that this reduced FSM is equivalent to the original. While simple optimistic relaxation gives optimal reductions in the case of completely specied FSMs, optimal solutions to interesting variations of the FSM reduction problem are known to be computationally intractable. For example, optimal reduction of an incompletely specied FSM (dont cares are available), in the sense that any values are acceptable for certain outputs and/or transitions, is NP-hard. The development of good heuristic approaches to this and related optimization problems remains a topic of research.
8.6.6
Regular expressions are a commonly used notation for describing simple classes of strings and symbols. For the purpose of this subsection the following regular-expression syntax for describing stings of uppercase letters will be used: 1. Finite strings of symbols (letters), including the empty string (which will be written as ), are regular expressions. Thus, A, and ABCAABCAAABB are valid regular expressions, each denoting a set containing only the specied string of zero or more letters 2. If p and q are regular expressions, then pq is a regular expression denoting the set of strings formed by concatenating a string from p with a string from q 3. If p and q are regular expressions, then p | q is a regular expression denoting the set of strings that includes both the strings denoted by p and the strings denoted by q . Thus A | B is a regular expression dening a set containing the strings A and B 4. If p is a regular expression, then (p) is a regular expression denoting the same set of strings; parentheses are used to disambiguate for example, to distinguish (AB ) | C from A(B | C )
VLSI
Design Course
8-59
Finite-State Machine
5. If p is a regular expression, then p is a regular expression denoting all strings that are concatenations of nitely many (zero or more) strings denoted by p. Thus A denotes the set of strings containing the empty string as well as every string consisting of nitely many As; A(A | B ) B denotes the set of all strings of As and Bs that begin with A and end with B An interesting property of regular expressions is that each regular expression denes a set of strings that can be recognized by a nite state machine. It is assumed that the input to the FSM is a sequence of symbols (in this case, encoded uppercase letters) and that each consecutive input symbol can cause a transition from the current FSM state to a new state. At any time when the sequence of input symbols corresponds to a string to be recognized, the FSM is in a distinguished state marked R; it is allowed to mark several states in this way. The starting state will be marked S . The FSM of gure 8.59, for example, recognizes the strings
Figure 8.59: Example FSM B (AB ) . Note that transitions corresponding to input strings that are not recognized (such as those containing the letter C ) are omitted. The selected convention is that such strings cause implicit transitions to a BAD state, which causes the entire input sequence to be rejected. Although every regular expression denotes a set of strings recognizable by an FSM, the systematic derivation of an FSM recognizer from a regular expression is not entirely trivial. A useful conceptual tool in dealing with regular expressions is the nondeterministic FSM (NFSM), whose state-transition diagram is ambiguous in the sense that it may indicate several possible transitions on a given input symbol. The simple NFSM in gure 8.60 recognizes the strings
Figure 8.60: Nondeterministic FSM A | (AB ). One can view the NFSM as being in several states simultaneously. Its behaviour can be emulated by hand, using tokens that are moved about on the state-transition diagram to record active states. One begins with a token on the starting state. At each input symbol,
VLSI
Design Course
8-60
Finite-State Machine
tokens are placed on each state at the arrow end of a transition from a marked state and the previous tokens have to be removed. Note that at most one token has to be placed in each state. Whenever one or more states marked R contains a token, the input string is accepted (recognized) by the NFSM. It is possible to construct a deterministic FSM that recognizes any regular expression, but the construction becomes cumbersome when an expression of the form | is encountered. In eect, the FSM under construction must entertain the two alternative forms and as possible inputs until some input symbol rules one or both forms out; this may require a number of states, each corresponding to some combination of a tentative parse of form or an alternative parse of form . In contrast, the NFSM provides direct accommodation for alternative input forms by means of ambiguous transitions. The dual paths between the S and R states of gure 8.60, for example, correspond directly to the alternative input forms A and AB . As a further convenience in the construction of NFSMs from regular expression, the use of transitions on the empty input string is allowed; such transitions are taken spontaneously by the NFSM. In the token model, whenever there is an empty transition from a state marked by a token, the target of the empty transition will be marked as well. Figure 8.61 shows how one might use empty transitions, designated by , to convert the A | (AB ) NFSM, for example, to
Figure 8.61: NFSM that recognizes strings of form (A | (AB )) recognize (A | (AB )) . Nondeterministic FSMs are, in an important sense, no more powerful than deterministic FSMs: The same set of strings (the ones that can be described by regular expressions) can be recognized by each. NFSMs, however, provide a primitive model for parallelism because of their ability to model several discrete states simultaneously. While NFSMs and FSMs perform the same computations, a deterministic FSM may require exponentially many states compared to the equivalent NFSM. The nondeterministic FSM, although not directly realizable in hardware, can be an important tool in the synthesis of realizable deterministic FSMs that perform useful computations. The synthesis of an FSM to recognize strings described by the regular expression (A | (AB )) , for example, might be approached by the straightforward synthesis of the NFSM of gure 8.61 followed by the derivation of an equivalent (but less intuitive) deterministic FSM using a computer-based algorithm.
VLSI
Design Course
8-61
Finite-State Machine
8.6.7 Context
Finite-state machines are simultaneously a mathematical abstraction that has received considerable attention from theorists and a practical engineering tool of enormous consequence to the designer of digital systems. These roles are not independent; the formal study of FSMs has signicantly enriched the repertoire of optimizations and techniques available to the engineer, while their practical signicance stimulates continued attention by theorists.
VLSI
Design Course
8-62
Chapter 9
VLSI
Design Course
9-1
VLSI
Design Course
9-2
Data path oriented: Microprocessors, DSPs and Co-Processors Data operation by a number of functional units interconnected by a wordsized datapath Functional units: ALU, Multiplier, . . . Control-dominated: Sequencers, Protocol Engines no arithmetic structures no or small data path decentralized set of coupled controllers
9.1.4
Synthesis Steps
1. Architectural Synthesis (= Behavioural Synthesis) translation of a source description into a data ow graph scheduling the events in the ow graph allocation of functional units in the machine binding the functional units to real components in a specic technology 2. Logic Synthesis translation of a register-transfer level description of a circuit into combinational logic and registers nite state machine synthesis technology-independent logic optimization mapping the result on a suitable target technology (Gate Arrays, Standard Cells, Sea of Gates, . . .) circuit retiming to meet performance requirements 3. Layout Synthesis module generators generate automatically a dense layout of specic modules typical modules: functional units of data paths (ALU, register, shifter, adder, . . .) greatest leverage for data path oriented design PLA-generators for control logic most useful in the design of application specic DSPs and generic components such as microprocessors
VLSI
Design Course
9-3
VLSI
Design Course
9-4
VLSI
Design Course
9-5
Gate Arrays
Introduction to Gate Arrays
Gate Arrays (Masterslices): Prefabricated active elements (master) Construction of logic functions by personalization (wiring macros from a cell library, intra-cell routing) Connection of functional blocks by inter-cell routing in 1 . . . 3 layers + contact/via layers Arrangement of gate arrays: Row Structure Island Structure Matrix of structures (= sea of gates) Mixed analog/digital gate arrays
VLSI
Design Course
9-6
Gate Arrays
VLSI
Design Course
9-7
Gate Arrays
9.3.2 IMI Grid Structure
Figure 9.3: IMI gate array structure Fig. 9.3 principally shows the structure of gate arrays of International Microcircuits Inc. (IMI) (single metal layer). The real circuit has 1440 cells. In the Figure a reduced number of 40 cells is drawn in order to improve the clearity of the representation. The gate array consists of the following elements: Pad (connection to outside world) Buer devices (drive out-chip load capacitances) Distributed power and ground buses Underpasses to cross under the power and ground buses without contacting them Each point represents a contact (potential interconnection point) From Fig. 9.4 the following features can be seen:
VLSI
Design Course
9-8
Gate Arrays
Figure 9.4: Corner of IMI gate array die Cells containing transistors are clustered around the VDD and VSS buses. In each cell four horizontal bars (crossing VDD and VSS ) can be seen. The thick bar represents a poly underpass while the the three thin bars are common poly input lines to an nMOS/pMOS transistor pair Between cell columns a column of short horizontal poly underpasses is placed
VLSI
Design Course
9-9
Gate Arrays
VLSI
Design Course
9-10
Gate Arrays
Figure 9.6: Explanations of grid: (a) basic cell. (b) internal interconnects. (c) basic cell and crossover (poly) block. (d) XR = transistor. (e) crossover block interconnects In Fig. 9.6 (b) the internal gate (long horizontal poly lines) and internal diusion (short horizontal diusion lines) are shown. From Fig. 9.6 (d) it can be seen that adjacent nMOS or pMOS transistors have a common drain/source connection. Contacts for the nMOS source and drain connections are at both sides of the VSS bus (same for pMOS transistors and VDD bus.
VLSI
Design Course
9-11
Gate Arrays
VLSI
Design Course
9-12
Gate Arrays
9.3.3 CDI Grid Structure
VLSI
Design Course
9-13
Gate Arrays
9.3.4 Gate Array Design Flow
VLSI
Design Course
9-14
Gate Arrays
9.3.5 Personalization Examples for IMI and CDI Gate Array
Figure 9.11: Personalization for inverter: (a) schematic. (b),(c) IMI layout. (d) CDI layout
VLSI
Design Course
9-15
Gate Arrays
Figure 9.13: Layout of transmission gates: (a) single TG. (b) pair of TGs with common output
VLSI
Design Course
9-16
Gate Arrays
9.3.6 Qualication of Gate Array Design Style
Advantages: Lower number of individual masks needed Higher number of pieces for uncustomized master (cost reduction) Many others for masters, second source fabrication, libraries and design systems Disadvantages: Area overhead (by unused transistor cells) Overdimensioned routing channels Larger cell size = Advantages dominate for smaller production volumes
VLSI
Design Course
9-17
Gate Arrays
9.3.7 Gate Array Market
VLSI
Design Course
9-18
VLSI
Design Course
9-19
VLSI
Design Course
9-20
Standard Cells: No prefabrication: all cell layouts from a system library Cells in rows: VDD /VSS - lines connected by cell abutment, uniform cell height, variable width: I/O - connections top and bottom Cell rows alternating with routing channels Width of routing channel adaptable to design needs Crossing of Cells possible: feed-through cells, electricially equivalent pins
9.4.2
Advantages: substantial saving of chip area compared to Gate Arrays (typically 40%) thereby reduction of fabrication costs per chip higher exibility in cell design Disadvantages: all masks individually (high initial cost and turn-around time) very complex or large-area functional blocks like RAM, ROM or PLA cannot be inserted = Advantages dominate with a higher number of pieces (> 10000)
VLSI
Design Course
9-21
VLSI
Design Course
9-22
Rectangular cells, any form and size Free cell arrangement Wiring channels between the cells Width of wiring channels according to routing needs Power/ground routing not separated from signal routing
Figure 9.21: Floor plan for macro cell design style (= building block approach
VLSI
Design Course
9-23
9.6.2
Mixed analog/digital macros EEPROM cells Power components: High-current analog buer Power MOSFET driver ASIC-Hybrid combinations Subsystem Cells: 555 Timer, 4046 PLL, . . . SC-Filter Biquad units Temperature sensors
VLSI
Design Course
9-24
PROM (Programmable Read Only Memory) Device with xed AND array and a programmable OR array 1. mask programmable + superior speed performance due to internal connections hardwired during manufacture + cheap at high volumes can only be programmed by manufacturer development cycle = weeks or months 2. eld programmable + immediately programmable + at low volumes less expensive than mask-programmable devices resistance of programmable routing switches lowers signal performance EPROM (Erasable Programmable Read-Only Memory) EEPROM (Electricially Erasable Programmable Read-Only Memory) = additional advantage to be erasable and re-programmable = structures of PROMs are best suited for the implementation of memories PLA AND array and OR array programmable product term sharing: every product term of the AND array can be connected to any of the OR output gates PAL AND array is programmable and OR array has xed connection points combinational PAL devices used for implementation of logic functions sequential PAL devices used for implementation of sequential logic (nite state machines) arithmetic PAL devices sum of product terms may be combined by EXOR gates at the input of the macrocell D ip-op
VLSI
Design Course
9-25
VLSI
Design Course
9-26
VLSI
Design Course
9-27
VLSI
Design Course
9-28
EPLD (Erasable Programmable Logic Devices) EEPLD (Electricially Erasable Programmable Logic Devices) = these devices use EPROM cells or EEPROM cells instead of fuses as programmable connections = tendency: instead of large global logic planes a blockoriented architecture with local logic blocks and macrocells and an interconnection network between the blocks is used Example: Altera EP1800
VLSI
Design Course
9-29
each EP1800 quadrant contains 12 macrocells and has a local bus with 24 lines (for normal and inverted macrocell outputs) and a local clock the global bus has 64 lines and runs through all of the four quadrants (true and complement signals of 12 inputs (= 24 lines) + true and complement of 4 clocks (= 8 lines) + true and complement of I/O-pins of the 4 global macro cells in each quadrant (= 32 lines) macrocells: combinational or registered data output; the ip-op is congurable: D, T, JK or SR)
VLSI
Design Course
9-30
VLSI
Design Course
9-31
VLSI
Design Course
9-32
9.7.3
1. Easy to map Espresso/MIS style logic into sum of products 2. Easy to route, very fast turnaround 3. Performance independent of netlist 4. Wide designer acceptance 5. Relatively mature technology, but some innovation still ongoing
VLSI
Design Course
9-33
Figure 9.33: Principal FPGA structure Logic blocks Routing resources that can connect the logic blocks The routing resources are both the greatest strength and weakness of FPGAs
VLSI
Design Course
9-34
1. Block organized, SRAM based (internal block structure not restricted to ANDOR) Xilinx Altera (FLEX) Plessey AT&T ... 2. Cell organized, anti-fuse based Actel Quicklogic ...
VLSI
Design Course
9-35
Static SRAM Programming Technology connection elements are controlled by SRAM cells
1. Function unit and routing are controlled by SRAM cells 2. These cells are located adjacent to the logic they control (not in a separate chip) 3. SRAM cells are congured at power-up and potentially recongured during operation 4. Conguration is a non-destructive process 5. SRAM cells are large (5 transistors), require connection to power, ground, data and select lines 6. . . . but they can be intimately intermixed with CMOS logic 7. SRAM memory design is highly rened Anti-Fuse Programming Technology Anti-fuses are made with a modied CMOS process involving an extra step This step creates a very thin insulating layer that separates two conducting layers This insulator is penetrated by applying a high voltage to the to conducting layers (this process is not reversible) The programming voltage must be much higher than the logic threshold, otherwise the chip would program itself under operation Such high voltages can be destructive for CMOS logic circuitry Large isolation devices may be required to protect logic gates from the programming voltage
VLSI
Design Course
9-36
Figure 9.36: Actel PLICE anti-fuse structure Actel PLICE anti-fuses can be programmed by placing a relatively high voltage (18V) across the anti-fuse terminals, heat and melt the dielectric by a driving current of about 5 mA and form a conductive link between poly-Si and n+ diusion bottom and top layer of the anti-fuse are connected to metal, the over all resistance of a programmed anti-fuse (from metal to metal) is about 300 500 manufactured by 3 additional masks to a normal CMOS process Quicklogic ViaLink Anti-Fuse Programming Technology
Figure 9.37: Quicklogic ViaLink Anti-Fuse amorphous silicon antifuse a low resistance path (80) between two metal wires is created by a 10V programming voltage at the terminals of the anti-fuse
VLSI
Design Course
9-37
Figure 9.38: EEPROM programming technology used in FPGAs (PLDs) manufactured by Altera and Plus Logic static charge on oating gate turns the transistor permanently o EPROM transistors are used to pull bit lines to ground disadvantage of EPROM technology: static power dissipation
9.8.4
VLSI
Design Course
9-38
Figure 9.39: General architecture of XILINX FPGAs Series XC2000 XC3000 XC4000 Number of CLBs 64 . . . 100 64 . . . 320 64 . . . 900 Equivalent Gates 1200 . . . 1800 2000 . . . 9000 2000 . . . 20000
Figure 9.40: Xilinx XC4000 CLB two stage look-up tables, two functions of 4 variables or one function of ve variable can be implemented
VLSI
Design Course
9-39
Figure 9.41: Xilinx XC4000 single length lines XC4000 routing architecture: Single-length Lines and Double-length lines high CLB connectivity to wiring segments
Figure 9.42: Xilinx XC4000 double length lines and long lines
VLSI
Design Course
9-40
Figure 9.43: General architecture of Actel FPGAs Series Act-1 Act-2 Number of LMs 295 . . . 546 430 . . . 1232 Equivalent Gates 1200 . . . 2000 6250 . . . 20000
rows of programmable Logic Modules (LM) horizontal routing channels between rows
VLSI
Design Course
9-41
VLSI
Design Course
9-42
VLSI
Design Course
9-43
VLSI
Design Course
9-44
VLSI
Design Course
9-45
Figure 9.49: Cost per Chip (Dollars) Economics and Performance of FPGAs compared to MPGAs: FPGA: + no overhead cost less cost intensive for low volumes + short turnaround time short time to market + high designers exibility (short turnaround time), low redesign costs relatively low speed of operation caused by the resistance and capacitance of programmable switches in the routing network decreased logical density, programmable switches and conguration network require chip area MPGA: + low per chip costs at high volumes + fabrication hardwired metal connection layers fast operation + high logic density very high costs for low volumes (for example prototypes) no redesign exibility
VLSI
Design Course
9-46
VLSI
Design Course
9-47
VLSI
Design Course
9-48
Adders / Subtracters
Chapter 10
Arithmetic Units
In the following chapter, basic arithmetic units like adders, subtracters, or multipliers are discussed. These components are widely used in VLSI circuits e. g. for the digital signal processing application domain. More detailed descriptions on arithmetic units can be found e. g. in [13] or [3].
10.1
10.1.1
Adders / Subtracters
Basic Adder Cells
The circuit realizing the function C = A1 A2 S = A1 A2 (10.1) (10.2)
Half Adder
is called halfadder and can be used to calculate the sum S of two bits A1 and A0 . A possible carry is set at the C output. Full Adder For adding binary numbers having a bitwidth of more than one single bit, the concept of the halfadder has to be extended. The carry output of less signicant bits in the addition process have to be taken into account in the more signicant bits. For that, a new circuit structure called fulladder is used which is based on the following functional equations: Cout = Cin (A1 + A2 ) + A1 A2 Sout = A1 A2 Cin (10.3) (10.4)
These equations can be realized either by logic gates (AND, OR, XOR) or by two halfadders and an OR gate.
10.1.2
The following section introduces the basic arithmetic components used in VLSI designs. First, adder and subtracter architectures are discussed. Since addition and subtraction for binary
VLSI
Design Course
10-1
Adders / Subtracters
numbers can be calculated by almost the same hardware (by selecting the appropriate complement representation rst), the term adder is used as synonym for both adder and subtracter in the following section. Serial Adders The principle of serial adders is shown in Fig. 10.1:
Carry Register n
. . . . . . . . .
E
Operand A Cout X Y
c
Shift Register n
. . . . . . . . .
Operand B
E E
c
Shift Register
T Cin
Full-Adder
Shift Register n
. . . . . . . . .
Sum c
Cout c
Figure 10.1: Serial adder principle At the beginning of the operation, the two nbit operands A and B are loaded to the shift registers. The carry register is cleared resp. set to the value of the carry input. During the next n clock cycles (if a wordlength of n bits for each operand is assumed), the operands are added bitwise in the fulladder and stored in the sum register. For that, the operand shift registers apply the least signicant bit to the fulladder inputs whereas the sum shift register reads the current sum output of the fulladder at the serial input and and shift the contents by one bit to the right each clock cycle. The carry output of an addition is stored in the carry register for use in the next clock cycle. The n-bit sum and the carry output are available after (n+1) clock cycles [1 operand load, n calculation]. The serial adder has the smallest hardware complexity which is wordlength independent (if the shift registers are not considered) but requires the highest computation time of all adder implementations. Parallel Adders Ripple Carry Adder Chained fulladders which form an adder of the required wordlength are called ripple carry adder since during addition the carry ripples through the whole chain from the least signicant to the most signicant bit as shown in Fig. 10.2: The addition time is therefore dependent on the wordlength of the operands. Carry Lookahead Adder To speed up the addition process, lookahead methods can be applied to reduce the time associated with carry propagation. The carry input of a stage i is calculated directly from the input of the preceding stages i 1, i 2, . . . i k rather
VLSI
Design Course
10-2
Adders / Subtracters
A[n-1] B[n-1] c Full-Adders A[1] B[1] c . .Cout[1] . ' A[0] B[0] c Cout[0]
c '
c '
c '
Cin
c c CoutSum[n-1]
c Sum[1]
c Sum[0]
Figure 10.2: Ripple carry adder principle than allowing carries to ripple from stage to stage. To perform that task, the cout of ordinary fulladders are substituted by the generate and propagate signals dened by gi = ai bi pi = ai + bi . The carry input signal of stage i + 1 is dened by the equation cini+1 = ci = gi + pi ci1 and by recursive substitution in an example of a 4 bit adder c0 = cin1 = g0 + p0 cin c1 = cin2 = g1 + p1 g0 + p1 p0 cin c2 = cin3 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 cin c3 = cout = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 cin . (10.8) (10.9) (10.10) (10.11) (10.7) (10.5) (10.6)
As can be seen in the equations above, the carry lookahead logic circuits can be realized by a two level logic implementation, that means the whole addition is performed in constant time (without inuence of wordlength). The implementation of the carry lookahead corresponding to the above equations is shown in Fig. 10.3.
A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0]
c c Cin[3] + ' c
Sum[3] g[3] p[3] g[2]
c c Cin[2] + ' c
Sum[2] p[2] g[1]
c c Cin[1] + ' c
Sum[1] p[1] g[0]
c c Cin[0] + ' c
Sum[0] p[0]
Cin
c c
c c
c c
c c '
c
Cout
Figure 10.3: Carry lookahead adder for 4 bits The number of gate inputs is restricted due to technological constraints. That means, the wordlength of a carry lookahead cannot increase above any number. Due to that reason, adders for a big wordlength are split into smaller groups processed by single carry lookahead adders with reasonable wordlengths as shown in Fig. 10.4.
VLSI
Design Course
10-3
Adders / Subtracters
A[15:12] C[15] B[15:12] A[11:8] B[11:8] A[7:4] B[7:4] A[3:0] B[3:0]
c c 4 bit CLA-Add
'C[11]
' C[7]
' C[3]
'
Cin
c c Cout Sum[15:12]
Figure 10.4: Clustered carry lookahead adder for 16 bits The carry signal produced by a group is forwarded to the next group so that, if the group is considered as a single block, the carry ripples through dierent blocks as in the carry ripple adder. Alternatively, a hierarchical approach might be chosen in a way, that for each group a group-generate as well as a group-propagate signal are generated which are evaluated by a second level carry lookahead circuit. Carry Select Adder In the following adder type, the wordlength of the operands is again subdivided into clusters (see Fig. 10.5). The cluster subwordlength is chosen to balance the time required for intra-cluster carry ripple additions and carry calculation of the preceding clusters. The additions are all performed in parallel assuming the following two cases: carry in of a cluster are 0 and are 1. The results (cluster carry out and partial sum C/Sum[i : j ]) are forwarded to multiplexors which select the appropriate value depending on the carry output of the preceding stages. Since the time to switch a multiplexor is almost negligible compared to the time required for the carry ripple additions, the overall addition time is almost independent of the wordlength.
A[15:12] B[15:12] A[11:8] B[11:8] A[7:4] B[7:4] A[3:0] B[3:0]
CR-Adder
c c 4 bit + '
CR-Adder
c c 4 bit + '
CR-Adder
c c 4 bit + '
CR-Adder
c c 4 bit + '
C/Sum[3:0]
Cin
C/Sum0[7:4] B[7:4]
CR-Adder
c c 4 bit + '
CR-Adder
c c 4 bit + '
CR-Adder
c c 4 bit + '
C/Sum1[15:12]
C/Sum1[11:8]
C/Sum1[7:4]
c c r 1 0 ' r r c c
Cout Sum[15:12]
c c r 1 0 ' r r
C[11]
c c C[3] r 1 0 ' r r
C[7]
c
Sum[7:0]
c
Sum[3:0]
Sum[11:8]
Figure 10.5: Carry select adder for 16 bits Since the carry select adder requires two carry ripple adder chains for each cluster (except in the least signicant), the hardware amount is almost twice that of a simple ripple carry adder. It is slower than a carry lookahead adder but compared to that type it has a higher regularity and is for that reason better suited for VLSI implementation.
VLSI
Design Course
10-4
Adders / Subtracters
Carry Save Adder For the addition of very many addends (e. g. in parallel multipliers), the time required for full carry propagation even in the case of use of carry lookahead adders might be to high for some applications. To achieve constant addition time complexity, the propagation of computed carry results is avoided in the same stage and both, the S and the Cout vectors are connected to the correct adder in the succeeding stage. This concept requires a nal addition to merge the sum and the carry vector of the nal stage into a single sum vector which can be realized using any of the adders discussed above (in Fig. 10.6 a carry ripple adder has been chosen for simplicity). In a carry save adder, the adder delay is increased by one full-adder delay if it is extended by an additional operand.
X[n-1] Y[n-1] X[2] Y[2] X[1] Y[1] X[0] Y[0] . . . . . . . Cin
Full-Adders
cc +
. . . . . .. . . . . . . W[n-1]
cc +
. . . . . .. . . .
cc +
. . . . . .. . . .
cc + '
. . . . . .. . . . W[0]
Full-Adders
cc + '0 c + ' c c
. . . .. . . . .
c c + '
. .. . . . .. . . . .
W[2] c c + '
W[1] c c + '
. .. . . . .. .
cc + '
. .. . . . .. . V[0]
Full-Adders
Full-Adders
cc + '
Sum[n] . . .. . .
. .. . . . .. .
V[1] c c + '
. . . .. . . . .
cc + '
cc + ' 0 c
Sum[0]
. . . . . . . Cout Sum[n+1] . . . . . . .
Sum[n-1]
Sum[1]
VLSI
Design Course
10-5
Shift and Add Multiplier The most common multiplier is the Shift and Add Multiplier (SAA Mult.). Two binary unsigned integer words X and Y of bit-size Nx and Ny , respectively, can be written using their binary representation:
Nx 1 Ny 1
X=
i=0
xi 2i
Y =
j =0
y j 2j
(10.12)
Z=
i=0
(10.13)
The following recurrence can be derived from formula 10.13: D0 = 0 Di+1 = Di 21 + xi Y Z = DNx 2Nx 1 (10.14)
In each step of the recurrence one bit of X is multiplied (a simple AND-operation) with Y and added to the intermediate result Di which is shifted one bit. Figure 10.7 shows the general structure of the Shift and Add multiplier with bit-sizes Nx and Ny .
Figure 10.7: Structure of SAA multipliers For this multiplier type it takes Nx clock cycles to complete the multiplication, since one bit of X is processed each step. The delay of the combinatorical circuit (which determines the maximum clock frequency) is approximately: Ny F A (F A is the delay of a full adder, the register delays are not considered). The cost of a Shift and Add Multiplier is (3Ny + 2Nx )F A (the cost of a full adder F A is assumed to be equal to the cost of a register). Carry Save Multiplier In opposite to the SAA-Multiplier, the Carry Save Multiplier (CSM) calculates the result in one step. Every bit of the rst argument is multiplied with every bit of the second argument concurrently. The results are added up according to the position of the source bits.
VLSI
Design Course
10-6
Multipliers
The CSM consists of combinatorial logic only. The multiplication of two 4-bit binary numbers can be written as X3 X2 X1 X0 Y3 Y2 Y1 Y0 P30 P20 P10 P00 P31 P21 P11 P01 P32 P22 P12 P02 P33 P23 P13 P03 Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
where Pij = Xi Yj . The addition of all Pij terms can be done in an array of full adders. Figure 10.8 shows the general structure of a Carry Save Multiplier assuming Nx Ny . Part II is omitted in case of same size for Nx and Ny . The Carry In of the full adder is supplied in the upper right corner. Not every full adder needs a Carry In, for some position half adders are sucient. The adder Carry Out is depicted in the lower left corner.
Figure 10.8: Structure of CSM multipliers The delay of this type of multipliers is (Nx + Ny 2)F A . The cost is (Nx 1)Ny F A plus (2Ny + 2Nx )F A , if X, Y and the Z-register are accounted as in the shift and add case above. Block Multiplier A combination of the fully parallel Carry Save Multiplier and the serial Shift and Add Multiplier leads to a exible architecture which can be congured from working fully serial to working fully parallel. Many combinations in between are possible, thus allowing the adaptation to given specications and restrictions.
VLSI
Design Course
10-7
Multipliers
The basic idea of the block multiplication is to divide each argument into blocks of the same size. Each block of the rst argument is multiplied with each block of the second argument in a fast Carry Save Multiplier. All calculated block products are added up taking into account the positions of the current argument blocks. Therefore, as in the Shift and Add Multiplier, the arguments and the intermediate result have to be shifted in an appropriate way.
. . . X register nx . . . Y register
. . . . . . . . .. . . . . . . . .. . . . . . . . . .
e e
ny
nx+ n y
. . . . . . . . . . . . . . . . . . . .
@ @ nx+ n y
. . . . . . . . . . . . . . . . . . . .
Adder
@ @ nx+ n y
. . . . . . . . . .
Controller
. . . . . . . . .
. . .
. . . . . . . . . . . . . . . . . . . .
Z register
Figure 10.9: Architecture of the block multiplier Figure 10.9 shows the architecture of the block multiplier. The argument registers and the Carry Hold Register are simple shift registers. The intermediate result has to be shifted in both directions, thus requiring a bidirectional shift register. Signals for controlling the shift directions are generated by a controller, which can be realized using a simple counter. The multiplier can be congured by varying the block sizes of the arguments. With increasing block sizes the multiplier becomes more parallel, thus reducing the number of clock cycles needed to perform a multiplication. Larger block sizes, however, require a larger Carry Save Multiplier, which increases the area needed to realize the multiplier. Assuming that the rst argument is separated in kx Blocks of size nx and the second argument in ky blocks of size ny , the multiplier needs kx ky clock cycles to perform a multiplication. The delay of the multiplier is determined by the size of the ripple carry adder, which has a width of nx + ny bits.
VLSI
Design Course
10-8
Microarchitectures
Chapter 11
Microarchitectures
The term microarchitecture describes the domain between the macroarchitecture (the lowestlevel hardware visible to the user) and the implementation technology (MOS VLSI) [27]. For better analysis, microarchitectures are usually divided into 3 parts: the data path which performs the data manipulations and calculations, the control path is used to apply correct sequences of control signals to the data path, and the input/output unit providing access from/to the external world (see Fig. 11.1)
'
Data Path Status . .
. . .. . . .
Control . .
.. . . . . .
Flags
. . .. . . . .
T c
Input / Output
. . . . . .. . .
T c
Figure 11.1: Microarchitecture blocks The control path which can be interpreted as a more or less complex nite state machine (FSM) can be either hardwired (used in xed applications like a controller for the serial adder in Fig. 10.1) or programmable (microprocessor with downloadable microcode). The microarchitecture scheme as shown in Fig. 11.1 can represent quite simple circuits (like a trac light controller) as well as complex microprocessors.
VLSI
Design Course
11-1
In the datapath of a microarchitecture, the operations and data manipulations are performed. For that, control signals are generated by the control path depending on the operation(s) to be executed. By forwarding information about the status of the data path (e. g. exceptional conditions, underow, overow, division by zero, . . . ), the control path is able to react in a correct way to the actual needs. The state signals (ags) can be used to enable conditional branching depending on the state of the data path. Data processing is usually performed by typical components like ALUs, shifters, register les, . . . . The following section shows how datapath structures are usually implemented in larger VLSI designs. For that, we assume the following simple datapath structure:
Control Signals
Clock Cin
. . . . . . . . . . .
OP-Sel
Sel
Shift
Clock
c c E c c Ed d E E T c E
. . . . . . . . . . .
Inputs
Ain
E
. . . . . . . . . . .
c E Rout
. . . . . . . . . . .
c E
Output
Bin
c Status Flags
Status Signals
Figure 11.2: Datapath example The datapath consists of 2 input registers for the input operands Ain and Bin, an arithmeticlogic unit (ALU), a multiplexor to select between the Cin input and the ALU output, a shifter unit, and an output register. The datapath structure could be implemented based on standard cells, where basic library cells (like gates, muxes, registers, . . . ) are selected and interconnected, or, if a datapath compiler is used, based on a set of several layout tiles as shown in Fig. 11.3. A datapath compiler creates a regular layout depending on the wordlength of the operands by stacking the appropriate number of tiles in the layout. The horizontal structure consisting of a set of tiles performing all functions for a single bit is called bit slice. If we apply vertical cuts to the layout structure, the whole layout will be subdivided in layout blocks corresponding to a single function implemented. These layout stripes are called functional slices.
11.1.1
As an example for a discrete datapath implementation the 2901 bit-slice will be discussed in the following section ( [10]). The 2901 integrated circuit contains besides of a 16 word register set, a Q register (used
VLSI
Design Course
11-2
Datapath Design
AReg BReg
ALU
MUX
Shifter
. . . . . . . . . . .
Bit Slices
.. . . . .. . . .
Status Buers
. . . . . . . . . . . . . .. . . . . . .
Functional Slices
VLSI
Design Course
11-3
Datapath Design
within add-shift multiplications or divisions) an arithmetic-logic unit (ALU), a shifter, and an instruction decoder (see Fig. 11.4). All operations and the registers are designed for 4 bit operands. The set of instructions which can be executed by the 2901 IC is also shown in Fig. 11.5. The instructions are encoded in a 9 bit I vector which is provided by an external microcode controller. The rst of these tables shows the selection of the sources for both ALU inputs (R and S), the second mentions the ALU functions, whereas the third indicates the destination of the ALU results. To form an ALU for wordlengths with multiples of 4 bits, the 2901 ICs can be cascaded as shown in Fig. 11.6. In the example, a simple carry propagation scheme has been selected. As an additional option, carry-lookahead circuits (AMD 2902) could be used to enhance the speed for carry propagation.
Figure 11.6: 16-bit bit-sliced ALU The 2901 IC has been widely used for applications in digital signal processing and for minicomputers. It is available as stand-alone IC and some silicon manufacturers also provide macrocells with the functionality of the 2901 (for dierent wordlengths) that might be included to ASIC designs.
VLSI
Design Course
11-4
Controllers are used to apply a sequence of control signals to the datapath components. These control signals are chosen to perform the desired operation(s) within the datapath. The datapath is able to interact with the controller unit by sending appropriate status signals (e. g. overow ag when an addition is performed, equal ag as a result of a comparison, . . . ). The controller can be designed to change the sequence of control signals depending on these ags (used e. g. in microprocessors to perform conditional branches). The general structure of such a controller can be found in Fig. 11.7.
Environmental Inputs
Combinational Logic
c State Register
c Control Outputs
Figure 11.7: Basic controller structure It consists of a combinational logic block and a register. The combinational logic block generates out of the input signals (which can be e. g. an instruction word dening the sequence of control signals to be generated, state ags, . . . ) and parts of the previous register content the control output signals as well as the information which step in the sequence of control signals is to be executed in the next cycle. The controller can be seen as a realization of the abstract model of a nite state machine. To get a high level of regularity in the design of a controller, very often regular layout structures (like ROMs or PLAs) are used to implement the combinational logic block rather than directly implement the logic functions in separate gates (random logic). The random logic approach was chosen in the control unit of many early microprocessors ( 8 bit) and in RISC (Reduced Instruction Set Computer) processors whereas the regular layout structures are used in CISC (Complex Instruction Set Computer) processors to simplify their controller design. Regular structures simplify the design process due to the fact that if modications in the control sequences are required only the contents of a PLA resp. a ROM has to be redened instead of designing a whole combinational gate network. Since the design process for the latter approach can be compared with programming a memory contents instead of circuit design, that approach is called microprogramming and will be considered in detail in the sequel.
VLSI
Design Course
11-5
Controller Implementations
11.2.1 Microprogrammed Controllers
Microprogrammed controllers mainly consist of a control memory and a microinstruction register. The control memory is implemented using ROM (Fig. 11.8) or PLA (Fig. 11.9) structures. For special applications, also RAM based control memories are used if e. g. the instruction set of a processor has to be changed for special purposes. That exibility is not available when using hardwired logic. On the other hand, extra hardware cost compared to random logic due to address decoding (in the ROM based controller) and sparse control matrices and a performance penalty due to larger internal delays in the PLA or ROM could be the prize for that exibility. The control memory contains both the control signals to be forwarded through the microinstruction register to the datapath and some sequencing information giving the address (NA next address) of the subsequent microinstruction. The concatenation of the control signals and the next address is called microinstruction.
Address d d Decoder ' c Control NA T T c Control Outputs Environmental Inputs
ROM
Figure 11.9: PLA based controller Depending on the generation of the control signals, two types of microinstructions can be distinguished:
VLSI
Design Course
11-6
Controller Implementations
Horizontal Microinstructions. The control word from the microinstruction register is directly applied to the circuit which is to be controlled (see Fig. 11.10). Each elementary control point has a corresponding entry in the control word. That results in a very long control word and therefore big control memories. On the other hand, very specic encoding and a high degree of parallelism in the operations is possible. Vertical Microinstructions. That type of microinstructions is based on a dierent approach: since in a n-bit control word 2n congurations would be possible which are hardly used by the controller, the wordlength of the control word in the control memory is reduced by encoding the smaller number of, lets say M , used control vectors into a vector of log2 M bits. In a second step, the n-bit control word is fetched from a secondary memory used as control vector decoder (implemented e. g. as ROM or PLA) and forwarded to the datapath (see Fig. 11.11). It is also possible to use encoding of the control vector in groups for dierent hardware units (one group for ALU control, the next for shifter control, . . . ) which are decoded group by group instead of using a single and large control vector decoder.
Control Bits in the Microinstruction
. . . . . . . . . . .
c c c c c c c c c c c c
.. . . . .. . . .
Control Lines
. . . . . . . . . . .
c c c c c c c c c c c c
. . . . .. . . . .
Control Lines
Figure 11.11: Vertical microinstruction In controller design, one can proceed one step further: if a microinstruction itself can be represented as a sequence of submicroinstructions (so called nanoinstructions, the structure shown in Fig. 11.12 can be used. The most simple approach, which already has been mentioned under vertical microcode, is a single step sequence of nanoinstructions, namely the decoding of the control outputs out of an encoded control vector from the microcode control memory.
VLSI
Design Course
11-7
Controller Implementations
If feedback is introduced in the decoder PLA (via the NNA [nanocode next address] register), control sequences can be generated by the nanocode PLA. As long as a nanocode sequence is running, the MNA [microcode next address] register is halted. In the case that many microinstructions use the same nanocode sequences, signicant savings in implementation area for the whole controller can be reached.
. . . . . . . .Microcode . . . . . . . . .PLA ..... . . . . . ' . OR A . . . . ...................... c MNA T T Environmental Inputs ...... . . . . . ND . . . . . .T .....
. . . . . . . . . . . .
VLSI
Design Course
11-8
Introduction
Chapter 12
The following design guidelines have been adapted from [5]. These recommendations are useful in order to avoid functional faults and get the desired functionality.
12.2
Synchronous Circuits
all data storage elements are clocked the same active edge of a single clock is applied at precisely the same time to all storage elements
12.2.1
Non-Recommended Circuits
The clock-input of the second FF is skewed by the clock-to-q delay of the rst FF and not activated at every activation clock edge (e.g. ripple counter)
VLSI
Design Course
12-1
Synchronous Circuits
Clock skew caused by gating the clock line (e.g. multiplexer in clock line)
FFs are clocked on the opposite edges of the clock signal Insertion of scan-path impossible Diculties in determing critical path lengths
Synchronous design principle, that all FFs change state at exactly the same time is not fullled
VLSI
Design Course
12-2
Clock Buering
12.2.2 Recommended Circuits
Recommended circuits for synchronous circuit design are described in the subsequent sections.
12.3
12.3.1
Clock Buering
Non-Recommended Circuits
Clock skew
VLSI
Design Course
12-3
Clock Buering
Clock skew by dierent load-dependent delays Excessive clock fanouts should be avoided (slow edges)
VLSI
Design Course
12-4
Clock Buering
12.3.2 Recommended Circuits
Same depth of buering Same fanout Limited fanout in order to achieve sharp clock edges
VLSI
Design Course
12-5
Clock Buering
VLSI
Design Course
12-6
Gated Clocks
Non-Recommended Circuits
Signal change at multiplexer input can cause a glitch at the clk input (FF captures invalid data) Gating the clock line introduces clock skew
12.4.2
Recommended Circuits
VLSI
Design Course
12-7
Double-edged Clocking
Non-Recommended Circuit
12.5.2
Recommended Circuit
VLSI
Design Course
12-8
Asynchronous Resets
Non-Recommended Circuit
12.6.2
Recommended Circuits
VLSI
Design Course
12-9
Shift-Registers 12.7
12.7.1
Shift-Registers
Non-recommended Circuits
12.7.2
Recommended Circuits
VLSI
Design Course
12-10
Asynchronous Inputs
Non-Recommended Circuits
Circuits with complicated feedback loops to capture asynchronous inputs (very sensitive to noise and functionality can be inuenced by placement and routing delays
12.8.2
Recommended Circuits
1. Chain of two or more D-type registers (reducing the probability of metastability) 2. Use of 4-bit register as shift register 3. Asynchronous handshake circuit
The probability of propagating metastable state is decreased with increasing number of register stages
Figure 12.20: 4-bit register used as shift register to capture an asynchronous input
VLSI
Design Course
12-11
Asynchronous Inputs
VLSI
Design Course
12-12
Asynchronous Inputs
The asynchronous handshake ciruit works as follows: the rst ip-op is reset asynchronously when the r input is zero or when the qb outputs of the second and the third FF both have the value 0 the q-output of the rst FF is asynchronously set to high, when a positive edge arises at its ck-input the high output of the rst FF is propagated through the second and the third FF in the two following cycles. The q-outputs of these FFs are set to zero and the reset logic for the rst FF is activated. Now the rst FF is ready to receive another edge at its input. Three cases of metastability caused by simultaneously rising edges of the asynchronous input and the system clock: 1. the second FF stabilizes to q=1 before the next rising clock edge (circuit works as desired) 2. the second FF settles to q=0 and the third FF remains in its state. Since the output q of the rst FF is high, the propagation of this output works correctly, but it needs one cycle more than in the rst case. 3. The metastable state of the second FF is still there at the next rising edge of the clock signal. Then the third FF also becomes metastable. The probability of receiving a metastable d (internal) signal can be reduced by increasing the length of the register chain.
VLSI
Design Course
12-13
In general it can not be recommended to build circuits, which functionality relies on delays.
12.9.2
Recommended Circuits
Usage of higher clock speed and build synchronous pulse generator Minimum time resolution is given by clock cycle
12.10
12.10.1
Bistable Elements
Non-Recommended Circuits
VLSI
Design Course
12-14
Bistable Elements
Figure 12.27:
VLSI
Design Course
12-15
Bistable Elements
12.10.2
Recommended Circuits
VLSI
Design Course
12-16
RAMs and ROMs in Synchronous Circuits 12.11 RAMs and ROMs in Synchronous Circuits
Problem: RAMs are double-edge triggered. The address is latched on the opposite edge to the data
12.11.1
Recommended Circuits
Figure 12.32: Interfacing RAM into synchronous circuit: ME and WEbar generation
VLSI
Design Course
12-17
VLSI
Design Course
12-18
Tristates 12.12
12.12.1
Tristates
Non-Recommended Circuit
VLSI
Design Course
12-19
Tristates
12.12.2 Recommended Circuits
Figure 12.36: Tristate bus with central control of tristate enables and additional driver activated on non-controlled states
12.12.3
Multiplexer Tristates
Disadvantages of Tristates: large area limited buering large routing load, slow Advantages of Multiplexers: small area ecient routing Control decoding expense is the same for tristates and multiplexers.
VLSI
Design Course
12-20
Parallel Signals
Non-Recommended Circuits
12.13.2
Recommended Circuit
VLSI
Design Course
12-21
Fanout 12.14
12.14.1
Fanout
Non-Recommended Circuit
VLSI
Design Course
12-22
Fanout
12.14.2 Recommended Circuits
VLSI
Design Course
12-23
Fanout
VLSI
Design Course
12-24
2. Use AOI logic (complex cells from standard cell library) where possible
Figure 12.44: Late changing input fed late into combinational logic
VLSI
Design Course
12-25
q1 0 0 1 1 1 1 0 0 0
q2 0 0 0 1 1 1 1 0 0
q3 0 0 0 0 1 1 1 1 0
Figure 12.46: Using duplicate logic for reducing fanout 6. Use fast library cells where available 7. Reduce length of critical signal paths 8. Use Schmitt trigger inputs in noisy environments
VLSI
Design Course
12-26
12.16.1
Non-Recommended Circuits
Figure 12.47: Circuit with inaccessible internal logic: only rst block is controllable and only last block is directly observable
Figure 12.48: Chain of counters: rst counter is not directly observable and second counter is not directly controllable
VLSI
Design Course
12-27
Figure 12.49: Counter with closed feedback loop: initial state not known
VLSI
Design Course
12-28
Figure 12.51: Chain of counters broken by test input and output signals
VLSI
Design Course
12-29
Figure 12.52: Counter with feedback loop opened by test control and output signals
VLSI
Design Course
12-30
VLSI
Design Course
12-31
VLSI
Design Course
12-32
Motivation
Chapter 13
Stable chip manufacturing costs Increasing testing costs: Increasing number of gates/device Limited number of pins Increasing number of internal states Increasing logical and sequential depth Example: Testing of a combinational circuit with n inputs (10 MHz, one test per cycle) n 25 30 40 50 60 time for test 3 s 107 s 1 day 3.5 years 3656 years
VLSI
Design Course
13-1
Economical Considerations
Average Quality Level (AQL)
aql =
#DefectiveParts #AcceptedParts
(13.1)
VLSI
Design Course
13-2
Economical Considerations
13.2.2 Correlation: Fault Coverage and Defective Parts
DL(= AQL): Number of defective circuits which have been classied as correct working (testing with T ) Y: yield T: fault coverage DL = 1 Y 1T (13.2)
VLSI
Design Course
13-3
Economical Considerations
VLSI
Design Course
13-4
VLSI
Design Course
13-5
Fundamental Denitions
13.3.1 Chip Test after Manufacturing
Manufacturing Process Parametric Test (current/power dissipation)
(erroneous chips are marked with color points and removed after sawing)
13.4
Fundamental Denitions
fault: physical defect, imperfection or aw which occurs in an hardware or software component error: manifestation of a fault (erroneous information on an hardware line or in a program, caused by a fault) failure: malfunction of a system
VLSI
Design Course
13-6
Basis: physical phenomena Oxide defects Missing implants Lithographic defects Junction defects Metal shorts & opens Moisture accumulation Impurities/Contaminants Static discharge
VLSI
Design Course
13-7
Fault Models
VLSI
Design Course
13-8
Fault Models
VLSI
Design Course
13-9
Fault Models
VLSI
Design Course
13-10
Fault Models
VLSI
Design Course
13-11
Figure 13.6: Fault detection by duplication with complementary logic Self-Checking Logic Recongurable Array Structures
VLSI
Design Course
13-12
VLSI
Design Course
13-13
VLSI
Design Course
13-14
manually pseudo random (leads up to 60% fault coverage) algorithmic special test patterns for RAMs
13.7.1
The D-Algorithm
Every test generation procedure has to solve the following problems 1. Creation of a change at the faulty line 2. Propagation of the change to the primary output line In the D-Algorithm the symbols D and D are used to refer to the changes. D and D are used as follows: D: used if a line has the value 1 in absence of a fault and the value 0 in case of a fault ocurrance D: used if a line has the value 0 if no fault occurs and otherwise the value 1 The D-algorithm method for path sensitization consists of two principal phases: 1. forward drive (propagation) of an D-value to an primary output 2. backward trace (consistency operation) These two steps are iterated for dierent propagation paths for the D-value from one dedicated internal point i to one dedicated primary output point o until the backward trace phase is nished without any contradiction (a test vector for a fault at i has been found) or until all possible paths from i to o have been examined.
VLSI
Design Course
13-15
Figure 13.9: Basic concept of D-algorithm 1. A primitive D-cube of a failure is a D-cube associated with a fault l/ on the output line l of a gate G. This produces the value D or D on l and the input lines have values which would produce in the fault-free case.
Figure 13.10: Primitive D-cube of fault (pdcf) for two-input NAND gate
VLSI
Design Course
13-16
2. A propagation D-cube of a failure species the propagation of changes at one (or more) inputs of a gate G to its output l.
Figure 13.11: Propagation-D-cube (pdc) for two-input NAND gate 3. A singular cover of a gate G is a {0,1,X} truth table representation of G
VLSI
Design Course
13-17
VLSI
Design Course
13-18
VLSI
Design Course
13-19
In the following the D-algorithm is illustrated for the given example from g. 13.15
VLSI
Design Course
13-20
VLSI
Design Course
13-21
Running the D-algorithm for generating a test for line 5/0: 1. Start with D-cube for the fault 5/0:
2. The D of line 5 is automatically propagated to line 6 and 7 by cube j . 3. Now the propagation along path 6 9 11 is considered: D on line 6 is propagated to line 9 by cube d. Combining d and k yields cube l:
4. If cube i is used with D instead of D, the propagation to the output can be done:
5. Now the consistency phase is started and a value for line 4 has to be found. From the singular cover table can be seen that a 0 on line 10 implies both line 7 and line 8 to be 1. In cube m line 7 is a D (and also line 5 which is connected to 7 by j ) and this D must now be set to 1 which is a contradiction which disables the path sensitization 5 6/7 9 11. Start test vector generation using another path 6. Starting the propagation along 5 7 10 11 leads to the following cube:
7. From the singular cover table we get the information that a 1 on line 8 is the same as a 0 on line 4. Additionally it can be seen that the 0 on line 9 can be obtained by a 1 on line 1. 8. This yields the nal cube 1110DDD10DD
VLSI
Design Course
13-22
VLSI
Design Course
13-23
Fault Simulation
Algorithms: Serial Fault Simulation
13.8.2
Improved Algorithms
VLSI
Design Course
13-24
Circuit level: restriction of physically possible faults Logic level: restrict possibilities of realizations System level: restrict size of components and number of states Testability: controllability observability additional chip area required shorter design cycle Methods to improve controllability and observability: ad-hoc techniques structured approaches
Figure 13.17: Design for testability: complex gate (a) not testable with stuck-at model. (b) fully testable with stuck-at model
13.9.1
Ad-Hoc Techniques
developed for special design less silicon area design automation almost impossible partitioning (test of circuit components by use of dedicated multiplexers)
VLSI
Design Course
13-25
VLSI
Design Course
13-26
Figure 13.19: Testability: ad-hoc techniques (a) insertion of register in order to limit logic depth to a given maximum value. (b) test shift registers for PLA test (increasing PLA area).
VLSI
Design Course
13-27
Scan-Path: Main idea: test of sequential network is reduced to test of combinational network for circuits consisting of logic with some feedbacks can be realized by reconguration of latches as shift registers (two mode of use)
Figure 13.20: Feedback logic with scanpath Test scan-path/register function rst: Flush test (0 . . . 010 . . . 0) or shift test (00110011 . . .) (each register transfer is tested by this combination: 0 0, 0 1, 1 1 and 1 0). Cycle for testing combinational logic function: 1. Scan mode: Preload Y and set PI 2. System operation mode: Wait until inputs of Y are steady. Clock new state into Y. 3. Shift state out. Compare PO and state values with expected responses
VLSI
Design Course
13-28
Advantages: Testability of clocked circuits is improved and guaranteed at design stage Consistent with good VLSI design practice (rules, abstraction, modularity . . .) Does not require special CAD Disadvantages: Wastes silicon Constrains designer to design according given conditions Additional Complexity Overhead 2% for a fundamentally structured design 30% for wild logic
13.9.3
Built-In Tests
System generates test vectors by its own Analyse and evaluation of test vectors is also automatically done Compromise: silicon testability Test Pattern Generators Test patterns are generated inside the circuit to be tested Short testing time, simple test programs, self-test Example: Test pattern memories, deterministic generators, counter
VLSI
Design Course
13-29
VLSI
Design Course
13-30
VLSI
Design Course
13-31
Example:
VLSI
Design Course
13-32
Evaluation of testing results inside the circuit Counting techniques, signature analyse Example: Counting Techniques
VLSI
Design Course
13-33
Signature Analyse Communication technique: coding theory Code words: data stream D, polynom P(x), division modulo 2 D R =Q+ P P Evaluation of testing data
VLSI
Design Course
13-34
Signature Analyse: Degree of Fault Recognition 1. Length of sequence: m bit 2m sequences possible 2. One sequence contains no faults number of erroneous sequences is 2m 1
3. Length of signature register: n bit 2n signatures 4. 2m sequences are mapped on 2n signatures number of nondetectable faults: 2m 1 = 2mn 1 2n
5. Possibility for nondetection of erroneous sequence: number of nondetectable faults divided by number of possible faults: N= 6. Fault detection rate: F F = 1 2mn 1 2m 1 2mn 1 2m 1
1 2n
VLSI
Design Course
13-35
Interpretation: all faults recognized if m < n (trivial) long sequences: n is important only n = 16 bit F = 99,99985% Parallel Signature Register with k Inputs
Figure 13.26: Parallel signature register Fault recognition rate: F =1 2mkn 1 2mk 1
VLSI
Design Course
13-36
A BILBO register is a universal element for use in either a scanpath environment or a self-test (signature analysis) environment.
Figure 13.27: BILBO registers: 1. full circuit 2. normal use 3. scan-path use 4. signature analysis Advantages: Versatility Normal operation Scan-path test: enhances testability Test vector generation via LFSR Data compression via LFSR Combined scan-path/self-test using same LFSRs Disadvantages: Silicon area BILBO latch can be 50% larger than ordinary latch
VLSI
Design Course
13-37
VLSI
Design Course
13-38
JTAG Standard
Chapter 14
1985 rst meeting of small group from European electronics companies later North American companies joined the group ( Joint Test Action Group = JTAG) results: IEEE Standard Test Access Port and Boundary-Scan Architecture
VLSI
Design Course
14-1
VLSI
Design Course
14-2
Figure 14.3: Combined use of in-circuit and functional test Disadvantages of classical approach: high costs for test hardware increased density not suited for surface mount technology modern chip testing techniques as scan path techniques built-in self-test techniques (BIST)/BILBO are not exploited well
VLSI
Design Course
14-3
Scan-testing at the board-level: permits use of automatic test pattern generation tools simplication of the hardware of the test equipment
VLSI
Design Course
14-4
VLSI
Design Course
14-5
Boundary scan application properties and limitations each test vector has to be shifted into scan path not very suitable for testing the chips themselves because of reduced test rate compared to stand-alone chip testing well suited for interconnection testing testing of dynamic behaviour impossible self-testing ICs: boundary scan can be used to trigger the self-test procedure
VLSI
Design Course
14-6
TAP Controller: responds to the control sequences supplied through the test access port (TAP) and generates the clocks an control signals required for the operation of the other circuit blocks Instruction Register: shift register which is serially loaded with instruction for test Test Data Registers: Bank of shift registers. The stimuli values required for a test are serially loaded into a test register selected by the current instruction. After execution the results can be shifted out for examination
VLSI
Design Course
14-7
14.3.2
Test Clock Input (TCK): independent of the system clock; used for synchronization of test operations between various chips on a board Test Mode Select Input (TMS): Input for controlling the test logic Test Data Input (TDI): Serial input for instruction and test register data Test Data Output (TDO): Serial output of instruction or test register data (source selected by TMS code) Optional Test Reset Input (TRST): For test initialization
VLSI
Design Course
14-8
VLSI
Design Course
14-9
Control of the test signals by external automatic test equipment (ATE) or by on-board bus master chip
Figure 14.11: Use of bus master chip to control IEEE Std 1149.1 chips
14.3.3
TAP-Controller
16-state FSM which controls data register (DR) and instruction register (IR) operations input signals: TRST TCK TMS last state (stored in internal FFs)
VLSI
Design Course
14-10
14.3.4
VLSI
Design Course
14-11
VLSI
Design Course
14-12
Test data registers: bypass register (mandatory) boundary scan register (mandatory) device identication register (optional) Bypass Register
VLSI
Design Course
14-13
VLSI
Design Course
14-14
VLSI
Design Course
14-15
Chapter 15
Typical signal processing applications require mixed analog/digital implementations. These mainly consist of Preprocessing of the signals, e.g. ltering and A/D conversion Digital signal processing, e.g. digital ltering, calculation of FFT Postprocessing, e.g. D/A conversion as shown in Fig.15.1 The aim of development is to integrate all these functions on a single chip.
VLSI
Design Course
15-1
Figure 15.3: Signal bandwidths that can be processed by present day (1989) technologies
VLSI
Design Course
15-2
Fig. 15.4 illustrates how analog-to-digital (A/D) and digital-to-analog (D/A) converters are used in data systems. In general, an A/D conversion process will convert a sampled and held analog signal to a digital word that is a representative of the analog signal. The D/A conversion process is essentially the inverse of the A/D process. Digital words are applied to the input of the D/A converter to create from a reference voltage an analog output signal that is a representative of the digital word.
Figure 15.4: Converters in signal processing systems: (a) A/D, (b) D/A
VLSI
Design Course
15-3
Input to D/A converters are (a) a digital word of N bits (b1 , b2 , b3 , . . . , bN ) (b) a reference Voltage Vref The output voltage can be expressed as VOU T = KVref D where K is a scaling factor and D is given as D= b2 b3 bN b1 + 2 + 3 + ... + N 1 2 2 2 2
N
(15.1)
(15.2)
bi 2i
(15.3)
Figure 15.5: (a) Conceptual block diagram of a D/A converter, (b) Clocked D/A converter In most cases, the digital input of the D/A converter is synchronously clocked. It is therefore necessary to provide a latch to hold the word for conversion and a sample-and-hold circuit at the output, as shown in Fig. 15.5(b). The basic architecture of the D/A converter without an output sample-and-hold circuit is shown in Fig. 15.7. Fig. 15.8 shows the ideal input-output characteristics for such a D/A converter.
15.2.1
The output Voltage of a current-scaling D/A converter as shown in Fig. 15.9 can be expressed as R R b1 b2 b3 bN Vout = I0 = + + + . . . + N 1 Vref (15.4) 2 2 R 2R 4R 2 R = Vref (b1 21 + b2 22 + b3 23 + . . . + bN 2N ) (15.5)
VLSI
Design Course
15-4
Digital-To-Analog Converters
Figure 15.6: (a) Sample-and-hold circuit, (b) Waveforms illustrating the operation of the sample-and-hold circuit
VLSI
Design Course
15-5
Digital-To-Analog Converters
The major disadvantage of this approach is the large ratio of component values. For example, the ratio of the resistor for the MSB to the resistor for the LSB is given by RM SB 1 = N 1 RLSB 2 For a 8-bit converter, this gives a ratio of 1/128. An alternative to this approach is the use of a R-2R ladder as shown in Fig. 15.10. Using the fact that the resistance to the right of any of the vertical 2R resistors is 2R, we see that the currents I1 , I2 , I3 , . . . , IN are binary-weighted and given as I1 = 2I2 = 4I3 = . . . = 2N 1 IN Thus, the output voltage of the R-2R D/A converter is given by Eq. 15.5. (15.7) (15.6)
VLSI
Design Course
15-6
Digital-To-Analog Converters
Figure 15.9: (a) Conceptual illustration of a current-scaling D/A converter, (b) Implementation of (a)
VLSI
Design Course
15-7
Digital-To-Analog Converters
15.2.2 Voltage Scaling D/A Converters
A voltage-scaling D/A converter is shown in Fig. 15.11. Its output voltage at any tap i can be expressed as Vref Vi = (i 0.5) (15.8) 8 The output voltage of the D/A converter is then determined by the values of the inputs b1 , b2 and b3 .
Figure 15.11: Illustration of a voltage-scaling D/A converter The structure of this voltage-scaling D/A converter is very regular and thus well suited for MOS technology. A problem with this type of D/A converters is the accuracy requirements of the resistors used. This makes it dicult to build D/A converters of this type with more than 8 bit resolution.
VLSI
Design Course
15-8
The objective of an A/D converter is the determination of the digital word corresponding to the analog input signal. Usually a sample-and-hold circuit (see Fig. 15.6) is required at the input of the A/D converter because it is not possible to convert a changing analog signal. A block diagram of a general A/D converter is shown in Fig. 15.12. The ideal input-output characteristics for a A/D converter are shown in Fig. 15.13.
VLSI
Design Course
15-9
Analog-To-Digital Converters
15.3.1 Serial A/D Converters
Two possible implementations of serial A/D converters are single-slope and dual-slope A/D converters. Both will not be discussed in detail here. The main advantages of these converters is their simplicity, their main disadvantage is the long conversion time required.
15.3.2
This type of A/D converters converts an analog input into an N-bit digital word in N clock cycles. Consequently, the conversion time is less than for the serial converters without much increase in the complexity of the circuit. Fig. 15.14 shows an example of a successive approximation A/D converter architecture.
Figure 15.14: Example of a successive approximation A/D converter architecture The successive approximation process is shown in Fig. 15.15.
VLSI
Design Course
15-10
Analog-To-Digital Converters
15.3.3
In many applications, it is necessary to have a smaller conversion time than is possible with the previously described A/D converter architectures. Parallel A/D converters, also known as ash A/D converters, typically require down to one clock cycle for conversion. An architecture of a 3-bit parallel A/D converter is shown in Fig. 15.16. Parallel A/D converters can reach typically up to 20 MHz for CMOS technology. The sampleand-hold time may though be larger than 50 ns and could prevent this conversion time from being realised. Another problem is that the number of comparators required is 2N 1 . For N greater than 8, too much area is required. One method of achieving small system conversion times is to use slower A/D converters in parallel, which is called time-interleaving and is shown in Fig. 15.17. Here M successive approximation A/D converters are used in parallel to complete the N -bit conversion of one analog signal per clock cycle. The sample-and-hold circuits consecutively sample and apply the input analog signal to their respective A/D converters. N clock cycles later, the A/D converter provides a digital word output. If M = N , then a digital word is given out every clock cycle. If one examines the chip area for an N -bit A/D converter using the parallel A/D converter architecture (M = 1) compared with the time-interleaved architecture for M = N , the minimum area will occur for a value of M between 1 and N .
VLSI
Design Course
15-11
Analog-To-Digital Converters
VLSI
Design Course
15-12
Analog-To-Digital Converters
VLSI
Design Course
15-13
Analog-To-Digital Converters
15.3.4 Sigma-Delta A/D Converter
Introduction The basic structure of a sigma-delta converter is shown in Fig. 15.18. The sigma-delta converter can be referred to as an oversampling converter, although oversampling is just one of the techniques contributing to the performance of a sigma-delta converter. The sigma-delta converter shown in Fig. 15.18 quantizes an analog signal with very low resolution (1 bit) and a very high sampling rate (2 MHz). With the use of oversampling techniques and digital ltering, the sampling rate is reduced (8 kHz) and the resolution is increased (16 bits).
Figure 15.18: Basic structure of a sigma-delta converter A more detailed block diagram of the sigma-delta modulator is shown in Fig. 15.19. It consists of an integrator, a quantizer (comparator for 1 bit) and a feedback loop with a D/A converter (switch for 1 bit). The output of the sigma-delta modulator is shown in Fig.15.20 for a sine wave input. The single-bit conversion will result in an output which is either 1 or 0. When the signal is near plus full scale, the output is positive during most of the clock cycles. The opposite is true for near minus full scale signals. When the output is followed by a digital lter as shown in Fig. 15.18 which can perform sophisticated averaging functions, the 1-bit sequence is transformed into a much more meaningful signal.
Noise Shaping One feature that makes the sigma-delta converter so powerful is its noise shaping capability. To understand how this works, the analysis of the sigma-delta modulator in the frequency domain is appropriate. Fig.15.21 shows the frequency domain linearized model of a sigmadelta modulator.
VLSI
Design Course
15-14
Analog-To-Digital Converters
VLSI
Design Course
15-15
Analog-To-Digital Converters
The integrator is represented as a analog lter. For an integrator, the transfer function has 1 an amplitude which is inversly proportional to the input frequency ( f relationship). The quantizer is modelled as a gain stage followed by the addition of quantization noise. Thus, the output y of the sigma-delta converter can be expressed by y = (x y ) 1 +q f (15.9)
where (x y ) is the dierence signal from the summing node at the input and q is the quantization noise. Applying some algebraic rearrangement yields y = 1+ 1 y = f y = y = x y +q f f x +q f
x f
1+
1 f
q 1+
1 f
x qf + f +1 f +1
(15.10)
At a frequency f = 0, the output signal equals x with no noise element q . At higher frequencies, the value of x is reduced and the inuence of q increases. In essence, the sigma-delta modulator has a low pass eect on the signal and a high pass eect on the noise. As a result of this, the modulator can be thought of as a noise shaping lter where noise in the signal pass band is reduced and noise energy is pushed into the higher frequency region. The eect of this procedure on normally equally distributed (white) quantization noise is shown in Fig. 15.22.
VLSI
Design Course
15-16
Analog-To-Digital Converters
Digital Filtering The sigma-delta modulator described so far produces a stream of single-bit digital values at a very high rate. The modulators output bit stream is fed into the converters digital lter, which performs several dierent functions. All of these functions, however, are integrated into a single lter implementation. The functions of the lter are: sophisticated averaging (low pass ltering) removing high frequency noise (quantization noise) reducing sampling rate The sampling rate reduction is done by averaging over a sample of cycles of the input bit stream and produces an output data stream that is reduced in sampling rate, but increased in resolution (i.e. number of bits per sample). Advantages of Sigma-Delta Converters The advantages of the sigma-delta converter technology are Sigma-delta converters are a complete conversion and ltering system, additional digital ltering functions may easily be implemented in the digital output lter of the converter Very low-cost and high-performance conversion ist possible as the analog part of the converter is very simple and need not be as accurate as in other A/D converters. The main part of the converter is the digital lter which can be integrated more easily in MOS technology. excellent signal-to-noise performance, therefore high resolution converters possible no sample-and-hold circuit preceeding the converter is neccessary as sampling rates are very high
VLSI
Design Course
15-17
Bibliography
Bibliography
[1] M. Anaratone. Digital CMOS Circuit Design. Kluwer Academic Publishers, 1986. [2] Stephen D. Brown, Robert J. Francis, Jonathan Rose, and Zvonko G. Vranesic. FieldProgrammable Gate Arrays. Kluwer Academic Publishers, 1992. [3] Joseph J. F. Cavanagh. Digital Computer Arithmetic - Design and Implementation. McGraw-Hill, Inc., 1985. [4] Murray Disman. The Programmable Logic IC Market. Electronic Trend Publications, 1992. [5] European Silicon Structures (ES2), Zone Industrielle, 13106 Rousset, France. Solo 2030 User Guide, e02a02 edition, June 1992. [6] Daniel D. Gajski. Silicon Compilation. Addison-Wesley Publishing Company, Inc., 1988. [7] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits. McGraw-Hill, Inc., 1990. [8] Abhijit Ghosh, Srinivas Devadas, and A. Richard Newton. Sequential Logic Testing and Verication. Kluwer Academic Publishers, 1992. [9] Lance A. Glasser and Daniel W. Dobberpuhl. The Design and Analysis of VLSI Circuits. Addison-Wesley Publishing Company, 1985. [10] John P. Hayes. Computer Architecture and Organization. McGraw-Hill, Inc., 1988. [11] David A. Hodges and Horace G. Jackson. Analysis and Design of Digital Integrated Circuits. McGraw-Hill, 1983. [12] Ernest E. Hollis. Design of VLSI Gate Array ICs. Prentice-Hall, 1987. [13] Kai Hwang. Computer Arithmetic Principles, Architectures, and Design. John Wiley and Sons, 1979. [14] Barry W. Johnson. Design and Analysis of Fault-Tolerant Digital Systems. AddisonWesley Publishing Company, 1989. [15] Parak K. Lala. Digital System Design using Programmable Logic Devices. Prentice-Hall, 1990.
VLSI
Design Course
16-1
Bibliography
[16] W. Maly. Atlas of IC Technologies: An Introduction to VLSI Processes. The Benjamin/Cummings Publishing Company, 1987. [17] Colin M. Maunder and Rodham E. Tulloss. The Test Access Port and Boundary Scan Architecture. IEEE Computer Society Press, 1990. [18] John Mavor, Mervyn A. Jack, and Peter B. Denyer. Introduction to MOS LSI Design. Addison Wesley, 1983. [19] William J. McClean (Editor). ASIC Outlook 1993. ICE (Integrated Circuit Engineering Corporation), 1993. [20] Dhiraj K. Pradhan, editor. Fault-Tolerant Computing: Theory and Techniques, volume I. Prentice-Hall, 1986. [21] Bryan T. Preas and Michael J. Lorenzetti. Physical Design Automation of VLSI Systems. The Benjamin/Cummings Publishing Company, 1988. [22] S. M. Sze. VLSI Technology. McGraw-Hill, Inc., 1988. [23] Takao Uehara and William M. van Cleemput. Optimal Layout of CMOS Functional Arrays . In IEEE Transactions on Computers, pages 305312, May 1981. [24] John P. Uyemura. Fundamentals of MOS Digital Integrated Circuits. Addison Wesley, 1988. [25] John P. Uyemura. Circuit Design for CMOS VLSI. Kluwer Academic Publishers, 1992. [26] Stephen A. Ward and Robert H. Halstead. Computation Structures. MIT-Press, 1990. [27] Neil Weste and Kamran Eshraghian. Principles of CMOS VLSI design. Addison-Wesley Publishing Company, 1985. [28] T.W. Williams, editor. VLSI Testing, volume 5 of Advances in CAD for VLSI. Elsevier Science Publishers B.V., 1986.
VLSI
Design Course
16-2