CHP 3 - Sub-System Design

Sub-System Design
Topics

Architectural issues. Switch logic, Gate Logic. Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders, Multipliers. Sequential Circuits. Semiconductor Memories.
Introduction

Large systems are composed of sub-systems, known as Leaf-Cells The most basic leaf cell is the common logic gate (inverter, nand, ..etc) Structured Design

High regularity Leaf cells replicated many times and interconnected to form the system
Logical and systematic approach to VLSI design is essential
Dealing with Complexity
Divide and conquer - limit the number of components you deal with at any one time Group several components into larger components

transistors form gates gates form functional units functional units form processing elements
A System-on-a-Chip
Courtesy: Philips
Major Levels of Design
Specification Description of requirements Systems Level placing and interconnecting major functional units Function Level specification and design of major functional units Logic/Circuit Level Gate level design, gate interconnection design Layout Level what will actually be patterned onto the chip, how the chip will be processed Physics Level the physics of gate and switch operation
Sub-System Design Guidelines
Define the requirements. Partition the overall architecture into subsystems. Consider interconnection paths between the subsystems. System floor plan on silicon chip. Regular structures for replication. Stick diagram for each leaf-cell (module) in the system. Convert the stick diagram of each leaf-cell into layout and go for design rule check. Simulate the performance of each cell.
Design Validation
Must check at every step that errors have not been introduced
the longer the error remains the more expensive it becomes to remove it
Chip Architecture Floor plan

After high level design is complete, it is necessary to decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of supply and clock rails In doing this sufficient space must be left between power rails to allow for data-buses and combinational logic cells Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area.
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors? 1) Pass Transistors 2) Transmission gates
Pass Transistors

We have assumed source is grounded What if source > 0?
e.g. pass transistor passing VDD

VDD VDD
NMOS Pass Transistors
Require one transistor and one gate signal Transmit 0 well, but when Vdd is applied to the drain, the voltage at the source is Vdd-Vtn When switch logic drives gate logic, n-type switches can cause electrical problems
When n-type switch driving a complementary gate cause the gate to run slower when the switch input = 1
Since pull down current is weaker when a lower gate voltage is applied The complementary gates pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors

When Vin= Vdd , then Vout= Vdd When Vin=0, CL will be discharged through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device.
Voltage degradation of Pass Transistors
Pass Transistor Ckts

VDD VDD VDD VDD VDD VDD
VDD VDD VSS
Pass Transistor Ckts

V DD V DD V s = V DD -V tn V DD V DD V DD -V tn V -V DD tn V DD -V tn V DD V DD
V s = |V tp | V SS
V DD V DD
V DD -V tn V DD -2V tn
Complementary Pass Transistor Logic

A A B B Pass-Transistor Network F
(a)
A A B B Inverse Pass-Transistor Network F
A B A B AND/NAND F=AB F=AB
A B A B OR/NOR F=A+B F=A+B
A A A A EXOR/NEXOR F=A F=A
(b)
Advantages and Disadvantages
Advantages:

Less no. of Transistors. No Static Power Consumption.
Disadvantages:

Output voltage degrades. Not an ideal switch due to series resistance. Delay of series pass transistors is large.
Transmission gates
Transmission gates (contd..)
Logic with Transmission gates
Logic with Transmission gates
Advantages:

Less no. of Transistors compared to CMOS. No Static Power Consumption. Efficient building of complex gates.
Disadvantages:

Not an ideal switch due to series resistance. Delay of series transmission gates is large.
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter.

Zp.u / Zp.d ratio Pull-up and Pull-down resistance. Power dissipation. Standard unit of capacitance Gate delay.
Sizing of NMOS Nand Gate.
The ratio between p.u to all p.d transistors (Zp.u / nZp.d) must be minimum 4:1 for making correct level of output voltage.
nMOS Nand gate geometry reveals two factors:
Area of nand gate is greater than area of inverter

because more no. of pull down transistors and corresponding increase in length of pull up transistor.
Delay is also increased due to direct proportion to the

number of inputs added.
NMOS Nor Gate
Since the pull down transistors are parallel in Nor gate, i.e. the pull down ratio for all transistors is same. So, it has same characteristics as inverter. The area occupied is reasonable, since there is no increase in length of pull-up transistor. So, Nor gate is preferable than nand gate.
CMOS Logic
Properties of CMOS Gates

High noise margins : VOH and VOL are at VDD and GND, respectively. No static power consumption : There never exists a direct path between VDD and VSS (GND) in steady-state mode.
Comparable rise and fall times: (under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate, but we have to keep the geometry symmetry by Allowing extended fall times for series nmos transistors (for series resistance). Keep the transfer characteristics for Vdd/2. CMOS Nor Gate has series p-transistors which increase the resistance and delay. This effect the transfer characteristics and reduce noise immunity. So, Geometries of nmos and pmos transistors should change.
CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS: Switch Delay Model

NAND2 A Req NOR2
A
Rp Rp B Rn B Rn A Cint A CL Rn A CL Rn A Rn B CL A INV Rp B Rp Cint Rp
Input Pattern Effects on Delay
Rp A Rn B Rn A B
Rp
Delay is dependent on the pattern of inputs Low to high transition
both inputs go low
CL
one input goes low
delay is 0.69 Rp/2 CL delay is 0.69 Rp CL
Cint
High to low transition
both inputs go high
delay is 0.69 2Rn CL
Delay Dependence on Input Patterns

3 2.5 2
A=B=10 A=1 0, B=1 A=1, B=10
Input Data Pattern A=B=01 A=1, B=01 A= 01, B=1 A=B=10
Delay (psec) 67 64 61 45
Voltage [V]
1.5 1 0.5 0 -0.5 0 100
A=1, B=10
200 300 400
80 81
A= 10, B=1
time [ps]
NMOS = 0.5m/0.25 m PMOS = 0.75m/0.25 m CL = 100 fF
Pseudo NMOS Logic

Reducing the no.of inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD In1 In2 InN
PUN F PDN
In1 In2 InN
Static CMOS
Pseudo NMOS Operation

The pulldown network of the gate is the same as for a fully complementary gate. The pullup network is replaced by a single p-type transistor whose gate is connected to VSS leaving the transistor permanently on. The p-type transistor is used as a resistor. When the gate input is 0V, both n-type transistors are off and the p-type transistor pulls the gates output up to VDD. When the gate input is Vdd, both the p-type and n-type transistor are on and both are operating to determine the gates output voltage.
Pseudo NMOS Characteristics
Pseudo NMOS VTC

3.0
2.5
2.0
W/Lp = 4
Vou t [V]
1.5
W/Lp = 2
1.0
0.5
W/Lp = 0.5 W/Lp = 0.25
W/Lp = 1
0.0 0.0
0.5
1.0
1.5
2.0
2.5
Vin [V]

Advantages:
Main advantage of the pseudo-nMOS gate is the small size of the pullup network, both in terms of number of devices and wiring complexity. Disadvantages: Due to more pull-up resistance, delay is more and hence speed of circuits is less. More Static power dissipation due to conduction path between VDD and VSS
Dynamic CMOS Logic
The disadvantage of Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic. It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases.
Clk In1 In2 In3 Clk
Mp
Out CL
PDN
Me
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation
In static circuits at every point in time (except when switching) the output is connected to either GND or VDD via a low resistance path.
fan-in of n requires 2n (n N-type + n P-type) devices.
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes.
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation
Precharge. When CLK goes low, the p-type transistor starts charging the precharge capacitance. The pulldown transistors controlled by the clock keep that precharge node from being drained. The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1. Evaluate. When CLK goes high, precharging stops i.e. the p-type pullup turns off. The evaluation phase begins i.e. the n-type pulldown turns on. The input signals must monotonically riseif an input goes from 0 to 1 and back to 0, it will inadvertently discharge the precharge capacitance. If the inputs create a conducting path through the pulldown network, the precharge capacitance is discharged, forcing its gates output to 0. If input is not 1, then the gates output would be left charged at logic 1.
Conditions on Output
Once the output of a dynamic gate is discharged, it cannot be charged again until the next precharge operation. Inputs to the gate can make at most one transition during evaluation. Output can be in the high impedance state during and after evaluation (PDN off), state is stored on CL
Clk
off Mp on
1 Out ((AB)+C) C
A B Clk off Me on
Properties of Dynamic Gates
Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic levels. Faster switching speeds due to reduced load capacitance. Overall power dissipation usually higher than static CMOS. no glitching higher transition probabilities extra load on Clk PDN starts to work as soon as the input signals exceed VTn, so VM, VIH and VIL equal to VTn low noise margin (NML)
Advantages: low area higher speed than static complementary gates. Disadvantages: Precharged gates introduce functional complexity because they must be operated in two distinct phases, Requires introduction of a clock signal. They are also more sensitive to noise; Their clocking signals also consume power and are difficult to turn off to save power.
Cascading Dynamic Gates

V Clk
Mp
Clk Out1
Mp
Clk Out2 In
In
Clk
Me
Clk
Me
Out1
VTn V t
Out2
Only 0 1 transitions allowed at inputs!
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation. However, since there is a finite propagation delay for the input to discharge Out1 to GND, the second output also starts to discharge. The second dynamic inverter turns off (PDN) when Out1 reaches VTn. Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -> 1 transition during the evaluation period. Setting all inputs of the second gate to 0 during precharge will fix it.
Domino CMOS Logic

Clk In1 In2 In3 Clk
Mp
11 10
Out1
PDN
Me
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation
Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic).
When CLK goes high, precharging stops i.e. the p-type pullup turns off. The evaluation phase begins i.e. the n-type pulldown turns on. The input signals must monotonically riseif an input goes from 0 to 1 and back to 0, it will inadvertently discharge the precharge capacitance. If the inputs create a conducting path through the pulldown network, the precharge capacitance is discharged, forcing its value to 0 and the gates output (through the inverter) to 1. If input is not 1, then the storage node would be left charged at logic 1 and the gates output would be 0.
Properties of Domino Logic

Only non-inverting logic can be implemented. Smaller area compared to Static CMOS. Free of glitches due to transistion from logic 1 to logic 0 only. Very high speed

static inverter can be skewed, only L-H transition Input capacitance reduced smaller logical effort
Cascading Domino Logic Gates

Clk
Mp
11 10
Out1
Clk
00 01
Mp
Out2
In1 In2 In3

Clk
PDN
In4 In5 Clk
PDN
Me
Me
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period. Hence, the only possible transition during evaluation is 0 -> 1
Clocked CMOS Logic
It is a combination of

Static CMOS Clk input given to one additional NMOS. Clk input given to another additional PMOS.
Fan-in for this logic is 2n+2. When Clk goes high, then logic of the circuit is evaluated due to NMOS in ON condition. When Clk goes low, then Logic is not evaluated due to NMOS in OFF condition.

Advantages: Clocked CMOS logic has been used for very low power CMOS and/or for minimizing hot electron effect problems in N-FET devices Disadvantages: More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic
An elegant solution to the dynamic CMOS logic erroneous evaluation problem is to use NP Domino Logic (also called NORA logic) as shown below.
Alternate stages of N logic with stages of P logic
N logic stages use true clock, normal precharge and evaluation phases, with N logic tree in the pull down leg. P logic stages use a complement clock, with P logic stage tied above the output node. During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd. During evaluate clk is high (-clk is low) and both type stages go through evaluation; N-logic tree logically evaluates to ground while Plogic tree logically evaluates to Vdd.
Inverter outputs can be used to feed other N-blocks from Nblocks, or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example
Logic below: Stage 1 is X = (A B) Stage 2 is G = X + Y Stage 3 is Z = (F G + H)
BiCMOS Technology
CMOS properties:

Low Power Slower
BJT properties:

Faster High Power
Implementation: CMOS & BJT

Core Logic: CMOS, Interface: BJT
Basic Circuits
Inverter Nand Gate Nor Gate
BiCMOS Circuits
Inverter
Nand Gate
Structured Design
Designing the Digital block or standard cell in the following levels

Logic level Circuit level Stick Diagram Layout
Examples : Combinational Logic

5-way selector. Multiplexers & Demultiplexers. Encoders & Decoders. Parity Generator. Bus Arbitration Logic. Gray Code to Binary Code Converter.
Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output
Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of

Multiplexers :- L=11, W=7. i) NMOS Inverter (8:1 and 4:1 ratio)

Butting contact :- L=22, W=10
Buried contact :- L=30, W=10
ii) CMOS Inverter :- L=35, W=18
Communication paths :- 3 spacing between metal contacts.
So, for Adder Standard cell :- L=190, W=150
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder, define what happens to carries
Generate: Cout = 1 independent of C
G=AB
Propagate: Cout = C
P=AB
Kill: Cout = 0 independent of C
K = ~A ~ B
Manchester Carry Chain
Build from switch logic using propagate, generate, & kill
Manchester Chain adders
The carry input is precharged with clock signal instead of passing through the Logic. Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor. Generate signal (G= A.B)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design: cascade full adders

Critical path goes from Cin to Cout Design full adder to have fast carry delay
B4 A3 C3 B3 A2 C2 S3 S2 B2 A1 C1 S1 B1 Cin
A4 Cout S4
Carry Look-Ahead Adder (CLA)
To avoid the linear growth of the carry delay, we use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel. Output Carry is represented in terms of generate and propagate signals. The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder. The adder is divided into two blocks

One block with logical 0 Cin. Another block with logical 1 Cin.
S and Cout are selected by actual Cin using 2:1 Multiplexer.
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=P.K1+(M-1).K2 where, M=No. of Blocks in the adder. P=No. of Cells of each block. K1=Delay through one adder cell. K2 = Delay through Multiplexer.
Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder. Looks for cases in carry-out of a set of bits is identical to carry in. Typically organized into m-bit stages. If Ai Bi , for every bit in stage, then bypass gate sends stages carry input directly to carry output.
Simplified Carry-Skip Adder Logic

bi+1 ai+1 bi ai
ci+1
FA
ci+1
FA
ci
si+1
pi+1
si
pi
skip signal
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1) then C3 = C0 else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by, T=2(P-1)K1+(M-2)K2, where k1=delay through one adder cell k2=delay for skipping the carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK

Using 32-bit operands, a multi-level carry-skip adder was 14 % faster and its power dissipation was 58 % of that of the carry-lookahead adder. Using 64-bit operands, a one-level carry-skip adder was 38% slower and its power consumption is 68 %
of the the carry-lookahead adder.
Comparison of Adders
MULTIPLIERS
Multipliers: Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers
Serial parallel Multiplier Braun Array 2s Complement multiplication using Baugh-Wooley method Pipelined Multiplier Array Modified Booths Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddas Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier: Procedure
Structure of Booths Multiplier
Wallace Tree Multiplier
A Wallace tree is an implementation of an adder tree designed for minimum propagation delay. Completion time is proportional to log2n. Optimized column adder tree Combines all partial products into 2 vectors (carry and sum) . Carry and sum outputs combined using a conventional adder. Compresses the no. of stages of partial products.
Wallace Tree Multiplier Structure
Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools. Partition the system sensibly. Aiming for simple interconnection and high regularity between sub-systems. Generate and verify each section of the design. Calculate the dimensions of the layout of subsystems and check the proportion in the total chip area.
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU

CHP 3 - Sub-System Design

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CHP 3 - Sub-System Design

Загружено:

Авторское право:

Доступные форматы

Sub-System Design

Logical and systematic approach to VLSI design is essential

Dealing with Complexity

Major Levels of Design

Sub-System Design Guidelines

Chip Architecture Floor plan

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

We have assumed source is grounded What if source > 0?

e.g. pass transistor passing VDD

NMOS Pass Transistors

PMOS Pass Transistors

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDD VDD VSS

Pass Transistor Ckts

Complementary Pass Transistor Logic

A B A B AND/NAND F=AB F=AB

A B A B OR/NOR F=A+B F=A+B

A A A A EXOR/NEXOR F=A F=A

Advantages and Disadvantages

Less no. of Transistors. No Static Power Consumption.

Transmission gates (contd..)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Sizing of NMOS inverter

Sizing of NMOS Nand Gate.

nMOS Nand gate geometry reveals two factors:

Area of nand gate is greater than area of inverter

Delay is also increased due to direct proportion to the

NMOS Nor Gate

Properties of CMOS Gates

CMOS Nand and Nor Characteristics

Static CMOS: Switch Delay Model

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs Low to high transition

both inputs go low

one input goes low

delay is 0.69 Rp/2 CL delay is 0.69 Rp CL

High to low transition

both inputs go high

delay is 0.69 2Rn CL

Delay Dependence on Input Patterns

A=B=10 A=1 0, B=1 A=1, B=10

Input Data Pattern A=B=01 A=1, B=01 A= 01, B=1 A=B=10

1.5 1 0.5 0 -0.5 0 100

NMOS = 0.5m/0.25 m PMOS = 0.75m/0.25 m CL = 100 fF

Pseudo NMOS Logic

VDD In1 In2 InN

In1 In2 InN

Pseudo NMOS Operation

Pseudo NMOS Characteristics

Pseudo NMOS VTC

W/Lp = 0.5 W/Lp = 0.25

Advantages and Disadvantages

Dynamic CMOS Logic

Clk In1 In2 In3 Clk

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation

fan-in of n requires 2n (n N-type + n P-type) devices.

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation