VLSI Design
VLSI Design
• pMOS
• nMOS
• CMOS
• BiCMOS
• GaAs
BiCMOS:
• A known deficiency of MOS technology is its limited load driving capabilities (due to
limited current sourcing and sinking abilities of pMOS and nMOS transistors.
• Bipolar transistors have
– higher gain
– better noise characteristics
– better high frequency characteristics
• BiCMOS gates can be an efficient way of speeding up VLSI circuits
• See table for comparison between CMOS and BiCMOS
• CMOS fabrication process can be extended for BiCMOS
• Example Applications
– CMOS - Logic
– BiCMOS - I/O and driver circuits
– ECL - critical high speed parts of the system
TECHNOLOGIES:
Starting with an uniformly doped silicon wafer, the fabrication of integrated circuits (IC's)
needs hundreds of sequential process steps. The most important process steps used in the
semiconductor fabrication are
a) Lithography
Lithography is used to transfer a pattern from a photomask to the surface of the wafer. For
example the gate area of a MOS transistor is defined by a specific pattern. The pattern
information is recorded on a layer of photoresist which is applied on the top of the wafer. The
photoresist changes its physical properties when exposed to light (often ultraviolet) or another
source of illumination (e.g. X-ray). The photoresist is either developed by (wet or dry)
etching or by conversion to volatile compounds through the exposure itself. The pattern
defined by the mask is either removed or remained after development, depending if the type
of resist is positive or negative. For example the developed photoresist can act as an etching
mask for the underlying layers.
b) Etching
Etching is used to remove material selectively in order to create patterns. The pattern is
defined by the etching mask, because the parts of the material, which should remain, are
protected by the mask. The unmasked material can be removed either by wet (chemical) or
dry (physical) etching. Wet etching is strongly isotropic which limits its application and the
etching time can be controlled difficultly. Because of the so-called under-etch effect, wet
etching is not suited to transfer patterns with sub-micron feature size. However, wet etching
has a high selectivity (the etch rate strongly depends on the material) and it does not damage
the material. On the other side dry etching is highly anisotropic but less selective. But it is
more capable for transfering small structures.
c) Deposition
Processes like etching, deposition, or oxidation, which modify the topography of the wafer
surface lead to a non-planar surface. Chemical mechanical planarization (CMP) is used to
plane the wafer surface with the help of a chemical slurry. First, a planar surface is necessary
for lithography due to a correct pattern transfer. Furthermore, CMP enables indirect pattering,
because the material removal always starts on the highest areas of the wafer surface. This
means that at defined lower lying regions like a trench the material can be left. Together with
the deposition of non-planar layers, CMP is an effective method to build up IC structures.
e) Oxidation
Oxidation is a process which converts silicon on the wafer into silicon dioxide. The chemical
reaction of silicon and oxygen already starts at room temperature but stops after a very thin
native oxide film. For an effective oxidation rate the wafer must be settled to a furnace with
oxygen or water vapor at elevated temperatures. Silicon dioxide layers are used as high-
quality insulators or masks for ion implantation. The ability of silicon to form high quality
silicon dioxide is an important reason, why silicon is still the dominating material in IC
fabrication.
OXIDATION TECHNIQUES
1.Cleaned wafers are placed in the wafer load station where dry nitrogen (N2) is introduced
into the chamber. The nitrogen prevents oxidation from occurring while the furnace reaches
the required temperature.
2.Once the specified temperature in the chamber is reached, the nitrogen gas flow is shut off
and oxygen (O2) is added to the chamber. The source of the oxygen can be gas or water
vapor depending upon the dry process or wet process.
•After the oxidation is complete and the oxide layer is the correct thickness, nitrogen is
reintroduced into the chamber to prevent further oxidation from occurring.
•The wafers are then removed from the chamber. After inspection, they are ready for further
processing.
•Thermal oxidation can be either a dry or a wet process
f) Ion Implantation
Ion implantation is the dominant technique to introduce dopant impurities into crystalline
silicon. This is performed with an electric field which accelerates the ionized atoms or
molecules so that these particles penetrate into the target material until they come to rest
because of interactions with the silicon atoms. Ion implantation is able to control exactly the
distribution and dose of the dopants in silicon, because the penetration depth depends on the
kinetic energy of the ions which is proportional to the electric field. The dopant dose can be
controlled by varying the ion source. Unfortunately, after ion implantation the crystal
structure is damaged which implies worse electrical properties. Another problem is that the
implanted dopants are electrically inactive, because they are situated on interstitial sites.
Therefore after ion implantation a thermal process step is necessary which repairs the crystal
damage and activates the dopants.
g) Diffusion
We have two types of FETs. They are Enhancement mode and depletion mode
transistor. Also we have PMOS and NMOS transistors.
In Enhancement mode transistorchannel is going to form after giving a proper positive gate
voltage. We have NMOS and PMOS enhancement transistors.
In Depletion mode transistorchannel will be present by the implant. It can be removed by
giving a proper negative gate voltage. We have NMOS and PMOS depletion mode
transistors.
N-MOS enhancement mode transistor:-
This transistor is normally off. This can be made ON by giving a positive gate
voltage. By giving a +ve gate voltage a channel of electrons is formed between source drain.
This transistor is normally ON, even with Vgs=0. The channel will be implanted
while fabricating, hence it is normally ON. To cause the channel to cease to exist, a –ve
voltage must be applied between gate and source.
NMOS Fabrication:
The process starts with the oxidation of the silicon substrate (Fig. 9(a)), in which a relatively
thick silicon dioxide layer, also calledfield oxide, is created on the surface (Fig. 9(b)). Then,
the field oxide is selectively etched to expose the silicon surface on which the MOS transistor
will be created (Fig. 9(c)).
Following this step, the surface is covered with a thin, high-quality oxide layer, which will
eventually form the gate oxide of the MOS transistor (Fig. 9(d)). On top of the thin
oxide, a layer of polysilicon (polycrystalline silicon) is deposited (Fig. 9(e)). Polysilicon is
used both as gate electrode material for MOS transistors and also as an interconnect medium
in silicon integrated circuits. Undoped polysilicon has relatively high resistivity. The
resistivity of polysilicon can be reduced, however,by doping it with impurity atoms. After
deposition, the polysilicon layer is patterned and etched to form the interconnects and
the MOS transistor gates (Fig. 9(f)). The thin gate oxide not covered by polysilicon is
also etched away, which exposes the bare silicon surface on which the source and drain
unctions are to be formed (Fig. 9(g)).
The entire silicon surface is then doped with a high concentration of impurities, either
through diffusion or ion implantation (in this case with donor atoms to produce n-type
doping).
Figure 9(h) shows that the doping penetrates the exposed areas on the silicon surface,
ultimately creating two n-type regions (source and drain junctions) in the p-type
substrate. The impurity doping also penetrates the polysilicon on the surface, reducing
its resistivity.
Note that the polysilicon gate, which is patterned before doping actually defines the
precise location of the channel region and, hence, the location of the source and the drain
regions.
Sincethis procedure allows very precise positioning of the two regions relative to the
gate, it is also called the self-aligned 8process. Once the source and drain regions are
completed, the entire surface is again covered with an insulating layer of silicon dioxide(Fig.
9 (i)).
The insulating oxide layer is then patterned in order to provide contactwindows for the drain
and source junctions (Fig. 9 (j)).
The surface is covered withevaporated aluminum which will form the interconnects
(Fig. 9 (k)).
Finally, the metal layer is patterned and etched, completing the interconnection of the
MOS transistors on the surface (Fig. 9 (l)).
Usually, a second (and third) layer of metallic interconnect can also be added on top of this
structure by creating another insulating oxide layer, cutting contact (via) holes, depositing,
and patterning the metal.
CMOS fabrication: When we need to fabricate both nMOS and pMOS transistors on
the same substrate we need to follow different processes. The three different processes
are,P-well process ,N-well process and Twin tub process.
P-WELL PROCESS:
The p-well process starts with a n type substrate. The n type substrate can be used to
implement the pMOS transistor, but to implement the nMOS transistor we need to
provide a p-well, hence we have provided heplace for both n and pMOS transistor on the
same n-type substrate.
Mask sequence.
Mask 1 defines the areas in which the deep p-well diffusion takes place.
Mask 2: It defines the thin oxide region (where the thick oxide is to be removed or
stripped and thin oxide grown)
Mask 3: It’s used to pattern the polysilicon layer which is deposited after thin oxide.
Mask 4: A p+ mask (anded with mask 2) to define areas where p-diffusion is to take place.
Mask 5: We are using the –ve form of mask 4 (p+ mask) It defines where n-diffusion is to
take place.
Mask 6: Contact cuts are defined using this mask.
Mask 7: The metal layer pattern is defined by this mask.
Mask 8: An overall passivation (overglass) is now applied and it also defines openings for
accessing pads.
Twin-tub process:
Here we will be using both p-well and n-well approach. The starting point is a n-
type material and then we create both n-well and p-well region. To create the both well we
first go for the epitaxial process and then we willcreate both wells on the same substrate.
Bi-CMOS technology: - (Bipolar CMOS)
The driving capability of MOS transistors is less because of limited current sourcing
and sinking capabilities of the transistors. To drive large capacitive loads we can think of Bi-
Cmos technology.
This technology combines Bipolar and CMOS transistors in a single integrated circuit,
by retaining benefits of bipolar and CMOS, BiCMOS is able to achieve VLSI circuits
with speed-power-density performance previously unattainable with either technology
individually.
The diagram given below shows the cross section of the BiCMOS process which
uses an npn transistor.
BASIC ELECTRICAL PROPERITIES:
1. Linear Region:
KVL:
VG - VS = VG VC + VC - VS
VG - VS = VGS
VG - VC = VGC
VC - VS = V (x)
VGS = VGC + V (x) or VGS - V (x) = VGC
Total charge density at x on capacitor(COX) is QT (x):
2. Saturation Region
When VDS _ (VGS - VTH) channel pinches off. This means that the channel current near the
drain spreads out and the channel near drain can be approximated as the depletion region. After
this occurs, at VDS = (VGS - VTH), if you make VDS larger, the current ID does not change (to
zero approximation). This is because any additional VDS you add will get dropped across the
depletion region and won't change the current ID.
So for VDS _ (VGS - VTH) we _nd ID by setting VDS = (VGS - VTH) substituting into the
linear equation.
Transconductance:
Transconductance is a property of certain electronic components. Conductance is the reciprocal
of resistance; transconductance is the ratio of the current change at the output port to the voltage
change at the input port. It is written as gm. For direct current, transconductance is defined as
follows:
Introduction:
NMOS(n-type MOStransistor):
Three MOS operating regions are: Cutoff or subthreshold region, linear region and
saturation region.
If Vsb is zero, then Vt= Vt(0) that means the value of the threshold voltage will not be changed.
Therefore, weshort circuit the source and substrate so that, Vsb will be zero.
Sub-threshold region:
For Vgs<Vtalso wewill get some value of Drain current this is called as Subthreshold current
And the region is called as Sub-threshold region.
Channel length modulation:
The channel length o f the MOSFET is changed due to the change in the drain to source voltage.
This effect is called as the channel length modulation. The effective channel length & the value
of the drain current considering channel length modulation into effect is given by,
Mobility: Mobility is the defined as the ease with which the charge carriers drift in the
substrate material .
Mobility decreases with increase in doping concentration and increase in temperature. Mobility
is the ratio o f average carrier drift velocity and el ectric field. Mobility is represented by the
symbol μ.
Drain punch-though:
When the drain is a high voltage, the depletion region around the drain may extend to the
source, causing the current to flow even i t gat e voltage i s zero . This i s known as Punch-
through condition.
CMOSINVETER CHARACTERISTICS
CMOS inverters (Complementary NOSFET Inverters) are some of the most widely used and
adaptable MOSFET inverters used in chip design. They operate with very little power loss and at
relatively high speed. Furthermore, the CMOS inverter has good logic buffer characteristics, in
that , it s noise margins in both low and high states are large.
A CMOS inverter contains a PMOS and a NMOS transistor connected at the drain and gate
terminals, a supply voltage VDD at the PMOS source terminal, and a ground connected at the
NMOS source terminal, were VIN is connected to the gate terminals and VOUT is connected
to the drain terminals.( given in diagram). It is important to notice that the CMOS does not
contain any resistors, which makes it more power efficient that a regular resistor - MOSFET
inverter.
As the voltage at the input of the CMOS device varies between 0andVDD,the state o f the
NMOS and PMOS varies accordingly . If we model each transistor as a simple switch
activated by VIN, the inverter‘ s operations can be seen very easily:
The table given, explains when the each transistor is turning on and off. When VIN is low, the
NMOS is "o ff", while the PMOS stays "on": instantly charging VOUT to logic high. When Vin
is high, the NM OS is "on and the PMOS is "o ff ": taking the voltage at VOUT to logic low.
Inverter DC Characteristics:
Before we study the DC characteristics o f the inverter we should examine the ideal
characteristics of inverter which is shown below. The characteristic shows that when input is
zero output will high and vice versa.
The actual characteristic is also given here for the reference. Here we have shown the status of
both NMOS and PMOS transistor in all the regions of the characteristics. Graphical Derivation
of Inverter DC Characteristics:
The actual characteristics are drawn by plotting the values of output voltage for different values
of the input voltage. We can also draw the characteristics, starting with the VI-characteristics o f
PMOS and NMOS characteristics.
Graphical Derivation of Inverter DCCharacteristics:
The actual characteristics are drawn by plotting the values of output voltage for different values
of the input voltage. We can also draw the characteristics, starting with the VI-characteristics o f
PMOS and NMOS characteristics
Figure shows five regions namely region A, B, C, D & E. also we have shown a dotted curve
which is the current that is drawn by the inverter.
Region A:
The output in this region is high because the P device is OFF and n device is ON. In region A,
NMOS is cuto ff region and PMOS is on, therefore output is l ogic high. We can analyze the
inverter when it is inregion B. the analysis i s given below:
Region B:
The equivalent circuit of the inverter when it i s region B i s given below.
In this region PMOS will be in linear region and NMOS is in saturation region
. The expression for the NMOS current is
Region C:
The equivalent circuit of CMOS inverter when it is in region C is given here. Both n and
p transistors are in saturation region, we can equate both the currents and we can obtain
the expression for the midpoint voltage or switching point voltage of a inverter. The
corresponding equations are as follows:
UNIT II
The VLSI design cycle starts with a formal specification of a VLSI chip, follows a
series of steps, and eventually produces a packaged chip. A typical design cycle may be
represented by the flow chart shown in Figure. Our emphasis is on the physical design step of
the VLSI design cycle. However, to gain a global perspective, we briefly outline all the steps
of the VLSI design cycle.
System specification
Architectural design
functional design
Logic design
Circuit design
Physical design
fabrication
1.System Specification:
The first step of any design process is to lay down the specifications of the system.
System specification is a high level representation of the system. The factors to be considered
in this process include: performance, functionality, and physical dimensions (size of the die
(chip)). The fabrication technology and design techniques are also considered.
The basic architecture of the system is designed in this step. This includes, such
decisions as RISC (Reduced Instruction Set Computer) versus CISC (Complex Instruction
Set Computer), number of ALUs, Floating Point units, number and structure of pipelines, and
size of caches among others.
The outcome of architectural design is a Micro-Architectural Specification (MAS).
While MAS is a textual (English like) description, architects can accurately predict the
performance, power and die size of the design based on such a description.
In this step, main functional units of the system are identified. This also identifies the
interconnect requirements between the units. The area, power, and other parameters of each
unit are estimated.
The behavioral aspects of the system are considered without implementation specific
information. For example, it may specify that a multiplication is required, but exactly in
which mode such multiplication may be executed is not specified. We may use a variety of
multiplication hardware depending on the speed and word size requirements. The key idea is
to specify behavior, in terms of input, output and timing of each unit, without specifying its
internal structure.
The outcome of functional design is usually a timing diagram or other relationships
between units. This information leads to improvement of the overall design process and
reduction of the complexity of subsequent phases. Functional or behavioral design provides
quick emulation of the system and allows fast debugging of the full system. Behavioral
design is largely a manual step with little or no automation help available.
4. Logic Design:
In this step the control flow, word widths, register allocation, arithmetic operations,
and logic operations of the design that represent the functional design are derived and tested.
This description is called Register Transfer Level (RTL) description. RTL is
expressed in a Hardware Description Language (HDL), such as VHDL or Verilog. This
description can be used in simulation and verification. This description consists of Boolean
expressions and timing information. The Boolean expressions are minimized to achieve the
smallest logic design which conforms to the functional design. This logic design of the
system is simulated and tested to verify its correctness. In some special cases, logic design
can be automated using high level synthesis tools. These tools produce a RTL description
from a behavioral description of the design.
5. Circuit Design:
The purpose of circuit design is to develop a circuit representation based on the logic
design. The Boolean expressions are converted into a circuit representation by taking into
consideration the speed and power requirements of the original design. Circuit Simulationis
used to verify the correctness and timing of each component.
The circuit design is usually expressed in a detailed circuit diagram. This diagram
shows the circuit elements (cells, macros, gates, transistors) and interconnection between
these elements. This representation is also called a netlist. Tools used to manually enter such
description are called schematic capture tools. In many cases, a netlist can be created
automatically from logic (RTL) description by using logic synthesis tools.
6. Physical Design:
In this step the circuit representation (or netlist) is converted into a geometric
representation. As stated earlier, this geometric representation of a circuit is called
a layout.Layout is created by converting each logic component (cells, macros, gates,
transistors) into a geometric representation (specific shapes in multiple layers), which
perform the intended logic function of the corresponding component. Connections between
different components are also expressed as geometric patterns typically lines in multiple
layers.
The exact details of the layout also depend on design rules, which are guidelines
based on the limitations of the fabrication process and the electrical properties of the
fabrication materials. Physical design is a very complex process and therefore it is usually
broken down into various sub-steps. Various verification and validation checks are performed
on the layout during physical design.
In many cases, physical design can be completely or partially automated and layout
can be generated directly from netlist by Layout Synthesis tools. Layout synthesis tools, while
fast, do have an area and performance penalty, which limit their use to some designs. Manual
layout, while slow and manually intensive, does have better area and performance as
compared to synthesized layout. However this advantage may dissipate as larger and larger
designs may undermine human capability to comprehend and obtain globally optimized
solutions.
7. Fabrication:
After layout and verification, the design is ready for fabrication. Since layout data is
typically sent to fabrication on a tape, the event of release of data is called Tape Out.Layout
data is converted (or fractured) into photo-lithographic masks, one for each layer. Masks
identify spaces on the wafer, where certain materials need to be deposited, diffused or even
removed. Silicon crystals are grown and sliced to produce wafers. Extremely small
dimensions of VLSI devices require that the wafers be polished to near perfection. The
fabrication process consists of several steps involving deposition, and diffusion of various
materials on the wafer. During each step one mask is used. Several dozen masks may be used
to complete the fabrication process.
A large wafer is 20 cm (8 inch) in diameter and can be used to produce hundreds of
chips, depending of the size of the chip. Before the chip is mass produced, a prototype is
made and tested. Industry is rapidly moving towards a 30 cm (12 inch) wafer allowing even
more chips per wafer leading to lower cost per chip.
N-diffusion
P-diffusion
Polysilicon
Metal
These layers are isolated by one another by thick or thin silicon dioxide insulating
layers. Thin oxide mask region includes n-diffusion / p-diffusion and transistor channel.
Stick Diagrams:
Stick diagrams may be used to convey layer information through use of a color code.
For example:
n-diffusion -- green
poly – red
blue -- metal yellow
--implant black –
contact areas
2Encodings For Nmos Process:
Figure shows the way of representing different layers in stick diagram notation and
mask layout using nmos style.
Figure1 shows when a n-transistor is formed: a transistor is formed when a green line
(n+ diffusion) crosses a red line (poly) completely. Figure also shows how a depletion
mode transistor is represented in the stick format.
Encodings for CMOS process:
Figure 2 shows when a n-transistor is formed: a transistor is formed when a green line
(n+ diffusion) crosses a red line (poly) completely.
Figure 2 also shows when a p-transistor is formed: a transistor is formed when a yellow
line(p+ diffusion) crosses a red line (poly) completely.
Encoding for BJT and MOSFETs:
a p-type substrate
paths of n-type diffusion
a thin layer of silicon dioxide
paths of polycrystalline silicon
a thick layer of silicon dioxide
paths of metal (usually aluminium) a further thick layer of silicon dioxide with contact cuts
through the silicon dioxide where connections are required.
The three layers carrying paths can be considered as independent conductors that
only interact where polysilicon crosses diffusion to form a transistor. These tracks can be
drawn as stick diagrams wih
diffusion in green
polysilicon in red
metal in blue
using black to indicate contacts between layers and yellow to mark regions of implant in the
channels of depletion mode transistors. With CMOS there are two types of diffusion: n-type
is drawn in green and p-type in brown.
These are on the same layers in the chip and must not meet. In fact, the method of
fabrication required that they be kept relatively far apart.Modern CMOS processes usually
support more than one layer of metal. Two are common and three or more are often
available. Actually, these conventions for colors are not universal; in particular, industrial
(rather than academic) systems tend to use red for diffusion and green for polysilicon.
Morever shortage of colored pen normally mean that both types of diffusion in CMOS are
colored green and the polarity indicated by drawing a circle round by p-type transistor or
from the context. Colorings for multiple layers of metal are even less standard.
There are three ways that nMOS inverter might be drawn:
Figure 4 shows schematic, stick diagram and corresponding layout of nMOS depletion
load inverter
Figure 5 shows schematic, stick diagram and corresponding layout of CMOS inverter
Figure 7 shows the stick diagram nMOS implementation of the function f=[(xy)+z]’
Figure 8 shows the stick diagram CMOS NOR and NAND, where we can see that the
p diffusion line never touched the n diffusion directly, it is always joined using a blue color
metal line.
Design Rules:
Design rules include width rules and spacing rules. Mead and Conway developed a
set of simplified scalable λ -based design rules, which are valid for a range of fabrication
technologies. In these rules, the minimum feature size of a technology is characterized as 2λ .
All width and spacing rules are specified in terms of the parameter λ . Suppose we have
design rules that call for a minimum width of 2λ , and a minimum spacing of 3λ. If we select
a 2µm technology (i.e., λ = 1 um), the above rules translated to a minimum width of
2µm and a minimum spacing of 3 µm. On the otherhand, if a 1µm technology (i.e., λ = 0.5
µm) is selected, then the same width and spacing rules are now specified as 1 um and 1.5 µm,
respectively.
Figure 10 shows the design rule n diffusion, p diffusion, poly, metal1 and metal 2.
The n and p diffusion lines is having a minimum width of 2λand a minimum spacing of 3λ.
Similarly we are showing for other layers.
Figure shows the design rule for the transistor, and it also shows that the poly
should extend for a minimum of 2λbeyond the diffusion boundaries.(gate over hang
distance)
Via:
It is used to connect higher level metals from metal1 connection. The cross section
and layout view given figure13 explain via in a betterway.
The above figure shows the design rules for contact cuts and Vias. The design
rule for contact is minimum 2λx2λand same is applicable for a Via.
Buried contact: The contact cut is made down each layer to be joined and it is shown
in figure 14
Butting contact: The layers are butted together in such a way the two contact cuts become
contiguous. We can better under the buttingcontact from figure 15
CMOS LAMBDA BASED DESIGN RULES:
Till now we have studied the design rules wrt only NMOS , what are the rules to be
followed if we have the both p and n transistor on the same chip will be made clear with the
diagram. Figure 16 shows the rules to be followed in CMOS well processes to accommodate
both n and p transistors.
In one way lambda based design rules are better compared micrometer based
design rules, that is lambda based rules are feature size independent.
Figure 17 shows the design rule for BiCMOS process using orbit 2um process.
The following is the example stick and layout for 2way selector with enable (2:1 MUX).
Layout diagram of cmos inverter:
In the following, the mask layout design of a CMOS inverter will be examined step-
by-step. The circuit consists of one nMOS and one pMOS transistor, therefore, one would
assume that the layout topology is relatively simple. Yet, we will see that there exist quite a
number of different design possibilities even for this very simple circuit.
First, we need to create the individual transistors according to the design rules.
Assume that we attempt to design the inverter with minimum-size transistors. The width of
the active area is then determined by the minimum diffusion contact size (which is necessary
for source and drain connections) and the minimum separation from diffusion contact to both
active area edges. The width of the polysilicon line over the active area (which is the gate of
the transistor) is typically taken as the minimum poly width . Then, the overall length of the
active area is simply determined by the following sum: (minimum poly width) + 2 x
(minimum poly-to- contact spacing) + 2 x (minimum spacing from contact to active area
edge). The pMOS transistor must be placed in an n-well region, and the minimum size of the
n- well is dictated by the pMOS active area and the minimum n-well overlap over n+. The
distance between the nMOS and the pMOS transistor is determined by the minimum
separation between the n+ active area and the n-well. The polysilicon gates of the nMOS and
the pMOS transistors are usually aligned. The final step in the mask layout is the local
interconnections in metal, for the output node and for the VDD and GND contacts. Notice
that in order to be biased properly, the n-well region must also have a VDD contact.
The mask layout designs of CMOS NAND and NOR gates follow the general
principles examined earlier for the CMOS inverter layout. Figure 18 shows the sample
layouts of a two- input NOR gate and a two-input NAND gate, using single-layer polysilicon
and single-layer metal. Here, the p-type diffusion area for the pMOS transistors and the n-
type diffusion area for the nMOS transistors are aligned in parallel to allow simple routing of
the gate signals with two parallel polysilicon lines running vertically. Also notice that the two
mask layouts show a very strong symmetry, due to the fact that the NAND and the NOR gate
are have a symmetrical circuit topology. Finally, Figs 19 and 20 show the major steps of the
mask layout design for both gates, starting from the stick diagram and progressively defining
the mask layers.
Figure 18: Sample layouts of a CMOS NOR2 gate and a CMOS NAND2 gate.
Figure 19: Major steps required for generating the mask layout of a CMOS NOR2
gate.
Figure 20: Major steps required for generating the mask layout of a CMOS
NAND2 gate.
Scaling of MOS Circuits:
1.What is Scaling?
2.Why Scaling?...
Scale the devices and wires down, Make the chips ‘fatter’ – functionality,
intelligence, memory – and – faster, Make more chips per wafer – increased yield, Make the
end user Happy by giving more for less and therefore, make MORE MONEY!!
Many of the FoMs can be improved by shrinking the dimensions of transistors and
interconnections. Shrinking the separation between features – transistors and wires
Adjusting doping levels and supply voltages.
Technology Scaling :
Improved Performance
Improved Cost
Interconnect Woes
Power Woes
Productivity Challenges
Physical Limits
Physical Limits :
8. Limitations of Scaling
Effects, as a result of scaling down- which eventually become severe enough to
prevent further miniaturization.
Substrate doping
Depletion width
Limits of miniaturization
INTRODUCTION :
The module (integrated circuit) is implemented in terms of logic gates and
interconnections between these gates. Designer should know the gate-level diagram of the
design. In general, gate-level modeling is used for implementing lowest level modules in a
design like, full-adder, multiplexers, etc.
For example, given the expression a+b , we can compute its truth value for any given values
of a and b , and also we can evaluate relationships such as a+b = c. but logic design is difficult
for many reasons:
• We may not have a logic gate for every possible function, or even for every function of n
inputs.
• Not all gate networks that compute a given function are alike-networks may differ greatly
in their area and speed.
• Thus combinational logic expressions are the specification,
Logic gate networks are the implementation,
Area, delay, and power are the costs.
• A logic gate is an idealized or physical device implementing a Boolean function, that is,
it performs a logical operation on one or more logic inputs and produces a single logic
output.
• Logic gates are primarily implemented using diodes or transistors acting as electronic
switches, but can also be constructed using electromagnetic relays (relay logic), fluidic
logic, pneumatic logic, optics, molecules, or even mechanical elements.
• With amplification, logic gates can be cascaded in the same way that Boolean functions
can be composed, allowing the construction of a physical model of all of Boolean logic.
• Simplest form of electronic logic is diode logic. This allows AND and OR gates to be
built, but not inverters, and so is an incomplete form of logic. Further, without some kind
of amplification it is not possible to have such basic logic operations cascaded as required
for more complex logic functions.
• To build a functionally complete logic system, relays, valves (vacuum tubes), or
transistors can be used.
• The simplest family of logic gates using bipolar transistors is called resistor-transistor
logic (RTL). Unlike diode logic gates, RTL gates can be cascaded indefinitely to produce
more complex logic functions. These gates were used in early integrated circuits. For
higher speed, the resistors used in RTL were replaced by diodes, leading to diode-
transistor logic (DTL).
• Transistor-transistor logic (TTL) then supplanted DTL with the observation that one
transistor could do the job of two diodes even more quickly, using only half the space.
• In virtually every type of contemporary chip implementation of digital systems, the
bipolar transistors have been replaced by complementary field-effect transistors
(MOSFETs) to reduce size and power consumption still further, thereby resulting in
complementary metal–oxide–semiconductor (CMOS) logic. that can be described with
Boolean logic.
Thus the output (F) is either connected to VDD or the ground, where the logic 0 is represented
by the ground and the logic 1 is represented by VDD. So the requirement of digital logic design is
to implement the pull-up switch (S1) and the pull-down switch(S2).
A generalized CMOS logic circuit consists of two transistor nets nMOS and pMOS. The
pMOS transistor net is connected between the power supply and the logic gate output called as
pull-up network , Whereas the nMOS transistor net is connected between the output and ground
called as pull-down network. Depending on the applied input logic, the PUN connects the output
node to VDD and PDN connects the output node to the ground.
The transistor network is related to the Boolean function with a straight forward design
procedure:
• Design the pull down network (PDN) by realizing, AND(product) terms using series-
connected nMOSFETs. OR (sum) terms using parallel-connected nMOSFETS.
CMOS inverter:
A CMOS inverter is the simplest logic circuit that uses one nMOS and one pMOS
transistor. The nMOS is used in PDN and the pMOS is used in the PUN as shown in
figure.
Finally join the PUN and PDN as shown in figure which realizes two –input NAND gate.
Note that we have realized y, rather tat Y because the inversion is automatically provided by the
nature of the CMOS circuit operation,
Finally join the PUN and PDN as shown in figure which realizes two –input NAND gate.
Note that we have realized y, rather tat Y because the inversion is automatically provided by the
nature of the cMOS circuit operation,
Working operation:
A complex logic gate is one that implements a function that can provide the basic NOT,
AND and OR operation but integrates them into a single circuit. CMOS is ideally suited for
creating gates that have logic equations by exhibiting the following,
An AOI logic equation is equivalent to a complemented SOP from, while an AOI equation is
equivalent to a complemented POS structure. In CMOS, output always produces NOT operation
acting on input variable.
1) AOI Logic Function (OR) Design of XOR gate using CMOS logic.
Step 3: Y = A.B+C.D , In this function A.B and C.D are added, for addition , we have to draw
parallel connection. So, A.B series connected in parallel with C.D as shown in figure.
Step 5: Take output at the point in between nMOS and pMOS networks.
1) OAI Logic Function (OR) Design of XNOR gate using CMOS logic.
OR-AND-INVERT logic function(AOI) implements operation in the order OR,AND,NOT.
For example , let us consider the function Y = (A+B).(C+D) i.e., Y = NOT((A OR B)AND (C
OR D)) The OAI logic gate implementation for Y
SWITCH LOGIC:
Switch logic is mainly based on pass transistor or transmission gate. It is fast for small
arrays and takes no static current from the supply, VDD. Hence power dissipation of such arrays
is small since current only flows on switching.
Switch (pass transistor) logic is analogous to logic arrays based on relay contacts,
where in path through each switch is isolated from the logic levels activating the switch.
PASS TRANSISTOR
This logic uses transistors as switches to carry logic signals from node to node instead of
connecting output nodes directly to VDD or ground(GND) If a single transistor is a switch
between two nodes, then voltage degradation equal to vt (threshold voltage) for high or low level
depends up on nMOS or pMOS logic. When using nMOS switch logic no pass transistor gate
input may be driven through one or more pass transistors as shown in figure.
Since the signal out of pass transistor T1 does not reach a full logic 1 by threshold voltage
effects signal is degraded by below a true logic 1, this degraded voltage would not permit the
output of T2 to reach an acceptable logic 1 level.
2) Do not dissipate standby power, since they do not have a path from supply to ground.
Disadvantages:
1) Degradation in the voltage levels due to undesirable threshold voltage effects.
2) Never drive a pass transistor with the output of another pass transistor.
TRANSMISSION GATE:
Thus current can flow through this element in either direction. Depending on whether
or not there is a voltage on the gate, the connection between the input and output is either low
resistance or high-resistance, respectively Ron = 100Ω and Roff > 5 MΩ.
Operation:
• When the gate input to the nMOS transistor is „0‟ and the complementary „1‟ is gate input to
the pMOS , thus both are turned off.
• When gate input to the nMOS is „1‟ and its complementary „0‟ is the gate input to the pMOS,
both are turned on and passes any signal „1‟ and „0‟ equally without any degradation.
• The use of transmission gates eliminates the undesirable threshold voltage effects which give
rise to loss of logic levels in pass-transistors as shown in figure.
2) Transmission gate consists of two transistors in parallel and except near the positive and
negative rails.
Disadvantages:
CMOS suffers from increased area and correspondingly increased capacitance and delay,
as the logic gates become more complex. For this reason, designers developed circuits (Alternate
gate circuits) that can be used to supplement the complementary type circuits . These forms are
not intended to replace CMOS but rather to be used in special applications for special purposes.
Pseudo nMOS logic is one type of alternate gate circuit that is used as a supplement for the
complementary MOS logic circuits. In the pseudo-nMOS logic, the pull up network (PUN) is
realized by a single pMOS transistor. The gate terminal of the pMOS transistor is connected to
the ground. It remains permanently in the ON state. Depending on the input combinations, output
goes low through the PDN. Figure shows the general building block of logic circuits that follows
pseudo nMOS logic.
Here, only the nMOS logic (Qn) is driven by the input voltage, while the gate of p-
transistor(Qp) is connected to ground or substrate and Qp acts as an active load for Qn. Except
for the load device, the pseudo-nMOS gate circuit is identical to the pull-down network(PDN) of
the complementary CMOS gate.The realization of logic circuits using pseudo-nMOS logic is as
shown in figure.
A dynamic CMOS logic uses charge storage and clocking properties of MOS transistors to
implement logic operations. Figure shows the basic building block of dynamic CMOS logic.
Here the clock ø drives nMOS evaluation transistor and pMOS precharge transistor. A logic is
implemented using an nFET array connected between output node and ground.
The gate (clock ø) defines two phases, evaluation and precharge phase during each clock cycle.
Working :
• When clock ø = 0 the circuit is in precharge phase with the pMOS device Mp ON and the
evaluation nMOS Mn OFF. This establishes a conducting path between VDD and the output
allowing Cout to charge to a voltage Vout = VDD. Mp is often called the precharge FET.
• When clock ø = 1 the circuit is in evaluation phase with the pMOS device Mp OFF and the
evaluation nMOS Mn ON. If the logic block acts like a closed switch the Cout can discharge
through logic array and Mn, this gives a final result of Vout = VDD, logically this is an output
of F = 1. Charge leakage eventually drops the output to Vout = 0 Vwhich could be an
incorrect logic value.
The logic formation is formed by three series connected FETs (3-input NAND gate) is shown in
figure.
The dynamic CMOS logic circuit has a serious problem when they are cascaded. In the
precharged phase (ø = 0) , output of all the stages are pre-charged to logic high. In the evaluation
phase (ø = 1), the output of all stages are evaluated simultaneously. Suppose in the first stage, the
inputs are such that the output is logic low after the evaluation. In the second stage, the output of
the first stage is one input and there are other inputs.
If the other inputs of the second stage are such that output of it discharges to logic low,
then the evaluated output of the first stage can never make the output of the second stage logic
high. This is because, by the time the first stage is being evaluated, output of the second. Stage is
discharged, since evaluation happens simultaneously. Remember that the output cannot be
charged to logic high in the evaluation phase (ø = 1, pMOSFET in PUN is OFF), it can only be
retained in the logic high depending on the inputs.
Advantages:
Large noise margin and Low power dissipation, Small area due to less number of transistors.
c) CMOS DOMINO LOGIC
Standard CMOS logic gates need a PMOS and an NMOS transistor for each logic input. The
pMOS transistors require a greater area tan the nMOS transistors carrying the same current. So, a
large chip area is necessary to perform complex logic operations. The package density in CMOS
is improved if a dynamic logic circuit, called the domino CMOS logic circuit, is used.
Domino CMOS logic is slightly modified version of the dynamic CMOS logic circuit. In this
case, a static inverter is connected at the output of each dynamic CMOS logic block. The
addition of the inverter solves the problem of cascading of dynamic CMOS logic circuits.The
circuit diagram of domino CMOS logic structures as shown in figure as follows
Working:
When ø = 0, is ON and Mn is OFF, so that no current flows in the AND-OR paths of the
AOI. The capacitor CL is charged to VDD through Mp since the latter is ON. The input to the
inverter is high, and drives the output voltage V0 to logic-0.
When ø = 1, Mp is turned OFF and Mn is turned ON. If either (or both) A and B or C and D
is at logic-1, CL discharges through either T2,T1 and Mn or T3,T4 and Mp. So , the inverter
input is driven to logic-0 and hence the output voltage V0 to logic-1. The Boolean expression for
the output voltage is Y = AB + CD.
Note :
Logic input can change only when ø = 0. No changes of the inputs are permitted when ø
= 1 since a discharge path may occur.
Advantages:
Smaller areas compared to conventional CMOS logic.
Parasitic capacitances are smaller so that higher operating speeds are possible.
Operation is free of glitches since each gate can make one transition.
Disadvantages:
Non inverting structures are possible because of the presence of inverting buffer.
Charge distribution may be a problem.
Working
When ø = 1 the circuit acts an inverter , because transistors Q3 and Q4 are „ON‟ . It is said to
be in the “evaluation mode”. Therefore the output Z changes its previous value.
When ø = 0 the circuit is in hold mode, because transistors Q3 and Q4 becomes „OFF‟ . It is
said to be in the “precharge mode”. Therefore the output Z remains its previous value.
Figure shows the another variation of basic dynamic logic arrangement of CMOS logic
called as n-p CMOS logic. In this, logic the actual logic blocks are alternatively „n‟ and „p‟ in a
cascaded structure. The clock ø and ø- are used alternatively to fed the precharge and evaluate
transistors. However, the functions of top and bottom transistors are also alternate between
precharge and evaluate transistors.
Disadvantages:
Here, the p-tree blocks are slower than the n-tree modules, due to the lower current drive
of the pMOS transistors in the logic network.
TIME DELAYS:
Consider the basic nMOS inverter has the channel length 8λ and width 2λ for pull-up
transistor and channel length of 2λ and width 2λ for pull down transistor.Hence the sheet
resistance for pull-up transistor is Rp.u = 4RS = 40kΩ and sheet resistance for pull-up transistor is
Rp.d = 1RS = 10kΩ.
Since (τ = RC) depends upon the values of R & C, the delay associates with the inverter
depend up on whether it is being turned on (or) off. Now, consider a pair of cascaded inverters
as shown in figure, then the delay over the pair will be constant irrespective of the sense of the
logic level transition of the input to the first.
When we consider CMOS inverters, the rules for nMOS inverters are not
applicable. But we need to consider the natural (RS) uneven values for equal size pull up p-
transistor and the n-type pull down transistors.
Figure shows the theoretical delay associated with a pair of both n and p transistors lambda based
inverters. Here the gate capacitance is double comparable to nMOS inverter since the input to a
CMOS inverter is connected to both transistor gate.
NOTE: Here the asymmetry (uneven) of resistance values can be eliminated by increasing the
width of the p-device channel by a factor of two or three at the same time the gate capacitance of
p-transistor also increased by the same factor.
In this analysis we assume that the p-device stays in saturation for the entire charging
period of the load capacitor CLConsider the circuit as follows
……………………… (1)
This current charges CL and since its magnitude is approximately constant, we have
…………………………(2)
Substitute the value of Idsp in above equation and then the rise time is
………………………(3)
………………………..(4)
If Vtp = 0.2VDD, then
………………………(5)
…………………….(6)
So, the rise time is slower by a factor of 2.5 when using minimum size devices for both n & p.
• In order to achieve symmetrical operation using minimum channel length we need to make
Wp = 2.5 Wn.
• For minimum size lambda based geometries this would result in the inverter having an input
capacitance of
1 ∆cg (n-device) + 2.5 ∆cg(p-device) = 3.5 ∆cg
From the above equations we can conclude that
1. τr and τf are proportional to 1/VDD
2. τr and τf are proportional to CL
3. τr = 2.5τf for equal n and p- transistor geometries.
When signals are propagated from the chip to off chip destinations we can face problems
to drive large capacitive loads. Generally off chip capacitances may be several orders higher than
on chip □cg values.
CL ≥ 104 ∆cg
Where CL denotes off chip load. The capacitances which of this order must be driven
through low resistances, otherwise excessively long delays will occur. Large capacitance is
presented at the input, which in turn slows down the rate of change of voltage at input.
Inverters to drive large capacitive loads must be present low pull-up and pull down resistance.
For MOS circuits low resistance values imply low L:W ratio(since ) . Since length L
cannot be reduced below the minimum feature size, the channels must be made very wide to
reduce resistance value. Consider N cascaded inverters as on increasing the width factor of „f‟
than the previous stage as shown in figure.
As the width factor increases, the capacitive load presented at the inverter input increases
and the area occupied increases also. It is observed that as the width increases, the number N of
stages are decreased to drive a particular value of C L. Thus with large f(width), N decreases but
delay per stage increases for 4:1 nMOS inverters.
Delay per stage = fτ for ∆Vin
Where ∆Vin indicates logic 0 to 1 transition and ∆-Vin indicates logic 1 to 0 transition of Vin.
N= ln(y)/ln(f)
It can be shown that total delay is minimized if f assumes the value of e for both CMOS
and nMOS inverters.
td = [ 2.5(N-1) + 4] eτ (nMOS)
td = [3.5(N-1) +5] eτ (CMOS) (for logical transition 1 to 0) ...…….(14)
Super buffers:
Generally the pull-up and the pull down transistors are not equally capable to drive
capacitive loads. This asymmetry is avoided in super buffers. Basically, a super buffer is a
symmetric inverting or non inverting driver that can supply (or) remove large currents and is
nearly symmetrical in its ability to drive capacitive load. It can switch large capacitive loads than
an inverter. An inverting type nMOS super buffer as shown in figure.
• Consider a positive going (0 to 1) transition at input Vin turns ON the inverter formed by
T1 and T2.
• With a small delay, the gate of T3 is pulled down to 0 volts. Thus, device T3 is cut off.
Since gate of T4 is connected to Vin, it is turned ON and the output is pulled down very
fast.
• For the opposite transition of Vin (1 to 0), Vin drops to 0 volts. The gate of transistor T3 is
allowed to rise to VDD quickly.
• Simultaneously the low Vin turns off T4 very fast. This makes T3 to conduct with its gate
voltage approximately equal to VDD.
• This gate voltage is twice the average voltage that would appear if the gate was
connected to the source as in the conventional nMOS inverter.
Now as Idsα Vgs , doubling the effective Vgs increases the current and there by reduces the delay
in charging at the load capacitor of the output. The result is more symmetrical transition.
Figure shows the non-inverting nMOS super buffer where the structures fabricated in 5µm
technology are capable of driving capacitance of 2pF with a rise time of 5nsec.
BiCMOS drivers:
1. In BiCMOS technology we use bipolar transistor drivers as the output stage of inverter and
logic gate circuits.
2. In bipolar transistors, there is an exponential dependence of the collector (output) current on
the base to emitter (input) voltage Vbe .
3. Hence, the bipolar transistors can be operated with much smaller input voltage swings than
MOS transistors and still switch large current.
4. Another consideration in bipolar devices is that the temperature effect on input voltage Vbe.
5. In bipolar transistor, Vbe is logarithmically dependent on collector current IC and also other
parameters such as base width, doping level, electron mobility.
6. Now, the temperature differences across an IC are not very high. Thus the Vbe values of the
bipolar devices spread over the chip remain same and do not differ by more than a few milli
volts.
The switching performance of a bipolar transistor driving a capacitive load can be analyzed
to begin with the help of equivalent circuit as shown in figure.
The time ∆t required to change the output voltage Vout by an amount equal to the input voltage is
∆t = CL/gm …………………………………(15)
The value of ∆t is small because the trans conductance of the bipolar transistors is
relatively high. There are two main components which reveals the delay due to the bipolar
transistors are Tin and TL .
Fig: 25 Tin vs TL
• Tin is the time required to first charge the base emitter junction of the bipolar (npn) transistor.
This time is typically 2ns for the BiCMOS transistor base driver.
• For the CMOS driver the time required to charge the input gate capacitance is 1ns.
• TL is the time required to charge the output load capacitance.
• The combined effect of Tin and TL is represented as shown in figure.
• Delay of BiCMOS inverter can be described by
……………….(16)
• Delay for BiCMOS inverter s reduced by a factor of hfe as compared with a CMOS inverter.
• In Bipolar transistors while considering delay another significant parameter is collector
resistance Rc through which the charging current for CL flows.
• For a high value of RC, there is a long propagation delay through the transistor when charging
a capacitive load.
• Figure shows the typical delay values at two values of CL as follows.
The devices thus have high β, high gm, high hfe and low RC. The presence of such
efficient and advantageous devices on chip offers a great deal of scope and freedom to the VLSI
designer.
Propagation delays:
Propagation delay is the delay in the propagation of the signal created by the change of
logical status at the input to create same change at the output.
The current entering at this node is I1 = (V1 – V2)/R and the current leaving from this node is
As the number of sections in the network increases, the circuit parameters become
distributed. Assume that R and C as the resistance per unit length and the capacitance per
unit length respectively.
τp α X2 ……...………..…………(22)
By simplifying the analysis if all sheet resistance, gate-to-channel capacitance RS and □cg are
lumped together
R total = nr Rs and C total = nc□cg …………………….(23)
Where r gives relative resistance per section interms of RS and c gives relative capacitance
per section interms of □cg . Then the overall delay for n sections is given by
τp = n2rc(τ) …………………..…………(24)
It can be shown that the signal delay in a section containing N identical pass transistors
driving a matched load (CL = Cg) is
For large value of N, the quantity (N + 1) can be replaced by N. Since the delay increases
with N, the number of pass transistors is restricted to 4. A cascade of more pass transistors will
produce a very slow circuit and the signal needs to be restored by an inverter after every three
(or) four pass transistor.
WIRING CAPACITANCES:
The significant sources of capacitance which contribute to the overall wiring capacitance
are as follows
(i)Fringing fields
Capacitance due to fringing field effects can be a major component of the overall
capacitance of interconnect wires. For fine line metallization, the value of fringing field
capacitance (Cff) can be of the same order as that of the area capacitance. Thus , Cff should be
taken into account if accurate prediction of performance is needed.
……………..………(26)
t = thickness of wire
(ii)Interlayer capacitances:
From the definition of capacitance itself, it can be said that there exists a capacitance
between the layers due to parallel plate effects. This capacitance will depend upon the layout i.e.,
where the layers cross or whether one layer underlies another etc., by the knowledge of these
capacitances, the accuracy of circuit modeling and delay calculations will be improved. It can be
readily calculated for regular structures.
• The source and drain p-diffusion regions forms junctions with the n-substrate (or n-well) at
well defined and uniform depths.
• Similarly, the source and drain n-diffusion regions forms junctions with p-substrate (or p-
well) at well defined and uniform depths.
• Hence, for diffusion regions, each diode thus formed has associated a peripheral (side wall)
capacitance with it.
• As a whole the peripheral capacitance,Cp will be the order of pF/unit length. So its value will
be greater than Carea of the diffusion region to substrate.
• Cp increases with reduction in source or drain area.
However, as the n and p-active regions are formed by impure implants at the surface of the
silicon incase of orbit processes, they have negligible depth. Hence Cp is quite negligible in
them.
Typical values are given in tabular form
Diffusion
capacitance Typical values
Negligible (assuming
-4 2
Periphery (Cperiph) 8.0 × 10 pF/µm implanted negligible
regions of negligible depth)
• An additional input to a CMOS logic gate requires an additional nMOS and pMOS i.e.,
two additional transistors, while incase of other MOS logic gates, it requires one
additional transistor.
• In CMOS logic gates, due to these additional transistors, not only the chip area but also
the total effective capacitance per gate also increased and hence propagation delay
increases. Some of the increase in propagation delay time can be compensated by the
size-scaling method. By increasing the size of the device, its current driving capability
can be preserved.
• Due to increase in both of inputs and devices size, the capacitance increases, Hence
propagation delay will still increase with fan-in.
• An increase in the number of outputs of a logic gate directly adds to its load capacitances.
Hence, the propagation delay increases with fan-out.
CHOICE OF LAYERS:
The following are the constraints which must be considered for the proper choice of layers.
1. Since the polysilicon layer has relatively high specific resistance (RS), it should not be used
for routing VDD and VSS (GND) except for small distances.
2. VDD and GND (VSS) must be distributed only on metal layers, due to the consideration of Rs
value.
3. The capacitive effects will also impose certain restrictions in the choice of layers as follows
(i) Where a fast signal line are required, and in relation to signals on wiring which has
relatively higher values of RS.
(ii) The diffusion areas have higher values of capacitance to substrate and are harder to
drive.
4. Over small equipotential regions, the signal on a wire can be treated as being identical at all
points.
5. Within each region the propagation delay of the signal will comparably smaller than the gate
delays and signal delays caused in a system connected by wires.
Thus the wires in a MOS system can be modeled as simple capacitors. This concept leads to the
establishment of electrical rules (guidelines) for communication paths (wires) as given in tabular
form.
Layer Maximum length of communication wire
Silicide 2,000λ NA NA
Introduction
CMOS system design consists of partitioning the system into subsystems of the types listed
above. Many options exist that make trade-o s between speed, den-sity, programmability, ease of
design, and other variables. This chapter addresses design options for common datapath operators.
The next chapter addresses arrays, especially those used for memory. Control structures are most
commonly coded in a hardware description language and synthesized.
Datapath operators benefit from the structured design principles of hierarchy, regularity,
modularity, and locality. They may use N identical circuits to process N-bit data. Related data
operators are placed physically adjacent to each other to reduce wire length and delay. Generally,
data is arranged to flow in one direction, while control signals are introduced in a direction
orthogonal to the data flow.
Common data path operators considered in this chapter include adders, one/zero detectors,
comparators, counters, shifters, ALUs, and multipliers.
Shifters
Consider a direct MOS switch implementation of a 4X4 crossbar switch as shown in Fig. 4.1.
The arrangement is quit general and may be readily expanded to accommodate n-bit inputs/outputs.
In fact, this arrangement is an overkill in that any input line can be connected to any or all output
lines-if all switches are closed, then all inputs are connected to all outputs in one glorious short
circuit. Furthermore, 16 control signals (sw00)-sw15, one for each transistor switch, must be provide
to drive the crossbar switch, and such complexity is highly undesirable. An adaption of this
arrangement) Recognizes the fact that we can couple the switch gates together in groups of four(in
this case) and also form four separate groups corresponding to shifts of zero, one, two, and three
bits. The arrangement is readily adapted so that the inlines also run horizontally (to con rm the
required strategy). The resulting arrangement is known as barrel shifter and a 4X4-bit barrel shifter
circuit diagram is given in Fig. 4.2. The interbus switches have their gate inputs connected in
staircase fashion in group of four and there are now four shift control inputs which must be mutually
exclusive in active state. CMOS transmission gates may be used in place of the simple pass
transistor switches if appropriate.
Adders
Addition is one of the basic operation perform in various processing like counting,
multiplication and ltering. Adders can be implemented in various forms to suit di erent speed and
density requirements.
The truth table of a binary full adder is shown in Figure 4.3, along with some functions that will be
of use during the discussion of adders. Adder inputs: A, B
Figure 2: Barrel shifter
Generate signal: G(A B); occurs when CARRY is internally generated within adder
Propagate signal: P (A + B); when it is 1, C is passed to CARRY. In some adders A B is used as
the P term because it may be reused to generate the sum term.
Single-Bit Adders
Probably the simplest approach to designing an adder is to implement gates to yield the required
majority logic functions.
= 𝐶 (A ⊕ B) + C(𝐴 ⊕ 𝐵)
= A ⊕B⊕C
CARRY = AB + AC + BC
= AB + C(A + B)
= 𝐴 𝐵 + 𝐶 (𝐴 + 𝐶 )
The direct implementation of the above equations is shown in Fig. 4 using the gate
schematic and the transistors is shown in Fig. 4.
The full adder employs 32 transistors (6 for the inverters, 10 for the carry circuit, and 16 for the
3-input XOR). A more compact design is based on the observation that S can be factored to reuse
the CARRY term as follows:
Figure 5: Transistor implementation of 1-Bit adder
Such a design is shown at the transistor levels in Figure 5 and uses only 28 transistors. Note that
the pMOS network is complement to the nMOS network.
Here Cin=C
A ripple carry adder is a digital circuit that produces the arithmetic sum of two binary
numbers. It can be constructed with full adders connected in cascaded, with the carry output from
each full adder connected to the carry input of the next full adder in the chain. Figure 6 shows the
interconnection of four full adder (FA) circuits to provide a 4-bit ripple carry adder. Notice from
Figure 6 that the input is from the right side because the first cell traditionally represents the least
significant bit (LSB). Bits a0 and b0 in the gure represent the least signi cant bits of the numbers to
be added. The sum output is represented by the bits S0-S3.
The carry lookahead adder (CLA) solves the carry delay problem by calculating the carry
signals in advance, based on the input signals. It is based on the fact that a carry signal will be
generated in two cases: (1) when both bits ai and bi are 1, or
(2) when one of the two bits is 1 and the carry-in is 1 . Thus, one can write,
The above two equations can be written in terms of two new signals P i and Gi, which are
shown in Figure 8:
c1 = G0 + P0:c0
c2 = G1 + P1:G0 + P1:P0:c0
Notice that the carry-out bit, ci+1, of the last stage will be available after four delays: two gate
delays to calculate the propagate signals and two delays as a result of the gates required to
implement Equation c4.
Figure 9 shows that a 4-bit CLA is built using gates to generate the Pi and Gi and signals and a
logic block to generate the carry out signals according to Equations c1 to c4.
(a) Logic network for 4-bit CLA carry bits (b) Sum calculation using CLA
network
transmission gate. If the carry path is precharged to VDD, the transmission gate is then reduced to a
simple NMOS transistor. In the same way the PMOS transistors of the carry generation is removed.
One gets a Manchester cell.
The Manchester cell is very fast, but a large set of such cascaded cells would be slow. This is
due to the distributed RC e ect and the body e ect making the propagation time grow with the square
of the number of cells. Practically, an inverter is added every four cells, like in Figure12.
Figure 13: Cascaded Manchester carry-chain elements with bu ering
Multipliers
In many digital signal processing operations - such as correlations, convolution, ltering, and
frequency analysis - one needs to perform multiplication. The most basic form of multiplication
consists of forming the product of two positive binary numbers. This may be accomplished through
the traditional technique of successive additions and shifts, in which each addition is conditional on
one of the multiplier bits. Here is an example.
The multiplication process may be viewed to consist of the following two steps:
It should be noted that binary multiplication is equivalent to a logical AND op-eration. Thus
evaluation of partial products consists of the logical ANDing of the multiplicand and the relevant
multiplier bit. Each column of partial products must then be added and, if necessary, any carry
values passed to the next column.
There are a number of techniques that may be used to perform multiplication. In general, the
choice is based on factors such as speed, throughput, numerical accuracy, and area. As a rule,
multipliers may be classi ed by the format in which data words are accessed, namely:-
Serial form
Serial/parallel form
Parallel form
Array Multiplication
A parallel multiplier is based on the observation that partial products in the multi-plication
process may be independently computed in parallel. For example, consider the unsigned binary
integers X and Y.
Thus Pk are the partial product terms called summands. There are mn summands, which are
produced in parallel by a set of mn AND gates. For 4-bit numbers, the expression above may be
expanded as in the table below.
Figure 15
An n x n multiplier requires
(n-1)2full adders,
n2 AND gates.
The worst-case delay associated with such a multiplier is (2n + l)tg, where tg is the worst-case
adder delay.
Cell shown in Figure 16 is a cell that may be used to construct a parallel multiplier.
The Xi term is propagated diagonally from top right to bottom left, while the yj term is
propagated horizontally. Incoming partial products enter at the top. Incoming CARRY IN values
enter at the top right of the cell. The bit-wise AND is performed in the cell, and the SUM is passed
to the next cell below. The CARRY 0UT is passed to the bottom left of the cell.
Figure17 depicts the multiplier array with the partial products enumerated. The Multiplier can be
drawn as a square array, as shown here, Figure 18 is the most convenient for implementation.In this
version the degeneration of the rst two rows of the multiplier are shown. The rst row of the
multiplier adders has been replaced with AND gates while the second row employs half-adders
rather than full adders.
This optimization might not be done if a completely regular multiplier were required (i.e. one
array cell). In this case the appropriate inputs to the rst and second row would be connected to
ground, as shown in the previous slide. An adder with equal carry and sum propagation times is
advantageous, because the worst-case multiply time depends on both paths.
If the truth table for an adder, is examined, it may be seen that an adder is in e ect a \one's
counter" that counts the number of l's on the A, B, and C inputs and encodes them on the SUM and
CARRY outputs.
A l-bit adder provides a 3:2 (3 inputs, 2 outputs)compression in the number of bits. The addition
of partial products in a column of an array multiplier may be thought of as totaling up the number of
l's in that column, with any carry being passed to the next column to the left.
Figure 18: Most convenient way for implementation of array multiplier
Figure 19
Example for implementation of 4x4 multiplier(4-bit) using Wallace Tree Multi-plication
methods
Considering the product P3, it may be seen that it requires the summation of four partial
products and a possible column carry from the summation of P2.
Consider the 6 x 6 multiplication table shown below. Considering the product P5, it may be seen
that it requires the summation of six partial products and a possible column carry from the
summation of P4. Here we can see the adders required in a multiplier based on this style of addition.
The adders have been arranged vertically into ranks that indicate the time at which the adder
output becomes available. While this small example shows the general Wallace addition technique,
it does not show the real speed advantage of a Wallace tree. There is an identifiable array part, and a
CPA part, which is at the top right. While this has been shown as a ripple-carry adder, any fast CPA
can be used here. The delay through the array addition (not including the CPA) is proportional to
log1.5(n), where n is the width of the Wallace tree.
Parity generator
1. Parity is a very useful tool in information processing in digital computers to indicate any
presence of error in bit information.
2. External noise and loss of signal strength cause loss of data bit information while transporting
data from one device to other device, located inside the computer or externally.
3. To indicate any occurrence of error, an extra bit is included with the message according to the
total number of 1s in a set of data, which is called parity.
4. If the extra bit is considered 0 if the total number of 1s is even and 1 for odd quantities of 1s
in a set of data, then it is called even parity.
5. On the other hand, if the extra bit is 1 for even quantities of 1s and 0 for an odd number of 1s,
then it is called odd parity.
A parity generator is a combination logic system to generate the parity bit at the transmitting
side.
Table 1.1: Truth table for generating even and odd parity bit
If the message bit combination is designated as, D3D2D1D0 and Pe, Po are the even and odd
parity respectively, then it is obvious from the table that the Boolean expressions of even parity
and odd parity are
Pe=D3D2D1D0
Po = (𝐷3𝐷2𝐷1𝐷0)
The above illustration is given for a message with four bits of information.
However, the logic diagrams can be expanded with more XOR gates for any number of bits.
Zero/One detector
Figure 26: One/zero detectors (a) All one detector (b) All zero detector (c) All zero detector
transistor level representation
Detecting all ones or zeros on wide N-bit words requires large fan-in AND or NOR gates.
Recall that by DeMorgan's law, AND, OR, NAND, and NOR are funda-mentally the same
operation except for possible inversions of the inputs and/or outputs. You can build a tree of AND
gates, as shown in Figure 26(b). Here, alternate NAND and NOR gates have been used. The path
has log N stages.
Comparators
Another common and very useful combinational logic circuit is that of the Digital Comparator
circuit. Digital or Binary Comparators are made up from standard AND, NOR and NOT gates that
compare the digital signals present at their input terminals and produce an output depending upon
the condition of those inputs.
For example, along with being able to add and subtract binary numbers we need to be able to
compare them and determine whether the value of input A is greater than, smaller than or equal to
the value at input B etc. The digital comparator accomplishes this using several logic gates that
operate on the principles of Boolean Algebra. There are two main types of Digital Comparator
available and these are.
1. Identity Comparator an Identity Comparator is a digital comparator that has only one output
terminal for when A = B either \HIGH" A = B = 1or \LOW" A = B = 0
The purpose of a Digital Comparator is to compare a set of variables or unknown numbers, for
example A (A1, A2, A3, . An, etc) against that of a constant or unknown value such as B (B1, B2,
B3, . Bn, etc) and produce an output condition or ag depending upon the result of the comparison.
For example, a magnitude comparator of two 1-bits, (A and B) inputs would produce the following
three output conditions when compared to each other.
A > B; A + B; A < B
This is useful if we want to compare two variables and want to produce an output when any of
the above three conditions are achieved. For example, produce an output from a counter when a
certain count number is reached. Consider the simple 1-bit comparator below. Then the operation of
a 1-bit digital comparator is given in the following Truth Table.
Inputs Outputs
B A A>B A=B A<B
0 0 0 1 0
0 1 1 0 0
1 0 0 0 1
1 1 0 0 0
From the above table the obtained expressions for magnitude comparator using K-map are as
follows
For A < B : C = AB
For A = B : D = AB + AB
For A > B : E = AB The logic diagram of 1-bit comparator using basic gates is shown
Counters
Counters can be implemented using the adder/subtractor circuits and registers (or
equivalently, D ip- ops)
The simplest counter circuits can be built using T ip- ops because the tog-gle feature is naturally
suited for the implementation of the counting operation. Counters are available in two categories
The ip- op output transition serves as a source for triggering other ip- ops i.e the C input
(clock input) of some or all ip- ops are triggered NOT by the common clock pulses Eg:-
Binary ripple counters, BCD ripple counters
2. Synchronous counters A synchronous counter however, has an internal clock, and the external
event is used to produce a pulse which is synchronized with this internal clock.
C input (clock input) of all ip- ops receive the common clock pulses
E.g.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter, Johnson
counter,
Figure 28 shows a 3-bit counter capable of counting from 0 to 7. The clock inputs of the
three ip- ops are connected in cascade. The T input of each ip-op is connected to a constant 1, which
means that the state of the ip- op will be toggled at each active edge (here, it is positive edge) of its
clock. We assume that the purpose of this circuit is to count the number of pulses that occur on the
primary input called Clock. Thus the clock input of the rst ip- op is connected to the Clock line. The
other two ip- ops have their clock inputs driven by the Q output of the preceding ip- op. Therefore,
they toggle their states whenever the preceding ip- op changes its state from Q = 1 to Q = 0, which
results in a positive edge of the Q signal.
Note here the value of the count is the indicated by the 3-bit binary number Q2Q1Q0. Since the
second ip- op is clocked by Q0 , the value of Q1 changes shortly after the change of the Q0 signal.
Similarly, the value of Q2 changes shortly after the change of the Q1 signal. This circuit is a modulo-
8 counter. Because it counts in the upward direction, we call it an up-counter. This behavior is
similar to the rippling of carries in a ripple-carry adder. The circuit is therefore called an
asynchronous counter, or a ripple counter.
Some modi cations of the circuit in Figure 4.29 lead to a down-counter which counts in the
sequence 0, 7, 6, 5, 4, 3, 2, 1, 0, 7, and so on. The modi ed circuit is shown in Figure 3.
Here the clock inputs of the second and third ip- ops are driven by the Q outputs of the
preceding stages, rather than by the Q outputs.
Although the asynchronous counter is easier to construct, it has some major disadvantages over
the synchronous counter.
First of all, the asynchronous counter is slow. In a synchronous counter, all the ip- ops will
change states simultaneously while for an asynchronous counter, the propagation delays of the ip-
ops add together to produce the overall delay. Hence, the more bits or number of ip- ops in an
asynchronous counter, the slower it will be.
Secondly, there are certain "risks" when using an asynchronous counter. In a complex system,
many state changes occur on each clock edge and some ICs respond faster than others. If an external
event is allowed to a ect a system whenever it occurs (unsynchronized), there is a small chance that
it will occur near a clock transition, after some IC's have responded, but before others have. This
intermingling of transitions often causes erroneous operations. And the worse this is that these
problems are di cult to foresee and test for because of the random time di erence between the events.
Synchronous Counters
A synchronous counter usually consists of two parts: the memory element and the
combinational element. The memory element is implemented using ip- ops while the combinational
element can be implemented in a number of ways. Using logic gates is the traditional method of
implementing combinational logic and has been applied for decades.
Therefore, if we use T ip- ops to realize the 4-bit counter, then the T inputs should be defined as
T0 = 1
T1 = Q0
T2 = Q0Q1
T3 = Q0Q1Q2
In Figure 30, instead of using AND gates of increased size for each stage, we use a factored
arrangement. This arrangement does not slow down the response of the counter because all inputs
and outputs change their states from the propagation deay from the positive edge of the clock.
Note that a change in the value of Q0 may have to propagate through several AND gates to
reach the ip- ops in the higher stages of the counter, which requires a certain amount of time. This
time must not exceed the clock period. Actually, it must be 3less than the clock period minus the
setup time of the ip- ops. It shows that the circuit behaves as a modulo-16 up-counter. Because all
changes take place with the same delay after the active edge of the Clock signal, the circuit is called
a synchronous counter.
1. Before the clock transition (low to high) that initiates the read operation (1), the row and
column addresses must be applied to the address input pins (ADDR) (2), the chip must be
selected (3), and the Write Enable must be high (4). Note that each of these signals must
be present and valid a specified amount of time (the set up time) before the clock
switches from low to high, and must remain valid for a specified amount of time (the
hold time) after the clock switches (7). When the chip select (CS) is low, the chip is
selected. When it is high (inactive), the chip cannot accept any input signals. The Write
Enable is used to choose between reading and writing. When it is low, a write operation
occurs; when it is high, a read operation occurs.
2. On the rising edge of the clock (CLK) (1), the address is registered and the read cycle
begins.
3. If the Output Enable is being used to control the appearance of data at the output, OE
must go low (5). OE is an asynchronous signal; it can be activated at any time. When
OE is high, the DQs are tri-stated; data from the memory will not appear on the outputs.
4. Data appears at the output pins of the SRAM (6). The time at which the data appears
depends on the access time of the device, the delay associated with the Output Enable
and the type of SRAM you are using. The access time of the SRAM is the amount of
time required to read a bit of data from the memory when all of the timing requirements
have been met.
DRAM:
Dynamic Random Access Memory (DRAM) devices are used in a wide range of
electronics applications. Although they are produced in many sizes and sold in a variety of
packages, their overall operation is essentially the same. DRAMs are designed for the sole
purpose of storing data. The only valid operations on a memory device are reading the data
stored in the device, writing (or storing) data in the device, and refreshing the data
periodically. To improve efficiency and speed, a number of methods for reading and writing
the memory have been developed.. While many aspects of a synchronous DRAM are similar
to an asynchronous DRAM, synchronous operation differs because it uses a clocked interface
and multiple bank architecture.
DRAM Architecture:
DRAM chips are large, rectangular arrays of memory cells with support logic that is used for
reading and writing data in the arrays, and refresh circuitry to maintain the integrity of stored
data.
Memory Arrays:
Memory arrays are arranged in rows and columns of memory cells called wordlines and
bitlines, respectively. Each memory cell has a unique location or address defined by the
intersection of a row and a column.
Memory Cells:
Figure 5 is the timing diagram of a simplified Read cycle that illustrates the following
description. To read the data from a memory cell, the cell must be selected by its row and
column coordinates, the charge on the cell must be sensed, amplified, and sent to the support
circuitry, and the data must be sent to the data output. In terms of timing, the following steps
must occur:
1. The row address must be applied to the address input pins on the memory device for the
prescribed amount of time before RAS goes low (tASR) and held (tRAH) after RAS goes
low.
4. WE must be set high for a read operation to occur prior (tRCS) to the transition of CAS,
and remain high (tRCH) after the transition of CAS.
5. CAS must switch from high to low and remain low (tCAS).
6. OE goes low within the prescribed window of time. Cycling OE is optional; it may be tied
low, if desired.
7. Data appears at the data output pins of the memory device. The time at which the data
appears depends on when RAS (tRAC), CAS (tCAC), and OE (tOEA) went low, and when
the address is supplied (tAA).
8. Before the read cycle can be considered complete, CAS and RAS must return to their
inactive states (tCRP, tRP).
Writing Data To Memory :
Figure 3 is the timing diagram of a simplified Write cycle that illustrates described below. To
write to a memory cell, the row and column address for the cell must be selected and data
must be presented at the data input pins. The chip's onboard logic either charges the memory
cell's capacitor or discharges it, depending on whether a 1 or 0 is to be stored. In terms of
timing, the following steps must occur:
1. The row address must be applied to the address input pins on the memory device for the
prescribed amount of time before RAS goes low and be held for a period of time.
3. A column address must be applied to the address input pins on the memory device for the
prescribed amount of time after RAS goes low and before CAS goes low and held for the
prescribed time
. 4. WE must be set low for a certain time for a write operation to occur (tWP). The timing of
the transitions are determined by CAS going low (tWCS, tWCH).
5. Data must be applied to the data input pins the prescribed amount of time before CAS goes
low (tDS) and held (tDH).
7. Before the write cycle can be considered complete, CAS and RAS must return to their
inactive states. Note: There is considerable latitude within the memory chip's timings with
respect to OE when data is actually written. The memory specifications show how to set up
chip timings for early and delayed write options.
Fig: 39 Writing to DRAM Memory Cell
Read-only memory (usually known by its acronym, ROM) is a class of storage media
used in computers and other electronic devices.
Because data stored in ROM cannot be modified (at least not very quickly or easily),
it is mainly used to distribute firmware
Firmware is software that is very closely tied to specific hardware, and unlikely to
require frequent updates.
In its strictest sense, ROM refers only to mask ROM (the oldest type of solid state
ROM), which is fabricated with the desired data permanently stored in it, and thus
can never be modified.
However, more modern types such as EPROM and flash EEPROM can be erased and
re-programmed multiple times; they are still described as "read-only memory"
because the reprogramming process is generally infrequent, comparatively slow, and
often does not permit random access writes to individual memory locations.
Despite the simplicity of mask ROM, economies of scale and field-programmability
often make reprogrammable technologies more flexible and inexpensive, so that
mask ROM is rarely used in new products as of 2007.
Classic mask-programmed ROM chips are integrated circuits that physically encode
the data to be stored, and thus it is impossible to change their contents after
8fabrication. Other types of non-volatile solid-state memory permit some degree of
modification: Types of ROM Programmable read-only memory (PROM), or onetime
programmable ROM (OTP), can be written to or programmed via a special device
called a PROM programmer. Typically, this device uses high voltages to permanently
destroy or create internal links (fuses or antifuses) within the chip. Consequently, a
PROM can only be programmed once.
Erasable programmable read-only memory (EPROM) can be erased by exposure to
strong ultraviolet light (typically for 10 minutes or longer), then rewritten with a
process that again requires application of higher than usual voltage. Repeated
exposure to UV light will eventually wear out an EPROM, but the endurance of most
EPROM chips exceeds 1000 cycles of erasing and reprogramming. EPROM chip
packages can often be identified by the prominent quartz "window" which allows UV
light to enter. After programming, the window is typically covered with a label to
prevent accidental erasure.
Electrically erasable programmable read-only memory (EEPROM) is based on a
similar semiconductor structure to EPROM, but allows its entire contents (or selected
banks) to be electrically erased, then rewritten electrically, so that they need not be
removed from the computer (or camera, MP3 player, etc.). Writing or flashing an
EEPROM is much slower (milliseconds per bit) than reading from a ROM or writing
to a RAM (nanoseconds in both cases).
Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be
modified one bit at a time. Writing is a very slow process and again requires higher
voltage (usually around 12 V) than is used for read access. EAROMs are intended for
applications that require infrequent and only partial rewriting. EAROM may be used
as non-volatile storage for critical system setup information; in many applications,
EAROM has been supplanted by CMOS RAM supplied by mains power and backed-
up with a lithium battery.
Flash memory (or simply flash) is a modern type of EEPROM invented in 1984.
Flash memory can be erased and rewritten faster than ordinary EEPROM, and newer
designs feature very high endurance (exceeding 1,000,000 cycles). Modern NAND
flash makes efficient use of silicon chip area, resulting in individual ICs with a
capacity as high as 16 GB as of 2007[update]; this feature, along with its endurance
and physical durability, has allowed NAND flash to replace magnetic in some
applications (such as USB flash drives). Flash memory is sometimes called flash
ROM or flash EEPROM when used as a replacement for older ROM types, but not in
applications that take advantage of its ability to be modified quickly
SHIFT REGISTER:
QUEUES :
Queues allow data to be read and written at different rates.
Read and write each use their own clock, data
Queue indicates whether it is full or empty
Build with SRAM and read/write counters (pointers)
Fig: 43 Queue
Programmable Logic Device (PLD) — a general term that refers to any type of integrated circuit
used for implementing digital hardware, where the chip can be configured by the end user to
realize different designs. Programming of such a device often involves placing the chip into a
special programming unit, but some chips can also be configured “in-system”. Another name for
FPDs is programmable logic devices (PLDs); although PLDs encompass the same types of chips
as FPDs, we prefer the term FPD because historically the word PLD has referred to relatively
simple types of devices.
Logic Gates
and
Inputs Outputs
Programmable
(logic variables) ( logic functions)
switches
• PAL* — a Programmable Array Logic (PAL) is a relatively small FPD that has a
programmable AND-plane followed by a fixed OR-plane.
Field Programmable Gate Arrays
Description
FPGA consists three major modules:
1.Configurabale logic block
2.IO block
3.Routing resources.
Xilinx user-programmable gate arrays include two major configurable elements: configurable
logic blocks (CLBs) and input/output blocks (IOBs).
• CLBs provide the functional elements for constructing the user’s logic.
• IOBs provide the interface between the package pins and internal signal lines.
Programmable interconnect resources provide routing paths to connect the inputs and
outputs of these configurable elements to the appropriate networks.
Configurable Logic Blocks implement most of the logic in an FPGA. Two 4-input
function generators (F and G) offer unrestricted versatility. Most combinatorial logic functions
need four or fewer inputs. a third function generator (H) has three inputs. Either zero, one, or two
of these inputs can be the outputs of F and G; the other input(s) are from outside the CLB.
The CLB can, therefore, implement certain functions of up to nine variables, like parity
check or expandable- identity comparison of two sets of four inputs.
Each CLB contains two storage elements that can be used to store the function generator
outputs.
Thirteen CLB inputs and four CLB outputs provide access to the function generators and
storage elements. These inputs and outputs connect to the programmable interconnect resources
outside the block.
Function Generators
Four independent inputs are provided to each of two function generators (F1 - F4 and G1
- G4). These function generators, with outputs labeled F’ and G’, are each capable of
implementing any arbitrarily defined Boolean function of four inputs. The function generators
are implemented as memory look-up tables.
A third function generator, labeled H’, can implement any Boolean function of its three
inputs. Two of these inputs can optionally be the F’ and G’ functional generator outputs.
Alternatively, one or both of these inputs can come from outside the CLB (H2, H0). The third
input must come from outside the block (H1).
Signals from the function generators can exit the CLB on two outputs. F’ or H’ can be
connected to the X output. G’ or H’ can be connected to the Y output.
Figure 2: Simplified Block Diagram of XC4000 Series CLB (RAM and Carry Logic functions not shown)
Flip-Flops
The CLB can pass the combinatorial output(s) to the interconnect network, but can also
store the combinatorial results or other incoming data in one or two flip-flops, and connect their
outputs to the interconnect network as well.
Each CLB F and G function generator contains dedicated arithmetic logic for the fast
generation of carry and borrow signals. This extra output is passed on to the function generator
in the adjacent CLB. The carry chain is independent of normal routing resources. Dedicated fast
carry logic greatly increases the efficiency and performance of adders, subtractors, accumulators,
comparators and counters.
The carry chain in XC4000E devices can run either up or down. At the top and bottom of
the columns where there are no CLBs above or below, the carry is propagated to the right.
Programmable Interconnect
All internal connections are composed of metal segments with programmable switching
points and switching matrices to implement the desired routing. A structured, hierarchical matrix
of routing resources is provided to achieve efficient automated routing.
Interconnect Overview
CLB routing is associated with each row and column of the CLB array.
IOB routing forms a ring (called a VersaRing) around the outside of the CLB array. It
connects the I/O with the internal logic blocks.
Global routing consists of dedicated networks primarily designed to distribute clocks
throughout the device with minimum delay and skew. Global routing can also be used for
other high-fanout signals.
Five interconnect types are distinguished by the relative length of their segments: single-
length lines, double-length lines, quad and octal lines (XC4000X only), and longlines. In the
XC4000X, direct connects allow fast data flow between adjacent CLBs, and between IOBs and
CLBs. Extra routing is included in the IOB pad ring. The XC4000X also includes a ring of octal
interconnect lines near the IOBs to improve pin-swapping and routing to locked pins.
CLB Routing Connections
A high-level diagram of the routing resources associated with one CLB is shown in
Figure. The shaded arrows represent routing present only in XC4000X devices.
CLB inputs and outputs are distributed on all four sides, providing maximum routing
flexibility. In general, the entire architecture is symmetrical and regular. It is well suited to
established placement and routing algorithms. Inputs, outputs, and function generators can freely
swap positions within a CLB to avoid routing congestion during the placement and routing
operation.
The horizontal and vertical single- and double-length lines intersect at a box called a
programmable switch matrix (PSM). Each switch matrix consists of programmable pass
transistors used to establish connections between the lines.
Figure 7: Programmable Switch Matrix (PSM)
Single-Length Lines
Single-length lines provide the greatest interconnect flexibility and offer fast routing
between adjacent blocks. There are eight vertical and eight horizontal single-length lines
associated with each CLB. These lines connect the switching matrices that are located in every
row and a column of CLBs.
Double-Length Lines
The double-length lines consist of a grid of metal segments, each twice as long as the
single-length lines: they run past two CLBs before entering a switch matrix. Double-length lines
are grouped in pairs with the switch matrices staggered, so that each line goes through a switch
matrix at every other row or column of CLBs.
There are four vertical and four horizontal double-length lines associated with each CLB.
These lines provide faster signal routing over intermediate distances, while retaining routing
flexibility. Double-length lines are connected by way of the programmable switch matrices.
Figure 8: Single- and Double-Length Lines, with
Programmable Switch Matrices (PSMs)
I/O Routing
XC4000 Series devices have additional routing around the IOB ring. This routing is
called a VersaRing. The VersaRing facilitates pin-swapping and redesign without affecting board
layout.
CPLD Architecture
Function Block
Each Function Block is comprised of 18 independent macrocells, each capable of
implementing a combinatorial or registered function. The FB also receives global clock, output
enable, and set/reset signals. The FB generates 18 outputs that drive the Fast CONNECT switch
matrix. These 18 outputs and their corresponding output enable signals also drive the IOB.
Logic within the FB is implemented using a sum-of-products representation. Thirty-six
inputs provide 72 true and complement signals into the programmable AND-array to form90
product terms. Any number of these product terms, up to the 90 available, can be allocated to
each macrocell by the product term allocator.
Macrocell
A macrocell array is an approach to the design and manufacture of ASICs. Essentially, it
is a small step up from the otherwise similar gate array, but rather than being a prefabricated
array of simple logic gates, the macrocell array is a prefabricated array of higher-level logic
functions such as flip-flops, ALU functions, registers, and the like.
The Fast CONNECT switch matrix connects signals to the FB inputs. All IOB outputs
(corresponding to user pin inputs) and all FB outputs drive the Fast CONNECT matrix.
I/O Block
The I/O Block (IOB) interfaces between the internal logic and the device user I/O pins.
Each IOB includes an input buffer, output driver, output enable selection multiplexer, and user
programmable ground control.
Standard cell design
Standard cell design is another approach to ASIC using PLDs. In this a library of
standard cells are available for design. The cell module consists of basic gates, flip-flops, and
some commonly used functions such as decoders, multiplexers, and full adders, as well as
memories. Standard cells have lower development cost. In terms of size and performance these
are less efficient.
Design Approach
Design steps to flow to implement vlsi circuits using PLDs
Definition:
Design for testability (DFT) refers to those design techniques that make test generation
and test application cost-effective.
Some terminologies:
Input / output (I/O) pads
• Protection of circuitry on chip from damage
• Care to be taken in handling all MOS circuits
• Provide necessary buffering between the environments On & OFF chip
• Provide for the connections of power supply
• Pads must be always placed around the peripheral
Minimum set of pads include:
• VDD connection pad
• GND(VSS) connection pad
• Input pad
• Output pad
• Bidirectional I/O pad
Designer must be aware of:
• nature of circuitry
• ratio/size of inverters/buffers on which output lines are connected
• how input lines pass through the pad circuit (pass transistor/transmission gate)
System delays
Buses:
• convenient concept in distributing data & control through a system
• bidirectional buses are convenient
• in design of datapath
• problems: capacitive load present
• largest capacitance
• sufficient time must be allowed to charge the total bus
• clock φ1 & φ2
Control paths, selectors & decoders
1. select registers and open pass transistors to connect cells to bus
2. Data propagation delay bus
3. Carry chain delay
The solution to the problem of testing a purely combinational logic block is a good
set of patterns detecting "all" the possible faults.
The first idea to test an N input circuit would be to apply an N-bit counter to the
inputs (controllability), then generate all the 2N combinations, and observe the outputs for
checking (observability). This is called "exhaustive testing", and it is very efficient...
but only for few- input circuits. When the input number increase, this technique becomes
very time consuming.
Most of the time, in exhaustive testing, many patterns do not occur during the
application of the circuit. So instead of spending a huge amount of time searching for
faults everywhere, the possible faults are first enumerated and a set of appropriate vectors
are then generated. This is called "single-path sensitization" and it is based on "fault
oriented testing".
The basic idea is to select a path from the site of a fault, through a sequence
of gates leading to an output of the combinational logic under test. The process is
composed of three steps :
• Manifestation : gate inputs, at the site of the fault, are specified as to generate
the opposite value of the faulty value (0 for SA1, 1 for SA0).
• Propagation : inputs of the other gates are determined so as to propagate the fault
signal along the specified path to the primary output of the circuit. This is done by
setting these inputs to "1" for AND/NAND gates and "0" for OR/NOR gates.
• Consistency : or justification. This final step helps finding the primary
input pattern that will realize all the necessary input values. This is done by
tracing backward from the gate inputs to the primary inputs of the logic in
order to receive the test patterns.
Example1 - SA1 of line1 (L1) : the aim is to find the vector(s) able to detect this fault.
These three steps have led to four possible vectors detecting L1=SA1.
Example 2 - SA1 of line8 (L8) : The same combinational logic having one internal line
SA1
• Manifestation : L8 = 0
• Propagation: Through the AND-gate: L5 = L1 = 1, then L10 = 0 Through the
NOR-gate: we want to have L11 = 0, not to mask L10 = 0.
• Consistency: From the AND-gate L8 = 0 leads to L7 = 0. From the NOT-gate L11
= 0 means L9 = L7 = 1, L7 could not be set to 1 and 0 at the same time. This
incompatibility could not be resolved in this case, and the fault "L8 SA1" remains
undetectable.
D – Algorithm:
Practical guidelines for testability should aim to facilitate test processes in three
main ways:
All "design for test" methods ensure that a design has enough observability and
controllability to provide for a complete and efficient testing. When a node has difficult
access from primary inputs or outputs (pads of the circuit), a very efficient method is to
add internal pads acceding to this kind of node in order, for instance, to control block B2
and observe block B1 with a probe.
It is easy to observe block B1 by adding a pad just on its output, without breaking
the link between the two blocks. The control of the block B2 means to set a 0 or a 1 to its
input, and also to be transparent to the link B1-B2. The logic functions of this purpose are
a NOR- gate, transparent to a zero, and a NAND-gate, transparent to a one. By this way
the control of B2 is possible across these two gates.
In this case the major penalties are extra devices and propagation delays due to
multiplexers. Demultiplexers are also used to improve observability. Using multiplexers
and demultiplexers allows internal access of blocks separately from each other, which is
the basis of techniques based on partitioning or bypassing blocks to observe or control
separately other blocks.
Based on the same principle of partitioning, the counters are sequential elements
that need a large number of vectors to be fully tested. The partitioning of a long counter
corresponds to its division into sub-counters.
The full test of a 16-bit counter requires the application of 216 + 1 = 65537 clock
pulses. If this counter is divided into two 8-bit counters, then each counter can be tested
separately, and the total test time is reduced 128 times (27). This is also useful if there are
subsequent requirements to set the counter to a particular count for tests associated with
other parts of the circuit: pre-loading facilities.
One of the most important problems in sequential logic testing occurs at the time
of power-on, where the first state is random if there were no initialization. In this case it is
impossible to start a test sequence correctly, because of memory effects of the sequential
elements.
Ideally, all memory elements should be able to be set to a known state, but
practically this could be very surface consuming, also it is not always necessary to
initialize all the sequential logic. For example, a serial-in serial-out counter could have
its first flip-flop provided with an initialization, then after a few clock pulses the
counter is in a known state.
Overriding of the tester is necessary some times, and requires the addition of gates
before a Set or a Reset so the tester can override the initialization state of the logic.
Automatic test pattern generators work in logic domains, they view delay
dependent logic as redundant combinational logic. In this case the ATPG will see an
AND of a signal with its complement, and will therefore always compute a 0 on the
output of the AND-gate (instead of a pulse). Adding an OR-gate after the AND-gate
output permits to the ATPG to substitute a clock signal directly.
When a clock signal is gated with any data signal, for example a load signal
coming from a tester, a skew or any other hazard on that signal can cause an error on the
output of logic.
Figure 8.8: Avoid Clock Gating
This is also due to asynchronous type of logic. Clock signals should be distributed
in the circuit with respect to synchronous logic structure.
This is another timing situation to avoid, in which the tester could not be
synchronized if one clock or more are dependent on asynchronous delays (across D-input
of flip-flops, for example).
The self resetting logic is more related to asynchronous logic, since a reset input is
independent of clock signal.
Before the delayed reset, the tester reads the set value and continues the normal
operation. If a reset has occurred before tester observation, then the read value is
erroneous. The solution to this problem is to allow the tester to override by adding an OR-
gate, for example, with an inhibition input coming from the tester. By this way the right
response is given to the tester at the right time.
Figure 8.10: Avoid Self Resetting Logic
Use Bused Structure
The tester can then disconnect any module from the buses by putting its output
into a high- impedance state. Test patterns can then be applied to each module separately.
Testing analog circuit requires a completely different strategy than for digital
circuit. Also the sharp edges of digital signals can cause cross-talk problem to the analog
lines, if they are close to each other.
Figure 8.12: Separate Analog and Digital Circuits
If it is necessary to route digital signals near analog lines, then the digital lines
should be properly balanced and shielded. Also, in the cases of circuits like Analog-
Digital converters, it is better to bring out analog signals for observation before
conversion. For Digital-Analog converters, digital signals are to be brought out also for
observation before conversion.
The set of design for testability guidelines presented above is a set of ad hoc
methods to design random logic in respect with testability requirements. The scan design
techniques are a set of structured approaches to design (for testability) the sequential
circuits.
The major difficulty in testing sequential circuits is determining the internal state
of the circuit. Scan design techniques are directed at improving the controllability and
observability of the internal states of a sequential circuit. By this the problem of testing a
sequential circuit is reduced to that of testing a combinational circuit, since the internal
states of the circuit are under control.
Scan Path
The goal of the scan path technique is to reconfigure a sequential circuit, for the
purpose of testing, into a combinational circuit. Since a sequential circuit is based on a
combinational circuit and some storage elements, the technique of scan path consists in
connecting together all the storage elements to form a long serial shift register. Thus the
internal state of the circuit can be observed and controlled by shifting (scanning) out the
contents of the storage elements. The shift register is then called a scan path.
The storage elements can either be D, J-K, or R-S types of flip-flops, but simple
latches cannot be used in scan path. However, the structure of storage elements is slightly
different than classical ones. Generally the selection of the input source is achieved using
a multiplexer on the data input controlled by an external mode signal. This multiplexer is
integrated into the D-flip-flop, in our case; the D-flip-flop is then called MD-flip-flop
(multiplexed-flip-flop).
The sequential circuit containing a scan path has two modes of operation: a
normal mode and a test mode which configure the storage elements in the scan path.
As analyzed from figure 8.13, in the normal mode, the storage elements are
connected to the combinational circuit, in the loops of the global sequential circuit, which
is considered then as a finite state machine.
In the test mode, the loops are broken and the storage elements are connected
together as a serial shift register (scan path), receiving the same clock signal. The input of
the scan path is called scan-in and the output scan-out. Several scan paths can be
implemented in one same complex circuit if it is necessary, though having several scan-in
inputs and scan-out outputs.
Before applying test patterns, the shift register itself has to be verified by shifting
in all ones i.e. 111...11, or zeros i.e. 000...00, and comparing.
1. Set test mode signal, flip-flops accept data from input scan-in
2. Verify the scan path by shifting in and out test data
3. Set the shift register to an initial state
4. Apply a test pattern to the primary inputs of the circuit
5. Set normal mode, the circuit settles and can monitor the primary outputs
of the circuit
6. Activate the circuit clock for one cycle
7. Return to test mode
8. Scan out the contents of the registers, simultaneously scan in the next pattern
Level sensitivity scan design (LSSD)
`
Advantages:
• Circuit operation is independent of dynamic characteristics of the logic elements
• ATP generation is simplified
• Eliminate hazards and races
• Simplifies test generation and fault simulation
Boundary Scan Test (BST) is a technique involving scan path and self-testing
techniques to resolve the problem of testing boards carrying VLSI integrated circuits
and/or surface mounted devices (SMD).
Printed circuit boards (PCB) are becoming very dense and complex, especially
with SMD circuits, that most test equipment cannot guarantee good fault coverage.
BST (figure 8.15) consists in placing a scan path (shift register) adjacent to each
component pin and to interconnect the cells in order to form a chain around the border of
the circuit. The BST circuits contained on one board are then connected together to form a
single path through the board.
The boundary scan path is provided with serial input and output pads and appropriate
clock pads which make it possible to:
• Test the interconnections between the various chip
• Deliver test data to the chips on board for self-testing
• Test the chips themselves with internal self-test
Procedure:
Set test inputs to all test points
Apply the master reset signal to initialize all memory elements
Set scan-in address & data, then apply the scan clock
Repeat the above step until all internal test inputs are scanned
Clock once for normal operation
Check states of the output points
Read the scan-out states of all memory elements by applying the address
Built-in-self test
Objectives:
1. To reduce test pattern generation cost
2. To reduce volume of test data
3. To reduce test time
Built-in Self Test, or BIST, is the technique of designing additional hardware and
software features into integrated circuits to allow them to perform self-testing, i.e., testing
of their own operation (functionally, parametrically, or both) using their own circuits,
thereby reducing dependence on an external automated test equipment (ATE).
Signature analysis performs polynomial division that is, division of the data out of
the device under test (DUT). This data is represented as a polynomial P(x) which is
divided by a characteristic polynomial C(x) to give the signature R(x), so that
R(x) = P(x)/C(x)
This is summarized as in figure 8.16.
Figure 8.16: BIST – signature analysis
An LFSR is a shift register that, when clocked, advances the signal through the
register from one bit to the next most-significant bit. Some of the outputs are combined in
exclusive-OR configuration to form a feedback mechanism. A linear feedback shift
register can be formed by performing exclusive-OR (Figure 8.16) on the outputs of two or
more of the flip-flops together and feeding those outputs back into the input of one of the
flip-flops.
i0 i1 i2
D0
Clock
Q0 Q1 Q2
A BILBO register (built-in logic block observer) combines normal flipflops with
a few additional gates to provide four different functions. The example circuit shown in
the applet realizes a four-bit register. However, the generalization to larger bit-widths
should be obvious, with the XOR gates in the LFSR feedback path chosen to implement a
good polynomial for the given bit-width.
When the A and B control inputs are both 1, the circuit functions as a normal
parallel D-type register.
When both A and B inputs are 0, the D-inputs are ignored (due to the AND gate
connected to A), but the flipflops are connected as a shift-register via the NOR and XOR
gates. The input to the first flipflop is then selected via the multiplexer controlled by the S
input. If the S input is 1, the multiplexer transmits the value of the external SIN shift-in
input to the first flipflop, so that the BILBO register works as a normal shift-register. This
allows to initialize the register contents using a single signal wire, e.g. from an external
test controller.
If all of the A, B, and S inputs are 0, the flipflops are configured as a shift-register,
again, but the input bit to the first flipflop is computed by the XOR gates in the LFSR
feedback path. This means that the register works as a standard LFSR pseudorandom
pattern generator, useful to drive the logic connected to the Q outputs. Note that the start
value of the LFSR sequence can be set by shifting it in via the SIN input.
Because a BILBO register can be used as a pattern generator for the block it
drives, as well provide signature-analysis for the block it is driven by, a whole circuit can
be made self-testable with very low overhead and with only minimal performance
degradation (two extra gates before the D inputs of the flipflops).
Figure 8.17: BIST – BILBO
Self-checking techniques:
It consists of logic block and checkers should then obey a set of rules in
which the logic block is ‘strongly fault secure’ and the checker ‘strongly code
disjoint’. The code use in data encoding depends on the type of errors that may
occur at the logic block output. In general three types are possible:
• Simple error: one bit only affected at a time.
• Unidirectional error: multiple bits at 1 instead of 0 (or 0 instead of 1)
• Multiple errors: multiple bits affected in any order.
Self-checking techniques are applied to circuits in which security is important
so that fault tolerance is of major interest. Such technique will occupy more area in
silicon than classical techniques such as functional testing but provide very high test
coverage.