UNIT IV
UNIT IV
CIRCUITS
I. Interconnect Parameters:
The wire linking transistors are called interconnect. The sum of width and spacing is called the
wire pitch. The thickness to width ratio is called the aspect ratio. When one wire switches affect
its neighbor through capacitive coupling, this effect is called crosstalk.
1. Capacitance Parameter
2. Resistance Parameter
3. Inductance Parameter
1.Capacitance Parameter:
An isolated wire over the substrate can be designed as a conductor over a ground plane. Wire
capacitance has two components namely parallel plate capacitance and the fringing
capacitance.
2. Resistance Parameter:
3. Inductance Parameter:
The inductance of a circuit states that a changing current passing through an inductor generates a
voltage drop ΔV.
It is possible to compute the inductance a wire directly from its geometry and its environment. A
simpler approach relies on the fact that the capacitance c and the inductance l (per unit length) of
a wire are related by the following expression,
With and µ respectively the permittivity and permeability of the surrounding dielectric.
1. Lumped Model
2. Lumped RC Model- The Elmore Delay
3. Distributed RC line Model/ Distributed RC line Model
4. Transmission Line Model
1. Lumped Model:
The total wire resistance of each wire segment into one single R and similarly combines the
global capacitance into a single capacitor C. This simple model is called the lumped RC model.
The RC tree network, let us consider the simple, non- branched RC chain (or ladder) shown in
figure. This network is worth analyzing because it is a structure that is often encountered in
digital circuits, and also because it represents an approximate model of a resistive-capacitive
wire.
A distributed RC line model is a more appropriate model as shown below which has, R and C
stand for the resistance and capacitance per unit length.
A distributed RLC model of a wire, known as the transmission line model. The transmission line
has the prime property that a signal propagates over the interconnection medium as a wave. This
is in contrast to the distributed RC model, where the signal diffuses from the source to the
destination governed by the diffusion equation i.e.
1. Adders
2. Multipliers
3. Comparators
4. Shift registers
1. Adders:
i. Carry look ahead adder
ii. Manchester carry chain adder
iii. Ripple carry adder
iv. High speed adder
a. Carry skip adder
b. Carry select adder
c. Carry save adder
The CLA is based on the fact that a carry signal will be generated in two cases.
Case (i):
When both inputs A, and B, are 1
Case (ii):
When one of the two bits is 1 and the carry-in (carry of the previous stage) is 1.
where Gi is known as the carry generate signal since a carry (Ci+1) is generated whenever Gi = 1,
regardless of the input carry (Ci)
Pi is known as the carry propagate signal since whenever Pi = 1, the input carry is propagated to
the output carry, i.e. Ci+1 = Ci (note that whenever Pi =1, Gi =0)
Computing the values of Pi and Gi only depend upon the input operand bits (Ai & Bi) it is clearly
shown from the figure and equations.
Thus these signals settle to their steady-state value after the propagation through their respective
gates.
The Boolean expression of the carry outputs of various stages can be obtained by putting 'i'
values i.e. i = 0,1,2,3, in equation (4), then we get,
Disadvantages:
• The disadvantage of the CLA adders more complex for more than 4 bits.
• Thus, CLA adders are usually implemented as 4-bit modules that are used to build large
size adders.
ii. Manchester carry chain adder:
A manchester carry chain generates the intermediate carries by tapping off nodes in the gate that
calculate the most significant carry value.
Fig.7.53 shows the dynamic circuit.
If φ = 0, then recharge occur and output is 1.
If φ= 1, then evaluation occur.
Dynamic Manchester carry chain for the carry bit upto C4 is shown in Fig.7.54
iii. Ripple carry adder:
In ripple carry adder, carry bit is calculated along with the sum bit. Each bit must wait for
calculation of previous carry.
In the ripple carry adder, the output is known after the carry generated by the previous stage is
produced.
Thus, the sum of the most significant bit is only available after the carry signal has rippled
through the adder from the least significant stage to the most significant stage. As a result, the
final sum and carry bits will be valid after a considerable delay.
If n bits are added, then we can get n-bit sum and carry of Cn. Ci= Carry in bit from the previous
column. N bit ripples carry adder needs n full adders with Ci+1 carry out bit.
Drawbacks:
• Circuit is slower.
The carry skip adder provides compromise between a ripple carry adder and a CLA adder. The
carry skip adder divides the words to be added into blocks. Within each block, ripple carry is
used to produce the sum bit and the carry. The carry skip adder reduces the delay due to the carry
computation i.e. by skipping over group of consecutive adder stages.
• The carry-select adder generally consists of two ripple carry adders and a multiplexer.
• Adding two n-bit numbers with a carry-select adder is done with two adders in order to
perform the calculation twice.
• Below is the basic building block of a carry-select adder, where the block size is 4.
• Two 4-bit ripple carry adders are multiplexed together, where the resulting carry and sum
bits are selected by the carry-in.
• Carry save adder is similar to the full adder. It is used when adding multiple numbers.
• All the bits of a carry save adder work in parallel.
• In carry save adder, the carry does not propagate. So, it is faster than carry propagate
adder.
• It has three inputs and produces 2 outputs, carry-out is saved. It is not immediately used
to find the final sum value.
2. Multipliers:
Each partial product is generated by the multiplication of the multiplicand with one
multiplier bit. The partial product are shifted according to their bit orders and then added.
The addition can be performed with normal carry propagate adder.
Multiplier is used in computation process, which multiplies two binary numbers. Basic
operations in multiplication are given below. 0 x 0 = 0, 0 x 1 = 0, 1 x 0 = 0, 1 x 1 = 1
i. Array multiplier
ii. Carry save multiplier
iii. Tree multiplier
i. Array multiplier:
Array multiplier uses an array of cells for calculation. Multiplier circuit is based on
repeated addition and shifting procedure. Each partial product is generated by the
multiplication of the multiplicand with one multiplier digit. N-1 adders are required
where N is the number of multiplier bits.
The generation of N partial products requires N x M 2-bit AND gates. Most of the area of
the multiplier is devoted to the adding of the N partial products, which requires N − 1, M-
bit adders. The shifting of the partial products for their proper alignment is performed by
simple routing and does not require any logic. The overall structure can easily be
compacted into a rectangle, resulting in a very efficient layout.
The method is simple but the delay is high and consumes large area by using ripple carry
adder for array multiplier. Product expression is given below
If two different 4-bit numbers (x0, x1, x2, x3& y0, y1, y2, y3)are multiplied then
This multiplier can accept all the inputs at the same time. An array multiplier for n-bit
word need n(n-2) full adders, n-half adder and n2 AND gates.
n=4
Full adder=n(n-2)
=4(4-2)
Full adder=8
Half adder=n=4
AND gate=n2 =42 =16
FA represents full adder, and HA stands for a half adder or an adder with two inputs.
X3 X2 X1 X0 Y0
X3 X2 X1 X0 Y1 Z0
HA FA FA HA
X3 X2 X1 X0 Y2 Z1
FA FA FA HA
X3 X2 X1 X0 Y3 Z2
FA FA FA HA
Z7 Z6 Z5 Z4 Z3
Figure: 4 x 4 array multiplier using Fulladder, Halfadder and AND gate.
A more efficient realization can be obtained by noticing that the multiplication result
does not change when the output carry bits are passed diagonally downwards instead of
only to the right, as shown in Figure 10.28. We include an extra adder called a vector-
merging adder to generate the final result. The resulting multiplier is called a carry-save
multiplier, because the carry bits are not immediately added, but rather are 'saved' for the
next adder stage. In the final stage, carries and sums are merged in a fast carry-propagate
(e.g. carry-lookahead) adder stage. While this structure has a slightly increased area cost
(one extra adder), it has the advantage that its worst-case critical path is shorter and
uniquely defined, as highlighted in Figure 10.28 and is expressed as
Advantage:
i. Shorter worst case critical path.
Disadvantage
i. Increased area cost (because of one extra adder).
iii. Tree multiplier:
Partial sum adders are rearranged in a tree like fashion to reduce both the critical path and
the number of adder cells. Consider four partial products each of which is four bits wide
as shown in figure 4.13(a). Number of full adders needed for this operation is reduced by
observing that only column 3 in the array has to add four bits. All other columns are less
complex as shown in 4.13(b). Now the original matrix of partial products is reorganized
into a tree shape to visually illustrate its varying depth. The challenge is to realize the
complete matrix with a minimum depth and a minimum number of adder elements. The
first type of operator used to cover the array is a full adder, which takes three inputs and
produces two outputs: the sum, located in the same column and the carry, located in the
next one. For this reason, the full adder is called a 3-2 compressor. It is denoted by a
circle covering three bits. The other operator is the half- adder, which takes two input bits
in a column and produces two outputs. The half adder is denoted by a circle covering two
bits.
To obtain minimal implementation, the tree is covered with full adders and half adders,
starting from its densest part. First, half adder is introduced in columns 4 and 3, as shown
in figure 4.13(b). The reduced tree is shown in figure 4.13(c). A second round of
reductions creates a tree of depth 2, as shown in figure 4.13(d). Only three full adders and
three half adders are used for the reduction process, compared with six full adders and six
half adders in the carry save multiplier. The final stage consists of a simple 2-input adder,
for which any type of adder can be used.
This structure is called the Wallace tree multiplier.This structure is substantially faster
than the carry save structure for large multiplier word lengths.
Disadvantages:
i. Very irregular
ii. Complicated layout
3. Comparators:
Digital or Binary Comparators are made up from standard AND, NOR and NOT gates that
compare the digital signals present at their input terminals and produce an output depending
upon the condition of those inputs. For example, along with being able to add and subtract binary
numbers we need to be able to compare them and determine whether the value of input A is
greater than, smaller than or equal to the value at input B etc. The digital comparator
accomplishes this using several logic gates that operate on the principles of Boolean algebra.
There are two main types of digital comparator available and these are.
1. Identity Comparator - Identity Comparator is a digital comparator that has only one
output terminal for when A=B either "HIGH" A=B=1 or "LOW" A=B=0
2. Magnitude Comparator - Magnitude Comparator is a type of digital comparator that
has three output terminals, one each for equality, A = B greater than, A>B and less than
A<B.
The purpose of a Digital Comparator is to compare a set of variables or unknown numbers, for
example A (A1, A2, A3, .... An, etc) against that of a constant or unknown value such as B (B1,
B2, B3, Bn, etc) and produce an output condition or flag depending upon the result of the
comparison. For example, a magnitude comparator of two 1-bits, (A and B) inputs would
produce the following three output conditions when compared to each other.
A>B, A=B, A<B
Which means: A is greater than B. A is equal to B, and A is less than B
This is useful if we want to compare two variables and want to produce an output when any of
the above three conditions are achieved. For example, produce an output from a counter when a
certain count number is reached. You may notice two distinct features about the comparator from
the above truth table. Firstly, the circuit does not distinguish between either two "0" or two "1"s
as an output A=B is produced when they are both equal, either A=B="0" or A =B ="1".
Secondly, the output condition for A=B resembles that of a commonly available logic gate, the
Exclusive-NOR or Ex-NOR function (equivalence) on each of the n-bits.
4. Shift Register:
A shift register is commonly used in signal-processing applications to store and delay data.
Figure 9.42(a) registers, particular care must be taken that hold times are satisfied. Flip-flops are
rather big, so large, dense shift registers use dual-port RAMs instead. The RAM is configured as
a circular buffer with a pair of counters specifying where the data is read and written. The read
counter is initialized to the first entry and the write counter to the last entry on reset, as shown in
Figure 9.42(b). Alternately, the counters in an N stage shift register can use two 1-of-N hot
registers to track which entries should be read and written. Again one is initialized to point to the
first entry and the other to the last entry. These registers can drive the word lines directly without
the need for a separate decoder, as shown in Figure 9.42(c). One variant of a shift register is a
tapped delay line that offers a variable number of stages of delay. Delay blocks are built from 32-
, 16-, 8-, 4-, 2-, and 1-stage shift registers. Multiplexers control pass-around of the delay blocks
to provide the appropriate total delay.
Shifters are important element in microprocessor design for arithmetic shifting, logical
shifting, rotation functions.
Shifters are mainly used to shift the numbers from one bit position to other.
There are several commonly used shifters are
• Logical shifter
• Arithmetic shifter
• Barrel shifter
• Funnel shifter
i. Logical Shifter:
In logical shifter, the shifter shifts the number left or right and fills the empty spots with 0’s.
Shift left:
MSB: shifted out
LSB: shifted in with a "0"
Examples:
1011
Left shift=0110
Shift right:
MSB: shifted in with a "0"
LSB: shifted out
Examples:
1011
Right shift=0101
ii. Arithmetic Shifters:
Arithmetic shifters is same as the logical shifter, but on right shift fills the MSB with copies of
the sign bit (to properly sign, extend 2's complement numbers when using right shift by k for
division by 2^k)
Examples:
1011
Right shift=1101
For left shift, same as logic shifter.
The simplest funnel shifter design consists of an array of N N-input multiplexers accepting 1-of-
N-hot select signals (one multiplexer for each output bit).
Using nMOS pass transistors for a 4-bit shifter. The shift amount is conditionally inverted and
decoded into select signals that are fed vertically across the array. The outputs are taken
horizontally. Each row of transistors attached to an output forms one of the multiplexers. The
2N-1 inputs run diagonally to the appropriate mux inputs.
IV. Logic Implementation using Programmable Devices:
i.ROM
ii.PLA
iii.FPGA
i.ROM:
A block diagram of a ROM is shown in Figure 5.4. It consists of k inputs and n outputs. The
inputs provide the address for the memory and the outputs give the data bits of the stored word
which is selected by the address. The number of words in a ROM is determined from the fact
that k address input lines are needed to specify 2k words.
Integrated circuit ROM chips have one or more enable inputs and sometimes with three-state
outputs to facilitate the construction of large arrays of ROM. Consider for example a 32 x 8
ROM. The unit consists of 32 words of 8 bits each. There are five input lines that form the binary
numbers from 0 through 31 for the address.
The hardware procedure that programs the ROM results in blowing fuse links according to a
given truth table. For example, programming the ROM according to the truth table given by
Table 5.2 results in the configuration shown in Figure 5.6.
Every 0 listed in the truth table specifies a no connection and every 1 listed specifies a path that
is obtained by a connection. For example, the table specifies the 8- bit word 10110010 for
permanent storage at address 3.
The four 0's in the word are programmed by blowing the fuse links between output 3 of the
decoder and the inputs of the OR gates associated with outputs A6 ,A3, A2, and A0. The four 1's
in the word are marked in the diagram with a x to denote a connection in place of a dot used for
permanent connection in logic diagrams. When the of the ROM is 00011, all the outputs of the
decoder are 0 except for output 3, which is at logic 1.
The signal equivalent to logic 1 at decoder output 3 propagates through the connections to the
OR gate outputs of A7, A5, A4, and A1. The other four outputs remain at 0. The result is that the
stored word 10110010 is applied to the eight data outputs.
Types of ROMs:
The required paths in a ROM may be programmed in four different ways.
• The first is called mask programming and is done by the semiconductor company during
the last fabrication process of the unit. The procedure for fabricating a ROM requires that
the customer fill out the truth table he wishes the ROM to satisfy. The truth table may be
submitted in a special form provided by the manufacturer or in a specified format on a
computer output medium. The manufacturer makes the corresponding mask for the paths
to produce the 1's and 0's according to the customer's truth table.
• Second type of ROM called programmable read-only memory or PROM. When ordered,
PROM units contain all the fuses intact giving all 1's in the bits of the stored words. The
fuses in the PROM are blown by application of a high-voltage pulse to the device through
a special pin. A blown fuse defines a binary 0 state and an intact fuse gives a binary 1
state. This allows the user to program the PROM in the laboratory to achieve the desired
relationship between input addresses and stored words.
• A third type of ROM is the erasable PROM or EPROM. The EPROM can be restructured
to the initial state even though it has been programmed previously. When the EPROM is
placed under a special ultraviolet light for a given period of time, the short wave radiation
discharges the internal floating gates that serve as the programmed connections. After
erase, the EPROM returns to its initial state and can be reprogrammed to a new set of
values.
• The fourth type of ROM is the electrically-erasable PROM (EEPROM or E PROM). It is
like the EPROM except that the previously programmed connections can be erased with
an electrical signal instead of ultraviolet light. The advantage is that the device can be
erased without removing it from its socket.
Example:
Design a combinational circuit using a ROM. The circuit accepts a 3-bit number and
generates an output binary number equal to the square of the input number.
Solution:
The first step is to derive the truth table of the combinational circuit. In most cases this is all that
is needed. In other cases, we can use a partial truth table for the ROM by utilizing certain
properties in the output variables.
Table 5.3 is the truth table for the combinational circuit. Three inputs and six outputs are needed
to accommodate all possible binary numbers. We note that output B0 is always equal to input A0;
so there is no need to generate B0, with a ROM since it is equal to an input variable.
Moreover, output B, is always 0, so this output is a known constant. We actually need to
generate only four outputs with the ROM; the other two are readily obtained. The minimum size
ROM needed must have three inputs and four outputs.
Three inputs specify eight words, so the ROM must be of size 8 × 4. The ROM implementation
is shown in Figure 5.7. The three inputs specify eight words of four bits each.
The truth table in Figure 5.7(b) specifies the information needed for programming the ROM. The
block diagram of Figure 5.7(a) shows the required connections of the combinational circuit.
ii.PLA(PROGRAMMABLE LOGIC ARRAY):
In PLAs, instead of using a decoder as in PROMs, a number (k) of AND gates is used where k <
2𝑛 , (n is the number of inputs). Each of the AND gates can be programmed to generate a
product term of the input variables and does not generate all the minterms as in the ROM. The
AND and OR gates inside the PLA are initially fabricated with the links (fuses) among them.
The specific Boolean functions are implemented in sum of products form by opening appropriate
links and leaving the desired connections.
A block diagram of the PLA is shown in the figure. It consists of n inputs, m outputs, and k
product terms. The product terms constitute a group of k AND gates each of 2n inputs. Links are
inserted between all n inputs and their complement values to each of the AND gates. Links are
also provided between the outputs of the AND gates and the inputs of the OR gates.
Since PLA has m-outputs, the number of OR gates is m. The output of each OR gate goes to an
XOR gate, where the other input has two sets of links, one connected to logic 0 and other to logic
1. It allows the output function to be generated either in the true form or in the complement form.
The output is inverted when the XOR input is connected to 1 (since X ⊕ 1= X ). The output
does not change when the XOR input is connected to 0 (since X ⊕ 0 = X).
Thus, the total number of programmable links is 2n x k + k x m + 2m.
The size of the PLA is specified by the number of inputs (n), the number of product terms (k),
and the number of outputs (m), (the number of sum terms is equal to the number of outputs).
Example 1:
Implement the combinational circuit having the shown truth table, using PLA.
Each product term in the expression requires an AND gate. To minimize the cost, it is necessary
to simplify the function to a minimum number of product terms.
Designing using a PLA, a careful investigation must be taken in order to reduce the distinct
product terms. Both the true and complement forms of each function should be simplified to see
which one can be expressed with fewer product terms and which one provides product terms that
are common to other functions.
F1 = AB + AC + BC or F1 = (AB + AC + BC)’
F2 = AB + AC + A’B’C’
This gives only 4 distinct product terms: AB, AC, BC, and A’B’C’. So the PLA table will be as
follows,
For each product term, the inputs are marked with 1, 0, or – (dash). If a variable in the product
term appears in its normal form (unprimed), the corresponding input variable is marked with A 1
in the Inputs column specifies a path from the corresponding input to the input of the AND gate
that forms the product term. A 0 in the Inputs column specifies a path from the corresponding
complemented input to the input of the AND gate. A dash specifies no connection.
The appropriate fuses are blown and the ones left intact form the desired paths. It is assumed that
the open terminals in the AND gate behave like a 1 input.
In the Outputs column, a T (true) specifies that the other input of the corresponding XOR gate
can be connected to 0, and a C (complement) specifies a connection to 1.
Note that output F1 is the normal (or true) output even though a C (for complement) is marked
over it. This is because F1’ is generated with AND-OR circuit prior to the output XOR. The
output XOR complements the function F1’ to produce the true F1 output as its second input is
connected to logic 1.
Limitations of PLA:
PLAs come in various sizes. Typical size is 16 inputs, 32 product terms, 8 outputs.
• Each AND gate has large fan-in. This limits the number of inputs that can be provided in
a PLA.
• 16 inputs forms 216, possible input combinations; only 32 permitted (since 32 AND
gates) in a typical PLA.
• 32 AND terms permitted large fan-in for OR gates as well.
➢ This makes PLAs slower and slightly more expensive than some alternatives
to be discussed shortly.
• 8 outputs could have shared min-terms, but not required.
Applications of PLA:
PLA-Based CLB :
PLA-based FPGA devices are based on conventional PLDs. The important advantage of this
structure is the logic circuits are implemented using only a few level logic. To improve
integration density logic expander is used.
Multiplexer-Based CLB :
In Multiplexer-based FPGAs to implement the logic circuits the multiplexers are used. The main
advantage of multiplexer-based FPGA is to provide more functionality by using minimum
transistors. Due to large number of inputs, multiplexer-based FPGAs place high demands on
routing.
3. Programmable I/O:
These are mainly buffers that can be configured either as input buffers, output buffers or
input/output buffers. They allow the pins of the FPGA chip to function either as input pins,
output pins or input/output pins. The IOBs provide a simple interface between the internal user
logic and the package pins.
Input Signals:
Two paths, labeled I1 and I2, bring input signals into the array. Inputs also connect to an input
register that can be programmed as either an edge-triggered flip-flop or a level sensitive
transparent-Low latch. The choice is made by placing the appropriate primitive from the symbol
library. The inputs can be globally configured for either TTL (1.2V) or CMOS (2.5V) thresholds.
The two global adjustments of input threshold and output level are independent of each other.
There is a slight hysteresis of about 300mV.Seperate clock signals are provided for the input and
output registers; these clocks can be inverted, generating either falling-edge or rising edge
triggered flip-flops. As is the case with the CLB registers, a global set/reset signal can be used to
set or clear the input and output registers whenever the RESET net is alive.
Registered Inputs:
The I1 and I2 signals that exit the block can each carry either the direct or registered input signal.
The input and output storage elements in each IOB have a common clock enable input, which
through configuration can be activated individually for the input or output flip flop or both. This
clock enable operates exactly like the EC pin on the XC4000E CLB. It cannot be inverted within
the IOB.
FPGA Architecture:
The basic architecture of FPGA consists of an array of logic blocks with programmable row and
column interconnecting channels surrounded by programmable I/O blocks as shown in Fig.
Many FPGA architectures are based on a type of memory called LUT (look-up table) rather than
on (sum of product) SOP AND/OR arrays as CPLDs are. Another approach found on some
FPGAs is the use of multiplexers to generate logic functions.
LUT:
It is the look-up table used in FPGAs is actually a memory device that can be programmed to
perform logic functions. The LUT essentially replaces the AND/OR array logic in a CPLD. As
an example of how an LUT can be used to produce a logic function. Fig. 9.7.2 shows a simple
diagram of an 8 bit by 1 bit (8 x 1) memory programmed to produce to SOP function
ABC+ABC+ABC. When any one of the three product terms appears on the LUT inputs, the
corresponding memory cell storing a 1 is selected and the 1 (HIGH) appears on the output. For
any product terms that are not part of the SOP function, the LUT output is 0 (LOW).
Logic Block:
Each logic block in a generic FPGA contains several logic elements, as shown in Fig. 9.7.3.
Generally there can be well over ten thousand logic elements in a single chip.
Logic Element:
A simplified diagram of a typical FPGA logic element is shown in Fig. 9.7.4. It contains an LUT,
associated logic, and a flip-flop. In this case, each logic element contains a 4-input LUT that can
be programmed as logic function generator. It can be used to produce SOP functions or logic
functions such as adders and comparators. When configured as an adder, the carry in and carry
out allow for adder expansion. Using the cascade logic, an LUT can be expanded by cascading
with LUT's in other logic elements. The programmable selects let you choose either
combinational functions form the LUT output or registered functions from the flip-flop output.
Advantages of FPGA:
Application of FPGA:
• Medical imaging
• Reconfigurable computing
• Speech recognition
• Cryptography
• Bioinformatics
V. Memory Architecture:
Memory is classified based on the following parameters.
i. Size
ii. Timing Parameters
iii. Function
iv. Access Pattern
v. Input/Output Architecture
vi. Application
Size - Size of memory is defined in terms of bits that are equivalent to the number of individual
cells (flip-flops or registers) needed to store the data. The chip designer expresses the memory
size in bytes, Kilobytes, Megabytes, Gigabytes, or Terabytes.
Timing Parameters -Read-access time is defined as the delay between the read request and the
moment the data is available at the output. Write- access time is the time elapsed between a write
request and the final writing of the input data into the memory. Read or write cycle time is the
minimum time required by the memory for successive read or write operation.
Function - Read only memory (ROM) and read write memory (RWM) are the important types
of semiconductor memory based on functions. Data is stored either in flip-flops or as a charge on
a capacitor and the corresponding memory cells are called static and dynamic memories
respectively. RWM are volatile memories, in which the data is lost when the supply voltage is
turned off. ROM is nonvolatile memories, in which disconnection of the supply voltage does not
result in a loss of the stored data.
Access Pattern - Most memories belong to the random access class, which means memory
locations can be read or written in a random order called Random Access Memory (RAM).
Memory types with faster access times, smaller area, or a memory with a special functionality
belongs to First-In First- Out (FIFO), Last-In First-Out (LIFO) used as a stack, and the shift
register. Video memories are examples of this class.
Input/Output Architecture- Semiconductor memories are classified based on the number of
data input and output ports. Most of the memory has only a single port that is shared between
input and output. Memories with higher bandwidth requirement have multiple input and output
data input and output ports are called multiport memories. Examples of multiport memories are
the register files used in Reduced Instruction Set Computer (RISC) microprocessors.
Application - Most large size memories were packaged as standalone ICs. Memories of this type
are called embedded.
The following are the different types of memory architectures.
i. N-word Memory Architecture
ii. Array Structured Memory Architecture
iii. Hierarchical Memory Architecture
iv. Content Addressable Memory Architecture
The memory is partitioned into P smaller blocks. The composition of each of the individual
blocks is identical. A word is selected on the basis of the row and column addresses that are
broadcast to all the blocks. An extra address word called the block address, selects one of the P
blocks to be read or written.
Advantages:
• Local word lengths and bit lines are kept within bounds
• Faster access times
• Block address activate only the addressed block
• Non-active blocks are put in power-saving mode
• Large power saving.
Figure 3.57 shows a 512 word Content Addressable Memory (CAM) that supports three modes
of operation namely read, write and match. The read and write modes access and manipulate data
in the CAM array as in ordinary memory. The match mode is unique to associate memory.
The comparand block is filled with the data pattern to match and the mask word indicates which
bits are significant. Every row that matches the pattern is passed to the validity block. The valid
rows that match are passed to the priority encoder. If two or more rows match the pattern, the
address of the row in the CAM array is used to break the tie. The priority encoder considers all
512 match lines from the CAM array, selects the one with the highest address, and encodes it in
binary. For 512 rows in the CAM array, 9 bits are required to indicate the highest row that
matched.
1. Address Decoders:
Address decoders are present whenever a memory allows for random address-based access. This
decoder design has substantial impact on the speed and power consumption of the memory. The
address decoder is classified into three types.
a. Row Decoders
b. Column and Block Decoders
c. Decoders for Non-random Access Memories
a. Row Decoders:
Row decoders are implemented as static or dynamic logic.
Static Decoder Design - Different methods are used to implement a static decoder. One method
uses pseudo-nMOS design style. This style has no proper power dissipation. So this method is
not widely used. Another method is to split a complex gate into two or more logic layers to
produces both a faster and a cheaper implementation. This decomposition concept makes it
possible to build fast and area-efficient decoders in complementary CMOS, and is used
effectively now-a-days. Segments of the address are decoded in a first logic layer called the
predecorder. A second layer of logic gates then produces the final word-line signals.
Advantages:
i. Reduces number of transistors (For an 8-input single stage decoder 4,096 transistors are
required. But using predecoders only needs 2,112 transistors, which is 52% of a single stage
decoder.)
ii. Number of inputs to the NAND gates is halved.
iii. Propagation delay is reduced by a factor of 4.
Disadvantages:
i. 4-input NAND gate driving the word line, presents a large load.
ii. To provide driver for large capacitances is an inverter is used.
iii. Output of the NAND gate should be buffered.
iv. Rules of logical effort to be used.
Dynamic Decoder Design –NOR decoders are faster and consume more area and power than
NAND logic. The outputs of the array are high by default with the exception of the selected row,
which is low. This "active low" signaling is in correspondence with the word-line requirements
of the NAND ROM.
2. Sense Amplifiers:
Sense amplifiers play a major role in the functionality, performance and reliability of circuits.
The following are the functions of sense amplifiers.
• Amplification
• Delay Reduction
• Power Reduction
• Signal Restoration
i. Differential Voltage Sensing Amplifiers:
The differential approach is directly applicable to SRAM memories. This is because SRAM
memories offer true differential output.
Figure 3.64 shows a differential sense amplifier. Amplification is accomplished with a single
stage, based on the current mirroring concept. The input signals (bit and bit ) are heavily loaded
and driven by the SRAM memory cell. The swing on those lines is small as the small memory
cell drives a large capacitive load. The inputs are fed to the differential input devices M1 and M2.
Transistors M3 and M4 act as an active current mirror load. The amplifier is conditioned by the
sense amplifier enable signal, SE. The inputs are precharged and equalized to a common value,
while SE is low disabling circuit. Once the read operation is initiated, one of the bit lines drops.
SE is enabled when a sufficient differential signal is established and the amplifier evaluates.
The gain of differential-to-single ended amplifier is given by
Asense=-gm1(r02 || r04)
Where,
gm1- Transconductance of the input transistors
r0- Small-signal device resistance of the transistor
Figure 3.65 is a charge-redistribution amplifier which is often used in small memory structures.
The idea is to use the imbalance between a large capacitance Clarge and a much smaller
component Csmall. The two capacitors are isolated by the pass transistor M1. The initial voltages
on nodes L and S (VL0 and VS0) are prechared to Vref-VTn and VDD by connecting node S to the
supply voltage. Because of the voltage drop over M1, VL only precharges to Vref-VT When one if
the pull-down devices, M2 turns ON, node L with its large capacitance slowly discharges. As
long as VL≥Vref - VTn transistor M1 is OFF, and Vs remains constant. Once VL drops below the
trigger voltage (Vref – VT), M1 turns ON. Charge redistribution is initiated, and nodes L and S
equalize. This can happen very fast due to the small capacitance on the latter node.
3. Voltage References:
The operation of a sophisticated memory requires a number of voltage references and supply
levels, including the following.
• Boosted-Word-line Voltage
• Half-VDD
• Reduced Internal Supply
• Negative Substrate
i. Voltage Down Converters:
Voltage down converters is used to create low internal supplies, allowing the interface circuits to
operate at higher voltages. Reduction of supply is necessary to avoid breakdown in the deep-
submicron devices. Level converters act as interfaces between the internal core and the external
circuits. Regulators are used to set a stable internal voltage while accepting a broad range on
unregulated input voltages in battery-operated systems. This states that the battery voltage varies
as a function of time.
Above figure shows the structure of a voltage down converter also called a linear regulator. It is
based on the operational amplifier. The circuit uses a large pMOS output driver transistor to
drive the load memory circuit. The circuit uses negative feedback to set the output voltage VDL to
the reference voltage.
The converter must offer a voltage that is immune to variations in operating conditions. Slow
variations, such as temperature changes, can be compensated by the feedback loop. The load
current drawn by load varies wildly over time.
ii. Charge Pumps:
Word-line boosting and well biasing need voltage sources that increase the supply voltage
requirement. A charge pump is used for this purpose and its concept is shown in figure 3.68.
Transistors M1 and M2 are in diode style. Initially the clock CLK is high. During this phase,
node A is at GND and node B at VDD-VT. The charge stored in the capacitor is,
Q = Cpump (VDD-VT)
During second phase, CLK goes low, raising node A to VDD . Node B rises and shuts OFF M1.
When B is one threshold above Vload ,M2 starts conducting and charge is transferred to Cload
During consecutive clock cycles, the pump continues to deliver charge to Vload until the
maximum voltage of 2 (VDD-VT) is reached at the output. The amount of current drawn from the
generator is determined by the capacitors size and the clock frequency. The efficiency of
generator measures how much current is wasted during every pump cycle. The wastage is
between 30 to 50%. Charge pumps are used for generators that draw little current.
An accurate and stable voltage reference is an important component of the voltage down
converter. The reference voltage is assumed to be relatively constant over power supply and
temperature variations.
4. Drivers/Buffers:
The length of word and bit lines increases with increasing memory sizes. Some of the
performance degradation is done by partitioning the memory array. A large portion of the read
and write access time is attributed to the wire delays. A major part of the memory-periphery area
is therefore allocated to the drivers, in particular the address buffers and the I/O drivers.
To achieve maximum performance careful timing is needed. Timing and control circuitry
occupies a minimal amount of area. Its design is an integral and major part of the memory design
process. This requires careful optimization.