Semiconductor Memory: Santanu Chattopadhyay
Semiconductor Memory: Santanu Chattopadhyay
Santanu Chattopadhyay
Electronics and Electrical Communication Engineering
Introduction
• There are two types of memories that are used in digital systems:
Random-access memory(RAM): perform both the write and read operations.
Read-only memory(ROM): perform only the read operation.
• The cost per bit of DRAM storage is three to four times less than
SRAM. Another factor is lower power requirement.
Address multiplexing
• Address multiplexing will reduce the number of pins in the IC
package.
P1 = XOR of bits(3,5,7,9,11) = 1 ⊕ 1 ⊕ 0 ⊕ 0 ⊕ 0 = 0
P2 = XOR of bits(3,6,7,10,11) = 1 ⊕ 0 ⊕ 0 ⊕ 1 ⊕ 0 = 0
P4 = XOR of bits(5,6,7,12) = 1 ⊕ 0 ⊕ 0 ⊕ 0 = 1
P8 = XOR of bits(9,10,11,12) = 0 ⊕ 1 ⊕ 0 ⊕ 0 = 1
Hamming Code
• The data is stored in memory together with the parity bit as 12-bit composite word.
Bit position: 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 1 1 0 0 1 0 1 0 0
• When read from memory, the parity is checked over the same combination of bits
including the parity bit.
C1 = XOR of bits(3,5,7,9,11)
C2 = XOR of bits(3,6,7,10,11)
C4 = XOR of bits(5,6,7,12)
C8 = XOR of bits(9,10,11,12)
Error-Detection
• A 0 check bit designates an even parity over the checked bits and
a 1 designates an odd parity.
• Since the bits were stored with even parity, the result,
C = C8C4C2C1 = 0000, indicates that no error has occurred.
• If C ≠ 0, then the 4-bit binary number formed by the check bits
gives the position of the erroneous bit.
Example
Bit position: 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 1 1 0 0 1 0 1 0 0 No error
1 0 1 1 1 0 0 1 0 1 0 0 Error in bit 1
0 0 1 1 0 0 0 1 0 1 0 0 Error in bit 5
• Evaluating the XOR of the corresponding bits, get the four check bits
C8 C4 C2 C1
For no error: 0 0 0 0
with error in bit 1: 0 0 0 1
if P = 0, the parity is correct (even parity), but if P = 1, then the parity over
the 13 bits is incorrect (odd parity).
the following four cases can occur:
Single-Error correction, Double-Error detection
1 0 1 1 0 0 1 0
X : means connection
Combinational circuit implementation
• The internal operation of a ROM can be interpreted in two way: First, a
memory unit that contains a fixed pattern of stored words. Second,
implements a combinational circuit.
• Previous figure may be considered as a combinational circuit with eight
outputs, each being a function of the five input variables.
In Table, output A7
Example
• Design a combinational circuit using a ROM. The circuit accepts a 3-bit number and
generates an output binary number equal to the square of the input number.
Derive truth table first
Example
Types of ROMs
• The required paths in a ROM may be programmed in four
different ways.
1. Mask programming: fabrication process
2. Read-only memory or PROM: blown fuse /fuse intact
3. Erasable PROM or EPROM: placed under a special ultraviolet light
for a given period of time will erase the pattern in ROM.
4. Electrically-erasable PROM(EEPROM): erased with an electrical
signal instead of ultraviolet light.
Combinational PLDs
• A combinational PLD is an integrated circuit with programmable
gates divided into an AND array and an OR array to provide an AND-
OR sum of product implementation.
• PROM: fixed AND array constructed as a decoder and
programmable OR array.
• PAL: programmable AND array and fixed OR array.
• PLA: both the AND and OR arrays can be programmed.
Combinational PLDs
Programmable Logic Array
• The decoder in PROM example can be replaced by an array of
AND gates that can be programmed to generate any product
term of the input variables.
• The product terms are then connected to OR gates to provide the
sum of products for the required Boolean functions.
• The output is inverted when the XOR input is connected to 1
(since x⊕1 = x’). The output doesn’t change and connect to 0
(since x⊕0 = x).
PLA
F1 = AB’+AC+A’BC’
F2 = (AC+BC)’
Programming Table
1. First: list the product terms numerically
2. Second: specifiy the required paths between inputs
and AND gates
3. Third: specify the paths between the AND and OR gates
4. For each output variable, we may have a T(ture) or
C(complement) for programming the XOR gate
Simplification of PLA
• Careful investigation must be undertaken in order to
reduce the number of distinct product terms, PLA has a
finite number of AND gates.
• Both the true and complement of each function should
be simplified to see which one can be expressed with
fewer product terms and which one provides product
terms that are common to other functions.
Example
Implement the following two Boolean functions with a PLA:
F1(A, B, C) = ∑(0, 1, 2, 4)
F2(A, B, C) = ∑(0, 5, 6, 7)
The two functions are simplified in the maps shown.
1 elements
0 elements
PLA table by simplifying the function
F1 = (AB + AC + BC)’
F2 = AB + AC + A’B’C’
PLA implementation
Programmable Array Logic
• The PAL is a programmable logic device with a fixed OR array and a programmable
AND array.
PAL
• When designing with a PAL, the Boolean functions must be
simplified to fit into each section.
• Unlike the PLA, a product term cannot be shared among two
or more OR gates. Therefore, each function can be simplified
by itself without regard to common product terms.
• The output terminals are sometimes driven by three-state
buffers or inverters.
Example
w(A, B, C, D) = ∑(2, 12, 13)
x(A, B, C, D) = ∑(7, 8, 9, 10, 11, 12, 13, 14, 15)
y(A, B, C, D) = ∑(0, 2, 3, 4, 5, 6, 7, 8, 10, 11, 15)
z(A, B, C, D) = ∑(1, 2, 8, 12, 13)
w = ABC’ + A’B’CD’
x = A + BCD
y = A’B + CD + B’D’
z = ABC’ + A’B’CD’ + AC’D’ + A’B’C’D = w + AC’D’ + A’B’C’D
PAL Table
• z has four product terms, and we can replace by w with two product terms,
this will reduce the number of terms for z from four to three.
PAL implementation
Fuse map for example
Field Programmable Gate Array
(FPGA)
Santanu Chattopadhyay
Electronics and Electrical Communication Engineering
Evolution of implementation technologies
• Logic gates (1950s-60s) trend toward
higher levels
• Regular structures for two-level logic (1960s-70s) of integration
– muxes and decoders, PLAs
• Programmable sum-of-products arrays (1970s-80s)
– PLDs, complex PLDs
• Programmable gate arrays (1980s-90s)
– densities high enough to permit entirely new
class of application, e.g., prototyping, emulation,
acceleration
Gate Array Technology (IBM - 1970s)
• Simple logic gates
– combine transistors to
implement combinational
and sequential logic
• Interconnect
– wires to connect inputs and
outputs to logic blocks
• I/O blocks
– special blocks at periphery
for external connections
• Add wires to make connections
– done when chip is fabbed
• “mask-programmable”
– construct any circuit
Field-Programmable Gate Arrays
• Logic blocks
– to implement combinational
and sequential logic
• Interconnect
– wires to connect inputs and
outputs to logic blocks
• I/O blocks
– special logic blocks at periphery
of device for external connections
• Key questions:
– how to make logic blocks programmable?
– how to connect the wires?
– after the chip has been fabbed
CPLD vs. FPGA
• CPLD has a somewhat restrictive structure consisting of one or more programmable sum-of-
products logic arrays feeding a relatively small number of clocked registers.
• Results in less flexibility, with the advantage of more predictable timing delays and a higher
logic-to-interconnect ratio.
• The FPGA architectures are dominated by interconnect. This makes them far more flexible (in
terms of the range of designs that are practical for implementation within them) but also far
more complex to design for.
• In practice, the distinction between FPGAs and CPLDs is often one of size as FPGAs are usually
much larger in terms of resources than CPLDs.
• Typically only FPGA's contain more advanced embedded functions such as adders, multipliers,
memory and other hardened functions.
• Another common distinction is that CPLDs contain embedded flash to store their
configuration while FPGAs usually, but not always, require an external flash
Programmability of FPGAs
• User programmability of CPLDs and FPGAs is achieved
via user-programmable switch technologies.
• For CPLDs, floating-gate transistors are used like EPROM
or EEPROM. On the otherhand, FPGAs normally use
SRAM (static RAM) or antifuse technology.
• Properties of the switches, such as, size, on-resistance, and
capacitance dictate trade-offs in architecture.
• In SRAM based FPGAs, there is an SRAM bit
corresponding to each of the programmable points within
the device.
• When the device is powered-on or reset, it reads a configuration
program from an off-chip memory and loads it into on-chip SRAM.
• The configuration program defines the logic function realized by
individual logic blocks and interconnections.
• Devices using SRAM based switching can be reprogrammed easily
by just changing the configuration program.
• FPGAs belonging to Xilinx, Plassey, Algotronix, Concurrent Logic,
Toshiba, etc. are SRAM based.
• SRAM provides fast reprogrammability at the cost of large area (at
least five transistors for cell and one for switch).
SRAM controlled switching
Antifuse based programming
• Antifuses are originally open-circuit, offering very high resistance. However, on
programming (applying a 11-20V across terminals), the resistance becomes very low,
thus, establishing electric connections.
• Antifuses can be made very small using modified CMOS technology, thus offering very
high device density, compared to SRAM.
• However, once programmed, they cannot be reused. Thus, the device is one-time
programmable.
• The structure is commonly known as PLICE (Programmable Low-Impedence Circuit
Element).
• PLICE uses Poly-Si and n+ diffusion as conductors and ONO (silicon diOxide - silicon
Nitride- silicon diOxide) as an insulator.
• The advantages include small size (little more than the cross-section of two metal wires)
and low series resistance.
• It has disadvantages, such as, large size of programming transistors, need of isolation
transistors, and one-time programmability.
• FPGAs from Actel, Quicklogic, Crosspoint, etc. support antifuses.
Antifuse structure
Floating gate
• FPGA devices from Altera, Plus Logic, AMD, etc. use floating gate programming
technology.
• While Altera and Plus Logic use ultraviolet erasable EPROM, AMD uses
electrically
• erasable EEPROM.
• It contains a control gate and a floating gate. The transistor can be disabled by
applying a high voltage between control gate and drain. This injects charge on the
floating gate, increasing the threshold voltage of the transistor – disabling it.
• Charge can be removed by exposing floating gate to ultraviolet light or by erasing
electrically. It provides reprogrammability and unlike SRAM, no external memory is
needed to program the chip on power-up.
• However, EPROM technology requires additional processing steps, high ON
resistance and high static power consumption due to pull-up resistor.
Floating gate programming
Comparison between programming techniques
FPGA Logic Blocks
• There are wide variations in the logic block structure of
FPGAs available from different vendors.
• They vary in number of inputs and outputs, amount of
area consumed, complexity of logic functions that they
can realize, total number of transistors needed, and so
on.
• The logic blocks can broadly be classified into the
following two categories – Fine Grain, Coarse Grain
Fine Grain Logic Block
• The block contains a few transistors that can be interconnected via
programming.
• Crosspoint FPGA uses a single transistor pair for each Boolean variable in
the logic block.
f = ab + c’
Coarse Grain Block – XC4000 from Xilinx
Coarse Grain Block – ACT1 from Actel
Trade-offs
• A large logic block can implement more logic within a single block, requires
lesser number of logic blocks to realize a given functionality on the FPGA.
• On the other hand, a large logic block consumes more space of FPGA.
• 4-input look-up table gives best result in terms of logic synthesized and area
consumed.
• A higher granularity level results in lesser delay between system input and
output.
• With the increase of granularity level, average fanout increases, number of
switches also increases as each block has more pins.
• Also, the length of wires increases with increase in size of logic block.
FPGA Design Flow
Modern FPGAs
• In addition to the basic blocks (such as, logic blocks, I/O blocks and interconnects), modern
FPGAs have additional units that make the design process simpler and more efficient.
• The two major system components, difficult to implement in FPGAs are embedded memories
and blocks for arithmetic calculations.
• Amongst the various calculations, multiplication is the most widely used one. Most of the
modern FPGAs contain embedded logic blocks for multiplication and memories to hold data.
DSP functionalities are highly facilitated by the availability of these.
• In many applications, FPGAs need to communicate with microprocessors. This has motivated
many FPGA vendors to embed soft processor cores within FPGAs. This reduces the latency of
communication between the microprocessor and the FPGA.
Xilinx Virtex-6 and Virtex-7 FPGA
• Each CLB of a Virtex-6 FPGA can be configured as one 6-input LUT or two 5-input LUTs.
• The LUT can also be used as a 64-bit RAM or two 32-bit RAMs.
• Apart from this, every Virtex- 6 FPGA has 156-1064 (depending upon the subfamily) dual port
block RAMs, each storing 36 Kbits.
• They also possess many dedicated, full-custom, low-power DSP slices. Each slice contains 25,
18-bit, 2’s complement multiplier and a 48-bit accumulator.
• Each Virtex-6 device has a 17-channel, 10-bit ADC and 8-72 Gbps transceiver.
• The next advanced version, Virtex-7 is a 3D IC with many improved features.
• The peak transceiver speed varies between 12.5-28.05 Gbps with 36-96 transceivers.
• It can perform 2756-5314 giga multiply accumulates (GMACS) and contains 46.5-85 Mb
block RAM, PCI express bus interface, and upto 1200 I/O pins.