DRAM Circuit Design A Tutorial
DRAM Circuit Design A Tutorial
ISBN:0780360141
This book instructs readers on the nuances of DRAM design, making it accessible for both
novice and practicing engineers and covering particular information necessary for working
with both the analog and digital circuits present in DRAM chips.
Table of Contents
Preface
Acknowledgments
Glossary
Index
List of Figures
List of Tables
Dram Circuit Design—A Tutorial
Brent Keeth
Micron Technology, Inc.
Boise, Idaho
R. Jacob Baker
Boise State University
Micron Technology, Inc.
Boise, Idaho
IEEE Solid-State Circuits Society, Sponsor
IEEE PRESS
This book and other books may be purchased at a discount from the
publisher when ordered in bulk quantities. Contact:
10 9 8 7 6 5 4 3 2 1
ISBN: 0780360141
IEEE Order No. PC5863
Library of Congress Cataloging-in-Publication Data
Keeth, Brent, 1960–
DRAM circuit design: a tutorial/Brent Keeth, R.Jacob Baker.
p. cm.
“IEEE Solid-State Circuits Society, sponsor.”
Includes bibliographical references and index.
ISBN 0-7803-6014-1
1. Semiconductor storage devices Design and construction. I.Baker,
R.Jacob, 1964–
II. Title
For
Susi, John, Katie, Julie, Kyri, Josh,
and the zoo.
About the Authors
Brent Keeth was born in Ogden, Utah, on March 30, 1960. He received
the B.S. and M.S. degrees in electrical engineering from the University of
Idaho, Moscow, in 1982 and 1996, respectively.
Mr. Keeth joined Texas Instruments in 1982, spending the next two years
designing hybrid integrated circuits for avionics control systems and a
variety of military radar sub-systems. From 1984 to 1987, he worked for
General Instruments Corporation designing baseband scrambling and
descrambling equipment for the CATV industry.
Thereafter, he spent 1987 through 1992 with the Grass Valley Group (a
subsidiary of Tektronix) designing professional broadcast, production, and
post-production video equipment. Joining Micron Technology in 1992, he
has engaged in the research and development of various CMOS DRAMs
including 4Mbit, 16Mbit, 64Mbit, 128Mbit, and 256Mbit devices. As a
Principal Fellow at Micron, his present research interests include
high-speed bus protocols and open standard memory design.
In 1995 and 1996, Brent served on the Technical Program Committee for
the Symposium on VLSI Circuits. In addition, he served on the Memory
Subcommittee of the U.S. Program Committee for the 1996 and 1999
IEEE International Solid-State Circuits Conferences. Mr. Keeth holds over
60 U.S. and foreign patents.
R.Jacob Baker (S’83, M’88, SM’97) was born in Ogden, Utah, on October
5, 1964. He received the B.S. and M.S. degrees in electrical engineering
from the University of Nevada, Las Vegas, and the Ph.D. degree in
electrical engineering from the University of Nevada, Reno.
From 1981 to 1987, Dr. Baker served in the United States Marine Corps
Reserves. From 1985 to 1993, he worked for E.G.&G. Energy
Measurements and the Lawrence Livermore National Laboratory
designing nuclear diagnostic instrumentation for underground nuclear
weapons tests at the Nevada test site. During this time, he designed over
30 electronic and electro-optic instruments, including high-speed (750
Mb/s) fiber-optic receiver/transmitters, PLLs, frame- and bit-syncs, data
converters, streak-camera sweep circuits, micro-channel plate gating
circuits, and analog oscilloscope electronics. From 1993 to 2000, he was a
faculty member in the Department of Electrical Engineering at the
University of Idaho. In 2000, he joined a new electrical engineering
program at Boise State University as an associate professor. Also, since
1993, he has consulted for various companies, including the Lawrence
Berkeley Laboratory, Micron Technology, Micron Display, Amkor Wafer
Fabrication Services, Tower Semiconductor, Rendition, and the Tower
ASIC Design Center.
Holding 12 patents in integrated circuit design, Dr. Baker is a member of
Eta Kappa Nu and is a coauthor (with H.Li and D.Boyce) of a popular
textbook covering CMOS analog and digital circuit design entitled, CMOS:
Circuit Design, Layout, and Simulation (IEEE Press, 1998). His research
interests focus mainly on CMOS mixed-signal integrated circuit design.
Preface
From the core memory that rocketed into space during the Apollo moon missions to the
solid-state memories used in today’s commonplace computer, memory technology has
played an important, albeit quiet, role during the last century. It has been quiet in the
sense that memory, although necessary, is not glamorous and sexy, and is instead being
relegated to the role of a commodity. Yet, it is important because memory technology,
specifically, CMOS DRAM technology, has been one of the greatest driving forces in the
advancement of solid-state technology. It remains a driving force today, despite the
segmenting that is beginning to appear in its market space.
The very nature of the commodity memory market, with high product volumes and low
pricing, is what ultimately drives the technology. To survive, let alone remain viable over
the long term, memory manufacturers must work aggressively to drive down their
manufacturing costs while maintaining, if not increasing, their share of the market. One of
the best tools to achieve this goal remains the ability for manufacturers to shrink their
technology, essentially getting more memory chips per wafer through process scaling.
Unfortunately, with all memory manufacturers pursuing the same goals, it is literally a
race to see who can get there first. As a result, there is tremendous pressure to advance
the state of the art—more so than in other related technologies due to the commodity
status of memory.
While the memory industry continues to drive forward, most people can relax and enjoy
the benefits—except for those of you who need to join in the fray. For you, the only way
out is straight ahead, and it is for you that we have written this book.
The goal of DRAM Circuit Design: A Tutorial is to bridge the gap between the introduction
to memory design available in most CMOS circuit texts and the advanced articles on
DRAM design that are available in technical journals and symposium digests. The book
introduces the reader to DRAM theory, history, and circuits in a systematic, tutorial
fashion. The level of detail varies, depending on the topic. In most cases, however, our
aim is merely to introduce the reader to a functional element and illustrate it with one or
more circuits. After gaining familiarity with the purpose and basic operation of a given
circuit, the reader should be able to tackle more detailed papers on the subject. We have
included a thorough list of papers in the Appendix for readers interested in taking that
next step.
The book begins in Chapter 1 with a brief history of DRAM device evolution from the first
1Kbit device to the more recent 64Mbit synchronous devices. This chapter introduces the
reader to basic DRAM operation in order to lay a foundation for more detailed discussion
later. Chapter 2 investigates the DRAM memory array in detail, including fundamental
array circuits needed to access the array. The discussion moves into array architecture
issues in Chapter 3, including a design example comparing known architecture types to a
novel, stacked digitline architecture. This design example should prove useful, for it
delves into important architectural trade-offs and exposes underlying issues in memory
design. Chapter 4 then explores peripheral circuits that support the memory array,
including column decoders and redundancy. The reader should find Chapter 5 very
interesting due to the breadth of circuit types discussed. This includes data path elements,
address path elements, and synchronization circuits. Chapter 6 follows with a discussion
of voltage converters commonly found on DRAM designs. The list of converters includes
voltage regulators, voltage references, VDD/2 generators, and voltage pumps. We wrap
up the book with the Appendix, which directs the reader to a detailed list of papers from
major conferences and journals.
Brent Keeth
R.Jacob Baker
Acknowledgments
We acknowledge with thanks the pioneering work accomplished over the past 30 years
by various engineers, manufacturers, and institutions that have laid the foundation for this
book. Memory design is no different than any other field of endeavor in which new
knowledge is built on prior knowledge. We therefore extend our gratitude to past, present,
and future contributors to this field. We also thank Micron Technology, Inc., and the high
level of support that we received for this work. Specifically, we thank the many individuals
at Micron who contributed in various ways to its completion, including Mary Miller, who
gave significant time and energy to build and edit the manuscript, and Jan Bissey and
crew, who provided the wonderful assortment of SEM photographs used throughout the
text.
Brent Keeth
R.Jacob Baker
Chapter 1: An Introduction to DRAM
Dynamic random access memory (DRAM) integrated circuits (ICs) have existed for more
than twenty-five years. DRAMs evolved from the earliest 1-kilobit (Kb) generation to the
recent 1-gigabit (Gb) generation through advances in both semiconductor process and
circuit design technology. Tremendous advances in process technology have
dramatically reduced feature size, permitting ever higher levels of integration. These
increases in integration have been accompanied by major improvements in component
yield to ensure that overall process solutions remain cost-effective and competitive.
Technology improvements, however, are not limited to semiconductor processing. Many
of the advances in process technology have been accompanied or enabled by advances
in circuit design technology. In most cases, advances in one have enabled advances in
the other. In this chapter, we introduce some fundamentals of the DRAM IC, assuming
that the reader has a basic background in complementary metal-oxide semiconductor
(CMOS) circuit design, layout, and simulation [1].
To gain insight into how modern DRAM chips are designed, it is useful to look into the
evolution of DRAM. In this section, we offer an overview of DRAM types and modes of
operation.
The dynamic nature of DRAM requires that the memory be refreshed periodically so as
not to lose the contents of the memory cells. Later we will discuss the mechanisms that
lead to the dynamic operation of the memory cell. At this point, we discuss how memory
Refresh is accomplished for the 1k DRAM.
Refreshing a DRAM is accomplished internally: external data to the DRAM need not be
applied. To refresh the DRAM, we periodically access the memory with every possible
row address combination. A timing diagram for a Refresh cycle is shown in Figure 1.7.
With the CE input pulled HIGH, the address is changed, while the R/W input is used as a
strobe or clock signal. Internally, the data is read out and then written back into the same
location at full voltage; thus, logic levels are restored (or refreshed).
If we want to read out the contents of the cell, we begin by first precharging the Read
columnline to a known voltage and then driving the Read rowline HIGH. Driving the Read
rowline HIGH turns M3 ON and allows M2 either to pull the Read columnline LOW or to
not change the precharged voltage of the Read columnline. (If M2’s gate is a logic LOW,
then M2 will be OFF, having no effect on the state of the Read columnline.) The main
drawback of using the 3-transistor DRAM cell, and the reason it is no longer used, is that
it requires two pairs of column and rowlines and a large layout area. Modern 1-transistor,
1-capacitor DRAM cells use a single rowline, a single columnline, and considerably less
area.
In the next section, we cover the operation of this cell in detail. Here we introduce the
operation of the cell. Data is written to the cell by driving the rowline (a.k.a., wordline)
HIGH, turning ON the MOSFET, and allowing the columnline (a.k.a., digitline or bitline) to
charge or discharge the storage capacitor. After looking at this circuit for a moment, we
can make the following observations:
1. The wordline (rowline) may be fabricated using polysilicon (poly). This allows
the MOSFET to be formed by crossing the poly wordline over an n+ active
area.
2. To write a full VCC logic voltage (where VCC is the maximum positive power
supply voltage) to the storage capacitor, the rowline must be driven to a
voltage greater than VCC+ the n-channel MOSFET threshold voltage (with
body effect). This voltage, >VCC +VTH, is often labeled VCC pumped (VCCP).
3. The bitline (columnline) may be made using metal or polysilicon. The main
concern, as we’ll show in a moment, is to reduce the parasitic capacitance
associated with the bitline.
Consider the row of N dynamic memory elements shown in Figure 1.13. Typically, in a
modern DRAM, N is 512, which is also the number of bitlines. When a row address is
strobed into the DRAM, via the address input pins using the falling edge of RAS, the
address is decoded to drive a wordline (rowline) to VCCP. This turns ON an entire row in a
DRAM memory array. Turning ON an entire row in a DRAM memory array allows the
information stored on the capacitors to be sensed (for a Read) via the bitlines or allows
the charging or discharging, via the bitlines, of the storage capacitors (for a Write).
Opening a row of data by driving a wordline HIGH is a very important concept for
understanding the modes of DRAM operation. For Refresh, we only need to supply row
addresses during a Refresh operation. For page Reads—when a row is open—a large
amount of data, which is set by the number of columns in the DRAM array, can be
accessed by simply changing the column address.
Figure 1.13: Row of N dynamic memory elements.
We’re now in a position to answer the question: “Why are we limited to increasing the
number of columnlines (or bitlines) used in a memory array?” or “Why do we need to
break up the memory into smaller memory arrays?” The answer to these questions
comes from the realization that the more bitlines we use in an array, the longer the delay
through the wordline (see Figure 1.13).
If we drive the wordline on the left side of Figure 1.13 HIGH, the signal will take a finite
time to reach the end of the wordline (the wordline on the right side of Figure 1.13). This is
due to the distributed resistance/capacitance structure formed by the resistance of the
polysilicon wordline and the capacitance of the MOSFET gates. The delay limits the
speed of DRAM operation. To be precise, it limits how quickly a row can be opened and
closed. To reduce this RC time, a polycide wordline is formed by adding a silicide, for
example, a mixture of a refractory metal such as tungsten with polysilicon, on top of
polysilicon. Using a polycide wordline will have the effect of reducing the wordline
resistance. Also, additional drivers can be placed at different locations along the wordline,
or the wordline can be stitched at various locations with metal.
The limitations on the additional number of wordlines can be understood by realizing that
by adding more wordlines to the array, more parasitic capacitance is added to the bitlines.
This parasitic capacitance becomes important when sensing the value of data charge
stored in the memory element. We’ll discuss this in more detail in the next section.
Our 4-bit word comes from the group-of-four memory arrays (one bit from each memory
array). We can define a page of data in the DRAM by realizing that when we open a row
in each of the four memory arrays, we are accessing 2k of data (512 bits/array×4 arrays).
By simply changing the column address without changing the row address and thus
opening another group-of-four wordlines, we can access the 2k “page” of data. With a
little imagination, we can see different possibilities for the addressing. For example, we
could open 8 group-of-four memory arrays with a row address and thus increase the page
size to 16k, or we could use more than one bit at a time from an array to increase word
size.
From the last section, we know that we can open a row in one or more DRAM arrays
concurrently, allowing a page of data to be written to or read from the DRAM. In this
section, we look at the different modes of operation possible for accessing this data via
the column address decoder. Our goal in this section is not to present all possible modes
of DRAM operation but rather to discuss the modes that have been used in
second-generation DRAMs. These modes are page mode, nibble mode, static column
mode, fast page mode, and extended data out.
Figure 1.14 shows the timing diagram for a page mode Read, Write, and
Read-Modify-Write. We can understand this timing diagram by first noticing that when
RAS goes LOW, we clock in a row address, decode the row address, and then drive a
wordline in one or more memory arrays to VCCP. The result is an open row(s) of data
sitting on the digitlines (columnlines). Only one row can be opened in any single array at a
time. Prior to opening a row, the bitlines are precharged to a known voltage. (Precharging
to VCC/2 is typically performed using internal circuitry.) Also notice at this time that data
out, DOUT, is in a Hi-Z state; that is, the DRAM is not driving the bus line connected to the
DOUT pin.
For the SDRAM Write, we change the syntax of the descriptions of what’s happening in
the part. However, the fundamental operation of the DRAM circuitry is the same as that of
the second-generation DRAMs. We can list these syntax changes as follows:
1. The memory is segmented into banks. For the 64Mb memory of Figure 1.17
and Figure 1.18, each bank has a size of 16Mbs (organized as 4,096 row
addresses [12 bits]×256 column addresses [8 bits]×16 bits [16 DQ I/O pins]).
As discussed earlier, this is nothing more than a simple logic design of the
address decoder (although in most practical situations, the banks are also laid
out so that they are physically in the same area). The bank selected is
determined by the addresses BA0 and BA1.
2. In second-generation DRAMs, we said, “We open a row,” as discussed earlier.
In SDRAM, we now say, “We activate a row in a bank.” We do this by issuing
an active command to the part. Issuing an active command is accomplished
on the rising edge of CLK with a row/bank address applied to the part with CS
and RAS LOW, while CAS and WE are held HIGH.
3. In second-generation DRAMs, we said, “We write to a location given by a
column address,” by driving CAS LOW with the column address applied to the
part and then applying data to the part. In an SDRAM, we write to the part by
issuing the Write command to the part. Issuing a Write command is
accomplished on the rising edge of CLK with a column/bank address applied
to the part: CS, CAS, and WE are held LOW, and RAS is held HIGH.
Table 1.1 shows the commands used in an SDRAM. In addition, this table shows how
inputs/outputs (DQs) can be masked using the DQ mask (DQM) inputs. This feature is
useful when the DRAM is used in graphics applications.
Table 1.1: SDRAM commands. (Notes: 1)
Open table as spreadsheet
Command H X X X X X
inhibit (NOP)
No operation L H H H X X
(NOP)
Burst L H H L X X
terminate
PRECHARGE L L H L X Code
(deactive row
in bank or
banks)
Auto-Refresh L L L H X X
or
Self-Refresh
(enter
Self-Refresh
mode)
Write — — — — L —
Enable/output
Table 1.1: SDRAM commands. (Notes: 1)
Open table as spreadsheet
Write — — — — H —
inhibit/output
Hi-Z
Notes
1 . CKE is HIGH for all commands shown except for Self-Refresh.
2. A0–A11 define the op-code written to the mode register.
3. A0–A11 provide row address, and BA0, BA1 determine which bank is made active.
4. A0–A9 (x4), A0–A8 (x8), or A0–A7 (x16) provide column address; A10 HIGH enables the auto PRECHARG
(nonpersistent), while A 10 LOW disables the auto PRECHARGE feature; BA0, BA1 determine which bank is
written to.
5. A10 LOW: BA0, BA1 determine the bank being precharged. A10 HIGH: all banks precharged and BA0, BA
6. This command is Auto-Refresh if CKE is HIGH and Self-Refresh if CKE is LOW.
7. Internal Refresh counter controls row addressing; all inputs and I/Os are “don’t care” except for CKE.
8. Activates or deactivates the DQs during Writes (zero-clock delay) and Reads (two-clock delay).
SDRAMs often employ pipelining in the address and data paths to increase operating
speed. Pipelining is an effective tool in SDRAM design because it helps disconnect
operating frequency and access latency. Without pipelining, a DRAM can only process
one access instruction at a time. Essentially, the address is held valid internally until data
is fetched from the array and presented to the output buffers. This single instruction mode
of operation ties operating frequency and access time (or latency) together. However,
with pipelining, additional access instructions can be fed into the SDRAM before prior
access instructions have completed, which permits access instructions to be entered at a
higher rate than would otherwise be allowed. Hence, pipelining increases operating
speed.
Pipeline stages in the data path can also be helpful when synchronizing output data to the
system clock. CAS latency refers to a parameter used by the SDRAM to synchronize the
output data from a Read request with a particular edge of the system clock. A typical
Read for an SDRAM with CAS latency set to three is shown in Figure 1.19. SDRAMs
must be capable of reliably functioning over a range of operating frequencies while
maintaining a specified CAS latency. This is often accomplished by configuring the
pipeline stage to register the output data to a specific clock edge, as determined by the
CAS latency parameter.
Figure 1.19: SDRAM with a latency of three.
At this point, we should understand the basics of SDRAM operation, but we may be
asking, “Why are SDRAMs potentially faster than second-generation DRAMs such as
EDO or FPM?” The answer to this question comes from the realization that it’s possible to
activate a row in one bank and then, while the row is opening, perform an operation in
some other bank (such as reading or writing). In addition, one of the banks can be in a
PRECHARGE mode (the bitlines are driven to VCC/2) while accessing one of the other
banks and, thus, in effect hiding PRECHARGE and allowing data to be continuously
written to or read from the SDRAM. (Of course, this depends on which application and
memory address locations are used.) We use a mode register, as shown in Figure 1.20,
to put the SDRAM into specific modes of operation for programmable operation, including
pipelining and burst Reads/Writes of data [2].
Figure 1.20: Mode register.
where C is the capacitance value in farads. Conversely, storing a logic zero in the cell
requires a capacitor with a voltage of −VCC/2 across it. Note that the stored charge on the
mbit capacitor for a logic zero is
The charge is negative with respect to the VCC/2 common node voltage in this state.
Various leakage paths cause the stored capacitor charge to slowly deplete. To return the
stored charge and thereby maintain the stored data state, the cell must be refreshed. The
required refreshing operation is what makes DRAM memory dynamic rather than static.
Figure 1.21: IT 1C DRAM memory cell. (Note the rotation of the rowline and columnline.)
The digitline referred to earlier consists of a conductive line connected to a multitude of
mbit transistors. The conductive line is generally constructed from either metal or
silicide/polycide polysilicon. Because of the quantity of mbits connected to the digitline
and its physical length and proximity to other features, the digitline is highly capacitive.
For instance, a typical value for digitline capacitance on a 0.35 μm process might be
300fF. Digitline capacitance is an important parameter because it dictates many other
aspects of the design. We discuss this further in Section 2.1. For now, we continue
describing basic DRAM operation.
The mbit transistor gate terminal is connected to a wordline (rowline). The wordline, which
is connected to a multitude of mbits, is actually formed of the same polysilicon as that of
the transistor gate. The wordline is physically orthogonal to the digitline. A memory array
is formed by tiling a selected quantity of mbits together such that mbits along a given
digitline do not share a common wordline and mbits along a common wordline do not
share a common digitline. Examples of this are shown in Figures 1.22 and 1.23. In these
layouts, mbits are paired to share a common contact to the digitline, which reduces the
array size by eliminating duplication.
A note is in order here regarding the word size stored in or read out of the memory array.
We may have 512 active bitlines when a single rowline in an array goes HIGH (keeping in
mind once again that only one wordline in an array can go HIGH at any given time). This
literally means that we could have a word size of 512 bits from the active array. The
inherent wide word size has led to the push, at the time of this writing, of embedding
DRAM with a processor (for example, graphics or data). The wide word size and the fact
that the word doesn’t have to be transmitted off-chip can result in lower-power,
higher-speed systems. (Because the memory and processor don’t need to communicate
off-chip, there is no need for power-hungry, high-speed buffers.)
1.2.4 Open/Folded DRAM Array Architectures
Throughout the book, we make a distinction between the open array architecture as
shown in Figures 1.22 and 1.24 and the folded DRAM array used in modern DRAMs and
seen in Figure 1.31. At the cost of increased layout area, folded arrays increase noise
immunity by moving sense amp inputs next to each other. These sense amp inputs come
directly from the DRAM array. The term folded comes from taking the DRAM arrays seen
in Figure 1.24 and folding them together to form the topology seen in Figure 1.31.
REFERENCES
[1] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation.
Piscataway, NJ: IEEE Press, 1998.
It is easier to explain the 8F2 designation with the aid of Figure 2.3. An imaginary box
drawn around the mbit defines the cell’s outer boundary. Along the x-axis, this box
includes one-half digitline contact feature, one wordline feature, one capacitor feature,
one field poly feature, and one-half poly space feature for a total of four features. Along
the y-axis, this box contains two one-half field oxide features and one active area feature
for a total of two features. The area of the mbit is therefore
Ideally, a twisting scheme equalizes the coupling terms from each digitline to all other
digitlines, both true and complement. If implemented properly, the noise terms cancel or
at least produce only common-mode noise to which the differential sense amplifier is
more immune.
Each digitline twist region consumes valuable silicon area. Thus, design engineers resort
to the simplest and most efficient twisting scheme to get the job done. Because the
coupling between adjacent metal lines is inversely proportional to the line spacing, the
signal-to-noise problem gets increasingly worse as DRAMs scale to smaller and smaller
dimensions. Hence, the industry trend is toward use of more complex twisting schemes
on succeeding generations [6] [7].
An alternative to the folded array architecture, popular prior to the 64kbit generation [1],
was the open digitline architecture. Seen schematically in Figure 2.6, this architecture
also features the sense amplifier circuits between two sets of arrays [8]. Unlike the folded
array, however, true and complement digitlines (D and D*) connected to each sense
amplifier pair come from separate arrays [9]. This arrangement precludes using digitline
twisting to improve signal-to-noise performance, which is the prevalent reason why the
industry switched to folded arrays. Note that unlike the folded array architecture, each
wordline in an open digitline architecture connects to mbit transistors on every digitline,
creating crosspoint-style arrays.
Figure 2.7: Open digitline array layout. (Feature size (F) is equal to one-half digitline pitch.)
Furthermore, the capacitor does not add stack height to the design, greatly simplifying
contact technology. The disadvantage of trench capacitor technology is the difficulty
associated with reliably building capacitors in deep silicon holes and connecting the
trench capacitor to the transistor drain terminal.
2.2.6 Configurations
Figure 2.19 shows a sense amplifier block commonly used in double- or triple-metal
designs. It features two Psense-amplifiers outside the isolation transistors, a pair of
EQ/bias (EQb) devices, a single Nsense-amplifier, and a single I/O transistor for each
digitline. Because only half of the sense amplifiers for each array are on one side, this
design is quarter pitch, as are the designs in Figures 2.20 and 2.21. Placement of the
Psense-amplifiers outside the isolation devices is necessary because a full one level (VCC)
cannot pass through unless the gate terminal of the ISO transistors is driven above VCC.
EQ/bias transistors are placed outside of the ISO devices to permit continued
equilibration of digitlines in arrays that are isolated. The I/O transistor gate terminals are
connected to a common CSEL signal for four adjacent digitlines. Each of the four I/O
transistors is tied to a separate I/O bus. This sense amplifier, though simple to implement,
is somewhat larger than other designs due to the presence of two Psense-amplifiers.
Figure 2.20: Complex sense amplifier block.
2.2.7 Operation
A set of signal waveforms is illustrated in Figure 2.24 for the sense amplifier of Figure
2.19. These waveforms depict a Read-Modify-Write cycle (Late Write) in which the cell
data is first read out and then new data is written back. In this example, a one level is read
out of the cell, as indicated by D* rising above D during cell access. A one level is always
+VCC/2 in the mbit cell, regardless of whether it is connected to a true or complement
digitline. The correlation between mbit cell data and the data appearing at the DRAM’s
data terminal (DQ) is a function of the data topology and the presence of data scrambling.
Data or topo scrambling is implemented at the circuit level: it ensures that the mbit data
state and the DQ logic level are in agreement. An mbit one level (+VCC/2) corresponds to
a logic one at the DQ, and an mbit zero level (−VCC/2) corresponds to a logic zero at the
DQ terminal.
Row decode circuits are similar to sense amplifier circuits in that they pitch up to mbit
arrays and have a variety of implementations. A row decode block is comprised of two
basic elements: a wordline driver and an address decoder tree. There are three basic
configurations for wordline driver circuits: the NOR driver, the inverter (CMOS) driver, and
the bootstrap driver. In addition, the drivers and associated decode trees can be
configured either as local row decodes for each array section or as global row decodes
that drive a multitude of array sections.
Global row decodes connect to multiple arrays through metal wordline straps. The straps
are stitched to the polysilicon wordlines at specific intervals dictated by the polysilicon
resistance and the desired RC wordline time constant. Most processes that strap
wordlines with metal do not silicide the polysilicon, although doing so would reduce the
number of stitch regions required. Strapping wordlines and using global row decoders
obviously reduces die size [22], very dramatically in some cases. The disadvantage of
strapping is that it requires an additional metal layer at minimum array pitch. This puts a
tremendous burden on process technologists in which three conductors are at minimum
pitch: wordlines, digitlines, and wordline straps.
Local row decoders, on the other hand, require additional die size rather than metal
straps. It is highly advantageous to reduce the polysilicon resistance in order to stretch
the wordline length and lower the number of row decodes needed. This is commonly
achieved with silicided polysilicon processes. On large DRAMs, such as the 1Gb, the
area penalty can be prohibitive, making low-resistance wordlines all the more necessary.
With the wordline driver circuits behind us, we can turn our attention to the address
decoder tree. There is no big secret to address decoding in the row decoder network. Just
about any type of logic suffices: static, dynamic, pass gate, or a combination thereof. With
any type of logic, however, the primary objectives in decoder design are to maximize
speed and minimize die area. Because a great variety of methods have been used to
implement row address decoder trees, it is next to impossible to cover them all. Instead,
we will give an insight into the possibilities by discussing a few of them.
Regardless of the type of logic with which a row decoder is implemented, the layout must
completely reside beneath the row address signal lines to constitute an efficient,
minimized design. In other words, the metal address tracks dictate the die area available
for the decoder. Any additional tracks necessary to complete the design constitute wasted
silicon. For DRAM designs requiring global row decode schemes, the penalty for
inefficient design may be insignificant; however, for distributed local row decode schemes,
the die area penalty may be significant. As with mbits and sense amplifiers, time spent
optimizing row decode circuits is time well spent.
2.3.7 Predecoding
The row address lines shown as RA1–RA3 can be either true and complement or
predecoded. Predecoded address lines are formed by logically combining (AND)
addresses as shown in Table 2.1.
Table 2.1: Predecoded address truth table.
Open table as spreadsheet
0 0 0 1 0 0 0
1 0 1 0 1 0 0
0 1 2 0 0 1 0
1 1 3 0 0 0 1
The advantages of using predecoded addresses include lower power (fewer signals
make transitions during address changes) and higher efficiency (only three transistors are
necessary to decode six addresses for the circuit of Figure 2.31). Predecoding is
especially beneficial in redundancy circuits. In fact, predecoded addresses are used
throughout most DRAM designs today.
We have briefly examined the basic elements required in DRAM row decoder blocks.
Numerous variations are possible. No single design is best for all applications. As with
sense amplifiers, design depends on technology and performance and cost trade-offs
REFERENCES
[1] K.Itoh, “Trends in Megabit DRAM Circuit Design,” IEEE Journal of Solid-State Circuits, vol.
25, pp. 778–791, June 1990.
[5] R.Kraus and K.Hoffmann, “Optimized Sensing Scheme of DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 24, pp. 895–899, August 1989.
[6] T.Yoshihara, H.Hidaka, Y.Matsuda, and K.Fujishima, “A Twisted Bitline Technique for
Multi-Mb DRAMs,” 1988 IEEE ISSCC Digest of Technical Papers, pp. 238–239.
[17] N.C.-C.Lu and H.H.Chao, “Half-VDD/Bit-Line Sensing Scheme in CMOS DRAMs,” in IEEE
Journal of Solid-State Circuits, vol. SC19, p. 451, August 1984.
[21] T.Ooishi, K.Hamade, M.Asakura, K.Yasuda, H.Hidaka, H.Miyamoto, and H.Ozaki, “An
Automatic Temperature Compensation of Internal Sense Ground for Sub-Quarter Micron
DRAMs,” 1994 Symposium on VLSI Circuits Digest of Technical Papers, pp. 77–78.
[22] K.Noda, T.Saeki, A.Tsujimoto, T.Murotani, and K.Koyama, “A Boosted Dual Word-line
Decoding Scheme for 256 Mb DRAMs,” 1992 Symposium on VLSI Circuits Digest of Technical
Papers, pp. 112–113.
Chapter 3: Array Architectures
This chapter presents a detailed description of the two most prevalent array architectures
under consideration for future large-scale DRAMs: the afore-mentioned open
architectures and folded digitline architectures.
Parameter Value
Digitline width WDL 0.3μm
Digitline pitch PDL 0.6μm
Wordline width WWL 0.3 μm
0.6 μm
2
Wordline pitch for 8F mbit PWL8
Wordline pitch for 6F2 mbit PWL6 0.9 μm
Cell capacitance CC 30fF
Digitline capacitance per mbit CDM 0.8fF
2
Wordline capacitance per 8F mbit CW8 0.6fF
Wordline capacitance per 6F2 mbit CW6 0.5fF
Wordline sheet resistance RS 6Ω/sq
Array core size, as measured in the number of mbits, is restricted by two factors: a desire
to keep the quantity of mbits a power of two and the practical limits on wordline and
digitline length. The need for a binary quantity of mbits in each array core derives from the
binary nature of DRAM addressing. Given N row addresses and M column addresses for
a given part, there are 2N+M addressable mbits. Address decoding is greatly simplified
within a DRAM if array address boundaries are derived directly from address bits.
Because addressing is binary, the boundaries naturally become binary. Therefore, the
size of each array core must necessarily have 2X addressable rows and 2Y addressable
digitlines. The resulting array core size is 2X+Y mbits, which is, of course, a binary number.
The second set of factors limiting array core size involves practical limits on digitline and
wordline length. From earlier discussions in Section 2.1, the digitline capacitance is
limited by two factors. First, the ratio of cell capacitance to digitline capacitance must fall
within a specified range to ensure reliable sensing. Second, operating current and power
for the DRAM is in large part determined by the current required to charge and discharge
the digitlines during each active cycle. Power considerations restrict digitline length for the
256-Mbit generation to approximately 128 mbit pairs (256 rows), with each mbit
connection adding capacitance to the digitline. The power dissipated during a Read or
Refresh operation is proportional to the digitline capacitance (CD), the supply voltage
(VCC), the number of active columns (N), and the Refresh period (P). Accordingly, the
power dissipated is given as
On a 256-Mbit DRAM in 8k (rows) Refresh, there are 32,768 (215) active columns during
each Read, Write, or Refresh operation. The active array current and power dissipation
for a 256-Mbit DRAM appear in Table 3.2 for a 90 ns Refresh period (−5 timing) at
various digitline lengths. The budget for the active array current is limited to 200mA for
this 256-Mbit design. To meet this budget, the digitline cannot exceed a length of 256
mbits.
Table 3.2: Active current and power versus digitline length.
Open table as spreadsheet
where PDL is the digitline pitch, WLW is the wordline width, and CW8 is the wordline
capacitance in an 8F2 mbit cell.
Table 3.3 contains the effective wordline time constants for various wordline lengths. As
shown in the table, the wordline length cannot exceed 512 mbits (512 digitlines) if the
wordline time constant is to remain under 4 nanoseconds.
Table 3.3: Wordline time constant versus wordline length.
Open table as spreadsheet
Accordingly,
where TR is the number of local row decoders, HLDEC is the height of each decoder, TDL is
the number of digitlines including redundant and dummy lines, and PDL is the digitline
pitch. Similarly, the width of the 32-Mbit block is found by summing the total width of the
sense amplifier blocks with the product of the wordline pitch and the number of wordlines.
This bit of math yields
where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy lines, and
PWL6 is the wordline pitch for the 6F2 mbit.
Table 3.4 contains calculation results for the 32-Mbit block shown in Figure 3.2. Although
overall size is the best measure of architectural efficiency, a second popular metric is
array efficiency. Array efficiency is determined by dividing the area consumed by
functionally addressable mbits by the total die area. To simplify the analysis in this book,
peripheral circuits are ignored in the array efficiency calculation. Rather, the calculation
considers only the 32-Mbit memory block, ignoring all other factors. With this
simplification, the array efficiency for a 32-Mbit block is given as
where 225 is the number of addressable mbits in each 32-Mbit block. The open digitline
architecture yields a calculated array efficiency of 51.7%.
Table 3.4: Open digitline (local row decode)—32-Mbit size calculations.
Open table as spreadsheet
Unfortunately, the ideal open digitline architecture presented in Figure 3.2 is difficult to
realize in practice. The difficulty stems from an interdependency between the memory
array and sense amplifier layouts in which each array digitline must connect to one sense
amplifier and each sense amplifier must connect to two array digitlines.
This interdependency, which exists for all array architectures, becomes problematic for
the open digitline architecture. The two digitlines, which connect to a sense amplifier,
must come from two separate memory arrays. As a result, sense amplifier blocks must
always be placed between memory arrays for open digitline array architectures [3], unlike
the depiction in Figure 3.2.
Two layout approaches may be used to achieve this goal. First, design the sense
amplifiers so that the sense amplifier block contains a set of sense amplifiers for each
digitline in the array. This single-pitch solution, shown in Figure 3.3, eliminates the need
for sense amplifiers on both sides of an array core because all of the digitlines connect to
a single sense amplifier block. Not only does this solution eliminate the edge problem, but
it also reduces the 32-Mbit block size. There are now only eight sense amplifier strips
instead of the seventeen of Figure 3.2. Unfortunately, it is nearly impossible to lay out
sense amplifiers in this fashion [4]. Even a single-metal sense amplifier layout,
considered the tightest layout in the industry, achieves only one sense amplifier for every
two digitlines (double-pitch).
This approach solves the array edge problem. However, by producing a larger 32-Mbit
memory block, array efficiency is reduced. Dummy arrays solve the array edge problem
inherent in open digitline architecture but require sense amplifier layouts that are on the
edge of impossible. The problem of sense amplifier layout is all the more difficult because
global column select lines must be routed through. For all intents and purposes, therefore,
the sense amplifier layout cannot be completed without the presence of an additional
conductor, such as a third metal, or without time multiplexed sensing. Thus, for the open
digitline architecture to be successful, an additional metal must be added to the DRAM
process.
With the presence of Metal3, the sense amplifier layout and either a full or hierarchical
global row decoding scheme is made possible. A full global row decoding scheme using
wordline stitching places great demands on metal and contact/via technologies; however,
it represents the most efficient use of the additional metal. Hierarchical row decoding
using bootstrap wordline drivers is slightly less efficient. Wordlines no longer need to be
strapped with metal on pitch, and, thus, process requirements are relaxed significantly [5].
For a balanced perspective, both global and hierarchical approaches are analyzed. The
results of this analysis for the open digitline architecture are summarized in Tables 3.5
and 3.6. Array efficiency for global and hierarchical row decoding calculate to 60.5% and
55.9%, respectively, for the 32Mbit memory blocks based on data from these tables.
Table 3.5: Open digitline (dummy arrays and global row decode)—32-Mbit size
calculations.
Open table as spreadsheet
Table 3.6: Open digitline (dummy arrays and hierarchical row decode)—32-Mbit
size calculations.
Open table as spreadsheet
where PDL is the digitline pitch and CW8 is the wordline capacitance in an 8F2 mbit cell. As
shown in Table 3.7, the wordline length cannot exceed 512 mbits (1,024 digitlines) for the
wordline time constant to remain under 4 nanoseconds. Although the wordline connects
to only 512 mbits, it is two times longer (1,024 digitlines) than wordlines in open digitline
array cores. The folded digitline architecture therefore requires half as many row decode
blocks or wordline stitching regions as the open digitline architecture.
Table 3.7: Wordline time constant versus wordline length (folded).
Open table as spreadsheet
Accordingly,
where TR is the number of row decoders, HRDEC is the height of each decoder, TDL is the
number of digitlines including redundant and dummy, and PDL is the digitline pitch.
Similarly,
where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy, PWL8 is
the wordline pitch for the 8F2 mbit, TTWIST is the total number of twist regions, and WTWIST
is the width of the twist regions.
Table 3.8 shows the calculated results for the 32-Mbit block shown in Figure 3.6. In this
table, a double-metal process is used, which requires local row decoder blocks. Note that
Table 3.8 for the folded digitline architecture contains approximately twice as many
wordlines as does Table 3.5 for the open digitline architecture. The reason for this is that
each wordline in the folded array only connects to mbits on alternating digitlines, whereas
each wordline in the open array connects to mbits on every digitline. A folded digitline
design therefore needs twice as many wordlines as a comparable open digitline design.
Table 3.8: Folded digitline (local row decode)—32-Mbit size calculations.
Open table as spreadsheet
Array efficiency for the 32-Mbit memory block from Figure 3.6 is again found by dividing
the area consumed by functionally addressable mbits by the total die area. For the
simplified analysis presented in this book, the peripheral circuits are ignored. Array
efficiency for the 32-Mbit block is therefore given as
ARCHITECTURE
This section introduces a novel, advanced architecture possible for use on future
large-scale DRAMs. First, we discuss technical objectives for the proposed architecture.
Second, we develop and describe the concept for an advanced array architecture
capable of meeting these objectives. Third, we conceptually construct a 32-Mbit memory
block with this new architecture for a 256-Mbit DRAM. Finally, we compare the results
achieved with the new architecture to those obtained for the open digitline and folded
digitline architectures from Section 3.1.
An underlying goal of the new architecture is to reduce overall die size beyond that
obtainable from either the folded or open digitline architectures. A second yet equally
important goal is to achieve signal-to-noise performance that meets or approaches that of
the folded digitline architecture.
A bilevel digitline architecture resulted from 256-Mbit DRAM research and design
activities carried out at Micron Technology, Inc., in Boise, Idaho. This bilevel digitline
architecture is an innovation that evolved from a comparative analysis of open and folded
digitline architectures. The analysis served as a design catalyst, ultimately leading to the
creation of a new DRAM array configuration—one that allows the use of 6F2 mbits in an
otherwise folded digitline array configuration. These memory cells are a by-product of
crosspoint-style (open digitline) array blocks. Crosspoint-style array blocks require that
every wordline connect to mbit transistors on every digitline, precluding the formation of
digitline pairs. Yet, digitline pairs (columns) remain an essential element in folded
digitline-type operation. Digitline pairs and digitline twisting are important features that
provide for good signal-to-noise performance.
The bilevel digitline architecture solves the crosspoint and digitline pair dilemma through
vertical integration. In vertical integration, essentially, two open digitline crosspoint array
sections are placed side by side, as seen in Figure 3.7. Digitlines in one array section are
designated as true digitlines, while digitlines from the second array section are
designated as complement digitlines. An additional conductor is added to the DRAM
process to complete the formation of the digitline pairs. The added conductor allows
digitlines from each array section to route across the other array section with both true
and complement digitlines vertically aligned. At the juncture between each section, the
true and complement signals are vertically twisted. With this twisting, the true digitline
connects to mbits in one array section, and the complement digitline connects to mbits in
the other array section. This twisting concept is illustrated in Figure 3.8.
Figure 3.7: Development of bilevel digitline architecture.
In the bilevel and folded digitline architectures, both true and complement digitlines exist
in the same array core. Accordingly, the sense amplifier block needs only one sense
amplifier for every two digitline pairs. For the folded digitline architecture, this yields one
sense amplifier for every four Metal 1 digitlines—quarter pitch. The bilevel digitline
architecture that uses vertical digitline stacking needs one sense amplifier for every two
Metal1 digitlines—half pitch. Sense amplifier layout is therefore more difficult for bilevel
than for folded designs. The three-metal DRAM process needed for bilevel architectures
concurrently enables and simplifies sense amplifier layout. Metal1 is used for lower level
digitlines and local routing within the sense amplifiers and row decodes. Metal2 is
available for upper level digitlines and column select signal routing through the sense
amplifiers. Metal3 can therefore be used for column select routing across the arrays and
for control and power routing through the sense amplifiers. The function of Metal2 and
Metal3 can easily be swapped in the sense amplifier block depending on layout
preferences and design objectives.
Wordline pitch is effectively relaxed for the plaid 6F2 mbit of the bilevel digitline
architecture. The mbit is still built using the minimum process feature size of 0.3 μm. The
relaxed wordline pitch stems from structural differences between a folded digitline mbit
and an open digitline or plaid mbit. There are essentially four wordlines running across
each folded digitline mbit pair compared to two wordlines running across each open
digitline or plaid mbit pair. Although the plaid mbit is 25% shorter than a folded mbit (three
versus four features), it also has half as many wordlines, effectively reducing the wordline
pitch. This relaxed wordline pitch makes layout of the wordline drivers and the address
decode tree much easier. In fact, both odd and even wordlines can be driven from the
same row decoder block, thus eliminating half of the row decoder strips in a given array
block. This is an important distinction, as the tight wordline pitch for folded digitline
designs necessitates separate odd and even row decode strips.
Sense amplifier blocks are placed on both sides of each array core. The sense amplifiers
within each block are laid out at half pitch: one sense amplifier for every two Metal 1
digitlines. Each sense amplifier connects through isolation devices to columns (digitline
pairs) from two adjacent array cores. Similar to the folded digitline architecture, odd
columns connect on one side of the array core, and even columns connect on the other
side. Each sense amplifier block is then exclusively connected to either odd or even
columns, never to both.
Unlike a folded digitline architecture that uses a local row decode block connected to both
sides of an array core, the bilevel digitline architecture uses a local row decode block
connected to only one side of each core. As stated earlier, both odd and even rows can
be driven from the same local row decoder block with the relaxed wordline pitch. Because
of this feature, the bilevel digitline architecture is more efficient than alternative
architectures. A four-metal DRAM process allows local row decodes to be replaced by
either stitch regions or local wordline drivers. Either approach could substantially reduce
die size. The array core also includes the three twist regions necessary for the bilevel
digitline architecture. The twist region is larger than that used in the folded digitline
architecture, owing to the complexity of twisting digitlines vertically. The twist regions
again constitute a break in the array structure, making it necessary to include dummy
wordlines.
As with the open digitline and folded digitline architectures, the bilevel digitline length is
limited by power dissipation and a minimum cell-to-digitline capacitance ratio. In the
256-Mbit generation, the digitlines are again restricted from connecting to more than 256
mbits (128 mbit pairs). The analysis to arrive at this quantity is the same as that for the
open digitline architecture, except that the overall digitline capacitance is higher. The
bilevel digitline runs over twice as many cells as the open digitline with the digitline
running in equal lengths in both Metal2 and Metal1. The capacitance added by the Metal2
component is small compared to the already present Metal1 component because Metal2
does not connect to mbit transistors. Overall, the digitline capacitance increases by about
25% compared to an open digitline. The power dissipated during a Read or Refresh
operation is proportional to the digitline capacitance (CD), the supply (internal) voltage
(VCC), the external voltage (VCCX), the number of active columns (N), and the Refresh
period (P). It is given as
On a 256-Mbit DRAM in 8k Refresh, there are 32,768 (215) active columns during each
Read, Write, or Refresh operation. Active array current and power dissipation for a
256-Mbit DRAM are given in Table 3.11 for a 90 ns Refresh period (−5 timing) at various
digitline lengths. The budget for active array current is limited to 200mA for this 256-Mbit
design. To meet this budget, the digitline cannot exceed a length of 256 mbits.
Table 3.11: Active current and power versus bilevel digitline length.
Open table as spreadsheet
As before,
where TR is the number of bilevel row decoders, HRDEC is the height of each decoder, TDL
is the number of bilevel digitline pairs including redundant and dummy, and PDL the
digitline pitch. Also,
where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy, PWL6 is
the wordline pitch for the plaid 6F2 mbit, TTWIST is the total number of twist regions, and
WTWIST is the width of the twist regions. Table 3.12 shows the calculated results for the
bilevel 32-Mbit block shown in Figure 3.13. A three-metal process is assumed in these
calculations because it requires the local row decoders. Array efficiency for the bilevel
digitline 32-Mbit array block, which yields 63.1% for this design example, is given as
With Metal4 added to the bilevel DRAM process, the local row decoder scheme can be
replaced by a global or hierarchical row decoder scheme. The addition of a fourth metal to
the DRAM process places even greater demands on process engineers. Regardless, an
analysis of 32-Mbit array block size was performed assuming the availability of Metal4.
The results of the analysis are shown in Tables 3.13 and 3.14 for the global and
hierarchical row decode schemes. Array efficiency for the 32-Mbit memory block using
global and hierarchical row decoding is 74.5% and 72.5%, respectively.
Table 3.13: Bilevel digitline (global decode)—32-Mbit size calculations.
Open table as spreadsheet
REFERENCES
[1] H.Hidaka, Y.Matsuda, and K.Fujishima, “A Divided/Shared Bit-Line Sensing Scheme for
ULSI DRAM Cores,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 473–77, April 1991.
[5] K.Noda, T.Saeki, A.Tsujimoto, T.Murotani, and K.Koyama, “A Boosted Dual Word-line
Decoding Scheme for 256Mb DRAMs,” 1992 Symposium on VLSI Circuits Digest of Technical
Papers, pp. 112–113.
[9] T.Yoshihara, H.Hidaka, Y.Matsuda, and K.Fujishima, “A Twisted Bitline Technique for
Multi-Mb DRAMs,” 1988 IEEE ISSCC Digest of Technical Papers, pp. 238–239.
Chapter 4: The Peripheral Circuitry
In this chapter, we briefly discuss the peripheral circuitry. In particular, we discuss the
column decoder and its implementation. We also cover the implementation of row and
column redundancy.
Column redundancy adds complexity to the column decoder because the redundancy
operation in FPM and EDO DRAMs requires the decode circuit to terminate column
transitions, prior to completion. In this way, redundant column elements can replace
normal column elements. Generally, the addressed column select is allowed to fire
normally. If a redundant column match occurs for this address, the normal column select
is subsequently turned OFF; the redundant column select is fired. The redundant match is
timed to disable the addressed column select before enabling the I/O devices in the
sense amplifier.
The fire-and-cancel operation used on the FPM and EDO column decoders is best
achieved with static logic gates. In packet-based and synchronous DRAMs, column
select firing can be synchronized to the clock. Synchronous operation, however, does not
favor a fire-and-cancel mode, preferring instead that the redundant match be determined
prior to firing either the addressed or redundant column select. This match is easily
achieved in a pipeline architecture because the redundancy match analysis can be
performed upstream in the address pipeline before presenting the address to the column
decode logic.
A typical FPM- or EDO-type column decoder realized with static logic gates is shown
schematically in Figure 4.1. The address tree is composed of combinations of NAND or
NOR gates. In this figure, the address signals are active HIGH, so the tree begins with
two-input NAND gates. Using predecoded address lines is again preferred. Predecoded
address lines both simplify and reduce the decoder logic because a single input can
represent two or more address terms. In the circuit shown in Figure 4.1, the four input
signals CA23, CA45, CA67, and CA8 represent seven address terms, permitting 1 of 128
decoding. Timing of the column selection is controlled by a signal called column decode
enable (CDE), which is usually combined with an input signal, as shown in Figure 4.2, or
as an additional term in the tree.
Therefore, the column decoder fires either the addressed column select or the redundant
column select in synchrony with the clock. The decode tree is similar to that used for the
CMOS wordline driver; a pass transistor was added so that a decoder enable term could
be included. This term allows the tree to disconnect from the latching column select driver
while new address terms flow into the decoder. A latching driver was used in this pipeline
implementation because it held the previously addressed column select active with the
decode tree disconnected. Essentially, the tree would disconnect after a column select
was fired, and the new address would flow into the tree in anticipation of the next column
select. Concurrently, redundant match information would flow into the phase term driver
along with CA45 address terms to select the correct phase signal. A redundant match
would then override the normal phase term and enable a redundant phase term.
Operation of this column decoder is shown in Figure 4.4. Once again, deselection of the
old column select CSEL<0> and selection of a new column select RCSEL<1> are
enveloped by EQIO. Column transition timing is under the control of the column latch
signal CLATCH*. This signal shuts OFF the old column select and enables firing of the
new column select. Concurrent with CLATCH* firing, the decoder is enabled with decoder
enable (DECEN) to reconnect the decode tree to the column select driver. After the new
column select fires DECEN transitions LOW to once again isolate the decode tree.
Redundancy has been used in DRAM designs since the 256k generation to improve yield
and profitability. In redundancy, spare elements such as rows and columns are used as
logical substitutes for defective elements. The substitution is controlled by a physical
encoding scheme. As memory density and size increase, redundancy continues to gain
importance. The early designs might have used just one form of repairable elements,
relying exclusively on row or column redundancy. Yet as processing complexity increased
and feature size shrank, both types of redundancy—row and column—became
mandatory.
Today various DRAM manufacturers are experimenting with additional forms of repair,
including replacing entire subarrays. The most advanced type of repair, however,
involves using movable saw lines as realized on a prototype 1Gb design [1]. Essentially,
any four good adjacent quadrants from otherwise bad die locations can be combined into
a good die by simply sawing the die along different scribe lines. Although this idea is far
from reaching production, it illustrates the growing importance of redundancy.
4.2.1 Row Redundancy
The concept of row redundancy involves replacing bad wordlines with good wordlines.
There could be any number of problems on the row to be repaired, including shorted or
open wordlines, wordline-to-digitline shorts, or bad mbit transistors and storage
capacitors. The row is not physically but logically replaced. In essence, whenever a row
address is strobed into a DRAM by RAS, the address is compared to the addresses of
known bad rows. If the address comparison produces a match, then a replacement
wordline is fired in place of the normal (bad) wordline.
The replacement wordline can reside anywhere on the DRAM. Its location is not restricted
to the array containing the normal wordline, although its range may be restricted by
architectural considerations. In general, the redundancy is considered local if the
redundant wordline and the normal wordline must always be in the same subarray.
If, however, the redundant wordline can exist in a subarray that does not contain the
normal wordline, the redundancy is considered global. Global repair generally results in
higher yield because the number of rows that can be repaired in a single subarray is not
limited to the number of its redundant rows. Rather, global repair is limited only by the
number of fuse banks, termed repair elements, that are available to any subarray.
Local row repair was prevalent through the 16-Meg generation, producing adequate yield
for minimal cost. Global row repair schemes are becoming more common for 64-Meg or
greater generations throughout the industry. Global repair is especially effective for
repairing clustered failures and offers superior repair solutions on large DRAMs.
Dynamic logic is a traditional favorite among DRAM designers for row redundant match
circuits. Dynamic gates are generally much faster than static gates and well suited to row
redundancy because they are used only once in an entire RAS cycle operation. The
dynamic logic we are referring to again is called precharge and evaluate (P&E). Match
circuits can take many forms; a typical row match circuit is shown in Figure 4.5. It consists
of a PRECHARGE transistor M1, match transistors M2-M5, laser fuses F1-F4, and static
gate I1. In addition, the node labeled row PRECHARGE (RPRE*) is driven by static logic
gates. The fuses generally consist of narrow polysilicon lines that can be blown or opened
with either a precision laser or a fusing current provided by additional circuits (not shown).
Figure 4.5: Row fuse block.
For our example using predecoded addresses, three of the four fuses shown must be
blown in order to program a match address. If, for instance, F2-F4 were blown, the circuit
would match for RA12<0> but not for RA12<1:3>. Prior to RAS falling, the node labeled
RED* is precharged to VCC by the signal RPRE*, which is LOW. Assuming that the circuit
is enabled by fuse F5, EVALUATE* will be LOW. After RAS falls, the row addresses
eventually propagate into the redundant block. If RA 12<0> fires HIGH, RED* discharges
through M2 to ground. If, however, RA 12<0> does not fire HIGH, RED* remains at VCC,
indicating that a match did not occur. A weak latch composed of I1 and M6 ensures that
RED* remains at VCC and does not discharge due to junction leakage.
This latch can easily be overcome by any of the match transistors. The signal RED* is
combined with static logic gates that have similar RED* signals derived from the
remaining predecoded addresses. If all of the RED* signals for a redundant element go
LOW, then a match has occurred, as indicated by row match (RMAT*) firing LOW. The
signal RMAT* stops the normal row from firing and selects the appropriate replacement
wordline. The fuse block in Figure 4.5 shows additional fuses F5-F6 for enabling and
disabling the fuse bank. Disable fuses are important in the event that the redundant
element fails and the redundant wordline must itself be repaired.
The capability to pretest redundant wordlines is an important element in most DRAM
designs today. For the schematic shown in Figure 4.5, the pretest is accomplished
through the set of static gates driven by the redundant test (REDTEST) signal. The input
signals labeled TRA 12n, TRA 34n, TRA 56n, and TODDEVEN are programmed uniquely
for each redundant element through connections to the appropriate predecoded address
lines. REDTEST is HIGH whenever the DRAM is in the redundant row pretest mode. If
the current row address corresponds to the programmed pretest address, RMAT* will be
forced LOW, and the corresponding redundant wordline rather than the normal wordline
will be fired. This pretest capability permits all of the redundant wordlines to be tested
prior to any laser programming.
Fuse banks or redundant elements, as shown in Figure 4.5, are physically associated
with specific redundant wordlines in the array. Each element can fire only one specific
wordline, although generally in multiple subarrays. The number of subarrays that each
element controls depends on the DRAM’s architecture, refresh rate, and redundancy
scheme. It is not uncommon in 16-Meg DRAMs for a redundant row to replace physical
rows in eight separate subarrays at the same time. Obviously, the match circuits must be
fast. Generally, firing of the normal row must be held off until the match circuits have
enough time to evaluate the new row address. As a result, time wasted during this phase
shows up directly on the part’s row access (tRAC) specification.
The column address match (CAM) signals from all of the predecoded addresses are
combined in standard static logic gates to create a column match (CMAT*) signal for the
column fuse block. The CMAT* signal, when active, cancels normal CSEL signals and
enables redundant RCSEL signals, as described in Section 4.1. Each column fuse block
is active only when its corresponding enable fuse has been blown. The column fuse block
usually contains a disable fuse for the same reason as a row redundant block: to repair a
redundant element. Column redundant pretest is implemented somewhat differently in
Figure 4.6 than row redundant pretest here. In Figure 4.6, the bottom fuse terminal is not
connected directly to ground. Rather, all of the signals for the entire column fuse block are
brought out and programmed either to ground or to a column pretest signal from the test
circuitry.
During standard part operation, the pretest signal is biased to ground, allowing the fuses
to be read normally. However, during column redundant pretest, this signal is brought to
VCC, which makes the laser fuses appear to be programmed. The fuse/latch circuits latch
the apparent fuse states on the next RAS cycle. Then, subsequent column accesses
allow the redundant column elements to be pretested by merely addressing them via their
pre-programmed match addresses.
The method of pretesting just described always uses the match circuits to select a
redundant column. It is a superior method to that described for the row redundant pretest
because it tests both the redundant element and its match circuit. Furthermore, as the
match circuit is essentially unaltered during redundant column pretest, the test is a better
measure of the obtainable DRAM performance when the redundant element is active.
Obviously, the row and column redundant circuits that are described in this section are
only one embodiment of what could be considered a wealth of possibilities. It seems that
all DRAM designs use some alternate form of redundancy. Other types of fuse elements
could be used in place of the laser fuses that are described. A simple transistor could
replace the laser fuses in either Figure 4.5 or Figure 4.6, its gate being connected to an
alternative fuse element. Furthermore, circuit polarity could be reversed and
non-predecoded addressing and other types of logic could be used. The options are
nearly limitless. Figure 4.7 shows a SEM image of a set of poly fuses.
REFERENCES
Essentially, as the quantity of input buffers is reduced, the amount of column address
space for the remaining buffers is increased. This concept is easy to understand as it
relates to a 16Mb DRAM. As a x16 part, this DRAM has 1 mbit per data pin; as a x8 part,
2 mbits per data pin; and as a x4 part, 4 mbits per data pin. For each configuration, the
number of array sections available to an input buffer must change. By using Data Write
muxes that permit a given input buffer to drive as few or as many Write driver circuits as
required, design flexibility is easily accommodated.
Address compression and data compression are two special test modes that are usually
supported by the data path design. Test modes are included in a DRAM design to extend
test capabilities or speed component testing or to subject a part to conditions that are not
seen during normal operation. Compression test modes yield shorter test times by
allowing data from multiple array locations to be tested and compressed on-chip, thereby
reducing the effective memory size by a factor of 128 or more in some cases. Address
compression, usually on the order of 4x to 32x, is accomplished by internally treating
certain address bits as “don’t care” addresses.
The data from all of the “don’t care” address locations, which correspond to specific data
input/output pads (DQ pins), are compared using special match circuits. Match circuits
are usually realized with NAND and NOR logic gates or through P&E-type drivers on the
differential DR<n> buses. The match circuits determine if the data from each address
location is the same, reporting the result on the respective DQ pin as a match or a fail.
The data path must be designed to support the desired level of address compression.
This may necessitate more DCSA circuits, logic, and pathways than are necessary for
normal operation.
The second form of test compression is data compression: combining data at the output
drivers. Data compression usually reduces the number of DQ pins to four. This
compression reduces the number of tester pins required for each part and increases the
throughput by allowing additional parts to be tested in parallel. In this way, x16 parts
accommodate 4x data compression, and x8 parts accommodate 2x data compression.
The cost of any additional circuitry to implement address and data compression must be
balanced against the benefits derived from test time reduction. It is also important that
operation in test mode correlate 100% with operation in non-test mode. Correlation is
often difficult to achieve, however, because additional circuitry must be activated during
compression, modifying noise and power characteristics on the die.
The address path for a DRAM can be broken into two parts: the row address path and the
column address path. The design of each path is dictated by a unique set of requirements.
The address path, unlike the data path, is unidirectional, with address information flowing
only into the DRAM. The address path must achieve a high level of performance with
minimal power and die area just like any other aspect of DRAM design. Both paths are
designed to minimize propagation delay and maximize DRAM performance. In this
chapter, we discuss various elements of the row and column address paths.
4K 4,096 1,024 12 10
2K 2,048 2,048 11 11
1K 1,024 4,096 10 12
As DRAM clock speeds continue to increase, the skew becomes the dominating concern,
outweighing the RDLL disadvantage of longer time to acquire lock.
This section describes an RSDLL (register-controlled symmetrical DLL), which meets the
requirements of DDR SDRAM. (Read/Write accesses occur on both rising and falling
edges of the clock.) Here, symmetrical means that the delay line used in the DLL has the
same delay whether a HIGH-to-LOW or a LOW-to-HIGH logic signal is propagating along
the line. The data output timing diagram of a DDR SDRAM is shown in Figure 5.23. The
RSDLL increases the valid output data window and diminishes the undefined tDSDQ by
synchronizing both the rising and falling edges of the DQS signal with the output data DQ.
Another concern with the phase-detector design is the design of the flip-flops (FFs). To
minimize the static phase error, very fast FFs should be used, ideally with zero setup
time.
Also, the metastability of the flip-flops becomes a concern as the loop locks. This,
together with possible noise contributions and the need to wait, as discussed above,
before implementing a shift-right or -left, may make it more desirable to add more filtering
in the phase detector. Some possibilities include increasing the divider ratio of the phase
detector or using a shift register in the phase detector to determine when a number
of—say, four—shift-rights or—lefts have occurred. For the design in Figure 5.26, a
divide-by-two was used in the phase detector due to lock-time requirements.
Figure 5.29: Measured ICC (DLL current consumption) versus input frequency.
5.3.6 Discussion
In this section we have presented one possibility for the design of a delay-locked loop.
While there are others, this design is simple, manufacturable, and scalable.
In many situations the resolution of the phase detector must be decreased. A useful
circuit to determine which one of two signals occurs earlier in time is shown in Figure 5.30.
This circuit is called an arbiter. If S1 occurs slightly before S2 then the output SO1 will go
HIGH, while the output SO2 stays LOW. If S2 occurs before S1, then the output SO2
goes HIGH and SO1 remains LOW. The fact that the inverters on the outputs are
powered from the SR latch (the cross-coupled NAND gates) ensures that SO1 and SO2
cannot be HIGH at the same time. When designed and laid out correctly, this circuit is
capable of discriminating tens of picoseconds of difference between the rising edges of
the two input signals.
Figure 5.30: Two-way arbiter as a phase detector.
The arbiter alone cannot be capable of controlling the shift register. A simple logic block
to generate shift-right and shift-left signals is shown in Figure 5.31. The rising edge of
SO1 or SO2 is used to clock two D-latches so that the shift-right and shift-left signals may
be held HIGH for more than one clock cycle. Figure 5.31 uses a divide-by-two to hold the
shift signals valid for two clock cycles. This is important because the output of the arbiter
can have glitches coming from the different times when the inputs go back LOW. Note
that using an arbiter-based phase detector alone can result in an alternating sequence of
shift-right, shift-left. We eliminated this problem in the phase-detector of Figure 5.24 by
introducing the dead zone so that a minimum delay spacing of the clocks would result in
no shifting.
[1]
This material is taken directly from [4].
REFERENCES
[1] H.Ikeda and H.Inukai, “High-Speed DRAM Architecture Development,” IEEE Journal of
Solid-State Circuits, vol. 34, no. 5, pp. 685–692, May 1999.
[2] M.Bazes,“Two Novel Full Complementary Self-Biased CMOS Differential Amplifiers,” IEEE
Journal of Solid-State Circuits, vol. 26, no. 2, pp. 165–168, February 1991.
[3] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation,
Piscataway, NJ: IEEE Press, 1998.
[4] F.Lin, J.Miller, A.Schoenfeld, M.Ma, and R.J.Baker, “A Register-Controlled Symmetrical
DLL for Double-Data-Rate DRAM," IEEE Journal of Solid-State Circuits, vol. 34, no. 4, 1999.
All DRAM voltage regulators are built from the same basic elements: a voltage reference,
one or more output power stages, and some form of control circuit. How each of these
elements is realized and combined into the overall design is the product of process and
design limitations and the design engineer’s preferences. In the paragraphs that follow,
we discuss each element, overall design objectives, and one or more circuit
implementations.
Although static voltage characteristics of the DRAM regulator are determined by the
voltage reference circuit, dynamic voltage characteristics are dictated by the power
stages. The power stage is therefore a critical element in overall DRAM performance. To
date, the most prevalent type of power stage among DRAM designers is a simple,
unbuffered op-amp. Unbuffered op-amps, while providing high open loop gain, fast
response, and low offset, allow design engineers to use feedback in the overall regulator
design. Feedback reduces temperature and process sensitivity and ensures better load
regulation than any type of open loop system. Design of the op-amps, however, is
anything but simple.
The ideal power stage would have high bandwidth, high open-loop gain, high slew rate,
low systematic offset, low operating current, high drive, and inherent stability.
Unfortunately, several of these parameters are contradictory, which compromises certain
aspects of the design and necessitates trade-offs. While it seems that many DRAM
manufacturers use a single opamp for the regulator’s power stage, we have found that it
is better to use a multitude of smaller op-amps. These smaller op-amps have wider
bandwidth, greater design flexibility, and an easier layout than a single, large opamp.
The power op-amp is shown in Figure 6.7. The schematic diagram for a voltage regulator
power stage is shown in Figure 6.8. This design is used on a 256Mb DRAM and consists
of 18 power op-amps, one boost amp, and one small standby op-amp. The VCC power
buses for the array and peripheral circuits are isolated except for the 20-ohm resistor that
bridges the two together. Isolating the buses is important to prevent high-current spikes
that occur in the array circuits from affecting the peripheral circuits. Failure to isolate
these buses can result in speed degradation for the DRAM because high-current spikes
in the array cause voltage cratering and a corresponding slow-down in logic transitions.
6.2.1 Pumps
Voltage pump operation can be understood with the assistance of the simple voltage
pump circuit depicted in Figure 6.10. For this positive pump circuit, imagine, for one
phase of a pump cycle, that the clock CLK is HIGH. During this phase, node A is at
ground and node B is clamped to VCC−VTH by transistor M1. The charge stored in
capacitor C1 is then
During the second phase, the clock CLK will transition LOW, which brings node A HIGH.
As node A rises to VCC, node B begins to rise to VCC+ (VCC−VTH), shutting OFF transistor
M1. At the same time, as node B rises one VTH above VLOAD, transistor M2 begins to
conduct. The charge from capacitor C1 is transferred through M2 and shared with the
capacitor CLOAD. This action effectively pumps charge into CLOAD and ultimately raises the
voltage VOUT. During subsequent clock cycles, the voltage pump continues to deliver
charge to CLOAD until the voltage VOUT equals 2VCC−VTH1−VTH2, one VTH below the peak
voltage occurring at node B. A simple, negative voltage pump could be built from the
circuit of Figure 6.10 by substituting PMOS transistors for the two NMOS transistors
shown and moving their respective gate connections.
Two important characteristics of a voltage pump are capacity and efficiency. Capacity is a
measure of how much current a pump can continue to supply, and it is determined
primarily by the capacitor’s size and its operating frequency. The operating frequency is
limited by the rate at which the pump capacitor C1 can be charged and discharged.
Efficiency, on the other hand, is a measure of how much charge or current is wasted
during each pump cycle. A typical DRAM voltage pump might be 30–50% efficient. This
translates into 2–3 milliamps of supply current for every milliamp of pump output current.
In addition to the pump circuits just described, regulator and oscillator circuits are needed
to complete a voltage pump design. The most common oscillator used in voltage pumps
is the standard CMOS ring oscillator. A typical voltage pump ring oscillator is shown in
Figure 6.13. A unique feature of this oscillator is the multifrequency operation permitted
by including mux circuits connected to various oscillator tap points. These muxes,
controlled by signals such as PWRUP, enable higher frequency operation by reducing the
number of inverter stages in the ring oscillator.
The voltage dropped across the PMOS diode does not affect the regulated voltage
because the reference voltage supply VDD is translated through a matching PMOS diode.
Both of the translated voltages are fed into the comparator stage, which enables the
pump oscillator whenever the translated VCCP voltage falls below the translated VDD
reference voltage. The comparator has built-in hysteresis, via the middle stage: this
dictates the amount of ripple present on the regulated VCCP supply.
The VBB regulator in Figure 6.17 operates in a similar fashion to the VCCP regulator of
Figure 6.16. The primary difference lies in the voltage translator stage. For the VBB
regulator, this stage translates the pumped voltage VBB and the reference voltage VSS up
within the input common mode range of the comparator circuit. The reference voltage VSS
is translated up by one threshold voltage (VTH) by sourcing a reference current with a
current mirror stage through an NMOS diode. The regulated voltage VBB is similarly
translated up by sourcing the same reference current with a matching current mirror stage
through a diode stack. This diode stack, similar to the VCCP case, contains an NMOS
diode that matches that used in translating the reference voltage VSS. The stack also
contains a mask-adjustable, pseudo-NMOS diode. The voltage across the pseudo-NMOS
diode determines the regulated voltage for VBB such that
The comparator includes a hysteresis stage, which dictates the amount of ripple present
on the regulated VBB supply.
6.3 DISCUSSION
In this chapter, we introduced the popular circuits used on a DRAM for voltage generation
and regulation. Because this introduction is far from exhaustive, we include a list of
relevant readings and references in the Appendix for those readers interested in greater
detail.
REFERENCES
[1] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation.
Piscataway, NJ: IEEE Press, 1998.
[2] B.Keeth, Control Circuit Responsive to Its Supply Voltage Level, United States Patent
#5,373,227, December 13, 1994.
Appendix
Supplemental Reading
In this tutorial overview of DRAM circuit design, we may not have covered specific topics
to the reader’s satisfaction. For this reason, we have compiled a list of supplemental
readings from major conferences and journals, categorized by subject. It is our hope that
unanswered questions will be addressed by the authors of these readings, who are
experts in the field of DRAM circuit design.
[1] S.Fuji, K.Natori, T.Furuyama, S.Saito, H.Toda, T.Tanaka, and O.Ozawa,“A Low-Power Sub
100 ns 256K Bit Dynamic RAM,” IEEE Journal of Solid-State Circuits, vol. 18, pp. 441–446,
October 1983.
[6] K.Itoh, “Trends in Megabit DRAM Circuit Design,” IEEE Journal of Solid-State Circuits, vol.
25, pp. 778–789, June 1990.
[9] K.Kimura, T.Sakata, K.Itoh, T.Kaga, T.Nishida, and Y.Kawamoto, “A Block-Oriented RAM
with Half-Sized DRAM Cell and Quasi-Folded Data-Line Architecture,” IEEE Journal of
Solid-State Circuits, vol. 26, pp. 1511–1518, November 1991.
[12] K.Shimohigashi and K.Seki, “Low-Voltage ULSI Design,” IEEE Journal of Solid-State
Circuits, vol. 28, pp. 408–413, April 1993.
[16] T.Ooishi, K.Hamade, M.Asakura, K.Yasuda, H.Hidaka, H.Miyamoto, and H.Ozaki, “An
Automatic Temperature Compensation of Internal Sense Ground for Sub-Quarter Micron
DRAMs,” 1994 Symposium on VLSI Circuits, p. 77, June 1994.
[20] S.Yoo, J.Han, E.Haq, S.Yoon, S.Jeong, B.Kim, J.Lee, T.Jang, H.Kim,C.Park, D.Seo,
C.Choi, S.Cho, and C.Hwang, “A 256M DRAM with Simplified Register Control for Low Power
Self Refresh and Rapid Burn-In,” 1994 Symposium on VLSI Circuits, p. 85, June 1994.
[22] D.Stark, H.Watanabe, and T.Furuyama, “An Experimental Cascade Cell Dynamic
Memory,” 1994 Symposium on VLSI Circuits, p. 89, June 1994.
[23] T.Inaba, D.Takashima, Y.Oowaki, T.Ozaki, S.Watanabe, and K.Ohuchi,“A 250mV Bit-Line
Swing Scheme for a 1V 4Gb DRAM,” 1995 Symposium on VLSI Circuits, p. 99, June 1995.
[24] I.Naritake, T.Sugibayashi, S.Utsugi, and T.Murotani, “A Crossing Charge Recycle Refresh
Scheme with a Separated Driver Sense-Amplifier for Gb DRAMs,” 1995 Symposium on VLSI
Circuits, p. 101, June 1995.
[28] D.Takashima, Y.Oowaki, S.Watanabe, and K.Ohuchi, “A Novel Power-Off Mode for a
Battery-Backup DRAM,” 1995 Symposium on VLSI Circuits, p. 109, June 1995.
[34] J.Han, J.Lee, S.Yoon, S.Jeong, C.Park, I.Cho, S.Lee, and D.Seo, “Skew Minimization
Techniques for 256M-bit Synchronous DRAM and Beyond,” 1996 Symposium on VLSI Circuits,
p. 192, June 1996.
[40] Y.Idei, K.Shimohigashi, M.Aoki, H.Noda, H.Iwai, K.Sato, and T.Tachibana, “Dual-Period
Self-Refresh Scheme for Low-Power DRAM’s with On-Chip PROM Mode Register,” IEEE
Journal of Solid-State Circuits, vol. 33, pp. 253–259, February 1998.
[41] K.Kim, C.-G.Hwang, and J.G.Lee, “DRAM Technology Perspective for Gigabit Era,” IEEE
Transactions Electron Devices, vol. 45, pp. 598–608, March 1998.
[42] H.Tanaka, M.Aoki, T.Sakata, S.Kimura, N.Sakashita, H.Hidaka,T.Tachibana, and
K.Kimura, “A Precise On-Chip Voltage Generator for a Giga-Scale DRAM with a Negative
Word-Line Scheme,” 1998 Symposium on VLSI Circuits, p. 94, June 1998.
[43] T.Fujino and K.Arimoto, “Multi-Gbit-Scale Partially Frozen (PF) NAND DRAM with
SDRAM Compatible Interface,” 1998 Symposium on VLSI Circuits, p. 96, June 1998.
DRAM Cells
[47] C.G.Sodini and T.I.Kamins, “Enhanced Capacitor for One-Transistor Memory Cell,” IEEE
Transactions Electron Devices, vol. ED-23, pp. 1185–1187, October 1976.
[48] J.E.Leiss, P.K.Chatterjee, and T.C.Holloway, “DRAM Design Using the Taper-Isolated
Dynamic RAM Cell,” IEEE Journal of Solid-State Circuits, vol. 17, pp. 337–344, April 1982.
[52] K.W.Kwon. I.S.Park, D.H.Han, E.S.Kim, S.T.Ahn, and M.Y.Lee,“Ta2O5 Capacitors for 1
Gbit DRAM and Beyond,” 1994 IEDM Technical Digest, pp. 835–838.
[53] D.Takashima, S.Watanabe, H.Nakano, Y.Oowaki, and K.Ohuchi,“Open/Folded Bit-Line
Arrangement for Ultra-High-Density DRAM’s,” IEEE Journal of Solid-State Circuits, vol. 29, pp.
539–542, April 1994.
[56] B.El-Kareh, G.B.Bronner, and S.E.Schuster, “The Evolution of DRAM Cell Technology,
Solid State Technology, vol. 40, pp. 89–101, May 1997.
[57] S.Takehiro, S.Yamauchi, M.Yoshimaru, and H.Onoda, “The Simplest Stacked BST
Capacitor for Future DRAM’s Using a Novel Low Temperature Growth Enhanced
Crystallization,” IEEE Symposium on VLSI Technology Digest of Technical Papers, pp.
153–154, June 1997.
[58] T.Okuda and T.Murotani, “A Four-Level Storage 4-Gb DRAM,” IEEE Journal of Solid State
Circuits, vol. 32, pp. 1743–1747, November 1997.
[59] A.Nitayama, Y.Kohyama, and K.Hieda, “Future Directions for DRAM Memory Cell
Technology,” 1998 IEDM Technical Digest, pp. 355–358.
DRAM Sensing
[62] N.C.-C.Lu and H.H.Chao, “Half-VDD/Bit-Line Sensing Scheme in CMOS DRAMs,” IEEE
Journal of Solid-State Circuits, vol. 19, pp. 451–454, August 1984.
[63] P.A.Layman and S.G.Chamberlain, “A Compact Thermal Noise Model for the
Investigation of Soft Error Rates in MOS VLSI Digital Circuits,” IEEE Journal of Solid-State
Circuits, vol. 24, pp. 79–89, February 1989.
[64] R.Kraus, “Analysis and Reduction of Sense-Amplifier Offset,” IEEE Journal of Solid-State
Circuits, vol. 24, pp. 1028–1033, August 1989.
[65] R.Kraus and K.Hoffmann, “Optimized Sensing Scheme of DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 24, pp. 895–899, August 1989.
[66] H.Hidaka, Y.Matsuda, and K.Fujishima, “A Divided/Shared Bit-Line Sensing Scheme for
ULSI DRAM Cores,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 473–478, April 1991.
[68] T.N.Blalock and R.C.Jaeger, “A High-Speed Sensing Scheme for IT Dynamic RAMs
Utilizing the Clamped Bit-Line Sense Amplifier,” IEEE Journal of Solid-State Circuits, vol. 27,
pp. 618–625, April 1992.
[72] T.Sunaga, “A Full Bit Prefetch DRAM Sensing Circuit,” IEEE Journal of Solid-State
Circuits, vol. 31, pp. 767–772, June 1996.
DRAM SOI
Embedded DRAM
Redundancy Techniques
DRAM Testing
Synchronous DRAM
Low-Voltage DRAMs
[91] K.Lee, C.Kim, D.Yoo, J.Sim, S.Lee, B.Moon, K.Kim, N.Kim, S.Yoo,J.Yoo, and S.Cho,
“Low Voltage High Speed Circuit Designs for Giga-bit DRAMs,” 1996 Symposium on VLSI
Circuits, p. 104, June 1996.
[92] M.Saito, J.Ogawa, K.Gotoh, S.Kawashima, and H.Tamura, “Technique for Controlling
Effective VTH in Multi-Gbit DRAM Sense Amplifier,” 1996 Symposium on VLSI Circuits, p. 106,
June 1996.
[93] K.Gotoh, J.Ogawa, M.Saito, H.Tamura, and M.Taguchi, “A 0.9 V Sense-Amplifier Driver
for High-Speed Gb-Scale DRAMs,” 1996 Symposium on VLSI Circuits, p. 108, June 1996.
[94] T.Hamamoto, Y.Morooka, T.Amano, and H.Ozaki, “An Efficient Charge Recycle and
Transfer Pump Circuit for Low Operating Voltage DRAMs,” 1996 Symposium on VLSI Circuits,
p. 110, June 1996.
[95] T.Yamada, T.Suzuki, M.Agata, A.Fujiwara, and T.Fujita, “Capacitance Coupled Bus with
Negative Delay Circuit for High Speed and Low Power (10GB/s<500mW) Synchronous
DRAMs,” 1996 Symposium on VLSI Circuits, p. 112, June 1996.
High-Speed DRAMs
[96] S.Wakayama, K.Gotoh, M.Saito, H.Araki, T.S.Cheung, J.Ogawa, and H.Tamura, “10-ns
Row Cycle DRAM Using Temporal Data Storage Buffer Architecture,” 1998 Symposium on
VLSI Circuits, p. 12, June 1998.
[100] S.-J.Jang, S.-H.Han, C.-S.Kim, Y.-H.Jun, and H.-J.Yoo, “A Compact Ring Delay Line for
High Speed Synchronous DRAM,” 1998 Symposium on VLSI Circuits, p. 60, June 1998.
[101] H.Noda, M.Aoki, H.Tanaka, O.Nagashima, and H.Aoki, “An On-Chip Timing Adjuster
with Sub-100-ps Resolution for a High-Speed DRAM Interface,” 1998 Symposium on VLSI
Circuits, p. 62, June 1998.
[102] T.Sato, Y.Nishio, T.Sugano, and Y.Nakagome, “5GByte/s Data Transfer Scheme with
Bit-to-Bit Skew Control for Synchronous DRAM,” 1998 Symposium on VLSI Circuits, p. 64,
June 1998.
High-Performance DRAM
[107] Y.Kanno, H.Mizuno, and T.Watanabe, “A DRAM System for Consistently Reducing CPU
Wait Cycles,” 1999 Symposium on VLSI Circuits, p. 131, June 1999.
[108] S.Perissakis, Y.Joo, J.Ahn, A.DeHon, and J.Wawrzynek, “Embedded DRAM for a
Reconfigurable Array,” 1999 Symposium on VLSI Circuits, p. 145, June 1999.
[109] T.Namekawa, S.Miyano, R.Fukuda, R.Haga, O.Wada, H.Banba,S.Takeda, K.Suda,
K.Mimoto, S.Yamaguchi, T.Ohkubo, H.Takato, and K.Numata, “Dynamically Shift-Switched
Dataline Redundancy Suitable for DRAM Macro with Wide Data Bus,” 1999 Symposium on
VLSI Circuits, p. 149, June 1999.
Glossary
1T1C
A DRAM memory cell consisting of a single MOSFET access transistor and a
single storage capacitor.
Bitline
Also called a digitline or columnline. A common conductor made from metal or
polysilicon that connects multiple memory cells together through their access
transistors. The bitline is ultimately used to connect memory cells to the sense
amplifier block to permit Refresh, Read, and Write operations.
Bootstrapped Driver
A driver circuit that employs capacitive coupling to boot, or raise up, a capacitive
node to a voltage above VCC.
Buried Capacitor Cell
A DRAM memory cell in which the capacitor is constructed below the digitline.
Charge Pump
See Voltage Pump.
CMOS,
Complementary
Metal-Oxide
Semiconductor
A silicon technology for fabricating integrated circuits. Complementary refers to
the technology’s use of both NMOS and PMOS transistors in its construction. The
PMOS transistor is used primarily to pull signals toward the positive power supply
VDD. The NMOS transistor is used primarily to pull signals toward ground. The
metal-oxide semiconductor describes the sandwich of metal oxide (actually
polysilicon in modern devices) and silicon that makes up the NMOS and PMOS
transistors.
COB,
Capacitor
over
Bitline
A DRAM memory cell in which the capacitor is constructed above the digitline
(bitline).
C
o
l
u
m
n
l
i
n
e
See Bitline.
C
o
l
u
m
n
R
e
d
u
n
d
a
n
c
y
See Bitline.
A circuit that generates and inserts an optimum delay to temporarily align two
signals. In DRAM, a DLL synchronizes the input and output clock signals of the
DRAM to the I/O data signals.
A memory technology that stores information in the form of electric charge on
capacitors. This technology is considered dynamic because the stored charge
degrades over time due to leakage mechanisms. The leakage necessitates
periodic Refresh of the memory cells to replace the lost charge.
A design metric, which is defined as the ratio of memory array die area and total
die area (chip size). It is expressed as a percentage.
A circuit that equalizes the voltages of a digitline pair by shorting the two digitlines
together. Most often, the equilibration circuit includes a bias network, which helps
to set and hold the equilibration level to a known voltage (generally VCC/2) prior to
Sensing.
A positive feedback (regenerative) circuit for amplifying the signals on the I/O
lines.
MOSFET transistors that connect the array digitlines to the I/O lines (through the
sense amplifiers). Read and Write operations from/to the memory arrays always
occur through I/O devices.
MOSFET transistors that isolate array digitlines from the sense amplifiers.
A memory cell capable of storing one bit of data. In modern DRAMs, the mbit
consists of a single MOSFET access transistor and a single storage capacitor.
The gate of the MOSFET connects to the wordline or rowline, while the source
and drain of the MOSFET connect to the storage capacitor and the digitline,
respectively.
An array of memory or mbit cells.
The practice of using the same chip address pins for both the row and column
addresses. The addresses are clocked into the device at different times.
A DRAM architecture that uses cross-point-style memory arrays in which a
memory cell is placed at every wordline and digitline intersection. Digitline pairs,
for connection to the sense amplifiers, consist of a single digitline from two
adjacent memory arrays.
The distance between like points in a periodic array. For example, digitline pitch
in a DRAM array is the distance between the centers or edges of two adjacent
digitlines.
Computer memory that allows access to any memory location without
restrictions,
The process of restoring the electric charge in DRAM memory cell capacitors to
full levels through Sensing. Note that Refresh occurs every time a wordline is
activated and the sense amplifiers are fired.
See Wordline.
A type of regenerative amplifier that senses the contents of memory cells and
restores them to full levels.
Also called a charge pump. A circuit for generating voltages that lie outside of the
power supply range.
List of Figures
Chapter 1: An Introduction to DRAM
Figure 1.1: 1,024-bit DRAM functional diagram.
Figure 1.2: 1,024-bit DRAM pin connections.
Figure 1.3: Ideal address input buffer.
Figure 1.4: Layout of a 1,024-bit memory array.
Figure 1.5: 1k DRAM Read cycle.
Figure 1.6: 1k DRAM Write cycle.
Figure 1.7: 1k DRAM Refresh cycle.
Figure 1.8: 3-transistor DRAM cell.
Figure 1.9: Block diagram of a 4k DRAM.
Figure 1.10: 4,096-bit DRAM pin connections.
Figure 1.11: Address timing.
Figure 1.12: 1-transistor, 1-capacitor (1T1C) memory cell.
Figure 1.13: Row of N dynamic memory elements.
Figure 1.14: Page mode.
Figure 1.15: Fast page mode.
Figure 1.16: Nibble mode.
Figure 1.17: Pin connections of a 64Mb SDRAM with 16-bit I/O.
Figure 1.18: Block diagram of a 64Mb SDRAM with 16-bit I/O.
Figure 1.19: SDRAM with a latency of three.
Figure 1.20: Mode register.
Figure 1.21: IT 1C DRAM memory cell. (Note the rotation of the rowline and columnline.)
Figure 1.22: Open digitline memory array schematic.
Figure 1.23: Open digitline memory array layout.
Figure 1.24: Simple array schematic (an open DRAM array).
Figure 1.25: Cell access waveforms.
Figure 1.26: DRAM charge-sharing.
Figure 1.27: Sense amplifier schematic.
Figure 1.28: Sensing operation waveforms.
Figure 1.29: Sense amplifier schematic with I/O devices.
Figure 1.30: Write operation waveforms.
Figure 1.31: A folded DRAM array.
List of Tables
Chapter 1: An Introduction to DRAM
Table 1.1: SDRAM commands. (Notes: 1)