0% found this document useful (0 votes)
81 views

DRAM Circuit Design A Tutorial

Uploaded by

cw1995coin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

DRAM Circuit Design A Tutorial

Uploaded by

cw1995coin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

DRAM Circuit Design: A Tutorial

by Brent Keeth and R. Jacob Baker

John Wiley & Sons © 2001 (200 pages)

ISBN:0780360141

This book instructs readers on the nuances of DRAM design, making it accessible for both

novice and practicing engineers and covering particular information necessary for working

with both the analog and digital circuits present in DRAM chips.

Table of Contents

Dram Circuit Design—A Tutorial

Preface

Acknowledgments

Chapter 1 - An Introduction to DRAM

Chapter 2 - The DRAM Array

Chapter 3 - Array Architectures

Chapter 4 - The Peripheral Circuitry

Chapter 5 - Global Circuitry and Considerations

Chapter 6 - Voltage Converters

Appendix - Supplemental Reading

Glossary

Index

List of Figures

List of Tables
Dram Circuit Design—A Tutorial
Brent Keeth
Micron Technology, Inc.
Boise, Idaho
R. Jacob Baker
Boise State University
Micron Technology, Inc.
Boise, Idaho
IEEE Solid-State Circuits Society, Sponsor

IEEE Press Series on Microelectronic Systems


Stuart K.Tewksbury and Joe E.Brewer, Series Editors

IEEE PRESS

The Institute of Electrical and Electronics Engineers, Inc., New York

This book and other books may be purchased at a discount from the
publisher when ordered in bulk quantities. Contact:

IEEE Press Marketing


Attn: Special Sales
445 Hoes Lane
P.O. Box 1331
Piscataway, NJ 08855–1331
Fax: +1 732 981 9334
For more information about IEEE Press products, visit the IEEE Online
Catalog & Store: http://www.ieee.org/ieeestore.

© 2001 by the Institute of Electrical and Electronics Engineers, Inc.

3 Park Avenue, 17th Floor, New York, NY 10016–5997.


All rights reserved. No part of this book may be reproduced in any form,
nor may it be stored in a retrieval system or transmitted in any form,
without written permission from the publisher.

10 9 8 7 6 5 4 3 2 1
ISBN: 0780360141
IEEE Order No. PC5863
Library of Congress Cataloging-in-Publication Data
Keeth, Brent, 1960–
DRAM circuit design: a tutorial/Brent Keeth, R.Jacob Baker.
p. cm.
“IEEE Solid-State Circuits Society, sponsor.”
Includes bibliographical references and index.
ISBN 0-7803-6014-1
1. Semiconductor storage devices Design and construction. I.Baker,
R.Jacob, 1964–
II. Title

TK7895.M4 K425 2000


621.39’732—dc21
00–059802
Dedication

For
Susi, John, Katie, Julie, Kyri, Josh,
and the zoo.
About the Authors

Brent Keeth was born in Ogden, Utah, on March 30, 1960. He received
the B.S. and M.S. degrees in electrical engineering from the University of
Idaho, Moscow, in 1982 and 1996, respectively.

Mr. Keeth joined Texas Instruments in 1982, spending the next two years
designing hybrid integrated circuits for avionics control systems and a
variety of military radar sub-systems. From 1984 to 1987, he worked for
General Instruments Corporation designing baseband scrambling and
descrambling equipment for the CATV industry.

Thereafter, he spent 1987 through 1992 with the Grass Valley Group (a
subsidiary of Tektronix) designing professional broadcast, production, and
post-production video equipment. Joining Micron Technology in 1992, he
has engaged in the research and development of various CMOS DRAMs
including 4Mbit, 16Mbit, 64Mbit, 128Mbit, and 256Mbit devices. As a
Principal Fellow at Micron, his present research interests include
high-speed bus protocols and open standard memory design.

In 1995 and 1996, Brent served on the Technical Program Committee for
the Symposium on VLSI Circuits. In addition, he served on the Memory
Subcommittee of the U.S. Program Committee for the 1996 and 1999
IEEE International Solid-State Circuits Conferences. Mr. Keeth holds over
60 U.S. and foreign patents.

R.Jacob Baker (S’83, M’88, SM’97) was born in Ogden, Utah, on October
5, 1964. He received the B.S. and M.S. degrees in electrical engineering
from the University of Nevada, Las Vegas, and the Ph.D. degree in
electrical engineering from the University of Nevada, Reno.

From 1981 to 1987, Dr. Baker served in the United States Marine Corps
Reserves. From 1985 to 1993, he worked for E.G.&G. Energy
Measurements and the Lawrence Livermore National Laboratory
designing nuclear diagnostic instrumentation for underground nuclear
weapons tests at the Nevada test site. During this time, he designed over
30 electronic and electro-optic instruments, including high-speed (750
Mb/s) fiber-optic receiver/transmitters, PLLs, frame- and bit-syncs, data
converters, streak-camera sweep circuits, micro-channel plate gating
circuits, and analog oscilloscope electronics. From 1993 to 2000, he was a
faculty member in the Department of Electrical Engineering at the
University of Idaho. In 2000, he joined a new electrical engineering
program at Boise State University as an associate professor. Also, since
1993, he has consulted for various companies, including the Lawrence
Berkeley Laboratory, Micron Technology, Micron Display, Amkor Wafer
Fabrication Services, Tower Semiconductor, Rendition, and the Tower
ASIC Design Center.
Holding 12 patents in integrated circuit design, Dr. Baker is a member of
Eta Kappa Nu and is a coauthor (with H.Li and D.Boyce) of a popular
textbook covering CMOS analog and digital circuit design entitled, CMOS:
Circuit Design, Layout, and Simulation (IEEE Press, 1998). His research
interests focus mainly on CMOS mixed-signal integrated circuit design.
Preface
From the core memory that rocketed into space during the Apollo moon missions to the
solid-state memories used in today’s commonplace computer, memory technology has
played an important, albeit quiet, role during the last century. It has been quiet in the
sense that memory, although necessary, is not glamorous and sexy, and is instead being
relegated to the role of a commodity. Yet, it is important because memory technology,
specifically, CMOS DRAM technology, has been one of the greatest driving forces in the
advancement of solid-state technology. It remains a driving force today, despite the
segmenting that is beginning to appear in its market space.

The very nature of the commodity memory market, with high product volumes and low
pricing, is what ultimately drives the technology. To survive, let alone remain viable over
the long term, memory manufacturers must work aggressively to drive down their
manufacturing costs while maintaining, if not increasing, their share of the market. One of
the best tools to achieve this goal remains the ability for manufacturers to shrink their
technology, essentially getting more memory chips per wafer through process scaling.
Unfortunately, with all memory manufacturers pursuing the same goals, it is literally a
race to see who can get there first. As a result, there is tremendous pressure to advance
the state of the art—more so than in other related technologies due to the commodity
status of memory.

While the memory industry continues to drive forward, most people can relax and enjoy
the benefits—except for those of you who need to join in the fray. For you, the only way
out is straight ahead, and it is for you that we have written this book.
The goal of DRAM Circuit Design: A Tutorial is to bridge the gap between the introduction
to memory design available in most CMOS circuit texts and the advanced articles on
DRAM design that are available in technical journals and symposium digests. The book
introduces the reader to DRAM theory, history, and circuits in a systematic, tutorial
fashion. The level of detail varies, depending on the topic. In most cases, however, our
aim is merely to introduce the reader to a functional element and illustrate it with one or
more circuits. After gaining familiarity with the purpose and basic operation of a given
circuit, the reader should be able to tackle more detailed papers on the subject. We have
included a thorough list of papers in the Appendix for readers interested in taking that
next step.
The book begins in Chapter 1 with a brief history of DRAM device evolution from the first
1Kbit device to the more recent 64Mbit synchronous devices. This chapter introduces the
reader to basic DRAM operation in order to lay a foundation for more detailed discussion
later. Chapter 2 investigates the DRAM memory array in detail, including fundamental
array circuits needed to access the array. The discussion moves into array architecture
issues in Chapter 3, including a design example comparing known architecture types to a
novel, stacked digitline architecture. This design example should prove useful, for it
delves into important architectural trade-offs and exposes underlying issues in memory
design. Chapter 4 then explores peripheral circuits that support the memory array,
including column decoders and redundancy. The reader should find Chapter 5 very
interesting due to the breadth of circuit types discussed. This includes data path elements,
address path elements, and synchronization circuits. Chapter 6 follows with a discussion
of voltage converters commonly found on DRAM designs. The list of converters includes
voltage regulators, voltage references, VDD/2 generators, and voltage pumps. We wrap
up the book with the Appendix, which directs the reader to a detailed list of papers from
major conferences and journals.
Brent Keeth
R.Jacob Baker

Acknowledgments
We acknowledge with thanks the pioneering work accomplished over the past 30 years
by various engineers, manufacturers, and institutions that have laid the foundation for this
book. Memory design is no different than any other field of endeavor in which new
knowledge is built on prior knowledge. We therefore extend our gratitude to past, present,
and future contributors to this field. We also thank Micron Technology, Inc., and the high
level of support that we received for this work. Specifically, we thank the many individuals
at Micron who contributed in various ways to its completion, including Mary Miller, who
gave significant time and energy to build and edit the manuscript, and Jan Bissey and
crew, who provided the wonderful assortment of SEM photographs used throughout the
text.
Brent Keeth
R.Jacob Baker
Chapter 1: An Introduction to DRAM
Dynamic random access memory (DRAM) integrated circuits (ICs) have existed for more
than twenty-five years. DRAMs evolved from the earliest 1-kilobit (Kb) generation to the
recent 1-gigabit (Gb) generation through advances in both semiconductor process and
circuit design technology. Tremendous advances in process technology have
dramatically reduced feature size, permitting ever higher levels of integration. These
increases in integration have been accompanied by major improvements in component
yield to ensure that overall process solutions remain cost-effective and competitive.
Technology improvements, however, are not limited to semiconductor processing. Many
of the advances in process technology have been accompanied or enabled by advances
in circuit design technology. In most cases, advances in one have enabled advances in
the other. In this chapter, we introduce some fundamentals of the DRAM IC, assuming
that the reader has a basic background in complementary metal-oxide semiconductor
(CMOS) circuit design, layout, and simulation [1].

1.1 DRAM TYPES AND OPERATION

To gain insight into how modern DRAM chips are designed, it is useful to look into the
evolution of DRAM. In this section, we offer an overview of DRAM types and modes of
operation.

1.1.1 The 1k DRAM (First Generation)


We begin our discussion by looking at the 1,024-bit DRAM (1,024 x 1 bit). Functional
diagrams and pin connections appear in Figure 1.1 and Figure 1.2, respectively. Note that
there are 10 address inputs with pin labels R1−R5 and C1−C5. Each address input is
connected to an on-chip address input buffer. The input buffers that drive the row (R) and
column (C) decoders in the block diagram have two purposes: to provide a known input
capacitance (CIN) on the address input pins and to detect the input address signal at a
known level so as to reduce timing errors. The level VTRIP, an idealized trip point around
which the input buffers slice the input signals, is important due to the finite transition times
on the chip inputs (Figure 1.3). Ideally, to avoid distorting the duration of the logic zeros
and ones, VTRIP should be positioned at a known level relative to the maximum and
minimum input signal amplitudes. In other words, the reference level should change with
changes in temperature, process conditions, input maximum amplitude (VIH), and input
minimum amplitude (VIL). Having said this, we note that the input buffers used in
first-generation DRAMs were simply inverters.
Figure 1.1: 1,024-bit DRAM functional diagram.

Figure 1.2: 1,024-bit DRAM pin connections.


Figure 1.3: Ideal address input buffer.
Continuing our discussion of the block diagram shown in Figure 1.1, we see that five
address inputs are connected through a decoder to the 1,024-bit memory array in both
the row and column directions. The total number of addresses in each direction, resulting
from decoding the 5-bit word, is 32. The single memory array is made up of 1,024
memory elements laid out in a square of 32 rows and 32 columns. Figure 1.4 illustrates
the conceptual layout of this memory array. A memory element is located at the
intersection of a row and a column.

Figure 1.4: Layout of a 1,024-bit memory array.


By applying an address of all zeros to the 10 address input pins, the memory data located
at the intersection of row0, RA 0, and column 0, CA 0, is accessed. (It is either written to
or read out, depending on the state of the R/W input and assuming that the CE pin is
LOW so that the chip is enabled.) It is important to realize that a single bit of memory is
accessed by using both a row and a column address. Modern DRAM chips reduce the
number of external pins required for the memory address by using the same pins for both
the row and column address inputs (address multiplexing). A clock signal row address
strobe (RAS) strobes in a row address and then, on the same set of address pins, a clock
signal column address strobe (CAS) strobes in a column address at a different time.

Also note how a first-generation memory array is organized as a logical square of


memory elements. (At this point, we don’t know what or how the memory elements are
made. We just know that there is a circuit at the intersection of a row and column that
stores a single bit of data.) In a modern DRAM chip, many smaller memory arrays are
organized to achieve a larger memory size. For example, 1,024 smaller memory arrays,
each composed of 256 kbits, may constitute a 256-Meg (256 million bits) DRAM.

1.1.1.1 Reading Data Out of the 1k DRAM.


Data can be read out of the DRAM by first putting the chip in the Read mode by pulling
the R/W pin HIGH and then placing the chip enable pin CE in the LOW state. Figure 1.5
illustrates the timing relationships between changes in the address inputs and data
appearing on the DOUT pin. Important timing specifications present in this figure are Read
cycle time (tRC) and Access time (tAC). The term tRC specifies how fast the memory can be
read. If tRC is 500 ns, then the DRAM can supply 1-bit words at a rate of 2 MHz. The term
tAC specifies the maximum length of time after the input address is changed beforethe
output data (DOUT) is valid.

Figure 1.5: 1k DRAM Read cycle.

1.1.1.2 Writing to the 1k DRAM.


Writing data to the DRAM is accomplished by bringing the R/Winput LOW with valid data
present onthe DIN pin. Figure 1.6 shows the timing diagram for a Write cycle. The term
Write cycle time (tWC) is related to the maximum frequency at which we can write data into
the DRAM. The term Address to Write delay time (tAW) specifies the time between the
address changing and the R/Winput going LOW. Finally, Write pulse width (tWP) specifies
how long the input data must be present before the R/Winput can go back HIGH in
preparation for another Read or Write to the DRAM. When writing to the DRAM, we can
think of the R/W input as a clock signal.
Figure 1.6: 1k DRAM Write cycle.

1.1.1.3 Refreshing the 1k DRAM.

The dynamic nature of DRAM requires that the memory be refreshed periodically so as
not to lose the contents of the memory cells. Later we will discuss the mechanisms that
lead to the dynamic operation of the memory cell. At this point, we discuss how memory
Refresh is accomplished for the 1k DRAM.
Refreshing a DRAM is accomplished internally: external data to the DRAM need not be
applied. To refresh the DRAM, we periodically access the memory with every possible
row address combination. A timing diagram for a Refresh cycle is shown in Figure 1.7.
With the CE input pulled HIGH, the address is changed, while the R/W input is used as a
strobe or clock signal. Internally, the data is read out and then written back into the same
location at full voltage; thus, logic levels are restored (or refreshed).

Figure 1.7: 1k DRAM Refresh cycle.

1.1.1.4 A Note on the Power Supplies.


The voltage levels used in the 1k DRAM are unusual by modern-day standards. In
reviewing Figure 1.2, we see that the 1k DRAM chip uses two power supplies: VDD and
VSS. To begin, VSS is a greater voltage than VDD: VSS is nominally 5 V, while VDD is −12 V.
The value of VSS was set by the need to interface to logic circuits that were implemented
using transistor-transistor logic (TTL) logic. The 17-V difference between VDD and VSS
was necessary to maintain a large signal-tonoise ratio in the DRAM array. We discuss
these topics in greater detail later in the book. The VSS power supply used in modern
DRAM designs, at the time of this writing, is generally zero; the VDD is in the neighborhood
of 2.5 V.

1.1.1.5 The 3-Transistor DRAM Cell.


One of the interesting circuits used in the 1k DRAM (and a few of the 4k and 16k DRAMs)
is the 3-transistor DRAM memory cell shown in Figure 1.8. The column- and rowlines
shown in the block diagram of Figure 1.1 are split into Write and Read line pairs. When
the Write rowline is HIGH, M1 turns ON. At this point, the data present on the Write
columnline is passed to the gate of M2, and the information voltage charges or
discharges the input capacitance of M2. The next, and final, step in writing to the mbit cell
is to turn OFF the Write rowline by driving it LOW. At this point, we should be able to see
why the memory is called dynamic. The charge stored on the input capacitance of M2 will
leak off over time.

Figure 1.8: 3-transistor DRAM cell.

If we want to read out the contents of the cell, we begin by first precharging the Read
columnline to a known voltage and then driving the Read rowline HIGH. Driving the Read
rowline HIGH turns M3 ON and allows M2 either to pull the Read columnline LOW or to
not change the precharged voltage of the Read columnline. (If M2’s gate is a logic LOW,
then M2 will be OFF, having no effect on the state of the Read columnline.) The main
drawback of using the 3-transistor DRAM cell, and the reason it is no longer used, is that
it requires two pairs of column and rowlines and a large layout area. Modern 1-transistor,
1-capacitor DRAM cells use a single rowline, a single columnline, and considerably less
area.

1.1.2 The 4k–64 Meg DRAM (Second Generation)


We distinguish second-generation DRAMs from first-generation DRAMs by the
introduction of multiplexed address inputs, multiple memory arrays, and the
1-transistor/1-capacitor memory cell. Furthermore, second-generation DRAMs offer more
modes of operation for greater flexibility or higher speed operation. Examples are page
mode, nibble mode, static column mode, fast page mode (FPM), and extended data out
(EDO). Second-generation DRAMs range in size from 4k (4,096×1 bit, i.e., 4,096 address
locations with 1-bit input/output word size) up to 64 Meg (67,108,864 bits) in memory
sizes of 16 Meg×4 organized as 16,777,216 address locations with 4-bit input/output
word size, 8 Meg×8, or 4 Meg×16.
Two other major changes occurred in second-generation DRAMs: (1) the power supply
transitioned to a single 5 V and (2) the technology advanced from NMOS to CMOS. The
change to a single 5 V supply occurred at the 64kbit density. It simplified system design to
a single power supply for the memory, processor, and any TTL logic used in the system.
As a result, rowlines had to be driven to a voltage greater than 5 V to turn the NMOS
access devices fully ON (more on this later), and the substrate held at a potential less
than zero. For voltages outside the supply range, charge pumps are used (see Chapter 6).
The move from NMOS to CMOS, at the 1Mb density level, occurred because of concerns
over speed, power, and layout size. At the cost of process complexity, complementary
devices improved the design.

1.1.2.1 Multiplexed Addressing.


Figure 1.9 shows a 4k DRAM block diagram, while Figure 1.10 shows the pin connections
for a 4k chip. Note that compared to the block diagram of the 1k DRAM shown in Figure
1.1, the number of address input pins has decreased from 10 to 6, even though the
memory size has quadrupled. This is the result of using multiplexed addressing in which
the same address input pins are used for both the row and column addresses. The row
address strobe (RAS) input clocks the address present on the DRAM address pins A0 to
A5 into the row address latches on the falling edge. The column address strobe (CAS)
input clocks the input address into the column address latches on its falling edge.

Figure 1.9: Block diagram of a 4k DRAM.


Figure 1.10: 4,096-bit DRAM pin connections.
Figure 1.11 shows the timing relationships between RAS, CAS, and the address inputs.
Note that tRC is still (as indicated in the last section) the random cycle time for the DRAM,
indicating the maximum rate we can write to or read from a DRAM. Note too how the row
(or column) address must be present on the address inputs when RAS (or CAS) goes
LOW. The parameters tRAS and tCAS indicate how long RAS or CAS must remain LOW
after clocking in a column or row address. The parameters tASR, tRAH, tASC, and tCAH
indicate the setup and hold times for the row and column addresses, respectively.

Figure 1.11: Address timing.

1.1.2.2 Multiple Memory Arrays.


As mentioned earlier, second-generation DRAMs began to use multiple or segmented
memory arrays. The main reason for splitting up the memory into more than one array at
the cost of a larger layout area can be understood by considering the parasitics present in
the dynamic memory circuit element. To understand the origins of these parasitics,
consider the modern DRAM memory cell comprising one MOSFET and one capacitor, as
shown in Figure 1.12.

Figure 1.12: 1-transistor, 1-capacitor (1T1C) memory cell.

In the next section, we cover the operation of this cell in detail. Here we introduce the
operation of the cell. Data is written to the cell by driving the rowline (a.k.a., wordline)
HIGH, turning ON the MOSFET, and allowing the columnline (a.k.a., digitline or bitline) to
charge or discharge the storage capacitor. After looking at this circuit for a moment, we
can make the following observations:
1. The wordline (rowline) may be fabricated using polysilicon (poly). This allows
the MOSFET to be formed by crossing the poly wordline over an n+ active
area.
2. To write a full VCC logic voltage (where VCC is the maximum positive power
supply voltage) to the storage capacitor, the rowline must be driven to a
voltage greater than VCC+ the n-channel MOSFET threshold voltage (with
body effect). This voltage, >VCC +VTH, is often labeled VCC pumped (VCCP).
3. The bitline (columnline) may be made using metal or polysilicon. The main
concern, as we’ll show in a moment, is to reduce the parasitic capacitance
associated with the bitline.
Consider the row of N dynamic memory elements shown in Figure 1.13. Typically, in a
modern DRAM, N is 512, which is also the number of bitlines. When a row address is
strobed into the DRAM, via the address input pins using the falling edge of RAS, the
address is decoded to drive a wordline (rowline) to VCCP. This turns ON an entire row in a
DRAM memory array. Turning ON an entire row in a DRAM memory array allows the
information stored on the capacitors to be sensed (for a Read) via the bitlines or allows
the charging or discharging, via the bitlines, of the storage capacitors (for a Write).
Opening a row of data by driving a wordline HIGH is a very important concept for
understanding the modes of DRAM operation. For Refresh, we only need to supply row
addresses during a Refresh operation. For page Reads—when a row is open—a large
amount of data, which is set by the number of columns in the DRAM array, can be
accessed by simply changing the column address.
Figure 1.13: Row of N dynamic memory elements.
We’re now in a position to answer the question: “Why are we limited to increasing the
number of columnlines (or bitlines) used in a memory array?” or “Why do we need to
break up the memory into smaller memory arrays?” The answer to these questions
comes from the realization that the more bitlines we use in an array, the longer the delay
through the wordline (see Figure 1.13).
If we drive the wordline on the left side of Figure 1.13 HIGH, the signal will take a finite
time to reach the end of the wordline (the wordline on the right side of Figure 1.13). This is
due to the distributed resistance/capacitance structure formed by the resistance of the
polysilicon wordline and the capacitance of the MOSFET gates. The delay limits the
speed of DRAM operation. To be precise, it limits how quickly a row can be opened and
closed. To reduce this RC time, a polycide wordline is formed by adding a silicide, for
example, a mixture of a refractory metal such as tungsten with polysilicon, on top of
polysilicon. Using a polycide wordline will have the effect of reducing the wordline
resistance. Also, additional drivers can be placed at different locations along the wordline,
or the wordline can be stitched at various locations with metal.

The limitations on the additional number of wordlines can be understood by realizing that
by adding more wordlines to the array, more parasitic capacitance is added to the bitlines.
This parasitic capacitance becomes important when sensing the value of data charge
stored in the memory element. We’ll discuss this in more detail in the next section.

1.1.2.3 Memory Array Size.


A comment is in order about memory array size and how addressing can be used for
setting word and page size. (We’ll explain what this means in a moment.) If we review the
block diagram of the 4k DRAM shown in Figure 1.9, we see that two 2k-DRAM memory
arrays are used. Each 2k memory is composed of 64 wordlines and 32 bitlines for 2,048
memory elements/address locations per array. In the block diagram, notice that a single
bit, coming from the column decoder, can be used to select data, via the bitlines, from
Array0 or Array1.
From our discussion earlier, we can open a row in Array0 while at the same time opening
a row in Array1 by simply applying a row address to the input address pins and driving
RAS LOW. Once the rows are open, it is a simple matter of changing the column address
to select different data associated with the same open row from either array. If our word
size is 1 bit, we could define a page as being 64 bits in length (32 bits from each array).
We could also define our page size as 32 bits with a 2-bit word for input/output. We would
then say that the DRAM is a 4k DRAM organized as 2k×2. Of course, in the 4k DRAM, in
which the number of bits is small, the concepts of page reads or size aren’t too useful. We
present them here simply to illustrate the concepts. Let’s consider a more practical and
modern configuration.
Suppose we have a 64-Meg DRAM organized as 16 Meg×4 (4 bits input/output) using 4k
row address locations and 4k column address locations (12 bits or pins are needed for
each 4k of addressing). If our (sub) memory array size is 256kbits, then we have a total of
256 memory arrays on our DRAM chip. We’ll assume that there are 512 wordlines and
512 bitlines (digitlines), so that the memory array is logically square. (However, physically,
as we shall see, the array is not square.) Internal to the chip, in the address decoders, we
can divide the row and column addresses into two parts: the lower 9 bits for addressing
the wordlines/bitlines in a 256k memory array and the upper 3 bits for addressing one of
the 64 “group-of-four” memory arrays (6 bits total coming from the upper 3 bits of the row
and column addresses).

Our 4-bit word comes from the group-of-four memory arrays (one bit from each memory
array). We can define a page of data in the DRAM by realizing that when we open a row
in each of the four memory arrays, we are accessing 2k of data (512 bits/array×4 arrays).
By simply changing the column address without changing the row address and thus
opening another group-of-four wordlines, we can access the 2k “page” of data. With a
little imagination, we can see different possibilities for the addressing. For example, we
could open 8 group-of-four memory arrays with a row address and thus increase the page
size to 16k, or we could use more than one bit at a time from an array to increase word
size.

1.1.2.4 Refreshing the DRAM.


Refreshing the DRAM is accomplished by sequentially opening each row in the DRAM.
(We’ll discuss how the DRAM cell is refreshed in greater detail later in the book.) If we
use the 64-Meg example in the last section, we need to supply 4k row addresses to the
DRAM by changing the external address inputs from 000000000000 to 111111111111
while clocking the addresses into the DRAM using the falling edge of RAS. In some
DRAMs, an internal row address counter is present to make the DRAM easier to refresh.
The general specification for 64-Meg DRAM Refresh is that all rows must be refreshed at
least every 64 ms, which is an average of 15.7 μs per row. This means, that if the Read
cycle time tRC is 100 ns (see Figure 1.11), it will take 4,096 • 100 ns or 410 μs to refresh a
DRAM with 4k of row addresses. The percentage of time the DRAM is unavailable due to
Refresh can be calculated as 410 μs/64 ms or 0.64% of the time. Note that the Refresh
can be a burst, taking 410 μs as just described, or distributed, where a row is refreshed
every 15.7 μs.

1.1.2.5 Modes of Operation.

From the last section, we know that we can open a row in one or more DRAM arrays
concurrently, allowing a page of data to be written to or read from the DRAM. In this
section, we look at the different modes of operation possible for accessing this data via
the column address decoder. Our goal in this section is not to present all possible modes
of DRAM operation but rather to discuss the modes that have been used in
second-generation DRAMs. These modes are page mode, nibble mode, static column
mode, fast page mode, and extended data out.
Figure 1.14 shows the timing diagram for a page mode Read, Write, and
Read-Modify-Write. We can understand this timing diagram by first noticing that when
RAS goes LOW, we clock in a row address, decode the row address, and then drive a
wordline in one or more memory arrays to VCCP. The result is an open row(s) of data
sitting on the digitlines (columnlines). Only one row can be opened in any single array at a
time. Prior to opening a row, the bitlines are precharged to a known voltage. (Precharging
to VCC/2 is typically performed using internal circuitry.) Also notice at this time that data
out, DOUT, is in a Hi-Z state; that is, the DRAM is not driving the bus line connected to the
DOUT pin.

Figure 1.14: Page mode.


The next significant timing event occurs when CAS goes LOW and the column address is
clocked into the DRAM (Figure 1.14). At this time, the column address is decoded, and,
assuming that the data from the open row is sitting on the digitlines, it is steered using the
column address decoder to DOUT. We may have an open row of 512 bits, but we are
steering only one bit to DOUT. Notice that when CAS goes HIGH, DOUT goes back to the
Hi-Z state.
By strobing in another column address with the same open row, we can select another bit
of data (again via the column address decoder) to steer to the DOUT pin. In this case,
however, we have changed the DRAM to the Write mode (Figure 1.14). This allows us to
write, with the same row open via the DIN pin in Figure 1.10, to any column address on the
open row. Later, second-generation DRAMs used the same pins for both data input and
output to reduce pin count. These bidirectional pins are labeled DQ.
The final set of timing signals in Figure 1.14 (the right side) read data out of the DRAM
with R/W HIGH, change R/W to a LOW, and then write to the same location. Again, when
CAS goes HIGH, DOUT goes back to the Hi-Z state.
The remaining modes of operation are simple modifications of page mode. As seen in
Figure 1.15, FPM allows the column address to change while CAS is LOW. The speed of
the DRAM improves by reducing the delay between CAS going LOW and valid data
present, or accessed, on DOUT (tCAC). EDO is simply an FPM DRAM that doesn’t force
DOUT to a Hi-Z state when CAS goes HIGH. The data out of the DRAM is thus available
for a longer period of time, allowing for faster operation. In general, opening the row is the
operation that takes the longest amount of time. Once a row is open, the data sitting on
the columnlines can be steered to DOUT at a fast rate. Interestingly, using column access
modes has been the primary method of boosting DRAM performance over the years.

Figure 1.15: Fast page mode.


The other popular modes of operation in second-generation DRAMs were the static
column and nibble modes. Static column mode DRAMs used flow-through latches in the
column address path. When a column address was changed externally, with CAS LOW,
the column address fed directly to the column address decoder. (The address wasn’t
clocked on the falling edge of CAS.) This increased the speed of the DRAM by preventing
the outputs from going into the Hi-Z state with changes in the column address.
Nibble mode DRAMs used an internal presettable address counter so that by strobing
CASCAS, the column address would change internally. Figure 1.16 illustrates the timing
operation for a nibble mode DRAM. The first time CAS transitions LOW (first being
defined as the first transition after RAS goes LOW), the column address is loaded into the
counter. If RAS is held LOW and CAS is toggled, the internal address counter is
incremented, and the sequential data appears on the output of the DRAM. The term
nibble mode comes from limiting the number of CAS cycles to four (a nibble).

Figure 1.16: Nibble mode.

1.1.3 Synchronous DRAMs (Third Generation)


Synchronous DRAMs (SDRAMs) are made by adding a synchronous interface between
the basic core DRAM operation/circuitry of second-generation DRAMs and the control
coming from off-chip to make the DRAM operation faster. All commands and operations
to and from the DRAM are executed on the rising edge of a master or command clock
signal that is common to all SDRAMs and labeled CLK. See Figure 1.17 for the pin
connections of a 64Mb SDRAM with 16-bit input/output (I/O).
Figure 1.17: Pin connections of a 64Mb SDRAM with 16-bit I/O.
At the time of this writing, SDRAMs operate with a maximum CLK frequency in the range
of 100–143 MHz. This means that if a 64Mb SDRAM is organized as a x16 part (that is,
the input/output word size is 16 bits), the maximum rate at which the words can be written
to the part is 200–286 MB/s.
Another variation of the SDRAM is the double-data-rate SDRAM (DDR SDRAM, or simply
DDR DRAM). The DDR parts register commands and operations on the rising edge of the
clock signal while allowing data to be transferred on both the rising and falling edges. A
differential input clock signal is used in the DDR DRAM with the labeling of, not
surprisingly, CLK and CLK. In addition, the DDR DRAM provides an output data strobe,
labeled DQS, synchronized with the output data and the input CLK. DQS is used at the
controller to strobe in data from a DRAM. The big benefit of using a DDR part is that the
data transfer rate can be twice the clock frequency because data can be transferred on
both the rising and falling edges of CLK. This means that when using a 133 MHz clock,
the data written to and read from the DRAM can be transferred at 266M words/s. Using
the numbers from the previous paragraph, this means that a 64Mb DDR SDRAM with an
input/output word size of 16 bits will transfer data to and from the memory controller at
400–572 MB/s.
Figure 1.18 shows the block diagram of a 64Mb SDRAM with 16-bit I/O. Note that
although CLK is now used for transferring data, we still have the second-generation
control signals CS, WE, CAS, and RAS present on the part. (CKE is a clock enable signal
which, unless otherwise indicated, is assumed HIGH.) Let’s discuss how these control
signals are used in an SDRAM by recalling that in a second-generation DRAM, a Write
was executed by first driving WE and CS LOW. Next a row was opened by applying a row
address to the part and then driving RAS LOW. (The row address is latched on the falling
edge of RAS.) Finally, a column address was applied and latched on the falling edge of
CAS. A short time later, the data applied to the part would be written to the accessed
memory location.

Figure 1.18: Block diagram of a 64Mb SDRAM with 16-bit I/O.

For the SDRAM Write, we change the syntax of the descriptions of what’s happening in
the part. However, the fundamental operation of the DRAM circuitry is the same as that of
the second-generation DRAMs. We can list these syntax changes as follows:
1. The memory is segmented into banks. For the 64Mb memory of Figure 1.17
and Figure 1.18, each bank has a size of 16Mbs (organized as 4,096 row
addresses [12 bits]×256 column addresses [8 bits]×16 bits [16 DQ I/O pins]).
As discussed earlier, this is nothing more than a simple logic design of the
address decoder (although in most practical situations, the banks are also laid
out so that they are physically in the same area). The bank selected is
determined by the addresses BA0 and BA1.
2. In second-generation DRAMs, we said, “We open a row,” as discussed earlier.
In SDRAM, we now say, “We activate a row in a bank.” We do this by issuing
an active command to the part. Issuing an active command is accomplished
on the rising edge of CLK with a row/bank address applied to the part with CS
and RAS LOW, while CAS and WE are held HIGH.
3. In second-generation DRAMs, we said, “We write to a location given by a
column address,” by driving CAS LOW with the column address applied to the
part and then applying data to the part. In an SDRAM, we write to the part by
issuing the Write command to the part. Issuing a Write command is
accomplished on the rising edge of CLK with a column/bank address applied
to the part: CS, CAS, and WE are held LOW, and RAS is held HIGH.
Table 1.1 shows the commands used in an SDRAM. In addition, this table shows how
inputs/outputs (DQs) can be masked using the DQ mask (DQM) inputs. This feature is
useful when the DRAM is used in graphics applications.
Table 1.1: SDRAM commands. (Notes: 1)
Open table as spreadsheet

Name CS RAS CAS R/W DQM ADDR

Command H X X X X X
inhibit (NOP)

No operation L H H H X X
(NOP)

Active (select L L H H X Bank/row


bank and
activate row)

Read (select L H L H L/H8 Bank/col


bank and
column, and
start Read
burst)

Write (select L H L L L/H8 Bank/col


bank and
column, and
start Write
burst)

Burst L H H L X X
terminate

PRECHARGE L L H L X Code
(deactive row
in bank or
banks)

Auto-Refresh L L L H X X
or
Self-Refresh
(enter
Self-Refresh
mode)

Load mode L L L L X Opcode


register

Write — — — — L —
Enable/output
Table 1.1: SDRAM commands. (Notes: 1)
Open table as spreadsheet

Name CS RAS CAS R/W DQM ADDR


Enable

Write — — — — H —
inhibit/output
Hi-Z
Notes
1 . CKE is HIGH for all commands shown except for Self-Refresh.
2. A0–A11 define the op-code written to the mode register.
3. A0–A11 provide row address, and BA0, BA1 determine which bank is made active.
4. A0–A9 (x4), A0–A8 (x8), or A0–A7 (x16) provide column address; A10 HIGH enables the auto PRECHARG
(nonpersistent), while A 10 LOW disables the auto PRECHARGE feature; BA0, BA1 determine which bank is
written to.
5. A10 LOW: BA0, BA1 determine the bank being precharged. A10 HIGH: all banks precharged and BA0, BA
6. This command is Auto-Refresh if CKE is HIGH and Self-Refresh if CKE is LOW.
7. Internal Refresh counter controls row addressing; all inputs and I/Os are “don’t care” except for CKE.
8. Activates or deactivates the DQs during Writes (zero-clock delay) and Reads (two-clock delay).

SDRAMs often employ pipelining in the address and data paths to increase operating
speed. Pipelining is an effective tool in SDRAM design because it helps disconnect
operating frequency and access latency. Without pipelining, a DRAM can only process
one access instruction at a time. Essentially, the address is held valid internally until data
is fetched from the array and presented to the output buffers. This single instruction mode
of operation ties operating frequency and access time (or latency) together. However,
with pipelining, additional access instructions can be fed into the SDRAM before prior
access instructions have completed, which permits access instructions to be entered at a
higher rate than would otherwise be allowed. Hence, pipelining increases operating
speed.
Pipeline stages in the data path can also be helpful when synchronizing output data to the
system clock. CAS latency refers to a parameter used by the SDRAM to synchronize the
output data from a Read request with a particular edge of the system clock. A typical
Read for an SDRAM with CAS latency set to three is shown in Figure 1.19. SDRAMs
must be capable of reliably functioning over a range of operating frequencies while
maintaining a specified CAS latency. This is often accomplished by configuring the
pipeline stage to register the output data to a specific clock edge, as determined by the
CAS latency parameter.
Figure 1.19: SDRAM with a latency of three.
At this point, we should understand the basics of SDRAM operation, but we may be
asking, “Why are SDRAMs potentially faster than second-generation DRAMs such as
EDO or FPM?” The answer to this question comes from the realization that it’s possible to
activate a row in one bank and then, while the row is opening, perform an operation in
some other bank (such as reading or writing). In addition, one of the banks can be in a
PRECHARGE mode (the bitlines are driven to VCC/2) while accessing one of the other
banks and, thus, in effect hiding PRECHARGE and allowing data to be continuously
written to or read from the SDRAM. (Of course, this depends on which application and
memory address locations are used.) We use a mode register, as shown in Figure 1.20,
to put the SDRAM into specific modes of operation for programmable operation, including
pipelining and burst Reads/Writes of data [2].
Figure 1.20: Mode register.

1.2 DRAM BASICS


A modern DRAM memory cell or memory bit (mbit), as shown in Figure 1.21, is formed
with one transistor and one capacitor, accordingly referred to as a IT 1C cell. The mbit is
capable of holding binary information in the form of stored charge on the capacitor. The
mbit transistor operates as a switch interposed between the mbit capacitor and the
digitline. Assume that the capacitor’s common node is biased at VCC/2, which we will later
show as a reasonable assumption. Storing a logic one in the cell requires a capacitor with
a voltage of +VCC/2 across it. Therefore, the charge stored in the mbit capacitor is

where C is the capacitance value in farads. Conversely, storing a logic zero in the cell
requires a capacitor with a voltage of −VCC/2 across it. Note that the stored charge on the
mbit capacitor for a logic zero is

The charge is negative with respect to the VCC/2 common node voltage in this state.
Various leakage paths cause the stored capacitor charge to slowly deplete. To return the
stored charge and thereby maintain the stored data state, the cell must be refreshed. The
required refreshing operation is what makes DRAM memory dynamic rather than static.

Figure 1.21: IT 1C DRAM memory cell. (Note the rotation of the rowline and columnline.)
The digitline referred to earlier consists of a conductive line connected to a multitude of
mbit transistors. The conductive line is generally constructed from either metal or
silicide/polycide polysilicon. Because of the quantity of mbits connected to the digitline
and its physical length and proximity to other features, the digitline is highly capacitive.
For instance, a typical value for digitline capacitance on a 0.35 μm process might be
300fF. Digitline capacitance is an important parameter because it dictates many other
aspects of the design. We discuss this further in Section 2.1. For now, we continue
describing basic DRAM operation.
The mbit transistor gate terminal is connected to a wordline (rowline). The wordline, which
is connected to a multitude of mbits, is actually formed of the same polysilicon as that of
the transistor gate. The wordline is physically orthogonal to the digitline. A memory array
is formed by tiling a selected quantity of mbits together such that mbits along a given
digitline do not share a common wordline and mbits along a common wordline do not
share a common digitline. Examples of this are shown in Figures 1.22 and 1.23. In these
layouts, mbits are paired to share a common contact to the digitline, which reduces the
array size by eliminating duplication.

Figure 1.22: Open digitline memory array schematic.

Figure 1.23: Open digitline memory array layout.

1.2.1 Access and Sense Operations


Next, we examine the access and sense operations. We begin by assuming that the cells
connected to D1, in Figure 1.24, have logic one levels (+VCC/2) stored on them and that
the cells connected to D0 have logic zero levels (−VCC/2) stored on them. Next, we form a
digitline pair by considering two digitlines from adjacent arrays. The digitline pairs, labeled
D0/D0* and D1/D1*, are initially equilibrated to VCC/2 V. All wordlines are initially at 0 V,
ensuring that the mbit transistors are OFF. Prior to a wordline firing, the digitlines are
electrically disconnected from the VCC/2 bias voltage and allowed to float. They remain at
the VCC/2 PRECHARGE voltage due to their capacitance.
Figure 1.24: Simple array schematic (an open DRAM array).
To read mbitl, wordline WL0 changes to a voltage that is at least one transistor VTH above
VCC. This voltage level is referred to as VCCP or VPP. To ensure that a full logic one value
can be written back into the mbit capacitor, VCCP must remain greater than one VTH above
VCC. The mbit capacitor begins to discharge onto the digitline at two different voltage
levels depending on the logic level stored in the cell. For a logic one, the capacitor begins
to discharge when the wordline voltage exceeds the digitline PRECHARGE voltage by
VTH. For a logic zero, the capacitor begins to discharge when the wordline voltage
exceeds VTH. Because of the finite rise time of the wordline voltage, this difference in
turn-on voltage translates into a significant delay when reading ones, as seen in Figure
1.25.

Figure 1.25: Cell access waveforms.


Accessing a DRAM cell results in charge-sharing between the mbit capacitor and the
digitline capacitance. This charge-sharing causes the digitline voltage either to increase
for a stored logic one or to decrease for a stored logic zero. Ideally, only the digitline
connected to the accessed mbit will change. In reality, the other digitline voltage also
changes slightly, due to parasitic coupling between digitlines and between the firing
wordline and the other digitline. (This is especially true for the folded bitline architecture
discussed later.) Nevertheless, a differential voltage develops between the two digitlines.
The magnitude of this voltage difference, or signal, is a function of the mbit capacitance
(Cmbit), digitline capacitance (Cdigit), and voltage stored on the cell prior to access (Vcell).
See Figure 1.26. Accordingly,
Figure 1.26: DRAM charge-sharing.
A Vsignal of 235mV is yielded from a design in which Vcell=1.65, Cmbit=50fF, and
Cdigit=300fF.
After the cell has been accessed, sensing occurs. Sensing is essentially the amplification
of the digitline signal or the differential voltage between the digitlines. Sensing is
necessary to properly read the cell data and refresh the mbit cells. (The reason for
forming a digitline pair now becomes apparent.) Figure 1.27 presents a schematic
diagram for a simplified sense amplifier circuit: a cross-coupled NMOS pair and a
cross-coupled PMOS pair. The sense amplifiers also appear like a pair of cross-coupled
inverters in which ACT and NLAT* provide power and ground. The NMOS pair or
Nsense-amp has a common node labeled NLAT* (for Nsense-amp latch).

Figure 1.27: Sense amplifier schematic.


Similarly, the Psense-amp has a common node labeled ACT (for Active pull-up). Initially,
NLAT* is biased to VCC/2, and ACT is biased to VSS or signal ground. Because the
digitline pair D1 and D1* are both initially at VCC/2, the Nsense-amp transistors are both
OFF. Similarly, both Psense-amp transistors are OFF. Again, when the mbit is accessed,
a signal develops across the digitline pair. While one digitline contains charge from the
cell access, the other digitline does not but serves as a reference for the Sensing
operation. The sense amplifiers are generally fired sequentially: the Nsense-amp first,
then the Psense-amp. Although designs vary at this point, the higher drive of NMOS
transistors and better VTH matching offer better sensing characteristics by Nsense-amps
and thus lower error probability compared to Psense-amps.
Waveforms for the Sensing operation are shown in Figure 1.28. The Nsense-amp is fired
by bringing NLAT* (N sense-amp latch) toward ground. As the voltage difference between
NLAT* and the digitlines (D1 and D1* in Figure 1.27) approaches VTH, the NMOS
transistor whose gate is connected to the higher voltage digitline begins to conduct. This
conduction occurs first in the subthreshold and then in the saturation region as the
gate-to-source voltage exceeds VTH and causes the low-voltage digitline to discharge
toward the NLAT* voltage. Ultimately, NLAT* will reach ground and the digitline will be
brought to ground potential. Note that the other NMOS transistor will not conduct: its gate
voltage is derived from the low-voltage digitline, which is being discharged toward ground.
In reality, parasitic coupling between digitlines and limited subthreshold conduction by the
second transistor result in voltage reduction on the high digitline.

Figure 1.28: Sensing operation waveforms.


Sometime after the Nsense-amp fires, ACT will be brought toward VCC to activate the
Psense-amp, which operates in a complementary fashion to the Nsense-amp. With the
low-voltage digitline approaching ground, there is a strong signal to drive the appropriate
PMOS transistor into conduction. This conduction, again moving from subthreshold to
saturation, charges the high-voltage digitline toward ACT, ultimately reaching VCC.
Because the mbit transistor remains ON, the mbit capacitor is refreshed during the
Sensing operation. The voltage, and hence charge, which the mbit capacitor held prior to
accessing, is restored to a full level: VCC for a logic one and ground for a logic zero. It
should be apparent now why the minimum wordline voltage is a VTH above VCC. If VCCP
were anything less, a full VCC level could not be written back into the mbit capacitor. The
mbit transistor source voltage Vsource cannot be greater than Vgate−VTH because this would
turn OFF the transistor.

1.2.2 Write Operation


A Write operation is similar to a Sensing and Restore operation except that a separate
Write driver circuit determines the data that is placed into the cell. The Write driver circuit
is generally a tristate inverter connected to the digitlines through a second pair of pass
transistors, as shown in Figure 1.29. These pass transistors are referred to as I/O
transistors. The gate terminals of the I/O transistors are connected to a common column
select (CSEL) signal. The CSEL signal is decoded from the column address to select
which pair (or multiple pairs) of digitlines is routed to the output pad or, in this case, the
Write driver.
Figure 1.29: Sense amplifier schematic with I/O devices.
In most current DRAM designs, the Write driver simply overdrives the sense amplifiers,
which remain ON during the Write operation. After the new data is written into the sense
amplifiers, the amplifiers finish the Write cycle by restoring the digitlines to full rail-to-rail
voltages. An example is shown in Figure 1.30 in which D1 is initially HIGH after the
Sensing operation and LOW after the writing operation. A Write operation usually involves
only 2–4 mbits within an array of mbits because a single CSEL line is generally connected
to only four pairs of I/O transistors. The remaining digitlines are accessed through
additional CSEL lines that correspond to different column address locations.

Figure 1.30: Write operation waveforms.

1.2.3 Opening a Row (Summary)


Opening a row of mbits in a DRAM array is a fundamental operation for both reading and
writing to the DRAM array. Sometimes the chain of events from a circuit designer’s point
of view, which lead to an open row, is called the RAS timing chain. We summarize the
RAS timing chain of events below, assuming that for a second-generation DRAM both
RAS and CAS are HIGH. (It’s trivial to extend our discussion to third-generation DRAMs
where RAS and CAS are effectively generated from the control logic.)
1. Initially, both RAS and CAS are HIGH. All bitlines in the DRAM are
driven to VCC/2, while all wordlines are at 0 V. This ensures that all of
the mbit’s access transistors in the DRAM are OFF.
2. A valid row address is applied to the DRAM and RAS goes LOW. While
the row address is being latched, on the falling edge of RAS, and
decoded, the bitlines are disconnected from the VCC/2 bias and allowed
to float. The bitlines at this point are charged to VCC/2, and they can be
thought of as capacitors.
3. The row address is decoded and applied to the wordline drivers. This
forces only one rowline in at least one memory array to VCCP. Driving
the wordline to VCCP turns ON the mbits attached to this rowline and
causes charge-sharing between the mbit capacitance and the
capacitance of the corresponding bitline. The result is a small
perturbation (upwards for a logic one and downwards for a logic zero)
in the bitline voltages.
4. The next operation is Sensing, which has two purposes: (a) to determine
if a logic one or zero was written to the cell and (b) to refresh the
contents of the cell by restoring a full logic zero (0 V) or one (VCC) to the
capacitor. Following the wordlines going HIGH, the Nsense-amp is
fired by driving, via an n-channel MOSFET, NLAT* to ground. The
inputs to the sense amplifier are two bitlines: the bitline we are sensing
and the bitline that is not active (a bitline that is still charged to
VCC/2—an inactive bitline). Pulling NLAT* to ground results in one of
the bitlines going to ground. Next, the ACT signal is pulled up to VCC,
driving the other bitline to VCC. Some important notes:
a.It doesn’t matter if a logic one or logic zero was sensed because
the inactive and active bitlines are pulled in opposite directions.
b.The contents of the active cell, after opening a row, are restored
to full voltage levels (either 0 V or VCC). The entire DRAM can
be refreshed by opening each row.
Now that the row is open, we can write to or read from the DRAM. In either case, it is a
simple matter of steering data to or from the active array(s) using the column decoder.
When writing to the array, buffers set the new logic voltage levels on the bitlines. The row
is still open because the wordline remains HIGH. (The row stays open as long as RAS is
LOW.)
When reading data out of the DRAM, the values sitting on the bitlines are transmitted to
the output buffers via the I/O MOSFETs. To increase the speed of the reading operation,
this data, in most situations, is transmitted to the output buffer (sometimes called a DQ
buffer) either through a helper flip-flop or another sense amplifier.

A note is in order here regarding the word size stored in or read out of the memory array.
We may have 512 active bitlines when a single rowline in an array goes HIGH (keeping in
mind once again that only one wordline in an array can go HIGH at any given time). This
literally means that we could have a word size of 512 bits from the active array. The
inherent wide word size has led to the push, at the time of this writing, of embedding
DRAM with a processor (for example, graphics or data). The wide word size and the fact
that the word doesn’t have to be transmitted off-chip can result in lower-power,
higher-speed systems. (Because the memory and processor don’t need to communicate
off-chip, there is no need for power-hungry, high-speed buffers.)
1.2.4 Open/Folded DRAM Array Architectures
Throughout the book, we make a distinction between the open array architecture as
shown in Figures 1.22 and 1.24 and the folded DRAM array used in modern DRAMs and
seen in Figure 1.31. At the cost of increased layout area, folded arrays increase noise
immunity by moving sense amp inputs next to each other. These sense amp inputs come
directly from the DRAM array. The term folded comes from taking the DRAM arrays seen
in Figure 1.24 and folding them together to form the topology seen in Figure 1.31.

Figure 1.31: A folded DRAM array.

REFERENCES

[1] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation.
Piscataway, NJ: IEEE Press, 1998.

[2] Micron Technology, Inc., Synchronous DRAM Data Sheet, 1999.

FOR FURTHER REFERENCE


See the Appendix for additional readings and references.
Chapter 2: The DRAM Array
This chapter begins a more detailed examination of standard DRAM array elements. This
examination is necessary for a clear understanding of fundamental DRAM elements and
how they are used in memory block construction. A common point of reference is required
before considering the analysis of competing array architectures. Included in this chapter
is a detailed discussion of mbits, array configurations, sense amplifier elements, and row
decoder elements.

2.1 THE MBIT CELL


The primary advantage of DRAM over other types of memory technology is low cost. This
advantage arises from the simplicity and scaling characteristics of its IT 1C memory cell
[1]. Although the DRAM mbit is simple conceptually, its actual design and implementation
are highly complex. Therefore, successful, cost-effective DRAM designs require a
tremendous amount of process technology.
Figure 2.1 presents the layout of a modern buried capacitor DRAM mbit pair. (Buried
means that the capacitor is below the digitline.) This type of mbit is also referred to as a
bitline over capacitor (BOC) cell. Because sharing a contact significantly reduces overall
cell size, DRAM mbits are constructed in pairs. In this way, a digitline contact can be
shared. The mbits comprise an active area rectangle (in this case, an n+ active area), a
pair of poly silicon wordlines, a single digitline contact, a metal or poly silicon digitline, and
a pair of cell capacitors formed with an oxide-nitride-oxide dielectric between two layers of
polysilicon. For most processes, the wordline polysilicon is silicided to reduce sheet
resistance, permitting longer wordline segments without reducing speed. The mbit layout,
as shown in Figure 2.1, is essentially under the control of process engineers, for every
aspect of the mbit must meet stringent performance and yield criteria.

Figure 2.1: Mbit pair layout.


A small array of mbits appears in Figure 2.2. This figure is useful to illustrate several
features of the mbit. First, note that the digitline pitch (width plus space) dictates the
active area pitch and the capacitor pitch. Process engineers adjust the active area width
and the field oxide width to maximize transistor drive and minimize transistor-to-transistor
leakage. Field oxide technology greatly impacts this balance. A thicker field oxide or a
shallower junction depth affords a wider transistor active area. Second, the wordline pitch
(width plus space) dictates the space available for the digitline contact, transistor length,
active area, field poly width, and capacitor length. Optimization of each of these features
by process engineers is necessary to maximize capacitance, minimize leakage, and
maximize yield. Contact technology, subthreshold transistor characteristics,
photolithography, and etch and film technology dictate the overall design.

Figure 2.2: Layout to show array pitch.


At this point in the discussion, it is appropriate to introduce the concept of feature size and
how it relates to cell size. The mbit shown in Figures 2.1 and 2.2 is by definition an
eight-square feature cell (8F2) [2] [3]. The intended definition of feature (F) in this case is
minimum realizable process dimension but in fact equates to a dimension that is one-half
the wordline (row) or digitline (column) pitch. A 0.25 μm process having wordline and
digitline pitches of 0.6 μm yields an mbit size that is

It is easier to explain the 8F2 designation with the aid of Figure 2.3. An imaginary box
drawn around the mbit defines the cell’s outer boundary. Along the x-axis, this box
includes one-half digitline contact feature, one wordline feature, one capacitor feature,
one field poly feature, and one-half poly space feature for a total of four features. Along
the y-axis, this box contains two one-half field oxide features and one active area feature
for a total of two features. The area of the mbit is therefore

Figure 2.3: Layout to show 8F2 derivation.


The folded array architecture, as shown in Figure 2.2, always produces an 8F2 mbit. This
results from the fact that each wordline connects or forms a crosspoint with an mbit
transistor on every other digitline and must pass around mbit transistors as field poly on
the remaining digitlines. The field poly in each mbit cell adds two square features to what
would have been a 6F2 cell. Although the folded array yields a cell that is 25% larger than
other array architectures, it also produces superior signal-to-noise performance,
especially when combined with some form of digitline twisting [4]. Superior low-noise
performance has made folded array architecture the architecture of choice since the
64kbit generation [5].
A folded array is schematically depicted in Figure 2.4. Sense amplifier circuits placed at
the edge of each array connect to both true and complement digitlines (D and D*) coming
from a single array. Optional digitline pair twisting in one or more positions reduces and
balances the coupling to adjacent digitline pairs and improves overall signal-to-noise
characteristics [4]. Figure 2.5 shows the variety of twisting schemes used throughout the
DRAM industry [6].

Figure 2.4: Folded digitline array schematic.


Figure 2.5: Digitline twist schemes.

Ideally, a twisting scheme equalizes the coupling terms from each digitline to all other
digitlines, both true and complement. If implemented properly, the noise terms cancel or
at least produce only common-mode noise to which the differential sense amplifier is
more immune.
Each digitline twist region consumes valuable silicon area. Thus, design engineers resort
to the simplest and most efficient twisting scheme to get the job done. Because the
coupling between adjacent metal lines is inversely proportional to the line spacing, the
signal-to-noise problem gets increasingly worse as DRAMs scale to smaller and smaller
dimensions. Hence, the industry trend is toward use of more complex twisting schemes
on succeeding generations [6] [7].
An alternative to the folded array architecture, popular prior to the 64kbit generation [1],
was the open digitline architecture. Seen schematically in Figure 2.6, this architecture
also features the sense amplifier circuits between two sets of arrays [8]. Unlike the folded
array, however, true and complement digitlines (D and D*) connected to each sense
amplifier pair come from separate arrays [9]. This arrangement precludes using digitline
twisting to improve signal-to-noise performance, which is the prevalent reason why the
industry switched to folded arrays. Note that unlike the folded array architecture, each
wordline in an open digitline architecture connects to mbit transistors on every digitline,
creating crosspoint-style arrays.

Figure 2.6: Open digitline array schematic.


This feature permits a 25% reduction in mbit size to only 6F2 because the wordlines do
not have to pass alternate mbits as field poly. The layout for an array of standard 6F2 mbit
pairs is shown in Figure 2.7 [2]. A box is drawn around one of the mbits to show the 6F2
cell boundary. Again, two mbits share a common digitline contact to improve layout
efficiency. Unfortunately, most manufacturers have found that the signal-to-noise
problems of open digitline architecture outweigh the benefits derived from reduced array
size [8].

Figure 2.7: Open digitline array layout. (Feature size (F) is equal to one-half digitline pitch.)

Digitline capacitive components, contributed by each mbit, include junction capacitance,


digitline-to-cellplate (poly3), digitline-to-wordline, digitline-to-digitline, digitline-to-substrate,
and, in some cases, digitline-to-storage cell (poly2) capacitance. Therefore, each mbit
connected to the digitline adds a specific amount of capacitance to the digitline. Most
modern DRAM designs have no more than 256 mbits connected to a digitline segment.
Two factors dictate this quantity. First, for a given cell size, as determined by row and
column pitches, a maximum storage capacitance can be achieved without resorting to
exotic processes or excessive cell height. For processes in which the digitline is above
the storage capacitor (buried capacitor), contact technology determines the maximum
allowable cell height. This fixes the volume available (cell area multiplied by cell height) in
which to build the storage capacitor. Second, as the digitline capacitance increases, the
power associated with charging and discharging this capacitance during Read and Write
operations increases. Any given wordline essentially accesses (crosses) all of the
columns within a DRAM. For a 256-Meg DRAM, each wordline crosses 16,384 columns.
With a multiplier such as that, it is easy to appreciate why limits to digitline capacitance
are necessary to keep power dissipation low.
Figure 2.8 presents a process cross section for the buried capacitor mbit depicted in
Figure 2.2, and Figure 2.9 shows a SEM image of the buried capacitor mbit. This type of
mbit, employing a buried capacitor structure, places the digitline physically above the
storage capacitor [10]. The digitline is constructed from either metal or polycide, while the
digitline contact is formed using a metal or polysilicon plug technology. The mbit capacitor
is formed with polysilicon (poly2) as the bottom plate, an oxide-nitride-oxide (ONO)
dielectric, and a sheet of polysilicon (poly3). This top sheet of polysilicon becomes a
common node shared by all mbit capacitors. The capacitor shape can be simple, such as
a rectangle, or complex, such as concentric cylinders or stacked discs. The most complex
capacitor structures are the topic of many DRAM process papers [11] [12] [13].

Figure 2.8: Buried capacitor cell process cross section.

Figure 2.9: Buried capacitor cell process SEM image.


The ONO dielectric undergoes optimization to achieve maximum capacitance with
minimum leakage. It must also tolerate the maximum DRAM operating voltage without
breaking down. For this reason, the cellplate (poly3) is normally biased at +VCC/2 V,
ensuring that the dielectric has no more than VCC/2 V across it for either stored logic state,
a logic one at +VCC/2 V or a logic zero at −VCC/2 V.
Two other basic mbit configurations are used in the DRAM industry. The first, shown in
Figures 2.10, 2.11, and 2.12, is referred to as a buried digitline or capacitor over bitline
(COB) cell [14] [15]. The digitline in this cell is almost always formed of polysilicon rather
than of metal.

Figure 2.10: Buried digitline mbit cell layout.

Figure 2.11: Buried digitline mbit process cross section.

Figure 2.12: Buried digitline mbit process SEM image.


As viewed from the top, the active area is normally bent or angled to accommodate the
storage capacitor contact that must drop between digitlines. An advantage of the buried
digitline cell over the buried capacitor cell of Figure 2.8 is that its digitline is physically very
close to the silicon surface, making digitline contacts much easier to produce. The angled
active area, however, reduces the effective active area pitch, constraining the isolation
process even further. In buried digitline cells, it is also very difficult to form the capacitor
contact. Because the digitline is at or near minimum pitch for the process, insertion of a
contact between digitlines can be difficult.
Figures 2.13 and 2.14 present a process cross section of the third type of mbit used in the
construction of DRAMs. Using trench storage capacitors, this cell is accordingly called a
trench cell [12] [13]. Trench capacitors are formed in the silicon substrate, rather than
above the substrate, after etching deep holes into the wafer. The storage node is a doped
polysilicon plug, which is deposited in the hole following growth or deposition of the
capacitor dielectric. Contact between the storage node plug and the transistor drain is
usually made through a poly strap.

Figure 2.13: Trench capacitor mbit process cross section.


Figure 2.14: Trench capacitor mbit process SEM image.
With most trench capacitor designs, the substrate serves as the common-node
connection to the capacitors, preventing the use of +VCC/2 bias and thinner dielectrics.
The substrate is heavily doped around the capacitor to reduce resistance and improve the
capacitor’s CV characteristics. A real advantage of the trench cell is that the capacitance
can be increased by merely etching a deeper hole into the substrate [16].

Furthermore, the capacitor does not add stack height to the design, greatly simplifying
contact technology. The disadvantage of trench capacitor technology is the difficulty
associated with reliably building capacitors in deep silicon holes and connecting the
trench capacitor to the transistor drain terminal.

2.2 THE SENSE AMP


The term sense amplifier actually refers to a collection of circuit elements that pitch up to
the digitlines of a DRAM array. This collection most generally includes isolation
transistors, devices for digitline equilibration and bias, one or more Nsense-amplifiers,
one or more Psense-amplifiers, and devices connecting selected digitlines to I/O signal
lines. All of the circuits along with the wordline driver circuits, as discussed in Section 2.3,
are called pitch cells. This designation comes from the requirement that the physical
layout for these circuits is constrained by the digitline and wordline pitches of an array of
mbits. For example, the sense amplifiers for a specific digitline pair (column) are generally
laid out within the space of four digitlines. With one sense amplifier for every four
digitlines, this is commonly referred to as quarter pitch or four pitch.

2.2.1 Equilibration and Bias Circuits


The first elements analyzed are the equilibration and bias circuits. As we can recall from
the discussions of DRAM operation in Section 1.1, the digitlines start at VCC/2 V prior to
cell Access and Sensing [17]. It is vitally important to the Sensing operation that both
digitlines, which form a column pair, are of the same voltage before the wordline is fired.
Any offset voltage appearing between the pair directly reduces the effective signal
produced during the Access operation [5]. Equilibration of the digitlines is accomplished
with one or more NMOS transistors connected between the digitline conductors. NMOS is
used because of its higher drive capability and the resulting faster equilibration.
An equilibration transistor, together with bias transistors, is shown in Figure 2.15. The
gate terminal is connected to a signal called equilibrate (EQ). EQ is held to VCC whenever
the external row address strobe signal RAS is HIGH. This indicates an inactive or
PRECHARGE state for the DRAM. After RAS has fallen, EQ transitions LOW, turning the
equilibration transistor OFF just prior to any wordline firing. EQ will again transition HIGH
at the end of a RAS cycle to force equilibration of the digitlines. The equilibration
transistor is sized large enough to ensure rapid equilibration of the digitlines to prepare
the part for a subsequent access.

Figure 2.15: Equilibration schematic.


As shown in Figure 2.15, two more NMOS transistors accompany the EQ transistor to
provide a bias level of VCC/2 V. These devices operate in conjunction with equilibration to
ensure that the digitline pair remains at the prescribed voltage for Sensing. Normally,
digitlines that are at VCC and ground equilibrate to VCC/2 V [5]. The bias devices ensure
that this occurs and also that the digitlines remain at VCC/2, despite leakage paths that
would otherwise discharge them. Again, for the same reasons as for the equilibration
transistor, NMOS transistors are used. Most often, the bias and equilibration transistors
are integrated to reduce their overall size. VCC/2 V PRECHARGE is used on most modern
DRAMs because it reduces power consumption and Read-Write times and improves
Sensing operations. Power consumption is reduced because a VCC/2 PRECHARGE
voltage can be obtained by equilibrating the digitlines (which are at VCC and ground,
respectively) at the end of each cycle.
The charge-sharing between the digitlines produces VCC/2 without additional ICC current.
The IBM® PMOS 16-Meg mbit DRAM designs are exceptions: they equilibrate and bias
the digitlines to VCC [18]. Because the wordlines and digitlines are both at VCC when the
part is inactive, row-to-column shorts that exist do not contribute to increased standby
current. VCC/2 PRECHARGE DRAMs, on the other hand, suffer higher standby current
with row-to-column shorts because the wordlines and digitlines are at different potentials
when the part is inactive. A layout for the equilibration and bias circuits is shown in Figure
2.16.

Figure 2.16: Equilibration and bias circuit layout.

2.2.2 Isolation Devices


Isolation devices are important to the sense amplifier circuits. These devices are NMOS
transistors placed between the array digitlines and the sense amplifiers (see Figure 2.19).
Isolation transistors are physically located on both ends of the sense amplifier layout. In
quarter-pitch sense amplifier designs, there is one isolation transistor for every two
digitlines. Although this is twice the active area width and space of an array, it
nevertheless sets the limit for isolation processing in the pitch cells.

Figure 2.19: Standard sense amplifier block.


The isolation devices provide two functions. First, if the sense amps are positioned
between and connected to two arrays, they electrically isolate one of the two arrays. This
is necessary whenever a wordline fires in one array because isolation of the second array
reduces the digitline capacitance driven by the sense amplifiers, thus speeding
Read-Write times, reducing power consumption, and extending Refresh for the isolated
array. Second, the isolation devices provide resistance between the sense amplifier and
the digitlines. This resistance stabilizes the sense amplifiers and speeds up the Sensing
operation by somewhat isolating the highly capacitive digitlines from the low-capacitance
sense nodes [19]. Capacitance of the sense nodes between isolation transistors is
generally less than 15fF, permitting the sense amplifier to latch much faster than if it were
solidly connected to the digitlines. The isolation transistors slow Write-Back to the mbits,
but this is far less of a problem than initial Sensing.

2.2.3 Input/Output Transistors


The input/output (I/O) transistors allow data to be read from and written to specific
digitline pairs. A single I/O transistor is connected to each sense node as shown in Figure
2.17. The outputs of each I/O transistor are connected to I/O signal pairs. Commonly,
there are two pairs of I/O signal lines, which permit four I/O transistors to share a single
column select (CSEL) control signal. DRAM designs employing two or more metal layers
run the column select lines across the arrays in either Metal2 or Metal3. Each column
select can activate four I/O transistors on each side of an array to connect four digitline
pairs (columns) to peripheral data path circuits. The I/O transistors must be sized
carefully to ensure that instability is not introduced into the sense amplifiers by the I/O
bias voltage or remnant voltages on the I/O lines.

Figure 2.17: I/O transistors.


Although designs vary significantly as to the numerical ratio, I/O transistors are designed
to be two to eight times smaller than the Nsense-amplifier transistors. This is sometimes
referred to as beta ratio. A beta ratio between five and eight is considered standard;
however, it can only be verified with silicon. Simulations fail to adequately predict sense
amplifier instability, although theory would predict better stability with higher beta ratio
and better Write times with lower beta ratio. During Write, the sense amplifier remains ON
and must be overdriven by the Write driver (see Section 1.2.2).
2.2.4 Nsense- and Psense-Amplifiers
The fundamental elements of any sense amplifier block are the Nsense-amplifier and the
Psense-amplifier. These amplifiers, as previously discussed, work together to detect the
access signal voltage and drive the digitlines, accordingly to VCC and ground. The
Nsense-amplifier depicted in Figure 2.18 consists of cross-coupled NMOS transistors.
The Nsense-amplifier drives the low-potential digitline to ground. Similarly, the
Psense-amplifier consists of cross-coupled PMOS transistors and drives the
high-potential digitline to VCC.

Figure 2.18: Basic sense amplifier block.


The sense amplifiers are carefully designed to guarantee correct detection and
amplification of the small signal voltage produced during cell access (less than 200mV)
[5]. Matching of transistor VTH, transconductance, and junction capacitance within close
tolerances helps ensure reliable sense amplifier operation. Ultimately, the layout dictates
the overall balance and performance of the sense amplifier block. As a result, a
tremendous amount of time is spent ensuring that the sense amplifier layout is optimum.
Symmetry and exact duplication of elements are critical to a successful design. This
includes balanced coupling to all sources of noise, such as I/O lines and latch signals
(NLAT* and ACT). Balance is especially critical for layout residing inside the isolation
transistors. And because the sense node capacitance is very low, it is more sensitive to
noise and circuit imbalances.
While the majority of DRAM designs latch the digitlines to VCC and ground, a growing
number of designs are beginning to reduce these levels. Various technical papers report
improved Refresh times and lower power dissipation through reductions in latch voltages
[20] [21]. At first, this appears contradictory: writing a smaller charge into the memory cell
would require refreshing the cell more often. However, by maintaining lower
drain-to-source voltages (VDS) and negative gate-to-source voltages (VGS) across
nonaccessed mbit transistors, substantially lower subthreshold leakage and longer
Refresh times can be realized, despite the smaller stored charge. An important concern
that we are not mentioning here is the leakage resulting from defects in the silicon crystal
structure. These defects can result in an increase in the drain/substrate diode saturation
current and can practically limit the Refresh times.
Most designs that implement reduced latch voltages generally raise the Nsense-amplifier
latch voltage without lowering the Psense-amplifier latch voltage. Designated as boosted
sense ground designs, they write data into each mbit using full VCC for a logic one and
boosted ground for a logic zero. The sense ground level is generally a few hundred
millivolts above true ground. In standard DRAMs, which drive digitlines fully to ground, the
VGS of nonaccessed mbits becomes zero when the digitlines are latched. This results in
high subthreshold leakage for a stored one level because full VCC exists across the mbit
transistor while the VGS is held to zero. Stored zero levels do not suffer from prolonged
subthreshold leakage: any amount of cell leakage produces a negative VGS for the
transistor. The net effect is that a stored one level leaks away much faster than a stored
zero level. One’s level retention, therefore, establishes the maximum Refresh period for
most DRAM designs. Boosted sense ground extends Refresh by reducing subthreshold
leakage for stored ones. This is accomplished by guaranteeing negative gate-to-source
bias on nonaccessed mbit transistors. The benefit of extended Refresh from these
designs is somewhat diminished, though, by the added complexity of generating boosted
ground levels and the problem of digitlines that no longer equilibrate at VCC/2 V.

2.2.5 Rate of Activation


The rate at which the sense amplifiers are activated has been the subject of some debate.
A variety of designs use multistage circuits to control the rate at which NLAT* fires.
Especially prevalent with boosted sense ground designs are two-stage circuits that
initially drive NLAT* quickly toward true ground to speed sensing and then bring NLAT* to
the boosted ground level to reduce cell leakage. An alternative to this approach, again
using two-stage drivers, drives NLAT* to ground, slowly at first to limit current and digitline
disturbances. This is followed by a second phase in which NLAT* is driven more strongly
toward ground to complete the Sensing operation. This phase usually occurs in
conjunction with ACT activation. Although these two designs have contrary operation,
each meets specific performance objectives: trading off noise and speed.

2.2.6 Configurations
Figure 2.19 shows a sense amplifier block commonly used in double- or triple-metal
designs. It features two Psense-amplifiers outside the isolation transistors, a pair of
EQ/bias (EQb) devices, a single Nsense-amplifier, and a single I/O transistor for each
digitline. Because only half of the sense amplifiers for each array are on one side, this
design is quarter pitch, as are the designs in Figures 2.20 and 2.21. Placement of the
Psense-amplifiers outside the isolation devices is necessary because a full one level (VCC)
cannot pass through unless the gate terminal of the ISO transistors is driven above VCC.
EQ/bias transistors are placed outside of the ISO devices to permit continued
equilibration of digitlines in arrays that are isolated. The I/O transistor gate terminals are
connected to a common CSEL signal for four adjacent digitlines. Each of the four I/O
transistors is tied to a separate I/O bus. This sense amplifier, though simple to implement,
is somewhat larger than other designs due to the presence of two Psense-amplifiers.
Figure 2.20: Complex sense amplifier block.

Figure 2.21: Reduced sense amplifier block.


Figure 2.20 shows a second, more complicated style of sense amplifier block. This design
employs a single Psense-amplifier and three sets of Nsense-amplifiers. Because the
Psense-amplifier is within the isolation transistors, the isolation devices must be the
NMOS depletion type, the PMOS enhancement type, or the NMOS enhancement type
with boosted gate drive to permit writing a full logic one into the array mbits. The triple
Nsense-amplifier is suggestive of PMOS isolation transistors; it prevents full zero levels to
be written unless the Nsense-amplifiers are placed adjacent to the arrays. In this more
complicated style of sense amplifier block, using three Nsense-amplifiers guarantees
faster sensing and higher stability than a similar design using only two Nsense-amplifiers.
The inside Nsense-amplifier is fired before the outside Nsense-amplifiers. However, this
design will not yield a minimum layout, an objective that must be traded off against
performance needs.
The sense amplifier block of Figure 2.21 can be considered a reduced configuration. This
design has only one Nsense-amp and one Psense-amp, both of which are placed within
the isolation transistors. To write full logic levels, either the isolation transistors must be
depletion mode devices or the gate voltage must be boosted above VCC by at least one
VTH. This design still uses a pair of EQ/bias circuits to maintain equilibration on isolated
arrays.
Only a handful of designs operates with a single EQ/bias circuit inside the isolation
devices, as shown in Figure 2.22. Historically, DRAM engineers tended to shy away from
designs that permitted digitlines to float for extended periods of time. However, as of this
writing, at least three manufacturers in volume production have designs using this
scheme.

Figure 2.22: Minimum sense amplifier block.


A sense amplifier design for single-metal DRAMs is shown in Figure 2.23. Prevalent on
1-Meg and 4-Meg designs, single-metal processes conceded to multi-metal processes at
the 16-Meg generation. Unlike the sense amplifiers shown in Figures 2.19, 2.20, 2.21,
and 2.22, single-metal sense amps are laid out at half pitch: one amplifier for every two
array digitlines. This type of layout is extremely difficult and places tight constraints on
process design margins. With the loss of Metal2, the column select signals are not
brought across the memory arrays. Generating column select signals locally for each set
of I/O transistors requires a full column decode block.

Figure 2.23: Single-metal sense amplifier block.


As shown in Figure 2.23, the Nsense-amp and Psense-amps are placed on separate
ends of the arrays. Sharing sense amps between arrays is especially beneficial for
single-metal designs. As illustrated in Figure 2.23, two Psense-amps share a single
Nsense-amp. In this case, with the I/O devices on only one end, the right Psense-amp is
activated only when the right array is accessed. Conversely, the left Psense-amp is
always activated, regardless of the accessed array, because all Read and Write data
must pass through the left Psense-amp to get to the I/O devices.

2.2.7 Operation
A set of signal waveforms is illustrated in Figure 2.24 for the sense amplifier of Figure
2.19. These waveforms depict a Read-Modify-Write cycle (Late Write) in which the cell
data is first read out and then new data is written back. In this example, a one level is read
out of the cell, as indicated by D* rising above D during cell access. A one level is always
+VCC/2 in the mbit cell, regardless of whether it is connected to a true or complement
digitline. The correlation between mbit cell data and the data appearing at the DRAM’s
data terminal (DQ) is a function of the data topology and the presence of data scrambling.
Data or topo scrambling is implemented at the circuit level: it ensures that the mbit data
state and the DQ logic level are in agreement. An mbit one level (+VCC/2) corresponds to
a logic one at the DQ, and an mbit zero level (−VCC/2) corresponds to a logic zero at the
DQ terminal.

Figure 2.24: Waveforms for the Read-Modify-Write cycle.


Writing specific data patterns into the memory arrays is important to DRAM testing. Each
type of data pattern identifies the weaknesses or sensitivities of each cell to the data in
surrounding cells. These patterns include solids, row stripes, column stripes, diagonals,
checkerboards, and a variety of moving patterns. Test equipment must be programmed
with the data topology of each type of DRAM to correctly write each pattern. Often the
tester itself guarantees that the pattern is correctly written into the arrays, unscrambling
the complicated data and address topology as necessary to write a specific pattern. On
some newer DRAM designs, part of this task is implemented on the DRAM itself, in the
form of a topo scrambler, such that the mbit data state matches the DQ logic level. This
implementation somewhat simplifies tester programming.
Returning to Figure 2.24, we see that a wordline is fired in Array 1. Prior to this, ISOa* will
go LOW to isolate Array0, and EQb will go LOW to disable the EQ/bias transistors
connected to Array1. The wordline then fires HIGH, accessing an mbit, which dumps
charge onto D0*. NLAT*, which is initially at VCC/2, drives LOW to begin the Sensing
operation and pulls D0 toward ground. Then, ACT fires, moving from ground to VCC,
activating the Psense-amplifier and driving D0* toward VCC. After separation has
commenced, CSEL0 rises to VCC, turning ON the I/O transistors so that the cell data can
be read by peripheral circuits. The I/O lines are biased at a voltage close to VCC, which
causes D1 to rise while the column is active. After the Read is complete, Write drivers in
the periphery turn ON and drive the I/O lines to opposite data states (in our example).
This new data propagates through the I/O devices and writes over the existing data
stored in the sense amplifiers. Once the sense amplifiers latch the new data, the Write
drivers and the I/O devices can be shut down, allowing the sense amplifiers to finish
restoring the digitlines to full levels. Following this restoration, the wordline transitions
LOW to shut OFF the mbit transistor. Finally, EQb and ISOa* fire HIGH to equilibrate the
digitlines back to VCC/2 and reconnect Array0 to the sense amplifiers. The timing for each
event of Figure 2.24 depends on circuit design, transistor sizes, layout, device
performance, parasitics, and temperature. While timing for each event must be minimized
to achieve optimum DRAM performance, it cannot be pushed so far as to eliminate all
timing margins. Margins are necessary to ensure proper device operation over the
expected range of process variations and the wide range of operating conditions.
Again, there is not one set of timing waveforms that covers all design options. The sense
amps of Figures 2.19–2.23 all require slightly different signals and timings. Various
designs actually fire the Psense-amplifier prior to or coincident with the Nsense-amplifier.
This obviously places greater constraints on the Psense-amplifier design and layout, but
these constraints are balanced by potential performance benefits. Similarly, the sequence
of events as well as the voltages for each signal can vary. There are almost as many
designs for sense amplifier blocks as there are DRAM design engineers. Each design
reflects various influences, preconceptions, technologies, and levels of understanding.
The bottom line is to maximize yield and performance and minimize everything else.

2.3 ROW DECODER ELEMENTS

Row decode circuits are similar to sense amplifier circuits in that they pitch up to mbit
arrays and have a variety of implementations. A row decode block is comprised of two
basic elements: a wordline driver and an address decoder tree. There are three basic
configurations for wordline driver circuits: the NOR driver, the inverter (CMOS) driver, and
the bootstrap driver. In addition, the drivers and associated decode trees can be
configured either as local row decodes for each array section or as global row decodes
that drive a multitude of array sections.
Global row decodes connect to multiple arrays through metal wordline straps. The straps
are stitched to the polysilicon wordlines at specific intervals dictated by the polysilicon
resistance and the desired RC wordline time constant. Most processes that strap
wordlines with metal do not silicide the polysilicon, although doing so would reduce the
number of stitch regions required. Strapping wordlines and using global row decoders
obviously reduces die size [22], very dramatically in some cases. The disadvantage of
strapping is that it requires an additional metal layer at minimum array pitch. This puts a
tremendous burden on process technologists in which three conductors are at minimum
pitch: wordlines, digitlines, and wordline straps.

Local row decoders, on the other hand, require additional die size rather than metal
straps. It is highly advantageous to reduce the polysilicon resistance in order to stretch
the wordline length and lower the number of row decodes needed. This is commonly
achieved with silicided polysilicon processes. On large DRAMs, such as the 1Gb, the
area penalty can be prohibitive, making low-resistance wordlines all the more necessary.

2.3.1 Bootstrap Wordline Driver


The bootstrap wordline driver shown in Figure 2.25 is built exclusively from NMOS
transistors [22], resulting in the smallest layout of the three types of driver circuits. The
absence of PMOS transistors eliminates large Nwell regions from the layout. As the name
denotes, this driver relies on bootstrapping principles to bias the output transistor’s gate
terminal. This bias voltage must be high enough to allow the NMOS transistor to drive the
wordline to the boosted wordline voltage VCCP.
Figure 2.25: Bootstrap wordline driver.
Operation of the bootstrap driver is described with help from Figure 2.26. Initially, the
driver is OFF with the wordline and PHASE terminals at ground. The wordline is held at
ground by transistor M2 because the decoder output signal DEC* is at VCC. The gate of
M3, the pass transistor, is fixed at VCC. The signals DEC and DEC* are fed from a decode
circuit that will be discussed later. As a complement pair, DEC and DEC* represent the
first of two terms necessary to decode the correct wordline. PHASE0, which is also fed
from a decode circuit, represents the second term. If DEC rises to VCC and DEC* drops to
ground, as determined by the decoder, the boot node labeled B1 will rise to VCC−VTN V,
and M2 will turn OFF. The wordline continues to be held to ground by M1 because
PHASE0 is still grounded. After B1 rises to VCC−VTN, the PHASE0 signal fires to the
boosted wordline voltage VCCP. Because of the gate-to-drain and gate-to-source
capacitance of M1, the gate of M1 boots to an elevated voltage, Vboot. This voltage is
determined by the parasitic capacitance of node B1, CGS1, CGD1) VCCP, and the initial
voltage at B1, VCC−VTN. Accordingly,
Figure 2.26: Bootstrap operation waveforms.
In conjunction with the wordline voltage rising from ground to VCCP, the gate-to-source
capacitance of M1 provides a secondary boost to the boot node. The secondary boost
helps to ensure that the boot voltage is adequate to drive the wordline to a full VCCP level.
The layout of the boot node is very important to the bootstrap wordline driver. First, the
parasitic capacitance of node B1, which includes routing, junction, and overlap
components, must be minimized to achieve maximum boot efficiency. Second, charge
leakage from the boot node must be minimized to ensure adequate VGS for transistor M1
such that the wordline remains at VCCP for the maximum RAS low period. Low leakage is
often achieved by minimizing the source area for M3 or using donut gate structures that
surround the source area, as illustrated in Figure 2.27.

Figure 2.27: Donut gate structure layout.


The bootstrap driver is turned OFF by first driving the PHASE0 signal to ground. M1
remains ON because node B1 cannot drop below VCC—VTH; M1 substantially discharges
the wordline toward ground. This is followed by the address decoder turning OFF,
bringing DEC to ground and DEC* to VCC. With DEC* at VCC, transistor M2 turns ON and
fully clamps the wordline to ground. A voltage level translator is required for the PHASE
signal because it operates between ground and the boosted voltage VCCP. For a global
row decode configuration, this requirement is not much of a burden. For a local row
decode configuration, however, the requirement for level translators can be very
troublesome. Generally, these translators are placed either in the array gap cells at the
intersection of the sense amplifier blocks and row decode blocks or distributed throughout
the row decode block itself. The translators require both PMOS and NMOS transistors
and must be capable of driving large capacitive loads. Layout of the translators is
exceedingly difficult, especially because the overall layout needs to be as small as
possible.

2.3.2 NOR Driver


The second type of wordline driver is similar to the bootstrap driver in that two decode
terms drive the output transistor from separate terminals. The NOR driver, as shown in
Figure 2.28, uses a PMOS transistor for M1 and does not rely on bootstrapping to derive
the gate bias. Rather, the gate is driven by a voltage translator that converts DEC from
VCC to VCCP voltage levels. This conversion is necessary to ensure that M1 remains OFF
for unselected wordlines as the PHASE signal, which is shared by multiple drivers, is
driven to VCCP.

Figure 2.28: NOR driver.


To fire a specific wordline, DEC must be HIGH and the appropriate PHASE must fire
HIGH. Generally, there are four to eight PHASE signals per row decoder block. The NOR
driver requires one level translator for each PHASE and DEC signal. By comparison, the
bootstrap driver only requires level translators for the PHASE signal.

2.3.3 CMOS Driver


The final wordline driver configuration is shown in Figure 2.29. In general, it lacks a
specific name: it is sometimes referred to as a CMOS inverter driver or a CMOS driver.
Unlike the first two drivers, the output transistor M1 has its source terminal permanently
connected to VCCP. This driver, therefore, features a voltage translator for each and every
wordline. Both decode terms DEC and PHASE* combine to drive the output stage
through the translator. The advantage of this driver, other than simple operation, is low
power consumption. Power is conserved because the translators drive only the small
capacitance associated with a single driver. The PHASE translators of both the bootstrap
and NOR drivers must charge considerable junction capacitance. The disadvantages of
the CMOS driver are layout complexity and standby leakage current. Standby leakage
current is a product of VCCP voltage applied to M1 and its junction and subthreshold
leakage currents. For a large DRAM with high numbers of wordline drivers, this leakage
current can easily exceed the entire standby current budget unless great care is
exercised in designing output transistor M1.
Figure 2.29: CMOS driver.

2.3.4 Address Decode Trees

With the wordline driver circuits behind us, we can turn our attention to the address
decoder tree. There is no big secret to address decoding in the row decoder network. Just
about any type of logic suffices: static, dynamic, pass gate, or a combination thereof. With
any type of logic, however, the primary objectives in decoder design are to maximize
speed and minimize die area. Because a great variety of methods have been used to
implement row address decoder trees, it is next to impossible to cover them all. Instead,
we will give an insight into the possibilities by discussing a few of them.

Regardless of the type of logic with which a row decoder is implemented, the layout must
completely reside beneath the row address signal lines to constitute an efficient,
minimized design. In other words, the metal address tracks dictate the die area available
for the decoder. Any additional tracks necessary to complete the design constitute wasted
silicon. For DRAM designs requiring global row decode schemes, the penalty for
inefficient design may be insignificant; however, for distributed local row decode schemes,
the die area penalty may be significant. As with mbits and sense amplifiers, time spent
optimizing row decode circuits is time well spent.

2.3.5 Static Tree


The most obvious form of address decode tree uses static CMOS logic. As shown in
Figure 2.30, a simple tree can be designed using two-input NAND gates. While easy to
design schematically, static logic address trees are not popular. They waste silicon and
are very difficult to lay out efficiently. Static logic requires two transistors for each address
term, one NMOS and one PMOS, which can be significant for many address terms.
Furthermore, static gates must be cascaded to accumulate address terms, adding gate
delays at each level. For these and other reasons, static logic gates are not used in row
decode address trees in today’s state-of-the-art DRAMs.

Figure 2.30: Static decode tree.


2.3.6 P&E Tree
The second type of address tree uses dynamic logic, the most prevalent being precharge
and evaluate (P&E) logic. Used by the majority of DRAM manufacturers, P&E address
trees come in a variety of configurations, although the differences between them can be
subtle. Figure 2.31 shows a simplified schematic for one version of a P&E address tree
designed for use with bootstrapped wordline drivers. P&E address tree circuits feature
one or more PMOS PRECHARGE transistors and a cascade of NMOS ENABLE
transistors M2–M4. This P&E design uses half of the transistors required by the static
address tree of Figure 2.30. As a result, the layout of the P&E tree is much smaller than
that of the static tree and fits more easily under the address lines. The PRE transistor is
usually driven by a PRECHARGE* signal under the control of the RAS chain logic.
PRECHARGE* and transistor M1 ensure that DEC* is precharged HIGH, disabling the
wordline driver and preparing the tree for row address activation.

Figure 2.31: P&E decode tree.


M7 is a weak PMOS transistor driven by the DEC inverter (M5 and M6). Together, M7
and the inverter form a latch to ensure that DEC* remains HIGH for all decoders that are
not selected by the row addresses. At the beginning of any RAS cycle, PRECHARGE* is
LOW and the row addresses are all disabled (LOW). After RAS falls, PRECHARGE*
transitions HIGH to turn OFF M1; then the row addresses are enabled. If RA1–RA3 all go
HIGH, then M2–M4 turn ON, overpowering M7 and driving DEC* to ground and
subsequently DEC to VCC. The output of this tree segment normally drives four
bootstrapped wordline drivers, each connected to a separate PHASE signal. Therefore,
for an array with 256 wordlines, there will be 64 such decode trees.

2.3.7 Predecoding
The row address lines shown as RA1–RA3 can be either true and complement or
predecoded. Predecoded address lines are formed by logically combining (AND)
addresses as shown in Table 2.1.
Table 2.1: Predecoded address truth table.
Open table as spreadsheet

RA0 RA1 PR01<n> PR01<0> PR01<1> PR01<2> PR0


Table 2.1: Predecoded address truth table.
Open table as spreadsheet

RA0 RA1 PR01<n> PR01<0> PR01<1> PR01<2> PR0

0 0 0 1 0 0 0

1 0 1 0 1 0 0

0 1 2 0 0 1 0

1 1 3 0 0 0 1
The advantages of using predecoded addresses include lower power (fewer signals
make transitions during address changes) and higher efficiency (only three transistors are
necessary to decode six addresses for the circuit of Figure 2.31). Predecoding is
especially beneficial in redundancy circuits. In fact, predecoded addresses are used
throughout most DRAM designs today.

2.3.8 Pass Transistor Tree


The final type of address tree to be examined, shown in Figure 2.32, uses pass transistor
logic. Pass transistor address trees are similar to P&E trees in numerous ways. Both
designs use PMOS PRECHARGE transistors and NMOS address ENABLE transistors.
Unlike P&E logic, however, the NMOS cascade does not terminate at ground. Rather, the
cascade of M2–M4 goes to a PHASE* signal, which is HIGH during PRECHARGE and
LOW during EVALUATE. The address signals operate the same as in the P&E tree:
HIGH to select and LOW to deselect. The pass transistor tree is shown integrated into a
CMOS wordline driver. This is necessary because the pass transistor tree and the CMOS
wordline driver are generally used together and their operation is complementary. The
cross-coupled PMOS transistors of the CMOS level translator provide a latch necessary
to keep the final interstage node biased at VCC. Again the latch has a weak pull-up, which
is easily overpowered by the cascaded NMOS ENABLE transistors. A pass transistor
address tree is not used with bootstrapped wordline drivers because the PHASE signal
feeds into the address tree logic rather than into the driver, as required by the bootstrap
driver.

Figure 2.32: Pass transistor decode tree.


2.4 DISCUSSION

We have briefly examined the basic elements required in DRAM row decoder blocks.
Numerous variations are possible. No single design is best for all applications. As with
sense amplifiers, design depends on technology and performance and cost trade-offs

REFERENCES

[1] K.Itoh, “Trends in Megabit DRAM Circuit Design,” IEEE Journal of Solid-State Circuits, vol.
25, pp. 778–791, June 1990.

[2] D.Takashima, S.Watanabe, H.Nakano, Y.Oowaki, and K.Ohuchi,“Open/Folded Bit-Line


Arrangement for Ultra-High-Density DRAM’s,” IEEE Journal of Solid-State Circuits, vol. 29, pp.
539–542, April 1994.

[3] HidetoHidaka, YoshioMatsuda, and KazuyasuFujishima, “A Divided/Shared Bit-Line


Sensing Scheme for ULSI DRAM Cores,” IEEE Journal of Solid-State Circuits, vol. 26, pp.
473–477, April 1991.

[4] M.Aoki, Y.Nakagome, M.Horiguchi, H.Tanaka, S.Ikenaga, J.Etoh,Y.Kawamoto, S.Kimura,


E.Takeda, H.Sunami, and K.Itoh, “A 60-ns 16-Mbit CMOS DRAM with a Transposed Data-Line
Structure,” IEEE Journal of Solid-State Circuits, vol. 23, pp. 1113–1119, October 1988.

[5] R.Kraus and K.Hoffmann, “Optimized Sensing Scheme of DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 24, pp. 895–899, August 1989.

[6] T.Yoshihara, H.Hidaka, Y.Matsuda, and K.Fujishima, “A Twisted Bitline Technique for
Multi-Mb DRAMs,” 1988 IEEE ISSCC Digest of Technical Papers, pp. 238–239.

[7] YukihitoOowaki, KenjiTsuchida, YohjiWatanabe, DaisaburoTakashima,MasakoOhta,


HiroakiNakano, ShigeyoshiWatanabe, AkihiroNitayama,FumioHoriguchi, KazunoriOhuchi, and
FujioMasuoka, “A 33-ns 64Mb DRAM,” IEEE Journal of Solid-State Circuits, vol. 26, pp.
1498–1505, November 1991.

[8] M.Inoue, H.Kotani, T.Yamada, H.Yamauchi, A.Fujiwara, J.Matsushima,H.Akamatsu,


M.Fukumoto, M.Kubota, I.Nakao, N.Aoi, G.Fuse, S.Ogawa,S.Odanaka. A.Ueno, and
Y.Yamamoto, “A 16Mb DRAM with an Open BitLine Architecture,” 1988 IEEE ISSCC Digest of
Technical Papers, pp. 246–247.

[9] Y.Kubota, Y.Iwase, K.Iguchi, J.Takagi, T.Watanabe, and


K.Sakiyama,“Alternatively-Activated Open Bitline Technique for High Density DRAM’s,” IEICE
Trans. Electron., vol. E75-C, pp. 1259–1266, October 1992.
[10] T.Hamada, N.Tanabe, H.Watanabe, K.Takeuchi, N.Kasai, H.Hada,K.Shibahara,
K.Tokashiki, K.Nakajima, S.Hirasawa, E.Ikawa, T.Saeki,E.Kakehashi, S.Ohya, and T.Kunio,
“A Split-Level Diagonal Bit-Line (SLDB) Stacked Capacitor Cell for 256Mb DRAMs,” 1992
IEDM Technical Digest, pp. 799–802.

[11] ToshinoriMorihara, YoshikazuOhno, TakahisaEimori, ToshiharuKatayama,ShinichiSatoh,


TadashiNishimura, and HirokazuMiyoshi, “Disk-Shaped Stacked Capacitor Cell for 256Mb
Dynamic Random-Access Memory,” Japan Journal of Applied Physics, vol. 33, Part 1, pp.
4570 –4575, August 1994.

[12] J.H.Ahn, Y.W.Park, J.H.Shin, S.T.Kim, S.P.Shim, S.W.Nam,W.M.Park, H.B.Shin,


C.S.Choi, K.T.Kim, D.Chin, O.H.Kwon, and C.G.Hwang, “Micro Villus Patterning (MVP)
Technology for 256Mb DRAM Stack Cell,” 1992 Symposium on VLSI Tech. Digest of
Technical Papers, pp. 12–13.

[13] KazuhikoSagara, TokuoKure, ShojiShukuri, JiroYugami, NorioHasegawa,HidekazuGoto,


and HisaomiYamashita, “Recessed Memory Array Technology for a Double Cylindrical
Stacked Capacitor Cell of 256M DRAM,” IEICE Trans. Electron., vol. E75-C, pp. 1313–1322,
November 1992.

[14] S.Ohya, “Semiconductor Memory Device Having Stacked-Type Capacitor of Large


Capacitance,” United States Patent Number 5,298,775, March 29, 1994.

[15] M.Sakao, N.Kasai, T.Ishijima, E.Ikawa, H.Watanabe, K.Terada, and T.Kikkawa, “A


Capacitor-Over-Bit-Line (COB) Cell with Hemispherical-Grain Storage Node for 64Mb
DRAMs,” 1990 IEDM Technical Digest, pp. 655–658.

[16] G.Bronner, H.Aochi, M.Gall, J.Gambino, S.Gernhardt, E.Hammerl, H.Ho,J.Iba, H.Ishiuchi,


M.Jaso, R.Kleinhenz, T.Mii, M.Narita, L.Nesbit,W.Neumueller, A.Nitayama, T.Ohiwa, S.Parke,
J.Ryan, T.Sato, H.Takato,and S.Yoshikawa, “A Fully Planarized 0.25μm CMOS Technology
for 256Mbit DRAM and Beyond,” 1995 Symposium on VLSI Tech. Digest of Technical Papers,
pp. 15–16.

[17] N.C.-C.Lu and H.H.Chao, “Half-VDD/Bit-Line Sensing Scheme in CMOS DRAMs,” in IEEE
Journal of Solid-State Circuits, vol. SC19, p. 451, August 1984.

[18] E.Adler; J.K.DeBrosse; S.F.Geissler; S.J.Holmes; M.D.Jaffe;J.B.Johnson; C.W.Koburger,


III; J.B.Lasky; B.Lloyd; G.L.Miles;J.S.Nakos; W.P.Noble, Jr.; S.H.Voldman; M.Armacost; and
R.Ferguson;“The Evolution of IBM CMOS DRAM Technology;” IBM Journal of Research and
Development, vol. 39, pp. 167–188, March 1995.

[19] R.Kraus, “Analysis and Reduction of Sense-Amplifier Offset,” in IEEE Journal of


Solid-State Circuits, vol. 24, pp. 1028–1033, August 1989.
[20] M.Asakura, T.Ohishi, M.Tsukude, S.Tomishima, H.Hidaka, K.Arimoto,K.Fujishima,
T.Eimori, Y.Ohno, T.Nishimura, M.Yasunaga, T.Kondoh,S.I.Satoh, T.Yoshihara, and
K.Demizu, “A 34ns 256Mb DRAM with Boosted Sense-Ground Scheme,” 1994 IEEE ISSCC
Digest of Technical Papers, pp. 140–141.

[21] T.Ooishi, K.Hamade, M.Asakura, K.Yasuda, H.Hidaka, H.Miyamoto, and H.Ozaki, “An
Automatic Temperature Compensation of Internal Sense Ground for Sub-Quarter Micron
DRAMs,” 1994 Symposium on VLSI Circuits Digest of Technical Papers, pp. 77–78.

[22] K.Noda, T.Saeki, A.Tsujimoto, T.Murotani, and K.Koyama, “A Boosted Dual Word-line
Decoding Scheme for 256 Mb DRAMs,” 1992 Symposium on VLSI Circuits Digest of Technical
Papers, pp. 112–113.
Chapter 3: Array Architectures
This chapter presents a detailed description of the two most prevalent array architectures
under consideration for future large-scale DRAMs: the afore-mentioned open
architectures and folded digitline architectures.

3.1 ARRAY ARCHITECTURES


To provide a viable point for comparison, each architecture is employed in the theoretical
construction of 32-Mbit memory blocks for use in a 256-Mbit DRAM. Design parameters
and layout rules from a typical 0.25 μm DRAM process provide the necessary dimensions
and constraints for the analysis. Some of these parameters are shown in Table 3.1. By
examining DRAM architectures in the light of a real-world design problem, an objective
and unbiased comparison can be made. In addition, using this approach, we readily
detect the strengths and weaknesses of either architecture.
Table 3.1: 0.25 μm design parameters.
Open table as spreadsheet

Parameter Value
Digitline width WDL 0.3μm
Digitline pitch PDL 0.6μm
Wordline width WWL 0.3 μm
0.6 μm
2
Wordline pitch for 8F mbit PWL8
Wordline pitch for 6F2 mbit PWL6 0.9 μm
Cell capacitance CC 30fF
Digitline capacitance per mbit CDM 0.8fF
2
Wordline capacitance per 8F mbit CW8 0.6fF
Wordline capacitance per 6F2 mbit CW6 0.5fF
Wordline sheet resistance RS 6Ω/sq

3.1.1 Open Digitline Array Architecture


The open digitline array architecture was the prevalent architecture prior to the 64kbit
DRAM. A modern embodiment of this architecture, as shown in Figure 3.1 [1] [2], is
constructed with multiple crosspoint array cores separated by strips of sense amplifier
blocks in one axis and either row decode blocks or wordline stitching regions in the other
axis. Each 128kbit array core is built using 6F2 mbit cell pairs. There are 131,072 (217)
functionally addressable mbits arranged in 264 rows and 524 digitlines. In the 264 rows,
there are 256 actual wordlines, 4 redundant wordlines, and 4 dummy wordlines. In the
524 digitlines, there are 512 actual digitlines, 8 redundant digitlines, and 4 dummy
digitlines. Photolithography problems usually occur at the edge of large repetitive
structures, such as mbit arrays. These problems produce malformed or nonuniform
structures, rendering the edge cells useless. Therefore, including dummy mbits,
wordlines, and digitlines on each array edge ensures that these problems occur only on
dummy cells, leaving live cells unaffected. Although dummy structures enlarge each array
core, they also significantly improve device yield. Thus, they are necessary on all DRAM
designs.

Figure 3.1: Open digitline architecture schematic.

Array core size, as measured in the number of mbits, is restricted by two factors: a desire
to keep the quantity of mbits a power of two and the practical limits on wordline and
digitline length. The need for a binary quantity of mbits in each array core derives from the
binary nature of DRAM addressing. Given N row addresses and M column addresses for
a given part, there are 2N+M addressable mbits. Address decoding is greatly simplified
within a DRAM if array address boundaries are derived directly from address bits.

Because addressing is binary, the boundaries naturally become binary. Therefore, the
size of each array core must necessarily have 2X addressable rows and 2Y addressable
digitlines. The resulting array core size is 2X+Y mbits, which is, of course, a binary number.
The second set of factors limiting array core size involves practical limits on digitline and
wordline length. From earlier discussions in Section 2.1, the digitline capacitance is
limited by two factors. First, the ratio of cell capacitance to digitline capacitance must fall
within a specified range to ensure reliable sensing. Second, operating current and power
for the DRAM is in large part determined by the current required to charge and discharge
the digitlines during each active cycle. Power considerations restrict digitline length for the
256-Mbit generation to approximately 128 mbit pairs (256 rows), with each mbit
connection adding capacitance to the digitline. The power dissipated during a Read or
Refresh operation is proportional to the digitline capacitance (CD), the supply voltage
(VCC), the number of active columns (N), and the Refresh period (P). Accordingly, the
power dissipated is given as

On a 256-Mbit DRAM in 8k (rows) Refresh, there are 32,768 (215) active columns during
each Read, Write, or Refresh operation. The active array current and power dissipation
for a 256-Mbit DRAM appear in Table 3.2 for a 90 ns Refresh period (−5 timing) at
various digitline lengths. The budget for the active array current is limited to 200mA for
this 256-Mbit design. To meet this budget, the digitline cannot exceed a length of 256
mbits.
Table 3.2: Active current and power versus digitline length.
Open table as spreadsheet

Digitline Digitline Active Power


Length Capacitance Current Dissipation
(mbits) (fF) (mA) (mW)

128 102 60 199

256 205 121 398

512 410 241 795


Wordline length, as described in Section 2.1, is limited by the maximum allowable RC
time constant of the wordline. To ensure acceptable access time for the 256-Mbit DRAM,
the wordline time constant should be kept below 4 nanoseconds. For a wordline
connected to N mbits, the total resistance and capacitance follow:

where PDL is the digitline pitch, WLW is the wordline width, and CW8 is the wordline
capacitance in an 8F2 mbit cell.
Table 3.3 contains the effective wordline time constants for various wordline lengths. As
shown in the table, the wordline length cannot exceed 512 mbits (512 digitlines) if the
wordline time constant is to remain under 4 nanoseconds.
Table 3.3: Wordline time constant versus wordline length.
Open table as spreadsheet

Wordline Length RWL CWL Time Constant


(mbits) (ohms) (fF) (ns)

128 1,536 64 0.098

256 3,072 128 0.39

512 6,144 256 1.57

1024 12,288 512 6.29


The open digitline architecture does not support digitline twisting because the true and
complement digitlines, which constitute a column, are in separate array cores. Therefore,
no silicon area is consumed for twist regions. The 32-Mbit array block requires a total of
two hundred fifty-six 128kbit array cores in its construction. Each 32-Mbit block
represents an address space comprising a total of 4,096 rows and 8,192 columns. A
practical configuration for the 32-Mbit block is depicted in Figure 3.2.

Figure 3.2: Open digitline 32-Mbit array block.


In Figure 3.2, the 256 array cores appear in a 16×16 arrangement. The x16 arrangement
produces 2-Mbit sections of 256 wordlines and 8,192 digitlines (4,096 columns). Sixteen
2-Mbit sections are required to form the complete 32-Mbit block: sense amplifier strips are
positioned vertically between each 2-Mbit section, and row decode strips or wordline
stitching strips are positioned horizontally between each array core.
Layout can be generated for the various 32-Mbit elements depicted in Figure 3.2. This
layout is necessary to obtain reasonable estimates for pitch cell size. With these size
estimates, overall dimensions of the 32-Mbit memory block can be calculated. The results
of these estimates appear in Table 3.4. Essentially, the overall height of the 32-Mbit block
can be found by summing the height of the row decode blocks (or stitch regions) together
with the product of the digitline pitch and the total number of digitlines.

Accordingly,

where TR is the number of local row decoders, HLDEC is the height of each decoder, TDL is
the number of digitlines including redundant and dummy lines, and PDL is the digitline
pitch. Similarly, the width of the 32-Mbit block is found by summing the total width of the
sense amplifier blocks with the product of the wordline pitch and the number of wordlines.
This bit of math yields
where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy lines, and
PWL6 is the wordline pitch for the 6F2 mbit.
Table 3.4 contains calculation results for the 32-Mbit block shown in Figure 3.2. Although
overall size is the best measure of architectural efficiency, a second popular metric is
array efficiency. Array efficiency is determined by dividing the area consumed by
functionally addressable mbits by the total die area. To simplify the analysis in this book,
peripheral circuits are ignored in the array efficiency calculation. Rather, the calculation
considers only the 32-Mbit memory block, ignoring all other factors. With this
simplification, the array efficiency for a 32-Mbit block is given as

where 225 is the number of addressable mbits in each 32-Mbit block. The open digitline
architecture yields a calculated array efficiency of 51.7%.
Table 3.4: Open digitline (local row decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 17

Width of sense amplifiers WAMP 88 μm

Number of local decode strips TLDEC 17

Height of local decode strips HLDEC 93μm

Number of digitlines TDL 8,400

Number of wordlines TWL 4,224

Height of 32-Mbit block Height32 6,621 μm

Width of 32-Mbit block Width32 5,298 μm

Area of 32-Mbit block Area32 35,078,058 μm2

Unfortunately, the ideal open digitline architecture presented in Figure 3.2 is difficult to
realize in practice. The difficulty stems from an interdependency between the memory
array and sense amplifier layouts in which each array digitline must connect to one sense
amplifier and each sense amplifier must connect to two array digitlines.
This interdependency, which exists for all array architectures, becomes problematic for
the open digitline architecture. The two digitlines, which connect to a sense amplifier,
must come from two separate memory arrays. As a result, sense amplifier blocks must
always be placed between memory arrays for open digitline array architectures [3], unlike
the depiction in Figure 3.2.
Two layout approaches may be used to achieve this goal. First, design the sense
amplifiers so that the sense amplifier block contains a set of sense amplifiers for each
digitline in the array. This single-pitch solution, shown in Figure 3.3, eliminates the need
for sense amplifiers on both sides of an array core because all of the digitlines connect to
a single sense amplifier block. Not only does this solution eliminate the edge problem, but
it also reduces the 32-Mbit block size. There are now only eight sense amplifier strips
instead of the seventeen of Figure 3.2. Unfortunately, it is nearly impossible to lay out
sense amplifiers in this fashion [4]. Even a single-metal sense amplifier layout,
considered the tightest layout in the industry, achieves only one sense amplifier for every
two digitlines (double-pitch).

Figure 3.3: Single-pitch open digitline architecture.


A second approach to solving the interdependency problem in open digitline architectures
is to maintain the configuration shown in Figure 3.2 but include some form of reference
digitline for the edge sense amplifiers. The reference digitline can assume any form as
long as it accurately models the capacitance and behavior of a true digitline. Obviously,
the best type of reference digitline is a true digitline. Therefore, with this approach, more
dummy array cores are added to both edges of the 32-Mbit memory block, as shown in
Figure 3.4. The dummy array cores need only half as many wordlines as the true array
core because only half of the digitlines are connected to any single sense amplifier strip.
The unconnected digitlines double the effective length of the reference digitlines.
Figure 3.4: Open digitline architecture with dummy arrays.

This approach solves the array edge problem. However, by producing a larger 32-Mbit
memory block, array efficiency is reduced. Dummy arrays solve the array edge problem
inherent in open digitline architecture but require sense amplifier layouts that are on the
edge of impossible. The problem of sense amplifier layout is all the more difficult because
global column select lines must be routed through. For all intents and purposes, therefore,
the sense amplifier layout cannot be completed without the presence of an additional
conductor, such as a third metal, or without time multiplexed sensing. Thus, for the open
digitline architecture to be successful, an additional metal must be added to the DRAM
process.
With the presence of Metal3, the sense amplifier layout and either a full or hierarchical
global row decoding scheme is made possible. A full global row decoding scheme using
wordline stitching places great demands on metal and contact/via technologies; however,
it represents the most efficient use of the additional metal. Hierarchical row decoding
using bootstrap wordline drivers is slightly less efficient. Wordlines no longer need to be
strapped with metal on pitch, and, thus, process requirements are relaxed significantly [5].
For a balanced perspective, both global and hierarchical approaches are analyzed. The
results of this analysis for the open digitline architecture are summarized in Tables 3.5
and 3.6. Array efficiency for global and hierarchical row decoding calculate to 60.5% and
55.9%, respectively, for the 32Mbit memory blocks based on data from these tables.
Table 3.5: Open digitline (dummy arrays and global row decode)—32-Mbit size
calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 17

Width of sense amplifiers WAMP 88 μm

Number of global decode strips TGDEC 1

Height of global decode strips HGDEC 200 μm


Table 3.5: Open digitline (dummy arrays and global row decode)—32-Mbit size
calculations.
Open table as spreadsheet

Description Parameter Size

Number of stitch regions NST 17

Height of stitch regions HST 10 μm

Number of digitlines TDL 8,400

Number of wordlines TWL 4,488

Height of 32-Mbit block Height32 5,410 μm

Width of 32-Mbit block Width32 5,535 μm

Area of 32-Mbit block Area32 29,944,350 μm2

Table 3.6: Open digitline (dummy arrays and hierarchical row decode)—32-Mbit
size calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 17

Width of sense amplifiers WAMP 88 μm

Number of global decode strips TGDEC 1

Height of global decode strips HGDEC 190 μm

Number of hier decode strips THDEC 17

Height of hier decode strips HHDEC 37 μm

Number of digitlines TDL 8,400

Number of wordlines TWL 4,488

Height of 32-Mbit block Height32 5,859 μm

Width of 32-Mbit block Width32 5,535 μm

Area of 32-Mbit block Area32 32,429,565 μm2

3.1.2 Folded Array Architecture


The folded array architecture depicted in Figure 3.5 is the standard architecture of today’s
modern DRAM designs. The folded architecture is constructed with multiple array cores
separated by strips of sense amplifiers and either row decode blocks or wordline stitching
regions. Unlike the open digitline architecture, which uses 6F2 mbit cell pairs, the folded
array core uses 8F2 mbit cell pairs [6]. Modern array cores include 262,144 (218)
functionally addressable mbits arranged in 532 rows and 1,044 digitlines. In the 532 rows,
there are 512 actual wordlines, 4 redundant wordlines, and 16 dummy wordlines. Each
row (wordline) connects to mbit transistors on alternating digitlines. In the 1,044 digitlines,
there are 1,024 actual digitlines (512 columns), 16 redundant digitlines (8 columns), and 4
dummy digitlines. As discussed in Section 3.1.1, because of the additive or subtractive
photolithography effects, dummy wordlines and digitlines are necessary to guardband live
digitlines. These photo effects are pronounced at the edges of large repetitive structures
such as the array cores.

Figure 3.5: Folded digitline array architecture schematic.


Sense amplifier blocks are placed on both sides of each array core. The sense amplifiers
within each block are laid out at quarter pitch: one sense amplifier for every four digitlines.
Each sense amplifier connects through isolation devices to columns (digitline pairs) from
both adjacent array cores. Odd columns connect on one side of the core, and even
columns connect on the opposite side. Each sense amplifier block is therefore connected
to only odd or even columns and is never connected to both odd and even columns within
the same block. Connecting to both odd and even columns requires a half-pitch sense
amplifier layout: one sense amplifier for every two digitlines. While half-pitch layout is
possible with certain DRAM processes, the bulk of production DRAM designs remain
quarter pitch due to the ease of laying them out. The analysis presented in this chapter is
accordingly based on quarter-pitch design practices.
The location of row decode blocks for the array core depends on the number of available
metal layers. For one- and two-metal processes, local row decode blocks are located at
the top and bottom edges of the core. For three- and four-metal processes, global row
decodes are used. Global row decodes require only stitch regions or local wordline
drivers at the top and bottom edges of the core [7]. Stitch regions consume much less
silicon area than local row decodes, substantially increasing array efficiency for the
DRAM. The array core also includes digitline twist regions running parallel to the
wordlines. These regions provide the die area required for digitline twisting. Depending on
the particular twisting scheme selected for a design (see Section 2.1), the array core
needs between one and three twist regions. For the sake of analysis, a triple twist is
assumed, as it offers the best overall noise performance and has been chosen by DRAM
manufacturers for advanced large-scale applications [8]. Because each twist region
constitutes a break in the array structure, it is necessary to use dummy wordlines. For this
reason, there are 16 dummy wordlines (2 for each array edge) in the folded array core
rather than 4 dummy wordlines as in the open digitline architecture.
There are more mbits in the array core for folded digitline architectures than there are for
open digitline architectures. Larger core size is an inherent feature of folded architectures,
arising from the very nature of the architecture. The term folded architecture comes from
the fact that folding two open digitline array cores one on top of the other produces a
folded array core. The digitlines and wordlines from each folded core are spread apart
(double pitch) to allow room for the other folded core. After folding, each constituent core
remains intact and independent, except for the mbit changes (8F2 conversion) necessary
in the folded architecture. The array core size doubles because the total number of
digitlines and wordlines doubles in the folding process. It does not quadruple as one
might suspect because the two constituent folded cores remain independent: the
wordlines from one folded core do not connect to mbits in the other folded core.
Digitline pairing (column formation) is a natural outgrowth of the folding process; each
wordline only connects to mbits on alternating digitlines. The existence of digitline pairs
(columns) is the one characteristic of folded digitline architectures that produces superior
signal-to-noise performance. Furthermore, the digitlines that form a column are physically
adjacent to one another. This feature permits various digitline twisting schemes to be
used, as discussed in Section 2.1, further improving signal-to-noise performance.
Similar to the open digitline architecture, digitline length for the folded digitline
architecture is again limited by power dissipation and minimum cell-to-digitline
capacitance ratio. For the 256-Mbit generation, digitlines are restricted from connecting to
more than 256 cells (128 mbit pairs). The analysis used to arrive at this quantity is similar
to that for the open digitline architecture. (Refer to Table 3.2 to view the calculated results
of power dissipation versus digitline length for a 256-Mbit DRAM in 8k Refresh.) Wordline
length is again limited by the maximum allowable RC time constant of the wordline.
Contrary to an open digitline architecture in which each wordline connects to mbits on
each digitline, the wordlines in a folded digitline architecture connect to mbits only on
alternating digitlines. Therefore, a wordline can cross 1,024 digitlines while connecting to
only 512 mbit transistors. The wordlines have twice the overall resistance, but only
slightly more capacitance because they run over field oxide on alternating digitlines.
Table 3.7 presents the effective wordline time constants for various wordline lengths for a
folded array core. For a wordline connected to N mbits, the total resistance and
capacitance follow:

where PDL is the digitline pitch and CW8 is the wordline capacitance in an 8F2 mbit cell. As
shown in Table 3.7, the wordline length cannot exceed 512 mbits (1,024 digitlines) for the
wordline time constant to remain under 4 nanoseconds. Although the wordline connects
to only 512 mbits, it is two times longer (1,024 digitlines) than wordlines in open digitline
array cores. The folded digitline architecture therefore requires half as many row decode
blocks or wordline stitching regions as the open digitline architecture.
Table 3.7: Wordline time constant versus wordline length (folded).
Open table as spreadsheet

Wordline Length RWL CWL Time Constant


(mbits) (ohms) (fF) (ns)

128 3,072 77 0.24

256 6,144 154 0.95

512 12,288 307 3.77

1,024 24,576 614 15.09


A diagram of a 32-Mbit array block using folded digitline architecture is shown in Figure
3.6. This block requires a total of one hundred twenty-eight 256kbit array cores. In this
figure, the array cores are arranged in an 8-row and 16-column configuration. The x8 row
arrangement produces 2-Mbit sections of 256 wordlines and 8,192 digitlines (4,096
columns). A total of sixteen 2-Mbit sections form the complete 32-Mbit array block. Sense
amplifier strips are positioned vertically between each 2-Mbit section, as in the open
digitline architecture. Again, row decode blocks or wordline stitching regions are
positioned horizontally between the array cores.

Figure 3.6: Folded digitline architecture 32-Mbit array block.


The 32-Mbit array block shown in Figure 3.6 includes size estimates for the various pitch
cells. The layout was generated where necessary to arrive at the size estimates. The
overall size for the folded digitline 32-Mbit block can be found by again summing the
dimensions for each component.

Accordingly,

where TR is the number of row decoders, HRDEC is the height of each decoder, TDL is the
number of digitlines including redundant and dummy, and PDL is the digitline pitch.
Similarly,

where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy, PWL8 is
the wordline pitch for the 8F2 mbit, TTWIST is the total number of twist regions, and WTWIST
is the width of the twist regions.
Table 3.8 shows the calculated results for the 32-Mbit block shown in Figure 3.6. In this
table, a double-metal process is used, which requires local row decoder blocks. Note that
Table 3.8 for the folded digitline architecture contains approximately twice as many
wordlines as does Table 3.5 for the open digitline architecture. The reason for this is that
each wordline in the folded array only connects to mbits on alternating digitlines, whereas
each wordline in the open array connects to mbits on every digitline. A folded digitline
design therefore needs twice as many wordlines as a comparable open digitline design.
Table 3.8: Folded digitline (local row decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSAF 17

Width of sense amplifiers WAMP 45 μm

Number of local decode strips TLDEC 9

Height of local decode strips HLDEC 93 μm

Number of digitlines TDL 8,352

Number of wordlines TWL1 8,512

Number of twist regions TTWIST 48

Width of twist regions WTWIST 6 μm

Height of 32-Mbit block Height32 6,592 μm

Width of 32-Mbit block Width32 6,160 μm

Area of 32-Mbit block Area32 40,606,720 μm2

Array efficiency for the 32-Mbit memory block from Figure 3.6 is again found by dividing
the area consumed by functionally addressable mbits by the total die area. For the
simplified analysis presented in this book, the peripheral circuits are ignored. Array
efficiency for the 32-Mbit block is therefore given as

which yields 59.5% for the folded array design example.


With the addition of Metal3 to the DRAM process, either a global or hierarchical row
decoding scheme, similar to the open digitline analysis, can be used. While global row
decoding and stitched wordlines achieve the smallest die size, they also place greater
demands on the fabrication process. For a balanced perspective, both approaches were
analyzed for the folded digitline architecture. The results of this analysis are presented in
Tables 3.9 and 3.10. Array efficiency for the 32-Mbit memory blocks using global and
hierarchical row decoding calculate to 74.0% and 70.9%, respectively.
Table 3.9: Folded digitline (global decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 17

Width of sense amplifiers WAMP 45 μm

Number of global decode strips TGDEC 1

Height of global decode strips HGDEC 200 μm

Number of stitch regions NST 9

Height of stitch regions HST 10μm

Number of digitlines TDL 8,352

Number of wordlines TWL 8,512

Number of twist regions TTWIST 48

Width of twist regions WTWIST 6 μm

Height of 32-Mbit block Height32 5,301 μm

Width of 32-Mbit block Width32 6,160 μm

Area of 32-Mbit block Area32 32,654,160 μm2

Table 3.10: Folded digitline (hierarchical row decode)—32-Mbit size calculations.


Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 17

Width of sense amplifiers WAMP 45 μm


Table 3.10: Folded digitline (hierarchical row decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Number of global decode strips TGDEC 1

Height of global decode strips HGDEC 190 μm

Number of hier decode strips NHDEC 9

Height of hier decode strips HHEC 37 μm

Number of digitlines TDL 8,352

Number of wordlines TWL 8,512

Number of twist regions TTWIST 48

Width of twist regions WTWIST 6 μm

Height of 32-Mbit block Height32 5,534 μm

Width of 32-Mbit block Width32 6,160 μm

Area of 32-Mbit block Area32 34,089,440 μm2

3.2 DESIGN EXAMPLES: ADVANCED BILEVEL DRAM

ARCHITECTURE
This section introduces a novel, advanced architecture possible for use on future
large-scale DRAMs. First, we discuss technical objectives for the proposed architecture.
Second, we develop and describe the concept for an advanced array architecture
capable of meeting these objectives. Third, we conceptually construct a 32-Mbit memory
block with this new architecture for a 256-Mbit DRAM. Finally, we compare the results
achieved with the new architecture to those obtained for the open digitline and folded
digitline architectures from Section 3.1.

3.2.1 Array Architecture Objectives


Both the open digitline and folded digitline architectures have distinct advantages and
disadvantages. While open digitline architectures achieve smaller array layouts by virtue
of using smaller 6F2 mbit cells, they suffer from poor signal-to-noise performance. A
relaxed wordline pitch, which stems from the 6F2 mbit, simplifies the task of wordline
driver layout. Sense amplifier layout, however, is difficult because the array configuration
is inherently half pitch: one sense amplifier for every two digitlines. The superior
signal-to-noise performance [9] of folded digitline architectures comes at the expense of
larger, less efficient array layouts. Good signal-to-noise performance stems from the
adjacency of true and complement digitlines and the capability to twist these digitline
pairs. Sense amplifier layout is simplified because the array configuration is quarter
pitch—that is, one sense amplifier for every four digitlines. Wordline driver layout is more
difficult because the wordline pitch is effectively reduced in folded architectures.
The main objective of the new array architecture is to combine the advantages, while
avoiding the disadvantages, of both folded and open digitline architectures. To meet this
objective, the architecture needs to include the following features and characteristics:
Open digitline mbit configuration
Small 6F2 mbit
Small, efficient array layout
Folded digitline sense amplifier configuration
Adjacent true and complement digitlines
Twisted digitline pairs
Relaxed wordline pitch
High signal-to-noise ratio

An underlying goal of the new architecture is to reduce overall die size beyond that
obtainable from either the folded or open digitline architectures. A second yet equally
important goal is to achieve signal-to-noise performance that meets or approaches that of
the folded digitline architecture.

3.2.2 Bilevel Digitline Construction

A bilevel digitline architecture resulted from 256-Mbit DRAM research and design
activities carried out at Micron Technology, Inc., in Boise, Idaho. This bilevel digitline
architecture is an innovation that evolved from a comparative analysis of open and folded
digitline architectures. The analysis served as a design catalyst, ultimately leading to the
creation of a new DRAM array configuration—one that allows the use of 6F2 mbits in an
otherwise folded digitline array configuration. These memory cells are a by-product of
crosspoint-style (open digitline) array blocks. Crosspoint-style array blocks require that
every wordline connect to mbit transistors on every digitline, precluding the formation of
digitline pairs. Yet, digitline pairs (columns) remain an essential element in folded
digitline-type operation. Digitline pairs and digitline twisting are important features that
provide for good signal-to-noise performance.
The bilevel digitline architecture solves the crosspoint and digitline pair dilemma through
vertical integration. In vertical integration, essentially, two open digitline crosspoint array
sections are placed side by side, as seen in Figure 3.7. Digitlines in one array section are
designated as true digitlines, while digitlines from the second array section are
designated as complement digitlines. An additional conductor is added to the DRAM
process to complete the formation of the digitline pairs. The added conductor allows
digitlines from each array section to route across the other array section with both true
and complement digitlines vertically aligned. At the juncture between each section, the
true and complement signals are vertically twisted. With this twisting, the true digitline
connects to mbits in one array section, and the complement digitline connects to mbits in
the other array section. This twisting concept is illustrated in Figure 3.8.
Figure 3.7: Development of bilevel digitline architecture.

Figure 3.8: Digitline vertical twisting concept.


To improve the signal-to-noise characteristics of this design, the single twist region is
replaced by three twist regions, as illustrated in Figure 3.9. A benefit of adding multiple
twist regions is that only half of the digitline pairs actually twist within each region, making
room in each region for the twists to occur. The twist regions are equally spaced at the
25%, 50%, and 75% marks in the overall array. Assuming that even digitline pairs twist at
the 50% mark, odd digitlines twist at the 25% and 75% marks. Each component of a
digitline pair, true and complement, spends half of its overall length on the bottom
conductor connecting to mbits and half of its length on the top conductor. This
characteristic balances the capacitance and the number of mbits associated with each
digitline. Furthermore, the triple twisting scheme guarantees that the noise terms are
balanced for each digitline, producing excellent signal-to-noise performance.

Figure 3.9: Bilevel digitline architecture schematic.


A variety of vertical twisting schemes is possible with the bilevel digitline architecture. As
shown in Figure 3.10, each scheme uses conductive layers already present in the DRAM
process to complete the twist. Vertical twisting is simplified because only half of the
digitlines are involved in a given twist region. The final selection of a twisting scheme is
based on yield factors, die size, and available process technology.

Figure 3.10: Vertical twisting schemes.


To further advance the bilevel digitline architecture concept, its 6F2 mbit was modified to
improve yield. Shown in an arrayed form in Figure 3.11, the plaid mbit is constructed
using long parallel strips of active area vertically separated by traditional field oxide
isolation. Wordlines run perpendicular to the active area in straight strips of polysilicon.
Plaid mbits are again constructed in pairs that share a common contact to the digitline.
Isolation gates (transistors) formed with additional polysilicon strips provide horizontal
isolation between mbits. Isolation is obtained from these gates by permanently
connecting the isolation gate polysilicon to either a ground or a negative potential. Using
isolation gates in this mbit design eliminates one- and two-dimensional encroachment
problems associated with normal isolation processes. Furthermore, many
photolithography problems are eliminated from the DRAM process as a result of the
straight, simple design of both the active area and the polysilicon in the mbit. The plaid
designation for this mbit is derived from the similarity between an array of mbits and
tartan fabric that is apparent in a color array plot.

Figure 3.11: Plaid 6F2 mbit array.


(The isolation gates are tied to ground or a negative voltage.)

In the bilevel and folded digitline architectures, both true and complement digitlines exist
in the same array core. Accordingly, the sense amplifier block needs only one sense
amplifier for every two digitline pairs. For the folded digitline architecture, this yields one
sense amplifier for every four Metal 1 digitlines—quarter pitch. The bilevel digitline
architecture that uses vertical digitline stacking needs one sense amplifier for every two
Metal1 digitlines—half pitch. Sense amplifier layout is therefore more difficult for bilevel
than for folded designs. The three-metal DRAM process needed for bilevel architectures
concurrently enables and simplifies sense amplifier layout. Metal1 is used for lower level
digitlines and local routing within the sense amplifiers and row decodes. Metal2 is
available for upper level digitlines and column select signal routing through the sense
amplifiers. Metal3 can therefore be used for column select routing across the arrays and
for control and power routing through the sense amplifiers. The function of Metal2 and
Metal3 can easily be swapped in the sense amplifier block depending on layout
preferences and design objectives.
Wordline pitch is effectively relaxed for the plaid 6F2 mbit of the bilevel digitline
architecture. The mbit is still built using the minimum process feature size of 0.3 μm. The
relaxed wordline pitch stems from structural differences between a folded digitline mbit
and an open digitline or plaid mbit. There are essentially four wordlines running across
each folded digitline mbit pair compared to two wordlines running across each open
digitline or plaid mbit pair. Although the plaid mbit is 25% shorter than a folded mbit (three
versus four features), it also has half as many wordlines, effectively reducing the wordline
pitch. This relaxed wordline pitch makes layout of the wordline drivers and the address
decode tree much easier. In fact, both odd and even wordlines can be driven from the
same row decoder block, thus eliminating half of the row decoder strips in a given array
block. This is an important distinction, as the tight wordline pitch for folded digitline
designs necessitates separate odd and even row decode strips.

3.2.3 Bilevel Digitline Array Architecture


The bilevel digitline array architecture depicted in Figure 3.12 is a potential architecture
for tomorrow’s large-scale DRAM designs. The bilevel architecture is constructed with
multiple array cores separated by strips of sense amplifiers and either row decode blocks
or wordline stitching regions. Wordline stitching requires a four-metal process, while row
decode blocks can be implemented in a three-metal process. The array cores include
262,144 (225) functionally addressable plaid 6F2 mbits arranged in 532 rows and 524
bilevel digitline pairs. The 532 rows comprise 512 actual wordlines, 4 redundant wordlines,
and 16 dummy wordlines. There are also 267 isolation gates in each array due to the use
of plaid mbits. Because they are accounted for in the wordline pitch, however, they can be
ignored. The 524 bilevel digitline pairs comprise 512 actual digitline pairs, 8 redundant
digitline pairs, and 4 dummy digitline pairs. The term digitline pair describes the array core
structure because pairing is a natural product of the bilevel architecture. Each digitline
pair consists of one digitline on Metal 1 and a vertically aligned complementary digitline
on Metal2.
Figure 3.12: Bilevel digitline array schematic.

Sense amplifier blocks are placed on both sides of each array core. The sense amplifiers
within each block are laid out at half pitch: one sense amplifier for every two Metal 1
digitlines. Each sense amplifier connects through isolation devices to columns (digitline
pairs) from two adjacent array cores. Similar to the folded digitline architecture, odd
columns connect on one side of the array core, and even columns connect on the other
side. Each sense amplifier block is then exclusively connected to either odd or even
columns, never to both.

Unlike a folded digitline architecture that uses a local row decode block connected to both
sides of an array core, the bilevel digitline architecture uses a local row decode block
connected to only one side of each core. As stated earlier, both odd and even rows can
be driven from the same local row decoder block with the relaxed wordline pitch. Because
of this feature, the bilevel digitline architecture is more efficient than alternative
architectures. A four-metal DRAM process allows local row decodes to be replaced by
either stitch regions or local wordline drivers. Either approach could substantially reduce
die size. The array core also includes the three twist regions necessary for the bilevel
digitline architecture. The twist region is larger than that used in the folded digitline
architecture, owing to the complexity of twisting digitlines vertically. The twist regions
again constitute a break in the array structure, making it necessary to include dummy
wordlines.
As with the open digitline and folded digitline architectures, the bilevel digitline length is
limited by power dissipation and a minimum cell-to-digitline capacitance ratio. In the
256-Mbit generation, the digitlines are again restricted from connecting to more than 256
mbits (128 mbit pairs). The analysis to arrive at this quantity is the same as that for the
open digitline architecture, except that the overall digitline capacitance is higher. The
bilevel digitline runs over twice as many cells as the open digitline with the digitline
running in equal lengths in both Metal2 and Metal1. The capacitance added by the Metal2
component is small compared to the already present Metal1 component because Metal2
does not connect to mbit transistors. Overall, the digitline capacitance increases by about
25% compared to an open digitline. The power dissipated during a Read or Refresh
operation is proportional to the digitline capacitance (CD), the supply (internal) voltage
(VCC), the external voltage (VCCX), the number of active columns (N), and the Refresh
period (P). It is given as

On a 256-Mbit DRAM in 8k Refresh, there are 32,768 (215) active columns during each
Read, Write, or Refresh operation. Active array current and power dissipation for a
256-Mbit DRAM are given in Table 3.11 for a 90 ns Refresh period (−5 timing) at various
digitline lengths. The budget for active array current is limited to 200mA for this 256-Mbit
design. To meet this budget, the digitline cannot exceed a length of 256 mbits.
Table 3.11: Active current and power versus bilevel digitline length.
Open table as spreadsheet

Digitline Digitline Active Power


Length Capacitance Current Dissipation
(mbits) (fF) (mW)

128 128 75 249

256 256 151 498

512 513 301 994


Wordline length is again limited by the maximum allowable (RC) time constant of the
wordline. The calculation for bilevel digitline is identical to that performed for open digitline
due to the similarity of array core design. These results are given in Table 3.3.
Accordingly, if the wordline time constant stant is to remain under the required
4-nanosecond limit, the wordline length cannot exceed 512 mbits (512 bilevel digitline
pairs).
A layout of various bilevel elements was generated to obtain reasonable estimates of
pitch cell size. With these size estimates, overall dimensions for a 32-Mbit array block
could be calculated. The diagram for a 32-Mbit array block using the bilevel digitline
architecture is shown in Figure 3.13. This block requires a total of one hundred
twenty-eight 256kbit array cores. The 128 array cores are arranged in 16 rows and 8
columns. Each 4-Mbit vertical section consists of 512 wordlines and 8,192 bilevel digitline
pairs (8,192 columns). Eight 4-Mbit strips are required to form the complete 32-Mbit block.
Sense amplifier blocks are positioned vertically between each 4-Mbit section.
Figure 3.13: Bilevel digitline architecture 32-Mbit array block.
Row decode strips are positioned horizontally between every array core. Only eight row
decode strips are needed for the sixteen array cores, for each row decode contains
wordline drivers for both odd and even rows. The 32Mbit array block shown in Figure 3.13
includes pitch cell layout estimates. Overall size for the 32-Mbit block is found by
summing the dimensions for each component.

As before,

where TR is the number of bilevel row decoders, HRDEC is the height of each decoder, TDL
is the number of bilevel digitline pairs including redundant and dummy, and PDL the
digitline pitch. Also,

where TSA is the number of sense amplifier strips, WAMP is the width of the sense
amplifiers, TWL is the total number of wordlines including redundant and dummy, PWL6 is
the wordline pitch for the plaid 6F2 mbit, TTWIST is the total number of twist regions, and
WTWIST is the width of the twist regions. Table 3.12 shows the calculated results for the
bilevel 32-Mbit block shown in Figure 3.13. A three-metal process is assumed in these
calculations because it requires the local row decoders. Array efficiency for the bilevel
digitline 32-Mbit array block, which yields 63.1% for this design example, is given as

Table 3.12: Bilevel digitline (local row decode)—32-Mbit size calculations.


Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 9

Width of sense amplifiers WAMP 65 μm

Number of local decode strips TLDEC 8

Height of local decode strips HLDEC 149 μm

Number of digitlines TDL 8,352

Number of wordlines TWL 4,256

Number of twist regions WTWIST1 9 μm

Width of twist regions WTWIST 9 μm

Height of 32-Mbit block Height32 6,203 μm

Width of 32-Mbit block Width32 4,632 μm


Table 3.12: Bilevel digitline (local row decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Area of 32-Mbit block Area32 28,732,296 μm2

With Metal4 added to the bilevel DRAM process, the local row decoder scheme can be
replaced by a global or hierarchical row decoder scheme. The addition of a fourth metal to
the DRAM process places even greater demands on process engineers. Regardless, an
analysis of 32-Mbit array block size was performed assuming the availability of Metal4.
The results of the analysis are shown in Tables 3.13 and 3.14 for the global and
hierarchical row decode schemes. Array efficiency for the 32-Mbit memory block using
global and hierarchical row decoding is 74.5% and 72.5%, respectively.
Table 3.13: Bilevel digitline (global decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 9

Width of sense amplifiers WAMP 65 μm

Number of global decode strips TGDEC 1

Height of global decode strips HGDEC 200 μm

Number of stitch regions NST 4

Height of stitch regions HST 10 μm

Number of digitlines TDL 8,352

Number of wordlines TWL 4,256

Number of twist regions TTWIST 24

Width of twist regions WTWIST 9 μm

Height of 32-Mbit block Height32 5,251 μm

Width of 32-Mbit block Width32 4,632 μm

Area of 32-Mbit block Area32 24,322,632 μm2

Table 3.14: Bilevel digitline (hierarchical row decode)—32-Mbit size calculations.


Open table as spreadsheet

Description Parameter Size

Number of sense amplifier strips TSA 9

Width of sense amplifiers WAMP 65 μm

Number of global decode strips TGDEC 1


Table 3.14: Bilevel digitline (hierarchical row decode)—32-Mbit size calculations.
Open table as spreadsheet

Description Parameter Size

Height of global decode strips HGDEC 190 μm

Number of hier decode strips NHDEC 4

Height of hier decode strips HHDEC 48 μm

Number of digitlines TDL 8,352

Number of wordlines TWL 4,256

Number of twist regions TTWIST 24

Width of twist regions WTWIST 9 μm

Height of 32-Mbit block Height32 5,393 μm

Width of 32-Mbit block Width32 4,632 μm

Area of 32-Mbit block Area32 24,930,376 μm2

3.2.4 Architectural Comparison


Although a straight comparison of DRAM architectures might appear simple, in fact it is
very complicated. Profit remains the critical test of architectural efficiency and is the true
basis for comparison. This in turn requires accurate yield and cost estimates for each
alternative. Without these estimates and a thorough understanding of process capabilities,
conclusions are elusive and the exercise is academic. The data necessary to perform the
analysis and render a decision also varies from manufacturer to manufacturer.
Accordingly, a conclusive comparison of the various array architectures is beyond the
scope of this book. Rather, the architectures are compared in light of the available data.
To better facilitate a comparison, the 32-Mbit array block size data from Sections 3.1 and
3.2 is summarized in Table 3.15 for the open digitline, folded digitline, and bilevel digitline
architectures.
Table 3.15: 32-Mbit size calculations summary.
Open table as spreadsheet

Architecture Row Metals 32-Mbit Efficiency


Deco Area (%)
2
de (μm )

Open digit Global 3 29,944,350 60.5

Open digit Hier 3 32,429,565 55.9

Folded digit Local 2 40,606,720 59.5

Folded digit Global 3 32,654,160 74.0


Table 3.15: 32-Mbit size calculations summary.
Open table as spreadsheet

Architecture Row Metals 32-Mbit Efficiency


Deco Area (%)
2
de (μm )

Folded digit Hier 3 34,089,440 70.9

Bilevel digit Local 3 28,732,296 63.1

Bilevel digit Global 4 24,322,632 74.5

Bilevel digit Hier 4 24,980,376 72.5


From Table 3.15, it can be concluded that overall die size (32-Mbit area) is a better metric
for comparison than array efficiency. For instance, the three-metal folded digitline design
using hierarchical row decodes has an area of 34,089,440mm2 and an efficiency of 70.9%.
The three-metal bilevel digitline design with local row decodes has an efficiency of only
63.1% but an overall area of 28,732,296 mm2. Array efficiency for the folded digitline is
higher. This is misleading, however, because the folded digitline yields a die that is 18.6%
larger for the same number of conductors.
Table 3.15 also illustrates that the bilevel digitline architecture always yields the smallest
die area, regardless of the configuration. The smallest folded digitline design at
32,654,160mm2 and the smallest open digitline design at 29,944,350mm2 are still larger
than the largest bilevel digitline design at 28,732,296mm2. It is also apparent that both the
bilevel and open digitline architectures need at least three conductors in their construction.
The folded digitline architecture still has a viable design option using only two conductors.
The penalty of using two conductors is a much larger die size—a full 41% larger than the
three-metal bilevel digitline design.

REFERENCES

[1] H.Hidaka, Y.Matsuda, and K.Fujishima, “A Divided/Shared Bit-Line Sensing Scheme for
ULSI DRAM Cores,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 473–77, April 1991.

[2] T.Hamada, N.Tanabe, H.Watanabe, K.Takeuchi, N.Kasai, H.Hada,K.Shibahara,


K.Tokashiki, K.Nakajima, S.Hirasawa, E.Ikawa, T.Saeki,E.Kakehashi, S.Ohya, and T.A.Kunio,
“A Split-Level Diagonal Bit-Line (SLDB) Stacked Capacitor Cell for 256Mb Drams,” 1992 IEDM
Technical Digest, pp. 799–802.

[3] M.Inoue, H.Kotani, T.Yamada, H.Yamauchi, A.Fujiwara, J.Matsushima,H.Akamatsu,


M.Fukumoto, M.Kubota, I.Nakao, N.Aoi, G.Fuse, S.Ogawa,S.Odanaka, A.Ueno, and
H.Yamamoto, “A 16Mb DRAM with an Open BitLine Architecture,” 1988 IEEE ISSCC Digest of
Technical Papers, pp. 246–247.
[4] M.Inoue, T.Yamada, H.Kotani, H.Yamauchi, A.Fujiwara, J.Matsushima,H.Akamatsu,
M.Fukumoto, M.Kubota, I.Nakao, N.Aoi, G.Fuse, S.Ogawa,S.Odanaka, A.Ueno, and
H.Yamamoto, “A 16-Mbit DRAM with a Relaxed Sense-Amplifier-Pitch Open-Bit-Line
Architecture,” IEEE Journal of Solid-State Circuits, vol. 23, pp. 1104–1112, October 1988.

[5] K.Noda, T.Saeki, A.Tsujimoto, T.Murotani, and K.Koyama, “A Boosted Dual Word-line
Decoding Scheme for 256Mb DRAMs,” 1992 Symposium on VLSI Circuits Digest of Technical
Papers, pp. 112–113.

[6] D.Takashima, S.Watanabe, H.Nakano, Y.Oowaki, and K.Ohuchi,“Open/Folded Bit-Line


Arrangement for Ultra-High-Density DRAM’s,” IEEE Journal of Solid-State Circuits, vol. 29, pp.
539–542, April 1994.

[7] Y.Oowaki, K.Tsuchida, Y.Watanabe, D.Takashima, M.Ohta, H.Nakano,S.Watanabe,


A.Nitayama, F.Horiguchi, K.Ohuchi, and F.Masuoka, “A 33-ns 64Mb DRAM,” IEEE Journal of
Solid-State Circuits, vol. 26, pp. 1498–1505, November 1991.

[8] Y.Nakagome, M.Aoki, S.Ikenaga, M.Horiguchi, S.Kimura, Y.Kawamoto,and K.Itoh, “The


Impact of Data-Line Interference Noise on DRAM Scaling,” IEEE Journal of Solid-State
Circuits, vol. 23, pp. 1120–1127, October 1988.

[9] T.Yoshihara, H.Hidaka, Y.Matsuda, and K.Fujishima, “A Twisted Bitline Technique for
Multi-Mb DRAMs,” 1988 IEEE ISSCC Digest of Technical Papers, pp. 238–239.
Chapter 4: The Peripheral Circuitry
In this chapter, we briefly discuss the peripheral circuitry. In particular, we discuss the
column decoder and its implementation. We also cover the implementation of row and
column redundancy.

4.1 COLUMN DECODER ELEMENTS


The column decoder circuits represent the final DRAM elements that pitch up to the array
mbits. Historically, column decode circuits were simple and straightforward: static logic
gates were generally used for both the decode tree elements and the driver output. Static
logic was used primarily because of the nature of column addressing in fast page mode
(FPM) and extended data out (EDO) devices. Unlike row addressing, which occurred
once per RAS cycle, column addressing could occur multiple times per RAS cycle, with
each column held open until a subsequent column address appeared. The transition
interval from one column to the next had to be minimized, allowing just enough time to
turn OFF the previous column, turn ON the new column, and equilibrate the necessary
data path circuits. Furthermore, because column address transitions were unpredictable
and asynchronous, a column decode logic that was somewhat forgiving of random
address changes was required—hence the use of static logic gates.
With the advent of synchronous DRAMs (SDRAMs) and high-speed, packet-based
DRAM technology, the application of column, or row, addresses became synchronized to
a clock. More importantly, column addressing and column timing became more
predictable, allowing design engineers to use pipelining techniques along with dynamic
logic gates in constructing column decode elements.

Column redundancy adds complexity to the column decoder because the redundancy
operation in FPM and EDO DRAMs requires the decode circuit to terminate column
transitions, prior to completion. In this way, redundant column elements can replace
normal column elements. Generally, the addressed column select is allowed to fire
normally. If a redundant column match occurs for this address, the normal column select
is subsequently turned OFF; the redundant column select is fired. The redundant match is
timed to disable the addressed column select before enabling the I/O devices in the
sense amplifier.

The fire-and-cancel operation used on the FPM and EDO column decoders is best
achieved with static logic gates. In packet-based and synchronous DRAMs, column
select firing can be synchronized to the clock. Synchronous operation, however, does not
favor a fire-and-cancel mode, preferring instead that the redundant match be determined
prior to firing either the addressed or redundant column select. This match is easily
achieved in a pipeline architecture because the redundancy match analysis can be
performed upstream in the address pipeline before presenting the address to the column
decode logic.
A typical FPM- or EDO-type column decoder realized with static logic gates is shown
schematically in Figure 4.1. The address tree is composed of combinations of NAND or
NOR gates. In this figure, the address signals are active HIGH, so the tree begins with
two-input NAND gates. Using predecoded address lines is again preferred. Predecoded
address lines both simplify and reduce the decoder logic because a single input can
represent two or more address terms. In the circuit shown in Figure 4.1, the four input
signals CA23, CA45, CA67, and CA8 represent seven address terms, permitting 1 of 128
decoding. Timing of the column selection is controlled by a signal called column decode
enable (CDE), which is usually combined with an input signal, as shown in Figure 4.2, or
as an additional term in the tree.

Figure 4.1: Column decode.

Figure 4.2: Column selection timing.


The final signal shown in the figure is labeled RED. This signal disables the normal
column term and enables redundant column decoders whenever the column address
matches any of the redundant address locations, as determined by the column-redundant
circuitry. A normal column select begins to turn ON before RED makes a LOW-to-HIGH
transition. This fire-and-cancel operation maximizes the DRAM speed between column
accesses, making column timing all the more critical. An example, which depicts a page
mode column transition, is shown in Figure 4.2. During the fire-and-cancel transition, I/O
equilibration (EQIO) envelops the deselection of the old column select CSEL<0> and the
selection of a new column select RCSEL<1>. In this example, CSEL<1> initially begins to
turn ON until RED transitions HIGH, disabling CSEL<1> and enabling redundant column
RCSEL<1>.
The column decode output driver is a simple CMOS inverter because the column select
signal (CSEL) only needs to drive to VCC. On the other hand, the wordline driver, as we
have seen, is rather complex; it needed to drive to a boosted voltage, VCCP. Another
feature of column decoders is that their pitch is very relaxed relative to the pitch of the
sense amplifiers and row decoders. From our discussion in Section 2.2 concerning I/O
transistors and CSEL lines, we learned that a given CSEL is shared by four to eight I/O
transistors. Therefore, the CSEL pitch is one CSEL for every eight to sixteen digitlines. As
a result, the column decoder layout is much less difficult to implement than either the row
decoder or the sense amplifiers.
A second type of column decoder, realized with dynamic P&E logic, is shown in Figure
4.3. This particular design was first implemented in an 800MB/s packet-based SLDRAM
device. The packet nature of SLDRAM and the extensive use of pipelining supported a
column decoder built with P&E logic. The column address pipeline included redundancy
match circuits upstream from the actual column decoder, so that both the column address
and the corresponding match data could be presented at the same time. There was no
need for the fire-and-cancel operation: the match data was already available.
Figure 4.3: Column decode: P&E logic.

Therefore, the column decoder fires either the addressed column select or the redundant
column select in synchrony with the clock. The decode tree is similar to that used for the
CMOS wordline driver; a pass transistor was added so that a decoder enable term could
be included. This term allows the tree to disconnect from the latching column select driver
while new address terms flow into the decoder. A latching driver was used in this pipeline
implementation because it held the previously addressed column select active with the
decode tree disconnected. Essentially, the tree would disconnect after a column select
was fired, and the new address would flow into the tree in anticipation of the next column
select. Concurrently, redundant match information would flow into the phase term driver
along with CA45 address terms to select the correct phase signal. A redundant match
would then override the normal phase term and enable a redundant phase term.
Operation of this column decoder is shown in Figure 4.4. Once again, deselection of the
old column select CSEL<0> and selection of a new column select RCSEL<1> are
enveloped by EQIO. Column transition timing is under the control of the column latch
signal CLATCH*. This signal shuts OFF the old column select and enables firing of the
new column select. Concurrent with CLATCH* firing, the decoder is enabled with decoder
enable (DECEN) to reconnect the decode tree to the column select driver. After the new
column select fires DECEN transitions LOW to once again isolate the decode tree.

Figure 4.4: Column decode waveforms.

4.2 COLUMN AND ROW REDUNDANCY

Redundancy has been used in DRAM designs since the 256k generation to improve yield
and profitability. In redundancy, spare elements such as rows and columns are used as
logical substitutes for defective elements. The substitution is controlled by a physical
encoding scheme. As memory density and size increase, redundancy continues to gain
importance. The early designs might have used just one form of repairable elements,
relying exclusively on row or column redundancy. Yet as processing complexity increased
and feature size shrank, both types of redundancy—row and column—became
mandatory.
Today various DRAM manufacturers are experimenting with additional forms of repair,
including replacing entire subarrays. The most advanced type of repair, however,
involves using movable saw lines as realized on a prototype 1Gb design [1]. Essentially,
any four good adjacent quadrants from otherwise bad die locations can be combined into
a good die by simply sawing the die along different scribe lines. Although this idea is far
from reaching production, it illustrates the growing importance of redundancy.
4.2.1 Row Redundancy
The concept of row redundancy involves replacing bad wordlines with good wordlines.
There could be any number of problems on the row to be repaired, including shorted or
open wordlines, wordline-to-digitline shorts, or bad mbit transistors and storage
capacitors. The row is not physically but logically replaced. In essence, whenever a row
address is strobed into a DRAM by RAS, the address is compared to the addresses of
known bad rows. If the address comparison produces a match, then a replacement
wordline is fired in place of the normal (bad) wordline.

The replacement wordline can reside anywhere on the DRAM. Its location is not restricted
to the array containing the normal wordline, although its range may be restricted by
architectural considerations. In general, the redundancy is considered local if the
redundant wordline and the normal wordline must always be in the same subarray.
If, however, the redundant wordline can exist in a subarray that does not contain the
normal wordline, the redundancy is considered global. Global repair generally results in
higher yield because the number of rows that can be repaired in a single subarray is not
limited to the number of its redundant rows. Rather, global repair is limited only by the
number of fuse banks, termed repair elements, that are available to any subarray.

Local row repair was prevalent through the 16-Meg generation, producing adequate yield
for minimal cost. Global row repair schemes are becoming more common for 64-Meg or
greater generations throughout the industry. Global repair is especially effective for
repairing clustered failures and offers superior repair solutions on large DRAMs.
Dynamic logic is a traditional favorite among DRAM designers for row redundant match
circuits. Dynamic gates are generally much faster than static gates and well suited to row
redundancy because they are used only once in an entire RAS cycle operation. The
dynamic logic we are referring to again is called precharge and evaluate (P&E). Match
circuits can take many forms; a typical row match circuit is shown in Figure 4.5. It consists
of a PRECHARGE transistor M1, match transistors M2-M5, laser fuses F1-F4, and static
gate I1. In addition, the node labeled row PRECHARGE (RPRE*) is driven by static logic
gates. The fuses generally consist of narrow polysilicon lines that can be blown or opened
with either a precision laser or a fusing current provided by additional circuits (not shown).
Figure 4.5: Row fuse block.
For our example using predecoded addresses, three of the four fuses shown must be
blown in order to program a match address. If, for instance, F2-F4 were blown, the circuit
would match for RA12<0> but not for RA12<1:3>. Prior to RAS falling, the node labeled
RED* is precharged to VCC by the signal RPRE*, which is LOW. Assuming that the circuit
is enabled by fuse F5, EVALUATE* will be LOW. After RAS falls, the row addresses
eventually propagate into the redundant block. If RA 12<0> fires HIGH, RED* discharges
through M2 to ground. If, however, RA 12<0> does not fire HIGH, RED* remains at VCC,
indicating that a match did not occur. A weak latch composed of I1 and M6 ensures that
RED* remains at VCC and does not discharge due to junction leakage.
This latch can easily be overcome by any of the match transistors. The signal RED* is
combined with static logic gates that have similar RED* signals derived from the
remaining predecoded addresses. If all of the RED* signals for a redundant element go
LOW, then a match has occurred, as indicated by row match (RMAT*) firing LOW. The
signal RMAT* stops the normal row from firing and selects the appropriate replacement
wordline. The fuse block in Figure 4.5 shows additional fuses F5-F6 for enabling and
disabling the fuse bank. Disable fuses are important in the event that the redundant
element fails and the redundant wordline must itself be repaired.
The capability to pretest redundant wordlines is an important element in most DRAM
designs today. For the schematic shown in Figure 4.5, the pretest is accomplished
through the set of static gates driven by the redundant test (REDTEST) signal. The input
signals labeled TRA 12n, TRA 34n, TRA 56n, and TODDEVEN are programmed uniquely
for each redundant element through connections to the appropriate predecoded address
lines. REDTEST is HIGH whenever the DRAM is in the redundant row pretest mode. If
the current row address corresponds to the programmed pretest address, RMAT* will be
forced LOW, and the corresponding redundant wordline rather than the normal wordline
will be fired. This pretest capability permits all of the redundant wordlines to be tested
prior to any laser programming.
Fuse banks or redundant elements, as shown in Figure 4.5, are physically associated
with specific redundant wordlines in the array. Each element can fire only one specific
wordline, although generally in multiple subarrays. The number of subarrays that each
element controls depends on the DRAM’s architecture, refresh rate, and redundancy
scheme. It is not uncommon in 16-Meg DRAMs for a redundant row to replace physical
rows in eight separate subarrays at the same time. Obviously, the match circuits must be
fast. Generally, firing of the normal row must be held off until the match circuits have
enough time to evaluate the new row address. As a result, time wasted during this phase
shows up directly on the part’s row access (tRAC) specification.

4.2.2 Column Redundancy


Column redundancy is the second type of repair available on most DRAM designs. In
Section 4.1, it was stated that column accesses can occur multiple times per RAS cycle.
Each column is held open until a subsequent column appears. Therefore, column
redundancy was generally implemented with circuits that are very different from those
seen in row redundancy. As shown in Figure 4.6, a typical column fuse block is built from
static logic gates rather than from P&E dynamic gates. P&E logic, though extremely fast,
needs to be precharged prior to each evaluation. The transition from one column to
another must occur within the span of a few nanoseconds. On FPM and EDO devices,
which had unpredictable and asynchronous column address transitions, there was no
guarantee that adequate time existed for this PRECHARGE to occur. Yet the predictable
nature of column addressing on the newer type of synchronous and packet-based
DRAMs affords today’s designers an opportunity to use P&E-type column redundancy
circuits. In these cases, the column redundant circuits appear very similar to P&E-style
row redundant circuits.
Figure 4.6: Column fuse block.
An example of a column fuse block for an FPM or EDO design is shown in Figure 4.6.
This column fuse block has four sets of column fuse circuits and additional enable/disable
logic. Each column fuse circuit, corresponding to a set of predecoded column addresses,
contains compare logic and two sets of laser fuse/latch circuits. The laser fuse/latch reads
the laser fuse whenever column fuse power (CFP) is enabled, generally on POWERUP
and during RAS cycles. The fuse values are held by the simple inverter latch circuits
composed of I0 and I1. Both true and complement data are fed from the fuse/latch circuit
into the comparator logic. The comparator logic, which appears somewhat complex, is
actually quite simple as shown in the following Boolean expression where F0 without the
bar indicates a blown fuse:

The column address match (CAM) signals from all of the predecoded addresses are
combined in standard static logic gates to create a column match (CMAT*) signal for the
column fuse block. The CMAT* signal, when active, cancels normal CSEL signals and
enables redundant RCSEL signals, as described in Section 4.1. Each column fuse block
is active only when its corresponding enable fuse has been blown. The column fuse block
usually contains a disable fuse for the same reason as a row redundant block: to repair a
redundant element. Column redundant pretest is implemented somewhat differently in
Figure 4.6 than row redundant pretest here. In Figure 4.6, the bottom fuse terminal is not
connected directly to ground. Rather, all of the signals for the entire column fuse block are
brought out and programmed either to ground or to a column pretest signal from the test
circuitry.
During standard part operation, the pretest signal is biased to ground, allowing the fuses
to be read normally. However, during column redundant pretest, this signal is brought to
VCC, which makes the laser fuses appear to be programmed. The fuse/latch circuits latch
the apparent fuse states on the next RAS cycle. Then, subsequent column accesses
allow the redundant column elements to be pretested by merely addressing them via their
pre-programmed match addresses.

The method of pretesting just described always uses the match circuits to select a
redundant column. It is a superior method to that described for the row redundant pretest
because it tests both the redundant element and its match circuit. Furthermore, as the
match circuit is essentially unaltered during redundant column pretest, the test is a better
measure of the obtainable DRAM performance when the redundant element is active.
Obviously, the row and column redundant circuits that are described in this section are
only one embodiment of what could be considered a wealth of possibilities. It seems that
all DRAM designs use some alternate form of redundancy. Other types of fuse elements
could be used in place of the laser fuses that are described. A simple transistor could
replace the laser fuses in either Figure 4.5 or Figure 4.6, its gate being connected to an
alternative fuse element. Furthermore, circuit polarity could be reversed and
non-predecoded addressing and other types of logic could be used. The options are
nearly limitless. Figure 4.7 shows a SEM image of a set of poly fuses.

Figure 4.7: 8-Meg×8-sync DRAM poly fuses.

REFERENCES

[1] T.Sugibayashi, I.Naritake, S.Utsugi, K.Shibahara, R.Oikawa, H.Mori,S.Iwao, T.Murotani,


K.Koyama, S.Fukuzawa, T.Itani, K.Kasama,T.Okuda, S.Ohya, and M.Ogawa, “A 1Gbit DRAM
for File Applications,” Digest of International Solid-State Circuits Conference, pp.
254–255,1995.

Chapter 5: Global Circuitry and


Considerations
In this chapter, we discuss the circuitry and design considerations associated with the
circuitry external to the DRAM memory array and memory array peripheral circuitry. We
call this global circuitry.
5.1 DATA PATH ELEMENTS
The typical DRAM data path is bidirectional, allowing data to be both written to and read
from specific memory locations. Some of the circuits involved are truly bidirectional,
passing data in for Write operations and out for Read operations. Most of the circuits,
however, are unidirectional, operating on data in only a Read or Write operation. To
support both operations, therefore, unidirectional circuits occur in complementary pairs:
one for reading and one for writing. Operating the same regardless of the data direction,
sense amplifiers, I/O devices, and data muxes are examples of bidirectional circuits.
Write drivers and data amplifiers, such as direct current sense amplifiers (DCSAs) or data
input buffers and data output buffers, are examples of paired, unidirectional circuits. In
this chapter, we explain the operation and design of each of these elements and then
show how the elements combine to form a DRAM data path. In addition, we discuss
address test compression circuits and data test compression circuits and how they affect
the overall data path design.

5.1.1 Data Input Buffer


The first element of any DRAM data path is the data input buffer. Shown in Figure 5.1, the
input buffer consists of both NMOS and PMOS transistors, basically forming a pair of
cascaded inverters. The first inverter stage has ENABLE transistors M1 and M2, allowing
the buffer to be powered down during inactive periods. The transistors are carefully sized
to provide high-speed operation and specific input trip points. The high-input trip point VIH
is set to 2.0 V for low-voltage TTL (LVTTL)-compatible DRAMs, while the low-input trip
point VIL is set to 0.8 V.

Figure 5.1: Data input buffer.


Designing an input buffer to meet specified input trip points generally requires a flexible
design with a variety of transistors that can be added or deleted with edits to the metal
mask. This is apparent in Figure 5.1 by the presence of switches in the schematic; each
switch represents a particular metal option available in the design. Because of variations
in temperature, device, and process, the final input buffer design is determined with
actual silicon, not simulations. For a DRAM that is 8 bits wide (x8), there will be eight input
buffers, each driving into one or more Write driver circuits through a signal labeled
DW<n> (Data Write where n corresponds to the specific data bit 0–7).
As the power supply drops, the basic inverter-based input buffer shown in Figure 5.1 is
finding less use in DRAM. The required noise margins and speed of the interconnecting
bus between the memory controller and the DRAM are getting difficult to meet. One
high-speed bus topology, called stub series terminated logic (SSTL), is shown in Figure
5.2 [1]. Tightly controlled transmission line impedances and series resistors transmit
high-speed signals with little distortion. Figure 5.2a shows the bus for clocks, command
signals, and addresses. Figure 5.2b shows the bidirectional bus for transmitting data to
and from the DRAM controller. In either circuit, VTT and VREF are set to VCC/2.

Figure 5.2: Stub series terminated logic (SSTL).


From this topology, we can see that a fully differential input buffer should be used: an
inverter won’t work. Some examples of fully differential input buffers are seen in Figure
5.3 [1] [2], Figure 5.4 [2], and Figure 5.5 [1].

Figure 5.3: Differential amplifier-based input receiver.


Figure 5.4: Self-biased differential amplifier-based input buffer.

Figure 5.5: Fully differential amplifier-based input buffer.


Figure 5.3 is simply a CMOS differential amplifier with an inverter output to generate valid
CMOS logic levels. Common-mode noise on the diffamp inputs is, ideally, rejected while
amplifying the difference between the input signal and the reference signal. The diff-amp
input common-mode range, say a few hundred mV, sets the minimum input signal
amplitude (centered around VREF) required to cause the output to change stages. The
speed of this configuration is limited by the diff-amp biasing current. Using a large current
will increase input receiver speed and, at the same time, decrease amplifier gain and
reduce the diff-amp’s input common-mode range.
The input buffer of Figure 5.3 requires an external biasing circuit. The circuit of Figure 5.4
is self-biasing. This circuit is constructed by joining a p-channel diff-amp and an
n-channel diff-amp at the active load terminals. (The active current mirror loads are
removed.) This circuit is simple and, because of the adjustable biasing connection,
potentially very fast. An output inverter, which is not shown, is often needed to ensure that
valid output logic levels are generated.
Both of the circuits in Figures 5.3 and 5.4 suffer from duty-cycle distortion at high speeds.
The PULLUP delay doesn’t match the PULLDOWN delay. Duty-cycle distortion becomes
more of a factor in input buffer design as synchronous DRAMs move toward clocking on
both the rising and falling edges of the system clock. The fully differential self-biased input
receiver of Figure 5.5 provides an adjustable bias, which acts to stabilize the PULLUP
and PULLDOWN drives. An inverter pair is still needed on the output of the receiver to
generate valid CMOS logic levels (two inverters in cascade on each output). A pair of
inverters is used so that the delay from the inverter pairs’ input to its output is constant
independent of a logic one or zero propagating through the pair.

5.1.2 Data Write Muxes


Data muxes are often used to extend the versatility of a design. Although some DRAM
designs connect the input buffer directly to the Write driver circuits, most architectures
place a block of Data Write muxes between the input buffers and the Write drivers. The
muxes allow a given DRAM design to support multiple configurations, such as x4, x8, and
x16 I/O. A typical schematic for these muxes is shown in Figure 5.6. As shown in this
figure, the muxes are programmed according to the bond option control signals labeled
OPTX4, OPTX8, and OPTX16. For x16 operation, each input buffer is muxed to only one
set of DW lines. For x8 operation, each input buffer is muxed to two sets of DW lines,
essentially doubling the quantity of mbits available to each input buffer. For x4 operation,
each input buffer is muxed to four sets of DW lines, again doubling the number of mbits
available to the remaining four operable input buffers.
Figure 5.6: Data Write mux.

Essentially, as the quantity of input buffers is reduced, the amount of column address
space for the remaining buffers is increased. This concept is easy to understand as it
relates to a 16Mb DRAM. As a x16 part, this DRAM has 1 mbit per data pin; as a x8 part,
2 mbits per data pin; and as a x4 part, 4 mbits per data pin. For each configuration, the
number of array sections available to an input buffer must change. By using Data Write
muxes that permit a given input buffer to drive as few or as many Write driver circuits as
required, design flexibility is easily accommodated.

5.1.3 Write Driver Circuit


The next element in the data path to be considered is the Write driver circuit. This circuit,
as the name implies, writes data from the input buffers into specific memory locations.
The Write driver, as shown in Figure 5.7, drives specific I/O lines coming from the mbit
arrays. A given Write driver is generally connected to only one set of I/O lines, unless
multiple sets of I/O lines are fed by a single Write driver circuit via additional muxes. Using
muxes between the Write driver and the arrays to limit the number of Write drivers and
DCSA circuits is quite common. Regardless, the Write driver uses a tristate output stage
to connect to the I/O lines. Tristate outputs are necessary because the I/O lines are used
for both Read and Write operations. The Write driver remains in a high-impedance state
unless the signal labeled Write is HIGH and either DW or DW* transitions LOW from the
initial HIGH state, indicating a Write operation. As shown in Figure 5.7, the Write driver is
controlled by specified column addresses, the Write signal, and DW<n>. The driver
transistors are sized large enough to ensure a quick, efficient Write operation. This is
important because the array sense amplifiers usually remain ON during a Write cycle.

Figure 5.7: Write driver.


The remaining elements of the Write Data path reside in the array and pitch circuits. As
previously discussed in Sections 1.2, 2.1, and 2.2, the mbits and sense amplifier block
constitute the end of the Write Data path. The new input data is driven by the Write driver,
propagating through the I/O transistors and into the sense amplifier circuits. After the
sense amplifiers are overwritten and the new data is latched, the Write drivers are no
longer needed and can be disabled. Completion of the Write operation into the mbits is
accomplished by the sense amplifiers, which restore the digitlines to full VCC and ground
levels. See Sections 1.2 and 2.2 for further discussion.

5.1.4 Data Read Path


The Data Read path is similar, yet complementary, to the Data Write path. It begins, of
course, in the array, as previously discussed in Sections 1.2 and 2.2. After data is read
from the mbit and latched by the sense amplifiers, it propagates through the I/O
transistors onto the I/O signal lines and into a DC sense amplifier (DCSA) or helper
flip-flop (HFF). The I/O lines, prior to the column select (CSEL) firing, are equilibrated and
biased to a voltage approaching VCC. The actual bias voltage is determined by the I/O
bias circuit, which serves to control the I/O lines through every phase of the Read and
Write cycles. This circuit, as shown in Figure 5.8, consists of a group of bias and
equilibrate transistors that operate in concert with a variety of control signals. When the
DRAM is in an idle state, such as when RAS is HIGH, the I/O lines are generally biased to
VCC. During a Read cycle and prior to CSEL, the bias is reduced to approximately one VTH
below VCC.
Figure 5.8: I/O bias circuit.
The actual bias voltage for a Read operation is optimized to ensure sense amplifier
stability and fast sensing by the DCSA or HFF circuits. Bias is maintained continually
throughout a Read cycle to ensure proper DCSA operation and to speed equilibration
between cycles by reducing the range over which the I/O lines operate. Furthermore,
because the DCSAs or HFFs are very high-gain amplifiers, rail-to-rail input signals are not
necessary to drive the outputs to CMOS levels. In fact, it is important that the input levels
not exceed the DCSA or HFF common-mode operating range. During a Write operation,
the bias circuits are disabled by the Write signal, permitting the Write drivers to drive
rail-to-rail.
Operation of the bias circuits is seen in the signal waveforms shown in Figure 5.9. For the
Read-Modify-Write cycle, the I/O lines start at VCC during standby; transition to VCC−VTH
at the start of a Read cycle; separate but remain biased during the Read cycle; drive
rail-to-rail during a Write cycle; recover to Read cycle levels (termed Write Recovery); and
equilibrate to VCC−VTH in preparation for another Read cycle.

Figure 5.9: I/O bias and operation waveforms.

5.1.5 DC Sense Amplifier (DCSA)


The next data path element is the DC sense amplifier (DCSA). This amplifier, termed
Data amplifier or Read amplifier by DRAM manufacturers, is an essential component in
modern, high-speed DRAM designs and takes a variety of forms. In essence, the DCSA
is a high-speed, high-gain differential amplifier for amplifying very small Read signals
appearing on the I/O lines into full CMOS data signals used at the output data buffer. In
most designs, the I/O lines connected to the sense amplifiers are very capacitive.
The array sense amplifiers have very limited drive capability and are unable to drive these
lines quickly. Because the DCSA has very high gain, it amplifies even the slightest
separation of the I/O lines into full CMOS levels, essentially gaining back any delay
associated with the I/O lines. Good DCSA designs can output full rail-to-rail signals with
input signals as small as 15mV. This level of performance can only be accomplished
through very careful design and layout. Layout must follow good analog design principles,
with each element a direct copy (no mirrored layouts) of any like elements.
As illustrated in Figure 5.10, a typical DCSA consists of four differential pair amplifiers and
self-biasing CMOS stages. The differential pairs are configured as two sets of balanced
amplifiers. Generally, the amplifiers are built with an NMOS differential pair using PMOS
active loads and NMOS current mirrors. Because NMOS has higher mobility, providing for
smaller transistors and lower parasitic loads, NMOS amplifiers usually offer faster
operation than PMOS amplifiers. Furthermore, VTH matching is usually better for NMOS,
offering a more balanced design. The first set of amplifiers is fed with I/O and I/O* signals
from the array; the second set, with the output signals from the first pair, labeled DX and
DX*. Bias levels into each stage are carefully controlled to provide optimum performance.

Figure 5.10: DC sense amp.


The outputs from the second stage, labeled DY and DY*, feed into self-biasing CMOS
inverter stages for fast operation. The final output stage is capable of tristate operation to
allow multiple sets of DCSA to drive a given set of Data Read lines (DR<n> and DR*<n>).
The entire amplifier is equilibrated prior to operation, including the self-biasing CMOS
inverter stages, by all of the devices connected to the signals labeled EQSA, EQSA*, and
EQSA 2. Equilibration is necessary to ensure that the amplifier is electrically balanced
and properly biased before the input signals are applied. The amplifier is enabled
whenever ENSA* is brought LOW, turning ON the output stage and the current mirror
bias circuit, which is connected to the differential amplifiers via the signal labeled CM. For
a DRAM Read cycle, operation of this amplifier is depicted in the waveforms shown in
Figure 5.11. The bias levels are reduced for each amplifier stage, approaching VCC/2 for
the final stage.
Figure 5.11: DCSA operation waveforms.

5.1.6 Helper Flip-Flop (HFF)


The DCSA of the last section can require a large layout area. To reduce the area, a
helper flip-flop (HFF) as seen in Figure 5.12, can be used. The HFF is basically a clocked
connection of two inverters as a latch [3]. When CLK is LOW, the I/O lines are connected
to the inputs/outputs of the inverters. The inverters don’t see a path to ground because
M1 is OFF when CLK is LOW. When CLK transitions HIGH, the outputs of the HFF
amplify, in effect, the inputs into full logic levels.

Figure 5.12: A helper flip-flop.


For example, if I/O=1.25 V and I/O*=1.23 V, then I/O becomes VCC, and I/O* goes to
zero when CLK transitions HIGH. Using positive feedback makes the HFF sensitive and
fast. Note that HFFs can be used at several locations on the I/O lines due to the small
size of the circuit.

5.1.7 Data Read Muxes


The Read Data path proceeds from the DCSA block to the output buffers. The connection
between these elements can either be direct or through Data Read muxes. Similar to
Data Write muxes, Data Read muxes are commonly used to accommodate multiple-part
configurations with a single design. An example of this is shown in Figure 5.13. This
schematic of a Data Read mux block is similar to that found in Figure 5.6 for the Data
Write mux block. For x16 operation, each output buffer has access to only one Data Read
line pair (DR<n> and DR*<n>). For x8 operation, the eight output buffers each have two
pairs of DR<n> lines available, doubling the quantity of mbits accessible by each output.
Similarly, for x4 operation, the four output buffers have four pairs of DR<n> lines available,
again doubling the quantity of mbits available for each output. For those configurations
with multiple pairs available, address lines control which DR<n> pair is connected to an
output buffer.

Figure 5.13: Data Read mux.

5.1.8 Output Buffer Circuit


The final element in our Read Data path is the output buffer circuit. It consists of an output
latch and an output driver circuit. A schematic for an output buffer circuit is shown in
Figure 5.14. The output driver on the right side of Figure 5.14 uses three NMOS
transistors to drive the output pad to either VCCX or ground. VCCX is the external supply
voltage to the DRAM, which may or may not be the same as VCC, depending on whether
or not the part is internally regulated. Output drivers using only NMOS transistors are
common because they offer better latch-up immunity and ESD protection than CMOS
output drivers. Nonetheless, PMOS transistors are still used in CMOS output drivers by
several DRAM manufacturers, primarily because they are much easier to drive than full
NMOS stages. CMOS outputs are also more prevalent on high-speed synchronous and
double-data rate (DDR) designs because they operate at high data rates with less
duty-cycle degradation.

Figure 5.14: Output buffer.


In Figure 5.14, two NMOS transistors are placed in series with VCCX to reduce substrate
injection currents. Substrate injection currents result from impact ionization, occurring
most commonly when high drain-to-source and high gate-to-source voltages exist
concurrently. These conditions usually occur when an output driver is firing to VCCX,
especially for high-capacitance loads, which slow the output transition. Two transistors in
series reduce this effect by lowering the voltages across any single device. The output
stage is tristated whenever both signals PULLUP and PULLDOWN are at ground.
The signal PULLDOWN is driven by a simple CMOS inverter, whereas PULLUP is driven
by a complex circuit that includes voltage charge pumps. The pumps generate a voltage
to drive PULLUP higher than one VTH above VCCX. This is necessary to ensure that the
series output transistors drive the pad to VCCX. The output driver is enabled by the signal
labeled QED. Once enabled, it remains tristated until either DQ or DQ* fires LOW. If DR
fires LOW, PULLDOWN fires HIGH, driving the pad to ground through M3. If DR* fires
LOW, PULLUP fires HIGH, driving the pad to VCCX through M1 and M2.
The output latch circuit shown in Figure 5.14 controls the output driver operation. As the
name implies, it contains a latch to hold the output data state. The latch frees the DCSA
or HFF and other circuits upstream to get subsequent data for the output. It is capable of
storing not only one and zero states, but also a high-impedance state (tristate). It offers
transparent operation to allow data to quickly propagate to the output driver. The input to
this latch is connected to the DR<n> signals coming from either the DCSAs or Data Read
muxes. Output latch circuits appear in a variety of forms, each serving the needs of a
specific application or architecture. The data path may contain additional latches or
circuits in support of special modes such as burst operation.

5.1.9 Test Modes

Address compression and data compression are two special test modes that are usually
supported by the data path design. Test modes are included in a DRAM design to extend
test capabilities or speed component testing or to subject a part to conditions that are not
seen during normal operation. Compression test modes yield shorter test times by
allowing data from multiple array locations to be tested and compressed on-chip, thereby
reducing the effective memory size by a factor of 128 or more in some cases. Address
compression, usually on the order of 4x to 32x, is accomplished by internally treating
certain address bits as “don’t care” addresses.
The data from all of the “don’t care” address locations, which correspond to specific data
input/output pads (DQ pins), are compared using special match circuits. Match circuits
are usually realized with NAND and NOR logic gates or through P&E-type drivers on the
differential DR<n> buses. The match circuits determine if the data from each address
location is the same, reporting the result on the respective DQ pin as a match or a fail.
The data path must be designed to support the desired level of address compression.
This may necessitate more DCSA circuits, logic, and pathways than are necessary for
normal operation.
The second form of test compression is data compression: combining data at the output
drivers. Data compression usually reduces the number of DQ pins to four. This
compression reduces the number of tester pins required for each part and increases the
throughput by allowing additional parts to be tested in parallel. In this way, x16 parts
accommodate 4x data compression, and x8 parts accommodate 2x data compression.
The cost of any additional circuitry to implement address and data compression must be
balanced against the benefits derived from test time reduction. It is also important that
operation in test mode correlate 100% with operation in non-test mode. Correlation is
often difficult to achieve, however, because additional circuitry must be activated during
compression, modifying noise and power characteristics on the die.

5.2 ADDRESS PATH ELEMENTS


DRAMs have used multiplexed addresses since the 4kb generation. Multiplexing is
possible because DRAM operation is sequential: column operations follow row operations.
Obviously, the column address is not needed until the sense amplifiers have latched,
which cannot occur until some time after the wordline has fired. DRAMs operate at higher
current levels with multiplexed addressing because an entire page (row address) must be
opened with each row access. This disadvantage is overcome by the lower packaging
cost associated with multiplexed addresses. In addition, owing to the presence of the
column address strobe (CAS), column operation is independent of row operation,
enabling a page to remain open for multiple, high-speed column accesses. This page
mode type of operation improves system performance because column access time is
much shorter than row access time. Page mode operation appears in more advanced
forms, such as EDO and synchronous burst mode, providing even better system
performance through a reduction in effective column access time.

The address path for a DRAM can be broken into two parts: the row address path and the
column address path. The design of each path is dictated by a unique set of requirements.
The address path, unlike the data path, is unidirectional, with address information flowing
only into the DRAM. The address path must achieve a high level of performance with
minimal power and die area just like any other aspect of DRAM design. Both paths are
designed to minimize propagation delay and maximize DRAM performance. In this
chapter, we discuss various elements of the row and column address paths.

5.2.1 Row Address Path


The row address path encompasses all of the circuits from the address input pad to the
wordline driver. These circuits generally include the address buffer, CAS before RAS
(CBR) counter, predecode logic, array buffers, redundancy logic, phase drivers, and row
decoder blocks. Row decoder blocks are discussed in Section 2.3, while redundancy is
addressed in Section 4.2. We will now focus on the remaining elements of the row
address path, namely, the row address buffer, CBR counter, predecode logic, array
buffers, and phase drivers.

5.2.2 Row Address Buffer


The row address buffer, as shown schematically in Figure 5.15, consists of a standard
input buffer and the additional circuits necessary to implement the functions required for
the row address path. The address input buffer (inpBuf) is the same as that used for the
data input buffer (see Figure 5.1) and must meet the same criteria for VIH and VIL as the
data input buffer. The row address buffer includes an inverter latch circuit, as shown in
Figure 5.15. This latch, consisting of two inverters and any number of input muxes (two in
this case), latches the row address after RAS falls.

Figure 5.15: Row address buffer.


The input buffer drives through a mux, which is controlled by a signal called row address
latch (RAL). Whenever RAL is LOW, the mux is enabled. The feedback inverter has low
drive capability, allowing the latch to be overwritten by either the address input buffer or
the CBR counter, depending on which mux is enabled. The latch circuit drives into a pair
of NAND gates, forcing both RA<n> and RA*<n> to logic LOW states whenever the row
address buffer is disabled because RAEN is LOW.

5.2.3 CBR Counter


As illustrated in Figure 5.15, the CBR ( CAS before RAS) counter consists of a single
inverter and a pair of inverter latches coupled through a pair of complementary muxes to
form a one-bit counter. For every HIGH-to-LOW transition of CLK*, the register output at
Q toggles. All of the CBR counters from each row address buffer are cascaded together
to form a CBR ripple counter. The Q output of one stage feeds the CLK* input of a
subsequent stage. The first register in the counter is clocked whenever RAS falls while in
a CBR Refresh mode. By cycling through all possible row address combinations in a
minimum of clocks, the CBR ripple counter provides a simple means of internally
generating Refresh addresses. The CBR counter drives through a mux into the inverter
latch of the row address buffer. This mux is enabled whenever CBR address latch
(CBRAL) is LOW. Note that the signals RAL and CBRAL are mutually exclusive in that
they cannot be LOW at the same time. For each and every DRAM design, the row
address buffer and CBR counter designs take on various forms. Logic may be inverted,
counters may be more or less complex, and muxes may be replaced with static gates.
Whatever the differences, however, the function of the input buffer and its CBR counter
remains essentially the same.

5.2.4 Predecode Logic


As discussed in earlier sections of this book, using predecoded addressing internal to a
DRAM has many advantages: lower power, higher efficiency, and a simplified layout. An
example of predecode circuits for the row address path is shown in Figure 5.16. This
schematic consists of seven sets of predecoders. The first set is used for even and odd
row selection and consists only of cascaded inverter stages. RA<0> is not combined with
other addresses due to the nature of row decoding. In our example, we assume that odd
and even will be combined with the predecoded row addresses at a later point in the
design, such as at the array interfaces. The next set of predecoders is for addresses
RA<1> and RA<2>, which together form RA 12<0:3>. Predecoding is accomplished
through a set of two-input NAND gates and inverter buffers, as shown. The remaining
addresses are identically decoded except for RA<12>. As illustrated at the bottom of the
schematic, RA<12> and RA*<12> are taken through a NOR gate circuit. This circuit
forces both RA<12> lines to a HIGH state as they enter the decoder, whenever the
DRAM is configured for 4k Refresh. This process essentially makes RA<12> a “don’t
care” address in the predecode circuit, forcing twice as many wordlines to fire at a time.
Figure 5.16: Row address predecode circuits.

5.2.5 Refresh Rate


Normally, when Refresh rates change for a DRAM, a higher order address is treated as a
“don’t care” address, thereby decreasing the row address space but increasing the
column address space. For example, a 16Mb DRAM bonded as a 4Mb x4 part could be
configured in several Refresh rates: 1k, 2k, and 4k. Table 5.1 shows how row and column
addressing is related to these Refresh rates for the 16Mb example. In this example, the
2k Refresh rate would be more popular because it has an equal number of row and
column addresses or square addressing.
Table 5.1: Refresh rate versus row and column addresses.
Open table as spreadsheet
Refresh Rows Columns Row Column
Rate Address Addresse
es s

4K 4,096 1,024 12 10

2K 2,048 2,048 11 11

1K 1,024 4,096 10 12

Refresh rate is also determined by backward compatibility, especially in personal


computer designs. Because a 4Mb DRAM has less memory space than a 16Mb DRAM,
the 4Mb DRAM should naturally have fewer address pins. To sell 16Mb DRAMs into
personal computers that are designed for 4Mb DRAMs, the 16Mb part must be configured
with no more address pins than the 4Mb part. If the 4Mb part has eleven address pins,
then the 16Mb part should have only eleven address pins, hence 2k Refresh. To trim cost,
most PC designs keep the number of DRAM address pins to a minimum. Although this
practice holds cost down, it also limits expandability and makes conversion to newer
DRAM generations more complicated owing to resultant backward compatibility issues.

5.2.6 Array Buffers


The next elements to be discussed in the row address path are array buffers and phase
drivers. The array buffers drive the predecoded address signals into the row decoder
blocks. In general, the buffers are no more than cascaded inverters, but in some cases
they include static logic gates or level translators, depending on row decoder
requirements. Additional logic gates could be included for combining the addresses with
enabling signals from the control logic or for making odd/even row selection by combining
the addresses with the odd and even address signals. Regardless, the resulting signals
ultimately drive the decode trees, making speed an important issue. Buffer size and
routing resistance, therefore, become important design parameters in high-speed designs
because the wordline cannot be fired until the address tree is decoded and ready for the
PHASE signal to fire.

5.2.7 Phase Drivers


As the discussion concerning wordline drivers and tree decoders in Section 2.3 showed,
the signal that actually fires the wordline is called PHASE. Although the signal name may
vary from company to company, the purpose of the signal does not. Essentially, this
signal is the final address term to arrive at the wordline driver. Its timing is carefully
determined by the control logic. PHASE cannot fire until the row addresses are set up in
the decode tree. Normally, the timing of PHASE also includes enough time for the row
redundancy circuits to evaluate the current address. If a redundancy match is found, the
normal row cannot be fired. In most DRAM designs, this means that the normally
decoded PHASE signal will not fire but will instead be replaced by some form of
redundant PHASE signal.
A typical phase decoder/driver is shown in Figure 5.17. Again, like so many other DRAM
circuits, it is composed of standard static logic gates. A level translator would be included
in the design if the wordline driver required a PHASE signal that could drive to a boosted
voltage supply. This translator could be included with the phase decoder logic or placed in
array gaps to locally drive selected row decoder blocks. Local phase translators are
common on double-metal designs with local row decoder blocks.

Figure 5.17: Phase decoder/driver.

5.2.8 Column Address Path


With our examination of row address path elements complete, we can turn our attention
to the column address path. The column address path consists of input buffers, address
transition detection circuits, predecode logic, redundancy, and column decode circuits.
Redundancy and column decode circuits are addressed in Section 4.2 and Section 4.1,
respectively.
A column address buffer schematic, as shown in Figure 5.18, consists of an input buffer,
a latch, and address transition circuits. The input buffer shown in this figure is again
identical to that described in Section 5.1 for the data path, so further description is
unnecessary. The column address input buffer, however, is disabled by the signal power
column (PCOL*) whenever RAS is HIGH and the part is inactive. Normally, the column
address buffers are enabled by PCOL* shortly after RAS goes LOW. The input buffer
feeds into a NAND latch circuit, which traps the column address whenever the column
address latch (CAL*) fires LOW. At the start of a column cycle, CAL* is HIGH, making the
latch transparent. This transparency permits the column address to propagate through to
the predecode circuits. The latch output also feeds into an address transition detection
(ATD) circuit that is shown on the right side of Figure 5.18.

Figure 5.18: Column address buffer.

5.2.9 Address Transition Detection


The address transition detection (ATD) circuit is extremely important to page mode
operation in a DRAM. An ATD circuit detects any transition that occurs on a respective
address pin. Because it follows the NAND latch circuit, the CAL* control signal must be
HIGH. This signal makes the latch transparent and thus the ATD functional. The ATD
circuit in Figure 5.18 has symmetrical operation: it can sense either a rising or a falling
edge. The circuit uses a two-input XNOR gate to generate a pulse of prescribed duration
whenever a transition occurs. Pulse duration is dictated by simple delay elements: I1 and
C1 or I2 and C2. Both delayed and undelayed signals from the latch are fed into the
XNOR gate. The ATD output signals from all of the column addresses, labeled TDX*<n>,
are routed to the equilibration driver circuit shown in Figure 5.19. This circuit generates a
set of equilibration signals for the DRAM. The first of these signals is equilibrate I/O
(EQIO*), which, as the name implies, is used in arrays to force equilibration of the I/O
lines. As we learned in Section 5.1.4, the I/O lines need to be equilibrated to VCC—VTH
prior to a new column being selected by the CSEL<n> lines. EQIO* is the signal used to
accomplish this equilibration.

Figure 5.19: Equilibration driver.


The second signal generated by the equilibration driver is called equilibrate sense amp
(EQSA). This signal is generated from address transitions occurring on all of the column
addresses, including the least significant addresses. The least significant column
addresses are not decoded into the column select lines (CSEL). Rather, they are used to
select which set of I/O lines is connected to the output buffers. As shown in the schematic,
EQSA is activated regardless of which address is changed because the DCSA s must be
equilibrated prior to sensing any new data. EQIO, on the other hand, is not affected by the
least significant addresses because the I/O lines do not need equilibrating unless the
CSEL lines are changed. The equilibration driver circuit, as shown in Figure 5.19, uses a
balanced NAND gate to combine the pulses from each ATD circuit. Balanced logic helps
ensure that the narrow ATD pulses are not distorted as they progress through the circuit.
The column addresses are fed into predecode circuits, which are very similar to the row
address predecoders. One major difference, however, is that the column addresses are
not allowed to propagate through the part until the wordline has fired. For this reason, the
signal Enable column (ECOL) is gated into the predecode logic as shown in Figure 5.20.
ECOL disables the predecoders whenever it is LOW, forcing the outputs all HIGH in our
example. Again, the predecode circuits are implemented with simple static logic gates.
The address signals emanating from the predecode circuits are buffered and distributed
throughout the die to feed the column decoder logic blocks. The column decoder
elements are described in Section 4.1.
Figure 5.20: Column predecode logic.

5.3 SYNCHRONIZATION IN DRAMS [1]


In a typical SDRAM, the relationship when reading data out of the DRAM between the
CLK input and the valid data out time or access time tAC can vary widely with process
shifts, temperature, and operating clock frequency. Figure 5.21 shows a typical
relationship between the SDRAM, CLK input, and a DQ output. The parameter
specification tAC is generally specified as being less than some value, for example,
tAC<5ns.

Figure 5.21: SDRAM CLK input and DQ output.


As clock frequencies increase, it is desirable to have less uncertainty in the availability of
valid data on the output of the DRAM. Towards this goal, double data rate (DDR)
SDRAMs use a delay-locked loop (DLL) [3] to drive tAC to zero. (A typical specification for
tAC in a DDR SDRAM is ±0.75ns.) Figure 5.22 shows the block diagram for a DLL used in
a DDR SDRAM. Note that the data I/O in DDR is clocked on both the rising and falling
edges of the input CLK (actually on the output of the delay line) [4]. Also, note that the
input CLK should be synchronized with the DQ strobe (DQS) clock output.

Figure 5.22: Block diagram for DDR SDRAM DLL.


To optimize and stabilize the clock-access and output-hold times in an SDRAM, an
internal register-controlled delay-locked loop (RDLL) has been used [4] [5] [6]. The RDLL
adjusts the time difference between the output (DQs) and input (CLK) clock signals in
SDRAM until they are aligned. Because the RDLL is an all-digital design, it provides
robust operation over all process corners. Another solution to the timing constraints found
in SDRAM was given by using the synchronous mirror delay (SMD) in [7]. Compared to
RDLL, the SMD does not lock as tightly, but the time to acquire lock between the input
and output clocks is only two clock cycles.

As DRAM clock speeds continue to increase, the skew becomes the dominating concern,
outweighing the RDLL disadvantage of longer time to acquire lock.
This section describes an RSDLL (register-controlled symmetrical DLL), which meets the
requirements of DDR SDRAM. (Read/Write accesses occur on both rising and falling
edges of the clock.) Here, symmetrical means that the delay line used in the DLL has the
same delay whether a HIGH-to-LOW or a LOW-to-HIGH logic signal is propagating along
the line. The data output timing diagram of a DDR SDRAM is shown in Figure 5.23. The
RSDLL increases the valid output data window and diminishes the undefined tDSDQ by
synchronizing both the rising and falling edges of the DQS signal with the output data DQ.

Figure 5.23: Data timing chart for DDR DRAM.


Figure 5.22 shows the block diagram of the RSDLL. The replica input buffer dummy delay
in the feedback path is used to match the delay of the input clock buffer. The phase
detector (PD) compares the relative timing of the edges of the input clock signal and the
feedback clock signal, which comes through the delay line and is controlled by the shift
register. The outputs of the PD, shift-right and shift-left, control the shift register. In the
simplest case, one bit of the shift register is HIGH. This single bit selects a point of entry
for CLKIn the symmetrical delay line. (More on this later.) When the rising edge of the
input clock is within the rising edges of the output clock and one unit delay of the output
clock, both outputs of the PD, shift-right and shift-left, go LOW and the loop is locked.

5.3.1 The Phase Detector


The basic operation of the phase detector (PD) is shown in Figure 5.24. The resolution of
this RSDLL is determined by the size of the unit delay used in the delay line. The locking
range is determined by the number of delay stages used in the symmetrical delay line.
Because the DLL circuit inserts an optimum delay time between CLKIn and CLKOut,
making the output clock change simultaneously with the next rising edge of the input
clock, the minimum operating frequency to which the RSDLL can lock is the reciprocal of
the product of the number of stages in the symmetrical delay line with the delay per stage.
Adding more delay stages increases the locking range of the RSDLL at the cost of
increased layout area.

Figure 5.24: Phase detector used in RSDLL.

5.3.2 The Basic Delay Element


Rather than using an AND gate as the unit-delay stage (NAND+ inverter), as in [5], a
NAND-only-based delay element can be used. The implementation of a three-stage delay
line is shown in Figure 5.25. The problem when using a NAND+inverter as the basic delay
element is that the propagation delay through the unit delay resulting from a
HIGH-to-LOW transition is not equal to the delay of a LOW-to-HIGH transition (tPHL ≠tPLH).
Furthermore, the delay varies from one run to another. If the skew between tPHL and tPLH is
50 ps, for example, the total skew of the falling edges through ten stages will be 0.5 ns.
Because of this skew, the NAND+ inverter delay element cannot be used in a DDR
DRAM. In our modified symmetrical delay element, another NAND gate is used instead of
an inverter (two NAND gates per delay stage). This scheme guarantees that tPHL =tPLH
independent of process variations. While one NAND switches from HIGH to LOW, the
other switches from LOW to HIGH. An added benefit of the two-NAND delay element is
that two point-of-entry control signals are now available. The shift register uses both to
solve the possible problem caused by the POWERUP ambiguity in the shift register.

Figure 5.25: Symmetrical delay element used in RSDLL.

5.3.3 Control of the Shift Register


As shown in Figures 5.25 and 5.26, the input clock is a common input to every delay
stage. The shift register selects a different tap of the delay line (the point of entry for the
input clock signal into the symmetrical delay line). The complementary outputs of each
register cell select the different tap: Q is connected directly to the input A of a delay
element, and Q* is connected to the previous stage of input B.

Figure 5.26: Delay line and shift register for RSDLL.


From right to left, the first LOW-to-HIGH transition in the shift register sets the point of
entry into the delay line. The input clock passes through the tap with a HIGH logic state in
the corresponding position of the shift register. Because the Q* of this tap is equal to a
LOW, it disables the previous stages; therefore, the previous states of the shift register do
not matter (shown as “don’t care,” X, in Figure 5.25). This control mechanism guarantees
that only one path is selected. This scheme also eliminates POWERUP concerns
because the selected tap is simply the first, from the right, LOW-to-HIGH transition in the
register.

5.3.4 Phase Detector Operation


To stabilize the movement in the shift register, after making a decision, the phase
detector waits at least two clock cycles before making another decision (Figure 5.24). A
divide-by-two is included in the phase detector so that every other decision, resulting from
comparing the rising edges of the external clock and the feedback clock, is used. This
provides enough time for the shift register to operate and the output waveform to stabilize
before another decision by the PD is implemented. The unwanted side effect of this delay
is an increase in lock time. The shift register is clocked by combining the shift-left and
-right signals. The power consumption decreases when there are no shift-left or -right
signals and the loop is locked.

Another concern with the phase-detector design is the design of the flip-flops (FFs). To
minimize the static phase error, very fast FFs should be used, ideally with zero setup
time.
Also, the metastability of the flip-flops becomes a concern as the loop locks. This,
together with possible noise contributions and the need to wait, as discussed above,
before implementing a shift-right or -left, may make it more desirable to add more filtering
in the phase detector. Some possibilities include increasing the divider ratio of the phase
detector or using a shift register in the phase detector to determine when a number
of—say, four—shift-rights or—lefts have occurred. For the design in Figure 5.26, a
divide-by-two was used in the phase detector due to lock-time requirements.

5.3.5 Experimental Results


The RSDLL of Figure 5.22 was fabricated in a 0.21 μm, four-poly, double-metal CMOS
technology (a DRAM process). A 48-stage delay line with an operation frequency of
125–250 MHz was used. The maximum operating frequency was limited by delays
external to the DLL, such as the input buffer and interconnect. There was no noticeable
static phase error on either the rising or falling edges. Figure 5.27 shows the resulting rms
jitter versus input frequency. One sigma of jitter over the 125–250 MHz frequency range
was below 100 ps. The measured delay per stage versus VCC and temperature is shown
in Figure 5.28. Note that the 150 ps typical delay of a unit-delay element was very close to
the rise and fall times on-chip of the clock signals and represents a practical minimum
resolution of a DLL for use in a DDR DRAM fabricated in a 0.21 μm process.

Figure 5.27: Measured rms jitter versus input frequency.


Figure 5.28: Measured delay per stage versus VCC and temperature.
The power consumption (the current draw of the DLL when VCC=2.8 V) of the prototype
RSDLL is illustrated in Figure 5.29. It was found that the power consumption was
determined mainly by the dynamic power dissipation of the symmetrical delay line. The
NAND delays in this test chip were implemented with 10/0.21 μm NMOS and 20/0.21 μm
PMOS. By reducing the widths of both the NMOS and PMOS transistors, the power
dissipation is greatly reduced without a speed or resolution penalty (with the added
benefit of reduced layout size).

Figure 5.29: Measured ICC (DLL current consumption) versus input frequency.

5.3.6 Discussion

In this section we have presented one possibility for the design of a delay-locked loop.
While there are others, this design is simple, manufacturable, and scalable.
In many situations the resolution of the phase detector must be decreased. A useful
circuit to determine which one of two signals occurs earlier in time is shown in Figure 5.30.
This circuit is called an arbiter. If S1 occurs slightly before S2 then the output SO1 will go
HIGH, while the output SO2 stays LOW. If S2 occurs before S1, then the output SO2
goes HIGH and SO1 remains LOW. The fact that the inverters on the outputs are
powered from the SR latch (the cross-coupled NAND gates) ensures that SO1 and SO2
cannot be HIGH at the same time. When designed and laid out correctly, this circuit is
capable of discriminating tens of picoseconds of difference between the rising edges of
the two input signals.
Figure 5.30: Two-way arbiter as a phase detector.
The arbiter alone cannot be capable of controlling the shift register. A simple logic block
to generate shift-right and shift-left signals is shown in Figure 5.31. The rising edge of
SO1 or SO2 is used to clock two D-latches so that the shift-right and shift-left signals may
be held HIGH for more than one clock cycle. Figure 5.31 uses a divide-by-two to hold the
shift signals valid for two clock cycles. This is important because the output of the arbiter
can have glitches coming from the different times when the inputs go back LOW. Note
that using an arbiter-based phase detector alone can result in an alternating sequence of
shift-right, shift-left. We eliminated this problem in the phase-detector of Figure 5.24 by
introducing the dead zone so that a minimum delay spacing of the clocks would result in
no shifting.

Figure 5.31: Circuit for generating shift register control.


In some situations, the fundamental delay time of an element in the delay line needs
reduction. Using the NAND-based delay shown in Figure 5.26, we are limited to delay
times much longer than a single inverter delay. However, using a single inverter delay
results in an inversion in our basic cell. Figure 5.32 shows that using a double
inverter-based delay element can solve this problem. By crisscrossing the single delay
element inputs the cell appears to be non-inverting. This scheme results in the least delay
because the delay between the cell’s input and output is only the delay of a single inverter.
The problems with using this type of cell over the NAND-based delay are inserting the
feedback clock and the ability of the shift register to control the delay of the resulting delay
line.

Figure 5.32: A double inverter used as a delay element.


Figure 5.33 shows how inserting transmission gates (TGs) that are controlled by the shift
register allows the insertion point to vary along the line. When C is HIGH, the feedback
clock is inserted into the output of the delay stage. The inverters in the stage are isolated
from the feedback clock by an additional set of TGs. We might think, at first glance, that
adding the TGs in Figure 5.33 would increase the delay significantly; however, there is
only a single set of TGs in series with the feedback before the signal enters the line. The
other TGs can be implemented as part of the inverter to minimize their impact on the
overall cell delay. Figure 5.34 shows a possible inverter implementation.

Figure 5.33: Transmission gates added to delay line.


Figure 5.34: Inverter implementation.
Finally, in many situations other phases of an input clock need generating. This is
especially useful in minimizing the requirements on the setup and hold times in a
synchronous system. Figure 5.35 shows a method of segmenting the delays in a DLL
delay line. A single control register can be used for all four delay segments. The
challenge becomes reducing the amount of delay in a single delay element. A shift in this
segmented configuration results in a change in the overall delay of four delays rather than
a single delay. Note that with a little thought, and realizing that the delay elements of
Figure 5.32 can be used in inverting or non-inverting configurations, the delay line of
Figure 5.35 can be implemented with only two segments and still provide taps of 90°,
180°, 270°, and 360°.

Figure 5.35: Segmenting delays for additional clocking taps.

[1]
This material is taken directly from [4].

REFERENCES

[1] H.Ikeda and H.Inukai, “High-Speed DRAM Architecture Development,” IEEE Journal of
Solid-State Circuits, vol. 34, no. 5, pp. 685–692, May 1999.

[2] M.Bazes,“Two Novel Full Complementary Self-Biased CMOS Differential Amplifiers,” IEEE
Journal of Solid-State Circuits, vol. 26, no. 2, pp. 165–168, February 1991.

[3] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation,
Piscataway, NJ: IEEE Press, 1998.
[4] F.Lin, J.Miller, A.Schoenfeld, M.Ma, and R.J.Baker, “A Register-Controlled Symmetrical
DLL for Double-Data-Rate DRAM," IEEE Journal of Solid-State Circuits, vol. 34, no. 4, 1999.

[5] A.Hatakeyama, H.Mochizuki, T.Aikawa, M.Takita, Y.Ishii, H.Tsuboi, S.Y.Fujioka,


S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishimura, K.Kawabata, Y.Okajima, M.Kawano,
H.Kojima, K.Mizutani, T.Anezaki, M.Hasegawa, and M.Taguchi, “A 256-Mb SDRAM Using a
Register-Controlled Digital DLL,” IEEE Journal of Solid-State Circuits, vol. 32, pp. 1728–1732,
November 1997.

[6] S.Eto, M.Matsumiya, M.Takita, Y.Ishii, T.Nakamura, K.Kawabata, H.Kano, A.Kitamoto,


T.Ikeda, T.Koga, M.Higashiro, Y.Serizawa, K.Itabashi, O.Tsuboi, Y.Yokoyama, and M.Taguchi,
“A 1Gb SDRAM with Ground Level Precharged Bitline and Non-Boosted 2.1V Word Line,”
ISSCC Digest of Technical Papers, pp. 82–83, February 1998.

[7] T.Saeki, Y.Nakaoka, M.Fujita, A.Tanaka, K.Nagata, K.Sakakibara, T.Matano, Y.Hoshino,


K.Miyano, S.Isa, S.Nakazawa, E.Kakehashi, J.M.Drynan, M.Komuro, T.Fukase, H.Iwasaki,
M.Takenaka, J.Sekine, M.Igeta, N.Nakanishi, T.Itani, K.Yoshida, H.Yoshino, S.Hashimoto,
T.Yoshii, M.Ichinose, T.Imura, M.Uziie, S.Kikuchi, K.Koyama, Y.Fukuzo, and T.Okuda, “A
2.5-ns Clock Access 250-MHz, 256-Mb SDRAM with Synchronous Mirror Delay,” IEEE
Journal of Solid-State Circuits, vol.31, pp. 1656–1665, November 1996.

Chapter 6: Voltage Converters


In this chapter, we discuss the circuitry for generating the on-chip voltages that lie outside
the supply range. In particular, we look at the wordline pump voltage and the substrate
pumps. We also discuss voltage regulators that generate the internal power supply
voltages.

6.1 INTERNAL VOLTAGE REGULATORS

6.1.1 Voltage Converters


DRAMs depend on a variety of internally generated voltages to operate and to optimize
their performance. These voltages generally include the boosted wordline voltage VCCP,
the internally regulated supply voltage VCC, the VCC/2 cellplate and digitline bias voltage
DVC2, and the pumped substrate voltage VBB. Each of these voltages is regulated with a
different kind of voltage generator. A linear voltage converter generates VCC, while a
modified CMOS inverter creates DVC2. Generating the boosted supply voltages VCCP and
VBB requires sophisticated circuits that employ charge on voltage pumps (a.k.a., charge
pumps). In this chapter, we discuss each of these circuits and how they are used in
modern DRAM designs to generate the required supply voltages.
Most modern DRAM designs rely on some form of internal voltage regulation to convert
the external supply voltage VCCX into an internal supply voltage VCC. We say most, not all,
because the need for internal regulation is dictated by the external voltage range and the
process in which the DRAM is based. The process determines gate oxide thickness, field
device characteristics, and diffused junction properties. Each of these properties, in turn,
affects breakdown voltages and leakage parameters, which limit the maximum operating
voltage that the process can reliably tolerate. For example, a 16Mb DRAM built in a 0.35
μm CMOS process with a 120A thick gate oxide can operate reliably with an internal
supply voltage not exceeding 3.6 V. If this design had to operate in a 5 V system, an
internal voltage regulator would be needed to convert the external 5 V supply to an
internal 3.3 V supply. For the same design operating in a 3.3 V system, an internal
voltage regulator would not be required. Although the actual operating voltage is
determined by process considerations and reliability studies, the internal supply voltage is
generally proportional to the minimum feature size. Table 6.1 summarizes this
relationship.
Table 6.1: DRAM process versus supply voltage.
Open table as spreadsheet

Process VCC Internal


0.45 μm 4V
0.35 μm 3.3V
0.25 μm 2.5V
0.20 μm 2V

All DRAM voltage regulators are built from the same basic elements: a voltage reference,
one or more output power stages, and some form of control circuit. How each of these
elements is realized and combined into the overall design is the product of process and
design limitations and the design engineer’s preferences. In the paragraphs that follow,
we discuss each element, overall design objectives, and one or more circuit
implementations.

6.1.2 Voltage References


The sole purpose of a voltage reference circuit is to establish a nominal operating point
for the voltage regulator circuit. However, this nominal voltage is not a constant; rather, it
is a function of the external voltage VCCX and temperature. Figure 6.1 is a graph of VCC
versus VCCX for a typical DRAM voltage regulator. This figure shows three regions of
operation. The first region occurs during a POWERUP or POWERDOWN cycle, in which
VCCX is below the desired VCC operating voltage range. In this region, VCC is set equal to
VCCX, providing the maximum operating voltage allowable in the part. A maximum voltage
is desirable in this region to extend the DRAM’s operating range and ensure data
retention during low-voltage conditions.
Figure 6.1: Ideal regulator characteristics.
The second region exists whenever VCCX is in the nominal operating range. In this range,
VCC flattens out and establishes a relatively constant supply voltage to the DRAM.
Various manufacturers strive to make this region absolutely flat, eliminating any
dependence on VCCX. We have found, however, that a moderate amount of slope in this
range for characterizing performance is advantageous. It is critically important in a
manufacturing environment that each DRAM meet the advertised specifications, with
some margin for error. A simple way to ensure these margins is to exceed the operating
range by a fixed amount during component testing. The voltage slope depicted in Figure
6.1 allows this margin testing to occur by establishing a moderate degree of dependence
between VCCX and VCC.
The third region shown in Figure 6.1 is used for component burn-in. During burn-in, both
the temperature and voltage are elevated above the normal operating range to stress the
DRAMs and weed out infant failures. Again, if there were no VCCX and VCC dependency,
the internal voltage could not be elevated. A variety of manufacturers do not use the
monotonic curve shown in Figure 6.1. Some designs break the curve as shown in Figure
6.2, producing a step in the voltage characteristics. This step creates a region in which
the DRAM cannot be operated. We will focus on the more desirable circuits that produce
the curve shown in Figure 6.1.

Figure 6.2: Alternative regulator characteristics.


To design a voltage reference, we need to make some assumptions about the power
stages. First, we will assume that they are built as unbuffered, two-stage, CMOS
operational amplifiers and that the gain of the first stage is sufficiently large to regulate the
output voltage to the desired accuracy. Second, we will assume that they have a closed
loop gain of Av. The value of Av influences not only the reference design, but also the
operating characteristics of the power stage (to be discussed shortly). For this design
example, assume Av=1.5. The voltage reference circuit shown in Figure 6.3 can realize
the desired VCC characteristics shown in Figure 6.4. This circuit uses a simple resistor and
a PMOS diode reference stack that is buffered and amplified by an unbuffered CMOS
op-amp. The resistor and diode are sized to provide the desired output voltage and
temperature characteristics and the minimum bias current. Note that the diode stack is
programmed through the series of PMOS switch transistors that are shunting the stack. A
fuse element is connected to each PMOS switch gate. Unfortunately, this
programmability is necessary to accommodate process variations and design changes.

Figure 6.3: Resistor/diode voltage reference.

Figure 6.4: Voltage regulator characteristics.


The reference temperature characteristics rely on establishing a proper balance between
VTH, mobility, and resistance variations with temperature. An ideal temperature coefficient
for this circuit would be positive such that the voltage rises with temperature, somewhat
compensating for the CMOS gate delay speed loss associated with increasing
temperature.
Two PMOS diodes (M3–M4) connected in series with the op-amp output terminal provide
the necessary burn-in characteristics in Region 3. In normal operation, the diodes are
OFF. As VCCX is increased into the burn-in range, VCCX will eventually exceed VREF by two
diode drops, turning ON the PMOS diodes and clamping VREF to VTH below VCCX. The
clamping action will establish the desired burn-in characteristics, keeping the regulator
monotonic in nature.
In Region 2, voltage slope over the operating range is determined by the resistance ratio
of the PMOS reference diode M1 and the bias resistor R1. Slope reduction is
accomplished by either increasing the effective PMOS diode resistance or replacing the
bias resistor with a more elaborate current source as shown in Figure 6.5. This current
source is based on a VTH referenced source to provide a reference current that is only
slightly dependent on VCCX [1]. A slight dependence is still necessary to generate the
desired voltage slope.

Figure 6.5: Improved voltage reference.


The voltage reference does not actually generate Region 1 characteristics for the voltage
regulator. Rather, the reference ensures a monotonic transition from Region 1 to Region
2. To accomplish this task, the reference must approximate the ideal characteristics for
Region 1, in which VCC=VCCX. The regulator actually implements Region 1 by shorting the
VCC and VCCX buses together through the PMOS output transistors found in each power
stage opamp. Whenever VCCX is below a predetermined voltage V1, the PMOS gates are
driven to ground, actively shorting the buses together. As VCCX exceeds the voltage level
V1, the PMOS gates are released and normal regulator operation commences. To ensure
proper DRAM operation, this transition needs to be as seamless as possible.

6.1.3 Bandgap Reference


Another type of voltage reference that is popular among DRAM manufacturers is the
bandgap reference. The bandgap reference is traditionally built from vertical pnp
transistors. A novel bandgap reference circuit is presented in Figure 6.6. As shown, it
uses two bipolar transistors with an emitter size ratio of 10:1. Because they are both
biased with the same current and owing to the different emitter sizes, a differential voltage
will exist between the two transistors. The differential voltage appearing across resistor
R1 will be amplified by the op-amp. Resistors R2 and R1 establish the closed loop gain
for this amplifier and determine nominal output voltage and bias currents for the
transistors [1].

Figure 6.6: Bandgap reference circuit.


The almost ideal temperature characteristics of a bandgap reference are what make them
attractive to regulator designers. Through careful selection of emitter ratios and bias
currents, the temperature coefficient can be set to approximately zero. Also, because the
reference voltage is determined by the bandgap characteristics of silicon rather than a
PMOS VTH, this circuit is less sensitive to process variations than the circuit in Figure 6.3.
Three problems with the bandgap reference shown in Figure 6.6, however, make it much
less suitable for DRAM applications. First, the bipolar transistors need moderate current
to ensure that they operate beyond the knee on their I-V curves. This bias current is
approximately 10–20 μA per transistor, which puts the total bias current for the circuit
above 25 μA. The voltage reference shown in Figure 6.3, on the other hand, consumes
less than 10 μA of the total bias current. Second, the vertical pnp transistors inject a
significant amount of current into the substrate—as high as 50% of the total bias current
in some cases. For a pumped substrate DRAM, the resulting charge from this injected
current must be removed by the substrate pump, which raises standby current for the part.
Third, the voltage slope for a bandgap reference is almost zero because the feedback
configuration in Figure 6.6 has no dependence on VCCX. Seemingly ideal, this
performance is unacceptable because to perform margin testing on the DRAM a finite
voltage slope is needed.

6.1.4 The Power Stage

Although static voltage characteristics of the DRAM regulator are determined by the
voltage reference circuit, dynamic voltage characteristics are dictated by the power
stages. The power stage is therefore a critical element in overall DRAM performance. To
date, the most prevalent type of power stage among DRAM designers is a simple,
unbuffered op-amp. Unbuffered op-amps, while providing high open loop gain, fast
response, and low offset, allow design engineers to use feedback in the overall regulator
design. Feedback reduces temperature and process sensitivity and ensures better load
regulation than any type of open loop system. Design of the op-amps, however, is
anything but simple.
The ideal power stage would have high bandwidth, high open-loop gain, high slew rate,
low systematic offset, low operating current, high drive, and inherent stability.
Unfortunately, several of these parameters are contradictory, which compromises certain
aspects of the design and necessitates trade-offs. While it seems that many DRAM
manufacturers use a single opamp for the regulator’s power stage, we have found that it
is better to use a multitude of smaller op-amps. These smaller op-amps have wider
bandwidth, greater design flexibility, and an easier layout than a single, large opamp.
The power op-amp is shown in Figure 6.7. The schematic diagram for a voltage regulator
power stage is shown in Figure 6.8. This design is used on a 256Mb DRAM and consists
of 18 power op-amps, one boost amp, and one small standby op-amp. The VCC power
buses for the array and peripheral circuits are isolated except for the 20-ohm resistor that
bridges the two together. Isolating the buses is important to prevent high-current spikes
that occur in the array circuits from affecting the peripheral circuits. Failure to isolate
these buses can result in speed degradation for the DRAM because high-current spikes
in the array cause voltage cratering and a corresponding slow-down in logic transitions.

Figure 6.7: Power op-amp.


Figure 6.8: Power stage.
With isolation, the peripheral VCC is almost immune to array noise. To improve slew rate,
each of the power op-amp stages shown in Figure 6.8 features a boost circuit that raises
the differential pair bias current and slew rate during expected periods of large current
spikes. Large spikes are normally associated with Psense-amp activation. To reduce
active current consumption, the boost current is disabled a short time after Psense-amp
activation by the signal labeled BOOST. The power stages themselves are enabled by
the signals labeled ENS and ENS* only when RAS is LOW and the part is active. When
RAS is HIGH, all of the power stages are disabled.
The signal labeled CLAMP* ensures that the PMOS output transistor is OFF whenever
the amplifier is disabled to prevent unwanted charging of the VCC bus. When forced to
ground, however, the signal labeled PWRUP shorts the VCCX and VCC buses together
through the PMOS output transistor. (The need for this function was described earlier in
this section.) Basically, these two buses are shorted together whenever the DRAM
operates in Region 1, as depicted in Figure 6.1. Obviously, to prevent a short circuit
between VCCX and ground, CLAMP* and PWRUP are mutually exclusive.
The smaller standby amplifier is included in this design to sustain the VCC supply
whenever the part is inactive, as determined by RAS. This amplifier has a very low
operating current and a correspondingly low slew rate. Accordingly, the standby amplifier
cannot sustain any type of active load. For this reason, the third and final type of amplifier
is included. The boost amplifier shown in Figure 6.8 is identical to the power stage
amplifiers except that the bias current is greatly reduced. This amplifier provides the
necessary supply current to operate the VCCP and VBB voltage pumps.
The final element in the voltage regulator is the control logic. An example of this logic is
shown in Figure 6.9. It consists primarily of static CMOS logic gates and level translators.
The logic gates are referenced to VCC. Level translators are necessary to drive the power
stages, which are referenced to VCCX levels. A series of delay elements tune the control
circuit relative to Psense-amp activation (ACT) and RAS (RL*) timing. Included in the
control circuit is the block labeled VCCX level detector [1]. The reference generator
generates two reference signals, which are fed into the comparator, to determine the
transition point V1 between Region 1 and Region 2 operation for the regulator. In addition,
the boost amp control logic block is shown in Figure 6.9. This circuit examines the VBB
and VCCP control signals to enable the boost amplifier whenever either voltage pump is
active.

Figure 6.9: Regulator control logic.

6.2 PUMPS AND GENERATORS


Generation of the boosted wordline voltage and negative substrate bias voltage requires
voltage pump (or charge pump) circuits. Pump circuits are commonly used to create
voltages that are more positive or negative than available supply voltages. Two voltage
pumps are commonly used in a DRAM today. The first is a VCCP pump, which generates
the boosted wordline voltage and is built primarily from NMOS transistors. The second is
a VBB pump, which generates the negative substrate bias voltage and is built from PMOS
transistors. The exclusive use of NMOS or PMOS transistors in each pump is required to
prevent latch-up and current injection into the mbit arrays. NMOS transistors are required
in the VCCP pump because various active nodes would swing negative with respect to the
substrate voltage VCCP. Any n-diffusion regions connected to these active nodes would
forward bias and cause latch-up and injection. Similar conditions mandate the use of
PMOS transistors in the VBB pump.

6.2.1 Pumps
Voltage pump operation can be understood with the assistance of the simple voltage
pump circuit depicted in Figure 6.10. For this positive pump circuit, imagine, for one
phase of a pump cycle, that the clock CLK is HIGH. During this phase, node A is at
ground and node B is clamped to VCC−VTH by transistor M1. The charge stored in
capacitor C1 is then

During the second phase, the clock CLK will transition LOW, which brings node A HIGH.
As node A rises to VCC, node B begins to rise to VCC+ (VCC−VTH), shutting OFF transistor
M1. At the same time, as node B rises one VTH above VLOAD, transistor M2 begins to
conduct. The charge from capacitor C1 is transferred through M2 and shared with the
capacitor CLOAD. This action effectively pumps charge into CLOAD and ultimately raises the
voltage VOUT. During subsequent clock cycles, the voltage pump continues to deliver
charge to CLOAD until the voltage VOUT equals 2VCC−VTH1−VTH2, one VTH below the peak
voltage occurring at node B. A simple, negative voltage pump could be built from the
circuit of Figure 6.10 by substituting PMOS transistors for the two NMOS transistors
shown and moving their respective gate connections.

Figure 6.10: Simple voltage pump circuit.


Schematics for actual VCCP and VBB pumps are shown in Figures 6.11 and 6.12,
respectively. Both of these circuits are identical except for the changes associated with
the NMOS and PMOS transistors. These pump circuits operate as two phase pumps
because two identical pumps are working in tandem. As discussed in the previous
paragraph, note that transistors M1 and M2 are configured as switches rather than as
diodes. The drive signals for these gates are derived from secondary pump stages and
the tandem pump circuit. Using switches rather than diodes improves pumping efficiency
and operating range by eliminating the VTH drops associated with diodes.
Figure 6.11: VCCP pump.

Figure 6.12: VBB pump.

Two important characteristics of a voltage pump are capacity and efficiency. Capacity is a
measure of how much current a pump can continue to supply, and it is determined
primarily by the capacitor’s size and its operating frequency. The operating frequency is
limited by the rate at which the pump capacitor C1 can be charged and discharged.
Efficiency, on the other hand, is a measure of how much charge or current is wasted
during each pump cycle. A typical DRAM voltage pump might be 30–50% efficient. This
translates into 2–3 milliamps of supply current for every milliamp of pump output current.
In addition to the pump circuits just described, regulator and oscillator circuits are needed
to complete a voltage pump design. The most common oscillator used in voltage pumps
is the standard CMOS ring oscillator. A typical voltage pump ring oscillator is shown in
Figure 6.13. A unique feature of this oscillator is the multifrequency operation permitted
by including mux circuits connected to various oscillator tap points. These muxes,
controlled by signals such as PWRUP, enable higher frequency operation by reducing the
number of inverter stages in the ring oscillator.

Figure 6.13: Ring oscillator.


Typically, the oscillator is operated at a higher frequency when the DRAM is in a PWRUP
state, since this will assist the pump in initially charging the load capacitors. The oscillator
is enabled and disabled through the signal labeled REGDIS*. This signal is controlled by
the voltage regulator circuit shown in Figure 6.14. Whenever REGDIS* is HIGH, the
oscillator is functional, and the pump is operative. Examples of VCCP and VBB pump
regulators are shown in Figure 6.14 and 6.15, respectively.
Figure 6.14: VCCP regulator.

Figure 6.15: VBB regulator.


These circuits use low-bias current sources and NMOS/PMOS diodes to translate the
pumped voltage levels, VCCP and VBB, to normal voltage levels. This shifted voltage level
is fed into a modified inverter stage that has an output-dependent trip point (the inverter
has hysteresis). The trip point is modified with feedback to provide hysteresis for the
circuit. Subsequent inverter stages provide additional gain for the regulator and boost the
signal to the full CMOS level necessary to drive the oscillator. Minimum and maximum
operating voltages for the pump are controlled by the first inverter stage trip point,
hysteresis, and the NMOS/PMOS diode voltages. The clamp circuit shown in Figure 6.14
is included in the regulator design to limit the pump voltage when VCC is elevated, such as
during burn-in.
A second style of voltage pump regulator for controlling VCCP and VBB voltages is shown in
Figure 6.16 and 6.17, respectively. This type of regulator uses a high-gain comparator
coupled to a voltage translator stage. The translator stage of Figure 6.16 translates the
pumped voltage VCCP and the reference voltage VDD down within the input common-mode
range of the comparator circuit. The translator accomplishes this with a reference current
source and MOS diodes. The reference voltage supply VDD is translated down by one
threshold voltage (VTH) by sinking the reference current with a current mirror stage
through a PMOS diode connected to VDD. The pumped voltage supply VCCP is similarly
translated down by sinking the same reference current with a matching current mirror
stage through a diode stack. The diode stack consists of a PMOS diode, matching that in
the VDD reference translator, and a pseudo-NMOS diode. The pseudo-NMOS diode is
actually a series of NMOS transistors with a common gate connection.
Figure 6.16: VCCP differential regulator.

Figure 6.17: VBB differential regulator.


The quantity and sizes of the transistors included in this pseudo-NMOS diode are
mask-programmable. The voltage drop across this pseudo-NMOS diode determines, in
essence, the regulated voltage for VCCP. Accordingly,

The voltage dropped across the PMOS diode does not affect the regulated voltage
because the reference voltage supply VDD is translated through a matching PMOS diode.
Both of the translated voltages are fed into the comparator stage, which enables the
pump oscillator whenever the translated VCCP voltage falls below the translated VDD
reference voltage. The comparator has built-in hysteresis, via the middle stage: this
dictates the amount of ripple present on the regulated VCCP supply.
The VBB regulator in Figure 6.17 operates in a similar fashion to the VCCP regulator of
Figure 6.16. The primary difference lies in the voltage translator stage. For the VBB
regulator, this stage translates the pumped voltage VBB and the reference voltage VSS up
within the input common mode range of the comparator circuit. The reference voltage VSS
is translated up by one threshold voltage (VTH) by sourcing a reference current with a
current mirror stage through an NMOS diode. The regulated voltage VBB is similarly
translated up by sourcing the same reference current with a matching current mirror stage
through a diode stack. This diode stack, similar to the VCCP case, contains an NMOS
diode that matches that used in translating the reference voltage VSS. The stack also
contains a mask-adjustable, pseudo-NMOS diode. The voltage across the pseudo-NMOS
diode determines the regulated voltage for VBB such that

The comparator includes a hysteresis stage, which dictates the amount of ripple present
on the regulated VBB supply.

6.2.2 DVC2 Generator


With our discussion of voltage regulator circuits concluded, we can briefly turn our
attention to the DVC2 generation. As discussed in Section 1.2, the memory capacitor
cellplate is biased to VCC/2. Furthermore, the digitlines are always equilibrated and biased
to VCC/2 between array accesses. In most DRAM designs, the cellplate and digitline bias
voltages are derived from the same generator circuit. A simple circuit for generating VCC/2
(DVC2) voltage is shown in Figure 6.18. It consists of a standard CMOS inverter with the
input and output terminals shorted together. With correct transistor sizing, the output
voltage of this circuit can be set precisely to VCC/2 V. Taking this simple design one step
further results in the actual DRAM DVC2 generator found in Figure 6.19. This generator
contains additional transistors to improve both its stability and drive capability.

Figure 6.18: Simple DVC2 generator.


Figure 6.19: DVC2 generator.

6.3 DISCUSSION

In this chapter, we introduced the popular circuits used on a DRAM for voltage generation
and regulation. Because this introduction is far from exhaustive, we include a list of
relevant readings and references in the Appendix for those readers interested in greater
detail.

REFERENCES

[1] R.J.Baker, H.W.Li, and D.E.Boyce, CMOS: Circuit Design, Layout, and Simulation.
Piscataway, NJ: IEEE Press, 1998.

[2] B.Keeth, Control Circuit Responsive to Its Supply Voltage Level, United States Patent
#5,373,227, December 13, 1994.
Appendix
Supplemental Reading
In this tutorial overview of DRAM circuit design, we may not have covered specific topics
to the reader’s satisfaction. For this reason, we have compiled a list of supplemental
readings from major conferences and journals, categorized by subject. It is our hope that
unanswered questions will be addressed by the authors of these readings, who are
experts in the field of DRAM circuit design.

General DRAM Design and Operation

[1] S.Fuji, K.Natori, T.Furuyama, S.Saito, H.Toda, T.Tanaka, and O.Ozawa,“A Low-Power Sub
100 ns 256K Bit Dynamic RAM,” IEEE Journal of Solid-State Circuits, vol. 18, pp. 441–446,
October 1983.

[2] A.Mohsen, R.I.Kung, C.J.Simonsen, J.Schutz, P.D.Madland,E.Z.Hamdy, and M.T.Bohr,


“The Design and Performance of CMOS 256K Bit DRAM Devices,” IEEE Journal of
Solid-State Circuits, vol. 19, pp. 610–618, October 1984.

[3] M.Aoki, Y.Nakagome, M.Horiguchi, H.Tanaka, S.Ikenaga, J.Etoh,Y.Kawamoto, S.Kimura,


E.Takeda, H.Sunami, and K.Itoh, “A 60-ns 16Mbit CMOS DRAM with a Transposed Data-Line
Structure,” IEEE Journal of Solid-State Circuits, vol. 23, pp. 1113–1119, October 1988.

[4] M.Inoue, T.Yamada, H.Kotani, H.Yamauchi, A.Fujiwara, J.Matsushima,H.Akamatsu,


M.Fukumoto, M.Kubota, I.Nakao, N.Aoi, G.Fuse,S.Ogawa, S.Odanaka, A.Ueno, and
H.Yamamoto, “A 16-Mbit DRAM with a Relaxed Sense-Amplifier-Pitch Open-Bit-Line
Architecture,” IEEE Journal of Solid-State Circuits, vol. 23, pp. 1104–1112, October 1988.

[5] T.Watanabe, G.Kitsukawa, Y.Kawajiri, K.Itoh, R.Hori, Y.Ouchi,T.Kawahara, and


R.Matsumoto, “Comparison of CMOS and BiCMOS 1Mbit DRAM Performance,” IEEE Journal
of Solid-State Circuits, vol. 24, pp. 771–778, June 1989.

[6] K.Itoh, “Trends in Megabit DRAM Circuit Design,” IEEE Journal of Solid-State Circuits, vol.
25, pp. 778–789, June 1990.

[7] Y.Nakagome, H.Tanaka, K.Takeuchi, E.Kume, Y.Watanabe, T.Kaga,Y.Kawamoto, F.Murai,


R.Izawa, D.Hisamoto, T.Kisu, T.Nishida,E.Takeda, and K.Itoh, “An Experimental 1.5-V 64-Mb
DRAM,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 465–472, April 1991.
[8] P.Gillingham, R.C.Foss, V.Lines, G.Shimokura, and T.Wojcicki, “High-Speed,
High-Reliability Circuit Design for Megabit DRAM,” IEEE Journal of Solid-State Circuits, vol. 26,
pp. 1171–1175, August 1991.

[9] K.Kimura, T.Sakata, K.Itoh, T.Kaga, T.Nishida, and Y.Kawamoto, “A Block-Oriented RAM
with Half-Sized DRAM Cell and Quasi-Folded Data-Line Architecture,” IEEE Journal of
Solid-State Circuits, vol. 26, pp. 1511–1518, November 1991.

[10] Y.Oowaki, K.Tsuchida, Y.Watanabe, D.Takashima, M.Ohta, H.Nakano,S.Watanabe,


A.Nitayama, F.Horiguchi, K.Ohuchi, and F.Masuoka, “A 33ns 64-Mb DRAM,” IEEE Journal of
Solid-State Circuits, vol. 26, pp. 1498–1505, November 1991.

[11] T.Kirihata, S.H.Dhong, K.Kitamura, T.Sunaga, Y.Katayama,R.E.Scheuerlein, A.Satoh,


Y.Sakaue, K.Tobimatus, K.Hosokawa,T.Saitoh, T.Yoshikawa, H.Hashimoto, and M.Kazusawa,
“A 14-ns 4-Mb CMOS DRAM with 300-mW Active Power,” IEEE Journal of Solid-State Circuits,
vol. 27, pp. 1222–1228, September 1992.

[12] K.Shimohigashi and K.Seki, “Low-Voltage ULSI Design,” IEEE Journal of Solid-State
Circuits, vol. 28, pp. 408–413, April 1993.

[13] G.Kitsukawa, M.Horiguchi, Y.Kawajiri, T.Kawahara, T.Akiba, Y.Kawase,T.Tachibana,


T.Sakai, M.Aoki, S.Shukuri, K.Sagara, R.Nagai, Y.Ohji,N.Hasegawa, N.Yokoyama, T.Kisu,
H.Yamashita, T.Kure, and T.Nishida,“256-Mb DRAM Circuit Technologies for File
Applications,” IEEE Journal of Solid-State Circuits, vol. 28, pp. 1105–1113, November 1993.

[14] T.Kawahara, Y.Kawajiri, M.Horiguchi, T.Akiba, G.Kitsukawa, T.Kure,and M.Aoki, “A


Charge Recycle Refresh for Gb-Scale DRAM’s in File Applications,” IEEE Journal of
Solid-State Circuits, vol. 29, pp. 715–722, June 1994.

[15] S.Shiratake, D.Takashima, T.Hasegawa, H.Nakano, Y.Oowaki,S.Watanabe, K.Ohuchi,


and F.Masuoka, “A Staggered NAND DRAM Array Architecture for a Gbit Scale Integration,”
1994 Symposium on VLSI Circuits, p. 75, June 1994.

[16] T.Ooishi, K.Hamade, M.Asakura, K.Yasuda, H.Hidaka, H.Miyamoto, and H.Ozaki, “An
Automatic Temperature Compensation of Internal Sense Ground for Sub-Quarter Micron
DRAMs,” 1994 Symposium on VLSI Circuits, p. 77, June 1994.

[17] A.Fujiwara, H.Kikukawa, K.Matsuyama, M.Agata, S.Iwanari,M.Fukumoto, T.Yamada,


S.Okada, and T.Fujita, “A 200MHz 16Mbit Synchronous DRAM with Block Access Mode,”
1994 Symposium on VLSI Circuits, p. 79, June 1994.

[18] Y.Kodama, M.Yanagisawa, K.Shigenobu, T.Suzuki, H.Mochizuki, and T.Ema, “A


150-MHz 4-Bank 64M-bit SDRAM with Address Incrementing Pipeline Scheme,” 1994
Symposium on VLSI Circuits, p. 81, June 1994.
[19] D.Choi, Y.Kim, G.Cha, J.Lee, S.Lee, K.Kirn, E.Haq, D.Jun, K.Lee,S.Cho, J.Park, and
H.Lim, “Battery Operated 16M DRAM with Post Package Programmable and Variable Self
Refresh,” 1994 Symposium on VLSI Circuits, p. 83, June 1994.

[20] S.Yoo, J.Han, E.Haq, S.Yoon, S.Jeong, B.Kim, J.Lee, T.Jang, H.Kim,C.Park, D.Seo,
C.Choi, S.Cho, and C.Hwang, “A 256M DRAM with Simplified Register Control for Low Power
Self Refresh and Rapid Burn-In,” 1994 Symposium on VLSI Circuits, p. 85, June 1994.

[21] M.Tsukude, M.Hirose, S.Tomishima, T.Tsuruda, T.Yamagata, K.Arimoto,and K.Fujishima,


“Automatic Voltage-Swing Reduction (AVR) Scheme for Ultra Low Power DRAMs,” 1994
Symposium on VLSI Circuits, p. 87, June 1994.

[22] D.Stark, H.Watanabe, and T.Furuyama, “An Experimental Cascade Cell Dynamic
Memory,” 1994 Symposium on VLSI Circuits, p. 89, June 1994.

[23] T.Inaba, D.Takashima, Y.Oowaki, T.Ozaki, S.Watanabe, and K.Ohuchi,“A 250mV Bit-Line
Swing Scheme for a 1V 4Gb DRAM,” 1995 Symposium on VLSI Circuits, p. 99, June 1995.

[24] I.Naritake, T.Sugibayashi, S.Utsugi, and T.Murotani, “A Crossing Charge Recycle Refresh
Scheme with a Separated Driver Sense-Amplifier for Gb DRAMs,” 1995 Symposium on VLSI
Circuits, p. 101, June 1995.

[25] S.Kuge, T.Tsuruda, S.Tomishima, M.Tsukude, T.Yamagata, and K.Arimoto, “SOI-DRAM


Circuit Technologies for Low Power High Speed Multi-Giga Scale Memories,” 1995
Symposium on VLSI Circuits, p. 103, June 1995.

[26] Y.Watanabe, H.Wong, T.Kirihata, D.Kato, J.DeBrosse, T.Hara,M.Yoshida, H.Mukai,


K.Quader, T.Nagai, P.Poechmueller, K.Pfefferl,M.Wordeman, and S.Fujii, “A 286mm2 256Mb
DRAM with X32 BothEnds DO,” 1995 Symposium on VLSI Circuits, p. 105, June 1995.

[27] T.Kirihata, Y.Watanabe, H.Wong, J.DeBrosse, M.Yoshida, D.Katoh,S.Fujii, M.Wordeman,


P.Poechmueller, S.Parke, and Y.Asao, “Fault-Tolerant Designs for 256 Mb DRAM,” 1995
Symposium on VLSI Circuits, p. 107, June 1995.

[28] D.Takashima, Y.Oowaki, S.Watanabe, and K.Ohuchi, “A Novel Power-Off Mode for a
Battery-Backup DRAM,” 1995 Symposium on VLSI Circuits, p. 109, June 1995.

[29] T.Ooishi, Y.Komiya, K.Hamada, M.Asakura, K.Yasuda, K.Furutani,T.Kato, H.Hidaka, and


H.Ozaki, “A Mixed-Mode Voltage-Down Converter with Impedance Adjustment Circuitry for
Low-Voltage Wide-Frequency DRAMs,” 1995 Symposium on VLSI Circuits, p. 111, June 1995.

[30] S.-J.Lee, K.-W.Park, C.-H.Chung, J.-S.Son, K.-H.Park, S.-H.Shin,S.T.Kim, J.-D.Han,


H.-J.Yoo, W.-S.Min, and K.-H.Oh, “A Low Noise 32Bit-Wide 256M Synchronous DRAM with
Column-Decoded I/O Line,” 1995 Symposium on VLSI Circuits, p. 113, June 1995.
[31] T.Sugibayashi, I.Naritake, S.Utsugi, K.Shibahara, R.Oikawa, H.Mori,S.Iwao, T.Murotani,
K.Koyama, S.Fukuzawa, T.Itani, K.Kasama,T.Okuda, S.Ohya, and M.Ogawa, “A 1-Gb DRAM
for File Applications,” IEEE Journal of Solid-State Circuits, vol. 30, pp. 1277–1280, November
1995.

[32] T.Yamagata, S.Tomishima, M.Tsukude, T.Tsuruda, Y.Hashizume, and K.Arimoto, “Low


Voltage Circuit Design Techniques for Battery-Operated and/or Giga-Scale DRAMs,” IEEE
Journal of Solid-State Circuits, vol. 30, pp. 1183–1188, November 1995.

[33] H.Nakano, D.Takashima, K.Tsuchida, S.Shiratake, T.Inaba, M.Ohta,Y.Oowaki,


S.Watanabe, K.Ohuchi, and J.Matsunaga, “A Dual Layer Bitline DRAM Array with VCC/VSS
Hybrid Precharge for Multi-Gigabit DRAMs,” 1996 Symposium on VLSI Circuits, p. 190, June
1996.

[34] J.Han, J.Lee, S.Yoon, S.Jeong, C.Park, I.Cho, S.Lee, and D.Seo, “Skew Minimization
Techniques for 256M-bit Synchronous DRAM and Beyond,” 1996 Symposium on VLSI Circuits,
p. 192, June 1996.

[35] H.Wong, T.Kirihata, J.DeBrosse, Y.Watanabe, T.Hara, M.Yoshida,M.Wordeman, S.Fujii,


Y.Asao, and B.Krsnik, “Flexible Test Mode Design for DRAM Characterization,” 1996
Symposium on VLSI Circuits, p. 194, June 1996.

[36] D.Takashima, Y.Oowaki, S.Watanabe, K.Ohuchi, and J.Matsunaga,“Noise Suppression


Scheme for Giga-Scale DRAM with Hundreds of I/Os,” 1996 Symposium on VLSI Circuits, p.
196, June 1996.

[37] S.Tomishima, F.Morishita, M.Tsukude, T.Yamagata, and K.Arimoto,“A Long Data


Retention SOI-DRAM with the Body Refresh Function,” 1996 Symposium on VLSI Circuits, p.
198, June 1996.

[38] M.Nakamura, T.Takahashi, T.Akiba, G.Kitsukawa, M.Morino,T.Sekiguchi, I.Asano,


K.Komatsuzaki, Y.Tadaki, SongsuCho,K.Kajigaya, T.Tachibana, and K.Sato, “A 29-ns 64-Mb
DRAM with Hierarchical Array Architecture,” IEEE Journal of Solid-State Circuits, vol. 31, pp.
1302–1307, September 1996.

[39] K.Itoh, Y.Nakagome, S.Kimura, and T.Watanabe, “Limitations and Challenges of


Multigigabit DRAM Chip Design,” IEEE Journal of Solid-State Circuits, vol. 32, pp. 624–634,
May 1997.

[40] Y.Idei, K.Shimohigashi, M.Aoki, H.Noda, H.Iwai, K.Sato, and T.Tachibana, “Dual-Period
Self-Refresh Scheme for Low-Power DRAM’s with On-Chip PROM Mode Register,” IEEE
Journal of Solid-State Circuits, vol. 33, pp. 253–259, February 1998.

[41] K.Kim, C.-G.Hwang, and J.G.Lee, “DRAM Technology Perspective for Gigabit Era,” IEEE
Transactions Electron Devices, vol. 45, pp. 598–608, March 1998.
[42] H.Tanaka, M.Aoki, T.Sakata, S.Kimura, N.Sakashita, H.Hidaka,T.Tachibana, and
K.Kimura, “A Precise On-Chip Voltage Generator for a Giga-Scale DRAM with a Negative
Word-Line Scheme,” 1998 Symposium on VLSI Circuits, p. 94, June 1998.

[43] T.Fujino and K.Arimoto, “Multi-Gbit-Scale Partially Frozen (PF) NAND DRAM with
SDRAM Compatible Interface,” 1998 Symposium on VLSI Circuits, p. 96, June 1998.

[44] A.Yamazaki, T.Yamagata, M.Hatakenaka, A.Miyanishi, I.Hayashi,S.Tomishima,


A.Mangyo, Y.Yukinari, T.Tatsumi, M.Matsumura,K.Arimoto, and M.Yamada, “A 5.3Gb/s 32Mb
Embedded SDRAM Core with Slightly Boosting Scheme,” 1998 Symposium on VLSI Circuits,
p. 100, June 1998.

[45] C.Kim, K.H.Kyung, W.P.Jeong, J.S.Kim, B.S.Moon, S.M.Yim,J.W.Chai, J.H.Choi, C.K.Lee,


K.H.Han, C.J.Park, H.Choi, and S.I.Cho, “A 2.5V, 2.0GByte/s Packet-Based SDRAM with a
1.0Gbps/pin Interface,” 1998 Symposium on VLSI Circuits, p. 104, June 1993.

[46] M.Saito, J.Ogawa, H.Tamura, S.Wakayama, H.Araki, Tsz-ShingCheung,K.Gotoh,


T.Aikawa, T.Suzuki, M.Taguchi, and T.Imamura, “500-Mb/s Nonprecharged Data Bus for
High-Speed Dram’s,” IEEE Journal of Solid-State Circuits, vol. 33, pp. 1720–1730, November
1998.

DRAM Cells

[47] C.G.Sodini and T.I.Kamins, “Enhanced Capacitor for One-Transistor Memory Cell,” IEEE
Transactions Electron Devices, vol. ED-23, pp. 1185–1187, October 1976.

[48] J.E.Leiss, P.K.Chatterjee, and T.C.Holloway, “DRAM Design Using the Taper-Isolated
Dynamic RAM Cell,” IEEE Journal of Solid-State Circuits, vol. 17, pp. 337–344, April 1982.

[49] K.Yamaguchi, R.Nishimura, T.Hagiwara, and H.Sunami, “Two-Dimensional Numerical


Model of Memory Devices with a Corrugated Capacitor Cell Structure,” IEEE Journal of
Solid-State Circuits, vol. 20, pp. 202–209, February 1985.

[50] N.C.Lu, P.E.Cottrell, W.J.Craig, S.Dash, D.L.Critchlow, R.L.Mohler,B.J.Machesney,


T.H.Ning, W.P.Noble, R.M.Parent, R.E.Scheuerlein,E.J.Sprogis, and L.M.Terman, “A
Substrate-Plate Trench-Capacitor (SPT) Memory Cell for Dynamic RAM’s,” IEEE Journal of
Solid-State Circuits, vol. 21, pp. 627–634, October 1986.

[51] Y.Nakagome, M.Aoki, S.Ikenaga, M.Horiguchi, S.Kimura, Y.Kawamoto,and K.Itoh, “The


Impact of Data-Line Interference Noise on DRAM Scaling,” IEEE Journal of Solid-State
Circuits, vol. 23, pp. 1120–1127, October 1988.

[52] K.W.Kwon. I.S.Park, D.H.Han, E.S.Kim, S.T.Ahn, and M.Y.Lee,“Ta2O5 Capacitors for 1
Gbit DRAM and Beyond,” 1994 IEDM Technical Digest, pp. 835–838.
[53] D.Takashima, S.Watanabe, H.Nakano, Y.Oowaki, and K.Ohuchi,“Open/Folded Bit-Line
Arrangement for Ultra-High-Density DRAM’s,” IEEE Journal of Solid-State Circuits, vol. 29, pp.
539–542, April 1994.

[54] WonchanKim, JoongsikKih, GyudongKim, SanghunJung, and GijungAhn, “An


Experimental High-Density DRAM Cell with a Built-in Gain Stage,” IEEE Journal of Solid-State
Circuits, vol. 29, pp. 978–981, August 1994.

[55] Y.Kohyama, T.Ozaki, S.Yoshida, Y.Ishibashi, H.Nitta, S.Inoue,K.Nakamura, T.Aoyama,


K.Imai, and N.Hayasaka, “A Fully Printable, Self-Aligned and Planarized Stacked Capacitor
DRAM Access Cell Technology for 1 Gbit DRAM and Beyond,” IEEE Symposium on VLSI
Technology Digest of Technical Papers, pp. 17–18, June 1997.

[56] B.El-Kareh, G.B.Bronner, and S.E.Schuster, “The Evolution of DRAM Cell Technology,
Solid State Technology, vol. 40, pp. 89–101, May 1997.

[57] S.Takehiro, S.Yamauchi, M.Yoshimaru, and H.Onoda, “The Simplest Stacked BST
Capacitor for Future DRAM’s Using a Novel Low Temperature Growth Enhanced
Crystallization,” IEEE Symposium on VLSI Technology Digest of Technical Papers, pp.
153–154, June 1997.

[58] T.Okuda and T.Murotani, “A Four-Level Storage 4-Gb DRAM,” IEEE Journal of Solid State
Circuits, vol. 32, pp. 1743–1747, November 1997.

[59] A.Nitayama, Y.Kohyama, and K.Hieda, “Future Directions for DRAM Memory Cell
Technology,” 1998 IEDM Technical Digest, pp. 355–358.

[60] K.Ono, T.Horikawa, T.Shibano, N.Mikami, T.Kuroiwa, T.Kawahara,S.Matsuno,


F.Uchikawa, S.Satoh, and H.Abe, “(Ba, Sr)TiO3 Capacitor Technology for Gbit-Scale DRAMs,”
1998 IEDM Technical Digest, pp. 803–806.

[61] H.J.Levy, E.S.Daniel, and T.C.McGill, “A Transistorless-Current-Mode Static RAM


Architecture,” IEEE Journal of Solid-State Circuits, vol. 33, pp. 669–672, April 1998.

DRAM Sensing

[62] N.C.-C.Lu and H.H.Chao, “Half-VDD/Bit-Line Sensing Scheme in CMOS DRAMs,” IEEE
Journal of Solid-State Circuits, vol. 19, pp. 451–454, August 1984.

[63] P.A.Layman and S.G.Chamberlain, “A Compact Thermal Noise Model for the
Investigation of Soft Error Rates in MOS VLSI Digital Circuits,” IEEE Journal of Solid-State
Circuits, vol. 24, pp. 79–89, February 1989.
[64] R.Kraus, “Analysis and Reduction of Sense-Amplifier Offset,” IEEE Journal of Solid-State
Circuits, vol. 24, pp. 1028–1033, August 1989.

[65] R.Kraus and K.Hoffmann, “Optimized Sensing Scheme of DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 24, pp. 895–899, August 1989.

[66] H.Hidaka, Y.Matsuda, and K.Fujishima, “A Divided/Shared Bit-Line Sensing Scheme for
ULSI DRAM Cores,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 473–478, April 1991.

[67] T.Nagai, K.Numata, M.Ogihara, M.Shimizu, K.Imai, T.Hara, M.Yoshida,Y.Saito, Y.Asao,


S.Sawada, and S.Fuji, “A 17-ns 4-Mb CMOS DRAM,” IEEE Journal of Solid-State Circuits, vol
26, pp. 1538–1543, November 1991.

[68] T.N.Blalock and R.C.Jaeger, “A High-Speed Sensing Scheme for IT Dynamic RAMs
Utilizing the Clamped Bit-Line Sense Amplifier,” IEEE Journal of Solid-State Circuits, vol. 27,
pp. 618–625, April 1992.

[69] M.Asakura, T.Ooishi, M.Tsukude, S.Tomishima, T.Eimori, H.Hidaka,Y.Ohno, K.Arimoto,


K.Fujishima, T.Nishimura, and T.Yoshihara, “An Experimental 256-Mb DRAM with Boosted
Sense-Ground Scheme,” IEEE Journal of Solid-State Circuits, vol. 29, pp. 1303–1309,
November 1994.

[70] T.Eirihata, S.H.Dhong, L.M.Terman, T.Sunaga, and Y.Taira, “A Variable Precharge


Voltage Sensing,” IEEE Journal of Solid-State Circuits, vol. 30, pp. 25–28, January 1995.

[71] T.Hamamoto, Y.Morooka, M.Asakura, and H.Ozaki, “Cell-Plate-Line/BitLine


Complementary Sensing (CBCS) Architecture for Ultra Low-Power DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 31, pp. 592–601, April 1996.

[72] T.Sunaga, “A Full Bit Prefetch DRAM Sensing Circuit,” IEEE Journal of Solid-State
Circuits, vol. 31, pp. 767–772, June 1996.

DRAM On-Chip Voltage Generation

[73] M.Horiguchi, M.Aoki, J.Etoh, H.Tanaka, S.Ikenaga, K.Itoh, K.Kajigaya,H.Kotani,


K.Ohshima, and T.Matsumoto, “A Tunable CMOS-DRAM Voltage Limiter with Stabilized
Feedback Amplifier,” IEEE Journal of Solid-State Circuits, vol. 25, pp. 1129–1135, October
1990.

[74] D.Takashima, S.Watanabe, T.Fuse, K.Sunouchi, and T.Hara, “Low-Power On-Chip


Supply Voltage Conversion Scheme for Ultrahigh-Density DRAMs,” IEEE Journal of
Solid-State Circuits, vol. 28, pp. 504–509, April 1993.
[75] T.Kuroda, K.Suzuki, S.Mita, T.Fujita, F.Yamane, F.Sano, A.Chiba,Y.Watanabe,
K.Matsuda, T.Maeda, T.Sakurai, and T.Furuyama, “Variable Supply-Voltage Scheme for
Low-Power High-Speed CMOS Digital Design,” IEEE Journal of Solid-State Circuits, vol. 33,
pp. 454–462, March 1998.

DRAM SOI

[76] S.Kuge, F.Morishita, T.Tsuruda, S.Tomishima, M.Tsukude, T.Yamagata,and K.Arimoto,


“SOI-DRAM Circuit Technologies for Low Power High Speed Multigiga Scale Memories,” IEEE
Journal of Solid-State Circuits, vol. 31, pp. 586–591, April 1996.

[77] K.Shimomura, H.Shimano, N.Sakashita, F.Okuda, T.Oashi, Y.Yamaguchi,T.Eimori,


M.Inuishi, K.Arimoto, S.Maegawa, Y.Inoue, S.Komori, and K.Kyuma, “A 1-V 46-ns 16-Mb
SOI-DRAM with Body Control Technique,” IEEE Journal of Solid-State Circuits, vol. 32, pp.
1712–1720, November 1997.

Embedded DRAM

[78] T.Sunaga, H.Miyatake, K.Kitamura, K.Kasuya, T.Saitoh, M.Tanaka,N.Tanigaki, Y.Mori,


and N.Yamasaki, “DRAM Macros for ASIC Chips,” IEEE Journal of Solid-State Circuits, vol. 30,
pp. 1006–1014, September 1995.

Redundancy Techniques

[79] H.L.Kalter, C.H.Stapper, J.E.Earth, Jr., J.DiLorenzo, C.E.Drake,J.A.Fifield, G.A.Kelley, Jr.,


S.C.Lewis, W.B.van der Hoeven, and J.A.Yankosky, “A 50-ns 16-Mb DRAM with a 10-ns Data
Rate and On-Chip ECC," IEEE Journal of Solid-State Circuits, vol. 25, pp. 1118–1128,
October 1990.

[80] M.Horiguchi, J.Etoh, M.Aoki, K.Itoh, and T.Matsumoto, “A Flexible Redundancy


Technique for High-Density DRAMs,” IEEE Journal of Solid-State Circuits, vol. 26, pp. 12–17,
January 1991.

[81] S.Kikuda, H.Miyamoto, S.Mori, M.Niiro, and M.Yamada, “Optimized Redundance


Selection based on Failure-Related Yield Model for 64-Mb DRAM and Beyond,” IEEE Journal
of Solid-State Circuits, vol. 26, pp. 1550–1555, November 1991.
[82] T.Kirihata, Y.Watanabe, HingWong, J.K.DeBrosse, M.Yoshida, D.Kato,S.Fujii,
M.R.Wordeman, P.Poechmueller, S.A.Parke, and Y.Asao, “Fault-Tolerant Designs for 256 Mb
DRAM,” IEEE Journal of Solid-State Circuits, vol. 31, pp. 558–566, April 1996.

DRAM Testing

[83] T.Ohsawa, T.Furuyama, Y.Watanabe, H.Tanaka, N.Kushiyama,K.Tsuchida, Y.Nagahama,


S.Yamano, T.Tanaka, S.Shinozaki, and K.Natori, “A 60-ns 4-Mbit CMOS DRAM with Built-in
Selftest Function,” IEEE Journal of Solid-State Circuits, vol. 22, pp. 663–668, October 1987.

[84] P.Mazumder, “Parallel Testing of Parametric Faults in a Three-Dimensional Dynamic


Random-Access Memory,” IEEE Journal of Solid-State Circuits, vol. 23, pp. 933–941, August
1988.

[85] K.Arimoto, Y.Matsuda, K.Furutani, M.Tsukude, T.Ooishi, K.Mashiko,and K.Fujishima, “A


Speed-Enhanced DRAM Array Architecture with Embedded ECC,” IEEE Journal of Solid-State
Circuits, vol. 25, pp. 11–17, February 1990.

[86] T.Takeshima, M.Takada, H.Koike, H.Watanabe, S.Koshimaru, K.Mitake,W.Kikuchi,


T.Tanigawa, T.Murotani, K.Noda, K.Tasaka, K.Yamanaka,and K.Koyama, “A 55-ns 16-Mb
DRAM with Built-in Self-test Function Using Microprogram ROM,” IEEE Journal of Solid-State
Circuits, vol. 25, pp. 903–911, August 1990.

[87] T.Kirihata, HingWong, J.K.DeBrosse, Y.Watanabe, T.Hara, M.Yoshida,M.R.Wordeman,


S.Fujii, Y.Asao, and B.Krsnik, “Flexible Test Mode Approach for 256-Mb DRAM," IEEE Journal
of Solid-State Circuits, vol. 32, pp. 1525–1534, October 1997.

[88] S.Tanoi, Y.Tokunaga, T.Tanabe, K.Takahashi, A.Okada, M.Itoh,Y.Nagatomo, Y.Ohtsuki,


and M.Uesugi, “On-Wafer BIST of a 200-Gb/s Failed-Bit Search for 1-Gb DRAM,” IEEE
Journal of Solid-State Circuits, vol. 32, pp. 1735–1742, November 1997.

Synchronous DRAM

[89] T.Sunaga, K.Hosokawa, Y.Nakamura, M.Ichinose, A.Moriwaki,S.Kakimi, and N.Kato, “A


Full Bit Prefetch Architecture for Synchronous DRAM’s,” IEEE Journal of Solid-State Circuits,
vol. 30, pp. 998–1005, September 1995.

[90] T.Kirihata, M.Gall, K.Hosokawa, J.-M.Dortu, HingWong, P.Pfefferi,B.L.Ji, O.Weinfurtner,


J.K.DeBrosse, H.Terletzki, M.Selz, W.Ellis,M.R.Wordeman, and O.Kiehl, “A 220-mm/sup2/,
Four-and Eight-Bank, 256-Mb SDRAM with Single-Sided Stitched WL Architecture,” IEEE
Journal of Solid-State Circuits, vol. 33, pp. 1711–1719, November 1998.

Low-Voltage DRAMs

[91] K.Lee, C.Kim, D.Yoo, J.Sim, S.Lee, B.Moon, K.Kim, N.Kim, S.Yoo,J.Yoo, and S.Cho,
“Low Voltage High Speed Circuit Designs for Giga-bit DRAMs,” 1996 Symposium on VLSI
Circuits, p. 104, June 1996.

[92] M.Saito, J.Ogawa, K.Gotoh, S.Kawashima, and H.Tamura, “Technique for Controlling
Effective VTH in Multi-Gbit DRAM Sense Amplifier,” 1996 Symposium on VLSI Circuits, p. 106,
June 1996.

[93] K.Gotoh, J.Ogawa, M.Saito, H.Tamura, and M.Taguchi, “A 0.9 V Sense-Amplifier Driver
for High-Speed Gb-Scale DRAMs,” 1996 Symposium on VLSI Circuits, p. 108, June 1996.

[94] T.Hamamoto, Y.Morooka, T.Amano, and H.Ozaki, “An Efficient Charge Recycle and
Transfer Pump Circuit for Low Operating Voltage DRAMs,” 1996 Symposium on VLSI Circuits,
p. 110, June 1996.

[95] T.Yamada, T.Suzuki, M.Agata, A.Fujiwara, and T.Fujita, “Capacitance Coupled Bus with
Negative Delay Circuit for High Speed and Low Power (10GB/s<500mW) Synchronous
DRAMs,” 1996 Symposium on VLSI Circuits, p. 112, June 1996.

High-Speed DRAMs

[96] S.Wakayama, K.Gotoh, M.Saito, H.Araki, T.S.Cheung, J.Ogawa, and H.Tamura, “10-ns
Row Cycle DRAM Using Temporal Data Storage Buffer Architecture,” 1998 Symposium on
VLSI Circuits, p. 12, June 1998.

[97] Y.Kato, N.Nakaya, T.Maeda, M.Higashiho, T.Yokoyama, Y.Sugo,F.Baba, Y.Takemae,


T.Miyabo, and S.Saito, “Non-Precharged Bit-Line Sensing Scheme for High-Speed
Low-Power DRAMs,” 1998 Symposium on VLSI Circuits, p. 16, June 1998.

[98] S.Utsugi, M.Hanyu, Y.Muramatsu, and T.Sugibayashi, “Non-Complimentary Rewriting


and Serial-Data Coding Scheme for Shared-Sense-Amplifier Open-Bit-Line DRAMs,” 1998
Symposium on VLSI Circuits, p. 18, June 1998.

[99] Y.Sato, T.Suzuki, T.Aikawa, S.Fujioka, W.Fujieda, H.Kobayashi,H.Ikeda, T.Nagasawa,


A.Funyu, Y.Fujii, K.I.Kawasaki, M.Yamazaki, and M.Taguchi, “Fast Cycle RAM (FCRAM); a
20-ns Random Row Access, Pipe-Lined Operating DRAM,” 1998 Symposium on VLSI Circuits,
p. 22, June 1998.

High-Speed Memory Interface Control

[100] S.-J.Jang, S.-H.Han, C.-S.Kim, Y.-H.Jun, and H.-J.Yoo, “A Compact Ring Delay Line for
High Speed Synchronous DRAM,” 1998 Symposium on VLSI Circuits, p. 60, June 1998.

[101] H.Noda, M.Aoki, H.Tanaka, O.Nagashima, and H.Aoki, “An On-Chip Timing Adjuster
with Sub-100-ps Resolution for a High-Speed DRAM Interface,” 1998 Symposium on VLSI
Circuits, p. 62, June 1998.

[102] T.Sato, Y.Nishio, T.Sugano, and Y.Nakagome, “5GByte/s Data Transfer Scheme with
Bit-to-Bit Skew Control for Synchronous DRAM,” 1998 Symposium on VLSI Circuits, p. 64,
June 1998.

[103] T.Yoshimura, Y.Nakase, N.Watanabe, Y.Morooka, Y.Matsuda,M.Kumanoya, and


H.Hamano, “A Delay-Locked Loop and 90-Degree Phase Shifter for 800Mbps Double Data
Rate Memories,” 1998 Symposium on VLSI Circuits, p. 66, June 1998.

High-Performance DRAM

[104] T.Kono, T.Hamamoto, K.Mitsui, and Y.Konishi, “A Precharged-Capacitor-Assisted


Sensing (PCAS) Scheme with Novel Level Controlled for Low Power DRAMs,” 1999
Symposium on VLSI Circuits, p. 123, June 1999.

[105] H.Hoenigschmid, A.Frey, J.DeBrosse, T.Kirihata, G.Mueller, G.Daniel,G.Frankowsky,


K.Guay, D.Hanson, L.Hsu, B.Ji, D.Netis, S.Panaroni,C.Radens, A.Reith, D.Storaska,
H.Terletzki, O.Weinfurtner, J.Alsmeier,W.Weber, and M.Wordeman, “A 7F2 Cell and Bitline
Architecture Featuring Tilted Array Devices and Penalty-Free Vertical BL Twists for 4Gb
DRAM’s” 1999 Symposium on VLSI Circuits, p. 125, June 1999.

[106] S.Shiratake, K.Tsuchida, H.Toda, H.Kuyama, M.Wada, F.Kouno,T.Inaba, H.Akita, and


K.Isobe, “A Pseudo Multi-Bank DRAM with Categorized Access Sequence,” 1999 Symposium
on VLSI Circuits, p. 127, June 1999.

[107] Y.Kanno, H.Mizuno, and T.Watanabe, “A DRAM System for Consistently Reducing CPU
Wait Cycles,” 1999 Symposium on VLSI Circuits, p. 131, June 1999.

[108] S.Perissakis, Y.Joo, J.Ahn, A.DeHon, and J.Wawrzynek, “Embedded DRAM for a
Reconfigurable Array,” 1999 Symposium on VLSI Circuits, p. 145, June 1999.
[109] T.Namekawa, S.Miyano, R.Fukuda, R.Haga, O.Wada, H.Banba,S.Takeda, K.Suda,
K.Mimoto, S.Yamaguchi, T.Ohkubo, H.Takato, and K.Numata, “Dynamically Shift-Switched
Dataline Redundancy Suitable for DRAM Macro with Wide Data Bus,” 1999 Symposium on
VLSI Circuits, p. 149, June 1999.

[110] C.Portmann, A.Chu, N.Hays, S.Sidiropoulos, D.Stark, P.Chau,K.Donnelly, and


B.Garlepp, “A Multiple Vendor 2.5-V DLL for 1.6-GB/s RDRAMs,” 1999 Symposium on VLSI
Circuits, p. 153, June 1999.

Glossary
1T1C
A DRAM memory cell consisting of a single MOSFET access transistor and a
single storage capacitor.
Bitline
Also called a digitline or columnline. A common conductor made from metal or
polysilicon that connects multiple memory cells together through their access
transistors. The bitline is ultimately used to connect memory cells to the sense
amplifier block to permit Refresh, Read, and Write operations.
Bootstrapped Driver
A driver circuit that employs capacitive coupling to boot, or raise up, a capacitive
node to a voltage above VCC.
Buried Capacitor Cell
A DRAM memory cell in which the capacitor is constructed below the digitline.
Charge Pump
See Voltage Pump.
CMOS,
Complementary
Metal-Oxide
Semiconductor
A silicon technology for fabricating integrated circuits. Complementary refers to
the technology’s use of both NMOS and PMOS transistors in its construction. The
PMOS transistor is used primarily to pull signals toward the positive power supply
VDD. The NMOS transistor is used primarily to pull signals toward ground. The
metal-oxide semiconductor describes the sandwich of metal oxide (actually
polysilicon in modern devices) and silicon that makes up the NMOS and PMOS
transistors.
COB,
Capacitor
over
Bitline
A DRAM memory cell in which the capacitor is constructed above the digitline
(bitline).
C
o
l
u
m
n
l
i
n
e

See Bitline.
C
o
l
u
m
n

R
e
d
u
n
d
a
n
c
y

The practice of adding spare digitlines to a memory array so that defective


digitlines can be replaced with nondefective digitlines.
An amplifier connected to the memory array I/O lines that amplifies signals
coming from the array.

See Bitline.
A circuit that generates and inserts an optimum delay to temporarily align two
signals. In DRAM, a DLL synchronizes the input and output clock signals of the
DRAM to the I/O data signals.
A memory technology that stores information in the form of electric charge on
capacitors. This technology is considered dynamic because the stored charge
degrades over time due to leakage mechanisms. The leakage necessitates
periodic Refresh of the memory cells to replace the lost charge.

Additional circuitry or structures added to a design, most often to the memory


array, that help maintain uniformity in live circuit structures. Nonuniformity occurs
at the edges of repetitive structures due to photolithography and etch process
limitations.
A variation of fast page mode (FPM) DRAM. In EDO DRAM, the data outputs
remain valid for a specified time after CAS goes HIGH during a Read operation.
This feature permits higher system performance than would be obtained
otherwise.

A design metric, which is defined as the ratio of memory array die area and total
die area (chip size). It is expressed as a percentage.
A circuit that equalizes the voltages of a digitline pair by shorting the two digitlines
together. Most often, the equilibration circuit includes a bias network, which helps
to set and hold the equilibration level to a known voltage (generally VCC/2) prior to
Sensing.

Generally refers to the minimum realizable process dimension. In the context of


DRAM design, however, feature size equates to a dimension that is half of the
digitline or wordline layout pitch.
A DRAM architecture that uses non-crosspoint-style memory arrays in which a
memory cell is placed only at alternating wordline and digitline intersections.
Digitline pairs, for connection to the sense amplifiers, consist of two adjacent
digitlines from a single memory array. For layout efficiency, each sense amplifier
connects to two adjacent memory arrays through isolation transistor pairs.
A second-generation memory technology permitting consecutive Reads from an
open page of memory, in which the column address could be changed while CAS
was still low.

A positive feedback (regenerative) circuit for amplifying the signals on the I/O
lines.

MOSFET transistors that connect the array digitlines to the I/O lines (through the
sense amplifiers). Read and Write operations from/to the memory arrays always
occur through I/O devices.
MOSFET transistors that isolate array digitlines from the sense amplifiers.

A memory cell capable of storing one bit of data. In modern DRAMs, the mbit
consists of a single MOSFET access transistor and a single storage capacitor.
The gate of the MOSFET connects to the wordline or rowline, while the source
and drain of the MOSFET connect to the storage capacitor and the digitline,
respectively.
An array of memory or mbit cells.

The practice of using the same chip address pins for both the row and column
addresses. The addresses are clocked into the device at different times.
A DRAM architecture that uses cross-point-style memory arrays in which a
memory cell is placed at every wordline and digitline intersection. Digitline pairs,
for connection to the sense amplifiers, consist of a single digitline from two
adjacent memory arrays.

The distance between like points in a periodic array. For example, digitline pitch
in a DRAM array is the distance between the centers or edges of two adjacent
digitlines.
Computer memory that allows access to any memory location without
restrictions,

The process of restoring the electric charge in DRAM memory cell capacitors to
full levels through Sensing. Note that Refresh occurs every time a wordline is
activated and the sense amplifiers are fired.

See Wordline.

A DRAM technology in which addressing, command, control, and data operations


are accomplished in synchronism with a master clock signal.

A type of regenerative amplifier that senses the contents of memory cells and
restores them to full levels.

A DRAM storage capacitor fabricated in a deep hole (trench) in the


semiconductor substrate.

Also called a charge pump. A circuit for generating voltages that lie outside of the
power supply range.

Also called a rowline. A wordline is a polysilicon conductor for forming memory


cell access transistor gates and connecting multiple memory cells into a physical
row. Driving a wordline HIGH activates, or turns ON, the access transistors in a
memory array row.

List of Figures
Chapter 1: An Introduction to DRAM
Figure 1.1: 1,024-bit DRAM functional diagram.
Figure 1.2: 1,024-bit DRAM pin connections.
Figure 1.3: Ideal address input buffer.
Figure 1.4: Layout of a 1,024-bit memory array.
Figure 1.5: 1k DRAM Read cycle.
Figure 1.6: 1k DRAM Write cycle.
Figure 1.7: 1k DRAM Refresh cycle.
Figure 1.8: 3-transistor DRAM cell.
Figure 1.9: Block diagram of a 4k DRAM.
Figure 1.10: 4,096-bit DRAM pin connections.
Figure 1.11: Address timing.
Figure 1.12: 1-transistor, 1-capacitor (1T1C) memory cell.
Figure 1.13: Row of N dynamic memory elements.
Figure 1.14: Page mode.
Figure 1.15: Fast page mode.
Figure 1.16: Nibble mode.
Figure 1.17: Pin connections of a 64Mb SDRAM with 16-bit I/O.
Figure 1.18: Block diagram of a 64Mb SDRAM with 16-bit I/O.
Figure 1.19: SDRAM with a latency of three.
Figure 1.20: Mode register.
Figure 1.21: IT 1C DRAM memory cell. (Note the rotation of the rowline and columnline.)
Figure 1.22: Open digitline memory array schematic.
Figure 1.23: Open digitline memory array layout.
Figure 1.24: Simple array schematic (an open DRAM array).
Figure 1.25: Cell access waveforms.
Figure 1.26: DRAM charge-sharing.
Figure 1.27: Sense amplifier schematic.
Figure 1.28: Sensing operation waveforms.
Figure 1.29: Sense amplifier schematic with I/O devices.
Figure 1.30: Write operation waveforms.
Figure 1.31: A folded DRAM array.

Chapter 2: The DRAM Array


Figure 2.1: Mbit pair layout.
Figure 2.2: Layout to show array pitch.
Figure 2.3: Layout to show 8F2 derivation.
Figure 2.4: Folded digitline array schematic.
Figure 2.5: Digitline twist schemes.
Figure 2.6: Open digitline array schematic.
Figure 2.7: Open digitline array layout. (Feature size (F) is equal to one-half digitline
pitch.)
Figure 2.8: Buried capacitor cell process cross section.
Figure 2.9: Buried capacitor cell process SEM image.
Figure 2.10: Buried digitline mbit cell layout.
Figure 2.11: Buried digitline mbit process cross section.
Figure 2.12: Buried digitline mbit process SEM image.
Figure 2.13: Trench capacitor mbit process cross section.
Figure 2.14: Trench capacitor mbit process SEM image.
Figure 2.15: Equilibration schematic.
Figure 2.16: Equilibration and bias circuit layout.
Figure 2.19: Standard sense amplifier block.
Figure 2.17: I/O transistors.
Figure 2.18: Basic sense amplifier block.
Figure 2.20: Complex sense amplifier block.
Figure 2.21: Reduced sense amplifier block.
Figure 2.22: Minimum sense amplifier block.
Figure 2.23: Single-metal sense amplifier block.
Figure 2.24: Waveforms for the Read-Modify-Write cycle.
Figure 2.25: Bootstrap wordline driver.
Figure 2.26: Bootstrap operation waveforms.
Figure 2.27: Donut gate structure layout.
Figure 2.28: NOR driver.
Figure 2.29: CMOS driver.
Figure 2.30: Static decode tree.
Figure 2.31: P&E decode tree.
Figure 2.32: Pass transistor decode tree.

Chapter 3: Array Architectures


Figure 3.1: Open digitline architecture schematic.
Figure 3.2: Open digitline 32-Mbit array block.
Figure 3.3: Single-pitch open digitline architecture.
Figure 3.4: Open digitline architecture with dummy arrays.
Figure 3.5: Folded digitline array architecture schematic.
Figure 3.6: Folded digitline architecture 32-Mbit array block.
Figure 3.7: Development of bilevel digitline architecture.
Figure 3.8: Digitline vertical twisting concept.
Figure 3.9: Bilevel digitline architecture schematic.
Figure 3.10: Vertical twisting schemes.
Figure 3.11: Plaid 6F2 mbit array.
(The isolation gates are tied to ground or a negative voltage.)
Figure 3.12: Bilevel digitline array schematic.
Figure 3.13: Bilevel digitline architecture 32-Mbit array block.
Chapter 4: The Peripheral Circuitry
Figure 4.1: Column decode.
Figure 4.2: Column selection timing.
Figure 4.3: Column decode: P&E logic.
Figure 4.4: Column decode waveforms.
Figure 4.5: Row fuse block.
Figure 4.6: Column fuse block.
Figure 4.7: 8-Meg×8-sync DRAM poly fuses.

Chapter 5: Global Circuitry and Considerations


Figure 5.1: Data input buffer.
Figure 5.2: Stub series terminated logic (SSTL).
Figure 5.3: Differential amplifier-based input receiver.
Figure 5.4: Self-biased differential amplifier-based input buffer.
Figure 5.5: Fully differential amplifier-based input buffer.
Figure 5.6: Data Write mux.
Figure 5.7: Write driver.
Figure 5.8: I/O bias circuit.
Figure 5.9: I/O bias and operation waveforms.
Figure 5.10: DC sense amp.
Figure 5.11: DCSA operation waveforms.
Figure 5.12: A helper flip-flop.
Figure 5.13: Data Read mux.
Figure 5.14: Output buffer.
Figure 5.15: Row address buffer.
Figure 5.16: Row address predecode circuits.
Figure 5.17: Phase decoder/driver.
Figure 5.18: Column address buffer.
Figure 5.19: Equilibration driver.
Figure 5.20: Column predecode logic.
Figure 5.21: SDRAM CLK input and DQ output.
Figure 5.22: Block diagram for DDR SDRAM DLL.
Figure 5.23: Data timing chart for DDR DRAM.
Figure 5.24: Phase detector used in RSDLL.
Figure 5.25: Symmetrical delay element used in RSDLL.
Figure 5.26: Delay line and shift register for RSDLL.
Figure 5.27: Measured rms jitter versus input frequency.
Figure 5.28: Measured delay per stage versus VCC and temperature.
Figure 5.29: Measured ICC (DLL current consumption) versus input frequency.
Figure 5.30: Two-way arbiter as a phase detector.
Figure 5.31: Circuit for generating shift register control.
Figure 5.32: A double inverter used as a delay element.
Figure 5.33: Transmission gates added to delay line.
Figure 5.34: Inverter implementation.
Figure 5.35: Segmenting delays for additional clocking taps.

Chapter 6: Voltage Converters


Figure 6.1: Ideal regulator characteristics.
Figure 6.2: Alternative regulator characteristics.
Figure 6.3: Resistor/diode voltage reference.
Figure 6.4: Voltage regulator characteristics.
Figure 6.5: Improved voltage reference.
Figure 6.6: Bandgap reference circuit.
Figure 6.7: Power op-amp.
Figure 6.8: Power stage.
Figure 6.9: Regulator control logic.
Figure 6.10: Simple voltage pump circuit.
Figure 6.11: VCCP pump.
Figure 6.12: VBB pump.
Figure 6.13: Ring oscillator.
Figure 6.14: VCCP regulator.
Figure 6.15: VBB regulator.
Figure 6.16: VCCP differential regulator.
Figure 6.17: VBB differential regulator.
Figure 6.18: Simple DVC2 generator.
Figure 6.19: DVC2 generator.

List of Tables
Chapter 1: An Introduction to DRAM
Table 1.1: SDRAM commands. (Notes: 1)

Chapter 2: The DRAM Array


Table 2.1: Predecoded address truth table.

Chapter 3: Array Architectures


Table 3.1: 0.25 μm design parameters.
Table 3.2: Active current and power versus digitline length.
Table 3.3: Wordline time constant versus wordline length.
Table 3.4: Open digitline (local row decode)—32-Mbit size calculations.
Table 3.5: Open digitline (dummy arrays and global row decode)—32-Mbit size
calculations.
Table 3.6: Open digitline (dummy arrays and hierarchical row decode)—32-Mbit size
calculations.
Table 3.7: Wordline time constant versus wordline length (folded).
Table 3.8: Folded digitline (local row decode)—32-Mbit size calculations.
Table 3.9: Folded digitline (global decode)—32-Mbit size calculations.
Table 3.10: Folded digitline (hierarchical row decode)—32-Mbit size calculations.
Table 3.11: Active current and power versus bilevel digitline length.
Table 3.12: Bilevel digitline (local row decode)—32-Mbit size calculations.
Table 3.13: Bilevel digitline (global decode)—32-Mbit size calculations.
Table 3.14: Bilevel digitline (hierarchical row decode)—32-Mbit size calculations.
Table 3.15: 32-Mbit size calculations summary.

Chapter 5: Global Circuitry and Considerations


Table 5.1: Refresh rate versus row and column addresses.

Chapter 6: Voltage Converters


Table 6.1: DRAM process versus supply voltage.

You might also like