0% found this document useful (0 votes)
35 views

Lecture 9 Memory Peripherals 2021

This lecture discusses memory peripherals such as row decoders, column multiplexers, sense amplifiers, and write drivers that are used in memory architectures. It provides an overview of memory architecture including the storage cell, word line, bit line, and address decoding. It also covers synchronous SRAM interface timing and definitions of memory timing parameters. Finally, it discusses the design of row decoders using AND/NAND gates and analyzing their logical effort.

Uploaded by

Noam Shemla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Lecture 9 Memory Peripherals 2021

This lecture discusses memory peripherals such as row decoders, column multiplexers, sense amplifiers, and write drivers that are used in memory architectures. It provides an overview of memory architecture including the storage cell, word line, bit line, and address decoding. It also covers synchronous SRAM interface timing and definitions of memory timing parameters. Finally, it discusses the design of row decoders using AND/NAND gates and analyzing their logical effort.

Uploaded by

Noam Shemla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Digital Integrated Circuits

(83-313)

Lecture 9:
Memory Peripherals
Prof. Adam Teman
25 May 2021

Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from sources freely available on the internet. When possible, these sources have been cited;
however, some references may have been cited incorrectly or overlooked. If you feel that a picture, graph, or code example has been copied from you and either needs to be cited or removed,
please feel free to email [email protected] and I will address this as soon as possible.
Lecture Content

2
© Adam May
Teman,
25, 2021
Memory Peripherals Overview

3
Memory Architecture
Storage Cell
Bit Line Memory Size: W Words of C bits
=W x C bits
Address bus: A bits
ADDA-1 : ADDM

→W=2A
Row Decoder

Word Line

Number of Words in a Row: 2M


Multiplexing Factor: M

Number of Rows: 2A-M


Number of Columns: C x 2M
C×2M
Sense Amplifiers /Drivers
Row Decoder: A-M → 2A-M
ADDM-1 :
Column Decoder Column Decoder: M → 2M
ADD0
Input/Output
4 (C bits) © Adam May
Teman,
25, 2021
Synchronous SRAM Interface 2mxn SRAM
A[m-1:0]

• A typical on-chip synchronous SRAM features: D[n-1:0] Q[n-1:0]

• Single-cycle write/read latency WEN[p-1:0]

• Byte write mask CEN


• Active low Write Enable (i.e., WEN=1 → Read Enable) CLK
• The timing diagram can be viewed, as follows:
(1) Rising edge of the clock results
CLK in WRITE, when WE is low.

A A0 A1 A2 A3 (2) Rising edge of the clock results


in READ, when WE is high.
D D0 D1 Valid data appears on the
output after a delay.
WE
Q D2 D3
5 © Adam May
Teman,
25, 2021
Memory Timing: Definitions

Real Datasheet
Example

Simple Definitions

Source: CMU, ECE548

6 Write Cycle © Adam May


Teman,
Read25, 2021
Cycle
Major Peripheral Circuits
Storage Cell
Bit Line

• Row Decoder
• Column Multiplexer

Row Decoder
Word Line

AW-1 : AM
• Sense Amplifier
• Write Driver
• Precharge Circuit

C×2M
Sense Amplifiers /Drivers

AM-1 : A0 Column Decoder

Input/Output
(C bits)
7
© Adam May
Teman,
25, 2021
Row Decoder Design

8
Row Decoders
• A Decoder reduces the number of select signals by log2.
• Number of Rows: W
• Number of Row Address Bits: A=log2W
Word 0
Word 1
ADDA-1 : ADD0

Word 2
Row Decoder

Word W-2
Word W-1

9 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Row Decoders
• Standard Decoder Design:
• Each output row is driven by an AND gate with k=log2N inputs.
• Each gate has a unique combination of address inputs
(or their inverted values).
• For example, an 8-bit row address has 256 8-input AND gates, such as:

WL0 = A7 A6 A5 A4 A3 A2 A1 A0 WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• NOR Decoder:
• DeMorgan will provide us with a NOR Decoder.
• In the previous example, we’ll get 256 8-input NOR gates:
WL0 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A0
WL255 = A7 + A6 + A5 + A4 + A3 + A2 + A1 + A 0
10 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
How should we build it? WL0

• Let’s build a row decoder for a 256x256 SRAM Array.


• We need 256 8-input AND Gates.
WL1
• Each gate drives 256 bitcells
• We have various options:

WL255

• Which one is best?


11 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Reminder: Logical Effort
t pd ,i = t pINV ( pi + EFi )
bi  Cin,i +1
PE = F   LE  B =   LEi  bi
CL
EFi LEi  fi = LEi 
Cin,i Cin ,1
EFopt = PE = N F   LEi  bi
N

Nopt = log EFopt PE = log EFopt F  LE  B

(
t pd = t pINV  ( pi + EFi ) = t pINV   pi + N  N PE )
12 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Problem Setup
• For LE calculation we need to start with:
• Output Load (CL)
• Input Capacitance (Cin)
• Branching (B)
• What is the Load Capacitance?
• 256 bitcells on each Word Line

CWL = 256  CCell + CWire


• Let’s ignore the wire for now…
• What is the Input Capacitance?
• Let’s assume our address drivers
can drive a bit more than a bitcell, so: Cin ,addr _ driver = 4  CCell
13 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Problem Setup
• What is the Branching Effort?
• Lets take another look WL0 = A7 A6 A5 A4 A3 A2 A1 A0
at the Boolean expressions:
WL255 = A7 A6 A5 A4 A3 A2 A1 A0
• We see that half of the signals use Ai and half use Ai!
• So each address driver drives 128 8-input AND gates,
but only one is on the selected WL path.

Con path = Cnand ; Coff path = 127  Cnand


Con path + Coff path Cnand + 127  Cnand
Badd _ driver = = = 128
Con path Cnand
14 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Number of Stages
CWL 256CCell
• Altogether the path effort is: PE = LE  B  F = LE  bi = LE  128 
Caddress 4CCell
= LE  8k = 213  LE

• The best case logical effort is


LE = 1
• So the minimum number of
stages for optimal delay is: PE = 213
N opt = log 3.6 2 = 7
13

• That’s a lot of stages!

15 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
So which implementation should we use?
• The one with the minimum Logical Effort:

LE = (10 3) 1 LE = 2  ( 5 3) LE = ( 4 3)  ( 5 3)  ( 4 3) 1 LE = ( 4 3)


3

= 10 3; = 10 3 = 80 27;
p = 2 + 2 + 2 +1 = 7
= 2.37;
p = 8 +1 = 9 p = 4+2 = 6
p = 2  3 + 1 3 = 9

16 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
New optimal number of Stages
• So now we can calculate the actual path effort:

PE = F  bi  LEi =
= 2.37  213 = 19.418k
N opt = log 3.6 PE = 7.7

• We could add another inverter or two


to get closer to the optimal number of stages…

17 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Implementation Problems
• Address Line Capacitance:
• Our assumption was that Cin,addr_driver=4Ccell.
• But each address drives 128 gates
• That’s a really long wire with high capacitance.
• This means that we will need to buffer the address lines
• This will probably ruin our whole analysis...

• Bit-cell Pitch:
• Each signal drives one row of bitcells.
• How will we fit 8 address signals into this pitch?

18 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Predecoding - Concept
• Solution:
• Let’s look at two decoder paths: WL254, WL255
A0 A0 A0
A1 A0
A1 A1
A1
A2
A2 A2
A2
A3
WL255 A3 A3 A
A4 WL254 3
A4
A5 A4
A5 A4
A6 A5 A5
A6
A7
A6 A7
A6
A7 A7

• We see that there are many “shared” gates.


• So why not share them?
• For instance, we can use the purple output for both gates…
19 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Method

4 →16
A0
• How do we do this? A1 D
A2
• If we look at the final Boolean expression, A3
it has combinations of groups of inputs.
• By grouping together a few inputs,
we actually create a small decoder.

4 →16
• Then we just AND the outputs of all the A4
“pre” decoders.
A5 E
A6
• For example: Two 4:16 predecoders A7
D = dec ( A0 , A1 , A2 , A3 ) ; E = dec ( A4 , A5 , A6 , A7 ) ;
WL0 = D0  E0 ; WL255 = D15  E15 ; WL254 = D14  E15 ;
20 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Example
• Let’s look at our example: WL0 = D0  E0
D = dec ( A0 , A1 , A2 , A3 ) WL255 = D15  E15
E = dec ( A4 , A5 , A6 , A7 ) WL254 = D15  E14
• What is our new branching effort?
• As before, each address drives half the lines of the small decoder.
• Each predecoder output drives 256/16 post-decoder gates.
• Altogether, the branching effort is:
B = baddr _ driver  bpredecoder = 16  256 = 128
2 16
• Same as before!
21 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Predecoding - Solution
• Why is this a better solution?
• Each Address driver is only driving eight gates
• less capacitance.
• We saved a ton of area by “sharing” gates.
• We can “Pitch Fit” 2-input NAND gates.

22 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Another Predecoding Example
• We can try using four 2-input predecoders:
• This will require us to use 256 4-input NAND gates.

23 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
How do we choose a configuration?
• Pitch Fitting: 2-input NANDs vs. 4-input NAND.
• Switching Capacitance: How many wires switch at each transition?
• Stages Before the large cap: Distribution of the load along the delay.
• Conclusion: Usually do as much predecoding as possible!
WL0 WL0

WL1 WL1
4 4 4 4 16 16

WL127 WL127

2→4 2→4 2→4 2→4 4 →16 4 →16

A0A1 A2A3 A4A5 A6A7 A0A1A2A3 A4A5A6A7


24 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Alternative Solution: Dynamic Decoders

GND

GND
VDD
PC

WL0 WL3

WL1 WL2

WL1
WL2
WL0
WL3
A0

A0

A1

A1

A0

A0

A1

A1
2-input NOR decoder 2-input NAND decoder
25 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
Column Multiplexer

26
Column Multiplexer
• First option – PTL Mux with decoder
• Fast – only 1 transistor in signal path.
• Large transistor Count A1 A0

B0 B1 B2 B3

Y
27 Row Decoder Column Mux Precharge Sense Amp © Adam May
Teman,
25, 2021
4 to 1 tree decoder
• Second option – Tree Decoder
• For 2k:1 Mux, it uses k series transistors.
• Delay increases quadratically
• No external decode logic → big area reduction.

28 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Combining the Two

29 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Precharge and Sense Amp

30
Precharge Circuitry
• Precharge bitlines high before reads


bit bit_b
• Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b

31 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Sense Amplifiers
make D V as small
C  DV as possible
tp = ----------------
Iav

large small

Idea: Use Sense Amplifier

small
transition s.a.

input output

32 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
Differential Sense Amplifier
• Non-clocked Sense Amp has high static power.
• Clocked sense amp saves power
• Requires sense_clk after enough bitline swing
• Isolation transistors cut off large bitline capacitance

33 Row Decoder Column Mux Precharge Sense Amp © Adam May


Teman,
25, 2021
The Computer Hall of Fame
• The machine that many of us got to know
during our military service:

Source: pcworld.com
• 32-bit, CISC architecture, introduced in 1977
• The VAX-11/780 was TTL-based, 5MHz, 2kB cache, reaching 1 MIPS
• Known as a “minicomputer”, even though it took up a whole room.
• VAX means “Virtual Address Extension”,
since the VAX was one of the first minicomputers to use virtual memory.
• Ran the VMS operating system.
• Many systems that were developed during the cold war
(e.g., F-15, F-18, Hawk missiles, nuclear programs) still use VAX today!
Further Reading
• Rabaey, et al. “Digital Integrated Circuits” (2nd Edition)
• Elad Alon, Berkeley ee141 (online)
• Weste, Harris, “CMOS VLSI Design (4th Edition)”

36
© Adam May
Teman,
25, 2021

You might also like