0% found this document useful (0 votes)

96 views

DRAM Basics by Prof. Matthew D. Sinclair

1) Die-stacked DRAM stacks multiple DRAM dies on a single logic die to increase density and bandwidth. Current examples include Hybrid Memory Cube and High Bandwidth Memory. 2) DRAM uses capacitors to store bits which requires periodic refreshing to prevent data loss from charge leakage over time. DRAM operations like reading and writing involve precharging and accessing the row buffer to avoid destructive reads. 3) DRAM has a two-level addressing scheme using row and column addresses to access a 2D array, and requires multi-step operations to perform reads and writes via the row buffer.

Uploaded by

fasil t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

DRAM Basics by Prof. Matthew D. Sinclair

Uploaded by

fasil t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

CS/ECE 752:

Advanced Computer Architecture I

Prof. Matthew D. Sinclair

DRAM Basics
Slide History/Attribution Diagram:
UW Madison UPenn UW Madison
UCLA
Hill, Sohi, Amir Roth, Hill, Sohi, Wood,
Nowatzki
Smith, Wood Milo Martin Various Universities Sankaralingam, Sinclair
Asanovic, Falsafi, Hoe, Lipasti,
Shen, Smith, Vijaykumar
Die-stacked DRAM (3D-DRAM)
• Die-stacked DRAM: Multiple DRAM
• Top layers store data dies are stacked
on top of bottom
• Bottom logic layer stores the
various control, access, and logic die
interface circuits
• Magic: Stacked means high
density, so high b/w interposer
integration not so expensive.
• Current Products: Source: AMD.com
• Hybrid Memory Cube (Micron)
• High Bandwidth Memory (Samsung, AMD, and Hynix)
• Tradeoffs:
• Basically the same latency as DRAM, but much higher bandwidth
• More expensive, so we can’t have as much memory…
• In GPU: Need high bandwidth all the time, but don’t need
that much memory, so it can serve as the main memory.
• What about CPU: Need huge memory, so cost is critical…
2
Emerging: Hybrid Memory Cube

• Micron proposal [Pawlowski, Hot Chips 11]

• www.hybridmemorycube.org (Now a dead link :( )
3
Hybrid Memory Cube MCM

• Micron proposal [Pawlowski, Hot Chips 11]

• www.hybridmemorycube.org (Now a dead link :( )

4
Network of DRAM

• Traditional DRAM: star topology

• HMC: mesh, etc. are feasible

5
Hybrid Memory Cube

• High-speed logic segregated in chip stack

• 3D TSV for bandwidth 6
High Bandwidth Memory (HBM)

[Shmuel Csaba Otto Traian]

• High-speed serial links vs. 2.5D silicon interposer

• Commercialized, HBM2/HBM3 …
7
Future: Resistive memory
• PCM: store bit in phase state of material
• Alternatives:
• Memristor, STT-MRAM
• Nonvolatile
• Dense: cross-point architecture (no access device)
• Relatively fast for read
• Very slow for write (also high power)
• Write endurance often limited
• Write leveling (also done for flash)
• Avoid redundant writes (read, cmp, write)
• Fix individual bit errors (write, read, cmp, fix)
• Lots of work on using this to augment/replace main memory

8
RAM

0/1 ? 0/1 ?
• RAM: large storage arrays
wordline0 • Basic structure
• MxN array of bits (M N-bit words)
address

0/1 ? 0/1 ?
• This one is 4x2
wordline1
• Bits in word connected by wordline
0/1 ? 0/1 ? • Bits in position connected by bitline
wordline2 • Operation
• Address decodes into M wordlines
0/1 ? 0/1 ?
wordline3
• High wordline → word on bitlines
• Bit/bitline connection → read/write
bitline1

bitline0

• Access latency
data • #ports * √#bits

9
SRAM

? ?
• SRAM: static RAM
• Bits as cross-coupled inverters (CCI)
– Four transistors per bit
address

? ? – More transistors for ports

? ? • “Static” means
• Inverters connected to pwr/gnd
+ Bits naturally/continuously “refreshed”
? ?

• Designed for speed

data

10
DRAM
• DRAM: dynamic RAM
• Bits as capacitors
+ Single transistors as ports
address

+ One transistor per bit/port

• “Dynamic” means
• Capacitors not connected to pwr/gnd
– Stored charge decays over time
– Must be explicitly refreshed

• Designed for density

data

11
DRAM Basics [Jacob and Wang]
• Precharge and Row Access

12
DRAM Basics, cont.
• Column Access

13
DRAM Basics, cont.
• Data Transfer

14
DRAM Operation I
• Read: similar to cache read
• Phase I: pre-charge bitlines to 0.5V
• Phase II: decode address, enable wordline
address

• Capacitor swings bitline voltage up(down)

• Sense-amplifier interprets swing as 1(0)
– Destructive read: word bits now discharged

write • Write: similar to cache write

sa sa
• Phase I: decode address, enable wordline
• Phase II: enable bitlines
• High bitlines charge corresponding capacitors

data – What about leakage over time?

15
DRAM Operation II
• Solution: add set of D-latches (row buffer)

• Read: two steps

address

• Step I: read selected word into row buffer

• Step IIA: read row buffer out to pins
• Step IIB: write row buffer back to selected word
+ Solves “destructive read” problem

r-I
sa sa • Write: two steps
• Step IA: read selected word into row buffer
r • Step IB: write data into row buffer
r/w-I DL DL • Step II: write row buffer back to selected word
r/w-II
data

16
DRAM Refresh
• DRAM periodically refreshes all contents
• Loops through all words
• Reads word into row buffer
address

• Writes row buffer back into DRAM array

• 1–2% of DRAM time occupied by refresh

sa sa

DL DL

data

17
DRAM Parameters
• DRAM parameters
• Large capacity: e.g., 64–256Mb
address • Arranged as square
+ Minimizes wire length
+ Maximizes refresh efficiency

DRAM • Narrow data interface: 1–16 bit

bit array
• Cheap packages → few bus pins

• Narrow address interface: N/2 bits

row buffer • 16Mb DRAM has a 12-bit address bus
• How does that work?

data

18
Two-Level Addressing
address
[23:12] [11:2]

• Two-level addressing
RAS • Row decoder/column muxes share
12to4K decoder

address lines
4K x 4K • Two strobes (RAS, CAS) signal which
bits part of address currently on bus

row buffer
4 1Kto1 muxes

CAS data

19
Access Latency and Cycle Time
• DRAM access much slower than SRAM
• More bits → longer wires
• Buffered access with two-level addressing
• SRAM access latency: <1ns
• DRAM access latency: 30–50ns

• DRAM cycle time also longer than access time

• Cycle time: time between start of consecutive accesses
• SRAM: cycle time = access time
• Begin second access as soon as first access finishes
• DRAM: cycle time = 2 * access time
• Why? Can’t begin new access while DRAM is refreshing row

20
Open v. Closed Pages
• Open Page
• Row stays active until another row needs to be accessed
• Acts as memory-level cache to reduce latency
• Variable access latency complicates memory controller
• Higher power dissipation (sense amps remain active)
• Closed Page
• Immediately deactivate row after access
• All accesses become Activate Row, Read/Write, Precharge

• Complex power v. performance trade off

21
DRAM Bandwidth
• Use multiple DRAM chips to increase bandwidth
• Recall, access are the same size as second-level cache
• Example, 16 2-byte wide chips for 32B access

• DRAM density increasing faster than demand

• Result: number of memory chips per system decreasing

• Need to increase the bandwidth per chip

• Especially important in game consoles
• SDRAM ➔ DDR ➔ DDR2 ➔ FBDIMM (➔ DDR3)
• Rambus - high-bandwidth memory
• Used by several game consoles

22
Synchronous DRAM (SDRAM)

RAS’

CAS’

Column add
Row add

Data Data Data

• Add Clock and Wider data!
• Also multiple transfers per RAS/CAS
23
Enhanced SDRAM & DDR
• Evolutionary Enhancements on SDRAM:
1. ESDRAM (Enhanced): Overlap row buffer access with
refresh

2. DDR (Double Data Rate): Transfer on both clock edges

3. DDR2’s small improvements
lower voltage, on-chip termination, driver calibration
prefetching, conflict buffering
4. DDR3, more small improvements
lower voltage, 2X speed, 2X prefetching,
2X banks, “fly-by topology”, automatic calibration

24
Interleaved Main Memory
• Divide memory into M banks and “interleave” addresses
across them, so word A is
• in bank (A mod M)
• at word (A div M)
Bank 0 Bank 1 Bank 2 Bank n
word 0 word 1 word 2 word n-1
word n word n+1 word n+2 word 2n-1
word 2n word 2n+1 word 2n+2 word 3n-1

PA
Doubleword in bank Bank Word in doubleword

Interleaved memory increases memory BW without wider bus

• Use parallelism in memory banks to hide memory latency
Copyright © 2002 Falsafi, from Hill,
Smith, Sohi, Vijaykumar, and Wood 25
Block interleaved memory systems
• Cache blocks map to separate memory controllers
• Interleave across DRAMs w/i a MC
• Interleave across intra-DRAM banks w/i a DRAM
DRAM

DRAM

DRAM
MC MC MC MC
CPU
B B+64 B+128 B+192

Data bus

26
Research: Processing in Memory
address
• Processing in memory
• Embed some ALUs in DRAM
• Picture is logical, not physical
DRAM • Do computation in DRAM rather than…
bit array • Move data to from DRAM to CPU
• Compute on CPU
• Move data from CPU to DRAM
• Will come back to this in “vectors” unit
row buffer
row buffer
• E.g.,: IRAM: intelligent RAM
• Berkeley research project
• [Patterson+,ISCA’97]
data • Very hot again

27
Memory Hierarchy Review
• Storage: registers, memory, disk
• Memory is the fundamental element

• Memory component performance

• tavg = thit + %miss * tmiss
• Can’t get both low thit and %miss in a single structure

• Memory hierarchy
• Upper components: small, fast, expensive
• Lower components: big, slow, cheap
• tavg of hierarchy is close to thit of upper (fastest) component
• 10/90 rule: 90% of stuff found in fastest component
• Temporal/spatial locality: automatic up-down data movement

28
28
Bonus

29
The DRAM Subsystem

Onur Mutlu….

Slide History/Attribution Diagram:

UW Madison UPenn UW Madison
UCLA
Hill, Sohi, Amir Roth, Hill, Sohi, Wood,
Nowatzki
Smith, Wood Milo Martin Various Universities Sankaralingam, Sinclair
Asanovic, Falsafi, Hoe, Lipasti,
Shen, Smith, Vijaykumar
DRAM Subsystem Organization

• Channel
• DIMM
• Rank
• Chip
• Bank
• Row/Column

31
Page Mode DRAM
• A DRAM bank is a 2D array of cells: rows x columns
• A “DRAM row” is also called a “DRAM page”
• “Sense amplifiers” also called “row buffer”

• Each address is a <row,column> pair

• Access to a “closed row”
• Activate command opens row (placed into row buffer)
• Read/write command reads/writes column in the row buffer
• Precharge command closes the row and prepares the bank for next
access
• Access to an “open row”
• No need for activate command

32
DRAM Bank Operation
Access Address:
(Row 0, Column 0) Columns
(Row 0, Column 1)
(Row 0, Column 85)

Row decoder
(Row 1, Column 0)

Rows
Row address 0
1

Row 01
Row
Empty Row Buffer CONFLICT
HIT !

Column address 0
1
85 Column mux

Data

33
The DRAM Chip
• Consists of multiple banks (2-16)
• Banks share command/address/data buses
• The chip itself has a narrow interface (4-16 bits per read)

34
128M x 8-bit DRAM Chip

35
DRAM Rank and Module
• Rank: Multiple chips operated together to form a wide
interface
• All chips comprising a rank are controlled at the same time
• Respond to a single command
• Share address and command buses, but provide different data

• A DRAM module consists of one or more ranks

• E.g., DIMM (dual inline memory module)
• This is what you plug into your motherboard

• If we have chips with 8-bit interface, to read 8 bytes in a

single access, use 8 chips in a DIMM

36
A 64-bit Wide DIMM (One Rank)

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM

Chip Chip Chip Chip Chip Chip Chip Chip

Command Data

37
A 64-bit Wide DIMM (One Rank)
• Advantages:
• Acts like a high-
capacity DRAM
chip with a wide
interface
• Simplicity: memory
controller does not
need to deal with
individual chips

• Disadvantages:
• Granularity:
Accesses cannot be
smaller than the
interface width

38
Multiple DIMMs
• Advantages:
• Enables even
higher capacity

• Disadvantages:
• Interconnect
complexity and
energy
consumption can
be high

40
DRAM Channels

• 2 Independent Channels: 2 Memory Controllers (Above)

• 2 Dependent/Lockstep Channels: 1 Memory Controller with
wide interface (Not shown above)

42
Generalized Memory Structure

43
The DRAM Subsystem
The Top Down View

Onur Mutlu….

Slide History/Attribution Diagram:

• Channel
• DIMM
• Rank
• Chip
• Bank
• Row/Column

45
The DRAM subsystem

“Channel” DIMM (Dual in-line memory module)

Processor

Memory channel Memory channel

46
Breaking down a DIMM

DIMM (Dual in-line memory module)

Side view

Front of DIMM Back of DIMM

47
Breaking down a DIMM

DIMM (Dual in-line memory module)

Side view

Front of DIMM Back of DIMM

Rank 0: collection of 8 chips Rank 1

48
Rank

Rank 0 (Front) Rank 1 (Back)

<0:63> <0:63>

Addr/Cmd CS <0:1> Data <0:63>

Memory channel
49
Breaking down a Rank

...

Chip 0

Chip 1

Chip 7
Rank 0

<56:63>
<8:15>
<0:7>
<0:63>

Data <0:63>

50
Breaking down a Chip

Chip 0

Bank 0
<0:7>
<0:7>

<0:7>

<0:7>
...

<0:7>

51
Breaking down a Bank
2kB
1B (column)

row 16k-1

...
Bank 0

row 0
<0:7>

Row-buffer
1B 1B 1B
...
<0:7>

52
DRAM Subsystem Organization

• Channel
• DIMM
• Rank
• Chip
• Bank
• Row/Column

53
Example: Transferring a cache block
Physical memory space

0xFFFF…F
Channel 0
...

DIMM 0

0x40
Rank 0
64B
cache block

0x00

54
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

...
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block

0x00

55
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 0
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block

0x00

56
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 0
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block
8B
0x00 8B

57
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block
8B
0x00

58
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
8B Data <0:63>
cache block
8B
0x00 8B

59
Example: Transferring a cache block
Physical memory space
Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
8B Data <0:63>
cache block
8B
0x00
A 64B cache block takes 8 I/O cycles to transfer.

During the process, 8 columns are read sequentially.

60
Latency Components: Basic DRAM Operation
• CPU → controller transfer time
• Controller latency
• Queuing & scheduling delay at the controller
• Access converted to basic commands
• Controller → DRAM transfer time
• DRAM bank latency
• Simple CAS (column address strobe) if row is “open” OR
• RAS (row address strobe) + CAS if array precharged OR
• PRE + RAS + CAS (worst case)
• DRAM → Controller transfer time
• Bus latency (BL)
• Controller to CPU transfer time

61
Multiple Banks (Interleaving) and Channels
• Multiple banks
• Enable concurrent DRAM accesses
• Bits in address determine which bank an address resides in
• Multiple independent channels serve the same purpose
• But they are even better because they have separate data buses
• Increased bus bandwidth

• Enabling more concurrency requires reducing

• Bank conflicts
• Channel conflicts
• How to select/randomize bank/channel indices in address?
• Lower order bits have more entropy
• Randomizing hash functions (XOR of different address bits)
• Pathological cases (strided at long length)

62
How Multiple Banks Help

63
Address Mapping (Single Channel)
• Single-channel system with 8-byte memory bus
• 2GB memory, 8 banks, 16K rows & 2K columns per bank

• Row interleaving
• Consecutive rows of memory in consecutive banks

Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits)

• Accesses to consecutive cache blocks serviced in a pipelined manner

• Cache block interleaving

• Consecutive cache block addresses in consecutive banks
• 64 byte cache blocks

Row (14 bits) High Column Bank (3 bits) Low Col. Byte in bus (3 bits)
8 bits 3 bits
• Accesses to consecutive cache blocks can be serviced in parallel
64
Bank Mapping Randomization
• DRAM controller can randomize the address mapping to
banks so that bank conflicts are less likely

3 bits Column (11 bits) Byte in bus (3 bits)

XOR

Bank index
(3 bits)

65
DRAM Refresh (I)
• DRAM capacitor charge leaks over time
• The memory controller needs to read each row periodically
to restore the charge
• Activate + precharge each row every N ms
• Typical N = 64 ms
• Implications on performance?
-- DRAM bank unavailable while refreshed
-- Long pause times: If we refresh all rows in burst, every 64ms the
DRAM will be unavailable until refresh ends
• Burst refresh: All rows refreshed immediately after one
another
• Distributed refresh: Each row refreshed at a different time,
at regular intervals

68
DRAM Refresh (II)

• Distributed refresh eliminates long pause times

• How else we can reduce the effect of refresh on
performance?
• Can we reduce the number of refreshes?

69
Downsides of DRAM Refresh
-- Energy consumption: Each refresh consumes energy
-- Performance degradation: DRAM rank/bank unavailable while
refreshed
-- QoS/predictability impact: (Long) pause times during refresh
-- Refresh rate limits DRAM density scaling

Liu et al., “RAIDR: Retention-aware Intelligent DRAM Refresh,” ISCA 2012.

70
Memory Controllers

Onur Mutlu….

Slide History/Attribution Diagram:

• The following discussion will use DRAM as an example, but

many issues are similar in the design of controllers for
other types of memories
• Flash memory
• Other emerging memory technologies
• Phase Change Memory
• Spin-Transfer Torque Magnetic Memory

72
DRAM Types
• DRAM has different types with different interfaces
optimized for different purposes
• Commodity: DDR, DDR2, DDR3, DDR4
• Low power (for mobile): LPDDR[1-5]
• High bandwidth (for graphics): GDDR[1-5]
• Low latency: eDRAM, RLDRAM, …
• 3D stacked: HBM, HMC,…
• Underlying microarchitecture is fundamentally the same
• A flexible memory controller can support various DRAM
types, but…
• This complicates the memory controller
• Difficult to support all types (and upgrades)

73
DRAM Controller: Functions
• Ensure correct operation of DRAM (refresh and timing)

• Service DRAM requests while obeying timing constraints of

DRAM chips
• Constraints: resource conflicts (bank, bus, channel), minimum
write-to-read delays
• Translate requests to DRAM command sequences

• Buffer and schedule requests to improve performance

• Reordering, row-buffer, bank, rank, bus management

• Manage power consumption and thermals in DRAM

• Turn on/off DRAM chips, manage power modes

74
DRAM Controller: Where to Place
• In chipset
+ More flexibility to plug different DRAM types into the system
+ Less power density in the CPU chip

• On CPU chip
+ Reduced latency for main memory access
+ Higher bandwidth between cores and controller
• More information can be communicated (e.g. request’s
importance in the processing core)

75
A Modern DRAM Controller

76
DRAM Scheduling Policies (I)
• FCFS (first come first served)
• Oldest request first

• FR-FCFS (first ready, first come first served)

1. Row-hit first
2. Oldest first
Goal: Maximize row buffer hit rate → maximize DRAM throughput

• Actually, scheduling is done at the command level

• Column commands (read/write) prioritized over row commands
(activate/precharge)
• Within each group, older commands prioritized over younger ones

77
DRAM Scheduling Policies (II)
• A scheduling policy is essentially a prioritization order

• Prioritization can be based on

• Request age
• Row buffer hit/miss status
• Request type (prefetch, read, write)
• Requestor type (load miss or store miss)
• Request criticality
• Oldest miss in the core?
• How many instructions in core are dependent on it?

78
Row Buffer Management Policies
• Open row
• Keep the row open after an access
+ Next access might need the same row → row hit
-- Next access might need a different row → row conflict, wasted energy

• Closed row
• Close the row after an access (if no other requests already in the request
buffer need the same row)
+ Next access might need a different row → avoid a row conflict
-- Next access might need the same row → extra activate latency

• Adaptive policies
• Predict whether or not the next access to the bank will be to the
same row

79
Open vs. Closed Row Policies

Policy First access Next access Commands

needed for next
access
Open row Row 0 Row 0 (row hit) Read
Open row Row 0 Row 1 (row Precharge +
conflict) Activate Row 1 +
Read
Closed row Row 0 Row 0 – access in Read
request buffer
(row hit)
Closed row Row 0 Row 0 – access not Activate Row 0 +
in request buffer Read + Precharge
(row closed)
Closed row Row 0 Row 1 (row closed) Activate Row 1 +
Read + Precharge

80
Why are DRAM Controllers Difficult to Design?
• Need to obey DRAM timing constraints for correctness
• There are many (50+) timing constraints in DRAM
• tWTR: Minimum number of cycles to wait before issuing a read
command after a write command is issued (rank level constraint –
change sense of bus)
• tRC: Minimum number of cycles between the issuing of two
consecutive activate commands to the same bank
• …
• Need to keep track of many resources to prevent conflicts
• Channels, banks, ranks, data bus, address bus, row buffers
• Need to handle DRAM refresh
• Need to optimize for performance & QoS (in the presence of constraints)
• Reordering is not simple
• Fairness and QoS needs complicate the scheduling problem

81
Many DRAM Timing Constraints

• From Lee et al., “DRAM-Aware Last-Level Cache Writeback: Reducing

Write-Caused Interference in Memory Systems,” HPS Technical Report,
April 2010.

82
2D Packaging

Board level System on Chip

[M. Maxfield, “2D vs. 2.5D vs. 3D ICs 101,” EE Times, April
2012]
Conventional packaging approaches
93
2D Packaging

[M. Maxfield, “2D vs. 2.5D vs. 3D ICs 101,” EE Times, April
2012]
Move toward System in Package (SIP)
• PCB, ceramic, semiconductor substrates 94
2.5D Packaging

2D Packaging 2.5D Packaging

[M. Maxfield, “2D vs. 2.5D vs. 3D ICs 101,” EE Times, April
2012]
2.5D uses silicon interposer, through-silicon vias
(TSV) 95
3D Packaging

3D Homogeneous 3D Heterogeneous

[M. Maxfield, “2D vs. 2.5D vs. 3D ICs 101,” EE Times, April
2012]
3D uses through-silicon vias (TSV) and/or
interposer 96
Packaging Discussion

• Heterogeneous integration
• RF, analog (PHY), FG/PCM/ReRAM, photonics
• Cost
• Silicon yield
• Bandwidth, esp. interposer
• Thermals
• It’s real!
• DRAM: HMC, HBM
• FPGAs
• GPUs: AMD, NVIDIA
• CPUs: AMD Zen, EPYC
97
Brief History of DRAM
• DRAM (memory): a major force behind computer industry
• Modern DRAM came with introduction of IC (1970)
• Preceded by magnetic “core” memory (1950s)
• Each cell was a small magnetic “donut”
• And by mercury delay lines before that (ENIAC)
• Re-circulating vibrations in mercury tubes

“the one single development that put computers on their feet was the
invention of a reliable form of memory, namely the core memory… It’s
cost was reasonable, it was reliable, and because it was reliable it
could in due course be made large”
Maurice Wilkes
Memoirs of a Computer Programmer, 1985

98
DRAM Basics [Jacob and Wang]
• Precharge and Row Access

99
DRAM Basics, cont.
• Column Access

100
DRAM Basics, cont.
• Data Transfer

101
Open v. Closed Pages
• Open Page
• Row stays active until another row needs to be accessed
• Acts as memory-level cache to reduce latency
• Variable access latency complicates memory controller
• Higher power dissipation (sense amps remain active)
• Closed Page
• Immediately deactivate row after access
• All accesses become Activate Row, Read/Write, Precharge

• Complex power v. performance trade off

102
DRAM Bandwidth
• Use multiple DRAM chips to increase bandwidth
• Recall, access are the same size as second-level cache
• Example, 16 2-byte wide chips for 32B access

• DRAM density increasing faster than demand

• Result: number of memory chips per system decreasing

• Need to increase the bandwidth per chip

• Especially important in game consoles
• SDRAM ➔ DDR ➔ DDR2 ➔ FBDIMM (➔ DDR3)
• Rambus - high-bandwidth memory
• Used by several game consoles

103
DRAM Evolution
• Survey by Cuppu et al.
1. Early Asynchronous Interface
2. Fast Page Mode/Nibble Mode/Static Column (skip)
3. Extended Data Out
4. Synchronous DRAM & Double Data Rate
5. Rambus & Direct Rambus
6. FB-DIMM

104
Old 64MbitDRAM Example from Micron
Clock Recovery

105
Extended Data Out (EDO)

RAS’

CAS’

Row add Column add Column add Column add

Data Data Data

• Similar to Fast Page Mode

• But overlapped Column Address assert with Data Out
106
Synchronous DRAM (SDRAM)

RAS’

CAS’

Column add
Row add

Data Data Data

• Add Clock and Wider data!
• Also multiple transfers per RAS/CAS
107
Enhanced SDRAM & DDR
• Evolutionary Enhancements on SDRAM:
1. ESDRAM (Enhanced): Overlap row buffer access with
refresh

2. DDR (Double Data Rate): Transfer on both clock edges

108
Wide v. Narrow Interfaces
• High frequency ➔ short wavelength ➔ data skew issues
• Balance wire lengths

DDR-2 serpentine board routing FB-DIMM board routing

109
Rambus RDRAM
• High-frequency, narrow channel
• Time multiplexed “bus” ➔ dynamic point-to-point channels
• ~40 pins ➔ 1.6GB/s
• Proprietary solution
• Never gained industry-wide acceptance (cost and power)
• Used in some game consoles (e.g., PS2)

RDRAM RDRAM RDRAM

CPU
from_clock
or
Memory to_clock
Controller
Data bus
16 bits @ 800 Mhz

110
FB-DIMM

111
DRAM Reliability
• One last thing about DRAM technology… errors
• DRAM bits can flip from 0➔1 or 1➔0
• Small charge stored per bit
• Energetic -particle strikes disrupt stored charge
• Many more bits
• Modern DRAM systems: built-in error detection/correction
• Today all servers; most new desktop and laptops
• Key idea: checksum-style redundancy
• Main DRAM chips store data, additional chips store f(data)
• |f(data)| < |data|
• On read: re-compute f(data), compare with stored f(data)
• Different ? Error…
• Option I (detect): kill program
• Option II (correct): enough information to fix error? fix and go on

112
DRAM Error Detection and Correction
address data error

4M 4M 4M 4M 4M
x x x x x
2B 2B 2B 2B 2B
0 1 2 3 f

• Performed by memory controller (not the DRAM chip)

• Error detection/correction schemes distinguished by…
• How many (simultaneous) errors they can detect
• How many (simultaneous) errors they can correct

113
Interleaved Main Memory
• Divide memory into M banks and “interleave” addresses
across them, so word A is
• in bank (A mod M)
• at word (A div M)
Bank 0 Bank 1 Bank 2 Bank n
word 0 word 1 word 2 word n-1
word n word n+1 word n+2 word 2n-1
word 2n word 2n+1 word 2n+2 word 3n-1

PA
Doubleword in bank Bank Word in doubleword

Interleaved memory increases memory BW without wider bus

• Use parallelism in memory banks to hide memory latency
Copyright © 2002 Falsafi, from Hill,
Smith, Sohi, Vijaykumar, and Wood 114
Block interleaved memory systems
• Cache blocks map to separate memory controllers
• Interleave across DRAMs w/i a MC
• Interleave across intra-DRAM banks w/i a DRAM
DRAM

DRAM

DRAM
MC MC MC MC
CPU
B B+64 B+128 B+192

Data bus

115
Memory Hierarchy Review
• Storage: registers, memory, disk
• Memory is the fundamental element

• Memory component performance

• tavg = thit + %miss * tmiss
• Can’t get both low thit and %miss in a single structure

116
Software Managed Memory
• Isn’t full associativity difficult to implement?
• Yes … in hardware
• Implement fully associative memory in software

• Let’s take a step back…

117

FortiSwitch 7.2 Sample Questions - Attempt Review 1
No ratings yet
FortiSwitch 7.2 Sample Questions - Attempt Review 1
4 pages
Connecting Arduino To Thingspeak Using Python As An Middle Ware
No ratings yet
Connecting Arduino To Thingspeak Using Python As An Middle Ware
14 pages
Unit 1 Question and Answer
100% (1)
Unit 1 Question and Answer
5 pages
Power Optimization (Part 2) : Xuan Silvia' Zhang
No ratings yet
Power Optimization (Part 2) : Xuan Silvia' Zhang
26 pages
ECE 554 Computer Architecture Main Memory Spring 2013
No ratings yet
ECE 554 Computer Architecture Main Memory Spring 2013
35 pages
Building A Business With Haskell: Case Studies: Cryptol, HaLVM and Copilot
100% (3)
Building A Business With Haskell: Case Studies: Cryptol, HaLVM and Copilot
32 pages
GOOD DRAM Interface Tutorial
No ratings yet
GOOD DRAM Interface Tutorial
91 pages
Network On Chip
No ratings yet
Network On Chip
43 pages
Behavioral Model of A DDR Memory Controller in A DFi - Frequency Ratio System
No ratings yet
Behavioral Model of A DDR Memory Controller in A DFi - Frequency Ratio System
10 pages
3 - SoC & NoC
No ratings yet
3 - SoC & NoC
55 pages
Why Active Low Reset
No ratings yet
Why Active Low Reset
1 page
DRAM Command Guide
No ratings yet
DRAM Command Guide
2 pages
ASU DDR5 Digital Presentation
No ratings yet
ASU DDR5 Digital Presentation
59 pages
Design of LPDDR3 Memory Controller With Axi
No ratings yet
Design of LPDDR3 Memory Controller With Axi
4 pages
Can Protocol Uvm PDF
No ratings yet
Can Protocol Uvm PDF
5 pages
DDR memory system simulation method
No ratings yet
DDR memory system simulation method
17 pages
Implementation and Veri Fication of Pci Express Interface in A Soc
No ratings yet
Implementation and Veri Fication of Pci Express Interface in A Soc
5 pages
Intelligent High Performance Memory Access Technique in Aspect of DDR3
No ratings yet
Intelligent High Performance Memory Access Technique in Aspect of DDR3
6 pages
DDR5-Anil Pandey PDF
No ratings yet
DDR5-Anil Pandey PDF
3 pages
The Berkeley Out - of - Order Machine (Boom!) : An Open - Source Industry - Compeeeve, Synthesizable, Parameterized Risc - V Processor
100% (1)
The Berkeley Out - of - Order Machine (Boom!) : An Open - Source Industry - Compeeeve, Synthesizable, Parameterized Risc - V Processor
45 pages
Riscv Iommu PDF
No ratings yet
Riscv Iommu PDF
103 pages
Ethernet IP Core Design Document: Author: Igor Mohor
No ratings yet
Ethernet IP Core Design Document: Author: Igor Mohor
46 pages
Write Levelling On DDR3
No ratings yet
Write Levelling On DDR3
3 pages
Clock Domain Crossing Verification WP
No ratings yet
Clock Domain Crossing Verification WP
6 pages
AI Transformation Playbook
No ratings yet
AI Transformation Playbook
22 pages
Basics of DDR Protocol: Jose Thomas Vellara
No ratings yet
Basics of DDR Protocol: Jose Thomas Vellara
57 pages
Eetop - CN Apb SVT Uvm User Guide
No ratings yet
Eetop - CN Apb SVT Uvm User Guide
61 pages
User Sequence Item
No ratings yet
User Sequence Item
8 pages
SV-UVM of AXI - WB
No ratings yet
SV-UVM of AXI - WB
4 pages
Low-Power Verification, The Air Way...
No ratings yet
Low-Power Verification, The Air Way...
19 pages
UCIe Physical Layer
No ratings yet
UCIe Physical Layer
16 pages
USB3 Verification Requirement Specification 0.2
No ratings yet
USB3 Verification Requirement Specification 0.2
34 pages
Axi Doc
No ratings yet
Axi Doc
28 pages
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
No ratings yet
Tutorial On DNN 4 of 9 DNN Accelerator Architectures PDF
73 pages
Verilog 2001 Ref Guide
No ratings yet
Verilog 2001 Ref Guide
56 pages
VIP Development of SPI Controller For Open-Power Processor Based Fabless SoC
No ratings yet
VIP Development of SPI Controller For Open-Power Processor Based Fabless SoC
6 pages
Sync Async
No ratings yet
Sync Async
31 pages
AXI Vs AHB. Difference Between AXI and AHB
100% (2)
AXI Vs AHB. Difference Between AXI and AHB
3 pages
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
No ratings yet
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
4 pages
Opencores Coding Guidelines
No ratings yet
Opencores Coding Guidelines
28 pages
Vlsi/Fpga Design and Test CAD Tool Flow in Mentor Graphics
No ratings yet
Vlsi/Fpga Design and Test CAD Tool Flow in Mentor Graphics
20 pages
Universal Verification Methodology Based Verification Environment For PCIE Data Link Layer
No ratings yet
Universal Verification Methodology Based Verification Environment For PCIE Data Link Layer
5 pages
Pcie Intel Specification
No ratings yet
Pcie Intel Specification
9 pages
AMBA Transactors User and Reference Guide: Product Version 5.4 May 2005
No ratings yet
AMBA Transactors User and Reference Guide: Product Version 5.4 May 2005
170 pages
Uart Core With Apb
No ratings yet
Uart Core With Apb
31 pages
Axi BFM
No ratings yet
Axi BFM
85 pages
PCIe Clock Source Selection
No ratings yet
PCIe Clock Source Selection
7 pages
System On Chip Design and Modelling: University of Cambridge Computer Laboratory Lecture Notes
No ratings yet
System On Chip Design and Modelling: University of Cambridge Computer Laboratory Lecture Notes
144 pages
I 2 C
No ratings yet
I 2 C
16 pages
Logic synthesis Standard Requirements
From Everand
Logic synthesis Standard Requirements
Gerardus Blokdyk
No ratings yet
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet
Onur 740 Fall11 Lecture25 Mainmemory
No ratings yet
Onur 740 Fall11 Lecture25 Mainmemory
50 pages
Lecture 12 - Memory Technologies
No ratings yet
Lecture 12 - Memory Technologies
47 pages
Memory System
No ratings yet
Memory System
70 pages
EE6304 Lecture8 Mem Hierarchy
No ratings yet
EE6304 Lecture8 Mem Hierarchy
54 pages
Seth 740 Fall13 Module3.5 Main Memory Part1
No ratings yet
Seth 740 Fall13 Module3.5 Main Memory Part1
69 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
2.2 CU Memory System Design
No ratings yet
2.2 CU Memory System Design
75 pages
DRAM Terminology and Basics, Energy Innovations
No ratings yet
DRAM Terminology and Basics, Energy Innovations
14 pages
EECS 150 - Components and Design Techniques For Digital Systems Lec 16 - Storage: DRAM, SDRAM
No ratings yet
EECS 150 - Components and Design Techniques For Digital Systems Lec 16 - Storage: DRAM, SDRAM
26 pages
Evoltion&Future - Memory Technology
No ratings yet
Evoltion&Future - Memory Technology
37 pages
250324Digital System Design_memory.pptx
No ratings yet
250324Digital System Design_memory.pptx
107 pages
L05 Memory
No ratings yet
L05 Memory
45 pages
Ch-4-Memory System Org and Arch
No ratings yet
Ch-4-Memory System Org and Arch
58 pages
ML15-17 (ML-1510 ML-1710 ML-1750)
No ratings yet
ML15-17 (ML-1510 ML-1710 ML-1750)
122 pages
Asset-V1 Microsoft+MS-100.1+2018 T3+type@[email protected] Student Lab Manual
No ratings yet
Asset-V1 Microsoft+MS-100.1+2018 T3+type@[email protected] Student Lab Manual
31 pages
Trace
No ratings yet
Trace
454 pages
D - Old Products - S7 Mediapad Lite S7-931u - HUAWEI MediaPad 7 Lite FAQs
100% (1)
D - Old Products - S7 Mediapad Lite S7-931u - HUAWEI MediaPad 7 Lite FAQs
21 pages
Distributed Operating: Systems Spring 2005
No ratings yet
Distributed Operating: Systems Spring 2005
23 pages
ITC LAB 3 - Networking
No ratings yet
ITC LAB 3 - Networking
10 pages
Programming With 8085
No ratings yet
Programming With 8085
24 pages
KR_C4_start-up_guide_V1.2_en
No ratings yet
KR_C4_start-up_guide_V1.2_en
45 pages
04 Chapter 4 Using TCL To Control The HyperMesh Session 12 1 PDF
100% (1)
04 Chapter 4 Using TCL To Control The HyperMesh Session 12 1 PDF
16 pages
Micro Project Sample Word
No ratings yet
Micro Project Sample Word
21 pages
Zero Address Instructions
No ratings yet
Zero Address Instructions
3 pages
SBQ Solution
No ratings yet
SBQ Solution
2 pages
The CPU, Instruction Fetch & Execute: 2.1 A Bog Standard Architecture
No ratings yet
The CPU, Instruction Fetch & Execute: 2.1 A Bog Standard Architecture
15 pages
Erik VanBuggenhout
No ratings yet
Erik VanBuggenhout
75 pages
Socket Programming
No ratings yet
Socket Programming
3 pages
RIS Installation Guide: For 32-Bit Applications
No ratings yet
RIS Installation Guide: For 32-Bit Applications
130 pages
Mahavir Polytechnic Department of Computer Engineering Unit - I
No ratings yet
Mahavir Polytechnic Department of Computer Engineering Unit - I
21 pages
IRJET V5I3579 With Cover Page v2
No ratings yet
IRJET V5I3579 With Cover Page v2
6 pages
1.2 Microsoft OneDrive
No ratings yet
1.2 Microsoft OneDrive
196 pages
Tej3m Network Design 2014 Final
No ratings yet
Tej3m Network Design 2014 Final
3 pages
AT-S81TR Specification
No ratings yet
AT-S81TR Specification
2 pages
OrangePi User Manual v1.0
No ratings yet
OrangePi User Manual v1.0
20 pages
Top DBA Shell Scripts For Monitoring The Database
No ratings yet
Top DBA Shell Scripts For Monitoring The Database
9 pages
CS401 Mcqs FinalTerm by Vu Topper RM
No ratings yet
CS401 Mcqs FinalTerm by Vu Topper RM
46 pages
Create VLC Playlist On Video Portions
No ratings yet
Create VLC Playlist On Video Portions
3 pages
STM32WB WDG - TIMERS Independent Watchdog IWDG
No ratings yet
STM32WB WDG - TIMERS Independent Watchdog IWDG
11 pages