The Memory System: Fundamental Concepts
The Memory System: Fundamental Concepts
Fundamental Concepts
Basic Concepts
Memory Addressing
The maximum size of the Main Memory
(MM) that can be used in any computer
is determined by its addressing
scheme.
The addressing capability of the system
depends on the No. of address lines it
has
Ex: 220 = 1M memory locations
232 =4 G memory locations
Word Address
0
Byte Address
0 1 2 3
4 5 6 7
8 9 1011
..
Processor
MAR
MDR
kbit
addressbus
nbit
databus
Memory
Upto2 kaddressable
locations
Wordlength= nbits
Controllines
R /W
(,MFC,etc.)
Some basic
concepts(Contd.,)
Measures for the speed of a memory:
memory access time.
memory cycle time.
Virtual Memory
In a virtual memory System, the address
generated by the CPU is referred to as a
virtual or logical address.
The corresponding physical address can be
different and the required mapping is
implemented by a special memory control
unit, often called the memory management
unit.
FF
A0
A2
W1
Address
decoder
FF
A1
W0
Memory
cells
Sense/Write
circuit
Sense/Write
circuit
R /W
b1
b0
A3
W15
Sense/Write
circuit
Datainput /outputlines:
b7
CS
Internal Organization of
Memory Chip
It stores 128 bits.
Requires 14 external connections for
address, data and control lines
And two lines for power supply and
ground connections.
Total of 16 connections.
1K X 1 Memory Chip
Organization
10 bit address is required.
5 bits each to form the row and column
address.
A row address selects a row of 32 cells, all of
which are accessed in parallel.
One of these cells, selected by the column
address, is connected to the external data
lines by the input demultiplexer and output
multiplexers.
5-Bit
Decoder
32X32
memory cell
array
10
B
I
s/w
ckt
T
A
D
D
Two 32 to 1
Multiplexers
S
S
R/w
CS
SRAM Cell
T1
T2
Wordline
Bitlines
T1
T2
Wordline
Bitlines
Read Operation
Word line is activated to close switches
T1 and T2.
If cell is in state 1, the signal on bit line
b is high and b1 is low.
Sense/write circuits at the end of the bit
lines monitor the state of b and b 1 and
set the output accordingly
Write Operation
The state of the cell is set by placing the
appropriate value on bit b and its
complement on b1, and then activating
the wordline.
This forces the
corresponding state
cell
into
the
CMOS(Complementary metaloxidesemiconductor)
Memory Cell
Asynchronous DRAMs
RAS
Row
address
latch
A20 9 A8 0
Row
decoder
4096 512 8
cellarray
Sense/Write
circuits
Column
address
latch
CAS
Column
decoder
D7
D0
16-megabit DRAM
organized as 4KX4K array
Each row can store 512
bytes. 12 bits to select a
row, and 9 bits to select a
group in a row. Total of 21
bits.
First apply the row address,
RAS signal latches the row
address. Then apply the
CS
column address, CAS signal
R /W
latches the address.
Timing of the memory unit
is controlled
asynchronously. This is
asynchronous DRAM
A specialized memory
controller circuit provides
the necessary control
signals,
RAS and CAS, that govern
the timings.
Synchronous DRAMs
Refresh
counter
Row
address
latch
Row
decoder
Column
address
counter
Column
decoder
Row/Column
address
Clock
R AS
CAS
R/ W
CS
Moderegister
and
timingcontrol
Timing diagram
Burst read of length 4.
Latency, Bandwidth
Memory latency is the time it takes
to transfer a word of data to or from
memory
Memory bandwidth is the number of
bits or bytes that can be transferred
in one second.
21bit
addresses
19bitinternalchipaddress
A19
A20
2bit
decoder
512K 8
memorychip
D3124
D2316
D 158
512K 8memorychip
19bit
address
8bitdata
input/output
Chipselect
Memory controller
Recall that in a dynamic memory chip, to reduce
the number of pins, multiplexed addresses are
used.
Address is divided into two parts:
Address
RAS
R/ W
Request
Processor
Memory
controller
R/ W
CS
Clock
Clock
Data
38
CAS
Memory
Refresh overhead
DRAMS have to refresh.
Older DRAMS refreshing for 16 ms.
In SDRAMS(synchronous) period is 64 ms.
In SDRAM with 8K rows(8192 rows), if it takes 4 clock cycles to access each row,
it takes 8192x4=32,768 cycles to refresh all the rows.
If clock rate is 133MHz,time needed to refresh all the row is 32,768/
(133x10power6)=246x10-6
Refreshing process occupies 0.246ms in 64-ms time interval.
Refresh overhead is 0.246/64=0.0038 which is less than 0.4% of the total time
available for accessing the memory
Rambus memory
Performance of Dynamic memory is
characterized by its latency and bandwidth.
All chips use similar organization for their cell
array so latency will be same if chips are
produced using the same manufacturing
process.
But bandwidth is not only depends on memory
but also nature of connecting path to the
processor.
To increase the amount of data that can be
transferred on a speed limited bus is to increase
the width of the bus by providing more data
lines ,i.e: widening the bus.
RAMBUS
Very wide bus expensive, requires lot of
space on motherboard.
So Alternative approaches to implement a
narrow bus that is much fasterrambus.
Key features: fast signaling method used
to transfer information between chips.
Instead of using signals that have voltage
levels of either 0 or vsupply to represent
logic values the signals consist of much
smaller voltage swings around a reference
voltage vref. (above 2 v) and 2 logic values
are represented by 0.3 V swings above
and below vref differential signaling.
RAMBUS Communication
Bit line
word line
T
P connected to store 0
Read-Only Memories
(Contd.,)
Programmable Read-Only Memory
(PROM):
Erasable Programmable Read-Only
Memory (EPROM)
Electrically Erasable Programmable
Read-Only Memory (EEPROM)
Read-Only Memories
(Contd.,)
Flash memory:
Has similar approach to EEPROM.
Read the contents of a single cell, but write the
contents of an entire block of cells.
Flash devices have greater density.
Higher capacity and low storage cost per bit.
Dynamic RAM:
Simpler basic cell circuit, hence are much less expensive, but significantly slower
than SRAMs.
Magnetic disks:
Storage provided by DRAMs is higher than SRAMs, but is still less than what is
necessary.
Secondary storage such as magnetic disks provide a large amount
of storage, but is much slower than DRAMs.
Memory Hierarchy
Processor
Registers
Increasing
size
Primary L1
cache
Secondary L2
cache
Main
memory
Magneticdisk
secondary
memory
Cache Memories
Processor is much faster than the main memory.
As a result, the processor has to spend much of its time waiting while instructions
and data are being fetched from the main memory.
Major obstacle towards achieving good performance.
Locality of Reference
Analysis of programs indicates that many
instructions in localized areas of a program
are executed repeatedly during some period
of time, while the others are accessed
relatively less frequently.
These instructions may be the ones in a loop, nested loop or few procedures
calling each other repeatedly.
This is called locality of reference.
Cache memories
Processor
Cache
Main
memory
Cache hit
Existence of a cache is transparent to the processor. The processor
issues Read and
Write requests in the same manner.
If the data is in the cache it is called a Read or Write hit.
Read hit:
The data is obtained from the cache.
Write hit:
Cache has a replica of the contents of the main memory.
Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit
known
as the dirty bit or modified bit. The contents of the main memory
are updated
when this block is replaced. This is write-back or copy-back
protocol.
Cache miss
If the data is not present in the cache, then a Read miss or Write miss
occurs.
Read miss:
Block of words containing this requested word is transferred from the
memory.
After the block is transferred, the desired word is forwarded to the processor.
The desired word may also be forwarded to the processor as soon as it is
transferred without waiting for the entire block to be transferred. This is
called load-through or early-restart.
Write-miss:
Write-through protocol is used, then the contents of the main memory are
updated directly.
If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.
Mapping functions
Mapping functions determine how
memory blocks are placed in the cache.
A simple processor example:
Direct mapping
Main
memory
Cache
tag
Block0
tag
Block1
tag
Block127
Tag
Block
Word
Mainmemoryaddress
Block4095
Associative mapping
Main
memory
Block0
tag
tag
Block1
Block127
Tag
12
Cache
tag
Block0
Word
4
Mainmemoryaddress
Block255
Block256
Block257
Block4095
Set-Associative mapping
Cache
tag
Main
memory
Block0
tag
Block1
tag
Block2
tag
Block3
tag
Block126
tag
Block127
Tag
6
set
6
Word
4
Mainmemoryaddress
Block0Blocks
Block4095
Replacement algorithm
When there is a cache miss, it requires
space for new blocks.
Replacement algorithm:
FIFO
The first-in block in the cache is replaced
first.
Performance
considerations
A key design objective of a computer system is to
achieve the best possible performance at the
lowest possible cost.
Price/performance ratio is a common measure of
success.
Interleaving
Divides the memory system into a number of
memory modules. Each module has its own address buffer register
(ABR) and data buffer register (DBR).
mbits
Module
Addressinmodule
MMaddress
mbits
kbits
Addressinmodule
Module
MMaddress
ABR DBR
ABR DBR
ABR DBR
ABR DBR
ABR DBR
ABR DBR
Module
0
Module
i
Module
n 1
Module
0
Module
i
Module
k
2 1
Effect of interleaving
Consider the time needed to transfer of block
of data from MM to cache with read miss.
Cache8 word block. Block copies to cache.
h/w properties: 1 c/c to send an address to
MM.
MM with slow DRAMS, allow the first word to
be accessed in 8 cycles. Then 4 for
subsequent . Then 1 c/c needed to send 1
word to cache.
If single module, time to load the desired
block into cache: 1+8+(7X4)+1=38 cycles.
With 4 modules
When the starting address of the block arrives
at the memory, all 4 modules begin accessing
the required data.
After 8 cycles, each module has 1 word of
data in its DBR. during the next 4 cycles these
words are transferred to cache one word at a
time. During this time next word in each
module is accessed. Then it takes 4 cycles to
transfer to cache.
So total time with interleaved memory is
1+8+4+4=17 cycles.
Interleaving reduces the block transfer time
by more than a factor of 2.
ave
= h1c1+(1-h1)h2c2+(1-h1)(1-h2)M
Other Performance
Enhancements
Several other possibilities exist for
enhancing performance
Write buffer
Prefetching
Lockup-Free cache
Write-through:
Each write operation involves writing to the main memory.
If the processor has to wait for the write operation to be complete, it slows
down the processor.
Processor does not depend on the results of the write operation.
Write buffer can be included for temporary storage of write requests.
Processor places each write request into the buffer and continues
execution.
If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer.
Write-back:
Block is written back to the main memory when it is replaced.
If the processor waits for this write to complete, before reading the new
block, it is slowed down.
Fast write buffer can hold the block to be written, and the new
block can be read first.
Virtual memories
Recall that an important challenge in the
design of a computer system is to provide a
large, fast memory system at an affordable
cost.
Architectural solutions to increase the effective
speed and size of the memory system.
Cache memories were developed to increase
the effective speed of the memory system.
Virtual memory is an architectural solution to
increase the effective size of the memory
system.
85
86
87
88
Data
Cache
Data
Mainmemory
DMAtransfer
Diskstorage
89
Address translation
Assume that program and data are composed
of fixed-length units called pages.
A page consists of a block of words that occupy
contiguous locations in the main memory.
Page is a basic unit of information that is
transferred between secondary storage and
main memory.
Size of a page commonly ranges from 2K to
16K bytes.
Pages should not be too small, because the access time of a secondary storage
device is much larger than the main memory.
Pages should not be too large, else a large portion of the page may not be used,
and it will occupy valuable space in the main memory.
90
Virtual memory:
Introduced to bridge the speed gap between the main memory
and secondary storage.
Implemented in part by software.
91
Virtualaddressfromprocessor
Virtualpagenumber
Offset
Virtual address is
interpreted as page
number and offset.
+
PAGETABLE
PTBR + virtual
page number provide
the entry of the page
in the page table.
Pageframe
inmemory
Pageframe
Physicaladdressinmainmemory
94
Offset
95
96
97
TLB
Virtualpage
number
No
Control
bits
Pageframe
inmemory
=?
Yes
Miss
Hit
Pageframe
Physicaladdressinmainmemory
99
Offset
Associative-mapped TLB
100
101
104
Disk
Disk drive
Disk controller
Organization of Data on a
Disk
Sector3,track n
Sector0,track1
Sector0,track0
Figure5.30.Organizationofonesurfaceofadisk.
Disk Controller
Processor
Mainmemory
Systembus
Diskcontroller
Diskdrive
Diskdrive
Figure5.31.Disksconnectedtothesystembus.
Aluminum
Optical Disks
Pit
Acrylic
Label
Polycarbonate plastic
Land
(a) Cross-section
Pit
Land
Reflection
Reflection
No reflection
Source
Detector
Source
Source
Detector
Detector
Optical Disks
CD-ROM
CD-Recordable (CD-R)
CD-ReWritable (CD-RW)
DVD
DVD-RAM
File
mark
File
mark
Filegap
Record
Record
gap
Record
Record
gap
Figure5.33.Organizationofdataonmagnetictape.
7or9
bits
End
Data transfers between main memory and disk occur directly bypassing the
cache.
When the data on a disk changes, the main memory block is also updated.
However, if the data is also resident in the cache, then the valid bit is set to 0.
What happens if the data in the disk and main memory changes and the writeback protocol is being used?
In this case, the data in the cache may also have changed and is indicated by
the dirty bit.
The copies of the data in the cache, and the main memory are different. This is
called the cache coherence problem.
One option is to force a write-back before the main memory is updated from
the disk.