0% found this document useful (0 votes)
20 views

Chapter 2z Ppt

Chapter 2 discusses key characteristics of computer memory, including types such as volatile, nonvolatile, and cache memory, emphasizing their performance, capacity, and cost. It explains the memory hierarchy, access methods, and the importance of locality of reference in optimizing cache performance. Additionally, it covers various mapping functions for cache organization, including direct, associative, and set-associative mapping, along with their implications on performance and efficiency.

Uploaded by

fredrick.juston
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Chapter 2z Ppt

Chapter 2 discusses key characteristics of computer memory, including types such as volatile, nonvolatile, and cache memory, emphasizing their performance, capacity, and cost. It explains the memory hierarchy, access methods, and the importance of locality of reference in optimizing cache performance. Additionally, it covers various mapping functions for cache organization, including direct, associative, and set-associative mapping, along with their implications on performance and efficiency.

Uploaded by

fredrick.juston
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 54

Chapter 2: Computer System

Memory
Key Characteristics of Computer Memory
Systems
• Memory can either be volatile, nonvolatile,
or nonerasable

• The most common forms are:


— Semiconductor memory
— Magnetic surface memory
— Optical
— Magneto-optical
Method of Accessing Units of Data
What Users Care About
• Performance

• Capacity

• Cost
Performance
Three performance parameters influence
performance:

Access time (latency)

Memory cycle time

Transfer rate
Memory Hierarchy
As we move from the top of the pyramid to
the bottom we need:
a) Decreasing cost per bit
b) Increasing capacity
c) Increasing access time
d) Decreasing frequency of access of the memory
by the processor
Semiconductor Main
Memory
Semiconductor Memory Types
Memory Cell Operation
Organization of bit cells in a memory chip
Organization of a 2M × 32 memory module using 512K × 8
static memory chips
Internal organization of a 32M × 8 dynamic memory chip
Typical 16 Mb DRAM (4M x 4)
Cache memories

• Processor is much faster than the main


memory.
— As a result, the processor has to spend much of its
time waiting while instructions and data are being
fetched from the main memory.
— Major obstacle towards achieving good performance.
• Speed of the main memory cannot be
increased beyond a certain point.
• Cache memory is an architectural arrangement
which makes the main memory appear faster
to the processor than it really is.
Cache Memory-Principles
Cache Memory-Operation
• Processor issues a Read request, a
block of words is transferred from
the main memory to the cache,
one word at a time.
• Subsequent references to the data
in this block of words are found in
the cache.
• At any given time, only some
blocks in the main memory are
held in the cache.
• Which blocks in the main memory
are in the cache is determined by
a “mapping function”.
• When the cache is full, and a block
of words needs to be transferred
from the main memory, some
block of words in the cache must
be replaced.
• This is determined by a
“replacement algorithm”.
Cache Memory-Read/Write
• Existence of a cache is transparent to the processor.
• The processor issues Read and Write requests in the same
manner.

• If the data is in the cache it is called a Read or Write hit.

Read hit:
- The data is obtained from the cache.

Write hit:
- Cache is a replica of the contents of the main memory.
- Contents of the cache and the main memory may be
updated simultaneously. This is the write-through
protocol.
- Update the contents of the cache, and mark it as updated
by setting a bit known as the dirty bit or modified bit.
- The contents of the main memory are updated when this
block is replaced. This is write-back or copy-back
Cache Memory-Read/Write
• If the data is not present in the cache, then a Read miss or
Write miss occurs.
Read miss:
- Block of words containing this requested word is transferred
from the memory.
- After the block is transferred, the desired word is forwarded
to the processor.
- The desired word may also be forwarded to the processor as
soon as it is transferred without waiting for the entire block to
be transferred. This is called load-through or early-restart.
Write-miss:
- Write-through protocol is used, then the contents of the main
memory are updated directly.
- If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired
word is overwritten with new information.
Locality of reference
• Analysis of programs indicates that many instructions in
localized areas of a program are executed repeatedly
during some period of time, while the others are accessed
relatively less frequently.
—These instructions may be the ones in a loop,
nested loop or few procedures calling each
other repeatedly.
—This is called “locality of reference”.
• Temporal locality of reference:
— Recently executed instruction is likely to be executed
again very soon.
• Spatial locality of reference:
— Instructions with addresses close to a recently
instruction are likely to be executed soon.
Locality of reference (contd..)
• Cache memory is based on the concept of locality
of reference.
— If active segments of a program are placed in a fast
cache memory, then the execution time can be reduced.
• Temporal locality of reference:
— Whenever an instruction or data is needed for the first
time, it should be brought into a cache. It will hopefully
be used again repeatedly.
• Spatial locality of reference:
— Instead of fetching just one item from the main memory
to the cache at a time, several items that have addresses
adjacent to the item being fetched may be useful.
— The term “block” refers to a set of contiguous addresses
locations of some size.
Hit Rate & Miss Penalty
• A successful access to data in a cache is called a hit.
• The number of hits stated as a fraction of all attempted
accesses is called the hit rate
• The number of misses stated as a fraction of all attempted
accesses is called the miss rate.
• High hit rates well over 0.9 are essential for high-
performance computers.
• Performance is adversely affected by the actions that need
to be taken when a miss occurs.
• The extra time needed to bring the desired information into
the cache is called miss penalty.
• Average access time experienced by processor is given by
tavg = hC + (1 − h)M Where
h  hit rate (1-h) miss rate
C time to access data from cache
M miss penalty
Example : Suppose that the processor has access to two levels of memory. Level 1
contains 1000 words and has an access time of 0.01 µs; level 2 contains 100,000
words and has an access time of 0.1 µs. Assume 95% of the memory accesses are
found in Level 1. Calculate the average time to access a word in memory.

Total Access time=(0.95x.01 µs)+0.05(0.01 µs+0.1 µs)


=0.015 µs ͌ T1
Mapping functions
• Mapping functions determine how
memory blocks are placed in the cache.
• A simple processor example:
— Cache consisting of 128 blocks of 16 words each.
— Total size of cache is 2048 (2K) words.
— Main memory is addressable by a 16-bit address.
— Main memory has 64K words.
— Main memory has 4K blocks of 16 words each.
— Consecutive addresses refer to consecutive words.
• Three mapping functions:
— Direct mapping
— Associative mapping
— Set-associative mapping.
Mapping of MM blocks(Bj) to cache lines(Li) for
Direct mapping
Tag
Address length = (s + w )= (5 + 2 ) bits
bits
00 01 10 11 Number of addressable units = 2s+w = 25+2 words or
bytes
L0 B0 B8 B16 B24 Block size = line size = 2w = 22 =4 words or bytes
L1 B1 B9 B17 B25 Number of blocks in main memory = 2s+w/2w = 2s
=25+2/22 = 25 =32
L2 B2 B10 B18 B26 Blocks
Number of lines in cache = m = 2r =23 =8
L3 B3 B11 B19 B27 Size of cache = 2r+w = 23+2 words or bytes
L4 B4 B12 B20 B28 Size of tag = (s - r ) bits= (5 - 3 )= 2 bits

L5 B5 B13 B21 B29


L6 B6 B14 B22 B30 Tag Line Word

L7 B7 B15 B23 B31 2 3 2

i= j modulo m
Tag Line Word
Ex: 10 101 10
icache line no.
jMM block no.
mno. of lines in cache
Direct- Mapping Cache Organization
Direct mapping
Main
•Block j of the main memory maps to j modulo 128 of
Block 0
memory the cache. 0 maps to 0, 129 maps to 1.
Cache Block 1 •More than one memory block is mapped onto the
tag
Block 0
same
tag
position in the cache.
Block 1 •May lead to contention for cache blocks even if the
Block 127 cache is not full.
•Resolve the contention by allowing new block to
Block 128
replace the old block, leading to a trivial replacement
tag
Block 127 Block 129 algorithm.
•Memory address is divided into three fields:
- Low order 4 bits determine one of the 16
words in a block.
Block 255 - When a new block is brought into the cache,
Tag Block Word the the next 7 bits determine which cache
Block 256
5 7 4 block this new block is placed in.
Main memory address
Block 257 - High order 5 bits determine which of the possible
32 blocks is currently present in the cache. These
are tag bits.
•Simple to implement but not very flexible.
Block 4095
Mapping of MM blocks(Bj) to cache lines(Li) for
Associative mapping
Tag • The penalty with associative mapping is the cost
00000---11111
bits of comparing a tag with every line in the cache
L0 B0-------B31 — To do this efficiently the cache must be small
• Address length = (s + w) bits = 5 + 2 bits
L1 B0-------B31 • Number of addressable units = 2s+w words= 232
L2 B0-------B31 • Block size = line size = 2w words= 22 =4 words
• Size of tag = s bits= 5 bits
L3 B0-------B31
L4 B0-------B31
Tag bits Word bits
L5 B0-------B31
5 2
L6 B0-------B31
L7 B0-------B31
Tag bits Word bits
Ex: 11001 10
Fully Associative Cache
Associative mapping
Main Block 0
memory
•Main memory block can be placed into
Block 1
Cache
any cache position.
tag
Block 0 •Memory address is divided into two
tag
Block 1 fields:
Block 127
- Low order 4 bits identify the word
within a block.
Block 128
- High order 12 bits or tag bits identify
tag
Block 127 Block 129 a memory block when it is resident in
the cache.
•Flexible, and uses cache space
efficiently.
Block 255
•Replacement algorithms can be used to
Tag Word
Block 256 replace an existing block in the cache
12 4
Block 257 when the cache is full.
Main memory address
•Cost is higher than direct-mapped
cache because of
the need to search all 128 patterns to
Block 4095 determine whether a given block is in the
cache.
K- Way Set-Associative Cache Organization
2-Way Set Associative

S0
S1

S3

2 Lines per set


No. of sets=4 (from S0 to S3)
2- Way Set Associative
Set Bits Tag
bits
000 001 010 011 100 101 110 111
00 S0 B0 B4 B8 B12 B16 B20 B24 B28
01 S1 B1 B5 B9 B13 B17 B21 B25 B29
10 S2 B2 B6 B10 B14 B18 B22 B26 B30
11 S3 B3 B7 B11 B15 B19 B23 B27 B31
i= j modulo v No. of sets4
No. of lines per set 2
icache set no.
jMM block no.
vno. of sets in cache
Tag Set Word
3 2 2

Tag Set Word


Ex: 001 10 11
Set-Associative mapping
Main
Cache memory
Block 0 Blocks of cache are grouped into sets.
tag Block 1
Mapping function allows a block of the main
Set 0
Block 0 memory to reside in any block of a specific
tag
Block 1 set.
tag Divide the cache into 64 sets, with two blocks
Block 2
Set 1 per set.
tag Block 127
Block 3 Memory block 0, 64, 128 etc. map to block 0,
Block 128 and they can occupy either of the two
Block 129
positions.
Memory address is divided into three fields:
tag
Block 126
- 6 bit field determines the set number.
Set 63 - High order 6 bit fields are compared to
tag
Block 127 the tag fields of the two blocks in a set.
Block 255
Set-associative mapping combination of direct
Block 256 and associative mapping.
Block 257
Number of blocks per set is a design
Tag Set Word
parameter.
6 6 4
- One extreme is to have all the blocks in
Main memory address one set,requiring no set bits (fully associative
mapping).
Block 4095
- Other extreme is to have one block per
set, is the same as direct mapping.
The main memory of the computer is organized as 64 blocks, with a block
size of 8 words. The cache has eight block frames. Show the mapping
from the numbered blocks in main memory to the block frames in the
cache. Draw all lines showing the mappings clearly.
a)Show the direct mapping and the addressed bits that identify the tag
field, the block number and the word number.
b)Show the fully associative mapping and the addressed bits that identify
the tag field, and the word number.
c)Show the two way set associative mapping and the addressed bits that
identify the tag field, the set number and the word number.
2-way set associative mapping
Set Associative Mapping Summary
• Address length = (s + w) bits = (5+2) bits
• Number of addressable units = 2s+w words =25+2
words
• Block size = line size = 2w = 22=4 words or bytes
• Number of lines in set = k= 2
• Number of sets = v = 2d= 22
• Number of lines in cache = m=kv = 2 x 4=8 lines
• Size of cache = No. of sets x No. of lines per set x
no. of words per line
=2d * k * 2w words or bytes
=4 * 2* 2=16 words
• Size of tag = (s – d) = 5-2=3 bits
Example 1 :
A set-associative cache consists of 64 lines, or slots, divided
into four-line sets. Main memory contains 4K blocks of 128
words each. Show the format of main memory addresses.

No. of sets=No. of lines/No. of lines per set


=64/4= 16= 24

No. of Main Memory Blocks= 4K= 4 x 1024= 212

No. of Main Memory Locations= No. of blocks x No. of words per block
=212 x 27 = 219

No. of Address Lines= 19


No. of Words=128=27
No. of Sets=24

Tag Set Word


8 4 7
Example 2 :
A set- associative cache consists of 1024 lines, divided into 8-
line sets. Main memory contains 16K blocks of 64 words each.
Show the format of main memory addresses and
diagrammatically show the mapping of the main memory
blocks to cache lines.
Solution:
•No. of cache lines = 1024
•No. of cache lines/set = 8
•No. of sets = 1024/8 = 128 = 27
•No. of blocks in MM = 16K =16 x 1024 = 24x210
•No. of words/block = 64 = 26
•No. of words in MM = 16K x 64 = 16 x 1024 x 64
• = 24 x 210 x 26
•No. of MM address bits = 20
Tag (7 bits) Set(7 bits) Word (6 bits)
• A computer system has main memory consisting of 2M
memory locations. It has a cache of size 8K organized in
block set associative manner with 8 blocks per set and 32
words per block. Calculate the number of bits for each of
the Tag, Set and Word fields of the main memory. Draw all
lines showing the mappings clearly.
I/O Modules
Module Function
+
I/O Module Structure
+
Programmed I/O
 Three techniques are possible for I/O operations:
 Programmed I/O
 Data are exchanged between the processor and the I/O
module
 Processor executes a program that gives it direct control of
the I/O operation
 When the processor issues a command it must wait until the
I/O operation is complete
 If the processor is faster than the I/O module this is wasteful
of processor time
 Interrupt-driven I/O
 Processor issues an I/O command, continues to execute other
instructions, and is interrupted by the I/O module when the
latter has completed its work
 Direct memory access (DMA)
 The I/O module and main memory exchange data directly
without processor involvement
+
I/O Techniques
+
I/O Commands

 There are four types of I/O commands that an I/O module may
receive when it is addressed by a processor:

1) Control
- used to activate a peripheral and tell it what to do

2) Test
- used to test various status conditions associated with an I/O
module and its peripherals

3) Read
- causes the I/O module to obtain an item of data from the
peripheral and place it in an internal buffer

4) Write
- causes the I/O module to take an item of data from the data bus
and subsequently transmit that data item to the peripheral
+ 3 Techniques for Input of a Block of data
I/O Instructions
+
Interrupt-Driven I/O
+
Simple
Interrupt
Processing
+
Direct Memory Access (DMA)

 DirectMemory Access (DMA) transfers the


block of data between
the memory and peripheral devices of the
system, without the participation of
the processor.
+
Typical DMA Block Diagram

You might also like