Topic_2_Memory_Hierarchy_Design_Fundamentals
Topic_2_Memory_Hierarchy_Design_Fundamentals
Instructor:
Prof. Rajat Subhra Chakraborty
Professor
Dept. of Computer Science and Engineering
Indian Institute of Technology Kharagpur
IIT KHARAGPUR Kharagpur, West Bengal, India 721302
E-mail: [email protected]
Different Levels of Memory
The typical levels in the hierarchy slow down and get larger as we move away from the processor for a large workstation or
small server. Embedded computers might have no disk storage and much smaller memories and caches. Increasingly, FLASH is
replacing magnetic disks, at least for first level file storage. The access times increase as we move to lower levels of the hierarchy,
which makes it feasible to manage the transfer less responsively. The implementation technology shows the typical technology used
for these functions. The access time is given in nanoseconds for typical values in 2017; these times will decrease over time.
Bandwidth is given in megabytes per second between levels in the memory hierarchy. Bandwidth for disk/FLASH storage includes
both the media and the buffered interfaces.
Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time between
processor memory requests (for a single processor or core) and the latency of a DRAM access, is plotted over time. In mid-
2017, AMD, Intel and Nvidia all announced chip sets using versions of HBM technology. Note that the vertical axis must be on
a logarithmic scale to record the size of the processor-DRAM performance gap. The memory baseline is 64 KiB DRAM in 1980, with
a 1.07 per year performance improvement in latency (see Figure 2.4 on page 88). The processor line assumes a 1.25 improvement
per year until 1986, a 1.52 improvement until 2000, a 1.20 improvement between 2000 and 2005, and only small improvements in
processor performance (on a per-core basis) between 2005 and 2015. As you can see, until 2010 memory access times in DRAM
improved slowly but consistently; since 2010 the improvement in access time has reduced, as compared with the earlier periods,
although there have been continued improvements in bandwidth.
Indian Institute of Technology Kharagpur
H&P CA:A QA (6th. Ed.)
Disk Storage
• Nonvolatile, rotating magnetic storage
• Important concepts: sector, track, cylinder
Given:
512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms
controller overhead, idle disk
Capacity and access times for DDR SDRAMs by year of production. Access time is for a random
memory word and assumes a new row must be opened. If the row is in a different bank, we assume the
bank is precharged; if the row is not open, then a precharge is required, and the access time is longer. As
the number of banks has increased, the ability to hide the precharge time has also increased. DDR4
SDRAMs were initially expected in 2014, but did not begin production until early 2016.
Row buffer
Allows several words to be read and refreshed in parallel
Synchronous DRAM
Allows for consecutive accesses in bursts without needing to send each address
Improves bandwidth
DRAM banking:
Allows simultaneous access to multiple DRAMs
Improves bandwidth
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits
Direct mapped
32-bit addresses
8-bit index => 256 blocks
Word size: 4 bytes (32 bits)
Block size: 16 words => 64
bytes (512 bits)
Cache size: (for the data
blocks only)
256 blocks * (64 bytes/block)
= 16 kB
Block offset (4 bits) selects
one of the 16 possible
words using a 16:1 MUX,
Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.) and returns to processor
Cache Misses
Basic idea: give more options to map a memory block to the cache
Fully Associative
Allow a given block to go in any cache entry
Requires all entries to be searched at once
Tag comparator per entry => expensive
n-way Set Associative
Each set contains n entries
# of Blocks in cache = (# of sets) * (associativity) = (# of sets) * n
Block number determines set:
Set no. = Block no. % (# of sets)
Search all entries in a given set at once
n tag comparators => less expensive than fully associative cache
Note: Direct mapped cache is 1-way set associative!
Indian Institute of Technology Kharagpur
Associative Cache Examples
Direct mapped:
Capacity Misses:
In addition to compulsory misses
The cache cannot contain all the blocks needed during execution
Repeatedly, blocks will be discarded and later retrieved after miss