0% found this document useful (0 votes)
59 views

Brief Overview of Cache Memory: April 2020

This document provides a brief overview of cache memory. It discusses that cache memory acts as a fast buffer between the CPU and main memory. It is used to store frequently accessed data from main memory to reduce latency. There are typically three levels of cache (L1, L2, L3) with varying speeds and capacities. When the CPU requests data, the cache controller first checks if it is present in cache (a cache hit) or needs to retrieve it from main memory (a cache miss). The document also covers cache mapping techniques, data writing methods, and differences between single-core and multi-core cache architectures.

Uploaded by

Raaz Dhakal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Brief Overview of Cache Memory: April 2020

This document provides a brief overview of cache memory. It discusses that cache memory acts as a fast buffer between the CPU and main memory. It is used to store frequently accessed data from main memory to reduce latency. There are typically three levels of cache (L1, L2, L3) with varying speeds and capacities. When the CPU requests data, the cache controller first checks if it is present in cache (a cache hit) or needs to retrieve it from main memory (a cache miss). The document also covers cache mapping techniques, data writing methods, and differences between single-core and multi-core cache architectures.

Uploaded by

Raaz Dhakal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340387148

Brief Overview of Cache Memory

Technical Report · April 2020


DOI: 10.13140/RG.2.2.22359.21921

CITATIONS READS

0 11,522

1 author:

Ameer Khan
Gift University
4 PUBLICATIONS   3 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multi-Swarm Harris Hawk's Optimization View project

Multi-verse Optimizer with Time Freeze Effect View project

All content following this page was uploaded by Ameer Khan on 02 April 2020.

The user has requested enhancement of the downloaded file.


Brief Overview of Cache Memory

Ameer Khan
Ameer Khan Research & Development Center, 52250, Pakistan
[email protected]

Introduction even when the power is turned off.


Cache memory is one of the fastest However, main purpose of all these
memories inside a computer which acts memories is to store data and provide it
as a buffer or mediator between CPU to the processing unit when required. To
and Memory (RAM). When CPU reduce the latency of data transfer
requires some data element it goes to between memories and processing units,
Cache and it that data element is present multiple strategies and techniques have
in cache, it fetches it; otherwise, cache been adopted over the decades. Major
controller requests the data from classification of memory includes,
memory. Cache contains most primary memory and secondary memory.
frequently accessed data locations. Primary Memory includes the internal
While giving high speed of data access, memories of a CPU which usually
cache is equally expensive as compared regards to registers, cache and RAM,
to other memories in a machine. Major whereas, secondary memory includes
purpose of a Cache is to reduce the hard disks or compact disks etc.
memory access time because going to
primary memory costs a lot more time Registers
compared to cache. With the
development of high speed processors,
Cache
memory access has been a bottleneck for
the throughput of computational
machines for decades. Multiple RAM
advances have been carried out to
improve the throughput of computers, HDD
one of which was the introduction of
cache memory. There are two main parts
DVD
of the cache one of which is Directory
which stores the addresses of the lines
and the other one is Data line which Cache controller
holds the data that is stored in the cache Cache controller handles the data
memory which is further addressed by requests and controls data transfer
the directory.[1] between Cache and Processor & Cache
and Memory. When processor requests a
data element, cache controller checks for
Memory hierarchy that element in Cache, and provides it to
A computer has different types of
processor if present. In case, the
memories which serve different purpose
required data element is not present in
depending upon their speed and cost.
cache, the cache controller requests that
Some of these memories are volatile
data from memory. The read and write
which means that they lose their states
requests to memory are handled by the
when power is turned off, while others
memory controller.
are non volatile which retain their states
Cache controllers handles the Levels of Cache
request by dividing it into three parts tag, There are mainly three levels of
set index and data index. Set index is cache (L1, L2, L3) which are
used to locate the corresponding line of categorized based on their speed and
the cache memory. If the valid bit capacity. Going from L1 to L3, memory
represents that the line is active, tag is access time and storage capacity
compared. In case of success in both increases.[3]
these cases, the element is fetched and it A) L1:
is considered as Cache Hit, otherwise, it This is the fastest cache and is
is a Cache Miss. placed close to or alongside of the
processor to make data access faster.
Cache Hit Level 1 cache is separate for all
If the data requested by the processors in multiprocessors machines
processor is present in the cache, it is and this is where requested data is
accessed and provided to processor by checked first. Usually its size is up to
cache controller and this is regarded as 256KB, however, in some processors
Cache Hit. Throughput of a system like Xeon it can be up to 1 MB.
largely depends upon the frequency of Instruction and Data is separate in this
cache hits because in this case, cache. However, this separation depends
Processor has to face a very short upon the architecture of the cache
latency.[2] design.
B) L2:
Cache Miss This is slower than the L1 cache and
If the data requested by the greater in size. Its size is up to 8MB.
processor is not present in the cache, it is Level 2 cache keeps the data that is
then requested from the memory and expected to be accessed by the processor
brought into the cache to make it in coming clocks. Level 2 cache is also
available for the processor and this is separate for all cores.
regarded as Cache Miss. In this case, the C) L3:
data request is delayed by a reasonable This is the slowest cache and
amount of time and the processor has to greatest in size as compared to other
wait for the data to arrive from the cache memories. Level 3 cache is up to
memory which is a comparatively time 50MB.
taking process.[2]
Data writing methods
Miss Penalty There are two main techniques for
In case of a Cache Miss, the data is data writing:
brought from the memory into cache to A) Write-Back:
make it available for the processor, In write-back a value is updated in
which is a time taking process. The time cache but is not simultaneously updated
taken to bring the data to cache from into memory. Update occurs when
memory, in case of cache miss, is update bit is set to 1.[4]
regarded as miss penalty.[2]
B) Write-Through:
In write-through a value is updated
Time to Hit in cache and memory simultaneously.
This is the time taken for the So, all the writes go to memory as well,
processing of a data request in case of a which makes the write functions
cache hit. slower.[5]
Locality and memory are consistent. That
A) Spatial: includes values in data and ordered
Spatial locality means that the data arrangement of instructions.[9]
elements to be accessed are placed close
to each other in space.[6] Cache Mappings
B)Temporal: A) Fully Associative:
Temporal Locality means that the In fully associative cache, a new
data elements are accessed frequently in cache line can be placed anywhere in the
time.[7] cache. It has comparators on all the
elements of the directory. It is slower
Single Core & Multi Core than other types of mappings but it is
Machines highly versatile.[8]
Cache memory in Single Core is B) Directly Mapped:
quite simple. It is used to access the data In directly mapped cache, a cache
required by the CPU. If found in cache, line has a unique address where it is
the data is provided promptly to the placed in cache. Each block in memory
CPU and in case it is not present in the is mapped to only one line of the cache,
cache it is fetched and loaded from the so we don’t need to check in other lines.
memory into cache and then provided to It has one comparator that is used for
CPU. comparison which makes it faster,
In Multi Core machines, the however, less versatile.[1]
working of cache is relatively C) n-Way set Associative:
complicated because each core has its These are intermediate schemes
own cache but uses the same main between Fully Associative and Directly
memory. So, when something is updated Mapped Cache. In these schemes a new
by a processor, it is updated in the cache cache line is to be placed in any of ‘N’
of that processor making it difficult to cache lines. In 4-Way Set Associative
keep consistency and coherency Cache, for example one line can be
between all the cache memories of all mapped to 4 different locations of the
cores and main memory. cache. This is relatively faster and
Coherency: relatively versatile when compared to
In multi-core systems, all cores have above mentioned mapping schemes.[1]
their own cache and it is expected of the
cache memories to work so smoothly Unified vs Split Cache
that the data provided to a processor is A) Unified Cache:
correct, at a given point in time. When In unified cache, code and data are
one core updates a value in its cache, all located in the same cache and the
the other copies of the same value portion of cache taken by both code and
should become invalidated and should data can vary accordingly to the
not be provided to any processor for any situation. So, all the fetch or load
operation unless the updated value is requests of both code and data come to
propagated to all the locations. This the same cache. In unified cache we
phenomenon is called cache coherency. only need to design and handle one
It can be explained in simple words as, cache.[10]
All the cores see the same data or they B) Split Cache:
are all provided the correct data at all In split cache, code and data are
times.[9] placed separately on two different cache
Consistency: portions. In this cache, the size of code
Cache to memory consistency refers and data portion is not flexible and can
to the fact that the data copies in cache not be changed according to the
situation. In this cache, all the load Measuring Cache Performance
requests of data come to data cache AMAT (Average Memory Access
portion and all the fetch requests of the Time) is a measure that is used to test
code come to code cache portion. It is performance of a memory. It is
very effective pipe-lining. This is also calculated using following formula:
known as I&D cache.[10]
AMAT = Hit Time+ Miss Rate × Miss Penalty
Write Buffer for Cache:
In ARM (advanced RISC machines), Performance of a cache can be improved
a write buffer is used between cache and by following things:
memory to improve the performance of 1. Reducing Miss Rate
memory write requests. A first in first 2. Reducing Miss Penalty
out (FIFO) stack is used in this write 3. Reducing Time to Hit
buffer. Data is transferred to write buffer
at the speed of the clock of the At the start, there is compulsory
system.[11] miss because it is the first access by the
processor to the memory. Due to limited
Victim Cache: size of the cache, data that was used in
Victim cache holds the recently the past and not required now is replaced
replaced elements from the cache. It is with the data that is required now and in
fully associative and used to reduce miss that case cache miss occurs. There can
rates. In case of a cache miss, victim be misses due to collision too. Reducing
cache is checked before going to the the number of misses will increase the
memory for a data element. If it is found number of Hits which eventually
in victim cache, that block of victim increases the throughput by increasing
cache is swapped with the main memory access time significantly.[13]
cache.[12]
References

[1] Hill, M.D., 1987. Aspects of cache memory and instruction buffer performance
(No. UCB/CSD-87-381). CALIFORNIA UNIV BERKELEY DEPT OF
ELECTRICAL ENGINEERING AND COMPUTER SCIENCES.
[2] Kowarschik, M. and Weiß, C., 2003. An overview of cache optimization
techniques and cache-aware numerical algorithms. In Algorithms for memory
hierarchies (pp. 213-232). Springer, Berlin, Heidelberg.
[3] Knotts, B.W., NCR Corp, 1997. Coherent copyback protocol for multi-level
cache memory systems. U.S. Patent 5,671,391.
[4] Steps, S.C., HP Inc, 1989. Write-back cache system using concurrent address
transfers to setup requested address in main memory before dirty miss signal
from cache. U.S. Patent 4,858,111.
[5] Martinez Jr, M.W., Bluhm, M., Byrne, J.S., Courtright, D.A., Duschatko, D.E.,
Garibay Jr, R.A. and Herubin, M.R., Cyrix Corp, 1996. Coherency for write-back
cache in a system designed for write-through cache including write-back latency
control. U.S. Patent 5,524,234.
[6] Kumar, S. and Wilkerson, C., 1998, July. Exploiting spatial locality in data
caches using spatial footprints. In Proceedings. 25th Annual International
Symposium on Computer Architecture (Cat. No. 98CB36235) (pp. 357-368)
IEEE.
[7] Song, Y. and Li, Z., 1999. New tiling techniques to improve cache tempora
locality. ACM SIGPLAN Notices, 34(5), pp.215-228.
[8] Singh, J.P., Stone, H.S. and Thiebaut, D.F., 1992. A model of workloads and its
use in miss-rate prediction for fully associative caches. IEEE transactions on
computers, (7), pp.811-825.
[9] Petrot, F., Greiner, A. and Gomez, P., 2006, August. On cache coherency and
memory consistency issues in NoC based shared memory multiprocessor SoC
architectures. In 9th EUROMICRO Conference on Digital System Design
(DSD'06) (pp. 53-60). IEEE.
[10]Coutinho, L.M., Mendes, J.L.D. and Martins, C.A., 2006, October.
Mscsim-multilevel and split cache simulator. In Proceedings. Frontiers in
Education. 36th Annual Conference (pp. 7-12). IEEE.
[11]Miyake, J., Panasonic Corp, 1996. Cache memory with a write buffer indicating
way selection. U.S. Patent 5,564,034.
[12]Peled, G. and Spillinger, I., Intel Corp, 2001. Trace victim cache. U.S. Patent
6,216,206.
[13]Sun, X.H. and Wang, D., 2012. APC: a performance metric of memory systems.
ACM SIGMETRICS Performance Evaluation Review, 40(2), pp.125-130.

View publication stats

You might also like