0% found this document useful (0 votes)
26 views

Cache Memory: An Analysis On Performance Issues: June 2021

Paper for cache memory

Uploaded by

sridevi setlem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Cache Memory: An Analysis On Performance Issues: June 2021

Paper for cache memory

Uploaded by

sridevi setlem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352825606

Cache Memory: An Analysis on Performance Issues

Conference Paper · June 2021


DOI: 10.1109/INDIACom51348.2021.00033

CITATIONS READS
19 6,905

6 authors, including:

Ahmad Alsharef Sonia Garg


Shoolini University University of Delhi
9 PUBLICATIONS 171 CITATIONS 38 PUBLICATIONS 656 CITATIONS

SEE PROFILE SEE PROFILE

Syed Rameem Zahra Monika Arora


Netaji Subhas University of Technology Delhi Amity University
27 PUBLICATIONS 304 CITATIONS 52 PUBLICATIONS 287 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Sonia Garg on 01 December 2021.

The user has requested enhancement of the downloaded file.


Cache Memory: An Analysis on Performance Issues
Sonia Ahmad Alsharef Pankaj Jain
Yogananda School of AI, Computer and Yogananda School of AI, Computer and Department of Management
Data Sciences Data Sciences JJT University
Shoolini University Shoolini University Rajasthan, India
Solan, India Solan, India [email protected]
[email protected] [email protected]
2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) | 978-93-80544-43-4/21/$31.00 ©202110.1109/INDIACom51348.2021.00033

Monika Arora Syed Rameem Zahra Gaurav Gupta


Department of Operations National Institute of Technology (NIT) Yogananda School of AI, Computer and
Apeejay School of Management Srinagar, India Data Sciences
New Delhi, India [email protected] Shoolini University
[email protected] Solan, India
[email protected]

Abstract—Cache memory is mainly inculcated in systems to cache memory. Very frequent and large numbers of memory
overcome the gap created in-between the main memory and requests are seen in the cache memory although it is much
CPUs due to their performance issues. Since, the speed of the smaller in size as compared to the main memory. Various
processors is ever-increasing, so a need arises for a faster speed levels of the cache memory are present in the computer,
cache memory that can definitely assist in bridging the gap among them the largest and the slowest cache is being
between the speed of processor and memory. Therefore, this present at the last level. In contrast, the smallest and the
paper proposes architecture circumscribed with three fastest cache will be found at first level cache. The level one
improvement techniques namely victim cache, sub-blocks, and cache, that is L1, is normally reside inside the processor;
memory bank. These three techniques will be implemented one
however, the level two, L2, and level three, L3, caches are
after other to improve and make the speed and performance of
cache comparative to main memory. Moreover, the different
present on separate chips out of the processor. Each
variables like miss penalty ratio, access speed of cache and miss processor, in the multi-core processors, will have its personal
rate ratio, which were already in use, are used in this paper to L1 cache memory and all the cores will be shared by the last
estimate the cache memory performance after implementation level of the cache.
of proposed approach. After performance estimation it can be Whenever a word is to be searched, at first the CPU
determined that proposed approach at level 1, using Victim searches primary address of that word in its cache memory.
Cache technique decreases the rate of misses, at level 2, Sub-
If it is found there a HIT occurs, else a MISS occurs. In that
blocks Division technique further reduces the penalty ratio of
case the word is searched in their main memory and then
miss rate and then at level 3 Memory Bank Technique is useful
in further decreasing memory access time. Thus, using the data sub-part is fetched from its main memory and finally
suggested approach, performance of Cache Memory can be stored in the cache for future reference. Hit Ratio is defined
improved several times. as the ratio between the, number of HITS divided by the sum
of number of HITs and the number of Misses. The HIT ratio
Keywords—RAM, Miss Ratio, Access Rate, Hit Ratio, Tags closer to one can be taken as on:
and Addresses, Associative Mapping
x If the memory address is accessed first time then
I. INTRODUCTION Misses will take place.
Cache Memory is an important element in the computer x If 2 blocks are simultaneously mapped on the same
that could affect the program execution because its access address, Misses could occur because of the
time is less than the access time of the other memories. It is inadequate size.
the fastest component in the memory hierarchy and
approaches the speed of CPU components. It works on the x Misses could occur because of the small size of
property of ‘locality of reference’ i.e. reference to memory at cache.
any given interval of time tends to be confined to a few The cache MISS rates in addition to the handling time
localized areas in memory. The active instructions of the needed for the cache are two main factors which have the
program in addition to the variables will be stored in the major impact on the performance of cache. Victim cache
cache memory; this will reduce the total execution time of (which is a short-term storage space of line of cache
the program because of reducing the average access memory memory, removed from cache) and with it column
time. associative cache can be used to reduce cache miss rate.
The concept behind the Cache memory organization is to Overall, cache memory acts as an connecting point in-
make the average access time of the memory approaches the between the CPU and slower memory unit, since it can
average access time of the cache memory by keeping the provide data at the fast pace for execution inside memory
instructions and data, which are most accessed, in the fast which has been shown in Fig. 1.

978-93-80544-43-4/21/$31.00 2021
c IEEE 184

Authorized licensed use limited to: University of Nevada Las Vegas. Downloaded on September 24,2021 at 11:42:57 UTC from IEEE Xplore. Restrictions apply.
the matching address in the memory. If the address was
found, the content pair is then fetched for the associative
memory and stored in it.

Fig. 1. Location of cache memory

Since cache systems are based on on-chip memory


essentials, which can retain data, so it also works like a
buffer to act as an intermediate for processors and the main
memory of a system. Recent studies propose that for
deciding the multi-core systems performance, on-board
elements like cache memory performance plays a crucial
role. Indeed, cache memory is quicker than main memory
and RAM because it is very much nearer to the
microprocessor. Thus, it becomes a basic need for
transferring data synchronously in-between the processor
and main memory. Data storage on cache has substantial
advantages in comparison to RAM. One of them is the speed
Fig. 2. Associative mapping Cache (all numbers in octal)
of cache memory for fetching-up the data. However, it
consuming more energy as it is on-chip. As we know that
2) Direct Mapping
speed of the processors are increasing day by day so a dire
need arises to increase the speed of cache memory. Probably, Fig. 3 illustrates the possibility that whether RAM can
the faster cache can assist in narrowing the gap between the be used for cache or not. The fifteen bits of processor
processor and memory speed. Thus, in this paper three address will be distributed among 2 parts. The index part
improvement techniques namely victim cache, sub-blocks, will be framed by considering 9 LSB (least Significant Bits)
memory bank has been implemented simultaneously to while the tag field will be determined by the six bits. As
increase the cache speed at each level. These techniques will shown in figure 3, both tags and index parts should be
be implemented one after other to improve and make the included in the address of the main memory. Here, the bit
speed and performance of cache comparative to main number of the index field should match the address bit
memory. Moreover, the different variables like miss penalty needed to take up the cache memory. Commonly, 2k words
ratio, access speed of cache and miss rate ratio, which were are the existed size of cache memory where 2n words will be
already in use, has been used in this paper to estimate the in main memory. The n bits address of the memory can be
cache memory performance after implementation of distributed among the index field of K-bits and tag field of n-
proposed approach. k bits. The n-bits address can be utilized by the direct
mapping of cache address is used by the cache direct
II. BACKGROUND mapping to access the main memory while the k-bits index
The concept used in this paper has been based on certain address is used to access the cache. Words in cache memory
techniques as described below. will be organized as shown in Fig. 3, where both data and its
associated tag will be combined to form one word. If a new
A. Existing Mapping Techniques word is first fetched to the cache, the cache bits will be
There are 3 main mapping techniques for organization of stored together with the data-bits. If a memory request is
Cache. generated by the CPU, then the cache will be accessed by its
address using the index field. After that, tag part of processor
1) Associative Mapping address and the tag field of word going to be considered
Cache memory works on the basis of associative memory from cache part will be compared to each other in order to
since it is the most flexible and fastest. An associative fetch the required data if match occurs.
memory is used to store addresses and content of data for the
3) Set Associative Mapping
word memory which is shown in Fig 2. This allows any
address in cache to store any word from the main memory. Set associative mapping is another type of cache
The fifteen-bits address value is represented as a 5 digit octal organization, which is built above the organization of direct
value and 12 bits word corresponding to it is represented as mapping to improve its performance by giving the ability to
a 4 digit octal value. Processor address value of fifteen digits each word in the cache to store two or more words using the
is sent to the argument register and then the associative same index address. In this organization, every word will be
memory will search for the memory address. In case, the stored together with its tag and then a set will be formed by
address exists in the associative cache memory, the twelve- considering several tag items of data in just one word of
bit data will be sent from it to the CPU. Otherwise, the order cache.
will be sent to the main memory and then it will search for To illustrate, consider a main memory accommodating

2021 8th International Conference on Computing for Sustainable Global Development (INDIACom 2021) 185

Authorized licensed use limited to: University of Nevada Las Vegas. Downloaded on September 24,2021 at 11:42:57 UTC from IEEE Xplore. Restrictions apply.
1024 words where each and every word from the cache In [8] and [9], average access memory time reduction
memory consist of 2 data words. Commonly, K-words of approach is proposed via adding victim cache. Kim and Song
main memory for every word of cache memory can be detailed out the effect of cache memory on the data storage
considered by a set-associative cache of the given set size as [10]. Some researchers have discussed cache memory
demonstrated in Fig. 4. performance with other memories like scratchpad memory,
split memory, etc. [11], [12]. Kolanski [13] proposed the
usage of Cache memory for mobile and embedded systems.
Smith et al. [14] suggested the performance for virtual
memory using a page based approach which has been
affected by page fault frequency. Moreover, Banday and
Khan [15] presented an overview and review on present
status of performance in Cache memory. They also discussed
about the Cache performance and its adaptability, when
system is working over the network.
A summary of related research works, done by the
researchers to improve the cache performance, is presented
in Table I.

TABLE I. SUMMARY OF RELATED RESEARCH WORKS

Reference Title of Paper Advantages Gaps


Fig. 3. Illustration of relation between the Main and Cache memory [1] x Virtual Memory Provides higher Not examined
Benefits and Uses associativity in average memory
Index Tag Data Tag Data terms of time and access time
space to improve reduction
000 01 3450 02 5670 cache miss rate happened due to
his approach.
[2], [3] x A Self-Tuning Proposed Any measure to
Cache Architecture automatic self- improve cache
for Embedded adaptable cache performance has
Systems memory models not been
x Column- for memory suggested.
associative systems
Caches: a
777 02 6710 00 2340 Technique for
Reducing the Miss
Fig. 4. Two-way set–associative mapping Cache Rate of Direct-
mapped Caches
If a request from memory is generated by the processor, [4], [5] x An overview of Analyzed Not suggested
the cache will be accessed by the index value of the address. modern cache performance of any
After that, the tag field of the processor address and memory and cache memory in improvement
combined two tags in the cache memory compared with each performance context of approach in
other in order to fetch the required data and then sees analysis of replacement terms of
replacement policies. replacement
whether the match occurs or not. The logical comparison policies policies.
will be processed by searching tags associatively in the set in x Implementation of
the same manner as we are doing associative memory search. LRU Replacement
Due to this it is named as ‘set associative’. An improvement Policy for
in the hit ratio will be shown due to the rise of set size since Reconfigurable
different tags can be assigned to the same index. Cache Memory
Using FPGA
III. LITERATURE REVIEW [8], [9] x A Fine-Grained Used victim Technique is
Configurable cache and beneficial
Various researches have shown how to improve cache Cache Architecture proposed average however can’t
performance. Chaplot [1] has proposed an approach that can for Soft Processors access memory improve the
provide higher associativity to improve cache miss rate and x Cache Memory time reduction cache
Simulators: A technique. performance up
discussed two locality types in time and in space, i.e. Comparative Study to a certain
temporal locality and spatial locality respectively. In [2] and extent.
[3], researchers have proposed automatic self-adaptable [14] x Modelling and Analyzed and Technique alone
Cache memory models for memory systems. The Enhancing Virtual suggested is not sufficient
performance analysis of cache memory, in context of Memory technique to to improve the
Performance in improve the performance of
replacement policies, is presented in [4] and [5]. K. performance of cache memory.
Logic Simulation
Westerholz [6] has proposed the implementation of cache virtual memory
designs and efficiency of cache driven memory management that has been
to bridge the communication between the CPU and main affected after
memory. Divya [7] suggested a very useful graceful code for using page fault
an efficient virtual memory [7]. frequency.

186 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom 2021)

Authorized licensed use limited to: University of Nevada Las Vegas. Downloaded on September 24,2021 at 11:42:57 UTC from IEEE Xplore. Restrictions apply.
Based upon the tabulated summary and reviewed The victim cache behavior in the corresponding
literature, it is observed that there is need to have a interaction with the associated level if cache is described
comprehensive model which could provide the improvement below
in cache memory performance at a place. Therefore, we have
suggested an architecture containing all together three Cache Hit: No action
improvement techniques namely victim cache, sub-blocks, Cache Miss means Victim Hit: In this case, the blocks
memory bank. These techniques will be implemented one inside the victim cache and the cache will be swapped.
after other to improve and make the speed and performance Therefore, the new block which has been stored in victim
of cache comparative to main memory. cache will be considered as the block that has been used
most recently.
IV. PROPOSED ARCHITECTURE FOR IMPROVING CACHE
PERFORMANCE Cache Miss means Victim Miss: The part from next level
will be fetched to cache and the part coming out of cache
As discussed in previous sections, there arises a dire need will get stored in victim cache.
of detailed research in order to bridge the speed gap in-
between cache memory and main memory. Thus, in this This can be explained through an example: Assume a
section, a solution of this problem has been suggested cache of direct-mapped category containing two blocks A, B
through an architecture. referring towards the same set of values. Then it is
considered to be linked to a second entry fully associative
In this architecture, different techniques have been victim cache having C, D blocks. The path to be considered
combined together that extracts out one by one, removing the is as shown in the Figure 6, i.e. A, B, A, B…. and so on.
limitations of previous one. This step by step architecture
will increase the speed of cache memory considering all of
techniques comprehensively where each one of them
individually is helpful in improving the performance of the
memory. Further, in the presented paper, different techniques
and methodologies has been examined to assess the
usefulness of proposed technology in improving
performance. The architecture has been illustrated with the
flow diagram in Fig. 5, and then elaborated in detail with its
implementation.

Fig. 6. Implementation example

As shown in the Fig. 6, it can be observed that when a


victim cache hit occurs, part of blocks A and B will be
swapped and the LRU block of victim cache does not
change. Thus, we got an illusion of associatively towards the
direct-mapped L1 cache which in contrast decreasing the
number of conflict misses.
If there are 2 cache memories, Level1 and Level2 with a
particular policy (L2 and L1 do not store similar memory
locations twice), L2 behaves like victim cache for L1.
Step2: Reducing the Penalty Ratio of Cache Miss
The performance could be affected by tags at which
Fig. 5. Proposed architecture for improving Cache performance
much space is required, and cache speed is decreased. To
decrease the required space of optimizing tags (in addition to
Step 1: Reducing The Miss Rate using Victim Cache
making them shorter) on the chip, large blocks are used. Due
A victim cache can also be called as hardware cache to reducing the necessary misses, the miss rate might be
memory, is used to boost hit latency rate for Direct-mapped decreased too. However, because the full block should be
cache memories and to reduce the conflict miss rate. This transferred between cache and other memories, the miss
cache can be placed at the refill path of the cache coming at penalty will be high.
Level 1, so that whenever any cache-line gets thrown out
from the cache memory, it will again get cached in the To overcome this problem, every block has to be divided
Victim cache. Thus, any kind of data is evicted from the into sub-blocks as shown in Fig. 7, each one of them has a
Level 1 cache, then the victim cache will automatically get valid bit.
populated. The victim cache will be considered only when Although the tag is valid for the whole block, just a
there arises a miss in Level 1. The contents of the matching single block has to be read on a miss. So that a smaller miss
victim cache line and the Level 1 cache-line will be swapped penalty will be achieved since a block cannot be identified as
if the access results are coming out as a hit. the minimum unit fetched between cache and memory
Implementation of Victim Cache: anymore.

2021 8th International Conference on Computing for Sustainable Global Development (INDIACom 2021) 187

Authorized licensed use limited to: University of Nevada Las Vegas. Downloaded on September 24,2021 at 11:42:57 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
Sub-blocks
[1] V. Chaplot, “Virtual Memory Benefits and Uses,” International
Journal of Advance Research in Computer Science and Management
100 1 1 1 1 Studies, vol. 4, no. 9, 2016.
Looks
a lot [2] C. Zhang, F. Vahid, and R. Lysecky, “A Self-Tuning Cache
300 1 1 0 0 like
Architecture for Embedded Systems,” ACM Transactions on
Embedded Computing Systems, vol. 3, no. 2, pp. 407-425, 2004.
200 0 1 0 1 write
buffer [3] A. Agrawal and S. D. Pudar, “Column-associative Caches: a
Technique for Reducing the Miss Rate of Direct-mapped Caches,” in
204 0 0 0 0 Proc. of the 20th Annual International Symposium on Computer
Architecture, 1993, pp. 179-190.
Fig. 7. Dividation of block into sub-blocks
[4] S. Kumar and P. K. Singh, “An overview of modern cache memory
and performance analysis of replacement policies,” in Proc. of the
Step3: Decreasing Memory Access Time IEEE International Conference on Engineering and Technology
Memory bank is a hardware-dependent logical unit of (ICETECH), 2016, Coimbatore, India, pp. 210-214.
storage in electronics. In a computer, the memory controller [5] S. S. Omran and I. A. Amory, “Implementation of LRU Replacement
Policy for Reconfigurable Cache Memory Using FPGA,” in Proc. of
with physical organization of the slots of the hardware the International Conference on Advanced Science and Engineering,
memory, define the memory bank. A memory bank is a 2018, pp. 13-18.
partition of cache memory that is addressed sequentially in [6] K. Westerholz, S. Honal, J. Plankl, and C. Hafer, “Improving
the entire set of memory banks. For example, assuming that performance by cache driven memory management,” in Proc. of the 1st
a(n) is data item which is stored in bank (b), the following IEEE Symposium on High Performance Computer Architecture,
data item a(n+1) will be stored in bank (b+1). To avoid the 1995, Raleigh, USA, pp. 234-242.
bank cycle time impacts, cache memory will be separated [7] Y. A. Divya, “An Efficient Virtual Memory using Graceful Code,”
International Journal of Trend in Scientific Research and Development
into several banks. Therefore, if data is saved or recovered vol. 3 no. 4, pp. 623-626, 2019.
sequentially each bank will have sufficient time to recover
[8] M. Biglari, K. Barijough, M. Goudarzi, and B. Pourmohseni, “A Fine-
before processing the next request of the same bank. Grained Configurable Cache Architecture for Soft Processors,” in
Proc. of the 18th CSI International Symposium on Computer
The required number of memory modules to have a Architecture and Digital Systems (CADS), 2015.
similar number of data bits as the bus. A bank can contain
[9] A. Q. Ahmad, M. Masoud, and S. S. Ismail, “Average Memory Access
many numbers of memory modules as illustrated in Fig. 8. Time ReductionVia Adding Victim Cache,” International Journal of
Applied Engineering Research, vol. 11, no. 19, pp. 9767-9771, 2016.
[10] Y. Kim and Y. H. Song, “Impact of processor cache memory on
storage performance,” in Proc., of the International SoC Design
Conference (ISOCC), 2017, Seoul, pp. 304-305.
[11] W. P. Dias and E. Colonese, “Performance Analysis of Cache and
Scratchpad Memory in an Embedded High Performance Processor,” in
Proc. of the 5th International Conference on Information Technology:
New Generations, 2008, Las Vegas, USA, pp. 657-661.
[12] K. Singh and S. Khanna, “Split Memory Based Memory Architecture
with Single-ended High Speed Sensing Circuit to Improve Cache
Memory Performance,” in Proc. of the 6th International Conference on
Signal Processing and Communication (ICSC), 2020, Noida, India, pp.
188-193.
Fig. 8. Memory bank modules
[13] R. Kolanski, “A logic for virtual memory, “Electronic Notes in
Theoretical Computer Science, vol. 217, pp. 61-77, 2008.
V. CONCLUSION AND FUTURE SCOPE [14] S. P. Smith and J. Kuban, “Modelling and Enhancing Virtual Memory
In this work, different methods are proposed to improve Performance in Logic Simulation,” in Proc. of the IEEE International
Conference on Computer-Aided Design, January 1988, pp. 264-265.
the performance of cache memory. The proposed methods
are discussed in detail to find out the advantages and [15] M. T. Banday and M. Khan, “A study of recent advances in cache
memories,” in Proc. of the International Conference on Contemporary
limitations of each one. Furthermore, several metrics were Computing and Informatics (IC3I), 2014, Mysore, India, pp. 398-403.
used to compare the behaviour of them. Therefore, the main
redemptions that we draw out from this study is that the
conflict miss rate can be decreased after taking bigger block
size, however for that, more cache size is required. Usage of
bigger block size can rise penalty ratio of misses, decrease
the time of the hit in addition to decreasing the power
dissipation. Larger cache results in slow access time and
more cost. However, more associatively results in fast access
time and consequently less cycle time or lesser number of
cycles. Victim Cache always decreases the rate of misses but
at a higher cost in contrast to look aside miss cache.
As a future work, implementation of suggested model
will be done. Moreover, further improvement of cache
performance will be suggested by considering several
methods that can predict future access of data and
instructions.

188 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom 2021)

Authorized licensed use limited to: University of Nevada Las Vegas. Downloaded on September 24,2021 at 11:42:57 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like