0% found this document useful (0 votes)
38 views7 pages

Weighting Cache Replace Algorithm For Storage System: Yihui Luo Changsheng Xie Chengfeng Zhang

This document proposes new cache replacement algorithms for storage systems called Weighting LFU (WLFU), Weighting LRU (WLRU), and Weighting LFRU (WLFRU). It begins by discussing how usual cache replacement algorithms assume higher cache hit ratios lead to lower average access times, but this may not be true for storage systems that have devices with varying speeds. It then analyzes I/O performance in storage systems and concludes higher speeds are obtained by increasing the cache hit ratios of objects with longer device access times. Based on this, it proposes the new weighting algorithms designed to minimize average access time rather than just increase hit ratio.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Weighting Cache Replace Algorithm For Storage System: Yihui Luo Changsheng Xie Chengfeng Zhang

This document proposes new cache replacement algorithms for storage systems called Weighting LFU (WLFU), Weighting LRU (WLRU), and Weighting LFRU (WLFRU). It begins by discussing how usual cache replacement algorithms assume higher cache hit ratios lead to lower average access times, but this may not be true for storage systems that have devices with varying speeds. It then analyzes I/O performance in storage systems and concludes higher speeds are obtained by increasing the cache hit ratios of objects with longer device access times. Based on this, it proposes the new weighting algorithms designed to minimize average access time rather than just increase hit ratio.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Weighting Cache Replace Algorithm for Storage

System
1 2 2 2
Yihui Luo Changsheng Xie Chengfeng Zhang
1
School of mathematics and Computer Science, Hubei University, Wuhan 430062, P.R. China
2
National Storage System Laboratory, School of Computer Science, Huazhong University of Science and
Technology, Wuhan 430074, P.R. China

Abstract There are many cache replace algorithms such as


FIFO, LFU, LRU, and some varieties of them. All of
Usual cache replace algorithms are based on the them improve the cache performance by increasing the
assumption that the high cache hit ratio can bring cache hit ratio based on the locality of data reference.
lower average access time, which is right as storage Must the higher cache hit ratio bring the better I/O
system has the same device access time. By analyzing performance in the storage system? If the speeds of
the performance of storage system, a conclusion is storage devices are the same in storage system the
drawn that storage system can get lower average conclusion is right. However, storage devices usually
access time only when the cache hit ratios of some have different I/O speeds in storage system, at the case
objects with long device access time are higher. Based the conclusion is no more right, so the usual
on this, some weighting cache replace algorithms such algorithms are not the optimization algorithms for the
as Weighting LFU (WLFU), Weighting LRU (WLRU) storage system. Based on this standpoint, we propose a
and Weighting LFRU (WLFRU) are proposed. These new cache replace algorithm named Weighting Least
algorithms are designed to minimize average access Frequently/Recently Used (WLFRU) algorithm, which
time and algorithm overhead. Experiment proved that is designed not to increase the cache hit ratio but to
WLFU and WLRU had better performance than usual minimize the average access time of the storage
algorithms such as LRU, and WLFRU had the best system. In order to do so, WLFRU is designed to
performance. increase the cache hit ratios of objects with longer
device access time, which considers not only I/O
Keywords: Cache replace algorithm, Average access locality but the difference of device access time.
time, Algorithm overhead, Weighting value The rest of this paper is organized as the
following: after analyzing the related works, we first
analyze the I/O performance of storage system with
1. Introduction cache, through which we prove that the higher cache
With processor speed increasing dramatically over the hit ratios doesn’t result to higher I/O speeds, at the
last few years and main memory density doubling same time, we can get a conclusion that the higher I/O
every two years, I/O appetite continues to grow, speeds are obtained only when it is higher to the cache
especially with the development of applications such hit ratios of objects with longer device access time.
as multimedia and network, which place an ever- Based on the conclusion, we then give the Weighting
increasing demand on the storage subsystem. Least Frequently Used algorithm (WLFU), Weighting
Although the storage device I/O speeds have increased Least Recently Used algorithm (WLRU) and WLFRU.
greatly, it cannot satisfy the demand of computer. As a Afterwards, we give the experiment to prove our cache
result, storage systems become the bottleneck of algorithm right. At last, we give a conclusion and the
computer system and the performance of computer future work.
system is not the optimization. There are many
methods to improve the performance of storage system 2. Related Works
and one of the most effective methods is to place a
cache in storage system. The cache devices are usually There are many researches about cache replace
fast storage devices but have less capacity, so the algorithm. The algorithm LRU always replaces the
suitable replace algorithms must be used in cache to least recently used objects. Various approximations
improve the performance of storage system. and improvements to LRU abound, see, for example,
enhanced clock algorithm [1]. If the workload or the
request stream is drawn from a LRU Stack Depth conclusions that is the base of designing cache replace
Distribution (SDD), LRU is the optimal policy. LRU algorithm in order to optimize its I/O performance.
has several advantages, for example, it is simple to 3.1. The Model of Hierarchical Storage System
implement and responds well to changes in the
underlying SDD model. However, while the SDD The hierarchical storage system is described as figure
model captures “recency”, it does not capture 1 that consists of cache and storage devices. The cache
“frequency”. The algorithm LFU replaces the least is a small storage device with high speed, which
frequently used objects. A relatively recent algorithm maybe the server memory or its local disk. The storage
LRU-2 [2] approximates LFU, which remembers the devices include different devices such as disk,
last two times for each object, when it is requested, CDROM, tape or other storage devices, which may
and to replace the object with the least recent have different access speeds. The links between cache
penultimate reference. Algorithms, which consider and devices may be bus or network, and their
both recency and frequency, are Frequency-based communication speeds maybe different. If it is
replacement (FBR) [4], Least Recently/ Frequently network, its speed may vary as the network load varies.
Used (LRFU) [5], multi-queue replacement (MQ) [6]. In order to simplify calculation, we suppose that the
Based on those, an algorithm named Adaptive communication time is contained in access time of
Replacement Cache (ARC) was proposed in [7]. The devices and the speeds don’t vary with the network
basic idea behind ARC is to maintain two LRU lists of load varies. The access data maybe data blocks or files
objects. One list, say L1, contains objects that have in the storage system and their sizes maybe the same
been seen only once “recently”, while the other list, or different, so we call the access data as data object,
say L2, contains objects that have been seen at least which means that their sizes are different.
twice “recently”. L1 is thought as capturing “recency”
while L2 capturing “frequency”. Although ARC
captures both frequency and recency, it doesn’t
consider the access cost of objects. Additionally, the
size of L1 is resized very frequently, which may
increase the algorithm overhead.
Server
Ekow Otoo etc gave a replace algorithm for
storage source manager in data grids [4], which
defined a utility function for each object to express its Cache
use status. This function relate to reference frequency,
object size and device access speed, but it has several
drawbacks: it pays almost no attention to recent
history, and does not adapt well to changing access
patterns since it accumulates stale objects with high
frequency counts that may no longer be useful,
Storage Storage Storage
moreover, it requires logarithmic implementation
device 1 device 2 device 3
complexity in cache size. Ulrich Hahn etc offered a
replace algorithm called ObjectLRU, which take into
account influence to replace algorithm by various Fig 1: The Model of Hierarchical Storage System.
object properties [8]. This uses a weighting function to
evaluate combinations of object properties, which 3.2. The Performance Formula of Storage System
provides a more flexible approach. But its weight
values are difficult to select. Furthermore, the cost of The performance gain is defined to be the ratio of
algorithm realization is not low. access times without and with cache [8].
access time without cache
g= (1)
3. The Performance Formula of access time with cache
Storage System The factors influencing the I/O performance are
the I/O latency of devices, the hit ratio of cache and
The basal goal is to improve the I/O performance data sizes. In order to simplify calculation, we suppose
using cache in storage system. So we first calculate that data objects have the same sizes. On the
performance gain of storage system with cache after assumption that the accessed data objects are {O1,
describing the model of hierarchical storage system. O2, … , On}, and their access frequencies are {m1,
With the performance gain, we can get a few of
m2, …, mn}, the total access frequencies of storage 4. The Cache Replace Algorithms
system is as the following:
n There are usually two goals to design cache replace
M = ∑ mi (2) algorithms, one is to make the cache hit ratio higher to
i =1 obtain least access time, and the other is to simplify
Without cache, accessed data comes from the cache algorithms in order to minimize the
different storage devices, their access time are {t1, t2, overhead [9].
t3,…,tn}, so the total access time is as following: As described in section 3, the usual replace
n
algorithms are based on the formula (7), so they are
To = ∑ mi ∗ t i (3) not the optimization algorithms for storage system. In
i =1
this section, we propose some algorithms based on the
If the storage system configures cache, suppose
formula (6), which are designed in order to obtain
the cache hit ratios of objects are {p1, p2, …,pn}, the
least access time but not highest cache hit ratio. In
total cache hit ratio is as following:
addition, we also try to simplify the cache replace
n
P = ∑ pi (4)
algorithms in order to minimize the algorithm
i =1
overhead.
Suppose access time of data object in cache is tc, 4.1. The WLFU algorithm
the total access time with cache is as following: As described above, the high cache hit ratio of the data
n n
Tc = ∑M ∗ pi ∗tc + ∑(mi − M ∗ pi ) ∗ti (5) objects with long access time from devices result to
high performance gain. We define a weighting hit ratio
i=1 i=1
for each object in cache as the following:
Thus performance gain is as the following:
ti
pi′ = ⎣ ⎦ ∗ p i = ci p i
n

To ∑mi ∗ti t min


(8)

g= = n
i =1
n

∑M ∗ pi ∗tc + ∑(mi − M ∗ pi ) ∗ti


Tc
Input: The request stream x1, x2, …, xi, …
i =1 i=1 For every i≥1, the following two cases must
1 (6) occur.
= n
Case 1: If i=1 then: tmin=ti
M ∗∑ pi (ti −tc ) Otherwise: If ti< tmin then
{k=⎣tmin / ti⎦ and tmin=ti and
1− i =1
n for every Oj in cache pi= pi*k and ci= ci*k}
∑m ∗t
i =1
i i
Case 2: If xi is in cache
then: pi=pi+ci and sort the p queue
From the formula (6), the following conclusions Otherwise:
are drawn. It is not the high total cache hit ratio but the The following two cases must occur.
high cache hit ratio of the data objects with long 1: If cache is full then:
access time from devices that result to high delete object with minimal pi
performance gain. The usual cache replace algorithms 2: ci=⎣ ti / tmin ⎦ and pi=ci
are used to increase the total cache hit ratio, so they and insert p in queue
are no more right in the case.
If the access time from different devices is the Fig 2: The Weighting Least Frequently Used algorithm.
same, the formula (6) can express as the following:
1 1 where, tmin is the minimal devices access time, ti
g= = is the object’s device access time, ci is an integer to
n
(ti − tc ) ∗ P
M ∗ (ti − tc ) ∗ ∑ pi 1− express the relative access time of Oi. We use ci to
ti replace ti in order to make the weighing value simple.
1− i =1
n tmin is given as the following: when the first data object
(∑mi ) ∗ ti is accessed, its device access time is set as tmin,
i =1 afterwards, if the device access time of data objects is
(7) not less than tmin, it is not changed; otherwise, the new
In this case, the higher total cache hit ratio means device access time is set as tmin, at the same time, the
the higher I/O performance gain. So the usual cache weighting hit ratios of all objects in cache are
replace algorithms such as LFU, LRU and OPT are multiplied by the ratio of old and new tmin. In the case,
based on this conclusion.
the weighting cache hit ratio is used to replace the recently accessed. LRU selects the object at the
'
cache hit ratio, and the objects having little pi in bottom of stack to be replaced as storage system needs
cache are first replaced, which makes the object with cache space to cache new object.
long device access time in cache long time and
improve its cache hit. So the average access time of Input: The request stream x1, x2, …, xi, …
storage system is little according to formula (6). Initialization: window=1
In order to make algorithm simple, the LFU For every i ≥ 1, The following four cases must
algorithm usually replaces hit ratio with frequency, the occur.
WLFU algorithm also use the same technique. The Case 1: If i=1 then: tmin=tmax= ti
WLFU algorithm uses a queue to record the weighting Otherwise: one and only one of the following
frequency of all objects in cache. If an object is not two cases must occur.
accessed in cache, when it is put in cache, its weight 1: If ti< tmin then tmin=ti
value is calculated and weighting frequency is set as 2: If ti> tmax then for every Oj in cache,
weight value, at the same time, the tmin may be kj= kj*⎣ti / tmax⎦ and tmax=ti
changed, and all of the weighting frequency and Case 2: k=⎣tmax / tmin⎦ and if k>window then
weight value maybe renewed. If an object is accessed window=k
in cache, its weighting frequency is added by weight Case 3: If xi isn’t in cache then: The following two
value. So the WLFU algorithm selects the data object cases must occur.
with the least weighting frequency as the replaced 1: If stack is full then: delete object at the
object when the cache needs storage space to cache bottom of stack and replace the object in
new data object. The WLFU is described as figure 2. cache
This algorithm is designed based on the 2: ki=⎣tmax / ti⎦
assumption that all data objects have the same size in Otherwise: delete xi in the stack
storage system, but it is also suited to the storage Case 4: If the ki tier is not empty then move all
system that has different object sizes. As described the seriate Oj, whose local≥ki, down to the
above, all data objects have the same size in storage next tier
system, which means that the device access time is the Put xi in the ki tier of the stack
ones of unit data object. Although the larger object can
make its weighting frequency increases, it also Fig 3: The Weighting Least Recently Used algorithm.
impropriates more cache space. So in storage system
with different object sizes, the device access time is Considering the device access time different, the
replaced with the ones of unit data object. In the case WLRU algorithm gives a weight value ki to the
the same algorithm is used. As this algorithm is used, recency of each object. Precisely, if some object has
there are many objects that have the same weighting long device access time, its recency is multiplied by a
frequency but different sizes, how to select the small weight value; otherwise, its recency is
replaced object? Our scheme is to select the large multiplied by a big weight value. ki is integer
object to be replaced because the write of large object calculated as (9).
can reduce the replace latency. t max
The following algorithms are also designed for ri′ = ⎣ ⎦ ∗ ri = k i ri (9)
storage system with same object size, but they are all
ti
suitable to storage system with different object sizes In order to make a queue with weighted recency,
for the above reasons. WLRU set a window at the top of the LRU stack. In
the window, the new object can insert anywhere; out
4.2. The WLRU algorithm
the window, the order of objects is no changed unless
The WLFU algorithm captures the notion of frequency, some objects are deleted. The window size is set as the
but it pays almost no attention to recent history, and integer ratio of the maximal and the minimal device
does not adapt well to varying access patterns since it access time. When an object is first put in cache, its
accumulates stale objects with high frequency counts device access time is recorded as both the maximal
that may no longer be useful. The WLRU algorithm is and the minimal device access time and the window
designed to capture recency. In order to design it, we size is set as 1. Afterwards, when some new object is
first have a look at the LRU algorithm. LRU algorithm put in cache, if its device access time is more(or less)
uses recency to replace frequency, which can reduce than the maximal(or minimal) device access time, the
algorithm overhead. LRU uses a recency stack to maximal(or minimal)is set as the now device access
realize object queue, the top object of the LRU stack is time and the window size value is renewed, otherwise
accessed recently and the bottom object is least the window size value is not changed.
The WLRU algorithm is realized with a stack as jouncing in Q1. For example, if some objects are
figure 3. Case 1 is used to revise the maximal and continually accessed in cycle and the size of Q1 is less
minimal device access time. At the same time, the than the total size of those objects, objects are
weight values of all objects in cache are renewed if the frequently replaced and the cache hit is 0. In the case,
maximal device access time is revised. Case 2 changes L must be increased. However, if L is too big, Q2 is
the size of the window as the maximal/minimal device small, which may result to objects frequently accessed
access time is revised. Case 3 deletes hit object or to be replaced. The algorithm revises L according to
replace the weighting least recently access object in the accumulating integer k. k is set as 0 at the
the stack. Case 4 inserts recently access object in the beginning, and it increase (or decrease) as a
window according to weight value. replacement is taken place in Q1 (or Q2). L is changed
4.3. The WLFRU algorithm only when k is accumulated to some value (such as
10), which can reduce the revising of L.
The WLFU algorithm captures frequency and the The WLFRU algorithm is described as figure 4. L
WLRU algorithm captures recency, which utilize one is set with C/2 at beginning, which can reduce L’s
part of I/O locality to improve the performance of revising. In case 1, cache hit takes place, the hit object
storage system. The WLFRU algorithm is a is inserted in Q2 according to WLRU. In case 2, cache
comprehensive one that uses both frequency and miss takes place, the object is inserted in Q1 according
recency to select the replaced object. to WLRU. Cache replace may take place in case 2. If
|Q1|=L, replace takes place in Q1 regardless cache
Input: The request stream x1, x2, …, xi, … being full or not, so k increases, at the same time, L
Initialization: L=C/2, window1= window2=1, k=0 may increase as k is more than a threshold. If |Q1|<L
For every i≥1, one and only one of following two and |Q1|+|Q2|=C, which means that L is too big, so
cases must occur. cache replace takes place in Q2 and k is reduced to
Case1: xi is in Q1 or Q2: Cache hit, insert xi reduce L.
according to WLRU in Q2
Case2: xi is neither in Q1 nor in Q2, Cache miss.
One and only one of following two cases 5. Experiment and Results
must occur.
1: |Q1|=L, then Cache hit ratio is usually used to evaluate the
a) Replace the WLRU object with xi in performance of cache, which shows how cache can
Q1, and k=k+1 reduce device I/O. However, as described above, the
b) if k>10 then L=L+1 higher cache hit ratio doesn’t mean the less average
2: |Q1|<L, then access time in storage system. Thus we use average
a) If |Q1|+|Q2|=C, then delete the access time to evaluate the performance of storage
WLRU object in Q2, and k=k-1 system. At the same time, we also measure cache-hit
b) Insert xi according to WLRU in ratios to compare it with average access time.
Q1 We used Disksim simulator, which was
c) If k<-10 then L=L-1 developed by Carnegie Mellon University, to simulate
cache storage system. Disksim is an efficient, accurate
and highly configurable disk system simulator
Fig. 4: The Least Frequently/Recently Used algorithm.
developed to support research into various aspects of
storage subsystem architecture [10]. Disksim contains
In order to utilize the frequency locality, we use
a cache module that can simulate cache replace
two variable-sized stacks Q1 and Q2 to record the
algorithms such as LRU, and we have programmed to
history of the cached object. The first holds objects
realize algorithms such as WLFU, WLRU and
that have been accessed only once recently and the
WLFRU. We used the synthetic traces contained in
second holds objects that have been accessed at least
Disksim to simulate those algorithms in order to
twice recently, so the objects in Q1 (or Q2) may be
compare their performance.
least (or frequently) accessed. In order to utilize the
The storage system configured two disks. The
recency locality, objects in both Q1 and Q2 are
average device access time was constant but the ratio
organized according to the WLRU. Suppose the cache
of device access time was set as 1, 4 and 16. We
size is C and the maximal size of Q1 is L, Q1 and Q2
measured both cache hit ratios and average response
satisfy the following:
time of storage system with LRU, WLFU, WLRU, and
0≤|Q1|≤L,0≤|Q2|≤C,0≤|Q1|+|Q2|≤C
WLFRU. The cache hit ratios are shown as the figure
L is continually revised in order to improve the
5, WLFU and WLRU have cache hit ratio as much as
utility of cache, at the same time it can reduce
LRU, and WLFRU has the highest cache hit ratio.
Since WLRU and WLFU enhance the hit ratios of effect of device access time, the higher cache hit ratio
some objects with more device access time while doesn’t mean the less average access time. Based on
reduce the hit ratios of some objects with less device this conclusion, we propose WLFU, WLRU and
access time, it doesn’t improve cache hit ratio compare WLFRU algorithm, which consider not only the I/O
to LRU. However, WLFRU uses both frequency and locality but also the device access time, so they have
recency to capture locality, so its cache hit ratio is higher performance than usual algorithm. However,
higher than the two ones. these algorithms don’t consider the effect of network
bandwidth and I/O load. Additionally, in WLFRU
algorithm, although we use accumulated k to revise L,
LRU WLFU WLRU WLFRU
how to select the threshold with I/O load varying is
0.35 not considered. In the future, we will research on
cache replace algorithms adapted to the varying of
Cache Hit Ratio

0.3
0.25 network bandwidth and I/O load. We also plan to
improve WLFRU algorithm to make it suit to different
0.2
I/O load. Additionally, the cache algorithms about
0.15
distributed cache and cooperation cache are our goal.
0.1
0.05
0 Acknowledgement
1 4 16
This work is partially supported by National Nature
The Ratio of Device Access Time Science Foundation of China (Grant No. 60273073).
Fig. 5: The cache hit ratios vary with varying of device
access time. References
[1] W. R. Carr and J. L. Hennessy, WSClock – a
LRU WLFU WLRU WLFRU simple and effective algorithm for virtual
memory management. Proc. 8th Symp. Operating
30 System Principles, pp. 87–95, 1981.
Average Access Time

25 [2] E. J. O’Neil, P. E. O’Neil, and G. Weikum, “An


optimality proof of the LRU-K page replacement
20
algorithm,” J. ACM, 46: 92–112, 1999.
(ms)

15 [3] J. T. Robinson and M. V. Devarakonda, “Data


10 cache management using frequency-based
replacement,” in Proc. ACM SIGMETRICS Conf.,
5 pp. 134–142, 1990.
0 [4] Ekow Otoo, Frank Olken and Arie Shoshani.
1 4 16 Disk Cache Replacement Algorithm for Storage
Resource Managers in Data Grids. Proceedings of
The Ratio of Device Access Time the IEEE/ACM SC2002 Conference, November,
2002.
Fig. 6: The average access time vary with varying of device
[5] D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min,
access time.
Y. Cho, and C. S. Kim, “LRFU: A spectrum of
policies that subsumes the least recently used and
The average response times of storage system are least frequently used policies,” IEEE Trans.
described as figure 6, WLRU and WLFU have less Computers, 50:1352–1360, 2001.
average response time than LRU and the WLFRU has [6] Y. Zhou and J. F. Philbin, “The multi-queue
the least average response time. This is because replacement algorithm for second level buffer
WLFU, WLRU and WLFRU consider not only access caches,” in Proc. USENIX Annual Tech. Conf.
locality but also the difference of device access time. (USENIX 2001), Boston, MA, pp. 91–104, June
2001
6. Conclusions and the Future Work [7] N. Megiddo and D. S. Modha. ARC: A self-tuning,
low overhead replacement cache. In Proceedings
Configuring cache in storage system is an effective of the Second USENIX Conference on File and
means to improve its performance. Considering the
Storage Technologies (FAST), pages 115–130,
San Francisco, CA, Mar. 2003.
[8] Ulrich Hahn, Werner Dilling, Dietmar Kallta.
Improved Adaptive Replacement Algorithm for
Disk Caches in HSM Systems. IEEE Symposium
on Mass Storage Systems, March, pp. 128-140,
1999.
[9] Aameek Singh etc. A Hybrid Access Model for
Storage Area Networks. Proceedings of the 22nd
IEEE/13th NASA Goddard Conference on Mass
Storage Systems and Technologies (MSST 2005).
[10] John S. Bucy etc. the DiskSim Simulation
Environment Version three Reference Manual.
School of Computer Science, Carnegie Mellon,
2003.

You might also like