0% found this document useful (0 votes)
52 views

PSAC Proactive Sequence-Aware Content Caching Via Deep Learning at The Network Edge

This document proposes a new strategy called PSAC (Proactive Sequence-Aware Content Caching) that uses deep learning models for proactive content caching at network edges. It aims to improve user experience and reduce network load. PSAC includes two models - PSAC_gen uses CNN for general content, and PSAC_seq uses attention mechanisms to capture sequential features in content. Experiments show PSAC can effectively improve performance while reducing resource usage, which is important for resource-limited edge devices. This is a novel approach that considers both general and sequential content for proactive caching.

Uploaded by

Umer Bin Salman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

PSAC Proactive Sequence-Aware Content Caching Via Deep Learning at The Network Edge

This document proposes a new strategy called PSAC (Proactive Sequence-Aware Content Caching) that uses deep learning models for proactive content caching at network edges. It aims to improve user experience and reduce network load. PSAC includes two models - PSAC_gen uses CNN for general content, and PSAC_seq uses attention mechanisms to capture sequential features in content. Experiments show PSAC can effectively improve performance while reducing resource usage, which is important for resource-limited edge devices. This is a novel approach that considers both general and sequential content for proactive caching.

Uploaded by

Umer Bin Salman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO.

4, OCTOBER-DECEMBER 2020 2145

PSAC: Proactive Sequence-Aware Content Caching


via Deep Learning at the Network Edge
Yin Zhang , Senior Member, IEEE, Yujie Li , Ranran Wang ,
Jianmin Lu, Xiao Ma , Member, IEEE, and Meikang Qiu , Senior Member, IEEE

Abstract—Compared with traditional ineffective methods, such Unfortunately, although Zeydanet al. [10] proposed adopt-
as acquiring more spectrum and deploying more base stations, edge ing collaborative filtering in the recommendation system for
caching is a highly promising solution for increased data flow needs
and has attracted considerable attention. However, owing to the achieving proactive content caching, this simple machine
lack of careful consideration of cached data, existing related learning method usually requires large-scale offline computing,
methods neither reduce network load nor improve the quality of which is not suitable for processing time-sequential, complex,
experience. In this study, we propose a proactive sequence-aware and real-time content required by users. In summary, content
content caching strategy (PSAC). Specifically, for general content at caching currently faces the following challenges:
the network edge and content with sequential features, PSAC_gen
(based on a convolutional neural network) and PSAC_seq (based on  Performance improvement. With the increase in user
an attention mechanism that can automatically capture sequential data demand, network load rate and user service quality
features), respectively, are proposed to implement proactive are difficult to improve under the existing content cach-
caching. Experiments demonstrate that the proposed deep learning ing strategies [11].
content caching method can effectively improve user experience and  Limited resources. As edge devices are inherently short
reduce network load.
of resources, reducing algorithm time and space con-
Index Terms—Content caching, sequence-aware, deep learn- sumption should be considered in the deployment of
ing, edge caching, cognitive networks. content caching policies [12].
 Sequential data. Several user requests have sequential
characteristics (such as a series of web page requests in
I. INTRODUCTION a browser or a series of user requests in a short time on
a video playback site), but the existing caching mecha-

E DGE caching in 5G environment is a promising new solu-


tion for network high-load transmission and has attracted
considerable attention [1]. Although the amount of content
nism lacks pertinence for sequential data processing.
Thus, it is difficult to extract sequential behavioral pat-
terns for improved user experience [13].
requested by users is increasing, storage space and the comput- To address these challenges, we propose proactive sequence-
ing resources of edge nodes are usually limited [2]. Thus, it is aware content caching (PSAC) via deep learning at the network
unable to cache all popular content. Advanced caching strategy edge. In particular, a PSAC_gen model based on a convolution
at the network edge is a valuable work to improve the whole net- kernel and attention mechanism is proposed for general con-
working performance, decrease the access delay, and enhance tent, and a PSAC_seq model based on self-attention is proposed
the user experience [3]–[5]. In addition, as an emerging research for sequential content. Through sufficient experiments, it veri-
topic, edge caching still faces several challenges, such as cache fies that the proposed models are available to reduce the net-
framework design [6] and content replacement [7], [8]. Further- work traffic load, improve the quality of experience (QoE), and
more, with the increase in user requirements, the content to be reduce resource consumption. Specifically, the main contribu-
cached is becoming increasingly diverse, and sequential user tions of this paper are as follows:
requests are more frequent [9].  First, we construct a proactive caching strategy based
on deep learning. This strategy can achieve smaller traf-
Manuscript received January 30, 2020; revised March 26, 2020; accepted fic load and higher user QoE. In addition, it is experi-
April 24, 2020. Date of publication April 27, 2020; date of current version
December 30, 2020. This work was supported by the China National Natu- mentally demonstrated that the proposed strategy is
ral Science Foundation under Grant 61702553. Recommended for accep- more advantageous in terms of resource consumption;
tance by Dr. Huimin Lu. (Corresponding author: Meikang Qiu.) this is desirable in resource-poor edge devices.
Yin Zhang is with the University of Electronic Science and Technology of
China, Chengdu 611731, China (e-mail: [email protected]).  Considering the characteristics of sequential content,
Yujie Li is with the Yangzhou University, Yangzhou 450001, China we develop different cache models for sequential and
(e-mail: [email protected]). general data. In particular, the sequence-aware cache
Ranran Wang, Jianmin Lu, and Xiao Ma are with the Zhongnan University
of Economics and Law, Wuhan 430073, China (e-mail: ran.ran.wang@stu. PSAC_seq model is more efficient than ordinary deep
zuel.edu.cn; [email protected]; [email protected]). learning models based on Recursive Neural Networks
Meikang Qiu is with the Texas A&M University Commerce, TX 75428 (RNNs) and Convolutional Neural Networks (CNNs)
USA (e-mail: [email protected]).
Digital Object Identifier 10.1109/TNSE.2020.2990963 owing to the transformer-type attention mechanism.
2327-4697 ß 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
2146 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO. 4, OCTOBER-DECEMBER 2020

The paper is organized as follows. Section II presents related caching framework using a powerful deep RNN model con-
work. Section III details the two proposed caching strategies. sisting of object feature prediction and caching strategy com-
Section IV presents the experiments. Section V concludes the ponents based on long short-term memory encoder–decoder.
paper. Although these approaches are slightly better than conven-
tional content caching methods, they still do not resolve the
II. RELATED WORK time and space resource consumption problems in the network
edge, and a more efficient learning method should be devel-
In this section, we will present studies concerned with pro- oped to reduce traffic load and enhance user QoE. In addition,
active content caching based on conventional machine learn- the sequential characteristics of user-requested content have
ing as well as deep learning. been noticed [9], [22], [23], but the performance (such as
solving long-distance dependence, parallelism, or operation
A. Conventional Proactive Content Caching efficiency) of the adopted RNN or CNN models and even ordi-
Several studies on proactive content caching have been con- nary machine learning frameworks is not as good as that of
ducted. For example, Muller et al. [14] proposed a proactive self-attention models [24], [25].
caching algorithm that learns the popularity of related content Considering this, we propose a PSAC strategy to address the
by regularly observing user- and context-related information. challenges of network performance bottlenecks, resource con-
Zhou et al. [15] focused on energy cost and proactive content sumption, and content serialization. In our carefully designed
caching, and they proposed a green delivery framework to PSAC strategy, we use two deep learning models to improve
improve the quality of content caching services and reduce cache performance; in particular, a sequence-aware model for
energy consumption costs. Manzoor et al. [16] proposed pro- sequential content.
actively caching potentially popular content in the future by
considering user mobility. Zeydanet al. [10] proposed the use III. ARCHITECTURE OF PSAC
of collaborative filtering to predict the content that users may A. Design Issues
request in the future. To maximize QoE, Chen et al. [17] pro-
posed a machine learning algorithm based on conceptor echo It has been claimed that caching in advance content that
state networks for predicting users’ movement pattern and the may be popular in the future at the network edge may reduce
distribution of requested content. Chang et al. [18] discussed a peak traffic and result in higher QoE [26], [27]. In this case,
learning-based edge caching method in which a large amount one should be concerned with the objects that are proactively
of data can be used for content popularity estimation and pro- cached, that is, what is predicted to be popular in the future.
active cache strategy design. User-requested content generally includes the following types:
Although most of these studies are useful for reducing net- 1) General Content This includes, for instance, real-time
work latency and traffic load, the improvements are limited streaming data and HTTP requests, which usually require
owing to the simplicity of the methods involved. Thus, the immediate response from the server. The requested data
proposed PASC uses deep learning for further improving the are usually transmitted to the user’s device as soon as
performance of the cache model. possible. If a proactive caching strategy is adopted to
anticipate these user needs in advance, the quality of ser-
vice will obviously be greatly improved. Qiaoet al. [28]
B. Deep Learning-Based Content Caching
proposed reducing latency after user requests using cach-
In recent years, deep learning has achieved great success in ing. For general user needs, we designed PSAC_gen to
various fields, and several studies have been concerned with its predict of user interests. This caching strategy uses con-
application to content caching at the complex network edge. volution and self-attention, and it is based on historical
Zhong et al. [19] proposed a deep Q-Network framework requests (only relevant to the current user; they can be
based on deep reinforcement learning for maximizing the user requests in a time sequence but not limited to this
cache hit ratio. Dai et al. [20] proposed a joint edge computing content). It achieves user interest prediction with high
and caching mechanism based on deep reinforcement learning accuracy and thereby reduces traffic load and improves
for dynamically orchestrating edge computing and caching QoE.
resources to improve resource utilization in the Internet of 2) Sequential Content Most user requests in daily life are
vehicles. To improve the caching and computing performance sequential. For example, on browsers, at e-commerce
of vehicle-to-vehicle networks, Tan et al. [21] proposed a sites, and in several other scenarios, requests usually
deep reinforcement learning method based on multiple time have a certain timing. If we can predict the content of
scales, and a motion perception reward estimation method forthcoming requests according to sequential historical
based on a large-scale event model. requests, and this content is proactively cached at the
Of course, sequential content characteristics have been network edge, we can reduce the peak traffic congestion
known. For example, Ale et al. [22] proposed an online proac- of the backbone network and improve QoE. If a user
tive caching scheme based on a bidirectional deep RNN model request is context-dependent (for example, a request for
to predict content requests in time series and accordingly the < 0 ToyStory30 ;0 ToyStory40 ;0 FindingNemo0 >
update the edge cache. Narayanan et al. [23] proposed a deep movie series), then future requests may also be in the
ZHANG et al.: PSAC: PROACTIVE SEQUENCE-AWARE CONTENT CACHING VIA DEEP LEARNING AT THE NETWORK EDGE 2147

Fig. 1. Framework of PSAC.

form of a sequence (for example, “Finding Dory” or construct inputs that meet the requirements. If the user set and
“Frozen II”). User requests are serialized in this manner the requested content set are U ¼ fu1 ; u2 ; u3 ; ujUj g and
in several scenarios, and for this sequential request we R ¼ fr1 ; r2 ; r3 ; rjRj g, respectively, the content sequence
use the sequence-aware PSAC_seq model to predict the requested by each user u in the past can be expressed as
user’s future interests. R S u ¼ fR S1u ; R S2u ; R S3u ; R SjR u
S u j g, where R St 2 R is
u

Consider the adequate capacity of deep learning for feature in chronological order, that is, the subscript t represents the
extraction, PSAC_gen based on CNN and self-attention is position of a piece of content in the entire sequence.
developed to extract semantics feature from the general con- After obtaining the content sequence requested by user u in
tent, while PAC_seq based on deep neural network and self- the past, we extract every L successive pieces of content to
attention is developed to extract sequential feature from the predict their next T pieces from the sequence. This is per-
sequential content. In Fig. 1, it shows the framework of PSAC. formed by sliding a window of size L þ T over the sequence,
For general-content requests, according to PSAC_gen, it can where each window generates a training instance for u,
be predicted that popular content is stored in the evolved denoted by a triplet (u, previous L pieces of content, next T
packet core or radio access network at a higher position. For pieces of content).
sequential personalized needs more apparent to the user con- 2) Embedding Layer: The embedding layer mainly maps
tent, we will use PSAC_seq to predict the corresponding user u and each piece of content to the corresponding vector
content near the edge of user devices. Details will be pro- space and obtains d-dimensional dense vectors, which are
vided in the following. denoted by E u and E c , respectively. Each piece of content is
represented by a dense vector of d dimensions; then, the L
B. PSAC_gen For General Content
pieces of content are combined to form an L  d matrix
For general-content requests, we propose the PSAC_gen C ðu;tÞ 2 RLd , that is,
model, which uses a self-attention layer. The attention 2 c 3
ER Su
mechanism can learn the dependencies between various tL
6 .. 7
pieces of content better than a simple convolution kernel. 6 . 7
C ðu;tÞ 6
¼6 c 7 (1)
Therefore, in PSAC_gen, we use self-attention to replace 7
4 ER Su 5
the vertical convolutional layer in [29]. For the user t2
ERc Su
request, the edge device uses PSAC_gen for edge caching. t1
Fig. 2 illustrates the framework of PSAC_gen while
Table I introduces the details of the related parameters. It where ERc Su is the vector representation of the i-th piece of
i
can be divided into four parts: input, embedding, pattern content in the previous L pieces.
capture, and fully connected layer. 3) Patterns-Capture Layer: As is well known, CNNs have
1) Input Layer: Our goal is to predict the top-N pieces of a strong ability to capture local features. PSAC_gen has a
content that users may request in the future based on past vertical convolution layer that uses multiple convolution
requests. Before sending data to the model, we should kernels of different shapes to move along the row direction of
2148 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO. 4, OCTOBER-DECEMBER 2020

Fig. 2. Framework of PSAC_gen.

TABLE I
DETAILS OF THE RELATED PARAMETERS.
ck ¼ ½ck1 ; ck2 . . . . . . ; ckLhþ1 ; (2)

where cki is the result of the i-th convolution operation. Then,


we perform the maximum pooling operation on ck , and thus
the final output o 2 Rn of n filters is

o ¼ fmaxðc1 Þ; maxðc2 Þ . . . . . . maxðcn Þg: (3)

b) Self-attention layer: The self-attention mechanism can


obtain the dependency relationship between the pieces of con-
tent. For the matrix C passed from the embedding layer, the
self-attention layer uses a feed-forward network with a tanh
activation function, and each element in the matrix is evalu-
ated for its attention value a to other elements. Thus, a new
the embedding matrix C to capture union-level features, where fusion representation of attn is obtained using this attention
several pieces of content requested in the past jointly influence value, as shown in the following equations:
the content to be predicted. Moreover, PSAC_gen replaces the  
vertical convolutional layer of Caser in [29] with a self-atten- sij ¼ V T tanh wj ERc S u þ wi ERc Su (4)
tj ti
tion layer. The reason is that as each piece of content is mapped
to a d-dimensional vector, only the entire d-dimensional vector
expðsij Þ
can represent the content, and the vertical convolution layer in aij ¼ PL (5)
j¼1 expðsj Þ
i
Caser has a L  1 convolution kernel performing only one con-
volution at a time. It is conceivable that this does not accurately
reflect the dependency relationship between the pieces of con- X
L

tent. The specific implementation process of the vertical convo- attni ¼ aij ERc u
Stj (6)
j¼1
lutional and self-attention layers is as follows.
a) Vertical convolution layer: The vertical convolution where V T , wj , wi are the parameters to be learnt.
layer has n convolution kernels f k ð1  k  nÞ of shape 4) Fully Connected Layer: We combine the information
h  d, where h 2 f1; 2; Lg is the height of the convolution obtained from the convolutional and the self-attention layers,
kernel. In our method, N ¼ 16. Each convolution kernel will that is, we connect the outputs of the two layers and input
perform a convolution operation from top to bottom with them into a fully connected neural network layer (shown in
every h rows of the embedding matrix C. Then, the final result part 4 of Fig. 2) to obtain more high-level and abstract fea-
of the convolution kernel f k is a vector tures, as follows:
ZHANG et al.: PSAC: PROACTIVE SEQUENCE-AWARE CONTENT CACHING VIA DEEP LEARNING AT THE NETWORK EDGE 2149

   
o
z ¼ ff w þb (7)
attn

where ff ð:Þ represents the activation function of the fully con-


nected layer, w represents the weight matrix, and b is the bias.
z contains the user’s short-term content request information.
To obtain the user’s long-term content request information,
we add a user attention layer to the content sequence. This rep-
resents a weighting of each piece of content by the user, that
is, what content is frequently requested, reflecting the general
preference of the user. The specific implementation is as
follows:
1) Using the following dot product, we calculate
the similarity matrix S of the user-embedded vector
E u and z: Fig. 3. Framework of PSAC_seq.

S ¼ ðE u ÞT z (8) PSAC_seq to implement sequential user request caching.


The proposed PSAC_seq constructs an L-order Markov
2) Based on the similarity matrix S, we obtain the user chain to predict the next T pieces of content. As shown in
attention vector u i attn as follows: Figure 3, setting L ¼ 3 and T ¼ 1 and using the sliding
window method to segment the historical interaction
at ¼ softmaxðSt: Þ (9)
sequence between user and content, the historical interac-
X X
u i attn ¼ atj zj ; atj ¼ 1 (10) tion content sequence and the target content sequence can
j be obtained.
The specific structure of the proposed PSAC_seq model for
where, St: represents the t-th row vector of the similar- serialized content caching is as follows.
ity matrix S, at represents the normalization result; 1) Nonlinear Layer: The three input queries Q, key K,
then, the weighted sum is used to obtain the representa- and value P V of the model are historical content interaction
tion containing the user preference information. After sequences nt¼1 Xtu . We use ReLU as the activation function
obtaining the user attention vector to the piece of con- to map query Q and key K to the same hidden layer space:
tent u i attn, we concatenate it with the user embed-
ding vector E u , and we input the result into an output ~ ¼ reluðX u WQ Þ
Q (13)
t
layer with jRj nodes to obtain
    ~ ¼ reluðX u WK Þ
K (14)
ðu;tÞ 0 Eu 0 t
y ¼s w þb (11)
u i tten
where WQ and WK are the weights of the two non-linear layers,
where sðxÞ ¼ 1=ð1 þ ex Þ, w0 represents the weight and WQ ¼ WK .
matrix, b0 is the bias, and yðu;tÞ is the probability 2) Self-attention Layer: The self-attention layer mainly
that the user will request the piece of content i at models the short-term dependence of the content sequence. In
time t. this layer, attention is calculated p
using
ffiffiffi the scaled dot attention
5) Loss Function: With the concept of pairwise, we ran- mechanism. The scale factor is d, and the attention weight
domly sample three negative samples for each prediction tar- matrix aut is obtained as follows:
get i. Thus, we use binary cross-entropy as our loss function,
 0 0T 
as follows: QK
aut ¼ softmax pffiffiffi (15)
X X X ðu;tÞ
X ðu;tÞ d
loss ¼ logðyi Þ þ logð1  yj Þ;
u t2Timeu i2Du
t
j6¼i
We multiply the weight matrix aut by the P value V , that is,
(12)
where Timeu ¼ fL þ 1; L þ 2; . . . . . . ; jR S u jg indicates the the historical content interaction sequence nt¼1 Xtu , to obtain
time steps for which the prediction is to be made for user u, the weighted historical content interaction sequence:
and Dut ¼ fR Stu ; R Stþ1
u
; . . . . . . ; R StþT
u
g denotes the set of
X
n
T pieces of content to be predicted. sut ¼ aut Xtu (16)
t¼1
C. PSAC_seq for Sequential Content
For sequential content, we are inspired by the sequence- In particular, several important processes are performed in
aware recommendation model in [24], and we propose this layer:
2150 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO. 4, OCTOBER-DECEMBER 2020

 To prevent an excessively high matching score between Here, s is set to be the sigmoid function. We denote the
the mapping vector of the query Q and the key K, a diag- negative sample as yuj , and then the negative log loss of the
onal operation is performed on the weight matrix aut . negative sample is
 In addition, to introduce timing signals (similar to Trans-
X
n
former), the model retains the timing information in the lossn ¼ logð1  sðyuj ÞÞ; (22)
data through position coding. Timing embedding (TE) j6¼i
consists of the following sine and cosine signals [25]:
where j 6¼ i indicates that the negative samples are from
TEðt; 2iÞ ¼ sinðt=100002i=d Þ (17) the complement of the positive samples, and s is the sigmoid
function. We add these two expressions to obtain the total loss:
TEðt; 2i þ 1Þ ¼ cosðt=100002i=d Þ (18)
X
n X
n
Here, t is the time step, and i is vector dimension. loss ¼ logðsðyui ÞÞ þ logð1  sðyuj ÞÞ (23)
Before the non-linear transformation, TE was simply i¼0 j6¼i
connected before the query and key representations.
Of course, PSAC_seq can realize timing awareness IV. EXPERIMENTS
owing to this procedure, and the active caching of the In this section, we will describe in detail a series of related
content sequence has never been completed. experiments.
 To learn a single attention representation, we average
the self-attention embedding vectors of order L as the A. Experiment Design
user’s time series intention expression, where L ¼ 3 is
the order of the Markov chain. sutl is the weight of the All experiments were conducted on a single 16 GB RAM
previous step in the historical interaction sequence: 3.6 GHz Intel (R) Core (TM) CPU. We designed two experi-
ments. First, we verify the accuracy of the model in predicting
1X L
requested content in Section 4.3. This is primarily demon-
Htu ¼ su (19)
L l¼1 tl strated by traffic load and QoE_Score. Subsequently, in
Section 4.4, we verify the resource consumption of the model;
3) Construction Of Short-Term And Long-Term Preferen- specifically, the time and memory resource consumption of
ces: After obtaining the self-attention code Htu of input Xtu , the model during the training process.
we compare it with the content sequence Xtþ1 u
of the next time We used the dataset from Movielens [30]. This is a classic
step using the Euclidean distance to calculate their similarity. movie recommendation dataset and is widely used for related
This yields the short-term dependency could be represented as research on content caching strategies [14], [31]. In fact, the
k Htu  Xtþ1
u
k22 . scene recommended by the movie corresponds to the content
Moreover, we calculate the Euclidean distance between requested by the user: each user requests a movie and scores it;
the user sequence and the target content sequence, and we thus, the process of rating the movie by the user corresponds to
obtain their similarity and the long-term user preference as a content request. In addition, the dataset itself has sequential
k Uu  Vi k22 characteristics. We selected MovieLens100K1 to verify the
4) Objective: The short-term dependence k Htu  Xtþ1 u
k22
proposed caching strategy. Overall, the dataset contains 943
and the long-term preference k Uu  Vi k22 are weighted and different users and 168 different movie ratings for a total of
summed to obtain the final prediction score, as shown in the fol- 100,000 records, including at least 20 ratings per user. The orig-
lowing equation, where w is the user weight parameter: inal dataset contains information such as the user’s occupation,
age, gender, zip code, and movie score. In this study, however,
we used only the user and historical movie sequences.
yutþ1 ¼ w k Uu  Vi k22 þð1  wÞ k Htu  Xtþ1
u
k22 (20) To evaluate the final performance of various caching strate-
gies, we designed two evaluation indicators, namely, QoE_-
Score and traffic load, to measure the QoE of individual users
Furthermore, in the content sequence, we search for content and the traffic load of the network:
that did not interact with the user in the historical behavior as
a negative sample. Specifically, for each user, T negative sam- 1 k Ri \ P i k > u
Cðk Ri \ Pi k > uÞ ¼ (24)
ples will be sampled and will be used in the construction of the 0 others
loss function. X
1
First, we predict the content score in which the user is QoE Score ¼ Cðk Ri \ Pi k > uÞ;
interested, and calculate the negative log loss of the posi- k #users k i2#users
tive sample: (25)
where, R represents the content set corresponding to actual
X
n user requests, P represents the predictions by the cache model,
lossp ¼ logðsðyui ÞÞ (21)
1
i¼0 https://grouplens.org/datasets/movielens/100k/
ZHANG et al.: PSAC: PROACTIVE SEQUENCE-AWARE CONTENT CACHING VIA DEEP LEARNING AT THE NETWORK EDGE 2151

TABLE II
THE COMPARED APPROACHES.

i represents each user, and #users represents the user set in MF (matrix factorization [32]). The frequency matrix of
the current network. k Ri \ Pi k denotes the amount of con- user clicks on items is decomposed into the user and item
tent in the cache device that can meet the user’s needs, u matrices of lower dimensions, and then the stochastic gradient
denotes the average number of requests, and Cðk Ri \ Pi k > descent algorithm is used to learn the representation vectors of
uÞ indicates whether the current cache content meet the user’s each user and item. Finally, the top K items with high similar-
needs, with a value of 0 or 1. k #users k represents the num- ity to users are cached.
ber of users considered in the current network. A larger QoE_- SVD (singular value decomposition [33]). The frequency
Score value indicates a better caching strategy. matrix of user clicks on items is decomposed in a manner that
P generalizes the feature decomposition to obtain the score of
i2#users k Ri  Pi k each piece of content by each user. We select the top K items
TrafficLoad ¼ P (26)
i2#users k Ri k
and cache them.
UserCF. The user-based collaborative filtering algorithm
where k Ri  Pi k represents the number of pieces of content [34] finds a set of users whose interests are similar to the tar-
not predicted in the content set corresponding to user i’s get, and then caches target items that were not clicked on but
clicks, and k Ri k represents the number of pieces of content are similar to what the user has clicked on.
that user i actually requests. A smaller traffic load indicates a ItemCF. The basic principle of our item-based collaborative
better caching strategy. filtering [35] caching is to cache content similar to what users
Moreover, the performance of PSAC is compared with the have clicked on in the past. In [10], this method is also used
representative approaches in Tab. II, which illustrates more to implement active content caching.
details about their scenarios, advantage, weakness, etc. Caser: We chose Caser [29] as one of the comparison meth-
First, the following approaches are conventional passive ods. This model uses a series of convolution kernels to complete
caching strategies mentioned in Zeydan et al. [10]. the recommendation of top-n. We use this method to predict the
LFU (least frequently used). After the buffer is full, we top K pieces of content that users may cache at the network edge.
replace the least frequently requested piece of content. We compared these approaches and the two proposed mod-
LRU (least recently used). We replace the piece of content els on the MovieLens dataset in terms of QoE_Score, traffic
of past user requests that has not been used for the longest load, and time as well as memory consumption.
time after the cache is full.
FIFO (first in first out). Content is cached based on the order
B. Comparison of Network Performance
in which users request it, replacing earlier requests when the
cache is full. In this section, we compare the PSAC strategy and the other
Given that human behavior is predictable and there are suf- caching methods in terms of the individual user experience
ficient data on the web to support the analysis of such behavior and the overall network traffic.
[10], we also adopt proactive caching strategies for the predic- As can be seen from Figs. 4 and 5, the traffic load of proac-
tion of requested content: tive caching is generally lower than that of passive caching,
2152 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO. 4, OCTOBER-DECEMBER 2020

Fig. 6. Runtime of one training epoch.

Fig. 4. Performance in terms of traffic load.

Fig. 7. Memory consumption of each method during training on Movie-


Lens 100k.

model is easier to train, it responds faster and consumes less


Fig. 5. Performance in terms of QoE_Score. computing resources. In fact, it can be seen that some proactive
caching strategies, such as SVD, MF, UserCF, and ItemCF,
whereas the quality of service is higher. This is mainly because have significantly higher computing time than passive caching
proactive caching precaches popular content on edge devices so strategies, such as LFU, LRU, and FIFO. Therefore, it is not
that user interests are predicted, and this content is directly cost-effective to adopt machine learning methods with little
obtained from the edge device when the request is made. There- improvement, as mentioned in [10]. Instead, deep learning
fore, the overall network load is lower. However, in the passive methods finally require less than 10 s (perhaps because most of
caching mode, the requested content cannot be found on the the other caching strategies involve computing the similarity of
edge device but should be obtained from the cloud. This results the content requested by the user, which is more time consum-
in larger network traffic, longer response time, and hence lower ing than learning the parameters in the neural network), which
user satisfaction. The three deep learning methods are signifi- is obviously more advantageous in real-world scenarios. How-
cantly better than the other methods in reducing the overall traf- ever, among the three deep learning models, the two proposed
fic load and improving QoE. This indicates that deep learning models are not better than Caser mainly because PSAC
can improve the network performance of content caching. included more parameters in the design process of the model.
PSAC_gen and PSAC_seq are superior to Caser (which is based However, PSAC still has high cost from the perspective of the
only on convolution kernel) because they both contain a self- improvement in network performance.
attention mechanism that that is more effective than a convolu- Fig. 7 shows the space consumption of various caching
tion kernel in handling the dependencies in sequential content. strategies during training. It can be seen that the memory
In addition, the dataset MovieLens is in fact sequence- resources consumed by passive caching and some proactive
dependent; thus PSAC_seq, which is sequence-aware, per- caching policies such as SVD, MF, UserCF, and ItemCF are
forms better in terms of traffic load and quality of service than all higher than those of the deep learning-based model. This
PSAC_gen, which applies to general types of data, at a cache again demonstrates the advantage of the proposed deep learn-
size of 80% or less. ing-based approach in terms of resource consumption. How-
ever, as PSAC_gen is, in fact, more complex than Caser’s
structure (with additional self-attention layer), it has more
C. Comparison of Time and Memory Consumption
parameters and is slightly more memory-intensive. PSAC_seq
In this section, we compare the proposed PSAC strategy also uses slightly more resources than Caser because it consid-
with other caching methods in terms of resource consumption. ers sequential information and performs similarity calculations
Fig. 6 shows the time consumption of one training epoch for to determine long-term content preferences. However, the
each cache strategy. At the network edge, it is obvious that if a performance of PSAC_gen and PSAC_seq in reducing traffic
ZHANG et al.: PSAC: PROACTIVE SEQUENCE-AWARE CONTENT CACHING VIA DEEP LEARNING AT THE NETWORK EDGE 2153

load and improving service quality demonstrates that they still [7] B. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-based
approach to caching in heterogenous small cell networks,” IEEE Trans.
have high practical value. Commun., vol. 64, no. 4, pp. 1674–1686, Apr. 2016.
[8] S. Zhang, N. Zhang, P. Yang, and X. Shen, “Cost-effective cache
deployment in mobile heterogeneous networks,” IEEE Trans. Veh.
D. Discussion Technol., vol. 66, no. 12, pp. 11 264–11 276, Dec. 2017.
[9] T. Giannakas, P. Sermpezis, and T. Spyropoulos, “Show me the cache:
Through the experimental results, the availability and per- Optimizing cache-friendly recommendations for sequential content
formance of PASC are verified and evaluated, but it got the access,” in Proc. 2018 IEEE 19th Int. Symp. “A World Wireless, Mobile
and Multimedia Netw.”(WoWMoM). IEEE, 2018, pp. 14–22.
following limitations. [10] E. Zeydan, E. Bastug, M. Bennis, M. A. Kader, I. A. Karatepe, A. S. Er,
Due to the lack of actual data, only Movielens is available in and M. Debbah, “Big data caching for networking: Moving from cloud
our experiment, which could not fully reflect the diversity of to edge,” IEEE Commun. Mag., vol. 54, no. 9, pp. 36–42, Sep. 2016.
[11] D. Prerna, R. Tekchandani, and N. Kumar, “Device-to-device content
users with different backgrounds and habits. The effectiveness caching techniques in 5g: A taxonomy, solutions, and challenges,” Com-
of PASC in reducing network latency has yet to be verified. put. Commun., vol. 153, pp. 48–84, 2020. [Online]. Available: http://
Moreover, security and privacy should be the great challenges www.sciencedirect.com/science/article/pii/S0140366419318225
[12] D. Liu, B. Chen, C. Yang, and A. F. Molisch, “Caching at the wireless
of the content caching [36], but they are not discussed in this edge: design aspects, challenges, and future directions,” IEEE Commun.
paper. Mag., vol. 54, no. 9, pp. 22–28, Sep. 2016.
In our work, general requests and sequential requests need [13] T. Giannakas, T. Spyropoulos, and P. Sermpezis, “The order of things:
Position-aware network-friendly recommendations in long viewing
manual marking. Thus, the efficiency could be further improved. sessions,” CoRR, vol. abs/1905.04947, 2019. [Online]. Available: http://
Besides, the feasibility should be approved through practical arxiv.org/abs/1905.04947
applications. [14] S. M€uller, O. Atan, M. van der Schaar, and A. Klein, “Context-aware pro-
active content caching with service differentiation in wireless networks,”
IEEE Trans. Wireless Commun., vol. 16, no. 2, pp. 1024–1036, Feb. 2017.
[15] S. Zhou, J. Gong, Z. Zhou, W. Chen, and Z. Niu, “Greendelivery: Proac-
V. CONCLUSION tive content caching and push with energy-harvesting-based small
In edge caching, it is highly important to adopt a suitable cells,” IEEE Commun. Mag., vol. 53, no. 4, pp. 142–149, Apr. 2015.
[16] S. Manzoor, S. Mazhar, A. Asghar, A. N. Mian, A. Imran, and
caching strategy for responding to user requests using limited J. Crowcroft, “Leveraging mobility and content caching for proactive
edge resources at a higher speed. Considering the problem of load balancing in heterogeneous cellular networks,” Trans. Emerging
proactive content caching at the network edge, we developed Telecommun. Technol., 2019.
[17] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong,
the PSAC caching strategy for general and sequential user “Caching in the sky: Proactive deployment of cache-enabled unmanned
requests, including PSAC_gen and PSAC_seq models. These aerial vehicles for optimized quality-of-experience,” IEEE J. Sel. Areas
elaborately designed models automatically capture user inter- Commun., vol. 35, no. 5, pp. 1046–1061, May. 2017.
[18] Z. Chang, L. Lei, Z. Zhou, S. Mao, and T. Ristaniemi, “Learn to cache:
ests through a self-attention mechanism, thus assisting the Machine learning for network edge caching in the big data era,” IEEE
model in predicting future interests more accurately and cache Wireless Commun., vol. 25, no. 3, pp. 28–35, Jun. 2018.
them at the network edge in advance. This further improves [19] C. Zhong, M. C. Gursoy, and S. Velipasalar, “A deep reinforcement
learning-based framework for content caching,” in Proc. 2018 52nd
QoE and reduces the peak traffic load of the network. In par- Annu. Conf. Inform. Sciences and Syst. (CISS), 2018, pp. 1–6.
ticular, PSAC_seq can realize the perception of sequential [20] Y. Dai, D. Xu, S. Maharjan, G. Qiao, and Y. Zhang, “Artificial intelli-
content by considering its temporal nature. In addition, a series gence empowered edge computing and caching for Internet of
Vehicles,” IEEE Wireless Commun., vol. 26, no. 3, pp. 12–18, Jun. 2019.
of experiments demonstrated the improvement of the two deep [21] L. T. Tan and R. Q. Hu, “Mobility-aware edge caching and computing in
learning models in terms of QoE and traffic load, and the time vehicle networks: A deep reinforcement learning,” IEEE Trans. Veh.
and memory consumption of the proposed content proactive Technol., vol. 67, no. 11, pp. 10 190–10 203, Nov. 2018.
[22] L. Ale, N. Zhang, H. Wu, D. Chen, and T. Han, “Online proactive caching
caching strategy are significantly more cost-effective than in mobile edge computing using bidirectional deep recurrent neural
other approaches at the resource-limited network edge. network,” IEEE Internet Things J., vol. 6, no. 3, pp. 5520–5530, Jun. 2019.
[23] A. Narayanan, S. Verma, E. Ramadan, P. Babaie, and Z.-L. Zhang,
“Deepcache: A deep learning based framework for content caching,” in
Proc. 2018 Workshop Netw. Meets AI & ML, 2018, pp. 48–53.
REFERENCES [24] S. Zhang, Y. Tay, L. Yao, A. Sun, and J. An, “Next item recommenda-
[1] P. Yang, N. Zhang, S. Zhang, L. Yu, J. Zhang, and X. S. Shen, “Content tion with self-attentive metric learning,” in Proc. 33rd AAAI Conf. Artifi-
popularity prediction towards location-aware mobile edge caching,” cial Intell., vol. 9, 2019.
IEEE Trans. Multimedia, vol. 21, no. 4, pp. 915–929, Apr. 2018. [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[2] J. Li, Z. Ming, M. Qiu, G. Quan, X. Qin, and T. Chen, “Resource alloca- º. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in
tion robustness in multi-core embedded systems with inaccurate Neural Information Processing Systems, Red Hook, Curran Assoc., NY,
information,” J. Syst. Archit., vol. 57, no. 9, pp. 840–849, 2011. USA, 2017, pp. 5998–6008.
[3] W. Jiang, G. Feng, and S. Qin, “Optimal cooperative content caching [26] Y. Zhang, R. Wang, M. S. Hossain, M. F. Alhamid, and M. Guizani,
and delivery policy for heterogeneous cellular networks,” IEEE Trans. “Heterogeneous information network-based content caching in the Internet
Mobile Comput., vol. 16, no. 5, pp. 1382–1393, May. 2017. of Vehicles,” IEEE Trans. Veh. Technol., vol. 68, no. 10, pp. 10 216–10 226,
[4] A. Ioannou and S. Weber, “A survey of caching policies and forwarding Oct. 2019.
mechanisms in information-centric networking,” IEEE Commun. Sur- [27] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of
veys and Tutorials, vol. 18, no. 4, pp. 2847–2886, Fourth Quarter 2016. proactive caching in 5g wireless networks,” IEEE Commun. Mag.,
[5] J. Li, M. Qiu, J. Niu, W. Gao, Z. Zong, and X. Qin, “Feedback dynamic vol. 52, no. 8, pp. 82–89, Aug. 2014.
algorithms for preemptable job scheduling in cloud systems,” in Proc. [28] J. Qiao, Y. He, and X. S. Shen, “Proactive caching for mobile video
2010 IEEE/WIC/ACM Int. Conf. Web Intell. and Intell. Agent Technol., streaming in millimeter wave 5g networks,” IEEE Trans. Wireless Com-
vol. 1, Aug. 2010, pp. 561–564. mun., vol. 15, no. 10, pp. 7187–7198, Oct. 2016.
[6] R. Tandon and O. Simeone, “Harnessing cloud and edge synergies: [29] J. Tang and K. Wang, “Personalized top-n sequential recommendation
Toward an information theory of fog radio access networks,” IEEE Com- via convolutional sequence embedding,” in Proc. 11th ACM Int. Conf.
mun. Mag., vol. 54, no. 8, pp. 44–50, Aug. 2016. Web Search and Data Mining, 2018, pp. 565–573.
2154 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 7, NO. 4, OCTOBER-DECEMBER 2020

[30] F. M. Harper and J. A. Konstan, “The movielens datasets: History and Ranran Wang is currently pursuing the M.S. degree
context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, 2016. with the School of Information and Safety Engineer-
[31] B. Chen and C. Yang, “Caching policy for cache-enabled d2d communi- ing, Zhongnan University of Economics and Law
cations by learning user preference,” IEEE Trans. Commun., vol. 66, (ZUEL). Her research interests include recommenda-
no. 12, pp. 6586–6601, 2018. tion systems, data mining, and edge caching.
[32] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua, “Fast matrix factorization
for online recommendation with implicit feedback,” in Proc. 39th Int.
ACM SIGIR Conf. Res. Develop. Inform. Ret., 2016, pp. 549–558.
[33] M. Jian and K.-M. Lam, “Simultaneous hallucination and recognition of
low-resolution faces based on singular value decomposition,” IEEE Trans.
Circuits and Syst. for Video Technol., vol. 25, no. 11, pp. 1761–1772,
Nov. 2015.
[34] Z. Jia, Y. Yang, W. Gao, and X. Chen, “User-based collaborative filter-
ing for tourist attraction recommendations,” in Proc. 2015 IEEE Int. Jianmin Lu is currently pursuing the M.S. degree
Conf. Comput. Intell. Commun. Technol., 2015, pp. 22–25. with the School of Information and Safety Engineer-
[35] F. Xue, X. He, X. Wang, J. Xu, K. Liu, and R. Hong, “Deep item-based ing, Zhongnan University of Economics and Law
collaborative filtering for top-n recommendation,” ACM Trans. Inform. (ZUEL). Her research interests include deep learning,
Syst., vol. 37, no. 3, p. 33, 2019. natural language processing, and edge caching.
[36] Z. Shao, C. J. Xue, Q. Zhuge, M. Qiu, B. Xiao, and E. H. M. Sha, “Security
protection and checking for embedded system integration against buffer
overflow attacks via hardware/software,” IEEE Trans. Comput., vol. 55,
no. 4, pp. 443–453, Apr. 2006.

Yin Zhang (Senior Member, IEEE) is a full Professor


of the School of Information and Communication
Engineering, University of Electronic Science and
Technology of China. He is Co-chair of IEEE Com-
puter Society Big Data STC. He has published more Xiao Ma (Member, IEEE) is an Assistant Professor of
than 90 prestigious conference and journal papers, the School of Information and Safety Engineering,
including 14 ESI Highly Cited Papers. He is an IEEE Zhongnan University of Economics and Law (ZUEL),
Senior Member since 2016. He got the Systems Jour- China. She got her Ph.D from the Huazhong Univer-
nal Best Paper Award of the IEEE Systems Council in sity of Science and Technology in 2017. During the
2018. He was named in Clarivate Analytics Highly period 2015.10–2017.3, she was visiting the Univer-
Cited Researchers List in 2019. His research interests sity of Illinois at Urbana-Champaign. Her research
include intelligent mobile services and applications, mobile computing, edge interests include recommendation systems, data min-
computing, cognitive wireless communications, etc. ing, and machine learning.

Yujie Li received the B.S. degree in Computer Sci-


ence and Technology from Yangzhou University in Meikang Qiu (Senior Member, IEEE) received the BE
2009. She received M.S. degrees in electrical engi- and ME degrees from Shanghai Jiao Tong University
neering from the Kyushu Institute of Technology and received Ph.D. degree of computer science from
and Yangzhou University in 2012, respectively. She the University of Texas at Dallas. He is the Department
received a Ph.D. degree from the Kyushu Institute Head and tenured full professor of Texas A&M Univer-
of Technology in 2015. From 2016 to 2017, she was sity Commerce. He is ACM Distinguished Member and
a Lecturer in Yangzhou University. Currently, she is IEEE Senior Member. He is the Chair of IEEE Smart
a JSPS Research Fellow (FPD) at the Kyushu Computing Technical Committee. He has published 20
Institute of Technology and an Assistant Professor + books, 550+ peer-reviewed journal and conference
in Fukuoka University, Japan. Her research interests papers, including 80+ IEEE/ACM Transactions papers.
include computer vision, sensors, Internet of Things, image segmentation. His research interests include cyber security, big data
and Machine Learning. analysis, cloud computing, smarting computing, intelligent data, embedded sys-
tems, etc. He is an Associate Editor of 10+ international journals, including IEEE
TRANSACTIONS ON COMPUTERS and IEEE TRANSACTIONS ON CLOUD COMPUTING.

You might also like