[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18)

January 8, 2020 Page 1/27
Personalized Top-N Sequential Recommendation
via Convolutional Sequence Embedding (WSDM’18)
Jihoo Kim
datartist@hanyang.ac.kr
Dept. of Computer and Software, Hanyang University
Jiaxi Tang, Ke Wang
Simon Fraser University

Jiaxi Tang
PhD Student
School of Computing Science
Intern at Google AI
Research & Machine Intelligence Team
Ke Wang
Professor
School of Computing Science
PhD, Georgia Institute of Technology
MS, Georgia Institute of Technology
Recent papers
Towards Neural Mixture Recommender for Long Range Dependent User Sequences (WWW’19)
Jiaxi Tang*, Francois Belletti*, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu and Ed H. Chi
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System (KDD’18)
Jiaxi Tang, Ke Wang
Author

Minimum qualifications:
• Currently enrolled in a Master’s or PhD degree in Computer Science or a related technical field.
• Experience (classroom/work) in Natural Language Understanding, Neural Networks, Computer Vision, Machine
Learning, Deep Learning, Algorithmic Foundations of Optimization, Data Science, Data Mining and/or Machine
Intelligence/Artificial Intelligence.
• Experience with one or more general purpose programming languages: Java, C++ or Python.
• Experience with research communities and/or efforts, including having published papers (being listed as author)
at conferences (e.g. NIPS, ICML, ACL, CVPR, etc).
About the job
Research and Machine Intelligence is a high impact team that’s building the next generation of intelligence and
language understanding for all Google products. To achieve this, we’re working on projects that utilize the latest
techniques in Artificial Intelligence, Machine Learning (including Deep Learning approaches like Google AI) and
Natural Language Understanding. We impact products across Google including Search, Maps and Google Now.
https://careers.google.com/jobs/results/136271419680924358-research-intern-2020/
Google AI Research Intern

Contents
1. Introduction
1.1 Top-N Sequential Recommendation
1.2 Limitations of Previous Work
1.3 Contributions
2. Related Work
3. Proposed Methodology
3.1 Embedding Look-up
3.2 Convolutional Layers
3.3 Fully-connected Layers
3.4 Network Training
3.5 Recommendation
4. Experiments
4.1 Experimental Setup
4.2 Performance Comparison
4.3 Network Visualization

User’s long term
and static behaviors
User’s short term
and dynamic behaviors
General
preferences
Sequential
patterns
<
always
After buying an iPhone, buy phone accessories
“I love Apple’s products”
vs
recent next
<Motivation>
1. Introduction

1.1 Top-N Sequential Recommendation
Users
Items
Sequence
order
General preferences
Sequential patterns
Input Output
A list of items
for user u
<Top-N Sequential Recommendation>
<Notations>
1. Introduction

<Markov chain based model>
1) FPMC (Factorized Personalized Markov Chains) WWW’10
2) Fossil (Factorized Sequential Prediction with Item Similarity Model) ICDM’16
<Two major limitations>
1) Fail to model union-level* sequential patterns.
2) Fail to allow skip behaviors**.
milk flour
*Union-Level?
butter… …
**Skip behaviors?
… airport hotel
rest-
aurant
bar
attr-
action
not necessary
…
Figure 1
1. Introduction

To provide evidences of union-level influences and skip behaviors
minimum support count = 5
minimum confidence = 50%
X Y
sequence
Figure 2
Sequential Association
Rules
→
1. Introduction

1.3 Contributions
Caser (ConvolutionAl Sequence Embedding Recommendation Model)
• Caser uses horizontal and vertical convolutional filters to capture sequential patterns
at point-level, union-level, and of skip behaviors.
• Caser models both users’ general preferences and sequential patterns, and
generalizes several existing state-of-the-art methods in a single unified framework.
• Caser outperforms state-of-the-art methods for top-N sequential recommendation on
real life data sets.
1. Introduction

• Sequential pattern mining depends on the explicit representation of patterns, thus, could
miss patterns in unobserved states. (= could miss implicit patterns)
• CNN has been used to extract users’ preferences from their reviews. None of these works
is for sequential recommendation.
• RNN was used for session-based recommendation. It may not work well in sequential
recommendation, because not all adjacent actions have dependency relationships.
• Temporal recommendation is related but different problem. (Session-based is also different)
(ex. Recommend coffee in the morning, instead of evening.)
2. Related Work

Figure 3
<Network Architecture of Caser>

The user 𝒖’s sequence
every 𝑳 successive
items
as input
their next 𝑻 items
as the targets
window of
size 𝑳 + 𝑻
The embedding for item 𝒊
d is the number of latent dimensions
𝑺 𝟏
𝒖
𝑺 𝟐
𝒖
𝑺 𝟑
𝒖
𝑺 𝟒
𝒖
𝑺 𝟓
𝒖
𝑬(𝒖,𝟑)
=
𝑸 𝑺 𝟏
𝒖
𝑸 𝑺 𝟐
𝒖
𝑬(𝒖,𝟒) =
𝑸 𝑺 𝟐
𝒖
𝑸 𝑺 𝟑
𝒖
𝑬(𝒖,𝟓) =
𝑸 𝑺 𝟑
𝒖
𝑸 𝑺 𝟒
𝒖
3.1 Embedding Look-up

image
local features
= 𝑳 × 𝒅 matrix 𝑬
= sequential pattern
Figure 4
Unlike image recognition,
“image” 𝑬 is not given…
and must be learnt
3.2 Convolutional Layers

𝑳 = 𝟒
𝒉 = 𝟐
𝒅 = 𝟑
𝑭 𝒌
∈ ℝ 𝟐×𝟑
𝒊 = 𝟏
𝒊 = 𝑳 − 𝒉 + 𝟏
= 𝟒 − 𝟐 + 𝟏
= 𝟑
𝑬 𝟏:𝟐
𝑬 𝟐:𝟑
𝑬 𝟑:𝟒
inner
product
activation
function
𝑖-th convolution value
<Max Pooling><Horizontal Filter>
𝑳 = 𝟒
𝒅 = 𝟑
෩𝑭 𝒌 ∈ ℝ 𝟒×𝟏
<Vertical Filter>
→ weighted sum
→ no max pooling
𝑘-th filter
# of filter
height of filter
Convolution value (by 𝑭 𝒌
)

activation function
convolutional
sequence embedding
3.3 Fully-connected Layers
the probability of
how likely user 𝒖 will interact
with item 𝒊
at time step 𝒕

union-level
sequential patterns
point-level
sequential patterns
short-term
sequential patterns
long-term
general preferences

To train the network, we transform the values of the output layers to probabilities
sigmoid function
the collection of the time steps
for which we would like to make
predictions for user 𝒖
the likelihood of all sequences in the dataset

To further capture skip behaviors, we could consider the next 𝑻 target items
Taking the negative logarithm of likelihood, we get the objective function “binary cross-entropy loss”
model parameters
hyper-parameters
are learned by minimizing the loss function (13)
are tuned on the validation set via grid search

3.5 Recommendation
After obtaining the trained neural network, to make recommendations for a user 𝒖 at time step 𝒕
We recommend 𝑵 items
that have the highest values
in the output layer 𝒖
𝒖’s last 𝑳 items’
embedding 𝑬(𝒖,𝒕)
𝒖’s latent
embedding 𝑷 𝒖
Input Output

<Datasets>
Amazon data was not used, due to its SI
0.0026 for ‘Office Products’
0.0019 for ‘Clothing’ / ‘Shoes’ / ‘Jewelry’ / ‘Video Games’
70% 10% 20%
validation testtraining
sequence
4. Experiments

<Evaluation Metrics>
MAP(Mean Average Precision): the average of AP for all users
Precision, Recall
top 𝑵 predicted items
for a user
the last 20% of actions
in user’s sequence (= test set)
4. Experiments

4. Experiments

<Influence of hyper-parameter 𝒅, 𝑳, 𝑻,>
4. Experiments

<Analysis of Caser Components>
𝒉 denotes horizontal convolutional layer
𝒗 denotes vertical convolutional layer
𝒑 denotes personalization
Any missing component is represented
by setting its corresponding 𝒐, ෥𝒐, 𝑷 𝒖 to zero.
4. Experiments

Caser puts more emphasis on recent actions,
demonstrating a major difference from the conventional top-N recommendation.
<Vertical convolutional filters>
4. Experiments

<Horizontal convolutional filters>
<Previous Sequence>
𝑺 𝟏 (13th Warrior) History
𝑺 𝟐 (American Beauty), Romance
𝑺 𝟑 (Star Trek), Action & SF
𝑺 𝟒 (Star Trek III)
𝑺 𝟓 (Star Trek IV)
<Predictions>
𝑹 𝟏 (Mad Max)
𝑹 𝟐 (Star War)
𝑹 𝟑 (Star Trek) >> Ground Truth
4. Experiments

Thank you!
Q & A

[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18)

Recommended

More Related Content

What's hot (20)

Similar to [Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18) (20)

Recently uploaded (20)

[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (WSDM’18)