0% found this document useful (0 votes)

50 views

Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14

This document discusses using a recurrent neural network to learn semantic word representations from text documents to build a content-based recommender system. It trains an RNN on document text to obtain latent feature representations of words, which are then used as features in a matrix factorization model to predict user preferences. The system achieved a root mean squared error of 0.902 on the RecSys Challenge 2014 dataset, outperforming LDA, LSA and PCA for modeling book abstract content.

Uploaded by

Anonymous t5aOqdHpuh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14

Uploaded by

Anonymous t5aOqdHpuh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Deep Learning of Semantic Word

Representations to Implement a Content-based

Recommender for the RecSys Challenge14

Omar U. Florez1 , Lama Nachman1

Intel Labs, Santa Clara CA 95053, USA,

{Omar.U.Florez, Lama.Nachman}@intel.com

Abstract. In this paper, we will discuss a recommender system that

exploits the semantics regularities captured by a Recurrent Neural Net-
work (RNN) in text documents. Many information retrieval systems treat
words as binary vectors under the classic bag-of-words model; however
there is not a notion of semantic similarity between words when describ-
ing a document in the resulting feature space. Recent advances in neu-
ral networks have shown that continuous word vectors can be learned
as a probability distribution over the words of a document [3, 4]. Sur-
prisingly, researchers have found that algebraic operations on this new
representation captures semantic regularities in language [5]. For exam-
ple, Intel + P entium Google results in word vectors associated to
{Search, Android, P hones}. We used this deep learning approach to dis-
cover the continuous and latent semantic features describing content of
documents and fit a linear regression model to approximate user pref-
erences for documents. Our submission to the RecSys Challenge14 ob-
tained a Root Mean Squared Error (RMSE) of 0.902 and ranked 6th for
Task 1. Interestingly enough, our approach provided better vector repre-
sentations than LDA, LSA, and PCA for modeling the content of book
abstracts, which are well-known techniques currently used to implement
content-based recommender systems in the recommendation community.

Keywords: deep learning, recommender systems, neural network lan-

guage models

1 Introduction
Latent representation of text is an important task in context-based recommender
systems. Especially cold-start problems impose a need to trust the content to
infer accurate recommendations when few (or non-existing) user ratings are
provided. Beyond are the n-grams and bag-of-words models to demonstrate
text, continuous representations techniques such as Latent Dirichlet Allocation
(LDA), Latent Semantic Analysis (LSA), and Principal Component Analysis
(PCA) which have been used to describe the content of a document as a proba-
bility distribution of latent variables known as topics.
The idea is that a sparse matrix M that characterizes the user preferences
(rows) in items (columns) can be factorized into two matrices U and V of joint
latent factor space of dimensionality K. This way the user preference u of item v
can be approximated by the dot product uT v. This method is known as matrix
factorization and has been proved to be effective in the Netflix Prize competition
combining better scalability and predictive accuracy than Collaborative Filtering
methods [1]. We have followed a similar approach, but have not assume a random
initialization for matrix V . Our hypothesis is that the features describing a
document can be learned in an unsupervised way considering how words form
a document in sentences. This provides a context for each word and thus not
every word is independent of each other as in the bag-of-words model.

2 Approach

Our method consists of two steps: a) feature learning, and b) user preference
learning. While the second step is the traditional matrix, factorization with
stochastic gradient descend, the feature learning step uses a neural network to
learn a continuous representation of words according to its context in a sentence.
Google has shown Word2Vec as a similar deep-learning approach to model se-
mantic word representations1 , but the task of recommendations on this new
representation is still unexplored.

2.1 Feature Learning

We describe a topic representation by learning the word topic proportions with

a Recurrent Neural Network (RNN) as in [5]. This architecture is illustrated
in Figure 1 and consists of an input layer w(t), hidden layer s(t), output layer
y(t), and the corresponding weigh matrices U , V , and W . The hidden layer s(t)
provides recurrent connections to s(t 1) forming a short-term memory that
models the context of a word in a document. Thus, the input layer consists of
the current word w(t) and of the previous values of the hidden layer s(t 1).
This layer is projected to a lower dimensional space using matrices U and W
and a non-linear activation function with a dimensionality of K.

Fig. 1: a) A Recurrent Neural Network (RNN)

1
https://code.google.com/p/word2vec/
The network is trained by stochastic gradient descend using back propagation
algorithm as in [5]. Hidden and output states are computed as follows:

X X
sj (t) = f ( wi (t)Uji + sl (t 1)Wjl )
i l
X
yk (t) = g( sj (t)Vkj )
j

where f (x) and g(x) are sigmoid and softmax activity functions:
1 exm
f (x) = g(xm ) = P x
1 + ex ke
k

When the RNN is trained, the output layer y(t) contain P (wt+1 |wt , s(t 1)),
the probability distribution of a word given a history of words (context) stored in
time t 1. To obtain a per document topic distribution, we match the empirical
distribution of words in a document d by using a continuous distribution over
these words indexed by a random variable .
Z Z N
Y
P (d) = P (d, ) = P ( ) P (wi |) d
i=1

where N is the number of words in document d and wi is the i-th word in d. As

in[2], we use a Gaussian prior on . Note that the term P (w|) is approximated
by the output of the RNN, so the problem
QN consists of estimating . The MAP

of for a document d is maxd P (d ) i=1 P (w|d ). We find d with stochastic
gradient descend for each document with a fixed number of iterations.

2.2 User Preference Learning

Given a matrix V learned in the previous step with RNN for each document
(cf. Feature Learning), the problem of predicting user preferences in a subset of
documents v V consists of finding the regression weights ui for user i such that
their inner product approximates the real user preferences yui in the training
dataset with yui ui v T . After taking the derivative of the squared error and
d
setting it to zero ( dv (yui ui v T )2 = 0), we solve for each user ui the following
equation:

ui = (v T v)1 v T yui
and then predict preference values for new documents within V .

3 Why is the System Innovative?

The popular bag-of-words model used in natural language processing has been
outperformed by neural networks language models in recent years [3, 4]. The
novelty of our attempt in the RecSys Challenge strives on the way how features
are learned to describe content. The weights of a trained RNN provide high
quality word vectors that show semantic relationships in large datasets [5]. This
way, words are not only similar or belong to the same topic, but share a similar
context. As far as we know, none of the previously proposed architectures has
been successfully trained for the problem of recommendation based on content.

4 Results

Given the 8,170 DBpedia URIs provided in the competition, we extract the
abstract of each book with SPARQL to obtain a vector of size (8, 170 V ),
where V is the number of words in the vocabulary. After stop-removal and
removing words with low frequency, we ended up with V 1, 500. We then
trained 99 RNNs with hidden layers ranging from 2 to 100 nodes to cover a
broad range in the number of latent features K to describe content in the DBbook
dataset. Similarly, we have set the same number of topics for LDA, LSA, and
PCA in the experiments (Figure 3 (a)). Initially we had considered gradient
descend to evaluate the regression weights during user preference learning, but
this approach (with 1, 2, 3 5, 10, 20, and 100 iterations) didnt provide lower
Root Mean Squared Error (RMSE) values than using the normal equation for
linear regression (Figure 3 (b)), so we used the normal equation as the best
(and fastest) approach. When gradient descend was considered, we measured
100 values for the regularization parameter in the range of 106 and 103 , so
in total 9, 900 deep learning-based recommenders were implemented.

(a) (b)

Fig. 3: a) Effect of the number of topics b) Effect of the number of iterations

We compare the quality of different content representations in terms of

RMSE. When a Recurrent Neural Network (RNN) is well trained, the result-
ing recommender will be able to linearly combine the latent features learned
during training to better approximate the user preferences. Our submission to
the RecSys Challenge14 obtained a RMSE of 0.902 and ranked number 6 for
Task 1 with K = 20. Interestingly enough our approach has always provided bet-
ter vector representations than LDA, LSA, and PCA for modeling the content of
book abstracts in our experiments, as shown in Figure 3 (a). Those techniques
are well-established methods currently used when implementing content-based
recommender systems in the recommendation community.

5 Learned Lessons
During the implementation the recommender system, we come up with the fol-
lowing findings:
Modeling word contexts provides a semantic relationship between words that
improves the latent representation of documents.
When the matrix V contains enough information to describe the structure in
the content of documents, updating U and V with coordinate ascend provides
minor improvements. Thus, we can train U with a very small number of
iterations if enough effort has been devoted to train V or both steps can be
performed independently without loss of RMSE.
The above approach requires a more aggressive regularization parameter to
control overfitting in matrix U . Empirically, large numbers for the regular-
ization parameters provide the smallest RMSE with this setting.
Surprisingly, projecting LDA topics to an orthogonal space with PCA no-
tably improved the prediction results. We believe this is because the linear
combination of features and weights provided by the regression algorithms
is better suit in a space with low correlation between latent variables.

6 Conclusions
We presented a recommender system that uses the semantic word properties of a
type of deep learning algorithm called Recurrent Neural Network. This method
provides a lower and less sparse representation of content of text documents.
Our results and submission to the RecSys Challenge shows that a RNN provides
a lower RMSE values than a latent representation with LDA, PCA, and LSA.

References
1. Robert M. Bell and Yehuda Koren. Lessons from the netflix prize challenge.
SIGKDD Explor. Newsl., 9(2):7579, December 2007.
2. Andrew L Maas and Andrew Y Ng. A probabilistic model for semantic word vectors.
In NIPS, volume 10, 2010.
3. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
word representations in vector space. Computing Research Repository, 2013.
4. Tomas Mikolov and et al. Distributed representations of words and phrases and
their compositionality. In NIPS, 2013.
5. Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in con-
tinuous space word representations. In Conference of the North American Chapter
of the Association for Computational Linguistics, pages 746751, 2013.

Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Design of Toggle Jack
100% (2)
Design of Toggle Jack
9 pages
Roth The Straight-Wire Appliance 17 Years Later - Journal of Clinical Orthodontics
100% (4)
Roth The Straight-Wire Appliance 17 Years Later - Journal of Clinical Orthodontics
6 pages
Articles_search_project (1)
No ratings yet
Articles_search_project (1)
8 pages
Learning Hierarchical Review Graph Representations For Recommendation
No ratings yet
Learning Hierarchical Review Graph Representations For Recommendation
14 pages
We Used Neural Networks To Detect Clickbaits: You Won't Believe What Happened Next!
No ratings yet
We Used Neural Networks To Detect Clickbaits: You Won't Believe What Happened Next!
7 pages
Learning To Generate Reviews and Discovering Sentiment
No ratings yet
Learning To Generate Reviews and Discovering Sentiment
9 pages
Sentiment Classification With Deep Neural Networks: Yi Zhou
No ratings yet
Sentiment Classification With Deep Neural Networks: Yi Zhou
58 pages
Text Classification Improved BT Integrating Bidirectional LSTM With Two-Dimensional Max Pooling
No ratings yet
Text Classification Improved BT Integrating Bidirectional LSTM With Two-Dimensional Max Pooling
11 pages
Overview On Movie Recommender System
100% (1)
Overview On Movie Recommender System
4 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
No ratings yet
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
6 pages
lecture 11
No ratings yet
lecture 11
57 pages
Recurrent Neural Network For Text Classification With Multi-Task Learning
No ratings yet
Recurrent Neural Network For Text Classification With Multi-Task Learning
7 pages
Wide and Deep Learning For Recommender Systems - Google Play Store - Highlighted Paper
No ratings yet
Wide and Deep Learning For Recommender Systems - Google Play Store - Highlighted Paper
4 pages
NN TextClassification
No ratings yet
NN TextClassification
7 pages
2204.09930v1
No ratings yet
2204.09930v1
9 pages
Character-Aware Neural Language Models
No ratings yet
Character-Aware Neural Language Models
9 pages
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
No ratings yet
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
9 pages
Recurrent Neural Networks: CMSC498L
No ratings yet
Recurrent Neural Networks: CMSC498L
36 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
Recurrent Convolutional Neural Networks For Text Classification
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
7 pages
Article Semantic Recommendation by Recurrent Neural Networks David
No ratings yet
Article Semantic Recommendation by Recurrent Neural Networks David
6 pages
Reverse Engineering Recurrent Networks For Sentiment Classification Reveals Line Attractor Dynamics
No ratings yet
Reverse Engineering Recurrent Networks For Sentiment Classification Reveals Line Attractor Dynamics
17 pages
Lecture Notes 6
No ratings yet
Lecture Notes 6
5 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Unit 4 - DL
No ratings yet
Unit 4 - DL
23 pages
ML 4
No ratings yet
ML 4
10 pages
Recommender Systems in the Era of Large Language M
No ratings yet
Recommender Systems in the Era of Large Language M
16 pages
Multimodal Recommender Systems: Rakuten Institute of Technology
No ratings yet
Multimodal Recommender Systems: Rakuten Institute of Technology
44 pages
Recurrent Neural Networks cheatsheet
No ratings yet
Recurrent Neural Networks cheatsheet
44 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
1308 0850 PDF
No ratings yet
1308 0850 PDF
43 pages
For Seminar
No ratings yet
For Seminar
17 pages
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
No ratings yet
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
28 pages
ML 5
No ratings yet
ML 5
20 pages
ML 5
No ratings yet
ML 5
20 pages
RNN
No ratings yet
RNN
22 pages
Day 4
No ratings yet
Day 4
22 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
Contrastive Learning for Sequential Recommendation
No ratings yet
Contrastive Learning for Sequential Recommendation
15 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Rnn
No ratings yet
Rnn
50 pages
Navigation Recomm by RNN David 06 2007
No ratings yet
Navigation Recomm by RNN David 06 2007
9 pages
Linguistic Regularities in Continuous Space Word Representations
No ratings yet
Linguistic Regularities in Continuous Space Word Representations
6 pages
Recurrent Neural Networks With Top-K Gains For Session-Based Recommendations
No ratings yet
Recurrent Neural Networks With Top-K Gains For Session-Based Recommendations
10 pages
Sentiment Analysis with an Recurrent Neural Networks
No ratings yet
Sentiment Analysis with an Recurrent Neural Networks
12 pages
lecture10-lstms
No ratings yet
lecture10-lstms
34 pages
IJCNN 2019 Utilizing Generative Adversarial Networks For Recommendation Based On Ratings and Reviews
No ratings yet
IJCNN 2019 Utilizing Generative Adversarial Networks For Recommendation Based On Ratings and Reviews
8 pages
Recurrent Neural Network Using LSTM Model
No ratings yet
Recurrent Neural Network Using LSTM Model
15 pages
A Novel Deep Learning-Based Collaborative Filtering Model For Recommendation System
No ratings yet
A Novel Deep Learning-Based Collaborative Filtering Model For Recommendation System
6 pages
Algorithms 17 00104
No ratings yet
Algorithms 17 00104
28 pages
Deep Learning For Information Retrieval
No ratings yet
Deep Learning For Information Retrieval
136 pages
Cs224n 2025 Lecture06 Fancy Rnn
No ratings yet
Cs224n 2025 Lecture06 Fancy Rnn
57 pages
Assignment_14_Modern_AI
No ratings yet
Assignment_14_Modern_AI
3 pages
Appendix
No ratings yet
Appendix
9 pages
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
No ratings yet
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
15 pages
Unit iv
No ratings yet
Unit iv
58 pages
Doc2vec Explain
No ratings yet
Doc2vec Explain
5 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordinary Differential Equations and Stability Theory: An Introduction
From Everand
Ordinary Differential Equations and Stability Theory: An Introduction
David A. Sanchez
No ratings yet
Explosive Material
100% (1)
Explosive Material
11 pages
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
No ratings yet
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
400 pages
Test A 9to
No ratings yet
Test A 9to
2 pages
8th Maths Lesson 10 To 12 Eng
No ratings yet
8th Maths Lesson 10 To 12 Eng
2 pages
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
No ratings yet
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
4 pages
Quantitative Research Samples
No ratings yet
Quantitative Research Samples
28 pages
Laser Range Finder: Diode-Pumped Technology
No ratings yet
Laser Range Finder: Diode-Pumped Technology
2 pages
StolPer Samuelson Theorem Note
No ratings yet
StolPer Samuelson Theorem Note
10 pages
Bba 4 Sem Basic Informatics For Management 19101343 May 2019
No ratings yet
Bba 4 Sem Basic Informatics For Management 19101343 May 2019
2 pages
General Mathematics: Quarter 1 - Module 16: Solving Problems Involving Inverse Functions
No ratings yet
General Mathematics: Quarter 1 - Module 16: Solving Problems Involving Inverse Functions
20 pages
S21 - DLD Lab MidTerm
No ratings yet
S21 - DLD Lab MidTerm
1 page
Geometric Design Standard RHD - BD
100% (10)
Geometric Design Standard RHD - BD
46 pages
Pi - Trahair - Inelastic Torsion of Steel I Beams
No ratings yet
Pi - Trahair - Inelastic Torsion of Steel I Beams
12 pages
3 - Dynamics Force and Motion Analysis of Plane Mechanism
No ratings yet
3 - Dynamics Force and Motion Analysis of Plane Mechanism
21 pages
What Are The Various Types of C Vuser Correlation Functions
No ratings yet
What Are The Various Types of C Vuser Correlation Functions
3 pages
Wheel-Balancer-Ml Balanceadora Manatec
No ratings yet
Wheel-Balancer-Ml Balanceadora Manatec
4 pages
Broiler Preventive Maintenance Checklist
No ratings yet
Broiler Preventive Maintenance Checklist
5 pages
June 2017 QP - Unit 4 Edexcel Physics A-Level PDF
No ratings yet
June 2017 QP - Unit 4 Edexcel Physics A-Level PDF
28 pages
Laporan Allison
No ratings yet
Laporan Allison
7 pages
Satir D300
No ratings yet
Satir D300
2 pages
XS-4222 4100XPC Assembly Procedure
100% (4)
XS-4222 4100XPC Assembly Procedure
8 pages
0231 - PM Internship Report
No ratings yet
0231 - PM Internship Report
38 pages
Mechanical Design Exit Exam Reviewer PDF
100% (1)
Mechanical Design Exit Exam Reviewer PDF
70 pages
Basic Concepts of Thermodynamics
100% (1)
Basic Concepts of Thermodynamics
19 pages
Cloud Resource Virtualization
100% (1)
Cloud Resource Virtualization
39 pages
Answer Key Elecs Superbook
100% (1)
Answer Key Elecs Superbook
46 pages
Science: Pure Substances Vs Mixtures
No ratings yet
Science: Pure Substances Vs Mixtures
17 pages
Mba Strategy Quant Advanced
No ratings yet
Mba Strategy Quant Advanced
196 pages

Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14

Uploaded by

Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14

Uploaded by

Deep Learning of Semantic Word

Representations to Implement a Content-based

Omar U. Florez1 , Lama Nachman1

Intel Labs, Santa Clara CA 95053, USA,

Abstract. In this paper, we will discuss a recommender system that

Keywords: deep learning, recommender systems, neural network lan-

2.1 Feature Learning

We describe a topic representation by learning the word topic proportions with

Fig. 1: a) A Recurrent Neural Network (RNN)

where N is the number of words in document d and wi is the i-th word in d. As

2.2 User Preference Learning

3 Why is the System Innovative?

Fig. 3: a) Effect of the number of topics b) Effect of the number of iterations

We compare the quality of different content representations in terms of

You might also like