1 Introduction - Recommender Systems
1 Introduction - Recommender Systems
Alaa BAKHTI
The course Syllabus
- 5 sessions - 21 hours
● Session 1: Introduction to recommender systems
- Each session = 2h theory + 2h
practical work ● Session 2: Content-based filtering
- Grading
- Research paper presentation
● Session 3: Memory-based collaborative filtering
- Subject: choose one paper from this list and post it the course teams
channel (first come, first serve)
- What: 25 min presentation + 5 min Q&A
- Who: 2 students per group
- When: last session
2 tower model implementation
TF IDF
Embedding space
word2vec
Cosine similarity
SVD
NMF
EDA
Loss function
L2 regularization
Dropout
Dense layer
AB testing?
Introduction
The long tail problem
Items
75%
of the watched content is from
some sort of recommendation
Source: Netflix recommendations: Beyond the 5 stars - X. Amatriain and J. Basilico - Netflix Inc - 2012.
Netflix: movie recommendation
$1B
per year is the estimated business
value of recommendation
Source: The Netflix recommender system: Algorithms, business value, and innovation - C. A. Gomez-Uribe and N. Hunt - Netflix Inc - 2015.
Amazon: product recommendation
35%
of Amazon sales originate from
cross-sales (recommendation)
Source: How retailers can keep up with consumers - Mckinsey & Company - 2013.
Youtube: video recommendation
60%
of the clicks on the home screen
are on the recommendations
Source: The YouTube Video Recommendation System - J. Davidson & al - Google Inc - 2010.
Problem formulation
Listened to
Recommend from
How to determine items that the user may be interested in?
Listened to
Rated
Recommend from
Rating matrix
4/5 2/5 ? ? ? ?
Explicit Implicit
- Data provided by users intentionally. - Data generated based on the user interaction
with items (easier to collect).
- Example : Press the like button on a
- Example : purchased an item => high rating.
YouTube video.
- Problem : poorly learns low ratings (what the
- Problem : it requires effort from the user
=> doesn’t scale. user doesn’t like).
Subject
● Exploratory Data Analysis (EDA) on the movielens dataset (ml-latest-small.zip) (another more complex dataset the
Movies dataset)
● Final dataset will be used in the next sessions
● Use only the files ratings.csv and movies.csv
TODO
● Create a Git repository for the recsys class
● Create a virtual environment “recsys”
● Download the data and store it in the “data” folder
● Create a notebook “movielens-eda.ipynb” to prepare and clean up the dataset (remove missing values, duplicates, check
distributions, …)