Movie Recommender Systems
Movie Recommender Systems
Dataset : Dataset is available in the given link. You can download it at your convenience.
This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a
scale of 1-5 and have been obtained from the official GroupLens website.
Content
This dataset consists of the following files:
movies_metadata.csv: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full
MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production
countries and companies.
keywords.csv: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified
JSON Object.
credits.csv: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON
Object.
links.csv: The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.
links_small.csv: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.
ratings_small.csv: The subset of 100,000 ratings from 700 users on 9,000 movies.
The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all
the 45,000 movies in this dataset can be accessed here
Acknowledgements
This dataset is an ensemble of data collected from TMDB and GroupLens.
The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the
TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional
movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.
The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the
dataset available here
Inspiration
This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I
wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this
metadata in combination with MovieLens ratings to build various types of Recommender Systems.
Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems
This project involves building a movie recommender system using machine learning techniques. Here's a
step-by-step guide:
1. Problem Definition
Objective: Develop a movie recommender system that suggests movies to users based on their past behavior and
preferences.
2. Data Collection
For this example, we'll use the MovieLens dataset, which is commonly used for movie recommendation systems.
You can download it from MovieLens.
3. Data Preprocessing
import pandas as pd
print(movies.head())
print(ratings.head())
# Basic statistics
print(ratings.describe())
# Histogram of ratings
ratings['rating'].hist(bins=30)
plt.title('Distribution of Movie Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()
trainset = data.build_full_trainset()
svd.fit(trainset)
6. Making Predictions
7. Recommending Movies
# Predict ratings for all movies the user hasn't rated yet
movie_ratings = [svd.predict(user_id, movie_id).est for movie_id in movie_ids]
return top_recommendations
To deploy the recommender system, you could create a simple web application using Flask.
app = Flask(__name__)
@app.route('/recommend', methods=['POST'])
def recommend():
data = request.get_json(force=True)
user_id = data['user_id']
num_recommendations = data.get('num_recommendations', 10)
recommendations = recommend_movies(user_id, num_recommendations)
return jsonify(recommendations.to_dict(orient='records'))
if __name__ == '__main__':
app.run(debug=True)
Set up logging and monitoring to track the performance of your recommender system, and
schedule regular retraining with new data to keep the recommendations relevant.
Simple Recommender¶
The Simple Recommender offers generalized recommnendations to every user based on movie popularity
and (sometimes) genre. The basic idea behind this recommender is that movies that are more popular and
more critically acclaimed will have a higher probability of being liked by the average audience. This model
does not give personalized recommendations based on the user.
The implementation of this model is extremely trivial. All we have to do is sort our movies based on ratings
and popularity and display the top movies of our list. As an added step, we can pass in a genre argument to
get the top movies of a particular genre.
Reference link