0% found this document useful (0 votes)
45 views11 pages

Movie Recommender Systems

Uploaded by

Smit Mandavia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

Movie Recommender Systems

Uploaded by

Smit Mandavia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Project Title Movie Recommender Systems

Tools Jupyter Notebook and VS code

Technologies Data Science

Project Difficulties level intermediate

Dataset : Dataset is available in the given link. You can download it at your convenience.

Click here to download data set

Movies Recommender System


About Dataset
Context
These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of
movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters,
release dates, languages, production companies, countries, TMDB vote counts and vote averages.

This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a
scale of 1-5 and have been obtained from the official GroupLens website.

Content
This dataset consists of the following files:
movies_metadata.csv: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full
MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production
countries and companies.

keywords.csv: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified
JSON Object.

credits.csv: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON
Object.

links.csv: The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.

links_small.csv: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.

ratings_small.csv: The subset of 100,000 ratings from 700 users on 9,000 movies.

The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all
the 45,000 movies in this dataset can be accessed here

Acknowledgements
This dataset is an ensemble of data collected from TMDB and GroupLens.
The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the
TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional
movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the
dataset available here

Inspiration
This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I
wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this
metadata in combination with MovieLens ratings to build various types of Recommender Systems.

Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems

Some of the things you can do with this dataset:


Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote
counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation
Engines.

Movie Recommender System Machine Learning Project

This project involves building a movie recommender system using machine learning techniques. Here's a
step-by-step guide:

1. Problem Definition

Objective: Develop a movie recommender system that suggests movies to users based on their past behavior and
preferences.

2. Data Collection

For this example, we'll use the MovieLens dataset, which is commonly used for movie recommendation systems.
You can download it from MovieLens.

3. Data Preprocessing

import pandas as pd

# Load the datasets


movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

# Display basic info and check for missing values


print(movies.info())
print(ratings.info())

print(movies.head())
print(ratings.head())

4. Exploratory Data Analysis (EDA)

import seaborn as sns


import matplotlib.pyplot as plt

# Basic statistics
print(ratings.describe())
# Histogram of ratings
ratings['rating'].hist(bins=30)
plt.title('Distribution of Movie Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

# Number of ratings per movie


ratings_per_movie = ratings.groupby('movieId').count()['rating']
ratings_per_movie.hist(bins=50)
plt.title('Number of Ratings per Movie')
plt.xlabel('Number of Ratings')
plt.ylabel('Count')
plt.show()

5. Building the Recommender System

Collaborative Filtering using Matrix Factorization (SVD)

from surprise import Dataset, Reader, SVD


from surprise.model_selection import cross_validate

# Load the data into Surprise format


reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Use SVD for matrix factorization


svd = SVD()

# Cross-validation to evaluate the algorithm


cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Training the Model

trainset = data.build_full_trainset()
svd.fit(trainset)
6. Making Predictions

# Predict the rating for a specific user and movie


user_id = 1
movie_id = 10
rating_prediction = svd.predict(user_id, movie_id)
print(f"Predicted rating for user {user_id} and movie {movie_id}: {rating_prediction.est}")

7. Recommending Movies

# Function to recommend top N movies for a given user


def recommend_movies(user_id, num_recommendations=10):
# Get a list of all movie ids
movie_ids = movies['movieId'].unique()

# Predict ratings for all movies the user hasn't rated yet
movie_ratings = [svd.predict(user_id, movie_id).est for movie_id in movie_ids]

# Create a DataFrame of movie ids and predicted ratings


recommendations = pd.DataFrame({
'movieId': movie_ids,
'predicted_rating': movie_ratings
})

# Sort the DataFrame by predicted rating in descending order


recommendations = recommendations.sort_values(by='predicted_rating',
ascending=False)

# Get the top N recommended movies


top_recommendations = recommendations.head(num_recommendations)

# Merge with the movies DataFrame to get movie titles


top_recommendations = pd.merge(top_recommendations, movies, on='movieId')

return top_recommendations

# Recommend top 10 movies for user with ID 1


recommendations = recommend_movies(1, 10)
print(recommendations)
8. Deployment

To deploy the recommender system, you could create a simple web application using Flask.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/recommend', methods=['POST'])
def recommend():
data = request.get_json(force=True)
user_id = data['user_id']
num_recommendations = data.get('num_recommendations', 10)
recommendations = recommend_movies(user_id, num_recommendations)
return jsonify(recommendations.to_dict(orient='records'))

if __name__ == '__main__':
app.run(debug=True)

9. Monitoring and Maintenance

Set up logging and monitoring to track the performance of your recommender system, and
schedule regular retraining with new data to keep the recommendations relevant.

10. Documentation and Reporting

Maintain comprehensive documentation of the project, including data sources, preprocessing


steps, model selection, and evaluation results. Create detailed reports and visualizations to
communicate findings and insights to stakeholders.

Tools and Technologies

● Programming Language: Python


● Libraries: pandas, numpy, seaborn, matplotlib, scikit-learn, Surprise, Flask
● Visualization Tools: Tableau, Power BI, or any dashboarding tool for advanced
visualizations
This is a basic outline of a movie recommender system project. Depending on your specific
goals and data, you may need to adjust the steps accordingly.

Sample Project Report

Simple Recommender¶
The Simple Recommender offers generalized recommnendations to every user based on movie popularity
and (sometimes) genre. The basic idea behind this recommender is that movies that are more popular and
more critically acclaimed will have a higher probability of being liked by the average audience. This model
does not give personalized recommendations based on the user.

The implementation of this model is extremely trivial. All we have to do is sort our movies based on ratings
and popularity and display the top movies of our list. As an added step, we can pass in a genre argument to
get the top movies of a particular genre.
Reference link

You might also like