0% found this document useful (0 votes)

26 views

Karan Mini Proj

Uploaded by

Karan D Parge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Karan Mini Proj

Uploaded by

Karan D Parge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

lOM oAR c P S D | 4007 793 9

Department of Computer engineering

TSSM BSCOER
Project
Report On

Movie Recommendation System

Karan Devdas Parge

Roll No:12

GUIDED BY:
Prof. A. D. Gujar
lOM oAR c P S D | 4007 793 9

Movie Recommendation System

PROBLEM STATEMENT
Develop a movie recommendation model using the scikit-learn library in python.

OBJECTIVE
The objective of this recommendation system is to provide satisfactory movie
recommendations to users while keeping the system user friendly i.e. by taking minimum
input from users. It recommends the movies based on metadata of the movies and past user
ratings.

TECHNOLOGY USED

Machine Learning Library:

• pandas numpy

• difflib
• AST

• scikit-learn Requirements:

• Python 3.6

THEORY

1. What is scikit-learn?
lOM oAR c P S D | 4007 793 9

Scikit-Learn is a free machine learning library for Python. It supports both supervised and
unsupervised machine learning, providing diverse algorithms for classification, regression,
clustering, and dimensionality reduction. It is licensed under a permissive simplified BSD
license and is distributed under many Linux distributions, encouraging academic and
commercial use.

The library is built upon the SciPy (Scientific Python) that must be installed before you can
use scikit-learn. This stack that includes:

• NumPy: Base n-dimensional array package

• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console
• Sympy: Symbolic mathematics
• Pandas: Data structures and analysis

Extensions or modules for SciPy care conventionally named SciKits. As such, the module
provides learning algorithms and is named scikit-learn.
The vision for the library is a level of robustness and support required for use in production
systems. This means a deep focus on concerns such as easy of use, code quality,
collaboration, documentation and performance.

Although the interface is Python, c-libraries are leverage for performance such as numpy for
arrays and matrix operations.
It was originally called scikits.learn and was initially developed by David Cournapeau as a
Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael Varoquaux,
Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for Research in
Computer Science and Automation), took this project at another level and made the first
public release (v0.1 beta) on 1st Feb. 2010.

1.1 FEATURES:
The library is focused on modelling data. It is not focused on loading, manipulating and
summarizing data. For these features, refer to NumPy and Pandas.Some popular groups of
models provided by scikit-learn include:
lOM oAR c P S D | 4007 793 9

• Clustering: for grouping un labelled data such as K Means.

• Cross Validation: for estimating the performance of supervised models on unseen data.
• Datasets: for test datasets and for generating datasets with specific properties for
investigating model behaviour.
• Dimensionality Reduction: for reducing the number of attributes in data for
summarization, visualization and feature selection such as Principal component
analysis.
• Ensemble methods: for combining the predictions of multiple supervised models.
• Feature extraction: for defining attributes in image and text data.
• Feature selection: for identifying meaningful attributes from which to create
supervised models.
• Parameter Tuning: for getting the most out of supervised models.
• Manifold Learning: For summarizing and depicting complex multi-dimensional data.
• Supervised Models: a vast array not limited to generalized linear models,
discriminate analysis, naive bayes, lazy methods, neural networks, support vector
machines and decision trees

2. What is a Recommendation System?

Simply put a Recommendation System is a filtration program whose prime goal is to predict
the “rating” or “preference” of a user towards a domain-specific item or item. In our case, this
domain-specific item is a movie, therefore the main focus of our recommendation system is to
filter and predict only those movies which a user would prefer given some data about the user
him or herself.

2.1. Recommendation System Mechanism:

The engine of the recommendation system filters the data via different machine learning
algorithms, and based on that filtering, it can predicts the most relevant entities to be
recommended. After studying the previous behaviours of the users, it recommends
products/services that the used may be interested on.
lOM oAR c P S D | 4007 793 9

The engine’s working of a recommendation is classified in these 3 steps:

2.1.1. Data Collection

The techniques that can be used to collect data are:

1. Explicit, where data are provided intentionally as an information (e.g. user’s input
such as movies rating)

2. Implicit, where data are provided intentionally but gathered from available data
stream (e.g. search history, clicks, order history, etc…)

2.1.2 Data Storage

It can be stored in a cloud storage such as SQL database, NoSQL database, or some other
kind of object storage. However, it depends on the data type and amount as well. The
more data that the storage can have for the model, the better recommendation system can
be.

3. What are the different filtration strategies?

lOM oAR c P S D | 4007 793 9

3.1. Content-based Filtering:

This filtration strategy is based on the data provided about the items. The Algorithm
recommends products that are similar to the ones that a user has liked in the past. This
similarity (generally cosine similarity) is computed from the data we have about the items as
well as the user’s past preferences.

For example, if a user likes movies such as ‘The Prestige’ then we can recommend him the
movies of ‘Christian Bale’ or movies with the genre ‘Thriller’ or maybe even movies directed
by ‘Christopher Nolan’. So what happens here the recommendation system checks the past
preferences of the user and find the film “The Prestige”, then tries to find similar movies to
that using the information available in the database such as the lead actors, the director, genre
of the film, production house, etc and based on this information find movies similar to “The
Prestige”.

Disadvantages:

1. Different products do not get much exposure to the user.

2. Businesses cannot be expanded as the user does not try different types of products.

3.2. Collaborative Filtering:

This filtration strategy is based on the combination of the user’s behaviour and comparing and
contrasting that with other users’ behaviour in the database. The history of all users plays an
lOM oAR c P S D | 4007 793 9

important role in this algorithm. The main difference between content-based filtering and
collaborative filtering that in the latter, the interaction of all users with the items influences
the recommendation algorithm while for content-based filtering only the concerned user’s
data is taken into account. There are multiple ways to implement collaborative filtering but
the main concept to be grasped is that in collaborative filtering multiple user’s data influences
the outcome of the recommendation. and doesn’t depend on only one user’s data for
modelling.

There are 2 types of collaborative filtering algorithms:

3.2.1. User-based Collaborative filtering:

The basic idea here is to find users that have similar past preference patterns as the user ‘A’
has had and then recommending him or her items liked by those similar users which ‘A’ has
not encountered yet. This is achieved by making a matrix of items each user has
rated/viewed/liked/clicked depending upon the task at hand, and then computing the
similarity score between the users and finally recommending items that the concerned user
isn’t aware of but users similar to him/her are and liked it. For example, if the user ‘A’ likes
‘Batman Begins’, ‘Justice League’ and ‘The Avengers’ while the user ‘B’ likes ‘Batman
Begins’, ‘Justice League’ and ‘Thor’ then they have similar interests because we know that
these movies belong to the super-hero genre. So, there is a high probability that the user ‘A’
would like ‘Thor’ and the user ‘B’ would like The Avengers’.

Disadvantages:

1. People are fickle-minded i.e their taste change from time to time and as this algorithm
is based on user similarity it may pick up initial similarity patterns between 2 users
who after a while may have completely different preferences.
2. There are many more users than items therefore it becomes very difficult to maintain
such large matrices and therefore needs to be recomputed very regularly.
3. This algorithm is very susceptible to shilling attacks where fake users profiles
consisting of biased preference patterns are used to manipulate key decisions.

3.2.2. Item-based Collaborative Filtering:

The concept in this case is to find similar movies instead of similar users and then
recommending similar movies to that ‘A’ has had in his/her past preferences. This is executed
lOM oAR c P S D | 4007 793 9

by finding every pair of items that were rated/viewed/liked/clicked by the same user, then
measuring the similarity of those rated/viewed/liked/clicked across all user who
rated/viewed/liked/clicked both, and finally recommending them based on similarity scores.

Here, for example, we take 2 movies ‘A’ and ‘B’ and check their ratings by all users who
have rated both the movies and based on the similarity of these ratings, and based on this
rating similarity by users who have rated both we find similar movies. So if most common
users have rated ‘A’ and ‘B’ both similarly and it is highly probable that ‘A’ and ‘B’ are
similar, therefore if someone has watched and liked ‘A’ they should be recommended ‘B’ and
vice versa.

Advantages over User-based Collaborative Filtering :

1. Unlike people’s taste, movies don’t change.

2. There are usually a lot fewer items than people, therefore easier to maintain and compute
the matrices.
3. Shilling attacks are much harder because items cannot be faked.

4. Data Description:

A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds
to one or more database tables, where every column of a table represents a particular variable,
and each row corresponds to a given record of the data set in question. The data set lists
values for each of the variables, such as for example height and weight of an object, for each
member of the data set. Data sets can also consist of a collection of documents or files. In the
open data discipline, data set is the unit to measure the information released in a public open
data repository. The European Open Data portal aggregates more than half a million data
sets.[2] Some other issues (real-time data sources,[3] non-relational data sets, etc.) increases the
difficulty to reach a consensus about it.[
This dataset contain 26 million ratings from 270,000 users for all 45,000 movies listed in the
Full Movie Lens Dataset. The dataset consists of movies released on or before July 2017.
Data points include cast, crew, plot keywords, budget, revenue, posters, release dates,
languages, production companies, countries, TMDB vote counts and vote averages.
lOM oAR c P S D | 4007 793 9

5. Building a Movie Recommendation System:

The approach to build the movie recommendation engine consists of the following:

1. Perform Exploratory Data Analysis (EDA) on the data.

2. Build the recommendation system.
3. Get recommendations.

• After downloading the dataset, we need to import all the required libraries and then
read the csv file using read_csv() method.
• If you visualize the dataset, you will see that it has many extra info about a movie.We
don’t need all of them. So, we choose keywords, cast, genres and director column to
use as our feature set(the so called “content” of the movie).
• If you visualize the dataset, you will see that it has many extra info about a movie.We
don’t need all of them. So, we choose keywords, cast, genres and director column to
use as our feature set(the so called “content” of the movie).
• Now, we need to call this function over each row of our dataframe. But, before doing
that, we need to clean and preprocess the data for our use.
• We will fill all the NaN values with blank string in the dataframe. Now that we have
obtained the combined strings, we can now feed these strings to a CountVectorizer()
object for getting the count matrix.
• At this point, 60% work is done. Now, we need to obtain the cosine similarity
matrixfrom the count matrix.
• Now, we will define two helper functions to get movie title from movie index and
vice-versa.
• Our next step is to get the title of the movie that the user currently likes. Then we
will find the index of that movie.
• After that, we will access the row corresponding to this movie in the similarity matrix.
• Thus, we will get the similarity scores of all other movies from the current
movie.Then we will enumerate through all the similarity scores of that movie to
make a tuple of movie index and similarity score.
• This will convert a row of similarity scores like this- [1 0.5 0.2 0.9] to this- [(0, 1) (1,
0.5) (2, 0.2) (3, 0.9)] . Here, each item is in this form- (movie index, similarity
score). Now comes the most vital point.
lOM oAR c P S D | 4007 793 9

• We will sort the list similar_movies according to similarity scores in descending

order. Since the most similar movie to a given movie will be itself, we will discard
the first element after sorting the movies.
• Now, we will run a loop to print first 5 entries from sorted_similar_movies list

INPUT
Here we use the movie_dataset.csv file.
The code goes as follows:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity df =
pd.read_csv("movie_dataset.csv") features =
['keywords','cast','genres','director'] def
combine_features(row): return row['keywords'] +"
"+row['cast']+" "+row["genres"]+" "+row["director"] for
feature in features: df[feature] = df[feature].fillna('')
df["combined_features"] = df.apply(combine_features,axis=1)
cv = CountVectorizer()
count_matrix = cv.fit_transform(df["combined_features"])
cosine_sim = cosine_similarity(count_matrix) def
get_title_from_index(index):
return df[df.index == index]["title"].values[0]
def get_index_from_title(title): return
df[df.title == title]["index"].values[0]
movie_user_likes = "Avatar"
movie_index = get_index_from_title(movie_user_likes)
similar_movies = list(enumerate(cosine_sim[movie_index]))
sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse
=True)[1:]
i=0
print("Top 5 similar movies to "+movie_user_likes+" are:\n")
for element in sorted_similar_movies:
print(get_title_from_index(element[0])) i=i+1 if i>=5:
break

OUTPUT
Top 5 similar movies to Avatar are:

Guardians of the Galaxy

Aliens
Star Wars: Clone Wars: Volume 1
lOM oAR c P S D | 4007 793 9

Star Trek Into Darkness

Star Trek Beyond

CONCLUSION

Recommendation systems have become an important part of everyone’s lives. With the
enormous number of movies releasing worldwide every year, people often miss out on some
amazing work of arts due to the lack of correct suggestion. Putting machine learning based
Recommendation systems into work is thus very important to get the right recommendations.
We saw content-based recommendation systems that although may not seem very effective on
its own, but when combined with collaborative techniques can solve the cold start problems
that collaborative filtering methods face when run independently.

ISO 27001-2022 Transition Book
100% (6)
ISO 27001-2022 Transition Book
20 pages
SRMDB - in (B28 - Research Paper)
No ratings yet
SRMDB - in (B28 - Research Paper)
5 pages
Project Report On Recommendation System
100% (4)
Project Report On Recommendation System
26 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Ingold Eight Themes in The Anthropology of Technology Author S Tim Ingold PDF
No ratings yet
Ingold Eight Themes in The Anthropology of Technology Author S Tim Ingold PDF
34 pages
DSBDA_Mini_Project
No ratings yet
DSBDA_Mini_Project
11 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Recommended System [5]
No ratings yet
Recommended System [5]
33 pages
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
No ratings yet
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
33 pages
Dsbda Report Final
No ratings yet
Dsbda Report Final
15 pages
Dsbda Mini Project Aissms Clg
No ratings yet
Dsbda Mini Project Aissms Clg
10 pages
Getting Information Off The Internet Is Like Taking A Drink From A Fire Hydrant!
No ratings yet
Getting Information Off The Internet Is Like Taking A Drink From A Fire Hydrant!
22 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
python-based Personalized Recommendation System Development
No ratings yet
python-based Personalized Recommendation System Development
37 pages
419623731-Project-Report-on-Recommendation-System (1)
No ratings yet
419623731-Project-Report-on-Recommendation-System (1)
26 pages
ML Unit 6
No ratings yet
ML Unit 6
83 pages
Quick Guide Build Recommendation Engine Python
No ratings yet
Quick Guide Build Recommendation Engine Python
17 pages
Content Based Movie Recommendation System by Python
No ratings yet
Content Based Movie Recommendation System by Python
44 pages
Movie_Recommendation_Report
No ratings yet
Movie_Recommendation_Report
27 pages
Gopal Project
No ratings yet
Gopal Project
31 pages
Minor Project
No ratings yet
Minor Project
15 pages
ML Report
No ratings yet
ML Report
23 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Recommender Systems Asanov
No ratings yet
Recommender Systems Asanov
7 pages
Batch D17
No ratings yet
Batch D17
17 pages
Bda - M 5
No ratings yet
Bda - M 5
14 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
Project Report in House
No ratings yet
Project Report in House
19 pages
Recommendation System-WPS Office
No ratings yet
Recommendation System-WPS Office
18 pages
Recommender Lecture
No ratings yet
Recommender Lecture
29 pages
recommender-system
No ratings yet
recommender-system
8 pages
IV YEAR_MINI PROJECT_FINAL REVIEW PPT SAMPLE FORMAT
No ratings yet
IV YEAR_MINI PROJECT_FINAL REVIEW PPT SAMPLE FORMAT
25 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
5 pages
Module 5
No ratings yet
Module 5
8 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
46 pages
Seminar Report Final
No ratings yet
Seminar Report Final
46 pages
Recommender - Introduction
No ratings yet
Recommender - Introduction
25 pages
filter2
No ratings yet
filter2
7 pages
Movie Recommendation System: Synopsis For Project (KCA 353)
No ratings yet
Movie Recommendation System: Synopsis For Project (KCA 353)
17 pages
Session 1 2
No ratings yet
Session 1 2
92 pages
Movie Recommendation System RAHULdocx
No ratings yet
Movie Recommendation System RAHULdocx
46 pages
Internship Report
No ratings yet
Internship Report
26 pages
Jangan Hapus 1
No ratings yet
Jangan Hapus 1
14 pages
fin_irjmets1731397431
No ratings yet
fin_irjmets1731397431
7 pages
Movie Recommender
No ratings yet
Movie Recommender
23 pages
Web Crawling Based Context Aware Recommender Syste
No ratings yet
Web Crawling Based Context Aware Recommender Syste
25 pages
smlPBL
No ratings yet
smlPBL
18 pages
A Seminar Report (Updated)
No ratings yet
A Seminar Report (Updated)
23 pages
Recommendation Systems: Department of Computer Science Engineering University School of Information and Technology
No ratings yet
Recommendation Systems: Department of Computer Science Engineering University School of Information and Technology
6 pages
UNIT 1
No ratings yet
UNIT 1
9 pages
Machine_Learning_Model_for_Movie_Recomme
No ratings yet
Machine_Learning_Model_for_Movie_Recomme
6 pages
Movie Recommendation System KHURRAM
No ratings yet
Movie Recommendation System KHURRAM
46 pages
B28 Viva
No ratings yet
B28 Viva
27 pages
Deep Learning For Recommendation System
No ratings yet
Deep Learning For Recommendation System
8 pages
Ai Document
No ratings yet
Ai Document
11 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
bda mini project part2
No ratings yet
bda mini project part2
24 pages
Cloud Computing Report
No ratings yet
Cloud Computing Report
38 pages
Project Synopsis
No ratings yet
Project Synopsis
14 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
SQL Practical
No ratings yet
SQL Practical
40 pages
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
100% (5)
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
62 pages
CS100: Introduction To Computer Science: In-Class Exercise
No ratings yet
CS100: Introduction To Computer Science: In-Class Exercise
6 pages
Lecture Database Security 2
No ratings yet
Lecture Database Security 2
23 pages
Consistent Self Citation Analysis of Nelson Tansu
No ratings yet
Consistent Self Citation Analysis of Nelson Tansu
6 pages
What Is A Search Engine? How Search Engine Works? Complete Guide 2019
No ratings yet
What Is A Search Engine? How Search Engine Works? Complete Guide 2019
3 pages
R18 B.Tech - CSE (Data Science) 3-1 Tentative Syllabus
No ratings yet
R18 B.Tech - CSE (Data Science) 3-1 Tentative Syllabus
24 pages
PDMS Vs SmartPlant
0% (1)
PDMS Vs SmartPlant
24 pages
ENGL5-Q4-L2-LONG QUIZ - Google Forms
No ratings yet
ENGL5-Q4-L2-LONG QUIZ - Google Forms
5 pages
Chapter 2-Entity Relationship Model
100% (1)
Chapter 2-Entity Relationship Model
48 pages
Data Egineering Simplified Cheat Sheet 2023 06 03
No ratings yet
Data Egineering Simplified Cheat Sheet 2023 06 03
2 pages
Retail Information System
No ratings yet
Retail Information System
28 pages
Lab 1
No ratings yet
Lab 1
4 pages
Forensic Investigation in Cloud Using VM Snapshots
No ratings yet
Forensic Investigation in Cloud Using VM Snapshots
5 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
9 pages
Togaf Series Guide: Information Architecture: Metadata Management
100% (1)
Togaf Series Guide: Information Architecture: Metadata Management
46 pages
Sap BW Tcodes List
No ratings yet
Sap BW Tcodes List
15 pages
Operation Analytics and Investigating Metric Spike Project
No ratings yet
Operation Analytics and Investigating Metric Spike Project
8 pages
ITDBS Lab Session 03
No ratings yet
ITDBS Lab Session 03
8 pages
Hypertext An Introduction and Survey
No ratings yet
Hypertext An Introduction and Survey
26 pages
Introduction To Database
No ratings yet
Introduction To Database
31 pages
EDM Council - FIBO Semantics Initiative
No ratings yet
EDM Council - FIBO Semantics Initiative
32 pages
NLP Unit-5
No ratings yet
NLP Unit-5
14 pages
DocuSign CLM Build Workflows
No ratings yet
DocuSign CLM Build Workflows
313 pages
Upload A Document To Access Your Download: Tall Building Design - Shanghai Tower PDF
No ratings yet
Upload A Document To Access Your Download: Tall Building Design - Shanghai Tower PDF
3 pages
Document 5
No ratings yet
Document 5
24 pages
1151MySQL (Create Table) Exercises
100% (2)
1151MySQL (Create Table) Exercises
7 pages

Karan Mini Proj

Uploaded by

Karan Mini Proj

Uploaded by

lOM oAR c P S D | 4007 793 9

Department of Computer engineering

Movie Recommendation System

Karan Devdas Parge

Movie Recommendation System

Machine Learning Library:

• NumPy: Base n-dimensional array package

• Clustering: for grouping un labelled data such as K Means.

2. What is a Recommendation System?

2.1. Recommendation System Mechanism:

The engine’s working of a recommendation is classified in these 3 steps:

2.1.1. Data Collection

2.1.2 Data Storage

3. What are the different filtration strategies?

3.1. Content-based Filtering:

1. Different products do not get much exposure to the user.

3.2. Collaborative Filtering:

There are 2 types of collaborative filtering algorithms:

3.2.1. User-based Collaborative filtering:

3.2.2. Item-based Collaborative Filtering:

Advantages over User-based Collaborative Filtering :

1. Unlike people’s taste, movies don’t change.

5. Building a Movie Recommendation System:

1. Perform Exploratory Data Analysis (EDA) on the data.

• We will sort the list similar_movies according to similarity scores in descending

Guardians of the Galaxy

Star Trek Into Darkness

You might also like