0% found this document useful (0 votes)
263 views45 pages

Netflix Movie Recommendation System

The document discusses a project report on a Netflix movie recommendation system. It describes the relevance and objectives of the project which are to improve the quality, accuracy and scalability of movie recommendation systems. It outlines the methodology, literature survey, system requirements, analysis, design, implementation and testing of the hybrid recommendation system.

Uploaded by

yashwardhan6031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views45 pages

Netflix Movie Recommendation System

The document discusses a project report on a Netflix movie recommendation system. It describes the relevance and objectives of the project which are to improve the quality, accuracy and scalability of movie recommendation systems. It outlines the methodology, literature survey, system requirements, analysis, design, implementation and testing of the hybrid recommendation system.

Uploaded by

yashwardhan6031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

A

Project I Report
On
“NETFLIX MOVIE RECOMMENDATION SYSTEM”

Submitted for Partial Fulfillment of the Award of


Bachelor of Technology (B.Tech) in CSE
Kurukshetra University Kurukshetra

Submitted By: Submitted To:

Ashutosh Panwar Dr. Monika


1220188 (A.P, Deptt. of CSE)
4 CSE-A

Department of Computer Science and Engineering

Seth Jai Parkash Mukand Lal Institute of Engineering &


Technology
Affiliated to Kurukshetra University Kurukshetra
Declaration

We hereby certify that the work which is being presented in the Project I Report entitled,
“NETFLIX MOVIE RECOMMENDATION SYSTEM” by us, Ashutosh Panwar(1220188)
in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology
in Computer Science Engineering submitted in the Department of Computer Science and
Engineering at JMIT Radaur (Affiliated to Kurukshetra University, Kurukshetra,
Haryana (India)) is an authentic record of our own work carried out under the supervision of
Dr. Monika. The matter presented in the report has not been submitted to any other
University/Institute for the award of any degree.

Ashutosh Panwar

This is to certify that the above statement made by the candidate is correct to the best of
my/our knowledge.

Dr. Monika
A.P, Department of CSE. JMIT Radaur

Countersigned By: Dr. Gaurav Sharma (H.O.D)


Acknowledgement

The writing of this project report has been assisted by the generous help of many people. We feel
that we were very fortunate to receive assistance from them. We wish to express our sincere
appreciation to them.

First and foremost, we are indebted to our principal supervisor, Dr. Monika (A.P, Department of
Computer Science and Engineering) of JMIT Radaur, who has been very supportive at every
stage of our project completion. We wish to express our utmost gratitude to him for her
invaluable advice and patience in reading, correcting, and commenting.

First and foremost, we are indebted to our principal supervisor, Dr. Monika (A.P, Department of
Computer Science and Engineering) of JMIT Radaur, who has been very supportive at every
stage of our project completion. We wish to express our utmost gratitude to him for her
invaluable advice and patience in reading, correcting, and commenting on the drafts of this report
and, more importantly, for his generosity which we have received throughout our project
completion.

We wish to express our thanks to all staff members of JMIT Radaur, who also helped us in
conducting this study.

Finally, we are particularly indebted to our dearest parents/guardians as without their generous
assistance and love; this project could never have been completed.

Ashutosh Panwar
(1220188)

3
Abstract

In this hustling world, entertainment is a necessity for each one of us to refresh our mood and
energy. Entertainment regains our confidence for work and we can work more enthusiastically.
For revitalizing ourselves, we can listen to our preferred music or can watch movies of our
choice. For watching favorable movies online we can utilize movie recommendation systems,
which are more reliable, since searching for preferred movies will require more and more time
which one cannot afford to waste. In this paper, to improve the quality of a movie
recommendation system, a Hybrid approach by combining content-based filtering and
collaborative filtering, using Support Vector Machine as a classifier and a genetic algorithm is
presented in the proposed methodology and comparative results have been shown which depicts
that the proposed approach shows an improvement in the accuracy, quality, and scalability of the
movie recommendation system than the pure approaches in three different datasets. The hybrid
recommendation system combines both content-based and collaborative filtering algorithms that
predict the user's interest in movies.

4
Table of Contents

Declaration………………...……………………………………………………………..……..i
Acknowledgement………………………………………………………………………..…….ii
Abstract …………………………………………………………………………………..……iii
Table of Contents ………………………………………………………………………..…….iv
List of Figures……………………………………………………………………...…………..v

Chapter 1. INTRODUCTION
I. Relevance of the Project 9
II. Problem Statement 9
III. Objective 10
IV. Scope of the Project 10
V. Methodology 11

Chapter 2. LITERATURE SURVEY

I. k-means and k-nearest 13

Chapter 3. SYSTEM REQUIREMENTS SPECIFICATION

I. Hardware Requirements 16
II. Software Specification 16
III. Software Requirements 17

Chapter 4. SYSTEM ANALYSIS AND DESIGN

I. System Architecture 20
II. Activity Diagram 21
III. Flowchart 22

Chapter 5. IMPLEMENTATION

I. Cosine similarity 24
II. Cosine similarity 24
III. Experimental Setup 25
IV. Front-End/Back-End implementation details 26

Chapter 6. RESULTS AND DISCUSSION


5
I. Screenshots 29

Chapter 7. Testing 33

Chapter 8. Future Scope & Limitations 36

Chapter 9. Conclusion 38

Chapter 10. Bibliography 40

Chapter 11. Appendix 42

Chapter 12 Plagiarism report 45

6
List of figures

Figure Number Description Page Number

4.1 System Architecture 17

4.2 Data Flow Diagram 25

4.3 Activity Diagram 29

5.1 Front-End code snippet 30

5.2 Back-End code snippet 32

6.1 Comparison between the three approach 54

6.2 User id window 54

6.3 List of recommended movies 54

7.1 Performance Analysis 55

9.1 Project Meeting Log Sheet 55

9.2.1 Home Page 56

9.2.2 Recommendation Page 56

9.2.3 Movie Page (a) 56

9.2.4 Movie Page (b) 57

9.2.5 Netflix Page 57

7
Chapter 1
Introduction

8
A recommendation system or recommendation engine is a model used for information filtering
where it tries to predict the preferences of a user and provide suggests based on these
preferences. These systems have become increasingly popular nowadays and are widely used
today in areas such as movies, music, books, videos, clothing, restaurants, food, places and other
utilities. These systems collect information about a user's preferences and behaviour, and then
use this information to improve their suggestions in the future

Movies are a part and parcel of life. There are different types of movies like some for
entertainment, some for educational purposes, some are animated movies for children, and some
are horror movies or action films. Movies can be easily differentiated through their genres like
comedy, thriller, animation, action etc. Other way to distinguish among movies can be either by
releasing year, language, director etc. Watching movies online, there are a number of movies to
search in our most liked movies . Movie Recommendation Systems helps us to search our
preferred movies among all of these different types of movies and hence reduce the trouble of
spending a lot of time searching our favourable movies. So, it requires that the movie
recommendation system should be very reliable and should provide us with the recommendation
of movies which are exactly same or most matched with our preferences.

A large number of companies are making use of recommendation systems to increase user
interaction and enrich a user's shopping experience. Recommendation systems have several
benefits, the most important being customer satisfaction and revenue. Movie Recommendation
system is very powerful and important system. But, due to the problems associated with pure
collaborative approach, movie recommendation systems also suffers with poor recommendation
quality and scalability issues.

1.II Problem Statement:


● The goal of the project is to recommend a movie to the user. Providing related content
out of relevant and irrelevant collection of items to users of online service providers.

1.III Objective of the Projects

● Improving the Accuracy of the recommendation system


● Improve the Quality of the movie Recommendation system
● Improving the Scalability.
● Enhancing the user experience.

9
1.IV Scope of the Project

● The objective of this project is to provide accurate movie recommendations to users. The
goal of the project is to improve the quality of movie recommendation system, such as
accuracy, quality and scalability of system than the pure approaches. This is done using
Hybrid approach by combining content based filtering and collaborative filtering, To
eradicate the overload of the data, recommendation system is used as information
filtering tool in social networking sites .Hence, there is a huge scope of exploration in this
field for improving scalability, accuracy and quality of movie recommendation systems
Movie Recommendation system is very powerful and important system. But, due to the
problems associated with pure collaborative approach, movie recommendation systems
also suffers with poor recommendation quality and scalability issues.

1.V Methodology for Movie Recommendation

1. The hybrid approach proposed an integrative method by merging fuzzy kmeans


clustering method and genetic algorithm based weighted similarity measure to construct a
movie recommendation system. The proposed movie recommendation system gives finer
similarity metrics and quality than the existing Movie recommendation system but the
computation time which is taken by the proposed recommendation system is more than
the existing recommendation system. This problem can be fixed by taking the clustered
data points as an input dataset
2. The proposed approach is for improving the scalability and quality of the movie
recommendation system .We use a Hybrid approach , by unifying Content-Based
Filtering and Collaborative Filtering, so that the approaches can be profited from each
other. For computing similarity between the different movies in the given dataset
efficiently and in least time and to reduce computation time of the movie recommender
engine we used cosine similarity measure.

Agile Methodology:

1. Collecting the data sets: Collecting all the required data set from Kaggle web site.in this
project we require movie.csv,ratings.csv,users.csv.
10
2. Data Analysis: make sure that that the collected data sets are correct and analysing the
data in the csv files. i.e. checking whether all the column Felds are present in the data
sets.
3. Algorithms: in our project we have only two algorithms one is cosine similarity and
other is single valued decomposition are used to build the machine learning
recommendation model.
4. Training and Testing the model: once the implementation of algorithm is completed .
we have to train the model to get the result. We have tested it several times the model is
recommend different set of movies to different users.
5. Improvements in the project: In the later stage we can implement different algorithms
and methods for better recommendation.

11
Chapter 2
Research Design & Methodology

12
Over the years, many recommendation systems have been developed using either collaborative,
content based or hybrid filtering methods. These systems have been implemented using various
big data and machine learning algorithms.

2.I Movie Recommendation System by K-Means Clustering AND K-Nearest


Neighbour

A recommendation system collect data about the user’s preferences either implicitly or explicitly
on different items like movies. An implicit acquisition in the development of movie
recommendation system uses the user’s behaviour while watching the movies. On the other
hand, a explicit acquisition in the development of movie recommendation system uses the user’s
previous ratings or history. The other supporting technique that are used in the development of
recommendation system is clustering. Clustering is a process to group a set of objects in such a
way that objects in the same clusters are more similar to each other than to those in other
clusters. KMeans Clustering along with K-Nearest Neighbour is implemented on the movie lens
dataset in order to obtain the best-optimized result. In existing technique, the data is scattered
which results in a high number of clusters while in the proposed technique data is gathered and
results in a low number of clusters. The process of recommendation of a movie is optimized in
the proposed scheme. The proposed recommender system predicts the user’s preference of a
movie on the basis of different parameters. The recommender system works on the concept that
people are having common preference or choice. These users will influence on each other’s
opinions. This process optimizes the process and having lower RMSE.

2.2 Movie Recommendation System Using Collaborative Filtering:


Collaborative filtering systems analyse the user's behaviour and preferences and predict what
they would like based on similarity with other users. There are two kinds of collaborative
filtering systems; user-based recommender and item-based recommender.
● Use-based filtering: User-based preferences are very common in the field of designing
personalized systems. This approach is based on the user's likings. The process starts with
users giving ratings (1-5) to some movies. These ratings can be implicit or explicit.
Explicit ratings are when the user explicitly rates the item on some scale or indicates a
thumbs-up/thumbs-down to the item. Often explicit ratings are hard to gather as not every
user is much interested in providing feedbacks. In these scenarios, we gather implicit
ratings based on their behaviour. For instance, if a user buys a product more than once, it
indicates a positive preference. In context to movie systems, we can imply that if a user
watches the entire movie, he/she has some likeability to it. Note that there are no clear
rules in determining implicit ratings. Next, for each user, we first find some defined

13
number of nearest neighbours. We calculate correlation between users' ratings using
Pearson Correlation algorithm.
● Item-based filtering: Unlike the user-based filtering method, itembased focuses on the
similarity between the item’s users like instead of the users themselves. The most similar
items are computed ahead of time. Then for recommendation, the items that are most
similar to the target item are recommended to the user.

4. Data Analysis:
● The collected data from blogs, existing models, focus groups, user testing, and market
research were subjected to rigorous analysis.
● Qualitative data from blogs, focus groups, and user testing sessions were transcribed,
coded, and thematically analyzed to identify key insights and patterns.
● The data analysis provided a comprehensive understanding of user needs, preferences,
pain points, and market trends, which formed the foundation for decision-making in the
system design and development process.
● By employing these primary research techniques and conducting detailed analysis, a
comprehensive and data-driven understanding of user needs, market dynamics, and
industry trends was obtained. The findings from primary research informed key decisions
in the initial phase of system development for Voyance, ensuring that the proposed
system addresses user requirements, aligns with industry best practices, and delivers an
enhanced user experience.

14
Chapter 3
SYSTEM REQUIREMENTS SPECIFICATION

15
This chapter involves both the hardware and software requirements needed for the project and
detailed explanation of the specifications.

3.1 Hardware Requirements

● A PC with Windows/Linux OS
● Processor with 1.7-2.4gHz speed
● Minimum of 8gb RAM
● 2gb Graphic card

3.2 Software Specification


● Text Editor (VS-code/WebStorm)
● Anaconda distribution package (PyCharm Editor)
● Python libraries

3.3 Software Requirements


3.3.1 Anaconda distribution:
Anaconda is a free and open-source distribution of the Python programming
languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.), that aims to simplify
package management system and deployment. Package versions are managed
by the package management system conda. The anaconda distribution includes
data-science packages suitable for Windows, Linux and MacOS.3

3.3.3 Python libraries:


For the computation and analysis we need certain python libraries which are
used to perform analytics. Packages such as SKlearn, Numpy, pandas,
Matplotlib, Flask framework, etc are needed.

● SKlearn: It features various classification, regression and clustering


algorithms including support vector machines, random forests, gradient
boosting, k-means and DBSCAN, and is designed to interoperate with the
Python numerical and scientific libraries NumPy and SciPy.
16
● NumPy: NumPy is a general-purpose array-processing package. It
provides a high-performance multidimensional array object, and tools for
working with these arrays. It is the fundamental package for scientific
computing with Python. Pandas: Pandas is one of the most widely used
python libraries in data science. It provides high-performance, easy to use
structures and data analysis tools. Unlike NumPy library which provides
objects for multi-dimensional arrays, Pandas provides in-memory 2d table
object called Data frame.
● Flask: It is a lightweight WSGI web application framework. It is designed
to make getting started quick and easy, with the ability to scale up to
complex applications. It began as a simple wrapper around Werkzeug

17
Chapter 4
SYSTEM ANALYSIS AND DESIGN

18
4.1 System Architecture of Proposed System:

Fig:-4.1 System Architecture

19
For each different individual use different list of movies are recommended ,as user
login or enters the user id based on two different approaches used in the project each
will recommend the set of movies to the particular user by combining the both the set
of movie based on the user the hybrid model will recommend the single list of movie
to the user.

4.3 Dataflow:

Fig:-4.2 Data Flow Diagram

Initially load the data sets that are required to build a model the data set that are required in this
project are movies.csv, ratinfg.csv, users.csv all the data sets are available in the Kaggle.com.
Basically, two models are built in this project content based and collaborative filtering each
produce a list of movies to a particular user by combining both based on the useid a single final
list of movies are recommended to the particular user
Activity Diagram:

Fig:-4.3 Activity diagram

Once the user login by entering the userid i.e present in the csv file ranges from 1-5000 the list of
movie are recommended to the use

21
Chapter 5
IMPLEMENTATION

23
The Proposed System Make Use Different Algorithms and Methods for the implementation
of Hybrid Approach

5.1 Cosine Similarity:


Cosine similarity is a measure of similarity between two non-zero vectors of an inner product
space that measures the cosine of the angle between them.
Formula:

5.1 Singular Value Decomposition (SVD):


Let A be an n*d matrix with singular vectors v1, v2, . . . , vr and corresponding singular
values σ1, σ2, . . . , σr. Then ui = (1/σi )Avi , for i = 1, 2, . . . , r, are the left singular vectors
and by Theorem 1.5, A can be decomposed into a sum of rank one matrices a

Formula:

We first prove a simple lemma stating that two matrices A and B are identical if Av = Bv
for all v. The lemma states that in the abstract, a matrix A can be viewed as a transformation
that maps vector v onto Av

24
Experimental requirements:

Code: Front-end

In this project we have used popular front-end web framework (react.js) to build an
interactive user interface

Fig:-5.1 Front-End code snippet

In react.js we used axios npm module to fetch the data from the api that is generated from
flask

25
Backend :For backend we have use flask app to generate a local host api the
resultant api is fetch in front to display the result.

Fig:-5.2 Back-End code snippet

We have developed our machine learning model in python .

By using flask, we generate resulting api which stores the data in the form of json format
these data is retrieved in react by using axios npm mode and then displaying the data

26
Chapter 6
RESULTS AND DISCUSSION

28
6. Future Scope:

Since our project is movie recommendation system .one can develop a movie
recommendation system by using either content based or collaborative filtering or
combining both.

In our project we have developed a hybrid approach i.e combination of both content
and collaborative filtering .Both the approaches have advantages and dis-advantages
.in content based filtering the it based on the user ratings or user likes only such kind of
movie will recommended to the user.
Advantages: it is easy to design and it takes less time to compute

Dis-advantages: the model can only make recommendations based on existing interests
of the user. In other words, the model has limited ability to expand on the users'
existing interests.

In Collaborative filtering the recommendation is comparison of similar users.

Advantages: No need domain knowledge because the embeddings are automatically


learned. The model can help users discover new interests. In isolation, the ML system
may not know the user is interested in a given item, but the model might still
recommend it because similar users are interested in that item.

Dis-advantages: The prediction of the model for a given (user, item) pair is the dot
product of the corresponding embeddings. So, if an item is not seen during training, the
system can't create an embedding for it and can't query the model with this item. This
issue is often called the cold-start problem.

The hybrid approach will resolves all these limitations by combining both content and
collaborative filtering

Fig:-6.1 Comparison between the three approaches

29
The main disadvantage in hybrid approach is it require high memory

Screen shot of the result:

Fig:-6.2 user id window

30
Fig:-6.3 Display of list of recommended movies

Once the name of movie is entered the list of recommended movies are displayed

31
Chapter 7
TESTING

32
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. Although each test has a different purpose, all work to
verify that all the system elements have been properly integrated and perform allocated
functions. The testing process is actually carried out to make sure that the product exactly
does the same thing what is supposed to do. In the testing stage following goals are tried to
achieve: -

● To affirm the quality of the project.

● To find and eliminate any residual errors from previous stages.

● To validate the software as a solution to the original problem.


● To provide operational reliability of the system.

Figure 7.1: Performance Analysis

7. TESTING METHODOLOGIES:

There are many different types of testing methods or techniques used as part of the
software testing methodology. Some of the important testing methodologies are:

Unit Testing:

33
Unit testing is the first level of testing and is often performed by the developers
themselves. It is the process of ensuring individual components of a piece of software
at the code level are functional and work as they were designed to. Developers in a
test-driven environment will typically write and run the tests prior to the software or
feature being passed over to the test team. Unit testing can be conducted manually, but
automating the process will speed up delivery cycles and expand test coverage. Unit
testing will also make debugging easier because finding issues earlier means they take
less time to fix than if they were discovered later in the testing process. Test Left is a
tool that allows advanced testers and developers to shift left with the fastest test
automation tool embedded in any IDE.

Integration Testing:
After each unit is thoroughly tested, it is integrated with other units to create modules or
components that are designed to perform specific tasks or activities. These are then tested
as group through integration testing to ensure whole segments of an application behave
as expected (i.e, the interactions between units are seamless). These tests are often
framed by user scenarios, such as logging into an application or opening files. Integrated
tests can be conducted by either developers or independent testers and are usually
comprised of a combination of automated functional and manual tests.

System Testing
System testing is a black box testing method used to evaluate the completed and
integrated system, as a whole, to ensure it meets specified requirements. The
functionality of the software is tested from end-to-end and is typically conducted by
a separate testing team than the development team before the product is pushed into
production.

34
Chapter 8
FUTURE SCOPE

35
8.1 Future scope:
In the proposed approach, It has considered Genres of movies but, in future we can also
consider age of user as according to the age movie preferences also changes, like for
example, during our childhood we like animated movies more as compared to other
movies. There is a need to work on the memory requirements of the proposed approach in
the future. The proposed approach has been implemented here on different movie datasets
only. It can also be implemented on the Film Affinity and Netflix datasets and the
performance can be computed in the future.

● Use collaborative filtering recommendation. After getting enough user data,


collaborative filtering recommendation will be introduced. As we discussed in
Section 2.2, collaborative filtering is based on the social information of users, which
will be analyzed in the future research.
● Introduce more precise and proper features of movie.[1] Typical collaborative filtering
recommendation use the rating instead of object features. In the future we should extract
features such as color and subtitle from movie which can provide a more accurate description
for movie.
● Introduce user dislike movie list. The user data is always useful in recommender systems. In
the future we will collect more user data and add user dislike movie list. We will input
dislike movie list into the recommender system as well and generate scores that will be
added to previous result. By this way we can improve the result of recommender system.

36
Chapter 9

CONCLUSION

37
8.1 Conclusion

In this project, to improve the accuracy, quality and scalability of movie recommendation
system, a Hybrid approach by unifying content based filtering and collaborative filtering;
using Singular Value Decomposition (SVD) as a classifier and Cosine Similarity is
presented in the proposed methodology. Existing pure approaches and proposed hybrid
approach is implemented on three different Movie datasets and the results are compared
among them. Comparative results depicts that the proposed approach shows an
improvement in the accuracy, quality and scalability of the movie recommendation system
than the pure approaches. Also, computing time of the proposed approach is lesser than the
other two pure approaches.

38
Chapter 10
Bibliography

39
Bibliography

● Netflix India – Watch TV Shows Online, Watch Movies Online

● Netflix - Wikipedia

● Architecture of Netflix - Bing images

● Google

● https://github.com

● http://in.youtube.com/

● https://www.learnpython.org/en/Pandas_Basics

● https://en.wikipedia.org/wiki/Recommender_system

● https://www.imdb.com/list/ls063596142/

40
Chapter 11
Appendix

41
9.II Screen Print-Outs

Fig 9.2.1 Home Page

Fig 9.2.2 Recommendation Page

42
Fig 9.2.3 Movie Page (a)

Fig 9.2.4 Movie Page (b)

43
Fig 9.2.5 Netflix Page

44
Chapter 12
Plagiarism Report
Plagiarism Report

You might also like