Mca Project
Mca Project
BANGALORE UNIVERSITY
By
Name: Sneha .V (P03CT23S126036)
UNDER THE GUIDANCE OF
Certificate
Certified that the project work entitled” Movie Recommender System”
is a bonafied work carried out by Name:Sneha. V(P03CT23S126036) in partial
fulfillment for the award of Degree of Master of Computer Application of the
Bangalore University, Bangalore during the year 2024-2025.The project report
has been approved as it satisfies the academic requirements in respect of Project
work prescribed for the Master of Computer Applications.
Date:
1.
2.
DECLARATION
We hereby declare that the work which is being presented in the project
entitled “Movie Recommender System” in partial fulfillment of the
requirements for the award of the Degree of Master of Computer Application
submitted in the Department of Computer Science East West College of
Management is an authentic record of our own work carried out under the
supervision and guidance of “Prof. HarshithaK” in the department of
Computer Sciences, East West College of Management.
The matter embodied in this project work has not been submitted for the
award of any other degree.
Date: Name:Sneha.V
(P03CT23S126036)
Place: Bangalore
ACKNOWLEDGEMENT
Task successful” makes everyone happy. But the happiness will be gold without
glitter if we didn’t state the persons who have supported us to make it a success.
Success will be crowned to people who made it a reality but the people whose
constant guidance and encouragement made it possible will be crowned first on
the eve of success.
We express our sincere gratitude to our respectful Lecturers Mrs. and Mr. for
enabling us to make use of laboratory and library facilities liberally, that helped
us a long way in carrying out our project work successfully for their consistent
supervision, guidance and co-operation throughout the project and we would
like to express our thankfulness to them for their constant motivation and
valuable help through the project work.
We extend our sincere gratitude to our parents who have encouraged us with
their blessings to do this project successfully. Finally we would like to thank to
all our friends, all the teaching and non-teaching staff members of the MCA
Department, for all the timely help, ideas and encouragement which helped
throughout in the completion of project
EAST WEST COLLEGE OF MANAGEMENT
2. INTRODUCTION
In In the era of digital streaming platforms and vast movie libraries, users are
often overwhelmed by the sheer volume of available choices. As a result,
personalized recommendation systems have become a critical tool for
enhancing user satisfaction and engagement. These systems guide users toward
movies that align with their preferences, streamlining the decision-making
process and enriching their overall viewing experience.
3. OBJECTIVE
The objective of this research is to design a system that allows users to input a
favorite movie and receive recommendations for similar movies. To achieve
this, the study employs advanced natural language processing (NLP)
techniques, vectorization methods like TF-IDF and count vectorization, and
similarity metrics such as cosine similarity. These tools transform raw metadata
into meaningful numerical representations, allowing the system to calculate and
rank similarities between movies.
This study is significant because it highlights the effectiveness of content-based
recommendation techniques in scenarios where user interaction data is limited
or unavailable. By leveraging only the metadata of movies, the system becomes
particularly valuable in cold-start situations or for new users with no historical
activity.
4. MODULES DESCRIPTION
4.1.1 BENEFITS
1. Sparsity of data: Data sets filled with rows and rows of values that contain
blanks or zero values. So finding ways to use denser parts of the data set and
those with information is critical.
For k-NN-based model, the underlying dataset ml-100k from the Surprise
Python sci-unit was used. Shock may be a tight call in any case, to search out
regarding recommendation frameworks. It’s acceptable for building and
examining recommendation frameworks that manage unequivocal rating data.
2. Libraries/Frameworks:
3. Storage:
Minimum 500 MB free space for dataset storage and project files
SSD recommended for faster read/write operations
6. SYSTEM ANALYSIS
DATABASE
2. The data flow diagram (DFD) is one of the most important modeling
tools. It is used to model the system components. These components are
the system process, the data used by the process, an external entity that
interacts with the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations that are applied as data moves
from input to out as bubble chart. A DFD may be used to represent a
system at any level of abstraction.
import streamlit as st
import pickle
import requests
import time
API_KEY = "5528de46eee10fe55e11d4f1ea5c9bd7"
BASE_URL = "https://api.themoviedb.org/3/movie/"
# Create a session to reuse connections
session = requests.Session()
def fetch_poster(movie_id):
url = f"{BASE_URL}{movie_id}?api_key={API_KEY}"
try:
data = response.json()
poster_path = data.get('poster_path')
if poster_path:
return f"https://image.tmdb.org/t/p/w500/{poster_path}"
else:
return "https://via.placeholder.com/300x450?
text=No+Poster+Available"
except requests.exceptions.RequestException as e:
else:
return "https://via.placeholder.com/300x450?
text=Error+Fetching+Poster"
def recommend(movie):
try:
distances = similarity[movie_index]
similar_movies = sorted(
)[1:6]
recommended_movies = []
posters = []
for i in similar_movies:
movie_id = movies_df.iloc[i[0]].id
recommended_movies.append(movies_df.iloc[i[0]].title)
posters.append(fetch_poster(movie_id))
except IndexError:
return [], []
except Exception as e:
return [], []
movies_list = movies_df['title'].values
# Streamlit App
selected_movie_name = st.selectbox(
if st.button("Recommend"):
names, posters = recommend(selected_movie_name)
st.write("Recommended Movies:")
cols = st.columns(5)
with col:
st.text(names[idx])
st.image(posters[idx], use_container_width=True)
DATA ANALYSIS
"cells": [
"cell_type": "code",
"execution_count": 1,
"id": "6945eafa-5504-4841-99de-8a4150795f63",
"metadata": {},
"outputs": [],
"source": [
},
"cell_type": "code",
"execution_count": 2,
"id": "274e079c-4196-4135-ab7e-20178e62b3e1",
"metadata": {},
"outputs": [],
"source": [
"\n",
"movies=pd.read_csv('tmdb_5000_movies.csv')\n",
"credits=pd.read_csv('tmdb_5000_credits.csv')"
},
"cell_type": "code",
"execution_count": 3,
"id": "006b3902-4620-44b8-a491-97f9d656aaf0",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th>budget</th>\n",
" <th>genres</th>\n",
" <th>homepage</th>\n",
" <th>id</th>\n",
" <th>keywords</th>\n",
" <th>original_language</th>\n",
" <th>original_title</th>\n",
" <th>overview</th>\n",
" <th>popularity</th>\n",
" <th>production_companies</th>\n",
" <th>production_countries</th>\n",
" <th>release_date</th>\n",
" <th>revenue</th>\n",
" <th>runtime</th>\n",
" <th>spoken_languages</th>\n",
" <th>status</th>\n",
" <th>tagline</th>\n",
" <th>title</th>\n",
" <th>vote_average</th>\n",
" <th>vote_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>237000000</td>\n",
" <td>http://www.avatarmovie.com/</td>\n",
" <td>19995</td>\n",
" <td>en</td>\n",
" <td>Avatar</td>\n",
" <td>150.437577</td>\n",
" <td>2009-12-10</td>\n",
" <td>2787965087</td>\n",
" <td>162.0</td>\n",
" <td>Released</td>\n",
" <td>Avatar</td>\n",
" <td>7.2</td>\n",
" <td>11800</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"\n",
"movies.head(1)"
]
},
"cell_type": "code",
"execution_count": 4,
"id": "86bb3f87-8d31-41ba-ba32-d5c75a34c97e",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>movie_id</th>\n",
" <th>title</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"\n",
"credits.head(1)"
},
{
"cell_type": "code",
"execution_count": 5,
"id": "938f69b5-857d-4295-b0ec-051b74dd6f98",
"metadata": {},
"outputs": [],
"source": [
"movies=movies.merge(credits,on='title')"
},
"cell_type": "code",
"execution_count": 7,
"id": "79ea2aa4-9e15-465e-aef2-b407ab861cbc",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>budget</th>\n",
" <th>genres</th>\n",
" <th>homepage</th>\n",
" <th>id</th>\n",
" <th>keywords</th>\n",
" <th>original_language</th>\n",
" <th>original_title</th>\n",
" <th>overview</th>\n",
" <th>popularity</th>\n",
" <th>production_companies</th>\n",
" <th>...</th>\n",
" <th>runtime</th>\n",
" <th>spoken_languages</th>\n",
" <th>status</th>\n",
" <th>tagline</th>\n",
" <th>title</th>\n",
" <th>vote_average</th>\n",
" <th>vote_count</th>\n",
" <th>movie_id</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>237000000</td>\n",
" <td>http://www.avatarmovie.com/</td>\n",
" <td>19995</td>\n",
" <td>en</td>\n",
" <td>Avatar</td>\n",
" <td>150.437577</td>\n",
" <td>...</td>\n",
" <td>162.0</td>\n",
" <td>Released</td>\n",
" <td>Avatar</td>\n",
" <td>7.2</td>\n",
" <td>11800</td>\n",
" <td>19995</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
],
"text/plain": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" tagline title vote_average vote_count movie_id \\\n",
"0 Enter the World of Pandora. Avatar 7.2 11800 19995 \n",
"\n",
"\n",
"\n",
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head(1)\n"
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4eb4d0a8-dd4d-4e4b-8bf3-984b69e73469",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
" dtype='object')"
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"# Required columns for analysis [genres, ID, keywords, title, overview, cast,
crew]\n",
"movies.columns\n"
},
"cell_type": "code",
"execution_count": 37,
"id": "1ae598a9-a482-4b75-8223-ef4b68b52dca",
"metadata": {},
"outputs": [],
"source": [
"movies=movies[['id','title','overview','genres','keywords','cast','crew']]\n"
},
"cell_type": "code",
"execution_count": 38,
"id": "d43d1fc0-3531-43d2-8697-ad8ef29d7957",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" <td>Pirates of the Caribbean: At World's End</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"\n",
"\n",
"\n",
"\n",
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#altered data\n",
"movies.head()"
},
"cell_type": "code",
"execution_count": 39,
"id": "de98365b-1765-46d5-9b48-4d5f21093bd7",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"id 0\n",
"title 0\n",
"overview 0\n",
"genres 0\n",
"keywords 0\n",
"cast 0\n",
"crew 0\n",
"dtype: int64"
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.isnull().sum()"
},
"cell_type": "code",
"execution_count": 40,
"id": "056052f4-04b1-43c2-8d4c-64db195e20af",
"metadata": {},
"outputs": [],
"source": [
"movies.dropna(inplace=True)"
]
},
"cell_type": "code",
"execution_count": 41,
"id": "52cd4fcf-e41a-4e44-8404-a307045bafbb",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"0"
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.duplicated().sum()"
},
{
"cell_type": "code",
"execution_count": 42,
"id": "4002881a-70df-4d92-ace7-5e7b79c03f6d",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.iloc[0].genres"
},
{
"cell_type": "code",
"execution_count": 49,
"id": "6076bf25-62b5-407f-ada2-d8aecbd6bb53",
"metadata": {},
"outputs": [],
"source": [
"import ast\n",
"\n",
"def convert(obj):\n",
" L=[]\n",
" L.append(i['name'])\n",
},
"cell_type": "code",
"execution_count": 52,
"id": "c80a260e-c874-4628-8279-f28438576420",
"metadata": {},
"outputs": [],
"source": [
"movies['genres']=movies['genres'].apply(convert)"
},
"cell_type": "code",
"execution_count": 53,
"id": "e083abf1-3bfe-4286-8408-4694816279e8",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",
"\n",
"\n",
"\n",
"\n",
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 54,
"id": "40c510df-1c99-4173-b9fd-828ae264ea6a",
"metadata": {},
"outputs": [],
"source": [
"movies['keywords']=movies['keywords'].apply(convert)"
},
"cell_type": "code",
"execution_count": 55,
"id": "127e3cf9-12f5-49c3-9495-7789d48e6cbe",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",
"\n",
"\n",
"1 [ocean, drug abuse, exotic island, east india ... \n",
"\n",
"\n",
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 56,
"id": "40a77d0a-2572-4c38-a5d8-fc0da1c6baec",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['cast']"
},
"cell_type": "code",
"execution_count": 62,
"id": "daa8f3de-3c6f-4289-9e53-e9959e574ff6",
"metadata": {},
"outputs": [],
"source": [
"def convert3(obj):\n",
" L=[]\n",
" counter=0\n",
" L.append(i['name'])\n",
" counter+=1\n",
" else:\n",
" break\n",
},
"cell_type": "code",
"execution_count": 63,
"id": "e7dbf82a-cbe2-46f4-a8f4-d8b11757b589",
"metadata": {},
"outputs": [],
"source": [
"movies['cast']=movies['cast'].apply(convert3)"
},
"cell_type": "code",
"execution_count": 64,
"id": "afa0ee75-0a32-46ab-a094-5452f82719a8",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"\n",
"\n",
"1 [ocean, drug abuse, exotic island, east india ... \n",
"\n",
"\n",
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 78,
"id": "d5724d6f-b047-43ea-93b6-ecd0b9b37696",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"['James Cameron']"
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['crew'][0]"
},
"cell_type": "code",
"execution_count": 82,
"id": "7f645cdf-3cfa-42ff-befb-51b2a2fda0be",
"metadata": {},
"outputs": [],
"source": [
"def fetch_director(obj):\n",
" L=[]\n",
" if i['job']=='Director':\n",
" L.append(i['name'])\n",
},
"cell_type": "code",
"execution_count": 84,
"id": "0626e06c-80ba-41b4-b97a-e915a2cff68f",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" <td>The Dark Knight Rises</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id title \\\n",
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"\n",
"\n",
"1 [ocean, drug abuse, exotic island, east india ... \n",
"\n",
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 86,
"id": "9657b5a0-ebb5-41b4-9ad1-ea6062966a72",
"metadata": {},
"outputs": [],
"source": [
"movies['overview']=movies['overview'].apply(lambda x:x.split())"
},
"cell_type": "code",
"execution_count": 87,
"id": "55acf88b-a0f4-459c-8b72-22be237ee308",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['overview']"
]
},
"cell_type": "code",
"execution_count": 88,
"id": "8420fe80-2368-401f-a9ac-0c042fcbe5f0",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"\n",
"\n",
"1 [ocean, drug abuse, exotic island, east india ... \n",
"\n",
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 102,
"id": "ebab29d1-d533-4580-ac45-ab15338b4b60",
"metadata": {},
"outputs": [],
"source": [
},
"cell_type": "code",
"execution_count": 103,
"id": "ed252a14-e4eb-40d6-b83d-91fed4d1bfa7",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"4808 [Documentary]\n",
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['genres']"
},
"cell_type": "code",
"execution_count": 104,
"id": "fd837d08-96d7-40a0-b56c-a7351afd87c4",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"4805 []\n",
"4807 []\n",
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['keywords']"
]
},
"cell_type": "code",
"execution_count": 105,
"id": "1de56d9c-33de-41e3-8c92-ec9256cc5c3f",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.columns"
]
},
"cell_type": "code",
"execution_count": 106,
"id": "e4324870-df88-4056-86a4-0c2c55d1abe8",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['cast']"
},
"cell_type": "code",
"execution_count": 107,
"id": "ee4f6745-9f88-4bc9-8e82-f0577363b088",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"0 [JamesCameron]\n",
"1 [GoreVerbinski]\n",
"2 [SamMendes]\n",
"3 [ChristopherNolan]\n",
"4 [AndrewStanton]\n",
"4804 [RobertRodriguez]\n",
"4805 [EdwardBurns]\n",
"4806 [ScottSmith]\n",
"4807 [DanielHsia]\n",
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies['crew']"
},
{
"cell_type": "code",
"execution_count": 108,
"id": "16149ecf-5668-43d6-b855-9a56b45b57a8",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>cast</th>\n",
" <th>crew</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" <td>[GoreVerbinski]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>[SamMendes]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" <td>[ChristopherNolan]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" <td>[AndrewStanton]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"\n",
"\n",
"\n",
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 110,
"id": "23284842-6ad5-4dce-85fa-3c4da56474c4",
"metadata": {},
"outputs": [],
"source": [
},
"cell_type": "code",
"execution_count": 111,
"id": "df758de9-a3e4-4e1c-9cae-d3f43ad9a52a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>overview</th>\n",
" <th>genres</th>\n",
" <th>keywords</th>\n",
" <th>crew</th>\n",
" <th>tags</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" <td>[JamesCameron]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" <td>[GoreVerbinski]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>[SamMendes]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" <td>[ChristopherNolan]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" <td>[AndrewStanton]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"\n",
"\n",
"\n",
"\n",
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"movies.head()"
},
"cell_type": "code",
"execution_count": 112,
"id": "1ff5c7cb-0904-4527-bcc6-e726529ab0ba",
"metadata": {},
"outputs": [],
"source": [
"sorted_df=movies[['id','title','tags']]"
},
"cell_type": "code",
"execution_count": 113,
"id": "e143beea-371b-4421-ade6-e9855ea92de1",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" }\n",
"\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>tags</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"sorted_df.head()"
},
"cell_type": "code",
"execution_count": 120,
"id": "31204b88-6c44-4ab2-8e1e-4c3a976e7462",
"metadata": {},
"outputs": [
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\269873301.py:1: SettingWithCopyWarning: \n",
"\n",
],
"source": [
},
"cell_type": "code",
"execution_count": 121,
"id": "20a641b5-0c92-4e59-90b0-f3f5f522e0d7",
"metadata": {},
"outputs": [
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
"\n",
" }\n",
"\n",
" }\n",
"</style>\n",
" <thead>\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>title</th>\n",
" <th>tags</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>19995</td>\n",
" <td>Avatar</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>206647</td>\n",
" <td>Spectre</td>\n",
" <td>A cryptic message from Bond’s past sends him o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>49026</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>49529</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"\n",
"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"sorted_df.head()"
},
"cell_type": "code",
"execution_count": 122,
"id": "cb8a2bb1-03ce-4ec2-9975-83c5a078266e",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"'In the 22nd century, a paraplegic Marine is dispatched to the moon
Pandora on a unique mission, but becomes torn between following orders and
protecting an alien civilization. Action Adventure Fantasy ScienceFiction
cultureclash future spacewar spacecolony society spacetravel futuristic romance
space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar
powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver
JamesCameron'"
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"sorted_df['tags'][0]"
},
"cell_type": "code",
"execution_count": 123,
"id": "dadabb94-9b17-4d01-a25f-c8764abfdb2e",
"metadata": {},
"outputs": [
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\3814400023.py:1: SettingWithCopyWarning: \n",
"\n",
],
"source": [
"sorted_df['tags']=sorted_df['tags'].apply(lambda x:x.lower())"
},
"cell_type": "code",
"execution_count": 124,
"id": "1a1389eb-5aa2-4477-a6d2-d042395764cc",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"4808 ever since the second grade when he first saw ...\n",
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"sorted_df['tags']"
},
"cell_type": "code",
"execution_count": 153,
"id": "03de0106-7284-487a-b29f-4e98e640a8b4",
"metadata": {},
"outputs": [],
"source": [
"import nltk\n",
"cv=CountVectorizer(max_features=5000, stop_words='english')\n",
"\n",
"vectors=cv.fit_transform(sorted_df['tags']).toarray()"
]
},
"cell_type": "code",
"execution_count": 154,
"id": "f27952ac-280e-460c-b911-19df31a25531",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
},
"execution_count": 154,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"vectors[0]"
},
{
"cell_type": "code",
"execution_count": 155,
"id": "3d0b3b20-4d21-49ea-ae62-81b5b2ab676a",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
" dtype=object)"
},
"execution_count": 155,
"metadata": {},
"output_type": "execute_result"
],
"source": [
"cv.get_feature_names_out()"
},
{
"cell_type": "code",
"execution_count": 156,
"id": "3a3277bc-e3a0-4d3e-8ae8-e546811c3db7",
"metadata": {},
"outputs": [],
"source": [
"#Steammming\n",
"import nltk\n",
"ps=PorterStemmer()\n",
"\n",
"def stem(text):\n",
" y=[]\n",
" y.append(ps.stem(i))\n",
},
"cell_type": "code",
"execution_count": 157,
"id": "772e24b2-9232-44f1-82a2-8c8f94a4b3b7",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
"'in the 22nd century, a parapleg marin is dispatch to the moon pandora on
a uniqu mission, but becom torn between follow order and protect an alien
civilization. action adventur fantasi sciencefict cultureclash futur spacewar
spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi
marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington
zoesaldana sigourneyweav jamescameron'"
},
"execution_count": 157,
"metadata": {},
"output_type": "execute_result"
],
"source": [
},
"cell_type": "code",
"execution_count": 158,
"id": "2582c623-3bcf-40de-8225-3b0c74591878",
"metadata": {},
"outputs": [
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\3626395266.py:1: SettingWithCopyWarning: \n",
"\n",
" sorted_df['tags']=sorted_df['tags'].apply(stem)\n"
]
}
],
"source": [
"sorted_df['tags']=sorted_df['tags'].apply(stem)"
},
"cell_type": "code",
"execution_count": 159,
"id": "b258ffbd-013d-4c71-94fa-53b5fd3c3457",
"metadata": {},
"outputs": [
"data": {
"text/plain": [
" dtype=object)"
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cv.get_feature_names_out()"
},
"cell_type": "code",
"execution_count": null,
"id": "19a59917-ad12-4431-bed1-d8f81030b39b",
"metadata": {},
"outputs": [],
"source": []
],
"metadata": {
"kernelspec": {
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"nbformat": 4,
"nbformat_minor": 5
Screenshots
CONCLUSION
In the last few decades, recommendation systems have been used, among the
many available solutions, in order to mitigate information and cognitive
overload problem by suggesting related and relevant items to the users. In this
regards, numerous advances have been made to get a high-quality and fine-
tuned recommendation system. Nevertheless, designers face several prominent
issues and challenges. Although, researchers have been working to cope with
these issues and have devised solutions that somehow and up to some extent try
to resolve these issues, however we need much to do in order to get to the
desired goal. In this research art
icle, we focused on these prominent issues and challenges, discussed what has
been done to mitigate these issues, and what needs to be done in the form of
different research opportunities and guidelines that can be followed in coping
with at least problems like latency, sparsity, context-awareness, grey sheep and
cold-start problem.
BIBLIOGRAPHY
Dataset Source:
Kaggle – TMDB Movie Dataset
Methodologies: