0% found this document useful (0 votes)
2 views

Mca Project

The project report details the development of a content-based movie recommender system aimed at providing personalized movie suggestions using metadata such as titles, genres, and keywords. It outlines the methodologies employed, including data preprocessing, vectorization techniques, and similarity computation, to enhance user experience in navigating extensive movie libraries. The system is designed to be scalable and modular, allowing for future enhancements and integration with collaborative filtering methods.

Uploaded by

varshithanshetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Mca Project

The project report details the development of a content-based movie recommender system aimed at providing personalized movie suggestions using metadata such as titles, genres, and keywords. It outlines the methodologies employed, including data preprocessing, vectorization techniques, and similarity computation, to enhance user experience in navigating extensive movie libraries. The system is designed to be scalable and modular, allowing for future enhancements and integration with collaborative filtering methods.

Uploaded by

varshithanshetty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 134

A PROJECT REPORT ON

“Movie Recommender System”

Project Report submitted to

BANGALORE UNIVERSITY

For partial fulfillment of the requirement for the award of

Master of Computer Application

By
Name: Sneha .V (P03CT23S126036)
UNDER THE GUIDANCE OF

Prof. Ms. Harshitha.K

EAST WEST COLLEGE OF MANAGEMENT


Off Magadi Road, Vishwaneedam Post, Bangalore 560091
YEAR 2025
EAST WEST COLLEGE OF MANAGEMENT
Off Magadi Road, Vishwaneedam Post, Bangalore-560091
Department of Computer Applications

Certificate
Certified that the project work entitled” Movie Recommender System”
is a bonafied work carried out by Name:Sneha. V(P03CT23S126036) in partial
fulfillment for the award of Degree of Master of Computer Application of the
Bangalore University, Bangalore during the year 2024-2025.The project report
has been approved as it satisfies the academic requirements in respect of Project
work prescribed for the Master of Computer Applications.

Signature of the Guide Signature of the HOD

Date:

Signature of the Examiner

1.

2.
DECLARATION

We hereby declare that the work which is being presented in the project
entitled “Movie Recommender System” in partial fulfillment of the
requirements for the award of the Degree of Master of Computer Application
submitted in the Department of Computer Science East West College of
Management is an authentic record of our own work carried out under the
supervision and guidance of “Prof. HarshithaK” in the department of
Computer Sciences, East West College of Management.

The matter embodied in this project work has not been submitted for the
award of any other degree.

Date: Name:Sneha.V
(P03CT23S126036)

Place: Bangalore
ACKNOWLEDGEMENT

Task successful” makes everyone happy. But the happiness will be gold without
glitter if we didn’t state the persons who have supported us to make it a success.
Success will be crowned to people who made it a reality but the people whose
constant guidance and encouragement made it possible will be crowned first on
the eve of success.

This acknowledgment transcends the reality of formality when we would like to


express deep gratitude and respect to all those people behind the screen who
guided, inspired and helped me for the completion of our project work. We
consider ourselves lucky enough to get such a good project. This project would
add as an asset to my academic profile.

We express our sincere gratitude to our respectful Lecturers Mrs. and Mr. for
enabling us to make use of laboratory and library facilities liberally, that helped
us a long way in carrying out our project work successfully for their consistent
supervision, guidance and co-operation throughout the project and we would
like to express our thankfulness to them for their constant motivation and
valuable help through the project work.

We extend our sincere gratitude to our parents who have encouraged us with
their blessings to do this project successfully. Finally we would like to thank to
all our friends, all the teaching and non-teaching staff members of the MCA
Department, for all the timely help, ideas and encouragement which helped
throughout in the completion of project
EAST WEST COLLEGE OF MANAGEMENT

No 63, off Magadi Road, Vishwaneedam Post Bangalore


Urban – 560091

DEPARMENT OF COMPUTER SCIENCE

Sl No PROJECT REPORT FORMATE


1 Main Page
2 Certificate Page
3 Declaration
4 Acknowledgement
5 Abstract
6 Index
7 Table of Contents
8 Introduction
9 Learning Objectives/Scope
10 Modules Description
11 Software requirement specifications (SRS)
12 System Analysis
13 Database
14 DFD / ERD’s / Class Diagrams etc.
15 Coding
16 Analysis And Test case
17 Screen Shots
18 Conclusion
19 References/ Bibliography
1. ABSTRACT

The project focuses on developing a content-based movie recommender system to


provide personalized movie suggestions. It uses movie metadata such as titles,
overviews, genres, keywords, cast, and crew to analyze and rank movies based on
their similarities.
Key components of the system include:

1. Data Preprocessing: Textual data is cleaned and tokenized to create


meaningful representations. Stop words are removed, and special tokens are
created for features like genres and keywords to enhance the feature extraction
process.

2. Vectorization Techniques: TF-IDF Vectorizer: Used to transform movie


overviews into numerical vectors, highlighting significant terms while reducing
noise from frequently occurring words.

3. Count Vectorizer: Applied to categorical features such as genres,


keywords, and cast, enabling a bag-of-words representation.

4. Similarity Computation: The system leverages cosine similarity to


calculate the degree of similarity between movies based on their feature
vectors. This metric enables the ranking of movies by relevance to a given
input movie.

5. Recommendation Engine: By taking a user-selected movie as input, the


system ranks all other movies in the dataset based on their similarity scores and
returns the top recommendations.

6. Scalability and Modularity: The system is implemented in a modular way,


allowing easy integration with larger recommendation pipelines. It focuses on
interpretability and transparency, making it an effective tool for personalized
recommendations. While the system is content-driven, its modular design
allows for future extensions, such as incorporating collaborative filtering or
deep learning models for hybrid recommendations. This approach highlights
the potential of content-based techniques in providing relevant, scalable, and
user-centric movie suggestions.

2. INTRODUCTION

In In the era of digital streaming platforms and vast movie libraries, users are
often overwhelmed by the sheer volume of available choices. As a result,
personalized recommendation systems have become a critical tool for
enhancing user satisfaction and engagement. These systems guide users toward
movies that align with their preferences, streamlining the decision-making
process and enriching their overall viewing experience.

This study focuses on the development of a content-based movie recommender


system, which uses the intrinsic attributes of movies to suggest
recommendations. Unlike collaborative filtering methods that rely on user
behavior and interactions, content-based approaches emphasize the properties of
items, such as movie metadata. By analyzing features like genres, cast,
keywords, and plot summaries, the system identifies patterns and similarities,
enabling accurate and personalized suggestions.

3. OBJECTIVE

The objective of this research is to design a system that allows users to input a
favorite movie and receive recommendations for similar movies. To achieve
this, the study employs advanced natural language processing (NLP)
techniques, vectorization methods like TF-IDF and count vectorization, and
similarity metrics such as cosine similarity. These tools transform raw metadata
into meaningful numerical representations, allowing the system to calculate and
rank similarities between movies.
This study is significant because it highlights the effectiveness of content-based
recommendation techniques in scenarios where user interaction data is limited
or unavailable. By leveraging only the metadata of movies, the system becomes
particularly valuable in cold-start situations or for new users with no historical
activity.

4. MODULES DESCRIPTION

4.1 SYSTEM STUDY

A recommendation engine is a system that suggests products, services,


information to users based on analysis of data. Notwithstanding, the
recommendation can derive from a variety of factors such as the history of the
user and the behavior of similar users. Recommendation systems are quickly
becoming the primary way for users to expose to the whole digital world
through the lens of their experiences, behaviors, preferences and interests. And
in a world of information density and product overload, a recommendation
engine provides an efficient way for companies to provide consumers with
personalized information and solutions.

4.1.1 BENEFITS

A recommendation engine can significantly boost revenues, Click-Through


Rates (CTRs), conversions, and other essential metrics. It can have positive
effects on the user experience, thus translating to higher customer satisfaction
and retention. Let’s take Netflix as an example. Instead of having to browse
through thousands of box sets and movie titles, Netflix presents you with a
much narrower selection of items that you are likely to enjoy. This capability
saves you time and delivers a better user experience. With this function, Netflix
achieved lower cancellation rates, saving the company around a billion dollars a
year. Although recommendation systems have been used for almost 20 years by
companies like Amazon, it has been proliferated to other industries such as
finance and travel during the last few years.

4.1.2 DIFFERENT TYPES

The most common types of recommendation systems are CONTENT-BASED


and COLLABORATIVE FILTERING recommendation systems. In
collaborative filtering, the behavior of a group of users is used to make
recommendations to other users. The recommendation is based on the
preference of other users. A simple example would be recommending a movie to
a user based on the fact that their friend liked the movie. There are two types of
collaborative models MEMORY-BASED methods and MODEL-BASED
methods. The advantage of memory-based techniques is that they are simple to
implement and the resulting recommendations are often easy to explain. They
are divided into two:

 User-based collaborative filtering: In this model, products are recommended


to a user based on the fact that the products have been liked by users similar to
the user. For example, if Derrick and Dennis like the same movies and a new
movie come out that Derick like, then we can recommend that movie to Dennis
because Derrick and Dennis seem to like the same movies.

 Item-based collaborative filtering: These systems identify similar items based


on users’ previous ratings. For example, if users A, B, and C gave a 5-star rating
to books X and Y then when a user D buys book Y they also get a
recommendation to purchase book X because the system identifies book X and
Y as similar based on the ratings of users A, B, and C.

Model-based methods are based on Matrix Factorization and are better at


dealing with sparsity. They are developed using data mining, machine learning
algorithms to predict users’ rating of unrated items. In this approach techniques
such as dimensionality reduction are used to improve accuracy. Examples of
such model-based methods include Decision trees, Rule-based Model, Bayesian
Model, and latent factor models.

 Content-based systems use metadata such as genre, producer, actor and


musician to recommend items say movies or music. Such a recommendation
would be for instance recommending Infinity War that featured Vin Diesel
because someone watched and liked The Fate of the Furious. Similarly, you can
get music recommendations from certain artists because you liked their music.
Content-based systems are based on the idea that if you liked a certain item you
are most likely to like something that is similar to it.

4.1.3 CHALLENGES A RECOMMENDATION SYSTEM FACE

1. Sparsity of data: Data sets filled with rows and rows of values that contain
blanks or zero values. So finding ways to use denser parts of the data set and
those with information is critical.

2. Latent association: Labelling is imperfect. Same products with different


labelling can be ignored or incorrectly consumed, meaning that the information
does not get incorporated correctly.

3. Scalability: The traditional approach has become overwhelmed by the


multiplicity of products and clients. This becomes a challenge as data sets widen
and can lead to performance reduction.

4.2 DATA PRE-PROCESSING

For k-NN-based model, the underlying dataset ml-100k from the Surprise
Python sci-unit was used. Shock may be a tight call in any case, to search out
regarding recommendation frameworks. It’s acceptable for building and
examining recommendation frameworks that manage unequivocal rating data.

4.3 MODEL BUILDING


Information is an element into a seventy fifth train take a look at and twenty
fifth holdout take a look at. Grid Search CV completed over five - overlap, is
employed to find the most effective arrangement of closeness live setup
(sim_options) for the forecast calculation. It utilizes the truth measurements
because the premise to get completely different mixes of sim options, over a
cross-approval system.

5. SYSTEM REQUIREMENT SPECIFICATIONS (SRS)

5.1 Software Requirements:

1. Programming Language: Python

2. Libraries/Frameworks:

Pandas (for data preprocessing)

NumPy (for numerical computing)

Scikit-learn (for vectorization and similarity computation)

Matplotlib & Seaborn (for data visualization)

3. Dataset Source: TMDB Movie Dataset (available on Kaggle)

4. Development Tools: Integrated Development Environment (IDE) like Jupyter


Notebook, PyCharm, or VS Code

5.2 Hardware Requirements:

1. Processor: Minimum Intel i3 or equivalent; recommended Intel i5 or higher

2. RAM: At least 4 GB (8 GB recommended for better performance with larger


datasets)

3. Storage:

Minimum 500 MB free space for dataset storage and project files
SSD recommended for faster read/write operations

4. Graphics: Basic integrated graphics support is sufficient for running


visualizations (e.g., heatmaps or charts)

5. OS Compatibility: Windows 10/11, macOS, or any Linux distribution


supporting Python and required libraries

6. SYSTEM ANALYSIS

6.1 PROPOSED SYSTEM

The proposed system provides a foundation for future extensions, such as


hybrid recommendation engines that integrate content-based and collaborative
filtering approaches. This flexibility makes it adaptable to dynamic user needs
and ensures its relevance in a rapidly evolving industry. Through this research,
the study aims to address key challenges in recommendation systems, such as
relevance, scalability, and interpretability, ultimately contributing to the
improvement of personalized entertainment experiences.

6.2 EXISTING SYSTEM

The rapid growth of digital platforms has led to an explosion of content,


particularly in the entertainment industry. Streaming services, such as Netflix,
Amazon Prime Video, and Disney+, offer vast libraries of movies and TV
shows, creating a challenging environment for users to find content that aligns
with their preferences. This challenge has driven the development of
recommendation systems, which aim to simplify the decision-making process by
presenting users with personalized suggestions.

Recommendation systems are broadly categorized into three approaches:


Content-based filtering, collaborative filtering, and hybrid models. Content-
based filtering, the focus of this study, utilizes the intrinsic properties of items,
such as textual descriptions, genres, and keywords, to identify similar items.
Unlike collaborative filtering, which relies on user interactions and preferences,
content-based methods are particularly advantageous in scenarios where user
data is sparse or unavailable, commonly referred to as the cold-start problem. In
the context of movie recommendation systems, content-based filtering leverages
metadata such as movie titles, overviews, genres, cast, and crew to analyze and
rank movies based on their similarities. These techniques use natural language
processing (NLP) and vectorization methods, like TF-IDF and count
vectorization, to convert textual and categorical data into meaningful numerical
representations. Similarity metrics, such as cosine similarity, then determine the
degree of relevance between movies, forming the basis for personalized
recommendations.

Historically, movie recommendation systems have evolved significantly from


simple manual curation to sophisticated machine learning-based algorithms.
Early systems relied heavily on user ratings and reviews, which posed
limitations in terms of scalability and accuracy. The advent of data-driven 6
models enabled the automation of recommendations, resulting in more efficient
and dynamic systems. Content-based systems, in particular, have gained
prominence for their ability to provide tailored suggestions without requiring
extensive user interaction data.

This study builds upon these advancements by developing a content-based


movie recommender system that analyzes movie metadata to deliver accurate
and scalable recommendations. The system demonstrates the utility of content-
based methods in addressing the challenges of personalization, especially in
datasets with limited user interaction data. Furthermore, it provides a foundation
for future hybrid models that could integrate the strengths of both content-based
and collaborative filtering approaches.
Through this study, the proposed recommender system contributes to the
growing body of knowledge in recommendation system research and addresses
the ongoing demand for efficient, user-centric solutions in the entertainment
industry.

DATABASE

DATA FLOW DIAGRAM

1. The DFD is also called as bubble chart. It is a simple graphical formalism


that can be used to represent a system in terms of input data to the system,
various processing carried out on this data, and the output data is
generated by this system.

2. The data flow diagram (DFD) is one of the most important modeling
tools. It is used to model the system components. These components are
the system process, the data used by the process, an external entity that
interacts with the system and the information flows in the system.

3. DFD shows how the information moves through the system and how it is
modified by a series of transformations that are applied as data moves
from input to out as bubble chart. A DFD may be used to represent a
system at any level of abstraction.

4. DFD may be partitioned into levels that represent increasing information


flow and functional detail.
USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of


behavioral diagram defined by and created from a Use-case analysis. Its purpose
is to present a graphical overview of the functionality provided by a system in
terms of actors, their goals (represented as use cases), and any dependencies
between those use cases. The main purpose of a use case diagram is to show
what system functions are performed for which actor. Roles of the actors in the
system can be depicted.
CODING

import streamlit as st

import pickle

import requests

import time

# Load movie data and similarity matrix

movies_df = pickle.load(open('movies.pkl', 'rb')) # Ensure this is a DataFrame

similarity = pickle.load(open('similarity.pkl', 'rb')) # Cosine similarity matrix

# TMDb API details

API_KEY = "5528de46eee10fe55e11d4f1ea5c9bd7"

BASE_URL = "https://api.themoviedb.org/3/movie/"
# Create a session to reuse connections

session = requests.Session()

def fetch_poster(movie_id):

"""Fetches the poster URL for a movie using its ID."""

url = f"{BASE_URL}{movie_id}?api_key={API_KEY}"

for attempt in range(3): # Retry up to 3 times

try:

response = session.get(url, timeout=10) # Use session for connection


pooling

response.raise_for_status() # Raise exception for HTTP errors

data = response.json()

poster_path = data.get('poster_path')

if poster_path:

return f"https://image.tmdb.org/t/p/w500/{poster_path}"

else:

return "https://via.placeholder.com/300x450?
text=No+Poster+Available"

except requests.exceptions.RequestException as e:

if attempt < 2: # Retry on first 2 attempts

st.warning(f"Retrying fetching poster for movie ID {movie_id}...


({attempt + 1}/3)")
time.sleep(1) # Short delay before retrying

else:

st.error(f"Error fetching poster for movie ID {movie_id}: {e}")

return "https://via.placeholder.com/300x450?
text=Error+Fetching+Poster"

def recommend(movie):

"""Recommends movies based on a given movie title."""

try:

movie_index = movies_df[movies_df['title'] == movie].index[0]

distances = similarity[movie_index]

similar_movies = sorted(

list(enumerate(distances)), reverse=True, key=lambda x: x[1]

)[1:6]

recommended_movies = []

posters = []

for i in similar_movies:

movie_id = movies_df.iloc[i[0]].id

recommended_movies.append(movies_df.iloc[i[0]].title)
posters.append(fetch_poster(movie_id))

time.sleep(0.5) # Throttle requests to avoid rate limiting

return recommended_movies, posters

except IndexError:

st.error("Movie not found in the dataset. Please check your selection.")

return [], []

except Exception as e:

st.error(f"An unexpected error occurred: {e}")

return [], []

# Extract movie titles for the dropdown

movies_list = movies_df['title'].values

# Streamlit App

st.title('Movie Recommender System')

selected_movie_name = st.selectbox(

"Select a movie to get recommendations", movies_list

if st.button("Recommend"):
names, posters = recommend(selected_movie_name)

if names and posters:

st.write("Recommended Movies:")

cols = st.columns(5)

for idx, col in enumerate(cols):

if idx < len(names):

with col:

st.text(names[idx])

st.image(posters[idx], use_container_width=True)

DATA ANALYSIS

"cells": [

"cell_type": "code",

"execution_count": 1,

"id": "6945eafa-5504-4841-99de-8a4150795f63",

"metadata": {},

"outputs": [],

"source": [

"#import necessary libraries\n",


"\n",

"import numpy as np\n",

"import pandas as pd"

},

"cell_type": "code",

"execution_count": 2,

"id": "274e079c-4196-4135-ab7e-20178e62b3e1",

"metadata": {},

"outputs": [],

"source": [

"#create dataframe and load dataset\n",

"\n",

"movies=pd.read_csv('tmdb_5000_movies.csv')\n",

"credits=pd.read_csv('tmdb_5000_credits.csv')"

},

"cell_type": "code",

"execution_count": 3,

"id": "006b3902-4620-44b8-a491-97f9d656aaf0",
"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",


" <th></th>\n",

" <th>budget</th>\n",

" <th>genres</th>\n",

" <th>homepage</th>\n",

" <th>id</th>\n",

" <th>keywords</th>\n",

" <th>original_language</th>\n",

" <th>original_title</th>\n",

" <th>overview</th>\n",

" <th>popularity</th>\n",

" <th>production_companies</th>\n",

" <th>production_countries</th>\n",

" <th>release_date</th>\n",

" <th>revenue</th>\n",

" <th>runtime</th>\n",

" <th>spoken_languages</th>\n",

" <th>status</th>\n",

" <th>tagline</th>\n",

" <th>title</th>\n",

" <th>vote_average</th>\n",

" <th>vote_count</th>\n",

" </tr>\n",
" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>237000000</td>\n",

" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",

" <td>http://www.avatarmovie.com/</td>\n",

" <td>19995</td>\n",

" <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",

" <td>en</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>150.437577</td>\n",

" <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",

" <td>[{\"iso_3166_1\": \"US\", \"name\": \"United States o...</td>\n",

" <td>2009-12-10</td>\n",

" <td>2787965087</td>\n",

" <td>162.0</td>\n",

" <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...</td>\n",

" <td>Released</td>\n",

" <td>Enter the World of Pandora.</td>\n",

" <td>Avatar</td>\n",
" <td>7.2</td>\n",

" <td>11800</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" budget genres \\\n",

"0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \


n",

"\n",

" homepage id \\\n",

"0 http://www.avatarmovie.com/ 19995 \n",

"\n",

" keywords original_language \\\n",

"0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n",

"\n",

" original_title overview \\\n",

"0 Avatar In the 22nd century, a paraplegic Marine is di... \n",

"\n",

" popularity production_companies \\\n",

"0 150.437577 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... \n",


"\n",

" production_countries release_date revenue \\\n",

"0 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2009-12-10


2787965087 \n",

"\n",

" runtime spoken_languages status \\\n",

"0 162.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...


Released \n",

"\n",

" tagline title vote_average vote_count \n",

"0 Enter the World of Pandora. Avatar 7.2 11800 "

},

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"#display the data movies\n",

"\n",

"movies.head(1)"

]
},

"cell_type": "code",

"execution_count": 4,

"id": "86bb3f87-8d31-41ba-ba32-d5c75a34c97e",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",


" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>movie_id</th>\n",

" <th>title</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",
"</div>"

],

"text/plain": [

" movie_id title cast \\\n",

"0 19995 Avatar [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \


n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... "

},

"execution_count": 4,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"#display the data of credits\n",

"\n",

"credits.head(1)"

},

{
"cell_type": "code",

"execution_count": 5,

"id": "938f69b5-857d-4295-b0ec-051b74dd6f98",

"metadata": {},

"outputs": [],

"source": [

"#merger two dataframes\n",

"movies=movies.merge(credits,on='title')"

},

"cell_type": "code",

"execution_count": 7,

"id": "79ea2aa4-9e15-465e-aef2-b407ab861cbc",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",


" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>budget</th>\n",

" <th>genres</th>\n",

" <th>homepage</th>\n",

" <th>id</th>\n",

" <th>keywords</th>\n",

" <th>original_language</th>\n",

" <th>original_title</th>\n",
" <th>overview</th>\n",

" <th>popularity</th>\n",

" <th>production_companies</th>\n",

" <th>...</th>\n",

" <th>runtime</th>\n",

" <th>spoken_languages</th>\n",

" <th>status</th>\n",

" <th>tagline</th>\n",

" <th>title</th>\n",

" <th>vote_average</th>\n",

" <th>vote_count</th>\n",

" <th>movie_id</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>237000000</td>\n",

" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",

" <td>http://www.avatarmovie.com/</td>\n",
" <td>19995</td>\n",

" <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",

" <td>en</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>150.437577</td>\n",

" <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",

" <td>...</td>\n",

" <td>162.0</td>\n",

" <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...</td>\n",

" <td>Released</td>\n",

" <td>Enter the World of Pandora.</td>\n",

" <td>Avatar</td>\n",

" <td>7.2</td>\n",

" <td>11800</td>\n",

" <td>19995</td>\n",

" <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"<p>1 rows × 23 columns</p>\n",


"</div>"

],

"text/plain": [

" budget genres \\\n",

"0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \


n",

"\n",

" homepage id \\\n",

"0 http://www.avatarmovie.com/ 19995 \n",

"\n",

" keywords original_language \\\n",

"0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n",

"\n",

" original_title overview \\\n",

"0 Avatar In the 22nd century, a paraplegic Marine is di... \n",

"\n",

" popularity production_companies ... runtime \\\n",

"0 150.437577 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... ...


162.0 \n",

"\n",

" spoken_languages status \\\n",

"0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n",

"\n",
" tagline title vote_average vote_count movie_id \\\n",

"0 Enter the World of Pandora. Avatar 7.2 11800 19995 \n",

"\n",

" cast \\\n",

"0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",

"\n",

"[1 rows x 23 columns]"

},

"execution_count": 7,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"#Display the data after merge\n",

"movies.head(1)\n"

},
{

"cell_type": "code",

"execution_count": 8,

"id": "4eb4d0a8-dd4d-4e4b-8bf3-984b69e73469",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',\


n",

" 'original_title', 'overview', 'popularity', 'production_companies',\n",

" 'production_countries', 'release_date', 'revenue', 'runtime',\n",

" 'spoken_languages', 'status', 'tagline', 'title', 'vote_average',\n",

" 'vote_count', 'movie_id', 'cast', 'crew'],\n",

" dtype='object')"

},

"execution_count": 8,

"metadata": {},

"output_type": "execute_result"

],
"source": [

"# Required columns for analysis [genres, ID, keywords, title, overview, cast,
crew]\n",

"movies.columns\n"

},

"cell_type": "code",

"execution_count": 37,

"id": "1ae598a9-a482-4b75-8223-ef4b68b52dca",

"metadata": {},

"outputs": [],

"source": [

"#Extract the required columns\n",

"movies=movies[['id','title','overview','genres','keywords','cast','crew']]\n"

},

"cell_type": "code",

"execution_count": 38,

"id": "d43d1fc0-3531-43d2-8697-ad8ef29d7957",

"metadata": {},

"outputs": [
{

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",
" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",

" <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",

" <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",
" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",

" <td>[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...</td>\n",

" <td>[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...</td>\n",

" <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",

" <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",

" <td>[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...</td>\n",

" <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",

" <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",


" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...</td>\n",

" <td>[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...</td>\n",

" <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",

" <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>John Carter is a war-weary, former military ca...</td>\n",

" <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",

" <td>[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...</td>\n",

" <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",

" <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",


"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",

"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... \n",

"\n",

" genres \\\n",

"0 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",

"1 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n",

"2 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",

"3 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n",

"4 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",

"\n",

" keywords \\\n",

"0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... \n",

"1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... \n",


"2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... \n",

"3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... \n",

"4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... \n",

"\n",

" cast \\\n",

"0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",

"1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n",

"2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n",

"3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n",

"4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",

"1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",

"2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",

"3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",

"4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "

},

"execution_count": 38,

"metadata": {},

"output_type": "execute_result"
}

],

"source": [

"#altered data\n",

"movies.head()"

},

"cell_type": "code",

"execution_count": 39,

"id": "de98365b-1765-46d5-9b48-4d5f21093bd7",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"id 0\n",

"title 0\n",

"overview 0\n",

"genres 0\n",

"keywords 0\n",

"cast 0\n",
"crew 0\n",

"dtype: int64"

},

"execution_count": 39,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.isnull().sum()"

},

"cell_type": "code",

"execution_count": 40,

"id": "056052f4-04b1-43c2-8d4c-64db195e20af",

"metadata": {},

"outputs": [],

"source": [

"movies.dropna(inplace=True)"

]
},

"cell_type": "code",

"execution_count": 41,

"id": "52cd4fcf-e41a-4e44-8404-a307045bafbb",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0"

},

"execution_count": 41,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.duplicated().sum()"

},
{

"cell_type": "code",

"execution_count": 42,

"id": "4002881a-70df-4d92-ace7-5e7b79c03f6d",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"'[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"name\": \"Adventure\"},


{\"id\": 14, \"name\": \"Fantasy\"}, {\"id\": 878, \"name\": \"Science
Fiction\"}]'"

},

"execution_count": 42,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.iloc[0].genres"

},
{

"cell_type": "code",

"execution_count": 49,

"id": "6076bf25-62b5-407f-ada2-d8aecbd6bb53",

"metadata": {},

"outputs": [],

"source": [

"# convert [{\"id\": 28, \"name\": \"Action\"}, {\"id\":


12, \"name\": \"Adventure\"}, {\"id\": 14, \"name\": \"Fantasy\"}, {\"id\":
878, \"name\": \"Science Fiction\"}]'| ['action','adventure','fantasy','science
fiction']\n",

"import ast\n",

"\n",

"def convert(obj):\n",

" L=[]\n",

" for i in ast.literal_eval(obj):\n",

" L.append(i['name'])\n",

" return L"

},

"cell_type": "code",

"execution_count": 52,
"id": "c80a260e-c874-4628-8279-f28438576420",

"metadata": {},

"outputs": [],

"source": [

"movies['genres']=movies['genres'].apply(convert)"

},

"cell_type": "code",

"execution_count": 53,

"id": "e083abf1-3bfe-4286-8408-4694816279e8",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",
" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",
" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>[Action, Adventure, Fantasy, Science Fiction]</td>\n",

" <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",

" <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...</td>\n",

" <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",

" <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",
" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...</td>\n",

" <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",

" <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...</td>\n",

" <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",

" <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",


" <td>John Carter is a war-weary, former military ca...</td>\n",

" <td>[Action, Adventure, Science Fiction]</td>\n",

" <td>[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...</td>\n",

" <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",

" <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, Science Fiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, Science Fiction] \n",

"\n",

" keywords \\\n",

"0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... \n",

"1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... \n",

"2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... \n",

"3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... \n",

"4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... \n",

"\n",

" cast \\\n",

"0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",

"1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n",

"2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n",

"3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n",


"4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",

"1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",

"2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",

"3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",

"4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "

},

"execution_count": 53,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 54,
"id": "40c510df-1c99-4173-b9fd-828ae264ea6a",

"metadata": {},

"outputs": [],

"source": [

"movies['keywords']=movies['keywords'].apply(convert)"

},

"cell_type": "code",

"execution_count": 55,

"id": "127e3cf9-12f5-49c3-9495-7789d48e6cbe",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",
" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",
" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>[Action, Adventure, Fantasy, Science Fiction]</td>\n",

" <td>[culture clash, future, space war, space colon...</td>\n",

" <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drug abuse, exotic island, east india ...</td>\n",

" <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",

" <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",
" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, based on novel, secret agent, sequel, mi...</td>\n",

" <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",

" <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dc comics, crime fighter, terrorist, secret i...</td>\n",

" <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",

" <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",


" <td>John Carter is a war-weary, former military ca...</td>\n",

" <td>[Action, Adventure, Science Fiction]</td>\n",

" <td>[based on novel, mars, medallion, space travel...</td>\n",

" <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",

" <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, Science Fiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, Science Fiction] \n",

"\n",

" keywords \\\n",

"0 [culture clash, future, space war, space colon... \n",

"1 [ocean, drug abuse, exotic island, east india ... \n",

"2 [spy, based on novel, secret agent, sequel, mi... \n",

"3 [dc comics, crime fighter, terrorist, secret i... \n",

"4 [based on novel, mars, medallion, space travel... \n",

"\n",

" cast \\\n",

"0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",

"1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n",

"2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n",

"3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n",


"4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",

"1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",

"2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",

"3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",

"4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "

},

"execution_count": 55,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 56,
"id": "40a77d0a-2572-4c38-a5d8-fc0da1c6baec",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...\n",

"1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa...\n",

"2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...\n",

"3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...\n",

"4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c...\n",

" ... \n",

"4804 [{\"cast_id\": 1, \"character\": \"El Mariachi\", \"c...\n",

"4805 [{\"cast_id\": 1, \"character\": \"Buzzy\", \"credit_...\n",

"4806 [{\"cast_id\": 8, \"character\": \"Oliver O\\u2019To...\n",

"4807 [{\"cast_id\": 3, \"character\": \"Sam\", \"credit_id...\n",

"4808 [{\"cast_id\": 3, \"character\": \"Herself\", \"credi...\n",

"Name: cast, Length: 4806, dtype: object"

},

"execution_count": 56,

"metadata": {},
"output_type": "execute_result"

],

"source": [

"movies['cast']"

},

"cell_type": "code",

"execution_count": 62,

"id": "daa8f3de-3c6f-4289-9e53-e9959e574ff6",

"metadata": {},

"outputs": [],

"source": [

"def convert3(obj):\n",

" L=[]\n",

" counter=0\n",

" for i in ast.literal_eval(obj):\n",

" if counter != 3:\n",

" L.append(i['name'])\n",

" counter+=1\n",

" else:\n",
" break\n",

" return L"

},

"cell_type": "code",

"execution_count": 63,

"id": "e7dbf82a-cbe2-46f4-a8f4-d8b11757b589",

"metadata": {},

"outputs": [],

"source": [

"movies['cast']=movies['cast'].apply(convert3)"

},

"cell_type": "code",

"execution_count": 64,

"id": "afa0ee75-0a32-46ab-a094-5452f82719a8",

"metadata": {},

"outputs": [

"data": {
"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",
" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>[Action, Adventure, Fantasy, Science Fiction]</td>\n",

" <td>[culture clash, future, space war, space colon...</td>\n",

" <td>[Sam Worthington, Zoe Saldana, Sigourney Weaver]</td>\n",

" <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",


" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drug abuse, exotic island, east india ...</td>\n",

" <td>[Johnny Depp, Orlando Bloom, Keira Knightley]</td>\n",

" <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, based on novel, secret agent, sequel, mi...</td>\n",

" <td>[Daniel Craig, Christoph Waltz, Léa Seydoux]</td>\n",

" <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dc comics, crime fighter, terrorist, secret i...</td>\n",


" <td>[Christian Bale, Michael Caine, Gary Oldman]</td>\n",

" <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>John Carter is a war-weary, former military ca...</td>\n",

" <td>[Action, Adventure, Science Fiction]</td>\n",

" <td>[based on novel, mars, medallion, space travel...</td>\n",

" <td>[Taylor Kitsch, Lynn Collins, Samantha Morton]</td>\n",

" <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",


"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",

"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, Science Fiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, Science Fiction] \n",

"\n",

" keywords \\\n",

"0 [culture clash, future, space war, space colon... \n",

"1 [ocean, drug abuse, exotic island, east india ... \n",

"2 [spy, based on novel, secret agent, sequel, mi... \n",

"3 [dc comics, crime fighter, terrorist, secret i... \n",


"4 [based on novel, mars, medallion, space travel... \n",

"\n",

" cast \\\n",

"0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] \n",

"1 [Johnny Depp, Orlando Bloom, Keira Knightley] \n",

"2 [Daniel Craig, Christoph Waltz, Léa Seydoux] \n",

"3 [Christian Bale, Michael Caine, Gary Oldman] \n",

"4 [Taylor Kitsch, Lynn Collins, Samantha Morton] \n",

"\n",

" crew \n",

"0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",

"1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",

"2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",

"3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",

"4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "

},

"execution_count": 64,

"metadata": {},

"output_type": "execute_result"

],
"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 78,

"id": "d5724d6f-b047-43ea-93b6-ecd0b9b37696",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"['James Cameron']"

},

"execution_count": 78,

"metadata": {},

"output_type": "execute_result"

],

"source": [
"movies['crew'][0]"

},

"cell_type": "code",

"execution_count": 82,

"id": "7f645cdf-3cfa-42ff-befb-51b2a2fda0be",

"metadata": {},

"outputs": [],

"source": [

"def fetch_director(obj):\n",

" L=[]\n",

" for i in ast.literal_eval(obj):\n",

" if i['job']=='Director':\n",

" L.append(i['name'])\n",

" return L"

},

"cell_type": "code",

"execution_count": 84,

"id": "0626e06c-80ba-41b4-b97a-e915a2cff68f",
"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",


" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" <td>[Action, Adventure, Fantasy, Science Fiction]</td>\n",

" <td>[culture clash, future, space war, space colon...</td>\n",

" <td>[Sam Worthington, Zoe Saldana, Sigourney Weaver]</td>\n",

" <td>[James Cameron]</td>\n",

" </tr>\n",

" <tr>\n",
" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drug abuse, exotic island, east india ...</td>\n",

" <td>[Johnny Depp, Orlando Bloom, Keira Knightley]</td>\n",

" <td>[Gore Verbinski]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, based on novel, secret agent, sequel, mi...</td>\n",

" <td>[Daniel Craig, Christoph Waltz, Léa Seydoux]</td>\n",

" <td>[Sam Mendes]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",
" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dc comics, crime fighter, terrorist, secret i...</td>\n",

" <td>[Christian Bale, Michael Caine, Gary Oldman]</td>\n",

" <td>[Christopher Nolan]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>John Carter is a war-weary, former military ca...</td>\n",

" <td>[Action, Adventure, Science Fiction]</td>\n",

" <td>[based on novel, mars, medallion, space travel...</td>\n",

" <td>[Taylor Kitsch, Lynn Collins, Samantha Morton]</td>\n",

" <td>[Andrew Stanton]</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [
" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",

"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, Science Fiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, Science Fiction] \n",

"\n",

" keywords \\\n",


"0 [culture clash, future, space war, space colon... \n",

"1 [ocean, drug abuse, exotic island, east india ... \n",

"2 [spy, based on novel, secret agent, sequel, mi... \n",

"3 [dc comics, crime fighter, terrorist, secret i... \n",

"4 [based on novel, mars, medallion, space travel... \n",

"\n",

" cast crew \n",

"0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] [James


Cameron] \n",

"1 [Johnny Depp, Orlando Bloom, Keira Knightley] [Gore Verbinski]


\n",

"2 [Daniel Craig, Christoph Waltz, Léa Seydoux] [Sam Mendes] \


n",

"3 [Christian Bale, Michael Caine, Gary Oldman] [Christopher


Nolan] \n",

"4 [Taylor Kitsch, Lynn Collins, Samantha Morton] [Andrew Stanton]


"

},

"execution_count": 84,

"metadata": {},

"output_type": "execute_result"

],
"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 86,

"id": "9657b5a0-ebb5-41b4-9ad1-ea6062966a72",

"metadata": {},

"outputs": [],

"source": [

"movies['overview']=movies['overview'].apply(lambda x:x.split())"

},

"cell_type": "code",

"execution_count": 87,

"id": "55acf88b-a0f4-459c-8b72-22be237ee308",

"metadata": {},

"outputs": [

"data": {
"text/plain": [

"0 [In, the, 22nd, century,, a, paraplegic, Marin...\n",

"1 [Captain, Barbossa,, long, believed, to, be, d...\n",

"2 [A, cryptic, message, from, Bond’s, past, send...\n",

"3 [Following, the, death, of, District, Attorney...\n",

"4 [John, Carter, is, a, war-weary,, former, mili...\n",

" ... \n",

"4804 [El, Mariachi, just, wants, to, play, his, gui...\n",

"4805 [A, newlywed, couple's, honeymoon, is, upended...\n",

"4806 [\"Signed,, Sealed,, Delivered\", introduces, a,...\n",

"4807 [When, ambitious, New, York, attorney, Sam, is...\n",

"4808 [Ever, since, the, second, grade, when, he, fi...\n",

"Name: overview, Length: 4806, dtype: object"

},

"execution_count": 87,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies['overview']"
]

},

"cell_type": "code",

"execution_count": 88,

"id": "8420fe80-2368-401f-a9ac-0c042fcbe5f0",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",


" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",


" <td>[Action, Adventure, Fantasy, Science Fiction]</td>\n",

" <td>[culture clash, future, space war, space colon...</td>\n",

" <td>[Sam Worthington, Zoe Saldana, Sigourney Weaver]</td>\n",

" <td>[James Cameron]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drug abuse, exotic island, east india ...</td>\n",

" <td>[Johnny Depp, Orlando Bloom, Keira Knightley]</td>\n",

" <td>[Gore Verbinski]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, based on novel, secret agent, sequel, mi...</td>\n",


" <td>[Daniel Craig, Christoph Waltz, Léa Seydoux]</td>\n",

" <td>[Sam Mendes]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>[Following, the, death, of, District, Attorney...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dc comics, crime fighter, terrorist, secret i...</td>\n",

" <td>[Christian Bale, Michael Caine, Gary Oldman]</td>\n",

" <td>[Christopher Nolan]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",

" <td>[Action, Adventure, Science Fiction]</td>\n",

" <td>[based on novel, mars, medallion, space travel...</td>\n",

" <td>[Taylor Kitsch, Lynn Collins, Samantha Morton]</td>\n",

" <td>[Andrew Stanton]</td>\n",


" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",

"1 [Captain, Barbossa,, long, believed, to, be, d... \n",

"2 [A, cryptic, message, from, Bond’s, past, send... \n",

"3 [Following, the, death, of, District, Attorney... \n",

"4 [John, Carter, is, a, war-weary,, former, mili... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, Science Fiction] \n",


"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, Science Fiction] \n",

"\n",

" keywords \\\n",

"0 [culture clash, future, space war, space colon... \n",

"1 [ocean, drug abuse, exotic island, east india ... \n",

"2 [spy, based on novel, secret agent, sequel, mi... \n",

"3 [dc comics, crime fighter, terrorist, secret i... \n",

"4 [based on novel, mars, medallion, space travel... \n",

"\n",

" cast crew \n",

"0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] [James


Cameron] \n",

"1 [Johnny Depp, Orlando Bloom, Keira Knightley] [Gore Verbinski]


\n",

"2 [Daniel Craig, Christoph Waltz, Léa Seydoux] [Sam Mendes] \


n",

"3 [Christian Bale, Michael Caine, Gary Oldman] [Christopher


Nolan] \n",

"4 [Taylor Kitsch, Lynn Collins, Samantha Morton] [Andrew Stanton]


"

]
},

"execution_count": 88,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 102,

"id": "ebab29d1-d533-4580-ac45-ab15338b4b60",

"metadata": {},

"outputs": [],

"source": [

"movies['genres']=movies['genres'].apply(lambda x:[i.replace(\" \",\"\") for i


in x])\n",

"movies['keywords']=movies['keywords'].apply(lambda x:[i.replace(\" \",\"\")


for i in x])\n",

"movies['cast']=movies['cast'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\


n",

"movies['crew']=movies['crew'].apply(lambda x:[i.replace(\" \",\"\") for i in


x])"

},

"cell_type": "code",

"execution_count": 103,

"id": "ed252a14-e4eb-40d6-b83d-91fed4d1bfa7",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0 [Action, Adventure, Fantasy, ScienceFiction]\n",

"1 [Adventure, Fantasy, Action]\n",

"2 [Action, Adventure, Crime]\n",

"3 [Action, Crime, Drama, Thriller]\n",

"4 [Action, Adventure, ScienceFiction]\n",

" ... \n",

"4804 [Action, Crime, Thriller]\n",

"4805 [Comedy, Romance]\n",

"4806 [Comedy, Drama, Romance, TVMovie]\n",


"4807 []\n",

"4808 [Documentary]\n",

"Name: genres, Length: 4806, dtype: object"

},

"execution_count": 103,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies['genres']"

},

"cell_type": "code",

"execution_count": 104,

"id": "fd837d08-96d7-40a0-b56c-a7351afd87c4",

"metadata": {},

"outputs": [

"data": {
"text/plain": [

"0 [cultureclash, future, spacewar, spacecolony, ...\n",

"1 [ocean, drugabuse, exoticisland, eastindiatrad...\n",

"2 [spy, basedonnovel, secretagent, sequel, mi6, ...\n",

"3 [dccomics, crimefighter, terrorist, secretiden...\n",

"4 [basedonnovel, mars, medallion, spacetravel, p...\n",

" ... \n",

"4804 [unitedstates–mexicobarrier, legs, arms, paper...\n",

"4805 []\n",

"4806 [date, loveatfirstsight, narration, investigat...\n",

"4807 []\n",

"4808 [obsession, camcorder, crush, dreamgirl]\n",

"Name: keywords, Length: 4806, dtype: object"

},

"execution_count": 104,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies['keywords']"
]

},

"cell_type": "code",

"execution_count": 105,

"id": "1de56d9c-33de-41e3-8c92-ec9256cc5c3f",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"Index(['id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew'],


dtype='object')"

},

"execution_count": 105,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.columns"

]
},

"cell_type": "code",

"execution_count": 106,

"id": "e4324870-df88-4056-86a4-0c2c55d1abe8",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0 [SamWorthington, ZoeSaldana, SigourneyWeaver]\n",

"1 [JohnnyDepp, OrlandoBloom, KeiraKnightley]\n",

"2 [DanielCraig, ChristophWaltz, LéaSeydoux]\n",

"3 [ChristianBale, MichaelCaine, GaryOldman]\n",

"4 [TaylorKitsch, LynnCollins, SamanthaMorton]\n",

" ... \n",

"4804 [CarlosGallardo, JaimedeHoyos, PeterMarquardt]\n",

"4805 [EdwardBurns, KerryBishé, MarshaDietlein]\n",

"4806 [EricMabius, KristinBooth, CrystalLowe]\n",

"4807 [DanielHenney, ElizaCoupe, BillPaxton]\n",

"4808 [DrewBarrymore, BrianHerzlinger, CoreyFeldman]\n",

"Name: cast, Length: 4806, dtype: object"


]

},

"execution_count": 106,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies['cast']"

},

"cell_type": "code",

"execution_count": 107,

"id": "ee4f6745-9f88-4bc9-8e82-f0577363b088",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0 [JamesCameron]\n",

"1 [GoreVerbinski]\n",
"2 [SamMendes]\n",

"3 [ChristopherNolan]\n",

"4 [AndrewStanton]\n",

" ... \n",

"4804 [RobertRodriguez]\n",

"4805 [EdwardBurns]\n",

"4806 [ScottSmith]\n",

"4807 [DanielHsia]\n",

"4808 [BrianHerzlinger, JonGunn, BrettWinn]\n",

"Name: crew, Length: 4806, dtype: object"

},

"execution_count": 107,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies['crew']"

},

{
"cell_type": "code",

"execution_count": 108,

"id": "16149ecf-5668-43d6-b855-9a56b45b57a8",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>cast</th>\n",

" <th>crew</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",

" <td>[Action, Adventure, Fantasy, ScienceFiction]</td>\n",

" <td>[cultureclash, future, spacewar, spacecolony, ...</td>\n",

" <td>[SamWorthington, ZoeSaldana, SigourneyWeaver]</td>\n",


" <td>[JamesCameron]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drugabuse, exoticisland, eastindiatrad...</td>\n",

" <td>[JohnnyDepp, OrlandoBloom, KeiraKnightley]</td>\n",

" <td>[GoreVerbinski]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, , secretagent


, sequel, mi6, ...</td>\n",

" <td>[DanielCraig, ChristophWaltz, LéaSeydoux]</td>\n",

" <td>[SamMendes]</td>\n",

" </tr>\n",
" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>[Following, the, death, of, District, Attorney...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dccomics, crimefighter, terrorist, secretiden...</td>\n",

" <td>[ChristianBale, MichaelCaine, GaryOldman]</td>\n",

" <td>[ChristopherNolan]</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",

" <td>[Action, Adventure, ScienceFiction]</td>\n",

" <td>[basedonnovel, mars, medallion, spacetravel, p...</td>\n",

" <td>[TaylorKitsch, LynnCollins, SamanthaMorton]</td>\n",

" <td>[AndrewStanton]</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",
"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",

"1 [Captain, Barbossa,, long, believed, to, be, d... \n",

"2 [A, cryptic, message, from, Bond’s, past, send... \n",

"3 [Following, the, death, of, District, Attorney... \n",

"4 [John, Carter, is, a, war-weary,, former, mili... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, ScienceFiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",

"3 [Action, Crime, Drama, Thriller] \n",


"4 [Action, Adventure, ScienceFiction] \n",

"\n",

" keywords \\\n",

"0 [cultureclash, future, spacewar, spacecolony, ... \n",

"1 [ocean, drugabuse, exoticisland, eastindiatrad... \n",

"2 [spy, basedonnovel, secretagent, sequel, mi6, ... \n",

"3 [dccomics, crimefighter, terrorist, secretiden... \n",

"4 [basedonnovel, mars, medallion, spacetravel, p... \n",

"\n",

" cast crew \n",

"0 [SamWorthington, ZoeSaldana, SigourneyWeaver] [JamesCameron]


\n",

"1 [JohnnyDepp, OrlandoBloom, KeiraKnightley] [GoreVerbinski] \


n",

"2 [DanielCraig, ChristophWaltz, LéaSeydoux] [SamMendes] \n",

"3 [ChristianBale, MichaelCaine, GaryOldman] [ChristopherNolan] \


n",

"4 [TaylorKitsch, LynnCollins, SamanthaMorton] [AndrewStanton] "

},

"execution_count": 108,

"metadata": {},

"output_type": "execute_result"
}

],

"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 110,

"id": "23284842-6ad5-4dce-85fa-3c4da56474c4",

"metadata": {},

"outputs": [],

"source": [

"movies['tags']=movies['overview']+ movies['genres']+ movies['keywords']+


movies['cast']+ movies['crew']"

},

"cell_type": "code",

"execution_count": 111,

"id": "df758de9-a3e4-4e1c-9cae-d3f43ad9a52a",

"metadata": {},

"outputs": [
{

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",
" <th>id</th>\n",

" <th>title</th>\n",

" <th>overview</th>\n",

" <th>genres</th>\n",

" <th>keywords</th>\n",

" <th>crew</th>\n",

" <th>tags</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",

" <td>[Action, Adventure, Fantasy, ScienceFiction]</td>\n",

" <td>[cultureclash, future, spacewar, spacecolony, ...</td>\n",

" <td>[SamWorthington, ZoeSaldana, SigourneyWeaver]</td>\n",

" <td>[JamesCameron]</td>\n",

" <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",

" </tr>\n",

" <tr>\n",
" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",

" <td>[Adventure, Fantasy, Action]</td>\n",

" <td>[ocean, drugabuse, exoticisland, eastindiatrad...</td>\n",

" <td>[JohnnyDepp, OrlandoBloom, KeiraKnightley]</td>\n",

" <td>[GoreVerbinski]</td>\n",

" <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",

" <td>[Action, Adventure, Crime]</td>\n",

" <td>[spy, basedonnovel, secretagent, sequel, mi6, ...</td>\n",

" <td>[DanielCraig, ChristophWaltz, LéaSeydoux]</td>\n",

" <td>[SamMendes]</td>\n",

" <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",

" </tr>\n",

" <tr>\n",
" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>[Following, the, death, of, District, Attorney...</td>\n",

" <td>[Action, Crime, Drama, Thriller]</td>\n",

" <td>[dccomics, crimefighter, terrorist, secretiden...</td>\n",

" <td>[ChristianBale, MichaelCaine, GaryOldman]</td>\n",

" <td>[ChristopherNolan]</td>\n",

" <td>[Following, the, death, of, District, Attorney...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",

" <td>[Action, Adventure, ScienceFiction]</td>\n",

" <td>[basedonnovel, mars, medallion, spacetravel, p...</td>\n",

" <td>[TaylorKitsch, LynnCollins, SamanthaMorton]</td>\n",

" <td>[AndrewStanton]</td>\n",

" <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",

" </tr>\n",

" </tbody>\n",
"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" overview \\\n",

"0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",

"1 [Captain, Barbossa,, long, believed, to, be, d... \n",

"2 [A, cryptic, message, from, Bond’s, past, send... \n",

"3 [Following, the, death, of, District, Attorney... \n",

"4 [John, Carter, is, a, war-weary,, former, mili... \n",

"\n",

" genres \\\n",

"0 [Action, Adventure, Fantasy, ScienceFiction] \n",

"1 [Adventure, Fantasy, Action] \n",

"2 [Action, Adventure, Crime] \n",


"3 [Action, Crime, Drama, Thriller] \n",

"4 [Action, Adventure, ScienceFiction] \n",

"\n",

" keywords \\\n",

"0 [cultureclash, future, spacewar, spacecolony, ... \n",

"1 [ocean, drugabuse, exoticisland, eastindiatrad... \n",

"2 [spy, basedonnovel, secretagent, sequel, mi6, ... \n",

"3 [dccomics, crimefighter, terrorist, secretiden... \n",

"4 [basedonnovel, mars, medallion, spacetravel, p... \n",

"\n",

" cast crew \\\n",

"0 [SamWorthington, ZoeSaldana, SigourneyWeaver] [JamesCameron]


\n",

"1 [JohnnyDepp, OrlandoBloom, KeiraKnightley] [GoreVerbinski] \


n",

"2 [DanielCraig, ChristophWaltz, LéaSeydoux] [SamMendes] \n",

"3 [ChristianBale, MichaelCaine, GaryOldman] [ChristopherNolan] \


n",

"4 [TaylorKitsch, LynnCollins, SamanthaMorton] [AndrewStanton] \


n",

"\n",

" tags \n",

"0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",


"1 [Captain, Barbossa,, long, believed, to, be, d... \n",

"2 [A, cryptic, message, from, Bond’s, past, send... \n",

"3 [Following, the, death, of, District, Attorney... \n",

"4 [John, Carter, is, a, war-weary,, former, mili... "

},

"execution_count": 111,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"movies.head()"

},

"cell_type": "code",

"execution_count": 112,

"id": "1ff5c7cb-0904-4527-bcc6-e726529ab0ba",

"metadata": {},

"outputs": [],

"source": [
"sorted_df=movies[['id','title','tags']]"

},

"cell_type": "code",

"execution_count": 113,

"id": "e143beea-371b-4421-ade6-e9855ea92de1",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",
" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>tags</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",

" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",
" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>[Following, the, death, of, District, Attorney...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",

" </tr>\n",
" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" tags \n",

"0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",

"1 [Captain, Barbossa,, long, believed, to, be, d... \n",

"2 [A, cryptic, message, from, Bond’s, past, send... \n",

"3 [Following, the, death, of, District, Attorney... \n",

"4 [John, Carter, is, a, war-weary,, former, mili... "

},

"execution_count": 113,

"metadata": {},
"output_type": "execute_result"

],

"source": [

"sorted_df.head()"

},

"cell_type": "code",

"execution_count": 120,

"id": "31204b88-6c44-4ab2-8e1e-4c3a976e7462",

"metadata": {},

"outputs": [

"name": "stderr",

"output_type": "stream",

"text": [

"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\269873301.py:1: SettingWithCopyWarning: \n",

"A value is trying to be set on a copy of a slice from a DataFrame.\n",

"Try using .loc[row_indexer,col_indexer] = value instead\n",

"\n",

"See the caveats in the documentation: https://pandas.pydata.org/pandas-


docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",

" sorted_df['tags']=sorted_df['tags'].apply(lambda x:\" \".join(x))\n"

],

"source": [

"sorted_df['tags']=sorted_df['tags'].apply(lambda x:\" \".join(x))"

},

"cell_type": "code",

"execution_count": 121,

"id": "20a641b5-0c92-4e59-90b0-f3f5f522e0d7",

"metadata": {},

"outputs": [

"data": {

"text/html": [

"<div>\n",

"<style scoped>\n",

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",


" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"</style>\n",

"<table border=\"1\" class=\"dataframe\">\n",

" <thead>\n",

" <tr style=\"text-align: right;\">\n",

" <th></th>\n",

" <th>id</th>\n",

" <th>title</th>\n",

" <th>tags</th>\n",

" </tr>\n",

" </thead>\n",

" <tbody>\n",

" <tr>\n",

" <th>0</th>\n",
" <td>19995</td>\n",

" <td>Avatar</td>\n",

" <td>In the 22nd century, a paraplegic Marine is di...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>1</th>\n",

" <td>285</td>\n",

" <td>Pirates of the Caribbean: At World's End</td>\n",

" <td>Captain Barbossa, long believed to be dead, ha...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>2</th>\n",

" <td>206647</td>\n",

" <td>Spectre</td>\n",

" <td>A cryptic message from Bond’s past sends him o...</td>\n",

" </tr>\n",

" <tr>\n",

" <th>3</th>\n",

" <td>49026</td>\n",

" <td>The Dark Knight Rises</td>\n",

" <td>Following the death of District Attorney Harve...</td>\n",

" </tr>\n",
" <tr>\n",

" <th>4</th>\n",

" <td>49529</td>\n",

" <td>John Carter</td>\n",

" <td>John Carter is a war-weary, former military ca...</td>\n",

" </tr>\n",

" </tbody>\n",

"</table>\n",

"</div>"

],

"text/plain": [

" id title \\\n",

"0 19995 Avatar \n",

"1 285 Pirates of the Caribbean: At World's End \n",

"2 206647 Spectre \n",

"3 49026 The Dark Knight Rises \n",

"4 49529 John Carter \n",

"\n",

" tags \n",

"0 In the 22nd century, a paraplegic Marine is di... \n",

"1 Captain Barbossa, long believed to be dead, ha... \n",

"2 A cryptic message from Bond’s past sends him o... \n",
"3 Following the death of District Attorney Harve... \n",

"4 John Carter is a war-weary, former military ca... "

},

"execution_count": 121,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"sorted_df.head()"

},

"cell_type": "code",

"execution_count": 122,

"id": "cb8a2bb1-03ce-4ec2-9975-83c5a078266e",

"metadata": {},

"outputs": [

"data": {

"text/plain": [
"'In the 22nd century, a paraplegic Marine is dispatched to the moon
Pandora on a unique mission, but becomes torn between following orders and
protecting an alien civilization. Action Adventure Fantasy ScienceFiction
cultureclash future spacewar spacecolony society spacetravel futuristic romance
space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar
powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver
JamesCameron'"

},

"execution_count": 122,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"#converted 4 columns into single paragraph\n",

"sorted_df['tags'][0]"

},

"cell_type": "code",

"execution_count": 123,

"id": "dadabb94-9b17-4d01-a25f-c8764abfdb2e",

"metadata": {},
"outputs": [

"name": "stderr",

"output_type": "stream",

"text": [

"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\3814400023.py:1: SettingWithCopyWarning: \n",

"A value is trying to be set on a copy of a slice from a DataFrame.\n",

"Try using .loc[row_indexer,col_indexer] = value instead\n",

"\n",

"See the caveats in the documentation: https://pandas.pydata.org/pandas-


docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",

" sorted_df['tags']=sorted_df['tags'].apply(lambda x:x.lower())\n"

],

"source": [

"sorted_df['tags']=sorted_df['tags'].apply(lambda x:x.lower())"

},

"cell_type": "code",

"execution_count": 124,
"id": "1a1389eb-5aa2-4477-a6d2-d042395764cc",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"0 in the 22nd century, a paraplegic marine is di...\n",

"1 captain barbossa, long believed to be dead, ha...\n",

"2 a cryptic message from bond’s past sends him o...\n",

"3 following the death of district attorney harve...\n",

"4 john carter is a war-weary, former military ca...\n",

" ... \n",

"4804 el mariachi just wants to play his guitar and ...\n",

"4805 a newlywed couple's honeymoon is upended by th...\n",

"4806 \"signed, sealed, delivered\" introduces a dedic...\n",

"4807 when ambitious new york attorney sam is sent t...\n",

"4808 ever since the second grade when he first saw ...\n",

"Name: tags, Length: 4806, dtype: object"

},

"execution_count": 124,

"metadata": {},
"output_type": "execute_result"

],

"source": [

"sorted_df['tags']"

},

"cell_type": "code",

"execution_count": 153,

"id": "03de0106-7284-487a-b29f-4e98e640a8b4",

"metadata": {},

"outputs": [],

"source": [

"#import necessary libraries\n",

"import nltk\n",

"from sklearn.feature_extraction.text import CountVectorizer \n",

"cv=CountVectorizer(max_features=5000, stop_words='english')\n",

"\n",

"# Text vectorization\n",

"vectors=cv.fit_transform(sorted_df['tags']).toarray()"

]
},

"cell_type": "code",

"execution_count": 154,

"id": "f27952ac-280e-460c-b911-19df31a25531",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"array([0, 0, 0, ..., 0, 0, 0], dtype=int64)"

},

"execution_count": 154,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"vectors[0]"

},
{

"cell_type": "code",

"execution_count": 155,

"id": "3d0b3b20-4d21-49ea-ae62-81b5b2ab676a",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"array(['000', '007', '10', ..., 'zone', 'zoo', 'zooeydeschanel'],\n",

" dtype=object)"

},

"execution_count": 155,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"cv.get_feature_names_out()"

},
{

"cell_type": "code",

"execution_count": 156,

"id": "3a3277bc-e3a0-4d3e-8ae8-e546811c3db7",

"metadata": {},

"outputs": [],

"source": [

"#Steammming\n",

"import nltk\n",

"from nltk.stem.porter import PorterStemmer \n",

"ps=PorterStemmer()\n",

"\n",

"def stem(text):\n",

" y=[]\n",

" for i in text.split():\n",

" y.append(ps.stem(i))\n",

" return \" \".join(y)\n"

},

"cell_type": "code",

"execution_count": 157,
"id": "772e24b2-9232-44f1-82a2-8c8f94a4b3b7",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"'in the 22nd century, a parapleg marin is dispatch to the moon pandora on
a uniqu mission, but becom torn between follow order and protect an alien
civilization. action adventur fantasi sciencefict cultureclash futur spacewar
spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi
marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington
zoesaldana sigourneyweav jamescameron'"

},

"execution_count": 157,

"metadata": {},

"output_type": "execute_result"

],

"source": [

"stem('In the 22nd century, a paraplegic Marine is dispatched to the moon


Pandora on a unique mission, but becomes torn between following orders and
protecting an alien civilization. Action Adventure Fantasy ScienceFiction
cultureclash future spacewar spacecolony society spacetravel futuristic romance
space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar
powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver
JamesCameron')"

},

"cell_type": "code",

"execution_count": 158,

"id": "2582c623-3bcf-40de-8225-3b0c74591878",

"metadata": {},

"outputs": [

"name": "stderr",

"output_type": "stream",

"text": [

"C:\\Users\\CSC\\AppData\\Local\\Temp\\
ipykernel_21432\\3626395266.py:1: SettingWithCopyWarning: \n",

"A value is trying to be set on a copy of a slice from a DataFrame.\n",

"Try using .loc[row_indexer,col_indexer] = value instead\n",

"\n",

"See the caveats in the documentation: https://pandas.pydata.org/pandas-


docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",

" sorted_df['tags']=sorted_df['tags'].apply(stem)\n"

]
}

],

"source": [

"sorted_df['tags']=sorted_df['tags'].apply(stem)"

},

"cell_type": "code",

"execution_count": 159,

"id": "b258ffbd-013d-4c71-94fa-53b5fd3c3457",

"metadata": {},

"outputs": [

"data": {

"text/plain": [

"array(['000', '007', '10', ..., 'zone', 'zoo', 'zooeydeschanel'],\n",

" dtype=object)"

},

"execution_count": 159,

"metadata": {},

"output_type": "execute_result"
}

],

"source": [

"cv.get_feature_names_out()"

},

"cell_type": "code",

"execution_count": null,

"id": "19a59917-ad12-4431-bed1-d8f81030b39b",

"metadata": {},

"outputs": [],

"source": []

],

"metadata": {

"kernelspec": {

"display_name": "Python 3 (ipykernel)",

"language": "python",

"name": "python3"

},

"language_info": {
"codemirror_mode": {

"name": "ipython",

"version": 3

},

"file_extension": ".py",

"mimetype": "text/x-python",

"name": "python",

"nbconvert_exporter": "python",

"pygments_lexer": "ipython3",

"version": "3.11.5"

},

"nbformat": 4,

"nbformat_minor": 5

Screenshots
CONCLUSION

In the last few decades, recommendation systems have been used, among the
many available solutions, in order to mitigate information and cognitive
overload problem by suggesting related and relevant items to the users. In this
regards, numerous advances have been made to get a high-quality and fine-
tuned recommendation system. Nevertheless, designers face several prominent
issues and challenges. Although, researchers have been working to cope with
these issues and have devised solutions that somehow and up to some extent try
to resolve these issues, however we need much to do in order to get to the
desired goal. In this research art

icle, we focused on these prominent issues and challenges, discussed what has
been done to mitigate these issues, and what needs to be done in the form of
different research opportunities and guidelines that can be followed in coping
with at least problems like latency, sparsity, context-awareness, grey sheep and
cold-start problem.

BIBLIOGRAPHY

 Dataset Source:
 Kaggle – TMDB Movie Dataset

 Tools and Resources:

 Scikit-learn Documentation: Machine Learning Algorithms.

 Pandas Documentation: Data Analysis and Manipulation in Python.

 NumPy Documentation: Numerical Computing in Python.

 Methodologies:

 Cosine Similarity: A Brief Overview of Cosine Similarity in


Recommender Systems.

 TF-IDF Vectorizer: Introduction to TF-IDF for Text Mining.

 Collaborative Filtering vs. Content-Based Filtering: Overview of


Recommender System Approaches.

You might also like