29% found this document useful (7 votes)

892 views

Project Movielense Solution

The GroupLens Research Project conducts research on recommender systems and information filtering. This case study asks to analyze movie rating data to understand factors influencing ratings and build a model to predict ratings. The datasets provided include movie information, user ratings of movies, and user demographic data to complete feature engineering and modeling tasks.

Uploaded by

Sanjib Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

29% found this document useful (7 votes)

892 views

Project Movielense Solution

Uploaded by

Sanjib Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

DESCRIPTION

Background of Problem Statement:

The GroupLens Research Project is a research group in the Department of

Computer Science and Engineering at the University of Minnesota. Members of
the GroupLens Research Project are involved in many research projects related
to the fields of information filtering, collaborative filtering, and recommender
systems. The project is led by professors John Riedl and Joseph Konstan. The
project began to explore automated collaborative filtering in 1992 but is most
well known for its worldwide trial of an automated collaborative filtering system
for Usenet news in 1996. Since then the project has expanded its scope to
research overall information by filtering solutions, integrating into content-based
methods, as well as, improving current collaborative filtering technology.

Problem Objective:

Here, we ask you to perform the analysis using the Exploratory Data Analysis
technique. You need to find features affecting the ratings of any particular movie
and build a model to predict the movie ratings.

Domain: Entertainment

Analysis Tasks to be performed:

 Import the three datasets

 Create a new dataset [Master_Data] with the following columns MovieID
Title UserID Age Gender Occupation Rating. (Hint: (i) Merge two tables at a time.
(ii) Merge the tables using two primary keys MovieID & UserId)
 Explore the datasets using visual representations (graphs or tables), also
include your comments on the following:
1. User Age Distribution
2. User rating of the movie “Toy Story”
3. Top 25 movies by viewership rating
4. Find the ratings for all the movies reviewed by for a particular user of user
id = 2696
 Feature Engineering:
Use column genres:

1. Find out all the unique genres (Hint: split the data in column genre making
a list and then process the data to find out only the unique categories of
genres)
2. Create a separate column for each genre category with a one-hot encoding
( 1 and 0) whether or not the movie belongs to that genre.
3. Determine the features affecting the ratings of any particular movie.
4. Develop an appropriate model to predict the movie ratings
Dataset Description:

These files contain 1,000,209 anonymous ratings of approximately 3,900 movies

made by 6,040 MovieLens users who joined MovieLens in 2000.
Ratings.dat
Format - UserID::MovieID::Rating::Timestamp

Field Description
UserID Unique identification for each user
MovieID Unique identification for each movie
Rating User rating for each movie
Timestamp Timestamp generated while adding user review

 UserIDs range between 1 and 6040

 The MovieIDs range between 1 and 3952
 Ratings are made on a 5-star scale (whole-star ratings only)
 A timestamp is represented in seconds since the epoch is returned by
time(2)
 Each user has at least 20 ratings

Users.dat
Format - UserID::Gender::Age::Occupation::Zip-code

Field Description
UserID Unique identification for each user
Genere Category of each movie
Age User’s age
Occupation User’s Occupation
Zip-code Zip Code for the user’s location

All demographic information is provided voluntarily by the users and is not

checked for accuracy. Only users who have provided demographic information
are included in this data set.

 Gender is denoted by an "M" for male and "F" for female

 Age is chosen from the following ranges:

Value Description
1 "Under 18"
18 "18-24"
25 "25-34"
35 "35-44"
45 "45-49"
50 "50-55"
56 "56+"
 Occupation is chosen from the following choices:

Value Description
0 "other" or not specified
1 "academic/educator"
2 "artist”
3 "clerical/admin"
4 "college/grad student"
5 "customer service"
6 "doctor/health care"
7 "executive/managerial"
8 "farmer"
9 "homemaker"
10 "K-12 student"
11 "lawyer"
12 "programmer"
13 "retired"
14 "sales/marketing"
15 "scientist"
16 "self-employed"
17 "technician/engineer"
18 "tradesman/craftsman"
19 "unemployed"
20 "writer”

Movies.dat
Format - MovieID::Title::Genres

Field Description
Unique identification for
MovieID
each movie
Title A title for each movie
Genres Category of each movie

 Titles are identical to titles provided by the IMDB (including year of

release)

 Genres are pipe-separated and are selected from the following genres:
1. Action
2. Adventure
3. Animation
4. Children's
5. Comedy
6. Crime
7. Documentary
8. Drama
9. Fantasy
10. Film-Noir
11. Horror
12. Musical
13. Mystery
14. Romance
15. Sci-Fi
16. Thriller
17. War
18. Western
 Some MovieIDs do not correspond to a movie due to accidental duplicate
entries and/or test entries
 Movies are mostly entered by hand, so errors and inconsistencies may
exist

Movielens Case
Study.ipynb

Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
11 pages
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
No ratings yet
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
9 pages
Step by Step Guide To Q Replication
No ratings yet
Step by Step Guide To Q Replication
12 pages
Task - Data Engineering
No ratings yet
Task - Data Engineering
2 pages
MANP007 and MAKTP030 Individual Report-2
No ratings yet
MANP007 and MAKTP030 Individual Report-2
3 pages
Deep Learning - Project Scope Document
No ratings yet
Deep Learning - Project Scope Document
2 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Implementing Rapidly Changing Dimension: What Are Fast Changing Dimensions?
No ratings yet
Implementing Rapidly Changing Dimension: What Are Fast Changing Dimensions?
5 pages
Thera Bank - Project - Submission - V1 PDF
No ratings yet
Thera Bank - Project - Submission - V1 PDF
26 pages
Isss602 Data Analytics Lab: Assignment 2: Be Customer Wise or Otherwise
No ratings yet
Isss602 Data Analytics Lab: Assignment 2: Be Customer Wise or Otherwise
34 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
Project 02 Customer Service Requests Analysis Caltech
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
19 pages
Starbucks Sentiment Analysis Using VADER
No ratings yet
Starbucks Sentiment Analysis Using VADER
23 pages
Operation Analytics and Investigating Metric Spike
No ratings yet
Operation Analytics and Investigating Metric Spike
13 pages
ER Diagram - Drawio PDF
No ratings yet
ER Diagram - Drawio PDF
1 page
DA Portfolio Project
No ratings yet
DA Portfolio Project
16 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Hive Mock Test
100% (1)
Hive Mock Test
6 pages
Walmart - Sales: Pandas PD Seaborn Sns Numpy NP Matplotlib - Pyplot PLT Matplotlib Datetime
100% (1)
Walmart - Sales: Pandas PD Seaborn Sns Numpy NP Matplotlib - Pyplot PLT Matplotlib Datetime
26 pages
Project
No ratings yet
Project
16 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
(CS2102) Group 4 Project Report
No ratings yet
(CS2102) Group 4 Project Report
22 pages
Boston Condo Dataset and Dictionary
No ratings yet
Boston Condo Dataset and Dictionary
32 pages
Imaging With FTK Imager
No ratings yet
Imaging With FTK Imager
9 pages
Report Business
No ratings yet
Report Business
4 pages
Assignment Chapter 3 PDF
No ratings yet
Assignment Chapter 3 PDF
2 pages
Hotels Review Classification Final
No ratings yet
Hotels Review Classification Final
34 pages
Assignment Data Analysis Example
100% (1)
Assignment Data Analysis Example
10 pages
III Year - Internship Review 5P7
No ratings yet
III Year - Internship Review 5P7
10 pages
Montgomery_Fleet_Equipment_Inventory_FA_PART_2_END (1)
No ratings yet
Montgomery_Fleet_Equipment_Inventory_FA_PART_2_END (1)
5 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Unit 4
No ratings yet
Unit 4
4 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Photoart Gallery With Report
No ratings yet
Photoart Gallery With Report
69 pages
Unit - Iv: Machine Learning (ML) For Iot
No ratings yet
Unit - Iv: Machine Learning (ML) For Iot
17 pages
IMDB Movie Analysis Report
No ratings yet
IMDB Movie Analysis Report
11 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
The Scope of Object-Oriented Software Engineering
100% (1)
The Scope of Object-Oriented Software Engineering
9 pages
SRS Weather History V1.0 (Released)
No ratings yet
SRS Weather History V1.0 (Released)
22 pages
Capstone Project - Airline Passenger Satisfaction
No ratings yet
Capstone Project - Airline Passenger Satisfaction
18 pages
Bca Vi Sem (Datawartehousing) Unit - I Notes
No ratings yet
Bca Vi Sem (Datawartehousing) Unit - I Notes
66 pages
Customer Churn Analysis - Jupyter Notebook
No ratings yet
Customer Churn Analysis - Jupyter Notebook
10 pages
My Resume
No ratings yet
My Resume
2 pages
A Complete Tutorial Which Teaches Data Exploration in Detail PDF
No ratings yet
A Complete Tutorial Which Teaches Data Exploration in Detail PDF
18 pages
Format of Resume To Be Uploaded Under Application
No ratings yet
Format of Resume To Be Uploaded Under Application
3 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
Student Franchisee Management System
No ratings yet
Student Franchisee Management System
21 pages
Career Track Brochure - Data Science
No ratings yet
Career Track Brochure - Data Science
39 pages
Specialized Visualization Tools - Coursera PDF
50% (2)
Specialized Visualization Tools - Coursera PDF
3 pages
Mini Project Boutique-1
100% (1)
Mini Project Boutique-1
25 pages
Basic SQL Quiz - 2 Online Test
No ratings yet
Basic SQL Quiz - 2 Online Test
5 pages
Data Wrangling (Data Preprocessing) : Practical Assessment 1
No ratings yet
Data Wrangling (Data Preprocessing) : Practical Assessment 1
5 pages
Data Warehousing and Data Mining - Handbook
0% (2)
Data Warehousing and Data Mining - Handbook
27 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Sample - Project Abstract - Outline Report - Course No. - BITS ID Edited
100% (1)
Sample - Project Abstract - Outline Report - Course No. - BITS ID Edited
10 pages
6 TheRealTimeFaceDetectionandRecognitionSystem
No ratings yet
6 TheRealTimeFaceDetectionandRecognitionSystem
48 pages
Case Study: Flight Data Analysis Using Spark Graphx
No ratings yet
Case Study: Flight Data Analysis Using Spark Graphx
23 pages
Data Analytics Project
No ratings yet
Data Analytics Project
9 pages
Trainity Project 3
No ratings yet
Trainity Project 3
18 pages
Project 2 - Movielens Case Study
No ratings yet
Project 2 - Movielens Case Study
5 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
SAS Requirement
No ratings yet
SAS Requirement
90 pages
Project MovieLens 17082019 by Monalisa Ganguly
No ratings yet
Project MovieLens 17082019 by Monalisa Ganguly
28 pages
Maths Class Viii Question Bank
100% (1)
Maths Class Viii Question Bank
139 pages
Marketing Services Center: Bluemine - Analysis For The Database Software Choice JUNE 11, 2018
No ratings yet
Marketing Services Center: Bluemine - Analysis For The Database Software Choice JUNE 11, 2018
10 pages
60 Seconds Guide To Oracle RMAN Quick Reference
No ratings yet
60 Seconds Guide To Oracle RMAN Quick Reference
16 pages
MongoDB Architecture Guide
No ratings yet
MongoDB Architecture Guide
19 pages
Causal Plane Manipulation - Superpower Wiki - Fandom
No ratings yet
Causal Plane Manipulation - Superpower Wiki - Fandom
6 pages
Competency of The Legal Management Graduates in Their Law Subjects: Batch 2015-2016: An Assessment
No ratings yet
Competency of The Legal Management Graduates in Their Law Subjects: Batch 2015-2016: An Assessment
18 pages
Serial Install Manual PDF
No ratings yet
Serial Install Manual PDF
7 pages
New Project 47-1
No ratings yet
New Project 47-1
3 pages
8dio Requiem Professional 1 1 Read Me
No ratings yet
8dio Requiem Professional 1 1 Read Me
16 pages
Asha Patel
No ratings yet
Asha Patel
6 pages
DEN0022D - Power - State - Coordination - Interface - Candidate B - Delta
No ratings yet
DEN0022D - Power - State - Coordination - Interface - Candidate B - Delta
110 pages
Cmps QB PDF
No ratings yet
Cmps QB PDF
80 pages
Inverter General Arh9luab - Aoh30lmaw4 - 0504g2819
No ratings yet
Inverter General Arh9luab - Aoh30lmaw4 - 0504g2819
25 pages
TEST 10 Script
No ratings yet
TEST 10 Script
18 pages
Activity3 3 6heatingventilatingandair-Conditioningsystems
No ratings yet
Activity3 3 6heatingventilatingandair-Conditioningsystems
2 pages
YCAS
100% (1)
YCAS
88 pages
Cid 2 Code
No ratings yet
Cid 2 Code
3 pages
Sigtto Catalogue January 2023
No ratings yet
Sigtto Catalogue January 2023
52 pages
UltraDoser
No ratings yet
UltraDoser
2 pages
SDR & PR Mop Hdpe Pe100
No ratings yet
SDR & PR Mop Hdpe Pe100
3 pages
Catalog Tetris
No ratings yet
Catalog Tetris
156 pages
Monroe List 2019
No ratings yet
Monroe List 2019
36 pages
NS2 Documentation
No ratings yet
NS2 Documentation
15 pages
Waste Heat Recovery in Cement Plant IJERTV3IS051014 PDF
No ratings yet
Waste Heat Recovery in Cement Plant IJERTV3IS051014 PDF
5 pages
Something Fishy Lab Report
100% (1)
Something Fishy Lab Report
6 pages
Ubs Placement Brochure - PDF 2014 16
No ratings yet
Ubs Placement Brochure - PDF 2014 16
60 pages
Alia Sultan Obaid Al Kaabi: Objective
No ratings yet
Alia Sultan Obaid Al Kaabi: Objective
2 pages
Decision Making Tools
No ratings yet
Decision Making Tools
28 pages
Skoruba Identityserver4 Admin
No ratings yet
Skoruba Identityserver4 Admin
27 pages
Understanding PID Control Systems
No ratings yet
Understanding PID Control Systems
13 pages
Try Out 2: Bahasa Inggris SMK Karya Bhakti Purbalingga
No ratings yet
Try Out 2: Bahasa Inggris SMK Karya Bhakti Purbalingga
10 pages
LF Ei Engineers India Limited: (For Cone Roof I Floating Roof Tanks)
No ratings yet
LF Ei Engineers India Limited: (For Cone Roof I Floating Roof Tanks)
3 pages
ACUV-S Series: Air Cooled Condensing Units
No ratings yet
ACUV-S Series: Air Cooled Condensing Units
32 pages

Project Movielense Solution

Uploaded by

Project Movielense Solution

Uploaded by

DESCRIPTION

Background of Problem Statement:

The GroupLens Research Project is a research group in the Department of

Analysis Tasks to be performed:

 Import the three datasets

These files contain 1,000,209 anonymous ratings of approximately 3,900 movies

 UserIDs range between 1 and 6040

All demographic information is provided voluntarily by the users and is not

 Gender is denoted by an "M" for male and "F" for female

 Titles are identical to titles provided by the IMDB (including year of

You might also like