BDA Mini Project Report
BDA Mini Project Report
A DISSERTATION REPORT ON
DEGREE
BACHELOR OF ENGINEERING
IN
COMPUTER ENGINEERING
GROUP MEMBERS
UNIVERSITY OF MUMBAI
2022-2023
DECLARATION
We declare that this written submission represents my ideas in my own words and where other’s
ideas or words have been included, I have adequately cited and referenced the original sources. I
also declare that I have adhered to all principal of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
Date:
Place: SHELU
2
CERTIFICATE
3
Index
TABLE OF CONTENTS
Abstract
List of Figures
List of Tables
Chapter 1 Introduction
1-2
Chapter 2 Literature Survey 3
Chapter 9 Conclusion 19
Reference 20
4
Abstract
The data available online, helps users to get information about anything of his/her interest.
But since the data is huge and complex it is difficult to get useful information from it.
Recommender System are effective software techniques to overcome this problem. Based on
the user's and item's information available, these techniques provides recommendations to
users in their area of interest. Recommender systems have wide applications like providing
suggestive list of items to customers for online shopping, recommending articles or books for
online reading, movie or music recommendations, news recommendation etc. In this project,
a collaborative filtering based books recommendation platform is proposed which uses both
the memory based and model based approaches.
5
List of Figures
1 System Architecture 8
6
List of Tables
1 Tools 13
7
Chapter 1
Introduction
1.1 Motivation:
There has been a lot of analysis done both in business and world on developing new
approaches for service recommender systems .a lot of firms capture large scale data
regarding their customers, providers and operations. The ascent of the amount of
consumers, services and different on-line data yields service recommender systems in
“Big Data‖” setting, which poses crucial challenges for service recommender systems.
Moreover, in most existing service recommender systems such as hotel reservation
systems and Restaurant place.
1.3 Aim:
This project aims to build & optimize a book recommendation system based on
collaborative filtering and will tackle an example of both memory based & model
based approach (using KNNWithMeans & Singular Value Decomposition)
Recommendation Systems are one of the largest application areas of Big Data
Analytics. They enable tailoring personalized content for users, thereby generating
revenue for businesses
1
Chapter 2
Problem Statement
2.2 Objectives
The objective of book recommender systems is to provide recommendations based on
recorded information on the users' preferences. These systems use information filtering
techniques to process information and provide the user with potentially more relevant
items. Recommend relevant books to users based on popularity and user interests.
2
Chapter 3
Literature Review
The proposes a simple comprehensible system for book recommendations that help
readers to recommend the correct book. In recent years, data analysis challenge has been
centered on for the administration recommendation system.For shoppers, network assets
square measure utterly joined and quickly developed. The planned method works on
coaching, feedback, management, reporting, configuration, and exploitation it to supply
helpful data to the user in order to assist in decision-making and knowledge item
recommendations.
Book recommendation system has been developed rapidly because of the net
technology and library modernization, which provide a replacement means for the librarians
to amass the readers’ demands. However, existing recommendation systems can’t provide
enough info for readers to choose whether or not to suggest a book or not, and that they don’t
analyze the recommendation info. Some systems conjointly lack of a feedback mechanism for
readers, which might hurt their enthusiasm. So as to unravel these issues, they designed
a novel book recommendation system.
Readers are redirected to the advice pages once they can’t realize the required book through
the library list retrieval system. the advice pages contain all the essential and increasing book
info for readers to seek advice from. Readers will suggest a book on these pages, and the
recommendation information is analyzed by the advice system to create scientific getting call.
They planned two formulas to reason the value and replica range respectively supported the
advice information. The application of the advice system shows that each the recommended
book utilization and readers’ satisfaction were greatly exaggerated
3
Chapter 4
Objectives and scope
Objectives:
Scope:
Given more information regarding the books dataset, namely features like Genre, Description
etc., we could implement a content-filtering based recommendation system and compare the
results with the existing collaborative-filtering based system.
We would like to explore various clustering approaches for clustering the users based on Age,
Location etc., and then implement voting algorithms to recommend items to the user
depending on the cluster into which it belongs.
4
Chapter 5
Proposed methodology
This project will use the 'Book-Crossing dataset' collected by Cai-Nicolas Ziegler
5
Exploratory data analysis
Ratings are of two types, an implicit rating & explicit rating. An implicit rating is
based on tracking user interaction with an item such as a user clicking on an item '0'.
An explicit rating is when a user explicitly rates an item, i.e., b/w '1-10'
• Majority of ratings are implicit i.e., rating '0'
• Rating of '8' has the highest rating count among explicit ratings '1-10'
6
Machine Learning – Model Selection
• After data cleaning, there were 78,782 records left
• 686 unique users who have rated > 250 books each
• 1913 unique book titles that have received > 50 ratings each
5 fold cross validation model performance on training set (with default model
parameters):
7
Chapter 6
System Architecture
Datasets were pre-processed to make suitable for developing the Recommendation system. Feature
extraction is performed in which Truncated-SVD is used to reduce the features of the dataset and
Data splitting is done in which training dataset and testing dataset are divided into 80:20 ratio.
Content Based Filtering System is developed in which book description is taken as an input and
Collaborative Filtering System is developed by building a model using K-Means Algorithm. Testing of
model with test data is performed.
8
Chapter 7
Tools
Dataset
9
Chapter 8
Implementation Screenshot
Program:
import numpy as np
import pandas as pd
import plotly.offline as py
import plotly.graph_objs as go
pio.renderers.default = "png"
import warnings
warnings.filterwarnings("ignore")
def loaddata(filename):
df = pd.read_csv(f'{filename}.csv',sep=';',error_bad_lines=False,w
arn_
bad_lines=False,encoding='latin-1')
10
return df
book = loaddata("BX-Books")
user = loaddata("BX-Users")
rating = loaddata("BX-Book-Ratings")
rating.shape
rating.head(3)
rating.info()
rating = rating[rating['User-
ID'].isin(rating_users[rating_users['Rating']>250]['index'])]
rating = rating[rating['ISBN'].isin(rating_books[rating_books['Rating'
]> 5 0]
['index'])]
rating
rating
11
print(f'Duplicate entries: {rating.duplicated().sum()}')
rating.drop_duplicates(inplace=True)
rating
list_of_distinct_users = list(rating['User-ID'].unique())
data = Dataset.load_from_df(rating[['User-ID','Book-
Title','Book-
Rating']], reader)
raw_ratings = data.raw_ratings
import random
threshold = int(len(raw_ratings)*0.8)
trainset = data.build_full_trainset()
testset = data.construct_testset(test_raw_ratings)
models=[KNNBasic(),KNNWithMeans(),KNNWithZScore(),KNNBaseline(),SVD()]
results = {}
12
recommendations[index] += all_item_weighted_rating[index]
else:
recommendations[index] = all_item_weighted_rating[index]
if all_item_weights[index] !=0:
recommendations[index] =recommendations[index]/\
(all_item_weights[index]*like_re
commend)
temp_df = pd.Series(recommendations).reset_index().sort_values(by=0, a
scending=False)
recommendations = list(temp_df.to_records(index=False))
final_recommendations = []
count = 0
flag = True
if item == userItem:
en rated by user,
if flag == True:
final_recommendations.append(trainset.to_raw_iid(item))
13
count +=1 # trainset has the items stored as inner id,
commendations
break
return(final_recommendations)
end=5, get_recommend=10)
recommendationsKNN
model.fit(trainset)
testset = trainset.build_anti_testset()
predictions = model.test(testset)
predictions_df = pd.DataFrame(predictions)
_recommend)
14
recommendations = []
recommendations.append(list(predictions_userID['iid']))
recommendations = recommendations[0]
return(recommendations)
recommendationsSVD
15
Output:
16
17
18
Chapter 9
Conclusion
We have successfully implemented a memory based as well as method based collaborative
filtering approach to make book recommendations in this project .In instances with a new user
or new item where little is known of the rating preference, collaborative filtering may not be
the method of choice for generating recommendations. Content based filtering methods may
be more appropriate. A book recommendation system is a type of recommendation system
where we have to recommend similar books to the reader based on his interest. The books
recommendation system is used by online websites which provide ebooks like google play
books, open library, good Read’s, etc. Often, a hybrid approach is taken for building real time
recommendations using multiple different approaches in industry. .
19
References
[1]B. Cui and X.Chen, "An Online Book Recommendation System Based on Web Service," 2
009 Sixth InternationalConference on Fuzzy Systems and Knowledge Discovery, 2009, pp.
520-524, doi: 10.1109/FSKD.2009.328.
[3] S. S. Sohail, J. Siddiqui and R. Ali, "Book recommendation system using opinion
mining technique," 2013International Conference on Advances in Computing,
Communications and Informatics (ICACCI), 2013, pp. 1609-1614, doi:
10.1109/ICACCI.2013.6637421.
20