Lab manual on recommender system
Lab manual on recommender system
MANNUAL ON
RECOMMENDER SYSTEM
CONTENTS
1. INTRODUCTION ……………………………………………. 2
2. WHAT IS RECOMMENDER SYSTEM? …………………….4
3. WHY RECOMMENDARER SYSTEM? ……………………..6
4. TYPES OF RECOMMENDERSYSTEM? ……………………7
5. CONTENT BASED FILTERING …………………..................7
6. COLLABORATIVE BASED FILTERING ……………………9
7. HYBRI BASED FILTERING …………………………………12
8. PHASES OF RECOMMENDER SYSTEM …………………..14
9. REQUIREMENTS ……………………………………………16
10. INSTALLATION OF ANACONDA………………………….17
11. BASICS ON PYTHON ……………………………………….31
12. BASIC ON NUMPY ………………………………………….37
13. BASIC ON PANDAS ………………………………………....39
14. BASIC ON MATPLOTLIB …………………………………..40
15. DIFFERENT STEPS TO CONSTRUCT BASIC RECOMMENDER
SYSTEM ……………………………………………………….42
16.BENEFITS OF RECOMMENDER SYSSTEM………………..56
17. CONCLUSION…………………………………………………57
18. ASSIGNMENT…………………………………………………58
1
INTRODUCTION
An abundant amount of information is created and delivered over
electronic media. Users risk becoming overwhelmed by the flow of
information, and they lack adequate tools to help them manage the
situation. Information filtering (IF) is one of the methods that is rapidly
evolving to manage large information flows. The aim of IF is to expose
users to only information that is relevant to them. Many IF systems have
been developed in recent years for various application domains. Some
examples of filtering applications are: filters for search results on the
internet that are employed in the Internet software, personal e-mail filters
based on personal profiles, list servers or newsgroups filters for groups
or individuals, browser filters that block non-valuable information, filters
designed to give children access them only to suitable pages, filters for e-
commerce applications that address products and promotions to potential
customers only, and many more. It deals with the delivering the
information which the user is going to like or they feel useful. The
information filtering system assist the user and provide the relevant
information from the data source.
In the past, people used to shop in a physical store, in which the items
available are limited. For instance, the number of movies that can be
placed in a Blockbuster store depends on the size of that store. By
contrast, nowadays, the Internet allows people to access abundant
resources online. On the Internet, where the number of choices is
overwhelming, there is need to filter, prioritize and efficiently deliver
relevant information in order to alleviate the problem of information
overload, which has created a potential problem to many Internet users.
Recommender systems solve this problem by searching through large
volume of dynamically generated information to provide users with
2
personalized content and services .Netflix, for example, has an enormous
collection of movies. Although the amount of available information
increased, a new problem arose as people had a hard time selecting the
items they actually want to see. This is where the recommender system
comes in.
History
Before internet, there are already several methods of filtering
information; for instance, governments may control and restrict the flow
of information in a given country by means of formal or informal
censorship. Let’s talk about information filters if we refer to newspaper
editors and journalists when they provide a service that selects the most
valuable information for their clients, readers of books, magazines,
newspapers, radio listeners and television viewers. This filtering
operation is also present in schools and universities where there is a
selection of information to provide assistance based on academic criteria
to customers of this service, the students. With the advent of the Internet
it is possible that anyone can publish anything he wishes at a low-cost. In
this way, it increases considerably the less useful information and
consequently the quality information is disseminated. With this problem,
it began to devise new filtering with which we can get the information
required for each specific topic to easily and efficiently.
3
What are recommender systems?
The sudden explosion in the amount of digital information and the
number of user of Internet have created a potential challenge of
information overload which hinders timely access to items of interest.
Information retrieval systems, such as Google, DevilFinder and AltaVista
have partially solved this problem but prioritization and personalization
of information were absent. This has increased the demand for
recommender systems.
Recommender systems aim to predict users’ interests and recommend
product items that quite likely are interesting for them. Data required for
recommender systems stems from explicit user ratings after watching a
movie or listening to a song, from implicit search engine queries and
purchase histories, or from other knowledge about the users/items
themselves.
Recommender system is defined as a decision making strategy for users
under complex information environments. From the perspective of E-
commerce as a tool that helps users search through records of knowledge
which is related to users’ interest and preference. It can also be defined
as a means of assisting and augmenting the social process of using
recommendations of others to make choices when there is no sufficient
personal knowledge or experience of the alternatives. Handle the problem
of information overload that users normally encounter by providing them
with personalized, exclusive content and service recommendations.
4
AREAS WHERE RECOMMENDER SYSTEM
USED.
Recommender Loyality
Searches system program
s
Fig -1
5
Why do we need recommender
systems?
1. Companies using recommender systems focus on increasing sales as a
result of very personalized offers and an enhanced customer experience.
2. Recommendations typically speed up searches and make it easier for
users to access content they’re interested in.
3. The user starts to feel known and understood and is more likely to buy
additional products or consume more content. By knowing what a user
wants, the company gains competitive advantage and the threat of losing
a customer to a competitor decreases.
4. Recommender systems are information filtering systems that deal with
the problem of information overload.
5. It has the ability to predict whether a particular user would prefer an
item or not based on the user’s profile.
6. Recommender systems are beneficial to both service providers and
users. They reduce transaction costs of finding and selecting items in an
online shopping environment.
7. Recommendation systems have also proved to improve decision making
process and quality.
8. In e-commerce setting, recommender systems enhance revenues, for the
fact that they are effective means of selling more products.
9. In scientific libraries, recommender systems support users by allowing
them to move beyond catalog searches.
6
TYPES OF RECOMMENDER
SYSTEM
Fig-2
7
interests. There are other systems, not considered purely content-based,
which utilize user personal and social data.
Content-based technique is a domain-dependent algorithm and it
emphasizes more on the analysis of the attributes of items in order to
generate predictions. When documents such as web pages, publications
and news are to be recommended, content-based filtering technique is the
most successful. In content-based filtering technique, recommendation is
made based on the user profiles using features extracted from the content
of the items the user has evaluated in the past.
In order to generate meaningful recommendations we use Vector Space
Model such as Term Frequency Inverse Document Frequency (TF/IDF)
or Probabilistic models such as Naïve Bayes Classifier, Decision Trees
or Neural Networks to model the relationship between different
documents within a corpus. These techniques make recommendations by
learning the underlying model with either statistical analysis or machine
learning techniques.
A common problem is that new users lack a defined profile unless they
are explicitly asked for information. Nevertheless, it is relatively simple
to add new items to the system. We just need to ensure that we assign
them a group according to their features. They have the ability to
recommend new items even if there are no ratings provided by users. So
even if the database does not contain user preferences, recommendation
accuracy is not affected. Also, if the user preferences change, it has the
capacity to adjust its recommendations in a short span of time. They can
manage situations where different users do not share the same items, but
only identical items according to their intrinsic features. Users can get
recommendations without sharing their profile, and this ensures privacy.
8
Content based filtering techniques are dependent on items’ metadata.
That is, they require rich description of items and very well organized
user profile before recommendation can be made to users. This is called
limited content analysis. So, the effectiveness of CBF depends on the
availability of descriptive data.
Collaborative filtering:-
Collaborative filtering technique is the most mature and the most
commonly implemented. Collaborative filtering recommends items by
identifying other users with similar taste; it uses their opinion to
recommend items to the active user. Collaborative recommender systems
have been implemented in different application areas. GroupLens is a
news-based architecture which employed collaborative methods in
assisting users to locate articles from massive news database.
Collaborative filtering is currently one of the most frequently used
approaches and usually provides better results than content-based
recommendations. Some examples of this are found in the
recommendation systems of YouTube, Netflix, and Spotify.
Collaborative filtering is a domain-independent prediction technique for
content that cannot easily and adequately be described by metadata such
as movies and music. Collaborative filtering technique works by building
a database (user-item matrix) of preferences for items by users. It then
matches users with relevant interest and preferences by calculating
similarities between their profiles to make recommendations. Such users
build a group called neighborhood. A user gets recommendations to those
items that he has not rated before but that were already positively rated
by users in his neighborhood.
The system uses collaborative filtering method to overcome scalability
issue by generating a table of similar items offline through the use of
item-to-item matrix. The system then recommends other products which
are similar online according to the users’ purchase history.
9
There are two types of methods to achieve this goal: memory-based and
model-based.
Memory-based:-
There are two approaches: the first one identifies clusters of users and
utilizes the interactions of one specific user to predict the interactions of
other similar users. The second approach identifies clusters of items that
have been rated by user A and utilizes them to predict the interaction of
user A with a different but similar item B. These methods usually
encounter major problems with large sparse matrices, since the number
of user-item interactions can be too low for generating high quality
clusters.
Model-based:-
These methods are based on machine learning and data mining
techniques. The goal is to train models to be able to make predictions.
For example, we could use existing user-item interactions to train a model
to predict the top-5 items that a user might like the most. One advantage
of these methods is that they are able to recommend a larger number of
items to a larger number of users, compared to other methods like
memory-based.
10
Issues with collaborative filtering systems are as defined:-
Cold-start problem
Hybrid Filtering:-
Hybrid filtering technique combines different recommendation
techniques in order to gain better system optimization to avoid some
limitations and problems of pure recommendation systems. The idea
behind hybrid techniques is that a combination of algorithms will provide
more accurate and effective recommendations than a single algorithm as
the disadvantages of one algorithm can be overcome by another
algorithm. The combination of approaches can be done in any of the
following ways: separate implementation of algorithms and combining
the result, utilizing some content-based filtering in collaborative
approach, utilizing some collaborative filtering in content-based
approach, creating a unified recommendation system that brings together
both approaches.
Different types hybrid filtering are
Weighted hybridization
Weighted hybridization combines the results of different recommenders
to generate a recommendation list or prediction by integrating the scores
from each of the techniques in use by a linear formula. They are given
equal weights at first, but weights are adjusted as predictions are
confirmed or otherwise. The benefit of a weighted hybrid is that all the
recommender system’s strengths are utilized during the recommendation
process in a straightforward way.
Switching hybridization
The system swaps to one of the recommendation techniques according to
a heuristic reflecting the recommender ability to produce a good rating.
12
The switching hybrid has the ability to avoid problems specific to one
method e.g. the new user problem of content-based recommender, by
switching to a collaborative recommendation system.
Cascade hybridization
Feature-augmentation
The technique makes use of the ratings and other information produced
by the previous recommender and it also requires additional functionality
from the recommender systems. Feature-augmentation hybrids are
13
superior to feature-combination methods in that they add a small number
of features to the primary recommender.
Meta-level
The internal model generated by one recommendation technique is used
as input for another. The model generated is always richer in
information when compared to a single rating. Meta-level hybrids are
able to solve the sparsity problem of collaborative filtering techniques
by using the entire model learned by the first technique as input for the
second technique.
14
includes cognitive skills, intellectual abilities, learning styles, interest,
preferences and interaction with the system. The user profile is normally
used to retrieve the needed information to build up a model of the user.
Explicit Feedback:-
Explicit feedback requires more effort from user, it is still seen as
providing more reliable data, since it does not involve extracting
preferences from actions, and it also provides transparency into the
recommendation process that results in a slightly higher perceived
recommendation quality and more confidence in the recommendations.
Implicit feedback:-
Implicit feedback reduces the burden on users by inferring their user’s
preferences from their behavior with the system. The method though does
not require effort from the user, but it is less accurate.
Learning phase
It applies a learning algorithm to filter and exploit the user’s features from
the feedback gathered in information collection phase.
Prediction/recommendation phase
It recommends or predicts what kind of items the user may prefer. This
can be made either directly based on the dataset collected in information
collection phase which could be memory based or model based or
through the system’s observed activities of the user.
15
What Are The Prerequisites For
Building A Recommender
System?
Data is the single most important asset. Essentially, you need to know
some details about your users and items. If metadata is all you have
available, you can start with content-based approaches. If you have a
large number of user interactions, you can experiment with more
powerful collaborative filtering. The larger the data set in your
possession, the better your systems will work.
What is metadata?
Metadata is "data that provides information about other data". In other
words, it is "data about data." Many distinct types of metadata exist,
including descriptive metadata, structural metadata, administrative
metadata reference metadata and statistical metadata. Descriptive
metadata is descriptive information about a resource. It is used for
discovery and identification. It includes elements such as title, abstract,
author, and keywords.
Structural metadata is metadata about containers of data and
indicates how compound objects are put together, for example,
how pages are ordered to form chapters. It describes the types,
versions, relationships and other characteristics of digital materials.
Administrative metadata is information to help manage a resource,
like resource type, permissions, and when and how it was created.
Reference metadata is information about the contents and quality
of statistical data.
Statistical metadata, also called process data, may describe
processes that collect, process, or produce statistical data
16
STEP 1: - INSTALLATION OF
ANACONDA
1. To download the installer: -
https://www.anaconda.com/distribution/ CLICK
We get the following: -
Fig-3
a. We now need to download according to your operating system
windows 64/32bits or version 2 or 3 but recommended version 3.
17
Fig - 4
b. On clicking on the download we get an .exe file. We click on the
.exe file we get the following: -
Fig - 5
18
Click here (click on
the next)
Fig - 6
click
Fig - 7
d. Clicking on next we get the following: -
19
Fig – 8
Note: - It may take some time.
Fig - 9
Then click on next, again next and the finish
e. The go to windows start menu. Click on the anaconda prompt.
20
click
Fig -10
f. Then at Anaconda prompt do the following-
g. Write the following commands: - > python Press
Enter
21
Fig - 11
h. Type >import this. It will import all the function required.
Fig -12
22
Then type “exit ()”
Step 2: - Installing Jupyter On Windows
1. Installing Jupyter on Windows using the
Anaconda Prompt
a. To install Jupyter on Windows, open the Anaconda
Prompt and type:
>conda install Jupyter
Type ‘y’ for yes when prompted. Once Jupyter is installed, type the
command below into the Anaconda Prompt to open the Jupyter
notebook file browser and start using Jupyter notebooks.
Fig – 13
Or you can directly type “jupyter notebook “ in the anaconda command
prompt.as shown below. (As some system may go directly)
23
b. Next we get the following: -
Fig -14
24
c. Now we get the following home page: -
Fig-15
25
d. Click on “NEW” we get the following: -
Fig – 16
e. After clicking on new click on python 3 as shown:
Click on it
Fig – 17
26
f. Then we get the following:-
Fig-18
g. As shown in the following: -
Fig 19
27
h. To run the program, click on the following: -
Fig -20
2. Steps to rename Jupyter: -
First go to file on Jupyter Notebook as shown: -
1) Click there and we get the following figure.
Click on
file
Fig-21
28
2) Click on the Rename.
Click on it
Fig-22
29
3) We get the following:-
Fig-23
Now we can rename as we want to save.
30
Basic On Python
Introduction on PYTHON: -
Python is one of the most popular programming language created
by Guido van Rossum.
It is a general-purpose interpreted, interactive, object-oriented, and
high-level programming language.
Python language is being used by almost all tech-giant companies
like – Google, Amazon, Facebook, Instagram, Dropbox, Uber…
etc.
Characteristics of Python
Following are important characteristics of Python
Programming −
It supports functional and structured programming methods as well
as OOP.
It can be used as a scripting language or can be compiled to byte-
and Java.
Why Python?
Python is one of the easiest language, which is readable and
understandable.
Here the codes written is nearer to the English language.
There is no such restriction in the language, so it is highly popular
among the developers.
31
Basic:
a. For initializing variables in python: -
i. <variables name> = <value>
Fig – 24
The program for taking an input from the user and print “hello”
Fig - 25
32
b. Blok of Indentation: -
In python indentation is used to define the loop and the control
structure. Here the user has to pay the attention to the whitespaces.
Like other language to define the starting of the block in the function
they use curly braces “{}” here in python it uses colons “:”.
Examples:
def symbol (): #user defined function and operation
a=a+1
return a
print (a) Comment line
Table -1
33
d. Decision making
i. if conditional statement: -
Fig - 26
ii. nested if: -
Fig -27
34
iii. if elif else ladder
Fig – 28
e. Loops in Python
1. while loop:
35
Fig -29
2. for loop:
for iterator_var in sequence:
statements (s)
Fig -30
Functions in python: -
A function is an organized reusable code which performs and defines
certain form of task.
Syntax: -
def functionname(parameters):
“function expression task and operation”
return [expression]
36
Python NumPy
NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working
with these arrays. It is the fundamental package for scientific computing
with Python. Python we have lists that serve the purpose of arrays, but
they are slow to process. NumPy aims to provide an array object that is
up to 50x faster than traditional Python lists. NumPy arrays are stored at
one continuous place in memory unlike lists, so processes can access and
manipulate them very efficiently. This behavior is called locality of
reference in computer science. This is the main reason why NumPy is
much faster than lists. Also it is optimized to work with latest CPU
architectures. Some python distribution already have NumPy installed
like, Anaconda, and Spyder etc.
Arrays in NumPy
Array in Numpy is a table of elements (usually numbers), all of the same
type, indexed by a tuple of positive integers. In Numpy, number of
dimensions of the array is called rank of the array. A tuple of integers
giving the size of the array along each dimension is known as shape of
the array. An array class in Numpy is called as ndarray. Arrays in
Numpy can be created by multiple ways, with various number of Ranks,
defining the size of the Array. Arrays can also be created with the use of
various data types such as lists, tuples, etc.
37
Import NumPy
Fig 31
Simple Array program :-
Fig- 32
38
Pandas Tutorial
Pandas is an open-source library that is built on top of NumPy library. It
is a Python package that offers various data structures and operations for
manipulating numerical data and time series. It is mainly popular for
importing and analyzing data much easier. Pandas is an open-source
library that is made mainly for working with relational or labeled data
both easily and intuitively. It provides various data structures and
operations for manipulating numerical data and time series. It is a high-
level data manipulation tool developed by Wes McKinney. It is built on
the Numpy package and its key data structure is called the Data Frame.
Data Frames allow you to store and manipulate tabular data in rows of
observations and columns of variables.
39
We import pandas in anaconda as:-
Fig-33
Matplotlib
Matplotlib is a plotting library for the Python programming language
and its numerical mathematics extension NumPy. It provides an object-
oriented API for embedding plots into applications using general-purpose
GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a
procedural "pylab" interface based on a state machine (like OpenGL),
designed to closely resemble that of MATLAB, though its use is
discouraged
matplotlib.pyplot is a collection of functions that make matplotlib work
like MATLAB. Each pyplot function makes some change to a figure: e.g.,
creates a figure, creates a plotting area in a figure, plots some lines in a
plotting area, decorates the plot with labels, etc. Matplotlib was originally
written by John D. Hunter.
40
Example on matplotlib:-
Fig-33
41
BASIC CONSTRUCTION OF
RECOMMENDER SYSTEM
Here we create a hybrid based filtering basic recommender system using
python and its library. Here we collect the dataset and normalized the
data.
Fig 34
42
Now we can see that the dataset has been downloaded as csv format.
Fig- 35
43
Fig-36
Now write the following command to open jupyter notebook.
“jupyter notebook” and press enter.
Fig-37
Now we can see that notebook has started:-
Fig-38
44
Step 3:- start python program.
1. Click on “NEW”.
CLICK ON
IT
Fig-39
2. After clicking on new click on python 3 as shown:
CLICK ON IT
Fig- 40
45
3. Then we get the following:-
Fig-41
Fig- 42
46
Step 5: Output of Movie Dataset and Credit Dataset.
47
Fig-44(movie dataset)
Fig- 45
Fig-46(input)
48
Fig 47(output)
49
Step 8: We Will Remove Unwanted Data From The
Merged Table.
Fig- 48
50
Fig -49
Fig-50 (formula)
Fig -51
51
Step 12: Calculate Weight.
Fig -52
52
Fig -53
Fig- 54
53
Fig -55
Graph -1
54
Step 15: Using Sklearn and MaxMinscaler We will
Normalized Data to Reduce the Gap Between Them.
Fig- 56
55
Benefits in Recommender System
1. Benefits of recommender systems are:
2. Revenue — past years, many researchers have studied and generate
many algorithms to learn increasing rate for an online customer like
Amazon site. Also, These algorithms study the difference between
shopping online sites with others using recommender systems for items
to increase revenue by increasing the number of sales.
3. Client Satisfaction — many times customers tend to expect to see near
similar product recommendation from their last browsing search on the
site. Mainly because they believe they will get more serious chances
for better products. When they leave the situation and get back
afterward; it would assist if their browsing data from the previous
shopping or viewing product list. This could further facilitate and guide
their e-Commerce activities, similar two experienced assistants. This
case of client satisfaction contributes to client retention.
4. Personalization — we often get recommendations from our friends.
They recognize what we like better than anyone else. This is the only
reason they are adept at recommending things and is what
recommendation systems try to model. You can utilize the data
collected indirectly to improve your website’s overall services and
assure that they are suitable according to a user’s preference.
5. Discovery — people need to be recommended items they would like or
prefer, and when they find a web page for shopping or movie, songs,
etc. meet their hopes they bound to visit this site again.
6. Provide Reports — is an integral piece of a personalization scheme.
Making the client accurate and up to the minute, reporting allows him
to make strong conclusions about his site and the management of a
movement. Founded on these reports clients can get offers for slow-
moving products in order to make a drive in sales.
56
CONCLUSION
A recommender system has been a hot topic for a long time. They are
simple algorithms which aim to provide the most relevant and accurate
items to the user by filtering useful stuff from of a huge pool of
information base. Recommender engines discovers data patterns in the
data set by learning consumer’s choices and produces the outcomes that
co-relates to their needs and interests.
57
Assignment Questions
1. What is data science?
2.What is data mining?
3. Define machine learning?
4. Define information filtering?
5. Types of information filtering?
6. Define recommender system.
7.Why recommender system?
8. Benefits of recommender system.
9. Difference between data mining and data filtering?
10. Give differences between data science and machine learning?
11. What are the problems faced in recommender system?
12. Different way of creating recommender system?
13. Different algorithm used in creating recommender system?
14. Give a summary on few algorithm?
15. What are the security problem faced by recommender system?
16. State different types of learning?
17. State different types of feedback?
18. Why matplotlib is used?
19. Create a graphical representation showing annual growth in
blockbuster movies using python.
20. Create a collaborative model using python using any data set.
21. Using any language of your choice create a basic recommender system.
58