Unit IV Recommender System

The document discusses hierarchical clustering and its steps. It then discusses recommender systems and the algorithms used for building them including association rules, collaborative filtering and matrix factorization. It provides examples of datasets and demonstrates how to generate association rules from transactional data using the apriori algorithm.

Uploaded by

Suja Mary

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Unit IV Recommender System

Uploaded by

Suja Mary

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Unit-IV

HIERARCHICAL CLUSTERING

Hierarchical clustering is a clustering algorithm which uses the following steps to develop clusters:

1. Start with each data point in a single cluster.

2. Find the data points with the shortest distance (using an appropriate distance measure) and merge
them to form a cluster.

3. Repeat step 2 until all data points are merged together to form a single cluster.

 The above procedure is called an agglomerative hierarchical cluster.

 AgglomerativeClustering in sklearn. cluster provides an algorithm for hierarchical clustering and
also takes the number of clusters to be created as an argument.
 The agglomerative hierarchical clustering can be represented and understood by using
dendrogram
from sklearn.cluster import AgglomerativeClustering
h_clusters = AgglomerativeClustering(3)
h_clusters.fit(scaled_beer_df)
beer_df[“h_clusterid”] = h_clusters.labels_
Compare the Clusters Created by K-Means and Hierarchical Clustering
To print each cluster independently and interpret the characteristic of each cluster.
beer_df[beer_df.h_clusterid == 0]

Recommender Systems
Recommendation systems are a set of algorithms which recommend most relevant items to
users based on their preferences predicted using the algorithms.
The following three algorithms that are widely used for building recommendation systems:
1. Association Rules
2. Collaborative Filtering
3. Matrix Factorization
Datasets
Using the following two publicly available datasets and build recommendations.
1. groceries.csv: This dataset contains transactions of a grocery store and can be downloaded from
http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml13/groceries.csv.
2. Movie Lens: This dataset contains 20000263 ratings and 465564 tag applications across 27278 movies.
The dataset can be downloaded from the link https://grouplens.org/datasets/movielens

ASSOCIATION RULES (ASSOCIATION RULE MINING)

 Association rule find combination lf items that frequently occur together in orders or baskets.
 The items that frequently occur together are called itemsets.
 An application of association rule mining is in Market Basket Analysis(MBA).
 MBA is a technique used mostly by retailers to find associations between items purchased by
customers,
To illustrate the association rule mining concept, let us consider a set of baskets and the items in those
baskets purchased by customers as depicted in Figure 9.1.

Items purchased in different baskets are:

1. Basket 1: egg, beer, sugar, bread, diaper
2. Basket 2: egg, beer, cereal, bread, diaper
3. Basket 3: milk, beer, bread
4. Basket 4: cereal, diaper, bread
The primary objective of a recommender system is to predict items that a customer may purchase in the
future based on his/her purchases so far. In future, if a customer buys beer, can we predict what he/she
is most likely to buy along with beer? To predict this, we need to find out which items have shown a
strong baskets purchased previously association with beer in previously purchased basket.

Association rule considers all possible combination of items in the previous baskets and computes
various measures such as support, confidence, and li- to identify rules with stronger associations.
One solution to retail problem is to eliminate items that possibly cannot be part of any itemsets. One
such algorithm the association rules use apriori algorithm. The apriori algorithm was proposed by
Agrawal and Srikant (1994). The rules generated are represented as

{diapers} -> {beer}

which means that customers who purchased diapers also purchased beer in the same basket.
{diaper, beer} together is called itemset. {diaper} is called the antecedent and the {beer} is consequent.

Metrics

Concepts such as support, confidence, and lift are used to generate association rules

Support indicates the frequencies of items are appearing together in baskets with respect to all
possible baskets being considered.

Lift value 1 indicate the item being independent. Lift value less than 1 implies the product are
substitution, and greater than 1 implies purchasing product increases and necessary for creating
association rule.

Applying Association Rules

To create association rules using the transactions data available in the groceries.csv dataset. Each line in the
dataset is an order and contains a variable number of items.
Loading the Dataset
Python’s open() method can be used to open the file and readlines() to read each line. The following code
block can be used for loading and reading the data:

The steps in this code block are explained as follows:

1. The code opens the file groceries.csv.
2. Reads all the lines from the file.
3. Removes leading or trailing white spaces from each line.
4. Splits each line by a comma to extract items.
5. Stores the items in each line in a list.
Encoding the Transactions
 Python library mlxtend provides methods to generate association rules from a list of transactions. The
transactions and items need to be converted into a tabular or matrix format.
 The matrix size will be of M × N, where M represents the total number of transactions and N
represents all unique items available across all transactions.
 The mlxtend library has a feature pre-processing implementation class called OnehotTransactions that
will take all_txns as an input and convert the transactions and items into one-hot-encoded format.
 The code for converting the transactional data using one-hot encoding is as follows:

The following code can be used for finding the size (shape or dimension) of the matrix.
one_hot_txns_df.shape
(9835, 171)
Generating Association Rules
To use apriori algorithms to generate itemset. The total number of itemset will depend on the
number of items that exist across all transactions
len(one_hot_txns_df.columns)
171
The code gives us an output of 171, that is, as mentioned in the previous section, there are 171 items. For
itemset containing 2 items in each set, the total number of itemsets will be 171C2, that is, the number of itemset
will be 14535.
Apriori algorithm takes the following parameters:
1. df: pandas − DataFrame in a one-hot-encoded format.
2. min_support: float − A float between 0 and 1 for minimum support of the itemsets returned.
Default is 0.5.
3. use_colnames: boolean − If true, uses the DataFrames’ column names in the returned DataFrame instead of
column indices.
The following commands can be used for setting minimum support.

Top Ten Rules

The top 10 association rules sorted by confidence. The rules stored in the variable rules are
sorted by con_dence in descending order
rules.sort_values( ‘confidence’,ascending = False)[0:10]
Pros and Cons of Association Rule Mining
The following are advantages of using association rules:
1. Transactions data, which is used for generating rules, is always available and mostly clean.
2. The rules generated are simple and can be interpreted.

Iso 19078 2013
No ratings yet
Iso 19078 2013
12 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Lightning Protection
67% (3)
Lightning Protection
20 pages
D 8 T
No ratings yet
D 8 T
2 pages
MODULE_4 Advance AIML part 1
No ratings yet
MODULE_4 Advance AIML part 1
12 pages
Module 4-1
No ratings yet
Module 4-1
34 pages
6 - Association Rules- for students
No ratings yet
6 - Association Rules- for students
39 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Module 4
No ratings yet
Module 4
11 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
CS2202_AssociationRuleMining
No ratings yet
CS2202_AssociationRuleMining
59 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
AIML mod 4
No ratings yet
AIML mod 4
37 pages
Unit-2
No ratings yet
Unit-2
8 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
DM Association
No ratings yet
DM Association
43 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
Association Rules Problem Statement
100% (1)
Association Rules Problem Statement
29 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
BD25
No ratings yet
BD25
19 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Contents
No ratings yet
Contents
59 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
association rule mapping -unit-4
No ratings yet
association rule mapping -unit-4
11 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
report
No ratings yet
report
5 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
S28
No ratings yet
S28
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Association Rules
No ratings yet
Association Rules
29 pages
M4
No ratings yet
M4
58 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Importance of Association Rule Mining and Its Real-Time Applications
No ratings yet
Importance of Association Rule Mining and Its Real-Time Applications
28 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
CH - 5
No ratings yet
CH - 5
43 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Lect 6
No ratings yet
Lect 6
74 pages
Association Rules
No ratings yet
Association Rules
24 pages
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
Unit-1 Control statement
No ratings yet
Unit-1 Control statement
15 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Sequential Storage
No ratings yet
Sequential Storage
9 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Programs
No ratings yet
Programs
10 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
ICUD Urogenital Infections
100% (1)
ICUD Urogenital Infections
1,200 pages
ER-56.12 (1) Distance Pieces
No ratings yet
ER-56.12 (1) Distance Pieces
10 pages
Hafeez Contractor: The Architect Who Redefine The Skyline of India
No ratings yet
Hafeez Contractor: The Architect Who Redefine The Skyline of India
17 pages
SFG 2024 Level2 - v3 - Updated Schedule 2
No ratings yet
SFG 2024 Level2 - v3 - Updated Schedule 2
10 pages
MIC College of Technology
No ratings yet
MIC College of Technology
25 pages
Bayesian Belief Networks
No ratings yet
Bayesian Belief Networks
9 pages
03 - System-Wide Concepts
No ratings yet
03 - System-Wide Concepts
42 pages
2758 0-2009 (+a1)
No ratings yet
2758 0-2009 (+a1)
25 pages
3:13-cv-06629 #1 - Complaint
No ratings yet
3:13-cv-06629 #1 - Complaint
178 pages
2024 WASSCE Super Mock Integrated Science 3
No ratings yet
2024 WASSCE Super Mock Integrated Science 3
8 pages
Report Text
No ratings yet
Report Text
3 pages
Schools in Uttar Pradesh: Ryan International School
No ratings yet
Schools in Uttar Pradesh: Ryan International School
12 pages
Ansul Agrawal Week 5 Workshop
No ratings yet
Ansul Agrawal Week 5 Workshop
5 pages
Drug Safety in Oncology 1: Series
No ratings yet
Drug Safety in Oncology 1: Series
9 pages
Saif-Book Review-2
No ratings yet
Saif-Book Review-2
5 pages
17 SparkSQL
No ratings yet
17 SparkSQL
44 pages
Saving Energy in Home
No ratings yet
Saving Energy in Home
7 pages
ML555 Development Kit For PCI and PCI Express Designs
No ratings yet
ML555 Development Kit For PCI and PCI Express Designs
108 pages
TIỂU LUẬN NHÓM - NHÓM 5 - SÁNG THỨ 3
No ratings yet
TIỂU LUẬN NHÓM - NHÓM 5 - SÁNG THỨ 3
11 pages
SolaNOVA 65 Selfcontained
No ratings yet
SolaNOVA 65 Selfcontained
2 pages
Comm Studies IA Analytical Guidelines
No ratings yet
Comm Studies IA Analytical Guidelines
2 pages
Darbyquaveresume FP
No ratings yet
Darbyquaveresume FP
2 pages
Introduction To Soil Ecology
No ratings yet
Introduction To Soil Ecology
15 pages
Multi Grade Lesson Plan Science Grade 2 3
No ratings yet
Multi Grade Lesson Plan Science Grade 2 3
7 pages
Monasteries of Achaia
No ratings yet
Monasteries of Achaia
49 pages
A Double-Edged Inheritance
No ratings yet
A Double-Edged Inheritance
4 pages
Software Project Management: Project Scope and Activities
No ratings yet
Software Project Management: Project Scope and Activities
47 pages