Recommendations Using Association Rules
Recommendations Using Association Rules
• Clicks can means tap, swipe, expand, enter of a key, blink or any other modern devices
gestures performed for apps, websites, mobile sites.
• Devices – Computing Devices, Laptop, Tablet, Smartphones, Wearables, Gaming
Devices, AR/VR Devices, Wearable Devices
• Web comprises of Web Properties.
• Web Properties are management by Web management software and tools which can
be open-source as well as proprietary. One of the popular software is Google Analytics
platform.
• Web properties types are apps, websites, mobile sites, wearable apps
Web log data
• Clickstream data - the sequence of clicks or interactions made by a user while navigating a website or
an application(apps).
• It provides a record of the pages visited, the order in which they were accessed, and the actions taken
on each page (e.g., clicks, taps, zoom, form submissions, downloads).
• Clickstream data is typically collected from the user-side.
• This data is crucial for understanding user behavior, identifying patterns, and optimizing user
experiences on the website. Clickstream data is commonly used in web analytics, conversion rate
optimization, and user journey analysis.
• Clickstream data can be obtained from Google Analytics platform or similar proprietary tools.
Transactional Data in Social Media & Web
Transactional data refers to
• Information related to specific user actions or interactions that result in a measurable outcome
or conversion on a website.
• These transactions can include a wide range of activities, such as purchases, sign-ups,
downloads, form submissions, or any other action that is considered valuable to the website
owner.
• Some example transactions for business :
• E-commerce Transactions – purchases/views made by users
• Lead Generation - form submissions, sign-ups for newsletters, requests for quotes, or contact form
inquiries
• Clicks on Specific Elements – Clicks for movies, houses, food, grocery, hotels, books, questions etc.
• Event Registrations, Subscription Sign-ups, Download Events
Google Analytics and similar platforms offer features to track and analyze transactional data.
Web Analytics • Web Content Analytics - Text Analytics
process can be applied for Topics,
Sentiments, and building models for
classification
• Web log data , Clickstream data or Transactions data of clicks recorded over a period
can provide insights in form of discovering patterns.
Session Id Item
Session ID 1 Item 1 Whenever Item 4 is clicked Item 5 is also clicked
Session ID 1 Item 3
OR
Session ID 1 Item 4
Session ID 1 Item 5 Whenever Item 4 is bought Item 5 is also bought
Session ID 2 Item 9 OR
Session ID 2 Item 4
Whenever Item 4 is viewed Item 5 is also viewed
Session ID 2 Item 5
Patterns
Transformed Data
Pre-processed Data
Discovery
For a pattern on historical data, we may need more support. More support can lead to confidence.
Confidence
• 121212?
• 12121231212123121212?
• 121212➔ 3
• Models are created using historical data by detecting patterns. It is a calculated guess about
likelihood of repetition of pattern.
Assumption – Past behaviour of users is the best predictor of future performance
Transactional Data
• Web log data can be used as a Transactional Data Session Id Item
Session ID 1 Item 1
Session ID 1 Item 3
• Similarly, Clickstream Data and Order/View/Purchase/Rating Data Session ID 1 Item 4
Session ID 1 Item 5
• Let us use a representative Transactional Data for items sold by platforms like Session ID 2 Item 9
Session ID 2 Item 4
Session ID 2 Item 5
Big Basket or Grofers
Txn_ID Item
5 Diaper
3 Beer
3 Coke
1 Milk
4 Bread Txn_ID Itemset
2 Beer
1 Bread, Coke, Milk
3 Diaper Data Selection
4 Diaper 2 Beer, Bread
1 Coke
5 Milk 3 Beer, Coke, Diaper, Milk
1 Bread
4 Milk 4 Beer, Bread, Diaper, Milk
2 Bread 5 Coke, Diaper, Milk
5 Coke
4 Beer
3 Milk Transactional Datasbase Transaction ID Dataset
What Is ASSOCIATION RULE MINING?
Given a set of records, let say for a fresh farm ecommerce app , each of which contain some number of
items from a given collection
• produce dependency rules which will identify occurrence of an item based on occurrences of other items
For example, set of records is in form of transaction data table from which rules are found
Txn_ID Itemset
1 Bread, Coke, Milk
Discovering Rules Rules Found:
2 Beer, Bread, Milk {Milk} => {Coke}
3 Beer, Coke, Diaper, Milk {Milk, Beer} => {Diaper}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
• Rule
• E.g. X => Y or LHS itemset = > RHS itemset or antecedent itemset => consequent itemset
• Where
• X is an itemset and is Left Hand Side (LHS) of Rule also called as antecedent of the rule. Let X = { Milk, Beer}
• Y is an itemset and is Right Hand Side (RHS) of Rule also called as consequent of the rule. Let Y = {Diaper}
• Rule sign => is implication means co-occurrence & NOT causality but merely association
Support
• Support count
• Frequency of occurrence of an itemset in the Transaction Id dataset
• Example Support Count of ({Milk, Beer, Diaper}) = 2 Txn_ID Itemset
• Example Support Count of ({Milk, Bread}) = 3 1 Bread, Coke, Milk
2 Beer, Bread, Milk
• Support
3 Beer, Coke, Diaper, Milk
• Fraction of transactions that contain an itemset
4 Beer, Bread, Diaper, Milk
• For a rule X => Y
• Probability that a transaction contains (X U Y) i.e. both X and Y
5 Coke, Diaper, Milk
• Alternatively,
• Confidence(X=>Y) = Support (X U Y)/ Support (X)
• If we take Ex & Ey as events that a transaction contains itemset X & Y respectively then
• Support (X U Y) = P (Ex Ey)
• Confidence (X=>Y) = P(Ey/Ex) = P (Ex Ey) / P(Ex) = Support (X U Y) / Support (X)
• Confidence is an indication of how often the rule has been found to be true.
Association Rule Mining Task
• Now the Association Rule Mining Task can be broken down as
• Given a set of transactions T, the goal of association rule mining is to find all rules having
• support ≥ minsup threshold ( user provided parameter) – generate frequent itemsets
• confidence ≥ minconf threshold ( user provided parameter) – generate association rules itemsets support
(Beer) 0.6
• Also, additionally we can consider Lift measure for evaluating rules with lift ≥ minlift threshold (Bread) 0.6
(Coke, Milk) 0.6
Txn_ID Item
Txn_ID Itemset Txn_Id Beer Bread Coke Diaper Milk
(Milk, Diaper) 0.6
5 Diaper (Diaper, Bread, Beer) 0.2
3 Beer 1 Bread, Coke, Milk 1 0 1 1 0 1 (Milk, Bread, Beer) 0.4
3 Coke 2 1 1 0 0 1
(Diaper, Coke, Beer) 0.2
1 Milk 2 Beer, Bread (Coke, Milk, Beer) 0.2
3 1 0 1 1 1 (Diaper, Milk, Beer) 0.4
4 Bread
3 Beer, Coke, Diaper, Milk (Milk, Coke, Bread) 0.2
2 Beer 4 1 1 0 1 1
3 Diaper Frequent Itemsets
5 0 0 1 1 1
4 Diaper 4 Beer, Bread, Diaper,
Milk
1 Coke Processed Data
5 Milk 5 Coke, Diaper, Milk
1 Bread antecedents consequents support confidence lift
4 Milk (Beer) (Diaper) 0.4 0.666667 1.111111
Transaction ID Dataset
2 Bread (Coke) (Diaper) 0.4 0.666667 1.111111
5 Coke (Milk) (Diaper) 0.6 0.6 1
4 Beer (Milk, Beer) (Diaper) 0.4 0.666667 1.111111
• Lift & other measures can be used to prune/rank the derived patterns
Coffee 𝐶𝑜𝑓𝑓𝑒𝑒
• Consider the transactions of grocery online retail with items of Tea
• Now Lift < 1 denotes negative association or items are substitute. Lift > 1 denotes positive association.
• As in example Lift < 1, therefore is negatively associated i.e. substitute items) . So Lift is useful to judge positive as well as negative association.
• Other rules interest measures are leverage, conviction, rule power factor , chisquare, cosine, coverage which are described at
http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
Applications of Association Rule Mining
• Market Basket Analysis or Association Rule Mining is applied in areas such as
• Clickstream Analytics – Consider rule for Goodreads as if you viewed the {Biography, … } -->
{Memoir}
• Marketing and Sales Promotion – Consider discovered rule as {Laptop, … } --> {Mousepad}
Applications of Association Rule Mining
• Sequential Pattern Discovery - Given: set of objects, each associated with its own timeline of
events, find rules that predict strong sequential dependencies among different events, of the
form (A B) (C) (D E) --> (F).
For example, (Shoes) (Racket, Racketball) --> (Sports Jacket)
• Catalogue design for business, Product clustering , Credit/debit card analysis , Web usage
mining, Banking & Insurance products profiles
• Bundling of Frequent items together for cross sell and up sell.
Applications of Association Rule Mining
Supermarket shelf management - Consider discovered rule, for simiplicity, {bread} => {butter}
E-commerce sites and apps management – Arrange the items placement on apps for strategies
10%
Off
Apriori Algorithm in Action – Numerical Example
• Other is to use the algorithms like apriori, FP Growth, Eclat. These are implemented in mlxtend package
in python and Arules package in R.
Algorithms
• Apriori algorithm - uses a breadth-first search strategy to count the support of itemsets
and uses a candidate generation function which exploits the downward closure
property of support.
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDE