0% found this document useful (0 votes)
29 views

Unit 3

3rd module

Uploaded by

apoorvaappu367
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
29 views

Unit 3

3rd module

Uploaded by

apoorvaappu367
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 38
©) studocu Unit-3 DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, M.Sc MPBil UNIT IIL Concept Description and Association Rule Mining: What is concept description? - Data Generalization and summarization-based characterization - Attribute relevance - class comparisons Association Rule Mining: Market basket analysis - basic concepts - Finding frequent item set Apriori algorithm - generating rules — Improved Apriori algorithm — Incremental ARM — Associative Classification — Rule Mining What is concept descriptio Concept Description is a definitive type of data mining. It defines a set of data including frequent buyers, graduate candidates, etc. It describes the characterization and comparison of the data. It also known as a class description when the concept to be described is defined as a class of objects] Data Generalization and summari: ased characterization ata Analysis point of view, data mining can be classified into two categories: Descriptive mining and predictive mining Descriptive mining: It describes the data set in a conei interesting general properties of data. and summative manner and presents Predictive mining: It analyzes the data to construct one or a set of models, and attempts to predict the behavior of new data sets. Databases usually store a large amount of data in great detail. However, users often like to view sets of summarized data in concise, descriptive terms. ‘What Is Concept Description The simplest kind of descriptive data mining is called concept description. A concept usually refers to a collection of data such as frequent_buyers, graduate_students and so on. As data mining task concept description is not a simple enumeration of the data. Instead, concept description generates descriptions for characterization and comparison of the data. It is sometimes called class description when the concept to be described refers to a clas of objects + Characterization: It provides a concise and succinct summarization of the given collection of data, + Comparison: It provides descriptions comparing two or more collections of data, Data Generalization & Summarization Data and objects in databases contain detailed information at the primitive concept level. For example, the item relation in a sales database may contain attributes describing low-level item information such as item ID, name, brand, category, supplier, place made and price. Itis useful to be able to summarize a large set of data and present it at a high conceptual level. For example, summarizing a large set of items relating to Christmas season sales provides a general description of such data, which can be very helpful for sales and marketing managers. isd Jobe ms of chavs Page TT Downloaded by Thammegowéa MT (rth DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, M.Sc MPBil This requires an important functionality called data generalization. Data Generalization A process that abstracts a large set of task-relevant data in a database from a low conceptual level to higher ones. Data Generalization is a summarization of general features of objects in a target class and produces what is called characteristic rules. The data relevant to a user-specified class are normally retrieved by a database query and run through a summarization module to extract the essence of the data at different levels of abstractions. For example, one may want to characterize the "OurVideoStore" customers who regularly rent more than 30 movies a year. With concept hierarchies on the attributes describing the target class, the attribute-oriented induction method can be used, for example, to carry out data summarization. Note that with a data cube containing a summarization of data, simple OLAP operations fit the purpose of data characterization. Approaches: + Data cube approach(OLAP approach). + Attribute-oriented induction approach. Presentation Of Generalized Results Generalized Relation: + Relations where some or all attributes are generalized, with counts or other aggregation values accumulated, Cross-Tabulation: ‘+ Mapping results into cross-tabulation form (similar to contingency tables). Visualization Techniques: + Pie charts, bar charts, curves, cubes, and other visual forms. Quantitative characteristic rules: + Mapping generalized results in characteristic rules with quantitative information associated with it Data Cube Approach It is nothing but performing computations and storing results in data cubes. Strength Page TZ Downloaded by Thammegowda MT (rithammegowda@gmallcom) acs DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, M.Sc MPBil + Anefficient implementation of data generalization, + Computation of various kinds of measures, e.g., count( ), sum( ), average( ), max( ). + Generalization and specialization can be performed on a data cube by roll-up and drill-down, Limitations + Ithandles only dimensions of simple non-numeric data and measures of simple aggregated numeric values. + Lack of intelligent analysis, can’t tell which dimensions should be used and what levels should the generalization reach. Attribute Itis a statistical approach for preprocessing data to filter out irrelevant attributes or rank the relevant attribute, Measures of attribute relevance analysis can be used to recognize irrelevant attributes that can be unauthorized from the concept description process. The incorporation of this preprocessing step into class characterization or comparison is defined as an analytical characterization, Data discrimination makes discrimination rules which are a comparison of the general features of objects between two classes defined as the target class and the contrasting class. It is a comparison of the general characteristics of targeting class data objects with the general characteristics of objects from one or a set of contrasting classes. The user can define the target and contrasting classes. The methods used for data discrimination are very similar to the approaches used for data characterization with the exception that data discrimination results include comparative measures. Reasons for attribute relevance analysis There are several reasons for attribute relevance analysis are as follows — + Itcan decide which dimensions must be included. + It can produce a high level of generalization. + Itcan reduce the number of attributes that support us to read patterns easily. The basic concept behind attribute relevance analysis is to evaluate some measure that can compute the relevance of an attribute regarding a given class or concept. Such measures involve information gain, ambiguity, and correlation coefficient, Attribute relevance analysis for concept description is implemented as follows — Data collection — It can collect data for both the target class and the contrasting clas processing, by query Preliminary relevance analysis using conservative AOI - This step recognizes a set of dimensions and attributes on which the selected relevance measure is to be use AOI can be used to implement preliminary analysis on the data by eliminating attributes having a high number of distinct values. It can be conservative, the AOI implemented should employ attribute generalization thresholds that are set reasonably large to enable more attributes to be treated in further relevance analysis by the selected measure. Remove- This process removes irrelevant and weakly relevant attributes using the selected relevance analysis measure. Generate the concept description using AOI — It can implement AOT using a less conservative set of attribute generalization thresholds. If the descriptive mining function is class characterization, only the original target class working relation is included now. isd Jobe ms of chavs Page Ts Downloaded by Thammegowéa MT (rth DATA MINING AND BUSINESS INTELLIGENCE C.CHANDRAPRIVA MS: Shi If the descriptive mining function is class characterization, only the original target class working relation is included. If the descriptive mining function is class characterization, only the original target class working relation is included. If the descriptive mining function is class comparison, both the original target class working relation and the original contrasting class working relation are included. class comparisons Association Rule Mining Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations or associations among the variables of dataset. Itis based on different rules to discover the interesting relations between variables in the database. The association rule leaming is one of the very important concepts of machine learning, and it is employed in Market Basket analysis, Web usage mining, continuous production, ete. Here market basket analysis is a technique used by the various big retailer to discover the associations between items. We can understand it by taking an example of a supermarket, as in a supermarket, all products that are purchased together are put together. For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these products are stored within a shelf or mostly nearby. Consider the below diagram: ik cereal ror} ced ‘Customer? Customer 3 Customer n Association rule learning can be divided into three types of algorithms: 1. Apriori 2. Eclat 3. F+P Growth Algorithm We will understand these algorithms in later chapters. How does Association Rule Learning work? Association rule learning works on the concept of If and Else Statement, such as if A then B. a= Downloaded by Thammegowda MT (mtthammegowda@gmailcom) Pagel? LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Here the If element is called antecedent, and then statement is called as Consequent.. These types of relationships where we can find out some association or relation between two items is known as single cardinality. 1t is all about creating rules, and if the number of items increases, then cardinality also increases accordingly. So, to measure the associations between thousands of data items, there are several metrics. These metrics are given below: Support © Confidence Lift Let's understand each of ther Support, Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the fraction of the transaction T that contains the itemset X. If there are X datasets, then for transactions T, it can be written as: Freq(X) T Supp|X) Confidence Confidence indicates how often the rule has been found to be true. Or how often the items X and Y ‘occur together in the dataset when the occurrence of X is already given, It is the ratio of the transaction that contains X and Y to the number of records that contain X. Freq(X.¥) Confidence= mene Freq(X) Lift Itis the strength of any rule, which can be defined as below formula: _ supp (XY) © Supp(X)xSupp(¥) Itis the ratio of the observed support measure and expected support if X and Y are independent of each other. It has three possible values: © If Lift= 1: The probability of occurrence of antecedent and consequent is independent of each other. Lift: It determines the degree to which the two itemsets are dependent to each other. o Lift {Butter} Some terminologies to familiarize yourself with Market Basket Analysis are: o Antecedent:Items or 'itemsets' found within the data are antecedents. In simpler words, it's the IF component, written on the left-hand side. In the above example, bread is the antecedent. © Consequent:A consequent is an item or set of items found in combination with the antecedent. It’s the THEN component, written on the right-hand side. In the above example, butter is the consequent. ‘Types of Market Basket Analys Market Basket Analysis techniques can be categorized based on how the available data is utilized. Here are the following types of market basket analysis in data mining, such as: er Ponisi Types of Market Basket 9nalysis eee perma 7 Pott ments id iieiniuacs Ey studocu Page 17 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL 1. Descriptive market basket analysis: This type only derives insights from past data and is the ‘most frequently used approach. The analysis here does not make any predictions but rates the association between products using statistical techniques. For those familiar with the basics of Data Analysis, this type of modelling is known as unsupervised learning. 2. Predictive market basket analysis: This type uses supervised leaming models like classification and regression. It essentially aims to mimic the market to analyze what causes what to happen. Essentially, it considers items purchased in a sequence to determine cross- selling. For example, buying an extended warranty is more likely to follow the purchase of an iPhone. While it isn't as widely used as a descriptive MBA, it is still a very valuable tool for marketers. 3. Differential market basket analysis: This type of analysis is beneficial for competitor analysis. It compares purchase history between stores, between seasons, between two time periods, between different days of the week, ete., to find interesting patterns in consumer behaviour. For example, it can help determine why some users prefer to purchase the same product at the same price on Amazon vs Flipkart. The answer can be that the Amazon reseller hhas more warehouses and can deliver faster, or maybe something more profound like user experience. Algorithms associated with Market Basket Analysis In market basket analysis, association rules are used to predict the likelihood of products being purchased together. Association rules count the frequency of items that occur together, secking to find associations that occur far more often than expected. Algorithms that use association rules include AIS, SETM and Apriori. The Apriori algorithm is commonly cited by data scientists in research articles about market basket analysis. It identifies frequent items in the database and then evaluates their frequency as the datasets are expanded to larger sizes. R's rules package is an open-source toolkit for association mining using the R programming language. This package supports the Apriori algorithm and other mining algorithms, including arulesNBMiner, opusminer, RKEEL and RSarules. With the help of the Apriori Algorithm, we can further classify and simplify the item sets that the consumer frequently buys. There are three components in APRIORI ALGORITHM: SUPPORT CONFIDENCE o LIFT For example, suppose 5000 transactions have been made through a popular e Now they want to calculate the support, confidence, and lift for the two products. For example, let's say pen and notebook, out of 5000 transactions, 500 transactions for pen, 700 transactions for notebook, and 1000 transactions for both. Page 1S Downloaded by Thammegowda MT (rithammegowda@gmallcom) DATA MINING AND BUSINESS INTELLIGENCE C.CHANDRAPRIVA, MSc MPhil, SUPPORT It has been calculated with the number of transactions divided by the total number of transactions made, 1, Support = freq(A, BYN support(pen) = transactions related to pen/total transactions i.e support > 500/5000=10 percent Whether the product sales are popular on individual sales or through combined sales has been calculated, That is calculated with combined transactions/individual transactions. 1, Confidence = freq (A, BY! freq(A) Confidence = combine transactions/individual transactions ie confidence-> 1000/500=20 percent LIFT Lift is calculated for knowing the ratio for the sales. 1. Lift = confidence percent/ support percent Lift> 20/10-2 When the Lift value is below 1, the combination is not so frequently bought by consumers. But in this case, it shows that the probability of buying both the things together is high when compared to the transaction for the individual items sold. Examples of Market Basket Analysis Here are the following examples that explore Market Basket Analysis by market segment, such as: Market Basket Analysis Examples Page 19 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Retail: The most well-known MBA case study is Amazon.com. Whenever you view a product, on Amazon, the product page automatically recommends, "Items bought together frequently." It is perhaps the simplest and most clean example of an MBA's cr selling techniques. Apart from e-commerce formats, BA is also widely applicable to the in-store retail segment. Grocery stores pay meticulous attention to product placement based and shelving optimization. For example, you are almost always likely to find shampoo and conditioner placed very close to each other at the grocery store. Walmart’s infamous beer and diapers association anecdote is also an example of Market Basket Analysis. Telecom: With the ever-increasing competition in the telecom sector, companies are paying close attention to customers’ services. For example, Telecom has now started to bundle TV and Internet packages apart from other discounted online services to reduce churn. IBFS: Tracing credit card history is a hugely advantageous MBA opportunity for IBES organizations. For example, Citibank frequently employs sales personnel at large malls to lure potential customers with attractive discounts on the go. They also associate with apps like Swiggy and Zomato to show customers many offers they can avail of via purchasing through credit cards, IBFS organizations also use basket analysis to determine fraudulent claims. Medicine: Basket analysis is used to determine comorbid conditions and symptom analysis in the medical field. It can also help identify which genes or traits are hereditary and which are associated with local environmental effec Benefits of Market Basket Analysis ‘The market basket analysis data mining technique has the following benefits, such as: Benefits of Market Basket Qnalysis a Increasing market share: Once a company hits peak growth, it becomes challenging to determine new ways of increasing market share. Market Basket Analysis can be used to put together demographic and gentrification data to determine the location of new stores or geo- targeted ads. Page [10 Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL © Behaviour analysis foundations of marketing. MBA can be used anywhere from a simple catalogue design to ‘UI/UX. Understanding customer behaviour patterns is a primal stone in the © Optimization of in-store operations: MBA is not only helpful in determining what goes on the shelves but also behind the store, Geographical patterns play a key role in determining the popularity or strength of certain products, and therefore, MBA has been increasingly used to optimize inventory for each store or warehouse. © Campaigns and promotions: Not only is MBA used to determine which products go together but also about which products form keystones in their product line, Recommendations: OTT platforms like Netflix and Amazon Prime benefit from MBA by understanding what kind of movies people tend to watch frequently. basic concepts Data mining is the process of finding useful new correlations, patterns, and trends by transferring through a high amount of data saved in repositories, using pattern recognition technologies including statistical and mathematical techniques. It is the analysis of factual datasets to discover unsuspected relationships and to summarize the records in novel methods that are both logical and helpful to the data owner. There are various concepts of data mining which are as follows ~ Classification — Classification is the procedure of discovering a model that represents and distinguishes data classes or concepts, for the objective of being able to use the model to predict the class of objects whose class label is anonymous. The derived model is based on the analysis of a group of training records (i.e., data objects whose class label is familiar) Predictions — Prediction is the same as classification, except that for prediction, the results are misrepresented in the future, Examples of prediction functions in business and research include ~ + It can be predicting the value of a stock three months into the future. + Itcan be predicting the percentage increase in traffic deaths next year if the speed limit is raised. + It can be predicting the winner of this fall’s baseball World Series, based on a correspondence of team statistics. + It can be predicted whether a definite molecule in drug discovery will begin a cost- effective new drug for a pharmaceutical company. Association Rules and Recommendation Systems — Association rules, or affinity analysis, are designed to find such general associations patterns between items in large databases. ‘The rules can be used in several methods. For example, grocery stores can use such information for product placement, They can use the rules for weekly promotional offers or for bundling products. Association rules derived from a hospital database on patients’ symptoms during consecutive hospitalizations can help find “which symptom is followed by what other symptom” to help predict future symptoms for returning patients. Page [11 Downloaded by Thammegowda MT (mtthammegowda @gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, CCHANDRAPRIVA. M.Sc M.Phil Data Reduction — Data mining is used to the selected data in a huge amount database. When data analysis and mining is completed on a huge amount of records then it takes a very high time to elops it impossible and infeasible It can reduce the processing time for data analysis, data reduction techniques are used to obtain a reduced representation of the dataset that is much smaller in volume by maintaining the integrity of the original data. By reducing the data, the efficiency of the data mining process is improved which produces the same analytical results. Data reduction aims to define it more compactly. When the data size is smaller, it is easier to use mature and computationally high-cost algorithms. The reduction of the data may be in terms of the number of rows (records) or terms of the number of columns (dimensions). Finding frequent item sets Frequent item sets, also known as association rules, are a fundamental concept in association rule mining, which is a technique used in data mining to discover relationships between items in a dataset. The goal of association rule mining is to identify relationships between items in a dataset that occur frequently together. A frequent item set is a set of items that occur together frequently in a dataset. The frequency of an item set is measured by the support count, which is the number of transactions or records in the dataset that contain the item set. For example, if'a dataset contains 100 transactions and the item set {milk, bread} appears in 20 of those transactions, the support count for {milk, bread} is 20, Association rule mining algorithms, such as Apriori or FP-Growth, are used to find frequent item sets and generate association rules. These algorithms work by iteratively generating candidate item sets and pruning those that do not meet the minimum support threshold. Once the frequent item sets are found, association rules can be generated by using the concept of confidence, which is the ratio of the number of transactions that contain the item set and the number of transactions that contain the antecedent (lefi-hand side) of the rule. Frequent item sets and association rules can be used for a variety of tasks such as market basket analysis, cross-selling and recommendation systems. However, it should be noted that association rule mining can generate a large number of rules, many of which may be irrelevant or uninteresting. Therefore, it is important to use appropriate measures such as lift and conviction to evaluate the interestingness of the generated rules. Association Mining searches for frequent items in the data set. In frequent mining usually, interesting associations and correlations between item sets in transactional and relational databases are found. In short, Frequent Mining shows which items appear together in a transaction or relationship. Need of Association Mining: Frequent mining is the generation of association rules from a Transactional Dataset. If there are 2 items X and Y purchased frequently then it’s good to put them together in stores or provide some discount offer on one item on purchase of another item, This can really increase sales, For example, itis likely to find that if a customer buys Milk and bread he/she also buys Butter. So the association rule is [‘milk]*[‘bread’]=>[ butter’). So the seller can suggest the customer buy butter if he/she buys Milk and Bread, Important Definitions : Support : It is one of the measures of interestingness. This tells about the usefulnes of rules, 5% Support means total 5% of transactions in the database follow the rule, and certainty Page [12 Downloaded by Thammegowda MT (rithammegowda@gmallcom) DATA MINING AND BUSINESS INTELLIGENCE C.CHANDRAPRIVA, MSc MPhil, Support(A > B) = Support_count(A U B) Confidence: A confidence of 60% means that 60% of the customers who purchased a milk and bread also bought butter. Confidence(A -> B) = Support_count(A U B) / Support_count(A) If. rule satisfies both minimum support and minimum confidence, it is a strong rule. Support_count(X): Number of transactions in which X appears. If X is A union B then itis the number of transactions in which A and B both are present. Maximal Itemset: An itemset is maximal frequent if none of its supersets are frequent. Closed Itemset: An itemset is closed if none of its immediate supersets have same support count same as Itemset. K- Itemset: Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if the corresponding support count is greater than the minimum support count. Example On finding Frequent Itemsets — Consider the given dataset with given 1 {A,C,D} 2 {B,C,D} | 3 {ABCD} | 4 eo 5 {A.B,C,D} Lets say minimum support count is 3 transactions. Relation hold is maximal frequent => closed => frequent I-frequent: {A} = 3; // not closed due to {A, C} and not maximal {B} = 4; // not closed due to {B, Dj and no maximal {C} = 4; //not closed due to {C, D} not maximal {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal 2efrequent: (A, B} = 2 // not frequent because support count < minimum support count so ignore 1A, C} = 3//not closed due to {A, C, D} {A, D} = 3 // not closed due to {A, C, D} {B, C} = 3 // not closed due to {B, C, D} {B, D} = 4// closed but not maximal due to {B, C, D} {C, D} = 4 // closed but not maximal due to {B, C, D} 5-frequent: (A, B, C} = 2// ignore not frequent because support count < minimum support count {A, B, D} = 2// ignore not frequent because support count < minimum support count {A, C, D} = 3 “maximal frequent {B, C, D} = 3 // maximal frequent 4-frequent: {A, B, C, D} = 2 /lignore not frequent {milk} 3._if'customers purchase milk and eggs they also purchase cheese and bread {milk, eggs) > { cheese, bread} 4. if customers purchase milk, cheese, and eggs they also purchase bread {milk, cheese, eggs} = {bread} Based on a set of transactions of customers Note that #1 and #2 are not the same as is demonstrated in the confidence rating of each rule described below. Implication means co-occurrence, not causali Ttemset: Bread, Milk A collection of 1 or more items Bread, Diaper, Beer, Fees Milk, Diaper, Beer, Coke + {bread, milk, diaper} Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke a) =)>)8)— Support Count: Support count, o, is the frequency count of occurences of the itemset + o({bread,milk,diaper}) = 2 Page [20 Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Support (similar to the idea of coverage with decision rules) Support is the percentage of instances in the database that contain all items listed in an itemset + For the bread AND milk cases #1 and #2 we might have o(bread and milk) = 5000 out of 50000 instances for s=10% support + or in the case of the tiny 5 items dataset, we would have 6=3 out of 5 instances for s=60%. Association Rule ‘An association rule is an implication expression of the form X —»> Y, where X and ¥ are itemsets + Example: {Milk, Diaper} + {Beer} Confidence (similar to the idea of accuracy with decision rules) Each rule has an associated confidence: the conditional probability of the association. E.g,, the probability that purchasing a set of items they then purchase another set of items, so if there were 10000 recorded transactions purchasing milk, and of those 5000 purchase bread, we have 50% confidence for rule #1. For rule #2, we might have 15000 purchasing bread, of which 5000 purchased milk, then it is 33% confidence. In the 4 itemset example Example: {Milk, Diaper} => {Beer} (Milk, Diaper, Beer) _ 2 _ IT] 5 _ o(Milk,Diaper,Beer) 2 ~~ @(Milk,Diaper) ~ 3 04 0.67 Item Sets tem sets are attribute-value combinations that meet a specified coverage requirement (minimum support). Item sets that do not make the cut are discarded. We can also talk about minimum confidence. Page [21 Downloaded by Thammegowda MT (mtthammegowda @gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Association Rules Mining Approach Given a set of transactions, T, the goal of association rule mining is to find all rules having + support > minSup threshold + confidence > minConf threshold Brute-foree approacl + List all possible association rules + Compute the support and confidence for each rule ‘+ Prune rules that fail the minSup and minConf thresholds ‘Computationally prohibitive! (exponential 0(3")) Below is a graph showing the total number of rules to consider for d unique items. ~ " > x IM 3-2" 41 Number of rules Ifd=6, R= 602 rules {Milk,Diaper}—> {Beer} (s=0.4, c=0.67) x Bh ead, Milk. = {Milk.Beer} — {Diaper} (s-0.4, c=1.0) 2 __|| Bread, Diaper, Beer, Exes {Diaper,Beer}—> {Milk} (s-0.4, c=0.67) 3__| Milk, Diaper, Beer, Coke {Beer} —> (Milk,Diaper) (s=0.4, ¢=0.67) 7] Bread, Milk, Diaper, Beer {Diaper} + {Milk,Beer} (s=0.4, c=0.5) — {Milk} — {DiaperBeer} (s-0.4, c-0.5) 5 Bread, Milk, Diaper, Coke Observations: + All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} + Rules originating from the same itemset have identical support but can have different confidence ‘Thus, we may decouple the support and confidence requirements Page [22 Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Mining the Association Rules ‘Two-step approach: 1. Frequent Itemset Generation Generate all itemsets whose support >minsup 2. Rule Generation Generate high confidence rules fiom each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive. = Given d items, there are 24 possible 3 candidate itemsets Frequent Itemset Tiadiectiiuts List of Generation Candidat TID [items Sty Brute-force 1 Bread, Milk { approach: 4 Bread, Diaper, Beer, Eggs Each N_ |3___|Milk, Diaper, Beer, Coke M itemset in 4 __|Bread, Milk, Diaper, Beer { the lattice } 3___|Bread, Milk, Diaper, Coke isa <+— w—— candidate frequent itemset + Count the support of each candidate by scanning the database + Match each transaction against every candidate Complexity is exponential ~ O(NMw), which is expensive since M = 2! !!! Page [23 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) Limscacs DATA MINING AND BUSINESS INTELLIGENCE _G.CHANDRAPRIVA, MScALPHiL Strategies Reduce the number of candidates (M) + Complete search: M=24 + Use pruning techniques to reduce M (use Apriori principle, below) Reduce the number of transactions (N) + Reduce size of N as the size of itemset increases + Used by DHP and vertical-based mining algorithms Reduce the number of comparisons (NM) + Use efficient data structures to store the candidates or transactions + Noneed to match every candidate against every transaction Apriori principle: Ifan itemset is frequent, then all of its subsets must also be frequent Apriori principle holds duc to the following property of the support measure: WX,Y:(X CY) 9(X) 2 (Y), x and y are itemsets © Support of an itemset never exceeds the support of its subsets + This is known as the anti-monotone property of support Found to be Infrequent Pruned “*s._ supersets Page [2 Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL 1 ‘Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Lo Items (1-itemsets) 1 | Bread, Milk 2 ‘Beer, Bread, Diaper, Eggs [tem Count _| Bread 4 3 Beer, Coke, Diaper, Mik | =m Coke 2 4 __| Beer, Bread, Diaper, Milk ‘Milk 4 S = Beer 3 3 | Bread, Coke, Diaper, Milk Diaper ir Eggs 1 If every subset is considered, 6; + Cy + 6C; 6+15+20=41 With support-based pruning, 6+6+4=16 Click through the steps to see the animation sequence Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 The formal Apriori algorithm Fe: frequent k-itemsets Lic candidate k-itemsets Algorithm + Letk=1 + Generate F; = {frequent 1-itemsets} + Repeat until Fx is empty: © Candidate Generation: Generate Ly) from Fx © Candidate Pruning: Prune candidate itemsets in Liv: containing subsets of length k that are infrequent © Support Counting: Count the support of each candidate in Ly. by scanning the DB © Candidate Elimination: Eliminate candidates in Lxs1 that are infrequent, leaving only those that are frequent => Fut Informally, the algorithm is, id iieiniuacs Ey studocu Page [25 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, MScMPBil + Finding one-item sets easy + Use one-item sets to generate two-item sets, two-item sets to generate three-item sets, + Keep only those item sets that meet the support threshold at each level to prune those at higher levels. + Then partition the retained item sets into rules and keep only those that meet the confidence threshold. Example 2: credit card promotion database This example considers a dataset of nominal values, although binary, both of which can be considered interesting. Unlike marketbasket where only purchases are interesting, Single itemsets now can be twice as large than above. MagazineWatch Life Ins C4" Single item sets at a 40% coverage threshold: Card Sex Promo Promo Promo 7” ins. anole i Number of single item sets ite Yes No No No Male wems Yes Yes ‘Yes. No ‘Female | |A. Magazine Promo=Yes 7 No No No No Male B, Watch Promo=Yes 4 Yes Yes Yes Yes Male ‘Watch Promo=No 6 Yes NoYes No Female D. Life Ins Promo=Yes 5 No No No No Female E. Life Ins Promo=No 5 Yes No Yes Yes Male F. Credit Card Ins=No 8 No Yes No No Male (G. Sex=Male 6 Yes No No No Male Yes Yes Yes No Female | '/: Sex-Female 4 Pairing--Step 2 Now begin pairing up combinations with the same coverage threshold (again 40% here) Page [26 Downloaded by Thammegowda MT (rithammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, MScMPBil single item sets Number of items AB[CDIEF GH ‘A. Magazine Promo=Yes 7 B\3|- B, Watch Promo-Yes 4 cla) |- ‘atch Promo=No 6 b/s : D. Life Ins Promo=Yes 5 E|2| |4) |- IE, Life Ins Promo=No 5 F/5| |5| |5/- F. Credit Card Ins 8 Gia) 4) jaja). G. Sex=Male 6 H 4) |- H Sex=Female 4 Resulting rules from two item sets. Consider rules in both directions: 1. (AD) ( MagazinePromo=Yes )—> ( LifelnsPromo=Yes ) at 5/7 confidence 2, (DA) (LifelnsPromo=Yes ) — (MagazinePromo=Yees ) at 5/5 confidence 3. twenty more rules from the 10 two-item-sets (A then C, C then A, A then F, F then A, ete.) Now apply minimum confidence threshold If confidence threshold would be 80%, then the first rule (A — D) is eliminated. Repeat process for 3 item set rules, then 4 item set rules, ete., but keep the support and confidence thresholds the same. Candidate Generation: Fi.a x Fi Method Merge two frequent (k-1)-itemsets if their first (k-2) items are identical Example F; = {ABC,ABD,ABE,ACD,BCD,BDE} + Lexicographically ordered! Candidate four-item sets are: + Merge(ABC, ABD) = ABCD + Merge(ABC, ABE) = ABCE + Merge(ABD, ABE) = ABDE Page [27 Downloaded by Thammegowda MT (mtthammegowda @gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Do not merge(ABD,ACD) because they share only prefix of length 1 instead of length 2 (A CDE) Not candidate because of no (C D E) Li= {ABCD,ABCE,ABDE} is the set of candidate 4-itemsets generated from first method Candidate pruning + Prune ABCE because ACE and BCE are infrequent + Prune ABDE because ADE is infrequent After candidate pruning: F; ~ {ABCD} Alternate Fy-1 x Fi-1 Method Merge two frequent (k-1)-itemsets if the last (k-2) items of the first one is identical to the first (k-2) items of the second. Fs = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} +» Merge(ABC, BCD) = ABCD + Merge(ABD, BDE) = ABDE * Merge(ACD, CDE) = ACDE. + Merge(BCD, CDE) = BCDE Li= {ABCD,ABDE,ACDE,BCDE} is the set of candidate 4-itemsets generated from second method pruning results in Fs = {ABCD} why are others eliminated? Lass wi Page [28 Downloaded by Thammegowda MT (rithammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, MSs, Frequent 2ttemset Remset (eee Diaoers] (Bread, Diapers} (Bread. Milk) (Diapers, Man) Candidate Candidate Generation Pruning amy | aot Diapers, Min) Frequont (Bread, Diape 2atemset Homeat japere] (Bread, Diapers] (@read, Mik (Dapers, Map Figure 6.8. Generating and pruning candidate k-temsets by merging pairs of frequent (1) temsets. Rule generation A three item set will be partitioned to generate 6 rules: ‘An item set (A B C) generates rules (A&B) >C, (A&C)>B, (B&C)>A, A> (B&O), B>(A&O), C>(A&B) Example 4 item set L = (A B C D), the partitioning result in the following rules + ABCD, ABD— C, ACD +B, BCD — A, A~ BCD, B— ACD, C > ABD, D — ABC AB CD, AC BD, AD BC, BC — AD, BD— AC, CD AB If |L| =k, then there are 2*— 2 candidate association rules (We are ignoring L —> True and True —> L. Weka will include the latter!) In general, confidence does not have an anti-monotone property. + iie., conf{ABC —D) can be larger or smaller than conf(AB D) But confidence of rules generated from the same itemset has an anti-monotone property + E.g., Suppose {A,B,C,D} is a frequent 4-itemset: conf(ABC —> D) > conf(AB —+CD) > conf(A — BCD) Downloaded by Thammegowda MT (rtthammegowda@gmallcom) DATA MINING AND BUSINESS INTELLIGENCE C.CHANDRAPRIVA MS: Shi Confidence is anti-monotone w.r.t. number of items on the RHS of the rule. conf= (itemset) (ths ) Lattice of rules Low om Contideybo Rule ~—_, z Weather example Ne Ne me we No Page [30 Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL One-tem sets ‘Two-tem sets “Three-tem sets Fourtem sets Outbok = Sunny (5) Outlook = Sunny ‘Outlook = Sunny Outlook = Sunny Temperature = Hot (2) Temperature = Hot Tempetature = Hot Humidty = High (2) Humidity = High Play = No (2) ‘Temperature = Cool (4) Outlook = Sunny ‘Outiook = Sunny Outlook = Rany Humidity = High (3) Humidty = High Temperature = Mid ‘Windy = False (2) windy = False Pay = Yes (2) In total: (with minimum support of two) + 12 one-item sets, + 47 two-item sets, + 39 three-item sets, + 6 four-item sets + 0 five-item sets Once all item sets with minimum support have been generated, we can turn them into rules Example: + Humidity ~ Normal, Windy ~ False, Play ~ Yes (4) Seven (2-1) potential rules (6 useful ones) If Humidity = Normal and Windy = False — Play = Yes (4/4) If Humidity =Normal and Play = Yes + Windy = False (4/6) If Windy = False and Play = Yes > Humidity = Normal (4/6) If Humidity = Normal — Windy = False and Play = Yes (4/7) If Windy = False > Humidity = Normal and Play = Yes (4/8) 1fPlay = Yes + Humidity = Normal and Windy = False (4/9) 27 If True + Humidity = Normal and Windy = False and Play = Yes (4/12) Factors Affecting Complexity of Apriori Choice of minimum support threshold + lowering support threshold results in more frequent itemsets + this may increase number of candidates and max length of frequent itemsets Dimensionality (number of items) of the data set + more space is needed to store support count of each item + if number of frequent items also increases, both computation and I/O costs may also increase Page [31 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA. MSs, Size of database + since Apriori makes multiple passes, run time of algorithm may increase with number of transactions Average transaction width + transaction width increases with denser data sets + This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width) Support Counting of Candidate Itemsets ‘Scan the database of transactions to determine the support of each candidate itemset Must match every candidate itemset against every transaction--- expensive operation The highlighted itemset support? Search all transactions... ‘Milk [itemset_ ae = Pray 2 |Beer, Bread, Diaper, Eggs eran 3 ___|Beer, Coke, Diaper, Milk Bread, Diap 4 ___ | Beer, Bread, Diaper, Milk era ay 5 Bread, Coke, Diaper, Milk To reduce number of comparisons, store the candidate itemsets in a hash structure Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets Transactions Hash Structure TID [hems ‘Bread, Milk ‘Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke ‘Bread, Milk, Diaper, Beer ‘Bread, Milk, Diaper, Coke +2 ase os Buckels Suppose you have 15 candidate itemsets of length 3: {145}, (124), {457}, {125}, {45.8}, {159}, (13 6}, (234), (56 7}, (345), {356}, (35 7}, {68 9}, {3 67}, {3 6 8} How many of these itemsets are supported by transaction (1,2,3,5,6)? Page | Downloaded by Thammegowda MT (mtthammegowda@gmailcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL Transaction, t 12356 Hash function 3,69 “ 14,7 ~" 14 345 356 367 “ 136° "" 357 368 689 125: Lsy 458 Matching transaction (1 2 3 5 6) leads to the buckets that contain the item sets to which counts can be incremented. EE cxsain Han ac om | [s89| Page [33 Downloaded by Thammegowda MT (rtthammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL General Considerations Association rules do not require identification of dependent variables first. This is a good example of information discovery. Not all rules may be useful. We may have a rule that exceeds our confidence level, but the item sets are also high in probability so not much new information is revealed. The lift is low. If eustomers purchase milk, they also purchase bread (conf, level of 50%) but if 70% of all purchases involves milk and 50% of purchases include bread, the rule is of little use. ‘Two types of relationships of interest: 1. association rules that show a lift in product sales for a particular product where the lift in sales is the result of is association with one or more other products—may conclude that marketing may use this information 2. association rules that show a lower than expected confidence for a particular association ‘may conclude that the products involved in the rule are competing for the same market, Start with high thresholds to see what rules are found; then reduce the levels as needed Improved Apriori algorithm Methods To Improve Apriori Efficiency Many methods are available for improving the efficiency of the algorithm. 1, Hash-Based Technique: This method uses a hash-based structure called a hash table for generating the k-itemsets and its corresponding count. It uses a hash funetion for generating the table 2. Transaction Reduction: This method reduces the number of transactions scanning in iterations. The transactions which do not contain frequent items are marked or removed, 3. Partitioning: This method requires only two database scans to mine the frequent itemsets. It says that for any itemset to be potentially frequent in the database, it should be frequent in at least one of the partitions of the database, 4, Sampling: This method picks a random sample S from Database D and then searches for frequent itemset in S. It may be possible to lose a global frequent itemset, This can be reduced by lowering the min sup. 5. Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked start point of the database during the scanning of the database. Applications Of Apriori Alg ‘Some fields where Apriori is used: 1, In Education Field: Extracting association rules in data mining of admitted students through characteristics and specialties. 2. In the Medical field: For example Analysis of the patient's database. 3. In Forestry: Analysis of probability and intensity of forest fire with the forest fire data 4. Apriori is used by many companies like Amazon in the Recommender System and by Google for the auto-complete feature. Page [34 Downloaded by Thammegowda MT (rithammegowda@gmallcom) DATA MINING AND BUSINESS INTELLIGENCE, C.CHANDRAPRIVA, M.Sc MPBil associative classification Associative Classification in Data Mining Bing Liu Et Al was the first to propose associative classification, in which he defined a model whose rule is “the right- hand side is constrained to be the attribute of the classification class”.An associative classifier is a supervised learning ‘model that uses association rules to assign a target value ‘The model generated by the association classifier and used to label new records consists of association rules that produce class labels. Therefore, they can also be thought of as a list of “if-then” clauses: if record meets certain criteria (specified on the left side of the rule, also known as antecedents), itis marked (or scored) according to the rule’s category on the right, Most associative classifiers read the list of rules sequentially and apply the first matching rule to mark new records. Association classifier rules inherit some metrics from association rules, such as Support or Confidence, which can be used to rank or filter the rules in the model and evaluate their quality Types of Associative Classification: ‘There are different types of Associative Classification Methods, Some of them are given below. 1. CBA (Classification Based on Associations): It uses association rule techniques to classily data, which proves to be more accurate than traditional classification techniques. It has to face the sensitivity of the minimum support threshold, When a lower minimum support threshold is specified, a large number of rules are generated. 2. CMAR (Classification based on Multiple Association Rules): It uses an efficient FP-tree, which consumes less memory and space compared to Classification Based on Associations, The FP-tree will not always fit in the main memory, especially when the number of attributes is large. 3. CPAR (Classification based on Predictive Association Rules): Classification based on predictive association rules combines the advantages of association classification and traditional rule-based classification. Classification based on predictive association rules uses a greedy algorithm to generate rules directly from training data, Furthermore, classification based on predictive association rules generates and tests more rules than traditional rule-based classifiers to avoid missing important rules. Association rule mining is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets found in various kinds of databases such as relational databases, transactional databases, and other forms of repositories. Table of Contents E But what is association rule? ‘The Association rule is a learning technique that helps identify the dependencies between two data items. Based on the dependency, it then maps accordingly so that it can be more profitable. Association rule furthermore looks for interesting associations among the variables of the dataset. It is undoubtedly one of the most important concepts of Machine Learning and has been used in different cases such as association in data mining and continuous production, among others. However, like all other techniques, association in data mining, too, has its own set of disadvantages. The same has been discussed in brief in this article An association rule has 2 parts: + an antecedent (if) and + aconsequent (then) ‘An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent, Have a look at this rule for instance: “Ifa customer buys bread, he's 70% likely of buying milk.” In the above association rule, bread is the antecedent and milk is the consequent. Simply put, it can be understood as a retail store's association rule to target their customers better. If the above rule is a result of a thorough analysis of some data sets, it can be used to not only improve customer Page 135 Downloaded by Thammegowéa MT (rth LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL service but also improve the company’s revenue. Association rules are created by thoroughly analyzing data and looking for frequent if/then pattems ‘Then, depending on the following two parameters, the important relationships are observed: 1. Support: Support indicates how frequently the iffthen relationship appears in the database. 2. Confidence: Confidence tells about the number of times these relationships have been found to be true. Must read: Free excel courses! So, ina given transaction with multiple items, Association Rule Mining primarily tries to find the rules that govern how or why such products/items are often bought together. For example, peanut butter and jelly are frequently purchased together because a lot of people like to make PB&I sandwiches. Learn Data Science Courses online at upGrad Association Rule Mining is sometimes referred to as “Market Basket Analysis”, as it was the first application area of association mining. The aim is to discover associations of items occurring together more often than you'd expect from randomly sampling all the possibilities. The classic anecdote of Beer and Diaper will help in understanding this better. The story goes like this: young American men who go to the stores on Fridays to buy diapers have a predisposition to grab a bottle of beer too. However unrelated and vague that may sound to us laymen, association rule mining shows us how and why! Let’s do a little analytics ourselves, shall we? Suppose an X store’s retail transactions database includes the following data: + Total number of transactions: 600,000 + Transactions containing diapers: 7,500 (1.25 percent) + Transactions containing beer: 60,000 (10 percent) + Transactions containing both beer and diapers: 6,000 (1.0 percent) From the above figures, we can conclude that if there was no relation between beer and diapers (that is, they were statistically independent), then we would have got only 10% of diaper purchasers to buy beer too. However, as surprising as it may seem, the figures tell us that 80% (=6000/7500) of the people who buy diapers also buy beer. This is a significant jump of 8 over what was the expected probability. This factor of increase is known as Lift — which is the ratio of the observed frequency of co-occurrence of our items and the expected frequency. How did we determine the lift? Simply by calculating the transactions in the database and performing simple mathematical operations. So, for our example, one plausible association rule can state that the people who buy diapers will also purchase beer with a Lift factor of 8. If we talk mathematically, the lift can be calculated as the ratio of the joint probability of two items x and y, divided by the product of their probabilities, Lift = P(xy)/[PO)P()] However, if the two items are statistically independent, then the joint probability of the two items will be the same as the product of their probabilities. Or, in other words, P(xy)=PQX)P(Y), which makes the Lift factor = 1. An interesting point worth mentioning here is that anti-correlation can even yield Lift values less than 1 — which corresponds to mutually exclusive items that rarely occur together Association Rule Mining has helped data scientists find out pattems they never knew existed. _ Basic Fundamentals of Statistics for Data Science Types Of Association Rules In Data Mining There are typically four different types of association rules in data mining. They are Page [36 Downloaded by Thammegowda MT (rithammegowda@gmallcom) LiMsc.cs, DATA MINING AND BUSINESS INTELLIGENCE, (CCHANDRAPRIVA, MSc M.PHIL + Multi-relational association rules + Generalized Association rule + Interval Information Association Rules + Quantitative Association Rules Multi-Relational Association Rule Also known as MRAR, multi-relational association rule is defined as a new class of association rules that are usually detived from different or multi-relational databases. Each rule under this class has one entity with different relationships that represent the indirect relationships between entities. Generalized Association Rule Moving on to the next (ype of association rule, the generalized association rule is largely used for getting a rough idea about the interesting patterns that often tend to stay hidden in data. Quantitative Association Rules ‘This particular type is actually one of the most unique kinds of all the four association rules available, What sets it apart from the others is the presence of numeric attributes in at least one attribute of quantitative association rules. This is in contrast to the generalized association rule, where the left and right sides consist of categorical attributes. Algorithms Of Associate Rule In Data Mining There are mainly three different types of algorithms that can be used to generate associate rules in data mining, Let’s take a look at them. + Apriori Algorithm Apriori algorithm identifies the frequent individual items in a given database and then expands them to larger item sets, keeping in check that the item sets appear sufficiently often in the database. + Eclat Algorithm ECLAT algorithm is also known as Equivalence Class Clustering and bottomup. Latice Traversal is another widely used method for associate rule in data mining. Some even consider it to be a better and more efficient version of the Apriori algorithm, + FP-growth Algorithm Also known as the recurring pattern, this algorithm is particularly useful for finding frequent patterns without the need for candidate generation. It mainly operates in two stages namely, FP-tree construction and extract frequently used item sets, Page [37 Downloaded by Thammegowda MT (mtthammegowda @gmallcom)

You might also like