1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
Apriori algorithm helps the customers to buy their products with ease and
increases the sales performance of the particular store
• Support
• Confidence
• Lift
Suppose you have 4000 customers transactions in a Big Bazar. You have to
calculate the Support, Confidence, and Lift for two products, and you may say
Biscuits and Chocolate. This is because customers frequently buy these two items
together.
Out of 4000 transactions, 400 contain Biscuits, whereas 600 contain Chocolate,
and these 600 transactions include a 200 that includes Biscuits and chocolates.
Using this data, we will find out the support, confidence, and lift.
Support
Support refers to the default popularity of any product. You find the support as a
quotient of the division of the number of transactions comprising that product by
the total number of transactions. Hence, we get
= 400/4000 = 10 percent
Confidence
Confidence refers to the possibility that the customers bought both biscuits and
chocolates together. So, you need to divide the number of transactions that
comprise both biscuits and chocolates by the total number of transactions to get
the confidence.
Hence,
= 200/400
= 50 percent.
It means that 50 percent of customers who bought biscuits bought chocolates also.
Lift
Consider the above example; lift refers to the increase in the ratio of the sale of
chocolates when you sell biscuits. The mathematical equations of lift are given
below.
= 50/10 = 5
Consider a Big Bazar scenario where the product set is P = {Rice, Pulse, Oil,
Milk, Apple}. The database comprises six transactions where 1 represents the
presence of the product and 0 represents the absence of the product.
Step 1
Make a frequency table of all the products that appear in all the transactions.
Now, short the frequency table to add only those products with a threshold
support level of over 50 percent. We find the given frequency table.
The above table indicated the products frequently bought by the customers.
Step 2
Create pairs of products such as RP, RO, RM, PO, PM, OM. You will get the
given frequency table.
Step 3
Step 4
Now, look for a set of three products that the customers buy together. We get
the given combination.
Step 5
Calculate the frequency of the two itemsets, and you will get the given frequency table.
2. Explain in detail of various methods that improve the efficiency of
Apriori algorithm?
Ans:
Transaction Reduction
Partioning
Sampling
Transaction Reduction:
Apriori is an algorithm for frequent item set mining and association rule
learning over relational databases. It proceeds by identifying the frequent
individual items in the database and extending them to larger and larger item
sets as long as those item sets appear sufficiently often in the database.
Apriori uses a "bottom up" approach, where frequent subsets are extended one
item at a time (a step known as candidate generation), and groups of candidates
are tested against the data. The algorithm terminates when no further successful
extensions are found.
Partioning:
Partitioning Method:
This clustering method classifies the information into multiple groups based on
the characteristics and similarity of the data. Its the data analysts to specify the
number of clusters that has to be generated for the clustering methods.
Sampling:
Relies on the fact that for an itemset to be frequent, all of its subsets must also
be frequent, so we only examine those itemsets whose subsets are all frequent
Support: How often a given rule appears in the database being mined.
Confidence: The number of times a given rule turns out to be true in practice.
Total no of Transactions(N) = 5
Support = 3 / 5
Confidence = 3 / 4
Generate Candidate set 1, do the first scan and generate One item set
In this stage, we get the sample data set and take each individual’s count and
make frequent item set 1(K = 1).
Hence the minimum support is 2 and based on that, item E will remove from the
Candidate set 1.
After Elimination :
Generate Candidate set 2, do the second scan and generate Second item set
Through this step, you create frequent set 2 (K =2) and takes each of their
Support counts.
After Elimination :
Generate Candidate set 3, do the third scan and generate Third item set
In this iteration create frequent set 3 (K = 3) and take count of Support. Then
compare with the minimum support value from the Candidate set 3.
Ans:
Correlation Analysis :
If there is correlation found, depending upon the numerical values measured, this
can be either positive or negative.
Negative correlation exists if one variable decreases when the other increases,
i.e. the high numerical values of one variable relate to the low numerical values
of the other.
Association Analysis:
Association Rule :
The strength of an association rule can be measured in terms of its support and
confidence. A rule that has very low support may occur simply by chance.
Confidence measures the reliability of the inference made by a rule.
s(X→Y)=σ(X∪Y)N
conf(X→Y)=σ(X∪Y)σ(X)
I(X→Y)=P(X,Y)P(X)×P(Y)
Rules at a high idea level may add to good judgment while rules at a low idea
level may not be valuable consistently.
Sometimes at the low data level, data does not show any significant pattern but
there is useful information hiding behind it.
Uniform Support –
At the point when a uniform least help edge is used, the search methodology is
simplified. The technique is likewise basic in that clients are needed to
determine just a single least help threshold
Reduce Support –
For mining various level relationship with diminished support, there are various
elective hunt techniques as follows.
• Level-by-Level independence –
Group-based support –
The group-wise threshold value for support and confidence is input by the user
or expert. The group is selected based on a product price or item set because
often expert has insight as to which groups are more important than others.
Example –
Experts are interested in purchase patterns of laptops or clothes in the non and
electronic category. Therefore low support threshold is set for this group to give
attention to these items’ purchase patterns.
8. Explain multidimensional space?
a space having more than three dimensions. Ordinary Euclidean space studied in
elementary geometry is three dimensional, planes are two dimensional, and
lines are one dimensional. The concept of a multidimensional space arose in the
process of the generalization of the subject of geometry.
• A dimension describes some aspect of the data that the company wants to
analyze. For example, your company would have a data with time
element in it—the Time could become a dimension in your model.
• A member corresponds to one point on a dimension. For example, in the
Time dimension, Monday would be a dimension member.
• A value is a unique characteristic of a member. For example, in the Time
dimension, 5/12/2008 might be the value of the member with the caption
“Monday.”
• An attribute is the full collection of members. For example, all the days
of the week would be an attribute of the Time dimension.
• The size, or cardinality, of a dimension is the number of members it
contains. For example, a Time dimension made up of the days of the
week would have a size of 7.
The following list defines some more of the common terms we use in
describing a multidimensional space.