Data Mining Merged
Data Mining Merged
___________
(c) Using Apriori algorithm, find all frequent itemsets for following 07
transaction data.
( Take min_sup=60% and min_conf=80% )
ID Items
1 {M,O,N,K,E,Y}
2 {D,O,N,K,E,Y
3 {M,A,K,E}
4 {M,U,C,K,Y}
5 {C,O,O,K,I,E}
OR
Q.3 (a) What is the use of proximity measures? Explain any one proximity 03
measures with equation.
(b) Explain Bayesian learning and inference with suitable example. 04
(c) List the accuracy parameters used for the performance evaluation of 07
classification and discuss any five parameters with appropriate
example.
Q.4 (a) Differentiate supervised and unsupervised learning. 03
(b) Explain logistic regression with appropriate example. 04
1
(c) Explain working of decision tree algorithm with suitable example. 07
OR
Q.4 (a) Differentiate agglomerative and divisive methods of clustering. 03
(b) What do you mean by perceptron? Discuss single-layer and multi layer 04
perceptron.
(c) Explain K-means clustering algorithm and prove that outlier adversely 07
affect the performance of algorithm.
Q.5 (a) Give strength and weakness of k-means in comparison of k-medoids 03
algorithm.
(b) What is outlier? Why outlier mining is important? 04
(c) Write about different clustering approaches with their strength and 07
weakness.
OR
Q.5 (a) Briefly explain the spatial data mining and temporal mining. 03
(b) Discuss any four data mining features available in the WEKA. 04
(c) How data mining is useful for web mining. Discuss any four web 07
mining applications.
*************
2
Seat No.: ________ Enrolment No.___________
Q.3 (a) What are the techniques to improve the efficiency of Apriori algorithm? 03
(b) What is an Itemset? What is a Frequent Itemset? 04
(c) For the given data 07
Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips
Find the frequent itemsets and generate association rules on this. Assume
that minimum support threshold (s = 33.33%) and minimum confident
threshold (c = 60%).
OR
Q.3 (a) Describe the different classifications of Association rule mining. 03
(b) What is meant by Reduced Minimum Support? 04
(c) Explain the steps of the “Apriori Algorithm” for mining frequent itemsets 07
with suitable example.
1
OR
Q.4 (a) What is attribute selection measure? 03
(b) What is the difference between supervised and unsupervised learning 04
scheme.
(c) Describe the issues regarding classification and prediction. Write an 07
algorithm for decision tree.
2
Seat No.: ________ Enrolment No.___________
Marks
Transaction ID Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
For given example find support & confidence for
{Milk, Chocolate} ⇒ Pepsi.
{Milk, Pepsi} → {Chocolate}
{Chocolate, Pepsi} → {Milk}
OR
(c) Solve the following problem using Apriori algorithm. 07
Find the frequent itemsets and generate association rules on this.
Assume that minimum support threshold (s = 33.33%), minimum
confident threshold (c = 60%), minimum support count=2.
Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips
1
iii. Discretization
(b) Differentiate between Classification and Prediction. 04
(c) Explain Decision Tree Classification algorithm with the help of 07
example.
OR
Q.3 (a) Differentiate between supervised learning and unsupervised 03
learning.
(b) What is Regression? Explain Linear Regression in short. 04
(c) Explain Naïve Bayes Classifier with example. 07
Q.4 (a) What do you mean by Tree Pruning? Explain with example. 03
(b) Explain the following as attribute selection measure: 04
(i) Information Gain
(ii) Gain Ratio
(c) What do you mean by learning-by-observation? Explain k-Means 07
clustering algorithm in detail.
OR
Q.4 (a) Define Data Cube. Explain any two operations on it. 03
(b) Differentiate between Partition method and Hierarchical method of 04
Clustering.
(c) What are the requirements of Clustering in Data Mining? 07
Q.5 (a) How K-Mean clustering method differs from K-Medoid clustering 03
method?
(b) Draw and explain the topology of a multilayer, feed-forward Neural 04
Network.
(c) Explain the major issues in data mining. 07
OR
Q.5 (a) Give difference between text mining and web mining. 03
(b) Why Hadoop is important? 04
(c) What is web log? Explain web structure mining and web usage 07
mining in detail.
************
2
Seat No.: ________ Enrolment No.___________
MARKS
Q.1 (a) What is market basket analysis? Precisely explain the meaning of the 03
following association rule:
computer → antivirus_software [support = 60%, confidence = 60%]
(b) In real-world data, tuples with missing values for some attributes are a 04
common occurrence. List and describe various methods for handling this
problem.
(c) With the help of a suitable diagram, describe the steps involved in data 07
mining when viewed as a process of knowledge discovery.
Q.2 (a) Give a short example to show that items in a strong association rule are 03
not always interesting.
(b) Briefly describe how partitioning technique may improve the efficiency 04
of Apriori algorithm.
(c) Discuss how frequent itemsets can be generated using FP-Growth 07
algorithm with the help of the following transactions. Let minimum
support threshold is 2.
Transaction ID Item IDs
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3
OR
(c) A database has the following six transactions. 07
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 Chips, Coke
T3 Coke, Chips, HotDogs
T4 Ketchup, Chips
T5 Buns, HotDogs
T6 HotDogs, Chips, Coke
1
Find all frequent itemsets and also generate the strong association rules
using Apriori algorithm. Let minimum support threshold is 33.34% and
minimum confidence threshold is 60%.
Q.3 (a) Describe any three primitives for specifying a data mining task. 03
(b) The following table shows the midterm and final exam grades obtained 04
by students in a database course.
x (Midterm exam) y (Final exam)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
Use the method of least squares to find an equation for the prediction of
a student’s final exam grade based on the student’s midterm grade in the
course. Predict the final exam grade of a student who received 86 grade
in the midterm exam.
(c) What is noise? Describe the possible reasons for noisy data. Explain the 07
different techniques to remove the noise from data.
OR
Q.3 (a) Discuss outlier analysis as a data mining functionality with the help of 03
an example.
(b) Explain how classification rules are extracted from a decision tree with 04
the help of an example.
(c) Explain in detail - min-max normalization method. Use this method to 07
normalize the following group of data by setting min = 0 and max = 1.
200, 400, 600, 1000
*************
3
Seat No.: ________ Enrolment No.___________