191CSC503T - Data Mining-Cat 2-Question Bank
191CSC503T - Data Mining-Cat 2-Question Bank
CO1:To understand data mining principles and techniques and Introduce DM as a cutting
edge business intelligence
CO2:To study the overview of developing areas – web mining, text mining and ethical
aspects of data mining
CO3:To study algorithms for finding hidden and interesting patterns in data
CO4:To understand and apply various classification and clustering techniques using tools.
CO5:To identify business applications and trends of data mining
CO’s Bloom’s
Q.No Questions
Level
Part – B
Illustrate in detail about the Bayesian Classification methods with an K3
1. example. CO3
2. Discuss about constraint based association rule mining with example CO3 K3
Outline the working principle of the support vector machine with a neat
3. sketch.
CO3 K4
CO’s Bloom’s
Q.No Questions
Level
Part c
Evaluate the following dataset using Naive Bayes classification algorithm.
Sl. No. Color Legs Height Smelly Species
2 Green 2 Tall No M
5 Green 2 Short No H
6 White 2 Tall No H
7 White 2 Tall No H
Justify your answer: For a university dataset assume the necessary features
2. required for model evaluation and selection.
CO3 K5
CO’s Bloom’s
Q.No Questions
Level
Part – B
Consider that the data mining task is to cluster the following eight points K3
A1,A2,A3,B1,B2,B3,C1AND C2(with (X,Y) representing location) into
three clusters A1(2,10) , A2(2,5) , A3(8,4) , B1(5,8) , B2(7,5) , B3(6,4) ,
C1(1,2) , C2(4,9).
1. The distance function is Euclidean distance. Suppose initially we assign A1,
CO4
B1 and C1 as the center of each cluster, respectively. Use the K-means
algorithm to show the three cluster centers after the first round of execution
and the final tree clusters.
K3
Use K-medoid algorithm to determine clusters for the following with k=2
Point X Y
P1 2 6
P2 3 4
P3 3 8
P4 4 7
2. P5 6 2 CO4
P6 6 4
P7 7 3
P8 7 4
P9 8 5
P10 7 6
K4
Outline the steps involved in the DBSCAN algorithm. Determine the core,
border, noise points from following data using DBSCAN. minpts=4 and
eps=1.9
Point X Y
P1 2 10
P2 2 5
P3 8 4
3. CO4
P4 5 8
P5 7 5
P6 6 4
P7 1 2
P8 4 9
CO’s Bloom’s
Q.No Questions
Level
Part C
Cluster the following eight points (with (x, y) representing locations)
into three clusters: (1, 2), (2, 5),(2, 10),(4, 9), (5, 8), (6, 4), (7, 5),(8,
4)
1. CO4 K5
Initial cluster centers are: (8, 4), (5, 8) (1, 2)
Use K-Means Algorithm to find the three cluster centers till the
second iteration.
Outline the steps involved in the DBSCAN algorithm. Determine the core,
border, noise points from following data using DBSCAN. minpts=4 and
eps=1.9
POINTS X Y
P1 3 7
P2 4 6
P3 5 5
P4 6 4
2. P5 7 3 CO4 K4
P6 6 2
P7 7 2
P8 8 4
P9 3 3
P10 2 6
P11 3 5
P12 2 4
CO’s Bloom’s
Q.No Questions
Level
Part A
Why is data preprocessing needed? Name any four preprocessing filters CO5 K2
1. used in the WEKA tool.
2. What are the foundations of data mining? CO5 K1
3. Name some specific application oriented databases. CO5 K2
4. Explain how data mining is used in health care analysis. CO5 K1
5. Explain data mining applications for bio medical and DNA data analysis. CO5 K1
6. Differentiate between data mining and data warehousing. CO5 K2
7. What are the applications of data mining? CO5 K1
8. List out the various data mining tools. CO5 K2
9. What is a dataset? Give an example. CO5 K1
10. What is association-rule learner? CO5 K1
11. Draw the layout of the Weka tool. CO5 K1
12. List out the limitations of the Weka tool. CO5 K2
13. Write down the functionalities of the Weka tool. CO5 K1
14. What is auto import? Give an example. CO5 K1
15. List out various data warehouse tools. CO5 K2
CO’s Bloom’s
Q.No Questions
Level
Part B
1. Discuss in detail about the WEKA tool and its functionalities. CO5 K3
2. Outline the features involved in the Iris plant database in detail. CO5 K4
3. Outline the features involved in the breast cancer database in detail. CO5 K3
4. Give a detailed note on Association rule learner. CO5 K3
Evaluate the performance measures of the different classification CO5 K4
5. algorithm for Iris plant dataset using WEKA tool
Evaluate the performance measures of the different clustering algorithm CO5 K4
6. for breast cancer dataset using WEKA tool
Evaluate the performance measures of the different clustering algorithm CO5 K4
7 for Iris plant dataset using WEKA tool
Evaluate the performance measures of the different classification CO5 K4
8 algorithm for breast cancer dataset using WEKA tool
CO’s Bloom’s
Q.No Questions
Level
Part- C
Illustrate the steps involved in loading and classifying the Iris plant database in CO5 K5
1. the WEKA tool.
Elucidate the steps involved in loading and classifying Breast cancer databases CO5 K5
2. in the WEKA tool.