0% found this document useful (0 votes)
49 views6 pages

191CSC503T - Data Mining-Cat 2-Question Bank

Question bank for the subjects data mining

Uploaded by

harisiva062005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views6 pages

191CSC503T - Data Mining-Cat 2-Question Bank

Question bank for the subjects data mining

Uploaded by

harisiva062005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CONTINUOUS ASSESSMENT TEST – 2

Regulations R 2019 - V21

Department of Computer Science and Engineering


Third Year / Fifth Semester

191CSC503T - DATA MINING

CO1:To understand data mining principles and techniques and Introduce DM as a cutting
edge business intelligence
CO2:To study the overview of developing areas – web mining, text mining and ethical
aspects of data mining
CO3:To study algorithms for finding hidden and interesting patterns in data
CO4:To understand and apply various classification and clustering techniques using tools.
CO5:To identify business applications and trends of data mining

Unit – III CLASSIFICATION (2nd half)


PART A
1. Define Support vector machine. CO3 K1
2. Define back propagation. CO3 K1
3. What are K-nearest neighbor classifiers? CO3 K1
4. Differentiate lazy learners and Eager learners. CO3 K2
5. Illustrate support vector machines with example. CO3 K2
6. How would you show your understanding about rule based classification? CO3 K2
7. Discuss why pruning is needed in decision tree. CO3 K2
8. Define Lazy learners with an example. CO3 K2
9. What are eager learners? CO3 K1

CO’s Bloom’s
Q.No Questions
Level
Part – B
Illustrate in detail about the Bayesian Classification methods with an K3
1. example. CO3

2. Discuss about constraint based association rule mining with example CO3 K3

Outline the working principle of the support vector machine with a neat
3. sketch.
CO3 K4

Illustrate in detail about the Backpropagation classification methods with


4. an example.
CO3 K3
Elucidate the different techniques used to improve the classification
5. accuracy
CO3 K4

CO’s Bloom’s
Q.No Questions
Level
Part c
Evaluate the following dataset using Naive Bayes classification algorithm.
Sl. No. Color Legs Height Smelly Species

1 White 3 Short Yes M

2 Green 2 Tall No M

3 Green 3 Short Yes M

1. 4 White 3 Short Yes M CO3 K5

5 Green 2 Short No H

6 White 2 Tall No H

7 White 2 Tall No H

8 White 2 Short Yes H

Justify your answer: For a university dataset assume the necessary features
2. required for model evaluation and selection.
CO3 K5

UNIT IV : CLUSTERING TECHNIQUES


CO’s Bloom’s
Q.No Questions
Level
Part A
1. What is cluster analysis? CO4 K1
2. Define Clustering? CO4 K1
3. How is the quality of a cluster represented? CO4 K2
4. Define K-means partitioning CO4 K1
5. List the major clustering methods. CO4 K2
6. Define outlier. How will you determine outliers in the data? CO4 K1
7. Discuss the challenges of outlier detection. CO4 K2
8. Explain the typical phases of outlier detection methods. CO4 K2
9. Distinguish between Classification and clustering. CO4 K2
10. Give the methods of clustering high dimensional data. CO4 K2
11. How is the goodness of clusters measured? CO4 K2
12. Classify hierarchical clustering methods CO4 K2
13. Define grid-based method in clustering. CO4 K1
14. What are the applications of cluster analysis? CO4 K1
15. What is the concept of partitioning methods? CO4 K1
16 Define hierarchical method in clustering. CO4 K1
17 Define density-based method in clustering. CO4 K1
18 What are types of outliers? CO4 K1
19 Mention the applications of outlier CO4 K2
20 What is outlier analysis? CO4 K1
Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8). CO4 K2
a) Compute Euclidean distance
21 b) Compute Manhattan distance
c) Compute Minkowski distance, q = 3
Given 5-dimensional numeric samples A= (1,0,2,5,3) and B(2,1,0,3,-1). CO4 K2
22 Find Euclidean distance between points.

CO’s Bloom’s
Q.No Questions
Level
Part – B
Consider that the data mining task is to cluster the following eight points K3
A1,A2,A3,B1,B2,B3,C1AND C2(with (X,Y) representing location) into
three clusters A1(2,10) , A2(2,5) , A3(8,4) , B1(5,8) , B2(7,5) , B3(6,4) ,
C1(1,2) , C2(4,9).
1. The distance function is Euclidean distance. Suppose initially we assign A1,
CO4
B1 and C1 as the center of each cluster, respectively. Use the K-means
algorithm to show the three cluster centers after the first round of execution
and the final tree clusters.
K3
Use K-medoid algorithm to determine clusters for the following with k=2

Point X Y

P1 2 6

P2 3 4

P3 3 8

P4 4 7

2. P5 6 2 CO4

P6 6 4

P7 7 3

P8 7 4

P9 8 5

P10 7 6
K4
Outline the steps involved in the DBSCAN algorithm. Determine the core,
border, noise points from following data using DBSCAN. minpts=4 and
eps=1.9

Point X Y

P1 2 10

P2 2 5

P3 8 4
3. CO4
P4 5 8

P5 7 5

P6 6 4

P7 1 2

P8 4 9

4. Discuss about the requirements of Clustering in data mining. CO4 K3


Let us consider four points (X1,X2,X3,X4) with the following co-ordinate K3
as a two-dimensional samples for clustering

X1=(1,0) , X2=(0,1) , X3=(2,1) , X4=(3,3,)


5. CO4
a) Apply one iteration of the K-means partition clustering algorithm.
b) What is the change in the total square error?
c) Apply the second iteration of the K-means algorithm.
Clusters: C1=(X1,X3) C2=(X2,X4)
6. Analyze the different clustering techniques used in data mining. CO4 K4
7 Give an insight of various outlier detection methods used in data mining. CO4 K3
8 Analyze the various constraints while clustering high dimensional data. CO4 K4

CO’s Bloom’s
Q.No Questions
Level
Part C
Cluster the following eight points (with (x, y) representing locations)
into three clusters: (1, 2), (2, 5),(2, 10),(4, 9), (5, 8), (6, 4), (7, 5),(8,
4)
1. CO4 K5
Initial cluster centers are: (8, 4), (5, 8) (1, 2)
Use K-Means Algorithm to find the three cluster centers till the
second iteration.
Outline the steps involved in the DBSCAN algorithm. Determine the core,
border, noise points from following data using DBSCAN. minpts=4 and
eps=1.9

POINTS X Y

P1 3 7

P2 4 6

P3 5 5

P4 6 4

2. P5 7 3 CO4 K4
P6 6 2

P7 7 2

P8 8 4

P9 3 3

P10 2 6

P11 3 5

P12 2 4

UNIT V : WEKA TOOL

CO’s Bloom’s
Q.No Questions
Level
Part A
Why is data preprocessing needed? Name any four preprocessing filters CO5 K2
1. used in the WEKA tool.
2. What are the foundations of data mining? CO5 K1
3. Name some specific application oriented databases. CO5 K2
4. Explain how data mining is used in health care analysis. CO5 K1
5. Explain data mining applications for bio medical and DNA data analysis. CO5 K1
6. Differentiate between data mining and data warehousing. CO5 K2
7. What are the applications of data mining? CO5 K1
8. List out the various data mining tools. CO5 K2
9. What is a dataset? Give an example. CO5 K1
10. What is association-rule learner? CO5 K1
11. Draw the layout of the Weka tool. CO5 K1
12. List out the limitations of the Weka tool. CO5 K2
13. Write down the functionalities of the Weka tool. CO5 K1
14. What is auto import? Give an example. CO5 K1
15. List out various data warehouse tools. CO5 K2
CO’s Bloom’s
Q.No Questions
Level
Part B
1. Discuss in detail about the WEKA tool and its functionalities. CO5 K3
2. Outline the features involved in the Iris plant database in detail. CO5 K4
3. Outline the features involved in the breast cancer database in detail. CO5 K3
4. Give a detailed note on Association rule learner. CO5 K3
Evaluate the performance measures of the different classification CO5 K4
5. algorithm for Iris plant dataset using WEKA tool
Evaluate the performance measures of the different clustering algorithm CO5 K4
6. for breast cancer dataset using WEKA tool
Evaluate the performance measures of the different clustering algorithm CO5 K4
7 for Iris plant dataset using WEKA tool
Evaluate the performance measures of the different classification CO5 K4
8 algorithm for breast cancer dataset using WEKA tool

CO’s Bloom’s
Q.No Questions
Level
Part- C
Illustrate the steps involved in loading and classifying the Iris plant database in CO5 K5
1. the WEKA tool.
Elucidate the steps involved in loading and classifying Breast cancer databases CO5 K5
2. in the WEKA tool.

You might also like