0% found this document useful (0 votes)
28 views

Data Mining Merged

Uploaded by

Rishi Bathija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Data Mining Merged

Uploaded by

Rishi Bathija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Seat No.: ________ Enrolment No.

___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER– VI (NEW) EXAMINATION – WINTER 2021
Subject Code:3160714 Date:02/12/2021
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
MARKS
Q.1 (a) Justify the importance of data mining. 03
(b) Differentiate OLTP and data warehouse. 04
(c) Briefly discussed steps of KDD process. 07

Q.2 (a) Explain data reduction and dimensionality reduction? 03


(b) What do you mean by correlation analysis? Justify its importance. 04
(c) List common task involved in the data pre-processing. Explain briefly 07
any four tasks of data pre-processing with suitable example.
OR
(c) Define the following: 07
concept description, support, confidence, strong association rules, data
generalization, and unsupervised learning.
Q.3 (a) How the classification is differs from the prediction? Explain phases of 03
classification.
(b) Attribute income have minimum value of 12000 INR and maximum 04
value of 98000 INR. Normalize income value of 73600 INR,
(i) Using min-max normalization in the range of [0,1]
(ii) Using z-score normalization. Take mean value of income as 54000
and standard deviation is 16000.

(c) Using Apriori algorithm, find all frequent itemsets for following 07
transaction data.
( Take min_sup=60% and min_conf=80% )

ID Items
1 {M,O,N,K,E,Y}
2 {D,O,N,K,E,Y
3 {M,A,K,E}
4 {M,U,C,K,Y}
5 {C,O,O,K,I,E}
OR
Q.3 (a) What is the use of proximity measures? Explain any one proximity 03
measures with equation.
(b) Explain Bayesian learning and inference with suitable example. 04
(c) List the accuracy parameters used for the performance evaluation of 07
classification and discuss any five parameters with appropriate
example.
Q.4 (a) Differentiate supervised and unsupervised learning. 03
(b) Explain logistic regression with appropriate example. 04

1
(c) Explain working of decision tree algorithm with suitable example. 07

OR
Q.4 (a) Differentiate agglomerative and divisive methods of clustering. 03

(b) What do you mean by perceptron? Discuss single-layer and multi layer 04
perceptron.
(c) Explain K-means clustering algorithm and prove that outlier adversely 07
affect the performance of algorithm.
Q.5 (a) Give strength and weakness of k-means in comparison of k-medoids 03
algorithm.
(b) What is outlier? Why outlier mining is important? 04
(c) Write about different clustering approaches with their strength and 07
weakness.
OR
Q.5 (a) Briefly explain the spatial data mining and temporal mining. 03

(b) Discuss any four data mining features available in the WEKA. 04

(c) How data mining is useful for web mining. Discuss any four web 07
mining applications.

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI(NEW) EXAMINATION – WINTER 2022
Subject Code:3160714 Date:16-12-2022
Subject Name:Data Mining
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
Marks
Q.1 (a) Compare descriptive and predictive data mining. 03
(b) Explain the data mining functionalities. 04
(c) Explain major requirements and challenges in data mining. 07

Q.2 (a) What do you mean by concept hierarchy? 03


(b) Explain the smoothing techniques. 04
(c) What is Data Cleaning? Describe various methods of Data Cleaning. 07
OR
(c) Explain about the different Data Reduction techniques. 07

Q.3 (a) What are the techniques to improve the efficiency of Apriori algorithm? 03
(b) What is an Itemset? What is a Frequent Itemset? 04
(c) For the given data 07

Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips

Find the frequent itemsets and generate association rules on this. Assume
that minimum support threshold (s = 33.33%) and minimum confident
threshold (c = 60%).
OR
Q.3 (a) Describe the different classifications of Association rule mining. 03
(b) What is meant by Reduced Minimum Support? 04
(c) Explain the steps of the “Apriori Algorithm” for mining frequent itemsets 07
with suitable example.

Q.4 (a) What are Bayesian Classifiers? 03


(b) What are the hierarchical methods used in classification? 04
(c) Describe in detail about Rule based Classification. 07

1
OR
Q.4 (a) What is attribute selection measure? 03
(b) What is the difference between supervised and unsupervised learning 04
scheme.
(c) Describe the issues regarding classification and prediction. Write an 07
algorithm for decision tree.

Q.5 (a) List the requirements of clustering in data mining. 03


(b) Differentiate Agglomerative and Divisive Hierarchical Clustering? 04
(c) Write a short note: Web content mining. 07
OR
Q.5 (a) What is meant by hierarchical clustering? 03
(b) Illustrate strength and weakness of k-mean in comparison with k- 04
medoid algorithm.
(c) Write a short note: Web usage mining. 07
*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – WINTER 2023
Subject Code:3160714 Date:11-12-2023
Subject Name:Data Mining
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.

Marks

Q.1 (a) Define Data Warehouse. State it’s features. 03


(b) Differentiate between OLAP and OLTP. 04
(c) Explain in detail different steps of KDD process. 07

Q.2 (a) Why to preprocess the data in data Mining? 03


(b) Explain Binning method with the help of example. 04
(c) Explain following terms related to Association Rule Mining: 07
Itemset, Support Count, support, and Association rule.

Transaction ID Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
For given example find support & confidence for
{Milk, Chocolate} ⇒ Pepsi.
{Milk, Pepsi} → {Chocolate}
{Chocolate, Pepsi} → {Milk}
OR
(c) Solve the following problem using Apriori algorithm. 07
Find the frequent itemsets and generate association rules on this.
Assume that minimum support threshold (s = 33.33%), minimum
confident threshold (c = 60%), minimum support count=2.

Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips

Q.3 (a) Define the following terms in Data Transformation: 03


i. Smoothing
ii. Normalization

1
iii. Discretization
(b) Differentiate between Classification and Prediction. 04
(c) Explain Decision Tree Classification algorithm with the help of 07
example.
OR
Q.3 (a) Differentiate between supervised learning and unsupervised 03
learning.
(b) What is Regression? Explain Linear Regression in short. 04
(c) Explain Naïve Bayes Classifier with example. 07
Q.4 (a) What do you mean by Tree Pruning? Explain with example. 03
(b) Explain the following as attribute selection measure: 04
(i) Information Gain
(ii) Gain Ratio
(c) What do you mean by learning-by-observation? Explain k-Means 07
clustering algorithm in detail.
OR
Q.4 (a) Define Data Cube. Explain any two operations on it. 03
(b) Differentiate between Partition method and Hierarchical method of 04
Clustering.
(c) What are the requirements of Clustering in Data Mining? 07

Q.5 (a) How K-Mean clustering method differs from K-Medoid clustering 03
method?
(b) Draw and explain the topology of a multilayer, feed-forward Neural 04
Network.
(c) Explain the major issues in data mining. 07
OR
Q.5 (a) Give difference between text mining and web mining. 03
(b) Why Hadoop is important? 04
(c) What is web log? Explain web structure mining and web usage 07
mining in detail.
************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – SUMMER 2023
Subject Code:3160714 Date:12-07-2023
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.

MARKS
Q.1 (a) What is market basket analysis? Precisely explain the meaning of the 03
following association rule:
computer → antivirus_software [support = 60%, confidence = 60%]
(b) In real-world data, tuples with missing values for some attributes are a 04
common occurrence. List and describe various methods for handling this
problem.
(c) With the help of a suitable diagram, describe the steps involved in data 07
mining when viewed as a process of knowledge discovery.

Q.2 (a) Give a short example to show that items in a strong association rule are 03
not always interesting.
(b) Briefly describe how partitioning technique may improve the efficiency 04
of Apriori algorithm.
(c) Discuss how frequent itemsets can be generated using FP-Growth 07
algorithm with the help of the following transactions. Let minimum
support threshold is 2.
Transaction ID Item IDs
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

OR
(c) A database has the following six transactions. 07
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 Chips, Coke
T3 Coke, Chips, HotDogs
T4 Ketchup, Chips
T5 Buns, HotDogs
T6 HotDogs, Chips, Coke

1
Find all frequent itemsets and also generate the strong association rules
using Apriori algorithm. Let minimum support threshold is 33.34% and
minimum confidence threshold is 60%.

Q.3 (a) Describe any three primitives for specifying a data mining task. 03
(b) The following table shows the midterm and final exam grades obtained 04
by students in a database course.
x (Midterm exam) y (Final exam)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
Use the method of least squares to find an equation for the prediction of
a student’s final exam grade based on the student’s midterm grade in the
course. Predict the final exam grade of a student who received 86 grade
in the midterm exam.
(c) What is noise? Describe the possible reasons for noisy data. Explain the 07
different techniques to remove the noise from data.
OR
Q.3 (a) Discuss outlier analysis as a data mining functionality with the help of 03
an example.
(b) Explain how classification rules are extracted from a decision tree with 04
the help of an example.
(c) Explain in detail - min-max normalization method. Use this method to 07
normalize the following group of data by setting min = 0 and max = 1.
200, 400, 600, 1000

Q.4 (a) Differentiate classification and clustering. 03


(b) Discuss data matrix and dissimilarity matrix with respect to clustering. 04
(c) Apply ID3 classification algorithm on the following data and construct a 07
decision tree. Show all the stepwise calculations clearly.
Class:
age income student credit_rating
buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
2
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
OR
Q.4 (a) Discuss cross-validation method for evaluating the accuracy of a 03
classifier.
(b) How k-means clustering method differs from k-medoids clustering 04
method? Discuss major drawbacks of k-means clustering method.
(c) Predict class label of the tuple X = (age = youth, income = medium, 07
student = yes, credit_rating = fair) with the help of Naive Bayesian
classification method and the following data. Show all the stepwise
calculations clearly.
Class:
age income student credit_rating
buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no

Q.5 (a) Discuss web structure mining. 03


(b) Discuss multimedia mining. 04
(c) Suppose that the data mining task is to cluster the following eight points 07
(with (x, y) representing location) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4,
9)
The distance function is Euclidean distance. Suppose initially we assign
A1, B1, and C1 as the center of each cluster, respectively. With the help
of k-means algorithm calculate,
(i) The three cluster centers after the first round execution
(ii) The final three clusters
OR
Q.5 (a) Discuss agglomerative hierarchical clustering method in brief. 03
(b) Explain the any four typical requirements of clustering in data mining. 04
(c) What is web mining? Explain web usage mining in detail. 07

*************

3
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – SUMMER 2022
Subject Code:3160714 Date:08/06/2022
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
Marks
Q.1 (a) What are the types of data? 03
(b) Compare descriptive and predictive data mining 04
(c) Draw and explain the data mining architecture. 07

Q.2 (a) What is dimensionality reduction? 03


(b) What are the types of concept hierarchies? 04
(c) What is Data Cleaning? Describe various methods of Data Cleaning. 07
OR
(c) Discuss issues to be considered during data integration. 07

Q.3 (a) What is meant by association rule? 03


(b) How is association rules mined from large databases? 04
(c) Explain the various criteria for the classification of frequent pattern mining. 07
OR
Q.3 (a) List two interesting measures for association rules. 03
(b) What is meant by multidimensional association rules? 04
(c) Write short notes on Maximal Frequent Item Set &Closed Frequent Item Set. 07

Q.4 (a) What is an outlier? 03


(b) What is Bayesian theorem? 04
(c) Demonstrate how Bayesian classification helps in predicting class 07
membership probabilities.
OR
Q.4 (a) Differentiate classification and prediction. 03
(b) What is the difference between “supervised” and unsupervised” learning 04
scheme.
(c) Explain the issues regarding the classification and prediction. 07

Q.5 (a) What is temporal mining? 03


(b) Explain web usage mining. 04
(c) Discuss the K-means clustering algorithm using examples. 07
OR
Q.5 (a) What is multimedia mining? 03
(b) Explain web content mining. 04
(c) What do you meant by Clustering? Explain the requirements used in 07
Clustering?
*************

You might also like