0% found this document useful (0 votes)
16 views

B.Tech May2022 Comp CSPE-64 Sem4

The document is a theory examination question paper for a 4th semester B.Tech course on Data Mining and Data Warehousing. It contains 6 questions with multiple parts assessing different concepts related to data mining techniques, data pre-processing, clustering, association rule mining and classification.

Uploaded by

ankit12012064
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

B.Tech May2022 Comp CSPE-64 Sem4

The document is a theory examination question paper for a 4th semester B.Tech course on Data Mining and Data Warehousing. It contains 6 questions with multiple parts assessing different concepts related to data mining techniques, data pre-processing, clustering, association rule mining and classification.

Uploaded by

ankit12012064
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

NATIONAL INSTITUTE OF TECHNOLOGY.

KURUKSHETRA
THEORY EXAMINATION
RollNo

Montlr arrd year: May'2022 Total no. of pages used: 4


Program: B.Tech. Semester: 4th
Subject: Data Mining and Data Warehousing Course code: CSPE64
Maximum Marks: 50 Tirne allowed: 03 Hours

NOTE: l. The qttestion paper contqins SIX questions.

2. All questions ore compulsory.


3. Attempt all parls oJ'a question together al one place.

4. Assttme suitable data if missing.

O1. Attempt all parts of the following: [Marks: l+3+2+21

(i) Explain why the term Data Mining is a misnomer.

(ii) Draw a diagram depicting Data Mining as a step in the process of Knowledge Discovery
from Data (KDD).

(iii) Discuss whether or not each of the following activities is a data mining task
(a) Dividing the customers of a company according to their gender.
(b) Dividing the customers of a company according to their profitability.
(c) Computing the total sales of a company.
(d) Monitoring the heart rate of a patient for abnormalities.

(iv) Draw a Venn diagram showing the relationship of Data Mining with Artificial lntelligence
(AI), Machirre Learning (ML), and Deep Learning (DL).

O2. Atternptull parts ofthe following: [Marks:2* 4:81

(i) Classiff the following attributes as discrete or continuous. Also, classifu thern as qualitative
(nomirral or ordinal) or quantitative (interval or ratio).
(a) Angles as measured in degrees between 0 and 360.
(b) ISBN numbers for books.

(ii) A shot-put player records the following scores (in meters): I6.8, I 6.9,11 .1, 17.2, 17.8,
17 .9, 18.2, I 8.3, I 8.3, I 8.5. Find the l0% trimmed mean.

(iii) Determine the interquartile range value for the first ten prime numbers.

1l
(iv)Supposethatthe minimum and maximum values forthe attribute income are Rs 12,000
and 98,000, respectively. Also. the mean and standard are 54,000 and 16,000, respectively.
Normalize a value 73,000 for income using
(a) min-max normalization to the range [0.0, 1.0]
(b) z-score nonnal ization

Q]. Attempt all wrts of the following: [Marks: 2 * 4 = 8l


(i) Define the term data warehouse. Draw a diagram showing a typical framework for the
construction and use ofa data warehouse.

(ii) What is a data cube? Consider a data cube for summarized sales data of AltEtectronics
is presented in the below Figure. The cube has three dirnensions: address (with city values
chicago, New York. Toronto. vancouver). tirne (with quarter values er, e2, e3, e4). and
item (with item type values honre enteftainrnent, computer, phone. security). The aggregate
value stored in each cell of the cube is the sales amount (in thousands). Find the total sales for
the first quafier, QI, for the iterns related to security systems in Vancouver.

,,{tt ( ltt{irgr}
.-.t- N.,rr \i 'rk ,l
'rur*,nr,, t.

oNt .r{r'rrl
rv'anc+uver

Qr

- {}'
_ al(

hrr1y1.' lrhrrttt
r{ai trrrten(
,ra,r/ tl,vpts I

(iii) List out the major steps (or methods) involved in data pre-processing.

(iv) List out the uays of handling missing values.

04. Attempt all oarts of the following: [Marks: 2 +3+31


(i) Consider the following set of frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4\, ll, 2, 5}, {1, 3, 4}, { l, 3, 5}, {2, 3, 4), {2, 3, 5), {3, 4, 5}.
Assume that there are only five iterns in the data set.
(a) List allcandidate 4-itemsets obtained by the candidate generation procedure in Apriori.
(b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori .

2l
(ii) Draw a lattice structure forthe association rules generated fiom the
frequent itemset {a,
b, c, d). civen tlrat the confidence of the rule {a, b. d; --
{c} is low. Then by using confidence-
pruned rules in
based'pruning, identifu the rules that can be pruned and also highlight these
the lattice.

(iiD

T5e figprre beloq, shows a clata set that contains l0 transactions and 5 iter:rs along
with its FP-tree represe[tatior'

TID Items
1 ia,b)
2 {b,c,d}
3 {a,c,d,e} t*t-

4
5
{a,d,e} ' c:1
'
{a,b,c} 1d:1
6 {a,b,c,d} d: \
7 {a} '-v\--.1e:1
I {a,b,c} e:1
g
{a,h.d}
10 {b,c,e}

(Assume the minimum


Construct the conditional Fp-tree for the suffix cd using FP-growth
support count is 2). Also, find all the frequent itemsets generated from this conditional FP-tree'

[Marks:3 +2+31
o5. Attempt all parts of the following:
gain in
(i) Consider the following data set for a binary classification problem. Calculate the
tree induction
the Gini index when spti"ttlng on A and B. Which attribute would the decision
algorithm choose?

A B Clrl-ss Ltlxll
T F +
T T +
T T +
T F
T T +
F F
F F
F F
T T
T F

3l
)

(ii) Consider a training set that contains 100 positive exanrples and 400 negative examples.
Find the FOIL's information gain for a rule R: C --t * (which covers 100 positive and 90
negative exarnples).

(iii) Figure below matrix for medical data where the class values are yes
shor.vs a confusion
and no for a class label aftribute. calrcer. Calculate the sensitivity. specificity. overall accuracy,
precision, and reeall of the classifier.

Clrr.ir:s ll ft, I ntl


ys.s s0i il0
, l{} l4s I e5(r0

*4
O6. Attempt anv FOLIR parts of the following: [Marks: 2.5 =101

(i) Consider the I -dimensional data set with l0 data points 11,2.3,. . l0). Show three iterations
of the k-means algorithm when k : 2, and tlre random seeds are initialized to { I , 2}.

(ii) Use the similarity matrix in the below Table to perform single-link hierarchical clustering.
Show your results by drawing a dendrogram.

p1 p2 p3 p4 pl'r
p1 1.00 r).1il 0.,11 {). f-il: ( ).lJl-r

p2 0.1i} L.{10 0.6{ 0.47 0.gf{


p.1 ri.41 0.64 1.Oil (1,4,tr 0.85
p;1 u.55 0.47 0.,4,1 1.00 0.76
p5 0.35 0.(}8 0.85 0.7{,i 1.0u

(iii) How does DBSCAN find clusters? Explain briefly.

(iv) Write shoft notes on any one of the followings: Cross-Validation OR Bootstrap OR
Ensemble Methods.

(v) what Are outliers? Discuss a distance-based outlier detection method.

stsT 07 &teK

You might also like