B.Tech May2022 Comp CSPE-64 Sem4
B.Tech May2022 Comp CSPE-64 Sem4
KURUKSHETRA
THEORY EXAMINATION
RollNo
(ii) Draw a diagram depicting Data Mining as a step in the process of Knowledge Discovery
from Data (KDD).
(iii) Discuss whether or not each of the following activities is a data mining task
(a) Dividing the customers of a company according to their gender.
(b) Dividing the customers of a company according to their profitability.
(c) Computing the total sales of a company.
(d) Monitoring the heart rate of a patient for abnormalities.
(iv) Draw a Venn diagram showing the relationship of Data Mining with Artificial lntelligence
(AI), Machirre Learning (ML), and Deep Learning (DL).
(i) Classiff the following attributes as discrete or continuous. Also, classifu thern as qualitative
(nomirral or ordinal) or quantitative (interval or ratio).
(a) Angles as measured in degrees between 0 and 360.
(b) ISBN numbers for books.
(ii) A shot-put player records the following scores (in meters): I6.8, I 6.9,11 .1, 17.2, 17.8,
17 .9, 18.2, I 8.3, I 8.3, I 8.5. Find the l0% trimmed mean.
(iii) Determine the interquartile range value for the first ten prime numbers.
1l
(iv)Supposethatthe minimum and maximum values forthe attribute income are Rs 12,000
and 98,000, respectively. Also. the mean and standard are 54,000 and 16,000, respectively.
Normalize a value 73,000 for income using
(a) min-max normalization to the range [0.0, 1.0]
(b) z-score nonnal ization
(ii) What is a data cube? Consider a data cube for summarized sales data of AltEtectronics
is presented in the below Figure. The cube has three dirnensions: address (with city values
chicago, New York. Toronto. vancouver). tirne (with quarter values er, e2, e3, e4). and
item (with item type values honre enteftainrnent, computer, phone. security). The aggregate
value stored in each cell of the cube is the sales amount (in thousands). Find the total sales for
the first quafier, QI, for the iterns related to security systems in Vancouver.
,,{tt ( ltt{irgr}
.-.t- N.,rr \i 'rk ,l
'rur*,nr,, t.
oNt .r{r'rrl
rv'anc+uver
Qr
- {}'
_ al(
hrr1y1.' lrhrrttt
r{ai trrrten(
,ra,r/ tl,vpts I
(iii) List out the major steps (or methods) involved in data pre-processing.
2l
(ii) Draw a lattice structure forthe association rules generated fiom the
frequent itemset {a,
b, c, d). civen tlrat the confidence of the rule {a, b. d; --
{c} is low. Then by using confidence-
pruned rules in
based'pruning, identifu the rules that can be pruned and also highlight these
the lattice.
(iiD
T5e figprre beloq, shows a clata set that contains l0 transactions and 5 iter:rs along
with its FP-tree represe[tatior'
TID Items
1 ia,b)
2 {b,c,d}
3 {a,c,d,e} t*t-
4
5
{a,d,e} ' c:1
'
{a,b,c} 1d:1
6 {a,b,c,d} d: \
7 {a} '-v\--.1e:1
I {a,b,c} e:1
g
{a,h.d}
10 {b,c,e}
[Marks:3 +2+31
o5. Attempt all parts of the following:
gain in
(i) Consider the following data set for a binary classification problem. Calculate the
tree induction
the Gini index when spti"ttlng on A and B. Which attribute would the decision
algorithm choose?
A B Clrl-ss Ltlxll
T F +
T T +
T T +
T F
T T +
F F
F F
F F
T T
T F
3l
)
(ii) Consider a training set that contains 100 positive exanrples and 400 negative examples.
Find the FOIL's information gain for a rule R: C --t * (which covers 100 positive and 90
negative exarnples).
(iii) Figure below matrix for medical data where the class values are yes
shor.vs a confusion
and no for a class label aftribute. calrcer. Calculate the sensitivity. specificity. overall accuracy,
precision, and reeall of the classifier.
*4
O6. Attempt anv FOLIR parts of the following: [Marks: 2.5 =101
(i) Consider the I -dimensional data set with l0 data points 11,2.3,. . l0). Show three iterations
of the k-means algorithm when k : 2, and tlre random seeds are initialized to { I , 2}.
(ii) Use the similarity matrix in the below Table to perform single-link hierarchical clustering.
Show your results by drawing a dendrogram.
p1 p2 p3 p4 pl'r
p1 1.00 r).1il 0.,11 {). f-il: ( ).lJl-r
(iv) Write shoft notes on any one of the followings: Cross-Validation OR Bootstrap OR
Ensemble Methods.
stsT 07 &teK