0% found this document useful (0 votes)

16 views

B.Tech May2022 Comp CSPE-64 Sem4

The document is a theory examination question paper for a 4th semester B.Tech course on Data Mining and Data Warehousing. It contains 6 questions with multiple parts assessing different concepts related to data mining techniques, data pre-processing, clustering, association rule mining and classification.

Uploaded by

ankit12012064

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

B.Tech May2022 Comp CSPE-64 Sem4

Uploaded by

ankit12012064

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NATIONAL INSTITUTE OF TECHNOLOGY.

KURUKSHETRA
THEORY EXAMINATION
RollNo

Montlr arrd year: May'2022 Total no. of pages used: 4

Program: B.Tech. Semester: 4th
Subject: Data Mining and Data Warehousing Course code: CSPE64
Maximum Marks: 50 Tirne allowed: 03 Hours

NOTE: l. The qttestion paper contqins SIX questions.

2. All questions ore compulsory.

3. Attempt all parls oJ'a question together al one place.

4. Assttme suitable data if missing.

O1. Attempt all parts of the following: [Marks: l+3+2+21

(i) Explain why the term Data Mining is a misnomer.

(ii) Draw a diagram depicting Data Mining as a step in the process of Knowledge Discovery
from Data (KDD).

(iii) Discuss whether or not each of the following activities is a data mining task
(a) Dividing the customers of a company according to their gender.
(b) Dividing the customers of a company according to their profitability.
(c) Computing the total sales of a company.
(d) Monitoring the heart rate of a patient for abnormalities.

(iv) Draw a Venn diagram showing the relationship of Data Mining with Artificial lntelligence
(AI), Machirre Learning (ML), and Deep Learning (DL).

O2. Atternptull parts ofthe following: [Marks:2* 4:81

(i) Classiff the following attributes as discrete or continuous. Also, classifu thern as qualitative
(nomirral or ordinal) or quantitative (interval or ratio).
(a) Angles as measured in degrees between 0 and 360.
(b) ISBN numbers for books.

(ii) A shot-put player records the following scores (in meters): I6.8, I 6.9,11 .1, 17.2, 17.8,
17 .9, 18.2, I 8.3, I 8.3, I 8.5. Find the l0% trimmed mean.

(iii) Determine the interquartile range value for the first ten prime numbers.

1l
(iv)Supposethatthe minimum and maximum values forthe attribute income are Rs 12,000
and 98,000, respectively. Also. the mean and standard are 54,000 and 16,000, respectively.
Normalize a value 73,000 for income using
(a) min-max normalization to the range [0.0, 1.0]
(b) z-score nonnal ization

Q]. Attempt all wrts of the following: [Marks: 2 * 4 = 8l

(i) Define the term data warehouse. Draw a diagram showing a typical framework for the
construction and use ofa data warehouse.

(ii) What is a data cube? Consider a data cube for summarized sales data of AltEtectronics
is presented in the below Figure. The cube has three dirnensions: address (with city values
chicago, New York. Toronto. vancouver). tirne (with quarter values er, e2, e3, e4). and
item (with item type values honre enteftainrnent, computer, phone. security). The aggregate
value stored in each cell of the cube is the sales amount (in thousands). Find the total sales for
the first quafier, QI, for the iterns related to security systems in Vancouver.

,,{tt ( ltt{irgr}
.-.t- N.,rr \i 'rk ,l
'rur*,nr,, t.

oNt .r{r'rrl
rv'anc+uver

- {}'
_ al(

hrr1y1.' lrhrrttt
r{ai trrrten(
,ra,r/ tl,vpts I

(iii) List out the major steps (or methods) involved in data pre-processing.

(iv) List out the uays of handling missing values.

04. Attempt all oarts of the following: [Marks: 2 +3+31

(i) Consider the following set of frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4\, ll, 2, 5}, {1, 3, 4}, { l, 3, 5}, {2, 3, 4), {2, 3, 5), {3, 4, 5}.
Assume that there are only five iterns in the data set.
(a) List allcandidate 4-itemsets obtained by the candidate generation procedure in Apriori.
(b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori .

2l
(ii) Draw a lattice structure forthe association rules generated fiom the
frequent itemset {a,
b, c, d). civen tlrat the confidence of the rule {a, b. d; --
{c} is low. Then by using confidence-
pruned rules in
based'pruning, identifu the rules that can be pruned and also highlight these
the lattice.

(iiD

T5e figprre beloq, shows a clata set that contains l0 transactions and 5 iter:rs along
with its FP-tree represe[tatior'

TID Items
1 ia,b)
2 {b,c,d}
3 {a,c,d,e} t*t-

4
5
{a,d,e} ' c:1
'
{a,b,c} 1d:1
6 {a,b,c,d} d: \
7 {a} '-v\--.1e:1
I {a,b,c} e:1
g
{a,h.d}
10 {b,c,e}

(Assume the minimum

Construct the conditional Fp-tree for the suffix cd using FP-growth
support count is 2). Also, find all the frequent itemsets generated from this conditional FP-tree'

[Marks:3 +2+31
o5. Attempt all parts of the following:
gain in
(i) Consider the following data set for a binary classification problem. Calculate the
tree induction
the Gini index when spti"ttlng on A and B. Which attribute would the decision
algorithm choose?

A B Clrl-ss Ltlxll
T F +
T T +
T T +
T F
T T +
F F
F F
F F
T T
T F

3l
)

(ii) Consider a training set that contains 100 positive exanrples and 400 negative examples.
Find the FOIL's information gain for a rule R: C --t * (which covers 100 positive and 90
negative exarnples).

(iii) Figure below matrix for medical data where the class values are yes
shor.vs a confusion
and no for a class label aftribute. calrcer. Calculate the sensitivity. specificity. overall accuracy,
precision, and reeall of the classifier.

Clrr.ir:s ll ft, I ntl

ys.s s0i il0
, l{} l4s I e5(r0

*4
O6. Attempt anv FOLIR parts of the following: [Marks: 2.5 =101

(i) Consider the I -dimensional data set with l0 data points 11,2.3,. . l0). Show three iterations
of the k-means algorithm when k : 2, and tlre random seeds are initialized to { I , 2}.

(ii) Use the similarity matrix in the below Table to perform single-link hierarchical clustering.
Show your results by drawing a dendrogram.

p1 p2 p3 p4 pl'r
p1 1.00 r).1il 0.,11 {). f-il: ( ).lJl-r

p2 0.1i} L.{10 0.6{ 0.47 0.gf{

p.1 ri.41 0.64 1.Oil (1,4,tr 0.85
p;1 u.55 0.47 0.,4,1 1.00 0.76
p5 0.35 0.(}8 0.85 0.7{,i 1.0u

(iii) How does DBSCAN find clusters? Explain briefly.

(iv) Write shoft notes on any one of the followings: Cross-Validation OR Bootstrap OR
Ensemble Methods.

(v) what Are outliers? Discuss a distance-based outlier detection method.

stsT 07 &teK

PrivyID OAuth Documentation v1.0
No ratings yet
PrivyID OAuth Documentation v1.0
14 pages
12.6.1 Packet Tracer - Troubleshooting Challenge - Use Documentation To Solve Issues
0% (1)
12.6.1 Packet Tracer - Troubleshooting Challenge - Use Documentation To Solve Issues
3 pages
EWM - Cancel Picking/ Cancel Outbound Process: Usage
No ratings yet
EWM - Cancel Picking/ Cancel Outbound Process: Usage
7 pages
Practical RF Circuit Design
100% (3)
Practical RF Circuit Design
47 pages
6th Sem Pyq Paper
No ratings yet
6th Sem Pyq Paper
12 pages
BCA
No ratings yet
BCA
76 pages
BDS306C
No ratings yet
BDS306C
4 pages
Mets Digital Library: Third Semester B.Tech. (Engineering) Degree Exal'Iination, Decel/'Lber 2008
No ratings yet
Mets Digital Library: Third Semester B.Tech. (Engineering) Degree Exal'Iination, Decel/'Lber 2008
17 pages
SY BSC Computer Science PDF
No ratings yet
SY BSC Computer Science PDF
96 pages
Data Structures, III Sem, June 2019
No ratings yet
Data Structures, III Sem, June 2019
2 pages
2016 MST
No ratings yet
2016 MST
1 page
Dwm Question Bank Winter 2024
No ratings yet
Dwm Question Bank Winter 2024
4 pages
Comp SC 18
No ratings yet
Comp SC 18
8 pages
Unipune QP May-18
No ratings yet
Unipune QP May-18
5 pages
DAV QP Dec 2022
No ratings yet
DAV QP Dec 2022
28 pages
Assign em NT
No ratings yet
Assign em NT
2 pages
Cs 60
No ratings yet
Cs 60
80 pages
B.tech CSE (CBA_BDA_CS) Sem-6 DAW-Reg-remi- exam APRIL-JUNE 2023
No ratings yet
B.tech CSE (CBA_BDA_CS) Sem-6 DAW-Reg-remi- exam APRIL-JUNE 2023
47 pages
b.c.a ( Science ) 2019 Pattern (2)
No ratings yet
b.c.a ( Science ) 2019 Pattern (2)
48 pages
Mca 2 Sem Data Structure Using C Nmca 213 2014 15
No ratings yet
Mca 2 Sem Data Structure Using C Nmca 213 2014 15
6 pages
Data Structures
No ratings yet
Data Structures
4 pages
ML - TH - Assignment 2 - 2024-25 - TA1728472836250
No ratings yet
ML - TH - Assignment 2 - 2024-25 - TA1728472836250
4 pages
B.Sc. Special Honou S Degr Information Tech Ology: Sri Lanka Institute of Inform
No ratings yet
B.Sc. Special Honou S Degr Information Tech Ology: Sri Lanka Institute of Inform
8 pages
15A05602 Data Warehousing & Mining
No ratings yet
15A05602 Data Warehousing & Mining
2 pages
June-July2024.22
No ratings yet
June-July2024.22
3 pages
Bsc Computer Science Cs Semester 6 2023 April Data Analytics 2019 Pattern
No ratings yet
Bsc Computer Science Cs Semester 6 2023 April Data Analytics 2019 Pattern
3 pages
L-LRR: - 21CSE Date: 29/10/2023
No ratings yet
L-LRR: - 21CSE Date: 29/10/2023
26 pages
DSA Worksheet One-2017EC
No ratings yet
DSA Worksheet One-2017EC
2 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
math Business Math.(Ques.7 year)
No ratings yet
math Business Math.(Ques.7 year)
15 pages
TPNNR: Ffixnnmlwatron
No ratings yet
TPNNR: Ffixnnmlwatron
2 pages
Commerce Bba-Ca Semester-3 2023 April Software-Engineering-2019-Pattern
No ratings yet
Commerce Bba-Ca Semester-3 2023 April Software-Engineering-2019-Pattern
2 pages
Oct 2023 B.SC Sem 1
No ratings yet
Oct 2023 B.SC Sem 1
13 pages
DWBI Assignment 17B
0% (1)
DWBI Assignment 17B
3 pages
bsc_computer-science-cs_semester-4_2023_april_data-structures-and-algorithms-ii-2019-pattern
No ratings yet
bsc_computer-science-cs_semester-4_2023_april_data-structures-and-algorithms-ii-2019-pattern
2 pages
2023 Oct Paper
0% (1)
2023 Oct Paper
58 pages
Topic-wise-analysis-DSA
No ratings yet
Topic-wise-analysis-DSA
32 pages
DATA STRUCTURES_Lateral Entry
No ratings yet
DATA STRUCTURES_Lateral Entry
4 pages
Advanced Databases Jan 2024
No ratings yet
Advanced Databases Jan 2024
2 pages
Be Computer Engineering Semester 3 2023 December Data Structurerev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 3 2023 December Data Structurerev 2019 C Scheme
2 pages
Mid Semster Exam QP
100% (2)
Mid Semster Exam QP
2 pages
CST466 DATA MINING, OCTOBER 2023.pdf - Crdownload
No ratings yet
CST466 DATA MINING, OCTOBER 2023.pdf - Crdownload
3 pages
Bit 2202 Data Structures and Algorithms MS2
100% (1)
Bit 2202 Data Structures and Algorithms MS2
7 pages
TY Bsc Computer Sci Sem IV
No ratings yet
TY Bsc Computer Sci Sem IV
16 pages
DS Worksheet One
No ratings yet
DS Worksheet One
3 pages
Bsc Computer Science Cs Semester 1 2023 April Problem Solving Using Computer and c Programming 2019 Pattern
No ratings yet
Bsc Computer Science Cs Semester 1 2023 April Problem Solving Using Computer and c Programming 2019 Pattern
3 pages
Sy Sem 1 PDF
No ratings yet
Sy Sem 1 PDF
23 pages
DATASTRUCTUREALGORITHMS-2017_3rdsem
No ratings yet
DATASTRUCTUREALGORITHMS-2017_3rdsem
7 pages
Forni TP 2018181 May/Jtjne 2018
No ratings yet
Forni TP 2018181 May/Jtjne 2018
28 pages
DATASTRUCTUREALGORITHMS-2019NEW_3rdsem-Copy-Copy
No ratings yet
DATASTRUCTUREALGORITHMS-2019NEW_3rdsem-Copy-Copy
7 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Btech It 6 Sem Data Analytics Kit601 2022
No ratings yet
Btech It 6 Sem Data Analytics Kit601 2022
2 pages
b.c.a ( Science ) 2019 Pattern
No ratings yet
b.c.a ( Science ) 2019 Pattern
46 pages
Bsc Computer Science Cs Semester 3 2023 November Software Engineering 2019 Pattern
No ratings yet
Bsc Computer Science Cs Semester 3 2023 November Software Engineering 2019 Pattern
2 pages
B.C.A ( SCIENCE ) Oct 2022-23-25
No ratings yet
B.C.A ( SCIENCE ) Oct 2022-23-25
3 pages
bsc_computer-science-cs_semester-5_2023_april_web-technologies-i-2019-pattern
No ratings yet
bsc_computer-science-cs_semester-5_2023_april_web-technologies-i-2019-pattern
2 pages
S Y B SC Sem II April2010
No ratings yet
S Y B SC Sem II April2010
67 pages
640005
No ratings yet
640005
4 pages
Be Computer Engineering Semester 3 2022 December Discrete Structures and Graph Theoryrev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 3 2022 December Discrete Structures and Graph Theoryrev 2019 C Scheme
3 pages
DS Jul 2023
No ratings yet
DS Jul 2023
3 pages
2015 Data Structures and Algorithms
No ratings yet
2015 Data Structures and Algorithms
4 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Eet 3305 Transmission Lines 2
No ratings yet
Eet 3305 Transmission Lines 2
4 pages
Slab Design
No ratings yet
Slab Design
3 pages
19.1 Estimation - Cie Igcse Maths 0580-Ext Theory-Qp
No ratings yet
19.1 Estimation - Cie Igcse Maths 0580-Ext Theory-Qp
3 pages
Combined Stresses and Mohr'S Circle: James Doane, PHD, Pe
No ratings yet
Combined Stresses and Mohr'S Circle: James Doane, PHD, Pe
41 pages
JKM355 375M 60HLM (V) F1 en
No ratings yet
JKM355 375M 60HLM (V) F1 en
2 pages
Adv Dlmalloc.
No ratings yet
Adv Dlmalloc.
35 pages
Equation of Motion and Vorticity Transport
No ratings yet
Equation of Motion and Vorticity Transport
54 pages
ProductMix (Opt) SOLVER
No ratings yet
ProductMix (Opt) SOLVER
3 pages
1.6b Isotropic and Anisotropic Minerals
No ratings yet
1.6b Isotropic and Anisotropic Minerals
44 pages
Lab Equipment GB
No ratings yet
Lab Equipment GB
1 page
Quickstart Guide To Text Analysis With Textstat
No ratings yet
Quickstart Guide To Text Analysis With Textstat
2 pages
FINAL PROJECT ON LINEAR MAPPING (M.SC)
No ratings yet
FINAL PROJECT ON LINEAR MAPPING (M.SC)
31 pages
TB 111 65475e0e903cd6.65475e1023ddf5.26934364
No ratings yet
TB 111 65475e0e903cd6.65475e1023ddf5.26934364
3 pages
Xna Multi Threading
No ratings yet
Xna Multi Threading
36 pages
01 Mitsubishi L2e Spec Sheet
No ratings yet
01 Mitsubishi L2e Spec Sheet
2 pages
2 Comparative Study of DFIG Power Control Using Stator
No ratings yet
2 Comparative Study of DFIG Power Control Using Stator
8 pages
SPD3 Subplate
No ratings yet
SPD3 Subplate
5 pages
Learning Module in BMGT22
No ratings yet
Learning Module in BMGT22
31 pages
SASMO 2013 Sec 3 Contest
100% (2)
SASMO 2013 Sec 3 Contest
14 pages
Hydrosphere: Study Guide For Module No. 7
No ratings yet
Hydrosphere: Study Guide For Module No. 7
7 pages
Programmable Logic Devices (PLDS)
No ratings yet
Programmable Logic Devices (PLDS)
8 pages
Pygad: An Intuitive Genetic Algorithm Python Library: Ahmed Fawzy Gad
No ratings yet
Pygad: An Intuitive Genetic Algorithm Python Library: Ahmed Fawzy Gad
6 pages
IMG - 0522 EE PreBoard Exam 3
No ratings yet
IMG - 0522 EE PreBoard Exam 3
1 page
Building Energy Use Prediction Using Time Series Analysis
No ratings yet
Building Energy Use Prediction Using Time Series Analysis
5 pages
RE_SpringVeneer_ENG1
No ratings yet
RE_SpringVeneer_ENG1
7 pages
University of Danang University of Foreign Language Studies Faculty of English
No ratings yet
University of Danang University of Foreign Language Studies Faculty of English
17 pages

B.Tech May2022 Comp CSPE-64 Sem4

Uploaded by

B.Tech May2022 Comp CSPE-64 Sem4

Uploaded by

NATIONAL INSTITUTE OF TECHNOLOGY.

Montlr arrd year: May'2022 Total no. of pages used: 4

NOTE: l. The qttestion paper contqins SIX questions.

2. All questions ore compulsory.

4. Assttme suitable data if missing.

O1. Attempt all parts of the following: [Marks: l+3+2+21

(i) Explain why the term Data Mining is a misnomer.

O2. Atternptull parts ofthe following: [Marks:2* 4:81

Q]. Attempt all wrts of the following: [Marks: 2 * 4 = 8l

(iv) List out the uays of handling missing values.

04. Attempt all oarts of the following: [Marks: 2 +3+31

(Assume the minimum

Clrr.ir:s ll ft, I ntl

p2 0.1i} L.{10 0.6{ 0.47 0.gf{

(iii) How does DBSCAN find clusters? Explain briefly.

(v) what Are outliers? Discuss a distance-based outlier detection method.

You might also like