0% found this document useful (0 votes)
29 views

Elective - Data Mining and Data Warehousing

1) The document is an exam paper for a computer science course assessing students' knowledge of data mining concepts and techniques. 2) It contains 3 questions testing students on data preprocessing, data mining processes, and typical data mining algorithms like chi-square tests, principal component analysis, and linear regression. 3) Students must choose 2 out of 3 subquestions for each of the 3 main questions, applying their understanding of data mining concepts to analyze various case studies and datasets.

Uploaded by

Tejas Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Elective - Data Mining and Data Warehousing

1) The document is an exam paper for a computer science course assessing students' knowledge of data mining concepts and techniques. 2) It contains 3 questions testing students on data preprocessing, data mining processes, and typical data mining algorithms like chi-square tests, principal component analysis, and linear regression. 3) Students must choose 2 out of 3 subquestions for each of the 3 main questions, applying their understanding of data mining concepts to analyze various case studies and datasets.

Uploaded by

Tejas Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Sanjay Ghodawat University, Kolhapur 2020-21

EXM/P/09/00
Established as State Private University under Govt. of Maharashtra. Act No XL, 2017
Year and Program School Department
TY, B.Tech Technology Computer Science & Engineering
Course Code Course Title Semester
CST313.3 Program Vertical I (DMDW) V/Odd
Date:10/9/2020 Examination: Max Marks:
Day : Thursday CAT - I 50
Time: 3.00 PM to 5.00 PM

Instructions: 1) Draw neat and suitable diagram


2) Use Chi-Square table for calculation
3) Mention formulae wherever necessary
Q.1 Attempt any two Marks Bloom’s CO
Level
a)Explain data cleaning and data integration steps of data 08 L2 CO1
preprocessing. Demonstrate Entity identification problem with
respect to student database data.
b) Explain Data normalization using 08 L2 CO1
1. Min- max transformation
2. Z score transformation
3. Decimal scaling
c) Demonstrate forward and inverse data transformation on 08 L3 CO2
following data using Haar discrete wavelet transform technique
of data reduction.
X={6,8,5,4,9,3,7}
Q.2 Attempt any two
a) Justify need of data mining . 08 L2 CO1
b) Explain data mining/ KDD process and present a case study of 08 L3 CO1
how data mining is used in business and medical field related
data.
c) Explain typical OLAP operation with suitable example. 08 L3 CO2
Q.3 Attempt any two
a) Use Chi-square (χ2) correlation test for given case study: A 09 L3 CO2
University conducted a survey of its undergraduate students. The
survey revealed that a substantial proportion of students were not
engaging in regular study, in response to a question on regular
study, 60% of all graduates reported doing no regular study,
25% reported reading periodically and 15% reported as regularly
doing study. The next year the University launched a
motivational program in campus in an attempt to increase
regular reading among undergraduates. To evaluate the impact
of the program, the University again surveyed graduates and
asked the same questions. The survey was completed by 470
graduates and the following data were collected as observed
frequency.

No
Periodic Regular
Regular Total
study study
study
No. of
255 125 90 470
Students

Based on the data, is there evidence of a shift in the distribution


of responses to the regular study question following the
implementation of the motivational program in campus? Run the
test at a 5% level of significance. (Hint : use null hypothesis:
“No change /no difference after motivational program”)

b) Demonstrate steps used in Principal Component Analysis 09 L3 CO2


(PCA). Calculate principal components for given database.

X Y
2 1
3 4
5 0
7 6
9 2

c) Apply linear regression for given database of plants production 09 L3 CO2


data. First row presents special case of no production but
electricity used for checking plant machines functionality prior
to production
y:
x:
Electricity
Production
usage(million
(million $)
kWh)
0 2
1 3
2 4
3 5
4 6
1) Find the linear regression line y = a x + b.
2) Estimate the value of electricity usage (y) for the production
(x) of 10.

*****

You might also like