0% found this document useful (0 votes)
2 views3 pages

UCT633_EST_23

This document outlines the End Semester Test (EST) for the Data Mining and Analytics course at Thapar Institute, detailing the exam date, duration, and marks. It includes a series of questions covering topics such as data classification, statistical modeling, clustering, overfitting, and hyperparameter tuning. Students are instructed to attempt five questions in sequence and assume missing data as needed.

Uploaded by

goelvansh1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

UCT633_EST_23

This document outlines the End Semester Test (EST) for the Data Mining and Analytics course at Thapar Institute, detailing the exam date, duration, and marks. It includes a series of questions covering topics such as data classification, statistical modeling, clustering, overfitting, and hyperparameter tuning. Students are instructed to attempt five questions in sequence and assume missing data as needed.

Uploaded by

goelvansh1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Roll Number: __________________________

Thapar Institute of Engineering & Technology, Patiala


Department of Computer Science and Engineering
End Semester Test (EST)
B.E. (CSBS) Course Code: UCT633
Course Name: Data Mining and Analytics
Date: May 17, 2024 Time: 02:00 PM – 05:00 PM
Duration: 3 Hours, M. Marks: 40 Name of Faculty: Dr. Jatin Bedi

Note: Attempt any five questions in a proper sequence. Assume missing data, if any, suitably.

CO BL
Q1 Classify the following attributes as binary, discrete, or continuous. Also classify them as 6 CO1 L4
(a) qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have
more than one interpretation, so briefly indicate your reasoning if you think there may
be some ambiguity. (Example: Age in years. Answer: Discrete, quantitative, ratio)
a) Time in terms of AM or PM
b) Brightness as measured by a light meter.
c) Brightness as measured by people’s judgments.
d) Bronze, Silver, and gold medals as awarded at the Olympics.
Q1 Given the following data matrix A, 2 C02 L3
(b) 1 0 1
𝐴=[ ]
0 −1 0
a) Compute the covariance matrix of A
b) Compute Eigen Values and Eigen Vectors of Covariance (A).

Q2 Discuss the following in detail giving suitable examples: 3 C05 L2


(a) a) Causality and Correlation
b) Stationarity and Differencing

Q2 Consider the dataset given below: 5 C05 L3


(b) 6.4, 5.6, 7.8, 8.8, 11, 11.6, 16.7, 15.3, 21.6, 22.4

a) Fit a single exponential smoothing model with alpha=0.977 to estimate data values.
b) Fit a double exponential smoothing model (with alpha=0.3623, gamma= 1.0, S1=6.4
and b1=0.8) to estimate data values.
c) Compare the estimated results of single and double exponential smoothing models.
Q3 Consider you have been given the following eight data points representing temperatures 5 C03 L3
(a) (in Celsius) and humidity levels (in percentage) recorded in different cities:

A = (20, 60), B = (25, 55), C = (30, 70), D = (22, 65), E = (27, 75), F = (18, 50), G = (28, 80),
H = (23, 45).

a) Apply agglomerative clustering with single linkage method to group the cities based
on their weather patterns. Use Euclidean measure to compute the initial distance
matrix and show the clustering output at each step.
b) Generate Dendrogram to visualize the clustering results and determine the optimal
number of clusters.

Q3 Define and explain the four crucial components of a time series. Use a real-world 3 C05 L1
(b) example to illustrate each component's influence on time series data analysis.

1/2
Q4 What is overfitting? How regularization helps in handling overfitting in machine learning 8 C04 L2
models? Derive the coefficients equation of ridge regularization for multiple linear
regression using gradient descent optimization and discuss how the ridge regression can
shrink the regression coefficients.

Q5 Consider a dataset containing information about customer transactions, where each 8 C03 L3
sample has two features: Feature X (representing transaction amount) and Feature Y
(representing transaction frequency). The samples belong to two classes: class 0
(representing non-fraudulent transactions) and class 1 (representing fraudulent
transactions). The dataset is provided below:
Feature X (in $) Feature Y Class Label
336 51 1
266 32 0
234 43 0
431 67 0
353 23 1
359 67 1
367 45 1
257 46 0
233 29 0
310 56 1

a) Using the k-nearest neighbor classifier with different values of K (1, 3, 5 and 7) and
the Euclidean distance measure, classify a given test sample <feature X= 436 and
feature Y= 40>
b) Explain the impact of varying K values on the model performance in terms of
overfitting and underfitting.
c) List and explain any three potential methods to reduce the inference cost associated
with k-nearest neighbor classifiers.
Q6 What are hyperparameters, and why are they important? Explain one popular method 4 C04 L1
(a) for tuning hyperparameters, and illustrate its significance with an example.
Q6 Consider a scenario where a company is developing a linear regression model to predict 4 C04 L3
(b) the energy consumption of buildings based on factors such as size (in square feet),
number of occupants, and outside temperature. The company has collected data from 12
buildings (table below) defining the actual energy consumption values and the
corresponding predicted values from the developed linear regression model.

Building 1 2 3 4 5 6 7 8 9 10 11 12
Actual 300 320 350 340 360 280 330 350 370 380 290 280
Energy
Consumption
Predicted 280 310 340 330 350 270 320 340 360 370 280 270
Energy
Consumption
Using the given data, answer the following:
a) Calculate the R-squared and adjusted R-squared value to evaluate the model's fit to
the data.
b) Compute the Mean Squared Error (MSE) to quantify the model's prediction accuracy.
c) Determine the Mean Absolute Error (MAE) to assess the average magnitude of
prediction errors.

2/2
Course Outcome Wise Marks Bloom's Level Wise Marks
Distribution Distribution
18
16
14
12
10
8
6
4
2
0
CO1 CO2 CO3 CO4 CO5 Level1 Level2 Level3 Level4

3/2

You might also like