UCT633_EST_23

This document outlines the End Semester Test (EST) for the Data Mining and Analytics course at Thapar Institute, detailing the exam date, duration, and marks. It includes a series of questions covering topics such as data classification, statistical modeling, clustering, overfitting, and hyperparameter tuning. Students are instructed to attempt five questions in sequence and assume missing data as needed.

Uploaded by

goelvansh1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views3 pages

UCT633_EST_23

Uploaded by

goelvansh1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Roll Number: __________________________

Thapar Institute of Engineering & Technology, Patiala

Department of Computer Science and Engineering
End Semester Test (EST)
B.E. (CSBS) Course Code: UCT633
Course Name: Data Mining and Analytics
Date: May 17, 2024 Time: 02:00 PM – 05:00 PM
Duration: 3 Hours, M. Marks: 40 Name of Faculty: Dr. Jatin Bedi

Note: Attempt any five questions in a proper sequence. Assume missing data, if any, suitably.

CO BL
Q1 Classify the following attributes as binary, discrete, or continuous. Also classify them as 6 CO1 L4
(a) qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have
more than one interpretation, so briefly indicate your reasoning if you think there may
be some ambiguity. (Example: Age in years. Answer: Discrete, quantitative, ratio)
a) Time in terms of AM or PM
b) Brightness as measured by a light meter.
c) Brightness as measured by people’s judgments.
d) Bronze, Silver, and gold medals as awarded at the Olympics.
Q1 Given the following data matrix A, 2 C02 L3
(b) 1 0 1
𝐴=[ ]
0 −1 0
a) Compute the covariance matrix of A
b) Compute Eigen Values and Eigen Vectors of Covariance (A).

Q2 Discuss the following in detail giving suitable examples: 3 C05 L2

(a) a) Causality and Correlation
b) Stationarity and Differencing

Q2 Consider the dataset given below: 5 C05 L3

(b) 6.4, 5.6, 7.8, 8.8, 11, 11.6, 16.7, 15.3, 21.6, 22.4

a) Fit a single exponential smoothing model with alpha=0.977 to estimate data values.
b) Fit a double exponential smoothing model (with alpha=0.3623, gamma= 1.0, S1=6.4
and b1=0.8) to estimate data values.
c) Compare the estimated results of single and double exponential smoothing models.
Q3 Consider you have been given the following eight data points representing temperatures 5 C03 L3
(a) (in Celsius) and humidity levels (in percentage) recorded in different cities:

A = (20, 60), B = (25, 55), C = (30, 70), D = (22, 65), E = (27, 75), F = (18, 50), G = (28, 80),
H = (23, 45).

a) Apply agglomerative clustering with single linkage method to group the cities based
on their weather patterns. Use Euclidean measure to compute the initial distance
matrix and show the clustering output at each step.
b) Generate Dendrogram to visualize the clustering results and determine the optimal
number of clusters.

Q3 Define and explain the four crucial components of a time series. Use a real-world 3 C05 L1
(b) example to illustrate each component's influence on time series data analysis.

1/2
Q4 What is overfitting? How regularization helps in handling overfitting in machine learning 8 C04 L2
models? Derive the coefficients equation of ridge regularization for multiple linear
regression using gradient descent optimization and discuss how the ridge regression can
shrink the regression coefficients.

Q5 Consider a dataset containing information about customer transactions, where each 8 C03 L3
sample has two features: Feature X (representing transaction amount) and Feature Y
(representing transaction frequency). The samples belong to two classes: class 0
(representing non-fraudulent transactions) and class 1 (representing fraudulent
transactions). The dataset is provided below:
Feature X (in $) Feature Y Class Label
336 51 1
266 32 0
234 43 0
431 67 0
353 23 1
359 67 1
367 45 1
257 46 0
233 29 0
310 56 1

a) Using the k-nearest neighbor classifier with different values of K (1, 3, 5 and 7) and
the Euclidean distance measure, classify a given test sample <feature X= 436 and
feature Y= 40>
b) Explain the impact of varying K values on the model performance in terms of
overfitting and underfitting.
c) List and explain any three potential methods to reduce the inference cost associated
with k-nearest neighbor classifiers.
Q6 What are hyperparameters, and why are they important? Explain one popular method 4 C04 L1
(a) for tuning hyperparameters, and illustrate its significance with an example.
Q6 Consider a scenario where a company is developing a linear regression model to predict 4 C04 L3
(b) the energy consumption of buildings based on factors such as size (in square feet),
number of occupants, and outside temperature. The company has collected data from 12
buildings (table below) defining the actual energy consumption values and the
corresponding predicted values from the developed linear regression model.

Building 1 2 3 4 5 6 7 8 9 10 11 12
Actual 300 320 350 340 360 280 330 350 370 380 290 280
Energy
Consumption
Predicted 280 310 340 330 350 270 320 340 360 370 280 270
Energy
Consumption
Using the given data, answer the following:
a) Calculate the R-squared and adjusted R-squared value to evaluate the model's fit to
the data.
b) Compute the Mean Squared Error (MSE) to quantify the model's prediction accuracy.
c) Determine the Mean Absolute Error (MAE) to assess the average magnitude of
prediction errors.

2/2
Course Outcome Wise Marks Bloom's Level Wise Marks
Distribution Distribution
18
16
14
12
10
8
6
4
2
0
CO1 CO2 CO3 CO4 CO5 Level1 Level2 Level3 Level4

3/2

Project Four Individual Part V
No ratings yet
Project Four Individual Part V
4 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Analysis of Queues - Methods and Applications (2012, CRC Press)
No ratings yet
Analysis of Queues - Methods and Applications (2012, CRC Press)
788 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
A Step-By-Step Guide On Asynchronous RFC - To - JDBC Scenario Using SAP PI 7.0
100% (1)
A Step-By-Step Guide On Asynchronous RFC - To - JDBC Scenario Using SAP PI 7.0
14 pages
CS-30004(DSA)-CS_END_NOV_2024
No ratings yet
CS-30004(DSA)-CS_END_NOV_2024
17 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
No ratings yet
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
5 pages
22EE514
No ratings yet
22EE514
6 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
No ratings yet
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
47 pages
Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science
No ratings yet
Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science
37 pages
Homework 1
No ratings yet
Homework 1
9 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
Due: Monday September 17: Homework 2 - Solution ECE 445 Biomedical Instrumentation, Fall 2012
No ratings yet
Due: Monday September 17: Homework 2 - Solution ECE 445 Biomedical Instrumentation, Fall 2012
3 pages
Chapter-4 Curve Fitting PDF
No ratings yet
Chapter-4 Curve Fitting PDF
17 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
SLA Mid-termV2 Soln
No ratings yet
SLA Mid-termV2 Soln
5 pages
MACHINE LEARNING T.E. (IT)(2019 Pattern) (Semester-I) Nov Dec 2022
No ratings yet
MACHINE LEARNING T.E. (IT)(2019 Pattern) (Semester-I) Nov Dec 2022
4 pages
3334 Exam Cheat Sheet
No ratings yet
3334 Exam Cheat Sheet
26 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Lesson 3-Multiple Linear Regression
No ratings yet
Lesson 3-Multiple Linear Regression
24 pages
Stats 12 Practice Test
No ratings yet
Stats 12 Practice Test
6 pages
Due: Monday September 17: Homework 2 ECE 445 Biomedical Instrumentation, Fall 2012
No ratings yet
Due: Monday September 17: Homework 2 ECE 445 Biomedical Instrumentation, Fall 2012
2 pages
Assignment_III
No ratings yet
Assignment_III
3 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Group 10 - Curve Fitting
No ratings yet
Group 10 - Curve Fitting
81 pages
QUESTION BANK MFDS-II
No ratings yet
QUESTION BANK MFDS-II
4 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Chapter 5 Curve Fitting
No ratings yet
Chapter 5 Curve Fitting
19 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
CHW 4
No ratings yet
CHW 4
7 pages
MIT2 086F12 Quiz3 Samples
No ratings yet
MIT2 086F12 Quiz3 Samples
14 pages
CS-3035 (ML) - CS Mid March 2023
No ratings yet
CS-3035 (ML) - CS Mid March 2023
3 pages
ASSIGN8
No ratings yet
ASSIGN8
5 pages
1 1
No ratings yet
1 1
6 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
HW 1
No ratings yet
HW 1
4 pages
QCM
No ratings yet
QCM
24 pages
exam-srm-sample-questions
No ratings yet
exam-srm-sample-questions
77 pages
Curve Fitting - F22
No ratings yet
Curve Fitting - F22
42 pages
ch16 Solutions
No ratings yet
ch16 Solutions
94 pages
The University of Auckland: Second Semester, 2004 Campus: City
No ratings yet
The University of Auckland: Second Semester, 2004 Campus: City
23 pages
Regression Adequacy
No ratings yet
Regression Adequacy
11 pages
Exercise_6 (1)
No ratings yet
Exercise_6 (1)
2 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
DM Endsem 2023-1
No ratings yet
DM Endsem 2023-1
4 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Individual Part 4
No ratings yet
Individual Part 4
4 pages
Appc 2.6 Packet
No ratings yet
Appc 2.6 Packet
7 pages
DA UNIT-III
No ratings yet
DA UNIT-III
14 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
Uct633 Mst e Mar25
No ratings yet
Uct633 Mst e Mar25
2 pages
Curve Fitting (Lecturers)
No ratings yet
Curve Fitting (Lecturers)
27 pages
Adequacy Og Regression Model
No ratings yet
Adequacy Og Regression Model
10 pages
A First Course in Dimensional Analysis: Simplifying Complex Phenomena Using Physical Insight
From Everand
A First Course in Dimensional Analysis: Simplifying Complex Phenomena Using Physical Insight
Juan G. Santiago
No ratings yet
Preventive Maintenance Belts
No ratings yet
Preventive Maintenance Belts
6 pages
The FN
No ratings yet
The FN
73 pages
How To Write Chapter 3-Methods of Research and Procedures
100% (1)
How To Write Chapter 3-Methods of Research and Procedures
17 pages
A Small Intro of AI
No ratings yet
A Small Intro of AI
55 pages
CCT Lesson Plan
No ratings yet
CCT Lesson Plan
2 pages
Veriton M480G
No ratings yet
Veriton M480G
2 pages
Downloaded Licenses
No ratings yet
Downloaded Licenses
6 pages
3 Phase Fault Detection
50% (8)
3 Phase Fault Detection
44 pages
GJ
No ratings yet
GJ
495 pages
CSS 4th Periodical
No ratings yet
CSS 4th Periodical
2 pages
Solid Works Praktikum
No ratings yet
Solid Works Praktikum
104 pages
DATA MINING Project Report
No ratings yet
DATA MINING Project Report
28 pages
? Woozy Face Emoji
No ratings yet
? Woozy Face Emoji
1 page
Owner's Manual: Subwoofers
No ratings yet
Owner's Manual: Subwoofers
4 pages
Instructions: Mark The Hands You Think You Should Always Fold
No ratings yet
Instructions: Mark The Hands You Think You Should Always Fold
5 pages
Mod Menu Crash 2022 10 30-21 33 40
No ratings yet
Mod Menu Crash 2022 10 30-21 33 40
2 pages
KND 100M
No ratings yet
KND 100M
297 pages
Problem Statement - Mathematical Foundations
No ratings yet
Problem Statement - Mathematical Foundations
2 pages
Satish Yellanki August 2107 AM
No ratings yet
Satish Yellanki August 2107 AM
18 pages
Multiple Table Test
No ratings yet
Multiple Table Test
3 pages
Parametric and Nonparametric
No ratings yet
Parametric and Nonparametric
2 pages
d9854 Advanced-Program Receiver Amt
No ratings yet
d9854 Advanced-Program Receiver Amt
6 pages
ptns
No ratings yet
ptns
2 pages
FAA 2020 0456 0006 - Attachment - 4
No ratings yet
FAA 2020 0456 0006 - Attachment - 4
41 pages
Manual The Dude Server Settings - MikroTik
No ratings yet
Manual The Dude Server Settings - MikroTik
13 pages
Toolroom Accessories
50% (2)
Toolroom Accessories
70 pages
Electrical_Compliance_Test Specification_Linear_Re_Driver_Active_Cable_LRD_rev1p0
No ratings yet
Electrical_Compliance_Test Specification_Linear_Re_Driver_Active_Cable_LRD_rev1p0
44 pages
Low Voltage - Reading Material
No ratings yet
Low Voltage - Reading Material
47 pages

UCT633_EST_23

Uploaded by

UCT633_EST_23

Uploaded by

Roll Number: __________________________

Thapar Institute of Engineering & Technology, Patiala

Q2 Discuss the following in detail giving suitable examples: 3 C05 L2

Q2 Consider the dataset given below: 5 C05 L3

You might also like