0% found this document useful (0 votes)
55 views

Problems For Exercise

This document contains information about multiple regression models, clustering algorithms, and logistic regression. It includes: 1) A multiple regression model and ANOVA table with missing values to compute. 2) Two multiple regression models to compare and identify the best fitted model. 3) Instructions to apply hierarchical clustering algorithms and create a confusion matrix using an iris dataset. 4) Demographic and toothpaste brand preference data to build a logistic regression model on, analyze variable significance, predict probabilities, and evaluate the model performance.

Uploaded by

Lalit Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Problems For Exercise

This document contains information about multiple regression models, clustering algorithms, and logistic regression. It includes: 1) A multiple regression model and ANOVA table with missing values to compute. 2) Two multiple regression models to compare and identify the best fitted model. 3) Instructions to apply hierarchical clustering algorithms and create a confusion matrix using an iris dataset. 4) Demographic and toothpaste brand preference data to build a logistic regression model on, analyze variable significance, predict probabilities, and evaluate the model performance.

Uploaded by

Lalit Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

For Exercise

1. Conduct the t test for the given multiple regression model for 8 df at α = 0.05 level of
significance. Let the std. errors for “Duration” and “Importance” are 0.22 and 0.12 (2
marks)
Attitude = 0.33732 + 0.48108 (Duration) + 0.28865 (Importance)

Find the missing values (?) and compute R2 from the given ANOVA table Information.
(2 marks)
Source SS df MS Calculated F
Regression ? 1 ?
?
Residual 18.223 ? ?
Total 63.815 24

6. For the given data, two multiple linear regression models were constructed.
(5 marks)

Model 1: Y = 62.405-0.144X4+0.102X3+0.510X2+1.551X1

Model 2: Y = 71.648+0.416X2+1.452X1-0.2365X4

X1 X2 X3 X4 Y
7.0 26.0 6.0 60.0 78.5
1.0 29.0 15.0 52.0 74.3
11.0 56.0 8.0 20.0 104.3
11.0 31.0 8.0 47.0 87.6
7.0 52.0 6.0 33.0 95.9
11.0 55.0 9.0 22.0 109.2
3.0 71.0 17.0 6.0 102.7
1.0 31.0 22.0 44.0 72.5
2.0 54.0 18.0 22.0 93.1
21.0 47.0 4.0 26.0 115.9
1.0 40.0 23.0 34.0 83.8
11.0 66.0 9.0 12.0 113.3
10.0 68.0 8.0 12.0 109.4

Compare the two models and find which one is best fitted? Why?

7. for the given data, apply the hierarchical-agglomerative clustering algorithms – Single
linkage, complete linkage and average linkage methods. Create the confusion matrix using
the actual label and predicted label. (Dataset name: IRIS dataset)

Page 1 of 2
Sepal Sepal Petal Petal Iris
Length Width Length Width Group label
50 33 14 2 1
64 28 56 22 1
65 28 46 15 2
67 31 56 24 2
63 28 51 15 2
46 34 14 3 1
69 31 51 23 1
62 22 45 15 2
48 35 15 2 1

8. A market researcher wants to understand the characteristic of people that determine which
one of two brands of toothpaste they prefer. Data is collected 13 people (7 of whom prefer
Brand A and 6 of whom prefer Brand B), with age, gender and income as the three
independent variables.
(10 marks)

Age Gender Income Brand


31 0 5.25 A
22 0 3.5 A
53 1 4.22 A
55 1 3 B
42 1 4 B
29 0 10 B
30 0 5 A
53 0 16 B
34 0 13.98 A
34 0 13.26 B
60 1 6.1 B
35 1 3.75 A
27 0 9 B

i. Build the Logistic regression model for the given data


ii. write the significance of the individual variables
iii. Find the predicted Probability for all the customers on both brands
iv. Draw a confusion matrix and find the Gmean value, TPR, TNR, Recall, Precision.

Page 2 of 2

You might also like