0% found this document useful (0 votes)
28 views6 pages

Exercises Topic 2 Solutions

The document discusses using multiple linear regression to model customer spending based on frequency and age using data from 32 customers. It provides scatter plots and regression results and asks to validate the model. It also discusses choosing the best regression model from three options to model employee salary based on education and experience using data from 36 employees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Exercises Topic 2 Solutions

The document discusses using multiple linear regression to model customer spending based on frequency and age using data from 32 customers. It provides scatter plots and regression results and asks to validate the model. It also discusses choosing the best regression model from three options to model employee salary based on education and experience using data from 36 employees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

L.F.

ST22

Topic 2: Multiple Linear Regression

Exercise 1: Testing a model

A staff restaurant conducted a survey collecting data from a random sample of 32 clients.
They were asked, among other things: how many times did they eat the restaurant during the
last month (variable: FREQUENCY); how much did they spend on a meal (variable:
SPENDING); how old they were (variable: AGE).

The restaurant manager would like to construct a model that would explain the spending
amount in terms of the frequency and the age for all clients.

You will find below:


Appendix 1: scatterplot of SPENDING w.r.t. FREQUENCY
Appendix 2: scatterplot of SPENDING w.r.t. AGE
Appendix 3: linear regression results obtained using a statistical package modeling
SPENDING in terms of FREQUENCY and AGE.

You are asked to proceed with the different tests required to validate the linear
regression model for all clients among which the sample was taken.

Appendix 1 (1st scatterplot)


Response variable: SPENDING
Explanatory variable: FREQUENCY

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]
Appendix 2 (2nd scatterplot)
Response variable: SPENDING
Explanatory variable: AGE

Appendix 3 (multiple linear regression model)


Response variable: SPENDING
Explanatory variables: FREQUENCY and AGE

Variable #1 (SPENDING)
Mean 4.17188
Corrected Standard Deviation 1.53249
Variable #2 (FREQUENCY)
Mean 10.59375
Corrected Standard Deviation 6.76738
Variable #3 (AGE)
Mean 35.75
Corrected Standard Deviation 11.6453
Count n 32
R-square 0.58999

Coefficient Standard Error


Intercept 6.13803
FREQUENCY -0.17208 0.02782
AGE -0.00401 0.01617

ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 29.8504
Total 72.8047
?

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]
1. Model pertinence:

𝑆𝑆𝑅𝑒𝑔 𝑆𝑆𝑇 − 𝑆𝑆𝑅𝑒𝑠 72.8047 − 29.8504


𝑅2 = = = = 0.59
𝑆𝑆𝑇 𝑆𝑆𝑇 72.8047

𝑅 2 /𝑘 0.59/2
𝐹𝑆𝑇𝐴𝑇 = = = 20.865
(1 − 𝑅 2 )/(𝑛 − 𝑘 − 1) (1 − 0.59)/29

𝐶𝑟𝑖𝑡 𝑣𝑎𝑙𝑢𝑒 = ℱ𝑘;𝑛−𝑘−1 = 3.33

𝐹𝑆𝑇𝐴𝑇 > 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 => Model is pertinent

2. Model validity for population generalization

Compare each t-stat to the critical value


𝐶𝑟𝑖𝑡. 𝑣𝑎𝑙𝑢𝑒 = 𝑡𝛼;𝑛−𝑘−1 = 2.04
2

Coefficient Standard Error T-stat CCL


Intercept 6.13803
FREQUENCY -0.17208 0.02782 -6.185 => Significant
AGE -0.00401 0.01617 -0.248 => Not significant

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]
Exercise 2: Choosing a model and using it

The HR director of an industrial group would like to construct a model explaining the
monthly salary of all employees.
Using data collected from a random sample of 36 employees, he tests two explanatory
variables that he deems relevant: the number of years of graduate studies (X1) and the
number of years of service (X2).
You can find below results from three regression models he tested using Excel.

1) For each of the three suggested models:


a. Calculate the coefficient of determination r²
b. Conduct the Student tests for the explanatory variables.

2) Can you help the HR director choose the most suitable model?
a. Using the information provided, which model would you suggest to use? Justify
your choice.
b. Estimate the parameters of the chosen model.

3) Pierre Durand, an employee of this group, is 38 years old, with 10 years of service
and 4 years of graduate studies. His monthly salary is 2050 Euros and he thinks he is
underpaid.
Calculate a 95% confidence interval for the mean salary of an employee with Pierre
Durand’s profile.
If you were the HR director of that firm, what
would you tell Pierre Durand about his salary?

Variable Mean Sample standard deviation*


Y 1850 795.88
X1 3.5 2.40
X2 4.17 2.15
* Reminder: this is the value of the unbiased point estimate (coefficient 1/ n-1)

Regression of Y w.r.t X1
ANALYSIS OF VARIANCE R² = 0.969
Sum of Squares F-stat = 1050.7 and crit value = 4.17 => Model pertinent
Regression 21 475 075
T-stat = 32.43 and crit. value = 2.045 => Coef significant
Residual 694 925
Total 22 170 000

Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08
4

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]
Regression of Y w.r.t. X2
ANALYSIS OF VARIANCE R² = 0.00063
Sum of Squares F-stat = 0.021 and crit value = 4.17 => Model NOT pertinent
Regression 13 975
Residual 22 156 025 T-stat = 0.1465 and crit. value = 2.045 => Coeff NOT significant
Total 22 170 000
?

Coefficients Standard-
error
Constant 1 811.2
Variable X2 9.32 63.62

Regression of Y w.r.t. X1, X2


ANALYSE OF VARIANCE R² = 0.9692
Sum of Squares F-stat = 519.88 and crit value = 3.32 => Model is pertinent
Regression 21 488 018.6 Obviously here since F-stat > crit value, the model is pertinent!!!
Residual 681 981.4
Total 22 170 000 X1: T-stat = 32.24 and crit. value = 2.042 => Coeff significant

X2: T-stat = -0.79 and crit. value = 2.042 => Coeff NOT significant

Coefficients Standard-
error
Constant 742
Variable X 327.27 10.15
1
Variable X -8.98 11.35
2

Q2. A. The chosen model is the first model (containing only variable X1) because it is
significant and contains no unnecessary and useless info (X2).

Q2. B. Parameters of the model:


𝛽0 = 706; 𝛽1 = 326.9 and 𝜎𝜀 = 142.96

Q3. A.
Chosen model = model 1 => computations made with model 1

Age = 38; years of service = 10; graduate studies = 4.

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]
𝐶𝐼95% = 𝑃𝑜𝑖𝑛𝑡𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ∓ 𝐶𝑟𝑖𝑡. 𝑉𝑎𝑙𝑢𝑒 ∗ 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝐸𝑟𝑟𝑜𝑟

Point estimate = 706 + 326.9 ∗ 4 = 2013.6

Critical Value = 𝑡𝛼;𝑛−2 = 2.04


2
1 (𝑥 𝑝 −𝑥̅ )2
Standard Error = 𝜎𝑦̂ = 𝜎𝜀 √𝑛 + = 24.35
𝑆𝑆𝑥𝑥

𝐶𝐼95% = [1963.919 ; 2063.28]

[Taken from chap 10 Méthodes Statistiques pour le Management - Hahn & Macé – Pearson 2016
Translated by Lynn FARAH]

You might also like