Correlation, Simple Linear Regression and Multiple Linear Regression Practice
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
11/28/2022 1
The estimated linear regression
X: independent variable
3
Interpret the regression coefficients
5
Practice
6
Practice
7
Scatter plot
8
8
9
It would appear that there is a positive correlation between X and Y.
10
Now let’s find the regression equation
11
Results
12
Inference for regression coefficients
The t-test for whether the coefficient is equal to zero (2-tailed test
with significance level alpha = 0.05)
Can perform a test for whether the coefficient is equal to a
specific value.
14
The standardized coefficients
The coefficients that you would obtain if you standardized all of the
variables in the regression, including the dependent and all of the
independent variables in multiple linear regression.
Standardize = put all variables on the same scale (variance of variables = 1)
Can compare the magnitude of the coefficients to see which variable has more
effective.
Notice that the larger betas are associated with the larger t-values and lower p-
values. 15
Evaluation of the model
R: Pearson correlation
R square: Coefficient of determination
The proportion of the variability of Y is explained by the linear
regression of Y on X (or the linear relationship between Y and X)
R2 = 0.601 → 60.1% of the variability of female life expectancy is
explained by the linear relationship between female life expectancy and
daily calorie intake.
The remaining (100 - 60.1=39.9%) of the variation is not explained by
this relationship.
16
Evaluation of the model
18
Evaluation of the model
• F statistic = MSregression/MS
residual
• Null hypothesis: all of the model coefficients are 0
19
Now it’s your turn
20
Multiple linear regression
21
The General Idea
22
The General Idea
24
Simple vs. Multiple Regression
25
Design Requirements
26
Assumptions
27
Multiple Linear Regression Models
Equation12-2
28
Multiple Linear Regression Models
29
Multiple Linear Regression Models
30
Multiple Linear Regression Models
32
Multiple Linear Regression Models
33
Multiple Linear Regression Models
35
Multiple Linear Regression Models
36
Multiple Linear Regression Models
37
Notes on the coefficient of determination
38
More notes on the coefficient of determination
Model 1: R2 = 0.6095
Model 2: R2 = 0.7520
The inclusion of an additional variable in a model can never
cause R2 to decrease.
Comparing R2 to conclude about the improvement of the model
is not suitable, use the adjusted R2 instead.
The adjusted R2 is an estimator of the population correlation(ρ) but
can’t be interpreted as the proportion of the variability among the
observed values y that is explained by the linear regression model.
39
Comparing Models -Testing R2
• The Model:
42
Explaining Variation: How much?
Predictable variation
by the combination of
independent variables
Total Variation in Y
Unpredictable
Variation
Proportion of Predictable and
Unpredictable Variation
R2 = Predictable
X2 (explained)
variation in Y
Various Significance Tests
Testing R2
- Test R2 through an F test
- Test competing models (difference between R2) through an F test of
difference of R2s
Testing
- Test of each partial regression coefficient () by t-tests
- Comparison of partial regression coefficients with each other - t-test
of difference between standardized partial regression coefficients ()
Example: Testing R2
Comparing models
Model 1: Y’= 35.37 + (3.38)XASC
Model 2: Y’= 36.83 + (3.52)XASC + (-0.44)XGSC
Compute R2 for each model
Model 1: R2 = r2 = 0.160
Model 2: R2 = 0.161
Test difference between R2s
Fobs = 0.119, Fcrit(0.05/1,100) = 3.94
Conclude that GSC does not add significantly to ASC in predicting AA
Testing Significance of b’s
H0: = 0
b-
tobserved =
standard error of b
With N-k-1 df
Example: t-test of b