0% found this document useful (0 votes)
5 views

Understanding Linear Regression Output

Understanding Linear Regression Output
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Understanding Linear Regression Output

Understanding Linear Regression Output
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Understanding Linear Regression Output

Regression is a tool to test for a relationship between a response variable and one
or more predictor variables. Starting with a straight-line relationship between two
variables:

^y i=β 0 + β 1∗x i

y i= ^y i + ε i

y i=β 0 + β 1∗x i +ε i

β 0∧β 1 are coefficients for the population data. Since we test only on the sample data
and not on the population data, we estimate the sample parameters b 0∧b1 which are
our best guess estimates of the population parameters β 0∧β 1. Therefore

 β 0∧β 1 are the estimated intercept and coefficient for the entire population data
 b 0∧b1 are the estimated intercept and coefficient from a sample of that
population

Now let us run the simple linear regression with Oxygen_Consumption as the
response variable and RunTime as the predictor variable in R.

fitness.lm <- lm(Oxygen_Consumption~RunTime, data=fitness)

We are looking at an equation in the form

OxygenConsumption= β0 + β 1∗RunTime+ ε

Here our Hypotheses are below:

 Null Hypothesis, H 0 : β 1=0


 Alternate Hypothesis, H a : β 1 ≠ 0
Now let us ask for the summary of the results.

summary(fitness.lm)

The results are below. I have superscripted the numbers we are going to interpret.

Call:
lm(formula = Oxygen_Consumption ~ RunTime, data = fitness)

Residuals:
Min 1Q Median 3Q Max
-5.3311 -1.8445 -0.0599 1.5352 6.2077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.4249 3.8558 21.377 < 2e-16 ***
RunTime -3.31091 0.36122 -9.1653 4.59e-104 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.745 on 29 degrees of freedom


Multiple R-squared: 0.74345, Adjusted R-squared: 0.73456
F-statistic: 84 on 1 and 29 DF, p-value: 4.59e-10

Our estimation model for Oxygen_Consumption is as below

OxygenConsumption=82.4249−3.3109∗RunTime

The interpretations of the superscripted values are below:

1. Estimate: Based on the sample data, we can say that for a unit increase in
RunTime, the Oxygen_Consumption decreases by 3.3109. But remember we are only
taking the sample and hence we are prone to sampling error.
2. Standard Error: Because -3.3109, the coefficient associated with RunTime is
only a sample estimate, the standard error associated with that estimate is
0.3612. The lower this value is, the more precise is our estimate of the
coefficient and vice-versa.
3. t value: The estimated coefficient, -3.3109, is -9.165 times the standard error
of 0.3612 from zero, the null Hypothesized value for the coefficient. The larger
this value, the more we are confident that the population coefficient is not
equal to zero.
4. The significance of relationship of the predictor variable RunTime with respect to
the response variable Oxygen_Consumption is 4.59 * 10-10. This is the probability of
getting an absolute t-value of more than 9.165, if the null Hypothesis is true.
Therefore we can say with more than 95% confidence that there is a
relationship between Oxygen_Consumption and RunTime. Hence we reject the Null
Hypothesis with 95% confidence.
5. Coefficient of determination of the model, R-Squared is equal to 0.7434. We
can say that 74.34% of the total variation in the dependent variable, i.e.
Oxygen_Consumption, is explained by the predictor variable i.e. RunTime.
6. The adjusted R-Squared of the model is equal to 0.7345. This adjusts
for avoiding overestimating impact of adding an additional
independent variable on the variability of the response / dependent
variable.

Now let us run the anova() function on the regression object fitness.lm.

anova(fitness.lm)

The results are below. I have superscripted the numbers we are going to interpret.

Analysis of Variance Table

Response: Oxygen_Consumption
Df Sum Sq Mean Sq F value Pr(>F)
RunTime 11 633.013 633.015 847 4.59e-108 ***
Residuals 29 2
218.54 4
7.54 6

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

1. The number of degrees of freedom associated with the model is equal


to 1.
2. The number of degrees of freedom associated with the residuals
(errors) is equal to 29. This is equal to the number of observations, 31
minus the number of number of parameters, 2 (we have 2 parameters
β 0∧β 1).
3. The sum of squares of the model is equal to 633.01. This is equal to
sum of the squares of the differences between the estimated value of
Oxygen_Consumption from the average value of Oxygen_Consumption.
∑ ( predicted value of OxygenConsumption−a verage value of OxygenConsumption )2
4. The sum of squared of residuals is equal to 218.54. This is equal to sum of
the squares of the differences between the estimated value of
Oxygen_Consumption from the actual value of Oxygen_Consumption.
∑ ( predicted value of OxygenConsumption−actual value of OxygenConsumption )2
5. Mean Square of Model is equal to 633.01. This is equal to sum of
squares of model divided by the degrees of freedom, i.e. 1.
6. Mean Square of Residuals is equal to 7.54. This is equal to sum of
squares of residuals, i.e. 218.54 divided by the degrees of freedom for
residuals, i.e. 29.
7. F Value of the model is equal to 84. This is equal to Mean Square of
Model, i.e., 633.01 divided by Mean Square of Residual i.e. 7.54.
8. The p-value of the model is 4.59*10-10. This is the probability of getting an
F-value of more than 84, if the null Hypothesis is true . Therefore we can say
with more than 95% confidence that there is a relationship between
Oxygen_Consumption and RunTime. Hence we reject the Null Hypothesis with 95%
confidence.

Now let us get the confidence interval estimates of the model.

confint(fitness.lm)

The results are below and show the 95% confidence intervals of the estimates. I have
superscripted the numbers we are going to interpret.

2.5 % 97.5 %
(Intercept) 74.53890 90.310980
RunTime1 -4.04968 -2.572029

1. The 95% confidence interval coefficient estimate associated with the


independent variable RunTime. Therefore we can say with 95%
confidence that the true value (as applicable to the population) of the
coefficient associated with RunTime is between -4.04968 and -2.572029.
Since the 95% confidence interval does not include 0 (please
remember the null hypothesized value of β is equal to zero), we can
reject the null hypothesis with 95% confidence.

You might also like