STAT2263 HUMBER Linear Regression - Week 12
STAT2263 HUMBER Linear Regression - Week 12
➢ Linear Regression
* Regression analysis is used to predict the value of one variable (the dependent variable) on
the basis of other variables (the independent variable).
-1 1
MIDORI KOBAYASHI 1
STAT 2263 Linear Regression and Correlation
For example,
x: pulse rate
y: white blood cell count and say r2 = 0.083
by the ___________________
variation in ___________________________.
pulse rate The remaining __________%
91.7 is
MIDORI KOBAYASHI 2
STAT 2263 Linear Regression and Correlation
RED
Interpretation of Slope:
As ______
x is increased by _______________________,
one unit ______
y is increased (or decreased) by
________units.
b
1
b
❖ The sign of ______ r
1 and ______must be the same.
positive
If b1 is positive, then r must be__________________.
negative
If b1 is negative, then r must be__________________.
Reference:
n(xy ) − (x)(y )
b1 ( slope) =
n ( x 2 ) − ( x ) 2
( y ) ( x )
b0 ( y − intercept ) = − b1 (n : # of pairs)
n n
MIDORI KOBAYASHI 3
STAT 2263 Linear Regression and Correlation
❖ Example (1)
A statistics professor claims that there is a distinct relationship between a student’s mark
and the student’s class attendance.
students A B C D E F G H
# of missed classes 2 7 13 9 7 4 13 16
negative
_________________________ correlation
As the _____________________________increases,
# of missed classes
yˆ b0 b1 x ^y = 96.118 - 2.999x
b1(-2.999)
As the number of missed classes is increased by one class, the
student’s statistic mark (%) can be decreased by 2.999%
MIDORI KOBAYASHI 4
same unit
STAT 2263 Linear Regression and Correlation
r 2= 0.942
94.2% of the variation in stat marks can be
do not explained by the variation in the number of missed
round classes. The remaining 5.8% is unexplained.
100-94.2
# of missed classes
between _____________________________________________________ stat marks
and _____________________________________.
(g) Is there a linear relationship between the number of missed classes and the statistics marks at the 5%
significance level?
stat marks
__________________________________________________________ 5
at the _______% significance level.
Ref. table
MIDORI KOBAYASHI 5
STAT 2263 Linear Regression and Correlation
(h) Predict the stat marks for students who miss 3 classes and 20 classes.
Since there is a significant linear relationship and the correlation is _____________________________,
very strong we
can use the best fit equation to predict statistic marks within 2 – 16 missed classes.
ŷ = 96.118 – 2.999 x
When x =3
When x = 20
❖ Example (2) The following table shows the years of experience of 14 registered nurses and
their salaries.
Years of 0.5 2 4 5 7 9 10 12.5 13 16 18 20 22 25
Experience
Annual 45.2 48.3 53.6 46.8 54.3 68.5 58.3 54.6 72.6 76.2 72.6 94.3 75.3 82.3
Salary
(in 000’s)
As the _____________________________________________
# of years of experience increases, the ______________________________
annual salary increases.
(The graph is ____________________________________.)
right side up
MIDORI KOBAYASHI 6
STAT 2263 Linear Regression and Correlation
(c) Write the regression equation using the output from SPSS.
yˆ b0 b1 x = 44.326 + 1.722x
= 0.782
strong positive
There is a ____________________________________________________ linear relationship
between _____________________________________________________
years of experience and _____________________________________.
annual salaries
(g) Is there a linear relationship between the number of years of experience and the annual salary of
nurses at the 1% significance level?
MIDORI KOBAYASHI 7
STAT 2263 Linear Regression and Correlation
__________________________________________________________
the salaries of nurses at the _______%
1 significance level.
Ref. table
P-value
Annual 45.2 48.3 53.6 46.8 54.3 68.5 58.3 54.6 72.6 76.2 72.6 94.3 75.3 82.3
Salary
(in 000’s)
yˆ 44.326 1.722x
MIDORI KOBAYASHI 8