0% found this document useful (0 votes)
13 views

STAT2263 HUMBER Linear Regression - Week 12

This document discusses linear regression and correlation. It defines key terms like dependent and independent variables, correlation coefficient r, coefficient of determination r^2, and regression equation. Examples are provided to demonstrate how to calculate these from data and interpret the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

STAT2263 HUMBER Linear Regression - Week 12

This document discusses linear regression and correlation. It defines key terms like dependent and independent variables, correlation coefficient r, coefficient of determination r^2, and regression equation. Examples are provided to demonstrate how to calculate these from data and interpret the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

STAT 2263 Linear Regression and Correlation

➢ Linear Regression

* Regression analysis is used to predict the value of one variable (the dependent variable) on
the basis of other variables (the independent variable).

* Dependent variable: usually denoted y


Independent variable: denoted x
Analysis between x and y in three ways:
o Correlation
o Determination
o Equation

➢ (1) Linear Correlation Coefficient, r

Correlation: The relationship between two variables (x and y)

r: measures the strength of the linear relationship between x and y.

-1 1

• Guideline for interpretation of r –1≤r≤1


|r|
0.00 No linear relationship
0.01 – 0.19 very weak
0.20 – 0.39 weak
0.40 – 0.69 modest “r” tells you only weak, strong, perfect
0.70 – 0.89 strong ___________________________________
0.90 – 0.99 very strong
1.00 perfect

MIDORI KOBAYASHI 1
STAT 2263 Linear Regression and Correlation

➢ (2) Coefficient of Determination, r2


❑ A measure of how much of the variation in dependent variable(y) can be "explained by"
the variation in independent variable (x).

For example,
x: pulse rate
y: white blood cell count and say r2 = 0.083

8.3% of the _____________________


variation in __________________________________________can
the white blood cell count be explained

by the ___________________
variation in ___________________________.
pulse rate The remaining __________%
91.7 is

unexplained (cannot be explained by pulse rate).


100-8.3

MIDORI KOBAYASHI 2
STAT 2263 Linear Regression and Correlation

➢ (3) Regression Equation


Given a collection of paired sample data, the regression equation,

yˆ b0 b1 x , algebraically describe the relationship between the two variables.


The graph of the regression equation is called the regression line or line of best fit.

RED

Interpretation of Slope:

As ______
x is increased by _______________________,
one unit ______
y is increased (or decreased) by

________units.
b
1
b
❖ The sign of ______ r
1 and ______must be the same.
positive
If b1 is positive, then r must be__________________.

negative
If b1 is negative, then r must be__________________.

Reference:
n(xy ) − (x)(y )
b1 ( slope) =
n ( x 2 ) − (  x ) 2
( y ) ( x )
b0 ( y − intercept ) = − b1  (n : # of pairs)
n n

MIDORI KOBAYASHI 3
STAT 2263 Linear Regression and Correlation

❖ Example (1)

A statistics professor claims that there is a distinct relationship between a student’s mark
and the student’s class attendance.

students A B C D E F G H

Stat Mark (%) 90 75 55 73 70 85 63 45

# of missed classes 2 7 13 9 7 4 13 16

(a) Determine the independent and dependent variables.


Does class attendance depend on the test mark?
Does the test mark depend on class attendance?

Independent variable: # of missed classes

Dependent variable: Stat Mark

(b) Is there a positive or negative correlation?

negative
_________________________ correlation

As the _____________________________increases,
# of missed classes

the stat mark ________________________.


decreases

(The graph is ____________________________).


right side down

The following is the output by SPSS


Model Summary and Parameter Estimates
Dependent Variable: StatMark
Model Summary Parameter Estimates
Equation R Square F df1 df2 Sig. Constant b1
Linear .942 96.735 1 6 .00006367 96.118 -2.999
The independent variable is ClassMissed.
(c) Write the regression equation using the output of SPSS.

yˆ b0 b1 x ^y = 96.118 - 2.999x

(d) Interpret the meaning of the regression coefficient, b1

b1(-2.999)
As the number of missed classes is increased by one class, the
student’s statistic mark (%) can be decreased by 2.999%
MIDORI KOBAYASHI 4

same unit
STAT 2263 Linear Regression and Correlation

Model Summary and Parameter Estimates


Dependent Variable: StatMark
Model Summary Parameter Estimates
Equation R Square F df1 df2 Sig. Constant b1
Linear .942 96.735 1 6 .00006367 96.118 -2.999
The independent variable is ClassMissed.

(e) Find the coefficient of determination and interpret it.

r 2= 0.942
94.2% of the variation in stat marks can be
do not explained by the variation in the number of missed
round classes. The remaining 5.8% is unexplained.

100-94.2

(f) Find the correlation coefficient and interpret it.

r = r2 = 0.942 = 0.970566… =0.971 keep same


Since , r = -0.971 decimal place
b is negative
as r^2

very strong negative


There is a ____________________________________________________ linear relationship

# of missed classes
between _____________________________________________________ stat marks
and _____________________________________.

(g) Is there a linear relationship between the number of missed classes and the statistics marks at the 5%
significance level?

Since p-value = 0.00006367 < 0.05 , we reject H0.

There is a significant linear relationship between _______________________________________


the number of missed classes and

stat marks
__________________________________________________________ 5
at the _______% significance level.

Ref. table

MIDORI KOBAYASHI 5
STAT 2263 Linear Regression and Correlation

(h) Predict the stat marks for students who miss 3 classes and 20 classes.
Since there is a significant linear relationship and the correlation is _____________________________,
very strong we
can use the best fit equation to predict statistic marks within 2 – 16 missed classes.

ŷ = 96.118 – 2.999 x

When x =3

= 96.118 - 2.999 (3) = 87.121 = 87.1

When x = 20

The equation cannot be used to predict.


(20 is out of interval)

❖ Example (2) The following table shows the years of experience of 14 registered nurses and
their salaries.
Years of 0.5 2 4 5 7 9 10 12.5 13 16 18 20 22 25
Experience

Annual 45.2 48.3 53.6 46.8 54.3 68.5 58.3 54.6 72.6 76.2 72.6 94.3 75.3 82.3
Salary
(in 000’s)

(a) Determine the independent and dependent variables.

Independent variable: # of years of experience

Dependent variable: annual salary

(b) Is there a positive or negative correlation?


positive
_______________________ correlation

As the _____________________________________________
# of years of experience increases, the ______________________________
annual salary increases.
(The graph is ____________________________________.)
right side up
MIDORI KOBAYASHI 6
STAT 2263 Linear Regression and Correlation

(c) Write the regression equation using the output from SPSS.

yˆ b0 b1 x = 44.326 + 1.722x

(d) Interpret the meaning of the regression coefficient, b1.


(1.722):
As the numbers of years of experience is increased by one year, the
annual salary of nurses will be increased by $1722 (1.722 thousands)

(e) Find the coefficient of determination and interpret it.

= 0.782

do not 78.2% of the variation in annual salaries of nurses can be


round explained by the variation in the number of years of experience.
The remaining 21.8% is unexplained
100-78.2

(f) Find the correlation coefficient and interpret it.

r = r2 = 0.782 = 0.88430 = 0.884


Since is positive , r = 0.884 (keep the same decimal places as r^2)

strong positive
There is a ____________________________________________________ linear relationship

between _____________________________________________________
years of experience and _____________________________________.
annual salaries

(g) Is there a linear relationship between the number of years of experience and the annual salary of
nurses at the 1% significance level?

MIDORI KOBAYASHI 7
STAT 2263 Linear Regression and Correlation

Since p-value = 0.0000271 < 0.01 , we reject H0.


p value a-value
There is a significant linear relationship between _______________________________________
the years of experiences and

__________________________________________________________
the salaries of nurses at the _______%
1 significance level.

Ref. table

P-value

(h) Predict the amount of salary if a nurse has 14 years of experience.

Years of 0.5 2 4 5 7 9 10 12.5 13 16 18 20 22 25


Experience

Annual 45.2 48.3 53.6 46.8 54.3 68.5 58.3 54.6 72.6 76.2 72.6 94.3 75.3 82.3
Salary
(in 000’s)

Since there is a significant linear relationship and the correlation is_____________________,


strong we can use the best
fit equation to predict the salaries within 0.5 – 25 years of experience.

yˆ 44.326 1.722x

substitute x=14 into the regression equation to get:

= 44.326 + 1.722 (14) = 68.434 = $68.434 (k) = $68434

MIDORI KOBAYASHI 8

You might also like