Lesson 6-8 Linear Regression and Correlation
Lesson 6-8 Linear Regression and Correlation
Simple Linear
Regression and
Correlation Analysis
1
Introduction to
and
Introduction to Regression Analysis
is used to:
• predict the value of a dependent variable based on
the value of at least one independent variable
• explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the
dependent variable
• a.k.a. scatter diagram
• shows the relationship between two
variables
Scatter Plot Examples
Linear relationships Nonlinear relationships
y y
x x
y y
x x
Who coined the term ‘regression’?
Simple Linear Regression Model
• Only one independent variable, 𝑥
• Relationship between 𝑥 and 𝑦 is
described by a linear function
• Changes in 𝑦 are assumed to be
caused by changes in 𝑥
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
11
Estimated Regression Model
Independent
ŷ = a + bx variable
Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6
Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6
1 2 3 3 4 5 6 7 8 9 10
Number of Children
10 13 16 19 22 25 28 31 34 37
Age at wedding
2. Obtain the estimated regression line equation.
𝑥𝑖 = 21 + 15 + 22 + 22 + 21 + 25 + 30 + 18 + 24 = 198
𝑖=1
𝑦𝑖 = 4 + 8 + 3 + 4 + 2 + 3 + 1 + 5 + 6 = 36
𝑖=1
𝑥𝑖 2 = 212 + 152 + 222 + 222 + 212 + 252 + 302 + 182 + 242 = 4500
𝑖=1
2. Obtain the estimated regression line equation.
(X) Age at wedding 21 15 22 22 21 25 30 18 24
(Y) No. of children 4 8 3 4 2 3 1 5 6
𝑦𝑖 2 = 42 + 82 + 32 + 42 + 22 + 32 + 12 + 52 + 62 = 180
𝑖=1
𝑥𝑖 𝑦𝑖 = 21 4 + 15 8 + 22 3 + 22 4 + 21 2 + 25 3 + 30 1 + 18 5 + 24 6 = 739
𝑖=1
22
23
24
• measures the strength of the association (linear
relationship) between two variables
• only concerned with strength of the relationship
• no causal effect is implied
Scatter Plot
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
• estimate of ρ
• measures the strength of the linear relationship in
the sample observations
Who developed Pearson product moment correlation coefficient?
Karl Pearson
(1857-1936)
• English mathematician
• a protégé and biographer of
Sir Francis Galton
Features of ρ and r
• unit free
• range between -1 and 1
• the closer to -1, the stronger the negative linear
relationship
• the closer to 1, the stronger the positive linear
relationship
• the closer to 0, the weaker the linear relationship
Examples of Approximate r Values
Perfect negative linear
relationship No linear relationship
y y y
x x x
r = -1 r = -.6 r=0
y y
r = +.3 x r = +1 x
Calculating the Correlation Coefficient
n n n
n xi y i − xi y i
r= i =1 i =1 i =1
n n n
2
n
2
n x 2 − x n y 2 − y
i i i i
i =1 i =1 i =1 i =1
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Coefficient of Determination, r2
• the total variation in the dependent
variable that is explained by variation in
the independent variable
0 r 1 2
Examples of Approximate
R2 Values
y
r2 = 1
x
r2 = +1
Examples of Approximate
R2 Values
y
0 < r2 < 1
x
Examples of Approximate
R2 Values
r2 = 0
y
No linear relationship between x
and y:
n n n
n xi yi − xi yi
9(739) − (198)(36) − 477
r= i =1 i =1 i =1
= = = −0.7361
n 2 n 2 n 2 n 2 [9(4500) − (198) 2 ][9(180) − (36) 2 ] 648
n xi − xi n yi − yi
i =1 i =1 i =1 i =1
Interpretation:
The value of 𝑟 = −0.7361 indicates a strong negative linear relationship between
the age of the mother at her wedding and the number of children in the family.
39
5. Compute the coefficient of determination and interpret.
𝑟 2 = −0.7361 2
× 100% = 54.18%
which means that 54% of the total variation of the number of children in
the family is explained or accounted by the age of the mother at her
wedding.
SEATWORK
Show all the necessary solutions.
1. When buying items, it is sometimes advantageous to buy in large
quantities because the unit price is usually less for larger quantities. To
test if there is a linear relationship between the number of quantities of a
particular item and the cost per quantity, the following data were
obtained.
d. Compute the sample correlation coefficient r and interpret.
e. Compute the coefficient of determination and interpret.
41
42