0% found this document useful (0 votes)
15 views

Lesson 6-8 Linear Regression and Correlation

This is for you to help finding linear regression and correlation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lesson 6-8 Linear Regression and Correlation

This is for you to help finding linear regression and correlation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 8

Simple Linear
Regression and
Correlation Analysis

1
Introduction to
and
Introduction to Regression Analysis
is used to:
• predict the value of a dependent variable based on
the value of at least one independent variable
• explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the
dependent variable
• a.k.a. scatter diagram
• shows the relationship between two
variables
Scatter Plot Examples
Linear relationships Nonlinear relationships

y y

x x

y y

x x
Who coined the term ‘regression’?
Simple Linear Regression Model
• Only one independent variable, 𝑥
• Relationship between 𝑥 and 𝑦 is
described by a linear function
• Changes in 𝑦 are assumed to be
caused by changes in 𝑥
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


Linear Regression Assumptions
• X is not a random variable.The values of the
independent variable X may be “fixed”, or the
researcher may select the values of X in advance.
• The values of X are measured without error.
• The variances around the line is the same for all
values of the independent variable (X). This is the
condition called homoscedasticity.
• The subpopulation of the dependent variable Y, given
different values of the independent variable X, is
normally distributed.
• The means of the subpopulations of Y all lie on the
same straight line (This is called the assumption of
linearity)

11
Estimated Regression Model

The sample regression line provides an estimate of


the population regression line
Estimate of the Estimate of the
Estimated (or regression
regression slope
predicted) y value intercept

Independent
ŷ = a + bx variable

The individual random error terms ei have a mean of zero


Formula
n n n
n xi yi −  xi  yi
b= i =1 i =1 i =1
2

n
 n
n x −   xi 
2
i
i =1  i =1 
Interpretation of
Slope b and intercept a
Interpretation of 𝒃
A sociologist wants to know whether the number of
children in the family is linearly dependent on the
age of the mother at her wedding. He interviewed 9
housewives and the results are shown below:

Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6

1. Construct a scatter plot. Interpret.


2. Obtain the estimated regression line equation.
3. Estimate the number of children if a mother’s age at wedding is 21.
1. Plot the scatter diagram/scatter plot. Interpret.

Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6
1 2 3 3 4 5 6 7 8 9 10
Number of Children

10 13 16 19 22 25 28 31 34 37

Age at wedding
2. Obtain the estimated regression line equation.

(X) Age at wedding 21 15 22 22 21 25 30 18 24


(Y) No. of children 4 8 3 4 2 3 1 5 6

෍ 𝑥𝑖 = 21 + 15 + 22 + 22 + 21 + 25 + 30 + 18 + 24 = 198
𝑖=1

෍ 𝑦𝑖 = 4 + 8 + 3 + 4 + 2 + 3 + 1 + 5 + 6 = 36
𝑖=1

෍ 𝑥𝑖 2 = 212 + 152 + 222 + 222 + 212 + 252 + 302 + 182 + 242 = 4500
𝑖=1
2. Obtain the estimated regression line equation.
(X) Age at wedding 21 15 22 22 21 25 30 18 24
(Y) No. of children 4 8 3 4 2 3 1 5 6

෍ 𝑦𝑖 2 = 42 + 82 + 32 + 42 + 22 + 32 + 12 + 52 + 62 = 180
𝑖=1

෍ 𝑥𝑖 𝑦𝑖 = 21 4 + 15 8 + 22 3 + 22 4 + 21 2 + 25 3 + 30 1 + 18 5 + 24 6 = 739
𝑖=1

σ𝑛𝑖=1 𝑥𝑖 198 σ𝑛𝑖=1 𝑦𝑖 36


𝑛=9 𝑥ҧ = = = 22 𝑦ത = = =4
𝑛 9 𝑛 9
2. Obtain the estimated regression line equation.
n n n
n  xi yi −  xi  yi
b= i =1 i =1 i =1
2 a = 𝑦ത − 𝑏𝑥ҧ = 4 − −0.368 22 = 12.096
n
  n
n  x −   xi 
2
i
i =1  i =1  Thus, the estimated regression line equation is
9(739) − (198)(36)
= 𝑦ො = 𝑎 + 𝑏𝑥
9( 4500) − (198) 2
= 12.096 − 0.368𝑥
6651 − 7128
=
40500 − 39204
− 477
= = −0.368
1296
3. Estimate the number of children if a mother’s age
at wedding is 21.
If 𝑥 = 21, then
𝑦ො = 12.096 − 0.368𝑥
= 12.096 − 0.368 21
= 4.368 ≈ 4.
This means that the number of children in the family is four if the age of the mother
at wedding is 21.
SEATWORK
Show all the necessary solutions.
1. When buying items, it is sometimes advantageous to buy in large
quantities because the unit price is usually less for larger quantities. To
test if there is a linear relationship between the number of quantities of a
particular item and the cost per quantity, the following data were
obtained.
a. Draw a scatter diagram of the given data.
b. Find the equation of the estimated regression line.
c. What is the expected cost per unit if we buy 2 dozen units of items?
Number of Units (X) 1 3 5 10 12 15
Cost per Unit (Y) 55 52 48 36 32 30

22
23
24
• measures the strength of the association (linear
relationship) between two variables
• only concerned with strength of the relationship
• no causal effect is implied
Scatter Plot
(continued)
Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples
(continued)
No relationship

x
Correlation Coefficient

• measures the strength of the association between


the variables

• estimate of ρ
• measures the strength of the linear relationship in
the sample observations
Who developed Pearson product moment correlation coefficient?

Karl Pearson
(1857-1936)
• English mathematician
• a protégé and biographer of
Sir Francis Galton
Features of ρ and r
• unit free
• range between -1 and 1
• the closer to -1, the stronger the negative linear
relationship
• the closer to 1, the stronger the positive linear
relationship
• the closer to 0, the weaker the linear relationship
Examples of Approximate r Values
Perfect negative linear
relationship No linear relationship
y y y

x x x
r = -1 r = -.6 r=0

y y

Perfect positive linear


relationship

r = +.3 x r = +1 x
Calculating the Correlation Coefficient

n n n
n  xi y i −  xi  y i
r= i =1 i =1 i =1
 n  n   n
2
 n  
2
n x 2 −  x    n y 2 −  y  
  i   i    i   i 
 i =1  i =1    i =1  i =1  

where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Coefficient of Determination, r2
• the total variation in the dependent
variable that is explained by variation in
the independent variable

0  r 1 2
Examples of Approximate
R2 Values
y
r2 = 1

Perfect linear relationship


between x and y:
x
r2 = 1
y 100% of the variation in y is
explained by variation in x

x
r2 = +1
Examples of Approximate
R2 Values

y
0 < r2 < 1

Weaker linear relationship


between x and y:
x
Some but not all of the variation
y
in y is explained by variation in x

x
Examples of Approximate
R2 Values

r2 = 0
y
No linear relationship between x
and y:

The value of Y does not depend


x on x. (None of the variation in y
r2 = 0
is explained by variation in x)
A sociologist wants to know whether the
number of children in the family is
linearly dependent on the age of the
mother at her wedding. He interviewed 9
housewives and the results are shown
below:

(X) Age at wedding 21 15 22 22 21 25 30 18 24


(Y) No. of children 4 8 3 4 2 3 1 5 6
4. Compute the sample correlation coefficient r and interpret.
5. Compute the coefficient of determination and interpret.
4. Compute the sample correlation coefficient r and
interpret. The given values are:
𝑛 𝑛 𝑛 𝑛 𝑛

෍ 𝑥𝑖 = 198 ෍ 𝑦𝑖 = 36 ෍ 𝑥𝑖 2 = 4500 ෍ 𝑦𝑖 2 = 180 ෍ 𝑥𝑖 𝑦𝑖 = 739 𝑛=9


𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

n n n
n xi yi −  xi  yi
9(739) − (198)(36) − 477
r= i =1 i =1 i =1
= = = −0.7361
 n 2  n 2   n 2  n 2  [9(4500) − (198) 2 ][9(180) − (36) 2 ] 648
n xi −   xi   n yi −   yi  
 i =1  i =1    i =1  i =1  

Interpretation:
The value of 𝑟 = −0.7361 indicates a strong negative linear relationship between
the age of the mother at her wedding and the number of children in the family.
39
5. Compute the coefficient of determination and interpret.

The sample of coefficient of determination 𝑟 2 is computed as

𝑟 2 = −0.7361 2
× 100% = 54.18%

which means that 54% of the total variation of the number of children in
the family is explained or accounted by the age of the mother at her
wedding.
SEATWORK
Show all the necessary solutions.
1. When buying items, it is sometimes advantageous to buy in large
quantities because the unit price is usually less for larger quantities. To
test if there is a linear relationship between the number of quantities of a
particular item and the cost per quantity, the following data were
obtained.
d. Compute the sample correlation coefficient r and interpret.
e. Compute the coefficient of determination and interpret.

Number of Units (X) 1 3 5 10 12 15


Cost per Unit (Y) 55 52 48 36 32 30

41
42

You might also like