Lecture - Slides - 3 - Correlation and Regression
Lecture - Slides - 3 - Correlation and Regression
• 9.1 Correlation
• 9.2 Linear Regression
• 9.3 Measures of Regression and Prediction Intervals
• 9.4 Multiple Regression
Correlation
Correlation
• A relationship between two variables.
• The data can be represented by ordered pairs (x, y)
x is the independent (or explanatory) variable
y is the dependent (or response) variable
Example: 2
x 1 2 3 4 5 x
y –4 –2 –1 0 2 2 4 6
–2
–4
x x
No Correlation Nonlinear Correlation
Larson/Farber 4th ed. 7
Example: Constructing a Scatter Plot
dollars)
Advertising expenses
(in thousands of
Appears to be a dollars)correlation.
positive linear As the
advertising expenses increase, the sales tend to increase.
Larson/Farber 4th ed. 9
Example: Constructing a Scatter Plot
Using Technology
Old Faithful, located in Duration
x
Time,
y
Duration
x
Time,
y
Yellowstone National Park, is the 1.8 56 3.78 79
50
1 5
From the scatter plot, it appears that the variables have a
positive linear correlation.
Larson/Farber 4th ed. 11
Correlation Coefficient
Correlation coefficient
• A measure of the strength and the direction of a linear relationship
between two variables.
• The symbol r represents the sample correlation coefficient.
• A formula for r is
-1 0 1
If r = -1 there is If r is close to 0 If r = 1 there is a
a perfect there is no linear perfect positive
negative correlation correlation
correlation
r = 0.91 r = 0.88
x x
Strong negative correlation Strong positive correlation
y y
r = 0.42 r = 0.07
x x
Weak positive correlation Nonlinear Correlation
Larson/Farber 4th ed. 14
Calculating a Correlation Coefficient
In Words In Symbols
1. Find the sum of the x- x
values.
2. Find the sum of the y- y
values.
3. Multiply each x-value by xy
its corresponding y-value
and find the sum.
x y xy x2 y2
2.4 225 540 5.76 50,625
1.6 184 294.4 2.56 33,856
2.0 220 440 4 48,400
2.6 240 624 6.76 57,600
1.4 180 252 1.96 32,400
1.6 184 294.4 2.56 33,856
2.0 186 372 4 34,596
2.2 215 473 4.84 46,225
Σx = 15.8 Σy = 1634 Σxy = 3289.8 Σx2 = 32.44 Σy2 = 337,558
8(3289.8) 15.81634
8(32.44) 15.82 8(337, 558) 1634 2
501.2
0.9129
9.88 30, 508
r ≈ 0.913 suggests a strong positive linear correlation. As
the amount spent on advertising increases, the company
sales also increase.
Larson/Farber 4th ed. 19
Example: Using Technology to Find a
Correlation Coefficient
Use a technology tool to calculate Duration
x
Time,
y
Duration
x
Time,
y
the correlation coefficient for the 1.8 56 3.78 79
level of significance
Number of
pairs of data
in sample
• Right-tailed test
• Two-tailed test
Linear Regression
x
Larson/Farber 4th ed. 40
Residuals
Residual
• The difference between the observed y-value and the
predicted y-value for a given x-value on the line.
For a given x-value,
di = (observed y-value) – (predicted y-value)
y
Observed
d 6{
y-value
d4 { }d
d3{ 5
}d2 Predicted
}d1 y-value
x
Larson/Farber 4th ed. 41
Regression Line
• ŷ = mx + b where
n xy x y y x
m 2 b y mx m
n x 2 x n n
• y is the mean of the y-values in the data
• x is the mean of the x-values in the data
• The regression line always passes through the point
x, y
yˆ 50.729 x 104.061
x
Advertising expenses
(in thousands of dollars)
Larson/Farber 4th ed. 47
Example: Using Technology to Find a
Regression Equation
Use a technology tool to find the Duration
x
Time,
y
Duration
x
Time,
y
equation of the regression line for 1.8 56 3.78 79
100
50
1 5
Larson/Farber 4th ed. 49
Example: Predicting y-Values Using
Regression Equations
The regression equation for the advertising expenses (in
thousands of dollars) and company sales (in thousands
of dollars) data is ŷ = 50.729x + 104.061. Use this
equation to predict the expected company sales for the
following advertising expenses. (Recall from section 9.1
that x and y have a significant linear correlation.)
1.1.5 thousand dollars
2.1.8 thousand dollars
3.2.5 thousand dollars
Total Deviation = yi y
Explained Deviation = yˆi y
Unexplained Deviation = yi yˆi
Total variation
• The sum of the squares of the differences between the
y-value of each ordered pair and the mean of y.
2
Total variation = yi y
Explained variation
• The sum of the squares of the differences between
each predicted y-value and the mean of y.
2
Explained variation = yˆi y
Unexplained variation
• The sum of the squares of the differences between the
y-value of each ordered pair and each corresponding
predicted y-value.
2
Unexplained variation = yi yˆi
Coefficient of determination
• The ratio of the explained variation to the total
variation.
• Denoted by r2
2 Explained variation
r
Total variation
Solution:
Use a table to calculate the sum of the squared
differences of each observed y-value and the
corresponding predicted y-value.
x y ŷi (yi – ŷ i)2
2.4 225 225.81 (225 – 225.81)2 = 0.6561
1.6 184 185.23 (184 – 185.23)2 = 1.5129
2.0 220 205.52 (220 – 205.52)2 = 209.6704
2.6 240 235.96 (240 – 235.96)2 = 16.3216
1.4 180 175.08 (180 – 175.08)2 = 24.2064
1.6 184 185.23 (184 – 185.23)2 = 1.5129
2.0 186 205.52 (186 – 205.52)2 = 381.0304
2.2 215 215.66 (215 – 215.66)2 = 0.4356
Σ = 635.3463
unexplained variation
Larson/Farber 4th ed. 65
Solution: Standard Error of Estimate
( yi yˆi) 2 635.3463
10.290
se 82
n2
1 8(2.1 1.975) 2
(2.447)(10.290) 1 26.857
8 8(32.44) (15.8) 2
Multiple Regression