STAT2507.Chapter3Part2.W22
STAT2507.Chapter3Part2.W22
2
The Dependent Variable and Independent Variable
▪ We are usually interested in investigating the relationship between
two quantitative variables because we wish to predict the value of
one variable based on the value of the other variable.
▪ Examples:
▪ A real estate agent wants to predict the selling price of a house
based on the number of bedrooms.
▪ A personal trainer wants to predict the number of calories
somebody burns based on the total time spent exercising.
▪ A student wants to predict their final exam grade for a course
based on their midterm exam grade.
3
The Dependent Variable and Independent Variable
▪ The variable whose values we want to predict is referred to as the
dependent variable and is denoted by Y.
▪ The variable on which we base our predictions is referred to as the
independent variable and is denoted by X.
▪ As we saw earlier, a scatterplot of the observed (x, y) values will
show us the nature and strength of the relationship between X and Y.
▪ In this chapter, we will focus our attention on variables that have an
approximately linear relationship.
4
Population Covariance
▪ The population covariance of two variables X and Y, denoted by XY ,
is a measure of the linear dependence between X and Y.
▪ Covariance can assume any real value.
▪ If the covariance is positive, then as one variable changes the other
variable tends to move in the same direction.
▪ If the covariance is negative, then as one variable changes the other
variable tends to move in the opposite direction.
5
Sample Covariance
▪ In order to estimate the population covariance, we take a random
sample of n pairs of values
n n
n x i y i
i =1 i =1
n
( x i − x )( y i − y ) x y
i i −
n
s xy = i =1
= i =1
n −1 n −1
6
Exercise 1
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
Compute and interpret the sample 3 0.040 5 0.085
covariance between BAC and the 7 0.095 7 0.090
number of beers consumed. 3 0.070 1 0.010
5 0.060 4 0.050
7
Exercise 1
8
0.20
0.15
xi − x
(x i , y i )
yi −y
BAC
0.10
y = 0.07375
0.05
x = 4.8125
0.00
0 1 2 3 4 5 6 7 8 9
Beers
9
Population Correlation Coefficient
▪ The population correlation coefficient of two random variables X and
Y, denoted by , is computed as
xy
= .
x y
▪ The correlation coefficient is a unitless measure of the direction and
strength of the linear dependence between X and Y.
10
Population Correlation Coefficient
▪ The population correlation coefficient always equals a value between
–1 and 1, inclusive.
–1 0 +1
11
Sample Correlation Coefficient
▪ In order to estimate the population correlation coefficient, we take a
random sample of n pairs of values
(x1, y1), (x2, y2), …, (xn, yn)
and compute the sample correlation coefficient r as follows:
s xy
r= .
s xs y
▪ The sample correlation coefficient always equals a value between –1
and 1, inclusive.
12
Examples of the Sample Correlation Coefficient
16
30.0
14
27.5
12
25.0
10
8 22.5
y
y
6 20.0
4
17.5
2
15.0
0
0 1 2 3 4 5 0 1 2 3 4
x x
r=1 r = –1
perfect positive perfect negative
linear relationship linear relationship
13
Examples of the Sample Correlation Coefficient
35
90
30 80
70
25
y
y
60
20
50
15
40
10 30
0 1 2 3 4 0 1 2 3 4
x x
r = –0.874 r = 0.408
strong negative moderate positive
linear relationship linear relationship
14
Examples of the Sample Correlation Coefficient
60
30
50
40
25
30
20 20
y
y
10
0 15
-10
10
-20
-30
0 1 2 3 4 5 10 15 20 25 30 35
x x
r = 0.165 r = –0.009
weak positive no discernible
linear relationship linear relationship
15
Examples of the Sample Correlation Coefficient
59
58
57
56
55
y
54
53
52
51
50
7 8 9 10 11 12 13
x
r=0
perfect quadratic relationship
16
Exercise 2
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
Compute and interpret the sample 3 0.040 5 0.085
correlation coefficient between BAC 7 0.095 7 0.090
and the number of beers consumed. 3 0.070 1 0.010
5 0.060 4 0.050
17
Exercise 1
18
The Least-Squares Regression Line
▪ If X and Y appear to have an approximately linear relationship, then
we can approximate the relationship using the equation of a line
Y = a + bX .
▪ The value a is the y-intercept. It is the value of Y when X= 0.
▪ The value b is the slope. It is the amount by which Y will change if we
increase X by 1 unit.
▪ The best-fitting line relating Y to X is called the least-squares
regression line.
19
The Least-Squares Regression Line
▪ The least-squares regression line is found by minimizing the sum of
the squared vertical differences between the data points and the line.
y
. .
. . Y = a + bX
. . .
. . .
x
20
The Least-Squares Regression Line
▪ The least-square regression line is Y = a + bX , where
sy
b=r and a = y − bx .
sx
▪ Note: Since sy and sx are both positive, b and r have the same sign.
21
Exercise 3
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
(a) Find the least-squares regression line 3 0.040 5 0.085
for predicting BAC based on the number 7 0.095 7 0.090
of beers consumed. 3 0.070 1 0.010
(b) Predict the BAC for a student who 5 0.060 4 0.050
consumed 5 beers.
22
Exercise 1
23
Exercise 3
Fitted Line Plot
BAC = - 0.01270 + 0.01796 Beers
0.20
0.15
BAC
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9
Beers
24