0% found this document useful (0 votes)
3 views

STAT2507.Chapter3Part2.W22

Chapter 3 – Part 2 discusses numerical measures for quantitative bivariate data, focusing on dependent and independent variables, covariance, correlation coefficients, and least-squares regression lines. It explains how to predict one variable based on another, compute sample covariance and correlation coefficients, and derive the least-squares regression line for linear relationships. The chapter includes practical exercises to apply these concepts using real data on blood alcohol content and beer consumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

STAT2507.Chapter3Part2.W22

Chapter 3 – Part 2 discusses numerical measures for quantitative bivariate data, focusing on dependent and independent variables, covariance, correlation coefficients, and least-squares regression lines. It explains how to predict one variable based on another, compute sample covariance and correlation coefficients, and derive the least-squares regression line for linear relationships. The chapter includes practical exercises to apply these concepts using real data on blood alcohol content and beer consumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 3 – Part 2

Numerical Measures for


Quantitative Bivariate Data

© 2022 Wayne Horn (excluding images)


Outline
▪ In Chapter 3 – Part 2, we will discuss the following topics:
▪ The Dependent Variable and Independent Variable
▪ Covariance
▪ The Correlation Coefficient
▪ The Least-Squares Regression Line

2
The Dependent Variable and Independent Variable
▪ We are usually interested in investigating the relationship between
two quantitative variables because we wish to predict the value of
one variable based on the value of the other variable.
▪ Examples:
▪ A real estate agent wants to predict the selling price of a house
based on the number of bedrooms.
▪ A personal trainer wants to predict the number of calories
somebody burns based on the total time spent exercising.
▪ A student wants to predict their final exam grade for a course
based on their midterm exam grade.

3
The Dependent Variable and Independent Variable
▪ The variable whose values we want to predict is referred to as the
dependent variable and is denoted by Y.
▪ The variable on which we base our predictions is referred to as the
independent variable and is denoted by X.
▪ As we saw earlier, a scatterplot of the observed (x, y) values will
show us the nature and strength of the relationship between X and Y.
▪ In this chapter, we will focus our attention on variables that have an
approximately linear relationship.

4
Population Covariance
▪ The population covariance of two variables X and Y, denoted by  XY ,
is a measure of the linear dependence between X and Y.
▪ Covariance can assume any real value.
▪ If the covariance is positive, then as one variable changes the other
variable tends to move in the same direction.
▪ If the covariance is negative, then as one variable changes the other
variable tends to move in the opposite direction.

5
Sample Covariance
▪ In order to estimate the population covariance, we take a random
sample of n pairs of values

(x1, y1), (x2, y2), …, (xn, yn)

and compute the sample covariance sxy as follows:

 n  n 
n x i y i 
 i =1   i =1 
n

 ( x i − x )( y i − y )  x y
i i −
n
s xy = i =1
= i =1
n −1 n −1
6
Exercise 1
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
Compute and interpret the sample 3 0.040 5 0.085
covariance between BAC and the 7 0.095 7 0.090
number of beers consumed. 3 0.070 1 0.010
5 0.060 4 0.050

7
Exercise 1

8
0.20

0.15
xi − x
(x i , y i )

yi −y
BAC

0.10

y = 0.07375

0.05

x = 4.8125
0.00
0 1 2 3 4 5 6 7 8 9
Beers

9
Population Correlation Coefficient
▪ The population correlation coefficient of two random variables X and
Y, denoted by  , is computed as
 xy
= .
 x y
▪ The correlation coefficient is a unitless measure of the direction and
strength of the linear dependence between X and Y.

10
Population Correlation Coefficient
▪ The population correlation coefficient always equals a value between
–1 and 1, inclusive.

–1 0 +1

perfect negative no linear perfect positive


linear relationship relationship linear relationship

11
Sample Correlation Coefficient
▪ In order to estimate the population correlation coefficient, we take a
random sample of n pairs of values
(x1, y1), (x2, y2), …, (xn, yn)
and compute the sample correlation coefficient r as follows:

s xy
r= .
s xs y
▪ The sample correlation coefficient always equals a value between –1
and 1, inclusive.

12
Examples of the Sample Correlation Coefficient

16
30.0

14
27.5
12

25.0
10

8 22.5
y

y
6 20.0

4
17.5

2
15.0
0

0 1 2 3 4 5 0 1 2 3 4
x x

r=1 r = –1
perfect positive perfect negative
linear relationship linear relationship
13
Examples of the Sample Correlation Coefficient

35
90

30 80

70
25
y

y
60
20

50

15
40

10 30
0 1 2 3 4 0 1 2 3 4
x x

r = –0.874 r = 0.408
strong negative moderate positive
linear relationship linear relationship
14
Examples of the Sample Correlation Coefficient

60
30
50

40
25
30

20 20

y
y

10

0 15

-10
10
-20

-30
0 1 2 3 4 5 10 15 20 25 30 35
x x

r = 0.165 r = –0.009
weak positive no discernible
linear relationship linear relationship
15
Examples of the Sample Correlation Coefficient

59

58

57

56

55

y
54

53

52

51

50

7 8 9 10 11 12 13
x

r=0
perfect quadratic relationship

16
Exercise 2
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
Compute and interpret the sample 3 0.040 5 0.085
correlation coefficient between BAC 7 0.095 7 0.090
and the number of beers consumed. 3 0.070 1 0.010
5 0.060 4 0.050

17
Exercise 1

18
The Least-Squares Regression Line
▪ If X and Y appear to have an approximately linear relationship, then
we can approximate the relationship using the equation of a line
Y = a + bX .
▪ The value a is the y-intercept. It is the value of Y when X= 0.
▪ The value b is the slope. It is the amount by which Y will change if we
increase X by 1 unit.
▪ The best-fitting line relating Y to X is called the least-squares
regression line.

19
The Least-Squares Regression Line
▪ The least-squares regression line is found by minimizing the sum of
the squared vertical differences between the data points and the line.
y
. .
. . Y = a + bX

. . .
. . .
x
20
The Least-Squares Regression Line
▪ The least-square regression line is Y = a + bX , where

sy
b=r and a = y − bx .
sx

▪ Note: Since sy and sx are both positive, b and r have the same sign.

21
Exercise 3
In the undergraduate statistics project Beers BAC Beers BAC
at Ohio State University, the relationship 5 0.100 3 0.020
between blood alcohol content (BAC)
2 0.030 5 0.050
and the number of beers consumed
appear to have an approximately 9 0.190 4 0.070
linear relationship. 8 0.120 6 0.100
(a) Find the least-squares regression line 3 0.040 5 0.085
for predicting BAC based on the number 7 0.095 7 0.090
of beers consumed. 3 0.070 1 0.010
(b) Predict the BAC for a student who 5 0.060 4 0.050
consumed 5 beers.
22
Exercise 1

23
Exercise 3
Fitted Line Plot
BAC = - 0.01270 + 0.01796 Beers
0.20

0.15
BAC

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9
Beers

24

You might also like