(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)
(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)
An Introduction to Econometrics
A Self-contained Approach
Frank Westhoff
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (includ-
ing photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information,
please email [email protected] or write to Special Sales Department, The MIT Press, 55 Hayward Street,
Cambridge, MA 02142.
This book was set in Times and Syntax by Toppan Best-set Premedia Limited. Printed and bound in the United States
of America.
2012049424
10 9 8 7 6 5 4 3 2 1
Contents
1 Descriptive Statistics 1
Chapter 1 Prep Questions 1
1.1 Describing a Single Data Variable 2
1.1.1 Introduction to Distributions 2
1.1.2 Measure of the Distribution Center: Mean (Average) 3
1.1.3 Measures of the Distribution Spread: Range, Variance, and Standard Deviation 8
1.1.4 Histogram: Visual Illustration of a Data Variable’s Distribution 11
1.2 Describing the Relationship between Two Data Variables 13
1.2.1 Scatter Diagram: Visual Illustration of How Two Data Variables Are Related 13
1.2.2 Correlation of Two Variables 13
1.2.3 Measures of Correlation: Covariance 13
1.2.4 Independence of Two Variables 19
1.2.5 Measures of Correlation: Correlation Coefficient 22
1.2.6 Correlation and Causation 27
1.3 Arithmetic of Means, Variances, and Covariances 27
Chapter 1 Review Questions 29
Chapter 1 Exercises 30
Appendix 1.1: The Arithmetic of Means, Variances, and Covariances 40
4.1.5 Importance of the Variance (Spread) of the Estimate’s Probability Distribution for an Unbiased
Estimation Procedure 125
4.2 Hypothesis Testing 126
4.2.1 Motivating Hypothesis Testing: The Evidence and the Cynic 126
4.2.2 Formalizing Hypothesis Testing: Five Steps 130
4.2.3 Significance Levels and Standards of Proof 133
4.2.4 Type I and Type II Errors: The Trade-Offs 135
Chapter 4 Review Questions 138
Chapter 4 Exercises 139
8.4 Summary: The Ordinary Least Squares (OLS) Estimation Procedure 270
8.4.1 Regression Model and the Role of the Error Term 270
8.4.2 Standard Ordinary Least Squares (OLS) Premises 271
8.4.3 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation Procedures 271
8.4.4 Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard Ordinary
Least Squares (OLS) Premises 272
Chapter 8 Review Questions 273
Chapter 8 Exercises 273
Appendix 8.1: Student t-Distribution Table—Right-Tail Critical Values 278
Appendix 8.2: Assessing the Reliability of a Coefficient Estimate Using the Student t-Distribution Table 280
16 Heteroskedasticity 513
Chapter 16 Prep Questions 513
16.1 Review 515
16.1.1 Regression Model 515
16.1.2 Standard Ordinary Least Squares (OLS) Premises 516
16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure 516
16.2 What Is Heteroskedasticity? 517
16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The Consequences 519
16.3.1 The Mathematics 519
16.3.2 Our Suspicions 522
16.3.3 Confirming Our Suspicions 522
xii Contents
18 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables 579
Chapter 18 Prep Questions 580
18.1 Review 583
18.1.1 Regression Model 583
18.1.2 Standard Ordinary Least Squares (OLS) Premises 583
18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure 584
18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure 584
18.3 A Closer Look at the Explanatory Variable/Error Term Independence Premise 586
18.4 Explanatory Variable/Error Term Correlation and Bias 588
18.4.1 Geometric Motivation 588
18.4.2 Confirming Our Logic 590
18.5 Estimation Procedures: Large and Small Sample Properties 592
18.5.1 Unbiased and Consistent Estimation Procedure 595
18.5.2 Unbiased but Inconsistent Estimation Procedure 596
18.5.3 Biased but Consistent Estimation Procedure 598
18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency 601
18.7 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure 602
18.7.1 Motivation of the Instrumental Variables Estimation Procedure 602
18.7.2 Mechanics 603
18.7.3 The “Good” Instrument Conditions 603
18.7.4 Justification of the Instrumental Variables Estimation Procedure 604
xiii Contents
Index 867
How to Use This Textbook
This textbook utilizes many empirical examples and Java simulations that play a critical role.
The empirical examples show you how we use statistical software to analyze real world data.
The Java simulations confirm the algebraic equations that are derived in the chapters, providing
you with a better appreciation of what the equations mean, and also demonstrate important
econometric concepts without delving into complicated mathematics. The simulations are called
Econometrics Labs. The textbook calls your attention to the empirical examples and labs by
denoting them with a icon.
To gain the most benefit from the textbook, you should read the textbook while seated at a
computer to take advantage of the empirical examples and labs. Connect to the following url:
http://mitpress.mit.edu/westhoffeconometrics
This takes you to the first page of our textbook website. You will now be asked to select a
data format for the empirical examples. All the data used in the textbook are available on the
website stored as EViews workfiles, State workfiles, and Excel spreadsheets. If you will be using
the EViews statistical software, click on EViews, or if you will be using Stata, click on Stata;
otherwise, click on Excel. After doing so, bookmark this page. In the future, whenever you see
the icon in the textbook, connect to your bookmarked page to avoid specifying your sta-
tistical software repeatedly.
Next click on the chapter you are reading. A list of the empirical examples and labs for the
chapter will now appear. Click on the appropriate one. To gain the most from the textbook, you
should perform the empirical analysis and complete the labs for yourself as well as read the
results that are presented in the textbook. The textbook includes many “Getting Started in
EViews” sections to guide you through the empirical examples if you are using EViews. Also
note that the labs are Java applets; consequently the computer you use must have the Java
Runtime Environment installed to run the labs. Each lab may take a few seconds to load. (If you
have trouble viewing the applets, be certain you are running an up-to-date version of Java.)
Shortly thereafter instructions will appear and the lab will pose questions. You can navigate from
question to question clicking Next Question and, if need be, Previous Question. You should
work your way through each empirical example and lab as you read along in the textbook. By
doing so, you will gain a better appreciation of the concepts that are introduced.
Descriptive Statistics
1
Chapter 1 Outline
1. Look at precipitation data for the twentieth century. How would you decide which month of
the year was the wettest?
2. Consider the monthly growth rates of the Dow Jones Industrial Average and the Nasdaq
Composite Index.
a. In most months, would you expect the Nasdaq’s growth rate to be high or low when the
Dow’s growth rate is high?
b. In most months, would you expect the Nasdaq’s growth rate to be high or low when the
Dow’s growth rate is low?
2 Chapter 1
c. Would you describe the Dow and Nasdaq growth rates as being correlated or
uncorrelated?
Descriptive statistics allow us to summarize the information inherent in a data variable. The
weather provides many examples of how useful descriptive statistics can be. Every day we hear
people making claims about the weather. “The summer of 2012 was the hottest on record,” “April
is the wettest month of the year,” “Last winter was the coldest ever,” and so on. To judge the
validity of such statements, we need some information, some data.
We will focus our attention on precipitation in Amherst, Massachusetts, during the twentieth
century. Table 1.1 reports the inches of precipitation in Amherst for each month of the twentieth
century.1
What is the wettest month of the summer in Amherst? How can we address this question? While
it is possible to compare the inches of precipitation in June, July, and August by carefully study-
ing the numerical values recorded in table 1.1, it is difficult, if not impossible, to draw any
conclusions. There is just too much information to digest. In some sense, the table includes too
much detail; it overwhelms us. For example, we can see from the table that July was the wettest
summer month in 1996, August was the wettest summer month in 1997, June was the wettest
summer month in 1998, August was again the wettest summer month in 1999, and finally, June
was again the wettest summer month in 2000. We need a way to summarize the information
contained in table 1.1. Descriptive statistics perform this task. By describing the distribution of
the values, descriptive statistics distill the information contained in many observations into single
numbers. Summarizing data in this way has both benefits and costs. Without a summary, we can
easily “lose sight of the forest for the trees.” In the process of summarizing, however, some
information will inevitably be lost.
First, we will discuss the two most important types of descriptive statistics that describe a
single data variable: measures of the distribution center and measures of the distribution
spread. Next we will introduce histograms. A histogram visually illustrates the distribution of
a single data variable.
1. With the exception of two months, March 1950 and October 1994, the data were obtained from NOAA’s National
Climate Data Center. Data for these two months were missing from the NOAA center and were obtained from the Phillip
T. Ives records that are stored in the Amherst College archives.
3 Descriptive Statistics
No doubt the most commonly cited descriptive statistic is the mean or average.2 We use the
mean to denote the center of the distribution all the time in everyday life. For example, we use
the mean or average income earned by individuals in states, per capita income, to denote how
much a typical state resident earns. Massachusetts per capita income in 2000 equaled $25,952.
This means that some Massachusetts residents earned more than $25,952 and some less, but
$25,952 lies at the center of the income distribution of Massachusetts residents. A typical or
representative state resident earned $25,952. A baseball player’s batting average is also a mean:
the number of hits the player gets per official at bat.
Since the mean represents the center of the distribution, the representative value, why not
simply calculate the mean amount of precipitation in June, July, and August to decide on the
wettest summer month? The month with the highest mean would be deemed the wettest. To
calculate the mean (average) precipitation for June in the twentieth century, we sum the amount
of precipitation in each June and divide the total by the number of Junes, 100 in this case:
0.75 + 4.54 + . . . + 7.99 377.76
Mean for June = = = 3.78
100 100
The mean precipitation for June is 3.78 inches. More formally, we can let x represent the data
variable for monthly precipitation in June:
∑
T
x + x + . . . + xT x
t =1 t
Mean[ x] = x = 1 2 =
T T
where
T = total number of observations
The mean of a data variable is often denoted by a bar above the symbol, x–, pronounced “x bar.”
∑ t =1 xt T is a concise way to describe the arithmetic used to compute the mean. Let us now
T
2. The median and mode are other measures of the center. They are presented in chapter 25.
4 Chapter 1
Table 1.1
Monthly precipitation in Amherst, Massachusetts, 1901 to 2000 (inches)
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
1903 3.28 4.27 6.40 2.30 0.48 7.79 4.64 4.92 1.66 2.72 2.04 3.95
1904 4.74 2.45 4.48 5.73 4.55 5.35 2.62 4.09 5.45 1.74 1.35 2.75
1905 3.90 1.70 3.66 2.56 1.28 2.86 2.63 6.47 6.26 2.27 2.06 3.15
1906 2.18 2.73 4.90 3.25 4.95 2.82 3.45 6.42 2.59 5.69 1.98 4.49
1907 2.73 1.92 1.82 1.98 4.02 2.61 3.87 1.44 8.74 5.00 4.50 3.89
1908 2.25 3.53 2.86 1.97 4.35 0.76 3.28 4.27 1.73 1.57 1.06 3.05
1909 3.56 5.16 3.01 5.53 3.36 2.24 2.24 3.80 4.99 1.23 1.06 2.95
1910 6.14 5.08 1.37 3.07 2.67 2.65 1.90 4.03 2.86 0.93 3.69 1.72
1911 2.36 2.18 3.80 1.87 1.37 2.02 4.21 5.92 3.41 8.81 3.84 4.42
1912 2.18 3.16 5.70 3.92 4.34 0.77 2.61 3.22 2.52 2.07 4.03 4.04
1913 3.98 2.94 6.30 3.30 4.94 0.90 1.59 2.26 2.56 5.16 2.11 3.38
1914 3.72 3.36 5.52 6.59 3.56 2.32 3.53 5.11 0.52 2.09 2.62 2.89
1915 6.52 7.02 0.12 3.99 1.20 3.00 9.13 8.28 1.37 2.89 2.20 5.86
1916 2.56 5.27 3.97 3.69 3.21 4.97 6.85 2.49 5.08 1.01 3.29 2.85
1917 3.30 1.98 4.08 1.83 4.13 5.27 3.36 7.06 2.42 6.60 0.63 2.56
1918 4.11 2.99 2.91 2.78 2.47 4.01 1.84 2.22 7.00 1.32 2.87 2.95
1919 2.02 2.80 4.22 2.37 6.20 1.09 4.17 4.80 4.45 1.81 6.20 1.48
1920 2.74 4.45 2.90 4.71 3.65 6.26 2.06 3.62 6.74 1.54 4.62 6.02
1921 2.00 2.38 3.57 6.47 4.56 3.87 6.00 2.35 1.84 1.08 6.20 1.90
1922 1.56 3.02 5.34 2.81 5.47 9.68 4.28 4.25 2.27 2.55 1.56 3.15
1923 6.02 1.81 1.98 3.19 3.26 2.24 1.77 2.55 1.89 5.50 5.05 4.23
1924 3.85 2.56 1.05 4.54 2.21 1.28 1.75 3.11 5.87 0.01 2.57 2.16
1925 3.42 3.64 4.12 3.10 2.55 4.28 6.97 1.93 3.09 4.74 3.23 3.56
1926 3.23 5.01 3.95 3.62 1.19 2.03 3.24 3.97 1.50 5.02 5.38 2.78
1927 2.50 2.62 1.96 1.60 4.83 3.37 3.40 5.01 2.79 4.59 8.65 5.66
1928 2.19 2.90 1.17 4.16 3.25 6.97 6.23 8.40 3.07 0.87 1.79 0.97
1929 4.33 3.92 3.20 6.89 4.17 3.06 0.70 1.54 3.62 2.75 2.73 4.05
1930 2.59 1.39 3.95 1.41 3.34 4.47 4.50 1.82 2.08 2.24 3.42 1.63
1931 3.58 1.80 3.79 2.95 7.44 4.24 3.87 6.57 2.50 3.06 1.55 3.83
1932 3.68 2.70 4.24 2.33 1.67 2.62 3.83 2.67 3.96 3.69 6.05 1.99
1933 2.44 3.48 4.79 5.03 1.69 3.68 2.25 6.63 12.34 3.90 1.19 2.81
1934 3.50 2.82 3.60 4.44 3.42 4.67 1.73 3.02 9.54 2.35 3.50 2.99
1935 4.96 2.50 1.48 2.54 2.17 5.50 3.10 0.82 4.67 0.88 4.41 1.05
1936 6.47 2.64 7.04 4.07 1.76 3.28 1.45 4.85 3.80 4.80 2.02 5.96
1937 5.38 2.22 3.38 4.03 6.09 5.72 2.88 4.91 3.24 4.33 4.86 2.44
1938 6.60 1.77 2.00 3.07 3.81 8.45 7.45 2.04 14.55 2.49 3.02 3.95
1939 2.21 3.62 4.49 4.56 2.15 3.21 2.30 3.89 2.97 4.55 0.98 3.89
1940 2.63 2.72 5.58 6.37 5.67 2.46 4.69 1.56 1.53 1.04 6.31 3.01
5 Descriptive Statistics
Table 1.1
(continued)
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1941 2.21 1.59 1.63 0.55 2.87 6.13 4.04 1.79 2.88 2.13 4.29 3.82
1942 3.54 1.66 7.89 0.96 2.98 3.63 4.95 2.93 3.94 3.27 6.07 6.03
1943 2.92 1.63 3.07 3.66 5.62 2.38 6.18 2.49 2.40 3.88 4.64 0.58
1944 1.24 2.34 4.36 3.66 1.35 4.70 3.88 4.33 5.31 1.74 4.21 2.18
1945 3.07 3.33 2.16 5.43 6.45 7.67 7.36 2.79 3.57 2.18 3.54 3.91
1946 2.72 3.52 1.60 2.16 5.41 3.30 5.30 4.00 4.88 1.51 0.70 3.51
1947 3.37 1.96 3.29 4.59 4.63 3.22 2.73 1.69 2.84 2.04 5.63 2.33
1948 2.63 2.45 2.92 2.87 5.83 5.67 2.95 3.56 1.92 1.14 5.22 2.87
1949 4.52 2.47 1.67 2.70 4.76 0.72 3.41 3.64 3.55 2.58 1.79 2.44
1950 4.33 3.99 2.67 3.64 2.77 3.65 2.83 2.93 2.24 1.87 6.60 4.64
1951 3.28 4.61 5.13 3.63 2.96 3.05 4.15 3.56 2.63 4.66 4.64 4.35
1952 4.02 1.97 3.17 3.40 4.00 4.97 4.99 3.98 4.05 1.07 0.89 4.10
1953 6.24 2.97 8.24 5.36 6.81 2.41 1.95 1.87 1.88 5.15 2.36 4.53
1954 2.45 1.94 3.93 4.24 4.80 2.68 3.00 3.91 6.14 1.89 5.07 3.19
1955 0.81 3.73 4.39 4.76 3.00 4.06 1.99 16.10 3.80 7.57 4.46 0.79
1956 1.75 3.52 4.94 4.49 2.02 2.86 2.90 2.71 5.55 1.64 3.10 4.83
1957 1.38 1.10 1.55 2.75 3.89 4.50 1.67 0.94 1.57 2.19 5.54 6.39
1958 4.03 2.21 2.62 4.58 2.98 1.64 5.13 5.19 3.90 3.79 3.79 1.57
1959 3.81 2.32 3.84 3.80 1.04 5.65 5.07 6.70 1.03 7.81 4.33 3.85
1960 2.35 3.90 3.32 4.30 3.44 4.73 6.84 3.74 6.75 2.43 3.13 2.71
1961 2.52 3.16 3.00 4.72 3.20 6.05 2.82 2.86 2.02 2.33 3.79 3.27
1962 3.01 3.59 1.84 2.69 2.03 1.06 2.16 3.33 3.74 4.16 2.11 3.30
1963 2.95 2.62 3.61 2.00 1.97 3.98 1.92 2.54 3.56 0.32 3.92 2.19
1964 5.18 2.32 2.71 2.72 0.83 1.84 3.02 3.01 0.94 1.32 1.68 3.98
1965 1.57 2.33 1.10 2.43 2.69 2.41 3.97 3.43 3.68 2.32 2.36 1.88
1966 1.72 3.43 2.93 1.28 2.26 3.30 5.83 0.67 5.14 4.51 3.48 2.22
1967 1.37 2.89 3.27 4.51 6.30 3.61 5.24 3.76 2.12 1.92 2.90 5.14
1968 1.87 1.02 4.47 2.62 3.02 7.19 0.73 1.12 2.64 3.10 5.78 5.08
1969 1.28 2.31 1.97 3.93 2.73 3.52 6.89 5.20 2.94 1.53 5.34 6.30
1970 0.66 3.55 3.52 3.69 4.16 4.97 2.17 5.23 3.05 2.45 3.27 2.37
1971 1.95 3.29 2.53 1.49 3.77 2.68 2.77 4.91 4.12 3.60 4.42 3.19
1972 1.86 3.47 4.85 4.06 4.72 10.25 2.42 2.25 1.84 2.51 6.92 6.81
1973 4.26 2.58 3.45 6.40 5.45 4.43 3.38 2.17 1.83 2.24 2.30 8.77
1974 3.35 2.42 4.34 2.61 5.21 3.40 3.71 3.97 7.29 1.94 2.76 3.67
1975 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
1976 5.23 3.30 2.15 3.40 4.49 2.20 2.20 6.21 2.74 4.31 0.71 2.69
1977 2.24 2.21 5.88 4.91 3.57 3.83 4.04 5.94 7.77 5.81 4.37 5.22
1978 8.16 0.88 2.65 1.48 2.53 2.83 1.81 4.85 0.97 2.19 2.31 3.93
1979 11.01 2.49 3.00 5.37 4.78 0.77 6.67 5.14 4.54 5.79 3.84 4.00
1980 0.50 0.99 6.42 3.84 1.47 3.94 2.26 1.43 2.33 2.23 3.63 0.91
6 Chapter 1
Table 1.1
(continued)
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1981 0.49 7.58 0.24 4.48 2.99 3.81 3.11 1.36 3.53 6.10 1.57 4.41
1982 3.92 3.65 2.26 4.39 2.54 8.07 4.20 2.00 2.81 2.29 3.55 1.85
1983 4.82 4.42 4.95 8.99 5.54 2.42 3.10 2.39 1.82 5.47 7.05 6.40
1984 1.75 6.42 3.68 4.30 11.95 1.69 4.66 1.34 1.02 3.13 3.97 2.84
1985 1.73 1.97 2.65 1.55 4.53 3.59 2.16 4.29 2.88 3.50 6.27 1.78
1986 5.86 2.83 3.69 1.43 2.36 5.02 7.32 1.99 1.07 2.43 5.32 5.52
1987 4.32 0.08 4.58 4.76 1.44 4.16 1.51 3.84 7.65 4.16 3.27 2.31
1988 2.40 3.40 2.13 3.59 2.58 1.28 6.37 4.71 2.45 1.72 5.83 1.52
1989 0.94 2.55 2.00 4.29 8.79 5.74 3.81 5.97 5.99 8.10 3.21 1.06
1990 4.32 3.15 3.13 4.35 6.79 1.49 1.70 8.05 1.42 6.40 3.64 5.07
1991 2.37 1.67 4.73 3.66 5.40 2.03 1.39 9.06 7.10 4.21 5.01 3.20
1992 2.12 1.78 3.25 2.95 2.32 3.34 4.28 7.63 2.47 2.18 4.43 3.76
1993 2.18 2.31 5.44 4.69 0.88 2.53 2.99 3.04 4.59 3.79 4.35 3.86
1994 5.76 1.87 5.60 3.19 6.34 2.70 6.87 4.39 3.72 1.34 3.87 5.06
1995 3.66 3.00 1.68 2.15 2.09 2.10 3.75 2.38 3.04 10.93 4.66 2.20
1996 6.68 4.01 2.19 8.30 3.62 4.50 6.94 0.70 6.01 4.11 3.59 6.09
1997 3.56 2.27 3.19 3.68 3.56 1.30 3.99 4.69 1.30 2.27 4.67 1.38
1998 4.19 2.56 4.53 2.79 3.50 8.60 2.06 1.45 2.31 5.70 1.78 1.24
1999 5.67 1.89 4.82 0.87 3.83 2.78 1.65 5.45 13.19 3.48 2.77 1.84
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24
∑
T
Consequently the expression x says “calculate the sum of the xt’s from t equals 1 to t
t =1 t
equals T”; that is,
∑
T
x = x1 + x2 + . . . + xT
t =1 t
Note that the x in Mean[x] is in a bold font. This is done to emphasize the fact that the mean
describes a specific characteristic, the distribution center, of the entire collection of values, the
entire distribution.
Suppose that we want to calculate the precipitation mean for each summer month. We could
use the information in tables and a pocket calculator to compute the means. This would not only
be laborious but also error prone. Fortunately, econometric software provides us with an easy
and reliable alternative. The Amherst weather data are posted on our website.
7 Descriptive Statistics
Amherst precipitation data: Monthly time series precipitation in Amherst, Massachusetts from
1901 to 2000 (inches)
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
condition (optional)” text area to restrict the sample to July only. Click OK. Descriptive statistics
for the 100 Julys appear in the Group window. Record the mean.
• In the Group window: Click Sample. In the Sample window: Enter month = 8 in the “If
condition (optional)” text area to restrict the sample to August only. Click OK. Descriptive
statistics for the 100 Augusts appear in the Group window. Record the mean.
3. Common sample eliminates all observations in which there is one or more missing values in one of the variables;
the individual samples option does not do so. Since no values are missing for June, July, and August, the choice of
common or individual has no impact.
8 Chapter 1
otherwise, the restriction, month = 8, will remain in effect if you ask EViews to perform any
more computations.
Table 1.2 summarizes the information. August has the highest mean. Based on the mean criterion,
August was the wettest summer month in the twentieth century; the mean for August equals
3.96, which is greater than the mean for June or July.
1.1.3 Measures of the Distribution Spread: Range, Variance, and Standard Deviation
While the center of the distribution is undoubtedly important, the spread can be crucial also. On
the one hand, if the spread is small, all the values of the distribution lie close to the center, the
mean. On the other hand, if the spread is large, some of the values lie far below the mean and
some lie far about the mean. Farming provides a good illustration of why the spread can be
important. Obviously the mean precipitation during the growing season is important to the
farmer. But the spread of the precipitation is important also. Most crops grow best when they
get a steady amount of moderate rain over the entire growing season. An unusually dry period
followed by an unusually wet period, and vice versa, is not welcome news for the farmer. Both
the center (mean) and the spread are important. The years 1951 and 1998 illustrate this well (see
table 1.3).
In reality, 1951 was a better growing season than 1998 even though the mean for 1998 was
a little higher. Precipitation was less volatile in 1951 than in 1998. Arguably, the most straight-
forward measure of distribution spread is its range. In 1951, precipitation ranged from a
minimum of 2.96 to a maximum of 4.15. In 1998, the range was larger from 1.45 to 8.60.
While the range is the simplest, it is not the most sensitive. The most widely cited measure
of spread is the variance and its closely related cousin, the standard deviation. The variance
equals the average of the squared deviations of the values from the mean. While this definition
may sound a little overwhelming when first heard, it is not as daunting as it sounds. We can use
the following three steps to calculate the variance:
Table 1.2
Mean monthly precipitation for the summer months in Amherst, Massachusetts, 1901 to 2000
Table 1.3
Growing season precipitation in Amherst, Massachusetts, 1951 and 1998
•For each month, calculate the amount by which that month’s precipitation deviates from the
mean.
• Square each month’s deviation.
•Calculate the average of the squared deviations; that is, sum the squared deviations and divide
by the number of months, 5 in this case.
Note that the mean and the variance are expressed in different units; the mean is expressed in
inches and the variance in inches squared. Often it is useful to compare the mean and the measure
of spread directly, in terms of the same units. The standard deviation allows us to do just that.
The standard deviation is the square root of the variance; hence the standard deviation is
expressed in inches, just like the mean:
We can use the same procedure to calculate the variance and standard deviation for 1951:
When the spread is small, as it was in 1951, all observations will be close to the mean. Hence
the deviations will be small. The squared deviations, the variance, and the standard deviation
will also be small. However, if the spread is large, as it was in 1998, some observations must
be far from the mean. Hence some deviations will be large. Some squared deviations, the vari-
ance, and the standard deviation will also be large. Let us summarize:
We can concisely summarize the steps for calculating the variance with the following
equations:
( x1 − Mean[ x])2 + ( x2 − Mean[ x])2 + . . . + ( xT − Mean[ x])2
Var[ x] =
T
( x1 − x )2 + ( x2 − x )2 + . . . + ( xT − x )2
=
T
where
∑ ∑
T T
t =1
( xt − Mean[ x])2 t =1
( xt − x ) 2
Var[ x] = =
T T
The standard deviation is the square root of the variance:
SD[ x] = Var[ x]
Again, let us now “dissect” the summation expressions ∑ t =1 ( xt − Mean[ x])2 and ∑
T T
t =1
( xt − x )2 :
• The uppercase Greek sigma, Σ, is an abbreviation for the word summation.
• The t = 1 and T represent the first and last observations of the summation.
• The xt represents observation t of the data variable.
∑ ( xt − Mean[ x])2 and ∑ t =1 ( xt − x )2 equal the sum of the squared deviations from the mean.
T T
t =1
Note that the x in Var[x] and in SD[x] is in a bold font. This emphasizes the fact that the
variance and standard deviation describe one specific characteristic, the distribution spread, of
the entire distribution.
A histogram is a bar graph that visually illustrates how the values of a single data variable are
distributed. Figure 1.1 is a histogram for September precipitation in Amherst. Each bar of the
histogram reports on the number of months in which precipitation fell within the specified range.
Number of months
30
25
20
15
10
0
0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15
Inches
Figure 1.1
Histogram—September precipitation in Amherst, Massachusetts, 1901 to 2000
12 Chapter 1
We can use histograms to illustrate the differences between two distributions. For example,
compare the histogram for September (figure 1.1) and the histogram for February precipitation
(figure 1.2):
The obvious difference in the two histograms is that the September histogram has a longer
“right-hand tail.” The center of September’s distribution lies to the right of February’s; conse-
quently we would expect September’s mean to exceed February’s. Also the distribution of
precipitation in September is more “spread out” than the distribution in February; hence we
would expect September’s variance to be larger. Table 1.4 confirms quantitatively what we
observe visually. September has a higher mean: 3.89 for September versus 2.88 for February.
Also the variance for September is greater.
Number of months
30
25
20
15
10
0
0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15
Inches
Figure 1.2
Histogram—February precipitation in Amherst, Massachusetts, 1901 to 2000
13 Descriptive Statistics
Table 1.4
Means and variances of precipitation for February and September, 1901 to 2000
Mean Variance
1.2.1 Scatter Diagram: Visual Illustration of How Two Data Variables Are Related
We will use the Dow Jones and Nasdaq data appearing in tables 1.5a and 1.5b to introduce
another type of useful graph, the scatter diagram, which visually illustrates the relationship
between two variables.
We will focus on the relationship between the Dow Jones and Nasdaq growth rates. Figure
1.3 depicts their scatter diagram by placing the Dow Jones growth rate on the horizontal axis
and the Nasdaq growth rate on the vertical axis.
On the scatter diagram in figure 1.3, each point illustrates the Dow Jones growth rate and the
Nasdaq growth rate for one specific month. For example, the top left point labeled Feb 2000
represents February 2000 when the Dow fell by 7.42 percent and the Nasdaq grew by 19.19
percent. Similarly the point in the first quadrant labeled Jan 1987 represents January 1987 when
the Dow rose by 13.82 percent and the Nasdaq rose by 12.41 percent.
The Dow Jones and Nasdaq growth rates appear to be correlated. Two variables are correlated
when information about one variable helps us predict the other. Typically, when the Dow Jones
growth rate is positive, the Nasdaq growth rate is also positive; similarly, when the Dow Jones
growth rate is negative, the Nasdaq growth rate is usually negative. Although there are excep-
tions, February 2000, for example, knowing one growth rate typically helps us predict the other.
For example, if we knew that the Dow Jones growth rate was positive in one specific month,
we would predict that the Nasdaq growth rate would be positive also. While we would not always
be correct, we would be right most of the time.
Covariance quantifies the notion of correlation. We can use the following three steps to calculate
the covariance of two data variables, x and y:
1. For each observation, calculate the amount by which variable x deviates from its mean and
the amount by which variable y deviates from its mean.
14
Table 1.5a
Monthly percentage growth rate of Dow Jones Industrial Index, 1985 to 2000
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Chapter 1
1985 6.21 − 0.22 −1.34 − 0.69 4.55 1.53 0.90 −1.00 − 0.40 3.44 7.12 5.07
1986 1.57 8.79 6.41 −1.90 5.20 0.85 − 6.20 6.93 − 6.89 6.23 1.94 − 0.95
1987 13.82 3.06 3.63 − 0.79 0.23 5.54 6.35 3.53 −2.50 −23.22 −8.02 5.74
1988 1.00 5.79 − 4.03 2.22 − 0.06 5.45 − 0.61 − 4.56 4.00 1.69 −1.59 2.56
1989 8.01 −3.58 1.56 5.46 2.54 −1.62 9.04 2.88 −1.63 −1.77 2.31 1.73
1990 −5.91 1.42 3.04 −1.86 8.28 0.14 0.85 −10.01 − 6.19 −0.42 4.81 2.89
1991 3.90 5.33 1.10 − 0.89 4.83 −3.99 4.06 0.62 − 0.88 1.73 −5.68 9.47
1992 1.72 1.37 − 0.99 3.82 1.13 −2.31 2.27 − 4.02 0.44 −1.39 2.45 − 0.12
1993 0.27 1.84 1.91 − 0.22 2.91 − 0.32 0.67 3.16 −2.63 3.53 0.09 1.90
1994 5.97 −3.68 −5.11 1.26 2.08 −3.55 3.85 3.96 −1.79 1.69 − 4.32 2.55
1995 0.25 4.35 3.65 3.93 3.33 2.04 3.34 −2.08 3.87 −0.70 6.71 0.84
1996 5.44 1.67 1.85 − 0.32 1.33 0.20 −2.22 1.58 4.74 2.50 8.16 −1.13
1997 5.66 0.95 − 4.28 6.46 4.59 4.66 7.17 −7.30 4.24 − 6.33 5.12 1.09
1998 −0.02 8.08 2.97 3.00 −1.80 0.58 − 0.77 −15.13 4.03 9.56 6.10 0.71
1999 1.93 − 0.56 5.15 10.25 −2.13 3.89 −2.88 1.63 − 4.55 3.80 1.38 5.69
2000 −4.84 −7.42 7.84 − 1.72 −1.97 − 0.71 0.71 6.59 −5.03 3.01 −5.07 3.59
15
Table 1.5b
Monthly percentage growth rate of Nasdaq Index, 1985 to 2000
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1985 12.79 1.97 −1.76 0.50 3.64 1.86 1.72 −1.19 −5.84 4.35 7.35 3.47
1986 3.35 7.06 4.23 2.27 4.44 1.32 −8.41 3.10 −8.41 2.88 − 0.33 −3.00
Descriptive Statistics
1987 12.41 8.39 1.20 −2.86 − 0.31 1.97 2.40 4.62 −2.35 −27.23 −5.60 8.29
1988 4.30 6.47 2.07 1.23 −2.35 6.59 −1.87 −2.76 2.95 −1.34 −2.88 2.66
1989 5.22 − 0.40 1.75 5.14 4.35 −2.44 4.25 3.42 0.77 −3.66 0.11 − 0.29
1990 −8.58 2.41 2.28 −3.54 9.26 0.72 −5.21 −13.01 −9.63 − 4.27 8.88 4.09
1991 10.81 9.39 6.44 0.50 4.41 −5.97 5.49 4.71 0.23 3.06 −3.51 11.92
1992 5.78 2.14 − 4.69 − 4.16 1.15 −3.71 3.06 −3.05 3.58 3.75 7.86 3.71
1993 2.86 −3.67 2.89 − 4.16 5.91 0.49 0.11 5.41 2.68 2.16 −3.19 2.97
1994 3.05 −1.00 − 6.19 −1.29 0.18 −3.98 2.29 6.02 − 0.17 1.73 −3.49 0.22
1995 0.43 5.10 2.96 3.28 2.44 7.97 7.26 1.89 2.30 − 0.72 2.23 − 0.67
1996 0.73 3.80 0.12 8.09 4.44 − 4.70 −8.81 5.64 7.48 − 0.44 5.82 − 0.12
1997 6.88 −5.13 − 6.67 3.20 11.07 2.98 10.52 − 0.41 6.20 −5.46 0.44 −1.89
1998 3.12 9.33 3.68 1.78 − 4.79 6.51 −1.18 −19.93 12.98 4.58 10.06 12.47
1999 14.28 −8.69 7.58 3.31 −2.84 8.73 −1.77 3.82 0.25 8.02 12.46 21.98
2000 −3.17 19.19 −2.64 −15.57 −11.91 16.62 −5.02 11.66 −12.68 −8.25 −22.90 −4.90
16 Chapter 1
20 Nasdaq
Feb 2000
Jan 1987
10
Dow Jones
0
–20 –10 0 10 20
–10
–20
Figure 1.3
Scatter diagram—Dow Jones growth rate versus Nasdaq growth rate
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
∑ ( xt − x ) ( yt − y )
T
t =1
=
T
where
Let us calculate the covariance for the Dow and Nasdaq monthly growth rates. The average
monthly increase for the Dow Jones Industrial average was 1.25 percent and the average increase
for the Nasdaq Composite was 1.43 percent. Their covariance equals 19.61:
17 Descriptive Statistics
20 Nasdaq
Feb 2000 deviation
Deviations from
mean
10 Jan 1987
Dow Jones
deviation
0
–20 –10 0 10 20
–10
–20
Figure 1.4
Scatter diagram—Dow Jones growth rate less its mean versus Nasdaq growth rate less its mean
( x1 − x ) ( y1 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
(6.21 − 1.25) (12.79 − 1..43) + . . . + (3.59 − 1.25) (−4.90 − 1.43)
=
192
= 19.61
A nonzero variance suggests that the variables are correlated. To understand why, consider a
scatter diagram of the deviations. As seen in figure 1.4, we place the deviation of the Dow Jones
growth rate from its mean on the horizontal axis and the deviation of the Nasdaq growth rate
from its mean on the vertical axis. This scatter diagram allows us to motivate the relationship
between the covariance and correlation.4
The covariance equation and the scatter diagram are related. The numerator of the covariance
equation equals the sum of the products of each month’s deviations, (xt − x–)(yt − –y ):
4. The discussion that follows is not mathematically rigorous because it ignores the magnitude of the deviation products.
Nevertheless, it provides valuable insights. Chapter 25 provides a more rigorous discussion of covariance.
18 Chapter 1
−
(yt – y)
Quadrant II Quadrant I
− − − −
(xt− x) < 0 (yt − y) > 0 (xt − x) > 0 (yt − y ) > 0
− − − −
(xt− x)(yt − y) < 0 (xt − x)(yt − y) > 0
−
(xt − x)
Quadrant III Quadrant IV
− − − −
(xt− x) < 0 (yt − y) < 0 ( xt− x) > 0 (yt − y) < 0
− − − −
(xt − x)(yt − y) > 0 (xt − x)(yt − y) < 0
Figure 1.5
Scatter diagram—Deviations and covariance terms
∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] =
T
What can we say about the sign of each observation’s deviations and their product, (xt − x–)(yt
− –y ), in each quadrant of the scatter diagram (figure 1.5)?
•First quadrant. Dow growth rate is greater than its mean and Nasdaq growth is greater than its
mean. Both deviations are positive; hence the product of the deviations is positive in the first
quadrant:
20 Nasdaq
Deviations deviation
from mean
10
Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5
–10
–20
Figure 1.6
Scatter diagram—Amherst precipitation less its mean versus Nasdaq growth rate less its mean
Compare figures 1.4 and 1.5. In the Dow Jones and Nasdaq deviation scatter diagram (figure
1.4), the points representing most months lie in the first and third quadrants. Consequently the
product of the deviations, (xt − x– )(yt − –y ), is positive in most months. This explains why the
covariance is positive.5 A positive covariance means that the variables are positively correlated.
When one variable is above average, the other is typically above average as well. Similarly, when
one variable is below average, the other is typically below average.
Two variables are independent or uncorrelated when information about one variable does not
help us predict the other. The covariance of two independent (uncorrelated) data variables is
approximately zero. To illustrate two independent variables, consider the precipitation in Amherst
and the Nasdaq growth rate. The scatter diagram in figure 1.6 plots the deviation of Amherst
precipitation from its mean versus the deviation of the Nasdaq growth rate from its mean:
Recall what we know about the sign of the deviation in each quadrant:
5. As mentioned above, we are ignoring how the magnitude of the products affects the sum.
20 Chapter 1
• First quadrant: (xt − x– ) > 0 and (yt − –y ) > 0 → (xt − x– ) (yt − –y ) > 0
• Second quadrant: (xt − x– ) < 0 and (yt − –y ) > 0 → (xt − x– ) (yt − –y ) < 0
• Third quadrant: (xt − x– ) < 0 and (yt − –y ) < 0 → (xt − x– ) (yt − –y ) > 0
• Fourth quadrant: (x − x– ) > 0 and (y − –y ) <
t t 0 → (xt − x– ) (yt − –y ) < 0
Since the points are distributed more or less evenly across all four quadrants, the products of
the deviations, (xt − x– )(yt − –y ), are positive in about half the months and negative in the other
half.6 Consequently the covariance will be approximately equal to 0. In general, if variables are
independent, the covariance will be about 0. In reality, the covariance of precipitation and the
Nasdaq growth rate is −0.91, approximately 0:
∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] = = −0.91 ≈ 0
T
We can use EViews to calculate the covariance. The stock market data are posted on our
website.
Stock market data: Monthly time series growth rates of the Dow Jones Industrial and Nasdaq
stock indexes from January 1985 to December 2000
Monthly growth rate of the Dow Jones Industrial Average based on the
DJGrowtht
monthly close for observation t (percent)
Monthly growth rate of the Nasdaq Composite based on the monthly close
NasdaqGrowtht
for observation t (percent)
Precipt Monthly precipitation in Amherst, MA, for observation t (inches)
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
Next instruct EViews to calculate the covariance of Amherst precipitation and the Nasdaq
growth rate:
6. Again, note that this explanation ignores the magnitude of the products.
21 Descriptive Statistics
• In the Workfile window: Highlight precip by clicking on it; then, while depressing <Ctrl>,
Both the variances and the covariances are reported in table 1.6. The variances are reported in
the diagonal cells: the variance for Amherst precipitation is 4.17 and the variance for the Nasdaq
growth rate is 43.10. Their covariance appears in the off diagonal cells: the covariance is −0.91.
Note that the two off-diagonal cells report the same number. This results from a basic arithmetic
fact. When we multiply two numbers together, the order of the multiplication does not matter:
Table 1.6
Amherst precipitation and Nasdaq growth rate covariance matrix
Covariance matrix
Precip NasdaqGrowth
There is no natural range for the covariance; its magnitude depends on the units used. To appre-
ciate why, suppose that we measured Amherst precipitation in centimeters rather than inches.
Consequently all precipitation figures appearing in table 1.1 would be multiplied by 2.54 to
convert from inches to centimeters. Now consider the covariance equation:
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
The covariance for Amherst precipitation would rise by a factor of 2.54. To understand why,
let the variable x represent Amherst precipitation and y the Nasdaq growth rate:
xt’s up by a x– up by a
→
factor of 2.54 factor of 2.54
(xt − x–)’s up by a
factor of 2.54
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
Cov[x, y] up by a
factor of 2.54
Our choice of which units to use in measuring rainfall, inches or centimeters, is entirely arbitrary.
The arbitrary choice affects the magnitude of the covariance. How, then, can we judge the covari-
ance to be large or small when its size is affected by an arbitrary decision?
Unit Insensitivity
To address this issue, we introduce the correlation coefficient that is not affected by the choice
of units:
Cov[ x, y]
CorrCoef[ x, y] =
Var[ x] Var[ y]
To appreciate why this resolves the problem, again let x represent Amherst precipitation. We
know that measuring rainfall in centimeters rather than inches causes the covariance to increase
by a factor of 2.54. But how does the use of centimeters affect the variance of precipitation and
its square root?
23 Descriptive Statistics
xt’s up by a x– up by a
→
factor of 2.54 factor of 2.54
(xt − x– )’s up by a
factor of 2.54
∑
T
t =1
( xt − x )2
Var[ x] =
T
Var[x] up by a
factor of 2.542
↓
Var[ x] up by a
factor of 2.54
We will now consider the correlation coefficient equation. When we use centimeters rather than
inches, both Cov[x, y] and Var[ x] increase by a factor of 2.54; consequently both the numera-
tor and the denominator of the correlation coefficient equation increase by a factor of 2.54:
Cov[x, y] up by
a factor of 2.54
↓
Cov[ x, y]
CorrCoef[ x, y] =
Var[ x] Var[ y]
↑
Var[ x] up by
a factor of 2.54
Natural Range
The correlation coefficient also has another important property; it must lie between −1.00 and
+1.00. Therefore it provides us with a sense of how strongly two variables are correlated. A
correlation coefficient of +1.00 represents perfect positive correlation and −1.00 represents
perfect negative correlation (figure 1.7).
To understand why, consider the two polar cases of perfect positive and perfect negative
correlation.
24 Chapter 1
0 CorrCoef
–1 1
Figure 1.7
Range of correlation coefficients
∑ ∑
T T
t =1
( xt − x ) 2 t =1
( yt − y )2
Var[ x] = , Var[ y] =
T T
∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] =
T
Perfect Positive Correlation Consider an example of perfect positive correlation. Suppose that
two variables are identical; that is, suppose that
yt = xt for each t = 1, 2, . . . , T
In this case the variables exhibit perfect positive correlation. If we know the value of x, we can
perfectly predict the value of y, and vice versa. Let us compute their correlation coefficient. To
do so, first note that x and y have identical means
–y = x–
and that each observation’s deviation from the means is the same for x and y
yt − –y = xt − x– for each t = 1, 2, . . . , T
Consider the equations above for the variances and covariance; both the variance of y and the
covariance equal the variance of x:
∑ ∑
T T
t =1
( yt − y )2 t =1
( xt − x ) 2
Var[ y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − –y = xt − x– Definition
25 Descriptive Statistics
and
∑ ( xt − x ) ( yt − y ) ∑
T T
t =1 t =1
( xt − x ) 2
Cov[ x, y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − y = xt − x–
– Definition
Now apply the correlation coefficient equation. The correlation coefficient equals 1.00:
Cov[ x, y] Var[ x] Var[ x]
CorrCoef[ x, y] = = = = 1.00
Var[ x] Var[ y] Var[ x] Var[ x] Var[ x]
↑ ↑
Definition Cov[x, y] = Var[x]
Perfect Negative Correlation Next consider an example of perfect negative correlation; suppose
that
In this case the variables exhibit perfect negative correlation. Clearly, y’s mean is the negative
of x’s:
–y = −x–
and y’s deviation from its mean equals the negative of x’s deviation from its mean for each
observation
yt − –y = −(xt − x– ) for each t = 1, 2, . . . , T
The variance of y equals the variance of x and the covariance equals the negative of the variance
of x:
∑ ∑
T T
t =1
( yt − y )2 t =1
( xt − x ) 2
Var[ y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − y = −(xt − x– )
– Definition
26 Chapter 1
and
∑ ( xt − x ) ( yt − y ) ∑
T T
t =1 t =1
( xt − x ) 2
Cov[ x, y] = = = − Var[ x]
T T
↑ ↑ ↑
Definition yt − y = −(xt − x– )
– Definition
Applying the correlation coefficient equation, the correlation coefficient equals −1.00:
↑ ↑
Definition Cov[x, y] = −Var[x]
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
• In the Workfile window: Highlight precip by clicking on it; then, while depressing <Ctrl>,
click on nasdaqgrowth and djgrowth to highlight them also.
• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables Precip,
NasdaqGrowth, and DJGrowth appears.
• In the Group window: Click View and then click Covariance Analysis. . . .
• In the Covariance Analysis window: Clear the Covariance box and select the Correlation box;
then click OK.
All diagonal elements must equal 1.00. This reflects the fact that when two variables are identi-
cal, perfect positive correlation results. Each off-diagonal cell reports the correlation coefficient
for the two different variables, as shown in table 1.7.
27 Descriptive Statistics
Table 1.7
Amherst precipitation, Nasdaq growth rate, and Dow Jones growth rate correlation matrix
Correlation matrix
Note that all the correlation coefficients fall within the −1.00 to +1.00 range. Each correlation
coefficient provides us with a sense of how correlated two variables are:
The correlation coefficient for the Dow and Nasdaq growth rate is positive. On the one hand,
this illustrates that they are positively correlated. On the other hand, the correlation coefficient
for Nasdaq growth rate and Amherst precipitation is approximately 0, indicating that the Nasdaq
growth rate and Amherst precipitation are independent (figure 1.8).
The fact that two variables are highly correlated does not necessarily indicate that one variable
is causing the other to rise and fall. For example, the Dow Jones and Nasdaq growth rates are
indeed positively correlated. This does not imply that a rise in the Dow Jones causes the Nasdaq
to rise or that a rise in the Nasdaq causes the Dow Jones to rise, however. It simply means that
when one rises, the other tends to rise, and when one falls, the other tends to fall. One reason
that these two variables tend to move together is that both are influenced by similar factors. For
example, both are influenced by that the general health of the economy. On the one hand, when
the economy prospers, both Dow Jones stocks and Nasdaq stocks tend to rise; therefore both
indexes tend to rise. On the other hand, when the economy falters, both indexes tend to fall.
While the indexes are correlated, other factors are responsible for the causation.
Elementary algebra allows us to derive the following relationships for means, variances, and
covariances:7
7. See appendix 1.1 at the end of this chapter for the algebraic proofs.
28 Chapter 1
20 Nasdaq
Feb 2000
deviation
Deviations from
mean
10 Jan 1987
Dow Jones
deviation
0
–20 –10 0 10 20
10
Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5
–10
Independent (uncorrelated)
Knowing the value of one variable does
not help us predict the value of the other
↓
Cov= 0.91≈ 0
CorrCoef = –0.07≈ 0 –20
Figure 1.8
Scatter diagrams—Comparison of correlated and independent variables
29 Descriptive Statistics
Var[y]
The variance of the sum of two independent (uncorrelated) variables equals the sum of the vari-
ances of the variables.
• Covariance of the sum of a constant and a variable: Cov[c + x, y] = Cov[x, y]
The covariance of two variables is unaffected when a constant is added to one of the
variables.
• Covariance of the product of a constant and a variable: Cov[cx, y] = c Cov[x, y]
Multiplying one of the variables by a constant increases their covariance by a factor equal to the
constant.
Chapter 1 Exercises
1. Consider the inches of precipitation in Amherst, MA, during 1964 and 1975:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1964 5.18 2.32 2.71 2.72 0.83 1.84 3.02 3.01 0.94 1.32 1.68 3.98
1975 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
a. For each year, record the number of months that fall into the following categories:
2. Consider the inches of precipitation in Amherst, MA, during 1964 and 1975.
a. Focus on the two histograms you constructed in exercise 1. Based on the histograms, in
which of the two years is the
i. center of the distribution greater (further to the right)? ______
ii. spread of the distribution greater? ______
b. For each of the two years, use your statistical software to find the mean and the sum of
squared deviations. Report your answers in the table below:
1964 1975
Mean ________ ________
Sum of squared deviations ________ ________
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
8. Common sample eliminates all observations in which there is one or more missing value in one of the variables; the
individual samples option does not do so. Since no values are missing for 1964 and 1975, the choice of common or
individual has no impact.
32 Chapter 1
c. Using your answers to part b and some simple arithmetic (division), compute the variance
for each year:
1964 1975
Variance ________ ________
d. Are your answers to parts b and c consistent with your answer to part a? Explain.
3. Focus on precipitation in Amherst, MA in 1975. Consider a new variable, TwoPlusPrecip,
which equals two plus each month’s precipitation: TwoPlusPrecip = 2 + Precip.
a. Fill in the blanks in the table below:
1975 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Precip 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
TwoPlusPrecip ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____
b. Construct a histogram for TwoPlusPrecip and compare it to the histogram for Precip in
1975 that you constructed in problem 1.
i. How are the histograms related?
ii. What happened to the distribution center?
iii. What happened to the distribution spread?
c. Consider the equations that describe mean and variance of a constant plus a variable:
Based on these equations and the mean and variance of Precip in 1975, what is the
i. mean of TwoPlusPrecip in 1975? ______
ii. variance of TwoPlusPrecip in 1975? ______
d. Using your statistical package, generate a new variable: TwoPlusPrecip = 2 + Precip. What
is the
i. mean of TwoPlusPrecip in 1975? ______
ii. sum of squared deviations of TwoPlusPrecip in 1975? ______
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
TwoPlusPrecip = 2 + Precip
• Click OK.
Instruct EViews to calculate the mean and sum of squared deviations of TwoPlusPrecip:
• In the Workfile window: Double click on TwoPlusPrecip.
• A spreadsheet displaying the value of TwoPlusPrecip for all the months appears.
• In the Series window: Click View; next click Descriptive Statistics & Tests and then Stats
Table. Descriptive statistics for all the months of the twentieth century now appear.
• In the Series window: Click Sample. In the Sample window: Enter year = 1975 in the “If
condition (optional)” text area to restrict the sample to 1975 only.
• Click OK. Descriptive statistics for 1975 appear in the Group window. Record the mean and
sum of squared deviations for 1975.
34 Chapter 1
iii. Using the sum of squared deviations and a calculator, compute the variance of TwoPlus-
Precip in 1975. ______
e. Are your answers to parts b, c, and d consistent? Explain.
4. Focus on precipitation in Amherst, MA, in 1975. Suppose that we wish to report precipitation
in centimeters rather than inches. To do this, just multiply each month’s precipitation by 2.54.
Consider a new variable, PrecipCm, which equals 2.54 times each month’s precipitation as
measured in inches: PrecipCm = 2.54 × Precip.
a. Consider the equations that describe mean and variance of a constant times a variable:
Based on these equations and the mean and variance of Precip in 1975, what is the
i. mean of PrecipCm in 1975? ______
ii. variance of PrecipCm in 1975? ______
b. Using your statistical package, generate a new variable: PrecipCm = 2.54 × Precip. What
is the
i. mean of PrecipCm in 1975? ______
ii. sum of squared deviations of PrecipCm in 1975? ______
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
Instruct EViews to calculate the mean and sum of squared deviations of PrecipCm:
35 Descriptive Statistics
Table.
• In the Group window: Click Sample. In the Sample window: Enter year = 1975 in the “If
condition (optional)” text area to restrict the sample to 1975 only.
• Click OK. Descriptive statistics for 1975 appear in the Group window. Record the mean and
sum of squared deviations for 1975.
iii. Using the sum of squared deviations and a calculator, compute the variance of Pre-
cipCm in 1975. ______
c. Are your answers to parts a and b consistent? Explain.
Focus on thirty students who enrolled in an economics course during a previous semester.
Student SAT data: Cross-sectional data of student math and verbal high school SAT scores from
a group of 30 students.
5. Consider the equations that describe the mean and variance of the sum of two variables:
SatMath SatVerbal
Mean _______ _______
Variance _______ _______
Covariance _______
Correlation coefficient _______
Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
• In the Workfile window: Highlight satmath by clicking on it; then, while depressing <Ctrl>,
click on satverbal to highlight it also.
• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables SatMath
and SatVerbal for all the students appears.
9. Common sample eliminates all observations in which there is one or more missing value in one of the variables; the
individual samples option does not do so. Since no values are missing, the choice of common or individual has no
impact.
37 Descriptive Statistics
Note: Copying and Pasting EViews Text It is often convenient to copy and paste EViews results
into a word processing document such as Microsoft Word. In the long run this can save you
much time because you can reproduce your results quickly and accurately:
• In EViews, highlight the text you wish to copy and paste.
• Right click on the highlighted area.
• Unless you have a good reason to do otherwise, accept the default choice by clicking OK.
• In your word processor: click Paste.
Based on these equations and the mean and variance of SatMath and SatVerbal, what is the
i. mean of SatSum? ______
ii. variance of SatSum? ______
e. Using your statistical package, generate a new variable:
SatSum = SatMath + SatVerbal. What is the
i. mean of SatSum? ______
ii. sum of squared deviations of SatSum? ______
38 Chapter 1
a. If student 1 drops the course, would the mean Math SAT score of the 29 remaining students
increase or decrease?
b. More generally, if a student drops the course, what determines whether the mean Math
SAT score of the remaining students would increase or decrease?
c. If a student adds the course, what determines whether the mean Math SAT would increase
or decrease?
d. Evaluate the following statement:
“A student transfers from College A to College B. The mean Math SAT scores at both col-
leges increase.”
Could this statement possibly be true? If so, explain how; if not, explain why not.
7. Again, focus on the student SAT data. Consider the equation for the mean Math SAT score
of the students:
x1 + x2 + . . . + x30
Mean[ SatMath] =
30
Next consider the mean Math SAT score of just the female students and the mean of the just the
male students. Since students 1 through 10 are female and students 11 through 30 are male:
x1 + x2 + . . . + x10
Mean[ SatMathFemale] =
10
x11 + x12 + . . . + x30
Mean[ SatMathMale] =
20
39 Descriptive Statistics
a. Using algebra, show that the mean for all students equals the weighted average of the
mean for female students and the mean for male students where the weights equal the propor-
tion of female and male students; that is,
and
Number of male students
Wgt Male =
Total number of students
= Weight giv
ven to male students
Mean[SatMath] = ______
Mean[SatMathFemale] = ______ Mean[SatMathMale] = ______
8. The following data from the 1995 and 1996 baseball seasons illustrate what is known as
Simpson’s paradox.
a. Compute the batting average for both players in 1995. Fill in the appropriate blanks. In
1995, who had the higher average?
40 Chapter 1
b. Compute the batting average for both players in 1996. Fill in the appropriate blanks. In
1996, who had the higher average?
c. Next combine the hits and at bats for the two seasons. Fill in the appropriate blanks.
Compute the batting average for both players in the combined seasons. Fill in the appropriate
blanks. In the combined seasons, who had the higher average?
d. Explain why the batting average results appear paradoxical.
e. Resolve the paradox. Hint: Jeter’s combined season average can be viewed as a weighted
average of his two seasons, weighted by his at bats. Similarly Justice’s combined season
average can be viewed as a weighted average also. Apply the equation you derived in
problem 7.
∑
T
x1 + x2 + . . . + xT x
t =1 t
Mean[ x] = x = =
T T
∑ ( xt − x )2
T
( x1 − x )2 + ( x2 − x )2 + … + ( xT − x )2 t =1
Var[ x] = =
T T
∑ ( xt − x ) ( yt − y )
T
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + … + ( xT − x ) ( yT − y ) t =1
Cov[ x, y] = =
T T
The mean of a constant plus a variable equals the constant plus the mean of the variable:
c + x1 + c + x2 + … + xT + c
Mean[c + x] =
T
c + c + . . . + c x1 + x2 + . . . + xT
= +
T T
Tc x1 + x2 + . . . + xT
= +
T T
= c + Mean[ x] = c + x
41 Descriptive Statistics
The mean of a constant times a variable equals the constant times the mean of the variable:
cx1 + cx2 + . . . + cxT
Mean[cx] =
T
x1 + x2 + . . . + xT
=c
T
= c Mean[ x] = cx
The mean of the sum of two variables equals the sum of the means of the variables:
x1 + y1 + x2 + y2 + . . . + xT + yT
Mean[ x + y] =
T
( x + x + . . . + xT ) + ( y1 + y2 + . . . + yT )
= 1 2
T
x1 + x2 + . . . + xT y1 + y2 + . . . + yT
= +
T T
=x+y
The variance of a constant plus a variable equals the variance of the variable:
The variance of a constant times a variable equals the constant squared times the variance of the
variable:
42 Chapter 1
T
= c 2 Var[ x]
The variance of the sum of two variables equals the sum of the variances of the variables plus
twice the variables’ covariance:
Var[ x + y]
[( x1 + y1 ) − ( x + y )]2 + . . . + [( xT + yT ) − ( x + y )]2
=
T
[( x1 − x ) + ( y1 − y )]2 + . . . + [( xT − x ) + ( yT − y )]2
=
T
[( x1 − x )2 + 2( x1 − x )( y1 − y ) + ( y1 − y )2 ] + . . . + [( xT − x )2 + 2( xT − x )( yT − y ) + ( yT − y )2 ]
=
T
[( x1 − x )2 + . . . + ( xT − x )2 ] + 2[( x1 − x )( y1 − y ) + . . . + ( xT − x )( yT − y )]
+ [( y1 − y )2 + . . . + ( yT − y )2 ]
=
T
( x1 − x )2 + . . . + ( xT − x )2 ( x1 − x )( y1 − y ) + . . . + ( xT − x )( yT − y )
= +2
T T
( y1 − y )2 + . . . + ( yT − y )2
+
T
= Var[ x] + 2 Cov[ x, y] + Var[ y]
Variance of the Sum of Two Independent (Uncorrelated) Variables: Var[x + y] = Var[x] + Var[y]
The variance of the sum of two independent (uncorrelated) variables equals the sum of the vari-
ances of the variables:
= Var[x] + Var[y]
The covariance of two variables is unaffected when a constant is added to one of the
variables:
Cov[c + x, y]
=
[(c + x1 ) − (c + x )] ( y1 − y ) + [(c + x2 ) − (c + x )] ( y2 − y ) + . . . + [(c + xT ) − (c + x )] ( yT − y )
T
=
[(c − c) + ( x1 − x )] ( y1 − y ) + [(c − c) + ( x2 − x )] ( y2 − y ) + . . . + [(c − c) + ( xT − x )] ( yT − y )
T
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
=
T
= Cov[ x, y]
Chapter 2 Outline
2.6 Mean, Variance, and Covariance: Data Variables and Random Variables
46 Chapter 2
(1 − p)2p + p2(1 − p)
simplifies to
p(1 − p)
2.1.1 Random Process: A Process Whose Outcome Cannot Be Predicted with Certainty
The outcome of a random process is uncertain. Tossing a coin is a random process because
you cannot tell beforehand whether the coin will land heads or tails. A baseball game is a random
process because the outcome of the game cannot be known beforehand, assuming of course that
the game has not been fixed. Drawing a card from a well-shuffled deck of fifty-two cards is a
47 Essentials of Probability and Estimation Procedures
random process because you cannot tell beforehand whether the card will be the ten of hearts,
the six of diamonds, the ace of spades, and so on.
The probability of an outcome tells us how likely it is for that outcome to occur. The value of
a probability ranges from 0 to 1.0. A probability of 0 indicates that the outcome will never occur;
1.0 indicates that the outcome will occur with certainty. A probability of one-half indicates that
the chances of the outcome occurring equals the chances that it will not. For example, if the
experts believe that a baseball game between two teams, say the Red Sox and Yankees, is a
toss-up, then the experts believe that
• the probability of a Red Sox win (and a Yankee loss) is one-half and
• the probability of a Red Sox loss (and a Yankee win) is also one-half.
We will use a card draw as our first illustration of a random process. While we could use a
standard deck of fifty-two cards as the example, the arithmetic can become cumbersome. Con-
sequently, to keep the calculations manageable, we will use a deck of only four cards, the 2 of
clubs, the 3 of hearts, the 3 of diamonds, and the 4 of hearts:
This experiment represents one repetition of a random process because we cannot determine
which card will be drawn before the experiment is conducted. Throughout this textbook, we will
continue to use the word experiment to represent one repetition of a random process. It is easy
to calculate the probability of each possible outcome for our card draw experiment.
Similarly the probability of drawing the 3 of diamonds is 1/4 and the probability of drawing the
4 of hearts is 1/4. To summarize,
1 1 1 1
Prob[2♣] = , Prob[3♥] = , Prob[3♦] = , Prob[ 4♥] =
4 4 4 4
2.1.3 Random Variable: A Variable That Is Associated with an Outcome of a Random Process
A random variable is a variable that is associated with a random process. The value of a random
variable cannot be determined with certainty before the experiment is conducted. There are two
types of random variables:
• A discrete random variable can only take on a countable number of discrete values.
• A continuous random variable can take on a continuous range of values; that is, a continuous
random variable can take on a continuum of values.
To illustrate a discrete random variable, consider our card draw experiment and define v:
That is, v equals 2, if the 2 of hearts were drawn; 3, if the 3 of hearts or the 3 of diamonds were
drawn; and 4, if the 4 of hearts were drawn.
• v is discrete because it can only take on a countable number of values; v can take on three
values: 2 or 3 or 4.
• v is a random variable because we cannot determine the value of v before the experiment is
conducted.
2.2.1 Probability Distribution Describes the Probability for All Possible Values of a Random
Variable
While we cannot determine v’s value beforehand, we can calculate the probability of each pos-
sible value:
• v equals 2 whenever the 2 of clubs is drawn; since the probability of drawing the 2 of clubs
is 1/4, the probability that v will equal 2 is 1/4.
49 Essentials of Probability and Estimation Procedures
Table 2.1
Probability distribution of random variable v
1
2♣ 2 = 0.25
4
1 1 1
3♥ or 3♦ 3 + = = 0.50
4 4 2
1
4♥ 4 = 0.25
4
0.50
0.25
v
2 3 4
Figure 2.1
Probability distribution of the random variable v
• v equals 3 whenever the 3 of hearts or the 3 of diamonds is drawn; since the probability of
drawing the 3 of hearts is 1/4 and the probability of drawing the 3 of diamonds is 1/4, the prob-
ability that v will equal 3 is 1/2.
• v equals 4 whenever the 4 of hearts is drawn; since the probability of drawing the 4 of hearts
is 1/4, the probability that v will equal 4 is 1/4.
Table 2.1 describes the probability distribution of the random variable v. The probability
distribution is sometimes called the probability density function of the random variable or simply
the distribution of the random variable. Figure 2.1 illustrates the probability distribution with a
graph that indicates how likely it is for the random variable to equal each of its possible values.
Note that the probabilities must sum to 1 because one of the four cards must be drawn; v must
equal either 2 or 3 or 4. This illustrates a general principle: The sum of the probabilities of all
possible outcomes must equal 1.
50 Chapter 2
In general, a random variable brings both bad and good news. Before the experiment is
conducted:
Bad news: What we do not know: on the one hand, we cannot determine the numerical value
of the random variable with certainty.
Good news: What we do know: on the other hand, we can often calculate the random variable’s
probability distribution telling us how likely it is for the random variable to equal each of its
possible numerical values.
We can interpret the probability of a particular outcome as the relative frequency of the outcome
after the random process, the experiment, is repeated many, many times. We will illustrate the
relative frequency interpretation of probability using our card draw experiment:
Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 2?
Answer: Since one of the four cards is a 2, we would expect to draw a 2 about one-fourth of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 2 should be about 1/4, its probability.
Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 3?
Answer: Since two of the four cards are 3’s, we would expect to draw a 3 about one-half of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 3 should be about 1/2 its probability.
Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 4?
Answer: Since one of the four cards is a 4, we would expect to draw a 4 about one-fourth of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 4 should be about 1/4, its probability.
We could justify this interpretation of probability “by hand,” but doing so would be a very time-
consuming and laborious process. Computers, however, allow us to simulate the experiment
quickly and easily. The Card Draw simulation in our econometrics lab does so (figure 2.2).
51 Essentials of Probability and Estimation Procedures
2 of Hearts
2 of Diamonds
2 of Clubs
3 of Spades
3 of Hearts Cards selected to
3 of Diamonds be in the deck
3 of Clubs
4 of Spades
4 of Hearts
Card drawn
in this repetition
Start Stop Pause
Value of card drawn
in this repetition
Repetition
Value
Mean
Var
2 Relative
frequency of
3
2, 3, and 4
4
Figure 2.2
Card Draw simulation
We first specify the cards to include in our deck. By default, the 2♣, 3♥, 3♦, and 4♥ are
included. When we click Start, the simulation randomly selects one of our four cards. The card
drawn and its value are reported. To randomly select a second card, click Continue. A table
reports on the relative frequency of each possible value and a histogram visually illustrates the
distribution of the numerical values.
Click Continue repeatedly to convince yourself that our experiment is indeed a random
process; that is, convince yourself that there is no way to determine which card will be drawn
beforehand. Next uncheck the Pause checkbox and click Continue. The simulation no longer
52 Chapter 2
0.50
0.25
v
2 3 4
Figure 2.3
Histogram of the numerical values of v
pauses after each card is selected. It will now repeat the experiment very rapidly. What happens
as the number of repetitions becomes large? The relative frequency of a 2 is approximately 0.25,
the relative frequency of a 3 is approximately 0.50, and the relative frequency of a 4 is approxi-
mately 0.25 as illustrated by the histogram appearing in figure 2.3. After many, many repetitions
click Stop. Recall the probabilities that we calculated for our random variable v:
Prob[v = 2] = 0.25
Prob[v = 3] = 0.50
Prob[v = 4] = 0.25
When the experiment is repeated many, many times, the relative frequency of each outcome
equals its probability. After many, many repetitions the distribution of the numerical values from
all the repetitions mirrors the probability distribution:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
53 Essentials of Probability and Estimation Procedures
2.3.1 Center of the Probability Distribution: Mean (Expected Value) of the Random Variable
We have already defined the mean of a data variable in chapter 1. That is, the mean is the average
of the numerical values. We can extend the notion of the mean to a random variable by applying
the relative frequency interpretation of probability. The mean of a random variable equals the
average of the numerical values after the experiment is repeated many, many times. The mean
is often called the expected value because that is what we would expect the numerical value to
equal on average after many, many repetitions of the experiment.1
Question: On average, what would we expect v to equal if we repeated our experiment many,
many times? About one-fourth of the time v would equal 2, one-half of the time v would equal
3, and one-fourth of the time v would equal 4:
1 1 1
of the time of the time of the time
4 2 4
↓ ↓ ↓
v=2 v=3 v=4
Answer: On average, v would equal 3. Consequently the mean of the random variable v
equals 3.
More formally, we can calculate the mean of a random variable using the following two steps:
• Multiply each value by the value’s probability.
• Sum the products.
The mean equals the sum of these products. The following equation describes the steps more
concisely:2
In words, it states that for each possible value, multiply the value and its probability; then sum
the products.
Let us now “dissect” the right-hand side of the equation:
1. The mean and expected value of a random variable are synonyms. Throughout this textbook we will be consistent
and always use the term “mean.” You should note, however, that the term “expected value” is frequently used instead
of “mean.”
2. Note that the v in Mean[v] is in a bold font. This is done to emphasize the fact that the mean refers to the entire
probability distribution. Mean[v] refers to the center of the entire probability distribution, not just a single value. When
v does not appear in a bold font, we are referring to a specific value that v can take on.
54 Chapter 2
In our example, v can take on three values: 2, 3, and 4. Applying the equation for the mean
obtains
Next let us turn our attention to the variance. Recall from chapter 1 that the variance of a data
variable describes the spread of a data variable’s distribution. The variance equals the average
of the squared deviations of the values from the mean. Just as we used the relative frequency
interpretation of probability to extend the notion of the mean to a random variable, we will now
use it to extend the notion of the variance. The variance of a random variable equals the average
of the squared deviations of the values from its mean after the experiment is repeated many,
many times.
Begin by calculating the deviation from the mean and then the squared deviation for each
possible value of v:
If we repeat our experiment many, many times, what would the squared deviations equal on
average?
•About one-fourth of the time v would equal 2, the deviation would equal −1, and the squared
deviation 1.
•About one-half of the time v would equal 3, the deviation would equal 0, and the squared
deviation 0.
•About one-fourth of the time v would equal 1, the deviation would equal 1, and the squared
deviation 1.
Half of the time the squared deviation would equal 1 and half of the time 0. On average, the
squared deviations from the mean would equal 1/2.
More formally, we can calculate the variance of a random variable using the following four
steps:
• For each possible value of the random variable, calculate the deviation from the mean.
• Square each value’s deviation.
• Multiply each value’s squared deviation by the value’s probability.
• Sum the products.
For each possible value, multiply the squared deviation and its probability; then sum the prod-
ucts. In our example there are three possible values for v: 2, 3, and 4:
56 Chapter 2
Econometrics Lab 2.2: Card Draw Simulation—Checking the Mean and Variance Calculations
It is useful to use our simulation to check our mean and variance calculations. We will exploit
the relative frequency interpretation of probability to do so. Recall our experiment:
• Shuffle the 2♣, 3♥, 3♦, and 4♥ thoroughly.
• Draw one card and record its value.
• Replace the card.
The relative frequency interpretation of probability asserts that when an experiment is repeated
many, many times, the relative frequency of each outcome equals its probability. After many,
many repetitions the distribution of the numerical values from all the repetitions mirrors the
probability distribution:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
If our equations are correct, what should we expect when we repeat our experiment many,
many times?
• The mean of a random variable’s probability distribution should equal the average of the
numerical values of the variable obtained from each repetition of the experiment after the experi-
ment is repeated many, many times. Consequently after many, many repetitions the mean should
equal about 3.
• The variance of a random variable’s probability distribution should equal the average of the
squared deviations from the mean obtained from each repetition of the experiment after the
experiment is repeated many, many times. Consequently after many, many repetitions the vari-
ance should equal about 0.5.
57 Essentials of Probability and Estimation Procedures
2 of Hearts
2 of Diamonds
2 of Clubs
3 of Spades
3 of Hearts Cards selected to
3 of Diamonds be in the deck
3 of Clubs
4 of Spades
4 of Hearts
Card drawn
in this repetition
Start Stop Pause
Value of card drawn
in this repetition
Repetition
Figure 2.4
Card Draw simulation
The Card Draw simulation in our econometrics lab allows us to confirm this:
As before, the 2♣, 3♥, 3♦, and 4♥ are selected by default (figure 2.4); so just click Start. Recall
that the simulation now randomly selects one of the four cards. The numerical value of the card
selected is reported. Note that the mean and variance of the numerical values are also reported.
You should convince yourself that the simulation is calculating the mean and variance correctly
by clicking Continue and calculating the mean and variance yourself. You will observe that the
simulation is indeed performing the calculations accurately. If you are still skeptical, click Con-
tinue again and perform the calculations. Do so until you are convinced that the mean and
variance reported by the simulation are indeed correct.
Next uncheck the Pause checkbox and click Continue. The simulation no longer pauses after
each card is selected. It will now repeat the experiment very rapidly. After many, many repeti-
tions click Stop. What happens as the number of repetitions becomes large?
58 Chapter 2
• The mean of the numerical values is about 3. This is consistent with our equation for the mean
of the random variable’s probability distribution:
1 1 1
Mean[ v] = 2 × + 3× + 4×
4 2 4
1 3
= + + 1 = 3
2 2
• The variance of the numerical values is about 0.5. This is consistent with our equation for the
variance of the random variable’s probability distribution:
1 1 1
Var[ v] = 1 × + 0× + 1×
4 2 4
1 1 1
= + 0 + = = 0.5
4 4 2
The simulation illustrates that the equations we use to compute the mean and variance are indeed
correct.
A continuous random variable, unlike a discrete random variable, can take on a continuous
range of values, a continuum of values. To learn more about these random variables, consider
the following example. Dan Duffer consistently hits 200 yard drives from the tee. A diagram of
the eighteenth hole appears in figure 2.5. The fairway is 32 yards wide 200 yards from the tee.
While the length of Dan’s drives is consistent (he always drives the ball 200 yards from the tee),
he is not consistent “laterally.” That is, his drives sometimes go to the left of where he aims and
sometimes to the right. Despite all the lessons Dan has taken, his drive can land up to 40 yards
to the left and up to 40 yards to the right of his target point. Suppose that Dan’s target point is
the center of the fairway. Since the fairway is 32 yards wide, there are 16 yards of fairway to
the left of Dan’s target point and 16 yards of fairway to the right.
The probability distribution appearing below the diagram of the eighteenth hole describes the
probability that his drive will go to the left and right of his target point. v equals the lateral
distance from Dan’s target point. A negative v represents a point to the left of the target point
and a positive v a point to the right. Note that v can take an infinite number of values between
−40 and +40: v can equal 10 or 16.002 or −30.127, and so on. v is a continuous rather than a
discrete random variable. The probability distribution at the bottom of figure 2.5 indicates how
likely it is for v to equal each of its possible values.
59 Essentials of Probability and Estimation Procedures
Eighteen hole
Fairway
32 yards
Lake
Target
200 yards
Left Right
rough rough
Tee
Probability distribution
0.025
0.020
0.015
0.010
0.005
v
–40 –32 –24 –16 –8 0 8 16 24 32 40
Figure 2.5
A continuous random variable
What is the area beneath the probability distribution? Applying the equation for the area of a
triangle obtains
1 1
Area beneath = × 0.025 × 40 + × 0.025 × 40
2 2
= 0.5 + 0.5 = 1
The area equals 1. This is not accidental. Dan’s probability distribution illustrates the property
that all probability distributions must exhibit:
• The area beneath the probability distribution must equal 1.
60 Chapter 2
Eighteen hole
Fairway
32 yards
Lake
Target
200 yards
Left Right
rough rough
Tee
Probability distribution
0.025 Prob[v between –16 and +16]
v
–40 –32 –24 –16 –8 0 8 16 24 32 40
Figure 2.6
A continuous random variable—Calculating probabilities
The area equaling 1 simply means that a random variable must always take on one of its pos-
sible values (see figure 2.6). In Dan’s case the area beneath the probability distribution must
equal 1 because Dan’s ball must land somewhere.
Let us now calculate some probabilities:
• What is the probability that Dan’s drive will land in the lake? The shore of the lake lies 16
yards to the right of the target point; hence the probability that his drive lands in the lake equals
the probability that v will be greater than 16:
This just equals the area beneath the probability distribution that lies to the right of 16. Applying
the equation for the area of a triangle:
1
Prob[ Drive in lake] = × 0.015 × 24 = 0.18
2
• What is the probability that Dan’s drive will land in the left rough? The left rough lies 16 yards
to the left of the target point; hence, the probability that his drive lands in the left rough equals
the probability that v will be less than or equal to −16:
This just equals the area beneath the probability distribution that lies to the left of −16:
1
Prob[ Drive in left rough] = × 0.015 × 24 = 0.18
2
• What is the probability that Dan’s drive will land in the fairway? The probability that his drive
lands in the fairway equals the probability that v will be within 16 yards of the target point:
This just equals the area beneath the probability distribution that lies between −16 and 16. We
can calculate this area by dividing the area into a rectangle and triangle:
1
Prob[ Drive in fairway ] = 0.015 × 32 + × 0.010 × 32
2
= 0.015 × 32 + 0.005 × 322
= (0.015 + 0.005) × 32
= 0.020 × 32 = 0.64
Prob[Drive in lake] + Prob[Drive in left rough] + Prob[Drive in fairway] = 0.18 + 0.18 + 0.64
= 1.0
The sum equals 1.0, illustrating the fact that Dan’s drive must land somewhere. This example
illustrates how we can use probability distributions to compute probabilities.
We will now apply what we have learned about random variables to gain insights into statistics
that are cited in the news every day. For example, when the Bureau of Labor Statistics calculates
the unemployment rate every month, it does not interview every American, the entire American
population; instead, it gathers information from a subset of the population, a sample. More
62 Chapter 2
specifically, data are collected from interviews with about 60,000 households. Similarly political
pollsters do not poll every American voter to forecast the outcome of an election, but rather they
query only a sample of the voters. In each case a sample of the population is used to draw infer-
ences about the entire population. How reliable are these inferences? To address this question,
we consider an example.
A college student, Clinton Jefferson Williams, is running for president of his student body. On
the day before the election, Clint must decide whether or not to hold a pre-election beer tap
rally:
• If he is comfortably ahead, he will not hold the beer tap rally; he will save his campaign funds
for a future political endeavor (or perhaps a Caribbean vacation in January).
• If he is not comfortably ahead, he will hold the beer tap rally to try to sway some voters.
There is not enough time to interview every member of the student body, however. What should
Clint do? He decides to conduct a poll.
Econometrician’s philosophy
If you lack the information to determine the value directly, estimate the value to the best of your
ability using the information you do have. By conducting the poll, Clint has adopted the philoso-
phy of the econometrician. Clint uses the information collected from the 16 students, the sample,
to draw inferences about the entire student body, the population. Seventy-five percent, 0.75, of
the sample support Clint:
Estimate of the actual population fraction supporting Clint = EstFrac = 12/16 = 3/4 = 0.75
This suggests that Clint leads, does it not? But how confident should Clint be that he is in fact
ahead. Clint faces a dilemma.
Clint’s Dilemma
Should Clint be confident that he has the election in hand and save his funds or should he finance
the beer tap rally?
We will now pursue the following project to help Clint resolve his dilemma.
63 Essentials of Probability and Estimation Procedures
Project
Use Clint’s opinion poll to assess his election prospects.
In reality, Clint only conducts one poll. How, then, can a simulation of the polling process be
useful? The relative frequency interpretation of probability provides the answer. We can use a
simulation to conduct a poll many, many times. After many, many repetitions the simulation
reveals the probability distribution of the possible outcomes for the one poll that Clint
conducts:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
We will now illustrate how the probability distribution might help Clint decide whether or not
to fund the beer tap rally.
The Opinion Poll simulation in our Econometrics Lab can help Clint address his dilemma. In
the simulation, we can specify the sample size. To mimic Clint’s poll, a sample size of 16 is
selected by default (shown in figure 2.7). Furthermore we can do something in the simulation
that we cannot do in the real world. We can specify the actual fraction of the population that
supports Clint, ActFrac. By default, the actual population fraction is set at 0.5; half of all voters
support Clint and half do not. In other words, we are simulating an election that is a toss-up.
When we click the Start button, the simulation conducts a poll of 16 people and reports the
fraction of those polled that support Clint:
Number for Clint
EstFrac =
Sample size
Number for Clint
=
16
64 Chapter 2
Sample
0.1 10
size
Actual 0.2 16
population 0.3 25
fraction 0.4 50
0.5
0.6
0.7
Start Stop
Pause
Figure 2.7
Opinion Poll simulation
EstFrac equals the estimated fraction of the population supporting Clint. To conduct a second
poll, click the Continue button. Do this several times. What do you observe? Sometimes the
estimated fraction, EstFrac, may equal the actual population fraction, 0.5, but usually it does
not. Furthermore EstFrac is a random variable; we cannot predict its value with certainty before
the poll is conducted. Next uncheck the Pause checkbox and click Continue. After many, many
repetitions click Stop.
The simulation histogram illustrates that sometimes 12 or more of those polled support Clint
even though only half the population actually supports him. So it is entirely possible that the
election is a toss-up even though 12 of the 16 individuals supported Clint in his poll. In other
words, Clint cannot be completely certain that he is leading, despite the fact that 75 percent of
the 16 individuals polled supported him. And where does Clint stand? The poll results do not
allow him to conclude he is leading with certainty. What conclusions can Clint justifiably draw
from his poll results?
65 Essentials of Probability and Estimation Procedures
Experiment 2.2: Opinion Poll with a Sample Size of 1—An Unrealistic but Instructive
Experiment
Write the name of each student in the college, the population, on a 3 × 5 card; then:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
Population
Sample
For Clint?
Figure 2.8
Opinion Poll simulation—Sample size of one
66 Chapter 2
v Prob
For Clint 1 1/2
1/2
Individual
1/2
Not for Clint 0 1/2
Figure 2.9
Probabilities for a sample size of one
To explain how we can do so, assume for the moment that the election is actually a toss-up
as we did in our simulation; that is, assume that half the population supports Clint and half does
not. We make this hypothetical assumption only temporarily because it will help us understand
the polling process. With this assumption we can easily determine v’s probability distribution.
Since the individual is chosen at random, the chances that the individual will support Clint equal
the chances he/she will not (see figure 2.9):
Individual’s
v Prob[v]
response
1
For Clint 1
2
1
Not for Clint 0
2
We describe v’s probability distribution by calculating its center (mean) and spread (variance).
For each possible value, multiply the value and its probability; then sum the products.
There are two possible values for v, 1 and 0:
67 Essentials of Probability and Estimation Procedures
v=1 v=0
↓ ↓
1 1
Mean[ v] = 1 × + 0×
2 2
1 1
= + 0 =
2 2
This makes sense, does it not? In words, the mean of a random variable equals the average
of the values of the variable after the experiment is repeated many, many times. Recall that we
have assumed that the election is a toss-up. Consequently after the experiment is repeated many,
many times we would expect v to equal
• 1, about half of the time.
• 0, about half of the time.
After many, many repetitions of the experiment, the numerical value of v should average out to
equal 1/2.
Recall the equation and the four steps we used to calculate the variance:
For each possible value, multiply the squared deviation and its probability; then sum the
products.
• For each possible value, calculate the deviation from the mean;
• Square each value’s deviation;
• Multiply each value’s squared deviation by the value’s probability;
• Sum the products.
v=1 v=0
↓ ↓
1 1 1 1
Var[ v] = × + ×
4 2 4 2
1 1 1
= + =
8 8 4
We will now use our Opinion Poll simulation to check our mean and variance calculations by
specifying a sample size of 1. In this case the estimated fraction, EstFrac, and v are identical:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
If our calculations for the mean and variance of v’s probability distribution are correct, the mean
of the numerical values should equal about 0.50 and the variance about 0.25 after many, many
repetitions:
Mean of the numerical
Variance of numerical values
values
↓ After many, many repetitions ↓
Mean of probability Variance of probability
1 1
distribution = = 0.50 distribution = = 0.25
2 4
69 Essentials of Probability and Estimation Procedures
Table 2.2
Opinion Poll simulation results—sample size of one
1
Actual population fraction = ActFrac = p = = 0.50
2
Equations: Simulation:
1 1
= 0.50 = 0.25 >1,000,000 ≈ 0.50 ≈ 0.25
2 4
v Prob
For Clint 1 p
p
Individual
1–p
Not for Clint 0 1–p
Figure 2.10
Probabilities for a sample size of one
Table 2.2 confirms that after many, many repetitions the mean numerical value is about 0.50
and the variance about 0.25, which is consistent with our calculations.
Generalization
Thus far we have assumed that the portion of the population supporting Clint is 1/2. Let us now
generalize our analysis by letting p equal the actual fraction of the population supporting Clint’s
candidacy: ActFrac = p. The probability that that the individual selected will support Clint just
equals p, the actual fraction of the population supporting Clint (see figure 2.10):
Individual’s
v Prob(v)
Response
For Clint 1 p
Not for Clint 0 1−p
70 Chapter 2
where
For each possible value, multiply the value and its probability; then sum the products. As before,
there are two possible values for v, 1 and 0:
v=1 v=0
↓ ↓
Mean[v] = 1×p + 0 × (1 − p)
= p + 0 =p
The mean equals p, the actual fraction of the population supporting Clint.
For each possible value, multiply the squared deviation and its probability, and sum the prod-
ucts—in four steps:
• For each possible value, calculate the deviation from the mean.
• Square each value’s deviation.
• Multiply each value’s squared deviation by the value’s probability.
• Sum the products.
v=1 v=0
↓ ↓
Var[v] = (1 − p) × p + p (1 − p)
2 2
Experiment 2.3: Opinion Poll with Sample Size of 2—Another Unrealistic but Instructive
Experiment
In reality, we would never use a poll of only two individuals to estimate the actual fraction of
the population supporting Clint. Nevertheless, analyzing such a case is instructive. Therefore let
us consider an experiment in which two individuals are polled (figure 2.11). Remember, we have
written the name of each student enrolled in the college on a 3 × 5 card.
In the first stage:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer; this yields a specific numeri-
cal value of v1 for the random variable. v1 equals 1 if the first individual polled supports Clint;
0 otherwise.
• Replace the card.
Population
Sample
Individual 1
for Clint?
Individual 2
for Clint?
Figure 2.11
Opinion Poll simulation—Sample size of two
The estimated fraction of the population supporting Clint, EstFrac, is a random variable. We
cannot determine with certainty the numerical value of the estimated fraction, EstFrac, before
the experiment is conducted.
The probability distribution reports the likelihood of each possible outcome. We can describe
the probability distribution by calculating its center (mean) and spread (variance).
⎡1 ⎤
Mean[ EstFrac] = Mean ⎢ (v1 + v2 ) ⎥
⎣ 2 ⎦
What do we know that would help us calculate the mean? We know the means of v1 and v2; also
we know about the arithmetic of means.
That is, we already have the means of the random variables v1 and v2:
• The first stage of the experiment is identical to the previous experiment in which only one
card is drawn; consequently
Mean[v1] = Mean[v] = p
73 Essentials of Probability and Estimation Procedures
where
Mean[v2] = Mean[v] = p
We will focus on
⎡1 ⎤
Mean ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
and apply the arithmetic of means:
Mean[v1] = Mean[v2] = p
1
= [ p + p]
2
simplifying
1
= [ 2 p]
2
=p
⎡1 ⎤
Var[ EstFrac] = Var ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
74 Chapter 2
That is, we already have the variances of the random variables v1 and v2:
• The first stage of the experiment is identical to the previous experiment in which only one
card was drawn; consequently
Now focus on the covariance of v1 and v2, Cov[v1, v2]. The covariance tells us whether the
variables are correlated or independent. On the one hand, when two variables are correlated their
covariance is nonzero; knowing the value of one variable helps us predict the value of the other.
On the other hand, when two variables are independent their covariance equals zero; knowing
the value of one does not help us predict the other.
In this case, v1 and v2 are independent and their covariance equals 0. Let us explain why. Since
the first card drawn is replaced, whether or not the first voter polled supports Clint does not
affect the probability that the second voter will support Clint. Regardless of whether or not the
first voter polled supported Clint, the probability that the second voter will support Clint is p,
the actual population fraction:
Cov[v1, v2] = 0
Focus on
⎡1 ⎤
Var ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
75 Essentials of Probability and Estimation Procedures
Cov[v1, v2] = 0
1
= [Var[ v1 ] + Var[ v2 ]]
4
simplifying
1
= [2 p(1 − p)]
4
p(1 − p)
=
2
As before, we can use the simulation to check the equations we just derived by exploiting the
relative frequency interpretation of probability. When the experiment is repeated many, many
times, the relative frequency of each outcome equals its probability. After many, many repetitions
the distribution of the numerical values from all the repetitions mirrors the probability
distribution:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
76 Chapter 2
To check the equations, we will specify a sample size of 2 and select an actual population
fraction of 0.50. Using the equations we derived, the mean of the estimated fraction’s probability
distribution should be 0.50 and the variance should be 0.125:
Mean[EstFrac] = p = 0.50
1 ⎛ 1⎞ 1 1
1−
p(1 − p) 2 ⎜⎝ 2 ⎟⎠ 2 × 2 1
Var[ EstFrac] = = = = = 0.125
2 2 2 8
Be certain the simulation’s Pause checkbox is cleared. Click Start and then, after many, many
repetitions, click Stop.
The simulation results (table 2.3) suggest that our equations are correct. After many, many
repetitions the mean (average) of the numerical values equals the mean of the probability dis-
tribution, 0.50. Similarly, the variance of the numerical values equals the variance of the prob-
ability distribution, 0.125.
Table 2.3
Opinion Poll simulation results—sample size of two
1
Actual population fraction = ActFrac = p = = 0.50
2
Equations Simulation
Mean of Variance of
Mean of Variance of numerical numerical
EstFrac’s EstFrac’s values of values of
Sample probability probability Simulation EstFrac from EstFrac from
size distribution distribution repetitions the experiments the experiments
1 1
2 = 0.50 = 0.125 >1,000,000 ≈0.50 ≈0.125
2 8
77 Essentials of Probability and Estimation Procedures
2.6 Mean, Variance, and Covariance: Data Variables and Random Variables
In this chapter we extended the notions of mean, variance, and covariance that we introduced
in chapter 1 from data variables to random variables. The mean and variance describe the dis-
tribution of a single variable. The mean depicts the center of a variable’s distribution; the variance
depicts the distribution spread. In the case of a data variable, the distribution is illustrated by a
histogram; consequently the mean and variance describe the center and spread of the data vari-
able’s histogram. In the case of a random variable, mean and variance describe the center and
spread of the random variable’s probability distribution.
Covariance quantifies the notion of how two variables are related. When two data variables
are uncorrelated, they are independent; the value of one variable does not help us predict the
value of the other. In the case of independent random variables, the value of one variable does
not affect the probability distribution of the other variable and their covariance equals 0.
Chapter 2 Exercises
2♠ 2♥ 2♦ 2♣ 3♠
3♥ 3♦ 4♠ 4♥ 5♦
78 Chapter 2
Using these equations, calculate the mean (expected value) and variance of the random variable
v’s probability distribution.
Use the relative frequency interpretation of probability to check your answers to parts b and
c by clicking on the following link:
d. After the experiment is repeated many, many times, does the distribution of numerical
values from the experiments mirror the random variable v’s probability distribution?
e. After the experiment is repeated many, many times, how are the mean and variance of the
random variable v’s probability distribution related to the mean and variance of the numerical
values?
2. Consider the following experiment. Using the same deck from question 1,
• Thoroughly shuffle the deck of 10 cards.
• Draw one card and record its value.
• Replace the card drawn.
• Thoroughly shuffle the deck of 10 cards.
• Draw a second card and record its value.
79 Essentials of Probability and Estimation Procedures
Let
(Note: This experiment differs from the earlier one in that the first card drawn is not replaced.)
As before, let
Next suppose that instead of 10 cards, the deck contains 10,000 cards: 4,000 2’s, 3,000 3’s, 2,000
4’s, and 1,000 5’s. Consider this new experiment:
• Thoroughly shuffle the deck of 10,000 cards.
• Draw one card and record its value.
• Do not replace the card drawn.
• Thoroughly shuffle the remaining 9,999 cards.
• Draw a second card and record its value.
80 Chapter 2
Compare the consequences of not replacing the first card drawn with the deck of 10 cards versus
the deck of 10,000.
d. Compare your answers to parts a and c. As the population size increases (i.e., as the
number of cards in the deck increases), is the probability of drawing a 2 on the second draw
affected more or less by whether or not a 2 is drawn on the first draw?
e. Suppose that a student assumes v1 and v2 to be independent even though the first card is
not replaced. As the population size increases, would this assumption become a better or
worse approximation of reality?
f. In view of the fact that there are more than 100 million American voters, should a profes-
sional pollster of American sentiments worry about “the replacement issue?”
4. A European roulette wheel has 37 slots around its perimeter (see figure 2.12). The slots are
numbered 1 through 36 and 0. You can place a bet on the roulette board.
Figure 2.12
Game of roulette (© Can Stock Photo Inc./RaStudio and © Can Stock Photo Inc./oorka)
81 Essentials of Probability and Estimation Procedures
Many different types of bets can be made. You can bet on a single number, a row of numbers,
a column of numbers, a set of twelve numbers, all red numbers, all black numbers, all even
numbers, or all odd numbers. (Note: The rows, columns, twelves, reds, blacks, evens, and odds
do not include 0.) Once all bets are placed, the roulette wheel is spun and a ball is dropped into
the spinning wheel. Initially, the ball bounces wildly around the wheel, but eventually, it settles
into one of the 37 slots. If this is a slot that you bet on, you win; the amount of winnings depends
on the type of bet you made:
If the ball does not settle into a slot you bet on, you lose your bet. Suppose that you always
place a $1 bet. Let
v = Your net winnings = Your gross winnings − $1
Roulette wheels are precisely balanced so that the ball is equally likely to land in each of the
37 slots.
a. Suppose that you place a $1 bet on the first set of twelve numbers.
i. If the ball ends up in one of the first 12 slots, what will v equal? What is the probability
of this scenario?
ii. If the ball does not end up in one of the first 12 slots, what will v equal? What is the
probability of this scenario?
iii. In this scenario, what are the mean (expected value) and variance of v?
b. Suppose that you place a $1 bet on red.
i. If the ball ends up in one of the 18 red slots, what will v equal? What is the probability
of this scenario?
ii. If the ball does not end up in one of the 18 red slots, what will v equal? What is the
probability of this scenario?
iii. In this scenario, what are the mean (expected value) and variance of v?
c. Compare the two bets. How are they similar? How are they different?
82 Chapter 2
Radii: 4 cm increments
10 9 8 7 6 5 4 3 2 1
80 cm
Figure 2.13
Assignment of archery points
5. The International Archery Federation establishes the rules for archery competitions. The
Federation permits the distance between the competitor and the target as well as the size of the
target to vary from competition to competition. Distance varies from 18 to 90 meters; the size
of the target varies from 40 to 122 centimeters in diameter. Say a friend, Archie, is participating
in a 60-meter contest. At a distance of 60 meters, the Federation specifies a target 80 centimeters
in diameter. At this distance Archie, an excellent archer, always shoots his arrows within 20
centimeters of the target’s center. Figure 2.13 described how points are assigned:
• 10 points if the arrow strikes within 4 centimeters of the target’s center.
• 9 points if the arrow strikes between 4 and 8 centimeters of the target’s center.
• 8 points if the arrow strikes between 8 and 12 centimeters of the target’s center.
• 7 points if the arrow strikes between 12 and 16 centimeters of the target’s center.
• 6 points if the arrow strikes between 16 and 20 centimeters of the target’s center.
83 Essentials of Probability and Estimation Procedures
Probability distribution
0.10
0.08
0.06
0.04
0.02
v
4 8 12 16 20 24 28 32 36 40
Figure 2.14
Probability distribution of v, the distance of Archie’s arrow from the target center
• 5 points if the arrow strikes between 20 and 24 centimeters of the target’s center.
• 4 points if the arrow strikes between 24 and 28 centimeters of the target’s center.
• 3 points if the arrow strikes between 28 and 32 centimeters of the target’s center.
• 2 points if the arrow strikes between 32 and 36 centimeters of the target’s center.
• 1 points if the arrow strikes between 36 and 40 centimeters of the target’s center.
Figure 2.14 describes the probability distribution of v, the distance of Archie’s arrow from the
target center; v is the distance of Archie’s arrow from the target’s center.
a. Explain why the area beneath Archie’s probability distribution must equal 1.
b. What is the probability that Archie will score at least 6 points?
c. What is the probability that Archie will score 10 points?
d. What is the probability that Archie will score 9 points?
e. What is the probability that Archie will score 7 or 8 points?
6. Recall our friend Dan Duffer who consistently hits 200 yard drives from the tee. Also recall
that Dan is not consistent “laterally”; his drive can land as far as 40 yards to the left or right of
his target point. Here v is the distance from Dan’s target point (see figure 2.15).
Dan has had a tough day on the course and has lost many golf balls. He only has one ball left
and wants to finish the round. Accordingly he wants to reduce the chances of driving his last
ball into the lake. So, instead of choosing a target point at the middle of the fairway, as indicated
in the figure to the right, he contemplates aiming his drive 8 yards to the left of the fairway
midpoint.
84 Chapter 2
Eighteen hole
Fairway
Target
32 yards
Lake
Target
8
yards
200 yards
Left Right
rough rough
Tee
Probability distribution
0.025
0.020
0.015
0.010
0.005
v
–40 –32 –24 –16 –8 0 8 16 24 32 40
Figure 2.15
Dan Duffer’s eighteenth hole
a. Revise and realign the figure to the right to reflect the new target point that Dan is
contemplating.
Based on this new target point:
b. What is the probability that his drive will land in the lake? ______
c. What is the probability that his drive will land in the left rough? ______
d. What is the probability that his drive will land in the fairway? ______
85 Essentials of Probability and Estimation Procedures
7. Joe passes through one traffic light on his daily commute to work. The traffic department has
set up the traffic light on a one minute cycle:
Red 30 seconds
Yellow 5 seconds
Green 25 seconds
Joe, a safe driver, decides whether or not to brake for the traffic light when he is 10 yards from
the light. If the light is red or yellow, he brakes; otherwise, he continues on.
a. When Joe makes his brake/continue decision next Monday, what is the probability that
the light will be
i. Red? _____
ii. Yellow? _____
iii. Green? _____
b. What is the probability that his brake/continue decision next Monday will be
i. Brake? _____
ii. Continue? _____
c. What is the probability that Joe will not stop at the light for the next five workdays during
his daily commute to work?
8. To avoid studying, you and your roommate decide to play the following game:
•Thoroughly shuffle your roommate’s standard deck of fifty-two cards: 13 spades, 13 hearts,
13 diamonds, and 13 clubs.
• Draw one card.
• If the card drawn is red, you win $1 from your roommate; if the card drawn is black, you lose
a $1.
• Replace the card drawn.
After you play the game once, you both decide to play it again. Let us modify the notation to
reflect this:
TNW = v1 + v2 + . . . + v18
Chapter 3 Outline
3.1 Review
3.1.1 Random Variables
3.1.2 Relative Frequency Interpretation of Probability
3.2 Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution
3.2.1 Measure of the Probability Distribution Center: Mean of the Random Variable
3.2.2 Measure of the Probability Distribution Spread: Variance of the Random Variable
3.2.3 Why Is the Mean of the Estimate’s Probability Distribution Important? Biased and
Unbiased Estimation Procedures
3.2.4 Why Is the Variance of the Estimate’s Probability Distribution Important? Reliability
of Unbiased Estimation Procedures
⎡1 ⎤
Mean ⎢ (v1 + v2 + . . . + vT ) ⎥ = p
⎣T ⎦
whenever Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = . . . = Mean[vT] = p.
⎡1 ⎤ p(1 − p)
Var ⎢ (v1 + v2 + . . . + vT ) ⎥ =
⎣T ⎦ T
Whenever
• Var[vt] = p(1 − p) for each t; that is, Var[v1] = Var[v2] = . . . = Var[vT] = p(1 − p)
and
• vt’s are independent; that is, all the covariances equal 0.
6. Would you have more confidence in a poll that queries a small number of individuals or a
poll that queries a large number?
3.1 Review
Remember, random variables bring both bad and good news. Before the experiment is
conducted:
Bad news: What we do not know: on the one hand, we cannot determine the numerical value
of the random variable with certainty.
89 Interval Estimates and the Central Limit Theorem
Good news: What we do know: on the other hand, we can often calculate the random variable’s
probability distribution telling us how likely it is for the random variable to equal each of its
possible numerical values.
After many, many repetitions of the experiment the distribution of the numerical values from
the experiments mirrors the random variable’s probability distribution.
3.2 Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution
Polling procedures use information gathered from a sample of the population to draw inferences
about the entire population. In the previous chapter we considered two unrealistic samples sizes,
a sample size of 1 and a sample size of 2. Common sense suggests that such small samples
would not be helpful in drawing inferences about an entire population. We considered these
unrealistic sample sizes to lay the groundwork for realistic ones. We are now prepared to analyze
the general case in which the sample size equals T. Let us return to our friend Clint who is
running for president of his student body. Consider the following experiment:
Write the names of every individual in the population on a card. Perform the following procedure
T times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
•Ask that individual if he/she supports Clint; the individual’s answer determines the numerical
value of vt: vt equals 1 if the tth individual polled supports Clint; 0 otherwise.
• Replace the card.
where T = sample size. The estimated fraction of the population supporting Clint, EstFrac, is a
random variable. We cannot determine the numerical value of the estimated fraction, EstFrac,
with certainty before the experiment is conducted.
90 Chapter 3
3.2.1 Measure of the Probability Distribution Center: Mean of the Random Variable
First, consider the mean. Apply the arithmetic of means and what we know about the vt’s:
⎡1 ⎤
Mean[ EstFrac] = Mean ⎢ ( v1 + v2 + . . . + vT )⎥
⎣T ⎦
since Mean[cx] = c Mean[x]
1
= Mean [( v1 + v2 + . . . + vT )]
T
Simplifying obtains
=p
91 Interval Estimates and the Central Limit Theorem
3.2.2 Measure of the Probability Distribution Spread: Variance of the Random Variable
Next, focus on the variance. Apply the arithmetic of variances and what we know about the vt’s:
⎡1 ⎤
Var[ EstFrac] = Var ⎢ ( v1 + v2 + . . . + vT )⎥
⎣T ⎦
since Var[cx] = c2Var[x]
1
= Var [( v1 + v2 + . . . + vT )]
T2
since Var[x + y] = Var[x] + Var[y] when x and y are independent; hence the covariances are
all 0.
1
= (Var[v1 ] + Var[ v2 ] + . . . + Var[ vT ])
T2
Simplifying obtains
p(1 − p)
=
T
To summarize:
p(1 − p)
Mean[ EstFrac] = p, Var[ EstFrac] =
T
where p = ActFrac = actual fraction of the population supporting Clint and T = sample size.
Once again, we will exploit the relative frequency interpretation of probability to check the
equations for the mean and variance of the estimated fraction’s probability distribution:
92 Chapter 3
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
We just derived the mean and variance of the estimated fraction’s probability distribution:
p(1 − p)
Mean[ EstFrac] = p, Var[ EstFrac] =
T
where p = ActFrac and T = sample size. Consequently after many, many repetitions the mean
of these numerical values should equal approximately p, the actual fraction of the population
that supports Clint, and the variance should equal approximately p(1 − p)/T.
In the simulation we begin by specifying the fraction of the population supporting Clint. We
could choose any actual population fraction; for purposes of illustration we choose 1/2 here.
1
Actual fraction of the population supporting Clint = ActFrac = = 0.50
2
1 ⎛ 1⎞ 1 1 1
⎜1 − ⎟ ×
1 2 ⎝ 2⎠ 2 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T T T 4T
We will now use our simulation to consider different sample sizes, different T’s.
Our simulation results appearing in table 3.1 are consistent with the equations we derived. After
many, many repetitions the means of the numerical values equal the means of the estimated
fraction’s probability distribution. The same is true for the variances.
Public opinion polls use procedures very similar to that described in our experiment. A specific
number of people are asked who or what they support and then the results are reported. We can
think of a poll as one repetition of our experiment. Pollsters use the numerical value of the
estimated fraction from one repetition of the experiment to estimate the actual fraction. But how
reliable is such an estimate? We will now show that the reliability of an estimate depends on the
mean and variance of the estimate’s probability distribution.
93 Interval Estimates and the Central Limit Theorem
Table 3.1
Opinion Poll simulation results with selected sample sizes
Equations Simulation
1 1
1 = 0.50 = 0.25 >1,000,000 ≈ 0.50 ≈ 0.25
2 4
1 1
2 = 0.50 = 0.125 >1,000,000 ≈ 0.50 ≈ 0.125
2 8
1 1
25 = 0.50 = 0.01 >1,000,000 ≈ 0.50 ≈ 0.01
2 100
1 1
100 = 0.50 = 0.0025 >1,000,000 ≈ 0.50 ≈ 0.0025
2 400
1 1
400 = 0.50 = 0.000625 >1,000,000 ≈ 0.50 ≈ 0.000625
2 1, 600
3.2.3 Why Is the Mean of the Estimate’s Probability Distribution Important? Biased and
Unbiased Estimation Procedures
Recall Clint’s poll in which 12 of the 16 individuals queried supported him. The estimated frac-
tion, EstFrac, equaled 0.75:
12
EstFrac = = 0.75
16
In chapter 2 we used our Opinion Poll simulation to show that this poll result did not prove with
certainty that the actual fraction of the population supporting Clint exceeded 0.50. In general,
we observed that while it is possible for the estimated fraction to equal the actual population
fraction, it is more likely for the estimated fraction to be greater than or less than the actual
fraction. In other words, we cannot expect the estimated fraction from a single poll to equal the
actual population fraction.
What then can we conclude? We know that the estimated fraction is a random variable. While
we cannot determine its numerical value with certainty before the experiment is conducted, we
can describe its probability distribution. A random variable’s mean describes the center of its
probability distribution. Using a little algebra, we showed that the mean of the estimated frac-
tion’s probability distribution equals the actual fraction of the population supporting Clint.
Whenever the mean of an estimate’s probability distribution equals the actual value, the estima-
tion procedure is unbiased as illustrated in figure 3.1:
94 Chapter 3
Probability distribution
EstFrac
Mean[EstFrac] = ActFrac
Figure 3.1
Probability distribution of EstFrac values
When the estimation procedure is unbiased, the average of the numerical values of the estimated
fractions equaled the actual population fraction after many, many repetitions. Table 3.1 reports
that this is true.
We can obtain even more intuition about unbiased estimation procedures when the probability
distribution of the estimate is symmetric. In this case the chances that the estimated fraction will
be less than the actual population fraction in one repetition equal the chances that the estimated
fraction will be greater than the actual fraction. We will use a simulation to illustrate this.
Figure 3.2 illustrates the defaults. An actual population fraction of 0.50 and a sample size of
100 are specified. Two new lists appear in the lower left of the window: a From list and a To
Repetition
Mean (average) of the
EstFrac numerical values of
From To
the estimated fractions
0.000 0.425 from all repetitions
Mean
0.025 0.450
0.050 0.475 Var Variance of the
0.075 0.500 numerical values of
the estimated fractions
From–To percent
from all repetitions
Figure 3.2
Opinion Poll simulation
96 Chapter 3
list. By default, a From value of .000 and a To value of .500 are selected. The From–To Percent
line reports the percentage of repetitions in which the estimated fraction lies between the From
value, .000, and the To value, .500.
Check to be certain that the simulation is calculating the From–To Percent correctly by click-
ing Start and then Continue a few times. Then clear the Pause box and click Continue. After
many, many repetitions click Stop. The From–To Percent equals approximately 50 percent. The
estimates in approximately 50 percent of the repetitions are less than 0.5, the actual value; con-
sequently approximately 50 percent of the repetitions are greater than 0.5, the actual value. The
chances that the estimated fraction will be less than the actual population fraction in one repeti-
tion equal the chances that the estimated fraction will be greater than the actual fraction.
To summarize, there are two important points to make about Clint’s poll:
Bad news: We cannot expect the estimated fraction from Clint’s poll, 0.75, to equal the actual
population fraction.
Good news: The estimation procedure that Clint used is unbiased. The mean of the estimated
fraction’s probability distribution equals the actual population fraction:
The estimation procedure does not systematically underestimate or overestimate the actual
population fraction. If the probability distribution is symmetric, the chances that the estimated
fraction will be less than the actual population fraction equal the chances that the estimated
fraction will be greater.
3.2.4 Why Is the Variance of the Estimate’s Probability Distribution Important? Reliability of
Unbiased Estimation Procedures
We will use the polling simulation to illustrate the importance of the probability distribution’s
variance.
In addition to specifying the actual population fraction and the sample size, the simulation
includes the From–To lists.
As before, the actual population fraction equals 0.50 by default. Select 0.450 from the From list
and 0.550 from the To list. The simulation will now calculate the percent of the repetitions in
which the numerical value of the estimated fraction lies within the 0.450 to 0.550 interval. Since
we have specified the actual population fraction to be 0.50, the simulation will report the percent
97 Interval Estimates and the Central Limit Theorem
Table 3.2
Opinion Poll From–To simulation results
95%
39% 69%
Figure 3.3
Histograms of estimated fraction numerical values
of repetitions in which the numerical value of the estimate fraction lies within 0.05 of the actual
fraction. Initially a sample size of 25 is selected. Note that the Pause checkbox is cleared. Click
Start and then after many, many repetitions click Stop. Next consider sample sizes of 100 and
400. Table 3.2 reports the results for the three sample sizes:
When the sample size is 25, the numerical value of the estimated fraction falls within 0.05 of
the actual population fraction in about 39 percent of the repetitions. When the sample size is
100, the numerical value of the estimated fraction falls within 0.05 of the actual population
fraction in about 69 percent of the repetitions. When the sample size is 400, the numerical value
of the estimated fraction falls within 0.05 of the actual population fraction in about 95 percent
of the repetitions (see figure 3.3).
The variance plays the key role here. On the one hand, when the variance is large, the distri-
bution is “spread out”; the numerical value of the estimated fraction falls within 0.05 of the
actual fraction relatively infrequently. On the other hand, when the variance is small, the distri-
bution is tightly “cropped” around the actual population fraction, 0.50; consequently the numeri-
cal value of the estimated fraction falls within 0.05 of the actual population fraction more
frequently.
98 Chapter 3
We can now exploit the relative frequency interpretation of probability to obtain a quantitative
sense of how much confidence we should have in the results of a single opinion poll. We do so
by considering the following interval estimate question:
Interval estimate question: What is the probability that the numerical value of the estimated
fraction, EstFrac, from one repetition of the experiment lies within ___ of the actual population
fraction, ActFrac? ______
Since we are focusing on the interval from 0.450 to 0.550 and the actual population fraction is
specified as 0.50, we can enter 0.05 in the first blank:
Interval estimate question: What is the probability that the numerical value of the estimated
fraction, EstFrac, from one repetition of the experiment lies within 0.05 of the actual population
fraction, ActFrac? ______
Begin by focusing on a sample size of 25. In view of what we just learned from the simula-
tion, we can now answer the interval estimate question. After many, many repetitions of the
experiment, the numerical value of the estimated fraction falls within 0.05 of the actual value
about 39 percent of the time. Now apply the relative frequency interpretation of probability.
When the experiment is repeated many, many times, the relative frequency of each outcome
equals its probability. Consequently, when the sample size is 25, the probability that the numeri-
cal value of the estimated fraction in one repetition of the experiment falls within 0.05 of the
actual value is about 0.39. By the same logic, when the sample size is 100, the probability that
the numerical value of the estimated fraction in one repetition of the experiment will fall within
0.05 of the actual value is about 0.69. When the sample size is 400, the probability that the
numerical value of the estimated fraction in one repetition of the experiment will fall within 0.05
of the actual value is about 0.95 (see table 3.3 and figure 3.4).
As the sample size becomes larger, it becomes more likely that the estimated fraction resulting
from a single poll will be close to the actual population fraction. This is consistent with our
intuition, is it not? When more people are polled, we have more confidence that the estimated
fraction will be close to the actual value.
We can now generalize what we just learned (shown in figure 3.5). When an estimation pro-
cedure is unbiased, the variance of the estimate’s probability distribution is important because
it determines the likelihood that the estimate will be close to the actual value. When the probabil-
ity distribution’s variance is large, it is unlikely that the estimated fraction from one poll will be
close to the actual population fraction; consequently the estimated fraction is an unreliable
estimate of the actual population fraction. However, when the probability distribution’s variance
99 Interval Estimates and the Central Limit Theorem
Table 3.3
Interval estimate question and Opinion Poll simulation results
0.95
0.39 0.69
Figure 3.4
Probability distribution of estimated fraction values
EstFrac EstFrac
ActFrac ActFrac
Figure 3.5
Probability distribution of estimated fraction values
100 Chapter 3
is small, it is likely that the estimated fraction from one poll will be close to the actual popula-
tion fraction; in this case the estimated fraction is a reliable estimate of the actual population
fraction.
You might have noticed that the distributions of the numerical values produced by the simulation
for the samples of 25, 100, and 400 look like bell-shaped curves. Although we do not provide
a proof, it can be shown that as the sample size increases, the distribution gradually approaches
what mathematicians and statisticians call the normal distribution. Formally, this result is known
as the Central Limit Theorem.
We will illustrate the Central Limit Theorem by using our Opinion Poll simulation. Again, let
the actual population fraction equal 0.50 and consider three different sample sizes: 25, 100, and
400. In each case we will use our simulation to calculate interval estimates for 1, 2, and 3 stan-
dard deviations around the mean.
1 1 1
1 p(1 − p) 2 × 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T 25 25 100
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.10
100 10
When the sample size equals 25, the standard deviation is 0.10. Since the distribution mean
equals the actual population fraction, 0.50, 1 standard deviation around the mean would be from
0.400 to 0.600, 2 standard deviations from 0.300 to 0.700, and 3 standard deviations from 0.200
to 0.800. In each case specify the appropriate From–To values and be certain that the Pause
checkbox is cleared. Click Start, and then after many, many repetitions click Stop. The simula-
tion results are reported in table 3.4.
101 Interval Estimates and the Central Limit Theorem
Table 3.4
Interval percentages for a sample size of 25
Interval: Simulation:
Standard deviations within Percent of repetitions
random variable’s mean From To within interval
Table 3.5
Interval percentages for a sample size of 100
Interval: Simulation:
Standard deviations within Percent of repetitions
random variable’s mean From To within interval
1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = 4 =
2 T 100 1000 400
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.05
400 20
When the sample size equals 100, the standard deviation is 0.05. Since the distribution mean
equals the actual population fraction, 0.50, 1 standard deviation around the mean would be from
0.450 to 0.550, 2 standard deviations from 0.400 to 0.600, and 3 standard deviations from 0.350
to 0.650.
102 Chapter 3
Table 3.6
Interval percentages for a sample size of 400
Interval: Simulation:
Standard deviations within Percent of repetitions
distribution mean From To within interval
1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = 4 =
2 T 400 4000 1, 600
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.025
1, 600 40
When the sample size equals 400, the standard deviation is 0.025. Since the distribution mean
equals the actual population fraction, 0.5, 1 standard deviation around the mean would be from
0.475 to 0.525; 2 standard deviations from 0.450 to 0.550, and 3 standard deviations from 0.425
to 0.575.
Let us summarize the simulation results in a single table (table 3.7). Clearly, standard devia-
tions play a crucial and consistent role here. Regardless of the sample size, approximately 68
or 69 percent of the repetitions fall within one standard deviation of the mean, approximately
95 or 96 percent within two standard deviations, and more than 99 percent within three. The
normal distribution exploits the key role played by standard deviations.
The normal distribution is a symmetric, bell-shaped curve with the midpoint of the bell occur-
ring at the distribution mean (figure 3.6). The total area lying beneath the curve is 1.0. As
mentioned before, it can be proved rigorously that as the sample size increases, the probability
distribution of the estimated fraction approaches the normal distribution. This fact allows us to
use the normal distribution to estimate probabilities for interval estimates.
Nearly every econometrics and statistics textbook includes a table that describes the normal
distribution. We will now learn how to use the table to estimate the probability that a random
variable will lie between any two values. The table is based on the “normalized value” of the
random variable. By convention, the normalized value is denoted by the letter z (figure 3.7):
103 Interval Estimates and the Central Limit Theorem
Table 3.7
Summary of interval percentages results
Sample sizes
Probability distribution
v
Distribution
mean
Figure 3.6
Normal distribution
104 Chapter 3
Probability distribution
Right–tail probability:
Probability of being
more than z standard
deviations above the
distribution mean
v
Distribution z SD'
mean
Figure 3.7
Normal distribution right-tail probabilities
In words, z tells us by how many standard deviations the value lies from the mean. If the value
of the random variable equals the mean, z equals 0.0; if the value is one standard deviation above
the mean, z equals 1.0; if the value is two standard deviations above the mean, z equals 2.0; and
so on.
The equation that describes the normal distribution is complicated (see appendix 3.1). Fortu-
nately, we can avoid using the equation because tables are available that describe the distribution.
The entire normal distribution table appears in appendix 3.1; an abbreviated portion appears in
table 3.8.
In the table the normal distribution row specifies the z value’s whole number and its tenths;
the column the z-value’s hundredths. The numbers within the body of the table estimate the
probability that the random variable lies more than z standard deviations above its mean.
105 Interval Estimates and the Central Limit Theorem
Table 3.8
Right-tail probabilities for the normal distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
..
.
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
..
.
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
..
.
For purposes of illustration, suppose that we want to use the normal distribution to calculate the
probability that the estimated fraction from one repetition of the experiment would fall between
0.525 and 0.575 when the actual population fraction was 0.50 and the sample size was 100
(figure 3.8). We begin by calculating the probability distribution’s mean and standard
deviation:
1
Sample size = T = 100, Actual population fraction = ActFrac = = 0.550
2
1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50 Var[ EstFrac] = = = 4 =
2 T 100 100 400
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.05
400 20
To calculate the probability that the estimated fraction lies between 0.525 and 0.575, we first
calculate the z-values for 0.525 and 0.575; that is, we calculate the number of standard deviations
106 Chapter 3
Probability distribution
Prob[EstFrac between 0.525 and 0.575]
Figure 3.8
Interval estimate from 0.525 to 0.575
that 0.525 and 0.575 lie from the mean. Since the mean equals 0.500 and the standard deviation
equals 0.05,
• z-value for 0.525 equals 0.50:
0.525 − 0.500 0.025 1
z= = = = 0.50
0.05 0.05 2
0.575 lies one and a half standard deviations above the mean.
Next consider the right-tail probabilities for the normal distribution in table 3.9. When we use
this table we implicitly assume that the normal distribution accurately describes the estimated
fraction’s probability distribution. For the moment, assume that this is true. The entry corre-
sponding to z equaling 0.50 is 0.3085; this tells us that the probability that the estimated fraction
lies above 0.525 is 0.3085 (figure 3.9a).
Table 3.9
Selected right-tail probabilities for the normal distribution
z 0.00 0.01
The entry corresponding to z equaling 1.50 is 0.0668; this tells us that the probability that the
estimated fraction lies above 0.525 is 0.0668 (figure 3.9b).
It is now easy to calculate the probability that the estimated fraction will lie between 0.525
and 0.575:
Just subtract the probability that the estimated fraction will be greater than 0.525 from the prob-
ability that the estimated fraction will be greater than 0.575:
With a sample size of 100, the probability that EstFrac will lie between 0.525 and 0.575 equals
0.2417. This, of course, assumes that the normal distribution describes EstFrac’s probability
distribution accurately.
To justify using the normal distribution to calculate the probabilities, reconsider our simulations
in which we calculated the percentages of repetitions that fall within one, two, and three standard
deviations of the mean after many, many repetitions. Now use the normal distribution to calculate
these percentages.
We can now calculate the probability of being within one, two, and three standard deviations
of the mean by reviewing two important properties of the normal distribution:
• The normal distribution is symmetric about its mean.
• The area beneath the normal distribution equals 1.0.
108 Chapter 3
Probability distribution
0.3085
EstFrac
0.50 SD
0.50 0.525
Figure 3.9a
Probability of EstFrac greater than 0.525
Probability distribution
0.0668
EstFrac
1.5 SD
0.50 0.575
Figure 3.9b
Probability of EstFrac greater than 0.575
109 Interval Estimates and the Central Limit Theorem
Table 3.10
Right-tail probabilities for the normal distribution
Probability distribution
Prob[within 1 SD] = 0.6826
1 SD 1 SD
Distribution
mean
Figure 3.10
Normal distribution calculations
We begin with the one standard deviation (SD) case. Table 3.10 reports that the right-hand tail
probability for z = 1.00 equals 0.1587:
We will now use that to calculate the probability of being within one standard deviation of the
mean as illustrated in figure 3.10:
• Since the normal distribution is symmetric, the probability of being more than one standard
deviation above the mean equals the probability of being more than one standard deviation below
the mean.
= 0.1587
• Since the area beneath the normal distribution equals 1.0, the probability of being within one
standard deviation of the mean equals 1.0 less the sum of the probabilities of being more than
110 Chapter 3
one standard deviation above the mean and the probalibity of being more than one standard
deviation below the mean.
= 1.0 − 0.3174
= 0.6826
Prob[2 SDs within] = 1.0 − (Prob[2 SDs below] + Prob[2 SDs above])
= 1.0 − 0.0456
= 0.9544
• Three standard deviations. As table 3.10 reports, the right-hand tail probability for z = 3.00
equals 0.0026:
Prob[3 SDs within] = 1.0 − (Prob[3 SDs below] + Prob[3 SDs above])
= 1.0 − 0.0026
= 0.9974
Table 3.11 compares the percentages calculated from our simulations with the percentages that
would be predicted by the normal distribution.
Table 3.11 reveals that the normal distribution percentages are good approximations of the
simulation percentages. Furthermore, as the sample size increases, the percentages of repetitions
within each interval gets closer and closer to the normal distribution percentages. This is pre-
cisely what the Central Limit Theorem states. We use the normal distribution to calculate interval
estimates because it provides estimates that are close to the actual values.
Table 3.12 illustrates what are sometimes called the normal distribution’s “rules of thumb.” In
round numbers, the probability of being within one standard deviation of the mean is 0.68, the
111 Interval Estimates and the Central Limit Theorem
Table 3.11
Interval percentages results and normal distribution percentages
Simulation:
Percent of repetitions
within interval
Interval: sample size Normal
Standard deviations within distribution
distribution mean 25 100 400 percentages
Table 3.12
Normal distribution rules of thumb
1 ≈0.68
2 ≈0.95
3 >0.99
probability of being within two standard deviations is 0.95, and the probability of being within
three standard deviations is more than 0.99.
Econometrician’s philosophy: If you lack the information to determine the value directly,
estimate the value to the best of your ability using the information you do have.
More specifically, he wrote the name of each student on a 3 × 5 card and repeated the following
procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
112 Chapter 3
After conducting his poll, Clint learns that 12 of the 16 students polled support him. That is, the
estimated fraction of the population supporting Clint is 0.75:
12 3
Frac =
Estimated fraction of the population supporting Clint: EstF = = 0.75
16 4
Based on the results of the poll, it looks like Clint is ahead. But how confident should he be that
this is in fact true? We will address this question in the next chapter.
Chapter 3 Exercises
1. During the 1994 to 1995 academic year, the mean Math and Verbal SAT scores in Ohio were
515 and 460. The standard deviation for both scores was 100. Consider the following two
variables:
d. Assume that the correlation coefficient for SatMath and SatVerbal equals 0.50. What is
the variance of
i. SatSum? _____
ii. SatDiff? _____
e. Assume that the correlation coefficient for SatMath and SatVerbal equals −0.50. What is
the variance of
i. SatSum? _____
ii. SatDiff? _____
f. Using your knowledge of the real world, which of the following do you find most likely.
That is, would you expect the correlation coefficient for SatMath and SatVerbal to be
0.0_____ 1.0_____ between 0.0 and 1.0_____ less than 0.0 _____ Explain.
2. Assume that the correlation coefficient for Math and Verbal SAT scores in Ohio is 0.5.
Suppose that an Ohio student is randomly chosen. What is the probability that his/her
a. SAT sum, SatSum, exceeds 1,000? ______
b. SAT Math and Verbal difference, SatDiff, exceeds 100? ______
Hint: Apply the normal distribution.
3. During the 1994 to 1995 academic year the mean Math SAT score for high school students
in Alaska was 489; in Michigan, the mean was 549. The standard deviation in both states equaled
100. A college admission officer must decide between one student from Alaska and one from
Michigan. Both students have taken the SAT, but the admission office has lost their scores. All
else being equal, the admission officer would like to admit the Alaska student for reasons of
geographic diversity, but he/she is a little concerned that the average math SAT score in Alaska
is lower.
a. Would knowledge of the Alaskan student’s Math SAT score help you predict the Michigan
student’s score, and vice versa?
b. Are the Alaskan student’s and Michigan student’s Math SAT scores independent?
c. The admission officer asks you to calculate the probability that the student from Michigan
has a higher score than the student from Alaska. Assuming that the applicants from each state
mirror that state’s Math SAT distribution, what is this probability?
4. The Wechsler Adult Intelligence Scale is a well-known IQ test. The test results are scaled so
that the mean score is 100 and the standard deviation is 15. There is no systematic difference
between the IQs of men and women. A dating service is matching male and female subscribers
whose IQs mirror the population as a whole.
a. What is the probability that a male subscriber will have an IQ exceeding 110? _____
b. What is the probability that a female subscriber will have an IQ exceeding 110? _____
114 Chapter 3
c. Assume that the dating service does not account for IQ when matching its subscribers;
consequently the IQs of the men and women who are matched are independent. Consider a
couple that has been matched by the dating service. What is the probability that both the male
and female will have an IQ exceeding 110? _____
d. Suppose instead that the dating service does consider IQ; the service tends to match high
IQ men with high IQ women, and vice versa. Qualitatively, how would that affect your answer
to part c? _____
5. Consider the automobiles assembled at a particular auto plant. Even though the cars are the
same model and have the same engine size, they obtain slightly different gas mileages. Presently
the mean is 32 miles per gallon with a standard deviation of 4.
a. What portion of the cars obtains at least 30 miles per gallon? Hint: Apply the normal
distribution.
A rental car company has agreed to purchase several thousand cars from the plant. The contract
demands that at least 90 percent of the autos achieve at least 30 miles per gallon. Engineers
report that there are two ways in which the plant can be modified to achieve this goal:
Approach 1: Increase the mean miles per gallon leaving the standard deviation unaffected; the
cost of increasing the mean is $100,000 for each additional mile.
Approach 2: Decrease the standard deviation leaving the mean unaffected; the cost of decreas-
ing the standard deviation is $200,000 for each mile reduction.
b. If approach 1 is used to achieve the objective, by how much must the mean be increased?
c. If approach 2 is used to achieve the objective, by how much must the standard deviation
be decreased?
d. Assuming that the plant owner wishes to maximize profits, which approach should
be used?
6. Recall the game described in the problems for chapter 2 that you and your roommate played:
•Thoroughly shuffle your roommate’s standard deck of fifty-two cards: 13 spades, 13 hearts,
13 diamonds, and 13 clubs.
• Draw one card.
• If the card drawn is red, you win $1 from your roommate; if the card drawn is black, you
lose a $1.
• Replace the card drawn.
TNW equals your total net winnings after you played the game eighteen times:
TNW = v1 − v2 + . . . + v18
115 Interval Estimates and the Central Limit Theorem
where vi = your net winnings from the ith repetition of the game. Recall that the mean of TNW’s
probability distribution equals 0 and the variance equals 18. Use the normal distribution to
estimate the probability of:
a. winning something: TNW greater than 0. ______
b. winning more than $2: TNW greater than 2. ______
c. losing more than $6: TNW less than −6. ______
d. losing more than $12: TNW less than −12. ______
7. Suppose that you and a friend each decide to play roulette fifty times, each time placing a
$1 bet. Focus on the total net winnings of you and your friend after you played the game fifty
times:
TNW = v1 − v2 + . . . + v50
where vi = Your net winnings from the ith spin of the roulette wheel.
a. You decide to always bet on the first set of twelve numbers.
i. Calculate the mean and variance of TNW’s probability distribution.
Using the normal distribution, estimate the probability that in net, you will
ii. win $10 or more. _____
iii. lose $10 or more. _____
b. Your friend decides to always bet on red.
i. Calculate the mean and variance of TNW’s probability distribution.
Using the normal distribution, estimate the probability that in net, he will
ii. win $10 or more. _____
iii. lose $10 or more. _____
c. A risk averse individual attempts to protect him/herself from losses. Who would be using
a more risk averse strategy, you or your friend?
116 Chapter 3
Focus on the number of individuals polled. Let us do some “back of the envelope” calculations.
For the calculations, consider only two major candidates, Bush and Kerry, and assume that the
election is a tossup; that is,
1
ActFrac = p = = 0.50
2
b. Compare the numbers in the table to the margins of error. What do you suspect that the
margin of error equals? ____________
Hint: Round off your “table numbers” to the nearest percent.
c. Recall that the polling procedure is unbiased. Using the normal distribution’s rules of
thumb interpret the margin of error.
117 Interval Estimates and the Central Limit Theorem
Right–tail
probability
z
0
Figure 3.11
Right-tail probability
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
118 Chapter 3
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
1
Normal distribution probability density function: e −{( x − Mean[ x ]) SD[ x ]} 2
SD[ x] 2π
Estimation Procedures, Estimates, and Hypothesis Testing
4
Chapter 4 Outline
2. After collecting evidence from a crime scene, the police identified a suspect. The suspect
provides the police with a statement claiming innocence. The district attorney is deciding
whether or not to charge the suspect with a crime. The district attorney asks a forensic expert
to examine the evidence and compare it to the suspect’s personal statement. After the expert
completes his/her work, the district attorney poses the following the question to the expert:
120 Chapter 4
Question: What is the probability that similar evidence would have arisen IF the suspect were
in fact innocent?
Initially, the forensic expert assesses this probability to be 0.50. A week later, however, more
evidence is uncovered and the expert revises the probability to 0.01. In light of the new evidence,
is it more or less likely that the suspect is telling the truth?
3. The police charge a seventeen-year-old male with a serious crime. History teaches us that no
evidence can ever prove that a defendant is guilty beyond all doubt. In this case, however, the
police do have strong evidence against the young man suggesting that he is guilty, although the
possibility that he is innocent cannot be completely ruled out. You have been impaneled on a
jury to decide this case. The judge instructs you and your fellow jurors to find the young man
guilty if you determine that he committed the crime “beyond a reasonable doubt.”
For each scenario, indicate whether the jury would be correct or incorrect.
b. Consider each scenario in which the jury errs. In each of these cases, what are the conse-
quences (the “costs”) of the error to the young man and/or to society?
4. Suppose that two baseball teams, Team RS and Team Y, have played 185 games against each
other in the last decade. Consider the following statement made by Mac Carver, a self-described
baseball authority:
Carver’s view: “Over the last decade, Team RS and Team Y have been equally strong.”
Now consider two hypothetical scenarios:
We will now return to Clint’s dilemma. The election is tomorrow and Clint must decide whether
or not to hold a pre-election beer tap rally designed to entice more students to vote for him. On
the one hand, if Clint is comfortably ahead, he could save his money by not holding the beer
tap rally. On the other hand, if the election is close, the beer tap rally could prove critical. Ideally
Clint would like to poll each member of the student body, but time does not permit this. Con-
sequently Clint decides to conduct an opinion poll by selecting 16 students at random. Clint
adopts the philosophy of econometricians:
Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
Clint wrote the name of each student on a 3 × 5 card and repeated the following procedure 16
times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
122 Chapter 4
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
Twelve of the 16 students polled support Clint. That is, the estimated fraction of the population
supporting him is 0.75:
12 3
Estimated fraction of population supporting Clint : EstFrac = = = 0.75
16 4
Based on the results of the poll, it looks like Clint is ahead. But how confident should Clint be
that he is in fact ahead. Clint faces a dilemma:
Clint’s dilemma: Should Clint be confident that he has the election in hand and save his funds
or should he finance the beer tap rally?
Our project is to use the poll to help Clint resolve his dilemma:
Our Opinion Poll simulation taught us that while the numerical value of the estimated fraction
from one poll could equal the actual population fraction, it typically does not. The simulations
showed that in most cases the estimated fraction will be either greater than or less than the actual
population fraction. Accordingly Clint must accept the fact that the actual population fraction
probably does not equal 0.75. So Clint faces a crucial question:
Crucial question:How much confidence should Clint have in his estimate? More to the point,
how confident should Clint be in concluding that he is actually leading?
To address the confidence issue, it is important to distinguish between the general properties of
Clint’s estimation procedure and the one specific application of that procedure, the poll Clint
conducted.
123 Estimation Procedures, Estimates, and Hypothesis Testing
4.1.3 Taking Stock and Our Strategy to Assess the Reliability of Clint’s Poll Results
Let us briefly review what we have done thus far. We have laid the groundwork required to
assess the reliability of Clint’s poll results by focusing on what we know before the poll is
conducted; that is, we have focused on the general properties of the estimation procedure, the
probability distribution of the estimate. In chapter 3 we derived the general equations for the
mean and variance of the estimated fraction’s probability distribution algebraically and then
checked our algebra by exploiting the relative frequency interpretation of probability in our
Opinion Poll simulation:
124 Chapter 4
Let us review the importance of the mean and variance of the estimated fraction’s probability
distribution.
Clint’s estimation procedure is unbiased because the mean of the estimated fraction’s probability
distribution equals the actual fraction of the population supporting Clint (figure 4.1):
His estimation procedure does not systematically underestimate or overestimate the actual value.
If the probability distribution is symmetric, the chances that the estimated fraction will be too
high in one poll equal the chances that it will be too low.
We used our Opinion Poll simulation to illustrate the unbiased nature of Clint’s estimation
procedure by exploiting the relative frequency interpretation of probability. After the experiment
is repeated many, many times, the average of the estimates obtained from each repetition of the
experiment equaled the actual fraction of the population supporting Clint:
125 Estimation Procedures, Estimates, and Hypothesis Testing
Probability distribution
ActFrac EstFrac
Figure 4.1
Probability distribution of EstFrac, estimated fraction values—Importance of the mean
4.1.5 Importance of the Variance (Spread) of the Estimate’s Probability Distribution for an
Unbiased Estimation Procedure
How confident should Clint be that his estimate is close to the actual population fraction? Since
the estimation procedure is unbiased, the answer to this question depends on the variance of the
estimated fraction’s probability distribution (see figure 4.2). As the variance decreases, the likeli-
hood of the estimate being “close to” the actual value increases; that is, as the variance decreases,
the estimate becomes more reliable.
126 Chapter 4
EstFrac EstFrac
ActFrac ActFrac
↓ ↓
Small probability that the Large probability that the
numerical value of the estimated numerical value of the estimated
fraction, EstFrac, from one repitition fraction, EstFrac, from one repetition
of the experiment will be close to the of the experiment will be close to the
actual population fraction, ActFrac actual population fraction, ActFrac
↓ ↓
Estimated is unreliable Estimate is reliable
Figure 4.2
Probability distribution of EstFrac, estimated fraction values—Importance of variance
Now we will apply what we have learned about the estimate’s probability distribution, the esti-
mation procedure’s general properties, to assess how confident Clint should be in concluding
that he is ahead.
The results, published in the prestigious scientific magazine Nature . . . showed a match between Jefferson
and Eston Hemings, Sally’s last child. The chances of such a match occurring randomly are less than one
in a thousand.
We will motivate the rationale behind hypothesis testing by considering a cynical view.
127 Estimation Procedures, Estimates, and Hypothesis Testing
Cynic’s view: Despite the poll results, the election is actually a toss-up.
Could the cynic be correct? Actually we have already shown that the cynic could be correct
when we introduced our Opinion Poll simulation. Nevertheless, we will do so again for
emphasis.
The Opinion Poll simulation clearly shows that 12 or even more of the 16 students selected
could support Clint in a single poll when the election is a toss-up. Accordingly we cannot simply
dismiss the cynic’s view as nonsense. We must take the cynic seriously. To assess his view, we
pose the following question. It asks how likely it would be to obtain a result like the one that
actually occurred if the cynic is correct.
Question for the cynic: What is the probability that the result from a single poll would be like
the one actually obtained (or even stronger), if the cynic is correct and the election is a
toss-up?
More specifically,
Question for the cynic: What is the probability that the estimated fraction supporting Clint
would equal 0.75 or more in one poll of 16 individuals, if the cynic is correct (i.e., if the election
is actually a toss-up and the fraction of the actual population supporting Clint equals 0.50)?
Probability that the result from a single poll would be like the
Prob[Results IF cynic correct = one actually obtained (or even stronger), IF the cynic is
correct (if the election is a toss-up)
When the probability is small, it would be unlikely that the election is a toss-up, and hence we
could be confident that Clint actually leads. When the probability is large, it is likely that the
election is a toss-up even though the poll suggests that Clint leads:
128 Chapter 4
Assessing the Cynic’s View Using the Normal Distribution: Prob[Results IF cynic correct]
How can we answer the question for the cynic? That is, how can we calculate this probability,
Prob[Results IF cynic correct]? To understand how, recall Clint’s estimation procedure, his poll:
Write the names of every individual in the population on a separate card, then perform the
following procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
• Calculate the fraction of those polled supporting Clint.
If the cynic is correct and the election is a toss-up, the actual fraction of the population support-
ing Clint would equal 1/2 or 0.50. Based on this premise, apply the equations we derived to
calculate the mean and variance of the estimated fraction’s probability distribution:
1
Sample size = T = 16, Actual population fraction = ActFrac = = 0.50
2
1 1 1
1 p(1 − p) 2 × 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T 16 16 64
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.125
64 8
Since the standard deviation is 0.125, the result of Clint’s poll, 0.75, is two standard deviations
above the mean, 0.50 (figure 4.3).
Next recall the normal distribution’s rules of thumb (as listed in table 4.1).
The rules of thumb tell us that the probability of being within two standard deviations of the
random variable’s mean is approximately 0.95. Recall that the area beneath the normal distribu-
tion equals 1.00. Since the normal distribution is symmetric, the probability of being more than
two standard deviations above the mean is 0.025 as shown in figure 4.3:
129 Estimation Procedures, Estimates, and Hypothesis Testing
0.95
0.025
2 SD 2 SD
0.25 0.50 0.75
Figure 4.3
Probability distribution of EstFrac—Calculating Prob[Results IF cynic correct]
Table 4.1
Normal distribution rules of thumb
1 ≈ 0.68
2 ≈ 0.95
3 > 0.99
If the cynic is actually correct (if the election is actually a toss-up), the probability that the frac-
tion supporting Clint would equal 0.75 or more in one poll of 16 individuals equals 0.025, that
is, 1 chance in 40. Clint must now make a decision. He must decide whether or not he is willing
to live with the odds of a 1 in 40 chance that the election is actually a toss-up. If he is willing
to do so, he will not fund the beer tap rally; otherwise, he will.
130 Chapter 4
The following five steps describe how we can formalize hypothesis testing.
Clint polls 16 students selected randomly; 12 of the 16 support him. The estimated fraction
of the population supporting Clint is 0.75 or 75 percent:
12 3
EstFrac = = = 0.75
16 4
Critical result:75 percent of those polled support Clint. This evidence, the fact that more than
half of those polled, suggests that Clint is ahead.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results the election is actually a toss-up; that is, the actual fraction
of the population supporting Clint is 0.50.
The null hypothesis adopts the cynical view by challenging the evidence; the cynic always chal-
lenges the evidence. By convention, the null hypothesis is denoted as H0. The alternative hypoth-
esis is consistent with the evidence; the alternative hypothesis is denoted as H1.
H1: ActFrac > 0.50 ⇒ Clint leads; cynic is incorrect and the evidence is correct
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
The magnitude of this probability determines whether we reject or do not reject the null hypoth-
esis; that is, the magnitude of this probability determines the likelihood that the cynic is correct
and H0 is true:
1. Traditionally this probability is called the p-value. We will use the more descriptive term, however, to emphasize
what it actually represents. Nevertheless, you should be aware that this probability is typically called the p-value.
131 Estimation Procedures, Estimates, and Hypothesis Testing
Step 4: Use the general properties of the estimation procedure, the estimated fraction’s prob-
ability distribution, to calculate Prob[Results IF H0 true].
Prob[Results IF H0 true] equals the probability that 0.75 or more of the 16 individuals polled
would support Clint if H0 is true (if the cynic is correct and the actual population fraction actu-
ally equaled 0.50); more concisely,
We will use the normal distribution to compute this probability. First calculate the mean and
variance of the estimated fraction’s probability distribution based on the premise that the null
hypothesis is true; that is, calculate the mean and variance based on the premise that the actual
fraction of the population supporting Clint is 0.50:
↓
1 1 1 1
Mean[ EstFrac] = p =
2
= 0.50 p(1 − p) 2 × 2 4 1
Var[ EstFrac] = = = =
T 16 16 64
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.125
64 8
Recall that z equals the number of standard deviations that the value lies from the mean:
Value of random variable − Distribution mean
z=
Distribution standarrd deviation
The value of the random variable equals 0.75 (from Clint’s poll); the mean equals 0.50, and the
standard deviation 0.125:
0.75 − 0.50 0.25
z= = = 2.00
0.125 0.125
Next consider the table of right-tail probabilities for the normal distribution. Table 4.2, an abbre-
viated form of the normal distribution table, provides the probability (see also figure 4.4):
132 Chapter 4
Table 4.2
Selected right-tail probabilities for the normal distribution
z 0.00 0.01
0.0228
EstFrac
2 SD
0.50 0.75
Figure 4.4
Probability distribution of EstFrac—Calculating Prob[Results IF H0 true]
= 0.0228
Clint must now decide whether he considers a probability of 0.0228 to be small or large. The
significance level is the dividing line between the probability being small and the probability
being large. The significance level Clint chooses implicitly establishes his standard of proof;
that is, the significance level establishes what constitutes “proof beyond a reasonable doubt.”
If the Prob[Results IF H0 true] is less than the significance level Clint adopts, he would judge
the probability to be “small.” Clint would conclude that it is unlikely for the null hypothesis to
be true, unlikely that the election is a tossup. He would consider the poll results in which 75
percent of those polled support him to be “proof beyond a reasonable doubt” that he is leading.
If instead the probability exceeds Clint’s significance level, he would judge the probability to
133 Estimation Procedures, Estimates, and Hypothesis Testing
be large. Clint would conclude that it is likely for the null hypothesis to be true, likely that the
election is a toss-up. In this case he would consider the poll results as not constituting “proof
beyond a reasonable doubt.”
= 0.0228
Now consider two different significance levels that are often used in academe: 5 percent and 1
percent:
Significance
level
Prob small Prob large
Figure 4.5
Significance levels and Clint’s election
If Clint should adopt a 5 percent significance level, he would reject the null hypothesis; Clint
would conclude that he leads and would not fund the beer tap rally. If instead he adopts a 1
percent significance level, he will not reject the null hypothesis; Clint would conclude that he
is not leading the election and so will fund the beer tap rally. A 1 percent significant level con-
stitutes a higher standard of proof than a 5 percent significance level; a lower significance level
makes it more difficult for Clint to conclude that he is leading (figure 4.5).
Now let us generalize. The significance level is the dividing line between what we consider
a small and large probability:
As we reduce the significance level, we make it more difficult to reject the null hypothesis; we
make it more difficult to conclude that Clint is leading. Consequently the significance level and
standard of proof are intimately related; as we reduce the significance level, we are implicitly
adopting a higher standard of proof:
What is the appropriate standard of proof for Clint? That is, what significance level should
he use? There is no definitive answer, only Clint can decide. The significance level Clint’s
chooses, his standard of proof, depends on a number of factors. In part, it depends on the impor-
tance he attaches to winning the election. If he attaches great importance to winning, he would
set a very low significance level, making it difficult to reject the null hypothesis. In this case he
would be setting a very high standard of proof; much proof would be required for him to reject
the notion that the election is a toss-up. Also Clint’s choice would depend on how “paranoid”
135 Estimation Procedures, Estimates, and Hypothesis Testing
he is. If Clint is a “worrywart” who always focuses on the negative, he would no doubt adopt a
low significance level. He would require a very high standard of proof before concluding that
he is leading. On the other hand, if Clint is a carefree optimist, he would adopt a higher signifi-
cance level and thus a lower standard of proof.
Traditionally significance levels of 1 percent, 5 percent, and 10 percent are used in academic
papers. It is important to note, however, that there is nothing “sacred” about any of these per-
centages. There is no mechanical way to decide on the appropriate significance level. We can
nevertheless address the general factors that should be considered. We will use a legal example
to illustrate this point.
Suppose that the police charge a seventeen-year-old male with a serious crime. Strong evi-
dence against him exists. The evidence suggests that he is guilty. But a word of caution is now
in order; no evidence can ever prove guilt beyond all doubt. Even confessions do not provide
indisputable evidence. There are many examples of an individual confessing to a crime that he/
she did not commit.
Again, let us play the cynic. The cynic always challenges the evidence:
Cynic’s view: Sure, there is evidence suggesting that the young man is guilty, but the evidence
results from the “luck of the draw.” The evidence is just coincidental. In fact the young man is
innocent.
The null hypothesis, H0, reflects the cynic’s view. We cannot simply dismiss the null hypothesis
as crazy. Many individuals have been convicted on strong evidence when they were actually
innocent. Every few weeks we hear about someone who, after being convicted years ago, was
released from prison as a consequence of DNA evidence indicating that he/she could not have
been guilty of the crime.
Now suppose that you are a juror charged with deciding the fate of the young man. Criminal
trials in the United States require the prosecution to prove that the defendant is guilty “beyond
a reasonable doubt.” The judge instructs you to find the defendant guilty if you believe the
evidence meets the “beyond the reasonable doubt” criterion. You and your fellow jurors must
now decide what constitutes “proof beyond a reasonable doubt.” To help you make this decision,
we will make two sets of observations. We will first express each in simple English and then
“translate” the English into “hypothesis-testing language”; in doing so, remember the null
hypothesis asserts that the defendant is innocent:
136 Chapter 4
Cost of type I error: Type I error means that an innocent young man is incarcerated; this is a
cost incurred not only by the young man, but also by society.
• Type II error: Jury finds the defendant innocent when he is actually guilty; in terms of
hypothesis-testing language, the jury does not reject the null hypothesis when the null hypothesis
is actually false.
Cost of type II error: Type II error means that a criminal is set free; this can be costly to society
because the criminal is free to continue his life of crime.
Table 4.3
Four possible scenarios
Table 4.4
Costs of type I and type II errors
Question: Suppose that the prosecutor decides to try the seventeen-year-old as an adult rather
than a juvenile. How should the jury’s standard of proof be affected?
In this case the costs of incarcerating an innocent man (type I error) would increase because the
conditions in a prison are more severe than the conditions in a juvenile detention center. Since
the costs of incarcerating an innocent man (type I error) are greater, the jury should demand a
higher standard of proof, thereby making a conviction more difficult:
Now review the relationship between the significance level and the standard of proof; a lower
significance level results in a higher standard of proof:
Significance
level
Small probability Large probability
Figure 4.6
Significance levels and the standard of proof
To make it more difficult to reject the null hypothesis, to demand a higher standard of proof,
the jury should adopt a lower significance level:
The choice of the significance level involves trade-offs, a “tightrope act,” in which we balance
the relative costs of type I and type II error (see figure 4.6). There is no automatic, mechanical
way to determine the appropriate significance level. It depends on the circumstances.
Chapter 4 Exercises
The results, published in the prestigious scientific magazine Nature . . . showed a match between Jefferson
and Eston Hemings, Sally’s last child. The chances of such a match occurring randomly are less than one
in a thousand.
The DNA evidence suggests that a relationship existed between Thomas Jefferson and Sally
Hemings.
a. Play the cynic. What is the cynic’s view?
b. Formulate the null and alternative hypotheses.
c. What does Prob[Results IF H0 true] equal?
3. During 2003 the Texas legislature was embroiled in a partisan dispute to redraw the state’s
US congressional districts. Texas Republicans charged that the districts were drawn unfairly so
as to increase the number of Texas Democrats sent to the US House of Representatives. The
Republican position was based on a comparison of the statewide popular vote for House candi-
dates in the 2002 election and the number of Democratic and Republican congressmen who
were elected:
2002 statewide vote for Congress (total of all votes in the 32 Texas congressional
districts)
Democratic votes 1,885,178
Republican votes 2,290,723
2002 Representatives elected
Democratic representatives 17
Republican representatives 15
a. What is the fraction of voters statewide who cast ballots for a Democratic candidate? Call
this fraction DemVoterFrac:
DemVoterFrac = ________
140 Chapter 4
To assess the cynic’s view, consider the following experiment: First write the names of each
citizen who voted in the Texas election on a card along with the party for whom he/she voted.
1,885,178 of these cards have the name of a Democratic voter and 2,290,723 have the name of
a Republican voter. Repeat the following 32 times:
Then calculate the fraction of the voters drawn who voted for the Democratic candidate; call
this fraction DemCongressFrac.
There is no “unfair districting” present in this experiment; that is, there is no gerrymandering
present. Every Texas voter has an equal chance of being chosen. Consequently any discrepancy
between the portion of voters who are Democrats and DemCongressFrac is just a random occur-
rence as the cynic contends.
H0: _____________________________________________________________
H1: _____________________________________________________________
4. The Electoral College became especially controversial after the 2000 presidential election
when Al Gore won the popular vote but lost the Electoral vote to George W. Bush.
To assess the cynic’s view, suppose that the following experiment was used to determine the
makeup of the Electoral College: First write the names of each citizen who voted in the 2000
presidential election on a card along with the party for whom he/she voted. Repeat the following
537 times:
Then calculate the fraction of the voters drawn who voted for the Democratic candidate; call
this fraction DemElectColFrac.
There is no unfairness present in this experiment; that is, every voter has an equal chance of
being chosen for the Electoral College. Consequently any discrepancy between the portion of
voters who are Democrats and DemElectColFrac is just a random occurrence as the cynic
contends.
e. Formulate the null and alternative hypotheses.
H0: ________________________________________________________________
H1: ________________________________________________________________
a. What fraction of the popular vote was cast for the Democratic candidate? Call this fraction
DemVoterFrac:
DemVoterFrac = _______
b. What fraction of the Electoral votes was cast for the Democratic candidate?
c. Do your answers to parts a and b suggest, at least the possibility of, Electoral College
unfairness? If so, which party, Democratic or Republican, appears to be favored?
d. Play the cynic. What is the cynic’s view?
To assess the cynic’s view, suppose that the following experiment was used to determine the
makeup of the Electoral College: First write the names of each citizen who voted in the 2008
Presidential election on a card along with the party for whom he/she voted. Repeat the following
537 times:
• Thoroughly shuffle the cards.
• Select one card at random.
• Record the party for whom the citizen voted.
• Replace the card.
Then calculate the fraction of the voters drawn who voted for the Democratic candidate; call
this fraction DemElectColFrac.
There is no unfairness present in this experiment; that is, every voter has an equal chance of
being chosen for the Electoral College. Consequently any discrepancy between the portion of
voters who are Democrats and DemElectColFrac is just a random occurrence as the cynic
contends.
143 Estimation Procedures, Estimates, and Hypothesis Testing
H0: ____________________________________________________________
H1: ____________________________________________________________
TNW equals your total net winnings after you played the game eighteen times:
TNW = v1 + v2 + . . . + v18
where vi = your net winnings from the ith repetition of the game. Recall that the mean of TNW’s
probability distribution equals 0 and the variance equals 18 when the game is played 18 times.
After you finish playing the game eighteen times, you won three times and your roommate
won fifteen times; you have lost a total of $12, your TNW equals −12.
a. Considering your losses, might you be a little suspicious that your roommate’s deck of
cards might not be a standard deck containing 26 red cards and 26 black cards? Explain why
or why not.
b. Play the cynic. What is the cynic’s view?
c. Formulate the null and alternative hypotheses. Express Prob[Results IF H0 true] in words
and in terms of TNW.
d. What does Prob[Results IF H0 true] equal?
144 Chapter 4
7. Recall the game of roulette that we described in the problems of chapters 2 and 3. While
playing roulette, you notice that the girlfriend of the casino’s manager is also playing roulette.
She always bets $1 the first set of twelve numbers. You observe that after she has played fifty
times, she has won 35 times and lost 15 times; that is, in net she has won $20, her TNW equals
20. Recall that the mean of TNW’s probability distribution equals −1.35 and the variance equals
98.60 when someone bets on the first set of twelve numbers for fifty spins of the wheel.
a. Considering her winnings, might you be a little suspicious that everything was on the “up
and up?” Explain why or why not.
b. Play the cynic. What is the cynic’s view?
c. Formulate the null and alternative hypotheses. Express Prob[Results IF H0 true] in words
and in terms of TNW.
d. What does Prob[Results IF H0 true] equal?
Ordinary Least Squares Estimation Procedure—The Mechanics
5
Chapter 5 Outline
1. The following table reports the (disposable) income earned by Americans and their total
savings between 1950 and 1975 in billions of dollars:
a. Construct a scatter diagram for income and savings. Place income on the horizontal axis
and savings on the vertical axis.
b. Economic theory teaches that savings increases with income. Do these data tend to support
this theory?
c. Using a ruler, draw a straight line through these points to estimate the relationship between
savings and income. What equation describes this line?
d. Using the equation, estimate by how much savings will increase if income increases by
$1 billion.
2. Three students are enrolled in Professor Jeff Lord’s 8:30 am class. Every week, he gives a
short quiz. After returning the quiz, Professor Lord asks his students to report the number of
minutes they studied; the students always respond honestly. The minutes studied and the quiz
scores for the first quiz appear in the table below:1
Minutes Quiz
Student studied (x) score (y)
1 5 66
2 15 87
3 25 90
1. NB: These data are not “real.” Instead, they were constructed to illustrate important pedagogical points.
147 Ordinary Least Squares Estimation Procedure—The Mechanics
a. Construct a scatter diagram for income and savings. Place minutes on the horizontal axis
and score on the vertical axis.
b. Ever since first grade, what have your parents and teachers been telling you about the
relationship between studying and grades? For the most part, do these data tend to support
this theory?
c. Using a ruler, draw a straight line through these points to estimate the relationship between
minutes studied and quiz scores. What equation describes this line?
d. Using the equation, estimate by how much a student’s quiz score would increase if that
student studies one additional minute.
3. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
4. What is the relative frequency interpretation of probability?
5. Calculus problem: Consider the following equation:
SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2
Differentiate SSR with respect to bConst and set the derivative equal to 0:
dSSR
=0
dbConst
bConst = –y − bxx–
where
y1 + y2 + y3
y=
3
x1 + x2 + x3
x=
3
SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2
Let
bConst = –y − bxx–
Substitute the expression for bConst into the equation for SSR. Show that after the substitution:
SSR = [(y1 − –y ) − bx(x1 − x– )]2 + [(y2 − –y ) − bx(x2 − x– )]2 + [(y3 − –y ) − bx(x3 − x– )]2
148 Chapter 5
Table 5.1
US annual income and savings data, 1950 to 1975
Recall the income and savings data we introduced in the chapter preview questions. Annual time
series data of US disposable income and savings from 1950 and 1975 are shown in table 5.1.
Economic theory suggests that as American households earn more income, they will save more:
The data appear to support the theory: as income increases, savings generally increase.
Question: How can we estimate the relationship between income and savings?
Answer: Draw a line through the points that best fits the data; then use the equation for the
best fitting line to estimate the relationship (figure 5.2).2
2. In reality this example exhibits a time series phenomenon requiring the use of sophisticated techniques beyond the
scope of an introductory textbook. Nevertheless, it does provide a clear way to motivate the notion of a best fitting line.
Consequently this example is a useful pedagogical tool even though more advanced statistical techniques are required
to analyze the data properly.
149 Ordinary Least Squares Estimation Procedure—The Mechanics
Savings (y)
200
1975
150
100
50
1950
0
0 200 400 600 800 1,000 1,200 1,400
Income (x)
Figure 5.1
Income and savings scatter diagram
Savings (y)
200
150
100
50
0
0 200 400 600 800 1,000 1,200 1,400
Income (x)
Figure 5.2
Income and savings scatter diagram with best fitting line
150 Chapter 5
By choosing two points on this line, we can solve for the equation of the best fitting line. It
looks like the points (200, 15) and (1200, 155) are more or less on the line. Let us use these two
points to estimate the slope:
Rise 155 − 15 140
Slope = = = = 0.14
Run 1200 − 200 1000
y − 15 = 0.14x − 28
y = 0.14x − 13
This equation suggests that if Americans earn an additional $1 of income, savings will rise by
an estimated $0.14; or equivalently, we estimate that a $1,000 increase in income causes a $140
increase in savings. Since the slope is positive, the data appear to support our theory; additional
income appears to increase savings.
Consider a second example. Three students are enrolled in Professor Jeff Lord’s 8:30 am class.
Every week, he gives a short quiz. After returning the quiz, Professor Lord asks his students to
report the number of minutes they studied; the students always respond honestly. The minutes
studied and the quiz scores for the first quiz appear in table 5.2.
The theory suggests that a student’s score on the quiz depends on the number of minutes he/
she studied:
Also it is generally believed that Professor Lord, a very generous soul, awards students some
points just for showing up for a quiz so early in the morning. Our friend Clint has been assigned
the problem of assessing the theory. Clint’s assignment is to use the data from Professor Lord’s
first quiz to assess the theory:
Table 5.2
First quiz results
1 5 66
2 15 87
3 25 90
151 Ordinary Least Squares Estimation Procedure—The Mechanics
Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz
scores.
The following equation allows us to use the simple regression model to assess the theory:
yt = βConst + βxxt + et
where
yt, the quiz score, is called the dependent variable and xt, the minutes studied, the explanatory
variable. The value of the dependent variable depends on the value of the explanatory variable.
Or putting it differently, the value of the explanatory variable explains the value of the dependent
value.
βConst and βx, the constant and coefficient of the equation, are called the parameters of the
model. To interpret the parameters recall the following:
• It is generally believed that Professor Lord gives students some points just for showing up for
the quiz.
• The theory postulates that studying more will improve a student’s score.
Using these observations, we can interpret the parameters, βConst and βx:
• βConst represents the number of points Professor Lord gives students just for showing up.
• βx represents the number of additional points earned for an additional minute of studying.
et is the error term. The error term reflects all the random influences on student t’s quiz score,
yt. For example, if, on the one hand, Professor Lord were in an unusually bad humor when he
graded one student’s quiz, that student’s quiz score might be unusually low; this would be
reflected by a negative error term. If, on the other hand, Professor Lord were in an unusually
good humor, the student’s score might be unusually high and a positive error term would result.
Professor Lord’s disposition is not the only sources of randomness. For example, a particular
152 Chapter 5
student could have just “lucked out” by correctly anticipating the questions Professor Lord asked.
In this case the student’s score would be unusually high, his/her error term would be positive.
All such random influences are accounted for by the error term. The error term accounts for all
the factors that cannot be determined or anticipated beforehand.
The word simple is used to describe the model because the model includes only a single explana-
tory variable. Obviously many other factors influence a student’s quiz score; the number of
minutes studied is only one factor. However, we must start somewhere. We will begin with the
simple regression model. Later we will move on and introduce multiple regression models to
analyze more realistic scenarios in which two or more explanatory variables are used to explain
the dependent variable.
Question: How can Clint use the data to assess the effect of studying on quiz scores?
Answer: He begins by drawing a scatter diagram using the data appearing in table 5.2 (plotted
in figure 5.3).
The data appear to confirm the “theory.” As minutes studied increase, quiz scores tend to
increase.
Question: How can Clint estimate the relationship between minutes studied and the quiz score
more precisely?
Answer: Draw a line through the points that best fits the data; then use the best fitting line’s
equation to estimate the relationship.
Clint’s effort to “eyeball” the best fitting line appears in figure 5.4. By choosing two points
on this line, Clint can solve for the equation of his best fitting line. It looks like the points (0,
60) and (20, 90) are more or less on the line. He can use these two points to estimate the slope:
Rise 90 − 60 30
Slope = = = = 1.5
Run 20 − 0 20
Next Clint can use a little algebra to derive the equation for the line:
y − 60
= 1.5
x−0
y − 60 = 1.5x
y = 60 + 1.5x
153 Ordinary Least Squares Estimation Procedure—The Mechanics
Score (y )
100
Std 3
90
Std 2
80
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.3
Minutes and scores scatter diagram
This equation suggests that an additional minute of studying increases a student’s score by 1.5
points.
Let us compare the two examples we introduced. In the income–savings case, the points were
clustered tightly around our best fitting line (figure 5.2). Two individuals might not “eyeball”
the identical “best fitting line,” but the difference would be slight. In the minutes–scores case,
however, the points are not clustered nearly so tightly (figure 5.3). Two individuals could
“eyeball” the “best fitting line” very differently; therefore two individuals could derive substan-
tially different equations for the best fitting line and would then would report very different
estimates of the effect that studying has on quiz scores. Consequently we need a systematic
procedure to determine the best fitting line. Furthermore, once we determine the best fitting line,
we need to decide how confident we should be in the theory. We will now address two issues:
154 Chapter 5
Score (y )
100
Std 3
90
Std 2
80
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.4
Minutes and scores scatter diagram with Clint’s eyeballed best fitting line
• What systematic procedure should we use to determine the best fitting line for the data?
• In view of the best fitting line, how much confidence should we have in the theory’s
validity?
The ordinary least squares (OLS) estimation procedure is the most widely used estimation
procedure to determine the equation for the line that “best fits” the data. Its popularity results
from two factors:
• The procedure is computationally straightforward; it provides us (and computer software) with
a relatively easy way to estimate the regression model’s parameters, the constant and slope of
the best fitting line.
• The procedure possesses several desirable properties when the error term meets certain
conditions.
155 Ordinary Least Squares Estimation Procedure—The Mechanics
This chapter focuses on the computational aspects of the ordinary least squares (OLS) estimation
procedure. In chapter 6 we turn to the properties of the estimation procedure.
We begin our study of the ordinary least squares (OLS) estimation procedure by introducing
a little notation. We must distinguish between the actual values of the parameters and the esti-
mates of the parameters. We have used the Greek letter beta, β, to denote the actual values.
Recall the original model:
yt = βConst + βxxt + et
βConst denotes the actual constant and βx the actual coefficient.
We will use Roman italicized b’s to denote the estimates. bConst denotes the estimate of the
constant for the best fitting line and bx denotes the estimate of the coefficient for the best fitting
line. That is, the equation for the best fitting line is
y = bConst + bxx
The constant and slope of the best fitting line, bConst and bx, estimate the values of βConst and βx.3
The ordinary least squares (OLS) estimation procedure chooses bConst and bx so as to minimize
the sum of the squared residuals. We will now use our example to illustrate precisely what this
means. We begin by introducing an equation for each student’s estimated score: Esty1, Esty2, and
Esty3.
Esty1, Esty2, and Esty3 estimate the score received by students 1, 2, and 3 based on the estimated
constant, bConst, the estimated coefficient, bx, and the number of minutes each student studies, x1,
x2, and x3.
The difference between a student’s actual score, yt, and his/her estimated score, Estyt, is called
the residual, Rest:
3. There is another convention that is often used to denote the parameter estimates, the “beta-hat” convention. The
estimate of the constant is denoted by β̂ Const and the coefficient by β̂ x. While the Roman italicized b’s estimation conven-
tion will be used throughout this textbook, be aware that you will come across textbooks and articles that use the beta-hat
convention. The b’s and β̂’s denote the same thing; they are interchangeable.
156 Chapter 5
Next we square each residual and add them together to compute the sum of squared residuals,
SSR:
where T = sample size. bConst and bx are chosen to minimize the sum of squared residuals. The
following equations for bConst and bx accomplish this:
∑
T
t =1
( yt − y )( xt − x )
bConst = y − bx x , bx =
∑
T
t =1
( xt − x ) 2
SSR = (y1 − bConst − bxx1)2+ (y2 − bConst − bxx2)2+ (y3 − bConst − bxx3)2
First focus on bConst. Differentiate the sum of squared residuals, SSR, with respect to bConst and
set the derivative equal to 0:
dSSR
= −2( y1 − bConst − bx x1 ) + −2( y2 − bConst − bx x2 ) + −2( y3 − bConst − bx x3 ) = 0
dbConst
Dividing by −2:
simplifying:
dividing by 3:
y1 + y2 + y3 x + x + x3
− bConst − bx 1 2 =0
3 3
157 Ordinary Least Squares Estimation Procedure—The Mechanics
y1 + y2 + y3 x + x + x3
Since equals the mean of y, –y , and 1 2 equals the mean of x, x–:
3 3
–y − bConst − bxx– = 0
Our first equation, our equation for bConst, is now justified. To minimize the sum of squared
residuals, the following relationship must be met:
–y = bConst + bxx– or bConst = –y − bxx–
As illustrated in figure 5.5, this equation simply says that the best fitting line must pass through
the point (x–, –y ), the point representing the mean of x, minutes studied, and the mean of y, the
quiz scores.
Score (y )
100
Std 3
90
Std 2 = bConst + bx x
OLS estimate: y
80
− −
(x, y) = (15, 81)
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.5
Minutes and scores scatter diagram with OLS best fitting line
158 Chapter 5
The best fitting line passes through the point (15, 81).
Next we will justify the equation for bx. Reconsider the equation for the sum of squared
residuals and substitute –y − bxx– for bConst:
SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2
= [y1 − (y– − bxx– ) − bxx1]2 + [y2 − (y– − bxx– ) − bxx2]2 + [y3 − ( –y − bxx– ) − bxx3]2
switching of the “bx terms” within each of the three squared terms:
To minimize the sum of squared residuals, differentiate SSR with respect to bx and set the deriva-
tive equal to 0:
dSSR
= −2[( y1 − y ) − bx ( x1 − x )]( x1 − x ) − 2[( y2 − y ) − bx ( x2 − x )]( x2 − x )
dbx
− 2[( y3 − y ) − bx ( x3 − x )]( x3 − x ) = 0
dividing by −2:
(y1 − –y )(x1 − x– ) +(y2 − –y )(x2 − x– ) + (y3 − –y )(x3 − x– ) = bx[(x1 − x– )2 + (x2 − x– )2 + (x3 − x– )2]
∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
Now, for each student, calculate the deviation of y from its mean and the deviations of x from
its mean:
Student yt –y yt − –y xt x– xt − x–
1 66 81 −15 5 15 −10
2 87 81 6 15 15 0
3 90 81 9 25 15 10
Next, for each student, calculate the products of the y and x deviations and squared x
deviations:
Score (y )
100
Std 3
90
Std 2 OLS estimate: y =b + bxx
Const
= 63 + 1.2x
80
− −
(x, y) = (15, 81)
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.6
Minutes and scores scatter diagram with OLS best fitting line
bx equals the sum of the products of the y and x deviations divided by the sum of the squared x
deviations:
∑
T
t =1
( yt − y )( xt − x ) 240 6
bx = = = = 1.2
∑
T
t =1
( xt − x ) 2 200 5
To calculate bConst recall that the best fitting line passes through the point representing the
average value of x and y, (x– , –y ) (see figure 5.6):
–y = bConst + bxx–
bConst = –y − bxx–
161 Ordinary Least Squares Estimation Procedure—The Mechanics
We just learned that bx equals 6/5. The average of the x’s, x–, equals 15 and the average of the
y’s, –y , equals 81. Substituting, we have
6
bConst = 81 − x
5
= 81 − 18 = 63
Using the ordinary least squares (OLS) estimation procedure, we have the best fitting line for
Professor Lord’s first quiz as
6
y = 63 + x = 63 + 1.2 x
5
Consequently the least squares estimates for βConst and βx are 63 and 1.2. These estimates suggest
that Professor Lord gives each student 63 points just for showing up; each minute studied earns
the student 1.2 additional points. Based on the regression we estimate that:
• 1 additional minute studied increases the quiz score by 1.2 points.
• 2 additional minutes studied increase the quiz score by 2.4 points.
• And so on.
Let us now quickly calculate the sum of squared residuals for the best fitting line:
6
Student xt yt Estyt = 63 + xt = 63 + 1.2 xt Rest = yt − Estyt Res 2t
5
6
1 5 66 63 + × 5 = 63 + 6 = 69 66 − 69 = −3 9
5
6
2 15 87 63 + × 15 = 63 + 6 × 3 = 63 + 18 = 81 87 − 81 = 6 36
5
6
3 25 90 63 + × 25 = 63 + 6 × 5 = 63 + 30 = 93 90 − 93 = −3 9
5
SSR = 54
The sum of squared residuals for the best fitting line is 54.
Econometrics Lab 5.1: Finding the Ordinary Least Squares (OLS) Estimates
We can use our Econometrics Lab to emphasize how the ordinary least squares (OLS) estimation
procedure determines the best fitting line by accessing the Best Fitting Line simulation
(figure 5.7).
162 Chapter 5
Go
Figure 5.7
Best Fitting Line simulation—Data
By default the data from Professor Lord’s first quiz are specified: the values of x and y for
the first student are 5 and 66, for the second student 15 and 87, and for the third student 25
and 90.
Click Go. A new screen appears as shown in figure 5.8 with two slider bars, one slide bar for
the constant and one for the coefficient.
By default the constant and coefficient values are 63 and 1.2, the ordinary least squares (OLS)
estimates. Also the arithmetic used to calculate the sum of squared residuals is displayed. When
the constant equals 63 and the coefficient equals 1.2, the sum of squared residuals equals 54.00;
this is just the value that we calculated.
Next experiment with different values for the constant and coefficient values by moving the
two sliders. Convince yourself that the equations we used to calculate the estimate for the con-
stant and coefficient indeed minimize the sum of squared residuals.
Software and the ordinary least squares (OLS) estimation procedure: Fortunately, we do not have
to trudge through the laborious arithmetic to compute the ordinary least squares (OLS) estimates.
Statistical software can do the work for us.
Professor Lord’s first quiz data: Cross-sectional data of minutes studied and quiz scores in the
first quiz for the three students enrolled in Professor Lord’s class (table 5.3).
Constant Coefficient
New data
Figure 5.8
Best Fitting Line simulation—Parameter estimates
Table 5.3
First quiz results
1 5 66
2 15 87
3 25 90
We can use the statistical package EViews to perform the calculations. After opening the workfile
in EViews:
• In the Workfile window: Click on the dependent variable, y, first; and then, click on the
explanatory variable, x, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
This window previews the regression that will be run; note that the dependent variable, “y,”
is the first variable listed followed by two expressions representing the explanatory variable, “x,”
and the constant “c.”
Do not forget to close the workfile.
164 Chapter 5
Table 5.4
OLS first quiz regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 5.4 reports the values of the coefficient and constant for the best fitting line. Note that the
sum of squared residuals for the best fitting line is also included.
yt = βConst + βxxt + et
where
The parameters of the model, the values of the constant, βConst, and the coefficient, βx, represent
the actual number of
• points Professor Lord gives students just for showing up, βConst;
• additional points earned for each minute of study, βx.
Obviously the parameters of the model play an important role, but what about the error term,
et? To illustrate the importance of the error term, suppose that somehow we know the values of
βConst and βx. For the moment, suppose that βConst, the actual constant, equals 50 and βx, the actual
coefficient, equals 2. In words, this means that Professor Lord gives each student 50 points for
165 Ordinary Least Squares Estimation Procedure—The Mechanics
showing up; furthermore each minute of study provides the student with two additional points.
Consequently the regression model is
yt = 50 + 2xt + et
Note: In the real world, we never know the actual values of the constant and coefficient. We are
assuming that we do here, just to illustrate the importance of the error term.
The error term reflects all the factors that cannot be anticipated or determined before the quiz
is given; that is, the error term represents all random influences. In the absence of random influ-
ences, the error terms would equal 0.
Assume, only for the moment, that there are no random influences; consequently each error term
would equal 0 (figure 5.9). While this assumption is unrealistic, it allows us to appreciate the
important role played by the error term. Focus on the first student taking Professor Lord’s first
Score (y )
Std 3
100
90
Std 2
80
Actual: y = 50 + 2x
70
60
Std 3
50
51 01 52 02 53 0
Minutes (x)
Figure 5.9
Best fitting line with no error term
166 Chapter 5
quiz. The first student studies for 5 minutes. In the absence of random influences (that is, if e1
equaled 0), what score would the first student receive on the quiz? The answer is 60:
y1 = 50 + 2 × 5 + 0 = 50 + 10 = 60
Next consider the second student. The second student studies for 15 minutes. In the absence of
random influences, the second student would receive an 80 on the quiz:
y2 = 50 + 2 × 15 + 0 = 50 + 30 = 80
The third student would receive a 100:
y3 = 50 + 2 × 25 + 0 = 50 + 50 = 100
y = βConst + βxx = 50 + 2x
In sum, in the absence of random influences, the error term of each student equals 0 and the
best fitting line fits the data perfectly. The slope of this line equals 2, the actual coefficient, and
the vertical intercept of the line equals 50, the actual constant. Without random influences, it is
easy to determine the actual constant and coefficient by applying a little algebra. We will now
use a simulation to emphasize this point (figure 5.10).
Econometrics Lab 5.2: Coefficient Estimates When Random Influences Are Absent
The Coefficient Estimate simulation allows us to do something we cannot do in the real world.
It allows us to specify the actual values of the constant and coefficient in the model; that is, we
can select βConst and βx. We can specify the number of values:
• Points Professor Lord gives students just for showing up, βConst; by default, βConst is set at 50.
• Additional points earned for an additional minute of study, βx; by default, βx is set at 2.
Table 5.5
Quiz results with no random influences (no error term)
Absence of random
Student Minutes (x) influences score (y)
1 5 60
2 15 80
3 25 100
167 Ordinary Least Squares Estimation Procedure—The Mechanics
Err term
Actual −2 3
coefficient: 0 4
βx 2 5
6
Start
Estimated coefficient value
calculated from this repetition:
Repetition
Σt=1 (yt − −y)(xt − −x)
T
bx =
Σt=1 (xt − −x)2
Coef est T
Figure 5.10
Coefficient Estimate simulation
yt = 50 + 2xt + et
Each repetition of the simulation represents a quiz from a single week. In each repetition the
simulation does the following:
• Calculates the score for each student based on the actual constant (βConst), the actual coefficient
(βx), and the number of minutes the student studied; then, to be realistic, the simulation can add
a random influence in the form of the error term, et. An error term is included whenever the Err
Term checkbox is checked.
• Applies the ordinary least squares (OLS) estimation procedure to estimate the coefficient.
When the Pause box is checked the simulation stops after each repetition; when it is cleared,
quizzes are simulated repeatedly until the “Stop” button is clicked.
We can eliminate random influences by clearing the Err Term box. After doing so, click Start
and then Continue a few times. We discover that in the absence of random influences the esti-
mate of the coefficient value always equals the actual value, 2 (see table 5.6).
168 Chapter 5
Table 5.6
Simulation results with no random influences (no error term)
Coefficient estimate:
Repetition No error term
1 2.0
2 2.0
3 2.0
4 2.0
Table 5.7
Quiz results with random influences (with error term)
Inclusion of random
Student Minutes (x) influences score (y)
1 5 66
2 15 87
3 25 90
This is precisely what we concluded earlier from the scatter diagram. In the absence of random
influences, the best fitting line fits the data perfectly. The best fitting line’s slope equals the
actual value of the coefficient.
The real world is not that simple, however; random influences play an important role. In the real
world, random influences are inevitably present. In figure 5.11 the actual scores on the first quiz
have been added to the scatter diagram. As a consequence of the random influences, students 1
and 2 over perform while student 3 under performs (table 5.7).
As illustrated in figure 5.12, when random influences are present, we cannot expect the inter-
cept and slope of the best fitting line to equal the actual constant and the actual coefficient. The
intercept and slope of the best fitting line, bConst and bx, are affected by the random influences.
Consequently the intercept and slope of the best fitting line, bConst and bx, are themselves random
variables. Even if we knew the actual constant and slope, that is, if we knew the actual values
of βConst and βx, we could not predict the values of the constant and slope of the best fitting line,
bConst and bx, with certainty before the quiz was given.
Econometrics Lab 5.3: Coefficient Estimates When Random Influences Are Present
We will now use the Coefficient Estimate simulation to emphasize this point. We will show that
in the presence of random influences, the coefficient of the best fitting line is a random
variable.
169 Ordinary Least Squares Estimation Procedure—The Mechanics
Score (y )
100
Std 3
90
Std 2
80
Actual: y = 50 + 2x
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.11
Scatter diagram with error term
Note that the Error Term checkbox is now checked to include the error term. Be certain that the
Pause checkbox is checked and then click Start. When the simulation computes the best fitting
line, the estimated value of the coefficient typically is not 2 despite the fact that the actual value
of the coefficient is 2. Click the Continue button a few more times to simulate each successive
week’s quiz. What do you observe? We simply cannot expect the coefficient estimate to equal
the actual value of the coefficient. In fact, when random influences are present, the coefficient
estimate almost never equals the actual value of the coefficient. Sometimes the estimate is less
than the actual value, 2, and sometimes it is greater than the actual value. When random influ-
ences are present, the coefficient estimates are random variables.
While your coefficient estimates will no doubt differ from the estimates in table 5.8, one thing
is clear. Even if we know the actual value of the coefficient, as we do in the simulation, we
cannot predict with certainty the value of the estimate from one repetition. Our last two
170 Chapter 5
Score (y )
100
Std 3
90
Std 2
Actual: y = 50 + 2x
70
Std 1
60
50
5 10 15 20 25 30
Minutes (x)
Figure 5.12
OLS best fitting line with error term
Table 5.8
Simulation results with random influences (with error term)
Coefficient estimate:
Repetition With error term
1 1.8
2 1.6
3 3.2
4 1.9
171 Ordinary Least Squares Estimation Procedure—The Mechanics
simulations illustrate a critical point: the coefficient estimate is a random variable as a conse-
quence of the random influences introduced by each student’s error term.
We will now use a simulation to gain insights into random influences and error terms. As we
know, random influences are those factors that cannot be anticipated or determined beforehand.
Sometimes random influences lead to a higher quiz score, and other times they lead to a lower
score. The error terms embody these random influences:
• Sometimes the error term is positive, indicating that the score is higher than “usual.”
• Other times the error term is negative indicating that the score is lower than “usual.”
If the random influences are indeed random, they should be a “wash” after many, many quizzes.
That is, random influences should not systematically lead to higher or lower quiz scores. In other
words, if the error terms truly reflect random influences, they should average out to 0 “in the
long run.”
Econometrics Lab 5.4: Error Terms When Random Influences Are Present
Let us now check to be certain that the simulations are capturing random influences properly
by accessing the Error Term simulation (figure 5.13).
Initially, the Pause checkbox is checked and the error term variance is 500. Now click Start
and observe that the simulation reports the numerical value error term for each of the three
students. Record these three values. Also note that the simulation constructs a histogram for each
student’s error term and also reports the mean and variance. Click Continue again to observe
the numerical values of the error terms for the second quiz. Confirm that the simulation is cal-
culating the mean and variance of each student’s error terms correctly. Click Continue a few
more times. Note that the error terms are indeed random variables. Before the quiz is given, we
Figure 5.13
Error Term simulation
172 Chapter 5
Figure 5.14
Error Term simulation results
cannot predict the numerical value of a student’s error term. Each student’s histogram shows
that sometimes the error term for that student is positive and sometimes it is negative. Next clear
the Pause checkbox and click Continue. After many, many repetitions click Stop.
After many, many repetitions, the mean (average) of each student’s error terms equals about
0 (figure 5.14). Consequently each student’s error term truly represents a random influence; it
does not systematically influence the student’s quiz score. It is also instructive to focus on each
student’s histogram. For each student, the numerical value of the error term is positive about
half the time and negative about half the time after many, many repetitions.
In sum, the error terms represent random influences; consequently the error terms have no
systematic effect on quiz scores, the dependent variable:
• Sometimes the error term is positive, indicating that the score is higher than “usual.”
• Other times the error term is negative indicating that the score is lower than “usual.”
What can we say about the student’s error terms beforehand, before the next quiz? We can
describe their probability distribution. The chances that a student’s error term will be positive is
the same as the chances it will be negative. For any one quiz, the mean of each student’s error
term’s probability distribution equals 0:
Initially, we will make some strong assumptions regarding the explanatory variables and the
error terms:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:
We call these premises the standard ordinary least squares (OLS) premises. They make the
analysis as straightforward as possible. In part IV of this textbook we relax these premises to
study more general cases. Our strategy is to start with the most straightforward case and then
move on to more complex ones. While we only briefly cite the premises here, we will return to
them in the fourth part of the textbook to study their implications.
Recall Clint’s assignment. He must assess the effect of studying on quiz scores by using Profes-
sor Lord’s first quiz as evidence. Clint can apply the ordinary least squares (OLS) estimation
procedure; the OLS estimate for the value of the coefficient is 1.2. But we now know that the
estimate is a random variable. We cannot expect the coefficient estimate from the one quiz, 1.2,
to equal the actual value of the coefficient, the actual impact that studying has on a student’s
quiz score. We will proceed by dividing Clint’s assignment into two related parts:
• Reliability of the coefficient estimate: How reliable is the coefficient estimate calculated
from the results of the first quiz? That is, how confident should Clint be that the coefficient
estimate, 1.2, will be close to the actual value?
• Assessment of the theory: In view of the fact that Clint’s estimate of the coefficient equals
1.2, how confident should Clint be that the theory is correct, that additional studying increases
quiz scores?
1. What criterion does the ordinary least squares (OLS) estimation procedure apply when deriv-
ing the best fitting line?
2. How are random influences captured in the simple regression model?
3. When applying the ordinary least squares (OLS) estimation procedure, what type of variables
are the parameter estimates as a consequence of random influences?
4. What are the standard ordinary least square (OLS) premises?
Chapter 5 Exercises
1. A colleague of Professor Lord is teaching another course in which three students are enrolled.
The number of minutes each student studied and his/her score on the quiz are reported below:
Regression example data: Cross-sectional data of minutes studied and quiz scores from a course
taught by Professor Lord’s colleague.
Minutes Quiz
Student studied (x) score (y)
1 5 14
2 10 44
3 30 80
a. On a sheet of graph paper, plot a scatter diagram of the data. Then, using a ruler, draw a
straight line that, by sight, best fits the data.
b. Using a calculator and the equations we derived in class, apply the least squares estimation
procedure to find the best fitting line by filling in the blanks:
First, calculate the means:
Means: x– = _____________ = _______
–y = _____________ = _______
Second, for each student calculate the deviation of x from its mean and the deviation of y
from its mean:
Student yt –y yt − –y xt x– xt − x–
1 14 _____ _____ 5 _____ _____
2 44 _____ _____ 10 _____ _____
3 80 _____ _____ 30 _____ _____
175 Ordinary Least Squares Estimation Procedure—The Mechanics
Third, calculate the products of the y and x deviations and squared x deviations for each
student; then calculate the sums:
∑
T
t =1
( yt − y )( xt − x )
bx = = = _______
∑
T
t =1
( xt − x )2
After opening the file, use the following steps to run the regression:
• In the Workfile window: Click on the dependent variable, y, first; and then, click on the
explanatory variable, x, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
a. Are the calculations you made in problem 1 consistent with those provided by the
software?
b. Based on the regression results, what equation estimates the effect of minutes studied on
quiz scores?
c. Estimate the effect of minutes studied on quiz scores:
i. 1 additional minute results in ____ additional points.
ii. 2 additional minutes result in ____ additional points.
iii. 5 additional minutes result in ____ additional points.
iv. 1 fewer minute results in ____ fewer points.
v. 2 fewer minutes result in ____ fewer points.
3. Consider crude oil production in the United States.
Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.
a. What does economic theory teach us about how the real price of crude oil should affect
US crude oil production?
b. Using statistical software, estimate the effect that the real price of oil has on US crude oil
production.
Labor supply data: Cross-sectional data of hours worked and wages for the 92 married workers
included in the March 2007 Current Population Survey residing in the Northeast region of the
United States who earned bachelor but no advanced degrees.
Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.
Gasoline Gasoline
Real price consumption Real price consumption
Year ($ per gal) (millions of gals) Year ($ per gal) (millions of gals)
1990 1.43 303.9 1995 1.25 327.1
1991 1.35 301.9 1996 1.31 331.4
1992 1.31 305.3 1997 1.29 336.7
1993 1.25 314.0 1998 1.10 346.7
1994 1.23 319.2 1999 1.19 354.1
a. What does economic theory teach us about how the real price of gasoline should affect
US gasoline consumption?
b. Using statistical software, estimate the effect that the real price of gasoline has on US
gasoline consumption.
iv. How would a $1.00 decrease in the real price affect US gasoline consumption?
___________
v. How would a $2.00 decrease in the real price affect US gasoline consumption?
___________
6. Consider cigarette smoking data the United States.
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
Conventional wisdom suggests that high school drop outs are more likely to smoke cigarettes
than those who graduate.
a. Using statistical software, estimate the effect that the completion of high school has on
per capita cigarette consumption.
House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
180 Chapter 5
a. What is an earmark?
It has been alleged that since the Congress was controlled by Democrats, Democratic members
received more solo earmarks than their non-Democratic colleagues.
b. Using statistical software, estimate the effect that the political party of a member of Con-
gress has on the dollars of earmarks received.
Chapter 6 Outline
6.2 Review
6.2.1 Regression Model
6.2.2 The Error Term
6.2.3 Ordinary Least Squares (OLS) Estimation Procedure
6.2.4 The Estimates, bConst and bx, Are Random Variables
6.5 New Equation for the Ordinary Least Squares (OLS) Coefficient Estimate
1. Run the Distribution of Coefficient Estimates simulation in the Econometrics Lab by clicking
the following link:
Note: You must click the Next Problem button to get to the simulation’s problem 1.
Clint’s assignment is to assess the theory that additional studying increases quiz scores. To do
so, he must use data from Professor Lord’s first quiz, the number of minutes studied, and the
quiz score for each of the three students in the course (table 6.1).
183 Ordinary Least Squares Estimation Procedure—The Properties
Table 6.1
First quiz results
1 5 66
2 15 87
3 25 90
Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz
scores.
6.2 Review
yt = βConst + βxxt + et
where
βConst and βx are the parameters of the model. Let us review their interpretation:
• βConst reflects the number of points Professor Lord gives students just for showing up.
• βx reflects the number of additional points earned for each additional minute of studying.
The error term, et, plays a crucial role in the model. The error term represents random influences.
The mean of the error term’s probability distribution for each student equals 0:
Consequently the error terms have no systematic on affect quiz scores. Sometimes the error term
will be positive and sometimes it will be negative, but after many, many quizzes each student’s
error terms will average out to 0. When the probability distribution of the error term is symmetric,
the chances that a student will score better than “usual” on one quiz equal the chances that the
student will do worse than “usual.”
184 Chapter 6
As a consequence of the error terms (random influences) we can never determine the actual
values of βConst and βx; that is, Clint has no choice but to estimate the values. The ordinary least
squares (OLS) estimation procedure is the most commonly used procedure for doing this:
∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
bConst = –y − bxx–
Using the results of the first quiz, Clint estimates the values of the coefficient and constant:
Student yt –y yt − –y xt x– xt − x–
1 66 81 −15 5 15 −10
2 87 81 6 15 15 0
3 90 81 9 25 15 10
∑ ∑
T T
t =1
( yt − y )( xt − x ) = 240 t =1
( xt − x )2 = 200
∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2 bConst = y − bx x = 81 − × 15 = 63
∑
T
t =1
( xt − x ) 2 200 5 5
185 Ordinary Least Squares Estimation Procedure—The Properties
In the previous chapter we used the Econometrics Lab to show that the estimates for the constant
and coefficient, bConst and bx, are random variables. As a consequence of the error terms (random
influences) we could not determine the numerical value of the estimates for the constant and
coefficient, bConst and bx, before we conduct the experiment, even if we knew the actual values
of the constant and coefficient, βConst and βx. Furthermore we can never expect the estimates to
equal the actual values. Consequently we must assess the reliability of the estimates. We will
focus on the coefficient estimate.
Estimate reliability: How reliable is the coefficient estimate calculated from the results of the
first quiz? That is, how confident can Clint be that the coefficient estimate, 1.2, will be close to
the actual value of the coefficient?
Clint faced a similar problem when he polled a sample of the student population to estimate the
fraction of students supporting him. Twelve of the 16 randomly selected students polled, 75
percent, supported Clint, thereby suggesting that he was leading. But we then observed that it
was possible for this result to occur even if the election was actually a toss-up. In view of this,
we asked how confident Clint should be in the results of his single poll. To address this issue,
we turned to the general properties of polling procedures to assess the reliability of the estimate
Clint obtained from his single poll:
186 Chapter 6
While we could not determine the numerical value of the estimated fraction, EstFrac, before
the poll was conducted, we could describe its probability distribution. Using algebra, we derived
the general equations for the mean and variance of the estimated fraction’s, EstFrac’s, probability
distribution. Then we checked our algebra with a simulation by exploiting the relative frequency
interpretation of probability: after many, many repetitions, the distribution of the numerical
values mirrors the probability distribution for one repetition.
187 Ordinary Least Squares Estimation Procedure—The Properties
The estimated fraction’s probability distribution allowed us to assess the reliability of Clint’s
poll.
Using the ordinary least squares (OLS) estimation procedure we estimated the value of the coef-
ficient to be 1.2. This estimate is based on a single quiz. The fact that the coefficient estimate
is positive suggests that additional studying increases quiz scores. But how confident can we be
that the coefficient estimate is close to the actual value? To address the reliability issue we will
focus on the general properties of the ordinary least squares (OLS) estimation procedure:
188 Chapter 6
∑
T
t =1
( yt − y )( xt − x ) 240 6
bx = bx = = = 1.2
∑ t =1 ( xt − x )2
T
200 5
6
bConst = –y − bxx– bConst = 81 − × 15 = 63
5
Mean[bx] = ?
Var[bx] = ?
↓
Mean and variance describe the center and spread of the estimate’s probability distribution
While we cannot determine the numerical value of the coefficient estimate before the quiz is
given, we can describe its probability distribution. The probability distribution tells us how likely
it is for the coefficient estimate based on a single quiz to equal each of the possible values. Using
algebra, we will derive the general equations for the mean and variance of the coefficient esti-
mate’s probability distribution. Then we will check our algebra with a simulation by exploiting
the relative frequency interpretation of probability: after many, many repetitions the distribution
of the numerical values mirrors the probability distribution for one repetition.
189 Ordinary Least Squares Estimation Procedure—The Properties
The coefficient estimate’s probability distribution will allow us to assess the reliability of the
coefficient estimate calculated from Professor Lord’s quiz.
To derive the equations for the mean and variance of the coefficient estimate’s probability dis-
tribution, we will apply the standard ordinary least squares (OLS) regression premises. As we
mentioned chapter 5, these premises make the analysis as straightforward as possible. In later
chapters we will relax these premises to study more general cases. In other words, we will start
with the most straightforward case and then move on to more complex ones later.
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:
To keep the algebra manageable, we will assume that the explanatory variables are constants in
the derivations that follow. This assumption allows us to apply the arithmetic of means and
variances easily. While this simplifies our algebraic manipulations, it does not affect the validity
of our conclusions.
190 Chapter 6
6.5 New Equation for the Ordinary Least Squares (OLS) Coefficient Estimate
In chapter 5 we derived an equation that expressed the OLS coefficient estimate in terms of the
x’s and y’s:
∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
It is advantageous to use a different equation to derive the equations for the mean and variance
of the coefficient estimate’s probability distribution, however; we will use an equivalent equation
that expresses the coefficient estimate in terms of the x’s, e’s, and βx rather than in terms of the
x’s and y’s:1
∑
T
( xt − x )et
bx = β x + t =1
∑
T
t =1
( xt − x )2
To keep the notation as straightforward as possible, we will focus on the 3 observation case. The
logic for the general case is identical to the logic for the 3 observation case:
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x +
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
To calculate the mean of bx’s probability distribution, review the arithmetic of means:
• Mean of a constant times a variable: Mean[cx] = c Mean[x]
• Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]
• Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]
1. Appendix 6.1 appearing at the end of this chapter shows how we can derive the second equation for the coefficient
estimate, bx, from the first.
191 Ordinary Least Squares Estimation Procedure—The Properties
Now we apply algebra to the new equation for the coefficient estimate, bx:
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
2 2 ⎠ ⎦
Applying Mean[cx] = c Mean[x]
1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Applying Mean[x + y] = Mean[x] + Mean[y]
1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2
1
= βx + [( x1 − x )Mean[e1 ] + ( x2 − x )Mean[e2 ] + ( x3 − x )Mean[e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
since Mean[e1] = Mean[e2] = Mean[e3] = 0 = βx. So, as shown in figure 6.1, we have
Mean[bx] = βx
Consequently the ordinary least squares (OLS) estimation procedure for the value of the coef-
ficient is unbiased. In any one repetition of the experiment, the mean (center) of the probability
distribution equals the actual value of the coefficient. The estimation procedure does not sys-
tematically overestimate or underestimate the actual coefficient value, βx. If the probability
distribution is symmetric, the chances that the estimate calculated from one quiz will be too high
equal the chances that it will be too low.
We can use the Distribution of Coefficient Estimates simulation in our Econometrics Lab to
replicate the quiz many, many times. But in reality, Clint only has information from one quiz,
the first quiz. How then can a simulation be useful? The relative frequency interpretation of
192 Chapter 6
Probability distribution
bx
Mean[ b ]
x
= βx
Figure 6.1
Probability distribution of coefficient estimates
probability provides the answer. The relative frequency interpretation of probability tells us that
the distribution of the numerical values after many, many repetitions of the experiments mirrors
the probability distribution of one repetition. Consequently repeating the experiment many, many
times reveals the probability distribution for the one quiz:
Distribution of the
numerical values
After many,
many
repetitions
Probability distribution
We can use the simulation to check the algebra we used to derive the equation for the mean of
the coefficient estimate’s probability distribution:
Mean[bx] = βx
If our algebra is correct, the mean (average) of the estimated coefficient values should equal the
actual value of the coefficient, βx, after many, many repetitions (see figure 6.2).
Figure 6.2
Distribution of coefficient estimates simulation
Recall that a simulation allows us to do something that we cannot do in the real world. In the
simulation, we can specify the actual values of the constant and coefficient, βConst and βx. The
default setting for the actual coefficient value is 2. Be certain that the Pause checkbox is checked.
Click Start. Record the numerical value of the coefficient estimate for the first repetition. Click
Continue to simulate the second quiz. Record the value of the coefficient estimate for the second
repetition and calculate the mean and variance of the numerical estimates for the first two repeti-
tions. Note that your calculations agree with those provided by the simulation. Click Continue
again to simulate the third quiz. Calculate the mean and variance of the numerical estimates for
the first three repetitions. Once again, note that your calculations and the simulation’s calcula-
tions agree. Continue to click Continue until you are convinced that the simulation is calculating
the mean and variance of the numerical values for the coefficient estimates correctly.
Now clear the Pause checkbox and click Continue. The simulation no longer pauses after
each repetition. After many, many repetitions click Stop.
Question: What does the mean (average) of the coefficient estimates equal?
Answer: It equals about 2.0.
This lends support to the equation for the mean of the coefficient estimate’s probability distribu-
tion that we just derived (table 6.2). Now change the actual coefficient value from 2 to 4. Click
Start, and then after many, many repetitions click Stop. What does the mean (average) of the
estimates equal? Next, change the actual coefficient value to 6 and repeat the process.
Note that in all cases the mean (average) of the estimates for the coefficient value equals the
actual value of the coefficient after many, many repetitions (figure 6.3).
The simulations confirm our algebra. The estimation procedure does not systematically under-
estimate or overestimate the actual value of the coefficient. The ordinary least squares (OLS)
estimation procedure for the coefficient value is unbiased.
194 Chapter 6
Table 6.2
Distribution of Coefficient Estimates simulation results
Equation:
Mean of coef Simulation:
Actual estimate prob dist Simulation Mean (average) of estimated coef
βx Mean[bx] repetitions values, bx, from the experiments
2 2 >1,000,000 ≈ 2.0
4 4 >1,000,000 ≈ 4.0
6 6 >1,000,000 ≈ 6.0
Probability distribution
bx
2
Figure 6.3
Histogram of coefficient value estimates
Next we turn our attention to the variance of the coefficient estimate’s probability distribution.
To derive the equation for the variance, begin by reviewing the arithmetic of variances:
• Variance of a constant times a variable: Var[cx] = c2 Var[x]
• Variance of the sum of a variable and a constant: Var[c + x] = Var[x]
• Variance of the sum of two independent variables: Var[x + y] = Var[x] + Var[y]
Focus on the first two standard ordinary least squares (OLS) premises:
• Error term/equal variance premise: Var[e1] = Var[e2] = Var[e3] = Var[e].
• Error term/error term independence premise: The error terms are independent; that is, Cov[et,
ej] = 0.
195 Ordinary Least Squares Estimation Procedure—The Properties
Therefore
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦
Applying Var[cx] = c2Var[x]
1
= Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ]2
Error term/error term independence premise:
1
= [( x1 − x )2 Var[e1 ] + ( x2 − x )2 Var[e2 ] + ( x3 − x )2 Var[e3 ]]
[( x1 − x ) 2
+ ( x2 − x ) + ( x3 − x )
2
]
2 2
1
= [( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ] Var[e]]
[( x1 − x ) + ( x2 − x )2 + ( x3 − x )2 ]
2 2
196 Chapter 6
Simplifying
Var[e]
=
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2
Var[e]
Var[bx ] =
∑
T
t =1
( xt − x )2
The variance of the coefficient estimate’s probability distribution equals the variance of the error
term’s probability distribution divided by the sum of squared x deviations.
We will now use the Distribution of Coefficient Estimates simulation to check the equation that
we just derived for the variance of the coefficient estimate’s probability distribution (figure 6.4).
The simulation automatically spreads the x values uniformly between 0 and 30. We will continue
to consider three observations; accordingly, the x values are 5, 15, and 25. To convince yourself
of this, be certain that the Pause checkbox is checked. Click Start and then Continue a few
times to observe that the values of x are always 5, 15, and 25.
Figure 6.4
Distribution of Coefficient Estimates simulation
197 Ordinary Least Squares Estimation Procedure—The Properties
Next recall the equation we just derived for the variance of the coefficient estimate’s probabil-
ity distribution:
Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2
By default, the variance of the error term probability distribution is 500; therefore the numerator
equals 500. Let us turn our attention to the denominator, the sum of squared x deviations. We
have just observed that the x values are 5, 15, and 25. Their mean is 15 and their sum of squared
deviations from the mean is 200:
x1 + x2 + x3 5 + 15 + 25 45
x= = = = 15
3 3 3
Student xt x– xt − x– (xt − x– )2
1 5 15 −10 (−10)2 = 100
2 15 15 0 (0)2 = 0
3 25 15 10 (10)2 = 100
Sum = 200
∑
T
t =1
( xt − x )2 = 200
That is,
To show that the simulation confirms this, be certain that the Pause checkbox is cleared
and click Continue. After many, many repetitions click Stop. Indeed, after many, many repeti-
tions of the experiment the variance of the numerical values is about 2.50. The simulation
confirms the equation we derived for the variance of the coefficient estimate’s probability
distribution.
198 Chapter 6
6.7 Estimation Procedures and the Estimate’s Probability Distribution: Importance of the Mean
(Center) and Variance (Spread)
Let us review what we learned about estimation procedures when we studied Clint’s opinion
poll in chapter 3:
• Importance of the probability distribution’s mean: Formally, an estimation procedure is
unbiased whenever the mean (center) of the estimate’s probability distribution equals the actual
value. The relative frequency interpretation of probability provides intuition: If the experiment
were repeated many, many times the average of the numerical values of the estimates will equal
the actual value. An unbiased estimation procedure does not systematically underestimate or
overestimate the actual value. If the probability distribution is symmetric, the chances that the
estimate calculated from one repetition of the experiment will be too high equal the chances the
estimate will be too low (figure 6.5).
• Importance of the probability distribution’s variance: When the estimation procedure is
unbiased, the variance of the estimate’s probability distribution’s variance (spread) reveals the
estimate’s reliability; the variance tells us how likely it is that the numerical value of the estimate
calculated from one repetition of the experiment will be close to the actual value (figure 6.6).
When the estimation procedure is unbiased, the variance of the estimate’s probability distribu-
tion determines reliability.
• On the one hand, as the variance decreases, the probability distribution becomes more tightly
cropped around the actual value making it more likely for the estimate to be close to the actual
value.
• On the other hand, as the variance increases, the probability distribution becomes less tightly
cropped around the actual value making it less likely for the estimate to be close to the actual
value.
Estimate
Actual value
Figure 6.5
Probability distribution of estimates—Importance of the mean
199 Ordinary Least Squares Estimation Procedure—The Properties
Estimate Estimate
Actual value Actual value
↓ ↓
Estimate is unreliable Estimate is reliable
Figure 6.6
Probability distribution of estimates—Importance of the variance
We will focus on the variance of the coefficient estimate’s probability distribution to explain
what influences its reliability. We will consider three factors:
• Variance of the error term’s probability distribution
• Sample size
• Range of the x’s
6.8.1 Estimate Reliability and the Variance of the Error Term’s Probability Distribution
What is our intuition here? The error term represents the random influences. It is the error term
that introduces uncertainty into the mix. On the one hand, as the variance of the error term’s
probability distribution increases, uncertainty increases; consequently the available information
becomes less reliable, and we would expect the coefficient estimate to become less reliable. On
the other hand, as the variance of the error term’s probability distribution decreases, the available
information becomes more reliable, and we would expect the coefficient estimate to become
more reliable.
200 Chapter 6
To justify this intuition, recall the equation for the variance of the coefficient estimate’s prob-
ability distribution:
Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2
The variance of the coefficient estimate’s probability distribution is directly proportional to the
variance of the error term’s probability distribution.
We will use the Distribution of Coefficient Estimates simulation to confirm the role played by
the variance of the error term’s probability distribution. To do so, check the From–To checkbox.
Two lists now appear: a From list and a To list. Initially, 1.0 is selected in the From list and 3.0
in the To list. Consequently the simulation will report the percent of repetitions in which the
coefficient estimate falls between 1.0 and 3.0. Since the default value for the actual coefficient,
βx, equals 2.0, the simulation reports on the percent of repetitions in which the coefficient esti-
mate falls within 1.0 of the actual value. The simulation reports the percent of repetitions in
which the coefficient estimate is “close to” the actual value where “close to” is considered to
be within 1.0.
By default, the variance of the error term’s probability distribution equals 500 and the sample
size equals 3. Recall that the sum of the squared x deviations equals 200 and therefore the vari-
ance of the coefficient estimate’s probability distribution equals 2.50:
Be certain that the Pause checkbox is cleared. Click Start, and then after many, many repetitions,
click Stop. As table 6.3 reports, the coefficient estimate lies within 1.0 of the actual coefficient
value in 47.3 percent of the repetitions.
Now reduce the variance of the error term’s probability distribution from 500 to 50. The vari-
ance of the coefficient estimate’s probability distribution now equals 0.25:
Click Start, and then after many, many repetitions click Stop. The histogram of the coefficient
estimates is now more closely cropped around the actual value, 2.0. The percent of repetitions
201 Ordinary Least Squares Estimation Procedure—The Properties
Table 6.3
Distribution of Coefficient Estimates simulation reliability results
Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx
in which the coefficient estimate lies within 1.0 of the actual coefficient value rises from 47.3
percent to 95.5 percent.
Why is this increase important? The variance measures the spread of the probability distribu-
tion. This is important when the estimation procedure is unbiased. As the variance decreases,
the probability distribution becomes more closely cropped around the actual coefficient value
and the chances that the coefficient estimate obtained from one quiz will lie close to the actual
value increases. The simulation confirms this; after many, many repetitions the percent of repeti-
tions in which the coefficient estimate lies between 1.0 and 3.0 increases from 47.3 percent to
95.5 percent. Consequently, as the error term’s variance decreases, we can expect the estimate
from one quiz to be more reliable. As the variance of the error term’s probability distribution
decreases, the estimate is more likely to be close to the actual value. This is consistent with our
intuition, is it not?
Next we will investigate the effect of the sample size, the number of observations, used to cal-
culate the estimate. Increase the sample size from 3 to 5. What does our intuition suggest? As
we increase the number of observations, we will have more information. With more information
the estimate should become more reliable; that is, with more information the variance of the
coefficient estimate’s probability distribution should decrease. Using the equation, let us now
calculate the variance of the coefficient estimate’s probability distribution when there are 5
observations. With 5 observations the x values are spread uniformly at 3, 9, 15, 21, and 27; the
mean (average) of the x’s, x– , equals 15 and the sum of the squared x deviations equals 360:
x1 + x2 + x3 + x4 + x5 3 + 9 + 15 + 21 + 27 75
x= = = = 15
5 3 5
202 Chapter 6
Student xt x– xt − x– (xt − x– )2
1 3 15 −12 (−12)2 = 144
2 9 15 −6 (−6)2 = 36
3 15 15 0 (0)2 = 0
4 21 15 6 (6)2 = 36
5 27 15 12 (12)2 = 144
Sum = 360
∑
T
t =1
( xt − x )2 = 360
Applying the equation for the value of the coefficient estimate’s probability distribution obtains
Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )2 + ( x4 − x )2 + ( x5 − x )2
T 2 2
t =1
( xt − x ) 2
50
=
(3 − 15)2 + (9 − 15)2 + (15 − 15)2 + (21 − 15)2 + (27 − 15)2
50 50 50
= = = = 0.1388 . . . ≈ 0.14
(−12)2 + (6)2 + (0)2 + (6)2 + (12)2 144 + 36 + 0 + 36 + 144 360
The variance of the coefficient estimate’s probability distribution falls from 0.25 to 0.14. The
smaller variance suggests that the coefficient estimate will be more reliable.
Are our intuition and calculations supported by the simulation? The answer is in fact yes.
Note that the sample size has increased from 3 to 5. Click Start, and then after many, many
repetitions click Stop (table 6.4).
After many, many repetitions the percent of repetitions in which the coefficient estimate lies
between 1.0 and 3.0 increases from 95.5 percent to 99.3 percent. As the sample size increases,
we can expect the estimate from one quiz to be more reliable. As the sample size increases, the
estimate is more likely to be close to the actual value.
Let us again begin by appealing to our intuition. As the range of x’s becomes smaller, we are
basing our estimates on less variation in the x’s, less diversity; accordingly we are basing our
203 Ordinary Least Squares Estimation Procedure—The Properties
Table 6.4
Distribution of Coefficient Estimates simulation reliability results
Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx
estimates on less information. As the range becomes smaller, the estimate should become less
reliable, and consequently the variance of the coefficient estimate’s probability distribution
should increase. To confirm this, increase the minimum value of x from 0 to 10 and decrease
the maximum value from 30 to 20. The five x values are now spread uniformly between 10 and
20 at 11, 13, 15, 17, and 19; the mean (average) of the x’s, x– , equals 15 and the sum of the
squared x deviations equals 40:
x1 + x2 + x3 + x4 + x5 11 + 13 + 15 + 17 + 19 75
x= = = = 15
5 3 5
Student xt x– xt − x– (xt − x– )2
1 11 15 −4 (−4)2 = 16
2 13 15 −2 (−2)2 = 4
3 15 15 0 (0)2 = 0
4 17 15 2 (2)2 = 4
5 19 15 4 (6)2 = 16
Sum = 40
∑
T
t =1
( xt − x )2 = 40
Applying the equation for the value of the coefficient estimate’s probability distribution:
Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )2 + ( x4 − x )2 + ( x5 − x )2
T 2 2
t =1
( xt − x ) 2
50
=
(11 − 15)2 + (13 − 15)2 + (15 − 15)2 + (17 − 15)2 + (19 − 15)2
50 50 50 5
= = = = = 1.25
(−4)2 + (2)2 + (0)2 + (2)2 + (4)2 16 + 4 + 0 + 4 + 16 40 4
204 Chapter 6
Table 6.5
Distribution of Coefficient Estimates simulation reliability results
Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx
The variance of the coefficient estimate’s probability distribution increases from about 0.14
to 1.25.
After changing the minimum value of x to 10 and the maximum value to 20, click the Start,
and then after many, many repetitions click Stop.
After many, many repetitions the percent of repetitions in which the coefficient estimate lies
between 1.0 and 3.0 decreases from 99.3 percent to 62.8 percent (table 6.5). An estimate from
one repetition will be less reliable. As the range of the x’s decreases, the estimate is less likely
to be close to the actual value.
Our simulation results illustrate relationships between information, the variance of the coefficient
estimate’s probability distribution, and the reliability of an estimate:
More and/or more reliable information Less and/or less reliable information
↓ ↓
Variance of coefficient estimate’s Variance of coefficient estimate’s probability
probability distribution smaller distribution larger
↓ ↓
Estimate more reliable; more likely the Estimate less reliable; less likely the estimate is
estimate is “close to” the actual value “close to” the actual value
205 Ordinary Least Squares Estimation Procedure—The Properties
In chapter 5 we introduced the mechanics of the ordinary least squares (OLS) estimation pro-
cedure and in this chapter we analyzed the procedure’s properties. Why have we devoted so
much attention to this particular estimation procedure? The reason is straightforward. When the
standard ordinary least squares (OLS) premises are satisfied, no other linear estimation procedure
produces more reliable estimates. In other words, the ordinary least squares (OLS) estimation
procedure is the best linear unbiased estimation procedure (BLUE). Let us now explain this
more carefully.
If an estimation procedure is the best linear unbiased estimation procedure (BLUE), it must
exhibit three properties:
• The estimate must be a linear function of the dependent variable, the yt’s.
• The estimation procedure must be unbiased; that is, the mean of the estimate’s probability
distribution must equal the actual value.
• No other linear unbiased estimation procedure can be more reliable; that is, the variance of
the estimate’s probability distribution when using any other linear unbiased estimation procedure
cannot be less than the variance when the best linear unbiased estimation procedure is used.
The Gauss–Markov theorem proves that the ordinary least squares (OLS) estimation procedure
is the best linear unbiased estimation procedure.2 We will illustrate the theorem by describing
two other linear unbiased estimation procedures that while unbiased, are not as reliable as the
ordinary least squares (OLS) estimation procedure. Note that while we would never use either
of these estimation procedures to do serious analysis, they are useful pedagogical tools. They
allow us to illustrate what we mean by the best linear unbiased estimation procedure.
We will now consider the Any Two and the Min–Max estimation procedures:
• Any Two estimation procedure: Choose any two points on the scatter diagram (figure 6.7);
draw a straight line through the points. The coefficient estimate equals the slope of this line.
• Min–Max estimation procedure: Choose two specific points on the scatter diagram (figure
6.8); the point with the smallest value of x and the point with the largest value of x; draw a
straight line through the two points. The coefficient estimate equals the slope of this line.
Any Two
y
x
3 9 15 21 27
Figure 6.7
Any Two estimation procedure
y Min-Max
x
3 9 15 21 27
Figure 6.8
Min–Max estimation procedure
207 Ordinary Least Squares Estimation Procedure—The Properties
Table 6.6
BLUE simulation results
Sample size = 5
Simulations:
Actual Values Estimated coefficient values, bx
Econometrics Lab 6.6: Comparing the Ordinary Least Squares (OLS), Any Two, and Min–Max
Estimation Procedures
We will now use the BLUE simulation in our Econometrics Lab to justify our emphasis on the
ordinary least squares (OLS) estimation procedure.
By default, the sample size equals 5 and the variance of the error term’s probability distribution
equals 500. The From–To values are specified as 1.0 and 3.0 (table 6.6).
Initially the ordinary least squares (OLS) estimation procedure is specified. Be certain that
the Pause checkbox is cleared. Click Start, and then after many, many repetitions click Stop.
For the OLS estimation procedure, the average of the estimated coefficient values equals about
2.0 and the variance 1.4. 60.4 percent of the estimates lie with 1.0 of the actual value. Next
select the Any Two estimation procedure instead of OLS. Click Start, and then after many, many
repetitions click Stop. For the Any Two estimation procedure, the average of the estimated coef-
ficient values equals about 2.0 and the variance 14.0; 29.0 percent of the estimates lie within
1.0 of the actual value. Repeat the process one last time after selecting the Min–Max estimation
procedure; the average equals about 2.0 and the variance 1.7; 55.2 percent of the estimates lie
with 1.0 of the actual value.
Let us summarize:
• In all three cases the average of the coefficient estimates equal 2.0, the actual value; after many,
many repetitions the mean (average) of the estimates equals the actual value. Consequently all
three estimation procedures for the coefficient value appear to be unbiased.
• The variance of the coefficient estimate’s probability distribution is smallest when the ordinary
least squares (OLS) estimation procedure is used. Consequently the ordinary least squares (OLS)
estimation procedure produces the most reliable estimates.
208 Chapter 6
What we have just observed can be generalized. When the standard ordinary least squares (OLS)
regression premises are met, the ordinary least squares (OLS) estimation procedure is the best
linear unbiased estimation procedure because no other linear unbiased estimation procedure
produces estimates that are more reliable.
Chapter 6 Exercises
1. Assume that the standard ordinary least square (OLS) premises are met. Let (xi, yi) and (xj,
yj) be the values of the explanatory and dependent variables from two different observations.
209 Ordinary Least Squares Estimation Procedure—The Properties
Also let
bSlope = slope of the straight line connecting the two points representing these two
observations
a. Express bSlope in terms of xi, yi, xj, and yj.
Consider the simple regression model and two different observations, i and j:
yi = βConst + βxxi + ei
yj = βConst + βxxj + ej
b. Using the simple regression model, substitute for yi and yj in the expression for bSlope.
(Assume that xi does not equal xj.)
c. What does the mean of bSlope’s probability distribution, Mean[bSlope], equal?
d. What does the variance of bSlope’s probability distribution, Var[bSlope], equal?
2. Assume that the standard ordinary least square (OLS) premises are met. Consider the Min–
Max estimation procedure that we simulated in Econometrics Lab 6.6. Let
• The actual coefficient equals 2 (βx = 2).
• The variance of the error term’s probability distribution equals 500 (Var[e] = 500).
• The sample size equals 5, and the values of the x’s equal: 3, 9, 15, 21, and 27.
Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.
4. Using statistical software, generate a new variable that expresses crude oil production in
thousands of gallons per day rather than thousands of barrels per day. Call the new variable
OilProdGallons. Note that there are 42 gallons in 1 barrel.
OilProdGallons = OilProdBarrels*42
Based on this OilProdGallons regression, estimate the effect of a $1 increase in price on the
gallons of oil produced.
211 Ordinary Least Squares Estimation Procedure—The Properties
c. Do the units in which the dependent variable is measured influence the estimate of how
the explanatory variable affects the dependent variable?
5. Using statistical software, generate a new variable that adds the constant 1,000 to OilProd-
Barrels in every year. Call the new variable OilProdBarrels1000.
Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.
GasConst US gasoline consumption in year t (millions of gallons per day)
PriceDollarst Real price of gasoline in year t (dollars per gallon –2000 dollars)
212 Chapter 6
6. Using statistical software, generate a new variable that expresses the price of gasoline in cents
rather than dollars. Call this new variable PriceCents.
Based on this PriceCents regression, estimate the effect that a 1 cent increase in price has on
the gallons of gasoline demanded.
c. Do the units in which the explanatory variable is measured influence the estimate of how
the explanatory variable affects the dependent variable?
7. Generate a new variable that equals the dollar price of gasoline, PriceDollars, plus 2. Call
this new variable PriceDollarsPlus2.
∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
bx is expressed in terms of the x’s and y’s. We wish to express bx in terms of the x’s, e’s, and βx.
Strategy: Focus on the numerator of the expression for bx and substitute for the y’s to express
the numerator in terms of the x’s, e’s, and βx. As we will shortly show, once we do this, our goal
will be achieved.
∑
T
We begin with the numerator, t =1
( yt − y )( xt − x ), and substitute βConst + βxxt + et for yt:
∑ ( yt − y )( xt − x ) = ∑ t =1 (βConst + β x xt + et − y )( xt − x )
T T
t =1
Rearranging terms
= ∑ t =1 (βConst − y + β x xt + et )( xt − x )
T
= ∑ t =1 (βConst + β x x − y + β x xt − β x x + et )( xt − x )
T
Simplifying
= ∑ t =1[(βConst + β x x − y ) + β x ( xt − x ) + et )]( xt − x )
T
= ∑ t =1 (βConst + β x x − y )( xt − x ) + ∑ t =1 β x ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T
214 Chapter 6
= (βConst + β x x − y )∑ t =1 ( xt − x ) + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T
∑ ( xt − x ) = ∑ t =1 xt − ∑ t =1 x
T T T
t =1
∑
T
Replacing t =1
x with Tx– ,
= ∑ t =1 xt − T x
T
since x = ∑ t =1 xt T
T
∑
T
x
= ∑ t =1 xt t =1 t
T
−T
T
Simplifying
= ∑ t =1 xt − ∑ t =1 xt
T T
=0
∑
T
Next return to the expression for the numerator, t =1
( yt − y )( xt − x ) :
∑ ( yt − y )( xt − x ) = (βConst + β x x − y )∑ t =1 ( xt − x ) + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T T
t =1
↓ ∑ t =1 ( xt − x ) = 0
T
= 0 + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T
Therefore
∑ ( yt − y )( xt − x ) = β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T
t =1
∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
215 Ordinary Least Squares Estimation Procedure—The Properties
β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T
=
∑
T
t =1
( xt − x )2
β x ∑ t =1 ( xt − x )2 ∑
T T
t =1
( xt − x )et
= +
∑ ∑
T T
t =1
( xt − x )2 t =1
( xt − x )2
∑
T
( xt − x )et
= βx + t =1
∑
T
t =1
( xt − x )2
Gauss–Markov theorem: When the standard ordinary least squares (OLS) premises are satis-
fied, the ordinary least squared (OLS) estimation procedure is the best linear unbiased estimation
procedure.
Proof Let
bOLS
x = ordinary least squares (OLS) estimate
∑
T
t =1
( yt − y )( xt − x )
bOLS
x =
∑
T
t =1
( xi − x )2
where
( xt − x )
wtOLS =
∑
T
i =1
( xi − x )2
3. To reduce potential confusion, the summation index in the denominator has been changed from t to i.
216 Chapter 6
Let wOLS
t equal the ordinary least squares (OLS) “linear weights”; more specifically,
bxOLS = ∑ t =1 wtOLS ( yt − y )
T
∑
T
•
t =1
wtOLS = 0
∑
T
•
t =1
wtOLS ( xt − x ) = 1
∑
T
First, t =1
wtOLS = 0:
( xt − x )
∑ wtOLS = ∑ t =1
T T
∑
t =1 T
i =1
( xi − x )2
∑
T
t =1
( xt − x )
=
∑
T
t =1
( xi − x )2
∑ x − ∑ t =1 x
T T
t =1 t
=
∑
T
t =1
( xi − x )2
∑
T
t =1 t
x −Tx
=
∑
T
t =1
( x − x )2
i
and since x = ∑ t =1 xt T
T
=
∑
T
t =1 t
x −T (∑ T
x T
t =1 t )
∑
T
t =1
( xi − x ) 2
217 Ordinary Least Squares Estimation Procedure—The Properties
Simplifying
∑ x −∑ x
T T
t =1 t t =1 t
=
∑ (x − x )
T 2
t =1 i
=0
∑
T
Second, t =1
wtOLS = 0:
( xt − x )
∑ wtOLS ( xt − x ) = ∑ t =1
T T
( xt − x )
∑
t =1 T
i =1
( xi − x )2
Simplifying
( xt − x ) 2
= ∑ t =1
T
∑
T
i =1
( xi − x )2
∑
T
t =1
( xt − x ) 2
=
∑
T
t =1
( xi − x )2
=1
Next consider a new linear estimation procedure whose weights are wtOLS + w′t. Only when each
w′t equals 0 will this procedure to identical to the ordinary least squares (OLS) estimation pro-
cedure. Let b′x equal the coefficient estimate calculated using this new linear estimation
procedure:
= ∑ t =1 (wtOLS + wt′)(βConst + β x xt + et )
T
218 Chapter 6
Multiplying through
Factoring out βConst from the first term and βx from the second
∑ ∑
T T
since t =1
wtOLS xt = 0 and t =1
wtOLS xt = 1
Therefore
Now calculate the mean of the new estimate’s probability distribution, Mean[b′t]:
Focusing on the last term, since the error terms represents random influences, Mean[et] = 0,
Mean[b′x] = βx
219 Ordinary Least Squares Estimation Procedure—The Properties
Therefore
∑ ∑
T T
t =1
wt′ = 0 and t =1
wt′xt = 0
= ∑ t =1 Var[(wtOLS + wt′)et ]
T
∑
T
Now focus on the cross product terms, t =1
wtOLS wt′ :
( xt − x )
∑ wtOLS wt′ = ∑ t =1
T T
wt′
∑
t =1 T
i =1
( xi − x )2
220 Chapter 6
∑
T
t =1
( xt − x )wt′
=
∑
T
i =1
( xi − x )2
=
∑
T
t =1 ( x w′ − ∑
t t
T
t =1
xwt′ )
∑
T
i =1
( xi − x ) 2
∑ x w ′ − x ∑ t =1 wt′
T T
t =1 t t
=
∑
T
i =1
( xi − x )2
∑ ∑
T T
since t =1 t
x wt′ = 0 and t =1
wt′ = 0
0−0
=
∑
T
i =1
( xi − x )2
=0
Therefore
∑
T
since t =1
wtOLS wt′ = 0
Chapter 7 Outline
7.1 Review
7.1.1 Clint’s Assignment
7.1.2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure
7.1.3 Importance of the Coefficient Estimate’s Probability Distribution
7.2 Strategy to Estimate the Variance of the Coefficient Estimate’s Probability Distribution
7.3 Step 1: Estimate the Variance of the Error Term’s Probability Distribution
7.3.1 First Attempt: Variance of the Error Term’s Numerical Values
7.3.2 Second Attempt: Variance of the Residual’s Numerical Values
7.3.3 Third Attempt: “Adjusted” Variance of the Residual’s Numerical Values
7.4 Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate
the Variance of the Coefficient Estimate’s Probability Distribution
b. In reality, we do not know the actual value of the constant and coefficient. We used the
ordinary least squares (OLS) estimation procedure to estimate their values. The estimated
constant was 63 and the estimated value of the coefficient was 6/5. Fill in the blanks below
to calculate each student’s residual and the residual squared. Then, compute the sum of the
squared residuals.
bConst = 63
6
bx = = 1.2 Rest = yt − (bConst + bxxt)
5
6
Student xt yt Estyt = 63 + xt Rest 1st quiz Res 2t 1st quiz
5
6
1 5 66 63 + × ___ = ____ _____ _____
5
6
2 15 87 63 + × ___ = ____ _____ _____
5
6
3 25 90 63 + × ___ = ____ _____ _____
5
Sum = _____
223 Estimating the Variance of an Estimate’s Probability Distribution
c. Compare the sum of squared errors with the sum of squared residuals.
d. In general, when applying the ordinary least squares (OLS) estimation procedure could
the sum of squared residuals ever exceed the sum of squared errors? Explain.
3. Suppose that student 2 had missed Professor Lord’s quiz.
Student xt yt
1 5 66 xt = minutes studied
3 25 90 yt = quiz score
7.1 Review
Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz scores.
7.1.2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure
An estimate’s probability distribution describes the general properties of the estimation proce-
dure. In the last chapter we showed that when the standard ordinary least squares (OLS) premises
are met, the mean of the coefficient estimate’s probability distribution equals the actual value,
βx, and the variance equals the variance of the error term’s probability distribution divided by
the sum of the squared x deviations, Var[ e] ∑ t =1 ( xt − x )2 :
T
224 Chapter 7
Var[ e]
Mean[ bx ] = β x, Var[ bx ] =
∑
T
t =1
( xt − x ) 2
Let us now review the importance of the mean and variance. In general, the mean and variance
of the coefficient estimate’s probability distribution play important roles:
• Mean: When the mean of the estimate’s probability distribution equals the actual value the
estimation procedure is unbiased. An unbiased estimation procedure does not systematically
underestimate or overestimate the actual value.
• Variance: When the estimation procedure is unbiased, the variance of the estimate’s prob-
ability distribution determines the reliability of the estimate. As the variance decreases, the
probability distribution becomes more tightly cropped around the actual value making it more
likely for the coefficient estimate to be close to the actual value:
We can apply these general concepts to the ordinary least squares (OLS) estimation procedure.
The mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual value
of the coefficient, βx; consequently the ordinary least squares (OLS) estimation procedure is
unbiased. The variance of the coefficient estimate’s probability distribution is now important;
the variance determines the reliability of the estimate. What does the variance equal? We derived
the equation for the variance:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2
But neither Clint nor we know the variance of the error term’s probability distribution, Var[e].
How then can the variance of the variance of the coefficient estimate’s probability distribution
be calculated? How can of Clint proceed? When Clint was faced with a similar problem before,
what did he do? Clint used the econometrician’s philosophy:
Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
What information does Clint have? Clint has the data from Professor Lord’s first quiz
(table 7.1).
225 Estimating the Variance of an Estimate’s Probability Distribution
Table 7.1
First quiz results
1 5 66
2 15 87
3 25 90
How can Clint use this information to estimate the variance of the coefficient estimate’s probabil-
ity distribution?
7.2 Strategy to Estimate the Variance of the Coefficient Estimate’s Probability Distribution
Clint needs a procedure to estimate the variance of the coefficient estimate’s probability distribu-
tion. Ideally this procedure should be unbiased. That is, it should not systematically underesti-
mate or overestimate the actual variance. His approach will be based on the relationship between
the variance of the coefficient estimate’s probability distribution and the variance of the error
term’s probability distribution that we derived in chapter 6:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2
Clint’s strategy is to replace the actual variances in this equation with estimated variances:
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2
where
Step 1: Clint estimates the variance of the error term’s probability distribution.
Step 2: Clint uses the estimate for the variance of the error term’s probability distribution to
estimate the variance for the coefficient estimate’s probability distribution.
226 Chapter 7
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓
Var[ e]
Var[ bx ] =
∑
T
EstVar[e] t =1
( xt − x ) 2
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2
7.3 Step 1: Estimating the Variance of the Error Term’s Probability Distribution
The data from Professor Lord’s first quiz is the only available information that Clint can use to
estimate the variance of the error term’s probability distribution, Var[e].
We will now describe three attempts to estimate the variance using the results of Professor
Lord’s first quiz by calculating the following:
1. Variance of the error term’s numerical values from the first quiz.
2. Variance of the residual’s numerical values from the first quiz.
3. “Adjusted” variance of the residual’s numerical values from the first quiz.
In each case we will use simulations to assess these attempts by exploiting the relative frequency
interpretation of probability:
Relative frequency interpretation of probability: After many, many repetitions of the experiment,
the distribution of the numerical values from the experiments mirrors the random variable’s
probability distribution; the two distributions are identical:
Distribution of the
numerical values
After many,
many
repetitions
Probability distribution
The first two attempts fail. Nevertheless, they provide the motivation for the third attempt which
succeeds. Even though the first two attempts fail, it is instructive to explore them.
227 Estimating the Variance of an Estimate’s Probability Distribution
7.3.1 First Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
Variance of the Error Term’s Numerical Values from the First Quiz
In reality Clint cannot observe the actual parameters, βConst and βx, but for the moment, assume
that we know them. If we were privy to the actual parameters, we would be able to calculate
the actual numerical values of the error terms for each of our three students from the first quiz:
Student 1’s error term Student 2’s error term Student 3’s error term
↓ ↓ ↓
e1 = y1 − (βConst + βxx1) e2 = y2 − (βConst + βxx2) e3 = y3 − (βConst + βxx3)
How could we use these three numerical values for the error terms from the first quiz to
estimate the variance of the error term’s probability distribution? Why not calculate the variance
of the numerical values of the three error terms and then use that variance to estimate the vari-
ance of the error term’s probability distribution? That is,
Recall that the variance is the average of the squared deviations from the mean:
We address this question by using the Estimating Variances simulation in our Econometrics Lab
figure 7.1):
Figure 7.1
Variance of the error term’s probability distribution simulation
229 Estimating the Variance of an Estimate’s Probability Distribution
By selecting “Err” in the “Use” line and “T” in the “Divide by” line, the simulation mimics the
procedure that we just described to estimate the variance of the error term’s probability distribu-
tion. Also note that the actual variance of the error term’s probability distribution equals 500 by
default.
Be certain that the Pause checkbox is checked and click Start. The simulation reports the sum
of squared errors (SSE) and the estimate for variance of the error term’s probability distribution
(Error Var Est) based on the data for the first repetition:
EstVar[ e] = Var[e1, e2, and e3 for 1st repetition]
Sum of squared errors for 1st repetition
=
T
Convince yourself that the simulation is calculating EstVar[e] correctly by applying the proce-
dure we just outlined. Then click Continue to simulate a second quiz. The simulation now
reports on the estimate for variance of the error term’s probability distribution (Error Var Est)
based on the data for the second repetition:
EstVar[ e] = Var[e1, e2, and e3 for 2 nd repetition]
Sum of squared errors for 2nd repetition
=
T
Again, convince yourself that the simulation is calculating EstVar[e] by applying the procedure
we outlined. Also the simulation calculates the mean (average) of the two variance estimates;
the mean of the variance estimates is reported in the Mean line directly below Error Var Est.
Convince yourself that the simulation is calculating the mean of the variance estimates
correctly.
Click Continue a few more times. Note that for some repetitions the estimated variance is
less than the actual variance and sometimes the estimate is greater than the actual. Does this
estimation procedure for the variance systematically underestimate or overestimate the actual
variance or is the estimation procedure unbiased? We can apply the relative frequency
230 Chapter 7
interpretation of probability to address this question by comparing the mean (average) of the
variance estimates with the actual variance after many, many repetitions. If the estimation pro-
cedure is unbiased, the mean of the variance estimates will equal the actual variance of the error
term’s probability distribution, 500 in this case, after many, many repetitions:
Clear the Pause checkbox and click Continue; after many, many repetitions click Stop. The
mean of the estimates for the error term’s variance equals about 500, the actual variance. Next
change the actual variance to 200; click Start, and then after many, many repetitions click Stop.
Again, the mean of the estimates approximately equals the actual value. Finally, change the
actual variance to 50 and repeat the procedure (table 7.2).
The simulation illustrates that this estimation procedure does not systematically underestimate
or overestimate the actual variance; that is, this estimation procedure for the variance of the error
term’s probability distribution is unbiased. But does this help Clint? Unfortunately, it does not.
To calculate the error terms we must know that actual value of the constant, βConst, and the actual
value of the coefficient, βx. In a simulation we can specify the actual values of the parameters,
βConst and βx, but neither Clint nor we know the actual values for Professor Lord’s quiz. After
all, if Clint knew the actual value of the coefficient, he would not need to go through the trouble
of estimating it, would he? The whole problem is that Clint will never know what the actual
value equals, that is why he must estimate it. Consequently this estimation procedure does not
help Clint; he lacks the information to perform the calculations. So, what should he do?
Table 7.2
Error Term Variance simulation results—First attempt
7.3.2 Second Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
Variance of the Residual’s Numerical Values from the First Quiz
Clint cannot calculate the actual values of the error terms because he does not know the actual
values of the parameters, βConst and βx. So he decides to do the next best thing. He has already
used the data from the first quiz to estimate the values of βConst and βx.
∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2 bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
∑
T
t =1
( xt − x ) 2 200 5 5
Clint’s estimate of βConst is 63 and βx is 1.2. Consequently, why not use these estimated values
for the constant and coefficient to estimate the numerical values of error terms for the three
students? In other words, just use the residuals to estimate the error terms:
Then use the variance of the three numerical values of the residuals to estimate the variance of
the error term’s probability distribution:
Recall that the variance equals the average of the squared deviations from the mean:
Clint can easily calculate the variance of the estimated errors when using the residuals to do so:
EstVar[ e] = Var[ Res1, Res2, and Res3 for 1st quiz]
( Res1 − Mean[ Res])2 + ( Res2 − Mean[ Res])2 + ( Res3 − Mean[ Res])2
=
3
Res1 + Res2 + Res3
2 2 2
=
3
Sum of squared residuals for 1st quiz SSR for 1st quiz 54
= = = = 18
3 3 3
The good news is that Clint can indeed perform these calculations. He can calculate the
residuals and therefore can estimate the variance of the error term’s probability distribution using
this procedure. Unfortunately, there is also some bad news. This estimation procedure is biased;
it systematically underestimates the variance of the error term’s probability distribution.
1. Using a little algebra, we can in fact show that the mean of the residuals must always equal 0 when we use the
ordinary least squares (OLS) estimation procedure.
233 Estimating the Variance of an Estimate’s Probability Distribution
Note that the “Res” is selected in the “Use” line, indicating that the variance of the residuals
rather than the error terms will be used to estimate the variance of the error term’s probability
distribution. As before, the actual variance of the error term’s probability distribution is specified
as 500 by default. Be certain that the Pause checkbox is cleared; click Start and after many,
many repetitions click Stop. The mean (average) of the estimates for the variance equals about
167 while the actual variance of the error term is 500. Next select a variance of 200 and then
50 and repeat the process. Convince yourself that this procedure consistently underestimates the
variance.
The mean of the estimates is less than the actual values; this estimation procedure is biased
downward (table 7.3). This estimation procedure systematically underestimates the variance of
the error term’s probability distribution.
Econometrics Lab 7.3: Comparing the Sum of Squared Residuals and the Sum of Squared Errors
To understand why this estimation procedure is biased downward, we will return to the Estimat-
ing Variances simulation.
This time, be certain that the Pause checkbox is checked and then click Start. Note that both
the sum of squared errors and the sum of squared residuals are reported. Which is less in the
first repetition? Click the Continue button to run the second repetition. Which sum is less in
the second repetition? Continue to do this until you recognize the pattern that is emerging. The
sum of squared residuals is always less than the sum of squared errors. Why?
Table 7.3
Error Term Variance simulation results—Second attempt
Recall how bConst and bx were chosen. They were chosen so as to minimize the sum of squared
residuals:
SSR = Res 12 + Res 22 + Res 23 = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2
The sum of squared residuals, Res 21 + Res 22 + Res 23 , would equal the actual sum of squared errors,
e 21 + e 22 + e 23 , only if bConst equaled βConst and bx equaled βx:
Only if
bConst = βConst and bx = βx
↓
Res + Res + Res = e 21 + e 22 + e 23
2
1
2
2
2
3
As a consequence of random influences we can never expect the estimates to equal the actual
values, however. That is, we must expect the sum of squared residuals to be less than the sum
of squared errors:
Typically
bConst ≠ βConst and bx ≠ βx
↓
Res + Res + Res < e 21 + e 22 + e 23
2
1
2
2
2
3
Divide both sides of the inequality by 3 to compare the variance of the Res’s and e’s:
Res12 + Res22 + Res32 e12 + e22 + e32
<
3 3
↓
Var[Res1, Res2, and Res3] < Var[e1, e2, and e3]
The variance of the residuals will be less than the variance of the actual errors. Recall our
first attempt to estimate the variance of the error term’s probability distribution. When we used
the variance of the actual errors, the procedure was unbiased:
Res12 + Res22 + Res32 e12 + e22 + e32
<
3 3
↓
Var[Res1, Res2, and Res3] < Var[e1, e2, and e3]
↓ ↓
Systematically Unbiased estimation
underestimates variance procedure
235 Estimating the Variance of an Estimate’s Probability Distribution
Using the variance of the residuals leads to bias because it systematically underestimates the
variance of the error term’s numerical values. So now, what can Clint do?
7.3.3 Third Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
“Adjusted” Variance of the Residual’s Numerical Values from the First Quiz
While we will not provide a mathematical proof, Clint can correct for this bias by calculating
what we will call the “adjusted” variance of the residuals. Instead of dividing the sum of squared
residuals by the sample size, Clint can calculate the adjusted variance by dividing by what are
called the degrees of freedom:
EstVar[ e] = AdjVar[ Res1, Res2, and Res3 for 1st quiz]
Sum of squared residuals for 1st quiz
=
Degrees of freedom
where
The degrees of freedom equal the sample size less the number of estimated parameters. For the
time being, do not worry about precisely what the degrees of freedom represent and why they
solve the problem of bias. We will motivate the rationale later in this chapter. We do not wish
to be distracted from Clint’s efforts to estimate the variance of the error term’s probability dis-
tribution at this time. So let us postpone the rationalization for now. For the moment we will
accept that fact that the degrees of freedom equal 1 in this case:
We subtract 2 because we are estimating the values of 2 parameters: the constant, βConst, and the
coefficient, βx.
Clint has the information necessary to perform the calculations for the adjusted variance of
the residuals. Recall that we have already calculated the sum of squared residuals:
236 Chapter 7
6
First quiz: bConst = 63 bx = = 1.2
5
6
Student xt yt Estt = bConst + bx xt = 63 + xt Rest = yt − Estyt Res 2t
5
6 −32 = 9
1 5 66 63 + × 5 = 69 66 − 69 = −3
5
6 62 = 36
2 15 87 63 + × 15 = 81 87 − 81 = 6
5
6
3 25 90 63 + × 25 = 93 90 − 93 = −3 −32 = 9
5
Sum = 0 Sum = 54
So we need only divide the sum, 54, by the degrees of freedom to use the adjusted variance to
estimate the variance of the error term’s probability distribution:
EstVar[ e] = AdjVar[ Res1, Res2, and Res3 for 1st quiz]
Sum of squared residuals for 1st quiz 54
= = = 54
Degrees of freedom 1
We will use the Estimating Variances simulation to illustrate that this third estimation procedure
is unbiased.
In the “Divide by” line, select “T−2” instead of “T.” Since we are estimating two parameters,
the simulation will be dividing by the degrees of freedom instead of the sample size. Initially
the variance of the error term’s probability distribution is specified as 500. Be certain that the
Pause checkbox is cleared; click Start, and then after many, many repetitions click Stop. The
mean (average) of the variance estimates equals about 500, the actual variance. Next repeat the
process by selecting a variance of 200 and then 50. Table 7.4 gives the results. In each case the
mean of the estimates equals the actual value after many, many repetitions. This estimation
procedure proves to be unbiased.
237 Estimating the Variance of an Estimate’s Probability Distribution
Table 7.4
Error Term Variance simulation results—Third attempt
7.4 Step 2: Use the Estimate for the Variance of the Error Term’s Probability Distribution to
Estimate the Variance of the Coefficient Estimate’s Probability Distribution
At last Clint has found an unbiased estimation procedure for the variance of the error term’s
probability distribution:
But why did he need this estimate in the first place? He needs it to estimate the variance of the
coefficient estimate’s probability distribution in order to assess the reliability of the coefficient
estimate. Recall his two-step strategy:
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between
probability distribution from the available the variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓
Var[ e]
Var[ bx ] =
∑
T
EstVar[e] t =1
( xt − x ) 2
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2
A little arithmetic allows Clint to estimate the variance of the coefficient estimate’s probability
distribution:
EstVar[ e]
EstVar[ bx ] =
( x1 − x ) + ( x2 − x ) + ( x3 − x )
54
=
(5 − 15)2 + (15 − 15)2 + (25 − 15)2
54 54 54
= = = = .27
(−10) + (0) + (10)
2 2 2
100 + 100 200
238 Chapter 7
Recall that the standard deviation is the square root of the variance; hence we can calculate
the estimated standard deviation by computing the square root of estimated variance:
Step 1: Clint estimates the variance of the error term’s probability distribution.
Step 2: Clint uses the estimate for the variance of the error term’s probability distribution to
estimate the variance for the coefficient estimate’s probability distribution.
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and error
information—data from the first quiz term’s probability distributions
↓ ↓
EstVar[ e] = AdjVar[ Res’s]
Var[ e]
SSR 54 Var[ bx ] =
∑
T
= = = 54 ( xt − x ) 2
Degrees of freedom 1 t =1
EstVar[ e] 54
EstVar[ bx ] = = = 0.27
∑
T
t =1
( xt − x ) 2 200
We have already used a simulation to show that step 1 is justified; that is, we have shown that
the estimation procedure for the variance of the error term’s probability distribution is unbiased.
Now we will justify the step 2 by showing that the estimation procedure for the variance of the
coefficient estimate’s probability distribution is also unbiased. To do so, we will once again
exploit the relative frequency interpretation of probability:
Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
239 Estimating the Variance of an Estimate’s Probability Distribution
An estimation procedure is unbiased whenever the mean (average) of the estimated numerical
values equals the actual value after many, many repetitions:
Econometrics Lab 7.5: Is Estimation Procedure for the Variance of the Coefficient Estimate’s
Probability Distribution Unbiased?
We will use our Estimating Variance simulation in the Econometrics Lab to show that this two-
step estimation procedure for the variance of the coefficient estimate’s probability distribution
is unbiased (figure 7.2).
By default, the actual variance of the error term’s probability distribution is 500 and the sample
size is 3. We can now calculate the variance of the coefficient estimate’s probability
distribution:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2
From before recall that the sum of x squared deviations equals 200:
∑
T
t =1
( xt − x )2 = 200
and that when the variance of the error term’s probability distribution is specified as 500 and
the sum of the squared x deviations equals 200, the variance of the coefficient estimates probabil-
ity distribution equals 2.50:
Var[ e] 500
Var[ bx ] = = = 2.50
∑
T
t =1
( xt − x ) 2 200
Let us begin by confirming that the simulation is performing the calculations correctly. Be
certain that the Pause button is checked, and then click Start. The sum of squared residuals and
the sum of squared x deviations are reported for the first repetition. Use this information along
with a pocket calculator to compute the estimate for the variance of the coefficient estimate’s
probability distribution, EstVar[bx]:
240 Chapter 7
SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
Mean
the coefficient estimate’s Mean(average) of the variance
probability distribution estimates from all repetitions
calculated from this repetition
Figure 7.2
Variance of the coefficient estimate’s probability distribution simulation
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2
where
SSR
EstVar[ e] =
Degrees of freedom
Compare your calculation with the simulation’s estimate. You will discover that they are
identical. Next click Continue and perform the same calculation for the second repetition. Again,
you will discover that the simulation has calculated the estimate for the variance of the coefficient
estimate’s probability distribution correctly. Also confirm that the simulation is computing the
mean of the variance estimates correctly by taking the average of the coefficient variance esti-
mates from the first two repetitions.
241 Estimating the Variance of an Estimate’s Probability Distribution
Click Continue a few more times. The variance estimate should be less than the actual value,
2.50, in some of the repetitions and greater than the actual value in others. Now the critical
question:
Critical question: After many, many repetitions, will the mean (average) of the variance esti-
mates equal the actual variance of the coefficient estimate’s probability distribution?
If the answer is yes, the variance estimation procedure is unbiased; the procedure is not system-
atically overestimating or underestimating the actual variance. If instead the answer is no, the
variance estimation procedure is biased. To answer this question, clear the Pause checkbox
and click Continue. After many, many repetitions click Stop. What do you observe? After
many, many repetitions the average of the coefficient’s variance estimates indeed equals
about 2.50.
Repeat this process after you change the error term variance to 200 and then to 50. As reported
above, the answer to the critical question is yes in all cases. The estimation procedure for the
variance of the coefficient estimate’s probability distribution is unbiased.
7.5.1 Reviewing Our Second and Third Attempts to Estimate the Variance of the Error Term’s
Probability Distribution
Earlier in this chapter we postponed our explanation of degrees of freedom because it would
have interrupted the flow of our discussion. We will now return to the topic by reviewing Clint’s
efforts to estimate the variance of the error term’s probability distribution. Since Clint can never
observe the actual constant, βConst, and the actual coefficient, βx, he cannot calculate the actual
values of the error terms. He can, however, use his estimates for the constant, bConst, and coef-
ficient, bx, to estimate the errors by calculating the residuals:
242 Chapter 7
We can think of the residuals as the estimated “error terms.” Now let us briefly review our second
and third attempts to estimate the variance of the error term’s probability distribution.
In our second attempt we used the variance of the residuals (“estimated errors”) to estimate
the variance of the error term’s probability distribution. The variance is the average of the squared
deviations from the mean:
Since the residuals are the “estimated errors,” it seemed natural to divide the sum of squared
residuals by the sample size, 3 in Clint’s case. Furthermore, since the Mean[Res] = 0,
But we showed that this procedure was biased; the Estimating Variance simulation revealed that
it systematically underestimated the error term’s variance.
We then modified the procedure; instead of dividing by the sample size, we divided by the
degrees of freedom, the sample size less the number of estimated parameters:
The Estimating Variances simulation illustrated that this modified procedure was unbiased.
Why does dividing by 1 rather than 3 “work?” That is, why do we subtract 2 from the sample
size when calculating the average of the squared residuals (“estimated errors”)? To provide some
intuition, we will briefly revisit Amherst precipitation in the twentieth century (table 7.5).
Calculating the mean for June obtains
0.75 + 4.54 + . . . + 7.99 377.76
Mean (average) for June = = = 3.78
100 100
243 Estimating the Variance of an Estimate’s Probability Distribution
Table 7.5
Monthly precipitation in Amherst, MA, during the twentieth century
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24
Each of the 100 Junes in the twentieth century provides one piece of information that we use to
calculate the average. To calculate an average, we divide the sum by the number of pieces of
information.
Key principle: To calculate a mean (an average), we divide the sum by the number of pieces
of information:
Sum
Mean (average) =
Number of pieces of information
Hence, to calculate the average of the squared deviations, the variance, we must divide by the
number of pieces of information.
Now let us return to our efforts to estimate the variance of the error term’s probability
distribution:
Claim: The degrees of freedom equal the number of pieces of information that are available to
estimate the variance of the error term’s probability distribution.
To justify this claim, suppose that the sample size were 2. Plot the scatter diagram (figure 7.3):
• With only two observations, we only have two points.
• The best fitting line passes directly through each of the two points on the scatter diagram.
• Consequently the two residuals, “the two estimated errors,” for each observation must always
equal 0 when the sample size is 2 regardless of what the actual variance of the error term’s
probability distribution equals:
The first two residuals, “the first two estimated errors,” provide no information about the actual
variance of the error term’s probability distribution because the line fits the data perfectly—both
residuals equal 0. Only with the introduction of a third observation do we get some sense of the
error term’s variance (figure 7.4).
244 Chapter 7
Figure 7.3
Degrees of freedom—Two observations
y y
Res3 Res2
Res1
x x
Suggests large error term variance Suggests small error term variance
Figure 7.4
Degrees of freedom—Three observations
245 Estimating the Variance of an Estimate’s Probability Distribution
To summarize:
• The first two observations provide no information about the error term; stated differently, the
first two observations provide “zero” information about the error term’s variance.
• The third observation provides the first piece of information about the error term’s variance.
This explains why Clint should divide by 1 to calculate the “average” of the squared devia-
tions. In general, the degrees of freedom equal the number of pieces of information that we have
to estimate the variance of the error term’s probability distribution:
Degrees of freedom = Sample size − Number of estimated parameters
To calculate the average of the sum of squared residuals, we should divide the sum of squared
residuals by the degrees of freedom, the number of pieces of information.
The ordinary least squares (OLS) estimation procedure actually includes three procedures; a
procedure to estimate the following:
• Regression parameters
• Variance of the error term’s probability distribution
• Variance of the coefficient estimate’s probability distribution
All three estimation procedures are unbiased. (Recall that we are assuming that the standard
ordinary least squares (OLS) premises are met. We will address the importance of these premises
in part IV of the textbook.) We will now review the calculations and then show that statistical
software performs all these calculations for us.
∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2, bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
∑
T
t =1
( xt − x ) 2 200 5 5
We estimated the variance of the error term’s probability distribution, EstVar[e], by dividing the
sum of squared residuals by the degrees of freedom:
SSR 54
EstVar[e] = AdjVar[Res1 + Res2 + Res3 ] = = = 54
Degrees of freedom 1
The square root of this estimated variance is typically called the standard error of the
regression:
Note that the term standard error always refers to the square root of an estimated variance.
The square root of this estimated variance of the coefficient estimate’s probability distribution
is called the standard error of the coefficient estimate, SE[bx]:
We illustrated that these estimation procedure have nice properties. When the standard ordi-
nary least squares (OLS) premises are satisfied:
• Each of these procedures is unbiased.
• The procedure to estimate the value of the parameters is the best linear unbiased estimation
procedure (BLUE).
In reality, we did not have to make all these laborious calculations. Statistical software performs
these calculations for us thereby saving us the task of performing the arithmetic (table 7.6):
Professor Lord’s first quiz data: Cross-sectional data of minutes studied and quiz scores in
the first quiz for the three students enrolled in Professor Lord’s class.
Table 7.6
Quiz scores’ regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
We previously noted the regression results report the parameter estimates and the sum of squared
residuals. While statistical software typically does not report the estimated variance of the error
term’s probability distribution, it does report the standard error of the regression, SE of regres-
sion, which is just the square root of the estimated variance of the error term’s probability dis-
tribution. We can easily calculate the estimated variance of the error term’s probability distribution
from the regression results by squaring the standard error of the regression:
EstVar[e] = 7.3284692 = 54
Similarly, while the statistical software does not report the estimated variance of the coefficient
estimate’s probability distribution, it does report its standard error. We can easily calculate the
estimated variance of the coefficient estimates probability distribution from the regression results
by squaring the standard error of the coefficient estimate:
1. Consider the ordinary least squares (OLS) estimation procedure. How is the variance of the
coefficient estimate’s probability distribution related to the variance of the error term’s probabil-
ity distribution?
2. What strategy have we used to estimate the variance of the coefficient estimate’s probability
distribution?
3. Consider our first attempt to estimate the variance of the error term’s probability
distribution:
e12 + e22 + . . . + eT2 SSE
EstVar[ e] = Var[e1, e2, . . . , eT ] = =
Sample size Sample size
4. Consider our second attempt to estimate the variance of the error term’s probability
distribution:
Res12 + Res22 + . . . + ResT2 SSR
EstVar[ e] = Var[ Res1, Res2, . . . , ResT ] = =
Sample size Sample size
This attempt succeeded. Explain why it is appropriate to divide by the degrees of freedom rather
than the sample size.
Chapter 7 Exercises
Recall Professor Lord’s colleague who is teaching another course in which three students are
enrolled.
Regression example data: Cross-sectional data of minutes studied and quiz scores from a course
taught by Professor Lord’s colleague.
Minutes Quiz
Student studied (x) score (y)
1 5 14
2 10 44
3 30 80
yt = βConst + βxxt + et
Using a calculator and the equations we derived in class, apply the least squares estimation
procedure to find the best fitting line by filling in the blanks:
First, calculate the means:
Means: x– = _______________ =
–y = _______________ =
249 Estimating the Variance of an Estimate’s Probability Distribution
Second, for each student calculate the deviation of x from its mean and the deviation of y from
its mean:
Student yt –y yt − –y xt x– xt − x–
1 14 _____ _____ 5 _____ _____
2 44 _____ _____ 10 _____ _____
3 80 _____ _____ 30 _____ _____
Third, calculate the products of the y and x deviations and squared x deviations for each student;
then calculate the sums:
∑
T
t =1
( yt − y )( xt − x )
bx = = = _________
∑
T
t =1
( xt − x )2
3. Finally, use the quiz data to estimate the variance and standard deviation of the coefficient
estimate’s probability distribution.
4. Check your answers to exercises 1, 2, and 3 using statistical software.
a. Can you estimate the variance of the error term’s probability distribution, EstVar[e]?
b. Can you estimate the variance of the coefficient estimate’s probability distribution,
EstVar[bx]?
∑
T
c. Can you calculate t =1
( xt − x )2 ?
∑
T
d. Can you calculate t =1
( yt − y )( xt − x ) ?
Chapter 8 Outline
1. Run the following simulation and answer the questions posed. Summarize your answers by
filling in the following blanks:
Cynic’s view: Studying has no impact on a student’s quiz score; the positive coefficient estimate
obtained from the first quiz was just “the luck of the draw.” In fact, studying does not affect quiz
scores.
253 Interval Estimates and Hypothesis Testing
b. If the cynic were correct and studying has no impact on quiz scores, what would the actual
coefficient, βx, equal?
c. Is it possible that the cynic is correct? To help you answer this question, run the following
simulation:
We will begin by taking stock of where Clint stands. Recall the theory he must assess:
Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz scores.
Clint uses a simple regression model to assess the theory. Quiz score is the dependent variable
and number of minutes studied is the explanatory variable:
yt = βConst + βxxt + et
where
βConst and βx are the model’s parameters. They incorporate the view that Professor Lord awards
each student some points just for showing up; subsequently, the number of additional points
each student earns depends on how much he/she studied:
• βConst represents the number of points Professor Lord gives a student just for showing up.
• βx represents the number of additional points earned for each additional minute of study.
Since the values of βConst and βx are not observable, Clint adopted the econometrician’s
philosophy:
Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
254 Chapter 8
Clint used the results of the first quiz to estimate the values of βConst and βx by applying the
ordinary least squares (OLS) estimation procedure to find the best fitting line:
6
1 5 66 → bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
5
2 15 87
3 25 90
Clint’s estimates suggest that Professor Lord gives each student 63 points for showing up; sub-
sequently, each student earns 1.2 additional points for each additional minute studied.
Clint realizes that he cannot expect the coefficient estimate to equal the actual value; in fact,
he is all but certain that it will not. So now Clint must address two related issues:
• Estimate reliability: How reliable is the coefficient estimate, 1.2, calculated from the first
quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to
the actual value?
• Theory assessment: How confident should Clint be that the theory is correct, that studying
improves quiz scores?
We will address both of these issues in this chapter. First, we consider estimate reliability.
Interval estimate question: What is the probability that the estimate, 1.20, lies within ____ of
the actual value? ____
The general properties of the ordinary least squares (OLS) estimation procedure allow us to
address this question. It is important to distinguish between the general properties and one spe-
cific application. Recall that the general properties refer to what we know about the estimation
procedure before the quiz is given; the specific application refers to the numerical values of the
estimates calculated from the results of the first quiz:
255 Interval Estimates and Hypothesis Testing
↓
Mean and variance describe the center and spread of the estimate’s probability distribution
The estimates are random variables and a quiz can be viewed as an experiment. We cannot
determine the numerical value of an estimate with certainty before the experiment (quiz) is
conducted. What then do we know beforehand? We can describe the probability distribution of
the estimate. We know that the mean of the coefficient estimate’s probability distribution equals
the actual value of the coefficient and its variance equals the variance of the error term’s probabil-
ity distribution divided by the sum of squared x deviations:
Both the mean and variance of the coefficient estimate’s probability distribution play a
crucial role:
• Since the mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual
value of the coefficient, βx, the estimation procedure is unbiased; the estimation procedure does
not systematically underestimate or overestimate the actual coefficient value.
• When the estimation procedure for the coefficient value is unbiased, the variance of the esti-
mate’s probability distribution, Var[bx], determines the reliability of the estimate; as the variance
decreases, the probability distribution becomes more tightly cropped around the actual value;
consequently it becomes more likely for the coefficient estimate to be close to the actual coef-
ficient value.
To assess his estimate’s reliability, Clint must consider the variance of the coefficient esti-
mate’s probability distribution. But we learned that Clint can never determine the actual variance
of the error term’s probability distribution, Var[e]. Instead, Clint employs a two step strategy for
estimating the variance of the coefficient estimate’s probability distribution:
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between
probability distribution from the available the variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
EstVar[ e] = AdjVar[ Res’s]
Var[ e]
SSR 54 Var[ bx ] =
∑
T
= = = 54 ( xt − x ) 2
Degrees of freedom 1 t =1
EstVar[ e] 54
EstVar[ bx ] = = = 0.27
∑
T
t =1
( xt − x ) 2 200
Unfortunately, there is one last complication before we can address the interval estimate
question.
8.2.1 Normal Distribution versus the Student t-Distribution: One Last Complication
We begin by reviewing the normal distribution. Recall that the variable z played a critical role
in using the normal distribution:
Value of random variable − Distribution mean
z=
Distribution sttandard deviation
= Number of standard deviations from the mean
257 Interval Estimates and Hypothesis Testing
In words, z equals the number of standard deviations the value lies from the mean. But Clint
does not know what the variance and standard deviation of the coefficient estimate’s probability
distribution equal. That is why he must estimate them. Consequently he cannot use the normal
distribution to calculate probabilities.
When the standard deviation is not known and must be estimated, the Student t-distribution
must be used. The variable t is similar to the variable z; instead of equaling the number of stan-
dard deviations the value lies from the mean, t equals the number of estimated standard devia-
tions the value lies from the mean:
Value of random variable − Distribution mean
t=
Estimated distributiion standard deviation
= Number of estimated standard deviations from the mean
Recall that the estimated standard deviation is called the standard error; hence
Value of random variable − Distribution mean
t=
Standard error
= Numbeer of standard errors from the distribution mean
Like the normal distribution, the t-distribution is symmetric about its mean. Since estimating
the standard deviation introduces an additional element of uncertainty, the Student t-distribution
is more “spread out” than the normal distribution as illustrated in figure 8.1. The Student
Normal
Student t
Figure 8.1
Normal and Student t-distributions
258 Chapter 8
How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident
should Clint be that the coefficient estimate, 1.2, will be close to the actual value? The interval
estimate question to address this question:
Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
____ of the actual coefficient value? ____
We begin by filling in the first blank, choosing our “close to” value. The value we choose
depends on how demanding we are; that is, our “close to” value depends on the range that we
consider to be “close to” the actual value. For purposes of illustration, we will choose 1.5; so
we write 1.5 in the first blank.
Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
1.5 of the actual coefficient value? ____
Figure 8.2 illustrates the probability distribution of the coefficient estimate and the probability
that we wish to calculate. The estimation procedure we used to calculate the coefficient estimate,
the ordinary least squares (OLS) estimation procedure is unbiased:
Mean[bx] = βx
Consequently we place the actual coefficient value, βx, at the center of the probability
distribution.
As discussed above, we must use the Student t-distribution rather than the normal distribution
since we must estimate the standard deviation of the probability distribution. The regression
results from Professor Lord’s first quiz provide the estimate (table 8.1).
The standard error equals the estimated standard deviation. t equals the number of standard
errors (estimated standard deviations) that the value lies from the distribution mean:
259 Interval Estimates and Hypothesis Testing
bx
1.5 1.5
Figure 8.2
Probability distribution of coefficient estimate—”Close to” value equals 1.5
Table 8.1
Quiz scores regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
Since the distribution mean equals the actual value, we can “translate” 1.5 below and above the
actual value into t’s. Since the standard error equals 0.5196, 1.5 below and above the actual
value translates into 2.89 standard errors below and above the actual value:
260 Chapter 8
b
1.5 1.5 x
2.89 SE's 2.89 SE's
βx −1. 5 Actual value = βx βx + 1.5
t = −2.89 t = 2.89
Figure 8.3
Probability distribution of coefficient estimate—”Close to” value equals 1.5
To summarize:
We can now use the Econometrics Lab to calculate the probability that the estimate is within
1.5 of the actual value by computing probabilities the left and right tails probabilities.1
1. Appendix 8.2 shows how we can use the Student t-distribution table to address the interval estimate question. Since
the table is cumbersome, we will use the Econometrics Lab to do so.
261 Interval Estimates and Hypothesis Testing
Since the Student t-distribution is symmetric, both the left and right tail probabilities equal
0.11 (figure 8.4). Hence, the probability that the estimate is within 1.5 of the actual value
equals 0.78:
0.78
0.11 0.11
b
1.5 1.5 x
2.89 SE's 2.89 SE's
βx −1. 5 Actual value = βx βx + 1.5
t = −2.89 t = 2.89
Figure 8.4
Probability distribution of coefficient estimate—Applying Student t-distribution
262 Chapter 8
We can now fill in the second blank in the interval estimate question:
Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
1.5 of the actual coefficient value? 0.78
Hypothesis testing allows Clint to assess how much confidence he should have in the theory.
We begin by motivating hypothesis testing using the same approach as we took with Clint’s
opinion poll. We will play the role of the cynic. Then we will formalize the process.
Recall that the “theory” suggests that a student’s score on the quiz depends on the number of
minutes he/she studies:
yt = βConst + βxxt + et
The theory suggests that βx is positive. Review the regression results for the first quiz (table 8.2).
Table 8.2
Quiz scores regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
The estimate for βx, 1.2, is positive. We estimate that an additional minute of studying
increases a student’s quiz score by 1.2 points. This lends support to Clint’s theory. But, how
much confidence should Clint have in the theory? Does this provide definitive evidence that
Clint’s theory is correct, or should we be skeptical? To answer this question, recall our earlier
hypothesis-testing discussion and play the cynic. What would a cynic’s view of our theory and
the regression results be?
Cynic’s view: Studying has no impact on a student’s quiz score; the positive coefficient estimate
obtained from the first quiz was just “the luck of the draw.” In fact, studying has no effect on
quiz scores; the actual coefficient, βx, equals 0.
In the simulation, the default actual coefficient value is 0. Check the From–To checkbox. Also
0 is specified in the From list. In the To list, no value is specified; consequently there is no upper
From–To bound. The From–To Percent box will report the percent of repetitions in which the
coefficient estimate equals 0 or more. Be certain that the “Pause” checkbox is cleared. Click
Start, and then after many, many repetitions click Stop. In about half of the repetitions the
coefficient estimate is positive; that is, when the actual coefficient, βx, equals 0, the estimate is
positive about half the time. The histogram illustrates this. Now, we can apply the relative fre-
quency interpretation of probability. If the actual coefficient were 0, the probability of obtaining
a positive coefficient from one quiz would be about one-half as illustrated in figure 8.5.
Consequently we cannot dismiss the cynic’s view as absurd.
To assess the cynic’s view, we pose the following question:
Question for the cynic: What is the probability that the result would be like the one obtained
(or even stronger), if studying actually has no impact on quiz scores? That is, what is the prob-
ability that the coefficient estimate from the first quiz would be 1.2 or more, if studying had no
impact on quiz scores (if the actual coefficient, βx, equals 0)?
Answer: Prob[Results IF cynic correct].
264 Chapter 8
If βx = 0
Prob[b > 0] ≈ 0.50
x
bx
Figure 8.5
Probability distribution of coefficient estimate—Could the cynic be correct?
The magnitude of the probability determines the likelihood that the cynic is correct, the likeli-
hood that studying has no impact on quiz scores:
To compute this probability, let us review what we know about the probability distribution of
the coefficient estimate:
Question for the cynic: What is the probability that the coefficient estimate from the first quiz
would be 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, βx,
equaled 0)?
Econometrics Lab 8.3: Using the Econometrics Lab to Calculate Prob[Results IF cynic correct]
Mean: 0
Standard error: 0.5196
Value: 1.2
Degrees of freedom: 1
Click Calculate. The probability that the estimate lies in the right tail equals 0.13. The answer
to the question for the cynic is 0.13 (figure 8.6):
In fact there is an even easier way to compute the probability. We do not even need to use the
Econometrics Lab to because the statistical software calculates this probability automatically.
To illustrate this, we will first calculate the t-statistic based on the premise that the cynic is
correct, based on the premise that the actual value of the coefficient equals 0:
Value of random variable − Distribution mean 1.2 − 0
t= = = 2.309
Standard Error 0.5196
= Number of standard errors from the distributtion mean
1.2 lies 2.309 standard errors from 0. Next return to the regression results (table 8.3) and focus
attention on the row corresponding to the coefficient and on the “t-Statistic” and “Prob” columns.
Student t-distribution
Mean = 0
SE = 0.5196
DF = 1
0.13
bx
0 1.2
Figure 8.6
Probability distribution of coefficient estimate—Prob[Results IF cynic correct]
266 Chapter 8
Table 8.3
Quiz scores regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
t-statistic based on the premise that the cynic is correct and the actual coefficient equals 0. The
t-Statistic column reports the number of standard errors the coefficient estimate based on the
premise that the actual coefficient equals 0.
• Second, the Prob column equals 0.2601. This is just twice the probability we just calculated
using the Econometrics Lab:
2 × 0.13 = 0.26
The Prob column is based on the premise that the actual coefficient equals 0 and then focuses
on the two tails of the probability distribution where each tail begins 1.2 (the numerical value
of the coefficient estimate) from 0. As figure 8.7 illustrates, the value in the Prob column equals
the probability of lying in the tails; the probability that the estimate resulting from one week’s
quiz lies at least 1.2 from 0 assuming that the actual coefficient, βx, equals 0. That is, the Prob
column reports the tails probability:
Tails probability: The probability that the coefficient estimate, bx, resulting from one regression
would lie at least 1.2 from 0 based on the premise that the actual coefficient, βx, equals 0.
Consequently we do not need to use the Econometrics Lab to answer the question that we
pose for the cynic:
267 Interval Estimates and Hypothesis Testing
Student t-distribution
Mean = 0
SE = 0.5196
DF = 1
0.2601/2 0.2601/2
b
x
1. 2 1.2
−1. 2 0 1.2
Figure 8.7
Probability distribution of coefficient estimate—Tails probability
Student t-distribution
Mean = 0
SE = 0.5196
DF = 1
0.2601/2
b
x
0 1.2
Figure 8.8
Probability distribution of coefficient estimate—Prob[Results IF cynic correct]
Question for the cynic: What is the probability that the coefficient estimate from the first quiz
is 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, βx, equals 0)?
Answer: Prob[Results IF cynic correct]
We can use the regression results to answer this question. From the Prob column we know
that the tails probability equals 0.2601. As shown in Figure 8.8, we are only interested in the
right tail, however, the probability that the coefficient estimate will equal 1.2 or more, if the
actual coefficient equals 0.
268 Chapter 8
Since the Student t-distribution is symmetric, the probability of lying in one of the tails is
0.2601/2. The answer to the question we posed to assess the cynic’s view is 0.13:
Tails probability 0.2601
Prob[ Results IF cynic correct ] = = ≈ 0.13
2 2
We formalized hypothesis testing in chapter 4 when we considered Clint’s public opinion poll.
We will follow the same steps here, with one exception. We add a step 0 to construct an appro-
priate model to assess the theory.
yt = βConst + βxxt + et
where
yt = quiz score
xt = minutes studied
βConst = points for showing up
βx = points for each minute studied
Step 1: Collect data, run the regression, and interpret the estimates (table 8.4).
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, studying has no impact on quiz scores. The results were just
“the luck of the draw.”
269 Interval Estimates and Hypothesis Testing
Table 8.4
Quiz scores regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
Now we construct the null and alternative hypotheses. Like the cynic, the null hypothesis chal-
lenges the evidence; the alternative hypothesis is consistent with the evidence:
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
The magnitude of this probability determines whether we reject the null hypothesis:
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
We have already calculated this probability. First, we did so using the Econometrics Lab. Then,
we noted that the statistical software had done so automatically. We need only divide the tails
probability, as reported in the Prob column of the regression results, by 2:
0.2601
Prob[ Results IF H 0 true] = ≈ 0.13
2
The probability that the coefficient estimate in one regression would be 1.2 or more if H0 were
actually true (if the actual coefficient, βx, equals 0) is 0.13.
The significance level is the dividing line between the probability being small and the prob-
ability being large.
Recall that the traditional significant levels used in academia are 1, 5, and 10 percent. Obviously
0.13 is greater than 0.10. Consequently Clint would not reject the null hypothesis that studying
has no impact on quiz scores even with a 10 percent significance level.
Let us sum up what we have learned about the ordinary least squares (OLS) estimation
procedure:
271 Interval Estimates and Hypothesis Testing
yt = βConst + βxxt + et
where
yt = dependent variable
et = error term
xt = explanatory variable
t = 1, 2, . . . , T
T = sample size
The error term is a random variable; it represents random influences. The mean of the each error
term’s probability distribution equals 0:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:
8.4.3 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation
Procedure
There are three important estimation procedures embedded within the ordinary least squares
(OLS) estimation procedures:
• A procedure to estimate the values of the regression parameters, βx and βConst:
∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2
272 Chapter 8
• A procedure to estimate the variance of the error term’s probability distribution, Var[e]:
SSR
EstVar[ e] =
Degrees of freedom
•A procedure to estimate the variance of the coefficient estimate’s probability distribution,
Var[bx]:
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2
8.4.4 Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard
Ordinary Least Squares (OLS) Premises
When the standard ordinary least square (OLS) premises are met:
•Each estimation procedure is unbiased; each estimation procedure does not systematically
underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).
yt = βConst + βxxt + et
where
yt = quiz score
xt = minutes studied
et = error term
βConst = points for showing up
βx = points for each minute studied
Correlation results whenever a causal relationship describes the reality accurately. That is, when
additional studying indeed increases quiz scores, studying and quiz scores will be (positively)
correlated:
• Knowing the number of minutes a student studies allows us to predict his/her quiz score.
• Knowing a student’s quiz score helps us predict the number of minutes he/she has studied.
More generally, a causal model that describes reality accurately implies correlation:
Chapter 8 Exercises
Petroleum consumption data for Nebraska: Annual time series data of petroleum consumption
and prices for Nebraska from 1990 to 1999
274 Chapter 8
where
b. Estimate the parameters of the model. Interpret bP, the estimate for βP.
Consider the reliability of the coefficient estimate.
275 Interval Estimates and Hypothesis Testing
2. Again, consider Nebraska’s petroleum consumption data in the 1990’s and the model cited
in question 1.
a. What does economic theory teach us about how the real price of petroleum should affect
Nebraska petroleum consumption?
b. Apply the hypothesis-testing approach that we developed to assess the theory.
Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999
a. What does economic theory teach us about how the real price of gasoline should affect
US gasoline consumption?
b. Apply the hypothesis-testing approach that we developed to assess the theory.
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia
276 Chapter 8
Conventional wisdom suggests that high school dropouts are more likely to smoke cigarettes
than those who graduate.
a. Apply the hypothesis-testing approach that we developed to assess the conventional
wisdom.
House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
277 Interval Estimates and Hypothesis Testing
It has been alleged that since the Congress was controlled by Democrats, Democratic members
received more solo earmarks than their non-Democratic colleagues.
a. Apply the hypothesis-testing approach that we developed to assess the allegation.
Wage and age data: Cross-sectional data of wages and ages for 190 union members included
in the March 2007 Current Population Survey who have earned high school degrees, but have
not had any additional education.
Many believe that unions strongly support the seniority system. Some union contracts require
employers to pay workers who have been on the job for many years more than newly hired
workers. Consequently older workers should typically be paid more than younger workers:
Use data from the March 2007 Current Population Survey to investigate the seniority theory.
a. Apply the hypothesis-testing approach that we developed to assess the seniority theory.
278 Chapter 8
Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.
a. What does economic theory teach us about how the real price of crude oil should affect
US crude oil production?
b. Apply the hypothesis-testing approach that we developed to assess the theory.
α: Right-tail
probability
t
0
Figure 8.9
Student t-distribution—Right-tail probabilities
279 Interval Estimates and Hypothesis Testing
Table 8.5
Right-tail critical values for the Student t-distribution
Table 8.5
(continued)
Table 8.6
Right-tail critical values for the Student t-distribution
Appendix 8.2 Assessing the Reliability of a Coefficient Estimate Using the Student
t-Distribution Table
We begin by describing the Student t-distribution table; a portion of it appears in table 8.6.
The first column represents the degrees of freedom. The numbers in the body of the table are
called the “critical values.” A critical value equals the number of standard errors a value lies
from the mean. The top row specifies α‘s value of, the “right-tail probability.” Figure 8.10 helps
us understand the table.
Since the t-distribution is symmetric, the “left tail probability” also equals α. The probability
of lying within the tails, in the center of the distribution, is 1 − 2α. This no doubt sounds con-
fusing, but everything should become clear after we show how Clint can use this table to answer
the interval estimate question.
Interval estimate question: What is the probability that the estimate, 1.2, lies within ____ of
the actual value? ____
281 Interval Estimates and Hypothesis Testing
Student t-distribution
1 –2α
α α
Estimate
Critical value × SE Critical value × SE
Distribution mean
Figure 8.10
Student t-distribution—Illustrating the probabilities
Let us review the regression results from Professor Lord’s first quiz:
Next we will modify figure 8.10 to reflect our specific example. Focus on figure 8.11:
•We are interested in the coefficient estimate; consequently we replace the horizontal axis label
by substituting bx for estimate.
• Also we know that the estimation procedure Clint uses, the ordinary least squares (OLS) esti-
mation procedure, is unbiased; hence the distribution mean equals the actual value. We can
replace the distribution mean with the actual coefficient value, βx.
Now let us help Clint fill in the blanks. When using the table we begin by filling in the second
blank rather than the first.
Clint must choose a value for α. As we will see, the value he chooses depends on how demand-
ing he is. For example, suppose that Clint believes that a 0.80 probability of the estimate lying
in the center of the distribution, close to the mean, is good enough. He would then choose an α
equal to 0.10. To understand why, note that when α equals 0.10, the probability of the estimate
lying in the right tail would be 0.10. Since the t-distribution is symmetric, the probability of the
estimate lying in the left tail would be 0.10 also. Therefore the probability that the estimate lies
in the center of the distribution would be 0.80; accordingly we write 0.80 in the second blank.
What is the probability that the estimate, 1.2, lies within _____ of the actual value? 0.80
282 Chapter 8
Student t-distribution
1 –2α
α α
b
x
Figure 8.11
Student t-distribution—Illustrating the probabilities for coefficient estimate
Table 8.7
Right-tail critical values for the Student t-distribution—α equals 0.10 and degrees of freedom equals 1
The first blank quantifies what “close to” means. The standard error and the Student t-distri-
bution table allow us to fill in the first blank. To do so, we begin by calculating the degrees of
freedom. Recall that the degrees of freedom equal 1:
=3−2
=1
Clint chose a value of α equal to 0.10 (figure 8.12). Table 8.7 indicates that the critical value
for α = 0.10 with one degree of freedom is 3.078. The probability that the estimate falls within
3.078 standard errors of the mean is 0.80. Next the regression results report that the standard
error equals 0.5196:
SE[bx] = 0.5196
283 Interval Estimates and Hypothesis Testing
Student t-distribution
0.80
0.10 0.10
bx
Critical value × SE Critical value × SE
Figure 8.12
Student t-distribution—Calculations for an α equal to 0.10
After multiplying the critical value given in the table, 3.078, by the standard error, 0.5196, we
can fill in the first blank:
What is the probability that the estimate, 1.2, lies within 1.6 of the actual value? 0.80
One-Tailed Tests, Two-Tailed Tests, and Logarithms
9
Chapter 9 Outline
1. Suppose that the following equation describes how Q and P are related: Q = βConstPβP.
a. What does dQ/dP equal?
b. Focus on the ratio of P to Q; that is, focus on P/Q. Substitute βConstPβP for Q and show
that P/Q equals 1/βConstPβP−1.
c. Show that (dQ/dP)(P/Q) equals βP.
2. We would like to express the percent changes algebraically. To do so, we begin with an
example. Suppose that X increases from 200 to 220.
a. In percentage terms by how much has X increased?
b. Argue that you have implicitly used the following equation to calculate the percent change:
ΔX
Percent change in X = × 100
X
3. Suppose that a household spends $1,000 of its income on a particular good every month.
a. What does the product of the good’s price, P, and the quantity of the good purchased by
the household each month, Q, equal?
b. Solve for Q.
c. Consider the function Q = βConstPβP. What would
i. βConst equal?
ii. βP equal?
5. What is the expression for a derivative of a natural logarithm? That is, what does d log(z)/dz
equal?1
Microeconomic theory tells us that the demand curve is typically downward sloping. In introduc-
tory economics and again in intermediate microeconomics we present sound logical arguments
justifying the shape of the demand curve. History has taught us many times, however, that just
because a theory sounds sensible does not necessary mean that it is true. We must test this theory
to determine if it is supported by real world evidence. We will focus on gasoline consumption
in the United States during the 1990s to test the downward sloping demand theory.
1. Be aware that sometimes natural logarithms are denoted as ln(z) rather than log(z). We will use the log(z) notation
for natural logarithms throughout this textbook.
287 One-Tailed Tests, Two-Tailed Tests, and Logarithms
Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.
Theory: A higher price decreases the quantity demanded; the demand curve is downward
sloping.
Project: Assess the effect of gasoline prices on gasoline consumption.
Step 0: Formulate a model reflecting the theory to be tested.
where
The theory suggests that βP should be negative. A higher price decreases the quantity demanded;
the demand curve is upward sloping.
Step 1: Collect data, run the regression, and interpret the estimates.
The gasoline consumption data can be accessed by clicking within the box below.
While the regression results (table 9.1) indeed support the theory, remember that we can never
expect an estimate to equal the actual value; sometimes the estimate will be greater than the
actual value and sometimes less. The fact that the estimate of the price coefficient is negative,
−151.7, is comforting, but it does not prove that the actual price coefficient, βP, is negative. In
fact we do not have and can never have indisputable evidence that the theory is correct. How
do we proceed?
288 Chapter 9
Table 9.1
Gasoline demand regression results
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: The price actually has no effect on the quantity of gasoline demanded; the nega-
tive coefficient estimate obtained from the data was just “the luck of the draw.” The actual
coefficient, βP, equals 0.
Now, we construct the null and alternative hypotheses:
The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is
consistent with the evidence.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and the price actually has no impact?
• Specific question: The regression’s coefficient estimate was −151.7: What is the probability
that the coefficient estimate in one regression would be −151.7 or less, if H0 were actually true
(if the actual coefficient, βP, equals 0)?
The magnitude of this probability determines whether we reject the null hypothesis:
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
If the null hypothesis were true, the actual price coefficient would equal 0. Since ordinary
least squares (OLS) estimation procedure for the coefficient value is unbiased, the mean of the
probability distribution for the coefficient estimates would be 0. The regression results provide
us with the standard error of the coefficient estimate. The degrees of freedom equal 8: the number
of observations, 10, less the number of parameters we are estimating, 2 (the constant and the
coefficient).
We now have the information needed to calculate Prob[Results IF H0 true], the probability of
result like the one obtained (or even stronger) if the null hypothesis, H0, were true. We could
use the Econometrics Lab to compute this probability, but in fact the statistical software has
already done this for us (table 9.2).
Recall that the Prob column reports the tails probability:
Tails probability: The probability that the coefficient estimate, bP, resulting from one regression
would lie at least 151.7 from 0, if the actual coefficient, βP, equals 0.
The tails probability reports the probability of lying in the two tails (figure 9.1). We are only
interested in the probability that the coefficient estimate will be −151.7 or less; that is, we are
only interested in the left tail. Since the Student t-distribution is symmetric, we divide the tails
probability by 2 to calculated Prob[Results IF H0 true]:
0.0128
Prob[ Results IF H 0 true] = = 0.0064
2
290 Chapter 9
Table 9.2
Gasoline demand regression results
Student t-distribution
Mean = 0
SE = 47.6
DF = 8
0.0128/2 0.0128/2
bP
−151.7 0
Figure 9.1
Probability distribution of linear model’s coefficient estimate
291 One-Tailed Tests, Two-Tailed Tests, and Logarithms
The traditional significance levels in academe are 1, 5, and 10 percent. In this case, the
Prob[Results IF H0 true] equals 0.0064, less than 0.01. So, even with a 1 percent significance
level, we would reject the null hypothesis that price has no impact on the quantity. This result
supports the theory that the demand curve is downward sloping.
Thus far we have considered only one-tailed tests because the theories we have investigated
suggest that the coefficient was greater than a specific value or less than a specific value:
• Quiz score theory: The theory suggested that studying increases quiz scores, that the coef-
ficient of minutes studied was greater than 0.
•Demand curve theory: The theory suggested that a higher price decreases the quantity sup-
plied, that the coefficient of price was less than 0.
In these cases, we were only concerned with one side or one tail of the distribution, either the
right tail or the left tail. Some theories, however, suggest that the coefficient equals a specific
value. In these cases, both sides (both tails) of the distribution are relevant and two-tailed tests
are appropriate. We will now investigate one such theory, the budget theory of demand.
The budget theory of demand postulates that households first decide on the total number of
dollars to spend on a good. Then, as the price of the good fluctuates, households adjust the
quantity they purchase to stay within their budgets. We will focus on gasoline consumption to
assess this theory:
292 Chapter 9
Budget theory of demand: Expenditures for gasoline are constant. That is, when gasoline prices
change, households adjust the quantity demanded so as to keep their gasoline expenditures
constant. Expressing this mathematically, the budget theory of demand postulates that the price,
P, times the quantity, Q, of the good demanded equals a constant:
P × Q = BudAmt
where
As we will learn, the price elasticity of demand is critical in assessing the budget theory of
demand. Consequently we will now review the verbal definition of the price elasticity of demand
and show how we can make it mathematically rigorous.
Verbal definition:The price elasticity demand equals the percent change in the quantity
demanded resulting from a 1 percent change in price.
To convert the verbal definition into a mathematical one, we start with the verbal definition:
Price elasticity of demand = Percent change in quantity demanded resulting from a 1 percent
change in price
X: 200 → 220
Percent change in X = (220 − 200)/200 × 100 = (20/200) × 100 = 0.1 × 100 = 10 percent. We
can generalize this:
ΔX
Percent change in X = × 100
X
Substituting for the percent changes
(ΔQ Q) × 100
=
(ΔP P ) × 100
293 One-Tailed Tests, Two-Tailed Tests, and Logarithms
Simplifying
ΔQ P
=
ΔP Q
Taking limits as ΔP approaches 0,
dQ P
=
dP Q
There always exists a potential confusion surrounding the numerical value for the price elasticity
of demand. Since the demand curve is downward sloping, dQ/dP is negative. Consequently the
price elasticity of demand will be negative. Some textbooks, in an effort to avoid negative
numbers, refer to price elasticity of demand as an absolute value. This can lead to confusion,
however. Accordingly we will adopt the more straightforward approach: our elasticity of demand
will be defined so that it is negative.
Now we are prepared to embark on the hypothesis-testing process.
Q = βConstPβP
Before doing anything else, however, let us now explain why this model indeed exhibits
constant price elasticity. We start with the mathematical definition of the price elasticity of
demand:
dQ P
Price elasticity of demand =
dP Q
dQ P
Price elasticity of demand =
dP Q
Simplifying
= βP
The price elasticity of demand just equals the value of βP, the exponent of the price, P.
A little algebra allows us to show that the budget theory of demand postulates that the price
elasticity of demand, βP, equals −1. First start with the budget theory of demand:
P × Q = BudAmt
Q = BudAmt × P−1
Q = βConstPβP
Clearly,
This allows us to reframe the budget theory of demand in terms of the price elasticity of
demand, βP:
Q = βConst PβP
where
Log Q = log(Q)
c = log(βConst)
Log P = log(P)
Step 1: Collect data, run the regression, and interpret the estimates.
Recall that we are using US gasoline consumption data to assess the theory.
295 One-Tailed Tests, Two-Tailed Tests, and Logarithms
Gasoline consumption data: Annual time series data for US gasoline consumption and prices
from 1990 to 1999.
We must generate the two variables: the logarithm of quantity and the logarithm of price:
• LogQt = log(GasConst)
• LogPt = log(PriceDollarst)
logq = log(gascons)
• Click OK.
logp = log(pricedollars)
• Click OK.
Now we can use EViews to run a regression with logq, the logarithm of quantity, as the dependent
variable and logp, the logarithm of price, as the explanatory variable.
•In the Workfile window: Click on the dependent variable, logq, first, and then click on the
explanatory variable, logp, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
• Do not forget to close the workfile.
296 Chapter 9
Note that estimate for the price elasticity of demand equals −0.586 (table 9.3). Since the budget
theory of demand postulates that the price elasticity of demand equals −1.0, the critical result is
not whether the estimate is above or below −1.0. Instead, the critical result is that the estimate
does not equal −1.0; more specifically, the estimate is 0.414 from −1.0. Had the estimate been
−1.414 rather than −0.586, the results would have been just as troubling as far as the budget
theory of demand is concerned (see figure 9.2).
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
The cynic always challenges the evidence. The regression results suggest that the price elastic-
ity of demand does not equal −1.0 since the coefficient estimate equals −0.586. Accordingly, the
cynic challenges the evidence by asserting that it does equal −1.0.
Table 9.3
Budget theory of demand regression results
Theory Evidence
Price
elasticity
0.414 of demand
−1. 0 −0.586 0
Figure 9.2
Number line illustration of critical result
297 One-Tailed Tests, Two-Tailed Tests, and Logarithms
Cynic’s view:Sure the coefficient estimate from regression suggests that the price elasticity of
demand does not equal −1.0, but this is just “the luck of the draw.” The actual price elasticity
of demand equals −1.0.
Question: Can we dismiss the cynic’s view as absurd?
Answer: No, as a consequence of random influences. Even if the actual price elasticity equals
−1.0, we could never expect the estimate to equal precisely −1.0. The effect of random influences
is captured formally by the “statistical significance question:”
Statistical significance question: Is the estimate of −0.586 statistically different from −1.0?
More precisely, if the actual value equals −1.0, how likely would it be for random influences to
cause the estimate to be 0.414 or more from −1.0?
We will now construct the null and alternative hypotheses to address this question:
H0: βP = −1.0 Cynic’s view is correct; actual price elasticity of demand equals −1.0.
H1: βP ≠ −1.0 Cynic’s view is incorrect; actual price elasticity of demand does not
equal −1.0.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
The magnitude of this probability determines whether we reject the null hypothesis:
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
298 Chapter 9
If the null hypothesis were true, the actual coefficient would equal −1.0. Since ordinary least
squares (OLS) estimation procedure for the coefficient value is unbiased, the mean of the prob-
ability distribution of coefficient estimates would be −1.0. The regression results provide us with
the standard error of the coefficient estimate. The degrees of freedom equal 8: the number of
observations, 10, less the number of parameters we are estimating, 2 (the constant and the
coefficient).
Can we use the “tails probability” as reported in the regression results to compute Prob[Results
IF H0 true]? Unfortunately, we cannot. The tails probability appearing in the Prob column of the
regression results is based on the premise that the actual value of the coefficient equals 0. Our
null hypothesis claims that the actual coefficient equals −1.0, not 0. Accordingly the regression
results appearing in table 9.3 do not report the probability we need.
We can, however, use the Econometrics Lab to compute the probability.
Econometrics Lab 9.1: Using the Econometrics Lab to Calculate Prob[Results IF H0 True]
or more above −1.0; that is, the probability that the estimate lies at or above −0.586.
Student t-distribution
Mean = −1.0
SE = 0.183
DF = 8
0.027 0.027
bP
0.414 0.414
−1.414 −1. 0 −0.586
Figure 9.3
Probability distribution of constant elasticity model’s coefficient estimate
Recall why we could not use the tails probability appearing in the regression results to cal-
culate the probability? The regression’s tail’s probability is based on the premise that the value
of the actual coefficient equals 0. Our null hypothesis, however, is based on the premise that the
value of the actual coefficient equals −1.0. So the regression results do not report the probability
we need.
It is very convenient to use the regression results to calculate the probabilities, however. In fact
we can do so by being clever. Since the results report the tails probability based on the premise
300 Chapter 9
that the actual coefficient equals 0, we can cleverly define a new coefficient that equals 0 when-
ever the price elasticity of demand equals −1.0. The following definition accomplishes this:
βClever = βP + 1.0
The critical property of βClever’s definition is that the price elasticity of demand, βP, equals −1.0
if and only if βClever equals 0:
βP = −1.0 Ù βClever = 0
Next recall the log form of the constant price elasticity model:
LogQt = c + βPLogPt
where
LogQt = log(GasConst)
LogPt = log(Pricet)
Let us now perform a little algebra. Since βClever = βP + 1.0, βP = βClever − 1.0. Let us substitute
for βP:
LogQt = c + βPLogPt
where
We can now express the hypotheses in terms of βClever. Recall that βP = −1.0 if and only if
βClever = 0:
H0: βP = −1.0 Ù H0: βClever = 0 Actual price elasticity of demand equals −1.0.
H1: βP ≠ −1.0 Ù H1: βClever ≠ 0 Actual price elasticity of demand does not equal −1.0.
301 One-Tailed Tests, Two-Tailed Tests, and Logarithms
Now we can use EViews to run a regression with yclever as the dependent variable and logp as
the explanatory variable.
•In the Workfile window: Click on the dependent variable, logqpluslogp, first; and then click
on the explanatory variable, logp, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
• Do not forget to close the workfile.
Table 9.4
Budget theory of demand regression results with clever algebra
First let us compare the estimates for βP in table 9.3 and βClever in table 9.4
• Estimate for βP, bP, equals −0.586;
• Estimate for βClever, bClever, equals 0.414.
This is consistent with the definition of βClever. By definition, βClever equals βP plus 1.0:
βClever = βP + 1.0
bClever = bP + 1.0
= −0.586 + 1.0
= 0.414
Student t-distribution
Mean = 0
SE = 0.183
DF = 8
0.0538/2 0.0538/2
bClever
0.414 0.414
0 0.414
Figure 9.4
Probability distribution of constant elasticity model’s coefficient estimate = clever approach
303 One-Tailed Tests, Two-Tailed Tests, and Logarithms
This is the same value for the probability that we computed when we used the Econometrics
Lab. By a clever algebraic manipulation, we can get the statistical software to perform the prob-
ability calculations. Now we turn to the final hypothesis-testing step.
The significance level is the dividing line between the probability being small and the prob-
ability being large.
Prob[Results IF H0 true] Prob[Results IF H0 true]
less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 true] Prob[Results IF H0 true] large
small
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
At a 1 or 5 percent significance level, we do not reject the null hypothesis that the elasticity of
demand equals −1.0, thereby supporting the budget theory of demand. That is, at a 1 or 5 percent
significance level, the estimate of −0.586 is not statistically different from −1.0.
The theory that we are testing determines whether we should use of a one-tailed or two-tailed
test. When the theory suggests that the actual value of a coefficient is greater than or less than
a specific constant, a one-tailed test is appropriate. Most economic theories fall into this category.
In fact most economic theories suggest that the actual value of the coefficient is either greater
than 0 or less than 0 (see figure 9.5). For example, economic theory teaches that the price should
have a negative influence on the quantity demanded; similarly theory teaches that the price
should have a positive influence on the quantity supplied. In most cases economists use one-
tailed tests. However, some theories suggest that the coefficient equals a specific value; in these
cases a two-tailed test is required.
The constant price elasticity model is just one example of how logarithms can be a useful
econometric tool. Generally, logarithms provide a very convenient way to test hypotheses that
304 Chapter 9
H0 : β = c
H 1: β > c
b
c
Probability distribution
H0 : β = c
H1 : β < c
b
c
Theory: β = c
Probability distribution
H0 : β = c
H1 : β ≠ c
b
c
Prob[Results IF H true] =
0
Probability of obtaining results like those
we actually got (or even stronger), if H0 is true
Prob[Results IF H true] =
0
Small Large
Figure 9.5
One-tailed and two-tailed tests—A comparison
305 One-Tailed Tests, Two-Tailed Tests, and Logarithms
are expressed in terms of percentages rather than “natural” units. To see how, we will first review
three concepts:
• The interpretation of the coefficient estimate
• The differential approximation
• The derivative of a logarithm
Let x increase by Δx: x → x + Δx. Consequently the estimated value of y will increase by Δy:
Esty → Esty + Δy:
Esty + Δy = bConst + bx x + bx Δx
Reconsider the original equation
Esty = bConst + bx x
Δy = bx Δx
In words, bx estimates the unit change in the dependent variable y resulting from a one unit
change in explanatory variable x.
In words, the derivative tells us by approximately how much y changes when x changes by a
small amount; that is, the derivative equals the change in y caused by a one (small) unit change
in x.
306 Chapter 9
The derivative of the natural logarithm of z with respect to z equals 1 divided by z.2 We have
already considered the case in which both the dependent variable and explanatory variable are
logarithms. Now we will consider two cases in which only one of the two variables is a
logarithm:
• Dependent variable is a logarithm.
• Explanatory variable is a logarithm.
Δz
≈ bx Δx
z
Δz Multiply both sides of the
× 100 ≈ (bx × 100)Δx
z equation by 100,
Interpretation of Δz/z × 100:
percent change in z
In words, when the dependent variable is a logarithm, bx × 100 estimates the percent change
in the dependent variable resulting from a one unit change in the explanatory variable, which is
the percent change in y resulting from a one (natural) unit change in x.
2. The log notation refers to the natural logarithm (logarithm base e), not the logarithm base 10.
307 One-Tailed Tests, Two-Tailed Tests, and Logarithms
In words, when the explanatory variable is a logarithm, bx/100 estimates the (natural) unit
change in the dependent variable resulting from a 1 percent change in the explanatory variable,
which is the unit change in z resulting from a 1 percent change in z.
To illustrate the usefulness of logarithms, consider the effect of a worker’s high education on
his/her wage. Economic theory (and common sense) suggests that a worker’s wage is influenced
by the number of years of education he/she completes:
To assess this theory we will focus on the effect of high school education; we consider workers
who have completed the ninth, tenth, eleventh, or twelfth grades and have not continued on to
college or junior college. We will use data from the March 2007 Current Population Survey. In
the process we can illustrate the usefulness of logarithms. Logarithms allow us to fine-tune our
hypotheses by expressing them in terms of percentages.
308 Chapter 9
Wage and education data: Cross-sectional data of wages and education for 212 workers
included in the March 2007 Current Population Survey residing in the Northeast region of the
United States who have completed the ninth, tenth, eleventh, or twelfth grades, but have not
continued on to college or junior college.
We can consider four models that capture the theory in somewhat different ways:
• Linear model
• Log dependent variable model
• Log explanatory variable model
• Log-log (constant elasticity) model
The linear model includes no logarithms. Wage is expressed in dollars and education
in years.
As table 9.5 reports we estimate that an additional year of high school increases the wat by about
$1.65 per hour. It is very common to express wage increases in this way. All the time we hear
people say that they received a $1.00 per hour raise or a $2.00 per hour raise. It is also very
Table 9.5
Wage regression results with linear model
common to hear raises expressed in percentage terms. When the results of labor new contracts
are announced the wage increases are typically expressed in percentage terms; management
agreed to give workers a 2 percent increase or a 3 percent increase. This observation leads us
to our next model: the log dependent variable model.
LogWage = log(Wage)
The dependent variable (LogWage) is expressed in terms of the logarithm of dollars; the explana-
tory variable (HSEduc) is expressed in years (table 9.6).
Let us compare the estimates derived by our two models:
• The linear model implicitly assumes that the impact of one additional year of high school
education is the same for each worker in terms of dollars. We estimate that a worker’s wage
increases by $1.65 per hour for each additional year of high school (table 9.5).
•The log dependent variable model implicitly assumes that the impact of one additional year of
high school education is the same for each worker in terms of percentages. We estimate that a
worker’s wage increases by 11.4 percent for each additional year of high school (table 9.6).
The estimates each model provides differ somewhat. For example, consider two workers, the
first earning $10.00 per hour and a second earning $20.00. On the one hand, the linear model
estimates that an additional year of high school would increase the wage of each worker by
Table 9.6
Wage regression results with log dependent variable model
$1.65 per hour. On the other hand, the log dependent variable model estimates that an additional
hear of high school would increase the wage of the first worker by 11.4 percent of $10.00, $1.14
and the second worker by 11 percent of $20.00, $2.28.
As we will see, the last two models (the log explanatory variable and log-log models) are not
particularly natural in this context of this example. We seldom express differentials in education
as percentage differences. Nevertheless, the log explanatory variable and log-log models are
appropriate in many other contexts. Therefore we will apply them to our wage and education
data even thought the interpretations will sound unusual.
LogHSEduc = log(HSEduc)
The dependent variable (Wage) is expressed in terms of dollars; the explanatory variable
(LogHSEduc) is expressed in terms of the log of years. As mentioned above, this model is not
particularly appropriate for this example because we do not usually express education differences
in percentage terms. Nevertheless, the example does illustrate how we interpret the coefficient
in a log explanatory variable model. The regression results estimate that a 1 percent increase in
high school education increases the way by about $.17 per hour (table 9.7).
Table 9.7
Wage regression results with log explanatory variable model
Table 9.8
Wage regression results with constant elasticity model
Both the dependent and explanatory variables are expressed in terms of logs. This is just the
constant elasticity model that we discussed earlier. The regression results estimate that a
1 percent increase in high school education increases the wage by 1.2 percent (table 9.8).
While the log-log model is not particularly appropriate in this case, we have already seen that
it can be appropriate in other contexts. For example, this was the model we used to assess the
budget theory of demand earlier in this chapter.
1. Consider the general structure of the theory, the null hypothesis, and alternative hypothesis.
When is a
a. One-tailed hypothesis appropriate?
Theory: ___________________
H0: ___________
H1: ___________
b. Two-tailed hypothesis appropriate?
Theory: ___________________
H0: ___________
H1: ___________
2. How should the coefficient estimate be interpreted when the dependent and explanatory
variables are specified as:
a. Dependent variable: y and explanatory variable: x
b. Dependent variable: log(y) and explanatory variable: x
c. Dependent variable: y and explanatory variable: log(x)
d. Dependent variable: log(y) and explanatory variable: log(x)
Chapter 9 Exercises
Petroleum consumption data for Nebraska: Annual time series data of petroleum consumption
and prices for Nebraska from 1990 to 1999.
2. Consider the budget theory of demand in the context of per capita petroleum consumption:
where
Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.
Gasoline consumption data: Annual time series data for US gasoline consumption and prices
from 1990 to 1999.
a. Would you theorize the price elasticity of demand for gasoline to be elastic or inelastic?
Explain.
b. Apply the hypothesis-testing approach that we developed to assess the theory.
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
a. Would you theorize that the price elasticity of demand for cigarettes would be elastic or
inelastic? Explain.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the price elasticity
of demand.
c. Does your estimate for the price elasticity of demand support your theory? Explain.
Wage and age data: Cross section data of wages and ages for 190 union members included in
the March 2007 Current Population Survey who have earned high school degrees, but have not
had any additional education.
We often describe wage increases in terms of percent changes. Apply the hypothesis-testing
approach that we developed to assess this “percent increase version” of the seniority theory.
a. Apply the hypothesis-testing approach that we developed to assess the seniority theory.
7. Revisit the effect that the Current Population Survey labor supply data.
Labor supply data: Cross-sectional data of hours worked and wages for the 92 married workers
included in the March 2007 Current Population Survey residing in the Northeast region of the
United States who earned bachelor, but no advanced, degrees.
Consider the theory that the supply elasticity is inelastic; that is, that the wage elasticity is
less than 1.
a. Apply the hypothesis-testing approach that we developed to assess the inelastic labor
supply theory.
Chapter 10 Outline
Q = βConstPβPIβIChickPβCP
where
Q = βConst ⎜ ⎟
⎝ ChickP ⎠ ⎜⎝ ChickP ⎟⎠
318 Chapter 10
b. If βCP = −βP − βI, what happens to the quantity of beef demanded when the price of beef
(the good’s own price, P), income (I), and the price of chicken (ChickP) all double?
c. If βP + βI + βCP = 0, what happens to the quantity of beef demanded when the price of
beef (the good’s own price, P), income (I), and the price of chicken (ChickP) all double?
Q = βConstPβPIβIChickPβCP
Thus far we have focused our attention on simple regression analysis where the model assumes
that only a single explanatory variable affects the dependent variable. In the real world, however,
a dependent variable typically depends on many explanatory variables. For example, while
economic theory teaches that the quantity of a good demanded depends on the good’s own price,
theory also tells us that the quantity depends on other factors also: income, the price of other
goods, and so on. Multiple regression analysis allows us to assess such theories.
• Multiple regression analysis attempts to sort out the individual effect of each explanatory
variable.
• An explanatory variable’s coefficient estimate allows us to estimate the change in the dependent
variable resulting from a change in that particular explanatory variable while all other explana-
tory variables remain constant.
D
Q
Figure 10.1
Downward sloping demand curve
Graphically, the theory is illustrated by a downward sloping demand curve (figure 10.1). When
we draw a demand curve for a good, we implicitly assume that all factors relevant to demand
other than that good’s own price remain the constant.
We will focus on the demand for a particular good, beef, to illustrate the importance of mul-
tiple regression analysis. We now apply the hypothesis-testing steps.
We will use a linear demand model to test the theory. Naturally the quantity of beef demanded
depends on its own price, the price of beef. Furthermore we postulate that the quantity of beef
demanded also depends on income and the price of chicken. In other words, our model proposes
that the factors relevant to the demand for beef, other than beef’s own price, are income and the
price of chicken.
where
The theory suggests that when income and the price of chicken remain constant, an increase
in the price of beef (the good’s own price) decreases the quantity of beef demanded (figure 10.2);
similarly, when income and the price chicken remain constant, a decrease in the price of beef
(the good’s own price) increases the quantity of beef demanded:
320 Chapter 10
D
Q
Figure 10.2
Downward sloping demand curve for beef
Economic theory teaches that the sign of coefficients for the explanatory variables other than
the good’s own price may be positive or negative. Their signs depend on the particular good in
question:
• The sign of βI depends on whether beef is a normal or inferior good. Beef is generally regarded
as a normal good; consequently we would expect βI to be positive: an increase in income results
in an increase in the quantity of beef demanded.
321 Multiple Regression Analysis—Introduction
Step 1: Collect data, run the regression, and interpret the estimates.
Beef consumption data: Monthly time series data of beef consumption, beef prices, income,
and chicken prices from 1985 and 1986 (table 10.1).
We now use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 10.2).
To interpret these estimates, let us for the moment replace the numerical value of each estimate
with the italicized lower case Roman letter b, b, that we use to denote the estimate. That is,
replace the estimated:
• Constant, 159,032, with bConst
• Price coefficient, −549.5, with bP
• Income coefficient, 24.25, with bI
• Chicken price coefficient, 287.4, with bCP
Table 10.1
Monthly beef demand data from 1985 and 1986
Table 10.2
Beef demand regression results—Linear model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
The coefficient estimates attempt to separate out the individual effect that each explanatory
variable has on the dependent variable. To justify this, focus on the estimate of the beef price
coefficient, bP. It estimates by how much the quantity of beef changes when the price of beef
(the good’s own price) changes while income and the price of chicken (all other explanatory
variables) remain constant. More formally, when all other explanatory variables remain
constant:
ΔQ
ΔQ = bP ΔP or bP =
ΔP
where
A little algebra explains why. We begin with the equation estimating our model:
Now increase the price of beef (the good’s own price) by ΔP while keeping all other explanatory
variables constant. ΔQ estimates the resulting change in quantity of beef demanded.
From To
Price: P → P + ΔP
Quantity: EstQ → EstQ + ΔQ
while all other explanatory variables remain constant; that is, while I and ChickP remain
constant.
In the equation estimating our model, substitute EstQ + ΔQ for EstQ and P + ΔP for P:
ΔQ = 0 + bPΔP + 0 + 0
Simplifying
324 Chapter 10
ΔQ = bPΔP
Dividing through by ΔP
ΔQ
= bP while all other explanatory variables remain constant
ΔP
To summarize,
ΔQ
ΔQ = bP ΔP or bP =
ΔP
ΔQ
ΔQ = bI ΔI or bI =
ΔI
"Slope" = bP
ΔQ
D
Q
Figure 10.3
Demand curve “slope”
325 Multiple Regression Analysis—Introduction
while all other explanatory variables (P and ChickP) remain constant. bI estimates the change
in quantity when income changes while all other explanatory variables (the price of beef and
the price of chicken) remain constant.
ΔQ
ΔQ = bCP ΔChickP or bCP =
ΔPChick
while all other explanatory variables (P and I) remain constant. bCP estimates the change in
quantity when the price of chicken changes while all other explanatory variables (the price of
beef and income) remain constant.
What happens when the price of beef (the good’s own price), income, and the price of chicken
change simultaneously? The total estimated change in the quantity of beef demanded just equals
the sum of the individual changes; that is, the total estimated change in the quantity of beef
demanded equals the change resulting from the change in
• the price of beef (the good’s own price)
plus
• income
plus
• the price of chicken
Each term estimates the change in the dependent variable, quantity of beef demanded, resulting
from a change in each individual explanatory variable.
The estimates achieve the goal:
Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect of each explanatory variable. An explanatory variable’s coefficient estimate allows
us to estimate the change in the dependent variable resulting from a change in that particular
explanatory variable while all other explanatory variables remain constant.
ΔQ = bpΔP = −549.5ΔP
Interpretation: The ordinary least squares (OLS) estimate of the price coefficient equals −549.5;
that is, we estimate that if the price of beef increases by 1 cent while income and the price of
chicken remain unchanged, the quantity of beef demanded decreases by about 549.5 million
pounds.
ΔQ = bIΔI = 24.25ΔI
Interpretation: The ordinary least squares (OLS) estimate of the income coefficient equals
24.25; that is, we estimate that if disposable income increases by 1 billion dollars while the price
of beef and the price of chicken remain unchanged, the quantity of beef demanded increases by
about 24.25 million pounds.
ΔQ = bCPΔChickP = 287.4ΔChickP
Interpretation: The ordinary least squares (OLS) estimate of the chicken price coefficient equals
287.4; that is, we estimate that if the price of chicken increases by 1 cent while the price of beef
and income remain unchanged, the quantity of beef demanded increases by about 287.4 million
pounds.
We estimate that the total change in the quantity of beef demanded equals −549.5 times the
change in the price of beef (the good’s own price) plus 24.25 times the change in disposable
income plus 287.4 times the change in the price of chicken.
Recall that the sign of the estimate for the good’s own price coefficient, bP, determines whether
or not the data support the downward sloping demand theory. bP estimates the change in the
quantity of beef demanded when the price of beef (the good’s own price) changes while the
other explanatory variables, income and the price of chicken, remain constant. The theory pos-
tulates that an increase in the good’s own price decreases the quantity of beef demanded. The
negative price coefficient estimate lends support to the theory.
327 Multiple Regression Analysis—Introduction
The own price coefficient estimate is −549.5. The negative sign of the coefficient
Critical result:
estimate suggests that an increase in the price decreases the quantity of beef demanded. This
evidence supports the downward sloping theory.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
The cynic is skeptical of the evidence supporting the view that the actual price coefficient,
βP, is negative; that is, the cynic challenges the evidence and hence the downward sloping
demand theory:
Cynic’s view: Sure, the price coefficient estimate from the regression suggests that the demand
curve is downward sloping, but this is just “the luck of the draw.” The actual price coefficient,
βP, equals 0.
H0: βP = 0 Cynic is correct: The price of beef (the good’s own price) has no effect on
quantity of beef demanded.
H1: βP < 0 Cynic is incorrect: An increase in the price decreases quantity of beef demanded.
The null hypothesis, like the cynic, challenges the evidence: an increase in the price of beef has
no effect on the quantity of beef demanded. The alternative hypothesis is consistent with the
evidence: an increase in the price decreases the quantity of beef demanded.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we obtained
(or even stronger), if the cynic is correct and the price of beef actually has no impact?
•Specific question: What is the probability that the coefficient estimate, bP, in one regression
would be −549.5 or less, if H0 were true (if the actual price coefficient, βP, equals 0)?
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true] (figure 10.5).
Prob[Results IF H0 true]
bP
−549.5 0
Figure 10.4
Probability distribution of coefficient estimate for the beef price
Student t-distribution
0.0004/2 = 0.0002 Mean = 0
SE = 130.3
DF = 20
bP
−549.5 0
Figure 10.5
Calculating Prob[Results IF H0 true]
We can now calculate Prob[Results IF H0 true]. The easiest way is to use the regression results.
Recall that the tails probability is reported in the Prob column. The tails probability is .0004;
therefore, to calculate Prob[Results IF H0 true] we need only divide 0.0004 by 2 (table 10.3).
0.0004
Prob[ Results IF H 0 true] = ≈ 0.0002
2
The significance level is the dividing line between the probability being small and the probability
being large.
329 Multiple Regression Analysis—Introduction
Table 10.3
Beef demand regression results—Linear model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
We can reject the null hypothesis at the traditional significance levels of 1, 5, and 10 percent.
Consequently the data support the downward sloping demand theory.
We will now consider a second theory regarding demand. Microeconomic theory teaches that
there is no money illusion; that is, if all prices and income change by the same proportion, the
quantity of a good demanded will not change. The basic rationale of this theory is clear. Suppose
that all prices double. Every good would be twice as expensive. If income also doubles, however,
consumers would have twice as much to spend. When all prices and income double, there is no
reason for a consumer to change his/her spending patterns; that is, there is no reason for a con-
sumer to change the quantity of any good he/she demands.
We can use indifference curve analysis to motivate this more formally.1 Recall the household’s
utility maximizing problem:
1. If you are not familiar with indifference curve analysis, please skip to the Linear Demand Model and Money Illusion
Theory section and accept the fact that the no money illusion theory is well grounded in economic theory.
330 Chapter 10
Y
Indifference curve
I /PY
Solution
Budget constraint
Slope = −PX /PY
X
I /P
X
Figure 10.6
Utility maximization
A household chooses the bundle of goods that maximizes its utility subject to its budget con-
straint. How can we illustrate the solution to the household’s problem? First, we draw the budget
constraint. To do so, let us calculate its intercepts.
PXX + PYY = I
X-intercept: Y = 0 Y-intercept: X = 0
↓ ↓
PXX = I PYY = I
↓ ↓
I I
X= Y=
PX PY
Next, to maximize utility, we find the highest indifference curve that still touches the budget
constraint as illustrated in figure 10.6.
Now suppose that all prices and income double:
Before After
max Utility = U(X, Y) PX → 2PX max Utility = U(X, Y)
s.t. PXX + PYY = I PY → 2PY s.t. 2PXX + 2PYY = 2I
I → 2I
331 Multiple Regression Analysis—Introduction
How is the budget constraint affected? To answer this question, calculate the intercepts after all
prices and income have doubled and then compare them to the original ones:
2PXX + 2PYY = 2I
X-intercept: Y = 0 Y-intercept: X = 0
↓ ↓
2PXX = 2I 2PYY = 2I
↓ ↓
2I I 2I I
X= = Y= =
2 PX PX 2 PY PY
Since the intercepts have not changed, the budget constraint line has not changed; hence, the
solution to the household’s constrained utility maximizing problem will not change.
In sum, the no money illusion theory is based on sound logic. But remember, many theories
that appear to be sensible turn out to be incorrect. That is why we must test our theories.
Project: Use the beef demand data to assess the no money illusion theory.
Can we use our linear demand model to do so? Unfortunately, the answer is no. The linear
demand model is inconsistent with the proposition of no money illusion. We will now explain
why.
The linear demand model is inconsistent with the no money illusion proposition because it
implicitly assumes that “slope” of the demand curve equals a constant value, βP, and unaffected
by income or the price chicken.2 To understand why, consider the linear model:
and recall that when we draw a demand curve income and the price of chicken remain constant.
Consequently for a demand curve:
Q = QIntercept + βPP
where
2. Again, recall that quantity is plotted on the horizontal and price is plotted on the vertical axis, the slope of the demand
curve is actually the reciprocal of βP, 1/βP. That is why we place the word “slope” within quotes. This does not affect
the validity of our argument, however. The important point is that the linear model implicitly assumes that the “slope”
of the demand curve is constant, unaffected by changes in other factors relevant to demand.
332 Chapter 10
P P P
P0
P1
P2
Q Q1 Q Q
Q0 Q2
Figure 10.7
Demand curve for beef
This is just an equation for a straight line; βP equals the “slope” of the demand curve.
Now consider three different beef prices and the quantity of beef demanded at each of the
prices while income and chicken prices remain constant:
When the price of beef is P0, Q0 units of beef are demanded; when the price of beef is P1, Q1
units of beef are demanded; and when the price of beef is P2, Q2 units of beef are demanded
(figure 10.7).
Now, suppose that income and the price of chicken doubles. When there is no money
illusion:
• Q0 units of beef would still be demanded if the price of beef rises from P0 to 2P0.
• Q1 units of beef would still be demanded if the price of beef rises from P1 to 2P1.
• Q2 units of beef would still be demanded if the price of beef rises from P2 to 2P2.
"Slope" = βP "Slope" = βP
"Slope" = βP
P0
2P1
P1
2P2
P2
Q Q Q
Q0 Q1 Q2
Figure 10.8
Demand curve for beef and no money illusion
To test the theory of no money illusion, we need a model of demand that can be consistent with
it. The constant elasticity demand model is such a model:
Q = βConstPβPIβIChickPβCP
The three exponents equal the elasticities. The beef price exponent equals the own price elasticity
of demand, the income exponent equals the income elasticity of demand, and the exponent of
the price of chicken equals the cross price elasticity of demand:
334 Chapter 10
"Slope" = βP
Initial
Figure 10.9
Two demand curves for beef—Before and after income and the price of chicken doubling
A little algebra allows us to show that the constant elasticity demand model is consistent with
the money illusion theory whenever the exponents sum to 0. Let βP + βI + βCP = 0 and solve for βCP:
βP + βI + βCP = 0
↓
βCP = −βP − βI
Q = βConstPβPIβIChickPβCP
βConstPβPIβIChickP(−βP−βI)
βConstPβPIβIChickP−βPChickP−βI
⎛ P βP ⎞ ⎛ I βI ⎞
= βConst ⎜
⎝ ChickP β P ⎟⎠ ⎜⎝ ChickP β I ⎟⎠
Simplifying
β βI
⎛ P ⎞ ⎛ I ⎞
P
= βConst ⎜
⎝ ChickP ⎟⎠ ⎜⎝ ChickP ⎟⎠
What happens to the two fractions whenever the price of beef (the good’s own price), income,
and the price of chicken change by the same proportion? Both the numerators and denominators
increase by the same proportion; hence the fractions remain the same. Therefore the quantity of
beef demanded remains the same. This model of demand is consistent with our theory whenever
the exponents sum to 0.
Let us begin the hypothesis testing process. We have already completed step 0.
Q = βConstPβPIβIChickPβCP
No money illusion theory: The elasticities sum to 0: βP + βI + βCP = 0.
Step 1: Collect data, run the regression, and interpret the estimates.
Natural logarithms convert the original equation for the constant elasticity demand model into
its “linear” form:
Table 10.4
Beef demand regression results—Constant elasticity model
To apply the ordinary least squares (OLS) estimation procedure we must first generate the
logarithms:
where
LogQt = log(Qt)
LogPt = log(Pt)
LogIt = log(It)
LogChickPt = log(ChickPt)
Next we run a regression with the log of the quantity of beef demanded as the dependent
variable; the log of the price of beef (the good’s own price), log of income, and the log of the
price of the price of chicken are the explanatory variables (table 10.4).
What happens when the price of beef (the good’s own price), income, and the price of chicken
increase by 1 percent simultaneously? The total estimated percent change in the quantity of beef
demanded equals sum of the individual changes. That is, the total estimated percent change in
the quantity of beef demanded equals the estimated percent change in the quantity demanded
resulting from
• a 1 percent change in the price of beef (the good’s own price)
plus
• a 1 percent change in income
plus
• a 1 percent change in the price of chicken.
The estimated percent change in the quantity demanded equals the sum of the elasticity esti-
mates. We can express this succinctly:
1 Percent increase in
↓
Price of beef Income Price of chicken
Estimated ↓ ↓ ↓
percent change = bP + bI + bCP
in Q = −0.41 + 0.51 + 0.12
= 0.22
A 1 percent increase in all prices and income results in a 0.22 percent increase in quantity of
beef demanded, suggesting that money illusion is present. As far as the no money illusion theory
is concerned, the sign of the elasticity estimate sum is not critical. The fact that the estimated
sum is +0.22 is not crucial; a sum of −0.22 would be just as damning. What is critical is that
the sum does not equal 0 as claimed by the money illusion theory.
Critical result: The sum of the elasticity estimates equals 0.22. The sum does not equal 0; the
sum is 0.22 from 0. This evidence suggests that money illusion is present and the no money
illusion theory is incorrect.
Since the critical result is that the sum lies 0.22 from 0, a two-tailed test, rather than a one-
tailed test is appropriate.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
338 Chapter 10
Cynic’s view: Sure, the elasticity estimates do not sum to 0 suggesting that money illusion
exists, but this is just “the luck of the draw.” In fact money illusion is not present; the sum of
the actual elasticities equals 0.
The cynic claims that the 0.22 elasticity estimate sum results simply from random influences.
A more formal way of expressing the cynic’s view is to say that the 0.22 estimate for the elastic-
ity sum is not statistically different from 0. An estimate is not statistically different from 0
whenever the nonzero results from random influences.
Let us now construct the null and alternative hypotheses:
The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is con-
sistent with the evidence. Can we dismiss the cynic’s view as nonsense?
We will use a simulation to show that the cynic could indeed be correct. In this simulation, Coef1,
Coef2, and Coef3 denote the coefficients for the three explanatory variables. By default, the actual
values of the coefficients are −0.5, 0.4, and 0.1. The actual values sum to 0 (figure 10.10).
Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5
0.0 23 Coefficient
0.1 24 estimates and
0.2 25 their sum for
this repetition
Figure 10.10
Sum of the elasticity estimates and random influences
339 Multiple Regression Analysis—Introduction
Be certain that the Pause checkbox is checked. Click Start. The coefficient estimates for each
of the three coefficients and their sum are reported:
• The coefficient estimates do not equal their actual values.
• The sum of the coefficient estimates does not equal 0 even though the sum of the actual coef-
ficient values equals 0.
Click Continue a few more times. As a consequence of random influences we could never expect
the estimate for an individual coefficient to equal its actual value. Therefore we could never
expect a sum of coefficient estimates to equal the sum of their actual values. Even if the actual
elasticities sum to 0, we could never expect the sum of their estimates to equal precisely 0.
Consequently we cannot dismiss the cynic’s view as nonsense.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and money illusion was not present?
• Specific question: The sum of the coefficient estimates is 0.22 from 0. What is the probabil-
ity that the sum of the coefficient estimates in one regression would be 0.22 or more from 0, if
H0 were true (if the sum of the actual elasticities equaled 0)?
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
How can we calculate this probability? We will explore three approaches that can be used:
• Clever algebraic manipulation
• Wald (F-distribution) test
• Letting statistical software do the work
340 Chapter 10
We begin with the clever algebraic manipulation approach. This approach exploits the tails prob-
ability reported in the regression printout. Recall that the tails probability is based on the premise
that the actual value of the coefficient equals 0. Our strategy takes advantage of this:
• First, cleverly define a new coefficient that equals 0 when the null hypothesis is true.
• Second, reformulate the model to incorporate the new coefficient.
• Third, use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the new model.
• Last, focus on the estimate of the new coefficient. Use the new coefficient estimate’s tails
probability to calculate Prob[Results IF H0 true].
Now cleverly define a new coefficient so that the null hypothesis is true when the new coefficient
equals 0:
βClever = βP + βI + βCP
Clearly, βClever equals 0 if and only if the elasticities sum to 0 and no money illusion exists; that
is, βClever equals 0 if and only if the null hypothesis is true.
Now we will use algebra to reformulate the constant elasticity of demand model to incorporate
βClever:
βClever = βP + βI + βCP
= βClever − βP − βI
Rearrange terms:
+ βIlogILessLogChickPt + βCleverLogChickPt + et
where
LogQt = log(Qt)
LogPLessLogChickPt = log(Pt) − log(ChickPt)
LogILessLogChickPt = log(It) − log(ChickPt)
LogChickPt = log(ChickPt)
Step 1: Collect data, run the regression, and interpret the estimates.
Now, use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of this model (table 10.5).
It is important to note that the estimates of the reformulated model are consistent with the
estimates of the original model (table 10.4):
• The estimate of the price coefficient is the same in both cases, −0.41.
• The estimate of the income coefficient is the same in both cases, 0.51.
• In the reformulated model, the estimate of βClever equals 0.22, which equals the sum of the
Step 2: Play the cynic and challenge the results; reconstruct the null and alternative
hypotheses.
Cynic’s view: Sure, bClever, the estimate for the sum of the actual elasticities, does not equal 0,
suggesting that money illusion exists, but this is just “the luck of the draw.” In fact money illu-
sion is not present; the sum of the actual elasticities equals 0.
342 Chapter 10
Table 10.5
Beef demand regression results—Constant elasticity model
Prob[Results IF H0 true]
bClever
0.22 0.22
0 0.22
Figure 10.11
Probability distribution of the clever coefficient estimate
We have already shown that we cannot dismiss the cynic’s view as nonsense. As a consequence
of random influences we could never expect the estimate for βClever to equal precisely 0, even if
the actual elasticities sum to 0.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis (figure 10.11).
343 Multiple Regression Analysis—Introduction
Student t-distribution
Mean = 0
SE = 0.2759
DF = 20
0.4325/2 0.4325/2
bClever
0.22 0.22
0 0.22
Figure 10.12
Calculating Prob[Results IF H0 true]
• Generic question: What is the probability that the results would be like those we obtained
(or even stronger), if the cynic is correct and no money illusion was present?
• Specific question: What is the probability that the coefficient estimate in one regression,
bClever, would be at least 0.22 from 0, if H0 were true (if the actual coefficient, βClever, equals 0)?
Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true] (figure 10.12).
The software automatically computes the tails probability based on the premise that the actual
value of the coefficient equals 0. This is precisely what we need, is it not? The regression printout
reports that the tails probability equals 0.4325. Consequently
Prob[Results IF H0 true] = 0.4325.
Step 5: Decide on the standard of proof, a significance level.
The significance level is the dividing line between the probability being small and the probability
being large.
The probability exceeds the traditional significance levels of 1, 5, and 10 percent. Based on the
traditional significance levels, we would not reject the null hypothesis. We would conclude that
βClever and the estimate sum is not statistically different from 0, thereby supporting the no money
illusion theory.
In the next chapter we will explore two other ways to calculate Prob[Results IF H0 true].
1. How does multiple regression analysis differ from simple regression analysis?
2. Consider the following linear demand model:
Q = βConst PβPIβIChickPβCP
Chapter 10 Exercises
Agricultural production data: Cross-sectional agricultural data for 140 nations in 2000 that
cultivated more than 10,000 square kilometers of land.
a. What is your theory regarding how the quantity of labor, land, and machinery affects
agricultural value added?
b. What does your theory imply about the signs of the model’s coefficients?
c. What are the appropriate hypotheses?
d. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
Definition: The value-added function exhibits constant returns to scale if and only if increasing
each input by the same factor will increase ValueAdded by that same factor.
346 Chapter 10
For example, suppose that the value-added function exhibits constant returns to scale. If twice
as much labor, land, and machinery are used, then value added will double also.
a. Begin with the equation for the constant elasticity model for value added.
ValueAdded = βConstLaborβLaborLandβLandMachineryβMachinery
Then double labor, land, and machinery; that is, in the equation replace
• Labor with 2Labor
• Land with 2Land
• Machinery with 2Machinery
Derive the expression for NewValueAdded in terms of ValueAdded:
NewValueAdded = _____________________
b. If the value-added function exhibits constant returns to scale, how must the new expression
for value added, NewValueAdded , be related to the original expression for value added,
ValueAdded?
c. If the value-added function exhibits constant returns to scale, what must the sum of the
exponents, βLabor + βLand + βMachinery, equal?
3. Consider the constant elasticity model for value added
b. Incorporate the “clever” coefficient into the log form of the constant elasticity model for
value added.
c. Estimate the parameters of the equation that incorporates the new coefficient.
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
f. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
6. Again, revisit the cigarette consumption data.
a. Instead of a linear model, consider a constant elasticity model to capture the impact that
the price of cigarettes and income per capita have on cigarette consumption. What equation
depicts your model?
348 Chapter 10
b. What does your theory imply about the sign of the coefficients?
c. What are the appropriate hypotheses?
d. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
e. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
7. Again, revisit the cigarette consumption data.
a. What is your theory regarding how
i. The price of cigarettes affects the youth smoking rate?
ii. Per capita income affects the youth smoking rate?
b. Based on your theory, construct a linear regression model. What equation depicts your
model?
c. What does your theory imply about the signs of the model’s coefficients?
d. What are the appropriate hypotheses?
e. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
f. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
Hypothesis Testing and the Wald Test
11
Chapter 11 Outline
2. Review how the ordinary least squares (OLS) estimation procedure determines the value of
the parameter estimates. What criterion does this procedure use to determine the value of the
parameter estimates?
3. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
4. Focus on our beef consumption data:
Beef consumption data: Monthly time series data of beef consumption, beef prices, income,
and chicken prices from 1985 and 1986.
a. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the constant elasticity demand model.
The money illusion theory contends that whenever all prices and income change by the same
proportion the quantity demanded is unaffected. In terms of elasticities, this means that a good’s
elasticities (own price, income, and cross price) sum to 0. Let us briefly review the steps that
we undertook in the last chapter to assess this theory.
Since the linear demand model is intrinsically inconsistent with the no money illusion theory,
we cannot use it to assess the theory. The constant elasticity demand model can be used, however:
where
Next we converted the constant elasticity demand model into a linear relationship by taking
natural logarithms:
If all prices and income increase by 1 percent, the quantity of beef demanded would increase
by 0.22 percent. The sum of the elasticity estimates does not equal 0; more specifically, the sum
lies 0.22 from 0. The nonzero sum suggests that money illusion exists (table 11.1).
However, as a consequence of random influences, we could never expect the sum of the
elasticity estimates to equal exactly precisely 0, even if the sum of the actual elasticities did
equal 0. Consequently we followed the hypothesis testing procedure. We played the cynic in
order to construct the null and alternative hypotheses. Finally, we needed to calculate the prob-
ability that the results would be like those we obtained (or even stronger), if the cynic is correct
and null hypothesis is actually true; that is, we needed to calculate Prob[Results IF H0 true].
Table 11.1
Beef demand regression results—Constant elasticity model
In the last chapter we explored one way to calculate this probability, the clever algebraic manipu-
lation approach. First we cleverly defined a new coefficient that equals 0 if and only if the null
hypothesis is true:
βClever = βP + βI + βCP
We then reformulated the null and alternative hypotheses in terms of the new coefficient, βClever:
After incorporating the new coefficient into the model, we used the ordinary least squares (OLS)
estimation procedure to estimate the value of the new coefficient. Since the null hypothesis is
now expressed as the new, clever coefficient equaling 0, the new coefficient’s tails probability
reported in the regression printout is the probability that we need:
βP + βI + βCP = 0
354 Chapter 11
We now incorporate this restriction into the constant elasticity demand model:
Rearranging terms
where
LogQt = log(Qt)
LogPLessLogChickPt = log(Pt) − log(ChickPt)
LogILessLogChickPt = log(It) − log(ChickPt)
LogChickPt = log(ChickPt)
To compute the cross price elasticity estimate, we must remember that the restricted regression
is based on the premise that the sum of the elasticities equals 0. Hence
bP + bI + bCP = 0
and
For future reference, note in the restricted regression the sum of squared residuals equals
0.004825 and the degrees of freedom equal 21:
Table 11.2
Beef demand regression results—Restricted model
Let us review the regression printout (table 11.3). Record the sum of squared residuals and the
degrees of freedom in the unrestricted regression:
Comparing the Restricted Sum of Squared Residuals and the Unrestricted Sum of Squared
Residuals: The F-Statistic
Next we compare the sum of squared residuals for the restricted and unrestricted regressions:
Table 11.3
Beef demand regression results—Unrestricted model
Table 11.4
Comparison of parameter estimates
Restricted Unrestricted
regression regression
bP − 0.47 − 0.41
bI 0.30 0.51
bChickP 0.17 0.12
bConst 9.50 11.37
SSR 0.004825 0.004675
The parameter estimates of the restricted and unrestricted regressions differ (table 11.4). Recall
that the estimates of the constant and coefficients are chosen so as to minimize the sum of
squared residuals. In the unrestricted regression no restrictions are placed on the coefficient
estimates; when bP equals −0.41, bI equals 0.51, and bChickP equals 0.12, the sum of squared
residuals is minimized. The estimates of the unrestricted regression minimize the sum of squared
residuals. The estimates of the restricted regression do not equal the estimates of the unrestricted
regression. Hence the restricted sum of square residuals is greater than the unrestricted sum.
More generally:
• The unrestricted equation places no restrictions on the estimates.
•Enforcing a restriction impedes our ability to make the sum of squared residuals as small as
possible.
• A restriction can only increase the sum of squared residuals; a restriction cannot reduce the
sum:
SSRR ≥ SSRU
357 Hypothesis Testing and the Wald Test
Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5
Restricted sum of
Restriction: Coef sum: 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum
Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition
Figure 11.1
Restricted and unrestricted sum of square residuals simulation
Econometrics Lab 11.1: The Restricted and Unrestricted Sums of Squared Residuals
This simulation emphasizes the point. It mimics the problem at hand by including three explana-
tory variables whose coefficients are denoted as Coef1, Coef2, and Coef3. By default the actual
values of the three coefficients are −0.5, 0.4, and 0.1, respectively. The simulation allows us
to specify a restriction on the coefficient sum. By default a coefficient sum of 0 is imposed
(figure 11.1).
Be certain the Pause checkbox is checked and click Start. The first repetition is now per-
formed. The simulation calculates the parameter estimates for both the restricted and unrestricted
equations. The sum of the restricted coefficient estimates equals 0; the sum of the unrestricted
coefficient does not equal 0. If our logic is correct the restricted sum will be greater than the
unrestricted sum. Check the two sums. Indeed, the restricted sum is greater. Click Continue a
few times. Each time the restricted sum is always greater than the unrestricted sum confirming
our logic.
Now let us consider a question:
358 Chapter 11
Question: Since the imposition of a restriction can only make the sum of squared residuals
larger, how much larger should we expect it be?
The answer to this question depends on whether or not the restriction is actually true. On the
one hand, if in reality the restriction is not true, we would expect the sum of squared residuals
to increase by a large amount. On the other hand, if the restriction is actually true, we would
expect the sum of squared residuals to increase only modestly.
How do we decide if the restricted sum of squared residuals is much larger or just a little
larger than the unrestricted sum? For reasons that we will not delve into, we compare the mag-
nitudes of the restricted and unrestricted sum of squared residuals by calculating what statisti-
cians call the F-statistic:
On the one hand, when the restricted sum is much larger than the unrestricted sum,
• SSRR − SSRU is large
and
• the F-statistic is large.
On the other hand, when the restricted sum is only a little larger than the unrestricted sum,
• SSRR − SSRU is small
and
• the F-statistic is small.
Note that since the restricted sum of squared residuals (SSRR) cannot be less than the unrestricted
sum (SSRU), F-statistic can never be negative (figure 11.2):
F≥0
Furthermore the F-statistic is a random variable. The claim is based on the fact that:
• Since the parameter estimates for both the restricted and unrestricted equations are random
variables both the restricted and unrestricted sums of squared residuals are random variables.
•Since both the restricted and unrestricted sums of squared residuals are random variables, the
F-statistic is a random variable.
359 Hypothesis Testing and the Wald Test
Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5
Restricted sum of
Restriction: Coef sum: 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum
Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition
F -Statistic
F-Statistic for
this repetition
Figure 11.2
Sums of square residuals and the F-statistic simulation
Again, let us use a simulation to illustrate that the F-statistic is a random variable. Be certain
the Pause checkbox is checked and click Start. Then click Continue a few times. We cannot
predict the sums of squared residuals or the F-statistic beforehand. Clearly, the sums of squared
residuals and the F-statistic are random variables.
Let us now put this all together:
Cynic’s view: Sure, the F-statistic is 0.64, but the F-statistic will always be positive because
the restricted sum of squared residuals (SSRR) will always be greater than the unrestricted sum
(SSRU). An F-statistic of 0.64 results from “the luck of the draw.”
We must calculate Prob[Results IF H0 true]. Before doing so, however, recall that the F-
statistic is a random variable. Recall what we have learned about random variables:
• The bad news is that we cannot predict the value of a random variable beforehand.
• The good news is that in this case we can describe its probability distribution.
The F-distribution describes the probability distribution of the F-statistic. As figure 11.3 shows,
the F-distribution looks very different than the normal and Student t-distribution. The normal
and Student t-distributions were symmetric bell shaped curves. The F-distribution is neither
F -Distribution
F
0
Figure 11.3
F-Distribution
362 Chapter 11
symmetric nor bell shaped. Since the F-statistic can never be negative, the F-distribution begins
at F equals 0. Its precise shape depends on the numerator’s and the denominator’s degrees of
freedom, the degrees of freedom of the restricted and unrestricted regressions.
Econometrics Lab 11.3: The Restricted and Unrestricted Sums of Squared Residuals and
the F-Distribution
Now we will use the simulation to calculate Prob[Results IF H0 true] (figure 11.4):
Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5
Restricted sum of
Restriction: Coef sum 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum
Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition
F–Statistic for
this repetition
Figure 11.4
F-Distribution simulation
363 Hypothesis Testing and the Wald Test
• By default, the actual values of the three coefficients (Coef1, Coef2, and Coef3) are −0.5, 0.4,
and 0.1, respectively. The actual values of the coefficients sum to 0. Hence the premise of the
null hypothesis is met; that is, H0 is true.
• Also the At Least F-Value is set at 0.64; this is the value of the F-statistic that we just calcu-
lated for the restricted and unrestricted beef demand regressions. Click Continue a few more
times. Sometimes the F-statistic is less than 0.64; other times it is greater than 0.64. Note the
At Least Percent line; the simulation is calculating the percent of repetitions in which the
F-statistics is equal to or greater than 0.64.
•Clear the Pause checkbox, click Start, and then after many, many repetitions click Stop. The
F-statistic equals 0.64 or more in about 43 percent of the repetitions.
We can now apply the relative frequency interpretation of probability; in one repetition of the
experiment, the probability that the F-statistic would be 0.64 or more when the null hypothesis
is true equals 0.43:
There is another way to calculate this probability that does not involve a simulation. Just as
there are tables that describe the normal and Student t-distributions, there are tables describing
the F-distribution. Unfortunately, F-distribution tables are even more cumbersome than Student
t-tables. Fortunately, we can use our Econometrics Lab to perform the calculation instead.
We wish to calculate the probability that the F-statistic from one pair of regressions would be
0.64 or more, if H0 were true (if there is no money illusion, if actual elasticities sum to 0),
Prob[Results IF H0 true] (figure 11.5).
F -Distribution
DFNum = 1
DFDem = 20
0.43
F
0.64
Figure 11.5
Calculating Prob[Results IF H0 true]—Using a simulation
364 Chapter 11
Click Calculate:
Many statistical software packages can be used to conduct a Wald test automatically.
First estimate the unrestricted regression (table 11.5). Then choose the Wald test option and
impose the appropriate restriction that the coefficients sum to 0.
Table 11.5
Beef demand regression results—Unrestricted model
Prob[Results IF H0 true]: The F-statistic equals 0.64 (table 11.6). The probability that the
F-statistic from one pair of regressions would be 0.64 or more, if H0 were true (if there is no
money illusion, if actual elasticities sum to 0) equals 0.43.
We have now described three ways to calculate Prob[Results IF H0 true]. Let us compare the
results (table 11.7).
While the methods use different approaches, they produce identical conclusions. In fact it can
be shown rigorously that the methods are equivalent.
Table 11.6
Beef demand regression results—Wald test of No Money Illusion theory
Wald test
Degrees of freedom
Table 11.7
Comparison of the methods to calculate Prob[Results IF H0 true]
Next we will consider a set of null and alternative hypotheses that assess the entire model:
On the one hand, if the null hypothesis were true, none of the explanatory variables would affect
the dependent variable, and consequently the model would be seriously deficient. On the other
hand, if the alternative hypothesis were true, at least one of the explanatory variables would be
influencing the dependent variable.
We will use the restricted and unrestricted regressions approach to calculate Prob[Results IF
H0 true]. We begin with estimating the restricted and unrestricted regressions.
Table 11.8
Beef demand regression results—No explanatory variables
Table 11.9
Beef demand regression results—Unrestricted model
What is the probability that the F-statistic from one pair of regressions
Prob[Results IF H0 true]:
would be 19.4 or more, if the H0 were true (i.e., if both prices and income have no effect on
quantity of beef demanded, if each of the actual coefficients, βP, βI, and βCP, equals 0)?
Using the Econometrics Lab, we conclude that the probability of obtaining the results like we
did if null hypothesis were true is less than 0.0001:
Also we could let a statistical package do the work by using it to run the Wald test.
After running the unrestricted regression, choose the Wald test option and impose the restriction
that all the coefficients equal 0.
Note that even though the Wald test printout reports the probability to be 0.0000 (table 11.10),
it is not precisely 0 because the printout reports the probability only to four decimals. To empha-
size this fact, we report that Prob[Results IF H0 true] is less than 0.0001.
In fact most statistical packages automatically report this F-statistic and the probability when
we estimate the unrestricted model (table 11.11).
The values appear in the F-statistic and Prob[F-statistic] rows:
F-statistic = 19.4
Using a significance level of 1 percent, we would conclude that Prob[Results IF H0 true] is small.
Consequently we would reject the null hypothesis that none of the explanatory variables included
in the model has an effect on the dependent variable.
Table 11.10
Demand regression results—Wald test of entire model
Wald test
Degrees of freedom
Table 11.11
Beef demand regression results—Unrestricted model
A two-tailed t-test is equivalent to a Wald test. We will use the constant elasticity demand model
to illustrate this:
Focus on the coefficient of the price of chicken, βCP. Consider the following two-tailed
hypotheses:
H0: βCP = 0 ⇒ Price of chicken has no effect on the quantity of beef demanded
H1: βCP ≠ 0 ⇒ Price of chicken has an effect on the quantity of beef demanded
We will first calculate Prob[Results IF H0 true] using a two-tailed t-test and then using a Wald
test.
Table 11.12
Beef demand regression results—Unrestricted model
Student t-distribution
Mean = 0
SE = 0.0714
DF = 20
0.0961/2 0.0961/2
bCP
0.12 0.12
0 0.12
Figure 11.6
Calculating Prob[Results IF H0 true]—Using a t-test
Prob[Results IF H0 true]: What is the probability that the coefficient estimate in one regression,
bCP, would be at least 0.12 from 0, if H0 were true (if the actual coefficient, βCP, equals 0)?
Since the tails probability is based on the premise that the actual coefficient value equals 0, the
tails probability reported in the regression printout is just what we are looking for (figure 11.6):
Next we turn to a Wald test. Let us review the rationale behind the Wald test:
• The null hypothesis enforces the restriction and the alternative hypothesis does not:
H0: βCP = 0 ⇒ Price of chicken has no effect on the quantity of beef demanded
H1: βCP ≠ 0 ⇒ Price of chicken has an effect on the quantity of beef demanded
just drop the price of chicken, PChick, as an explanatory variable because its coefficient is speci-
fied as 0 (table 11.13):
Table 11.13
Beef demand regression results—Restricted model
Table 11.14
Beef demand regression results—Unrestricted model
F -Distribution
DFNum = 1
DFDem = 20
Prob[Results IF H0 true]
F
3.05
Figure 11.7
Calculating Prob[Results IF H0 true]—Using an F-test
Using these two regressions, we can now calculate the F-statistic (figure 11.7):
Prob[Results IF H0 true]: What is the probability that the F-statistic from one pair of regressions
would be 3.05 or more, if the H0 were true (if the actual coefficient, βCP, equals 0; that is, if the
price of chicken has no effect on the quantity of beef demanded)?
Alternatively we could use statistical software to calculate the probability. After running the
unrestricted regression, choose the Wald test option and impose the restriction that all the coef-
ficients equal 0.
374 Chapter 11
C(3) = 0
• Click OK.
Using either method, we conclude that based on a Wald test, the Prob[Results IF H0 true] equals
0.0961(table 11.15).
Now compare Prob[Results IF H0 true]’s calculated for the two-tailed t-test and the Wald test:
The probabilities are identical. This is not a coincidence. It can be shown rigorously that a two-
tailed t-test is a special case of the Wald test.
We have introduced three distributions that are used to assess theories: Normal, Student-t, and
F (figure 11.8).
•Theories involving a single variable: Normal distribution and Student t-distribution. The
normal distribution is used whenever we know the standard deviation of the distribution; the
normal distribution is described by its mean and standard deviation.
Often we do not know the standard deviation of the distribution, however. In these cases we turn
to the Student t-distribution; it is described by its mean, estimated standard deviation (standard
error), and the degrees of freedom. The Student t-distribution is more “spread out” than the
Table 11.15
Beef demand regression results—Wald test of LogChickP coefficient
Wald test
Degrees of freedom
Normal
Student t
Figure 11.8
Normal distribution, Student t-distribution, and F-distribution
normal distribution because an additional element of uncertainty is added when the standard
deviation is not known and must be estimated.
• Theories involving several variables: F-Distribution. The F-distribution can be used to assess
relationships among two or more estimates. We compute the F-statistic by using the sum of
squared residuals and the degrees of freedom in the restricted and unrestricted regressions:
The F-distribution is described by the degrees of freedom in the numerator and denominator,
DFR − DFU and DFU.
1. Consider a Wald test. Can the restricted sum of squared residuals be less than the unrestricted
sum of squared residuals? Explain.
376 Chapter 11
2. How is a Wald test F-statistic related to the restricted and unrestricted sum of squared
residuals?
3. How is a two-tailed t-test related to the Wald test?
4. What are the three important probability distributions we have introduced? When is it appro-
priate to use each of them?
Chapter 11 Exercises
Agricultural production data: Cross-sectional agricultural data for 140 nations in 2000 that
cultivated more than 10,000 square kilometers of land.
1. Focus on the log form of the constant elasticity value added model:
+ βMachinerylog(Machineryt) + et
2. Focus on the log form of the constant elasticity value added model:
+ βMachinerylog(Machineryt) + et
Assess the constant returns to scale theory using the Wald approach.
a. Consider the unrestricted regression.
i. Estimate the parameters of the unrestricted regression.
3. Assess the constant returns to scale theory using the Wald approach the “easy way” with
statistical software.
4. Compare the Prob[Results IF H0 true] that has been calculated in three ways: clever algebra,
Wald test using the Econometrics Lab, and Wald test using statistical software.
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
We, as consumers, naturally think of the price of cigarettes as what we must pay to purchase a
pack of cigarettes:
The seller of cigarettes, however, must pass the cigarette tax on to the government. From the
standpoint of the supplier, the price from the supplier’s standpoint equals the price from the
producer’s standpoint less the tax:
Convince yourself that the values of PriceConsumer, PriceSupplier, and Tax for our cigarette
consumption data are actually related in this way.
Consider the following model:
This model raises the possibility that consumers of cigarettes react differently to the price
received by the supplier and the tax received by the government even though they both affect
the price paid by consumers in the same way.
a. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients of
PriceSuppliert and Taxt.
b. Interpret the coefficient estimates.
6. Continue using cigarette consumption data and focus on the following null and alternative
hypothesis:
a. In words, what does the null hypothesis suggest? What does the alternative hypothesis
suggest?
b. Use the cigarette consumption data and a clever algebraic manipulation to calculate
Prob[Results IF H0 true].
7. Continue using the cigarette consumption data in order to use the Wald test to calculate
Prob[Results IF H0 true].
379 Hypothesis Testing and the Wald Test
a. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the unrestricted regression. What do the unrestricted sum of square residuals and degrees
of freedom equal?
b. Derive the equation that describes the restricted regression.
c. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the restricted regression. What do the restricted sum of square residuals and degrees of
freedom equal?
d. Compute the F-statistic for the Wald test.
e. Using the Econometrics Lab, compute Prob[Results IF H0 true].
8. Continue using the cigarette consumption data in order to calculate Prob[Results IF H0 true]
the “easy way.”
Chapter 12 Outline
1. Consider a multiple regression model. When a particular explanatory variable has no effect
on the dependent variable, what does the actual value of its coefficient equal?
2. The 1992 Clinton presidential campaign focused on the economy and made the phrase “It’s
the economy stupid” famous. Bill Clinton and his political advisors relied on the theory that
voters hold the President and his party responsible for the state of the economy. When the
economy performs well, the President’s party gets credit; when the economy performs poorly,
the President’s party takes the blame.
“It’s the economy stupid” theory: The American electorate is sensitive to economic conditions.
Good economic conditions increase the vote for the President’s party; bad economic conditions
decrease the vote for the President’s party.
382 Chapter 12
where
VotePresPartyt = percent of the popular vote received by the incumbent President’s party in
year t
UnemPriorAvgt = average unemployment rate in the three years prior to election, that is, three
years prior to year t
a. Assuming that the “It’s the economy stupid” theory is correct, would βUnemPriorAvg be posi-
tive, negative or zero?
b. For the moment assume that when you run the appropriate regression, the sign of the
coefficient estimate agrees with your answer to part a. Formulate the null and alternative
hypotheses for this model.
3. Again focus on the on the “It’s the economy stupid” theory. Consider a second model:
where
where
UnemCurrentt = unemployment rate change from previous year; that is, the unemployment rate
trend in year t (Note: If the unemployment rate is rising, the trend will be a positive number; if
the unemployment rate is falling, the trend will be a negative number.)
a. Assuming that the theory is correct, would βUnemTrend be positive, negative or zero?
b. For the moment assume that when you run the appropriate regression, the sign of the
coefficient estimate agrees with your answer to part a. Formulate the null and alternative
hypotheses for this model.
5. The following table reports the percent of the popular vote received by the Democrats,
Republicans, and third parties for every presidential election since 1892.
383 Model Specification and Development
Both models use the same information to explain the quantity of beef demanded: the price of
beef (the good’s own price), income, and the price of chicken. The models use this information
differently, however. That is, the two models specify two different ways in which the
quantity of beef demanded is related to the price of beef (the good’s own price), income, and
the price of chicken. We will now explore how we might decide whether or not a particular
specification of the model can be improved. The RESET test is designed to do just this. In the
test we modify the original model to construct an artificial model. An artificial model is not
designed to test a theory, but rather it is designed to assess the original model.
To explain the RESET test, we begin with the general form of the simple linear regression
model: y is the dependent variable and x is the explanatory variable:
yt = βConst + βxxt + et
We use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters:
• bConst estimates βConst
• bx estimates βx
Critical point: The artificial model adds no new information. It is just using the same informa-
tion in a different form.
Question: Can this new form of the information in the artificial model help us explain the
dependent variable significantly better? The coefficient of Esty2 provides the answer to this
question. If γEsty2, the coefficient of Esty2, equals 0, the new form of the information is adding
no explanatory power; if γEsty2 does not equal 0, the new form adds power. We now construct the
appropriate null and alternative hypotheses:
We will now consider the linear model of beef demand to illustrate the RESET test:
First run the regression to estimate the parameters of the original model (table 12.1):
Next construct the artificial model:
Step 1: Collect data, run the regression, and interpret the estimates.
EstQSquared = EstQ2
Then we use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 12.2):
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the new form of the information adds no explanatory power.
The coefficient EstQ2, γEstQ2, actually equals 0.
Table 12.1
Beef demand regression results—Linear model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 12.2
Beef demand regression results—Artificial model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
H0: γEstQ2 = 0 Cynic is correct: New form of the information adds NO explanatory power;
there is no compelling reason to consider a new specification of the original
model.
H1: γEstQ2 ≠ 0 Cynic is incorrect: New form of the information adds explanatory power; there
is reason to consider a new specification of the original model.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct the new form of the information adds NO
explanatory power?
• Specific question: The regression’s estimate of γEstQ2 was 0.0000579. What is the probability
that the estimate of γEstQ2 from one regression would be at least 0.0000579 from 0, if H0 were
true (i.e., if γEstQ2 actually equaled 0, if the different form of the information did not improve the
regression)?
Table 12.3
Beef demand regression results—Linear model RESET test
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Steps 4 and 5: The tails probability reported in the regression results is the probability that
we need:
We would reject the null hypothesis at the 5 percent significance level although not at the 1
percent significance level. This suggests that it may be prudent to investigate an alternative
specification of the original model.
Fortunately, statistical software provides a very easy way to run a RESET test by generating
the new variable automatically (table 12.3).
Our calculations and those provided by the statistical software are essentially the same. The
slight differences that do emerge result from the fact that we rounded off some decimal places
from the parameter estimates of the original model when we generated the estimated value of
Q, EstQ.
389 Model Specification and Development
Next consider a different specification of the model, a constant elasticity demand model:
We then estimate its parameters using the ordinary least squares (OLS) estimation procedure
(table 12.4).
Step 1: Collect data, run the regression, and interpret the estimates.
We will estimate the artificial model using statistical software (table 12.5).
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the new form of the information adds no explanatory power.
Table 12.4
Beef demand regression results—Constant elasticity model
Table 12.5
Beef demand regression results—Constant elasticity Model RESET test
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
H0: γEstQ2 = 0 Cynic is correct: New form of the information adds NO explanatory power;
there is no compelling reason to consider a new specification of the original
model.
H1: γEstQ2 ≠ 0 Cynic is incorrect: New form of the information adds explanatory power; there
is reason to consider a new specification of the original model.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct the new form of the information adds NO
explanatory power?
• Specific question: The regression’s estimate of γEstQ2 was 10.7. What is the probability that
the estimate of γEstQ2 from one regression would be at least 10.7 from 0, if H0 were true (that is,
if γEstQ2 actually equaled 0, if the different form of the information did not improve the
regression)?
The size of this probability determines whether we reject the null hypothesis:
Using the traditional significance levels of 1 or 5 percent, we do not reject the null hypothesis
and conclude that there is no compelling reason to specify a new model.
The 1992 Clinton presidential campaign focused on the economy and made the phrase “It’s the
economy stupid” famous. Bill Clinton, the Democratic challenger, and his political advisors
relied of the theory that voters hold the Republican President, George H. W. Bush, and his party
responsible for the state of the economy. When the economy performs well, the President’s party
gets credit; when the economy performs poorly, the President’s party takes the blame:
“It’s the economy stupid” theory: The American electorate is sensitive to economic conditions.
Good economic conditions increase the vote for the President’s party; bad economic conditions
decrease the vote for the President’s party.
Project: Assess the effect of economic conditions on presidential elections.
Clearly, we need data to test this theory. Fortunately, we have already collected some data. Data
from 1890 to 2008 can be easily accessed:
Presidential election data: Annual time series data of US presidential election and economic
statistics from 1890 to 2008.
First note that the data does not include the variable that we are trying to explain: the vote
received by the incumbent President’s party:
Fortunately, we can generate it from the variables that we have. We have data reporting the
percent of the popular vote received by the Democratic and Republican candidates, VotePar-
tyDemt and VotePartyRept. Also another variable indicates the incumbent President’s party,
PresPartyR1t. Focus attention on these three variables:
To show that this new variable indeed equals the vote receive by the President’s party consider
the two possibilities:
• When the Republicans are occupying the White House,
The new variable VotePresPartyt will equal the vote received by the Republican candidate:
= 1 × VotePartyRept + 0 × VotePartyDemt
= VotePartyRept
• When the Democrats are occupying the White House,
The new variable VotePresPartyt will equal the vote received by the Democratic candidate:
= 0 × VotePartyRept + 1 × VotePartyDemt
= VotePartyDemt
393 Model Specification and Development
Table 12.6
Checking generated variables
After generating any new variable, it is important to check to be certain that it is generated cor-
rectly. The first few elections are reported in table 12.6. Everything looks fine. When Republicans
hold the White House (when PresPartyRIt equals 1), the new variable, VotePresPartyt, equals
the vote received by the Republican candidate (VotePartyRept). Alternatively, when Democrats
hold the White House (when PresPartyR1t equals 0), VotePresPartyt equals the vote received by
the Democratic candidate (VotePartyDemt).
Next let us look at our voting data to investigate the possibility of data oddities (table 12.7). In
all but a handful of elections, third (minor) parties captured only a small percent of the total
vote. In some years third parties received a substantial fraction, however. The election of 1912
is the most notable example.
In the 1912 election more than a third of votes are siphoned off from the Republicans and
Democrats. How should deal with this? One approach is just to focus on those elections that
were legitimate two-party elections. In this approach, we might ignore all elections in which
third (minor) parties receive at least 10 percent or perhaps 15 percent of the votes cast. If we
were to pursue this approach, however, we would be discarding information. Econometricians
never like to throw away information. Another approach would be to focus just on the two major
parties by expressing the percent of votes just in terms of those votes casted just for the Repub-
lican and Democratic candidates. Let us call this variable VotePresPartyTwot:
As always, it is important to be certain that the new variable has been generated correctly
(table 12.8). Undoubtedly there are other ways to account for third parties. In this chapter,
however, we will do so by focusing on the variable VotePresPartyTwo.
394 Chapter 12
Table 12.7
Checking for data oddities
Table 12.8
Checking generated variables
Now we will illustrate the iterative process that that econometricians use to develop their models.
There is no “cookbook” procedure we can follow. Common sense and inventiveness play critical
roles in model development:
Model formulation:
Formulate a specific model
describing the theory
Incorporate insights from the assessment
↓ ↑
to refine the specific model describing
Model assessment: the general theory
Apply econometric techniques
to assess the model
Gradually we refine the specific details of the model using an iterative process: model formula-
tion and model assessment. In a real sense this is as much of an art as a science.
We will describe specific models that attempt to explain the percent of the vote received by the
President’s party. In doing so, we will illustrate how the iterative process of model formulation
and model assessment leads us from one model to the next. We begin by observing that the
unemployment rate is most frequently cited economic statistic. Every month the Bureau of Labor
Statistics announces the previous month’s unemployment rate. The announcement receives
headline attention in the newspapers and on the evening news broadcasts. Consequently it seems
natural to begin with models that focus on the unemployment rate. We will eventually refine our
model by extending our focus to another important economic variable, inflation.
Model 1: Past performance—Electorate is sensitive to how well the economy has performed
in the three years prior to the election.
The first model implicitly assumes that voters conscientiously assess economic conditions over
the three previous years of the President’s administration. If conditions have been good, the
President and his party are rewarded with more votes. If conditions have been bad, fewer votes
would be received. More specifically, we use the average unemployment rate in the three years
prior to the election to quantify economic conditions over the three previous years of the Presi-
dent’s administration.
where
UnemPriorAvgt = average unemployment rate in the three years prior to election; that is, three
years prior to year t
Theory: A high the average unemployment rate during the three years prior to the election will
decrease the votes for the incumbent President’s party; a low average unemployment rate will
increase the votes. The actual value of the coefficient, βUnemPriorAvg, is negative:
βUnemPriorAvg < 0
Step 1: Collect data, run the regression, and interpret the estimates.
After generating the variable UnemPriorAvg we use the ordinary least squares (OLS) estima-
tion procedure to estimate the model’s parameters (table 12.9).
The coefficient estimate is 0.33. The coefficient estimate directly contradicts our theory.
Accordingly we will abandon this model and go “back to the drawing board.” We will consider
another model.
Our analysis of the first model suggests that voters may not have a long memory; accordingly,
the second model suggests that voters are myopic; voters judge the President’s party only on the
current economic climate; they do not care what has occurred in the past. More specifically, we
use the current unemployment rate to assess economic conditions.
Table 12.9
Election regression results—Past performance model
where
Theory: A high unemployment rate in the election year itself will decrease the votes for incum-
bent President’s party; a low unemployment rate will increase the votes. The actual value of the
coefficient, βUnemCurrent, is negative:
βUnemCurrent < 0
Step 1: Collect data, run the regression, and interpret the estimates.
We use the ordinary least squares (OLS) estimation procedure to estimate the second model’s
parameters (table 12.10).
The coefficient estimate is −0.12. This is good news. The evidence supports our theory. Now
we will continue on to determine how confident we should be in our theory.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the current unemployment rate does not affect the votes
received by the incumbent President’s party.
H0: βUnemCurrent = 0 Cynic is correct: Current unemployment rate has no effect on votes
H1: βUnemCurrent < 0 Cynic is incorrect: High unemployment rate reduces votes for the incum-
bent President’s party
Table 12.10
Election regression results—Present performance model
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and the current unemployment rate actually
has no impact?
• Specific question: The regression’s coefficient estimate was −0.12. What is the probability
that the coefficient estimate in one regression would be −0.12 or less, if H0 were actually true
(if the actual coefficient, βUnemCurrent, equals 0)?
The size of this probability determines whether we reject the null hypothesis:
Steps 4 and 5: Use the EViews regression printout to calculate Prob[Results IF H0 true]
(figure 12.2).
The tails probability answers the following question:
Question: If actual value of the coefficient were 0, what is the probability that the estimate
would be at least 0.12 from 0?
Answer: 0.6746
Prob[Results IF H0 true]
bUnemCurrent
−0.12 0
Figure 12.1
Probability distribution of coefficient estimate
399 Model Specification and Development
0.6746/2 0.6746/2
bUnemCurrent
0.12 0.12
− 0.12 0
Figure 12.2
Probability distribution of coefficient estimate
The probability of being in the left-hand tail equals the tails probability divided by 2:
0.6746
Prob[ Results IF H 0 true] = ≈ 0.34
2
This is not good news. By the traditional standards, a significance level of 1, 5, or 10 percent,
this probability is large; we cannot reject the null hypothesis which asserts that the current
unemployment rate has not effect on votes.
Model 2 provides both good and bad news. The coefficient sign supports the theory suggesting
that we are on the right track. Voters appear to have a short memory; they appear to be more
concerned with present economic conditions than the past. The bad news is that the coefficient
for the current unemployment rate does not meet the traditional standards of significance.
Model 3: Present trend—Electorate is sensitive to the current trend, whether economic condi-
tions are improving or deteriorating during the election year.
The second model suggests that we may be on the right track by just focusing on the election
year. The third model speculates that voters are concerned with the trend in economic conditions
during the election year. If economic conditions are improving, the incumbent President’s party
is rewarded with more votes. On the other hand, if conditions are deteriorating, fewer votes
would be received. We use the trend in the unemployment rate to assess the trend in economic
conditions.
where
UnemTrendt = unemployment rate change from previous year; that is, the unemployment rate
trend in year t
Theory: A rising unemployment rate during the election year will decrease the votes of the
incumbent President’s party; a falling unemployment rate will increase votes. The actual value
of the coefficient, βUnemTrend, is negative:
βUnemTrend < 0
Step 1: Collect data, run the regression, and interpret the estimates.
After generating the variable UnemTrend, we use the ordinary least squares (OLS) estimation
procedure to estimate the model’s parameters (table 12.11).
The coefficient estimate is −0.75. This is good news. The evidence supports our theory. Now
we will continue on to determine how confident we should be in our theory.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the unemployment rate trend does not affect the votes
received by the incumbent President’s party.
Table 12.11
Election regression results—Present trend model
H0: βUnemTrend = 0 Cynic is correct: Unemployment rate trend has no effect on votes
H1: βUnemTrend < 0 Cynic is incorrect: A rising unemployment rate (a positive value for
UnemTrend) decreases the vote for the incumbent President’s party; a
falling unemployment rate trend (a negative value for UnemTrend)
increases the vote.
Steps 3, 4, and 5: We will now calculate Prob[Results IF H0 true]. We have done this several
times now, we know that since we are conducting a one-tailed test, the Prob[Results IF H0 true]
equals half the tails probability:
0.1965
Prob[ Results IF H 0 true] = ≈ 0.10
2
While this probability is still considered large at the 5 percent significance level, we appear to
be on the right track. We will shortly consider a fourth model; it postulates that when judging
economic conditions, the electorate considers not only the unemployment rate trend but also the
trend in prices, the inflation rate.
Before moving on to model 4, however, let us illustrate the subtle difference between models
2 and 3 by using each to estimate the vote received by the President’s party in 2008. For model
2 we only need the unemployment rate for 2008 to calculate the estimate; for model 3 we not
only need the unemployment rate in 2008 but also the unemployment rate in the previous year,
2007:
Unemployment rate in 2008 = 5.81% Unemployment rate in 2007 = 4.64%
Model 2: In 2008,
UnemCurrent = 5.81
= 52.7 − 0.7
= 52.0
Model 2’s estimate depends only on the unemployment rate in the current year, 2008 in this
case. The unemployment rate for 2007 is irrelevant. The estimate for 2008 would be the same
regardless of what the unemployment rate for 2007 equaled.
402 Chapter 12
Model 3: In 2008,
= 52.0 − 0.9
= 51.1
Model 3’s estimate depends on the change in the unemployment rate; consequently the unem-
ployment rates in both years are important.
Model 4: Present trend II—Electorate is sensitive not only to the unemployment rate trend, but
also the trend in prices, the inflation rate.
The fourth model, like the third, theorizes that voters are concerned with the trend. If economic
conditions are improving, the incumbent President’s party is rewarded with more votes. If condi-
tions were deteriorating, fewer votes would be received. The fourth model postulates that voters
are not only concerned with the trend in the unemployment rate but also the trend in prices. The
inflation rate measures the trend in prices. A 2 percent inflation rate means that prices are on
average rising by 2 percent, a 3 percent inflation rate means that prices are rising by 3 percent,
and so on.
where
Theory:
• A rising unemployment rate during the election year will decrease the votes of the incumbent
President’s party; a falling unemployment rate will increase votes. The actual value of the Unem-
Trend coefficient, βUnemTrend, is negative:
βUnemTrend < 0
• An increase in the inflation rate during the election year will decrease the votes of the incum-
bent President’s party; a decrease in the inflation rate will increase votes. The actual value of
the InflCpiCurrent coefficient, βInflCpiCurrent, is negative:
βInflCpiCurrent < 0
403 Model Specification and Development
Table 12.12
Election regression results—Present trend model
Step 1: Collect data, run the regression, and interpret the estimates (table 12.12).
On the one hand, both coefficients suggest that deteriorating economic conditions decrease
the votes received by the President’s party. On the other hand, improving economic conditions
increase the vote.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view of unemployment rate trend: Despite the results, the unemployment trend has no
effect.
Cynic’s view of inflation rate: Despite the results, the trend in prices has no effect.
Using the tails probabilities reported in the regression printout, we can easily
Steps 3, 4, and 5:
compute Prob[Results IF H0 True] for each of our theories:
At the 5 percent significance level both of these probabilities are small. Hence, at the 5 percent
significance level, we can reject the null hypotheses that the unemployment trend and inflation
have no effect on the vote for the incumbent President’s party. This supports the notion that “it’s
the economy stupid.”
This example illustrates the model formulation and assessment process. As mentioned before,
the process is as much of an art as a science. There is no routine “cookbook” recipe that we can
apply. It cannot be emphasized enough that we must use our common sense and
inventiveness.
Chapter 12 Exercises
Presidential election data: Annual time series data of US presidential election and economic
statistics from 1890 to 2008.
405 Model Specification and Development
Consider the following factors that may or may not influence the vote for the incumbent Presi-
dent’s party:
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
406 Chapter 12
a. Focus on the cigarette tax rate, Tax. Which state has the highest tax rate and which state
has the lowest tax rate?
b. Consider the following linear model that attempts to explain the cigarette tax rate:
What rationale might justify this model? That is, devise a theory explaining why a state’s
tobacco production should affect the state’s tax on cigarettes. What does your theory suggest
about the sign of the coefficient, βTobProd?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters. Interpret the coefficient estimate.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
3. Revisit the cigarette consumption data.
407 Model Specification and Development
a. Perform a RESET test on the linear model explaining the cigarette tax rate. What do you
conclude?
b. Apply the hypothesis testing approach that we developed to assess this model.
SqrtTobProdPC = sqr(TobProdPC)
or
SqrtTobProdPC = TobProdPC∧.5
In EViews, the term sqr is the square root function and the character ∧ represents an exponent.
c. Perform a RESET test for the nonlinear model. What do you conclude?
4. Revisit the cigarette consumption data.
Consider how the price of cigarettes paid by consumers and per capita income affect the adult
smoking rate.
a. Formulate a theory explaining how each of these factors should affect the adult smoking
rate.
408 Chapter 12
b. Present a linear model incorporating these factors. What do your theories imply about the
sign of each coefficient?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
f. Perform a RESET test for your model. What do you conclude?
5. Revisit the cigarette consumption data.
Focus your attention on explaining the youth smoking rate. Choose the variables that you believe
should affect the youth smoking rate.
a. Formulate a theory explaining how each of these factors should affect the youth smoking
rate.
b. Present a linear model incorporating these factors. What do your theories imply about the
sign of each coefficient?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
f. Perform a RESET test for your model. What do you conclude?
Dummy and Interaction Variables
13
Chapter 13 Outline
Minutes Quiz
Student Studied (x) Score (y)
1 5 66
2 15 87
3 25 90
Consider the most simple of all possible models, one that does not include even a single explana-
tory variable:
Model: yt = βConst + et
410 Chapter 13
SSR = Res 21 + Res 22 + Res 23 = (y1 − bConst)2+ (y2 − bConst)2 + (y3 − bConst)2
Using calculus derive the equation for bConst that minimizes the sum of squared residuals by
expressing bConst in terms of y1, y2, and y3.
Faculty salary data: Artificially generated cross section salary data and characteristics for 200
faculty members.
Salary = βConst + et
To estimate the model, you must “trick” EViews into running the appropriate regression:
• In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight one other
variable, say SexM1.
• In the Workfile window: double click a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete SexM1 so that the line specifying the equation
looks like this:
salary c
• Click OK.
412 Chapter 13
Run the appropriate regression to estimate the values of the constant and coefficient. What
is the estimated salary for men? What is the estimated salary for women?
f. Compare your answers to d and e with your answers to a, b, and c. What conclusions can
you draw concerning averages and the regression estimates?
3. Consider the following model explaining Internet use in various countries:
LogUsersInternett = β Int
Const + β Year Yeart + β CapHum CapitalHumant
Int Int
+ β Int
CapPhy CapitalPhysicalt + β GDP Gdpt + β Auth Autht + e t
Int Int Int
where
b. Develop a theory that explains how each explanatory variable affects Internet use. What
do your theories suggest about the sign of each coefficient?
4. Consider a similar model explaining television use in various countries:
LogUsersTVt = β TV
Const + β Year Yeart + β CapHum CapitalHumant + β CapPhy CapitalPhysicalt
TV TV TV
+ β TV
GDP Gdpt + β Auth Autht + e t
TV TV
where
Model yt = βConst + et
Estimates Estyt = bConst
Residuals Rest = yt − Estyt
To minimize the sum of squared residuals, differentiate with respect to bConst and set the deriva-
tive equal to 0:
dSSR
= −2( y1 − bConst ) − 2( y2 − bConst ) − 2( y3 − bConst ) = 0
dbConst
Divide by −2:
Rearrange terms:
y1 + y2 + y3 = 3bConst
Divide by 3:
y1 + y2 + y3
= bConst
3
y1 + y2 + y3
where equals the mean of y, –y ,
3
–y = bConst
We have just shown that when a regression includes only a constant the ordinary least squares
(OLS) estimate of the constant equals the average value of the dependent variable, y.
414 Chapter 13
Now we consider faculty salary data. It is important to keep in mind that these data were arti-
ficially generated; the data are not “real.” Artificially generated, rather than real, data are used
as a consequence of privacy concerns.
Faculty salary data: Artificially generated cross-sectional salary data and characteristics for 200
faculty members.
On average, males earn nearly $30,000 more than females. This certainly raises the possibility
that gender discrimination exists, does it not?
A dummy variable separates the observations into two disjoint groups; a dummy variable equals
1 for one group and 0 for the other group. The variable SexM1 is a dummy variable; SexM1
denotes whether a faculty member is a male of female; SexM1 equals 1 if the faculty member
is a male and 0 if female. We will now show that dummy variables prove very useful in explor-
ing the possibility of discrimination by considering three types of models:
415 Dummy and Interaction Variables
13.2.3 Models
Since this model includes only a constant, we are theorizing that except for random influences
each faculty member earns the same salary. That is, this model attributes all variations in income
to random influences.
Step 1: Collect data, run the regression, and interpret the estimates.
To estimate the model, you must “trick” EViews into running the appropriate regression:
• In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight one other
variable, say SexM1.
• In the Workfile window: double click a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete SexM1 so that the window looks like this:
salary c
• Click OK.
Table 13.1 confirms the fact that when a regression only includes a constant, the ordinary least
squares (OLS) estimate of the constant is just the average of the dependent variable. To empha-
size this fact, we will now run two more regressions with only a constant: one regression includ-
ing only men (table 13.2) and one including only women (table 13.3).
416 Chapter 13
Table 13.1
Discrimination regression results—All observations
Table 13.2
Discrimination regression results—Males only
Table 13.3
Discrimination regression results—Females only
Tables 13.1, 13.2, and 13.3 illustrate the important lesson that type 1 models teach us. In a
regression that includes only a constant, the ordinary least squares (OLS) estimate of the constant
is the average of the dependent variable. Next let us consider a slightly more complicated model.
417 Dummy and Interaction Variables
Type 2 Models: A Constant and a Single Dummy Explanatory Variable Denoting Sex
Discrimination theory: Women are discriminated against in the job market; hence, men earn
higher salaries than women. Since SexM1 equals 1 for males and 0 for females, βSexM1 should be
positive indicating that men will earn more than women: βSexM1 > 0.
Step 1: Collect data, run the regression, and interpret the estimates.
Using the ordinary least squares (OLS) estimation procedure, we estimate the parameters
(table 13.4):
For emphasis, let us apply the estimated equation to men and then to women by plugging in
their values for SexM1:
We can now compute the estimated salary for men and women:
Next note something very interesting by comparing the regression results to the salary
averages:
Table 13.4
Discrimination regression results—Male sex dummy
An ordinary least squares (OLS) regression that includes only a constant and a dummy variable
is equivalent to comparing averages. The conclusions are precisely the same: men earn $28,693
more than women. The dummy variable’s coefficient estimate equals the difference of the
averages.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, there is no discrimination.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question for discrimination hypothesis: What is the probability that the results
would be like those we obtained (or even stronger), if the cynic is correct and no discrimination
were present?
• Specific question for discrimination hypothesis: What is the probability that the coefficient
estimate, bSexM1, in one regression would be 2,240 or more, if H0 were true (if the actual coef-
ficient, βSexM1, equals 0)?
Steps 4 and 5: To calculate the Prob[Results IF H0 true], use the tails probability reported in
the regression printout. This is easy to do. Since this is a one-tailed test, we divide the tails
probability by 2:2
< 0.0001
Prob[ Results IF H 0 true] = = < 0.0001
2
Clearly, the Prob[Results IF H0 true] is very small. We can reject the null hypothesis which
asserts that no discrimination exists.
Before we continue, let us point out that our dummy variable, SexM1, assigned 1 to males
and 0 to females. This was an arbitrary choice. We could just as easily assigned 0 to males and
1 to females, could we not? To see what happens when we switch the assignments, generate a
new variable, SexF1:
2. Note that even though the tails probability is reported as 0.0000, the probability can never precisely equal 0. It will
always exceed 0. Consequently, instead of writing 0.0000, we write < 0.0001 to emphasize the fact that the probability
can never equal precisely 0.
419 Dummy and Interaction Variables
SexF1 = 1 − SexM1
Discrimination theory: Women are discriminated against in the job market; hence women earn
lower salaries than men. Since SexF1 equals 1 for females and 0 for males, βSexF1 should be
negative indicating that women will earn less than men: βSexF1 < 0.
Step 1: Collect data, run the regression, and interpret the estimates.
After we generate the new dummy variable, SexF1, we can easily run the regression (table 13.5).
Let us apply this estimated equation to men and then to women by plugging in their values
for SexF1:
The results are precisely the same as before. This is reassuring. The decision to assign 1 to one
group and 0 to the other group is completely arbitrary. It would be very discomforting if this
Table 13.5
Discrimination regression results—Female sex dummy
arbitrary decision affected our conclusions. The fact that the arbitrary decision does not affect
the results is crucial.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, there is no discrimination.
The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is
consistent with the evidence.
Steps 3, 4, and 5: It is easy to calculate the Prob[Results IF H0 true] by using the tails probabil-
ity reported in the regression printout. Since this is a one-tailed test, we divide the tails probabil-
ity by 2:
< 0.0001
Prob[ Results IF H 0 true] = = < 0.0001
2
Since the probability is so small, we reject the null hypothesis that no discrimination exists.
Bottom line:
• Our choice of the base group for the dummy variable (i.e., the group that is assigned a value
of 0 for the dummy variable) does not influence the results.
• Type 2 models, models that include only a constant and a dummy variable, are equivalent to
comparing averages.
On the other hand, what implicit assumption is this discrimination model making? The model
implicitly assumes that the only relevant factor in determining faculty salaries is gender. Is this
reasonable? Well, very few individuals contend that gender is the only factor. Many individuals
believe that gender is one factor, perhaps an important factor, affecting salaries, but they believe
that other factors such as education and experience also play a role.
421 Dummy and Interaction Variables
Type 3 Models: A Constant, a Dummy Explanatory Variable Denoting Sex, and Other
Explanatory Variable(s)
While these models allow the possibility of gender discrimination, they also permit us to explore
the possibility that other factors affect salaries too. To explore such models, let us include both
a dummy variable and the number of years of experience as explanatory variables.
Theories:
• Discrimination: As before, we theorize that women are discriminated against: βSexF1 < 0.
• Experience: It is generally believed that in most occupations, employees with more experi-
ence earn more than employees with less experience. Consequently we theorize that the experi-
ence coefficient should be positive: βExper > 0.
Step 1: Collect data, run the regression, and interpret the estimates.
We can now compute the estimated salary for men and women (table 13.6):
Table 13.6
Discrimination regression results—Female sex dummy and experience
= 42,238 + 2,447Experience
For women, SexF1 = 1:
39,998 + 2,447Experience
We can illustrate the estimated salaries of men and women graphically (figure 13.1).
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
• Cynic’s view on discrimination: Despite the results, there is no discrimination.
• Cynic’s view on experience: Despite the results, experience does not increase salary.
EstSalary
Slope = 2,447
42,238
2,240
EstSalaryWomen = 39,998 + 2,447 Experience
39,998
Experience
Figure 13.1
Salary discrimination
423 Dummy and Interaction Variables
The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is con-
sistent with the evidence. We will proceed by focusing on discrimination.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question for discrimination hypothesis: What is the probability that the results
would be like those we obtained (or even stronger), if the cynic is correct and no discrimination
were present?
• Specific question for discrimination hypothesis: The regression’s coefficient estimate was
−2,240. What is the probability that the coefficient estimate in one regression would be −2,240
or less, if H0 were true (if the actual coefficient, βSexF1, equals 0; i.e., if no discrimination existed)?
0.4638
Prob[ Results IF H 0 true] = = 0.23
2
At the traditional significance levels of 1, 5, and 10 percent, we cannot reject the null hypothesis
that no discrimination exists. What should we make of this dramatic change?
0.4638/2 = .2319
bSexF1
− 2,240 0
Figure 13.2
Probability distribution of coefficient estimate
424 Chapter 13
Focus on our last model: Salaryt = βConst + βSexF1SexF1t + βExper Experiencet + et.
Implicit assumption: One year of added experience increases the salary of men and women by
equal amounts.
In other words, this model implicitly assumes that women start behind men by a certain amount
and then remain behind men by that same amount for each level of experience. We will call this
“lump sum” discrimination. Figure 13.3 illustrates this well; the slopes of the lines representing
the estimated salaries for men and women are equal.
Might gender discrimination take another form? Yes. Experience could affect the salaries of
men and women differently. It is possible for a man to receive more for an additional year of
experience than a woman. In other words, could men be more highly rewarded for experience
than women? Our last model excludes this possibility because it implicitly assumes that a year
of added experience increases the salary of men and women by equal amounts. To explore the
possibility of this second type of discrimination, we will introduce interaction variables. We will
refer to this type of discrimination as “raise” discrimination.
EstSalary
Slope = 2,447
42,238
2,240
EstSalaryWomen = 39,998 + 2,447 Experience
39,998
Experience
Figure 13.3
Estimated discrimination equations with “lump sum” discrimination
425 Dummy and Interaction Variables
An interaction variable allows us to explore the possibility that one explanatory variable influ-
ences the effect that a second explanatory variable has on the dependent variable. We generate
an interaction variable by multiplying the two variables together. We will focus on the interaction
of Experience and SexF1 by generating the variable Exper_SexF1:
We will now add the interaction variable, Exper_SexF1, to our last model.
Theories:
• “Lump sum” discrimination: As before, we theorize that women are discriminated against:
βSexF1 < 0.
• Experience: As before, we theorize that the experience coefficient should be positive:
βExper > 0.
• “Raise” discrimination: One year of additional experience should increase the salary of
women by less than their male counterparts. Hence we theorize that the coefficient of the inter-
action variable is negative: βExper_SexF1 < 0. (If it is not clear why you should expect this coefficient
to be negative, be patient. It should become clear shortly.)
Step 1: Collect data, run the regression, and interpret the estimates (table 13.7).
Table 13.7
Discrimination regression results—Female sex dummy, experience, and Female sex dummy−Experience
interaction variable
= 37,595 + 0 + 2,676Experience − 0
= 37,595 + 2,676Experience
For women,
= 48,565 + 1,541Experience
Plot the estimated salary for men and women (figure 13.4). We can use this regression to assess
the possibility of two different types of discrimination. One of the estimates is a little
surprising:
EstSalary
48,565
37,595
Experience
Figure 13.4
Estimated discrimination equations with “lump sum” and “raise” discrimination
427 Dummy and Interaction Variables
• “Lump sum” discrimination: As before, the coefficient of the sex dummy variable, SexF1,
assesses the possibility of “lump sum” discrimination. The coefficient estimate is positive. This
is unexpected. It suggests that when faculty members are hired from graduate school with no
experience, women receive about $10,970 more than men. The positive coefficient estimate
suggests that reverse discrimination exists at the entry level.
• “Raise” discrimination: The coefficient of the interaction variable, Exper_SexF1, assesses
the possibility of this more subtle type of discrimination, “raise” discrimination. The coefficient
estimate is negative. It suggests that a woman is receives $1,135 less than a man for an additional
year of experience. The negative coefficient estimate suggests that women receive smaller annual
raises than their male counterparts.
These regression results paint a more complex picture of possible discrimination than is often
contemplated. Again, recall that as a consequence of privacy concerns these data were artificially
generated. Consequently, do not conclude that the conclusions we have suggested here neces-
sarily reflect the “real world.” This example was used because it illustrates how multiple regres-
sion analysis can exploit dummy variables and interaction variables to investigate important
issues, such as the presence of discrimination.
13.2.6 Conclusions
+ β Int
CopPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht
Int Int
+ e Int
t
+ β TV
CapPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht + e t
TV TV TV
The dependent variable in both the Internet and television models is the logarithm of users. This
is done so that the coefficients can be interpreted as percentages.
The theory behind the effect of human capital, physical capital, and per capita GDP on both
Internet and television use is straightforward: Additional human capital, physical capital, and
per capita GDP should stimulate both Internet and television use.
429 Dummy and Interaction Variables
We postulate that the impact of time and political factors should be different for the two media,
however:
• As an emerging technology, we theorize that there should be, on the one hand, substantial
growth of Internet use over time—even after accounting for all the other factors that may affect
Internet use. Television, on the other hand, is a mature technology. After accounting for all the
other factors, time should play little or no role in explaining television use.
• We postulate that the political factors should affect Internet and television use differently. On
the one hand, since authoritarian nations control the content of television, we would expect
authoritarian nations to promote television; television provides the authoritarian nation the means
to get the government’s message out. On the other hand, since it is difficult to control Internet
content, we would expect authoritarian nations to suppress Internet use.
Table 13.8 summarizes our theories and presents the appropriate null and alternative hypoth-
eses. As table 13.8 reports, all the hypothesis tests are one-tailed tests with the exception of the
Year coefficient in the television use model.
Let us begin by focusing on Internet use.
Step 1: Collect data, run the regression, and interpret the estimates.
Since the dependent variables are logarithms, we interpret the coefficient estimates in terms
of percentages (table 13.9). The signs of all the coefficient estimates support our theories.
Table 13.8
Theories and hypotheses for Internet and television use
LogUsersInternet LogUsersTV
Year β Int
Year >0 H0: β Int
Year =0 β TV
Year =0 H0: β TV
Year = 0
CapitalPhysical β Int
CapPhy >0 H0: β Int
CapPhy =0 β TV
CapPhy >0 H0: β TV
CapPhy = 0
Auth β Int
Auth < 0 H0: β Int
Auth = 0 β TV
Auth > 0 H0: β TV
Auth = 0
Table 13.9
Internet regression results
Note that all the results support the theories and all the coefficients except for the Year coefficient
in the television regression are significant at the 1 percent level. It is noteworthy that the regres-
sion results suggest that the impact of Year and Auth differ for the two media just as we postu-
lated. Our results suggest that after accounting for all other explanatory variables:
• Internet use grows by an estimated 45 percent per year whereas the annual growth rate of
television use does not differ significantly from 0.
• Increases in the authoritarian index results to a significant decrease Internet use, but a signifi-
cant increase television use.
431 Dummy and Interaction Variables
Table 13.10
Television regression results
Table 13.11
Coefficient estimates and Prob[Results IF H0 true]
LogUsersInternet LogUsersTV
Question: Does per capita GDP have a greater impact on Internet use in authoritarian nations
than nonauthoritarian ones?
Some argue that the answer to this question is yes; that is, that per capita GDP has a greater
impact on Internet use in authoritarian nations. Their rationale is based on the following logic:
• In authoritarian nations, citizens have few sources of uncensored information. There are few,
if any, uncensored newspapers, news magazines, etc. available. The only source of uncensored
information is the Internet. Consequently the effect of per capita GDP on Internet use will be
large.
• In nonauthoritarian nations, citizens have many sources of uncensored information. Higher per
capita GDP will no doubt stimulate Internet use, but it will also stimulate the purchase of uncen-
sored newspapers, news magazines, etc. Consequently the effect on Internet use will be modest.
An authoritarian index–GDP interaction variable can be used to explore this issue. To do so,
generate the interaction variable Auth_GdpPC, the product of the authoritarian index and per
capita GDP:
LogUsersInternett = β Int
Const + β Year Yeart +β CapHum CapitalHumant
Int Int
+ β Int
CapPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht
Int Int
+ β Int
Auth_GDP Auth_GdpPCt + e t
Int
If the theory regarding the interaction of authoritarianism and per capita GDP is correct, the
coefficient of the interaction variable, Auth_GdpPC, should positive: β Int
Auth_GDP > 0. (If you are
not certain why, it should become clear shortly.) The null and alternative hypotheses are
H0: β Int
Auth_GDP = 0
H1: β Int
Auth_GDP > 0
Step 1: Collect data, run the regression, and interpret the estimates.
Focus attention on the estimated effect of GDP. To do so, consider both the GDP and Auth_
GDP terms in the estimated equation (table 13.12):
433 Dummy and Interaction Variables
Table 13.12
Internet regression results—With interaction variable
Table 13.13
Interaction variable estimate calculations
We will now estimate the impact of GDP for several values of the authoritarian index
(table 13.13).
Recall that as the authoritarian index increases, the level of authoritarianism rises. Therefore
the estimates suggest that as a nation becomes more authoritarian, a $1,000 increase in per capita
GDP increases Internet use by larger amounts. This supports the position of those who believe
that citizens of all nations seek out uncensored information. In authoritarian nations, citizens
434 Chapter 13
have few sources of uncensored information; therefore, as per capita GDP rises, they embrace
the uncensored information the Internet provides more enthusiastically than do citizens of non-
authoritarian nation in which other sources of uncensored information are available.
Chapter 13 Exercises
Faculty salary data: Artificially constructed cross section salary data and characteristics for 200
faculty members.
1. Reconsider the faculty salary data and add the number of articles each faculty member has
published to the model:
c. Formulate the null and alternative hypotheses regarding the effect of published articles.
d. Assess the effect of published articles.
2. Again, reconsider the faculty salary data and add an article–sex interaction variable to the
faculty salary model:
+ βArt_SexF1Articles_Sext + et
435 Dummy and Interaction Variables
Allegation: Women receive less credit for their publications than do their male colleagues.
What does the allegation suggest about the sign of the Articles_Sex coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the published articles–sex interaction coefficient estimate.
House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
3. Revisit the House earmark data and consider the following model:
a. Develop a theory that explains how each explanatory variable affects the number of solo
earmarks. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
+ βLib_Dem Lib_Demt + et
Allegation: Liberal Democrats receive more earmarks than their non-Democratic liberal
colleagues.
What does the allegation suggest about the sign of the Lib_Dem coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the Lib_Dem interaction coefficient estimate. What do you conclude
about the allegation?
+ βNERegionNortheastt + et
437 Dummy and Interaction Variables
Allegation: Members of Congress from the Northeast receive more earmarks than their col-
leagues from other parts of the country.
What does the allegation suggest about the sign of the RegionNortheast coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the Northeast region coefficient estimate.
c. Formulate the null and alternative hypotheses regarding the allegation.
d. Calculate Prob[Results IF H0 true] and assess the allegation.
Omitted Explanatory Variables, Multicollinearity,
14 and Irrelevant Explanatory Variables
Chapter 14 Outline
14.1 Review
14.1.1 Unbiased Estimation Procedures
14.1.2 Correlated and Independent (Uncorrelated) Variables
14.3 Multicollinearity
14.3.1 Perfectly Correlated Explanatory Variables
14.3.2 Highly Correlated Explanatory Variables
14.3.3 “Earmarks” of Multicollinearity
1. Review the goal of multiple regression analysis. In words, explain what multiple regression
analysis attempts to do?
2. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
3. Consider an estimate’s probability distribution. Review the importance of its mean and
variance:
440 Chapter 14
Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.
Attendance depends not only on the ticket price but also on the salary of the home team.
i. Devise a theory explaining the effect that home team salary should have on attendance.
What does your theory suggest about the sign of the HomeSalary coefficient, βHomeSalary?
ii. Use the ordinary least squares (OLS) estimation procedure to estimate both of the
model’s coefficients. Interpret the regression results.
c. What do you observe about the estimates for the PriceTicket coefficients in the two
models?
7. Again, focus on the baseball data and consider the following two variables:
Generate a new variable, PriceCents, to express the price in terms of cents rather than dollars:
Run the regression to estimate the parameters of this model. You will get an “unusual” result.
Explain this by considering what multiple regression analysis attempts to do.
8. The following are excerpts from an article appearing in the New York Times on September 1,
2008:
Doubts Grow over Flu Vaccine in Elderly by Brenda Goodman
The influenza vaccine, which has been strongly recommended for people over 65 for more than four
decades, is losing its reputation as an effective way to ward off the virus in the elderly.
A growing number of immunologists and epidemiologists say the vaccine probably does not work very
well for people over 70 . . .
The latest blow was a study in The Lancet last month that called into question much of the statistical
evidence for the vaccine’s effectiveness. . . .
The study found that people who were healthy and conscientious about staying well were the most likely
to get an annual flu shot. . . . [others] are less likely to get to their doctor’s office or a clinic to receive the
vaccine.
Dr. David K. Shay of the Centers for Disease Control and Prevention, a co-author of a commentary that
accompanied Dr. Jackson’s study, agreed that these measures of health . . . “were not incorporated into
early estimations of the vaccine’s effectiveness” and could well have skewed the findings.
a. Does being healthy and conscientious about staying well increase or decrease the chances
of getting flu?
b. According to the article, are those who are healthy and conscientious about staying well
more or less likely to get a flu shot?
c. The article alleges that previous studies did not incorporate health and conscientious in
judging the effectiveness of flu shots. If the allegation is true, have previous studies overes-
timated or underestimated the effectiveness of flu shots?
d. Suppose that you were the director of your community’s health department. You are con-
sidering whether or not to subsidize flu vaccines for the elderly. Would you find the previous
studies useful? That is, would a study that did not incorporate health and conscientious in
judging the effectiveness of flu shots help you decide if your department should spend your
limited budget to subsidize flu vaccines? Explain.
14.1 Review
Estimates are random variables. Consequently there is both good news and bad news. Before
the data are collected and the parameters are estimated:
443 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
• Bad news: On the one hand, we cannot determine the numerical values of the estimates with
certainty (even if we knew the actual values).
• Good news: On the other hand, we can often describe the probability distribution of the
estimate telling us how likely it is for the estimate to equal its possible numerical values.
Accordingly we can apply the relative frequency interpretation of probability. In one repetition,
the chances that the estimate will be greater than the actual value equal the chances that the
estimate will be less.
Probability distribution
Estimate
Actual value
Figure 14.1
Probability distribution of an estimate—Unbiased estimation procedure
444 Chapter 14
Estimate Estimate
Actual value Actual value
↓ ↓
Estimate is unreliable Estimate is reliable
Figure 14.2
Probability distribution of an estimate—Importance of variance
Scatter Diagrams
The Dow Jones and Nasdaq growth rates are positively correlated. Most of the scatter diagram
points lie in the first and third quadrants (figure 14.3). When the Dow Jones growth rate is high,
the Nasdaq growth rate is usually high also. Similarly, when the Dow Jones growth rate is low,
the Nasdaq growth rate is usually low also. Knowing one growth rate helps us predict the other.
Amherst precipitation and the Nasdaq growth rate are independent, uncorrelated. The scatter
diagram points are spread rather evenly across the graph. Knowing the Nasdaq growth rate does
not help us predict Amherst precipitation, and vice versa.
445 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
10 10
–10 –10
–20 –20
Figure 14.3
Scatter diagrams, correlation, and independence
Correlation Coefficient
The correlation coefficient indicates the degree to which two variables are correlated; the correla-
tion coefficient ranges from −1 to +1:
• = 0 = Independent (uncorrelated): Knowing the value of one variable does not help us
predict the value of the other.
• > 0 = Positive correlation: Typically, when the value of one variable is high, the value of
the other variable will be high.
• < 0 = Negative correlation: Typically, when the value of one variable is high, the value of
the other variable will be low.
We will consider baseball attendance data to study the omitted variable phenomena.
Let us begin our analysis by focusing on the price of tickets. Consider the following two models
that attempt to explain game attendance:
The first model has a single explanatory variable, ticket price, PriceTicket:
Downward sloping demand theory: This model is based on the economist’s downward sloping
demand theory. An increase in the price of a good decreases the quantity demand. Higher ticket
prices should reduce attendance; hence the PriceTicket coefficient should be negative:
βPrice < 0
We will use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 14.1):
447 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
Table 14.1
Baseball attendance regression results—Ticket price only
The estimated coefficient for the ticket price is positive suggesting that higher prices lead to an
increase in quantity demanded. This contradicts the downward sloping demand theory, does
it not?
In the second model, we include not only the price of tickets, PriceTicket, as an explanatory
variable, but also the salary of the home team, HomeSalary:
We can justify the salary explanatory variable in the grounds that fans like to watch good players.
We will call this the star theory. Presumably a high salary team has better players, more stars,
on its roster and accordingly will draw more fans.
Teams with higher salaries will have better players, which will increase attendance.
Star theory:
The HomeSalary coefficient should be positive:
βHomeSalary > 0
Now use the ordinary least squares (OLS) estimation procedure to estimate the parameters (table
14.2). These coefficient estimates lend support to our theories.
The two models produce very different results concerning the effect of the ticket price on
attendance. More specifically, the coefficient estimate for ticket price changes drastically from
1,897 to −591 when we add home team salary as an explanatory variable. This is a disquieting
puzzle. We will solve this puzzle by reviewing the goal of multiple regression analysis and then
explaining when omitting an explanatory variable will prevent us from achieving the goal.
448 Chapter 14
Table 14.2
Baseball attendance regression results—Ticket price and home team salary
Multiple regression analysis attempts to sort out the individual effect of each explanatory vari-
able. The estimate of an explanatory variable’s coefficient allows us to assess the effect that an
individual explanatory variable has on the dependent variable. An explanatory variable’s coef-
ficient estimate estimates the change in the dependent variable resulting from a change in that
particular explanatory variable while all other explanatory variables remain constant.
In model 1 we estimate that a $1.00 increase in the ticket price increase attendance by nearly
2,000 per game, whereas in model 2, we estimate that a $1.00 increase decreases attendance by
about 600 per game. The two models suggest that the individual effect of the ticket price is very
different. The omitted variable phenomenon allows us to resolve this puzzle.
Claim: Omitting an explanatory variable from a regression will bias the estimation procedure
whenever two conditions are met. Bias results if the omitted explanatory variable
• influences the dependent variable;
• is correlated with an included explanatory variable.
When these two conditions are met, the coefficient estimate of the included explanatory variable
is a composite of two effects, the influence that the
• included explanatory variable itself has on the dependent variable (direct effect);
• omitted explanatory variable has on the dependent variable because the included explanatory
variable also acts as a proxy for the omitted explanatory variable (proxy effect).
449 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
Act Coef2
Pause
−5
0
Start Stop
5
Both X’s
Repetition
Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
corr X1&X2
Ests below act
− 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2
Figure 14.4
Omitted variable simulation
Since the goal of multiple regression analysis is to sort out the individual effect of each explana-
tory variable we want to capture only the direct effect.
We can now use the Econometrics Lab to justify our claims concerning omitted explanatory
variables. The following regression model including two explanatory variables is used
(figure 14.4):
The simulation provides us with two options; we can either include both explanatory variables
in the regression, “Both Xs” or just one, “Only X1.” By default the “Only X1” option is selected,
consequently the second explanatory variable is omitted. That is, x1t is the included explanatory
450 Chapter 14
variable and x2t is the omitted explanatory variable. For simplicity, assume that x1’s coefficient,
βx1, is positive. We will consider three cases to illustrate when bias does and does not result:
• Case 1: The coefficient of the omitted explanatory variable is positive and the two explana-
tory variables are independent (uncorrelated).
• Case 2: The coefficient of the omitted explanatory variable equals zero and the two explana-
tory variables are positively correlated.
• Case 3: The coefficient of the omitted explanatory variable is positive and the two explana-
tory variables are positively correlated.
We will now show that only in the last case does bias result because only in the last case is the
proxy effect is present.
Case 1: The coefficient of the omitted explanatory variable is positive and the two explanatory
variables are independent (uncorrelated).
Will bias result in this case? Since the two explanatory variables are independent (uncorrelated),
an increase in the included explanatory variable, x1t, typically will not affect the omitted explana-
tory variable, x2t. Consequently the included explanatory variable, x1t, will not act as a proxy
for the omitted explanatory variable, x2t. Bias should not result.
Typically
Included omitted
variable Independence variable
x1t up x2t unaffected
↓ βx1 > 0 ↓ βx2 > 0
yt up yt unaffected
↓ ↓
Direct effect No proxy effect
We will use our lab to confirm this logic. By default, the actual coefficient for the included
explanatory variable, x1t, equals 2 and the actual coefficient for the omitted explanatory variable,
x2t, is nonzero, it equals 5. Their correlation coefficient, Corr X1&X2, equals 0.00; hence the
two explanatory variables are independent (uncorrelated). Be certain that the Pause checkbox is
cleared. Click Start and after many, many repetitions, click Stop. Table 14.3 reports that the
average value of the coefficient estimates for the included explanatory variable equals its actual
value. Both equal 2.0. The ordinary least squares (OLS) estimation procedure is unbiased.
The ordinary least squares (OLS) estimation procedure captures the individual influence that
the included explanatory variable itself has on the dependent variable. This is precisely the effect
that we wish to capture. The ordinary least squares (OLS) estimation procedure is unbiased; it
is doing what we want it to do.
451 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
Table 14.3
Omitted variables simulation results
2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50
Table 14.4
Omitted variables simulation results
2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50
2 0 0.30 ≈ 2.0 ≈ 50 ≈ 50
Case 2: The coefficient of the omitted explanatory variable equals zero and the two explanatory
variables are positively correlated.
In the second case the two explanatory variables are positively correlated; when the included
explanatory variable, x1t, increases, the omitted explanatory variable, x2t, will typically increase
also. But the actual coefficient of the omitted explanatory variable, βx2, equals 0; hence, the
dependent variable, yt, is unaffected by the increase in x2t. There is no proxy effect because the
omitted variable, x2t, does not affect the dependent variable; hence bias should not result.
Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 = 0
yt up yt unaffected
↓ ↓
Direct effect No proxy effect
To confirm our logic with the simulation, be certain that the actual coefficient for the omitted
explanatory variable equals 0 and the correlation coefficient equals 0.30. Click Start and then
after many, many repetitions, click Stop. Table 14.4 reports that the average value of the coef-
ficient estimates for the included explanatory variable equals its actual value. Both equal 2.0.
The ordinary least squares (OLS) estimation procedure is unbiased.
452 Chapter 14
Again, the ordinary least squares (OLS) estimation procedure captures the influence that the
included explanatory variable itself has on the dependent variable. Again, there is no proxy effect
and all is well.
Case 3: The coefficient of the omitted explanatory variable is positive and the two explanatory
variables are positively correlated.
As with case 2 the two explanatory variables are positively correlated; when the included
explanatory variable, x1t, increases the omitted explanatory variable, x2t, will typically increase
also. But now the actual coefficient of the omitted explanatory variable, βx2, is no longer 0, it is
positive; hence an increase in the omitted explanatory variable, x2t, increases the dependent
variable. In additional to having a direct effect on the dependent variable, the included explana-
tory variable, x1t, also acts as a proxy for the omitted explanatory variable, x2t. There is a proxy
effect.
Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 > 0
yt up yt up
↓ ↓
Direct effect Proxy effect
In the simulation, the actual coefficient of omitted explanatory variable, βx2, once again equals
5. The two explanatory variables are positively correlated, the correlation coefficient equals 0.30.
Click Start and then after many, many repetitions click Stop. Table 14.5 reports that the average
value of the coefficient estimates for the included explanatory variable, 3.5, exceeds its actual
value, 2.0. The ordinary least squares (OLS) estimation procedure is biased upward.
Now we have a problem. The ordinary least squares (OLS) estimation procedure overstates
the influence of the included explanatory variable, the effect that the included explanatory vari-
able itself has on the dependent variable.
Table 14.5
Omitted variables simulation results
2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50
2 0 0.30 ≈ 2.0 ≈ 50 ≈ 50
2 5 0.30 ≈ 3.5 ≈ 28 ≈ 72
453 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
bx1 < 2
bx1
2 3. 5
Figure 14.5
Probability distribution of an estimate—Upward bias
Let us now take a brief aside. Case 3 provides us with the opportunity to illustrate what bias
does and does not mean.
• What bias does mean: Bias means that the estimation procedure systematically overesti-
mates or underestimates the actual value. In this case, upward bias is present. The average of
the estimates is greater than the actual value after many, many repetitions.
• What bias does not mean: Bias does not mean that the value of the estimate in a single
repetition must be less than the actual value in the case of downward bias or greater than the
actual value in the case of upward bias. Focus on the last simulation. The ordinary least squares
(OLS) estimation procedure is biased upward as a consequence of the proxy effect. Despite the
upward bias, however, the estimate of the included explanatory variable is less than the actual
value in many of the repetitions as shown in figure 14.5.
Upward bias does not guarantee that in any one repetition the estimate will be greater than the
actual value. It just means that it will be greater “on average.” If the probability distribution is
symmetric, the chances of the estimate being greater than the actual value exceed the chances
of being less.
Now we return to our three omitted variable cases by summarizing them (table 14.6).
Question: Is the estimation procedure biased or unbiased when both explanatory variables are
included in the regression?
454 Chapter 14
Table 14.6
Omitted variables simulation summary
1 Yes No Unbiased
2 No Yes Unbiased
3 Yes Yes Biased
Table 14.7
Omitted variables simulation results—No omitted variables
2 5 0.3 ≈ 2.0
To address this question, “Both Xs” is now selected. This means that both explanatory variables,
x1t and x2t, will be included in the regression. Both explanatory variables affect the dependent
variable and they are correlated. As we saw in case 3, if one of the explanatory variables is
omitted, bias will result. To see what occurs when both explanatory variables are included, click
Start and after many, many repetitions, click Stop. When both variables are included the ordi-
nary least squares (OLS) estimation procedure is unbiased (table 14.7).
Conclusion: To avoid omitted variable bias, all relevant explanatory variables should be
included in a regression.
The ticket price coefficient estimate is affected dramatically by the presence of home team salary;
in model 1 the estimate is much higher 1,897 versus −591. Why?
We will now argue that when ticket price is included in the regression and home team salary
is omitted, as in model 1, there reason to believe that the estimation procedure for the ticket
price coefficient will be biased. We just learned that the omitted variable bias results when the
following two conditions are met; when an omitted explanatory variable:
• influences the dependent variable
and
• is correlated with an included explanatory variable.
Model 1 omits home team salary, HomeSalaryt. Are the two omitted variable bias conditions
met?
• It certainly appears reasonable to believe that the omitted explanatory variable, HomeSalaryt,
affects the dependent variable, Attendancet. The club owner who is paying the high salaries
certainly believes so. The owner certainly hopes that by hiring better players more fans will
attend the games. Consequently it appears that the first condition required for omitted variable
bias is met.
• We can confirm the correlation by using statistical software to calculate the correlation matrix
(table 14.8).
Table 14.8
Ticket price and home team salary correlation matrix
Correlation matrix
PriceTicket HomeSalary
The correlation coefficient between PriceTickett and HomeSalaryt is 0.78; the variables are
positively correlated. The second condition required for omitted variable bias is met.
We have reason to suspect bias in model 1. When the included variable, PriceTickett, increases
the omitted variable, HomeSalaryt, typically increases also. An increase in the omitted variable,
HomeSalaryt, increases the dependent variable, Attendancet:
In additional to having a direct effect on the dependent variable, the included explanatory vari-
able, PriceTickett, also acts as a proxy for the omitted explanatory variable, HomeSalaryt. There
is a proxy effect and upward bias results. This provides us with an explanation of why the ticket
price coefficient estimate in model 1 is greater than the estimate in model 2.
Omitting an explanatory variable from a regression biases the estimation procedure whenever
two conditions are met. Bias results if the omitted explanatory variable:
• influences the dependent variable;
• is correlated with an included explanatory variable.
When these two conditions are met, the coefficient estimate of the included explanatory variable
is a composite of two effects; the coefficient estimate of the included explanatory reflects two
influences:
• The included explanatory variable, which has an effect on the dependent variable (direct
effect).
• The omitted explanatory variable, which has an effect on the dependent variable because the
included explanatory variable also acts as a proxy for the omitted explanatory variable (proxy
effect).
The bad news is that the proxy effect leads to bias. The good news is that we can eliminate the
proxy effect and its accompanying bias by including the omitted explanatory variable. But now,
we will learn that if two explanatory variables are highly correlated a different problem can
emerge.
457 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
14.3 Multicollinearity
The phenomenon of multicollinearity occurs when two explanatory variables are highly cor-
related. Recall that multiple regression analysis attempts to sort out the influence of each indi-
vidual explanatory variable. But what happens when we include two explanatory variables in a
single regression that are perfectly correlated? Let us see.
In our baseball attendance workfile, ticket prices, PriceTickett, are reported in terms of dollars.
Generate a new variable, PriceCentst, reporting ticket prices in terms of cents rather than dollars:
Note that the variables PriceTickett and PriceCentst are perfectly correlated. If we know one,
we can predict the value of the other with complete accuracy. Just to confirm this, use statistical
software to calculate the correlation matrix (table 14.9).
The correlation coefficient of PriceTickett and PriceCentst equals 1.00. The variables are
indeed perfectly correlated. Now run a regression with Attendance as the dependent variable and
both PriceTicket and PriceCents as explanatory variables.
Your statistical software will report a diagnostic. Different software packages provide different
messages, but basically the software is telling us that it cannot run the regression.
Why does this occur? The reason is that the two variables are perfectly correlated. Knowing
the value of one allows us to predict perfectly the value of the other with complete accuracy.
Both explanatory variables contain precisely the same information. Multiple regression analysis
attempts to sort out the influence of each individual explanatory variable. But if both variables
contain precisely the same information, it is impossible to do this. How can we possibility sepa-
rate out each variable’s individual effect when the two variables contain the identical informa-
tion? We are asking statistical software to do the impossible.
Table 14.9
EViews dollar and cent ticket price correlation matrix
Correlation matrix
PriceTicket PriceCents
Explanatory variables
perfectly correlated
↓
Knowing the value of one
explanatory value allows
us to predict perfectly the
value of the other
↓
Both variables contain
precisely the same information
↓
Impossible to separate out the
individual effect of each variable
Next we consider a case in which the explanatory variables are highly, although not perfectly,
correlated.
To investigate the problems created by highly correlated explanatory variable, we will use our
baseball data to investigate a model that includes four explanatory variables:
+ βHomeNWHomeNetWinst + βHomeGBHomeGamesBehindt + et
where
Table 14.10
2009 final season standings—AL East
The variable HomeGamesBehindt captures the home team’s standing in its divisional race. For
those who are not baseball fans, note that all teams that win their division automatically qualify
for the baseball playoffs. Ultimately the two teams what win the American and National League
playoffs meet in the World Series. Since it is the goal of every team to win the World Series,
each team strives to win its division. Games behind indicates how close a team is to winning its
division. To explain how games behind are calculated, consider the final standings of the Ameri-
can League Eastern Division in 2009 (table 14.10).
The Yankees had the best record; the games behind value for the Yankees equals 0. The Red
Sox won eight fewer games than the Yankees; hence the Red Sox were 8 games behind. The
Rays won 19 fewer games than the Yankees; hence the Rays were 19 games behind. Similarly
the Blue Jays were 28 games behind and the Orioles 39 games behind.1 During the season if a
team’s games behind becomes larger, it becomes less likely the team will win its division, less
likely for that team to qualify for the playoffs, and less likely for that team to eventually win
the World Series. Consequently, if a team’s games behind becomes larger, we would expect
home team fans to become discourage resulting in less attendance.
We use the terms team quality and division race to summarize our theories regarding home
net wins and home team games behind:
• Team quality theory: More net wins increase attendance. βHomeNW > 0.
• Division race theory: More games behind decreases attendance. βHomeGB < 0.
1. In this example all teams have played the same number of games. When a different number of games have been
played, the calculation becomes a little more complicated. Games behind for a non–first place team equals
( Wins of first − Wins of trailing) + ( Losses of trailing − Losses of first)
2
460 Chapter 14
Table 14.11
HomeNetWins and HomeGamesBehind correlation matrix
Correlation matrix
HomeNetWins HomeGamesBehind
Table 14.12
Attendance regression results
Table 14.11 reports that the correlation coefficient for HomeGamesBehindt and HomeNetWinst
equals −0.962. Recall that the correlation coefficient must lie between −1 and +1. When two
variables are perfectly negatively correlated their correlation coefficient equals −1. While Home-
GamesBehindt and HomeNetWinst are not perfectly negatively correlated, they come close; they
are highly negatively correlated.
We use the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters (table 14.12).
The sign of each estimate supports the theories. Focus on the two new variables included in
the model: HomeNetWinst and HomeGamesBehindt. Construct the null and alternative
hypotheses.
461 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
While the signs coefficient estimates are encouraging, some of results are disappointing:
• The coefficient estimate for HomeNetWinst is positive supporting our theory, but what about
the Prob[Results IF H0 true]? What is the probability that the estimate from one regression would
equal 60.53 or more, if the H0 were true (i.e., if the actual coefficient, βHomeNW, equals 0, if home
team quality has no effect on attendance)? Using the tails probability, we can easily calculate
the probability
0.4778
Prob[ Results IF H 0 true] = ≈ 0.24
2
We cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10 percent,
suggesting that it is quite possible for the null hypothesis to be true, quite possible that home
team quality has no effect on attendance.
• Similarly the coefficient estimate for HomeGamesBehindt is negative supporting our theory,
but what about the Prob[Results IF H0 true]? What is the probability that the estimate from one
regression would equal −84.39 or less, if the H0 were true (i.e., if the actual coefficient, βHomeGB,
equals 0, if games behind has no effect on attendance)? Using the tails probability, we can easily
calculate the probability
0.6138
Prob[ Results IF H 0 true] = ≈ 0.31
2
Again, we cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10
percent, suggesting that it is quite possible for the null hypothesis to be true, quite possible that
games behind has no effect on attendance.
Should we abandon our “theory” as a consequence of these regression results?
Let us perform a Wald test (table 14.13) to access the proposition that both coefficients
equal 0:
Table 14.13
EViews Wald test results
Wald test
Degrees of freedom
Prob[Results IF H0 true]: What is the probability that the F-statistic would be 111.4 or more, if
the H0 were true (i.e., if both βHomeNW and βHomeGB equal 0, if both team quality and games behind
have no effect on attendance)?
We can reject the null hypothesis at a 1 percent significance level; it is unlikely that both team
quality and games behind have no effect on attendance.
There appears to be a paradox when we compare the t-tests and the Wald test:
Individually, neither team quality nor games behind appears to influence attendance significantly,
but taken together by asking if team quality and/or games behind influence attendance, we
conclude that they do.
Next let us run two regressions each of which includes only one of the two troublesome
explanatory variables (tables 14.14 and 14.15). When only a single explanatory variable is
included the coefficient is significant.
463 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
Table 14.14
EViews attendance regression results—HomeGamesBehind omitted
Table 14.15
EViews attendance regression results—HomeNetWins omitted
How can we explain this? Recall that multiple regression analysis attempts to sort out the influ-
ence of each individual explanatory variable. When two explanatory variables are perfectly
correlated, it is impossible for the ordinary least squares (OLS) estimation procedure to separate
out the individual influences of each variable. Consequently, if two variables are highly corre-
lated, as team quality and games behind are, it may be very difficult for the ordinary least squares
(OLS) estimation procedure to separate out the individual influence of each explanatory variable.
This difficulty evidences itself in the variance of the coefficient estimates’ probability distribu-
tions. When two highly correlated variables are included in the same regression, the variances
of each estimate’s probability distribution is large. This explains our t-test results.
0 50
1 75
2 100
125
Act Coef2
Pause
−5
0
Start Stop
5
Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
Corr X1&X2
Ests below act − 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2
Figure 14.6
Multicollinearity simulation
By default the actual value of the coefficient for the first explanatory variable equals 2 and actual
value for the second equals 5. Note that the “Both Xs” is selected; both explanatory variables
are included in the regression. Initially, the correlation coefficient is specified as 0.00; that is,
initially the explanatory variables are independent. Be certain that the Pause checkbox is cleared
and click Start. After many, many repetitions click Stop. Next repeat this process for a correla-
tion coefficient of 0.30, a correlation coefficient of 0.60, and a correlation coefficient of 0.90
(table 14.16).
466 Chapter 14
Table 14.16
Multicollinearity simulation results
An irrelevant explanatory variable is a variable that does not influence the dependent variable.
Including an irrelevant explanatory variable can be viewed as adding “noise,” an additional
element of uncertainty, into the mix. An irrelevant explanatory variable adds a new random
influence to the model. If our logic is correct, irrelevant explanatory variables should lead to
both good news and bad news:
• Good news: Random influences do not cause the ordinary least squares (OLS) estimation
procedure to be biased. Consequently the inclusion of an irrelevant explanatory variable should
not lead to bias.
• Bad news: The additional uncertainty added by the new random influence means that the
coefficient estimate is less reliable; the variance of the coefficient estimate’s probability distribu-
tion should rise when an irrelevant explanatory variable is present.
0 50
1 75
2 100
125
Act Coef2
Pause
−5
0
Start Stop
5
Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
Corr X1&X2
Ests below act − 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2
Figure 14.7
Irrelevant explanatory variable simulation
By default the first explanatory variable, x1t, is the relevant explanatory variable; the default
value of its coefficient is 2. The second explanatory variable, x2t, is the irrelevant one (figure
14.7). An irrelevant explanatory variable has no effect on the dependent variable; consequently
the actual value of its coefficient, βx2, equals 0.
Initially the “Only X1” option is selected indicating that only the relevant explanatory vari-
able, x1t, is included in the regression; the irrelevant explanatory variable, x2t, is not included.
Click Start and then after many, many repetitions click Stop. Since the irrelevant explanatory
variable is not included in the regression, correlation between the two explanatory variables will
have no impact on the results. Confirm this by changing correlation coefficients from 0.00 to
0.30 in the “Corr X1&X2” list. Click Start and then after many, many repetitions click Stop.
Similarly show that the results are unaffected when the correlation coefficient is 0.60 and 0.90.
Subsequently investigate what happens when the irrelevant explanatory variable is included
by selecting the “Both Xs” option; the irrelevant explanatory, x2t, will now be included in the
468 Chapter 14
Table 14.17
Irrelevant explanatory variable simulation results
regression. Be certain that the correlation coefficient for the relevant and irrelevant explanatory
variables initially equals 0.00. Click Start and then after many, many repetitions click Stop.
Investigate how correlation between the two explanatory variables affects the results when the
irrelevant explanatory variable is included by selecting correlation coefficient values of 0.30,
0.60, and 0.90. For each case click Start and then after many, many repetitions click Stop. Table
14.17 reports the results of the lab.
The results reported in table 14.17 are not surprising; the results support our intuition. On the
one hand, when only the relevant (variable 1) is included:
• The mean of the coefficient estimate for relevant explanatory variable, x1t, equals 2, the actual
value; consequently the ordinary least squares (OLS) estimation procedure for the coefficient
estimate is unbiased.
• Naturally, the variance of the coefficient estimate is not affected by correlation between the
relevant and irrelevant explanatory variables because the irrelevant explanatory variable is not
included in the regression.
On the other hand, when both relevant and irrelevant variables (variables 1 and 2) are included:
• The mean of the coefficient estimates for relevant explanatory variable, x1t, still equals 2, the
actual value; consequently, the ordinary least squares (OLS) estimation procedure for the coef-
ficient estimate is unbiased.
• The variance of the coefficient estimate is greater whenever the irrelevant explanatory variable
is included even when the two explanatory variables are independent (when the correlation coef-
ficient equals 0.00). This occurs because the irrelevant explanatory variable is adding a new
random influence to the model.
• As the correlation between the relevant and irrelevant explanatory variables increases it
becomes more difficult for the ordinary least squares (OLS) estimation procedure to separate
out the individual influence of each explanatory variable. As we saw with multicollinearity, this
difficulty evidences itself in the variance of the coefficient estimate’s probability distributions.
As the two explanatory variables become more correlated the variance of the coefficient esti-
mate’s probability distribution increases.
469 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables
The simulation illustrates the effect of including an irrelevant explanatory variable in a model.
While it does not cause bias, it does make the coefficient estimate of the relevant explanatory
variable less reliable by increasing the variance of its probability distribution.
Chapter 14 Exercises
Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
+ βTobProdTobProdPCt + et
a. Develop a theory that explains how each explanatory variable affects per capita cigarette
consumption. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
+ βTobProdTobProdPCt + et
a. Develop a theory that explains how each explanatory variable affects per capita cigarette
consumption. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
+ βIIncPCt + βTobProdTobProdPCt + et
a. Develop a theory that explains how each explanatory variable affects per capita cigarette
consumption. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
a. Develop a theory that explains how each explanatory variable affects the number of solo
earmarks. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
a. Develop a theory that explains how each explanatory variable affects the number of solo
earmarks. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
+ βDemPartyDemocratt + et
a. Develop a theory that explains how each explanatory variable affects the number of solo
earmarks. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.
Chapter 15 Outline
15.3 Pitfalls
15.3.1 Explanatory Variable Has the Same Value for All Observations
15.3.2 One Explanatory Variable Is a Linear Combination of Other Explanatory Variables
15.3.3 Dependent Variable Is a Linear Combination of Explanatory Variables
15.3.4 Outlier Observations
15.3.5 Dummy Variable Trap
1. A friend believes that the internet is displacing the television as a source of news and enter-
tainment. The friend theorizes that after accounting for other factors, television usage is falling
by 1 percent annually:
−1.0 Percent growth rate theory: After accounting for all other factors, the annual growth rate
of television users is negative, −1.0 percent.
LogUsersTVt = βConst
TV
+ βYear
TV
Yeart + βCapHum
TV
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
TV TV
GdpPCt + βAuth
TV
Autht + etTV
Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.
a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.
∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t
R =
2
=
∑ (y − y)
T
Actual squaredd deviations from the mean t
2
t =1
Calculate the R-squared for Professor Lord’s first quiz by filling in the following blanks:
475 Other Regression Statistics and Pitfalls
∑
T
t =1
( yt − y )2 = ______
∑ ∑
T T
y = ____
t =1 t t =1
( Estyt − y )2 = ______
3. Students frequently experience difficulties when analyzing data. To illustrate some of these
pitfalls, we first review the goal of multiple regression analysis:
Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect of each explanatory variable. An explanatory variable’s coefficient estimate allows
us to estimate the change in the dependent variable resulting from a change in that particular
explanatory variable while all other explanatory variables remain constant.
Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.
a. Explanatory variable has the same value for all observations. Run the following
regression:
i. What happens?
ii. What is the value of DHt for each of the observations?
iii. Why is it impossible to determine the effect of an explanatory variable if the explana-
tory variable has the same value for each observation? Explain.
b. One explanatory variable is a linear combination of other explanatory variables. Generate
a new variable, the ticket price in terms of cents:
i. What happens?
ii. Is it possible to sort out the effect of two explanatory variables when they contain
redundant information?
c. One explanatory variable is a linear combination of other explanatory variables—another
example. Generate a new variable, the total salaries of the two teams playing:
i. What happens?
ii. Is it possible to sort out the effect of explanatory variables when they are linear com-
binations of each other and therefore contain redundant information?
d. Dependent variable is a linear combination of explanatory variables. Run the following
regression:
What happens?
e. Outlier observations. First, run the following regression:
iii. Look at the first observation. What is the value of HomeSalary for the first observa-
tion? Is the value that was entered correctly?
Run the following regression:
Faculty salary data: Artificially constructed cross section salary data and characteristics for 200
faculty members.
As we did in chapter 13, generate the dummy variable SexF1, which equals 1 for a woman and
0 for a man. Run the following three regressions specifying Salary as the dependent variable:
To estimate the third model (part c) using EViews, you must “fool” EViews into running the
appropriate regression:
• In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight SexF1,
For each regression, what is the equation that estimates the salary for
i. men?
ii. women?
Last, run one more regression specifying Salary as the dependent variable:
d. Explanatory variables: SexF1, SexM1, and Experience—but with a constant. What
happens?
5. Consider a system of linear equations of 2 equations and 3 unknowns. Can you solve for all
three unknowns?
15.1.1 Confidence Interval Approach: Which Theories Are Consistent with the Data?
Our approach thus far has been to present a theory first and then use data to assess the
theory:
• First, we presented a theory.
• Second, we analyzed the data to determine whether or not the data were consistent with the
theory.
In other words, we have started with a theory and then decided whether or not the data were
consistent with the theory.
The confidence interval approach reverses this process. Confidence intervals indicate the
range of theories that are consistent with the data.
479 Other Regression Statistics and Pitfalls
In other words, the confidence interval approach starts with the data and then decides what theo-
ries are compatible.
Hypothesis testing plays a key role in both approaches. Consequently we must choose a sig-
nificance level. A confidence interval’s “size” and the significance level are intrinctly related:
Two-tailed confidence interval + Significance level = 100%
Since the traditional significance levels are 10, 5, and 1 percent, the three most commonly used
confidence intervals are 90, 95, and 99 percent:
• For a 90 percent confidence interval, the significance level is 10 percent.
• For a 95 percent confidence interval, the significance level is 5 percent.
• For a 99 percent confidence interval, the significance level is 1 percent.
A theory is consistent with the data if we cannot reject the null hypothesis at the confidence
interval’s significance level. No doubt this sounds confusing, so let us work through an example
using our international television data:
Project: Which growth theories are consistent with the international television data?
Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.
We begin by specifying the “size” of the confidence interval. Let us use a 95 percent confi-
dence interval which means that we are implicitly choosing a significance level of 5 percent.
The following two steps formalize the procedure to decide whether a theory lies within the two-
tailed 95 percent confidence interval:
Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.
Step 2: Consider a specific theory. Is the theory consistent with the data? Does the theory lie
within the confidence interval?
• Step 2a: Based on the theory, construct the null and alternative hypotheses. The null hypoth-
esis reflects the theory.
• Step 2b: Compute Prob[Results IF H0 true].
• Step 2c: Do we reject the null hypothesis?
Yes: Reject the theory. The data are not consistent with the theory. The theory does not lie
within the confidence interval.
No: The data are consistent with the theory. The theory does lie within the confidence
interval.
Recall that we decided to use a 95 percent confidence interval and consequently a 5 percent
significance level:
We will illustrate the steps by focusing on four growth rate theories postulating what the
growth rate of television use equals after accounting for other relevant factors:
• 0.0 percent growth rate theory
• −1.0 percent growth rate theory
• 4.0 percent growth rate theory
• 6.0 percent growth rate theory
481 Other Regression Statistics and Pitfalls
Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.
We will apply the same model to explain television use that we used previously:
Model:
LogUsersTVt = βConst
TV
+ βYear
TV
Yeart + βCapHum
TV
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
TV TV
GdpPCt + βAuth
TV
Autht + etTV
Step 2: 0.0 Percent growth rate theory. Focus on the effect of time. Is a 0.0 percent growth
theory consistent with the data? Does the theory lie within the confidence interval?
0.0 Percent growth rate theory: After accounting for all other explanatory variables, time has
no effect on television use; that is, after accounting for all other explanatory variables, the annual
growth rate of television use equals 0.0 percent. Accordingly the actual coefficient of Year, βYearTV
,
equals 0.000.
Table 15.1
Television regression results
• Step 2a: Based on the theory, construct the null and alternative hypotheses.
H0: βYear
TV
= 0.000
H1: βYear
TV
≠ 0.000
• Step 2b: Compute Prob[Results IF H0 true].
Prob[Results IF H0 true] = Probability that the coefficient estimate would be at least 0.023
from 0.000, if H0 were true (if the actual coefficient equals, βYear
TV
, 0.000).
We can use the Econometrics Lab to calculate the probability of obtaining the results if the null
hypothesis is true. Remember that we are conducting a two-tailed test.
Question: What is the probability that the estimate lies at or above 0.023?
Answer: 0.0742 (figure 15.1)
Student t-distribution
Mean = 0.000
SE = 0.0159
DF = 736
0.0742 0.0742
TV
0.023 0.023 bYear
− 0.023 0.000 0.023
Figure 15.1
Probability distribution of coefficient estimate—0.0 Percent growth rate theory
483 Other Regression Statistics and Pitfalls
Question: What is the probability that the estimate lies at or below −0.023?
Answer: 0.0742
The Prob[Results IF H0 true] equals the sum of the right and the left tail two probabilities:
We will not provide justification for any of these theories. The confidence interval approach
does not worry about justifying the theory. The approach is pragmatic; the approach simply asks
whether or not the data support the theory.
Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.
Step 2: −1.0 Percent growth rate theory. Is the theory consistent with the data? Does the theory
lie within the confidence interval?
• Step 2a: Based on the theory, construct the null and alternative hypotheses.
484 Chapter 15
H0: βYear
TV
= −0.010
H1: βYear
TV
≠ −0.010
• Step 2b: Compute Prob[Results IF H0 true].
To compute Prob[Results IF H0 true], we first pose a question:
Question: How far is the coefficient estimate, 0.023, from the value of the coefficient specified
by the null hypothesis, −0.010?
Answer: 0.033
Accordingly
Prob[Results IF H0 true] = Probability that the coefficient estimate would be at least 0.033
from −0.010, if H0 were true (if the actual coefficient equals, βYear
TV
, −0.010)
We can use the Econometrics Lab to calculate the probability of obtaining the results if the null
hypothesis is true. Once again, remember that we are conducting a two-tailed test:
Student t-distribution
Mean = –0.010
SE = 0.0159
DF = 736
0.0191 0.0191
TV
0.033 0.033 bYear
–0.043 –0.010 0.023
Figure 15.2
Probability distribution of coefficient estimate—−1.0 Percent growth rate theory
485 Other Regression Statistics and Pitfalls
Question: What is the probability that the estimate lies 0.033 or more above −0.010, at or above
0.023?
Answer: 0.0191
Question: What is the probability that the estimate lies 0.033 or more below −0.010, at or
below −0.043?
Answer: 0.0191
The Prob[Results IF H0 true] equals the sum of the of the two probabilities:
Yes, we do reject the null hypothesis at a 5 percent significance level; Prob[Results IF H0 true]
equals 0.038, which is less than 0.05. The theory is not consistent with the data; hence −0.010
does not lie within the 95 percent confidence interval.
We do not reject the null hypothesis at a 5 percent significance level. The theory is consistent
with the data; hence 0.040 does lie within the 95 percent confidence interval.
We do reject the null hypothesis at a 5 percent significance level. The theory is not consistent
with the data; hence 0.060 does not lie within the 95 percent confidence interval.
We summarize the four theories in figure 15.3 and table 15.2.
486 Chapter 15
Student t-distribution
Mean = –0.010
SE = 0.0159 –1.0 Percent growth theory
DF = 736 TV
H0:β Year = −0.010
TV
0.0191 0.0191 H :β ≠−0.010
1 Year
Prob[Results IF H0 true] ≈ 0.038
TV
–0.043 0.033 0.033 0.023 b
–0.010 Year
Student t-distribution
Mean = 0.000
SE = 0.0159 0.0 Percent growth theory
DF = 736 TV
H0:β Year = 0.000
0.0742 0.0742
TV
H1:β Year ≠ 0.000
Prob[Results IF H0 true] ≈ 0.148
TV
–0.023
0.023
0.000
0.023
0.023 bYear
Student t-distribution
Mean = 0.040
SE = 0.0159
4.0 Percent growth theory DF = 736
TV
H0:β Year = 0.040
0.1427 0.1427
TV
H1:β Year ≠ 0.040
Prob[Results IF H0 true] ≈ 0.285
0.017 0.017 TV
0.023 0.040 0.057 bYear
Student t-distribution
Mean = 0.060
SE = 0.0159
DF = 736
6.0 Percent growth theory 0.0101 0.0101
TV
H0:β Year = 0.060
TV
H1:β Year ≠ 0.060
Prob[Results IF H0 true] ≈ 0.020 TV
0.023
0.037
0.060
0.037
0.097 bYear
Figure 15.3
Probability distribution of coefficient estimate—Comparison of growth rate Theories
Table 15.2
Growth rate theories and the 95 percent confidence interval
−1% H0: β TV
Year = −0.010 H1: β TV
Year ≠ −0.010 ≈ 0.038 No
0% H0: β TV
Year = 0.000 H1: β TV
Year ≠ 0.000 ≈ 0.148 Yes
4% H0: β TV
Year = 0.040 H1: β TV
Year ≠ 0.040 ≈ 0.285 Yes
6% H0: β TV
Year = 0.060 H1: β TV
Year ≠ 0.060 ≈ 0.020 No
487 Other Regression Statistics and Pitfalls
Prob[Results IF H0 true]
0.285
0.148
0.050
0.038
0.020
Growth
rate
theory
−1.0% 0.0% 4.0% 6.0%
Within 95%
0.038 0.148 confidence interval 0.285 0.020
LB UB
βYear βYear
Do not reject H0
Reject H0 Reject H0
Figure 15.4
Lower and Upper Confidence Interval Bounds
Question: What is the lowest growth rate theory that is consistent with the data; that is, what
is the lower bound of the confidence interval, βYear
LB
?
• The 4.0 percent growth rate theory lies within the confidence interval, but the 6.0 percent
theory does not (figure 15.4).
Question: What is the highest growth rate theory that is consistent with the data; that is, what
is the upper bound of the confidence interval, βYear
UB
?
Figure 15.5 answers these questions visually by illustrating the lower and upper bounds. The
Prob[Results IF H0 true] equals 0.05 for both lower and upper bound growth theories because
our calculations are based on a 95 percent confidence interval:
488 Chapter 15
Student t-distribution
LB
Mean = β Year
Lower bound growth theory
SE = 0.0159 TV LB
H0:β Year = β Year
DF = 736
TV LB
H1:β Year ≠ β Year
0.025 0.025 Prob[Results IF H0 true] = 0.05
bTV
Year
LB 0.023
β Year Student t-distribution
LB
Mean = β Year
SE = 0.0159
Upper bound growth theory DF = 736
TV UB
H0:β Year = β Year
TV UB
H1:β Year ≠ β Year 0.025 0.025
Prob[Results IF H0 true] = 0.05
b TV
Year
0.023 UB
β Year
Figure 15.5
Probability distribution of coefficient estimate—Lower and upper confidence intervals
• The lower bound growth theory postulates a growth rate that is less than that estimated. Hence
the coefficient estimate, 0.023, marks the right-tail border of the lower bound.
•The upper bound growth theory postulates a growth rate that is greater than that estimated.
Hence the coefficient estimate, 0.023, marks the left-tail border of the upper bound.
We can use the Econometrics Lab to calculate the lower and upper bounds:
• Calculating the lower bound, βYear
LB
: For the lower bound, the right-tail probability equals
0.025.
Mean: −0.0082
βYear
LB
= −0.0082
• Calculating the Upper Bound, β YearUB
: For the upper bound, the left-tail probability equals
0.025. Accordingly the right-tail probability will equal 0.975.
Mean: 0.0542
βYear
UB
= 0.0542
In this case −0.0082 and 0.0542 mark the bounds of the two-tailed 95 percent confidence
interval:
• For any growth rate theory between −0.82 percent and 5.42 percent:
Prob[Results IF H0 true] > 0.05 → Do not reject H0 at the 5 percent significance level.
• For any growth rate theory below −0.82 percent or above 5.42 percent:
Prob[Results IF H0 true] < 0.05 → Reject H0 at the 5 percent significance level.
Fortunately, statistic software provides us with an easy and convenient way to compute confi-
dence intervals. The software does all the work for us.
490 Chapter 15
Table 15.3
95 Percent confidence interval calculations
Table 15.3 reports that the lower and upper bounds for the 95 percent confidence interval are
−0.0082 and 0.0542. These are the same values that we calculated using the Econometrics Lab.
All statistical packages report the coefficient of determination, the R-squared, in their regression
printouts. The R-squared seeks to capture the “goodness of fit.” It equals the portion of the depen-
dent variable’s squared deviations from its mean that is explained by the parameter estimates:
∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t
R =2
=
∑ (y − y)
T
Actual squaredd deviations from the mean t
2
t =1
To explain how the coefficient of determination is calculated, we will revisit Professor Lord’s
first quiz (table 15.4). Recall the theory, the model, and our analysis:
Theory: An increase in the number of minutes studied results in an increased quiz score.
Model: yt = βConst + βxxt + et
491 Other Regression Statistics and Pitfalls
Table 15.4
First quiz data
Minutes Quiz
Student studied (x) score (y)
1 5 66
2 15 87
3 25 90
Table 15.5
First quiz regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
where
Theory: βx > 0
We used the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters (table 15.5).
Next we formulated the null and alternative hypotheses to determine how much confidence
we should have in the theory:
Table 15.6
R-squared calculations for first quiz
∑ ∑ ∑
T T T
y = 243
t =1 t t =1
( yt − y )2 = 342 t =1
( Estyt − y )2 = 288
243 288
y= = 81 R-squared = = 0.84
3 342
We then calculated Prob[Results IF H0 true], the probability of the results like we obtained (or
even stronger) if studying in fact had no impact on quiz scores. The tails probability reported in
the regression printout allows us to calculate this easily. Since a one-tailed test is appropriate,
we divide the tails probability by 2:
0.2601
Prob[ Results IF H 0 true] = ≈ 0.13
2
We cannot reject the null hypothesis that studying has no impact even at the 10 percent signifi-
cance level.
The regression printout reports that the R-squared equals about .84; this means that 84 percent
of the dependent variable’s squared deviations from its mean are explained by the parameter
estimates. Table 15.6 shows the calculations required to compute the R-squared:
The R-squared equals ∑ t =1 ( Estyt − y )2 divided ∑ t =1 ( yt − y )2 :
T T
∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t 2888
R = 2
= = = 0.84
∑ (y − y)
T
Actual squaredd deviations from the mean t
2 342
t =1
Note that 84 percent of the y’s squared deviations are explained by the estimated constant and
coefficient. Our calculation of the R-squared agrees with the regression printout.
While the R-squared is always calculated and reported by all statistical software, it is not
useful in assessing theories. We will justify this claim by considering a second quiz that Profes-
sor Lord administered. Each student studies the same number of minutes and earns the same
score in the second quiz as he/she did in the first quiz (table 15.7).
Before we run another regression that includes the data from both quizzes, let us apply our
intuition:
493 Other Regression Statistics and Pitfalls
Table 15.7
Second quiz data
Minutes Quiz
Student studied (x) score (y)
1 5 66
2 15 87
3 25 90
Table 15.8
First and second quiz regression results
Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
• Begin by focusing on only the first quiz. Taken in isolation, first quiz suggests that studying
improves quiz scores. We cannot be very confident of this, however, since we cannot reject the
null hypothesis even at a 10 percent significance level.
• Next consider only the second quiz. Since the data from the second quiz is identical to the
data from the first quiz, the regression results would be identical. Hence, taken in isolation, the
second quiz suggests that studying improves quiz scores.
Each quiz in isolation suggests that studying improves quiz scores. Now consider both quizzes
together. The two quizzes taken together reinforce each other; this should make us more confi-
dent in concluding that studying improves quiz scores, should it not?
If our intuition is correct, how should the Prob[Results IF H0 true] be affected when we con-
sider both quizzes together? Since we are more confident in concluding that studying improves
quiz scores, the probability should be less. Let us run a regression using data from both the first
and second quizzes to determine whether or not this is true (table 15.8).
Table 15.9
R-squared calculations for first and second quizzes
∑ ∑ ∑
T T T
y = 486
t =1 t t =1
( yt − y )2 = 684 t =1
( Estyt − y )2 = 576
486 576
y= = 81 R-squared = = 0.84
6 684
As a consequence of the second quiz, the probability has fallen from 0.13 to 0.005; clearly, our
confidence in the theory rises. We can now reject the null hypothesis that studying has no impact
at the traditional significance levels of 1, 5, and 10 percent. Our calculations confirm our
intuition.
Next consider the R-squared for the last regression that includes both quizzes. The regression
printout reports that the R-squared has not changed; the R-squared is still 0.84. Table 15.9
explains why:
∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t 5866
R =
2
= = = 0.84
∑ (y − y)
T
Actual squaredd deviations from the mean t
2 684
t =1
The R-squared still equals 0.84. Both the actual and explained squared deviations have doubled;
consequently their ratio, the R-squared, remains unchanged. Clearly, the R-squared does not help
us assess our theory. We are now more confident in the theory, but the value of the R-squared
has not changed. The bottom line is that if we are interested in assessing our theories we should
focus on hypothesis testing, not on the R-squared.
15.3 Pitfalls
Frequently econometrics students using statistical software encounter pitfalls that are frustrating.
We will now discuss several of these pitfalls and describe the warning signs that accompany
them. We begin by reviewing the “goal” of multiple regression analysis:
495 Other Regression Statistics and Pitfalls
Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect of each explanatory variable. An explanatory variable’s coefficient estimate allows
us to estimate the change in the dependent variable resulting from a change in that particular
explanatory variable while all other explanatory variables remain constant.
We will illustrate the first four pitfalls by revisiting our baseball attendance data that reports on
every game played in the American League during the summer of 1996 season.
We begin with a model that we have studied before in which attendance, Attendance, depends
on two explanatory variables, ticket price, PriceTicket, and home team salary, HomeSalary:
15.3.1 Explanatory Variable Has the Same Value for All Observations
One common pitfall is to include an explanatory variable in a regression that has the same value
for each observation. To illustrate this, consider the variable DH:
Table 15.10
Baseball attendance regression results
Our baseball data includes only American League games in 1996. Since interleague play did
not begin until 1997 and all American League games allowed designated hitters, the variable
DHt equals 1 for each observation. Let us try to use the ticket price, PriceTicket, home team
salary, HomeSalary, and the designated hitter dummy variable, DH, to explain attendance,
Attendance:
The statistical software issues a diagnostic. While the verbiage differs from software package
to software package, the message is the same: the software cannot perform the calculations that
we requested. That is, the statistical software is telling us that it is being asked to do the
impossible.
What is the intuition behind this? To determine how a dependent variable is affected by an
explanatory variable, we must observe how the dependent variable changes when the explanatory
variable changes. The intuition is straightforward:
• On the one hand, if the dependent variable tends to rise when the explanatory variable rises,
the explanatory variable affects the dependent variable positively suggesting a positive
coefficient.
• On the other hand, if the dependent variable tends to fall when the explanatory variable rises,
the explanatory variable affects the dependent variable negatively suggesting a negative
coefficient.
The evidence of how the dependent variable changes when the explanatory variable changes is
essential. In the case of our baseball example, there is no variation in the designated hitter
497 Other Regression Statistics and Pitfalls
explanatory variable, however; the DHt equals 1 for each observation. We have no way to assess
the effect that the designated hitter has on attendance. We are asking our statistical software to
do the impossible. While we have attendance information when the designated hitter was used,
we have no attendance information when the designated hitter was not used. How then can we
expect the software to assess the impact of the designed hitter on attendance?
We have already seen one example of this when we discussed multicollinearity in the previous
chapter. We included both the ticket price in terms of dollars and the ticket price in terms of
cents as explanatory variables. The ticket price in terms of cents was a linear combination of
the ticket price in terms of dollars:
Let us try to use the ticket price, PriceTicket, home team salary, HomeSalary, and the ticket
price in terms of cents, PriceCents, to explain attendance, Attendance:
When both measures of the price were included in the regression our statistical software will
issue a diagnostic indicating that it is being asked to do the impossible. Statistical software
cannot separate out the individual influence of the two explanatory variables, PriceTicket and
PriceCents, because they contain precisely the same information; the two explanatory variables
are redundant. We are asking the software to do the impossible.
In fact any linear combination of explanatory variables produces this problem. To illustrate
this, we consider two regressions. The first specifies three explanatory variables: ticket price,
home team salary, and visiting team salary (table 15.11).
Table 15.11
Baseball attendance
TotalSalary is a linear combination of HomeSalary and VisitSalary. Let us try to use the ticket
price, PriceTicket, home team salary, HomeSalary, and visiting team salary, VisitSalary, and
total salary, TotalSalary, to explain attendance, Attendance:
Our statistical software will issue a diagnostic indicating that it is being asked to do the
impossible.
The information contained in TotalSalary is already included in HomeSalary and VisitSalary.
Statistical software cannot separate out the individual influence of the three explanatory variables
because they contain redundant information. We are asking the software to do the impossible.
Suppose that the dependent variable is a linear combination of the explanatory variables. The
following regression illustrates this scenario. TotalSalary is by definition the sum of HomeSalary
and VisitSalary. Total salary, TotalSalary, is the dependent variable; home team salary, HomeSal-
ary, and visiting team salary, VisitSalary, are the explanatory variables (table 15.12).
The estimates of the constant and coefficients reveal the definition of TotalSalary:
Table 15.12
Total salary
Furthermore the standard errors are very small, approximately 0. In fact they are precisely equal
to 0, but they are not reported as 0’s as a consequence of how digital computers process numbers.
We can think of these very small standard errors as telling us that we are dealing with an “iden-
tity” here, something that is true by definition.
We should be aware of the possibility of “outliers” because the ordinary least squares (OLS)
estimation procedure is very sensitive to them. An outlier can occur for many reasons. One
observation could have a unique characteristic or one observation could include a mundane typo.
To illustrate the effect that an outlier may have, once again consider the games played in the
summer of the 1996 American League season.
The first observation reports the game played in Milwaukee on June 1, 1996: the Cleveland
Indians visited the Milwaukee Brewers. The salary for the home team, the Brewers, totaled
20.232 million dollars in 1996:
Observation Month Day Home team Visiting team Home team salary
1 6 1 Milwaukee Cleveland 20.23200
2 6 1 Oakland New York 19.40450
3 6 1 Seattle Boston 38.35453
4 6 1 Toronto Kansas City 28.48671
5 6 1 Texas Minnesota 35.86999
Table 15.13
Baseball attendance regression with correct data
Table 15.14
Baseball attendance regression with an outlier
20.23200. All the other values were entered correctly. You can access the data including this
“outlier” in table 15.14.
Observation Month Day Home team Visiting team Home team salary
Even though only a single value has been altered, the estimates of both coefficients changes
dramatically. The estimate of the ticket price coefficient changes from about −591 to 1,896 and
the estimate of the home salary coefficient changes from 783.0 to −0.088. This illustrates how
sensitive the ordinary least squares (OLS) estimation procedure can be to an outlier. Conse-
quently we must take care to enter data properly and to check to be certain that we have generated
any new variables correctly.
To illustrate the dummy variable trap, we will revisit our faculty salary data:
Project: Assess the possibility of discrimination in academe.
Faculty salary data: Artificially constructed cross-sectional salary data and characteristics for
200 faculty members.
501 Other Regression Statistics and Pitfalls
We will investigate models that include only dummy variables and years of teaching experience.
More specifically, we will consider four cases:
SexF1 = 1 − SexM1
Now we will estimate the parameters of the four models (table 15.15). First, model 1.
Model 1:
Table 15.15
Faculty salary regression
We calculate the estimated salary equation for men and women. For men, SexF1 = 0:
The intercept for women equals $39,998; the slope equals 2,447.
It is easy to plot the estimated salary equations for men and women (figure 15.6). Both plotted
lines have the same slope, 2,447. The intercepts differ, however. The intercept for men is 42,238
while the intercept for women is 39,998:
Model 2:
EstSalary
Slope = 2,447
42,238
2,240
39,998
Experience
Figure 15.6
Estimated salaries equations for men and women
503 Other Regression Statistics and Pitfalls
Let us attempt to calculate the second model’s estimated constant and the estimated male sex
dummy coefficient, bConst and bSexM1, using the intercepts from model 1.
39,998 = bConst
and two unknowns, bConst and bSexM1. It is easy to solve for the unknowns. The second equation
tells us that bConst equals 39,998:
bConst = 39,998
Using the estimates from model 1, we compute that the estimate for model 2’s estimate for the
constant, which should be 39,998 and the estimate for the male sex dummy coefficient, which
should be 2,240.
Let us now run the regression. The regression confirms our calculations (table 15.16).
Model 3:
Salaryt = βSexF1SexF1t + βSexM1SexM1t + βEExperiencet + et
Again, let us attempt to calculate the third model’s estimated female sex dummy coefficient and
its male sex dummy coefficient, bSexF1 and bSexM1, using the intercepts from model 1.
504 Chapter 15
Table 15.16
Faculty salary regression
42,238 = bSexM1
39,998 = bSexF1
and two unknowns, bSexF1 and bSexM1. Using the estimates from model 1, we compute that the
estimate for model 3’s estimate for the male sex dummy coefficient should be 42,238 and the
estimate for the female sex dummy coefficient should be 39,998.
Let us now run the regression:
To estimate the third model (part c) using EViews, you must “fool” EViews into running the
appropriate regression:
•In the Workfile window: highlight Salary and then while depressing <Ctrl>, highlight SexF1,
SexM1, and Experience.
• In the Workfile window: double click on a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete c so that the window looks like this:
salary sexf1 sexm1 experience.
• Click OK.
505 Other Regression Statistics and Pitfalls
Table 15.17
Faculty salary regression
Model 4:
Question: Can we calculate the fourth model’s bConst, bSexF1, and bSexM1 using model 1’s
intercepts?
and three unknowns, bConst, bSexF1, and bSexM1. We have more unknowns than equations. We cannot
solve for the three unknowns. It is impossible. This is called a dummy variable trap:
Dummy variable trap: A model in which there are more parameters representing the intercepts
than there are intercepts.
There are three parameters, bConst, bSexF1, and bSexM1, estimating the two intercepts.
Let us try to run the regression:
Our statistical software will issue a diagnostic telling us that it is being asked to do the impossible.
In some sense, the software is being asked to solve for three unknowns with only two equations.
1. Explain in words how the confidence interval approach differs from the approach we have
taken thus far.
2. If you wish to assess a theory should you be concerned with the coefficient of determination,
the R-squared?
3. What is the goal of multiple regression analysis?
4. In each case, what issue arises for multiple regression analysis and in words explain why it
arises:
a. Explanatory variable has the same value for all observations.
b. One explanatory variable is a linear combination of other explanatory variables.
c. Dependent variable is a linear combination of explanatory variables.
d. Outlier observations.
e. Dummy variable trap.
Chapter 15 Exercises
Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.
LogUsersInternett = βConst
Int
+ βYear
Int
Yeart + βCapHum
Int
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
Int Int
GdpPCt + βAuth
Int
Autht + etInt
507 Other Regression Statistics and Pitfalls
a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.
a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.
b. Compute the two-tailed the 95 percent confidence interval for the coefficient estimate
Year.
Petroleum data consumption for Massachusetts and Nebraska: Panel data of petroleum con-
sumption and prices for two states, Massachusetts and Nebraska, from 1990 to 1999.
3. Estimate this model for Massachusetts by restricting your sample to Massachusetts observa-
tions only. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the model.
a. What equation describes estimated per capita petroleum consumption in Massachusetts,
EstPetroConsPCMass?
• To include only the Massachusetts data, enter Mass1 = 1 in the If condition window.
• Click OK.
NB: Do not forget that the Sample option behaves like a toggle switch. It remains on until
you turn it off.
Therefore, before proceeding, in the Workfile window:
• Click Sample.
4. Estimate the model for Nebraska by restricting your sample to Nebraska observations only.
Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the
model.
a. What equation describes estimated per capita petroleum consumption in Nebraska,
EstPetroConsPCNeb?
• To include only the Nebraska data, enter Mass1 = 0 in the If condition window.
• Click OK.
NB: Do not forget that the Sample option behaves like a toggle switch. It remains on until
you turn it off.
Therefore, before proceeding, in the Workfile window:
• Click Sample.
5.
a. Consider the following new model:
where
Neb1t = 1 − Mass1t
PriceReal_Masst = PriceRealt × Mass1t
PriceReal_Nebt = PriceRealt × Neb1t
Let bMass, bNeb, bPMass, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:
EstPetroConsPC =
bMassMass1 + bNebNeb1 + bPMassPriceReal_Mass + bPNebPriceReal_Neb
PetroConsPCt =
βConst + βMassMass1t + βPNebNeb1t + βPMassPriceReal_Masst + βPNebPriceReal_Nebt + et
Let bConst, bMass, bNeb, bPMass, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:
EstPetroConsPC =
bConst + bMassMass1 + bNebNeb + bPMassPriceReal_Mass + bPNebPriceReal_Neb
PetroConsPCt =
βMassMass1t + βPNebNeb1t + βPPriceRealt + βPMassPriceReal_Masst + βPNebPriceReal_Nebt + et
Let bMass, bNeb, bConst, bP, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:
EstPetroConsPC =
bMassMass1 + bNebNeb1t + bPPriceReal + bPMassPriceReal_Mass + bPNebPriceReal_Neb
Chapter 16 Outline
16.1 Review
16.1.1 Regression Model
16.1.2 Standard Ordinary Least Squares (OLS) Premises
16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure
16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
16.3.1 The Mathematics
16.3.2 Our Suspicions
16.3.3 Confirming Our Suspicions
2. In chapter 6 we showed that the ordinary least squares (OLS) estimation procedure for the
coefficient value was unbiased; that is, we showed that
514 Chapter 16
Mean[bx] = βx
Review the algebra. What role, if any, did the first standard ordinary least squares (OLS) premise,
the error term equal variance premise, play?
3. In chapter 6 we showed that the variance of the coefficient estimate’s probability distribution
equals the variance of the error term’s probability distribution divided by the sum of the squared
x deviations; that is, we showed that
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2
Review the algebra. What role, if any, did the first standard ordinary least squares (OLS) premise,
the error term equal variance premise, play?
1992 Internet data: Cross-sectional data of Internet use and gross domestic product for 29
countries in 1992.
a. What is your theory concerning how per capita GDP should affect Internet use? What does
your theory suggest about the sign of the GdpPC coefficient, βGDP?
b. Run the appropriate regression. Do the data support your theory?
e. Based on the scatter diagram, what do you conclude about the variance of the residuals
as per capita GDP increases?
5. Again, consider the following model:
Assume that the variance of the error term’s probability distribution is proportional to each
nation’s per capita GDP:
Var[et] = V × GdpPCt
where V is a constant.
Now divide both sides of the equation that specifies the model by the square root of per capita
GDP, GdpPCt . Let
et
εt =
GdpPCt
16.1 Review
where
yt = dependent variable
et = error term
xt = explanatory variable
T = sample size
The error term is a random variable that represents random influences: Mean[et] = 0
We will now focus our attention on the standard ordinary least squares (OLS) regression
premises:
• Error term equal variance premise: The variance of the error term’s probability distribution
for each observation is the same; all the variances equal Var[e]:
16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure
The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
• Values of the regression parameters, βx and βConst:
∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2
When the standard ordinary least squares (OLS) regression premises are met:
• Each estimation procedure is unbiased; that is, each estimation procedure does not systemati-
cally underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).
Crucial point: When the ordinary least squares (OLS) estimation procedure performs its calcu-
lations, it implicitly assumes that the standard ordinary least squares (OLS) regression premises
are satisfied.
In this chapter we will focus on the first standard ordinary least squares (OLS) premise, the
error term equal variance premise. We begin by examining precisely what the premise means.
Subsequently we investigate what problems do and do not emerge when the premise is violated
and finally what can be done to address the problems that do arise.
Heteroskedasticity refers to the variances of the error terms’ probability distributions. The syl-
lable “hetero” means different; the syllable “skedasticity” refers to the spread of the distribution.
Heteroskedasticity means that the spread of the error term’s probability distribution differs from
observation to observation. Recall the error term equal variance premise:
• Error term equal variance premise: The variance of the error term’s probability distribution
for each observation is the same; all the variances equal Var[e]:
The presence of heteroskedasticity violates the error term equal variance premise.
We begin by illustrating the effect of heteroskedasticity on the error terms. Consider the three
students in Professor Lord’s class who must take a quiz every Tuesday morning:
50 –2.0
200 –1.0
350 0.0
500 1.0
Repetition Pause
Start
Figure 16.1
Heteroskedasticity simulation
The error terms, the et’s, represent random influences; that is, the error terms have no systematic
effort on the dependent variable yt. Consequently the mean of each error term’s probability distribu-
tion, Mean[et], equals 0. In other words, if the experiment were repeated many, many times, the error
term would average out to be 0. When the distribution is symmetric, half the time the error term
would be positive leading to a higher than normal value of yt and half the time it would be negative
leading to a lower than normal value of yt. We will use a simulation to illustrate this (figure 16.1).
The list labeled Heter is the “heteroskedasticity factor.” Initially, Heter is specified as 0, meaning
that no heteroskedasticity is present. Click Start and then Continue a few times to note that the
distribution of each student’s error terms is illustrated in the three histograms at the top of the
window. Also the mean and variance of each student’s error terms are computed. Next uncheck
the Pause checkbox and click Continue; after many, many repetitions of the experiment click
Stop. The mean of each student’s error term is approximately 0, indicating that the error terms
truly represent random influences; the error terms have no systematic affect on a student’s quiz
score. Furthermore the spreads of each student’s error term distribution appear to be nearly
identical; the variance of each student’s error term is approximately the same. Hence the error
term equal variance premise is satisfied (figure 16.2).
Next change the value of the Heter. When a positive value is specified, the distribution spread
increases as we move from student 1 to student 2 to student 3; when a negative value is speci-
fied, the spread decreases. Specify 1 instead of 0. Note that when you do this, the title of the
variance list changes to Mid Err Var. This occurs because heteroskedasticity is now present and
the variances differ from student to student. The list specifies the variance of the middle student’s,
student 2’s, error term probability distribution. By default student 2’s variance is 500. Now, click
Start and then after many, many repetitions of the experiment click Stop. The distribution
spreads of each student’s error terms are not identical (figure 16.3).
519 Heteroskedasticity
Figure 16.2
Error term probability distributions—Error term equal variance premise satisfied
Figure 16.3
Error term probability distributions—Error term equal variance premise violated
The error term equal variance premise is now violated. What might cause this discrepancy?
Suppose that student 1 tries to get a broad understanding of the material and hence reads all the
assignment albeit quickly. However, student 3 guesses on what material will be covered on the
quiz and spends his/her time thoroughly studying only that material. When student 3 guesses
right, he/she will do very well on the quiz, but when he/she guesses wrong, he/she will do very
poorly. Hence we would expect student 3’s quiz scores to be more volatile than student 1’s. This
volatility is reflected by the variance of the error terms. The variance of student 3’s error term
distribution would be greater than student 1’s.
16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
Now let us explore the consequences of heteroskedasticity. We will focus on two of the three
estimation procedures embedded within the ordinary least squares (OLS) estimation
procedure:
520 Chapter 16
Question: Are these estimation procedures still unbiased when heteroskedasticity is present?
Ordinary Least Squares (OLS) Estimation Procedure for the Coefficient Value
Begin by focusing on the coefficient value. Previously we showed that the estimation procedure
for the coefficient value was unbiased by
• applying the arithmetic of means;
and
• recognizing that the means of the error terms’ probability distributions equal 0 (since the error
terms represent random influences).
∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦
1. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can apply the arithmetic of means easily. Our results are unaffected by this assumption.
521 Heteroskedasticity
1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2
1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2
1
= βx + [( x1 − x )Mean[ e1 ] + ( x2 − x )Mean[ e2 ] + ( x3 − x )Mean[ e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
What is the critical point here? We have not relied on the error term equal variance premise
to show that the estimation procedure for the coefficient value is unbiased. Consequently we
suspect that the estimation procedure for the coefficient value should still be unbiased in the
presence of heteroskedasticity.
Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the Coefficient
Estimate’s Probability Distribution
Next consider the estimation procedure for the variance of the coefficient estimate’s probability
distribution used by the ordinary least squares (OLS) estimation procedure. The strategy involves
two steps:
• First, we used the adjusted variance to estimate the variance of the error term’s probability
distribution: EstVar[e] = SSR/Degrees of freedom.
• Second, we applied the equation relating the variance of the coefficient estimates
probability distribution and the variance of the error term’s probability distribution:
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2
T
Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓ ↓
SSR Var[ e]
EstVar[ e] = Var[ bx ] =
∑
T
Degrees of freedom
t =1
( xt − x )2
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
522 Chapter 16
This strategy is grounded on the premise that the variance of each error term’s probability
distribution is the same, the error term equal variance premise:
Unfortunately, when heteroskedasticity is present, the error term equal variance premise is vio-
lated because there is not a single Var[e]. The variance differs from observation to observation.
When heteroskedasticity is present the strategy used by the ordinary least squares (OLS) estima-
tion procedure to estimate the coefficient estimate’s probability distribution is based on a faulty
premise. The ordinary least squares (OLS) estimation procedure is trying to estimate something
that does not exist, a single Var[e]. Consequently we should be suspicious of the procedure.
So, where do we stand? We suspect that when heteroskedasticity is present, the ordinary least
squares (OLS) estimation procedure for the
• coefficient value will still be unbiased;
• variance of the coefficient estimate’s probability distribution may be biased.
Econometrics Lab 16.2: Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation
Procedure
The simulation (figure 16.4) allows us to address the two critical questions:
• Question 1: Is the estimation procedure for the coefficient’s value unbiased; that is, does the
mean of the coefficient estimate’s probability distribution equal the actual coefficient value? The
relative frequency interpretation of probability allows us to address this question by using the
simulation. After many, many repetitions the distribution of the estimated values mirrors the
probability distribution. Therefore we need only compare the mean of the estimated coefficient
values with the actual coefficient values. If the two are equal after many, many repetitions, the
estimation procedure is unbiased.
523 Heteroskedasticity
SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
the coefficient estimate’s Mean
probability distribution Mean(average) of the variance
calculated from this repetition estimates from all repetitions
Figure 16.4
Heteroskedasticity simulation
Relative frequency
Unbiased
interpretation of Mean
or biased
probability of the coefficient
estimate’s probability ↓ Actual
↓ distribution coefficient
= or ≠ value
After many, many
→ ↓
repetitions
Mean (average) of
the estimated
coefficient values
after many, many
repetitions
524 Chapter 16
• Question 2: Is the estimation procedure for the variance of the coefficient estimate’s prob-
ability distribution unbiased? Again, the relative frequency interpretation of probability allows
us to address this question by using the simulation. We need only compare the variance of the
estimated coefficient values and estimates for the variance after many, many repetitions. If the
two are equal, the estimation procedure is unbiased.
Relative frequency
Unbiased
interpretation of Variance
or biased
probability of the coefficient
estimate’s probability ↓ Mean (average) of the
↓ distribution variance estimates after
= or ≠ many, many repetitions
After
many, many → ↓
repetitions
Variance of
the estimated
coefficient values
after many, many
repetitions
Note that the “Heter” list now appears in this simulation. This list allows us to investigate the
effect of heteroskedasticity (figure 16.5). Initially, 0 is specified as a benchmark, meaning that
no heteroskedasticity is present. Click Start and then after many, many repetitions click Stop.
The simulation results appear in table 16.1. In the absence of heteroskedasticity both estimation
procedures are unbiased:
• The estimation procedure for the coefficient value is unbiased. The mean (average) of the
coefficient estimates equals the actual coefficient value; both equal 2.
Heter
–2.0
–1.0
0.0
1.0
2.0
Figure 16.5
Heteroskedasticity factor list
525 Heteroskedasticity
Table 16.1
Heteroskedasticity simulation results
• The estimation procedure for the variance of the coefficient estimates probability distribution
is unbiased. The mean (average) of the variance estimates equals the actual variance of the coef-
ficient estimates; both equal 2.5.
When the standard ordinary least squares (OLS) premises are satisfied, both estimation proce-
dures are unbiased.
Next we will investigate the effect of heteroskedasticity by selecting 1.0 from the “Heter” list.
Heteroskedasticity is now present. Click Start and then after many, many repetitions click Stop.
• The estimation procedure for the coefficient value is unbiased. The mean (average) of the
coefficient estimates equals the actual coefficient value; both equal 2.
• The estimation procedure for the variance of the coefficient estimates probability distribution
is biased. The mean (average) of the estimated variances equals 2.9, while the actual variance
equals 3.6.
The simulation results confirm our suspicions. When heteroskedasticity is present there is
some good news, but also some bad news:
• Good news: The ordinary least squares (OLS) estimation procedure for the coefficient value
is still unbiased.
• Bad news: The ordinary least squares (OLS) estimation procedure for the variance of the
coefficient estimate’s probability distribution is biased.
When the estimation procedure for the variance of the coefficient estimate’s probability dis-
tribution is biased, all calculations based on the estimate of the variance will be flawed also;
that is, the standard errors, t-statistics, and tail probabilities appearing on the ordinary least
526 Chapter 16
squares (OLS) regression printout are unreliable. Consequently we will use an example to
explore how we can account for the presence of heteroskedasticity.
We will illustrate this approach by considering the effect of per capita GDP on Internet use.
To assess the theory we construct a simplified model with a single explanatory variable, per
capita GDP. Previously we showed that several other factors proved important in explaining
Internet use. We include only per capita GDP here for pedagogical reasons: to provide a simple
illustration of how we can account for the presence of heteroskedasticity.
where
The theory suggests that the model’s coefficient, βGDP, is positive. To keep the exposition clear,
we will use data from a single year, 1992, to test this theory:
527 Heteroskedasticity
Table 16.2
Internet regression results
1992 Internet data: Cross-sectional data of Internet use and gross domestic product for 29
countries in 1992.
Using statistical software, we run a regression with the log of Internet use as the dependent
variable and the per capita GDP as the explanatory variable (table 16.2).
Since the evidence appears to support the theory, we construct the null and alternative
hypotheses:
H0: βGDP = 0 Per capita GDP does not affect Internet use
H1: βGDP > 0 Higher per capita GDP increases Internet use
As always, the null hypothesis challenges the evidence; the alternative hypothesis is consistent
with the evidence. Next we calculate Prob[Results IF H0 true].
528 Chapter 16
Prob[Results IF H0 true]: What is the probability that the GdpPC estimate from one repetition
of the experiment will be 0.101 or more, if H0 were true (i.e., if the per capita GDP has no effect
on the Internet use, if βGDP actually equals 0)?
To emphasize that the Prob[Results IF H0 true] depends on the standard error we will use the
Econometrics Lab to calculate the probability. The following information has been entered in
the lab:
Click Calculate.
We use the standard error provided by the ordinary least squares (OLS) regression results to
compute the Prob[Results IF H0 true].
We can also calculate Prob[Results IF H0 true] using the tails probability reported in the
regression printout. Since this is a one-tailed test, we divide the tails probability by 2:
0.0046
Prob[ Results IF H 0 true] = ≈ 0.0023
2
Based on the 1 percent significance level, we would reject that null hypothesis. We would reject
the hypothesis that per capita GDP has no effect on Internet use.
There may a problem with this, however. The equation used by the ordinary least squares
(OLS) estimation procedure to estimate the variance of the coefficient estimate’s probability
distribution assumes that the error term equal variance premise is satisfied. Our simulation
revealed that when heteroskedasticity is present and the error term equal variance premise is
violated, the ordinary least squares (OLS) estimation procedure estimating the variance of the
coefficient estimate’s probability distribution was flawed. Recall that the standard error equals
529 Heteroskedasticity
the square root of the estimated variance. Consequently, if heteroskedasticity is present, we may
have entered the wrong value for the standard error into the Econometrics Lab when we calcu-
lated Prob[Results IF H0 true]. When heteroskedasticity is present the ordinary least squares
(OLS) estimation procedure bases it computations on a faulty premise, resulting in flawed stan-
dard errors, t-Statistics, and tails probabilities. Consequently we should move on to the next
step.
Intuition leads us to suspect that the answer is yes. When the per capita GDP is low, individuals
have little to spend on any goods other than the basic necessities. In particular, individuals have
little to spend on Internet use and consequently Internet use will be low. This will be true for
all countries in which per capita GDP is low. On the contrary, when per capita GDP is high,
individuals can afford to purchase more goods. Naturally, consumer tastes vary from nation to
nation. In some high per capita GDP nations, individuals will opt to spend much on Internet use
while in other nations individuals will spend little. A scatter diagram of per capita GDP and
Internet use appears to confirm our intuition (figure 16.6).
3
Log Users Internet
0
0 5 10 15 20 25 30 35
GdpPC
Figure 16.6
Scatter diagram: GdpPC versus LogUsersInternet
530 Chapter 16
As per capita GDP rises we observe a greater variance for the log of Internet use per 1,000
persons. In nations with low levels of per capita GDP (less than $15,000), the log varies between
about 0 and 1.6. Whereas in nations with high level of per capita GDP (more than $15,000), the
log varies between about 0 and 3.20. What does this suggest about the error term in our model:
LogUsersInternett = βConst + βGDPGdpPCt + et
Two nations with virtually the same level of per capita GDP have quite different rates of Internet
use. The error term in the model would capture these differences. Consequently, as per capita
GDP increases, we would expect the variance of the error term’s probability distribution to
increase.
Of course, we can never observe the error terms themselves. We can, however, think of the
residuals as the estimated error terms:
Since the residuals are observable we can plot a scatter diagram of the residuals, the estimated
errors, and per capita GDP to illustrate how they are related (figure 16.7).
0.4
0.3
0.2
0.1
Residuals
0
0 5 10 15 20 25 30 35
–0.1
–0.2
–0.3
–0.4
GdpPC
Figure 16.7
Scatter Diagram: GdpPC versus Residuals
531 Heteroskedasticity
Our suspicions appear to be borne out. The residuals in nations with high per capita GDP are
more spread out than in nations with low per capita GDP. It appears that heteroskedasticity could
be present. The error term equal variance premise may be violated. Consequently we must be
suspicious of the standard errors and probabilities appearing in the regression printout; the
ordinary least squares (OLS) estimation procedure is calculating these values based on what
could be an invalid premise, the error term equal variance premise.
Since the scatter diagram suggests that our fears may be warranted, we now test the hetero-
skedasticity more formally. While there are several different approaches, we will focus on the
Breusch–Pagan–Godfrey test, which utilizes an artificial regression based on the following
model:
Heteroskedasticity model:
The model suggests that as GdpPCt increases, the squared deviation of the error term from its
mean increases. Based on the scatter diagram appearing in figure 16.7, we suspect that αGDP is
positive:
Theory:
αGDP > 0
We can simplify this model by recognizing that the error term represents random influences;
hence the mean of its probability distribution equals 0; hence,
(et − Mean[et])2 = e 2t
e 2t = αConst + αGDPGdpPCt + vt
532 Chapter 16
Table 16.3
Breusch–Pagan–Godfrey results
Of course, we can never observe the error terms themselves. We can, however, think of the
residuals as the estimates of the error terms. We substitute the residual squared for the error term
squared:
Heteroskedasticity model:
Next formulate the null and alternative hypotheses for the artificial regression model:
H0: αGDP = 0 Per capita GDP does not affect the squared deviation of the residual
H1: αGDP > 0 Higher per capita GDP increases the squared deviation of the residual
and compute Prob[Results IF H0 true] from the tails probability reported in the regression
printout:
0.0118
Prob[ Results IF H 0 true] = ≈ 0.0059
2
We reject the null hypothesis at the traditional significance levels of 1, 5, and 10 percent. Our
formal test reinforces our suspicion that heteroskedasticity is present. Furthermore note that the
estimate of the constant is not statistically significantly different from 0 even at the 10 percent
significance level. We will exploit this to simplify the mathematics that follow. We assume that
the variance of the error term’s probability distribution is directly proportional to per capita GDP:
Var[et] = V × GdpPCt
Strategy: Algebraically manipulate the original model so that the problem of heteroskedasticity
is eliminated in the new model. That is, tweak the original model so that variance of each nation’s
error term’s probability distribution is the same. We can accomplish this with just a little algebra.
Based on our scatter diagram and the Breusch–Pagan–Godfrey test, we assume that the vari-
ance of the error term’s probability distribution is proportional to per capita GDP:
Var[et] = V × GdpPCt
Original model:
Tweaked model:
⎡ et ⎤
Var[e t ] = Var ⎢ ⎥
⎣ GdpPCt ⎦
↓ Arithmetic of variances: Van[cx] = c2Var[x]
1
= Var[ et ]
GdpPCt
↓ Var[et] = V × GpcPCt, where V equals a constant
1
= V × GpcPCt
GdpPCt
=V
We divided the original model by GdpPCt so that the variance of the error term’s probability
distribution in the tweaked model equals V for each observation. Consequently the error term
equal variance premise is satisfied in the tweaked model. Therefore the ordinary least squares
(OLS) estimation procedure computations of the estimates for the variance of the error term’s
probability distribution will not be flawed in the tweaked model.
The dependent and explanatory variables in the new tweaked model are:
LogUsersInternett
Tweaked dependent variable: AdjLogUsersInternett =
GdpPCt
1
Tweaked explanatory variables: AdjConstt =
GdpPCt
AdjGdpPCt = GdpPCt
Table 16.4
Tweaked Internet regression results
Table 16.5
Comparison of Internet regression results
Now let us compare the tweaked regression for βGDP and the original one (table 16.5).
The most striking differences are the calculations that are based on the estimated variance of
the coefficient probability distribution: the standard error, the t-Statistic, and Prob values. This
is hardly surprising. The ordinary least squares (OLS) regression calculations are based on the
error term equal variance premise. Our analysis suggests that this premise is violated ordinary
least squares (OLS) regression, however. Consequently the standard Error, t-Statistic, and Prob.
calculations, will be flawed when we use the ordinary least squares (OLS) estimation procedure.
The general least squares (GLS) regression corrects for this.
Recall the purpose of our analysis in the first place: Assess the effect of per capita GDP on
Internet use. Recall our theory and associated hypotheses:
Theory: Higher per capita GDP increases Internet use.
H0: βGDP = 0 Per capita GDP does not affect Internet use
H1: βGDP > 0 Higher per capita GDP increases Internet use
We see that the value of the tails probability decreases from 0.0046 to 0.0002. Since a one-tailed
test is appropriate, the Prob[Results IF H0 true] declines from 0.0023 to 0.0001. Accounting for
heteroskedasticity has an impact on the analysis.
536 Chapter 16
We will now use a simulation to illustrate that the generalized least squares (GLS) estimation
procedure for the following cases:
• Value of the coefficient estimate is unbiased.
• Variance of the coefficient estimate’s probability distribution is unbiased.
• Value of the coefficient estimate is the best linear unbiased estimation procedure (BLUE).
A new drop down box now appears in figure 16.8. We can specify the estimation procedure:
either ordinary least squares (OLS) or generalized least squares (GLS). Initially, OLS is specified
indicating that the ordinary least squares (OLS) estimation procedure is being used. Also note
that by default 0.0 is specified in the Heter list, which means that no heteroskedasticity is present.
Recall our previous simulations illustrating that the ordinary least squares (OLS) estimation
procedure to estimate the coefficient value and the ordinary least squares (OLS) procedure to
estimate the variance of the coefficient estimate’s probability distribution were both unbiased
when no heteroskedasticity is present. To review this, click Start and then after many, many
repetitions click Stop.
Next introduce heteroskedasticity by selecting 1.0 from the “Heter” list. Recall that while the
ordinary least squares (OLS) estimation procedure for the coefficient’s value was still unbiased,
the ordinary least squares (OLS) estimation procedure for the variance of the coefficient esti-
mate’s probability distribution was biased. To review this, click Start and then after many, many
repetitions click Stop.
Finally, select the generalized least squares (GLS) estimation procedure instead of the ordinary
least squares (OLS) estimation procedure. Click Start and then after many, many repetitions
click Stop. The generalized least squares (GLS) results are reported in the last row of table 16.6.
When heteroskedasticity is present and the generalized least squares (GLS) estimation procedure
is used, the variance of the estimated coefficient values from each repetition of the experiment
equals the average of the estimated variances. This suggests that the generalized least squares
(GLS) procedure indeed provides an unbiased estimation procedure for the variance. Also note
that when heteroskedasticity is present, the variance of the estimated values resulting from
generalized least squares (GLS) is less than ordinary least squares (OLS), 2.3 versus 2.9. What
does this suggest? The lower variance suggests that the generalized least squares (GLS) proce-
dure provides more reliable estimates when heteroskedasticity is present. In fact it can be shown
that the generalized least squares (GLS) procedure is indeed the best linear unbiased estimation
(BLUE) procedure when heteroskedasticity is present.
537 Heteroskedasticity
bx =
Σt=1 (xt − −x)2
T Variance of the estimated
coefficient values from all
Sum sqr XDev repetitions
SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
the coefficient estimate’s Mean
probability distribution Mean(average) of the variance
calculated from this repetition estimates from all repetitions
Figure 16.8
Heteroskedasticity simulation
Table 16.6
Heteroskedasticity simulation results
Let us summarize:
Standard
Is the estimation procedure: Heteroskedasticity
premises
an unbiased estimation procedure for the OLS OLS GLS
coefficient’s value? Yes Yes Yes
variance of the coefficient estimate’s Yes No Yes
probability distribution?
for the coefficient value the best linear Yes No Yes
unbiased estimation procedure (BLUE)?
Robust standard errors address the first issue and are particularly appropriate when the sample
size is large. White standard errors constitute one such approach. We will not provide a rigorous
justification of this approach, the mathematics is too complex. We will, however, provide the
motivation by taking a few liberties. Begin by reviewing our derivation of the variance of the
coefficient estimate’s probability distribution, Var[bx], presented in chapter 6:
∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎝
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
2 2 ⎠ ⎦
539 Heteroskedasticity
1
= Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2
Error term/error term independence premise: The error terms are independent, Var[x + y] =
Var[x] + Var[y]
1
= [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2
1
=
[( x1 − x ) + ( x2 − x )2 + ( x3 − x )2 ]2
2
∑
T
t =1
( xt − x )2 Var[ et ]
= 2
⎡ ∑ T ( xt − x )2 ⎤
⎣ t =1 ⎦
Focus on Var[et] and recall that the variance equals the average of the squared deviations from
the mean:
Var[et] = e 2t
While the error terms are not observable, we can think of the residuals as the estimated error
term. Consequently we will use Res 2t to estimate e 2t :
e 2t → Res 2t
540 Chapter 16
Table 16.7
Internet regression results—Robust standard errors
Applying this to the equation for the variance of the coefficient estimate’s probability distribution
obtains
∑
T
t =1
( xt − x )2 Var[ et ]
Var[ b ] =
x 2
⎡ ∑ T ( xt − x )2 ⎤
⎣ t =1 ⎦
↓ Substituting e2t for Var[et]
∑
T
t =1
( xt − x )2 et2
= 2
⎡∑ T
( xt − x )2 ⎤
⎣ t =1 ⎦
↓ Residuals as estimated error terms: e2t → Res2t
∑ ( x − x ) Res
T 2 2
t =1 t t
EstVar[ bx ] = 2
⎡∑ ( x − x ) ⎤
T 2
⎣ t =1 ⎦ t
The White robust standard error is the square root of the estimated variance.2 Statistical software
makes it easy to compute robust standard errors (table 16.7).
2. While it is beyond the scope of this textbook, it can be shown that although this estimation procedure is biased, the
magnitude of the bias diminishes and approaches zero as the sample size approaches infinity.
541 Heteroskedasticity
Chapter 16 Exercises
Judicial data: Cross-sectional data of judicial and economic statistics for the fifty states in 2000.
JudExpt State and local expenditures for the judicial system per 100,000 persons in state t
CrimesAllt Crimes per 100,000 persons in state t
GdpPCt Real per capita GDP in state t (2000 dollars)
Popt Population in state t (persons)
UnemRatet Unemployment rate in state t (percent)
Statet Name of state t
Yeart Year
1. We wish to explain state and local judicial expenditures. To do so, consider the following
linear model:
3. Apply the generalized least squares (GLS) estimation procedure to the judicial expenditure
model. To simplify the mathematics, use the following equation to model the variance of the
error term’s probability distribution:
Var[et] = V × GdpPCt
where V equals a constant.
Burglary and poverty data: Cross section of burglary and economic statistics for the fifty states
in 2002.
543 Heteroskedasticity
a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient using the burglary and poverty data. Interpret the coefficient estimates. What are
the critical results?
• The unemployment rate is calculated from a sample of the population. The sample size in each
state is approximately proportional to the state’s population.
• Is the reported unemployment rate is a state with a large population more or less reliable that
a state with a small population?
•Would you expect the variance of the error term’s probability distribution in a large state to
be more or less than the variance in a small state?
b. Consider the ordinary least squares (OLS) estimates of the parameters that you computed
in the previous question. Plot the residuals versus population.
544 Chapter 16
i. Does your graph appear to confirm your suspicions concerning the presence of
heteroskedasticity?
ii. If so, does the variance appear to be directly or inversely proportion to population?
c. Based on your suspicions, formulate a model of heteroskedasticity.
d. Use the Breusch–Pagan–Godfrey approach to test for the presence of heteroskedasticity.
7. Apply the generalized least squares (GLS) estimation procedure to the judicial expenditure
model. To simplify the mathematics use the following equation to model the variance of the
error term’s probability distribution:
V
Var[ et ] =
Popt
8. How, if at all, does accounting for heteroskedasticity affect the assessment of your
theories.
Autocorrelation (Serial Correlation)
17
Chapter 17 Outline
17.1 Review
17.1.1 Regression Model
17.1.2 Standard Ordinary Least Squares (OLS) Premises
17.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure
17.1.4 Covariance and Independence
17.3 Autocorrelation and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
17.3.1 The Mathematics
17.3.2 Our Suspicions
17.3.3 Confirming Our Suspicions
2. In chapter 6 we showed that the ordinary least squares (OLS) estimation procedure for the
coefficient value was unbiased; that is, we showed that
546 Chapter 17
Mean[bx] = βx
Review the algebra. What role, if any, did the second premise ordinary least squares (OLS)
premise, the error term/error term independence premise, play?
3. In chapter 6 we showed that the variance of the coefficient estimate’s probability distribution
equals the variance of the error term’s probability distribution divided by the sum of the squared
x deviations; that is, we showed that
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2
Review the algebra. What role, if any, did the second premise ordinary least squares (OLS)
premise, the error term/error term independence premise, play?
Consumer durable data: Monthly time series data of consumer durable production and income
statistics 2004 to 2009.
d. If the residual is positive in one month, is it usually positive in the next month?
e. If the residual is negative in one month, is it usually negative in the next month?
7. Consider the following equations:
yt = βConst + βxxt + et
et = ρet−1 + vt
Rest = yt − Estyt
Start with the last equation, the equation for Rest. Using algebra and the other equations,
show that
yt = βConst + βxxt + et
et = ρet−1 + vt
Multiply the yt−1 equation by ρ. Then subtract it from the yt equation. Using algebra and the et
equation show that
17.1 Review
where
yt = dependent variable
et = error term
xt = explanatory variable
T = sample size
The error term is a random variable that represents random influences: Mean[et] = 0
Again, we begin by focusing our attention on the standard ordinary least squares (OLS) regres-
sion premises:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:
17.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure
The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
549 Autocorrelation (Serial Correlation)
∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
When the standard ordinary least squares (OLS) regression premises are met:
• Each estimation procedure is unbiased; that is, each estimation procedure does not systemati-
cally underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).
Crucial point: When the ordinary least squares (OLS) estimation procedure performs its calcu-
lations, it implicitly assumes that the standard ordinary least squares (OLS) regression premises
are satisfied.
In chapter 16 we focused on the first standard ordinary least squares (OLS) premise. We will
now turn our attention to the second, error term/error term independence premise. We begin by
examining precisely what the premise means. Subsequently, we investigate what problems do
and do not emerge when the premise is violated and finally what can be done to address the
problems that do arise.
We introduced covariance to quantify the notions of correlation and independence. On the one
hand, if two variables are correlated, their covariance is nonzero. On the other hand, if two
variables are independent their covariance is 0. A scatter diagram allows us to illustrate how
covariance is related to independence and correlation. To appreciate why, consider the equation
we use to calculate covariance:
∑ ( xt − x ) ( yt − y )
N
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + … + ( xN − x ) ( yN − y ) t =1
Cov[ x, y] = =
N N
550 Chapter 17
(y − −
y)
i
Quadrant II Quadrant I
− ) < 0 (y − − − ) > 0 (y − −
(x − x
(x − x y) > 0 y) > 0
i i i i
−)( y − − − −
(x i − x y) < 0 (x i − x)( y i − y) > 0
i
−
(x i − x )
− ) < 0 (y − −
(x − x − ) > 0 (y − −
y) < 0 (x − x y) < 0
i i i i
−)(y − −
(x − x − −
y) > 0 (x i − x)(y i − y) < 0
i i
Figure 17.1
Scatter diagram and covariance
Focus on one term in the numerator the covariance term, (xt − x– )(yi − –y ); consider its sign in
each of the four quadrants (see figure 17.1).
•First quadrant. Dow growth rate is greater than its mean and Nasdaq growth is greater than its
mean; the product of the deviations is positive in the first quadrant:
Recall that we used precipitation in Amherst, the Nasdaq growth rate, and the Dow Jones growth
rate to illustrate independent and correlated variables in chapter 1.
551 Autocorrelation (Serial Correlation)
20 Nasdaq
deviation
Deviations
from mean
10
Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5
–10
Cov = ~0.9 ≈ 0
–20
Figure 17.2
Precipitation versus Nasdaq growth
Precipitation in Amherst and the Nasdaq growth rate are independent; knowing one does not
help us predict the other. Figure 17.2 shows that the scatter diagram points are distributed rela-
tively evenly throughout the four quadrants thereby suggesting that the covariance is approxi-
mately 0. However, the Dow Jones growth rate and the Nasdaq growth rate are not independent;
they are correlated. Most points on figure 17.3 are located in the first and third quadrants; con-
sequently most of the covariance terms are positive resulting in a positive covariance.
Autocorrelation (serial correlation) is present whenever the value of one observation’s error
term allows us to predict the value of the next. When this occurs, one observation’s error term
is correlated with the next observation’s; the error terms are correlated, and the second premise,
the error term/error term independence premise, is violated. The following equation models
autocorrelation:
Autocorrelation model:
et = ρet−1 + vt,
20 Nasdaq
Feb 2000 deviation
Deviations
from mean
Jan 1987
10
Dow Jones
deviation
0
–20 –10 0 10 20
–10
Cov = 19.5
–20
Figure 17.3
Dow Jones growth versus Nasdaq growth
The Greek letter “rho” is the traditional symbol that is used to represent autocorrelation. When
rho equals 0, no autocorrelation is present; when rho equals 0, the ρet−1 term disappears and the
error terms, the e’s, are independent because the vt’s are independent. However, when rho does
not equal 0, autocorrelation is present.
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
We can use a simulation to illustrate autocorrelation. We begin with selecting .0 in the “rho” list
(figure 17.4). Focus on the et−1 versus et scatter diagram (figure 17.5). You will observe that this
scatter diagram looks very much like the Amherst precipitation–Nasdaq scatter diagram (figure
17.2), indicating that the two error terms are independent; that is, does knowing et−1 not help us
to predict et? Next specify rho to equal 0.9. Now the scatter diagram (figure 17.6) will look
much more like the Dow Jones–Nasdaq scatter diagram (figure 17.3), suggesting that for the
most part, when et−1 is positive et will be positive also, or alternatively when et−1 is negative, et
will be negative also; this illustrates positive autocorrelation.
Rho
− 0.9
− 0.6
− 0.3
0.0
0.3
Figure 17.4
Rho list
et
et-1
Figure 17.5
ρ=0
554 Chapter 17
et
et-1
Figure 17.6
ρ = 0.9
17.3 Autocorrelation and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
Now let us explore the consequences of autocorrelation. Just as with heteroskedasticity, we will
focus on two of the three estimation procedures embedded within the ordinary least squares
(OLS) estimation procedure:
• Value of the coefficient.
• Variance of the coefficient estimate’s probability distribution.
Question: Are these estimation procedures still unbiased when autocorrelation is present?
Ordinary Least Squares (OLS) Estimation Procedure for the Coefficient Value
Begin by focusing on the coefficient value. Previously we showed that the estimation procedure
for the coefficient value was unbiased by
• applying the arithmetic of means
and
• recognizing that the means of the error terms’ probability distributions equal 0 (since the error
terms represent random influences).
555 Autocorrelation (Serial Correlation)
∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x )2
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦
Applying Mean[cx] = c Mean[x]
1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2
1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Applying Mean[cx] = c Mean[x]
1
= βx + [( x1 − x )Mean[ e1 ] + ( x2 − x )Mean[ e2 ] + ( x3 − x )Mean[ e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
1. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can easily apply the arithmetic of means. Our results are unaffected by this assumption.
556 Chapter 17
= βx
What is the critical point here? We have not relied on the error term/error term independence
premise to show that the estimation procedure for the coefficient value is unbiased. Consequently
we suspect that the estimation procedure for the coefficient value will continue to be unbiased
in the presence of autocorrelation.
Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the Coefficient
Estimate’s Probability Distribution
Next consider the estimation procedure for the variance of the coefficient estimate’s probability
distribution used by the ordinary least squares (OLS) estimation procedure:
The strategy involves two steps:
• First, we used the adjusted variance to estimate the variance of the error term’s probability
distribution: EstVar[e] = SSR/Degress of freedoms.
• Second, we applied the equation relating the variance of the coefficient estimates
probability distribution and the variance of the error term’s probability distribution:
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2 .
T
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
Unfortunately, when autocorrelation is present, the second step is not justified. To understand
why, recall the arithmetic of variances:
Since the covariance of independent variables equals 0, we can simply ignore the covariance
terms when calculating the sum of independent variables. However, if two variables are not
independent, their covariance does not equal 0. Consequently, when calculating the variance of
the sum of two variables that are not independent we cannot ignore their covariance.
Next apply this to the error terms when autocorrelation is absent and when it is present:
We will now review our derivation of the relationship between the variance of the coefficient
estimate’s probability distribution and the variance of the error term’s probability distribution,
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2 , to illustrate the critical role played by the error term/error term
T
independence premise. We began with the equation for the coefficient estimate:
Equation for coefficient estimate:
∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2
2. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can apply the arithmetic of variances easily. Our results are unaffected by this assumption.
558 Chapter 17
⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎣⎝ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎠ ⎦
2 2
Error term/error term independence premise: The error terms are independent, Var[x + y] =
Var[x] + Var[y]
1
= [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2
Error term equal variance premise: The error term variance is identical, Var[e1] = Var[e2] =
Var[e3] = Var[e].
Simplifying
Var[ e]
=
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Generalizing
Var[ e]
=
∑
T
t =1
( xt − x )2
Focus on the fourth step. When the error term/error term independence premise is satisfied,
that is, when the error terms are independent, we can ignore the covariance terms when calculat-
ing the variance of a sum of variables:
559 Autocorrelation (Serial Correlation)
1
Var [ e ] = Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2
Error term/error term independence premise: The error terms are independent: Var[x + y] =
Var[x] + Var[y].
When autocorrelation is present, however, the error terms are not independent and the covariance
terms cannot be ignored. Therefore, when autocorrelation is present, the fourth step is invalid:
1
Var[ bx ] = [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2
Consequently, in the presence of autocorrelation, the equation we used to describe the relation-
ship between the variances of the probability distribution for the error term and the probability
distribution coefficient estimate is no longer valid:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2
The procedure used by the ordinary least squares (OLS) to estimate the variance of the coef-
ficient estimate’s probability distribution is flawed.
Step 1: Estimate the variance of the error Step 2: Apply the relationship between the
term’s probability distribution from the variances of coefficient estimate’s and error
available information—data from the first quiz term’s probability distributions
↓ ↓
SSR Var[ e]
EstVar[ e] = Var[ bx ] =
∑
T
Degrees of freedom
t =1
( xt − x ) 2
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
The equation that the ordinary least squares (OLS) estimation procedure uses to estimate the
variance of the coefficient estimate’s probability distribution is flawed when autocorrelation is
present. Consequently, how can we have faith in the variance estimate?
Let us summarize. After reviewing the algebra, we suspect that when autocorrelation is present,
the ordinary least squares (OLS) estimation procedure for the
560 Chapter 17
We will use a simulation to confirm our suspicions (shown in table 17.1 and figure 17.7).
Econometrics Lab 17.2: The Ordinary Least Squares (GLS) Estimation Procedure and
Autocorrelation
Table 17.1
Autocorrelation simulation results
Rho
− 0.9
− 0.6
− 0.3
0.0
0.3
0.6
0.9
Figure 17.7
Specifying rho
• Bad news: The ordinary least squares (OLS) estimation procedure for the variance of the
coefficient estimate’s probability distribution is biased. The average the actual variance of the
estimated coefficient values equals 1.11 while the average of the estimated variances equals 0.28.
Just as we feared, when autocorrelation is present, the ordinary least squares (OLS) calculations
to estimate the variance of the coefficient estimates are flawed.
When the estimation procedure for the variance of the coefficient estimate’s probability dis-
tribution is biased, all calculations based on the estimate of the variance will be flawed also;
that is, the standard errors, t-statistics, and tail probabilities appearing on the ordinary least
squares (OLS) regression printout are unreliable. Consequently we will use an example to
explore how we account for the presence of autocorrelation.
Step 1: Apply the ordinary least squares (OLS) estimation procedure. Estimate the model’s
parameters with the ordinary least squares (OLS) estimation procedure.
Step 2: Consider the possibility of autocorrelation.
• Ask whether there is reason to suspect that autocorrelation may be present.
• Use the ordinary least squares (OLS) regression results to “get a sense” of whether autocorrela-
tion is a problem by examining the residuals.
• Use the Lagrange multiplier approach by estimating an artificial regression to test for the
presence of autocorrelation.
• Estimate the value of the autocorrelation parameter, ρ.
Step 3: Apply the generalized least squares (GLS) estimation procedure.
• Apply the model of autocorrelation and algebraically manipulate the original model to derive
a new, tweaked model in which the error terms do not suffer from autocorrelation.
562 Chapter 17
• Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the
tweaked model.
Time series data often exhibit autocorrelation. We will consider monthly consumer durables
data:
Consumer durable data: Monthly time series data of consumer durable consumption and
income statistics 2004 to 2009.
ConsDurt Consumption of durables in month t (billions of 2005 chained dollars)
Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained dollars)
Project: Assess the effect of disposable income on the consumption of consumer durables.
These particular start and end dates were chosen to illustrate the autocorrelation phenomenon
clearly.
Economic theory suggests that higher levels of disposable income increase the consumption of
consumer durables:
Theory:
βI > 0
Step 1: Apply the ordinary least squares (OLS) estimation procedure (table 17.2).
H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables
As always, the null hypothesis challenges the evidence; the alternative hypothesis is consistent
with the evidence. Next we calculate Prob[Results IF H0 true].
Prob[Results IF H0 true]: What is the probability that the Inc coefficient estimate from one
repetition of the experiment will be 0.087 or more, if H0 were true (i.e., if the per capita income
has no effect on the Internet use, if βI actually equals 0)?
563 Autocorrelation (Serial Correlation)
Table 17.2
OLS consumer durable regression results
To emphasize that the Prob[Results IF H0 true] depends on the standard error we will use the
Econometrics Lab to calculate the probability. The following information has already been
entered:
Click Calculate.
We use the standard error provided by the ordinary least squares (OLS) regression results to
compute the Prob[Results IF H0 true].
We can also calculate the Prob[Results IF H0 true] by using the tails probability reported in
the regression printout. Since this is a one-tailed test, we divide the tails probability by 2:
< 0.0001
Prob[ Results IF H 0 true] = ≈ < 0.0001
2
564 Chapter 17
Based on the 1 percent significance level, we would reject that null hypothesis. We would
reject the hypothesis that disposable income has no effect on the consumption of consumer
durables use.
There may a problem with this, however. The equation used by the ordinary least squares
(OLS) estimation procedure to estimate the variance of the coefficient estimate’s probability
distribution assumes that the error term/error term independence premise is satisfied. Our simula-
tion revealed that when autocorrelation is present and the error term/error term independence
premise is violated, the ordinary least squares (OLS) estimation procedure estimating the vari-
ance of the coefficient estimate’s probability distribution can be flawed. Recall that the standard
error equals the square root of the estimated variance. Consequently, if autocorrelation is present,
we may have entered the wrong value for the standard error into the Econometrics Lab when
we calculated Prob[Results IF H0 true]. When autocorrelation is present the ordinary least
squares (OLS) estimation procedure bases it computations on a faulty premise, resulting in
flawed standard errors, t-statistics, and tails probabilities. Consequently we should move on to
the next step.
Unfortunately, there is reason to suspect that autocorrelation may be present. We would expect
the consumption of durables are not only influenced by disposable income, but also by the busi-
ness cycle:
• When the economy is strong, consumer confidence tends to be high; consumers spend more
freely and purchase more than “usual.” When the economy is strong the error term tends to be
positive.
• When the economy is weak, consumer confidence tends to be low; consumers spend less
freely and purchase less than “usual.” When the economy is weak the error term tends to be
negative.
We know that business cycles tend to last for many months, if not years. When the economy is
strong, it remains strong for many consecutive months; hence, when the economy is strong we
would expect consumers to spend more freely and for the error term to be positive for many
consecutive months. On the other hand, when the economy is weak, we would expect consumers
to spend less freely and the error term to be negative for many consecutive months.
565 Autocorrelation (Serial Correlation)
We can think of the residuals as the estimated errors. Since the residuals are observable we use
the residuals as proxies for the error terms. Figure 17.8 plots the residuals.
The residuals are plotted consecutively, one month after another. As we can easily see, a posi-
tive residual is typically followed by another positive residual; a negative residual is typically
followed by a negative residual. “Switchovers” do occur, but they are not frequent. This suggests
that positive autocorrelation is present. Most statistical software provides a very easy way to
look at the residuals.
566 Chapter 17
100
75
50
25
Residual
0
Jan-04 Jun-05 Oct-06 Mar-08 Jul-09
–25
–50
–75
–100
Figure 17.8
Plot of the residuals
It is also instructive to construct a scatter diagram (figure 17.9) of the residuals versus the
residuals lagged one month. Most of the scatter diagram points lie in the first and third quadrants.
The residuals are positively correlated.
Since the residual plots suggest that our fears are warranted, we now test the autocorrelation
model more formally. While there are many different approaches, we will focus on the Lagrange
multiplier (LM) approach, which uses an artificial regression to test for autocorrelation.3 We will
proceed by reviewing a mathematical model of autocorrelation.
3. The Durbin–Watson statistic is the traditional method of testing for autocorrelation. Unfortunately, the distribution
of the Durbin–Watson statistic depends on the distribution of the explanatory variable. This makes hypotheses testing
with the Durbin–Watson statistic more complicated than with the Lagrange multiplier test. Consequently we will focus
on the Lagrange multiplier test.
567 Autocorrelation (Serial Correlation)
100
80
60
40
20
Residual
0
–100 –75 –50 –25 0 25 50 75 100
–20
–40
–60
–80
–100
Residual lag
Figure 17.9
Scatter diagram of the residuals
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
In this case we believe that ρ is positive. A positive rho provides the error term with inertia. A
positive error term tends to follow a positive error term and a negative error term tends to follow
a negative term. But also note that there is a second term, vt. The vt’s are independent; they
represent random influences that affect the error term also. It is the vt’s that “switch” the sign
of the error term.
Now we combine the original model with the autocorrelation model:
568 Chapter 17
Rearranging terms
= (βConst − bConst) + (βx − bx)xt + ρet−1 + vt Cannot observe et−1 use Rest−1 instead
↓
NB: Since the vt’s are independent, we need not worry about autocorrelation here.
Most statistical software allows us to easily assess this model (table 17.3).
Critical result: The Resid(−1) coefficient estimate equals 0.8394. The positive sign of the coef-
ficient estimate suggests that an increase in last period’s residual increases this period’s residual.
This evidence suggests that autocorrelation is present.
Now we formulate the null and alternative hypotheses:
569 Autocorrelation (Serial Correlation)
Table 17.3
Lagrange multiplier test results
The null hypothesis challenges the evidence by asserting that no autocorrelation is present. The
alternative hypothesis is consistent with the evidence.
Next we calculate Prob[Results IF H0 true].
Prob[Results IF H0 true]: What is the probability that the coefficient estimate from one regres-
sion would be 0.8394 or more, if the H0 were true (i.e., if no autocorrelation were actually
present, if ρ actually equals 0)?
ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present
In practice there are a variety of ways to estimate ρ. We will discuss what is perhaps the most
straightforward. Since the error terms are unobservable, we “replace” the error terms with the
residuals:
570 Chapter 17
Table 17.4
Regression results—Estimating ρ
Model: et = ρet−1 + vt
↓ ↓
Rest = ρRest−1 + vt
where vt’s are independent. Note that there is no constant in this model (table 17.4).
• Run the original regression; EViews automatically calculates the residuals and places them in
the variable resid.
• EViews automatically modifies Resid every time a regression is run. Consequently we will
now generate two new variables before running the next regression to prevent a “clash:”
residual = resid
residuallag = residual(−1)
•Now specify residual as the dependent variable and residuallag as the explanatory variable;
do not forget to “delete” the constant.
We can accomplish this with a little algebra. We begin with the original model and then apply
the autocorrelation model:
Original model: yt = βConst + βxxt + et
571 Autocorrelation (Serial Correlation)
yt = βConst + βxxt + et
Original model for period t − 1:
Multiplying by ρ,
yt = βConst + βxxt + et
Subtracting
Critical point: In the tweaked model, vt’s are independent; hence we need not be concerned
about autocorrelation in the tweaked model.
Now let us run the tweaked regression for our example; using the estimate of ρ, we generate
two new variables:
Table 17.5
GLS regression results—Accounting for autocorrelation
H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables
0.1545
Prob[ Results IF H 0 true] = ≈ 0.0772
2
After accounting for autocorrelation, we cannot reject the null hypothesis at the 1 or 5 percent
significance levels.
Let us now compare the disposable income coefficient estimate in last regression, the general-
ized least squares (GLS) regression that accounts for autocorrelation, with the disposable income
coefficient estimate in the ordinary least squares (OLS) regression that does not account for
autocorrelation (table 17.6). The most striking differences are the calculations that are based on
the estimated variance of the coefficient probability distribution: the coefficient’s standard error,
t-statistic, and tails probability. The standard error nearly doubles when we account for autocor-
relation. This is hardly surprising. The ordinary least squares (OLS) regression calculations are
based on the premise that the error terms are independent. Our analysis suggests that this is not
true. The general least squares (GLS) regression accounts for error term correlation. The standard
573 Autocorrelation (Serial Correlation)
Table 17.6
Coefficient estimate comparison
Table 17.7
Autocorrelation simulation results
error, t-statistic, and tails probability in the general least squares (GLS) regression differ
substantially.
We will now use a simulation to illustrate that the generalize least squares (GLS) estimation
procedure indeed provides “better” estimates than the ordinary least squares (OLS) estimation
procedure. While both procedures provide unbiased estimates of the coefficient’s value, only the
generalized least squares (GLS) estimation procedure provides an unbiased estimate of the
variance.
As before, choose a rho of 0.6; by default the ordinary least squares (OLS) estimation procedure
is chosen. Click Start and then after many, many repetitions click Stop. When the ordinary least
squares (OLS) estimation procedure is used, the variance of the estimated coefficient values
equals about 1.11. Now specify the generalized least squares (GLS) estimation procedure by
clicking GLS. Click Start and then after many, many repetitions click Stop. When the general-
ized least squares (GLS) estimation procedure is used, the variance of the estimated coefficient
values is less, 1.01. Consequently the generalized least squares (GLS) estimation procedure
provides more reliable estimates (table 17.7).
574 Chapter 17
Table 17.8
OLS regression results—Robust standard errors
As before, robust standard errors address the first issue arising when autocorrelation is present.
Newey–West standard errors provide one such approach that is suitable for both autocorrelation
and heteroskedasticity. This approach applies the same type of logic that we used to motivate
the White approach for heteroskedasticity, but it is more complicated. Consequently we will not
attempt to motivate the approach here. Statistical software makes it easy to compute Newey–
West robust standard errors (table 17.8).4
4. While it is beyond the scope of this textbook, it can be shown that while this estimation procedure is biased, the
magnitude of the bias diminishes and approaches zero as the sample size approaches infinity.
575 Autocorrelation (Serial Correlation)
Chapter 17 Exercises
Petroleum consumption data for Massachusetts: Annual time series data of petroleum con-
sumption and prices for Massachusetts from 1970 to 2004.
a. Time series data often exhibit autocorrelation. Consequently plot the residuals. Does the
plot of the residuals possible suggest the presence of autocorrelation?
b. Use the Lagrange multiplier approach by estimating an artificial regression to test for the
presence of autocorrelation.
c. Estimate the value of the autocorrelation parameter, ρ.
a. Apply the model of autocorrelation and algebraically manipulate the original model to
derive a new, tweaked model in which the error terms do not suffer from autocorrelation.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the tweaked model.
4. How, if at all, does accounting for autocorrelation affect the assessment of your theory?
Crime data for California: Annual time series data of crime and economic statistics for
California from 1989 to 2008.
a. Time series data often exhibit autocorrelation. Consequently plot the residuals. Does the
plot of the residuals possible suggest the presence of autocorrelation?
b. Use the Lagrange multiplier approach by estimating an artificial regression to test for the
presence of autocorrelation.
c. Estimate the value of the autocorrelation parameter, ρ.
a. Apply the model of autocorrelation and algebraically manipulate the original model to
derive a new, tweaked model in which the error terms do not suffer from autocorrelation.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the tweaked model.
8. How, if at all, does accounting for autocorrelation affect the assessment of your theories?
Explanatory Variable/Error Term Independence Premise, Consistency, and
18 Instrumental Variables
Chapter 18 Outline
18.1 Review
18.1.1 Regression Model
18.1.2 Standard Ordinary Least Squares (OLS) Premises
18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure
18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure
18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency
where
yt = dependent variable
et = error term
xt = explanatory variable
T = sample size
Suppose that the actual constant equals 6 and the actual coefficient equals 1/2:
1
βConst = 6, β x =
2
Also suppose that the sample size is 6. The following table reports the value of the explanatory
variable and the error term for each of the six observations:
Observation xt et
1 2 4
2 6 2
3 10 3
4 14 −2
5 18 −1
6 22 −4
a. On a sheet of graph paper place x on the horizontal axis and e on the vertical axis.
i. Plot a scatter diagram of x and e.
ii. As xt increases, does et typically increase or decrease?
iii. Is et positively or negatively correlated with xt?
b. Immediately below this graph construct a second graph with x on the horizontal axis and
y on the vertical axis.
i. Plot the line depicting the actual equation, the line representing the actual constant and
the actual coefficient:
1
y = 6+ x
2
581 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
Observation xt et yt
1 2 4 _____
2 6 2 _____
3 10 3 _____
4 14 −2 _____
5 18 −1 _____
6 22 −4 _____
iii. Plot the x and y values for each of the six observations.
c. Based on the points you plotted in your second graph, “eyeball” the best fitting line and
sketch it in.
d. How are the slope of the line representing the actual equation and the slope of the best
fitting line related?
2. Recall the poll Clint conducted to estimate the fraction of the student population that sup-
ported him in the upcoming election for class president. He used the following approach:
Random sample technique: Write the name of each individual in the population on a 3 × 5 card.
• Leave Clint’s dorm room and ask the first 16 people you run into if he/she is voting for Clint.
• Calculate the fraction of the sample supporting Clint.
Use the Econometrics Lab to simulate the two sampling techniques (figure 18.1).
Figure 18.1
Opinion Poll simulation
a. Answer the questions posed in the lab, and then fill in the following blanks:
b. What happens to the mean of the estimated fraction as the sample size increases?
c. Explain why your answer to part b “makes sense.” To do so, consider the following
questions:
583 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
i. Compared to the general student population, are the students who live near Clint more
likely to be Clint’s friends?
ii. Compared to the general student population, are the students who live near Clint more
likely to vote for him?
iii. Would the nonrandom sampling technique bias the poll in Clint’s favor?
iv. What happens to the magnitude of the bias as the sample size increases? Explain.
d. As the sample size increases, what happens to the variance of the estimates?
18.1 Review
where
yt = dependent variable
et = error term
xt = explanatory variable
T = sample size
Mean[et] = 0
Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.
584 Chapter 18
Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.
18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure
The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
• Values of the regression parameters, βx and βConst:
∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2
SSR
EstVar[ e] =
Degrees of freedom
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
When the standard ordinary least squares (OLS) regression premises are met:
• Each estimation procedure is unbiased; that is, each estimation procedure does not systemati-
cally underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).
Crucial point: When the ordinary least squares (OLS) estimation procedure performs its calcu-
lations, it implicitly assumes that the standard ordinary least squares (OLS) regression premises
are satisfied.
18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure
The ordinary least square (OLS) estimation procedure is economist’s most widely used estima-
tion procedure. When contemplating the use of this procedure, we should keep two issues in
585 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
mind: Is the ordinary least squares (OLS) estimation procedure for the coefficient value unbi-
ased? If unbiased, is the ordinary least squares (OLS) estimation procedure reliable in the fol-
lowing two ways:
• Can the calculations for the standard errors be trusted?
• Is the ordinary least square (OLS) estimation procedure for the coefficient value the most
reliable, the best linear unbiased estimation procedure (BLUE)?
In the previous two chapters we showed that the violation the first two standard ordinary least
squares (OLS) premises, the error term equal variance premise and the error term/error term
independence premise, does not cause the ordinary least squares (OLS) estimation procedure
for the coefficient value to be biased. This was good news. We then focused on the reliability
issue. We learned that the standard error calculations could not be trusted and that the ordinary
least squares (OLS) estimation procedure was not the best linear unbiased estimation procedure
(BLUE). In this chapter we turn our attention to the third premise, explanatory variable/error
term independence. Unfortunately, violation of the third premise does cause the ordinary least
squares (OLS) estimation procedure for the coefficient value to be biased. The explanatory vari-
able/error term independence premise determines whether or not the ordinary least squares
(OLS) estimation procedure is unbiased or biased. Figure 18.2 summarizes the roles played by
the three standard premises.
Satisfied Violated
OLS reliability question: Are the
error term equal variance and error
term/error term independence premises
satisfied or violated?
Figure 18.2
OLS bias and reliability flow diagram
586 Chapter 18
This chapter begins by explaining why bias results when the explanatory variable/error term
independence premise is violated. Next we introduce a new property that is used to describe
estimation procedures, consistency. Typically, consistency is considered to be not as desirable
as is being unbiased, but in some cases, estimation procedures that are biased sometimes meet
the consistency standard. We close the chapter by introducing one such procedure: the Instru-
mental Variables (IV) estimation procedure.
Initially the explanatory variable/error term correlation coefficient equals 0. Be certain that the
Pause checkbox is checked. Then click the Start button. Note that the blue points indicate the
observations with low x values, the black points the observations medium x values, and the red
points the observations with high x values. Click the Start button a few more times to convince
yourself that this is always true. Now clear the Pause checkbox and click Start. After many,
many repetitions, click Stop. Note that the scatter diagram points are distributed more or less
evenly across the graph as shown in figure 18.3.
Since the points are spread evenly, knowing the value of the explanatory variable, xt, does not
help us predict the value of the error term, et. The explanatory variable and the error term are
independent: the explanatory variable/error term independence premise is satisfied. The value
of x, low, medium, or high, does not affect the mean of the error terms. The mean is approxi-
mately 0 in each case (figure 18.4).
Next we select 0.60 in the Corr X&E list. Consequently the explanatory variable and error
term are now positively correlated. After many, many repetitions we observe that the explanatory
variable and the error term are no longer independent. The scatter diagram points are no longer
spread evenly; a pattern emerges. As illustrated in figure 18.5, as the value of explanatory vari-
able rises, the error term tends to rise also:
•When the value of the explanatory variable is low, the error term is typically negative. The
mean of the low x value error terms is negative (figure 18.6).
587 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
et
x
t
Figure 18.3
Scatter diagram—Corr X&E = 0
Figure 18.4
Error term probability distributions—Corr X&E = 0
•When the value of the explanatory variable is high and the error term is typically positive. The
mean of the high x value error terms is positive (figure 18.6).
Last, we select −0.60 in the Corr X&E list. Again, the scatter diagram points are not spread
evenly (figure 18.7). The explanatory variable and error term are now negatively correlated. As
the value of explanatory variable rises, the error term falls:
•When the value of the explanatory variable is low, the error term is typically positive. The
mean of the low x value error terms is positive.
•When the value of the explanatory variable is high, the error term is typically negative. The
mean of the high x value error terms is negative.
588 Chapter 18
e
t
x
t
Figure 18.5
Scatter diagram—Corr X&E = 0.6
Mean: –24 Variance: 500 Mean: 0 Variance: 500 Mean: 24 Variance: 500
Figure 18.6
Error term probability distributions—Corr X&E = 0.6
We will proceed by explaining geometrically why correlation between the explanatory vari-
ables and error terms biases the ordinary least squares (OLS) estimation procedure for coefficient
value. Then we will use a simulation to confirm our logic.
Focus attention on figure 18.8. The line in the lower two graphs represent the actual relationship
between the dependent variable, yt, and the explanatory variable, xt:
yt = βConst + βxxt
589 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
e
t
x
t
Figure 18.7
Corr X&E = −0.6
Explanatory variable and error term Explanatory variable and error term
e postively correlated e negatively correlated
t t
x x
t t
yt yt
Actual
equation line
Actual
equation line
xt xt
Figure 18.8
Explanatory variable/error term correlation
590 Chapter 18
βConst is the actual constant and βx the actual coefficient. Now we will examine the left and right
panels:
• Left panels of figure 18.8: The explanatory variable, xt, and error term, et, are positively
correlated as illustrated in the top left scatter diagram. The et tends to be low for low values of
xt and high for high values of xt. Now consider the bottom left scatter diagram in which the xt’s
and yt’s are plotted. When the explanatory variable and the error term are positively correlated,
the scatter diagram points tend to lie below the actual equation line for low values of xt and
above the actual equation line for high values of xt.
• Right panels of figure 18.8: The explanatory variable, xt, and error term, et, are negatively
correlated as illustrated in the top right scatter diagram. The et tends to be tends to be high for
low values of xt and low for high variables of xt. Now consider the bottom right scatter diagram
in which the xt’s and yt’s are plotted. When the explanatory variable and the error term are nega-
tively correlated, the scatter diagram points tend to lie above the actual regression line for low
values of xt and below the actual regression line for high values of xt.
In figure 18.9 we have added the best fitting line for each of the two panels:
• Left panels of figure 18.9: When the explanatory variable and error terms are positively cor-
related the best fitting line is more steeply sloped that the actual equation line; consequently the
ordinary least squares (OLS) estimation procedure for the coefficient value is biased upward.
• Right panels of figure 18.9: When the explanatory variable and error terms are negatively
correlated the best fitting line is less steeply sloped than the actual equation line; consequently,
the ordinary least squares (OLS) estimation procedure for the coefficient value is biased
downward.
Based on our logic we would expect the ordinary least squares (OLS) estimation procedure for
the coefficient value to be biased whenever the explanatory variable and the error term are
correlated.
Econometrics Lab 18.2: Ordinary Least Squares (OLS) and Explanatory Variable/Error Term
Correlation
We can confirm our logic using a simulation. As a base case, we begin with 0.00 specified in
the Corr X&E list; the explanatory variables and error terms are independent. Click Start and
then after many, many repetitions click Stop. The simulation confirms that no bias results when-
ever the explanatory variable/error term independence premise is satisfied.
591 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
↓
for low xt’s for high xt’s for low xt’s for high xt’s
↓ ↓ ↓ ↓
et’s low et’s high et’s high et’s low
↓ ↓ ↓ ↓
yt below actual yt above actual yt above actual yt below actual
regression line regression line regression line regression line
Explanatory variable and error term Explanatory variable and error term
positively correlated negatively correlated
et et
xt xt
yt yt
Best fitting line Actual
equation line
Actual
equation line
Best fitting line
xt xt
Figure 18.9
Explanatory variable/error term correlation with best fitting line
Now, specify 0.30 in the Corr X&E list; the explanatory variable and error terms are positively
correlated. Click Start and then after many, many repetitions click Stop. The average of the
estimated coefficient values, 6.1, exceeds the actual value, 2.0; the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased upward whenever the explanatory vari-
able and error terms are positively correlated. By selecting −0.6 from the “Corr X&E” list, we
can show that downward bias results whenever the explanatory variable and error terms are
negatively correlated. The average of the estimated coefficient values, −2.1, is less than the actual
value, 2.0 (table 18.1).
592 Chapter 18
Table 18.1
Explanatory variable/error term correlation—Simulation results
Explanatory variable/error term correlation creates a problem for the ordinary least squares
(OLS) estimation procedure. Positive correlation causes upward bias and negative correlation
causes downward bias. What can we do in these cases? Econometricians respond to this question
very pragmatically by adopting the philosophy that “half a loaf is better than none.” In general,
we use different estimation procedures that, while still biased, may meet an arguably less
demanding criterion called consistency. In most cases, consistency is not as desirable as is being
unbiased; nevertheless, if we cannot find an unbiased estimation procedure, consistency proves
to be better than nothing. After all, “half a loaf is better than none.” To explain the notion of
consistency, we begin by reviewing what it means for an estimation procedure to be unbiased
(figure 18.10).
Unbiased: An estimation procedure is unbiased whenever the mean (center) of the estimate’s
probability distribution equals the actual value.
Mean of the estimate’s probability distribution = Actual value
Mean (average) of the estimates = Actual value after many, many repetitions
593 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
Probability distribution
Estimate
Actual value
Figure 18.10
Unbiased estimation procedures and the probability distribution of estimates
Being unbiased is a small sample property because the size of the sample plays no role in
determining whether or not an estimation procedure is unbiased.
Consistent: Consistency is a large sample property; the size sample plays a critical role here.
Also both the mean and variance of the estimate’s probability distribution are important when
deciding if an estimation procedure is consistent.
Mean of the estimate’s probability distribution: Consistency requires the mean to either
• equal the actual value:
or
• approach the actual value as the sample size approaches infinity:
as
Sample size → ∞
or
• magnitude of the bias must diminish as the sample size becomes larger.
Variance[Est] → 0
as
Sample size → ∞
Figure 18.11 illustrates the relationship between the two properties of estimation procedures.
Figure 18.12 provides a flow diagram, a “roadmap,” that we can use to determine the properties
of an estimation procedure.
To illustrate the distinction between these two properties of estimation procedures we will
consider three examples:
• Unbiased and consistent.
• Unbiased but not consistent.
• Biased and consistent.
Consistent
Unbiased
Figure 18.11
Unbiased and consistent estimation procedures
595 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
No
Biased
Yes
Unbiased Yes
Does Var[Est] 0
No
as the sample size ?
Yes No
Figure 18.12
Determining the properties of an estimation procedure
When the standard ordinary least squares (OLS) premises are met the ordinary least squares
(OLS) estimation procedure is not only unbiased, but also consistent. We will use our Econo-
metrics Lab to illustrate this.
This estimation procedure is unbiased and consistent (table 18.2). After many, many
repetitions:
• The average of the estimated coefficient values equals the actual value, 2.0, suggesting that
the estimation procedure is unbiased.
• The variance of the estimated coefficient values appears to be approaching 0 as the sample
size increases.
596 Chapter 18
Table 18.2
Unbiased and consistent estimation procedure
Large sample
Small sample
Estimate
Actual value
Figure 18.13
OLS estimation procedure—Probability distributions
When the standard ordinary least squares (OLS) premises are met, the ordinary least squares
(OLS) estimation procedure provides us with the best of all possibilities; it is both unbiased and
consistent (figure 18.13).
The Any Two estimation procedure that we introduced in chapter 6 provides us with an example
of an estimation procedure that is unbiased but not consistent. Let us review the Any Two esti-
mation procedure. First we construct a scatter diagram plotting the explanatory variable on the
horizontal axis and the dependent variable on the vertical axis. Then we choose any two points
at random and draw a straight line connecting these points. The coefficient estimate equals the
slope of this line (figure 18.14).
597 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
yt Any Two
xt
3 9 15 21 27
Figure 18.14
Any Two estimation procedure
Table 18.3
Any Two estimation procedure
As table 18.3 reports the Any two estimation procedure is unbiased but not consistent. After
many, many repetitions:
• The average of the estimated coefficient values equals the actual value, 2.0, suggesting that
the estimation procedure is unbiased.
•The variance of the estimated coefficient values increases as the sample size increases (figure
18.15); consequently the estimation procedure is not consistent.
598 Chapter 18
Small sample
Large sample
Estimate
Actual value
Figure 18.15
Any Two estimation procedure—Probability distributions
To illustrate an estimation procedure that is biased, but consistent, we will revisit the opinion
poll conducted by Clint. Recall that Clint used a random sampling procedure to poll the
population.
Calculate the fraction of the sample supporting Clint. This estimation procedure proved to be
unbiased.
But now consider an alternative approach. Suppose that you are visiting Clint in his dorm
room and he asks you to conduct the poll. Instead of taking the time to write the name of each
individual on a 3 × 5 card, you simply leave Clint’s room and ask the first 16 people you run
into how he/she will vote.
599 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
Why do we call this a nonrandom sampling technique? Compared to the general student
population:
• Are the students who live near Clint more likely to be a friend of Clint?
• Consequently, are the students who live near Clint more likely to vote for Clint?
Since your starting point is Clint’s dorm room, it is likely that you will poll students who are
Clint’s friends. They will probably be more supportive of Clint than the general student popula-
tion, will they not? Consequently we would expect this nonrandom polling technique to be biased
in favor of Clint. We will use a simulation to test our logic.
Observe that you can select the sampling technique by checking or clearing the Nonrandom
Sample checkbox (see figure 18.16). Begin by clearing the Nonrandom Sample checkbox to
choose the random sampling technique; this provides us with a benchmark. Click Start and then
after many, many repetitions click Stop. As before, we observe that the estimation procedure is
unbiased. Convince yourself that the random sampling technique is also consistent by increasing
the sample size from 16 to 25 and to 100.
Next specify the nonrandom technique that we just introduced by checking the “Nonrandom
Sample” checkbox. You walk out of Clint’s dorm room and poll the first 16 people you run into.
Click Start and then after many, many repetitions click Stop. The simulation results confirm
our logic. The nonrandom polling technique biases the poll results in favor of Clint. But now
what happens as we increase the sample size from 16 to 25 and then to 100?
We observe that while the nonrandom sampling technique is still biased, the magnitude of the
bias declines as the sample size increases (table 18.4). As the sample size increases from 16 to
25 to 100, the magnitude of the bias decreases from 0.06 to 0.04 to 0.01. This makes sense, does
it not? As the sample size becomes larger, you will be farther and farther from Clint’s dorm
room, which means that you will be getting larger and larger portion of your sample from the
general student population rather than Clint’s friends. Furthermore the variance of the estimates
also decreases as the sample size increases. This estimation procedure is biased but consistent.
After many, many repetitions:
600 Chapter 18
Is Clint’s estimation
Nonrandom sample procedure unbiased?
Figure 18.16
Opinion Poll simulation
Table 18.4
Opinion Poll simulation—Random and nonrandom samples
Large sample
Small sample
Estimate
Actual value
Figure 18.17
Nonrandom sample estimation procedure—Probability distributions
• The average of the estimates appears to be approaching the actual value, 0.5.
• The variance of the estimated coefficient values appears to be approaching 0 as the sample
size increases (figure 18.17).
18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency
We have shown that when the explanatory variable/error term independence premise is violated,
the ordinary least squares (OLS) estimation procedure for the coefficient estimate is biased. But
might it be consistent?
Econometrics Lab 18.6: Ordinary Least Squares (OLS) Estimation Procedure and Consistency
Clearly, the magnitude of the bias does not diminish as the sample size increases (table 18.5).
The simulation demonstrates that when the explanatory variable/error term independence premise
is violated, the ordinary least squares (OLS) estimation procedure is neither unbiased nor con-
sistent. This leads us to a new estimation procedure, the instrumental variable (IV) estimation
procedure. Like ordinary least squares, the instrumental variables will prove to be biased when
the explanatory variable/error term independence premise is violated, but it has an advantage:
under certain conditions, the instrumental variable (IV) estimation procedure is consistent.
602 Chapter 18
Table 18.5
Explanatory variable/error term correlation—Simulation results
Original model:
yt = Const + x xt + t , where yt = dependent variable
xt = explanatory variable
t = error term
When xt and t t = 1, 2, . . .,T , T = sample size
are correlated
xt is the “problem”
explanatory variable
Figure 18.18
“Problem” explanatory variable
In some situations the instrumental variable estimation procedure can mitigate, but not com-
pletely remedy, cases where the explanatory variable and the error term are correlated (figure
18.18). When an explanatory variable, xt, is correlated with the error term, εt, we will refer to
the explanatory variable as the “problem” explanatory variable. The correlation of the explana-
tory variable and the error term creates the bias problem for the ordinary least squares (OLS)
estimation procedure.
We begin by searching for another variable called an instrument. Traditionally, we denote the
instrument by the lower case Roman letter z, zt. An effective instrument must possess two prop-
erties. A “good” instrument, zt, must be
• correlated with the “problem” explanatory variable, xt, and
• independent of the error term, εt.
We use the instrument to provide us with an estimate of the “problem” explanatory variable.
Then this estimate is used as a surrogate for the “problem” explanatory variable. The estimate
of the “problem” explanatory variable, rather than the “problem” explanatory variable itself, is
used to explain the dependent variable.
603 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
18.7.2 Mechanics
Instrumental Variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the
problem explanatory variable, xt.
• Dependent variable: “Problem” explanatory variable, xt.
• Explanatory variable: Instrument, zt.
• Estimate of the “problem” explanatory variable: Estxt = aConst + azzt, where aConst and az are the
estimates of the constant and coefficient in this regression, IV Regression 1.
Instrumental Variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.
Let us now provide the intuition behind why a “good” instrument, zt, must satisfy the two condi-
tions mentioned above.
The estimate, Estxt, will be a good surrogate only if it is a good predictor of the “problem”
explanatory variable, xt. This will occur only if the instrument, zt, is correlated with the “problem”
explanatory variable, xt.
yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt
Consequently, to avoid violating the explanatory variable/error term independence premise, the
instrument, zt, and the error term, εt, must be independent.
yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt
As we will see, while instrumental variable estimation procedure will not solve the problem of
bias, it can mitigate it. We will use a simulation to illustrate that while the instrumental variable
(IV) estimation procedure is still biased, it is consistent when “good” instrument conditions are
satisfied (figure 18.19).
Econometrics Lab 18.7: Instrumental Variables (IV) Estimation Procedure and Consistency
Two new correlation lists appear in this simulation: Corr X&Z and Corr Z&E. The two new lists
reflect the two conditions required for a good instrument:
• The Corr X&Z list specifies the correlation coefficient for the explanatory variable and the
instrument. To be a “good” instrument the explanatory variable and the instrument must be cor-
related. The default value is 0.50.
• The Corr Z&E specifies the correlation coefficient for the instrument and error term. To be a
“good” instrument the instrument and error term must be independent. The default value is 0.00;
that is, the instrument and error term are independent.
605 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
Sample size
0.0 50
Act 1.0 75
coef 2.0 100
125
150
Estimation
procedure
Pause
IV
Start Stop
Repetition
Coef value est
Mean
Var
Figure 18.19
Instrumental Variables simulation
Table 18.6
IV estimation procedure—“Good” instrument conditions satisfied
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests
Table 18.7
IV estimation procedure—A better instrument
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests
Subsequently we increase the sample size from 50 to 100 and then again from 100 to 150. Table
18.6 reports the simulation results.
Both bad news and good news emerge:
Bad news: The instrumental variable estimation is biased. The mean of the estimates for the
coefficient of the explanatory variable does not the actual value we specified, 2.0.
Good news: As we increase the sample size,
• the mean of the coefficient estimates gets closer to the actual value
and
• the variance of the coefficient estimates becomes smaller.
This illustrates the fact that the instrumental variable (IV) estimation procedure is consistent.
Next we will use the lab to illustrate the importance of the “good” instrument conditions.
First, let us see what happens when we improve the instrument by making it more highly cor-
related with the problem explanatory variable. We do this by increasing the correlation coeffi-
cient of the explanatory variable and the instrument from 0.50 to 0.75 in the Corr X&Z list
(table 18.7).
The magnitude of the bias decreases; also the variance of the coefficient estimates also
decreases. A more highly correlated instrument provides a better estimate of the “problem”
explanatory variable in IV Regression 1 and hence is a better instrument.
607 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
Table 18.8
IV estimation procedure—Instrument correlated with error term
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests
Last, let us use the lab to illustrate the important role that the independence of the error term
and the instrument plays by specifying 0.10 from the Corr Z&E list; the instrument and the error
term are no longer independent (table 18.8). As we increase the sample size from 50 to 100 to
150, the magnitude of the bias does not decrease. The instrumental variable (IV) estimation
procedure is no longer consistent when the instrument is correlated with the error term; the
explanatory variable/error term independence premise is violated in IV Regression 2.
1. What are the ramifications for the ordinary least squares (OLS) estimation procedure for the
value of the coefficient if the explanatory variable and error term are
a. positively correlated?
b. negatively correlated?
c. independent?
2. How does the problem resulting from explanatory variable/error term correlation differ from
the problems caused by heteroskedasticity or autocorrelation?
3. When is an estimation procedure unbiased?
4. When is an estimation procedure consistent?
5. Must an unbiased estimation procedure be consistent? Explain.
6. Must a consistent estimation procedure be unbiased? Explain.
7. What are the two “good” instrument conditions? Why is each important?
Chapter 18 Exercises
The Nonrandom Sample box is cleared; hence the random sampling procedure described above
will be used. By default the actual fraction supporting Clint, ActFrac, equals 0.5; also the From–
To values are specified as 0.45 and 0.55. Click the Start button and then after many, many,
repetitions click Stop.
The Nonrandom Sample box is checked; hence the nonrandom sampling procedure described
above will be used. As in problem 1, the sample size equals 16, the actual fraction supporting
Clint, ActFrac, equals 0.5, and the From–To values are specified as 0.45 and 0.55. Click the
Start button and then after many, many, repetitions click Stop.
a. What does the mean of the estimates equal?
b. What is the magnitude of the bias?
c. What does the variance of the estimates equal?
d. What percent of the repetitions fall within 0.05 of the actual fraction, ActFrac?
3. Compare your answers to problems 1 and 2. When the sample size is the same which sampling
procedure is more reliable?
609 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables
4. Clearly, the nonrandom polling procedure requires less “setup.” It is not necessary to write
the name of each student on a separate card, and so forth. Consequently, with the nonrandom
procedure, there is time to poll more students. In the following simulation the sample size has
been raised from 16 to 25 to account for this.
Click the Start button and then after many, many, repetitions click Stop.
a. What does the mean of the estimates equal?
b. What is the magnitude of the bias?
c. What does the variance of the estimates equal?
d. What percent of the repetitions fall within .05 of the actual fraction, ActFrac?
5. Compare your answers to problems 1 and 4. Is an unbiased estimation procedure always
better than a biased one? Explain.
Measurement Error and the Instrumental Variables Estimation Procedure
19
Chapter 19 Outline
19.2 The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable
Measurement Error
19.3 The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable
Measurement Error
19.3.1 Summary: Explanatory Variable Measurement Error Bias
19.3.2 Explanatory Variable Measurement Error: Attenuation (Dilution) Bias
19.3.3 Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent?
1. Suppose that a physics assignment requires you to measure the amount of time it takes a one
pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop
watch to measure how long it takes the weight to fall.
a. Even though you are very careful and conscientious would you expect the stop watch to
report precisely the same amount of time on each trial? Explain.
Suppose that the following equation describes the relationship between the measured elapsed
time and the actual elapsed time:
yMeasuredt = yActualt + vt
where
and where vt is a random variable. vt represents the random influences that cause your measure-
ment of the elapsed time to deviate from the actual elapsed time. The random influences cause
you to click the stop watch a little early or a little late.
b. Recall that you are careful and conscientious in attempting to measure the elapsed time.
i. In approximately what portion of the trials would you overestimate the elapsed time;
that is, in approximately what portion of the trials would you expect vt to be positive?
ii. In approximately what portion of the trials would you underestimate the elapsed time;
that is, in approximately what portion of the trials would you expect vt to be negative?
iii. Approximately what would the mean (average) of vt equal?
2. Economists distinguish between permanent income and annual income. Loosely speaking,
permanent income equals what a household earns per year “on average;” that is, permanent
income can be thought of as the “average” of annual income over an entire lifetime. In some
years, annual income is more than its permanent income, but in other years, it is less. The dif-
ference between the household’s annual income and permanent income is called transitory
income:
where
or equivalently,
Since permanent income equals what a household earns “on average,” the mean of transitory
income equals 0. Microeconomic theory teaches that households base their consumption deci-
sions on their “permanent” income.
Theory: Additional permanent income increases consumption.
When we attempt to gather data to access this theory, we immediately encounter a difficulty.
Permanent income cannot be observed. Only annual income data are available to assess the
theory. So, while we would like to specify permanent income as the explanatory variable, we
have no choice. We must use annual disposable income.
a. Can you interpret transitory income as measurement error? Hint: What is the mean
(average) of transitory income?
b. Now represent transitory income, IncTranst, by ut:
IncAnnt − IncPermt + ut
We will argue that dependent variable measurement error does not lead to bias. However, when-
ever explanatory variable measure error exists, the explanatory variable and error term will be
correlated resulting in bias. We consider dependent variable measurement error first. Before
doing so, we will describe precisely what we mean by measurement error.
614 Chapter 19
Suppose that a physics assignment requires you to measure the amount of time it takes a one
pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop
watch to measure how long it takes the weight to fall.
Question: Will your stop watch report the same amount of time on each trial?
Answer: No. Sometimes reported times will be lower than other reported times. Sometimes
you will be a little premature in clicking the stop watch button. Other times you will be a little
late.
It is humanly impossible to measure the actual elapsed time perfectly. No matter how careful
you are, sometimes the measured value will be a little low and other times a little high. This
phenomenon is called measurement error.
yMeasuredt = yActualt + vt
Recall that yActualt equals the actual amount of time elapsed and yMeasuredt equals the mea-
sured amount of time; vt represents measurement error. Sometimes vt will be positive when you
are a little too slow in clicking the stop watch button; other times vt will be negative when you
click the button a little too soon. vt is a random variable; we cannot predict the numerical value
of vt beforehand. What can we say about vt? We can describe its distribution. Since you are
conscientious in measuring the elapsed time, the mean of vt’s probability distribution equals 0:
Mean[vt] = 0
Measurement error does not systematically increase or decrease the measured value of yt. The
measured value of yt, yMeasuredt,will not systematically overestimate or underestimate the actual
value.
19.2 The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable
Measurement Error
We begin with the equation specifying the actual relationship between the dependent and
explanatory variables:
But now suppose that as a consequence of measurement error, the actual value of the dependent
variable, yActualt, is not observable. You have no choice but to use the measured value, yMea-
suredt. Recall that the measured value equals the actual value plus the measurement error random
variable, vt:
yMeasuredt = yActualt + vt
where vt is a random variable with mean 0: Mean[vt] = 0. Solving for yActualt:
yActualt = yMeasuredt − vt
εt represents the error term in the regression that you will actually be running. Will this result
in bias? To address this issue consider the following question:
Question: Are the explanatory variable, xActualt, and the error term, εt, correlated?
To answer the question, suppose that the measurement error term, vt, were to increase:
vt up
εt = et + vt
xActualt unaffected ↔ εt up
The value of the explanatory variable, xActualt, is unchanged while the error term, εt, increases.
Hence the explanatory variable and error term εt are independent; consequently no bias should
result.
We use a simulation to confirm our logic (figure 19.1). First we consider our base case, the no
measurement error case. The YMeas Err checkbox is cleared indicating that no dependent vari-
able measurement error is present. Consequently no bias should result. Be certain that the Pause
616 Chapter 19
Pause
OLS
Start Stop
YMeas err
Repetition
Coef value est
Mean
Var
Figure 19.1
Dependent variable measurement error simulation
checkbox is cleared and click Start. After many, many repetitions, click Stop. The ordinary least
squares (OLS) estimation procedure is unbiased in this case; the average of the estimated coef-
ficient values and the actual coefficient value both equal 2.0. When no measurement error is
present, all is well.
Now we will introduce dependent variable measurement error by checking the YMeas Err
checkbox. The YMeas Var list now appears with 20.0 selected; the variance of the measurement
error’s probability distribution, Var[vt], equals 20.0. Click Start and then after many, many
repetitions click Stop. Again, the average of the estimated coefficient values and the actual coef-
ficient value both equal 2.0. Next select from 20.0 to 50.0 to 80.0 from the YMeas Var list and
repeat the process.
The simulation confirms our logic (table 19.1). Even when dependent variable measurement
error is present, the average of the estimated coefficient values equals the actual coefficient value.
Dependent variable measurement error does not lead to bias.
What are the ramifications of dependent variable measurement error? The last column of table
19.1 reveals the answer. As measurement error variance increases, the variance of the estimated
coefficient values and hence the variance of the coefficient estimate’s probability distribution
increases. As the variance of the dependent variable measurement error term increases, we
introduce “more uncertainty” into the process and hence, the ordinary least squares (OLS) esti-
mates become less reliable.
617 Measurement Error and the Instrumental Variables Estimation Procedure
Table 19.1
Dependent variable measurement error simulation results
Sample size = 10
19.3 The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable
Measurement Error
To investigate explanatory variable measurement error we again begin with the equation that
describes the actual relationship between the dependent and explanatory variables:
Next suppose that we cannot observe the actual value of the explanatory variable; we can only
observe the measured value. The measured value equals the actual value plus the measurement
error random variable, ut:
xMeasuredt = xActualt + ut
where
ut is a random variable with mean 0: Mean[ut] = 0. Solving for yActualt:
xActualt = xMeasuredt − ut
Recall what we learned about correlation between the explanatory variable and error term
(figure 19.2):
Are the explanatory variable, xMeasuredt, and the error term, εt, correlated? The answer to the
question depends on the sign of the actual coefficient. Consider the three possibilities:
• βxActual > 0: When the actual coefficient is positive, negative correlation exists; consequently,
the ordinary least squares (OLS) estimation procedure for the coefficient value would be biased
downward. To understand why, suppose that ut increases:
ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt down
ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt up
Pause
OLS
Start Stop
XMeas err
Mean
Var
XMeas var
10.0
15.0
20.0
Explanatory variable
measurement error
variance
Figure 19.2
Explanatory variable measurement error simulation
• βxActual = 0: When the actual coefficient equals 0, no correlation exists; consequently no bias
results. To understand why, suppose that ut increases:
ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt unaffected
No explanatory variable/error
term correlation
↓
OLS unbiased
620 Chapter 19
We will use a simulation to check our logic. This time we check the XMeas Err checkbox. The
XMeas Var list now appears with 20.0 selected; the variance of the measurement error’s probabil-
ity distribution, Var[ut], equals 20.0. Then we select various values for the actual coefficient. In
each case, click Start and then after many, many repetitions click Stop. The simulation results
are reported in table 10.2 (table 19.2).
The simulation results confirm our logic. When the actual coefficient is positive and explanatory
variable measurement error is present, the ordinary least squares (OLS) estimation procedure for
the coefficient value is biased downward. When the actual coefficient is negative and explanatory
variable measurement error is present, upward bias results. Last, when the actual coefficient is
zero, no bias results even in the presence of explanatory variable measurement error.
Table 19.2
Explanatory variable measurement error simulation results
Sample size = 40
β xActual
β xActual < 0 β >0
0 xActual
Figure 19.3
Effect of explanatory variable measurement error
The simulations reveal an interesting pattern. While explanatory variable measurement error
leads to bias, the bias never appears to be strong enough to change the sign of the mean of the
coefficient estimates. In other words, explanatory variable measurement error biases the ordinary
least squares (OLS) estimation procedure for the coefficient value toward 0. This type of bias
is called attenuation or dilution bias (figure 19.3).
Why does explanatory variable measurement error cause attenuation bias? Even more basic,
why does explanatory variable measurement error cause bias at all? After all, the chances that
the measured value of the explanatory variable will be too high equal the chances it will be too
low. Why should this lead to bias? To appreciate why, suppose that the actual value of the coef-
ficient, βxActual, is positive. When the measured value of the explanatory variable, xMeasuredt,
rises it can do so for two reasons:
• the actual value of explanatory variable, xActualt, rises
or
• the value of the measurement error term, ut, rises.
xActualt up → yActualt up
xMeasuredt up or
ut up → yActualt unchanged
622 Chapter 19
Taking into account both cases, we conclude that the estimation procedure will understate the
effect that the actual value of the explanatory variable has on the dependent variable. Overall,
the estimation procedure will understate the actual value of the coefficient.
19.3.3 Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent?
We have already shown that when explanatory variable measurement error is present and the
actual coefficient is nonzero, the ordinary least squares (OLS) estimation procedure for the coef-
ficient value is biased. But perhaps it is consistent. Let us see by increasing the sample size
(table 19.3).
The bias does not lessen as the sample size is increased. Unfortunately, when explanatory
variable measurement error is present and the actual coefficient is nonzero, the ordinary least
squares (OLS) estimation procedure for the coefficient value provides only bad news:
• Bad news: The ordinary least squares (OLS) estimation procedure is biased.
• Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.
Table 19.3
OLS estimation procedure, measurement error, and consistency
Original model:
yt = Const + , where yt = dependent variable
x xt + t
xt = explanatory variable
t = error term
When xt and t
t = 1, 2, . . .,T, T = sample size
are correlated
xt is the “problem”
explanatory variable
Figure 19.4
The “problem” explanatory variable
Recall that the instrumental variable estimation procedure addresses situations in which the
explanatory variable and the error term are correlated (figure 19.4).
When an explanatory variable, xt, is correlated with the error term, εt, we will refer to the
explanatory variable as the “problem” explanatory variable. The correlation of the explanatory
variable and the error term creates the bias problem for the ordinary least squares (OLS) estima-
tion procedure. The instrumental variable estimation procedure can mitigate, but not completely
remedy the problem. Let us briefly review the procedure and motivate it.
19.4.1 Mechanics
Instrumental variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the
problem explanatory variable, xt.
• Dependent variable: “Problem” explanatory variable, xt.
• Explanatory variable: Instrument, zt.
• Estimate of the “problem” explanatory variable: Estxt = aConst + azzt, where aConst and az are the
estimates of the constant and coefficient in this regression, IV Regression 1.
Instrumental variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.
624 Chapter 19
Let us now provide the intuition behind why a “good” instrument, zt, must satisfy the two condi-
tions: instrument/”problem” explanatory variable correlation and instrument/error term
independence.
The estimate, Estxt, will be a good surrogate only if it is a good predictor of the “problem”
explanatory variable, xt. This will occur only if the instrument, zt, is correlated with the “problem”
explanatory variable, xt.
yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt
yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt
Economists distinguish between permanent income and annual income. Loosely speaking, per-
manent income equals what a household earns per year “on average;” that is, permanent income
can be thought of as the “average” of annual income over an entire lifetime. In some years, the
household’s annual income is more than its permanent income, but in other years, it is less. The
difference between the household’s annual income and permanent income is called transitory
income:
where
or equivalently,
Since permanent income equals what the household earns “on average,” the mean of transitory
income equals 0.
Microeconomic theory teaches that households base their consumption decisions on their
“permanent” income. We are going to apply the permanent income consumption theory to health
insurance coverage:
Theory: Additional permanent per capita disposable income within a state increases health
insurance coverage within the state.
Project: Assess the effect of permanent income on health insurance coverage.
where
When we attempt to gather data to access this theory, we immediately encounter a difficulty.
Permanent income cannot be observed. Only annual income data are available to assess the
theory.
Health insurance data: Cross-sectional data of health insurance coverage, education, and
income statistics from the 50 states and the District of Columbia in 2007.
Coveredt Adults (25 and older) covered by health insurance in state t (percent)
IncAnnPCt Per capita annual disposable income in state t (thousands of dollars)
HSt Adults (25 and older) who completed high school in state t (percent)
Collt Adults (25 and older) who completed a four year college in state t (percent)
AdvDegt Adults (25 and older) who have an advanced degree in state t (percent)
While we would like to specify permanent income as the explanatory variable, permanent income
is unobservable. We have no choice. We must use annual disposable income as the explanatory
variable. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
(table 19.4).
Now construct the null and alternative hypotheses:
Table 19.4
Health insurance OLS regression results
Since the null hypothesis is based on the premise that the actual value of the coefficient equals
0, we can calculate the Prob[Results IF H0 true] using the tails probability reported in the regres-
sion printout:
0.0352
Prob[Results IF H 0 true] = = 0.0176
2
19.5.2 Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a Serious
Econometric Problem?
Might this regression suffer from a serious econometric problem, however? Yes. Annual income
equals permanent income plus transitory income; transitory income can be viewed as measure-
ment error. Sometimes transitory income is positive, sometimes it is negative, on average it
is 0:
IncPermPCt = IncAnnPCt − ut
As a consequence of explanatory variable measurement error the ordinary least squares (OLS)
estimation procedure for the coefficient will be biased downward. To understand why we begin
with our model and then do a little algebra:
where βIncPerm > 0. Theory suggests that βIncPerm is positive; consequently we expect the new error
term, εt, and the explanatory variable, IncAnnPCt, to be negatively correlated.
ut up
IncAnnPCt = IncPermPCt + ut εt = et − βIncPermut βIncPerm > 0
IncAnnPCt up ↔ εt down
IncAnnPCt is the “problem” explanatory variable because it is correlated with the error term, εt.
The ordinary least squares (OLS) estimation procedure for the coefficient value is biased toward
0. We will now show how we can use the instrumental variable (IV) estimation procedure to
mitigate the problem.
Choose an instrument: In this example we use percent of adults who completed high school,
HSt, as our instrument. In doing so, we believe that it satisfies the two “good” instrument condi-
tions. We believe that high school education, HSt,
• is positively correlated with the “problem” explanatory variable, IncAnnPCt
and
• is uncorrelated with the error term, εt.
We can motivate IV Regression 1 by devising a theory to explain permanent income. Our theory
is very straightforward, state per capita permanent income depends on percent of state residents
who are high school graduates:
Table 19.5
Health insurance IV Regression 1 results
where
HSt = percent of adults (25 and over) who completed high school in state t
Theory: As a state has a greater percent of college graduates, its per capita permanent income
increase; hence αHS > 0.
But, again, we note that permanent income is not observable, only annual income is. Conse-
quently we have no choice but to use annual per capita income as the dependent variable
(table 19.5).
What are the ramifications of using annual per capita income as the dependent variable? We can
view annual per capita income as permanent per capita income with measurement error. What
do we know about dependent variable measurement error? Dependent variable does not lead to
bias; only explanatory variable measurement error creates bias. Since annual income is the
dependent variable in IV Regression 1, the ordinary least squares (OLS) estimation procedure
for the regression parameters will not be biased.
Table 19.6
Health insurance IV Regression 2 results
Table 19.7
Comparison of OLS and IV Regression results
19.6.2 Comparison of the Ordinary Least Squares (OLS) and the Instrumental
Variables (IV) Approaches
Now review the two approaches that we used to estimate of the effect of permanent income on
health insurance coverage: the ordinary least squares (OLS) estimation procedure and the instru-
mental variable (IV) estimation procedure.
• First, we used annual disposable income as the explanatory variable and applied the ordinary
least squares (OLS) estimation procedure. We estimated that a $1,000 increase in per capita
disposable income increases health insurance coverage by 0.23 percentage points. But we believe
that an explanatory variable measurement error problem is present here.
• Second, we used an instrumental variable (IV) approach, which resulted in a higher estimate
for the impact of permanent income. We estimated that a $1,000 increase in per capita disposable
income increases health insurance coverage by 1.39 percentage points.
These results are consistent with the notion that the ordinary least squares (OLS) estimation
procedure for the coefficient value is biased downward whenever explanatory variable measure-
ment error is present (table 19.7).
631 Measurement Error and the Instrumental Variables Estimation Procedure
We are using the instrument to create a surrogate for the “problem” explanatory variable in IV
Regression 1:
The estimate, EstIncAnnPCt, will be a “good” surrogate only if the instrument, HSt, is correlated
with the “problem” explanatory variable, IncAnnPCt; that is, only if the estimate is a good pre-
dictor of the “problem” explanatory variable.
The sign of the HSt coefficient is positive supporting our view that annual income and high
school education are positively correlated. Furthermore, the coefficient is significant at the 5
percent level and nearly significant at the 1 percent level. So it is reasonable to judge that the
instrument meets the first condition.
Next focus on the second “good” instrument condition:
Instrument/error term independence: The instrument, HS, and the error term, εt, must be inde-
pendent. Otherwise, the explanatory variable/error term independence premise would be violated
in IV Regression 2.
The explanatory variable/error term independence premise will be satisfied only if the instru-
ment, HSt, and the new error term, εt, are independent. If they are correlated, then we have gone
“from the frying pan into the fire.” It was the violation of this premise that created the problem
in the first place. There is no obvious reason to believe that they are correlated. Unfortunately,
there is no way to confirm this empirically, however. This can be the “Achilles heel” of the
instrumental variable (IV) estimation procedure. Finding a good instrument can be very tricky.
Claim: While the instrumental variable (IV) estimation procedure for the coefficient value in
the presence of measurement is biased, it is consistent.
632 Chapter 19
Econometrics Lab 19.4: Consistency and the Instrumental Variable (IV) Estimation Procedure
While this claim can be justified rigorous, we will avoid the mathematics by using a
simulation.
Focus your attention on figure 19.5. Since we wish to investigate the properties of the instru-
mental variable (IV) estimation procedure, IV is selected in the estimation procedure box. Next
note the XMeas Var List. Explanatory variable measurement error is present. By default, the
Pause
IV
Start Stop
Mean
Var
0.50 10.0
0.75 15.0
20.0
Figure 19.5
Instrumental variable measurement error simulation
633 Measurement Error and the Instrumental Variables Estimation Procedure
Table 19.8
Measurement error, IV estimation procedure, and consistency
variance of the probability distribution for the measurement error term, Var[ut], equals 20.0. In
the Corr X&Z list .50 is selected; the correlation coefficient between the explanatory variable
and the instrument is .50.
Initially, the sample size is 40. Click Start and then after many, many repetitions click Stop.
The average of the estimated coefficient values equals 2.24. Next increase the sample size from
40 to 50 and repeat the process. Do the same for a sample size of 60. As table 19.8 reports, the
average of the estimated coefficient values never equals the actual value; consequently the
instrumental variable (IV) estimation procedure for the coefficient value is biased. But also note
that the magnitude of the bias decreases as the sample size increases. Also the variance of the
estimates declines as the sample size increases.
Table 19.8 suggests that when explanatory variable measurement error is present, the instru-
mental variable (IV) estimation procedure for the coefficient value provides both good news and
bad news:
• Bad news: The instrumental variable (IV) estimation procedure for the coefficient value is
still biased; the average of the estimated coefficient values does not equal the actual value.
• Good news: The instrumental variable (IV) estimation procedure for the coefficient value is
consistent.
Chapter 19 Exercises
where
Suppose that we wish to publish the results of our analysis along with the data. As a consequence
of privacy concerns, we wish to prevent “outsiders” from connecting an individual student’s
exam, problem set, and SAT scores.
1. Randomize the student SATScores. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s SATScores.
• If the coin lands tails, subtract 10 points from that student’s SATScores.
Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?
2. Randomize the student ProbScores. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s ProbScores.
• If the coin lands tails, subtract 10 points from that student’s ProbScores.
Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?
3. Randomize the student ExamScore. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s ExamScore.
• If the coin lands tails, subtract 10 points from that student’s ExamScore.
Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?
635 Measurement Error and the Instrumental Variables Estimation Procedure
Health insurance data: Cross-sectional data of health insurance coverage, education, and
income statistics from the 50 states and the District of Columbia in 2007.
Coveredt Percent of adults (25 and older) with health insurance in state t
IncAnnPCt Per capita annual disposable income in state t (thousands of dollars)
HSt Percent of adults (25 and older) who completed high school in state t
Collt Percent of adults (25 and older) who completed a four year college in state t
AdvDegt Percent of adults (25 and older) who have an advanced degree in state t
4. Consider the same theory and model that we used in the chapter:
Theory: Additional permanent per capita disposable income within a state increases health
insurance coverage within the state.
where
Repeat part of what was done in this chapter: use the ordinary least squares (OLS) estimation
procedure to estimate the value of the IncPermPC coefficient. You have no choice but to use
IncAnnPCt as the explanatory variable since IncPermPCt is not observable. Does the sign of the
coefficient lend support for your theory?
Judicial data: Cross-sectional data of judicial and economic statistics for the fifty states in 2000.
636 Chapter 19
JudExpt State and local expenditures for the judicial system per 100,000 persons in state t
CrimesAllt Crimes per 100,000 persons in state t
GdpPCt Per capita real per capita GDP in state t (2000 dollars)
Popt Population in state t (persons)
UnemRatet Unemployment rate in state t (percent)
Statet Name of state t
Yeart Year
a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient using the Judicial Data. Interpret the coefficient estimates. What are the critical
results?
8. Many believe that measurement error is present in crime rate statistics. Use instrumental
variable (IV) estimation procedure to account for the measurement error.
Chapter 20 Outline
20.2 The Ordinary Least Squares Estimation Procedure, Omitted Explanatory Variable Bias,
and Consistency
and
yt = βConst + βx1x1t + εt
a. Express the second model’s error term, εt, in terms as a function of the first model’s terms.
Assume that
• the coefficient βx2 is positive
and
• the explanatory variables, x1t and x2t, are positively correlated.
b. Will the explanatory variable, x1t, and the second model’s error term, εt, be correlated? If
so, how?
c. Focus on the second model. Suppose that the ordinary least squares (OLS) estimation
procedure were used to estimate the parameters of the second model. Would the ordinary
least squares estimation (OLS) estimation procedure for the value of βx1 be biased? If so,
how?
2. Consider the following model explaining the vote received by the Democratic Party in the
2008 presidential election:
The variable Liberalt reflects the “liberalness” of the electorate in state t. On the one hand, if
the electorate is by nature liberal in state t, the Liberalt would be high; on the other hand, if the
electorate is conservative the Liberalt would be low.
639 Omitted Variables and the Instrumental Variable Estimation Procedure
a. Express the second model’s error term, εt, in terms as a function of the first model’s terms.
Assume that
• the coefficient βLib is positive
and
• the explanatory variables PopDent and Liberalt are positively correlated.
b. Will the explanatory variable PopDent and the second model’s error term, εt, be correlated?
If so, how?
c. Focus on the second model. Suppose that the ordinary least squares (OLS) estimation
procedure were used to estimate the parameters of the second model. Would the ordinary
least squares estimation (OLS) estimation procedure for the value of βLib be biased? If so,
how?
3. What does the correlation coefficient of PopDent and Collt equal?
We will briefly review our previous discussion of omitted explanatory variables that appears in
chapter 14. Then we will show that omitted explanatory variable phenomenon can also be ana-
lyzed in terms of explanatory variable/error term correlation.
In chapter 14 we argued that omitting an explanatory variable from a regression will bias the
ordinary least squares (OLS) estimation procedure for the coefficient value whenever two condi-
tions are met. Bias results if the omitted variable
• influences the dependent variable;
• is correlated with an included variable.
When these two conditions are met, the ordinary least squares (OLS) procedure to estimate the
coefficient of the included explanatory variable captures two effects:
• Direct effect: The effect that the included explanatory variable actually has on the dependent
variable.
• Proxy effect: The effect that the omitted explanatory variable has on the dependent variable
because the included variable is acting as a proxy for the omitted variable.
Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect that each explanatory variable has on the dependent variable.
640 Chapter 20
Consequently we want the coefficient estimate of the included variable to capture only the direct
effect and not the proxy effect. Unfortunately, the ordinary least squares (OLS) estimation pro-
cedure fails to do this when the omitted variance influences the dependent variable and when it
is also correlated with an included variable.
To illustrate this, we considered a model with two explanatory variables, x1 and x2:
What happens when we omit the explanatory variable x2t from the regression?
The two conditions necessary for the omitted variable bias are satisfied:
• Since βx2 is positive, the omitted variable influences the dependent variable.
• Since x1t and x2t are positively correlated, the omitted variable is correlated with an included
variable.
An increase in x1t directly affects yt, causing yt to increase; this is the direct effect we want to
capture. But the story does not end here when x2t is omitted. Since the two explanatory variables
are positively correlated, an increase in x1t is typically accompanied by an increase in x2t, which
in turn leads to an additional increase in yt:
Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 > 0
yt up yt up
↓ ↓
Direct effect Proxy effect
When the explanatory variable x2t is omitted from a regression, the ordinary least squares (OLS)
estimation procedure for the value of x1t’s coefficient, βx1, is biased upward because it reflects
not only the impact of x1t itself (direct effect) but also the impact of x2t (proxy effect).
641 Omitted Variables and the Instrumental Variable Estimation Procedure
20.1.2 Omitted Explanatory Variable Bias and the Explanatory Variable/Error Term
Independence Premise
We can also use what we learned in chapter 18 about correlation between the explanatory vari-
able and error term to explain why bias occurs. When we omit the explanatory variable x2t from
the regression, the error term of the new equation, εt, includes not only the original error term,
et, but also the “omitted variable term,” βx2x2t:
The new error term, εt, includes the “omitted variable term,” βx2x2t. Therefore the included
explanatory variable, x1t, and the new error term, εt, are positively correlated:
• Since x1t and x2t are positively correlated, when x1t increases, x2t typically increases also.
• Since βx2 is positive when x1t increases, βx2x2t and the new error term, εt, increases also, the
new error term will typically increase.
20.2 The Ordinary Least Squares Estimation Procedure, Omitted Explanatory Variable Bias,
and Consistency
When an omitted explanatory variable causes the ordinary least squares (OLS) estimation pro-
cedure to be biased, might the procedure still be consistent? We will use a simulation to address
this question (figure 20.1).
Econometrics Lab 20.1: Ordinary Least Squares, Omitted Variables, and Consistency
By default, the actual value of x1t coefficient, Coef1, equals 2.0 and the actual value of the x2t
coefficient, Coef2, equals 4.0. The correlation coefficient for the explanatory variables x1t and
x2t equals 0.60; the explanatory variables are positively correlated. Furthermore the Only X1
option is selected; the explanatory variable x2t is omitted. The included explanatory variable,
x1t, will be positively correlated with the error term. x1t becomes a “problem” explanatory
variable.
Initially, the sample size equals 50. Click Start and then after many, many repetitions click
Stop. The mean of the coefficient estimates for the explanatory variable x1 equals 4.4. Our logic
is confirmed; upward bias results. Nevertheless, to determine if the ordinary least squares (OLS)
estimation procedure might consistent we increase the sample size from 50 to 100 and once
more from 100 to 150. As table 20.1 reports, the mean of the coefficient estimates remains
at 4.4.
Unfortunately, the ordinary least squares (OLS) estimation procedure proves to be not only
biased but also not consistent whenever an explanatory variable is omitted that
• affects the dependent variable
and
• is correlated with an included variable.
Sample size
50
Act 1.0 75
coef 2.0 100
125
150
– 4.0
0.0 Estimation
Act
Coef2 4.0 procedure
Pause
OLS
Start Stop
Repetition
Coef value est Both X’s
Mean Only X1
Var
Figure 20.1
OLS omitted variable simulation
Table 20.1
Ordinary least squares: Bias and consistency
Original model:
yt = Const + , where yt = dependent variable
x xt + t
xt = explanatory variable
t = error term
When xt and t t = 1, 2, . . .,T, T = sample size
are correlated
xt is the “problem”
explanatory variable
Figure 20.2
“Problem” explanatory variable
The instrumental variable (IV) estimation procedure can deal with situations when the explana-
tory variable and the error term are correlated (figure 20.2). When an explanatory variable, xt,
is correlated with the error term, εt, we refer to the explanatory variable as the “problem”
explanatory variable. The correlation of the explanatory variable and the error term creates the
bias problem for the ordinary least squares (OLS) estimation procedure. The instrumental vari-
able estimation procedure can mitigate, but not completely remedy these cases. Let us briefly
review the procedure and motivate it.
20.3.1 Mechanics
Choose a “good” instrument: A “good” instrument, zt, must have two properties:
• Correlated with the “problem” explanatory variable, xt, and
• Uncorrelated with the error term, εt.
Instrumental variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the
problem explanatory variable, xt.
• Dependent variable: “Problem” explanatory variable, xt.
• Explanatory variable: Instrument, zt.
• Estimate of the “problem” explanatory variable: Estxt = aConst + azzt, where aConst and az are the
estimates of the constant and coefficient in this regression, IV Regression 1.
Instrumental variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.
645 Omitted Variables and the Instrumental Variable Estimation Procedure
Let us again provide the intuition behind why a “good” instrument, zt, must satisfy the two
conditions:
The estimate, Estxt, will be a good surrogate only if it is a good predictor of the “problem”
explanatory variable, xt. This will occur only if the instrument, zt, is correlated with the “problem”
explanatory variable, xt.
Instrument/error term independence: The instrument, zt, must be independent of the error term,
εt. Focus on IV Regression 2. We begin with the original model and then replace the “problem”
explanatory, xt, variable with its surrogate, Estxt:
yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt
Consequently, to avoid violating the explanatory variable/error term independence premise the
instrument, zt, and the error term, εt, must be independent.
yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt
2008 presidential election data: Cross-sectional data of election, population, and economic
statistics from the 50 states and the District of Columbia in 2008.
The variable Liberalt reflects the “liberalness” of the electorate in state t. On the one hand, if
the electorate is by nature liberal in state t, the Liberalt would be high; on the other hand, if the
electorate is conservative, the Liberalt would be low. The theories described below suggest that
coefficients both Liberalt and PopDent would be positive:
•Population density theory: States with high population densities have large urban areas
which are more likely to vote for the Democratic candidate, Obama; hence βPopDen > 0.
• “Liberalness” theory: Since the Democratic party is more liberal than the Republican party,
a high “liberalness” value would increase the vote of the Democratic candidate, Obama; hence
βLib > 0.
Unfortunately, we do not have any data to quantify the “liberalness” of a state; according, Liberal
must be omitted from the regression (table 20.2).
Question: Might the ordinary least squares (OLS) estimation procedure suffer from a serious
econometric problem?
Since the “liberalness” variable is unobservable and must be omitted from the regression, the
explanatory variable/error term premise would be violated if the included variable, PopDent, is
correlated with the new error term, εt.
647 Omitted Variables and the Instrumental Variable Estimation Procedure
Table 20.2
Democratic vote OLS regression results
t up Lib > 0
Figure 20.3
PopDen—A “problem” explanatory variable
where εt = βLibLiberalt + et. We have good reason to believe that they will be correlated because
we would expect PopDent and Liberalt to be correlated. States that tend to elect liberal repre-
sentatives and senators then to have high population densities. That is, we suspect that PopDent
and Liberalt are positively correlated (figure 20.3). Consequently the included explanatory vari-
able, PopDent, and the error term, εt, will be positively correlated. The ordinary least squares
(OLS) estimation procedure for the value of the coefficient will be biased upward.
To summarize, when the explanatory variable Liberal is omitted, as it must be, PopDen
becomes a “problem” explanatory variable become it is correlated with the error term, εt. Now
648 Chapter 20
we will apply the Instrumental Variable (IV) estimation procedure to understand how the instru-
ment variable estimation procedure can address the omitted variable problem.
Choose an instrument: In this example we will use the percent of college graduates, Collt, as
our instrument. In doing so, we believe that it satisfies the two “good” instrument conditions;
that is, we believe that the percentage of high school graduates, Collt, is
• positively correlated with the “problem” explanatory variable, PopDent
and
•uncorrelated with the error term, ε t = βLibLiberalt + et. Consequently we believe that the instru-
ment, Collt, is uncorrelated with the omitted variable, Liberalt.
Table 20.3
Democratic vote IV Regression 1 results
Table 20.4
Democratic vote IV Regression 2 results
Table 20.5
Correlation matrices Coll and PopDen
Correlation matrix
Coll PopDen
The estimate, EstPopDent, will be a “good” surrogate only if the instrument, Collt, is correlated
with the “problem” explanatory variable, PopDent; that is, only if the estimate is a good predictor
of the “problem” explanatory variable. Table 20.5 reports that the correlation coefficient for Collt
and PopDent equals 0.60. Furthermore the IV Regression 1 results appearing in table 20.3 suggest
that the instrument, Collt, will be a good predictor of the “problem” explanatory variable,
PopDent. Clearly, the coefficient estimate is significant at the 1 percent level. So it is reasonable
to judge that the instrument meets the first condition.
650 Chapter 20
Instrument/error term independence: The instrument, Collt, and the error term, εt, must be
independent. Otherwise, the explanatory variable/error term independence premise would be
violated in IV Regression 2.
The explanatory variable/error term independence premise will be satisfied only if the surrogate,
EstPopDent, and the error term, εt, are independent. EstPopDent is a linear function of Collt and
εt is a linear function of PopDent:
• EstPopDent = −560.9 + 28.13Collt
and
• εt = βLibLiberalt + et
Hence the explanatory variable/error term independence premise will be satisfied only if the
instrument, Collt, and the omitted variable, Liberalt, are independent. If they are correlated, then
we have gone “from the frying pan into the fire.” It was the violation of this premise that created
the problem in the first place. Unless we were to believe that liberals are better educated than
conservatives, and vice versa, it is not unreasonable to believe that education and political lean-
ings are independent. Many liberals are highly educated and many conservatives are highly
educated. Unfortunately, there is no way to confirm this empirically. This can be the “Achilles
heel” of the instrumental variable (IV) estimation procedure. When we choose an instrument, it
must be uncorrelated with the omitted variable. Since there is no way to assess this empirically,
we are implicitly assuming that the second good instrument condition is satisfied when we use
the instrumental variables estimation procedure to address the omitted explanatory variables
problem.
We claim that while the instrumental variable (IV) estimation procedure for the coefficient value
is still biased when an omitted explanatory variable problem exists, it will be consistent when
we use a “good” instrument.
651 Omitted Variables and the Instrumental Variable Estimation Procedure
Sample size
0.0 50
Act 1.0 75
coef 2.0 100
125
150
Act –4.0
Coef2 0.0 Estimation
4.0 procedure
IV Pause
Start Stop
Repetition
Coef value est
Both X’s
Mean
Only X1
Var
Figure 20.4
IV omitted variable simulation
While this claim can be justified rigorous, we will avoid the complicated mathematics by using
a simulation (figure 20.4).
The model upon which this simulation is based on the following model:
The Only X1 button is selected; hence, only the first explanatory variable will be included in
the analysis. The model becomes
The Corr X1&Z list specifies the correlation coefficient of the included explanatory variable,
x1, and the instrument, z. This correlation indicates how good a surrogate the instrument will
be. An increase in correlation means that the instrument should become a better surrogate. By
default, this correlation coefficient equals .50. The Corr X2&Z list specifies the correlation coef-
ficient of the omitted explanatory variable, x2, and the instrument, z. Recall how the omitted
variable, x2, and the error term, εt, are related:
yt = βConst + βx1x1t + εt
where εt = βx2 x2t + et. By default, .00 is selected from the Corr X2&Z list. Hence the instrument,
z, and the error term, εt, are independent. The second condition required for a “good” instrument
is also satisfied. Initially, the sample size equals 50. Then we increase from 50 to 100 and sub-
sequently from 100 to 150. Table 20.6 reports results from this simulation:
Both bad news and good news emerge:
• Bad news: The instrumental variable estimation is biased. The mean of the estimates for the
coefficient of the first explanatory variable, x1, does not equal the actual value we specified, 2.
• Good news: As we increase the sample size, the mean of the coefficient estimates gets closer
to the actual value and the variance of the coefficient estimates becomes smaller. This illustrates
the fact that the instrumental variable (IV) estimation procedure is consistent.
653 Omitted Variables and the Instrumental Variable Estimation Procedure
Table 20.6
IV estimation procedure—Biased but consistent
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests
Table 20.7
IV estimation procedure—Improved instrument
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests
Next let us see what happens when we improve the instrument by making it more correlated
with the included “problem” explanatory variable. We do this by increasing the correlation coef-
ficient of the included explanatory variable, x1, and the instrument, z, from 0.50 to 0.75 when
the sample size equals 150 (table 20.7). The magnitude of the bias decreases and the variance
of the coefficient estimates also decreases. We now have a better instrument.
Last, let us use the lab to illustrate the important role that the independence of the error term,
εt, and the instrument, z, plays:
βt = βx2x2t + et
By specifying 0.10 from the Corr X2&Z list the error term, εt, and the instrument, z, are no
longer independent (table 20.8). As we increase the sample size from 50 to 100 to 150, the
magnitude of the bias does not decrease. The instrumental variable (IV) estimation procedure is
no longer consistent. This illustrates the “Achilles heel” of the instrument variable (IV) estima-
tion procedure.
654 Chapter 20
Table 20.8
IV estimation procedure—Instrument correlated with omitted variable
Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests
1. In generally, what are the two conditions that a “good” instrument must meet?
2. More specifically, when an omitted variable issue arises, a “good” instrument must satisfy
two conditions.
a. The instrument and the included “problem” explanatory variable:
i. How must the instrument be related to the included “problem” explanatory variable?
Explain.
ii. Can we determine whether this condition is met? If so, how?
b. The instrument and the omitted variable:
i. How must the instrument be related to the omitted explanatory variable? Explain.
ii. Can we determine whether this condition is met? If so, how?
Chapter 20 Exercises
Consider the following model explaining the vote received by the Democratic Party in the
2008 presidential election:
where
where
6.
a. First focus on the growth rate of real GDP.
i. How do you believe a state’s GDP growth rate affected the Democrat vote in the 2008
presidential election?
ii. What does this suggest about the sign of βGdpGth?
656 Chapter 20
Chapter 21 Outline
Satisfied: Violated:
OLS bias question: Is the explanatory Explanatory variable Explanatory variable
variable/error term independence and error term and error term
premise satisfied or violated? independent correlated
Is the OLS estimation procedure
for the value of the coefficient
unbiased or biased? __________ __________
2. Suppose that there are three college students enrolled in a small math class: Jim, Peg, and
Tim. A quiz is given weekly. Each student’s quiz score for the first ten weeks of the semester
is reported below along with the number of minutes the student studied and his/her math SAT
score from high school.
Panel data (also called longitudinal data) combines time series and cross-sectional information.
A time series refers to data for a single entity in different time periods. A cross section refers
to data for multiple entities in a single time period. In this example, data from the ten weeks
represent the time series; that is, the ten weeks provide data from ten different time periods. The
data from the three students represent the cross section; that is, the three students provide data
for three different entities, Jim, Peg, and Tim.
Our assignment is to assess the effect of studying on the students’ math quiz scores:
Assignment: Assess the effect of studying on quiz scores.
where
Let us now take a closer look at the data. The math SAT scores are from high school. Jim’s SAT
score in high school equaled a constant 720. Similarly Peg’s is a constant 760 and Tim’s is a
constant 670:
t = 720
MathSat Jim for t = 1, 2, . . . , 10
MathSat Peg
t = 760 for t = 1, 2, . . . , 10
t = 670
MathSat Tim for t = 1, 2, . . . , 10
This allows us to simplify the notation. Since the MathSat variable only depends on the student
and does not depend on the week, we can drop the time subscript t for the MathSat variable,
but of course we must retain the individual student superscript i to denote the student:
a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
Privacy concerns did not permit the college to release student SAT data. Consequently, you
have no choice but to omit MathSat from your regression.
b. Do high school students who receive high SAT math scores tend to study more or less
than those students who receive low scores?
660 Chapter 21
This chapter does not introduce any new concepts. It instead applies the concepts that we already
learned to a new situation. We begin by reviewing the concepts we will be using. First recall the
standard ordinary least squares (OLS) premises:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:
Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.
Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.
The ordinary least square (OLS) estimation procedure is economist’s most widely used esti-
mation procedure (figure 21.1). When contemplating the use of this procedure, we should keep
two questions in mind:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?
•If independent, the ordinary least squares (OLS) estimation procedure for the coefficient value
will be unbiased, and we should pose the second reliability question.
• If correlated, the ordinary least squares (OLS) estimation procedure for the coefficient value
will be biased, in which case we should consider an alternative procedure in an effort to calculate
better estimates.
661 Panel Data and Omitted Variables
Figure 21.1
Ordinary least squares (OLS) bias summary
OLS reliability question: Are the ordinary least square (OLS) error term equal variance premise
and the error term/error term independence premises satisfied; that is, is the variance of the
probability distribution for each observation’s error term the same and are the error terms inde-
pendent from each other?
• If satisfied, the ordinary least squares (OLS) estimation procedure calculation of the coeffi-
cient’s standard error, t-statistic, and tails probability will be “sound” and the ordinary least
squares (OLS) estimation procedure is the best linear unbiased estimation procedure (BLUE).
In some sense, we cannot find a better linear estimation procedure and hence we should be
pleased.
• If violated, the ordinary least squares (OLS) estimation procedure calculation of the coeffi-
cient’s standard error, t-statistic, and tails probability will be flawed and the ordinary least
squares (OLS) estimation procedure is not the best linear unbiased estimation procedure (BLUE).
In this case, we can use a generalized least squares (GLS) estimation by tweaking the original
model in a way that eliminates the problem or calculate robust standard errors.
In this chapter we apply what we have learned to panel data (also called longitudinal data), situ-
ations in which have time series data for a number of cross sections. We use three artificially
generated examples to show how the use of panel data techniques can mitigate some of the dif-
ficulties encountered when using the ordinary least squares (OLS) estimation procedure. These
662 Chapter 21
examples are designed to illustrate the issues clearly. All three examples involve the score stu-
dents receive in their college classes:
• Math class panel data: Three students comprise the entire enrollment of a math class. Each
week a quiz is given. The quiz score earned by each student is collected from the first ten weeks
of the course. Also the number of minutes each student studied for each week’s quiz and each
student’s math SAT score from high school is available.
• Chemistry class panel data: Two students are enrolled in an advanced undergraduate chem-
istry course. Each week a lab report is due. The score earned by each student is collected from
the first ten labs along with the number of minutes each student devoted to the lab. Each week
a different graduate student grades both lab reports submitted by the two students.
• Studio art class panel data: Three students are randomly selected from a heavily enrolled
studio art class. Each week each student submits an art project. The score earned by each student
is collected from the first ten weeks of the course along with the number of minutes each student
devoted to the project.
We have data describing the performance of a number of students over a number of weeks. This
is what we mean by panel data. Cross-sectional and time series information are combined. In
our cases the students comprise the cross sections and the weeks the time series. As we will
learn, the existence of panel data can sometimes allow us to account for omitted variables.
Suppose that there are three college students enrolled in a small math class: Jim, Peg, and Tim.
A quiz is given weekly. Each student’s quiz score for the first ten weeks of the semester is
reported below along with the number of minutes the student studied and his/her math SAT score
from high school.
Math quiz score data: Artificially constructed panel data for 3 students during the first 10 weeks
of a math class (table 21.1):
Table 21.1
Math quiz panel data
Panel data combines time series and cross-sectional information. A time series refers to data
from a single entity in different time periods. A cross section refers to data for multiple entities
in a single time period. In this example, data from the ten weeks represent the time series; that
is, the ten weeks provide data from ten different time periods. The data from the three students
represent the cross section; that is, the three students provide data for three different entities.
Our assignment is to assess the effect of studying on the students’ math quizzes:
Project: Assess the effect of studying on math quiz scores.
1 Traditionally both the cross section and time are identified as subscripts. To reduce the possibility of confusion,
however, we use a superscript to identify the cross section so that there is only a single subscript, the subscript identify-
ing the time period.
664 Chapter 21
• The subscript t denotes time period, the week; that is, t equals 1, 2, . . . , or 10.
• The superscript i denotes the cross section, the individual student; that is, i equals Jim, Peg,
or Tim.
Let us now take a closer look at the data. The math SAT scores are from high school. Jim’s SAT
score from high school equaled a constant 720. Similarly Peg’s is a constant 760 and Tim’s is
a constant 670:
This allows us to simplify the notation. Since the SAT data for our three college students are
from high school, the MathSat variable only depends on the student and does not depend on the
week. We can drop the time subscript t for the MathSat variable, but we must, of course, retain
the individual student superscript i to denote the student:
Theory: The theory concerning how math SAT scores and studying affect quiz scores is
straightforward. Both coefficients should be positive:
βSat > 0: Higher math SAT scores increase a student’s quiz score
βMins > 0: Studying more increases a student’s quiz score
665 Panel Data and Omitted Variables
Table 21.2
Math quiz pooled OLS regression results—MathSAT and MathMins explanatory variables
We begin pooling the data and using the ordinary least squares (OLS) estimation procedure to
estimate the parameters of the model. We run a regression in which each week of student data
represents one observation. This is called a pooled regression because we merge the weeks and
students together. No distinction is made between a specific student (cross section) or a specific
week (time). In a pooled regression every week of data for each student is treated the same
(table 21.2).
Now apply the estimates to each of the three students:
EstMathScore
Peg
Jim
16.14
Tim
11.42
Slope = 0.43
5.52
MathMins
Figure 21.2
Estimates of math scores with math SAT explanatory variable
When we plot the estimated equations the three students have different intercepts, but all have
the same slope (figure 21.2). This is a consequence of our model’s assumption regarding the
effect of studying:
We have implicitly assumed that the effect of studying reflected by the coefficient βMins is the
same for each student.
Now let us introduce a twist. What if privacy concerns did not permit the college to release
student SAT data? In this case we must omit MathSat from our pooled regression (table 21.3).
What are the ramifications of omitting MathSat from this regression?
667 Panel Data and Omitted Variables
Table 21.3
Math quiz pooled OLS regression results—MathMins explanatory variable only
where ε ti = βSat MathSati + e ti . The MathSat term becomes “folded” into the new error term.
Will the ordinary least squares (OLS) estimation procedure for the coefficient value be unbi-
ased? The OLS bias question cited at the start of the chapter provides the answer:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?
More specifically, we need to determine whether or not the model’s explanatory variable, Math-
Minsti, and error term, ε ti , are correlated. To do so, suppose that MathSat i rises. Recall the defini-
tion of ε ti:
Question: Do high school students who receive high SAT math scores tend to study more or
less than those students who receive low scores?
Typically students earning higher SAT scores tend to study more. Hence MathMins ti would typi-
cally rise also. The explanatory variable, MathMins ti, and the error term, ε ti , are positively cor-
related. The ordinary least squares (OLS) estimation procedure for the value of the coefficient
668 Chapter 21
MathSat i up
i =
MathSat i and MathMins ti t Sat MathSat i + eti
positively correlated > 0
Sat
i
Typically MathMinsti up t up
MathMinsti and t
i
positively correlated
Figure 21.3
Math quiz scores and bias
is biased upward (figure 21.3). When the explanatory variable MathSat is omitted and the ordi-
nary least squares (OLS) estimation procedure is used to estimate the value of the MathMin’s
coefficient, upward bias results.
What can we do? We will now introduce two approaches we can take to address this problem:
• First differences
• Dummy variable/fixed effects
To explain the first differences approach, focus on the first student, Jim. Apply the model to
week t and the previous week, week t − 1:
First subtract. The first two terms are the right-hand side, βConst and βSatMathSat Jim, subtract out
leaving us with the following expression:
MathScoretJim − MathScoretJim
−1 = βMinsMathMinstJim − βMinsMathMinstJim
− 1 + et
Jim
− etJim
−1
By computing first differences, we have eliminated the omitted variable, MathSatJim, from the
equation because MathSat Jim is the same for all weeks.
669 Panel Data and Omitted Variables
Table 21.4
Math quiz first difference OLS regression results
MathScoretTim − MathScoretTim
− 1 = βMinsMathMinst
Tim
− βMinsMathMinstTim
− 1 + et
Tim
− etTim
−1
We can now generalize this by using the superscript i to represent the students:
Next generate two new variables and use the ordinary least squares (OLS) estimation proce-
dure to estimate the parameters of the first differences model (table 21.4):
In fact math SAT scores do not vary for a student from week to week because students take their
math SAT’s in high school, not while they are in college. But, if the math SAT scores for a
student were to vary from week to week, our logic would fail because the MathSat term for that
student would not subtract out when we calculated the first differences. So we have satisfied the
critical assumption in this case.
670 Chapter 21
Again, begin by focusing on Jim. Because MathSat Jim does not vary from week to week, we can
“fold” the MathSatJim term into Jim’s constant.
Leting α CJim
onst = βConst + βSat × 720
= α CJim
onst +βMins MathMinsJim + etJim
MathScoretPeg =α CPeg
onst +βMinsMathMins
Peg
+ etPeg
MathScoretTim =α CTim
onst +βMinsMathMins
Tim
+ eTim
t
We now have three separate equations: one for Jim, one for Peg, and one for Tim. We can
represent for the three equations concisely introducing three dummy variables:
MathScoreti = α CJim
onst DumJim + α Const DumPeg + α Const DumTim + βMinsMathMins + et
i Peg i Tim i i i
To convince yourself that the “concise” model is equivalent to the three separate equations,
consider each student individually:
Table 21.5
Math quiz OLS regression results—MathMins and cross-sectional dummy variable explanatory variables
Next we use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of our “concise” model (table 21.5). Let us plot the estimated equations for each student (figure
21.4). The dummy variable coefficient estimates are just the intercepts of the estimated equa-
tions. Jim’s intercept is 11.86, Peg’s 19.10, and Tim’s 7.72.
Statistical software makes it easy for us to do this. See table 21.6.
• Click on MathScore and then while holding the <Ctrl> key down, click on MathMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects Specification box, select Fixed from the Cross Section dropdown box.
• Click OK.
EstMathScore
Peg
Jim
19.10
Tim
11.86
Slope = 0.33
7.52
MathMins
Figure 21.4
Estimates of math scores with dummy cross-sectional variables
The intercept for each group equals the constant from the regression results (table 21.6) plus the
effect value from table 21.7:
These are just the dummy variable coefficient estimates, the intercepts of the estimated
equations.
The dummy/fixed effects approach relies on the same critical assumption as did the first dif-
ference approach:
673 Panel Data and Omitted Variables
Table 21.6
Math quiz cross-sectional fixed effects regression results
Table 21.7
Math quiz cross-sectional fixed effects
1 − 0.966005
2 6.273785
3 − 5.307779
Cross-sectional dummy variable/fixed effects critical assumption: For each student (cross
section) the omitted variable must equal the same value in each week (time period). That is,
from week to week:
• MathSatJim does not vary.
• MathSatPeg does not vary.
• MathSatTim does not vary.
If the math SAT scores for a student were to vary from week to week our logic would fail because
the MathSat term for that student could not be folded into that student’s constant. But, since the
SAT scores are from high school, they do not vary.
Next we consider a second scenario in which two students, Ted and Sue, are enrolled in an
advanced undergraduate chemistry course. Each week a lab report is due.
674 Chapter 21
Table 21.8
Chemistry lab panel data
Ted 1 63 83 Sue 1 63 83
Ted 2 64 92 Sue 2 64 92
Ted 3 70 82 Sue 3 70 82
Ted 4 80 95 Sue 4 80 95
Ted 5 71 85 Sue 5 71 85
Ted 6 78 96 Sue 6 78 96
Ted 7 68 86 Sue 7 68 86
Ted 8 67 96 Sue 8 67 96
Ted 9 80 89 Sue 9 80 89
Ted 10 63 90 Sue 10 63 90
Chemistry lab score data: Artificially constructed panel data for 2 students during the first 10
weeks of a chemistry class.
The scores of the two students and the time each devoted to each lab each week are given in
table 21.8.
Each week the lab reports of the two students are graded by one of 10 graduate students in
the small chemistry department. Each week a different graduate student grades the lab reports
of the two undergraduates; the undergraduate students do not know which graduate student will
be doing the grading beforehand. In the first week, both Ted’s and Sue’s lab reports are graded
by one graduate student; in the second week, Ted’s and Sue’s reports are graded by a second
graduate student; and so on. Our assignment is to use this information to assess the effect that
time devoted to the lab each week has on the score that week’s lab report receives.
Project: Assess the effect of time devoted to lab on the lab report score.
where
Recall that in a given week the same graduate student grades each student’s lab report. Therefore
we can drop the student superscript in the GraderGenerosity variable:
..
.
Sue
LabScore10 = βConst + βGGGraderGenerosity10 + βLabMinsLabMins10
Sue
+ e10
Sue
Generalizing obtains
where
i = Ted, Sue
t = 1, 2, . . . , 10
676 Chapter 21
Table 21.9
Chemistry lab pooled OLS regression results
We begin by using the ordinary least squares (OLS) estimation procedure in a pooled regression
to estimate the parameters. We include all ten weeks for each of the two students in a single
regression; consequently, we include a total of twenty observations. But we have a problem: the
explanatory variable GraderGenerosity is unobservable. We must omit it from the regression
(table 21.9).
What are the ramifications of omitting GraderGenerosity from the regression?
= βConst + βLabMinsLabMins ti + ε ti
where εti = βGGGraderGenerosityt + eti. The GraderGenerosityt term becomes “folded” into the
new error term, eti.
Will the ordinary least squares (OLS) estimation procedure for the coefficient value be unbi-
ased? The OLS bias question cited earlier in the chapter provides the answer:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?
To answer this question, suppose the grader in week t is unusually generous. Then GraderGen-
erosityt and the new error term, eti, would rise. Ted and Sue would not know about the grader’s
generosity until after the lab report was returned. Consequently the number of minutes devoted
677 Panel Data and Omitted Variables
GraderGenerosityt up
i i
t
up LabMins t unaffected
i i
LabMins t and t
independent
OLS estimation
procedure for coefficient
value is unbiased
Figure 21.5
Chemistry lab scores and bias
to the lab, LabMinsti, would be unaffected. The explanatory variable, LabMinsti and the new error
term, eti are independent. The ordinary least squares (OLS) estimation procedure for the value
of the coefficient is unbiased (figure 21.5).
Now let us move on to the OLS reliability question:
OLS reliability question: Are the ordinary least square (OLS) error term equal variance premise
and the error term/error term independence premises satisfied; that is, is the variance of the
probability distribution for each observation’s error term the same and are the error terms inde-
pendent of each other?
In fact the error terms are not independent. We would expect Ted’s error term in a particular
week to be correlated with Sue’s error term in that week. The reason stems from the fact that a
different graduate student grades the each week’s lab reports. Naturally some graduate students
will award more partial credit than others. For example, on the one hand, if a generous graduate
student grades the lab reports in the first week we would expect the error terms of both students
to be positive. On the other hand, if the first week’s reports are graded by a very demanding
graduate student, we would expect the error terms of both students to be negative.
How might be get a sense of whether or not this type of correlation is present in this case?
Recall that while the error terms are unobservable we can think of the residuals as the estimated
error terms. Table 21.10 reports on the residuals. The residuals appear to confirm our suspicions.
The sign of the residuals is always the same. Figure 21.6 plots a scatter diagram of Ted’s and
Sue’s residuals. Each point on the scatter diagram represents on specific week.
The scatter diagram points fall in the first and third quadrants. On the one hand, when the
residual of one student is positive, the residual for the other student is positive also. On the other
hand, when the residual of one student is negative, the residual for the other student is negative.
The scatter diagram suggests that our suspicions about error term/error term correlation are
warranted.
678 Chapter 21
Table 21.10
Chemistry lab OLS residuals
1 3.30 1.30
2 6.48 3.35
3 − 6.60 −3.14
4 1.27 2.30
5 − 4.11 −3.60
6 −2.01 −2.57
7 −1.57 −1.16
8 8.94 0.27
9 − 4.73 −7.03
10 4.99 4.32
8
Sue's residuals
Week 10
4
Week 2
Week 4
Week 1
Week 8
0
Week 7
–8 –6 –4 –2 0 2 4 6 8 10
Week 6 Ted's residuals
Week 3
–4
Week 5
Week 9 –8
Figure 21.6
Scatter diagram of Ted’s and Sue’s chemistry lab OLS residuals
679 Panel Data and Omitted Variables
To understand how period fixed effects can address this issue, recall the original model:
..
.
Now focus on week 1. We can fold the constant grader generosity term into the constant for
that week:
..
.
For each week we have folded the generosity of the grader into the constant. In each week
the constant is identical for both students because the same graduate grades both lab reports. We
now have ten new constants for each week, one constant for each of the ten weeks. Period fixed
effects estimates the values of parameters. Statistical software makes it easy to compute these
estimates (table 21.11).
680 Chapter 21
Table 21.11
Chemistry lab time period fixed effects regression results
• Click on MathScore and then, while holding the <Ctrl> key down, click on MathMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects specification box, select Fixed from the Period dropdown box.
• Click OK.
Statistical software allows us to obtain the estimates of each week’s constant (table 21.12).
The period fixed effects suggest that the graduate student who graded the lab reports for week
9 was the toughest grader and the graduate student who graded for week 8 was the most
generous.
Period dummy variable/fixed effects critical assumption: For each week (time period) the
omitted variable must equal the same value for each student (cross section).
681 Panel Data and Omitted Variables
Table 21.12
Chemistry lab period fixed effects
1 3.171238
2 4.466844
3 − 4.948602
4 2.805060
5 − 4.082423
6 −3.251531
7 −1.448602
8 4.819041
9 −5.814780
10 4.283755
Thus far we have been considering the cases in which we had data for all the students in the
course. In reality this is not always true, however. For example, in calculating the unemployment
rate, the Bureau of Labor Statistics conducts a survey that acquires data from approximately
60,000 American households. Obviously there are many more than 60,000 households in the
United States.
We will now consider a scenario to gain insights into such cases. Suppose that there are several
hundred students enrolled in a studio art course at a major university. Each week a studio art
project is assigned. At the beginning of the semester, three students were selected randomly from
all those enrolled: Bob, Dan, and Kim.
Art project score data: Artificially constructed panel data for 3 students during the first 10 weeks
of a studio art class.
The scores of the three students and the time each devoted to the project each week are reported
in table 21.13:
Our assignment is to use information from the randomly selected students to assess the effect
that time devoted to the project each week has on the score that week’s project receives:
Project: Assess the effect of time devoted to project on the project score.
682 Chapter 21
Table 21.13
Studio art panel data
Bob 1 13 35 Dan 1 17 55
Bob 2 17 42 Dan 2 11 57
Bob 3 19 33 Dan 3 21 61
Bob 4 23 45 Dan 4 15 58
Bob 5 13 31 Dan 5 13 54
Bob 6 15 42 Dan 6 17 62
Bob 7 17 35 Dan 7 19 61
Bob 8 13 37 Dan 8 13 53
Bob 9 17 35 Dan 9 11 50
Bob 10 13 34 Dan 10 9 55
Kim 1 27 53 Kim 6 19 43
Kim 2 23 53 Kim 7 25 56
Kim 3 21 49 Kim 8 25 50
Kim 4 23 48 Kim 9 17 44
Kim 5 27 53 Kim 10 19 48
where
The ArtIQi variable, innate artistic talent, requires explanation. Clearly, ArtIQ is an abstract
concept and is unobservable. Therefore, we must omit it from the regression. Nevertheless, we
do know that different students possess different quantities of innate artistic talent. Figure 21.7
illustrates this notion.
683 Panel Data and Omitted Variables
ArtIQ i
Mean[ArtIQ]
Figure 21.7
Probability distribution of ArtIQ random variable
Since our three students were selected randomly, define a random variable, vi, to equal the
amount by which a student’s innate artistic talent deviates from the mean:
ArtIQi = Mean[ArtIQ] + vi
where vi is a random variable. Since the three students were chosen randomly, the mean of vi’s
probability distribution equals 0:
Mean[vi] = 0
Next let us incorporate our specification of ArtIQi to the model:
Rearrange terms
= αConst + βMinsArtMins ti + ε ti
where ε ti = βArtIQvi + e ti and
ε represents random influences for student i in week t.
i
t
684 Chapter 21
Table 21.14
Studio art pooled OLS regression results
We begin with a pooled regression by using the ordinary least squares (OLS) estimation proce-
dure to estimate model’s parameters (table 21.14):
bMins = 0.40
We estimate that a ten-minute increase devoted to an art project increases a student’s score by
4 points. Will the ordinary least squares (OLS) estimation procedure for the coefficient value be
unbiased? The OLS bias question cited earlier in the chapter provides the answer:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?
where
ε ti = βArtIQvi + e ti
ArtIQi = Mean[ArtIQ] + vi
When vi increases both innate artistic ability, ArtIQi, and the model’s error term, ε ti , increases.
Therefore the correlation or lack thereof between innate artistic ability, ArtIQi, and the amount
of time devoted to the project, ArtMins ti , determines whether or not the ordinary least squares
(OLS) estimation procedure for the coefficient value is biased or unbiased.
685 Panel Data and Omitted Variables
v i up
vi > 0
i
ArtIQ = Mean[ArtIQ]+ vi ArtIQ
i
ArtIQ up
i
t = ArtIQ vi + eti
ArtMinsti ? i
t up
Figure 21.8
Studio art projects and bias
It is unclear how the amount a time students devote to their studio art projects will be cor-
related with their innate artistic ability. It could be argued that students with more artistic ability
will be more interested in studio art and hence would devote more time to their art projects.
However, highly talented students may only spend a little time on their projects because they
only need to spend a few minutes to get a good score.
The random effects (RE) estimation procedure is only appropriate when the omitted explana-
tory variable and the included explanatory variable are independent. Consequently we will now
assume that innate artistic ability, ArtIQi, and the time devoted to studio art projects, ArtMins ti,
are independent so to motivate the rationale of the random effects (RE) estimation procedure.
Since we are assuming independence, we can move on to pose the OLS reliability question:
OLS reliability question: Are the ordinary least square (OLS) error term equal variance premise
and the error term/error term independence premises satisfied; that is, is the variance of the
probability distribution for each observation’s error term the same and are the error terms
independent?
In fact the error terms are not independent. To understand why, note that the error term, ε ti, in
this model is interesting; it has two components: ε ti = βArtIQvi + e ti. The first term, βArtIQvi, reflects
innate artistic talent of each randomly selected student:
• Bob’s deviation from the innate artistic talent mean: vBob
• Dan’s deviation from the innate artistic talent mean: vDan
• Kim’s deviation from the innate artistic talent mean: vKim
The second term, e ti, represents the random influences of each student’s weekly quiz.
686 Chapter 21
.. .. ..
. . .
Kim 10 βArtIQvKim + e10
Kim
Each of Bob’s error terms has a common term, βArtIQvBob. Similarly each of Dan’s error terms
has and common error term, βArtIQvDan, and each of Kim’s error terms has a common term,
βArtIQvKim. Consequently the error terms are not independent. Since the error term/error
term independence premise is violated, the standard error calculations made by the ordinary
least squares (OLS) estimation procedure are flawed; furthermore the ordinary least square
estimation procedure for the coefficient value is not the best linear unbiased estimation
procedure (BLUE).
To check our logic, we would like to analyze the error terms to determine if they appear to
be correlated, but the error terms are not observable. We can exam the residuals, however. Recall
that the residuals can be thought of as the estimated error terms (figure 21.9). The residuals
indeed suggested that the error term/error term independence premise is violated.
The random effects estimation procedure exploits this error term pattern to calculate “better”
estimates (table 21.15).
687 Panel Data and Omitted Variables
20
15
10
0 Obs
0 5 10 15 20 25 30
–5
–10
Bob Dan Kim
–15
–20
Figure 21.9
Art class ordinary least squares (OLS) residuals
Table 21.15
Studio art cross-sectional random effects regression results
• Click on ArtScore and then, while holding the <Ctrl> key down, click on ArtMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects Specification box, select Random from the Cross Section dropdown box.
• Click OK.
688 Chapter 21
The intuition behind all this is that we can exploit the additional information about the error
terms to improve the estimation procedure. Additional information is a “good thing.” It is worth
noting that we adopted the same strategy when we studied heteroskedasticity and autocorrelation
(chapters 16 and 17). When the error terms are not independent we can exploit that information
to improve our estimate beyond what the ordinary least squares (OLS) estimation procedure
provides. In this case we used the random effects estimation procedure to do so.
For each student (cross section) the omitted variable must equal the same value in each week
(time period). That is, from week to week:
• ArtIQBob does not vary.
• ArtIQDan does not vary.
• ArtIQKim does not vary.
1. What is the critical assumption that the first differences estimation procedure makes?
2. What is the critical assumption that the cross section fixed effects (FE) estimation procedure
makes?
3. What is the critical assumption that the period fixed effects (FE) estimation procedure makes?
4. What is the critical assumption that the random effects (RE) estimation procedure makes?
5. What is the intuition behind our treatment of heteroskedasticity and autocorrelation? How is
the random effects estimation procedure similar?
Chapter 21 Exercises
Beer consumption data: Panel gasoline data of beer consumption, beer prices, and income
statistics for fifty states and the District of Columbia from 1999 to 2007:
BeerPC ti Per capita beer consumption in state i during year t (12 oz cans)
Pricet Real price of beer in year t (1982–84 dollars per 12 oz can)
i
IncPC t Per capita real disposable income in state i during in year t (thousands of chained
2005 dollars)
689 Panel Data and Omitted Variables
a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
without fixed or random effects, estimate the value of each coefficient. Interpret the coeffi-
cient estimates. What are the critical results?
Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.
4. Do not include the variable Year as an explanatory variable. Instead introduce period fixed
effects.
a. Are the coefficient estimates of CapitalHuman, CapitalPhysical, GdpPC, and Auth quali-
tatively consistent with the estimates that you obtained in the previous exercise when the
variable Year was included and period fixed effects were not specified?
b. Examine the estimates of the period fixed effects coefficients. Are they qualitatively con-
sistent with the coefficient estimate for the variable year that you obtained when period fixed
effects were not specified?
Motor fuel consumption data for Arkansas, Massachusetts, and Washington: Panel data relating
to motor fuel consumption for Arkansas, Massachusetts, and Washington from 1973 to 2007.
These three states were chosen randomly from the fifty states and the District of Columbia.
MotorFuelPCti Per capita motor fuel consumption in state i during year t (gallons)
Priceti Real price of gasoline in state i during year t (1982–84 dollars per gallon)
i
IncPC t Per capita real disposable income in state i during year t (thousands of
chained 2005 dollars)
PopDenti Population density in state i during year t (persons per square mile)
i
UnemRate t Unemployment rate in state i during year t (percent)
a. Develop a theory regarding how the explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
without fixed or random effects, estimate the value of each coefficient. Interpret the coeffi-
cient estimate. What is the critical result?
691 Panel Data and Omitted Variables
Chapter 22 Outline
1. This question requires slogging through much high school algebra, so it is not very exciting.
While tedious, it helps us understand simultaneous equations models. Consider the following
two equations that model the demand and supply of beef:
where
Let
Pt = α PConts + α FP
P
FeedPt + α PIInct + ε Pt
Compare these two equations for Qt and Pt with the two equations for Qt and Pt in problem
1. Express α QFP, α QI, α FP
P
, and α PI in terms of the β’s appearing in problem 1:
695 Simultaneous Equations Models—Introduction
Since the actual parameters of the model, the β’s, are unobservable, we estimate them. The
estimated parameters are denoted by italicized Roman b’s:
In terms of the estimated coefficients, bx1 and/or bx2, what is the expression for the estimated
change in y?
c. If x1 changes by Δx1 while x2 constant: Δy =
d. If x2 changes by Δx2 while x1 constant: Δy =
e. Putting parts c and d together, if both x1 and x2 change: Δy =
Equilibrium: QD = QS = Q
where
Use algebra to solve for the equilibrium price and quantity. That is,
a. Express the equilibrium price, P, in terms of FeedP and Inc.
b. Express the equilibrium quantity, Q, in terms of FeedP and Inc.
These two equations are called the reduced form (RF) equations.
696 Chapter 22
Demand and supply curves are arguably the economist’s most widely used tools. They provide
one example of simultaneous equations models. Unfortunately, as we will shortly show, the
ordinary least squares (OLS) estimation procedure is biased when it is used to estimate the
parameters of these models. To illustrate this, we begin by reviewing the effect that explanatory
variable/error term correlation has on the ordinary least squares (OLS) estimation procedure.
Then we focus on a demand/supply model to explain why the ordinary least squares (OLS)
estimation procedure leads to bias.
697 Simultaneous Equations Models—Introduction
Explanatory variable and error term Explanatory variable and error term
e positively correlated et negatively correlated
t
xt xt
yt yt
Actual
equation line
Best fitting line
xt xt
Figure 22.1
Explanatory variable and error term correlation
Explanatory variable/error term correlation (figure 22.1) leads to bias. On the one hand, when
the explanatory variable and error term are positively correlated, the best fitting line is more
steeply sloped than the actual equation line; consequently the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased upward. On the other hand, when the
explanatory variable and error term are negatively correlated, the best fitting line is less steeply
sloped than the actual equation line; consequently, the ordinary least squares (OLS) estimation
procedure for the coefficient value is biased downward.
Consider the market for a good such as food or clothing. The following two equations describe
a standard demand/supply model of the market for the good:
where
The quantity of a good demanded is determined by the good’s own price and other demand
factors such as income, the price of substitutes, the price of and complements. Similarly the
quantity of a good supplied is determined by the good’s own price and other supply factors such
as wages and raw material prices.
The market is in equilibrium whenever the quantity demanded equals the quantity supplied:
Q Dt = Q St = Qt
Both the quantity, Qt, and the price, Pt, are determined simultaneously as depicted by the famous
demand/supply diagram reproduced in figure 22.2.
699 Simultaneous Equations Models—Introduction
Price
Q = Equlibrium quantity
P
P = Equilibrium price
D
Quantity
Q
Figure 22.2
Demand/supply model
Endogenous variables: Variables determined “within” the model; namely price and quantity
Exogenous variables: Variables determined “outside” the model; namely other demand and
supply factors
In single equation models, there is only one endogenous variable, the dependent variable itself;
all explanatory variables are exogenous. For example, in the following single model the depen-
dent variable is consumption and the explanatory variable income:
700 Chapter 22
The model only attempts to explain how consumption is determined. The dependent variable,
consumption, is the only endogenous variable. The model does not attempt to explain how
income is determined; that is, the values of income are taken as given. All explanatory variables,
in this case only income, are exogenous.
In a simultaneous equations model, while the dependent variable is endogenous, an explana-
tory variable can be either endogenous or exogenous. In the demand/supply model, quantity, the
dependent variable, is an endogenous; quantity is determined “within” the model. Price is both
an endogenous variable and an explanatory variable. Price is determined “within” the model,
and it is used to explain the quantity demanded and the quantity supplied.
What are the consequences of endogenous explanatory variables for the ordinary least squares
(OLS) estimation procedure?
Claim: Whenever an explanatory variable is also an endogenous variable, the ordinary least
squares (OLS) estimation procedure for the value of its coefficient is biased.
We will now use the demand and supply models to justify this claim.
When the ordinary least squares (OLS) estimation procedure is used to estimate the demand
model, the good’s own price and the error term are positively correlated; accordingly, the ordi-
nary least squares (OLS) estimation procedure for the value of the price coefficient will be biased
upward (figure 22.3). Let us now show why.
Price
D (eD up)
S
P ( eD up)
P ( eD down)
D (eD down) D
Quantity
Figure 22.3
Effect of the demand error term
701 Simultaneous Equations Models—Introduction
We are focusing on the demand model; hence, the Dem radio button is selected. The lists imme-
diately below the Dem radio button specify the demand model. The actual constant equals 30,
the actual price coefficient equals −4, and so forth. XCoef represents an “other demand factor,”
such as income.
Be certain that the Pause checkbox is cleared. Click Start and then after many, many repeti-
tions click Stop. The average of the estimated demand price coefficient values is −2.6, greater
than the actual value, −4.0 (table 22.1). This result suggests that the ordinary least squares (OLS)
estimation procedure for the value of the price coefficient is biased upward. Our Econometrics
Lab confirms our suspicions.
But even though the ordinary least squares (OLS) estimation procedure is biased, it might be
consistent, might it not? Recall the distinction between an unbiased and a consistent estimation
procedure:
702 Chapter 22
Dem Sup
20 0
Const Unbiased estimation
30 10
procedure: After many,
many repetitions of the
experiment the average
−4 1 (mean) of the estimates
PCoef equals the actual value
−1 4
1 1
2 2
XCoef
Is the estimation
30 30 procedure for the
XVar 40 40 coefficient’s value
unbiased?
Estimation
procedure
10 10
Err Var 20 20
OLS
Mean (average)
of the estimated PCoef value est
coefficient
values from all Mean
repetitions
Figure 22.4
Simultaneous equations simulation
Table 22.1
Simultaneous equations simulation results—Demand
Table 22.2
Simultaneous equations simulation results—Demand
Unbiased: The estimation procedure does not systematically underestimate or overestimate the
actual value; that is, after many, many repetitions the average of the estimates equals the actual
value.
Consistent but biased: As consistent estimation procedure can be biased. But, as the sample
size, as the number of observations, grows:
• The magnitude of the bias decreases. That is, the mean of the coefficient estimate’s probability
distribution approaches the actual value.
• The variance of the estimate’s probability distribution diminishes and approaches 0.
How can we use the simulation to investigate this possibility? Just increase the sample size.
If the procedure is consistent, the average of the estimated coefficient values after many, many
repetitions would move closer and closer to −4.0, the actual value, as we increase the sample
size. That is, if the procedure is consistent, the magnitude of the bias would decrease as the
sample size increases. (Also the variance of the estimates would decrease.) So let us increase
the sample size from 20 to 30 and then to 40. Unfortunately, we observe that a larger sample
size does not reduce the magnitude of the bias (table 22.2).
When we estimate the value of the price coefficient in the demand model, we find that the
ordinary least squares (OLS) estimation procedure fails in two respects:
Bad news: The ordinary least squares (OLS) estimation procedure is biased.
Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.
We will now use the same line of reasoning to show that the ordinary least squares (OLS) esti-
mation procedure for the value of the price coefficient in the supply model is also biased (figure
22.5).
Price
S (eS down)
S
P ( eS down)
S (eS up)
P
P ( eS up)
D
Quantity
Figure 22.5
Effect of the supply error term
The ordinary least squares (OLS) estimation procedure for the value of the price coefficient in
the supply model should be biased downward. Once again, we will use a simulation to confirm
our logic.
Table 22.3
Simultaneous equations simulation results—Supply
S S
Prob[ b < 0 ] Prob[b > 0]
P P
S
bP
− 0.4 1. 0
0
Figure 22.6
Probability distribution of price coefficient estimate
We are now focusing on the supply curve; hence, the Sup radio button is selected. Note that the
actual value of the supply price coefficient equals 1.0. Be certain that the Pause checkbox is
cleared. Click Start and then after many, many repetitions click Stop. The average of the esti-
mated coefficient values is −1.4, less than the actual value, 1.0. This result suggests that the
ordinary least squares (OLS) estimation procedure for the value of the price coefficient is biased
downward, confirming our suspicions.
But might the estimation procedure be consistent? To answer this question increase the sample
size from 20 to 30 and then from 30 to 40. The magnitude of the bias is unaffected. Accordingly,
it appears that the ordinary least squares (OLS) estimation procedure for the value of the price
coefficient is not consistent either (table 22.3).
When estimating the price coefficient’s value in the supply model, the ordinary least squares
(OLS) estimation procedure fails in two respects (figure 22.6):
706 Chapter 22
Bad news: The ordinary least squares (OLS) estimation procedure is biased.
Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.
The supply model simulations illustrate a problem even worse than that encountered when
estimating the demand model. In this case the bias can be so severe that the mean of the coef-
ficient estimate’s probability distribution has the wrong sign. To gain more intuition, suppose
that the probability distribution is symmetric. Then the chances that the coefficient estimate
would have the wrong sign are greater than the chances that it would have the correct sign when
using the ordinary least squares (OLS) estimation procedure. This is very troublesome, is it not?
We have used the demand and supply models to illustrate that an endogenous explanatory vari-
able creates a bias problem for the ordinary least squares (OLS) estimation procedure. Whenever
an explanatory variable is also an endogenous variable, the ordinary least squares (OLS) estima-
tion procedure for the value of its coefficient is biased.
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.
Endogenous variables: Both the quantity of beef and the price of beef, Qt and Pt, are endogenous
variables; they are determined within the model.
Exogenous Variables:
• Disposable income is an “Other demand factor”; disposable income, Inct, is an exogenous
variable that affects demand. Since beef is regarded as a normal good, we expect that households
would demand more beef when income rises.
• The price of chicken is also an “Other demand factor”; the price of chicken, ChickPt, is an
exogenous variable that affects demand. Since chicken is a substitute for beef, we expect that
households would demand more beef when the price of chicken rises.
707 Simultaneous Equations Models—Introduction
• The price of cattle feed is an “Other supply factor”; the price of cattle feed, FeedPt, is an
exogenous variable that affects supply. Since cattle feed is an input to the production of beef,
we expect that firms would produce less when the price of cattle feed rises.
Now let us formalize the simultaneous equations demand/supply model that we will
investigate:
Equilibrium: Q Dt = Q St = Qt
Let us begin by using the ordinary least squares (OLS) procedure to estimate the parameters.
As reported in table 22.4, the estimate of the demand model’s price coefficient is negative,
−364.4, suggesting that higher prices decrease the quantity demanded. The result is consistent
with economic theory suggesting that the demand curve is downward sloping.
Table 22.4
OLS regression results—Demand model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 22.5
OLS regression results—Supply model
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
As reported in table 22.5, the estimate of the supply model’s price coefficient is negative,
−231.5, suggesting that higher prices decrease the quantity supplied. Obviously this result is not
consistent with economic theory. This result suggests that the supply curve is downward sloping
rather than upward sloping. But what have we just learned about the ordinary least squares (OLS)
estimation procedure. The ordinary least squares (OLS) estimation procedure for the price coef-
ficient estimate of the supply model will be biased downward. This could explain our result,
could it not?
We will now describe an alternative estimation procedure, the reduced form (RF) estimation
procedure. We will show that while this new procedure does not “solve” the bias problem, it
mitigates it. More specifically, while the procedure is still biased, it proves to be consistent. In
this way, the new procedure is “better than” ordinary least squares (OLS). We begin by describ-
ing the mechanics of the reduced form (RF) estimation procedure.
We have argued that the ordinary least squares (OLS) estimation procedure leads to bias
because an endogenous variable, in our case the price, is an explanatory variable. The reduced
form (RF) approach begins by using algebra to express each endogenous variable only in terms
of the exogenous variables. These new equations are called the reduced form (RF) equations.
Intuition: Since bias results from endogenous explanatory variables, algebraically manipulate
the simultaneous equations model to express each endogenous variable only in terms of the
exogenous variables. Then use the ordinary least squares (OLS) estimation procedure to estimate
the parameters of these newly derived equations, rather than the original ones.
Step 1: Derive the reduced form (RF) equations from the original models.
• The reduced form (RF) equations express each endogenous variable in terms of the exogenous
variables only.
• Algebraically solve for the original model’s parameters in terms of the reduced form (RF)
parameters.
Step 2: Use ordinary least squares (OLS) estimation procedure to estimate the parameters of
the reduced form (RF) equations.
Step 3: Calculate coefficient estimates for the original models using the derivations from step
1 and estimates from step 2.
Step 1: Derive the Reduced Form (RF) Equations from the Original Models
We begin with the supply and demand models:
Equilibrium: QDt = Q St = Qt
There are six parameters of the demand and supply models: βConst D
, β PD, β DI, β Const
S
, βPS, and βFP
S
.
We wish to estimate the values of these parameters.
The reduced form (RF) equations express each endogenous variable in terms of the exogenous
variables. In this case we wish to express Qt in terms of FeedPt and Inct and Pt in terms of FeedPt
and Inct. The appendix at the end of this chapter shows that how elementary, yet laborious,
algebra can be used to derive the following reduced form (RF) equations for the endogenous
variables, Qt and Pt:
β PS βConst
D
− β PD βConst
S
β PD β FP
S
β PS β ID β PS etD − β PD etS
Qt = − FeedPt + Inct +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
βConst
D
− βConst
S
β FP
S
β ID etD − etS
Pt = − FeedPt + Inct +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
Now let us make an interesting observation about the reduced form (RF) equations. Focus
first on the ratio of the feed price coefficients and then on the ratio of the income coefficients.
These ratios equal the price coefficients of the original demand and supply models, β PD and βPS :
710 Chapter 22
Qt = α Const
Q
+ αFP
Q
FeedPt + α QIInct + ε Qt
Pt = αConst
P
+ α FP
P
FeedPt + α PIInct + ε Pt
First, consider the notation we use in the reduced form (RF) equations. Superscripts refer to
the reduced form (RF) equation:
The parameter subscripts refer to the constants and coefficients of each reduced form (RF)
equation:
There are six parameters of the reduced form (RF) equations: α Const
Q
, α FP
Q
, α QI, α PConst, α FP
P
, and α PI.
711 Simultaneous Equations Models—Introduction
By comparing the two sets of reduced form (RF) equations, we can express each of the reduced
form (RF) parameter, each α, in terms of the parameters of the original demand and supply
models, the β’s. We have six equations:
β PS βConst
D
− β PD βConst
S
β Dβ S βSβD
α Const
Q
= , α FP
Q
= − SP FPD , α IQ = S P I D
βP − βP
S D
βP − βP βP − βP
βConst
D
− βConst
S
β FP
S
β ID
α Const
P
= , α P
FP = − , α P
I =
β PS − β PD β PS − β PD β PS − β PD
There are six parameters of the original demand and supply models: βConst
D
, β PD, β DI, βConst
S
, βPS, and
β FP. That is, we have six unknowns, the β’s. We have six equations and six unknowns. We can
S
solve for the unknowns by expressing the β’s in terms of the α’s. For example, we can solve for
price coefficients of the original demand and supply models, β DP and β SP:
α P
FP α P
I
These coefficients reflect the “slopes” of the demand and supply curves.1
Step 2: Use Ordinary Least Squares (OLS) to Estimate the Reduced Form Equations
We use the ordinary least squares (OLS) estimation procedure to estimate the α’s (see tables
22.6 and 22.7).
Estimate of α FP
Q
= αFP
Q
= −332.00 Estimate of α QI = α QI = 17.347
Estimate of α FP
Q
= α FP
Q
= 1.0562 Estimate of α PI = α IP = 0.018825
1. The coefficients do not equal the slope of the demand curve, but rather the reciprocal of the slope. They are the ratio
of run over rise instead of rise over run. This occurs as a consequence of the economist’s convention of placing quantity
on the horizontal axis and price on the vertical axis. To avoid the awkwardness of using the expression “the reciprocal
of the slope” repeatedly, we will place the word slope within quotes to indicate that it is the reciprocal.
712 Chapter 22
Table 22.6
OLS regression results—Quantity reduced form (RF) equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 22.7
OLS regression results—Price reduced form (RF) equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
• Feed price reduced form (RF) estimates: Since cattle feed is an input for beef production,
an increase in the feed price shifts the supply curve for beef to the left. As figure 22.7 illustrates,
the equilibrium quantity falls and the equilibrium price rises.
The feed price coefficient estimate in the quantity reduced form (RF) equation is negative,
−332.00. The negative estimate suggests that an increase in feed prices reduces the quantity. The
713 Simultaneous Equations Models—Introduction
Table 22.8
Reduced form (RF) coefficient estimates
Price
Feed price increases
S
S'
D
Quantity
Figure 22.7
Demand/supply analysis—An increase in feed price
feed price estimate in the price reduced form (RF) equation is positive, 1.0562. This suggests
that an increase in the feed price increases the price of beef.
Quantity Price
falls rises
The feed price coefficient estimates are consistent with the standard demand/supply analysis.
• Income reduced form (RF) estimates: Since beef is generally regarded as a normal good,
an increase in income shifts the demand curve for beef to the right. As figure 22.8 illustrates,
the equilibrium quantity and price both increase.
The income coefficient estimates in both the quantity and price reduced form (RF) regression
are positive, 17.347 and .018825. The positive estimates suggest that an increase in income cause
both the quantity and the price of beef to rise.
714 Chapter 22
Price
Income increases
S
D'
D
Quantity
Figure 22.8
Demand/supply analysis—An increase in income
Income increases
Quantity Price
rises falls
The income coefficient estimates are consistent with the standard demand/supply analysis.
We now return to complete step 3 of the reduced form (RF) estimation procedure.
Step 3: Calculate Coefficient Estimates for the Original Model Using the Derivations and
Estimates from Steps 1 and 2
We will use the reduced form (RF) coefficient estimates from step 2 to estimate the price coef-
ficients of the demand and supply models, β DP and β PS , the “slopes” of the supply and demand
curves. To do so, we apply the equations for β PD and β PS that we derived in step 1.
715 Simultaneous Equations Models—Introduction
22.3.4 Comparing Ordinary Least Squares (OLS) and Reduced Form (RF) Estimates
We will now compare the ordinary least squares (OLS) and reduced form (RF) estimates of the
price coefficients (table 22.9). The supply curve price coefficient is the most obvious difference.
The ordinary least squares (OLS) estimate is negative while the reduced form (RF) estimate is
positive. In view of our upward sloping supply curve theory, this result is comforting. Unlike
the ordinary least squares (OLS) estimates, the signs of the reduced form (RF) price coefficient
estimates are consistent not only with our theory of demand, but also our theory of supply.
Consequently we will now show that the reduced form (RF) estimation procedure is “better”
than the ordinary least squares (OLS) estimation procedure when estimating simultaneous equa-
tions models.
Previously, we used the simultaneous equations simulation to show that the ordinary least squares
(OLS) estimation procedure was neither unbiased nor consistent when estimating the values of
Table 22.9
Comparing OLS and RF price coefficient estimates
Table 22.10
Simultaneous equations simulation results
the price coefficients. Now, we will use this simulation to investigate the properties of the
reduced form (RF) estimation procedure. It would be wonderful if the reduced form (RF)
approach were unbiased. Failing that, might the reduced form (RF) approach be consistent?
While we could address these issues rigorously, we will avoid the complex mathematics by using
a simulation.
Note that the reduced form (RF), rather than the ordinary least squares (OLS), estimation pro-
cedure is now selected. Also the Dem radio button is selected initially; the demand model is
being analyzed. Be certain that the Pause checkbox is cleared. Click Start and then after many,
many repetitions click Stop. Next select the Sup radio button and repeat the process to analyze
the supply model.
Table 22.10 reports the reduced form (RF) results for a sample size of 20. The results suggest
that reduced form (RF) estimation procedures for the price coefficients are biased. The averages
of the estimated price coefficient values after many, many repetitions do not equal the actual
values for either the demand or supply models. The average of the demand price coefficient
estimates equals −4.3 while the actual value equals −4.0; similarly the averate of the supply price
coefficient estimates equals 1.3 while the actual value equals 1.0.
But perhaps that unlike the ordinary least squared (OLS) estimation procedure, the reduced
form (RF) approach might be consistent. To address this question, we increase the sample size,
first from 20 to 30 and then from 30 to 40 (table 22.11). As the sample size becomes larger, bias
is still present but the magnitude of the bias diminishes for both the demand and supply price
coefficients. Furthermore the variances also fall. The simulation illustrates that while the reduced
form (RF) estimation procedure for the price coefficient value is still biased, it is consistent.
We can conclude that the reduced form (RF) estimation procedure for the coefficient value
of an endogenous explanatory variable provides both good news and bad news:
Bad news: The reduced form (RF) estimation procedure for the coefficient value is biased.
717 Simultaneous Equations Models—Introduction
Table 22.11
Simultaneous equations simulation results
Good news: The reduced form (RF) estimation procedure for the coefficient value is
consistent.
Let us reexamine how we obtained the estimates for the price coefficients of the demand and
supply models:
Q
aFP −332.00 aQ 17.347
bPD = P
= = −314.3, bPS = IP = = 921.5
aFP 1.0562 aI 0.018825
These equations for the two price coefficient estimates appear paradoxical at first glance:
• The demand model’s price coefficient, bPD, depends on the reduced form (RF) coefficients of
Q P Q P
feed price, aFP and aFP . But aFP and aFP tell us something about supply, not demand. They tell
us how the feed price, a variable that shifts the supply curve, affects the equilibrium quantity
and price.
• Similarly the supply model’s price coefficient, bPS, depends on the reduced form (RF) coeffi-
cients of income, aQI and a IP. But a IQ and a IP tell us something about demand, not supply. They
tell us how income, a variable that shifts the demand curve, affects the equilibrium quantity and
price.
22.6.1 Review: Goal of Multiple Regression Analysis and the Interpretation of the Coefficients
• Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the
individual effect that each explanatory variable has on the dependent variable.
718 Chapter 22
where
Since the actual parameters of the model, the β’s are unobservable, we estimate them:
The coefficient estimates attempt to separate out the individual effect that each explanatory
variable has on the dependent variable. To explain what this means, focus on the estimate of the
first explanatory variable’s coefficient, bx1. It estimates the change in the dependent variable
resulting from a change in the explanatory variable 1 while all other explanatory variables remain
constant. More formally,
Δy
Δy = bx1Δx1 or bx1 =
Δx1
where
while all other explanatory variables remain constant. A little algebra explains why. We begin
with the equation estimating our model:
Now increase the explanatory variable 1 by Δx1 while keeping all other explanatory variables
constant. Δy estimates the resulting change in the dependent variable.
From To
Price: x1 → x1 + Δx1
Quantity: Esty → Esty + Δy
719 Simultaneous Equations Models—Introduction
All other explanatory variables remain constant. In the equation estimating our model,
substitute
Esty + Δy x1 + Δx1
and
for Esty for x1
Esty = bConst + bx1x1 + bx2x2 Substituting
↓ ↓
Esty + Δy = bConst + bx1(x1 + Δx1) + bx2x2 Multiplying through by bx1
↓
Esty + Δy = bConst + bx1x1 + bx1Δx1 + bx2x2
Δy = 0 + bx1Δx1 + 0
Simplify:
Δy = bx1Δx1
Δy
= bx1
Δx1
while all other explanatory variables remain constant.
Using the same logic, we can interpret the estimate of the second explanatory variable’s coef-
ficient, bx2, analogously:
Δy
Δy = bx 2 Δx 2 or bx 2 =
Δx 2
while all other explanatory variables remain constant. bx2 allows us to estimate the change in the
dependent variable when explanatory variable 2 changes while all other explanatory variables
remain constant.
What happens when both explanatory variables change simultaneously?
The total estimated change in the quantity demanded equals the sum of the individual changes:
Each term estimates the change in the dependent variable resulting from a change in each indi-
vidual explanatory variable. We will now apply the interpretation of the coefficient estimates to
resolve the paradoxes.
We will resolve the paradoxes by applying the interpretation of the coefficient estimates:
• First, to the original simultaneous equations models.
• Second, to the reduced form (RF) equations.
22.6.2 Paradox: Demand Model Price Coefficient Depends on the Reduced Form (RF) Feed
Price Coefficients
We will first explain why the price coefficient estimate of the demand model, bPD, is determined
Q P
by the reduced form (RF) feed price coefficient estimates, aFP and aFP .
Recall the demand model:
Interpret the price coefficient estimate, bPD. The price coefficient estimate of the demand model
estimates the change in the quantity of beef demanded when price of beef changes while income
remains constant.
ΔQD = bPDΔP
ΔQ D
bPD =
ΔP
while income remains constant. Since income remains constant, the demand curve does not shift;
hence, bPD is just the estimated “slope” of the demand curve for beef (figure 22.9).
Next consider the reduced form (RF) equations that estimate the quantity and price:
Suppose that the feed price decreases while income remains constant. As shown in figure 22.10,
the decrease in feed prices shifts the supply curve for beef to the right.
Now interpret the feed price coefficients in the reduced form (RF) equations:
• The feed price coefficient of the quantity reduced form (RF) equation estimates the change in
the beef quantity of beef when the feed price changes while income remains constant:
721 Simultaneous Equations Models—Introduction
Price
Income constant
D
D
ΔQ
bP = ΔP
ΔP
D
ΔQ D
Quantity
Figure 22.9
“Slope” of demand curve
Price
Income constant
S
Feed price
decreases
S'
ΔP
ΔQ D
Quantity
Figure 22.10
Feed price decreases and income remains constant
ΔQ = −332.00ΔFeedP
ΔP = 1.0562ΔFeedP
while income remains constant. Divide ΔQ by ΔP. While income remains constant:
722 Chapter 22
ΔQ D
= −314.3
ΔP
We now can appreciate why the “slope” of the demand curve for beef is estimated by the
reduced form (RF) feed price coefficients. Changes in the feed price cause the supply curve for
beef to shift. When the demand curve remains stationary, changes in the feed price move the
equilibrium from one point on the demand curve to another point on the same demand curve.
Consequently the feed price coefficients of the reduced form (RF) equations estimate how the
quantity and price change as we move along the demand curve because they are based on the
premise that income remains constant and therefore the demand curve remains stationary. The
reduced form (RF) feed price coefficients provide us with the information we need to calculate
the “slope” of the demand curve for beef.
22.6.3 Paradox: Supply Model Price Coefficient Depends on the Reduced Form (RF) Income
Coefficients
We will use similar logic to explain why is the price coefficient estimate of the supply model,
β SP, is determined by the reduced form (RF) income coefficient estimates, aQI and aIP. Recall the
supply model:
EstQS = bConst
S
+ bSPP + bSFPFeedP
Interpret the price coefficient estimate, bPS . The price coefficient estimate of the supply model
estimates the change in the quantity of beef supplied when price of beef changes while the feed
price remains constant:
ΔQS = bPS ΔP
Price
Feed price constant
S
S
ΔP S
ΔQ
b = ΔP
P
S
ΔQ
Quantity
Figure 22.11
“Slope” of supply curve
Price
Feed price constant
S
ΔP
D'
Income increases
D
ΔQ
Quantity
Figure 22.12
Income increases and feed price remains constant
ΔQ S
bPS =
ΔP
while feed price remains constant. Since the feed price is constant, the supply curve does not
shift; hence, bPS is just the estimated “slope” of the supply curve for beef (figure 22.11).
Once again, consider the reduced form (RF) equations that estimate the quantity and price:
Suppose that income increases and feed price remains constant. As shown in figure 22.12, the
demand curve will shift to the right.
Now interpret the income coefficients in the reduced form (RF) equations:
• The income coefficient of the quantity reduced form (RF) equation estimates the change in
beef quantity when income changes while feed prices remain constant:
ΔQ = 17.347ΔInc
ΔP = 0.018825ΔInc
ΔQ 17.347ΔInc 17.347
= = = 921.5
ΔP 018825ΔInc 018825
while feed price remains constant.
Next recognize that this Q represents the quantity of beef supplied. The change in income
causes the demand curve to shift, but the supply curve remains stationary because the feed price
has remained constant. As figure 22.12 illustrates, we move from one point on the supply curve
to another point on the same supply curve. This movement represents a change in the quantity
of beef supplied, QS:
ΔQ S
= 921.5
ΔP
We can appreciate why the “slope” of the supply curve for beef is determined by the reduced
form (RF) income coefficients. Changes in income cause the demand curve for beef to shift.
When the supply curve remains stationary, changes in income move the equilibrium from one
point on the supply curve to another point on the same supply curve. Consequently the income
coefficients of the reduced form (RF) equations estimate how the quantity and price change as
we move along the supply curve because they are based on the premise that the feed price
remains constant and therefore the supply curve remains stationary. The reduced form (RF)
income coefficients provide us with the information we need to calculate the “slope” of the
supply curve for beef.
We now have developed some intuition regarding why the estimated “slope” of the demand curve
depends on the feed price reduced form (RF) coefficient estimates and why the estimated “slope”
725 Simultaneous Equations Models—Introduction
of the supply curve depends on the income reduced form (RF) coefficient estimates. The coef-
ficient interpretation approach provides intuition and also gives us a bonus. The coefficient
interpretation approach provides us with a simple way to derive the relationships between the
estimated “slopes” of the demand and supply curves and the reduced form (RF) estimates.
Compare the algebra we just used to express the estimated “slopes” of the demand and supply
curves with the algebra used in the appendix to this chapter.
5. In a simultaneous equations model, is the reduced form (RF) estimation procedure for the
value of a coefficient for an endogenous explanatory variable
a. unbiased?
b. consistent?
6. What paradoxes arise when using the reduced form (RF) estimation procedure to estimate
the price coefficients of the demand/supply simultaneous equations model? Resolve the
paradoxes.
Chapter 22 Exercises
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.
b. Consider the price reduced form (RF) estimates: EstP = 33.027 + 1.0562FeedP +
0.018825Inc
i. What equation estimates the change in the price, ΔP, when both income changes by
ΔInc and the feed price changes byΔFeedP?
ii. When the “while” condition cited in part a is satisfied, how must the change in income,
ΔInc, and the change in feed prices, ΔFeedP, be related? Solve the equation for ΔFeedP?
c. Consider the quantity reduced form (RF) estimates: EstQ = 138,726 − 332.00FeedP +
17.347Inc
i. What equation estimates change in the quantity, ΔQ, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
ii. Substitute in your answer to part b(ii). Then recall your answer to part a to calculate
the numerical value of b DI.
b. Consider the price reduced form (RF) estimates: EstP = 33.027 + 1.0562FeedP +
.018825Inc
i. What equation estimates the change in the price, ΔP, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
727 Simultaneous Equations Models—Introduction
ii. When the “while” condition cited in part a is satisfied, how must the change in income,
ΔInc, and the change in feed prices, ΔFeedP, be related? Solve the equation for ΔInc.
c. Consider the quantity reduced form (RF) estimates: EstQ = 138,726 − 332.00FeedP +
17.347Inc
i. What equation estimates change in the quantity, ΔQ, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
ii. Substitute in your answer to part b(ii). Then recall your answer to part a to calculate
S
the numerical value of b FP .
Now consider a different model describing the beef market: a constant elasticity model. The log
version of this model is
3. What are the reduced form (RF) equations for this model?
4. Estimate the parameters for the reduced form (RF) equations.
a. Focus on the quantity reduced form (RF) regression. Use the regression results to estimate
the change in the log of the quantity, Δlog(Q), when the log of the feed price changes by
Δlog(FeedP) and the log of income changes by Δlog(Inc):
Δlog(Q) =
b. Focus on the price reduced form (RF) regression. Use the regression results to estimate
the change in the log of the price, Δlog(P), when the log of the feed price changes by
Δlog(FeedP) and the log of income changes by Δlog(Inc):
Δlog(P) =
Chicken market data: Monthly time series data relating to the market for chicken from 1980
to 1985.
728 Chapter 22
Equilibrium: Q Dt = Q St = Qt
6. What are the reduced form (RF) equations for this model?
7. Estimate the parameters for the reduced form (RF) equations.
a. Focus on the quantity reduced form (RF) regression. Use the regression results to estimate
the change in the quantity, ΔQ, when the feed price changes by ΔFeedP and income changes
by ΔInc:
ΔQ =
b. Focus on the price reduced form (RF) regression. Use the regression results to estimate
the change in the price, ΔP, when the feed price changes by ΔFeedP and income changes
by ΔInc:
ΔP =
Crime and police data: Annual time series data of US crime and economic statistics from 1988
to 2007.
729 Simultaneous Equations Models—Introduction
Consider the following simultaneous equations model of crime and police expenditures
11. What are the reduced form (RF) equations for this model?
12. Estimate the parameters for the reduced form (RF) equations.
a. Focus on the crimes reduced form (RF) regression. Use the regression results to estimate
the change in the crime rate, ΔCrimes, when the unemployment rate changes by ΔUnemRate
and per capita GDP changes by ΔGdpPC:
b. Focus on the police expenditure reduced form (RF) regression. Use the regression results
to estimate the change in police expenditures, ΔPoliceExp, when the unemployment rate
changes by ΔUnemRate and per capita GDP changes by ΔGdpPC:
Equilibrium: Q Dt = Q St = Qt
Q Dt = β Const
D
+ βPDPt + + β DIInct + eDt
Q St = β Const
S
+ β PS Pt + β FP
s
FeedPt + + eSt
Substitute
Qt = β Const
D
+ βPDPt + + β IDInct + eDt
Qt = β Const
S
+ βPSPt + βFP
s
FeedPt + + eSt
Subtract
0 = β Const
D
− βConst
S
+ β DPPt − β PS Pt − β FP
s
FeedPt + β DIInct + eDt − eSt
Solve
β PS Pt − β PDPt = β Const
D
− βConst
S
+ − β FP
S
FeedPt + β IDInct + β Dt − eSt
(β PS − β PD)Pt = β Const
D
− βConst
S
+ − βFP
S
FeedPt + β DIInct + β Dt − eSt
βConst
D
− βConst
S
β FP
S
FeedPt β ID Inct etD − etS
Pt = + − + +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
Strategy to Derive the Reduced Form (RF) Equation for Qt
• Subtract the new equation for the supply model from the new equation for the demand model.
• Solve for Qt.
Q Dt = βConst
D
+ βPDPt + + β DIInct + eDt
Q St = βConst
S
+ β PS Pt + βFP
s
FeedPt + + eSt
Substitute
Qt = βConst
D
+ βPDPt + + β DIInct + eDt
Qt = βConst
S
+ βPS Pt + βFP
s
FeedPt + + eSt
Multiply
βPSQt = βPSβConst
D
+ β PSβ PDPt + + βPSβ DIInct + βPS eDt
βPDQt = β PDβConst
S
+ β PDβPSPt + βPDβFP
S
FeedPt + + β PDeSt
Subtract
Solve
(β PS − βPD)Qt = β PSβConst
D
− β PDβConst
S
+ − β PDβFP
S
FeedPt + β PSβ DIInct + βPS eDt − βPDeSt
β PS βConst
D
− β PD βConst
S
β PD β FP
S
FeedPt β S e D − β PD etS
Qt = + − + β PS β ID Inct + P tS
βP − βP
S D
βP − βP
S D
β P − β PD
Simultaneous Equations Models—Identification
23
Chapter 23 Outline
23.1 Review
23.1.1 Demand and Supply Models
23.1.2 Ordinary Least Squares (OLS) Estimation Procedure
23.1.3 Reduced Form (RF) Estimation Procedure—One Way to Cope with Simultaneous
Equations Models
23.2 Two-Stage Least Squares (TSLS): An Instrumental Variable (IV) Two- Step Approach—A
Second Way to Cope with Simultaneous Equations Models
23.2.1 First Stage: Exogenous Explanatory Variable(s) Used to Estimate the Endogenous
Explanatory Variable(s)
23.2.2 Second Stage: In the Original Model, the Endogenous Explanatory Variable
Replaced with Its Estimate
23.3 Comparison of Reduced Form (RF) and Two-Stage Least Squares (TSLS) Estimates
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.
Consider the model for the beef market that we used in the last chapter:
1. We will now introduce another estimation procedure for simultaneous equations models, the
two-stage least squares (TSLS) estimation procedure:
First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).
First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).
• Explanatory variable(s): All exogenous variables. In this case the exogenous variables are
FeedPt and Inct.
Using the ordinary least squares (OLS) estimation procedure, what equation estimates the
“problem” explanatory variable, the price of beef?
EstP = __________________________________
Generate a new variable, EstP, that estimates the price of beef based on the first stage.
and the second stage of the two-stage least squares (TSLS) estimation procedure:
Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.
• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variable(s): First-stage estimate of the endogenous explanatory variable and
the relevant exogenous explanatory variables. In this case the estimate of the price of beef and
income, EstPt and Inct.
a. Using the ordinary least squares (OLS) estimation procedure, estimate the EstP coefficient
of the demand model.
b. Compare the two-stage least squares (TSLS) coefficient estimate for the demand model
with the estimate computed using the reduced form (RF) estimation procedure in chapter 22.
and the second stage of the two-stage least squares (TSLS) estimation procedure:
Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.
• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
736 Chapter 23
Equilibrium: Q Dt = QSt = Qt
a. Focus on the reduced form (RF) estimates for the income coefficients:
i. The reduced form (RF) income coefficient estimates, aIQ and aIP, allowed us to estimate
the “slope” of which curve?
ii. If the reduced form (RF) income coefficient estimates were not available, would we
be able to estimate the “slope” of this curve?
b. Focus on the reduced form (RF) estimates for the feed price coefficients:
Q
i. The reduced form (RF) feed price coefficient estimates of these coefficients, aFP and
P
aFP, allowed us to estimate the “slope” of which curve?
737 Simultaneous Equations Models—Identification
ii. If the reduced form (RF) feed price coefficient estimates were not available, would we
be able to estimate the “slope” of this curve?
23.1 Review
For the economist, arguably the most important example of a simultaneous equations model is
the demand/supply model:
Equilibrium: QDt = Q St = Qt
• Endogenous variables: Variables determined “within” the model: Quantity and Price.
• Exogenous variables: Variables determined “outside” the model.
Price
Q = Equlibrium quantity
P
P = Equilibrium price
D
Quantity
Q
Figure 23.1
Demand/supply model
In the last chapter we learned why simultaneous equations cause a problem for the ordinary least
squares (OLS) estimation procedure:
In the demand/supply model, the price is an endogenous explanatory variable. When we used
the ordinary least squares (OLS) estimation procedure to estimate the value of the price coef-
ficient in the demand and supply models, we observed that a problem emerged. In each model,
price and the error term were correlated resulting in bias; the price is the “problem” explanatory
variable (figure 23.2).
So where did we go from here? We explored the possibility that the ordinary least squares
(OLS) estimation procedure might be consistent. After all, is not “half a loaf” better than none?
We took advantage of our Econometrics Lab to address this issue. Recall the distinction between
an unbiased and a consistent estimation procedure:
Unbiased: The estimation procedure does not systematically underestimate or overestimate the
actual value; that is, after many, many repetitions the average of the estimates equals the actual
value.
Consistent but biased: As consistent estimation procedure can be biased. But as the sample
size, as the number of observations, grows:
739 Simultaneous Equations Models—Identification
Price Price
D (eD up) S (es down)
S S
D (eD down) D D
Quantity Quantity
et
D
up etD down etS up etS down
↓ ↓ ↓ ↓
Pt up Pt down Pt down Pt up
↓ ↓
OLS estimation procedure OLS estimation procedure
for coefficient value for coefficient value
biased upward biased upward
Figure 23.2
Correlation of price and error terms
• The magnitude of the bias decreases. That is, the mean of the coefficient estimate’s probability
distribution approaches the actual value.
• The variance of the estimate’s probability distribution diminishes and approaches 0.
Unfortunately, the Econometrics Lab illustrates the sad fact that the ordinary least squares (OLS)
estimation procedure is neither unbiased nor consistent.
We then considered an alternative estimation procedure: the reduced form (RF) estimation
procedure. Our Econometrics Lab taught us that while the reduced form (RF) estimation proce-
dure is biased, it is consistent. That is, as the sample size grows, the average of the coefficient
estimates gets “closer and closer” to the actual value and the variance grew smaller and smaller.
Arguably, when choosing between two biased estimates, it is better to use the one that is con-
sistent. This represents the econometrician’s pragmatic, “half a loaf is better than none” philoso-
phy. We will now quickly review the reduced form (RF) estimation procedure.
740 Chapter 23
Reduced Form (RF) Estimation Procedure—One Way to Cope with Simultaneous Equations
Models
We begin with the simultaneous equations model and then constructed the reduced form (RF)
equations:
Equilibrium: Q Dt = Q St = Qt
We use the ordinary least squares (OLS) estimation procedure to estimate the reduced form (RF)
parameters (tables 23.1 and 23.2) and then use the ratio of the reduced form (RF) estimates to
estimate the “slopes” of the demand and supply curves.
Table 23.1
OLS regression results—Quantity reduced form (RF) equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.2
OLS regression results—Price reduced form (RF) equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
−332.00 17.347
= = −314.3 = = 921.5
1.0562 0.018825
23.2 Two-Stage Least Squares (TSLS): An Instrumental Variable (IV) Two-Step Approach—A
Second Way to Cope with Simultaneous Equations Models
Another way to estimate simultaneous equations model is the two-stage least squares (TSLS)
estimation procedure. As the name suggests, the procedure involves two steps. As we will see,
two-stage least squares (TSLS) uses a strategy that is similar to the instrumental variable (IV)
approach.
First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).
Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.
We will now illustrate the two-stage least squares (TSLS) approach by considering the beef
market.
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.
Consider the model for the beef market that we used in the last chapter:
Equilibrium: Q Dt = Q St = Qt
The strategy for the first stage is similar to the strategy used by instrumental variable (IV)
approach. The endogenous explanatory variable is the source of the bias; consequently the
endogenous explanatory variable is the “problem” explanatory variable. In the first stage the
endogenous explanatory variable is the dependent variable. The explanatory variables are all
the exogenous variables. In our example, price is the endogenous explanatory variable; conse-
quently price becomes the dependent variable in the first stage. The exogenous variables, income
and feed price, are the explanatory variables.
23.2.1 First Stage: Exogenous Explanatory Variable(s) Used to Estimate the Endogenous
Explanatory Variable(s)
Table 23.3
OLS regression results—TSLS first-stage equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
• Explanatory variable(s): All exogenous variables. In this case the exogenous variables are
FeedPt and Inct.
Using these regression results we estimate the price of beef based on the exogenous variables,
income and feed price (table 23.3).
The strategy for the second stage is also similar to the instrumental variable (IV) approach.
We return to the original model and replace the endogenous explanatory variable with its estimate
from stage 1. The dependent variable is the original dependent variable, quantity. The explana-
tory variables are stage 1’s estimate of the price and the relevant exogenous variables. In our
example we have two models, one for demand and one for supply; accordingly we first apply
the second stage to demand and then to supply.
23.2.2 Second Stage: In the Original Model, the Endogenous Explanatory Variable Replaced
with Its Estimate
Demand Model
• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variables: First-stage estimate of the endogenous explanatory variable and the
relevant exogenous explanatory variables. In this case the estimate of the price of beef and
income, EstPt and Inct.
Table 23.4
OLS regression results—TSLS second-stage demand equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.5
OLS regression results—TSLS second-stage supply equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Supply Model
• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variables: First-stage estimate of the “problem” explanatory endogenous vari-
able and any relevant exogenous explanatory variable. In this case the estimated of the price of
beef and income, EstPt, and PFeedt.
23.3 Comparison of Reduced Form (RF) and Two-Stage Least Squares (TSLS) Estimates
Compare the estimates from the reduced form (RF) approach with the estimates from the two-
stage least squares (TSLS) approach (table 23.6). The estimates are identical. In this case the
745 Simultaneous Equations Models—Identification
Table 23.6
Comparison of reduced form (RF) and two-stage least squares (TSLS) price coefficient estimates
Table 23.7
TSLS regression results—Demand model
Dependent variable: Q
Instrument(s): FeedP and Inc
Explanatory variable(s): Estimate SE t-Statistic Prob
reduced form (RF) estimation procedure and the two-stage least squares (TSLS) estimation
procedure produce identical results.
Many statistical packages provide an easy way to apply the two-stage least squares (TSLS)
estimation procedure so that we do not need to generate the estimate of the endogenous explana-
tory variable ourselves (tables 23.7 and 23.8).
746 Chapter 23
Table 23.8
TSLS regression results—Supply model
Dependent variable: Q
Instrument(s): FeedP and Inc
Explanatory variable(s): Estimate SE t-Statistic Prob
EViews makes it very easy for us to use the two-stage least squares (TSLS) approach. EViews
does most of the work for us eliminating the need to generate a new variable:
Note that these are the same estimates that we obtained when we generated the estimates of the
price on our own.
Let us step back for a moment to review our beef market model.
747 Simultaneous Equations Models—Identification
Equilibrium: Q Dt = Q St = Qt
We can use the coefficient interpretation approach to estimate the “slopes” of the demand and
supply in terms of the reduced form (RF) estimates (figure 23.3).1
Intuition: Critical Role of the Exogenous Variable Absent from the Model
In each model there is one exogenous variable absent and one endogenous explanatory variable.
This one to one correspondence allows us to estimate the coefficient of the endogenous explana-
tory variable, price.
1. Again, recall that the coefficients do not equal the slope of the demand curve, but rather the reciprocal of the slope.
They are the ratio of run over rise instead of rise over run. This occurs as a consequence of the economist’s convention
of placing quantity on the horizontal axis and price on the vertical axis. To avoid the awkwardness of using the expres-
sion “the reciprocal of the slope” repeatedly, we will place the word slope within double quotes to indicate that it is the
reciprocal.
748 Chapter 23
Suppose that FeedP increases while Suppose that Inc increases while
Inc remains constant: FeedP remains constant:
ΔQ = aFP
Q
ΔFeedP ΔQ = aIQ ΔInc
ΔP = aFP
P
ΔFeedP ΔP = aIP ΔInc
Price Price
FeedP increases
FeedP constant
Inc constant S
S’ Inc increases
Initial
Initial D’
equilibrium
equilibrium
Q
S ΔP = aI ΔInc
Q
ΔP= aF PΔFeedP
Q
D ΔQ = aI ΔInc
Q
ΔQ = aF PΔFeedP D
Quantity Quantity
Changes in the feed price shift the Changes in the income shift the
supply curve, but not the demand curve demand curve, but not the supply curve
↓ ↓
We are moving from one equilibrium to We are moving from one equilibrium to
another on the same demand curve. another on the same supply curve.
This movement represents a change in This movement represents a change in
D S
the quantity of beef demanded, Q the quantity of beef supplied, Q
↓ ↓
Estimated “slope” of demand curve: Estimated “slope” of supply curve:
ΔQ aFP
Q
ΔFeedP aFP
Q
ΔQ aIQ ΔInc aIQ
bPD = = P = P bPS = = =
ΔP aFP ΔFeedP aFP ΔP aIP ΔInc aIP
↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
demand model, FeedP, allows us to supply model, Inc, allows us to
estimate the “slope” of the demand curve estimate the “slope” of the supply curve
Figure 23.3
Reduced form (RF) and coefficient interpretation approach—Identified
749 Simultaneous Equations Models—Identification
Order Condition
The order condition formalizes this relationship:
Number of Number of
exogenous endogenous
explanatory explanatory variables
variables absent included in the model
from the model
Less than Equal to Greater than
↓
Model Model Model
underidentified identified overidentified
↓ ↓ ↓
No RF estimate Unique RF estimate Multiple RF estimates
The reduced form estimation procedure for our beef market example is identified. For both the
demand model and the supply model, the number of exogenous variable absent from the model
equaled the number of endogenous explanatory variables in the model:
23.5.2 Underidentification
We will now illustrate the underidentification problem. Suppose that no income information
was available. Obviously, if we have no income information, we cannot include Inc as an
explanatory variable in our models:
750 Chapter 23
Equilibrium: Q Dt = Q St = Qt
Let us now apply the order condition by comparing the number of absent exogenous variables
and endogenous explanatory variables in each model:
• be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model and
• not be able to estimate the coefficient of the endogenous explanatory variable, P, in the supply
model.
The coefficient interpretation approach explains why. We can still estimate the “slope” of the
demand curve by calculating the ratio of the reduced form (RF) feed price coefficient estimates,
Q P
aFP and aFP , but we cannot estimate the “slope” of the supply curve since we cannot estimate the
751 Simultaneous Equations Models—Identification
D
ΔQ = aQ ΔFeedP D ΔQ = aQΔInc
FP I
Quantity Quantity
Changes in the feed price shift the Changes in income shift the
supply curve, but not the demand curve demand curve, but not the supply curve
↓ ↓
ΔQ ΔFeedP aFP
Q
ΔQ ΔInc aIQ
bPD = = P = P bPS = = =
Δ a P ΔFeedP aFP Δ a ΔInc aIP
↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
demand model, FeedP, allows us to supply model, Inc, allows us to
estimate the “slope” of the demand curve estimate the “slope” of the supply curve
Figure 23.4
Reduced form (RF) and coefficient interpretation approach—Underidentified
reduced for (RF) income coefficients. We will use the coefficient estimate approach to explain
this phenomenon to take advantage of the intuition it provides (figure 23.4).
There is both good news and bad news when we have feed price information but no income
information:
Good news: On the one hand, since we still have feed price information, we have information
about supply curve shifts. The shifts in the supply curve cause the equilibrium quantity and price
to move along the demand curve. In other words, shifts in the supply curve “trace out” the
demand curve; hence we can still estimate the “slope” of the demand curve.
Bad news: On the other hand, since we have no income information, we have no information
about demand curve shifts. Without knowing how the demand curve shifts we have no idea how
the equilibrium quantity and price move along the supply curve. In other words, we cannot “trace
out” the supply curve; hence we cannot estimate the “slope” of the supply curve.
To use the reduced form (RF) approach to estimate the “slope” of the demand curve, we first
use ordinary least squares (OLS) to estimate the parameters of the reduced form (RF) equations
(tables 23.9 and 23.10).
752 Chapter 23
Table 23.9
OLS regression results—Quantity reduced form (RF) equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.10
OLS regression results—Price reduced form (RF) equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
Then we can estimate the “slope” of the demand curve by calculating the ratio of the feed price
estimates:
Q
aFP −821.85
Estimated “slope” of the demand curve = bPD = P
= = −1, 566.5
aFP 0.52464
Now let us use the two-stage least squares (TSLS) estimation procedure to estimate the “slope”
of the demand curve (table 23.11). In both cases the estimated “slope” of the demand curve is
−1,566.5.
When we try to use two-stage least squares (TSLS) to estimate the “slope” of the supply curve
the statistical software will report an error. We are asking the statistical software to do the
impossible.
Similarly an underidentification problem would exist if income information was available, but
feed price information was not.
753 Simultaneous Equations Models—Identification
Table 23.11
TSLS regression results—Demand model
Dependent variable: Q
Instrument(s): FeedP
Explanatory variable(s): Estimate SE t-Statistic Prob
Equilibrium: Q Dt = Q St = Qt
Again, let us now apply the order condition by comparing the number of absent exogenous
variables and endogenous explanatory variables in each model:
Number of Number of
exogenous endogenous
explanatory explanatory variables
variables absent included in the model
from the model
Less than Equal to Greater than
↓
Model Model Model
underidentified identified overidentified
↓ ↓ ↓
No RF estimate Unique RF estimate Multiple RF estimates
754 Chapter 23
• be able to estimate the coefficient of the endogenous explanatory variable, P, in the supply
model and
• not be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model.
Good news: Since we have income information, we still have information about demand curve
shifts. The shifts in the demand curve cause the equilibrium quantity and price to move along
the supply curve. In other words, shifts in the demand curve “trace out” the supply curve; hence
we can still estimate the “slope” of the supply curve.
Bad news: On the other hand, since we have no feed price information, we have no information
about supply curve shifts. Without knowing how the supply curve shifts we have no idea how
the equilibrium quantity and price move along the demand curve. In other words, we cannot
“trace out” the demand curve; hence we cannot estimate the “slope” of the demand curve.
To use the reduced form (RF) approach to estimate the “slope” of the supply curve, we first
use ordinary least squares (OLS) to estimate the parameters of the reduced form (RF) equations
(tables 23.12 and 23.13).
↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
demand model, FeedP, allows us to supply model, Inc, allows us to
estimate the “slope” of the demand curve estimate the “slope” of the supply curve
Figure 23.5
Reduced form (RF) and coefficient interpretation approach—Underidentified
Table 23.12
OLS regression results—Quantity reduced form (RF) equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.13
OLS regression results—Price reduced form (RF) equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.14
TSLS regression results—Supply equation
Dependent variable: Q
Instrument(s): Inc
Explanatory variable(s): Estimate SE t-Statistic Prob
Then we can estimate the “slope” of the supply curve by calculating the ratio of the income
estimates:
aIQ 20.225
Estimated “slope ” of the supply curve = bPS = P
= = 2, 091.7
aI 0.009669
Once again, two-stage least squares (TSLS) provide the same estimate (table 23.14).
Also when we try to use two-stage least squares (TSLS) to estimate the “slope” of the demand
curve the statistical software will report an error.
23.5.3 Overidentification
While an underidentification problem arises when too little information is available, an overi-
dentification problem arises when, in some sense, too much information is available. To illus-
trate this, suppose that in addition to the feed price and income information, the price of chicken
is also available. Since beef and chicken are substitutes, the price of chicken would appear as
an exogenous explanatory variable in the demand model. The simultaneous equations model and
the reduced form (RF) estimates would change:
Let us now apply the order condition by comparing the number of absent exogenous variables
and endogenous explanatory variables in each model:
ΔQ S
bPS =
ΔP
Price Price FeedP constant
FeedP constant Inc increases
Inc increases S ChickP increases S
Initial D’ Initial D’
equilibrium Q equilibrium Q
ΔP = aI ΔInc ΔP = aFP ΔChickP
Q Q
D ΔQ = aI ΔInc D ΔQ = aCP ΔChickP
Quantity Quantity
Changes in income shift the Changes in the chicken price shift the
demand curve, but not the supply curve demand curve, but not the supply curve
↓ ↓
ΔQ aIQ ΔInc aIQ ΔQ aCP
Q
ΔChickP aCP
Q
bPS = = = bPS = = P = P
ΔP aIP ΔInc aIP ΔP aCP ΔChickP aCP
↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
supply model, Inc, allows us to supply model, ChickP, allows us to
estimate the “slope” of the supply curve estimate the “slope” of the supply curve
Figure 23.6
Reduced form (RF) and coefficient interpretation approach—Overidentified
• be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model and
• encounter some complications when estimating the coefficient of the endogenous explanatory
variable, P, in the supply model. In fact, the reduced form (RF) estimation procedure provides
multiple estimates.
We will now explain why the multiple estimates result (figure 23.6). We have two exogenous
factors that shift the demand curve: income and the price of chicken. Consequently there are
two ways to “trace out” the supply curve. There are now two different ways to use the reduced
form (RF) estimates to estimate the “slope” of the supply curve:
759 Simultaneous Equations Models—Identification
Ratio of the reduced form (RF) Ratio of the reduced form (RF)
income coefficients chicken feed coefficients
↓ ↓
Estimated “slope” Estimated “slope”
of supply curve: of supply curve:
↓ ↓
Q Q
a aCP
bPS = I
P
bPS = P
a I aCP
We will go through the mechanics of the reduced form (RF) estimation procedures to illustrate
the overidentification problem. First we use the ordinary least squares (OLS) to estimate the
reduced form (RF) parameters (tables 23.15 and 23.16).
Table 23.15
OLS regression results—Quantity reduced form (RF) equation
Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.16
OLS regression results—Price reduced form (RF) equation
Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob
Then we use the reduced form (RF) estimates to compute the estimates for the “slopes” of
the demand and supply curves:
The reduced form (RF) estimation procedure produces two different estimates for the “slope”
for the supply curve. This is what we mean by overidentification.
While reduced form (RF) estimation procedure cannot resolve the overidentification problem,
two-stage least squares (TSLS) approach can. The two-squares least squares estimation proce-
dure provides a single estimate of the “slope” of the supply curve. The following regression
printout reveals this (tables 23.17 and 23.18).
Table 23.17
TSLS regression results—Demand model
Dependent variable: Q
Instrument(s): FeedP, Inc, and ChickP
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.18
TSLS regression results—Supply model
Dependent variable: Q
Instrument(s): FeedP, Inc, and ChickP
Explanatory variable(s): Estimate SE t-Statistic Prob
Table 23.19
Comparison of RF and TSLS estimates
The estimated “slope” of the demand curve is −366.0. This is the same estimate as computed
by the reduced form (RF) estimation procedure. Two-stage least squares (TSLS) provides a
single estimate for the “slope” of the supply curve:
bPS = 893.5
Table 23.19 compares the estimates that result when using the two different estimation proce-
dures. Note that on the one hand, the demand model is not overidentified. Both the reduced form
(RF) estimation procedure and the two-stage least squares (TSLS) estimation procedure provide
the same estimate for the “slope” of the demand curve. On the other hand, the supply model is
overidentified. The reduced form (RF) estimation procedure provides two estimates for the “slope”
of the supply curve; the two-stage least squares (TSLS) estimation procedure provides only one.
762 Chapter 23
Number of Number of
exogenous endogenous
explanatory explanatory variables
variables absent included in the model
from the model
Less than Equal to Greater than
↓
Model Model Model
underidentified identified overidentified
↓ ↓ ↓
No RF estimate Unique RF estimate Multiple RF estimates
↓ ↓ ↓
RF and TSLS RF and TSLS Unique TSLS
estimates identical estimates identical estimate
Chapter 23 Exercises
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.
Consider the following constant elasticity model describing the beef market:
763 Simultaneous Equations Models—Identification
1. Suppose that there were no data for the price of chicken and income; that is, while you can
include the variable FeedP in your analysis, you cannot use the variables Inc and ChickP.
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
3. Last suppose that you can use all the variables in your analysis.
Chicken market data: Monthly time series data relating to the market for chicken from 1980
to 1985.
Consider the following constant elasticity model describing the chicken market:
4. Suppose that there were no data for the price of pork and income; that is, while you can
include the variable FeedP in your analysis, you cannot use the variables Inc and PorkP.
ii. Can we estimate the own price elasticity of supply, β SP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β DP? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
In general, compare the reduced form (RF) estimation procedure and the two-stage least squares
(TSLS) estimation procedure.
7. When the reduced form (RF) estimation procedure provides no estimate for a coefficient,
how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
8. When the reduced form (RF) estimation procedure provides a single estimate for a coefficient,
how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
How are the estimates related?
9. When the reduced form (RF) estimation procedure provides multiple estimates for a coeffi-
cient, how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
Binary and Truncated Dependent Variables
24
Chapter 24 Outline
24.1 Introduction
2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.
Consider the following model explaining the winning party in each state:
1. Devise a theory regarding how the population density of a state affects voting behavior. That
is, as a state’s population density increases thereby becoming more urban will a state become
more or less likely to vote Democratic? What does your theory suggest about the sign of the
coefficient for PopDent?
768 Chapter 24
2. Construct a scatter diagram to illustrate the relationship between PopDen and WinDem1 by
plotting Plot PopDen on the horizontal axis and WinDem1 on the vertical axis.
Salaries and hitting performance of American League hitters: Cross-sectional 2011 salary and
2010 performance data for all hitters on Major League rosters at the opening of the 2011 season.
4. Devise a theory regarding how on base percentage affects salary. What does you theory
suggest about the sign of the coefficient for OnBasePctt?
5. Construct a scatter diagram to illustrate the relationship between OnBasePct and Salary by
plotting OnBasePct on the horizontal axis and Salary on the vertical axis.
24.1 Introduction
We will now consider two special problems that the dependent variable can create when using
the ordinary least squares (OLS) estimation procedure:
• The first arises when the dependent variable is binary, that is, when the dependent variable
can only take on two values such as Yes/No or True/False. In this case one of the two possibili-
ties is represented with a 0 and the other with a 1; that is, the dependent variable is a dummy
variable.
• The second problem arises when the dependent variable can never be greater than a specific
value and/or less than a specific value. For example, in the United States the legal wage an
employee can be paid cannot fall below the Federal minimum wage in most states and occupa-
tions. Currently the federally mandated minimum wage is $7.25 per hour.
We will now show that whenever either of these problems is present, the ordinary least squares
(OLS) estimation procedure can produce erroneous results.
770 Chapter 24
The number of votes a state receives in the Electoral College equals the number of congress-
people the state sends to Washington: the number of Representatives plus the number of Sena-
tors, two.1 With the exception of two states, Maine and Nebraska, all a state’s electoral votes are
awarded to the presidential candidate receiving the most votes. We will focus on the 2004 presi-
dential election in which the Democrat, John Kerry, challenged the incumbent Republican,
George W. Bush.2
Project: Assess the effect of state population density on state Electoral College winner.
2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.
Table 24.1 reports on the winner of each state’s electoral votes along with each state’s popula-
tion density. Note that the table ranks the states in order of their population density. As you can
see, the Republicans won all of the eleven least densely populated states. The Democrats won
all of the seven most densely populated states. The Republicans and Democrats split the states
in the middle. This observation allows us to formulate a theory:
Theory: The party winning a state’s Electoral College vote depends on the state’s population
density; as a state becomes more densely populated, the Democrats rather than the Republicans
become more likely to win.
To assess the theory we begin by constructing a scatter diagram (figure 24.1). Each point
represents one state. The dependent variable y which equals 1 whenever the Democrats win and
0 whenever the Republicans win. The dependent variable WinDem1 is a binary variable, a
dummy variable; WinDem1 takes on only one of two values, either 0 or 1:
1. The District of Columbia has three votes even though it has no (voting) members of Congress. Consequently the
District of Columbia was not included in our analysis, although its inclusion would not affect our conclusions.
2. While Maine and Nebraska do not have a “winner take all” system, in 2004 all of the state’s electoral votes were
won by a single party in each of these states. The Democrats won all of Maine’s electoral votes and the Republicans
won all of Nebraska’s votes.
771 Binary and Truncated Dependent Variables
Table 24.1
2004 Electoral College winners by state
Population Population
density (persons density (persons
State per sq mi) Winner State per sq mi) Winner
The scatter diagram appears to support our theory. As the population density increases, states
tend to support Democrats rather than Republicans. All states whose population density was less
than 35 persons per square mile voted Republican while all states whose population density was
greater than 400 persons per square mile voted Democrat. States whose population density was
between 35 and 400 persons per square mile were split. Next we will formulate a model to assess
the theory.
772 Chapter 24
Figure 24.1
Scatter diagram—Population density versus election winner
The linear probability model is just the “standard” linear specification. The dependent variable,
WinDem1t, is the winning party indicated by a 0 or 1 and the explanatory variable is population
density, PopDen:
Table 24.2
OLS regression results—2004 presidential election
Using this equation let us calculate estimates for some selected states:
These estimates for the probability of the Democrats winning each state appear to be reasonable.
Alaska’s population density is very low, and we estimate that the probability of a Democrat win
is low also, only 0.193. Florida’s population density falls in the middle, between 35 and 400;
we estimate that the probability of a Democrat win in Florida is about half. Maryland’s popula-
tion density is high, above 400; we estimate that the probability of a Democrat win in Maryland
is high also, 0.759.
Let us calculate probability estimates for Massachusetts, Rhode Island, and New Jersey:
1.2
1.0
0.8
0.6
0.4
0.2
0 PopDem
200 400 600 800 1,000 1,200
Figure 24.2
Scatter diagram—Population density versus winning party with linear best fitting line
The probability estimates for these states are nonsensical. Remember that a probability cannot
be less than 0 or greater than 1. On the one hand, if the probability of an event equals 0, that
event cannot occur. On the other hand, if the probability of an event equal 1, the event will occur
with certainty. An event cannot be more certain than certain. A probability of 1.013 or 1.217 or
1.354 simply does not make sense.
It is easy to understand why the linear probability model produces the nonsensical results;
just graph the best fitting line (figure 24.2). The probability model is linear; consequently, the
slope of the best fitting line is a constant, 0.001. Since the slope is a positive constant, the esti-
mated probability will exceed 1 whenever the population density is large enough.
24.2.3 Probit Probability Model: Correcting the Linear Model’s Intrinsic Problems
EstWinDem
1.2
1.0
0.8
0.6
0.4
0.2
0 PopDen
200 400 600 800 1,000 1,200
Figure 24.3
Scatter diagram—Population density versus winning party with stretched S-shaped best fitting line
is low initially we would expect the probability of a Democratic win to be low already. Therefore
any subsequent decrease in population density must reduce the estimated probability only by a
small amount; otherwise, a nonsensical negative probability would result.
• When the population density is “large,” a change in the population density results in only a
small change in probability; for example, when the population density rises from 800 to 1,000,
the estimated probability of being a Democratic win rises, but by only a little. When the popula-
tion density is high initially we would expect the probability of being a Democratic win to be
high already. Therefore any subsequent increase in the population density must raise the esti-
mated probability only by a small amount; otherwise, a nonsensical probability exceeding 1
would result.
• When the population density is “moderate” a change in the population results in a large change
in probability; for example, the change in the estimated probability of being a Democratic win
is large in the 500 to 700 range.
3. Any procedure that is used frequently is logit. While the probit and logit procedures do not produce identical results,
rarely do the results differ in any substantial way.
776 Chapter 24
Table 24.3
Party affiliation—Simple probit example data
1 100 Rep 0
2 450 Dem 1
3 550 Rep 0
4 900 Dem 1
(Dem) 1
(Rep) 0 x
200 400 600 800 1,000
Figure 24.4
Scatter diagram—Simplified example
the population density, PopDent, and yt equal the dummy variable representing the winning party,
WinDem1t, (Table 24.3).
The first state votes Republican and has a population density of 100 persons per square mile.
The second state votes Democratic and has a population density of 450. The third state votes
Republican and has a population density of 550. The fourth state votes Democratic and has a
population density of 900. The scatter diagram appears in Figure 24.4:
(Dem) 1
(Rep) 0 x
200 400 600 800 1,000
Figure 24.5
Scatter diagram—Simplified example with stretched S-shaped best fitting line
The easiest way to understand how the probit model works is to use an example. Let us begin
by considering one possible transformation function: z = −2 + 0.004x. For the moment, do not
worry about why we are using this particular transformation function. We simply wish to show
how the probit estimation procedure constructs its stretched S-shaped lines.
Begin by calculating the probability that state 1 votes Democratic. Its population density, xt,
equals 100, we simply plug it into the transformation function:
Next we turn to the normal distribution to calculate the probability of the Democrats and Repub-
licans winning in the state. We can use the Econometrics Lab to do so (figure 24.6).
Est Prob[Rep]
Est Prob[Dem]
1 −Esty = 0.9453
Esty = 0.0548
z
–1.6 0
Figure 24.6
x equals 100
Est Prob[Rep]
Est Prob[Dem]
1 −Esty = 0.5793
Esty = 0.4207
z
− 0.2 0
Figure 24.7
x equals 450
779 Binary and Truncated Dependent Variables
Table 24.4
Simple probit example probability calculations
Est Est
Win Actual Prob[Dem] Prob[Rep] Prob of
State party Y x z = −2 + 0.004x Esty 1 − Esty actual y
(Dem) 1
0.9452
0.5793
0.4207
0.0548
(Rep) 0 x
200 400 600 800 1,000
Figure 24.8
Scatter diagram—Population density versus winning party with stretched S-shaped best fitting line
The probability of the Democrats winning in state 2 equals 0.4207; the probability of the Repub-
licans winning equals 0.5793.
Table 24.4 reports on the probabilities for the four states (plotted in figure 24.8). The trans-
formation function, z = −2 + 0.004x, obtains the four probabilities:
• First state votes Republican equals 0.9452.
• Second state votes Democratic equals 0.4207.
• Third state votes Republican equals 0.4207.
• Fourth state votes Democratic equals 0.9452.
780 Chapter 24
Now let us calculate the probability of the actual result, that is, the probability that the first
state votes Republican and the second state votes Democratic and the third state votes Republican
and the fourth state votes Democratic; this equals the product of the individual probabilities:
Prob[1st Rep and 2nd is Dem and 3rd is Rep and 4th is Dem]
Prob Prob Prob Prob
= × × ×
[1st Rep ] [2nd Dem ] [3rd Rep] [4th Dem ]
= 0.9452 × 0.4207 × 0.4207 × 0.9452 = 0.1582
z = −2 + 0.004x
The answer is that we choose the equation that maximizes the likelihood of obtaining the
actual result. That is, choose the equation that maximizes the likelihood that the first state votes
Republican, the second Democratic, the third Republican, and the fourth Democratic.
We already calculated this probability for the transformation function z = −2 + 0.004x; it
equals 0.1582. Obviously the search for the “best” transformation function could be very time-
consuming. Our Econometrics Lab can help, however.
We can specify different constants and coefficients by moving the slide bars (figure 24.9). The
simulation quickly calculates the probability of obtaining the actual results for the transformation
functions with different constants and coefficients.
Initially, a constant of −2.00 and a coefficient of 0.004 are selected, the values we used to
explain the construction of the stretched S-shaped line. Note that the probabilities we calculated
and those calculated by the lab are identical. But let us try some other constants and coefficients.
Table 24.5 reports on several different transformation functions:
It looks like a constant of −2.00 and coefficient of 0.004 is the best. That is, the transforma-
tion function z = −2 + 0.004x maximizes the probability of obtaining the actual results. Fortu-
nately, computational techniques have been devised to estimate the best transformation functions
quickly. Statistical software uses such algorithms (table 24.6).
781 Binary and Truncated Dependent Variables
Constant Coefficient
State Winner y PopDen z Prob(Dem) Prob(Rep) Prob(y)
1 Rep 0 100 –1.60 0.0548 0.9452 0.9452
2 Dem 1 450 –0.02 0.4207 0.5793 0.5793
3 Rep 0 550 0.20 0.5793 0.4207 0.4207
4 Dem 1 900 1.60 0.9452 0.0548 0.9452
–2.00 0.0040
Figure 24.9
Probit Calculation Lab
Table 24.5
Simple probit example—Econometric Lab calculations
Constant Coefficient Prob[1st Rep and 2nd Dem and 3rd Rep and 4th Dem]
Table 24.6
Probit results—Simple probit example
Probit
Dependent variable: y
Explanatory variable(s): Estimate SE z-Statistic Prob
In EViews, after double clicking on variables selected for the regression, y and x:
• In the Equation Estimation window, click on the Method dropdown list.
• Select BINARY.
• Click OK.
These results are consistent with our simulation results. The maximum likelihood transformation
is:
z = −2 + 0.004x
Now we will apply the Probit estimation procedure to the 2004 election results using EViews
(table 24.7). Using the probit estimates we can calculate the probabilities for a few selected
states (table 24.8).
Table 24.7
Probit results—2004 presidential election
Probit
Table 24.8
Probit estimates—2004 presidential election
We will now consider a second example in which the ordinary least squares (OLS) estimation
procedure falters. The problem arises whenever the dependent variable cannot take on a value
that is greater than or less than a specific value. Our example considers the salaries of Major
League Baseball hitters at the beginning of the 2011 season.
Project: Assess the impact of a hitter’s the previous season’s performance on his salary.
Salaries and hitting performance of Major League hitters: Cross-sectional 2011 salary and 2010
performance data for all hitters on Major League rosters at the opening of the 2011 season.
First examine the salaries of the twenty-five highest paid hitters (table 24.9). Next consider
the let us examine the salaries of the twenty-five lowest paid hitters (table 24.10). As table 24.10
shows, no player earns less than $414,000. This is dictated by the collective bargaining
agreement negotiated by Major League Baseball and the Major League Baseball Player’s Union.
The minimum salary a team could pay to a Major League player in 2011 was $414,000. Con-
sequently player salaries are said to be truncated or censored, their value cannot fall below
$414,000.
We will now investigate the theory that a hitter’s performance in 2010 affects his salary in
2011.4 On the one hand, if a hitter does well in 2010, he will be able to negotiate a high salary
in 2011; on the other hand, if his 2010 performance was poor, a low salary would results. More
specifically, we will focus on the effect that on base percentage has on salary:
Theory: An increase in a hitter’s on base percentage in 2010 increases his 2011 salary.
Table 24.9
Salaries of the twenty-five highest paid MLB hitters in 2011
We begin by considering a simple linear model relating on base percent and salary:
where
Table 24.10
Salaries of the twenty-five lowest paid MLB hitters in 2011
percentage of all but two of these players falls below the average on base percentage of all MLB
hitters, 0.323. Consequently it is reasonable to believe that without the minimum salary imposed
by the collective bargaining agreement the salaries of most of these players would fall below
$414,000. In other words, the collective bargaining agreement is truncating salaries. To under-
stand why truncated variables create problems for the ordinary least squares (OLS) estimation
procedure we begin by estimating the model’s parameters by using ordinary least square (OLS)
estimation procedure (table 24.11).
35,000
30,000
25,000
Salary($1,000)
20,000
15,000
10,000
5,000
0
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
On base percentage
Figure 24.10
Scatter diagram—On base percentage versus salary
Table 24.11
OLS regression results—MLB salaries
35,000
Best fitting line
without "truancated"
30,000 observations
25,000
Salary ($1,000)
20,000
15,000
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
On base percentage
Figure 24.11
Scatter diagram—On base percentage versus salary and best fitting lines
The ordinary least squares (OLS) estimates suggest that a 0.001 point increase in on base per-
centage increases salary by $37,960. This in fact understates the actual effect of on base percent-
age, how salaries would be affected by on base percentage if Major League Baseball would not
be artificially constrained by the $414,000 minimum salary (figure 24.11). Without baseball’s
minimum wage, it is reasonable to believe that the “truncated” points would be lower and,
consequently, the best filling line would be steeper.
The Tobit estimation procedure accounts for the truncation of the dependent variable. It takes
advantage of all the available information, but treats “truncated” observations differently than
the “nontruncated” ones (table 24.12). While we will not delve into the mathematics underlying
the Tobit estimation procedure, we will show that software packages allow us to apply the pro-
cedure easily.
788 Chapter 24
Table 24.12
Tobit results—MLB salaries
Tobit
Table 24.13
Comparison of ordinary least squares (OLS) and tobit results
OLS 37,960
Tobit 47,800
• As usual, select the dependent and explanatory variables and then double click on one of the
selected variables.
• In the Equation Estimation window, click on the Method dropdown list;
• Select CENSORED.
•By default, the left (lower) censoring value is 0. This value should be changed to 414, the
minimum wage, for Major League players.
• Click OK.
Now let us compare the two estimates (table 24.13). The Tobit estimate of the OnBasePct coef-
ficient is 47,800 as opposed to the ordinary least squares (OLS) estimate of 37,960. This is
consistent with the scatter diagram appearing in figure 24.11.
789 Binary and Truncated Dependent Variables
Chapter 24 Exercises
2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.
1. Consider the following model explaining the winning party in each state:
a. Develop a theory regarding how population density influences the probability of a Repub-
lican victory. What does your theory imply about the sign of population density
coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of the
population density coefficient using the 2004 Electoral College data. Interpret the coefficient
estimates. What is the critical result?
e. Compare the ordinary least squares (OLS) equation we computed earlier in the chapter
estimating WinDem1 with the equation you just computed estimating WinRep1. Are these
two equations consistent? That is, do the two equations suggest that the effect of population
density on the winning party in a state is the same? Explain.
2. Use the probit estimation procedure to analyze the effect that PopDen has on WinRep1.
Compare the probit estimates we computed earlier in the chapter estimating WinDem1 with the
probit estimates you just computed estimating WinRep1. Are the estimates consistent?
2008 Electoral College data: Cross-sectional data from the 2008 presidential election for the
fifty states.
3. Use the probit estimation procedure to analyze the effect of population density in the 2008
presidential election. For the moment, assume that population density is the only explanatory
variable affecting the election results.
a. Use the probit estimation procedure to find the maximum likelihood transformation. What
is the critical result?
5. Use the probit estimation procedure to analyze the effect of both the population density and
the unemployment trend in the 2008 presidential election in a single model.
a. Use the probit estimation procedure to find the maximum likelihood transformation. What
is the critical result?
Compare your probit estimates in exercises 3 and 4 with the ones you just computed.
b. Are the critical results the same?
c. How do the estimate values and their significances differ?
Degree day and temperature data for Charlestown, SC: Daily time series data of degree days
and high temperatures for Charlestown, SC, in 2001.
Heating degree days only consider those days in which heat is required; that is, when tempera-
tures are high and cooling rather than heating is required, the value of heating degree days is 0.
Consequently heating degree days is a truncated or censored variable—truncated at 0.
Consider the following model:
6. Devise a theory to explain the number of degree days based on the high temperature. What
does your theory suggest about the sight of the coefficient for HighTemp?
7. Construct a scatter diagram to illustrate the relationship between HighTemp and HeatDe-
gDays by plotting HighTemp on the horizontal axis and HeatDegDays on the vertical axis. Does
the scatter diagram lend support to your theory?
8. Use the ordinary least squares (OLS) estimation procedure to analyze the effect of the high
temperature on heating degree days.
a. Estimate the value of the high temperature coefficient. Interpret the coefficient estimate.
What is the critical result?
b. Formulate the null and alternative hypotheses.
792 Chapter 24
Chapter 25 Outline
25.7 Correlation
25.7.1 Correlated Events
25.7.2 Correlated Random Variables and Covariance
25.8 Independence
25.8.1 Independent Events
25.8.2 Independent Random Variables and Covariance
1. Consider a deck of cards that contains only 3 red cards and 2 black cards.
a. Draw one card from the deck. What is the probability that the card drawn is
i. red?
ii. black?
b. Do not replace the first card drawn. Draw a second card from the deck. If the first card
drawn is red, what is the probability that the second card drawn is
i. red?
ii. black?
c. If the first card drawn is black, what is the probability that the second card drawn is
i. red?
ii. black?
2. Monty Hall was the host of the popular TV game show “Let’s Make a Deal.” Use our lab to
familiarize yourself with the game.
Click Play and follow the instructions. Play the game a dozen times or so. How frequently did
you win?
In chapter 1 we introduced the mean as a measure of the distribution center. While the mean is
the most commonly cited measure of the center, two others are also useful: mode and median.
We will now introduce them by considering the distribution of American family sizes. Table
25.1 provides data for the sizes of American families in 2008. We begin by constructing a his-
togram to illustrate the data visually (figure 25.1).
In total, there were approximately 224,703,000 adult Americans in 2008. The mean family
size for American adults in 2008 was 2.76. Now we will introduce two other measures of the
distribution center, the mode and the median.
795 Descriptive Statistics, Probability, and Random Variables—A Closer Look
60,000
Number of adults (thousands)
50,000
40,000
30,000
20,000
10,000
0
1 2 3 4 5 6 7 8 9 or more
Figure 25.1
Histogram of family size—2008
Table 25.1
Family sizes—2008
Number of Percent of
Family size (persons) adults (thousands) adults (%)
1 51,565 22.9
2 67,347 30.0
3 39,432 17.5
4 36,376 16.2
5 18,074 8.0
6 7,198 3.2
7 2,784 1.2
8 1,028 0.5
9 or more 899 0.4
Source: US Census Bureau Current Population Survey, Annual Social and Economic Supplement, 2008
796 Chapter 25
The mode is the most frequently occurring data value. It is easy to determine the mode from
the histogram; the mode corresponds to the highest bar. In our case, the mode is 2 persons. As
table 25.1 reports, 30.0 percent of all American adults were members of two person families.
The second most frequent was one person families; 20.5 percent of all American adults were
members of one person families. If you chose one American adult at random, he/she would be
more likely to be a member of a two person family than any other family size. Be aware, however,
that while a family size of two would be more likely than any other family size, the probability
that a randomly selected adult would be a member of a two person family would be less than
one-half, only 0.30:
The median is the data value that divides the distribution in the middle, that is, into two “equal”
parts. One way to think of the median is to imagine that all 224,703,000 American adults are
lined up in order of increasing family size. The median family size is the family size of the
112,351,500th American in the line, the American in the middle (figure 25.2).
In 2008 the median family size was 2. At least half (22.9 + 30.0 = 52.9 percent) of all American
adults were members of families including 2 or fewer persons and at least half (30.0 + 17.5 +
16.2 + 13.3 = 77.0 percent) were members of families including 2 or more persons.
5 18,074 (8.0%)
6 7,198 (3.2%)
7 2,784 (1.2%)
8 1,028 (0.5%)
9 or more 899 (0.4%)
112,351,500th American adult
29,983 (13.3%)
median
5 or
1 2 3 4
more
Figure 25.2
Americans adults lined up in order of family size
797 Descriptive Statistics, Probability, and Random Variables—A Closer Look
Table 25.2
Preferred level of aid assumption
1 $2,000 22.9
2 4,000 30.0
3 6,000 17.5
4 8,000 16.2
5 10,000 8.0
6 12,000 3.2
7 14,000 1.2
8 16,000 0.5
9 or more 18,000 0.4
The median voter theorem provides one example of how important the median can be. To
appreciate why, suppose that each family’s preferred level of federal aid for education depends
on its size. To make this illustration more concrete, assume that table 25.2 reports on the preferred
level of Federal aid for each family size. While these preferred aid numbers are hypothetical,
they do attempt to capture one realistic feature of family preferences. That is, as a family has
more children, it typically supports more aid for education because the family will gain more
benefits from that aid.
The median voter theorem states that in a majority rule voting process, the preferences of the
median voter, the voter in the middle, will win whenever the median’s preference is pitted against
any other alternative. In this case, the preferences of the 2 person family, the median, will win.
The preferred aid level of the median voter, $4,000, will win. To understand why, we will con-
sider two elections, one in which $4,000 is pitted against a proposal greater than $4,000 and a
second in which $4,000 is pitted against a proposal that is less than $4,000.
• $4,000 versus a proposal greater than $4,000: Suppose that the median voter’s choice,
$4,000, is pitted against a proposal that is greater than $4,000. Clearly, all adult members of 2
person families will vote for $4,000 since $4,000 is their preferred choice. Although $4,000 is
not the preferred choice of 1 person families, $4,000 is closer to their preferred $2,000 choice
than a proposal that is greater than $4,000. Hence adult members of 1 person families will vote
for $4,000 also. Now let us count the votes. Adult members of 1 and 2 person families will vote
for $4,000 which constitutes a majority of the votes, 52.9 percent to be exact. $4,000 will defeat
any proposal that is greater than $4,000.
• $4,000 versus a proposal less than $4,000: Suppose that the median voter’s choice, $4,000,
is pitted against a proposal that is less than $4,000. As before, all adult members of 2 person
families will vote for $4,000 since $4,000 is their preferred choice. Although $4,000 is not the
preferred choice of 3, 4, 5, 6, 7, 8, and 9 or more person families, $4,000 is closer to their
798 Chapter 25
preferred choice than a proposal that is less than $4,000. Hence adult members of these families
will vote for $4,000 also. Adult members of 2, 3, 4, 5, 6, 7, 8, and 9 or more person families
will vote for $4,000, which constitutes a majority of the votes, 77.0 percent to be exact. $4,000
will defeat any proposal that is less than $4,000.
The median family’s preferred level of aid, $4,000, will defeat any proposal that is greater than
$4,000 and any proposal that is less.
Recall that the mean is the average. The mean describes the average characteristic of the popula-
tion. For example, per capita income describes the income earned by individuals on average;
batting average describes the hits per official at bat a baseball player gets. In our family size
example, the mean equals 2.76. On average, the typical American adult resides in a family of
2.76 persons.
For our family size example, the median, 2, was less than the mean, 2.76. To understand why
this occurs look at the histogram again. Its right-hand tail is longer than its left-hand tail. When
we calculate the median, we find that a family of 4 persons has the same impact as a family of
9. If suddenly quintuples were born to a family of 4, making it a family of 9, the median would
not be affected. However, the mean would be affected. With the birth of the quintuples, the mean
would rise. Consequently, since the right-hand tail of the distribution is longer than the left-hand
tail, the mean is greater than the median because the high values have a greater impact on the
mean than they do on the median.
Event trees are simple but useful tools we can employ to calculate probabilities. We will use
the following experiment to introduce event trees:
An event tree visually illustrates the mutually exclusive outcomes (events) of a random
process. In figure 25.3 there are two such outcomes: either the card is red or it is black. The
circle represents the event of the random process, the card draw. There are two branches from
the circle: one representing a red card and one a black card. The ends of the two branches rep-
resent mutually exclusive events—two events that cannot occur simultaneously. A card cannot
799 Descriptive Statistics, Probability, and Random Variables—A Closer Look
Outcome Prob
Card
draw
Figure 25.3
Card draw event tree for one draw
be both red and black. The event tree reports the probabilities of a red or black card. The “stan-
dard” deck of cards contains 13 spades, 13 hearts, 13 diamonds, and 13 clubs. 26 of the 52 cards
are red, the hearts and diamonds; 26 of the 52 cards are black, the spades and clubs.
• What are the chances the card drawn will be red? Since 26 of the 52 cards are red, there are
26 chances in 52 or 1 chance in 2 that the card will be red. The probability that the card will be
red equals 26/52 or 1/2.
• What are the chances the card drawn will be black? Similarly, since 26 of the 52 cards are
black, there are 26 chances in 52 or 1 chance in 2 that the card will be black. The probability
that the card will be black is 26/52 or 1/2.
The probability is 1/2 that we will move along the red branch and 1/2 that we will move along
the black branch.
There are two important features of this event tree that are worth noting. First, we can only
wind up at the end of one of the branches because the card drawn cannot be both red and black;
stated more formally, red and black are mutually exclusive events. Second, we must wind up at
the end of one branch because the card drawn must be either red or black. This means that the
probabilities of the branch ends must sum to 1.0. We have now introduced the general charac-
teristics of event trees:
• We cannot wind up at the end of more than one event tree branch; consequently the ends of
each event tree branch represent mutually exclusive events.
• We must wind up at the end of one event tree branch; consequently the sum of the probabilities
of the event tree branches equals 1.0.
Number of draws
1
2
3
Start Pause
Figure 25.4
Card Draw simulation
In the simulation we select the number of red cards and the number of black cards to include
in our deck. We can also specify the number of cards to be drawn from the deck. In this case,
we include 26 red cards and 26 black cards in the deck; we draw one card from the deck. The
relative frequencies of red and black cards are reported. Since the Pause checkbox is checked,
the simulation will pause after each repetition. Click Start. Was a red or black card drawn? Now
the Start button becomes the Continue button. Click Continue to run the second repetition. Is
the simulation calculating the relative frequency of red and black cards correctly? Click the
Continue button a few more times to convince yourself that the simulation is calculating relative
frequencies correctly. Now uncheck the Pause checkbox and click Continue. After many, many
repetitions, click Stop. Observe that the relative frequencies of red and black cards will be about
0.500, equal to the probabilities. The simulation illustrates the relative frequency interpretation
of probability (figure 25.4):
Outcome Prob
Card
draw
Figure 25.5
Card draw event tree for one draw
Since 3 of the 5 cards are red, the probability of drawing a red card is 3/5; since 2 of the 5 cards
are black, the probability of drawing a black card is 2/5. Like our first card draw, the new event
tree possesses two properties (figure 25.5):
• We cannot wind up at the end of more than one event tree branch; consequently the ends of
each event tree branch represent mutually exclusive events.
• We must wind up at the end of one event tree branch; consequently the sum of the probabilities
of the event tree branches equals 1.0.
Econometrics Lab 25.2: Card Draw Simulation—Draw One Card from a Deck of 3 Red Cards
and 2 Black Cards
Again, we will use a simulation this experiment to illustrate the relative frequency notion of
probability.
By default, the deck includes 3 red cards and 2 black cards. Click Start and then after many,
many repetitions click Stop. The relative frequency of red cards is about 0.600 and the relative
frequency of black card is about 0.400. Once again, see that the probabilities we calculated equal
the relative frequencies when the experiment is repeated many times (figure 25.6).
802 Chapter 25
Figure 25.6
Card Draw simulation
Experiment 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards without Replacement
Thus far we have been only drawing one card from the deck. Now consider an experiment in
which we draw two cards from our small deck.
The event tree now looks a little more complicated because it must illustrate both the first and
second draws (figure 25.7).
Since the first card drawn is not replaced, the probabilities of obtaining a red and black card
on the second draw depend on whether a red or black card was drawn on the first draw. If a red
card is drawn on the first draw, two red cards and two black cards remain in the deck; conse-
quently on the second draw
• there are two chances in four of drawing a red card. The probability of drawing a red card is
1/2:
1
Prob [2nd is red IF 1st is red ] = = 0.50
2
803 Descriptive Statistics, Probability, and Random Variables—A Closer Look
Draw
Red
2
3/5
3 Red 2/4 = 1/2 3/5 × 1/2 = 3/10
Black 1st R 2nd B
2 Black = 0.30
Draw
1
Draw
Black
2
Figure 25.7
Card draw event tree for two draws
• there are two chances in four of drawing a black card. The probability of drawing a black card
is 1/2:
1
Prob [2nd is red IF 1st is black ] = = 0.50
2
If a black card is drawn on the first draw, 3 red cards and 1 black card remain in the deck;
consequently on the second draw
• there are three chances in four of drawing a red card. The probability of drawing a red card
is 3/4:
3
Prob [2nd is red IF 1st is black ] = = 0.75
4
• there is one chance in four of drawing a black card. The probability of drawing a black card
is 1/4:
1
Prob [2nd is red IF 1st is red ] = = 0.25
4
804 Chapter 25
After the two draws are complete, there are four possible outcomes (events) as indicated by
the end of each event tree branch:
• A red card in the first draw and a red card in the second draw.
• A red card in the first draw and a black card in the second draw.
• A black card in the first draw and a red card in the second draw.
• A black card in the first draw and a black card in the second draw.
These four outcomes (events) are mutually exclusive. The probability of winding up at the end
of a branch equals the product of the probabilities of each limb of the branch.
For example, consider Prob[1st is red AND is 2nd red] by focusing on the top branch of the
event tree:
Prob [1st is red AND 2nd is red ]
= Prob [1st is red ] × Prob [2nd is red IF 1st is red ]
3 1
= ×
5 2
3 1 3
= × = = 0.30
5 2 10
As figure 25.7 indicates, when the first card is drawn there are 3 chances in 5 that we will move
along the Draw 1 red limb; the probability of drawing a red card on the first draw is 3/5. Since
the first card drawn is not replaced only 4 cards now remain, 2 of which are red. So there is 1
chance in 2 that we will continue along the Draw 1’s Draw 2 red limb; if the first card drawn
is a red card, the probability of drawing a red card on the second draw is 1/2. We will use the
relative frequency interpretation of probability to confirm that the probability of a red card on
the first draw and a red card on the second draw equals the product of these two probabilities.
After many, many repetitions of the experiment:
• In the first draw, a red card will be drawn in 3/5 of the repetitions.
• For these repetitions, the repetitions in which a red card is drawn, a red card will be drawn in
1/2 of the second draws.
3 1 3
• Overall, a red card will be drawn in the first and second draws in × = = 0.30 of the
repetitions. 5 2 10
Next consider Prob[1st is red AND 2nd is black] by focusing on the second branch from the
top.
805 Descriptive Statistics, Probability, and Random Variables—A Closer Look
The probability of a red card in the first draw is 3/5. Of the 4 cards now remaining, 2 are black.
Therefore, the probability of a black card in the second draw is 1/2. The probability of a red
card on the first draw and a black card on the second is the product of the two probabilities.
Using the same logic, we can calculate the probability of winding up at the end of the other
two event tree branches:
Prob [1st is black AND 2nd is red ]
= Prob [1st is black ] × Prob [2nd is red IF 1st is black ]
2 3
= ×
5 4
2 3 3
= × = = 0.30
5 4 10
and
Prob [1st is black AND 2nd is black ]
= Prob [1st is black ] × Prob [2nd is black IF 1st is black ]
2 1
= ×
5 4
2 1 1
= × = = 0.10
5 4 10
Once again, note that our new event tree exhibits the general event tree properties:
• We cannot wind up at the end of more than one event tree branch; consequently the ends of
each event tree branch represent mutually exclusive events.
• We must wind up at the end of one event tree branch; consequently the sum of the probabilities
of the event tree branches equals 1.
Econometrics Lab 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards
and 2 Black Cards without Replacement
We can use our Card Draw simulation to illustrate the relative frequency interpretation of
probability.
806 Chapter 25
The relative frequency of each outcome mirrors the probability of that outcome.
Event trees facilitate the calculation of a combination of different outcomes. Since the ends of
each event tree branch represent mutually exclusive events, we add the probabilities of each
outcome. For example, suppose that we want to calculate the probability that a black card is
drawn on the second draw. As the event tree in figure 25.7 illustrates, there are two different
ways of this event occurring:
Focusing on the second and fourth event tree branches from the top:
Prob[2nd is black]
= Prob[1st is red AND 2nd is black] + Prob[1st is black AND 2nd is black]
= 0.30 + 0.10
= 0.40
The probability of drawing a black card on the second draw equals 0.40.
Similarly we can calculate the probability that the second card drawn is red by focusing on
the first and third event tree branches from the top:
Prob[2nd is red]
= Prob[1st is red AND 2nd is red] + Prob[1st is black AND 2nd is red]
= 0.30 + 0.30
= 0.60
The probability of drawing a red card on the second draw equals 0.60.
Similarly suppose that we wish to know the probability of drawing two reds cards or two
black cards:
The probability of drawing two reds cards or two black cards equals 0.40. We simply sum the
probabilities of the appropriate branch ends.
• Conditional probability: The probability that an event will occur if another event occurs.
• Joint probability: The probability that two events will occur together.
• Nonconditional probability: The probability that an event will occur without any informa-
tion about other events.
To understand the distinction better, consider our last experiment and two possible events:
The probability that the second card will be red if the first card is black, Prob[2nd is red IF 1st
is black], is a conditional probability. The probability that first card is black and the second card
is red, Prob[1st is black AND 2nd is red], is a joint probability. The probability of drawing a
red card on the second draw without any additional information, Prob[2nd is red], is a noncon-
ditional probability.
We have already computed these probabilities by using the event tree appearing in figure 25.7:
3
Conditional probability: Prob [2nd is red IF 1st is black ] = = 0.75
4
3
Joint probability: Prob [1st is black AND 2nd is red ] = = 0.330
10
Event trees are useful because they facilitate the calculation of all three types of probabilities.
First, event trees report the conditional probabilities and then the joint probabilities. Then, since
the ends of the branches represent mutually exclusive events, we can compute the nonconditional
probabilities by summing the joint probabilities. Table 25.3 summarizes all three types of
probabilities.
The joint, conditional, and nonconditional probabilities are related. We have in fact already used
this relationship to calculate the probabilities:
808 Chapter 25
Table 25.3
Conditional, joint, and nonconditional probabilities without replacement
1 3 3
Prob[2nd R IF 1st R] = = 0.50 Prob[1st R AND 2nd R] = = 0.30 Prob[1st R] = = 0.60
2 10 5
1 3 3
Prob[2nd B IF 1st R] = = 0.50 Prob[1st R AND 2nd B] = = 0.30 Prob[2nd R] = = 0.60
2 10 5
3 3 2
Prob[2nd R IF 1st B] = = 0.75 Prob[1st B AND 2nd R] = = 0.30 Prob[1st B] = = 0.40
4 10 5
1 1 2
Prob[2nd B IF 1st B] = = 0.25 Prob[1st B AND 2nd B] = = 0.10 Prob[2nd B] = = 0.40
4 10 5
Prob [1st is black AND 2nd is red ] = Prob [1st is black ] × Prob [2nd is red IF 1st is black ]
2 3
= ×
5 4
2 3 3
= × = = 0.30
5 4 10
The probability of drawing a black card on the first draw and a red card on the second draw
equals the probability of drawing a black card on the first times the probability of drawing a red
card on the second draw if black is drawn first.
We can generalize this relationship between joint and conditional probabilities by specifying
events A and B as follows:
Prob[1st is black AND 2nd is red] = Prob[1st is black] × Prob[2nd is red IF 1st is black]
↓
Prob[A and B] = Prob[A] × Prob[B IF A]
1. Frequently the following symbols are used instead of the words OR, AND, and IF:
To illustrate the value of event trees and the conditional/joint probability relationship, consider
a mathematical controversy that erupted in the popular press during 1990. The controversy
involved the game show “Let’s Make a Deal.” On the show, a contestant is presented with three
closed doors, numbered 1, 2, and 3. One of the doors has a valuable prize behind it. A “dud” is
behind the other two doors. The real prize has been randomly placed behind one of the three
doors. Monty Hall, the emcee, knows where the prize is located. Monty asks the contestant to
choose one of the three doors after which he opens one of the three doors. In deciding which
door to open, Monty adheres to two rules. He never opens the door that
Monty always opens one of the doors containing a dud. He then gives the contestant the oppor-
tunity to change his/her mind and switch doors. Should the contestant stay with the door he/she
chose initially or should the contestant switch?
In September 1990 Marilyn vos Salant, a columnist for Parade Magazine, wrote about the
contestant’s choice. She claimed that the contestant should always switch. This created a fire-
storm of ridicule from academic mathematicians, some of whom were on the faculty of this
country’s most prestigious institutions. The New York Times even reported the controversy on
the front page of its Sunday, July 21, 1991, edition stating that several thousand letters criticized
Ms. Salant’s advice.2 Two typical responses were:
• Robert Sachs, George Mason University: “You blew it! Let me explain: If one door is shown
to be a loser, that information changes the probability of either remaining choice—neither of
which has any reason to be more likely—to 1/2. As a professional mathematician, I’m very
concerned with the general public’s lack of mathematical skills. Please help by confessing your
error and, in the future, being more careful.”
• E. Ray Bobo, Georgetown University: “How many irate mathematicians are needed to get you
to change your mind?”
Much to the embarrassment of many mathematicians, Ms. Salant’s advice was eventually proved
correct. One of them, Dr. Sachs, had the grace to apologize:
I wrote her another letter, telling her that after removing my foot from my mouth, I’m now eating humble
pie. I vowed as penance to answer all the people who wrote to castigate me. It’s been an intense profes-
sional embarrassment.
This incident teaches us a valuable lesson. As Persi Diaconis, a mathematician from Harvard
University, stated: “Our brains are just not wired to do probability problems very well. . . .” That
2. John Tierney, The New York Times, Sunday, July 21, 1991, pp. 1 and 20.
810 Chapter 25
is why event trees are so useful. We will now use event trees to analyze the Monty Hall problem
and show how many mathematicians would have avoided embarrassment had they applied this
simple, yet powerful tool.
Suppose that you are a contest appearing on the “Let’s Make a Deal” stage. The prize has
already been placed behind one of the doors. There is an equal chance of the prize being behind
each door. The probability that it is behind any one door is one out of three, 1/3. We begin by
drawing the event tree that appears in figure 25.8.
You now choose one of the doors. Suppose you choose door numbered 3. Recall Monty’s
rules: Monty never opens the door that
Since you chose door 3, we do not have worry about Monty opening door 3 as a consequence
of Monty’s first rule. Monty will now open either door 1 or door 2. Keeping in mind Monty’s
second rule:
Prize behind
door 1
1/3
1/3
Prize behind
door 3
Figure 25.8
Event tree before you choose a door
811 Descriptive Statistics, Probability, and Random Variables—A Closer Look
• If the prize is behind door 3, he will randomly choose to open either door 1 or door 2; the
chances are 50-50 he will open door 1 and 50-50 he will open door 2:
1
Prob [ Monty opens door 1 IF prize behind door 3] =
2
1
Prob [ Monty opens door 2 IF prize behind door 3] =
2
Figure 25.9 extends the event tree we drew in figure 25.8 to account for the fact that you
chose door 3. Let us explain how we extended the top branch of the event tree. As shown in
both figures 25.8 and 25.9, the probability that the prize is behind door 1 is 1/3. Now, if the
prize is behind door 1, the probability that Monty will open door 1 is 0, and hence the probability
that he will open door 2 is 1 as indicated in figure 25.9. Using similar logic, we can now extend
the other branches.
Before opening a door Monty pauses for a commercial break so you have time to consider
your strategy. Using the event tree, it is easy to calculate the joint probabilities. From top to
bottom of figure 25.9:
1 Open
1/3 × 1 = 1/3
door 1
1/3 Prize behind
Prize Open
door 2
Open
1/3 × 0 = 0
0 door 2
1/3
1/2 Open
1/3 × 1/2 = 1/6
door 1
Prize behind
Open
door 3
Open
1/3 × 1/2 = 1/6
1/2 door 2
Figure 25.9
Event tree after you choose door 3
812 Chapter 25
1
Prob [ Monty opens door 2 AND Prize behind door 1] =
3
1
Prob [ Monty opens door 1 AND Prize behind door 2 ] =
3
1
Prob [ Monty opens door 2 AND Prize behind door 3] =
6
Also the event tree allows us to calculate the nonconditional probabilities of which door Monty
will open. Since you choose door 3, Monty will open either door 1 or door 2:
• Prob[Monty opens door 1]: Counting from the top of figure 25.9, focus on the ends of
branches 1, 3, and 5:
1 1 1
Prob [ Monty opens door 1] = 0 + + =
3 6 2
• Prob[Monty opens door 2]: Counting from the top of figure 25.9, focus on the ends of
branches 2, 4, and 6:
1 1 1
Prob [ Monty opens door 2 ] = +0+ =
3 6 2
Note that these two nonconditional probabilities sum to 1 because we know that Monty always
opens one of the two doors that you did not choose.
Now we are in a position to give you some advice. We know that Monty will open door 1 or
door 2 as soon as the commercial ends. First, consider the possibility that Monty opens door. 1.
If so, door 1 will contain a dud. In this case the prize is either behind door 2 or door 3. We can
calculate the probability that the prize is behind door 2 and the probability that the prize is
behind door 3 if Monty were to open door 1 by applying the conditional/joint probability
relationship:
• Prob[Prize behind door 2 IF Monty opens door 1]: Begin with the conditional/joint probability
relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]
813 Descriptive Statistics, Probability, and Random Variables—A Closer Look
We have already calculated these probabilities with the help of the event tree:
1
Prob [ Monty opens door 1 AND Prize behind door 2 ] =
3
1
Prob [ Monty opens door 1] =
2
Now we plug in:
2
Prob[ Prize behind door 2 IF Montyopens door 1] =
3
• Prob[Prize behind door 3 IF Monty opens door 1]: We use the same logic. Begin with the
conditional/joint probability relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]
Substituting Monty opens door 1 for A and Prize behind door 3 for B gives
814 Chapter 25
1
Prob [ Monty opens door 1] =
2
1
Prob [ Prize behind door 3 IF Monty opens door 1] =
3
1
the probability that the prize is behind door 3 is .
3
Therefore, if Monty opens door 1, you should switch from door 3 to door 2.
Next consider the possibility that Monty opens door 2. If so, door 2 will contain a dud. In
this case the prize is either behind door 1 or door 3. We can calculate the probability that the
prize is behind door 1 and the probability that the prize is behind door 3 if Monty were to open
door 2 by applying the conditional/joint probability relationship:
• Prob[Prize behind door 1 IF Monty opens door 2]: Conditional/joint probability relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]
2
Prob [ Prize behind door 1 IF Monty opens door 2 ] =
3
• Calculating Prob[Prize behind door 3 IF Monty opens door 2]: Conditional/joint probability
relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]
1
the probability that the prize is behind door 3 is .
3
Therefore, if Monty opens door 2, you should switch from door 3 to door 1.
So let us summarize. If Monty opens door 1, you should switch. If Monty opens door 2, you
should switch. Regardless of which door Monty opens, you should switch doors. Ms. vos Salant
is correct and all her academic critics should be eating “humble pie.”
What is the intuition here? Before you make your initial choice, you know that the probability
that the prize lies behind door 3 equals 1/3. Furthermore you know that after you make your
choice, Monty will open neither the door you chose nor the door that contains the prize. There-
fore, when Monty actually opens a door, you will be given no additional information that is
relevant to door 3. Without any additional information about door 3, it should not affect the
probability that the prize lies behind door 3. This is precisely what our calculations showed.
816 Chapter 25
We will use a simulation to confirm our conclusion that switching is the better strategy.
Click Start Simulation and then after many, many repetitions click Stop. The simulation reports
the winning percentage for both the no switch and switch strategies. No switch winning fre-
quency equals 0.3333 . . . and the switch winning percentages equals 0.6666. . . . The results are
consistent with the probabilities that we just calculated.
25.7 Correlation
We begin with correlated events presenting both verbal and rigorous definitions. Then, we extend
the notion of correlation to random variables.
Definition
Two events are correlated whenever the occurrence of one event helps up predict the other;
more specifically, two events are correlated when the occurrence of one event either increases
or decreases the probability of the other. Formally, events A and B are correlated whenever
Prob[B IF A] ≠ Prob[B]
The conditional probability of an event B does not equal its nonconditional probability. When
events A and B are correlated, event A is providing additional information that causes us to
modify our assessment of event B’s likelihood. To illustrate what we mean by correlation, review
experiment 25.3:
Experiment 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards without Replacement
It is easy to show that the two events, second card drawn is red and first card drawn is black in
experiment 25.3, are correlated. Refer to the event tree appearing in figure 25.7 to compute the
nonconditional probability that the second card drawn is red. Since the ends of each event tree
branch represent mutually exclusive events, we can calculate the combination of different out-
comes by adding the probabilities of ending up at the appropriate branches. As the event tree in
figure 25.7 illustrates, there are two different ways to draw a red card on the second draw:
Consequently the probability of drawing a red card on the second draw equals 0.60:
Prob[2nd is red] + Prob[1st is red AND 2nd is red] + Prob[1st is black AND 2nd is red]
= 0.30 + 0.30
= 0.60
Recall that the conditional probability of drawing a red card on the second draw if the first card
drawn is black. As illustrated in figure 25.7,
3
Prob [2nd is red IF 1st is black ] = = 0.75
4
Therefore the events are correlated. This is intuitive, is it not? If we know that we have drawn
a black card on the first draw and we do not replace it, there will be fewer black cards remaining
in the deck. Consequently we will be more likely to draw a red card on the second draw.
We can illustrate this with our example. We know that the events 1st is black and 2nd is red are
correlated. Let
We will now extend the notions of correlation to random variables. Continuing on with experi-
ment 25.3, let
v1 can take on two possible values, 0 and 1. Similarly v2 can take on two possible values, 0
and 1.
Let us modify figure 25.7 by adding v1 and v2 to the event tree describing experiment 25.3 as
shown in figure 25.10. Using the event tree, we can calculate the conditional, joint, and noncon-
ditional probabilities for the random variables v1 and v2 (table 25.4). In the absence of
3 Red
Red 1st B 2nd R 2/5 × 3/4 = 6/20
1 Black 3/4
2/5 v2 = 0 v1 = 1 v2 = 0 = 3/10
Draw = 0.30
Black
2
v1 = 1
1/4
Black 1st B 2nd B 2/5 × 1/4 = 2/20
v2 = 1 v1 = 1 v2 = 1 = 1/10
= 0.10
Figure 25.10
Event tree for two draws without replacement
819 Descriptive Statistics, Probability, and Random Variables—A Closer Look
Table 25.4
Conditional, joint, and nonconditional probabilities without replacement
1 3 3
Prob[v2 = 0 IF v1 = 0] = = 0.50 Prob[v1 = 0 AND v2 = 0] = = 0.30 Prob[v1 = 0] = = 0.60
2 10 5
1 3 3
Prob[v2 = 1 IF v1 = 0] = = 0.50 Prob[v1 = 0 AND v2 = 1] = = 0.30 Prob[v2 = 0] = = 0.60
2 10 5
3 3 2
Prob[v2 = 0 IF v1 = 1] = = 0.75 Prob[v1 = 1 AND v2 = 0] = = 0.30 Prob[v1 = 1] = = 0.40
4 10 5
1 1 2
Prob[v2 = 1 IF v1 = 1] = = 0.25 Prob[v1 = 1 AND v2 = 1] = = 0.10 Prob[v2 = 1] = = 0.40
4 10 5
information about v1, the nonconditional probabilities are relevant. The nonconditional probabil-
ity that v2 will equal
On the one hand, if we know that v1 equals 0, the probabilities change; the conditional prob-
ability that v2 will now equal
On the other hand, if we know that v1 equals 1, the conditional probability that v2 will now equal
The random variables v1 and v2 are correlated. Knowing the value of v1 helps us predict the
value of v2 because the value of v1 affects v2’s probability distribution. In this case, v1 and v2
are negatively correlated; an increase in the value of v1 from 0 to 1, increases the likelihood that
v2 will be lower
v1 = 0 v1 = 1
Prob[v2 = 0 IF v1 = 0] = 0.50 Prob[v2 = 0 IF v1 = 1] = 0.75
Prob[v2 = 1 IF v1 = 0] = 0.50 Prob[v2 = 1 IF v1 = 1] = 0.25
Now recall that covariance is a measure of correlation. If two variables are correlated, their
covariance not equal 0. Let us now calculate the covariance of the random variables of v1 and
v2 to illustrate this fact. The equation for the covariance is
820 Chapter 25
The covariance is negative because v1 and v2 are negatively correlated. An increase in v1,
increases the probability that v2 will be lower.
25.8 Independence
As with correlation we begin with independent events presenting both verbal and rigorous defini-
tions. Then we extend the notion of independence to random variables.
Definition
Two events are independent (uncorrelated) whenever the occurrence of one event does not help
us predict the other. For example, the total points scored in the Super Bowl and the relative
humidity in Santiago, Chile, on Super Bowl Sunday are independent events. Knowing the value
of one would not help us predict the other. Two events are independent when the occurrence of
one event does not affect the likelihood that the other event will occur. Formally, event B is
independent of event A independent whenever
Prob[B IF A] = Prob[B]
The occurrence of event A does not affect the chances that event B will occur.
Prob[A AND B]
Prob [ B IF A] =
Prob[A]
Prob[B AND A]
Prob [ A IF B ] =
Prob[B]
Prob[A AND B]
=
Prob[B]
Prob[A] × Prob[B]
=
Prob[B]
= Prob[A]
Two random variables are independent if the probability distribution of each is unaffected by
the value of the other:
And hence, after applying the logic we used with independent events, the joint probability equals
the product of the nonconditional probabilities:
We can show that when two random variables are independent their covariance will equal 0:
Cov[v1, v2 ] = ∑ All v ∑ All v2
(v1 − Mean[ v1 ])(v2 − Mean[ v2 ])Prob[v1 AND v2 ]
1
= ∑ All v ∑ All v2
(v1 − Mean[ v1 ])(v2 − Mean[ v2 ]) Prob[v1 ] × Prob[v2 ]
1
Rearranging factors
= ∑ All v ∑ All v2
(v1 − Mean[ v1 ])Prob[v1 ] × (v2 − Mean[ v2 ])Prob[v2 ]
1
Rearranging terms
∑ All v2
(v2 − Mean[ v2 ])Prob[v2 ]
Simplifying algebra
Simplifying algebra
↓ Simplifying algebra
= Mean[ v2 ] − Mean[ v2 ]∑ All v Prob[v2 ]
2
↓ ∑ All v2
Prob[v2 ] = 1
= Mean[v2] − Mean[v2]
=0
823 Descriptive Statistics, Probability, and Random Variables—A Closer Look
=0
We have shown that when the random variables v1 and v2 are independent their covariance
equals 0.
To illustrate two independent random variables, let us modify experiment 25.3 by replacing
the card drawn after the first draw:
Experiment 25.4: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards with Replacement
As before, our random variable equals the number of black cards drawn:
v1 can take on two possible values, 0 and 1. Similarly v2 can take on two possible values,
0 and 1.
Let us begin by constructing the event tree (figure 25.11). Using the event tree, we can cal-
culate the conditional, joint, and nonconditional probabilities for the random variables v1 and v2
(table 25.5). In the absence of information about v1, the nonconditional probabilities are relevant;
the nonconditional probability that v2 will equal
But what happens if we know that v1 equals 0? The conditional probabilities tell us that nothing
changes. The probability that v2 will equal
3 Red
Red 1st B 2nd R 2/5 × 3/5 = 6/25
1 Black 3/5
2/5 v2 = 0 v1 = 1 v2 = 0 = 0.24
Draw
Black
2
v1 = 1
2/5
Black 1st B 2nd B 2/5 × 2/5 = 4/25
v2 = 1 v1 = 1 v2 = 1 = 0.16
Figure 25.11
Event tree for two draws with replacement
Table 25.5
Conditional, joint, and nonconditional probabilities with replacement
3 9 3
Prob[v2 = 0 IF v1 = 0] = = 0.60 Prob[v1 = 0 AND v2 = 0] = = 0.36 Prob[v1 = 0] = = 0.60
5 25 5
2 6 3
Prob[v2 = 1 IF v1 = 0] = = 0.40 Prob[v1 = 0 AND v2 = 1] = = 0.24 Prob[v2 = 0] = = 0.60
5 25 5
3 6 2
Prob[v2 = 0 IF v1 = 1] = = 0.60 Prob[v1 = 1 AND v2 = 0] = = 0.24 Prob[v1 = 1] = = 0.40
5 25 5
2 4 2
Prob[v2 = 1 IF v1 = 1] = = 0.40 Prob[v1 = 1 AND v2 = 1] = = 0.16 Prob[v2 = 1] = = 0.40
5 25 5
825 Descriptive Statistics, Probability, and Random Variables—A Closer Look
However, if we know that v1 equals 1, the probability that v2 will now equal
25.9.1 Correlation
Correlated Events
• Definition: Two events are correlated whenever the occurrence of one event helps up predict
the other; more specifically, whenever the occurrence of one event either increases or decreases
the probability of the other:
Prob[B IF A] ≠ Prob[B]
826 Chapter 25
• Correlated events and joint probability: Two events are correlated whenever their joint
probability will not equal the product of the nonconditional probabilities:
Prob[B IF A] ≠ Prob[B]
↓
Prob[A AND B] ≠ Prob[A] × Prob[B]
• Correlated random variables and covariance: Two variables are correlated whenever their
covariance does not equal 0.
Cov[v1, v2] ≠ 0
25.9.2 Independence
Independent Events
• Definition: Two events are independent (uncorrelated) whenever the occurrence of one event
does not help us predict the other; more specifically, whenever the occurrence of one event does
not increase or decrease the probability of the other:
Prob[B IF A] = Prob[B]
• Independent events and joint probability: The joint probability of two independent events
equals the product of the nonconditional probabilities:
Prob[B IF A] = Prob[B]
↓
Prob[A AND B] = Prob[A] × Prob[B]
• Independent events and symmetry: When the probability of event B is unaffected by event
A, the probability of event A is unaffected by event B:
Prob[B IF A] = Prob[B]
↓
Prob[A IF B] = Prob[A]
• Independent random variables and covariance: Two variables are independent whenever
their covariance equals 0:
Cov[v1, v2] = 0
827 Descriptive Statistics, Probability, and Random Variables—A Closer Look
Integral calculus allows us to extend the equations for the mean and variance of discrete random
variables to continuous random variables. Since knowledge of integral calculus is not needed
for most econometric analysis, we will include only the definition for those students who have
been exposed to the integral calculus:
Chapter 25 Exercises
1. Focus on thirty students who enrolled in an economics course during a previous semester.
Student SAT data: Cross-sectional data of student math and verbal high school SAT scores from
a group of 30 students.
2. Two cab companies, Yellow Cab and Orange Cab, serve a small town. There are 900 Yellow
cabs and 100 Orange cabs. A cab strikes a color-blind pedestrian. After striking the pedestrian,
the cab immediately leaves the scene of the accident. The victim knows that a cab struck him,
but his color blindness makes him unable to report on the hue of the cab.
a. Based on this information, draw an event tree to determine the probability that the guilty
cab was Yellow and the probability that the guilty cab was Orange?
A judge must find one of the cab companies liable for the damage done to the pedestrian.
b. Based on the available information, which cab company should the judge find guilty?
Explain.
3. Reconsider the cab liability issue described in question 2. An eyewitness has just come
forward who reports that he saw the accident. Irrefutable documentation has proven that the
probability of the eyewitness being correct is 0.8 and the probability of being incorrect is 0.2.
829 Descriptive Statistics, Probability, and Random Variables—A Closer Look
a. Extend the event tree you constructed in question 2 to reflect two possibilities: the pos-
sibility that the eyewitness will report that a Yellow cab was guilty and that an Orange cab
was guilty.
b. Using your event tree, determine the following joint probabilities:
Prob[Yellow reported] =
Prob[Orange reported] =
e. Should the judge, a very busy individual, take the time to hear the eyewitness’s testimony?
Explain.
f. Your event tree reflects two pieces of information: the relative number of Yellow and
Orange cabs and the reliability of eyewitness testimony. Intuitively, how do these two pieces
of information explain you results?
4. This problem comes from a “Car Talk Puzzler: Attack of the Bucolic Plague.”3
RAY: This puzzler came to us a while ago—January 1999, to be precise. It’s from Professor
Bruce Robinson at the University of Tennessee in Knoxville. Of course, I had to make a few
modifications . . .
TOM: He won’t even want to be associated with it, once you’re finished.
RAY: I’m sure he’ll send us an email asking to have his name expunged.
Here it is:
• A dreaded new disease is sweeping across the countryside. It’s called “The Bucolic Plague.”
If you’re afflicted with it, you begin wandering around the woods aimlessly, until you finally
collapse and die. The remedy is to lock yourself in the bathroom for two or three days, until the
urge passes.
• A test has been developed that can detect whether you have the disease. The test is 99 percent
accurate. That is, if you have the disease, there is a 99 percent chance that the test will detect
it. If you don’t have the disease, the test will be 99 percent accurate in saying that you don’t.
• In the general population, 0.1 percent of the people have the disease—that’s one-tenth of one
percent.
• You decide to go for the test. You get your results: positive.
Should you lock yourself in the bathroom and ask for a constant supply of magazines, or should
you not be worried? And, the real question is, what is the probability that you actually have the
Bucolic Plague?
a. First, suppose that you have not been tested yet. Consider the fact that 0.1 percent of the
general population has the disease. Assuming that you are typical, what is the probability that
you have the disease? Draw the appropriate event tree to illustrate this.
b. Now, you have been tested, but you have not yet received the test results. Extend your
event tree to account for the possibility of a positive or negative result.
i. Using your event tree, determine the following joint probabilities:
iii. Using the conditional/joint probability relationship, compute the following conditional
probabilities:
5. Suppose that the producer of “Let’s Make a Deal” changes the way in which the “prize door”
is selected. Instead of randomly placing the prize behind one of the three doors, the following
procedure is used:
• First, the contestant chooses two doors rather one.
• Second, Monty opens one of the two doors the contestant had chosen. The door Monty opens
never contains the prize.
• Third, Monty gives the contestant the opportunity to stay with unopened door that he/she
initially chose or switch to the other unopened door.
Suppose that the contestant initially chooses doors 1 and 2. Monty uses the following rules to
decide which door to open:
• If the prize is behind door 1, he would open door 2.
• If the prize is behind door 2, he would open door 1.
• If the prize is behind door 3, he would choose either to open door 1 or door 2 randomly; that
is, if the prize is behind door 3, the chances are 50-50 he will open door 1 and 50-50 he will
open door 2.
a. Draw the event tree describing which door Monty will open.
b. Calculate the following conditional probabilities:
i. Prob[Prize behind door 2 IF Monty opens door 1]
ii. Prob[Prize behind door 3 IF Monty opens door 1]
iii. Prob[Prize behind door 1 IF Monty opens door 2]
iv. Prob[Prize behind door 3 IF Monty opens door 2]
c. After Monty opens a door, would you advise the contestant to stay with the unopened door
he/she chose initially or switch to the other unopened door, door 3?
5. Suppose that the producer of “Let’s Make a Deal” changes the way in which the “prize door”
is selected. Instead of randomly placing the prize behind one of the three doors, the following
procedure is used:
• Thoroughly shuffle a standard deck of fifty-two cards.
• Randomly draw one card, note its color, and replace the card.
832 Chapter 25
Chapter 26 Objectives
26.4 Estimation Procedures: Importance of the Probability Distribution’s Mean (Center) and
Variance (Spread)
26.5 Strategy to Estimate the Variance of the Estimated Mean’s Probability Distribution
26.7 Step 2: Use the Estimated Variance of the Population to Estimate the Variance of the
Estimated Mean’s Probability Distribution
⎡1 ⎤
Mean ⎢ (v1 + v2 + … + vT ) ⎥ = ActMean
⎣T ⎦
whenever Mean[vi] = ActMean for each i; that is,
⎡1 ⎤ ActVar
Var ⎢ (v1 + v2 + … + vT ) ⎥ =
⎣T ⎦ T
whenever
and
• the vi’s are independent; that is, all the covariances equal 0.
3. Consider an estimate’s probability distribution:
a. Why is the mean of the probability distribution important? Explain.
b. Why is the variance of the probability distribution important? Explain.
4. Consider a random variable. When additional uncertainty is present how is the spread of does
the random variable’s probability distribution affected? How is the variance affected?
Last summer our friend Clint was hired by the consumer group to analyze a claim made by the
Key West Tourist Bureau. The tourist bureau claims that the average low temperature in Key
West during the winter months is 65 degrees Fahrenheit (rounded to the nearest degree). Clint
has been hired to assess this claim.
The consumer group has already compiled the high and low high temperatures for each winter
day from the winter of 2000–2001 to the winter of 2008–2009, 871 days in total.
835 Estimating the Mean of a Population
Figure 26.1
Clint’s 871 cards
Key West winter weather data: Time series data of daily high and low temperatures in Key
West, Florida, during the winter months (December, January, and February) from the 2000–2001
winter to the 2008–2009 winter.
Clint has recorded the low temperature for each day on a 3 × 5 card (figure 26.1).
Since we can access statistical software, we can use the software to calculate the actual average
low temperature and the actual variance in the winter months.
In fact the claim of the Tourist Bureau is justified. The average low temperature in Key West
was 64.56 which when rounded to the nearest whole number equals 65. But Clint has a problem.
He does not have access to any statistical software, however. He does not have the time to sum
all 871 observations to calculate the mean. It would take him a long time to do so. Instead, he
adopts the econometrician’s philosophy to assess the Tourist Bureau’s claim:
Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
Clint samples the population of all 871 days by performing the following experiment:
836 Chapter 26
Figure 26.2
Four cards Clint randomly selects
Use the average of the low for the four days sampled to estimate the mean:
v1 + v2 + v3 + v4
EstMean =
4
Clint draws the following four cards (figure 26.2). Clint uses these four values to estimate the
mean of the population:
69.10 + 66.20 + 54.00 + 55.90 245.20
EstMean = = = 61.30
4 4
By default, a sample size of 4 is selected (figure 26.3) and the Pause checkbox is checked. Click
the Start button and compare the numerical value of the estimated mean with the actual mean
from the first repetition of the experiment. Are they equal? Click the Continue button a few times
and compare the estimated and actual means for each of the subsequent repetitions. We can draw
two conclusions:
837 Estimating the Mean of a Population
Repetition
Numerical
EstMean
value of the
estimated
Mean
mean in this
repetition Variance
Figure 26.3
Opinion Poll simulation
• We cannot expect the numerical value of the estimated mean to equal the actual population
mean.
• We cannot predict the numerical value of the estimated mean before the experiment is con-
ducted; hence the estimated mean is a random variable.
So where does this leave Clint? He knows that in all likelihood his estimate, 61.30, does not
equal the actual population mean. But perhaps he could get some sense of how likely it is for
his estimate to be “close” to the actual value. Recall that Clint faced the same problem when
assessing the reliability of his opinion poll. He wanted to know how likely it was that his opinion
poll results were close to the actual fraction of the population supporting him for class president.
As with Clint’s opinion poll, we will use the general properties of Clint’s estimation procedure
to assess the reliability of one specific application of the procedure:
838 Chapter 26
v1 + v2 + v3 + v4
EstMean =
4
Before selection After selection
↓ ↓
Random variable:
Estimate: Numerical value
Probability distribution
↓ ↓
Mean[EstMean]
EstMean = 61.30
Var[EstMean]
↓
Mean and variance describe the center and spread of the estimate’s probability distribution
The estimated mean, EstMean, is a random variable. While we cannot determine the value of
a random variable before the experiment is conducted, we can often describe a random variable’s
probability distribution. Our next goal is to do just that. We will describe the probability distribu-
tion of the estimated mean, EstMean, by deriving the equations for its mean and variance. The
mean describes the probability distribution’s center and the variance describes the probability
distribution’s spread. We will use the same strategy that we used when studying opinion polls.
First we will consider a very simple and unrealistic experiment in which only one card is drawn.
Then we will apply the arithmetic of means and variances to generalize the results for sample
sizes greater than one.
We will now derive the equation for the mean and variance of v’s probability distribution.
839 Estimating the Mean of a Population
Recall the formula for the mean (expected value) of a random variable:
Mean[ v] = ∑ v Prob[v]
All v
To apply this formula to our experiment, we will calculate the probability of drawing a specific
card from Clint’s deck of 871 cards (figure 26.4).
What is the probability of drawing the card for December 1, 2000? Since there are 871 cards
in the well-shuffled deck, there is one chance in 871 of drawing the December 1, 2000, card.
Thus the probability of drawing the December 1, 2000, card is 1/871. What is the probability
of drawing the card for December 2, 2000? By the same logic, the probability of drawing the
December 2, 2000, card is 1/871. Clearly, the probability of drawing the card for any specific
day from a well-shuffled deck of 871 cards is 1/871:
1
Prob [12-1-2000] = Prob [12-2-2000] = … = Prob [2-28-2009] =
871
Figure 26.4
Clint’s 871 cards
840 Chapter 26
In words, the center of the random variable v’s probability distribution, Mean[v], equals that
actual mean of the population, 64.56.
We can use the same strategy to show that the variance of the random variable v will equal the
population variance. Review the formula for the variance of a random variable:
In words, the spread of the random variable v’s probability distribution, Var[v], equals the actual
variance of the population, 43.39.
Next consider the general case where T cards are drawn from the deck and then apply the
arithmetic of means and variances:
We can describe the probability distribution of the random variable EstMean by applying the
arithmetic of means and variances to the estimate of the mean:
v1 + v2 + … + vT 1
EstMean = = (v1 + v2 + … + vT )
T T
First we consider the mean. Keep in mind that the mean of each v equals the population mean,
ActMeanAll871:
⎡1 ⎤
Mean [ EstMean] = Mean ⎢ ( v1 + v2 + … + vT )⎥
⎣T ⎦
Mean[cx] = c Mean[x]
1
= Mean [ v1 + v2 + … + vT ]
T
Simplifying
= ActMeanAll871
842 Chapter 26
The terminology can be confusing: Mean[EstMean] is the mean of the estimated mean. To
resolve this confusion, remember that the estimated mean, EstMean, is a random variable.
Therefore, like any random variable, EstMean is described by its probability distribution. So
Mean[EstMean] refers to the mean of EstMean’s probability distribution, the center of EstMean’s
probability distribution. To emphasize this point, we will often follow the word “mean” with the
word “center” in parentheses when referring to Mean[EstMean]; for example, Mean[EstMean]
is the mean (center) of EstMean’s probability distribution.
Next we focus on the variance. Note that the variance of each v equals the population vari-
ance, ActVarAll871. Also note that since each card drawn is replaced, the probability distribution
of v for one draw is not affected by the value of v on any other draw. The v’s are independent;
hence all the covariances equal 0:
⎡1 ⎤
Var [ EstMean] = Var ⎢ ( v1 + v2 + … + vT )⎥
⎣T ⎦
Var[cx] = c2Var[x]
1
= Var [ v1 + v2 + … + vT ]
T2
Var[x + y] = Var[x] + Var[y] when x and y are independent. The covariances are all 0.
1
= (Var [v1 ] + Var [v2 ] + … + Var [vT ])
T2
Simplifying
ActVarAll 871
=
T
• The mean (center) of the estimated fraction’s probability distribution, Mean[EstMean], equals
the actual population mean:
Mean[EstMean] = ActMeanAll871
Figure 26.5
Opinion Poll simulation
844 Chapter 26
Table 26.1
Checking our equations for the mean and variance
Equations Simulation
48.39
1 64.56 = 48.39 ≈ 64.56 ≈ 48.39
1
48.39
4 64.56 = 12.10 ≈ 64.56 ≈ 12.10
4
48.39
10 64.56 = 4.84 ≈ 64.56 ≈ 4.84
10
It is important to keep in mind what we know versus what Clint knows. We know that the average
of all the lows, the actual mean, equals 64.56 and the actual variance equals 43.39. We used a
statistical package to compute these two statistics. Accordingly, we know that the Tourist
Bureau’s claim that the average winter low in Key West is 65 can be justified (at least to the
nearest whole degree). Clint does not have access to a statistical package, however, and does
not have the time to perform the arithmetic calculations needed to calculate the actual mean.
Clint proceeds to estimate the mean winter low temperature by randomly selecting four days
and calculating the average of the lows on these days. What does Clint know about his estimated
mean, EstMean? Let us summarize what Clint knows:
Probability distribution
EstMean
ActMeanAll871
Figure 26.6
Probability distribution of EstMean
Mean[EstMean] = ActMeanAll871
ActVarAll 871
Var[ EstMean] =
T
Even though Clint does not know the numerical value of the actual mean and actual variance,
ActMeanAll871 and ActVarAll871, he does know that the mean (center) of EstMean’s
ActVarAll 871
probability distribution equals ActMeanAll871 and the variance equals whatever
T
the values are.
26.4 Estimation Procedures: Importance of the Probability Distribution’s Mean (Center) and
Variance (Spread)
Probability distribution
estimate
Actual value
Figure 26.7
Probability distribution of estimates—Importance of the mean
the estimate will be too low. The fact that the mean (center) of EstMean’s probability distribution
equals the population mean is good news for Clint. The procedure he used to estimate the popu-
lation mean is unbiased, it does not systematically underestimate the actual population mean.
Since the estimation procedure is unbiased, the variance of the estimate’s probability distribution
plays a critical role.
• Importance of the probability distribution’s variance: When the estimation procedure is
unbiased, the probability distribution’s variance (spread) reveals the estimate’s reliability; the
variance tells us how likely it is that the numerical value of the estimate calculated from one
repetition of the experiment will be close to the actual value (figure 26.8).
26.5 Strategy to Estimate the Variance of the Estimated Mean’s Probability Distribution
The variance of EstMean’s probability distribution is crucial in assessing the reliability of Clint’s
estimate. On the one hand, if the variance is small, Clint can be confident that his estimate is
“close” to the actual population mean. On the other hand, if the variance is large, Clint must be
skeptical. What does Clint know about the variance of EstMean’s probability distribution? He
has already derived the equation for it:
ActVarAll 871
Var[ EstMean] = , where T = sample size
T
847 Estimating the Mean of a Population
Estimate Estimate
Actual value Actual value
↓ ↓
Estimate is unreliable
Estimate is reliable
Figure 26.8
Probability distribution of estimates—Importance of the variance
The sample size equals 4. We know that the actual population variance equals 43.39; hence we
know that the variance of the estimated mean’s probability distribution, Var[EstMean], equals
10.85:
43.39
Var[ EstMean] = = 10.85
4
Clint does not know the actual variance of the population, ActVarAll871, however. While he
has the raw data needed to calculate ActVarAll871, he does not have time to do so—it takes
longer to calculate the variance than the mean. So what should he do? Recall the econometri-
cian’s philosophy:
Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
Clint can estimate the population variance from the available information, his four randomly
selected values for the low temperatures: 69.10, 66.20, 54.00, and 55.90. Then, he can modify
the equation we derived for Var[EstMean] by replacing Var[EstMean] and ActVarAll871 with
their estimated versions:
EstVarAll 871
EstVar[ EstMean] =
T
848 Chapter 26
where
EstVarAll 871
EstVar[ EstMean] =
T
We will now describe three attempts to estimate the population variance using the Clint’s four
randomly selected values by calculating the following:
1. The variance of Clint’s four numerical values based on the actual population mean.
2. The variance of Clint’s four numerical values based on the estimated population mean.
3. The “adjusted” variance of Clint’s four numerical values based on the estimated population
mean.
While the first two attempts fail for different reasons, they provide the motivation for the third
attempt, which succeeds. Therefore it is useful to explore the two failed attempts.
26.6.1 First Attempt: Variance of Clint’s Four Numerical Values Based on the Actual
Population Mean
The rationale is very simple. The variance is the average of the squared deviations from the
mean. So why not just calculate the variance for the four values he has, 66.20, 62.60, 62.10, and
57.90, to estimate the variance of the entire population? Let us do that now:
849 Estimating the Mean of a Population
The average of the squared deviations provides an estimate of the population’s variance:
Note that the estimate obtained from Clint’s sample, 69.94, does not equal the population vari-
ance, 43.39. This should not surprise us, however. We would never expect any estimate to achieve
perfection. What then is the best we could hope for? We could hope that this estimation procedure
would be unbiased; we could hope that the estimation procedure does not systematically under-
estimate or overestimate the actual value. This estimation procedure is in fact unbiased. We will
use our simulation to illustrate this.
Focus your attention on the lower left corner of the window. For the moment ignore the Divide
By line. In the Use Mean line, note that the Act button is selected. This means that the actual
population mean, 64.56, is used to calculate the deviations. Consequently our estimate of the
variance is based on the actual population mean just as we did in our calculations above. Click
Start. The values of the four cards selected are reported in figure 26.9. Calculate the variance
of the four values based on the actual population mean, 64.56. Is the simulation calculating the
estimated variance, EstVar, correctly? Next click Continue. Again, calculate the estimated vari-
ance. Is the simulation calculating it correctly? Also check to see if the simulation has calculated
850 Chapter 26
Mean
Divide by T T−1
Figure 26.9
Opinion Poll simulation
the mean (average) of the variance estimates from the first two repetitions. Now uncheck the
Pause checkbox; after many, many repetitions click Stop. Compare the mean (average) of the
estimated variances with the actual population variance. Both equal 43.39. This suggests that
the estimation procedure for the actual population variance is unbiased.
Does this help Clint? Unfortunately, it does not. Recall what Clint knows versus what we
know. We know that the actual population mean equals 64.56, but Clint does not. Indeed, if he
knew the population mean, he would not have to go through all of this trouble in the first place.
So Clint must now try another tack.
26.6.2 Second Attempt: Variance of Clint’s Four Numerical Values Based on the Estimated
Population Mean
Since Clint does not know the actual population mean what can he do? He can use the estimate
he calculated, 61.30, from the four randomly selected lows?
851 Estimating the Mean of a Population
The average of the squared deviations based on Clint’s estimated population mean provides an
estimate of the actual population’s variance that Clint can calculate:
Hopefully this estimation procedure will be unbiased. Let us use a simulation to find out.
Before clicking Start, note that Est is selected in the Use Mean line. Consequently, instead of
calculating the deviation from the actual mean, the simulation will now calculate the deviation
from the estimated mean. Now click Start and then after many, many repetitions click Stop.
Compare the mean (average) of the estimated variances with the actual population variance.
Unfortunately, they are not equal. The mean of the estimated variances equals 32.54 while the
actual variance equals 43.39. This suggests that the estimation procedure for the actual popula-
tion variance is biased downward; it systematically underestimates the variance of the
population.
852 Chapter 26
To explain why, note that when we use Clint’s estimate of the population mean to calculate
the sum of squared deviations, we obtain a lower sum than we did when we used the actual
population mean:
Sum of squared deviations using actual population mean = 209.8104
Sum of squared deviations using estimated population mean = 167.3000
Is this just a coincidence? No, it is not. To understand why, we will ask a question: What
value would minimize the sum of squared deviations of the 4 sample values? Let vVarMin equal
this value; that is,
vVarMin minimizes (v1 − vVarMin)2 + (v2 − vVarMin)2 + (v3 − vVarMin)2 + (v4 − vVarMin)2
where v1, v2, v3, and v4 equal the four sample values.
With a little calculus we can solve for vVarMin, differentiate the sum of squared deviations with
respect to vVarMin, and then set the derivative equal to 0:
d[(v1 − vVarMin )2 + (v2 − vVarMin )2 + (v3 − vVarMin )2 + (v4 − vVarMin )2 ]
dvVarMin
= −2(v1 − vVarMin ) − 2(v1 − vVarMin ) − 2(v1 − vVarMin ) − 2(v1 − vVarMin ) = 0
Now some algebra,
v1 + v2 + v3 + v4
What does equal? It is just the estimated population mean. Using the estimate
4
of the population mean to calculate the deviations from the mean minimizes the sum of squared
deviations. The two sums are equal only if the estimate of the population mean equals the actual
population mean:
Only if the estimated mean equals the actual mean
Sum of squared deviations based ↓ Sum of squared deviations based
on estimated population mean = on actual population mean
Typically the estimate of the population mean will not equal the actual population mean,
however. Consequently the sum of squared deviations based on the estimate of the population
mean will be less than the sum of squared deviations based on the population mean itself:
853 Estimating the Mean of a Population
Typically the estimated mean will not equal the actual mean
Sum of squared deviations based ↓ Sum of squared deviations based
on estimated population mean < on actual population mean
↓ ↓
Systematically underestimates Unbiased estimation procedure for
actual population variance actual population variance
Recall that the average of the sum of squared deviations based on the population mean pro-
vides an unbiased procedure for the population variance. Consequently, if Clint were to estimate
the population variance by using the deviations from the estimated mean rather than the actual
mean, he would systematically underestimate the variance of the population. So, let us make
one last attempt.
26.6.3 Third Attempt: “Adjusted” Variance of Clint’s Four Numerical Values Based on the
Estimated Population Mean
How should Clint proceed? Fortunately, he has a way out. Clearly, Clint has no choice but to
use the estimated population mean to calculate the sum of squared residuals. If he divides by 3
rather than 4, his estimation procedure will be unbiased. More generally, when the actual popula-
tion mean is unknown and the estimated population mean must be used to calculate the deviations
from the mean, we divide the sum of squared deviations by the sample size less 1 rather than
by the sample size itself. In this case the sample size less 1 equals the degrees of freedom; there
are 3 degrees of freedom. For the time being, do not worry about precisely what the degrees of
freedom represent and why they solve the problem of bias. We will motivate the rationale later
in this chapter. We do not wish to be distracted from Clint’s efforts to assess the Tourist Bureau’s
claim at this time. So we will postpone the rationalization for now.
Let us now compute our “adjusted” estimate of the variance. Recall our calculations of the
sum of squared deviations using the estimated population mean:
We will use a simulation to illustrate that the adjusted variance procedure is unbiased.
Before clicking Start, note that Est is selected in the Use Mean line and T − 1 in the Divide By
line. Consequently the simulation will now calculate the deviation from the estimated mean and
then after summing the squared deviations, it will divide by the sample size less one, T−1, rather
than the sample size itself, T.
Now click Start and then after many, many repetitions click Stop. Compare the mean
(average) of the estimated variances with the actual population variance. They are equal. This
suggests that our third estimation procedure for the population variance is unbiased. After many,
many repetitions the adjusted average of the squared deviations equals the actual population
variance.
26.7 Step 2: Use the Estimated Variance of the Population to Estimate the Variance of the
Estimated Mean’s Probability Distribution
Recall that the standard deviation equals the square root of the variance. Consequently the
estimated standard deviation equals the square root of the estimated variance. Furthermore the
estimated standard deviation has been given a special name, it is called the standard error:
SE = EstSD = EstVar
Now let us calculate the standard error, the estimated standard deviation of the estimated mean,
EstMean’s, probability distribution:
Now, at last, Clint is in a position to assess the Tourist Bureau’s claim that the average daily low
temperature during winter was 65 in Key West. Hypothesis testing allows Clint to do so. Recall
the steps involved in hypothesis testing:
Critical result:The estimated mean is 61.30. This evidence, the fact that the estimated mean is
less than 65, suggests that the Tourist Bureau’s claim is not justified.
Step 2: Play the cynic, challenge the evidence, and construct the null and alternative
hypotheses.
Cynic’s view: Despite the results the average low temperature is actually 65.
The null hypothesis adopts the cynical view by challenging the evidence; the cynic always
challenges the evidence. The alternative hypothesis is consistent with the evidence.
856 Chapter 26
H1: ActMeanAll871 < 65 ⇒ Actual mean is less than 65; cynic is incorrect.
Step 3: Formulate the question to assess the null hypothesis and the cynic’s view.
• Generic question: What is the probability that the result would be like the one obtained (or
even stronger), if H0 is true (if the cynic is correct)?
• Specific question: The estimated mean was 61.30. What is the probability of obtaining an
average low of 61.30 or less from four randomly selected days if the actual population mean of
lows were 65 (if H0 is true)?
Step 4: Use the general properties of the estimation procedure, the estimated mean’s probability
distribution, to calculate Prob[Results IF H0 true].
When we assess Clint’s poll, we used the normal distribution to calculate this probability. Unfor-
tunately, we cannot use the normal distribution now. Instead, we must use a different distribution,
the Student t-distribution. We will now explain why.
Recall that the variable z played a critical role in using the normal distribution:
Value of random variable − Distribution mean
z=
Distribution sttandard deviation
= Number of standard deviations from the mean
In words, z equals the number of standard deviations the value lies from the mean. But Clint
does not know what the actual variance and standard deviation of his probability distribution
857 Estimating the Mean of a Population
Normal
Student t
Figure 26.10
Normal and Student t-distributions
equals. That is why he had to estimate it. Consequently he cannot use the normal distribution
to calculate probabilities.
When the standard deviation is not known and must be estimated, the Student t-distribution
rather than the normal distribution must be used (figure 26.10). t equals the number of estimated
standard deviations the value lies from the mean:
Since estimating the standard deviation introduces an additional element of uncertainty, the
Student t-distribution is more “spread out” than the normal distribution.
Unfortunately, the Student t-table is more cumbersome than the normal distribution table. The
right-hand tail probabilities not only depend on the value of t, the number of estimated standard
deviations from the mean, but also on the degrees of freedom. We will exploit our Econometrics
Lab to calculate the probability. Let us review the relevant information (figure 26.11):
Student t-distribution
Mean = 65
SE = 3.73
DF = 3
0.1972
EstMean
61.30 65
Figure 26.11
Probability distribution of EstMean
Now let us return to the fifth and last step in the hypothesis testing procedure.
859 Estimating the Mean of a Population
At the traditional significance levels used in academe (1, 5, and 10 percent), we cannot reject
the null hypothesis that the average low is 65 and the tourist bureau’s claims are justified. Con-
sequently Clint fails to reject the null hypothesis; that is, he fails to reject the Tourist Bureau’s
claim that the average winter low temperature in Key West is 65.
Earlier in this chapter we postponed our explanation of degrees of freedom because it would
have interrupted the flow of our discussion. We will now return to the topic. In this case the
degrees of freedom equal the sample size less one:
We will now explain why we divided by the degrees of freedom, the sample size less one, rather
than the sample size itself when estimating the variance. To do so, we will return to the basics
and discuss a calculation we have been making since grade school.
Revisit Amherst precipitation in the twentieth century (table 26.2). Calculating the mean for
June obtains
Table 26.2
Monthly precipitation in Amherst, MA, during the twentieth century
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24
860 Chapter 26
Each of the 100 Junes in the twentieth century provides one piece of information that we use to
calculate the average. To calculate an average, we divide the sum by the number of pieces of
information. The key principle is how we calculate a mean or, in everyday language, an average:
Key principle: To calculate a mean or an average, we divide the sum by the number of pieces
of information.
Sum
Mean (average) =
Number of pieces of information
Now focus on the variance. Recall that the variance equals the average of the squared devia-
tions from the mean:
In Clint’s case the number of pieces of information available to estimate the variance equals 3,
not 4; that is why we should divide by 3. To understand why, consider the first observation in
isolation. Recall the first card Clint draws:
v1 = 4.54
Based only on the first observation, the estimated population mean would equal 4.54; hence,
when considering only the first observation, the deviation from the estimated mean, and the
squared deviation would equal 0. More generally, when we consider only the first observation
the deviation from the estimated mean and the squared deviation will always equal 0, despite
which one of the 871 cards was drawn, the estimated mean would equal that value recorded on
the card:
Considering first observation only
↓
EstMean = v1
↓
v1 − EstMean = 0
↓
(v1 − EstMean)2 = 0
861 Estimating the Mean of a Population
Based on only a single observation, the deviation and squared deviation will equal 0 regardless
of what the actual population variance equals. Consequently the first observation provides no
information when estimating the population variance. Only when the second observation is
introduced would we begin to get some information about the actual population variance:
• On the one hand, if the actual population variance were large, then it would be likely for the
value recorded on the second observation to be far from the first observation value; consequently
the deviations from the estimated mean and squared deviations would be large.
• On the other hand, if the actual population variance were small, then it would be likely for
the value recorded on the second observation to be close to the first observation value; conse-
quently the deviations from the estimated mean and the squared deviations would be small.
Since the first observation in isolation provides no information about the variance, the number
of pieces of information available to estimate the variance equals the sample size less one. The
degrees of freedom equal the number of pieces of information that are available to estimate the
variance.
Chapter 26 Exercises
1. The large manufacturer of laptop computers claims that on average its laptops achieves 7
hours of battery life; that is, the manufacturer claims that the actual mean number of hours its
laptop will operate without the battery begin recharged is 7:
A consumer group has challenged the claim, however, asserting that the average is less than 7.0.
You have been asked by the Consumer Protection Agency to investigate this claim. To do so you
conduct the following experiment:
Use the average of the eight laptops sampled to estimate the mean battery life:
v1 + v2 + v3 + v8
EstMean =
4
∑
8
a. What does the sum of the hours, v , now equal? On average, what does the battery
t =1 t
life of the eight laptops equal? What is the estimated mean, EstMean, for the battery life of
all laptops produced?
b. Show that the sum of squared deviations from the estimated mean, EstMean, of the eight
laptops you tested equals 8.12.
c. Estimate the variance for the battery life of all laptops produced by the manufacturer, the
population?
863 Estimating the Mean of a Population
d. Argue that the vi’s, the battery life of the laptops you tested, are independent random
variables.
e. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?
2. Now, apply the information you compiled in problem 1 to assess the consumer group’s com-
plaint. We will use hypothesis testing to do so.
a. Play the cynic and construct the null and alternative hypotheses, H0 and H1.
b. If the null hypothesis were correct, what would the mean of EstMean’s probability distri-
bution equal?
c. Formulate the question needed to assess the cynic’s view and the null hypothesis, that is,
compute Prob[Results IF H0 true].
d. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Our Econometrics
Lab includes software that allows you to calculate this probability easily. Access the lab using
the following link and then fill in the blanks with the appropriate numbers:
Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______
e. Assess the consumer manufacturer’s claim.
3. Now suppose that the sample size were 64, eight times larger. That is, instead of randomly
selecting eight cards, suppose that you randomly selected sixty-four laptops. Furthermore
suppose that the battery life data just replicated the data from the first eight laptops. That is,
∑
64
a. What would the sum of the hours for the sixty-four laptops, v , equal? On average,
t =1 t
what does the battery life of the sixty-four laptops equal? What is the estimated mean,
EstMean, for the battery life of all laptops produced?
b. What would the sum of squared deviation from the estimated mean now equal?
c. Estimate the variance for number of hours of all laptops produced by the manufacturer,
the population?
d. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?
e. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Access the lab using
the following link and then fill in the blanks:
Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______
4. Now suppose that the sample size were 120, fifteen times larger than the original eight.
Furthermore suppose that the battery life data just replicated the data from the first eight laptops.
That is,
∑
120
a. What would the sum of the hours for the 120 laptops, t =1 t
v , now equal? On average,
what does the battery life of the 120 laptops equal? What is the estimated mean, EstMean,
for the battery life of all laptops produced?
b. What would the sum of squared deviation from the estimated mean now equal?
c. Estimate the variance for number of hours of all laptops produced by the manufacturer,
the population?
d. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?
e. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Access the lab using
the following link and then fill in the blanks:
Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______
f. Assess the consumer manufacturer’s claim.
Using your intuition, explain why Prob[Results IF H0 true] changes as the sample size increases.
866 Chapter 26
6. Suppose that you had not learned about degrees of freedom and the Student t-distribution.
Specifically, suppose that you used the sample size when estimating population variance and the
normal distribution when calculating Prob[Results IF H0 true]. Fill in the blanks had this been
the case:
7. Compare your answers to problems 5 and 6. As the sample size increases does the use of the
degrees of freedom and the Student t-distribution become more or less critical? Explain.
Index
Absence of random influences, 165–68 maximum likelihood estimation procedure and, 780–82
Alternative hypothesis. See Hypothesis testing probit probability model, 774–82
Any Two estimation procedure, 205–208, 597 BLUE. See Best linear unbiased estimation procedure
Artificial models, 384–85 (BLUE)
Attenuation bias, 621 Bounds, upper and lower confidence interval, 488–89
Autocorrelation Breusch–Pagan–Godfrey test, 531–35
accounting for, 561–73 Budget theory of demand, 291–99
covariance and independence and, 549–51
defined, 551 Calculating an average, 242–45, 859–61
error terms and, 553 Causation and correlation, 27, 272–73
generalized least squares and, 561–74 Censored dependent variables, 783–88
mathematics, 554–59 Center. See Mean (center)
model, 551–53 Central Limit Theorem, 100–102
ordinary least squares and, 554–61 Clever algebraic manipulation approach, 340–44, 353
rho, 551–53 Coefficient estimates, 166–71, 187–89
robust standard errors and, 574 autocorrelation and, 554–59
Averages, 3–8 biased vs. unbiased, 239–41
calculating, 242–45, 861–63 heteroskedasticity and, 520–22
dummy variables and, 414, 427 interpretation using logarithms, 304–11
irrelevant explanatory variables, 468–69
Basic regression model, 515–16, 548, 583 mean (center) of, 190–93
Best fitting line, 148–50, 152–53 probability distribution, 190–97
ordinary least squares (OLS) estimation procedure for importance of, 224–25
finding, 156–64 variance estimation, 237–41
stretched S-shaped vs. straight regression, 774–82 reduced form estimation procedure, 714–15
systematic procedure to determine, 153–54 reliability of, 173, 199–204, 258–62
Best linear unbiased estimation procedure (BLUE), 205– range of x’s and, 202–204
208, 517, 538, 549, 585 sample size and, 201–202
Bias variance of error term’s probability distribution and,
attenuation, 621 199–201
biased but consistent estimation procedure, 598–601 Coefficient interpretation approach, 717–25
dilution, 621 paradoxes, 722–24
estimation procedures, 227–35, 599–601 Coefficient of determination, 490–94
explanatory variable/error term correlation and, 588–92 Conditional probability, 807–808
instrumental variable approach and ordinary least Monty Hall problem, 809–16
squares, 628–31 Confidence intervals
omitted explanatory variable, 448–56, 639–43 calculated with statistical software, 489–90
ordinary least squares, 660–61, 667–68, 676–77 consistence with data, 478–79
Binary dependent variables example, 479–89
electoral college example, 770–71 ordinary least squares and, 479–92
linear probability model, 772–74 upper and lower bounds, 488–89
868 Index
justifying use of, 107–10 best linear unbiased estimation procedure (BLUE),
properties of, 105 205–208, 517, 538, 549, 585
rules of thumb, 110–12 biased and unbiased, 93–96, 125–26, 233, 236, 239–41,
stretched S-shaped line using, 774–82 289, 442–44, 522–26, 703
spread, 2, 8–11, 67–68 biased but consistent, 598–601
Student t-, 374–75, 856–59 embedded within the ordinary least squares estimation
variance, 2, 8–11, 67–68 procedure, 271–72, 516–17, 548–49, 584
Downward sloping demand theory, 286–91, 318–29, general and specific, 123
446–47 generalized least squares (GLS) (see Generalized least
Dummy variables squares (GLS) estimation procedure)
cross-sectional fixed effects and, 670–71 importance of mean in, 124–25, 198
averages and, 413 importance of variance in, 125–26, 198
defined, 414–15 instrumental variable (IV), 602–607, 623–24
example, 414, 428 maximum likelihood, 780–82
fixed effects and, 670–73 Min-Max, 205–208
implicit assumptions and, 424 population mean, 834–38
models, 415–23 opinion polls, 61–66, 89–97, 121–24
period fixed effects and, 673–81 ordinary least squares (OLS) (see Ordinary least
trap, 500–506 squares (OLS) estimation procedure)
probit, 780–82
Elasticity of demand. See Constant elasticity model; reduced form (RF), 708–15, 720–22, 744–45, 747
Demand theory; Price elasticity of demand order condition and, 746–61
Endogenous variables reliability of unbiased, 96–97
vs. exogenous variables, 699, 737 small and large sample properties, 592–601
replaced with its estimate, 743–44 Tobit, 787–88
Errors two-stage least squares (TSLS), 741–46, 760–61
measurement, 613 unbiased and consistent, 595–96
defined, 614 unbiased but inconsistent, 596–97
example, 625–28 Events
modeling, 614 correlated, 816–18, 825–26
ordinary least squares and, 617–23 independent, 820–21, 826
robust standard, 538–41, 574 Event trees, 798–806
type I and type II, 135–38 Monty Hall problem and, 809–16
Error terms, 151–52, 173, 183, 189, 270–71, 539, 558, EViews
559, 583 autocorrelation and, 547
autocorrelation and, 551–61 binary dependent variables, 782
equal variance premise (see Standard ordinary least dummy and interaction variables, 411–12, 415–23
squares (OLS) premises) dummy variable trap and, 504–506
explanatory variables and, 548, 584, 586–88 fixed effects, 680
correlation and bias, 588–92, 696–97 heteroskedasticity and, 540–41
independence premise (see Standard ordinary least ordinary least squares (OLS) estimation procedure and,
squares (OLS) premises) 163–64
heteroskedasticity and, 517–26 probit estimation procedure, 782
importance of, 164–71 random effects, 687
independence premise (see Standard ordinary least scatter diagrams, 515–16
squares (OLS) premises) Tobit estimation procedure, 788
probability distribution variance, 199–201 two-stage least squares, 746
degrees of freedom and, 241–45 Exogenous variables
estimation of, 226–36 absent from demand and supply models, 747–48
random influences and, 171–72 vs. endogenous variables, 699, 737
standard ordinary least squares (OLS) premises (see order condition and, 746–61
Standard ordinary least squares (OLS) premises) Explanatory variables, 151
Estimates dependent variable as linear combination of,
coefficient (see Coefficient estimates) 498–99
interval (see Interval estimates) dummy variables and, 415–23, 500–506
Estimation procedures endogenous, 699, 706
Any Two, 205–208, 597 error terms, 618–19
best fitting line and, 148–50 correlation and bias, 588–92, 696–97
870 Index
Maximum likelihood estimation procedure, 780–82 linear demand model and, 331–33
Mean (center), 3–8, 27, 29, 839–40, 844–45 probability calculation
coefficient estimate’s probably distribution, 190–93, clever algebraic manipulation, 340–44, 353
224–25 Wald (F-distribution) test, 353–64
distribution, 2, 3–8 Nonconditional probability, 807
estimation of population Nonrandom sampling, 599–601
degrees of freedom, 859–61 Normal distribution, 102–104
estimating variance of, 848–54 example, 105–107
importance of probability distribution’s mean and hypothesis testing and, 128–30
variance, 845–46 interval estimates and, 256–58
normal distribution and Student t-distribution, 856–59 justifying use of, 107–10
procedure, 834–38 properties, 105
estimation procedures and, 93–96, 124–25 rules of thumb, 110–12
importance of, 95–96, 124–25, 198, 845–46 single variable, 374–75
probability distribution, 66–67, 90, 124–25, 190–93 stretched S-shaped line using, 774–82
of random variable, 66–67, 90 vs. Student t-distribution, 856–59
relationship to median, 798 Null hypothesis. See Hypothesis testing
strategy to estimate variance of probability distribution
of, 846–48 Oddities, data, 393–94
unbiased estimation procedure and, 124–25, 190–91, OLS. See Ordinary least squares (OLS) estimation
443 procedure
Measurement error, 613 Omitted explanatory variables, 445–46, 456
explanatory variable, 622–23 bias, 448–56, 639–43
defined, 614 explanatory variable/error term independence premise
dependent variable, 614–17 and, 641–43
example, 625–28 consistency and, 642–43
instrumental variables and, 628–33 direct effect, 448–56
modeling, 614 instrumental variables and, 648–54
Median, 796–98 bad instrument, 653
Min-Max estimation procedure, 205–208 good instrument conditions, 645, 649–53
Mode, 796 ordinary least squares and, 446–54, 642–43
Model specification and development proxy effect, 448–56
data oddities in, 393–94 Omitted variable proxy effect, 449–53
effect of economic conditions on presidential elections, One-tailed hypothesis test, 286–91
391–404 vs. two-tailed tests, 291
iterative process for formulation and assessment, Opinion poll simulation, 63–66, 68–76
395–404 Order condition, 746–49
Ramsey REgression Specification Error Test (RESET), overidentification problem, 757–60
384–91 underidentification problem, 749–56
Monty Hall problem, 809–16 Ordinary least squares (OLS) estimation procedure,
Motivating hypothesis testing, 126–30, 262–68 154–55, 184, 584–86, 660–61
Multicollinearity, 457–66 absence of random influences and, 165–68
earmarks of, 464–66 autocorrelation and, 551–61
highly correlated explanatory variables and, 458–63 as best linear unbiased estimation procedure (BLUE),
perfectly correlated explanatory variables and, 457–58 205–208
Multiple regression analysis bias, 660–61, 667–68, 676–77
constant elasticity demand model and, 333–39 coefficient estimates, 166–71, 187–89
flexibility of, 427 mean (center) of, 190–93
goal of, 318, 325, 448, 475–77, 495, 639–40, 717 reliability, 199–204
linear demand model and, 331–33 confidence intervals and, 479–92
power of, 427 consistency, 601, 622–23
vs. simple, 318 dependent variable measurement error and, 614–16
dummy variables and, 415–23
Natural logarithms, 305 error term/error term independence premise (see
Natural range and correlation coefficient, 23–27 Standard ordinary least squares (OLS) premises)
No money illusion theory, 329–31 equal error term variance premise (see Standard
constant elasticity demand model and, 333–39 ordinary least squares (OLS) premises)
hypothesis testing, 351–53 error terms and, 164–71, 171–72, 183, 270–71
872 Index
Tails probability, 260–62, 266–67, 289, 299, 398–99 error term correlation and bias, 588–92, 696–97
confidence intervals and, 479–89 error term independence premise (see standard
confidence interval upper and lower bounds and, ordinary least squares (OLS) premises)
488–89 exogenous, 742–44
Tobit estimation procedure, 787–88 having same value for all observations, 495–97
Truncated (censored) dependent variables, 783–88 highly correlated, 458–63
t-tests, 369–74 irrelevant, 466–69
Two-stage least squares (TSLS) estimation procedure, logarithm, 307
741–44 measurement error, 617–23
compared to reduced form, 744–46, 752–53, 755–56, multicollinearity and, 457–66
759–62 omitted, 445–56, 639–40
overidentification and, 760–61, 762 one explanatory variable as linear combination of
underidentification, 753, 756, 762 other, 497–98
Two-tailed confidence intervals perfectly correlated, 457–58
consistence with data, 478–79 instrumental, 602–607, 623–24, 628–31
example, 479–89 interaction, 425–27, 432–34
ordinary least squares and, 479–92 log dependent variable model, 309–10
Two-tailed hypothesis tests, 291–99 omitted explanatory, 445–56
equivalence of Wald tests and, 369–74 ordinary least squares (OLS), bias and consistency,
Two-variable relationships 446–56, 639–43
correlation, 13 instrumental variables, bias, and consistency, 642–43
correlation coefficient, 22–27 example, 646–48
covariance and, 13–19 probability distribution of, 53–58
independence of variables in, 19–21 random, 48–50, 53–58, 66–67, 88–89, 185
scatter diagrams of, 13 continuous, 48, 58–61, 827
Type I and type II errors, 135–38 correlated, 818–20
discrete, 48–58
Unbiased estimation procedures, 93–96, 125–26, 233, F-statistic as, 358–62
236, 239–41, 289, 442–44 independent, 821–25
autocorrelation and, 559–61 interval estimates and, 256–58
consistency and, 595–96 mean of, 66–67, 90
defined, 93–95 variance of, 54–58, 91–92
heteroskedasticity and, 522–26 Variance (spread), 2, 8–12, 27, 29, 67–76, 91–92,
inconsistent, 596–97 125–26, 194–97, 840–44
Uncorrelated variables, 19–21, 444–45 estimating coefficient estimate, 194–97, 224–25,
Underidentification problem, 749–56 237–41
Unit insensitivity and correlation coefficient, 22–23 normal distribution and Student t-distribution, 256–58,
Unrestricted sum of squared residuals, 355–58, 362–63, 856–57
373 strategy for, 225–26, 521–22, 556, 559
F-distribution and, 362–63, 373 estimating error term, 199–201, 227–36, 516, 517–19
Upper confidence interval bounds, 488–89 degrees of freedom and, 241–45
estimating population, 848–54
Variable(s) degrees of freedom, 859–61
correlated and independent, 19–21, 444–45 normal distribution and Student t-distribution,
dependent, 151, 628–31, 648–49, 769 856–59
binary, 770–82 strategy to estimate, 846–48
logarithm, 306 importance of, 96–97, 125–26, 198, 845–46
measurement error, 614–16 random variable, 54–58, 67–76, 91–92
truncated (censored), 783–88 unbiased estimation procedure, 125–23, 198–99, 444,
dummy, 414–24, 670–73 846–47
cross-sectional fixed effects, 670–71
period fixed effects, 673–80, 679–81 Wald (F-distribution) test
trap, 500–506 calculations, 363–65
endogenous vs. exogenous, 699, 706–707, 737 equivalence of two-tailed t-tests and, 369–74
exogenous, 699, 737 restricted and unrestricted sums of squared residuals
order condition and, 746–61 and, 355–58, 362–63, 373
explanatory, 151, 628–31, 648–49 restricted regression, 353–55, 372–73
dependent variable as linear combination of, 498–99 unrestricted regression, 355–56, 372–73, 388–89