0% found this document useful (0 votes)

14 views893 pages

(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)

An Introduction to Econometrics by Frank Westhoff is a comprehensive guide that covers fundamental concepts in econometrics, including descriptive statistics, probability, estimation procedures, and hypothesis testing. The book is structured into chapters that progressively build on each topic, providing exercises and review questions to reinforce learning. It serves as a self-contained resource for understanding econometric methods and their applications.

Uploaded by

abha2010abha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views893 pages

(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)

Uploaded by

abha2010abha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 893

An Introduction to Econometrics

A Self-contained Approach

Frank Westhoff

The MIT Press

Cambridge, Massachusetts
London, England
© 2013 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (includ-
ing photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information,
please email [email protected] or write to Special Sales Department, The MIT Press, 55 Hayward Street,
Cambridge, MA 02142.

This book was set in Times and Syntax by Toppan Best-set Premedia Limited. Printed and bound in the United States
of America.

Library of Congress Cataloging-in-Publication Data

Westhoff, Frank, 1946–

An introduction to econometrics : a self-contained approach / Frank Westhoff.
pages cm
Includes index.
ISBN 978-0-262-01922-4 (hardcover : alk. paper)
1. Econometrics. I. Title.
HB139.W475 2013
330.01′5195–dc23

2012049424

10 9 8 7 6 5 4 3 2 1
Contents

How to Use This Book xvii

1 Descriptive Statistics 1
Chapter 1 Prep Questions 1
1.1 Describing a Single Data Variable 2
1.1.1 Introduction to Distributions 2
1.1.2 Measure of the Distribution Center: Mean (Average) 3
1.1.3 Measures of the Distribution Spread: Range, Variance, and Standard Deviation 8
1.1.4 Histogram: Visual Illustration of a Data Variable’s Distribution 11
1.2 Describing the Relationship between Two Data Variables 13
1.2.1 Scatter Diagram: Visual Illustration of How Two Data Variables Are Related 13
1.2.2 Correlation of Two Variables 13
1.2.3 Measures of Correlation: Covariance 13
1.2.4 Independence of Two Variables 19
1.2.5 Measures of Correlation: Correlation Coefficient 22
1.2.6 Correlation and Causation 27
1.3 Arithmetic of Means, Variances, and Covariances 27
Chapter 1 Review Questions 29
Chapter 1 Exercises 30
Appendix 1.1: The Arithmetic of Means, Variances, and Covariances 40

2 Essentials of Probability and Estimation Procedures 45

Chapter 2 Prep Questions 46
2.1 Random Processes and Probability Random Process: A Process Whose Outcome Cannot Be Predicted with
Certainty 46
2.1.1 Random Process: A Process Whose Outcome Cannot Be Predicted with Certainty 46
2.1.2 Probability: The Likelihood of a Particular Outcome of a Random Process 47
2.1.3 Random Variable: A Variable That Is Associated with an Outcome of a Random Process 48
2.2 Discrete Random Variables and Probability Distributions 48
2.2.1 Probability Distribution Describes the Probability for All Possible Values of a Random
Variable 48
2.2.2 A Random Variable’s Bad News and Good News 50
2.2.3 Relative Frequency Interpretation of Probability 50
2.2.4 Relative Frequency Interpretation of Probability Summary 52
vi Contents

2.3 Describing a Probability Distribution of a Random Variable 53

2.3.1 Center of the Probability Distribution: Mean (Expected Value) of the Random Variable 53
2.3.2 Spread of the Probability Distribution: Variance of the Random Variable 54
2.4 Continuous Random Variables and Probability Distributions 58
2.5 Estimation Procedures: Populations and Samples 61
2.5.1 Clint’s Dilemma: Assessing Clint’s Political Prospects 62
2.5.2 Usefulness of Simulations 63
2.5.3 Center of the Probability Distribution: Mean of the Random Variable 66
2.5.4 Spread of the Probability Distribution: Variance of the Random Variable 67
2.6 Mean, Variance, and Covariance: Data Variables and Random Variables 77
Chapter 2 Review Questions 77
Chapter 2 Exercises 77

3 Interval Estimates and the Central Limit Theorem 87

Chapter 3 Prep Questions 88
3.1 Review 88
3.1.1 Random Variables 88
3.1.2 Relative Frequency Interpretation of Probability 89
3.2 Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution 89
3.2.1 Measure of the Probability Distribution Center: Mean of the Random Variable 90
3.2.2 Measure of the Probability Distribution Spread: Variance of the Random Variable 91
3.2.3 Why Is the Mean of the Estimate’s Probability Distribution Important? Biased and Unbiased
Estimation Procedures 93
3.2.4 Why Is the Variance of the Estimate’s Probability Distribution Important? Reliability of Unbiased
Estimation Procedures 96
3.3 Interval Estimates 98
3.4 Relative Frequency Interpretation of Probability 98
3.5 Central Limit Theorem 100
3.6 The Normal Distribution: A Way to Estimate Probabilities 102
3.6.1 Properties of the Normal Distribution: 105
3.6.2 Using the Normal Distribution Table: An Example 105
3.6.3 Justifying the Use of the Normal Distribution 107
3.6.4 Normal Distribution’s Rules of Thumb 110
Chapter 3 Review Questions 112
Chapter 3 Exercises 112
Appendix 3.1: Normal Distribution Right-Tail Probabilities 117

4 Estimation Procedures, Estimates, and Hypothesis Testing 119

Chapter 4 Prep Questions 119
4.1 Clint’s Dilemma and Estimation Procedures 121
4.1.1 Clint’s Opinion Poll and His Dilemma 121
4.1.2 Clint’s Estimation Procedure: The General and the Specific 123
4.1.3 Taking Stock and Our Strategy to Assess the Reliability of Clint’s Poll Results: Use the General
Properties of the Estimation Procedure to Assess the Reliability of the One Specific
Application 123
4.1.4 Importance of the Mean (Center) of the Estimate’s Probability Distribution 124
vii Contents

4.1.5 Importance of the Variance (Spread) of the Estimate’s Probability Distribution for an Unbiased
Estimation Procedure 125
4.2 Hypothesis Testing 126
4.2.1 Motivating Hypothesis Testing: The Evidence and the Cynic 126
4.2.2 Formalizing Hypothesis Testing: Five Steps 130
4.2.3 Significance Levels and Standards of Proof 133
4.2.4 Type I and Type II Errors: The Trade-Offs 135
Chapter 4 Review Questions 138
Chapter 4 Exercises 139

5 Ordinary Least Squares Estimation Procedure—The Mechanics 145

Chapter 5 Prep Questions 146
5.1 Best Fitting Line 148
5.2 Clint’s Assignment 150
5.3 Simple Regression Model 151
5.3.1 Parameters of the Model 151
5.3.2 Error Term and Random Influences 151
5.3.3 What Is Simple about the Simple Regression Model? 152
5.3.4 Best Fitting Line 152
5.3.5 Needed: A Systematic Procedure to Determine the Best Fitting Line 153
5.4 Ordinary Least Squares (OLS) Estimation Procedure 154
5.4.1 Sum of Squared Residuals Criterion 155
5.4.2 Finding the Best Fitting Line 156
5.5 Importance of the Error Term 164
5.5.1 Absence of Random Influences: A “What If” Question 165
5.5.2 Presence of Random Influences: Back to Reality 168
5.6 Error Terms and Random Influences: A Closer Look 171
5.7 Standard Ordinary Least Squares (OLS) Premises 173
5.8 Clint’s Assignment: The Two Parts 173
Chapter 5 Review Questions 174
Chapter 5 Exercises 174

6 Ordinary Least Squares Estimation Procedure—The Properties 181

Chapter 6 Prep Questions 182
6.1 Clint’s Assignment: Assess the Effect of Studying on Quiz Scores 182
6.2 Review 183
6.2.1 Regression Model 183
6.2.2 The Error Term 183
6.2.3 Ordinary Least Squares (OLS) Estimation Procedure 184
6.2.4 The Estimates, bConst and bx, Are Random Variables 185
6.3 Strategy: General Properties and a Specific Application 185
6.3.1 Review: Assessing Clint’s Opinion Poll Results 185
6.3.2 Preview: Assessing Professor Lord’s Quiz Results 187
6.4 Standard Ordinary Least Squares (OLS) Regression Premises 189
6.5 New Equation for the Ordinary Least Squares (OLS) Coefficient Estimate 190
6.6 General Properties: Describing the Coefficient Estimate’s Probability Distribution 190
viii Contents

6.6.1 Mean (Center) of the Coefficient Estimate’s Probability Distribution 190

6.6.2 Variance (Spread) of the Coefficient Estimate’s Probability Distribution 194
6.7 Estimation Procedures and the Estimate’s Probability Distribution 198
6.8 Reliability of the Coefficient Estimate 199
6.8.1 Estimate Reliability and the Variance of the Error Term’s Probability Distribution 199
6.8.2 Estimate Reliability and the Sample Size 201
6.8.3 Estimate Reliability and the Range of x’s 202
6.8.4 Reliability Summary 204
6.9 Best Linear Unbiased Estimation Procedure (BLUE) 205
Chapter 6 Review Questions 208
Chapter 6 Exercises 208
Appendix 6.1: New Equation for the OLS Coefficient Estimate 213
Appendix 6.2: Gauss–Markov Theorem 215

7 Estimating the Variance of an Estimate’s Probability Distribution 221

Chapter 7 Prep Questions 222
7.1 Review 223
7.1.1 Clint’s Assignment 223
7.1.2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure 223
7.1.3 Importance of the Coefficient Estimate’s Probability Distribution 224
7.2 Strategy to Estimate the Variance of the Coefficient Estimate’s Probability Distribution 225
7.3 Step 1: Estimate the Variance of the Error Term’s Probability Distribution 226
7.3.1 First Attempt: Variance of the Error Term’s Numerical Values 227
7.3.2 Second Attempt: Variance of the Residual’s Numerical Values 231
7.3.3 Third Attempt: “Adjusted” Variance of the Residual’s Numerical Values 235
7.4 Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate the Variance of
the Coefficient Estimate’s Probability Distribution 237
7.5 Tying up a Loose End: Degrees of Freedom 241
7.5.1 Reviewing our Second and Third Attempts to Estimate the Variance of the Error Term’s Probability
Distribution 241
7.5.2 How Do We Calculate an Average? 242
7.6 Summary: The Ordinary Least Squares (OLS) Estimation Procedure 245
7.6.1 Three Important Parts 245
7.6.2 Regression Results 246
Chapter 7 Review Questions 247
Chapter 7 Exercises 248

8 Interval Estimates and Hypothesis Testing 251

Chapter 8 Prep Questions 251
8.1 Clint’s Assignment: Taking Stock 253
8.2 Estimate Reliability: Interval Estimate Question 254
8.2.1 Normal Distribution versus the Student t-Distribution: One Last Complication 256
8.2.2 Assessing the Reliability of a Coefficient Estimate 258
8.3 Theory Assessment: Hypothesis Testing 262
8.3.1 Motivating Hypothesis Testing: The Cynic 262
8.3.2 Formalizing Hypothesis Testing: The Steps 268
ix Contents

8.4 Summary: The Ordinary Least Squares (OLS) Estimation Procedure 270
8.4.1 Regression Model and the Role of the Error Term 270
8.4.2 Standard Ordinary Least Squares (OLS) Premises 271
8.4.3 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation Procedures 271
8.4.4 Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard Ordinary
Least Squares (OLS) Premises 272
Chapter 8 Review Questions 273
Chapter 8 Exercises 273
Appendix 8.1: Student t-Distribution Table—Right-Tail Critical Values 278
Appendix 8.2: Assessing the Reliability of a Coefficient Estimate Using the Student t-Distribution Table 280

9 One-Tailed Tests, Two-Tailed Tests, and Logarithms 285

Chapter 9 Prep Questions 286
9.1 A One-Tailed Hypothesis Test: The Downward Sloping Demand Curve 286
9.2 One-Tailed versus Two-Tailed Tests 291
9.3 A Two-Tailed Hypothesis Test: The Budget Theory of Demand 291
9.4 Hypothesis Testing Using Clever Algebraic Manipulations 299
9.5 Summary: One-Tailed and Two-Tailed Tests 303
9.6 Logarithms: A Useful Econometric Tool to Fine Tuning Hypotheses—The Math 303
9.6.1 Interpretation of the Coefficient Estimate: Esty = bConst + bxx 305
9.6.2 Differential Approximation: Δy ≈ (dy/dx)Δx 305
9.6.3 Derivative of a Natural Logarithm: d log(z)/dz = 1/z 305
9.6.4 Dependent Variable Logarithm: y = log(z) 306
9.6.5 Explanatory Variable Logarithm of z: x = log(z) 307
9.7 Using Logarithms—An Illustration: Wages and Education 307
9.7.1 Linear Model: Waget = βConst + βEHSEduct + et 308
9.7.2 Log Dependent Variable Model: LogWaget = βConst + βEHSEduct + et 309
9.7.3 Log Explanatory Variable Model: Waget = βConst + βELogHSEduct + et 310
9.7.4 Log-Log (Constant Elasticity) Model: LogWaget = βConst + βELogHSEduct + et 311
9.8 Summary: Logarithms and the Interpretation of Coefficient Estimates 311
Chapter 9 Review Questions 312
Chapter 9 Exercises 312

10 Multiple Regression Analysis—Introduction 317

Chapter 10 Prep Questions 317
10.1 Simple versus Multiple Regression Analysis 318
10.2 Goal of Multiple Regression Analysis 318
10.3 A One-Tailed Test: Downward Sloping Demand Theory 318
10.4 A Two-Tailed Test: No Money Illusion Theory 329
10.4.1 Linear Demand Model and the No Money Illusion Theory 331
10.4.2 Constant Elasticity Demand Model and the No Money Illusion Theory 333
10.4.3 Calculating Prob[Results IF H0 true]: Clever Algebraic Manipulation 340
Chapter 10 Review Questions 344
Chapter 10 Exercises 345
x Contents

11 Hypothesis Testing and the Wald Test 349

Chapter 11 Prep Questions 349
11.1 No Money Illusion Theory: Taking Stock 351
11.2 No Money Illusion Theory: Calculating Prob[Results IF H0 True] 353
11.2.1 Clever Algebraic Manipulation 353
11.2.2 Wald (F-Distribution) Test 353
11.2.3 Calculating Prob[Results IF H0 true]: Let the Software Do the Work 364
11.3 Testing the Significance of the “Entire” Model 366
11.4 Equivalence of Two-Tailed t-Tests and Wald Tests (F-Tests) 369
11.4.1 Two-Tailed t-Test 369
11.4.2 Wald Test 371
11.5 Three Important Distributions 374
Chapter 11 Review Questions 375
Chapter 11 Exercises 376

12 Model Specification and Development 381

Chapter 12 Prep Questions 381
12.1 Model Specification: Ramsey REgression Specification Error Test (RESET) 384
12.1.1 RESET Logic 384
12.1.2 Linear Demand Model 386
12.1.3 Constant Elasticity Demand Model 389
12.2 Model Development: The Effect of Economic Conditions on Presidential Elections 391
12.2.1 General Theory: “It’s the economy stupid” 391
12.2.2 Generate Relevant Variables 392
12.2.3 Data Oddities 393
12.2.4 Model Formulation and Assessment: An Iterative Process 395
12.2.5 Specific Voting Models 395
Chapter 12 Review Questions 404
Chapter 12 Exercises 404

13 Dummy and Interaction Variables 409

Chapter 13 Prep Questions 409
13.1 Preliminary Mathematics: Averages and Regressions Including Only a Constant 413
13.2 An Example: Discrimination in Academe 414
13.2.1 Average Salaries 414
13.2.2 Dummy Variables 414
13.2.3 Models 415
13.2.4 Beware of Implicit Assumptions 424
13.2.5 Interaction Variables 425
13.2.6 Conclusions 427
13.3 An Example: Internet and Television Use 428
13.3.1 Similarities and Differences 428
13.3.2 Interaction Variable: Economic and Political Interaction 432
Chapter 13 Review Questions 434
Chapter 13 Exercises 434
xi Contents

14 Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables 439

Chapter 14 Prep Questions 439
14.1 Review 442
14.1.1 Unbiased Estimation Procedures 442
14.1.2 Correlated and Independent (Uncorrelated) Variables 444
14.2 Omitted Explanatory Variables 445
14.2.1 A Puzzle: Baseball Attendance 446
14.2.2 Goal of Multiple Regression Analysis 448
14.2.3 Omitted Explanatory Variables and Bias 448
14.2.4 Resolving the Baseball Attendance Puzzle 454
14.2.5 Omitted Variable Summary 456
14.3 Multicollinearity 457
14.3.1 Perfectly Correlated Explanatory Variables 457
14.3.2 Highly Correlated Explanatory Variables 458
14.3.3 “Earmarks” of Multicollinearity 464
14.4 Irrelevant Explanatory Variables 466
Chapter 14 Review Questions 469
Chapter 14 Exercises 469

15 Other Regression Statistics and Pitfalls 473

Chapter 15 Prep Questions 473
15.1 Two-Tailed Confidence Intervals 478
15.1.1 Confidence Interval Approach: Which Theories Are Consistent with the Data? 478
15.1.2 A Confidence Interval Example: Television Growth Rates 479
15.1.3 Calculating Confidence Intervals with Statistical Software 489
15.2 Coefficient of Determination, R-Squared (R2) 490
15.3 Pitfalls 494
15.3.1 Explanatory Variable Has the Same Value for All Observations 495
15.3.2 One Explanatory Variable Is a Linear Combination of Other Explanatory Variables 497
15.3.3 Dependent Variable Is a Linear Combination of Explanatory Variables 498
15.3.4 Outlier Observations 499
15.3.5 Dummy Variable Trap 500
Chapter 15 Review Questions 506
Chapter 15 Exercises 506

16 Heteroskedasticity 513
Chapter 16 Prep Questions 513
16.1 Review 515
16.1.1 Regression Model 515
16.1.2 Standard Ordinary Least Squares (OLS) Premises 516
16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure 516
16.2 What Is Heteroskedasticity? 517
16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The Consequences 519
16.3.1 The Mathematics 519
16.3.2 Our Suspicions 522
16.3.3 Confirming Our Suspicions 522
xii Contents

16.4 Accounting for Heteroskedasticity: An Example 526

16.5 Justifying the Generalized Least Squares (GLS) Estimation Procedure 536
16.6 Robust Standard Errors 538
Chapter 16 Review Questions 541
Chapter 16 Exercises 541

17 Autocorrelation (Serial Correlation) 545

Chapter 17 Prep Questions 545
17.1 Review 548
17.1.1 Regression Model 548
17.1.2 Standard Ordinary Least Squares (OLS) Premises 548
17.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure 548
17.1.4 Covariance and Independence 549
17.2 What Is Autocorrelation (Serial Correlation)? 551
17.3 Autocorrelation and the Ordinary Least Squares (OLS) Estimation Procedure: The Consequences 554
17.3.1 The Mathematics 554
17.3.2 Our Suspicions 559
17.3.3 Confirming Our Suspicions 560
17.4 Accounting for Autocorrelation: An Example 561
17.5 Justifying the Generalized Least Squares (GLS) Estimation Procedure 573
17.6 Robust Standard Errors 574
Chapter 17 Review Questions 575
Chapter 17 Exercises 575

18 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables 579
Chapter 18 Prep Questions 580
18.1 Review 583
18.1.1 Regression Model 583
18.1.2 Standard Ordinary Least Squares (OLS) Premises 583
18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure 584
18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure 584
18.3 A Closer Look at the Explanatory Variable/Error Term Independence Premise 586
18.4 Explanatory Variable/Error Term Correlation and Bias 588
18.4.1 Geometric Motivation 588
18.4.2 Confirming Our Logic 590
18.5 Estimation Procedures: Large and Small Sample Properties 592
18.5.1 Unbiased and Consistent Estimation Procedure 595
18.5.2 Unbiased but Inconsistent Estimation Procedure 596
18.5.3 Biased but Consistent Estimation Procedure 598
18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency 601
18.7 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure 602
18.7.1 Motivation of the Instrumental Variables Estimation Procedure 602
18.7.2 Mechanics 603
18.7.3 The “Good” Instrument Conditions 603
18.7.4 Justification of the Instrumental Variables Estimation Procedure 604
xiii Contents

Chapter 18 Review Questions 607

Chapter 18 Exercises 607

19 Measurement Error and the Instrumental Variables Estimation Procedure 611

Chapter 19 Prep Questions 612
19.1 Introduction to Measurement Error 613
19.1.1 What Is Measurement Error? 614
19.1.2 Modeling Measurement Error 614
19.2 The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable Measurement Error 614
19.3 The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable Measurement Error 617
19.3.1 Summary: Explanatory Variable Measurement Error Bias 620
19.3.2 Explanatory Variable Measurement Error: Attenuation (Dilution) Bias 621
19.3.3 Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent? 622
19.4 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure 623
19.4.1 Mechanics 623
19.4.2 The “Good” Instrument Conditions 624
19.5 Measurement Error Example: Annual, Permanent, and Transitory Income 625
19.5.1 Definitions and Theory 625
19.5.2 Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a Serious Econometric
Problem? 627
19.6 Instrumental Variable (IV) Approach 628
19.6.1 The Mechanics 628
19.6.2 Comparison of the Ordinary Least Squares (OLS) and the Instrumental Variables (IV)
Approaches 630
19.6.3 “Good” Instrument Conditions Revisited 631
19.7 Justifying the Instrumental Variable (IV) Estimation Procedure 631
Chapter 19 Review Questions 633
Chapter 19 Exercises 634

20 Omitted Variables and the Instrumental Variable Estimation Procedure 637

Chapter 20 Prep Questions 637
20.1 Revisit Omitted Explanatory Variable Bias 639
20.1.1 Review of Our Previous Explanation of Omitted Explanatory Variable Bias 639
20.1.2 Omitted Explanatory Variable Bias and the Explanatory Variable/Error Term Independence
Premise 641
20.2 The Ordinary Least Squares Estimation Procedure, Omitted Explanatory Variable Bias, and
Consistency 642
20.3 Instrumental Variable Estimation Procedure: A Two Regression Estimation Procedure 644
20.3.1 Mechanics 644
20.3.2 The “Good” Instrument Conditions 645
20.4 Omitted Explanatory Variables Example: 2008 Presidential Election 646
20.5 Instrument Variable (IV) Application: 2008 Presidential Election 648
20.5.1 The Mechanics 648
20.5.2 “Good” Instrument Conditions Revisited 649
20.6 Justifying the Instrumental Variable (IV) Estimation Procedure 650
Chapter 20 Review Questions 654
Chapter 20 Exercises 654
xiv Contents

21 Panel Data and Omitted Variables 657

Chapter 21 Prep Questions 657
21.1 Taking Stock: Ordinary Least Squares (OLS) Estimation Procedure 660
21.1.1 Standard Ordinary Least Squares (OLS) Premises 660
21.2 Preview: Panel Data Examples and Strategy 661
21.3 First Differences and Fixed Effects (Dummy Variables) 662
21.3.1 Math Quiz Score Model 663
21.3.2 Ordinary Least Squares (OLS) Pooled Regression 665
21.3.3 First Differences 668
21.3.4 Cross-sectional Fixed Effects (Dummy Variables) 670
21.4 Period Fixed Effects (Dummy Variables) 673
21.4.1 Chemistry Score Model 674
21.4.2 Ordinary Least Squares (OLS) Pooled Regression 676
21.4.3 Period Fixed Effects (Dummy Variables) 679
21.5 Cross-sectional Random Effects 681
21.5.1 Art Project Model 682
21.5.2 Ordinary Least Squares (OLS) Pooled Regression 684
21.5.3 Cross-sectional Random Effects 686
21.6 Random Effects Critical Assumptions 688
Chapter 21 Review Questions 688
Chapter 21 Exercises 688

22 Simultaneous Equations Models—Introduction 693

Chapter 22 Prep Questions 694
22.1 Review: Explanatory Variable/Error Term Correlation 696
22.2 Simultaneous Equations Models: Demand and Supply 698
22.2.1 Endogenous versus Exogenous Variables 699
22.2.2 Single Equation versus Simultaneous Equations Models 699
22.2.3 Demand Model 700
22.2.4 Supply Model 703
22.2.5 Summary: Endogenous Explanatory Variable Problem 706
22.3 An Example: The Market for Beef 706
22.3.1 Demand and Supply Models 706
22.3.2 Ordinary Least Squares (OLS) Estimation Procedure 707
22.3.3 Reduced Form (RF) Estimation Procedure: The Mechanics 708
22.3.4 Comparing Ordinary Least Squares (OLS) and Reduced Form (RF) Estimates 715
22.4 Justifying the Reduced Form (RF) Estimation Procedure 715
22.5 Two Paradoxes 717
22.6 Resolving the Two Paradoxes: Coefficient Interpretation Approach 717
22.6.1 Review: Goal of Multiple Regression Analysis and the Interpretation of the Coefficients 717
22.6.2 Paradox: Demand Model Price Coefficient Depends on the Reduced Form (RF) Feed Price
Coefficients 720
22.6.3 Paradox: Supply Model Price Coefficient Depends on the Reduced Form (RF) Income
Coefficients 722
22.7 The Coefficient Interpretation Approach: A Bonus 724
Chapter 22 Review Questions 725
Chapter 22 Exercises 725
Appendix 22.1: Algebraic Derivation of the Reduced Form Equations 730
xv Contents

23 Simultaneous Equations Models—Identification 733

Chapter 23 Prep Questions 734
23.1 Review 737
23.1.1 Demand and Supply Models 737
23.1.2 Ordinary Least Squares (OLS) Estimation Procedure 738
23.2 Two-Stage Least Squares (TSLS): An Instrumental Variable (IV) Two-Step Approach—A Second Way to Cope
with Simultaneous Equations Models 741
23.2.1 First Stage: Exogenous Explanatory Variable(s) Used to Estimate the Endogenous Explanatory
Variable(s) 742
23.2.2 Second Stage: In the Original Model, the Endogenous Explanatory Variable Replaced with Its
Estimate 743
23.3 Comparison of Reduced Form (RF) and Two-Stage Least Squares (TSLS) Estimates 744
23.4 Statistical Software and Two-Stage Least Squares (TSLS) 745
23.5 Identification of Simultaneous Equations Models: Order Condition 746
23.5.1 Taking Stock 746
23.5.2 Underidentification 749
23.5.3 Overidentification 757
23.5.4 Overidentification and Two-Stage Least Squares (TSLS) 760
23.6 Summary of Identification Issues 762
Chapter 23 Review Questions 762
Chapter 23 Exercises 762

24 Binary and Truncated Dependent Variables 767

Chapter 24 Prep Questions 767
24.1 Introduction 769
24.2 Binary Dependent Variables 770
24.2.1 Electoral College: Red and Blue States 770
24.2.2 Linear Probability Model 772
24.2.3 Probit Probability Model: Correcting the Linear Model’s Intrinsic Problems 774
24.3 Truncated (Censored) Dependent Variables 783
24.3.1 Ordinary Least Squares (OLS) Estimation Procedure 787
24.3.2 Tobit Estimation Procedure 787
Chapter 24 Review Questions 789
Chapter 24 Exercises 789

25 Descriptive Statistics, Probability, and Random Variables—A Closer Look 793

Chapter 25 Prep Questions 794
25.1 Descriptive Statistics: Other Measures of the Distribution Center 794
25.1.1 Measure of the Distribution Center: Mode 796
25.1.2 Measure of the Distribution Center: Median 796
25.1.3 Relationship between the Mean and Median 798
25.2 Event Trees: A Tool to Calculate Probabilities 798
25.3 Calculating the Probability of a Combination of Different Outcomes 806
25.4 Nonconditional, Conditional, and Joint Probabilities 807
25.5 Conditional/Joint Probability Relationship 807
25.6 The Monty Hall Problem: Mathematicians Eat “Humble Pie” 809
xvi Contents

25.7 Correlation 816

25.7.1 Correlated Events 816
25.7.2 Correlated Random Variables and Covariance 818
25.8 Independence 820
25.8.1 Independent Events 820
25.8.2 Independent Random Variables and Covariance 821
25.9 Summary of Correlation and Independence 825
25.9.1 Correlation 825
25.9.2 Independence 826
25.10 Describing Probability Distributions of Continuous Random Variables 827
Chapter 25 Review Questions 827
Chapter 25 Exercises 827

26 Estimating the Mean of a Population 833

Chapter 26 Prep Questions 834
26.1 Estimation Procedure for the Population Mean 834
26.2 Estimated Mean’s Probability Distribution 839
26.2.1 Measure of the Probability Distribution’s Center: Mean 839
26.2.2 Measure of the Probability Distribution’s Spread: Variance 840
26.3 Taking Stock: What We Know versus What Clint Knows 844
26.4 Estimation Procedures: Importance of the Probability Distribution’s Mean (Center) and Variance
(Spread) 845
26.5 Strategy to Estimate the Variance of the Estimated Mean’s Probability Distribution 846
26.6 Step 1: Estimate the Variance of the Population 848
26.6.1 First Attempt: Variance of Clint’s Four Numerical Values Based on the Actual
Population Mean 848
26.6.2 Second Attempt: Variance of Clint’s Four Numerical Values Based on the Estimated Population
Mean 850
26.6.3 Third Attempt: “Adjusted” Variance of Clint’s Four Numerical Values Based on the Estimated
Population Mean 853
26.7 Step 2: Use the Estimated Variance of the Population to Estimate the Variance of the Estimated Mean’s
Probability Distribution 854
26.8 Clint’s Assessment of the Key West Tourist Bureau’s Claim 855
26.9 Normal Distribution and the Student t-Distribution 856
26.10 Tying Up a Loose End: Degrees of Freedom 859
Chapter 26 Review Questions 861
Chapter 26 Exercises 861

Index 867
How to Use This Textbook

This textbook utilizes many empirical examples and Java simulations that play a critical role.
The empirical examples show you how we use statistical software to analyze real world data.
The Java simulations confirm the algebraic equations that are derived in the chapters, providing
you with a better appreciation of what the equations mean, and also demonstrate important
econometric concepts without delving into complicated mathematics. The simulations are called
Econometrics Labs. The textbook calls your attention to the empirical examples and labs by
denoting them with a icon.
To gain the most benefit from the textbook, you should read the textbook while seated at a
computer to take advantage of the empirical examples and labs. Connect to the following url:
http://mitpress.mit.edu/westhoffeconometrics
This takes you to the first page of our textbook website. You will now be asked to select a
data format for the empirical examples. All the data used in the textbook are available on the
website stored as EViews workfiles, State workfiles, and Excel spreadsheets. If you will be using
the EViews statistical software, click on EViews, or if you will be using Stata, click on Stata;
otherwise, click on Excel. After doing so, bookmark this page. In the future, whenever you see
the icon in the textbook, connect to your bookmarked page to avoid specifying your sta-
tistical software repeatedly.
Next click on the chapter you are reading. A list of the empirical examples and labs for the
chapter will now appear. Click on the appropriate one. To gain the most from the textbook, you
should perform the empirical analysis and complete the labs for yourself as well as read the
results that are presented in the textbook. The textbook includes many “Getting Started in
EViews” sections to guide you through the empirical examples if you are using EViews. Also
note that the labs are Java applets; consequently the computer you use must have the Java
Runtime Environment installed to run the labs. Each lab may take a few seconds to load. (If you
have trouble viewing the applets, be certain you are running an up-to-date version of Java.)
Shortly thereafter instructions will appear and the lab will pose questions. You can navigate from
question to question clicking Next Question and, if need be, Previous Question. You should
work your way through each empirical example and lab as you read along in the textbook. By
doing so, you will gain a better appreciation of the concepts that are introduced.
Descriptive Statistics
1

Chapter 1 Outline

1.1 Describing a Single Data Variable

1.1.1 Introduction to Distributions
1.1.2 Measure of the Distribution Center: Mean (Average)
1.1.3 Measures of the Distribution Spread: Range, Variance, and Standard Deviation
1.1.4 Histogram: Visual Illustration of a Data Variable’s Distribution

1.2 Describing the Relationship between Two Data Variables

1.2.1 Scatter Diagram: Visual Illustration of How Two Data Variables Are Related
1.2.2 Correlation of Two Variables
1.2.3 Measures of Correlation: Covariance
1.2.4 Independence of Two Variables
1.2.5 Measures of Correlation: Correlation Coefficient
1.2.6 Correlation and Causation

1.3 Arithmetic of Means, Variances, and Covariances

Chapter 1 Prep Questions

1. Look at precipitation data for the twentieth century. How would you decide which month of
the year was the wettest?

2. Consider the monthly growth rates of the Dow Jones Industrial Average and the Nasdaq
Composite Index.
a. In most months, would you expect the Nasdaq’s growth rate to be high or low when the
Dow’s growth rate is high?
b. In most months, would you expect the Nasdaq’s growth rate to be high or low when the
Dow’s growth rate is low?
2 Chapter 1

c. Would you describe the Dow and Nasdaq growth rates as being correlated or
uncorrelated?

1.1 Describing a Single Data Variable

Descriptive statistics allow us to summarize the information inherent in a data variable. The
weather provides many examples of how useful descriptive statistics can be. Every day we hear
people making claims about the weather. “The summer of 2012 was the hottest on record,” “April
is the wettest month of the year,” “Last winter was the coldest ever,” and so on. To judge the
validity of such statements, we need some information, some data.
We will focus our attention on precipitation in Amherst, Massachusetts, during the twentieth
century. Table 1.1 reports the inches of precipitation in Amherst for each month of the twentieth
century.1

1.1.1 Introduction to Distributions

What is the wettest month of the summer in Amherst? How can we address this question? While
it is possible to compare the inches of precipitation in June, July, and August by carefully study-
ing the numerical values recorded in table 1.1, it is difficult, if not impossible, to draw any
conclusions. There is just too much information to digest. In some sense, the table includes too
much detail; it overwhelms us. For example, we can see from the table that July was the wettest
summer month in 1996, August was the wettest summer month in 1997, June was the wettest
summer month in 1998, August was again the wettest summer month in 1999, and finally, June
was again the wettest summer month in 2000. We need a way to summarize the information
contained in table 1.1. Descriptive statistics perform this task. By describing the distribution of
the values, descriptive statistics distill the information contained in many observations into single
numbers. Summarizing data in this way has both benefits and costs. Without a summary, we can
easily “lose sight of the forest for the trees.” In the process of summarizing, however, some
information will inevitably be lost.
First, we will discuss the two most important types of descriptive statistics that describe a
single data variable: measures of the distribution center and measures of the distribution
spread. Next we will introduce histograms. A histogram visually illustrates the distribution of
a single data variable.

1. With the exception of two months, March 1950 and October 1994, the data were obtained from NOAA’s National
Climate Data Center. Data for these two months were missing from the NOAA center and were obtained from the Phillip
T. Ives records that are stored in the Amherst College archives.
3 Descriptive Statistics

1.1.2 Measure of the Distribution Center: Mean (Average)

No doubt the most commonly cited descriptive statistic is the mean or average.2 We use the
mean to denote the center of the distribution all the time in everyday life. For example, we use
the mean or average income earned by individuals in states, per capita income, to denote how
much a typical state resident earns. Massachusetts per capita income in 2000 equaled $25,952.
This means that some Massachusetts residents earned more than $25,952 and some less, but
$25,952 lies at the center of the income distribution of Massachusetts residents. A typical or
representative state resident earned $25,952. A baseball player’s batting average is also a mean:
the number of hits the player gets per official at bat.
Since the mean represents the center of the distribution, the representative value, why not
simply calculate the mean amount of precipitation in June, July, and August to decide on the
wettest summer month? The month with the highest mean would be deemed the wettest. To
calculate the mean (average) precipitation for June in the twentieth century, we sum the amount
of precipitation in each June and divide the total by the number of Junes, 100 in this case:
0.75 + 4.54 + . . . + 7.99 377.76
Mean for June = = = 3.78
100 100
The mean precipitation for June is 3.78 inches. More formally, we can let x represent the data
variable for monthly precipitation in June:

x1 = value for the first observation (June 1901) = 0.75

x2 = value for the second observation (June 1902) = 4.54
..
.
xT = value for the Tth or last observation (June 2000) = 7.99

The following equation expresses the mean generally:

∑
T
x + x + . . . + xT x
t =1 t
Mean[ x] = x = 1 2 =
T T
where
T = total number of observations
The mean of a data variable is often denoted by a bar above the symbol, x–, pronounced “x bar.”
∑ t =1 xt T is a concise way to describe the arithmetic used to compute the mean. Let us now
T

“dissect” the numerator of this expression:

2. The median and mode are other measures of the center. They are presented in chapter 25.
4 Chapter 1

Table 1.1
Monthly precipitation in Amherst, Massachusetts, 1901 to 2000 (inches)

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
1903 3.28 4.27 6.40 2.30 0.48 7.79 4.64 4.92 1.66 2.72 2.04 3.95
1904 4.74 2.45 4.48 5.73 4.55 5.35 2.62 4.09 5.45 1.74 1.35 2.75
1905 3.90 1.70 3.66 2.56 1.28 2.86 2.63 6.47 6.26 2.27 2.06 3.15
1906 2.18 2.73 4.90 3.25 4.95 2.82 3.45 6.42 2.59 5.69 1.98 4.49
1907 2.73 1.92 1.82 1.98 4.02 2.61 3.87 1.44 8.74 5.00 4.50 3.89
1908 2.25 3.53 2.86 1.97 4.35 0.76 3.28 4.27 1.73 1.57 1.06 3.05
1909 3.56 5.16 3.01 5.53 3.36 2.24 2.24 3.80 4.99 1.23 1.06 2.95
1910 6.14 5.08 1.37 3.07 2.67 2.65 1.90 4.03 2.86 0.93 3.69 1.72
1911 2.36 2.18 3.80 1.87 1.37 2.02 4.21 5.92 3.41 8.81 3.84 4.42
1912 2.18 3.16 5.70 3.92 4.34 0.77 2.61 3.22 2.52 2.07 4.03 4.04
1913 3.98 2.94 6.30 3.30 4.94 0.90 1.59 2.26 2.56 5.16 2.11 3.38
1914 3.72 3.36 5.52 6.59 3.56 2.32 3.53 5.11 0.52 2.09 2.62 2.89
1915 6.52 7.02 0.12 3.99 1.20 3.00 9.13 8.28 1.37 2.89 2.20 5.86
1916 2.56 5.27 3.97 3.69 3.21 4.97 6.85 2.49 5.08 1.01 3.29 2.85
1917 3.30 1.98 4.08 1.83 4.13 5.27 3.36 7.06 2.42 6.60 0.63 2.56
1918 4.11 2.99 2.91 2.78 2.47 4.01 1.84 2.22 7.00 1.32 2.87 2.95
1919 2.02 2.80 4.22 2.37 6.20 1.09 4.17 4.80 4.45 1.81 6.20 1.48
1920 2.74 4.45 2.90 4.71 3.65 6.26 2.06 3.62 6.74 1.54 4.62 6.02
1921 2.00 2.38 3.57 6.47 4.56 3.87 6.00 2.35 1.84 1.08 6.20 1.90
1922 1.56 3.02 5.34 2.81 5.47 9.68 4.28 4.25 2.27 2.55 1.56 3.15
1923 6.02 1.81 1.98 3.19 3.26 2.24 1.77 2.55 1.89 5.50 5.05 4.23
1924 3.85 2.56 1.05 4.54 2.21 1.28 1.75 3.11 5.87 0.01 2.57 2.16
1925 3.42 3.64 4.12 3.10 2.55 4.28 6.97 1.93 3.09 4.74 3.23 3.56
1926 3.23 5.01 3.95 3.62 1.19 2.03 3.24 3.97 1.50 5.02 5.38 2.78
1927 2.50 2.62 1.96 1.60 4.83 3.37 3.40 5.01 2.79 4.59 8.65 5.66
1928 2.19 2.90 1.17 4.16 3.25 6.97 6.23 8.40 3.07 0.87 1.79 0.97
1929 4.33 3.92 3.20 6.89 4.17 3.06 0.70 1.54 3.62 2.75 2.73 4.05
1930 2.59 1.39 3.95 1.41 3.34 4.47 4.50 1.82 2.08 2.24 3.42 1.63
1931 3.58 1.80 3.79 2.95 7.44 4.24 3.87 6.57 2.50 3.06 1.55 3.83
1932 3.68 2.70 4.24 2.33 1.67 2.62 3.83 2.67 3.96 3.69 6.05 1.99
1933 2.44 3.48 4.79 5.03 1.69 3.68 2.25 6.63 12.34 3.90 1.19 2.81
1934 3.50 2.82 3.60 4.44 3.42 4.67 1.73 3.02 9.54 2.35 3.50 2.99
1935 4.96 2.50 1.48 2.54 2.17 5.50 3.10 0.82 4.67 0.88 4.41 1.05
1936 6.47 2.64 7.04 4.07 1.76 3.28 1.45 4.85 3.80 4.80 2.02 5.96
1937 5.38 2.22 3.38 4.03 6.09 5.72 2.88 4.91 3.24 4.33 4.86 2.44
1938 6.60 1.77 2.00 3.07 3.81 8.45 7.45 2.04 14.55 2.49 3.02 3.95
1939 2.21 3.62 4.49 4.56 2.15 3.21 2.30 3.89 2.97 4.55 0.98 3.89
1940 2.63 2.72 5.58 6.37 5.67 2.46 4.69 1.56 1.53 1.04 6.31 3.01
5 Descriptive Statistics

Table 1.1
(continued)

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1941 2.21 1.59 1.63 0.55 2.87 6.13 4.04 1.79 2.88 2.13 4.29 3.82
1942 3.54 1.66 7.89 0.96 2.98 3.63 4.95 2.93 3.94 3.27 6.07 6.03
1943 2.92 1.63 3.07 3.66 5.62 2.38 6.18 2.49 2.40 3.88 4.64 0.58
1944 1.24 2.34 4.36 3.66 1.35 4.70 3.88 4.33 5.31 1.74 4.21 2.18
1945 3.07 3.33 2.16 5.43 6.45 7.67 7.36 2.79 3.57 2.18 3.54 3.91
1946 2.72 3.52 1.60 2.16 5.41 3.30 5.30 4.00 4.88 1.51 0.70 3.51
1947 3.37 1.96 3.29 4.59 4.63 3.22 2.73 1.69 2.84 2.04 5.63 2.33
1948 2.63 2.45 2.92 2.87 5.83 5.67 2.95 3.56 1.92 1.14 5.22 2.87
1949 4.52 2.47 1.67 2.70 4.76 0.72 3.41 3.64 3.55 2.58 1.79 2.44
1950 4.33 3.99 2.67 3.64 2.77 3.65 2.83 2.93 2.24 1.87 6.60 4.64
1951 3.28 4.61 5.13 3.63 2.96 3.05 4.15 3.56 2.63 4.66 4.64 4.35
1952 4.02 1.97 3.17 3.40 4.00 4.97 4.99 3.98 4.05 1.07 0.89 4.10
1953 6.24 2.97 8.24 5.36 6.81 2.41 1.95 1.87 1.88 5.15 2.36 4.53
1954 2.45 1.94 3.93 4.24 4.80 2.68 3.00 3.91 6.14 1.89 5.07 3.19
1955 0.81 3.73 4.39 4.76 3.00 4.06 1.99 16.10 3.80 7.57 4.46 0.79
1956 1.75 3.52 4.94 4.49 2.02 2.86 2.90 2.71 5.55 1.64 3.10 4.83
1957 1.38 1.10 1.55 2.75 3.89 4.50 1.67 0.94 1.57 2.19 5.54 6.39
1958 4.03 2.21 2.62 4.58 2.98 1.64 5.13 5.19 3.90 3.79 3.79 1.57
1959 3.81 2.32 3.84 3.80 1.04 5.65 5.07 6.70 1.03 7.81 4.33 3.85
1960 2.35 3.90 3.32 4.30 3.44 4.73 6.84 3.74 6.75 2.43 3.13 2.71
1961 2.52 3.16 3.00 4.72 3.20 6.05 2.82 2.86 2.02 2.33 3.79 3.27
1962 3.01 3.59 1.84 2.69 2.03 1.06 2.16 3.33 3.74 4.16 2.11 3.30
1963 2.95 2.62 3.61 2.00 1.97 3.98 1.92 2.54 3.56 0.32 3.92 2.19
1964 5.18 2.32 2.71 2.72 0.83 1.84 3.02 3.01 0.94 1.32 1.68 3.98
1965 1.57 2.33 1.10 2.43 2.69 2.41 3.97 3.43 3.68 2.32 2.36 1.88
1966 1.72 3.43 2.93 1.28 2.26 3.30 5.83 0.67 5.14 4.51 3.48 2.22
1967 1.37 2.89 3.27 4.51 6.30 3.61 5.24 3.76 2.12 1.92 2.90 5.14
1968 1.87 1.02 4.47 2.62 3.02 7.19 0.73 1.12 2.64 3.10 5.78 5.08
1969 1.28 2.31 1.97 3.93 2.73 3.52 6.89 5.20 2.94 1.53 5.34 6.30
1970 0.66 3.55 3.52 3.69 4.16 4.97 2.17 5.23 3.05 2.45 3.27 2.37
1971 1.95 3.29 2.53 1.49 3.77 2.68 2.77 4.91 4.12 3.60 4.42 3.19
1972 1.86 3.47 4.85 4.06 4.72 10.25 2.42 2.25 1.84 2.51 6.92 6.81
1973 4.26 2.58 3.45 6.40 5.45 4.43 3.38 2.17 1.83 2.24 2.30 8.77
1974 3.35 2.42 4.34 2.61 5.21 3.40 3.71 3.97 7.29 1.94 2.76 3.67
1975 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
1976 5.23 3.30 2.15 3.40 4.49 2.20 2.20 6.21 2.74 4.31 0.71 2.69
1977 2.24 2.21 5.88 4.91 3.57 3.83 4.04 5.94 7.77 5.81 4.37 5.22
1978 8.16 0.88 2.65 1.48 2.53 2.83 1.81 4.85 0.97 2.19 2.31 3.93
1979 11.01 2.49 3.00 5.37 4.78 0.77 6.67 5.14 4.54 5.79 3.84 4.00
1980 0.50 0.99 6.42 3.84 1.47 3.94 2.26 1.43 2.33 2.23 3.63 0.91
6 Chapter 1

Table 1.1
(continued)

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1981 0.49 7.58 0.24 4.48 2.99 3.81 3.11 1.36 3.53 6.10 1.57 4.41
1982 3.92 3.65 2.26 4.39 2.54 8.07 4.20 2.00 2.81 2.29 3.55 1.85
1983 4.82 4.42 4.95 8.99 5.54 2.42 3.10 2.39 1.82 5.47 7.05 6.40
1984 1.75 6.42 3.68 4.30 11.95 1.69 4.66 1.34 1.02 3.13 3.97 2.84
1985 1.73 1.97 2.65 1.55 4.53 3.59 2.16 4.29 2.88 3.50 6.27 1.78
1986 5.86 2.83 3.69 1.43 2.36 5.02 7.32 1.99 1.07 2.43 5.32 5.52
1987 4.32 0.08 4.58 4.76 1.44 4.16 1.51 3.84 7.65 4.16 3.27 2.31
1988 2.40 3.40 2.13 3.59 2.58 1.28 6.37 4.71 2.45 1.72 5.83 1.52
1989 0.94 2.55 2.00 4.29 8.79 5.74 3.81 5.97 5.99 8.10 3.21 1.06
1990 4.32 3.15 3.13 4.35 6.79 1.49 1.70 8.05 1.42 6.40 3.64 5.07
1991 2.37 1.67 4.73 3.66 5.40 2.03 1.39 9.06 7.10 4.21 5.01 3.20
1992 2.12 1.78 3.25 2.95 2.32 3.34 4.28 7.63 2.47 2.18 4.43 3.76
1993 2.18 2.31 5.44 4.69 0.88 2.53 2.99 3.04 4.59 3.79 4.35 3.86
1994 5.76 1.87 5.60 3.19 6.34 2.70 6.87 4.39 3.72 1.34 3.87 5.06
1995 3.66 3.00 1.68 2.15 2.09 2.10 3.75 2.38 3.04 10.93 4.66 2.20
1996 6.68 4.01 2.19 8.30 3.62 4.50 6.94 0.70 6.01 4.11 3.59 6.09
1997 3.56 2.27 3.19 3.68 3.56 1.30 3.99 4.69 1.30 2.27 4.67 1.38
1998 4.19 2.56 4.53 2.79 3.50 8.60 2.06 1.45 2.31 5.70 1.78 1.24
1999 5.67 1.89 4.82 0.87 3.83 2.78 1.65 5.45 13.19 3.48 2.77 1.84
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24

• The uppercase Greek sigma, Σ, is an abbreviation for the word summation.

• The t = 1 and T represent the first and last observations of the summation.
• The xt represents observation t of the data variable.

∑
T
Consequently the expression x says “calculate the sum of the xt’s from t equals 1 to t
t =1 t
equals T”; that is,

∑
T
x = x1 + x2 + . . . + xT
t =1 t

Note that the x in Mean[x] is in a bold font. This is done to emphasize the fact that the mean
describes a specific characteristic, the distribution center, of the entire collection of values, the
entire distribution.
Suppose that we want to calculate the precipitation mean for each summer month. We could
use the information in tables and a pocket calculator to compute the means. This would not only
be laborious but also error prone. Fortunately, econometric software provides us with an easy
and reliable alternative. The Amherst weather data are posted on our website.
7 Descriptive Statistics

Amherst precipitation data: Monthly time series precipitation in Amherst, Massachusetts from
1901 to 2000 (inches)

Precipt Monthly precipitation in Amherst, MA, for observation t (inches)

Getting Started in EViews

Access the Amherst weather data online:

To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Amherst Weather

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)

Next instruct EViews to calculate the means:

•In the Workfile window: Highlight year by clicking on it; then, while depressing <Ctrl>, click
on month and precip to highlight them also.
• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables Year,
Month, and Precip for all the months appears.
• In the Group window: Click View; then click Descriptive Stats, and then Individual Samples.3
Descriptive statistics for all the months of the twentieth century now appear. We only want to
consider one month at a time. We want to compute the mean for June and then for July and then
for August. Let us see how to do this.
• In the Group window: Click Sample. In the Sample window: Enter month = 6 in the “If
condition (optional)” text area to restrict the sample to the sixth month, June, only.
•Click OK. Descriptive statistics for the 100 Junes appear in the Group window. Record the
mean.
• In the Group window: Click Sample. In the Sample window: Enter month = 7 in the “If

condition (optional)” text area to restrict the sample to July only. Click OK. Descriptive statistics
for the 100 Julys appear in the Group window. Record the mean.
• In the Group window: Click Sample. In the Sample window: Enter month = 8 in the “If
condition (optional)” text area to restrict the sample to August only. Click OK. Descriptive
statistics for the 100 Augusts appear in the Group window. Record the mean.

3. Common sample eliminates all observations in which there is one or more missing values in one of the variables;
the individual samples option does not do so. Since no values are missing for June, July, and August, the choice of
common or individual has no impact.
8 Chapter 1

• This last step is critical. In the Group window: Click Sample.

• In the Sample window: Clear the “If condition (optional)” text area by deleting month = 8;

otherwise, the restriction, month = 8, will remain in effect if you ask EViews to perform any
more computations.

Now, do not forget to close the file:

• In the EViews window: Click File, then Exit.
• In the Workfile window: Click No in response to the save changes made to workfile.

Table 1.2 summarizes the information. August has the highest mean. Based on the mean criterion,
August was the wettest summer month in the twentieth century; the mean for August equals
3.96, which is greater than the mean for June or July.

1.1.3 Measures of the Distribution Spread: Range, Variance, and Standard Deviation

While the center of the distribution is undoubtedly important, the spread can be crucial also. On
the one hand, if the spread is small, all the values of the distribution lie close to the center, the
mean. On the other hand, if the spread is large, some of the values lie far below the mean and
some lie far about the mean. Farming provides a good illustration of why the spread can be
important. Obviously the mean precipitation during the growing season is important to the
farmer. But the spread of the precipitation is important also. Most crops grow best when they
get a steady amount of moderate rain over the entire growing season. An unusually dry period
followed by an unusually wet period, and vice versa, is not welcome news for the farmer. Both
the center (mean) and the spread are important. The years 1951 and 1998 illustrate this well (see
table 1.3).
In reality, 1951 was a better growing season than 1998 even though the mean for 1998 was
a little higher. Precipitation was less volatile in 1951 than in 1998. Arguably, the most straight-
forward measure of distribution spread is its range. In 1951, precipitation ranged from a
minimum of 2.96 to a maximum of 4.15. In 1998, the range was larger from 1.45 to 8.60.
While the range is the simplest, it is not the most sensitive. The most widely cited measure
of spread is the variance and its closely related cousin, the standard deviation. The variance
equals the average of the squared deviations of the values from the mean. While this definition
may sound a little overwhelming when first heard, it is not as daunting as it sounds. We can use
the following three steps to calculate the variance:

Table 1.2
Mean monthly precipitation for the summer months in Amherst, Massachusetts, 1901 to 2000

Jun Jul Aug

Mean 3.78 3.79 3.96

9 Descriptive Statistics

Table 1.3
Growing season precipitation in Amherst, Massachusetts, 1951 and 1998

Year Apr May Jun Jul Aug Mean

1951 3.63 2.96 3.05 4.15 3.56 3.47

1998 2.79 3.50 8.60 2.06 1.45 3.68

•For each month, calculate the amount by which that month’s precipitation deviates from the
mean.
• Square each month’s deviation.
•Calculate the average of the squared deviations; that is, sum the squared deviations and divide
by the number of months, 5 in this case.

Let us first calculate the variance for 1998:

Month Precipitation Mean Deviation from mean Squared deviation

Apr 2.79 3.68 2.79 − 3.68 = −0.89 0.7921
May 3.50 3.68 3.50 − 3.68 = −0.18 0.0324
Jun 8.60 3.68 8.60 − 3.68 = 4.92 24.2064
Jul 2.06 3.68 2.06 − 3.68 = −1.62 2.6244
Aug 1.45 3.68 1.45 − 3.68 = −2.23 4.9729
Sum of squared deviations = 32.6282

Sum of squared deviations 32.6282

Variance = = = 6.5256
T 5

Note that the mean and the variance are expressed in different units; the mean is expressed in
inches and the variance in inches squared. Often it is useful to compare the mean and the measure
of spread directly, in terms of the same units. The standard deviation allows us to do just that.
The standard deviation is the square root of the variance; hence the standard deviation is
expressed in inches, just like the mean:

Standard deviation = Variance = 6.5256 in 2 = 2.55 in

10 Chapter 1

We can use the same procedure to calculate the variance and standard deviation for 1951:

Month Precipitation Mean Deviation from mean Squared deviation

Apr 3.63 3.47 3.63 − 3.47 = 0.16 0.0256
May 2.96 3.47 2.96 − 3.47 = −0.51 0.2601
Jun 3.05 3.47 3.05 − 3.47 = −0.42 0.1764
Jul 4.15 3.47 4.15 − 3.47 = 0.68 0.4624
Aug 3.56 3.47 3.56 − 3.47 = 0.09 0.0081
Sum of squared deviations = 0.9326

Sum of squared deviations 0.9326

Variance = = = 0.1865
T 5
Standard deviation = Variance = 0.1865 in 2 = 0.43 in

When the spread is small, as it was in 1951, all observations will be close to the mean. Hence
the deviations will be small. The squared deviations, the variance, and the standard deviation
will also be small. However, if the spread is large, as it was in 1998, some observations must
be far from the mean. Hence some deviations will be large. Some squared deviations, the vari-
ance, and the standard deviation will also be large. Let us summarize:

Spread small Spread large

↓ ↓
All deviations small Some deviations large
↓ ↓
All squared deviations small Some squared deviations large
↓ ↓
Variance small Variance large

We can concisely summarize the steps for calculating the variance with the following
equations:
( x1 − Mean[ x])2 + ( x2 − Mean[ x])2 + . . . + ( xT − Mean[ x])2
Var[ x] =
T
( x1 − x )2 + ( x2 − x )2 + . . . + ( xT − x )2
=
T

where

T = total number of observations

x– = Mean[x] = mean of x
11 Descriptive Statistics

We can express the variance more concisely using “summation” notation:

∑ ∑
T T
t =1
( xt − Mean[ x])2 t =1
( xt − x ) 2
Var[ x] = =
T T
The standard deviation is the square root of the variance:
SD[ x] = Var[ x]

Again, let us now “dissect” the summation expressions ∑ t =1 ( xt − Mean[ x])2 and ∑
T T
t =1
( xt − x )2 :
• The uppercase Greek sigma, Σ, is an abbreviation for the word summation.
• The t = 1 and T represent the first and last observations of the summation.
• The xt represents observation t of the data variable.

∑ ( xt − Mean[ x])2 and ∑ t =1 ( xt − x )2 equal the sum of the squared deviations from the mean.
T T
t =1
Note that the x in Var[x] and in SD[x] is in a bold font. This emphasizes the fact that the
variance and standard deviation describe one specific characteristic, the distribution spread, of
the entire distribution.

1.1.4 Histogram: Visual Illustration of a Data Variable’s Distribution

A histogram is a bar graph that visually illustrates how the values of a single data variable are
distributed. Figure 1.1 is a histogram for September precipitation in Amherst. Each bar of the
histogram reports on the number of months in which precipitation fell within the specified range.

Number of months
30

0
0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15
Inches

Figure 1.1
Histogram—September precipitation in Amherst, Massachusetts, 1901 to 2000
12 Chapter 1

The histogram provides a visual illustration of the distribution of September precipitation in

Amherst during the twentieth century. That is, figure 1.1 illustrates how frequently there was
less than 1 inch of precipitation, how frequently there was between 1 and 2 inches, and so on.
In September there was
• less than 1 inch of rain in 3 years • between 8 and 9 inches of rain in 2 years
• between 1 and 2 inches of rain in 18 years • between 9 and 10 inches of rain in 1 year
• between 2 and 3 inches of rain in 26 years • between 10 and 11 inches of rain in 0 years
• between 3 and 4 inches of rain in 19 years • between 11 and 12 inches of rain in 0 years
• between 4 and 5 inches of rain in 8 years • between 12 and 13 inches of rain in 1 year
• between 5 and 6 inches of rain in 9 years • between 13 and 14 inches of rain in 1 year
• between 6 and 7 inches of rain in 7 years • between 14 and 15 inches of rain in 1 year
• between 7 and 8 inches of rain in 4 years

We can use histograms to illustrate the differences between two distributions. For example,
compare the histogram for September (figure 1.1) and the histogram for February precipitation
(figure 1.2):
The obvious difference in the two histograms is that the September histogram has a longer
“right-hand tail.” The center of September’s distribution lies to the right of February’s; conse-
quently we would expect September’s mean to exceed February’s. Also the distribution of
precipitation in September is more “spread out” than the distribution in February; hence we
would expect September’s variance to be larger. Table 1.4 confirms quantitatively what we
observe visually. September has a higher mean: 3.89 for September versus 2.88 for February.
Also the variance for September is greater.

Number of months

0
0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15
Inches

Figure 1.2
Histogram—February precipitation in Amherst, Massachusetts, 1901 to 2000
13 Descriptive Statistics

Table 1.4
Means and variances of precipitation for February and September, 1901 to 2000

Mean Variance

February 2.88 1.49

September 3.89 6.50

1.2 Describing the Relationship between Two Data Variables

1.2.1 Scatter Diagram: Visual Illustration of How Two Data Variables Are Related

We will use the Dow Jones and Nasdaq data appearing in tables 1.5a and 1.5b to introduce
another type of useful graph, the scatter diagram, which visually illustrates the relationship
between two variables.
We will focus on the relationship between the Dow Jones and Nasdaq growth rates. Figure
1.3 depicts their scatter diagram by placing the Dow Jones growth rate on the horizontal axis
and the Nasdaq growth rate on the vertical axis.

1.2.2 Correlation of Two Variables

On the scatter diagram in figure 1.3, each point illustrates the Dow Jones growth rate and the
Nasdaq growth rate for one specific month. For example, the top left point labeled Feb 2000
represents February 2000 when the Dow fell by 7.42 percent and the Nasdaq grew by 19.19
percent. Similarly the point in the first quadrant labeled Jan 1987 represents January 1987 when
the Dow rose by 13.82 percent and the Nasdaq rose by 12.41 percent.
The Dow Jones and Nasdaq growth rates appear to be correlated. Two variables are correlated
when information about one variable helps us predict the other. Typically, when the Dow Jones
growth rate is positive, the Nasdaq growth rate is also positive; similarly, when the Dow Jones
growth rate is negative, the Nasdaq growth rate is usually negative. Although there are excep-
tions, February 2000, for example, knowing one growth rate typically helps us predict the other.
For example, if we knew that the Dow Jones growth rate was positive in one specific month,
we would predict that the Nasdaq growth rate would be positive also. While we would not always
be correct, we would be right most of the time.

1.2.3 Measure of Correlation: Covariance

Covariance quantifies the notion of correlation. We can use the following three steps to calculate
the covariance of two data variables, x and y:

1. For each observation, calculate the amount by which variable x deviates from its mean and
the amount by which variable y deviates from its mean.
14

Table 1.5a
Monthly percentage growth rate of Dow Jones Industrial Index, 1985 to 2000

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Chapter 1

1985 6.21 − 0.22 −1.34 − 0.69 4.55 1.53 0.90 −1.00 − 0.40 3.44 7.12 5.07
1986 1.57 8.79 6.41 −1.90 5.20 0.85 − 6.20 6.93 − 6.89 6.23 1.94 − 0.95
1987 13.82 3.06 3.63 − 0.79 0.23 5.54 6.35 3.53 −2.50 −23.22 −8.02 5.74
1988 1.00 5.79 − 4.03 2.22 − 0.06 5.45 − 0.61 − 4.56 4.00 1.69 −1.59 2.56
1989 8.01 −3.58 1.56 5.46 2.54 −1.62 9.04 2.88 −1.63 −1.77 2.31 1.73
1990 −5.91 1.42 3.04 −1.86 8.28 0.14 0.85 −10.01 − 6.19 −0.42 4.81 2.89
1991 3.90 5.33 1.10 − 0.89 4.83 −3.99 4.06 0.62 − 0.88 1.73 −5.68 9.47
1992 1.72 1.37 − 0.99 3.82 1.13 −2.31 2.27 − 4.02 0.44 −1.39 2.45 − 0.12
1993 0.27 1.84 1.91 − 0.22 2.91 − 0.32 0.67 3.16 −2.63 3.53 0.09 1.90
1994 5.97 −3.68 −5.11 1.26 2.08 −3.55 3.85 3.96 −1.79 1.69 − 4.32 2.55
1995 0.25 4.35 3.65 3.93 3.33 2.04 3.34 −2.08 3.87 −0.70 6.71 0.84
1996 5.44 1.67 1.85 − 0.32 1.33 0.20 −2.22 1.58 4.74 2.50 8.16 −1.13
1997 5.66 0.95 − 4.28 6.46 4.59 4.66 7.17 −7.30 4.24 − 6.33 5.12 1.09
1998 −0.02 8.08 2.97 3.00 −1.80 0.58 − 0.77 −15.13 4.03 9.56 6.10 0.71
1999 1.93 − 0.56 5.15 10.25 −2.13 3.89 −2.88 1.63 − 4.55 3.80 1.38 5.69
2000 −4.84 −7.42 7.84 − 1.72 −1.97 − 0.71 0.71 6.59 −5.03 3.01 −5.07 3.59
15

Table 1.5b
Monthly percentage growth rate of Nasdaq Index, 1985 to 2000

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1985 12.79 1.97 −1.76 0.50 3.64 1.86 1.72 −1.19 −5.84 4.35 7.35 3.47
1986 3.35 7.06 4.23 2.27 4.44 1.32 −8.41 3.10 −8.41 2.88 − 0.33 −3.00
Descriptive Statistics

1987 12.41 8.39 1.20 −2.86 − 0.31 1.97 2.40 4.62 −2.35 −27.23 −5.60 8.29
1988 4.30 6.47 2.07 1.23 −2.35 6.59 −1.87 −2.76 2.95 −1.34 −2.88 2.66
1989 5.22 − 0.40 1.75 5.14 4.35 −2.44 4.25 3.42 0.77 −3.66 0.11 − 0.29
1990 −8.58 2.41 2.28 −3.54 9.26 0.72 −5.21 −13.01 −9.63 − 4.27 8.88 4.09
1991 10.81 9.39 6.44 0.50 4.41 −5.97 5.49 4.71 0.23 3.06 −3.51 11.92
1992 5.78 2.14 − 4.69 − 4.16 1.15 −3.71 3.06 −3.05 3.58 3.75 7.86 3.71
1993 2.86 −3.67 2.89 − 4.16 5.91 0.49 0.11 5.41 2.68 2.16 −3.19 2.97
1994 3.05 −1.00 − 6.19 −1.29 0.18 −3.98 2.29 6.02 − 0.17 1.73 −3.49 0.22
1995 0.43 5.10 2.96 3.28 2.44 7.97 7.26 1.89 2.30 − 0.72 2.23 − 0.67
1996 0.73 3.80 0.12 8.09 4.44 − 4.70 −8.81 5.64 7.48 − 0.44 5.82 − 0.12
1997 6.88 −5.13 − 6.67 3.20 11.07 2.98 10.52 − 0.41 6.20 −5.46 0.44 −1.89
1998 3.12 9.33 3.68 1.78 − 4.79 6.51 −1.18 −19.93 12.98 4.58 10.06 12.47
1999 14.28 −8.69 7.58 3.31 −2.84 8.73 −1.77 3.82 0.25 8.02 12.46 21.98
2000 −3.17 19.19 −2.64 −15.57 −11.91 16.62 −5.02 11.66 −12.68 −8.25 −22.90 −4.90
16 Chapter 1

20 Nasdaq
Feb 2000

Jan 1987
10

Dow Jones
0
–20 –10 0 10 20

–10

–20

Figure 1.3
Scatter diagram—Dow Jones growth rate versus Nasdaq growth rate

2. Multiply each observation’s x deviation by its y deviation.

3. Calculate the average of these products; that is, sum the products of the deviations and divide
by the number of observations.

We can express these steps concisely with an equation:

( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
∑ ( xt − x ) ( yt − y )
T
t =1
=
T

where

T = total number of observations

x– = Mean[x] = mean of x
–y = Mean[y] = mean of y

Let us calculate the covariance for the Dow and Nasdaq monthly growth rates. The average
monthly increase for the Dow Jones Industrial average was 1.25 percent and the average increase
for the Nasdaq Composite was 1.43 percent. Their covariance equals 19.61:
17 Descriptive Statistics

20 Nasdaq
Feb 2000 deviation

Deviations from
mean

10 Jan 1987

Dow Jones
deviation
0
–20 –10 0 10 20

–10

–20

Figure 1.4
Scatter diagram—Dow Jones growth rate less its mean versus Nasdaq growth rate less its mean

( x1 − x ) ( y1 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
(6.21 − 1.25) (12.79 − 1..43) + . . . + (3.59 − 1.25) (−4.90 − 1.43)
=
192
= 19.61

A nonzero variance suggests that the variables are correlated. To understand why, consider a
scatter diagram of the deviations. As seen in figure 1.4, we place the deviation of the Dow Jones
growth rate from its mean on the horizontal axis and the deviation of the Nasdaq growth rate
from its mean on the vertical axis. This scatter diagram allows us to motivate the relationship
between the covariance and correlation.4
The covariance equation and the scatter diagram are related. The numerator of the covariance
equation equals the sum of the products of each month’s deviations, (xt − x–)(yt − –y ):

4. The discussion that follows is not mathematically rigorous because it ignores the magnitude of the deviation products.
Nevertheless, it provides valuable insights. Chapter 25 provides a more rigorous discussion of covariance.
18 Chapter 1

−
(yt – y)

Quadrant II Quadrant I
− − − −
(xt− x) < 0 (yt − y) > 0 (xt − x) > 0 (yt − y ) > 0

− − − −
(xt− x)(yt − y) < 0 (xt − x)(yt − y) > 0
−
(xt − x)
Quadrant III Quadrant IV
− − − −
(xt− x) < 0 (yt − y) < 0 ( xt− x) > 0 (yt − y) < 0

− − − −
(xt − x)(yt − y) > 0 (xt − x)(yt − y) < 0

Figure 1.5
Scatter diagram—Deviations and covariance terms

∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] =
T
What can we say about the sign of each observation’s deviations and their product, (xt − x–)(yt
− –y ), in each quadrant of the scatter diagram (figure 1.5)?
•First quadrant. Dow growth rate is greater than its mean and Nasdaq growth is greater than its
mean. Both deviations are positive; hence the product of the deviations is positive in the first
quadrant:

(xt − x– ) > 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) > 0

• Second quadrant. Dow growth rate is less than its mean and Nasdaq growth is greater than its
mean. One deviation is positive and one negative; hence the product of the deviations is negative
in the second quadrant:

(xt − x– ) < 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) < 0

•Third quadrant. Dow growth rate is less than its mean and Nasdaq growth is less than its mean.
Both deviations are negative; hence the product of the deviations is positive in the third
quadrant:

(xt − x– ) < 0 and (yt − –y ) < 0 → (xt − x– )(yt − –y ) > 0

• Fourth quadrant. Dow growth rate is greater than its mean and Nasdaq growth is less than its
mean. One deviation is positive and one negative; hence, the product of the deviations is negative
in the fourth quadrant:

(xt − x– ) > 0 and (yt − –y ) < 0 → (xt − x– )(yt − –y ) < 0

19 Descriptive Statistics

20 Nasdaq
Deviations deviation
from mean

Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5

–10

–20

Figure 1.6
Scatter diagram—Amherst precipitation less its mean versus Nasdaq growth rate less its mean

Compare figures 1.4 and 1.5. In the Dow Jones and Nasdaq deviation scatter diagram (figure
1.4), the points representing most months lie in the first and third quadrants. Consequently the
product of the deviations, (xt − x– )(yt − –y ), is positive in most months. This explains why the
covariance is positive.5 A positive covariance means that the variables are positively correlated.
When one variable is above average, the other is typically above average as well. Similarly, when
one variable is below average, the other is typically below average.

1.2.4 Independence of Two Variables

Two variables are independent or uncorrelated when information about one variable does not
help us predict the other. The covariance of two independent (uncorrelated) data variables is
approximately zero. To illustrate two independent variables, consider the precipitation in Amherst
and the Nasdaq growth rate. The scatter diagram in figure 1.6 plots the deviation of Amherst
precipitation from its mean versus the deviation of the Nasdaq growth rate from its mean:
Recall what we know about the sign of the deviation in each quadrant:

5. As mentioned above, we are ignoring how the magnitude of the products affects the sum.
20 Chapter 1

• First quadrant: (xt − x– ) > 0 and (yt − –y ) > 0 → (xt − x– ) (yt − –y ) > 0
• Second quadrant: (xt − x– ) < 0 and (yt − –y ) > 0 → (xt − x– ) (yt − –y ) < 0
• Third quadrant: (xt − x– ) < 0 and (yt − –y ) < 0 → (xt − x– ) (yt − –y ) > 0
• Fourth quadrant: (x − x– ) > 0 and (y − –y ) <
t t 0 → (xt − x– ) (yt − –y ) < 0

Since the points are distributed more or less evenly across all four quadrants, the products of
the deviations, (xt − x– )(yt − –y ), are positive in about half the months and negative in the other
half.6 Consequently the covariance will be approximately equal to 0. In general, if variables are
independent, the covariance will be about 0. In reality, the covariance of precipitation and the
Nasdaq growth rate is −0.91, approximately 0:

∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] = = −0.91 ≈ 0
T

We can use EViews to calculate the covariance. The stock market data are posted on our
website.

Stock market data: Monthly time series growth rates of the Dow Jones Industrial and Nasdaq
stock indexes from January 1985 to December 2000

Monthly growth rate of the Dow Jones Industrial Average based on the
DJGrowtht
monthly close for observation t (percent)
Monthly growth rate of the Nasdaq Composite based on the monthly close
NasdaqGrowtht
for observation t (percent)
Precipt Monthly precipitation in Amherst, MA, for observation t (inches)

Getting Started in EViews

Access the stock market data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Stock Market.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)

Next instruct EViews to calculate the covariance of Amherst precipitation and the Nasdaq
growth rate:

6. Again, note that this explanation ignores the magnitude of the products.
21 Descriptive Statistics

• In the Workfile window: Highlight precip by clicking on it; then, while depressing <Ctrl>,

click on nasdaqgrowth to highlight it.

• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables Precip and
NasdaqGrowth appears.
• In the Group window: Click View, and then click Covariance Analysis. . . .
• In the Covariance Analysis window: Be certain that the Covariance checkbox is selected; then
click OK.

Last, close the file:

• In the EViews window: Click File, then Exit.
• In the Workfile window: Click No in response to the save changes made to the workfile.

Both the variances and the covariances are reported in table 1.6. The variances are reported in
the diagonal cells: the variance for Amherst precipitation is 4.17 and the variance for the Nasdaq
growth rate is 43.10. Their covariance appears in the off diagonal cells: the covariance is −0.91.
Note that the two off-diagonal cells report the same number. This results from a basic arithmetic
fact. When we multiply two numbers together, the order of the multiplication does not matter:

(xt − x– )(yt − –y ) = (yt − –y )(xt − x– )

Let us summarize the relationship between correlation, independence, and covariance:

Variables are Variables are

correlated independent
↓ ↓
The value of one variable The value of one variable
does help predict the does not help predict the
value of the other value of the other
↓ ↓
Cov[x, y] ≠ 0 Cov[x, y] = 0

Table 1.6
Amherst precipitation and Nasdaq growth rate covariance matrix

Covariance matrix

Precip NasdaqGrowth

Precip 4.170426 − 0.911125

NasdaqGrowth − 0.911125 43.09910
22 Chapter 1

1.2.5 Measure of Correlation: Correlation Coefficient

There is no natural range for the covariance; its magnitude depends on the units used. To appre-
ciate why, suppose that we measured Amherst precipitation in centimeters rather than inches.
Consequently all precipitation figures appearing in table 1.1 would be multiplied by 2.54 to
convert from inches to centimeters. Now consider the covariance equation:

( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
The covariance for Amherst precipitation would rise by a factor of 2.54. To understand why,
let the variable x represent Amherst precipitation and y the Nasdaq growth rate:

xt’s up by a x– up by a
→
factor of 2.54 factor of 2.54

(xt − x–)’s up by a
factor of 2.54

( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
Cov[ x, y] =
T
Cov[x, y] up by a
factor of 2.54

Our choice of which units to use in measuring rainfall, inches or centimeters, is entirely arbitrary.
The arbitrary choice affects the magnitude of the covariance. How, then, can we judge the covari-
ance to be large or small when its size is affected by an arbitrary decision?

Unit Insensitivity
To address this issue, we introduce the correlation coefficient that is not affected by the choice
of units:

Cov[ x, y]
CorrCoef[ x, y] =
Var[ x] Var[ y]
To appreciate why this resolves the problem, again let x represent Amherst precipitation. We
know that measuring rainfall in centimeters rather than inches causes the covariance to increase
by a factor of 2.54. But how does the use of centimeters affect the variance of precipitation and
its square root?
23 Descriptive Statistics

xt’s up by a x– up by a
→
factor of 2.54 factor of 2.54

(xt − x– )’s up by a
factor of 2.54
∑
T
t =1
( xt − x )2
Var[ x] =
T
Var[x] up by a
factor of 2.542
↓
Var[ x] up by a
factor of 2.54

We will now consider the correlation coefficient equation. When we use centimeters rather than
inches, both Cov[x, y] and Var[ x] increase by a factor of 2.54; consequently both the numera-
tor and the denominator of the correlation coefficient equation increase by a factor of 2.54:
Cov[x, y] up by
a factor of 2.54
↓
Cov[ x, y]
CorrCoef[ x, y] =
Var[ x] Var[ y]
↑
Var[ x] up by
a factor of 2.54

The value of the correlation coefficient is unaffected by the choice of units.

Natural Range
The correlation coefficient also has another important property; it must lie between −1.00 and
+1.00. Therefore it provides us with a sense of how strongly two variables are correlated. A
correlation coefficient of +1.00 represents perfect positive correlation and −1.00 represents
perfect negative correlation (figure 1.7).
To understand why, consider the two polar cases of perfect positive and perfect negative
correlation.
24 Chapter 1

Perfect negative Perfect positive

correlation Independent correlation

0 CorrCoef
–1 1

Figure 1.7
Range of correlation coefficients

Let us begin by reviewing the definitions of variance and covariance:

∑ ∑
T T
t =1
( xt − x ) 2 t =1
( yt − y )2
Var[ x] = , Var[ y] =
T T

∑ ( xt − x ) ( yt − y )
T
t =1
Cov[ x, y] =
T
Perfect Positive Correlation Consider an example of perfect positive correlation. Suppose that
two variables are identical; that is, suppose that

yt = xt for each t = 1, 2, . . . , T

In this case the variables exhibit perfect positive correlation. If we know the value of x, we can
perfectly predict the value of y, and vice versa. Let us compute their correlation coefficient. To
do so, first note that x and y have identical means
–y = x–

and that each observation’s deviation from the means is the same for x and y

yt − –y = xt − x– for each t = 1, 2, . . . , T

Consider the equations above for the variances and covariance; both the variance of y and the
covariance equal the variance of x:

Var[y] = Var[x] and Cov[x, y] = Var[x]

It is easy to understand why:

∑ ∑
T T
t =1
( yt − y )2 t =1
( xt − x ) 2
Var[ y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − –y = xt − x– Definition
25 Descriptive Statistics

and

∑ ( xt − x ) ( yt − y ) ∑
T T
t =1 t =1
( xt − x ) 2
Cov[ x, y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − y = xt − x–
– Definition

Now apply the correlation coefficient equation. The correlation coefficient equals 1.00:
Cov[ x, y] Var[ x] Var[ x]
CorrCoef[ x, y] = = = = 1.00
Var[ x] Var[ y] Var[ x] Var[ x] Var[ x]
↑ ↑
Definition Cov[x, y] = Var[x]
Perfect Negative Correlation Next consider an example of perfect negative correlation; suppose
that

yt = −xt for each t = 1, 2, . . . , T

In this case the variables exhibit perfect negative correlation. Clearly, y’s mean is the negative
of x’s:
–y = −x–

and y’s deviation from its mean equals the negative of x’s deviation from its mean for each
observation
yt − –y = −(xt − x– ) for each t = 1, 2, . . . , T

The variance of y equals the variance of x and the covariance equals the negative of the variance
of x:

Var[y] = Var[x] and Cov[x, y] = −Var[x]

Let us show why:

∑ ∑
T T
t =1
( yt − y )2 t =1
( xt − x ) 2
Var[ y] = = = Var[ x]
T T
↑ ↑ ↑
Definition yt − y = −(xt − x– )
– Definition
26 Chapter 1

and

∑ ( xt − x ) ( yt − y ) ∑
T T
t =1 t =1
( xt − x ) 2
Cov[ x, y] = = = − Var[ x]
T T
↑ ↑ ↑
Definition yt − y = −(xt − x– )
– Definition
Applying the correlation coefficient equation, the correlation coefficient equals −1.00:

Cov[ x, y] − Var[ x] − Var[ x]

CorrCoef[ x, y] = = = = −1.00
Var[ x] Var[ y] Var[ x] Var[ x] Var[ x]

↑ ↑
Definition Cov[x, y] = −Var[x]

Getting Started in EViews

Access the stock market data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Stock Market.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
• In the Workfile window: Highlight precip by clicking on it; then, while depressing <Ctrl>,
click on nasdaqgrowth and djgrowth to highlight them also.
• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables Precip,
NasdaqGrowth, and DJGrowth appears.
• In the Group window: Click View and then click Covariance Analysis. . . .
• In the Covariance Analysis window: Clear the Covariance box and select the Correlation box;
then click OK.

All diagonal elements must equal 1.00. This reflects the fact that when two variables are identi-
cal, perfect positive correlation results. Each off-diagonal cell reports the correlation coefficient
for the two different variables, as shown in table 1.7.
27 Descriptive Statistics

Table 1.7
Amherst precipitation, Nasdaq growth rate, and Dow Jones growth rate correlation matrix

Correlation matrix

Precip NasdaqGrowth DJGrowth

Precip 1.000000 − 0.067960 − 0.128425

NasdaqGrowth − 0.067960 1.000000 0.669061
DJGrowth − 0.128425 0.669061 1.000000

Note that all the correlation coefficients fall within the −1.00 to +1.00 range. Each correlation
coefficient provides us with a sense of how correlated two variables are:

CorrCoef[Dow Jones Growth Rate, Nasdaq Growth Rate] = 0.67

CorrCoef[Nasdaq Growth Rate, Amherst Precipitation] = −0.07

The correlation coefficient for the Dow and Nasdaq growth rate is positive. On the one hand,
this illustrates that they are positively correlated. On the other hand, the correlation coefficient
for Nasdaq growth rate and Amherst precipitation is approximately 0, indicating that the Nasdaq
growth rate and Amherst precipitation are independent (figure 1.8).

1.2.6 Correlation and Causation

The fact that two variables are highly correlated does not necessarily indicate that one variable
is causing the other to rise and fall. For example, the Dow Jones and Nasdaq growth rates are
indeed positively correlated. This does not imply that a rise in the Dow Jones causes the Nasdaq
to rise or that a rise in the Nasdaq causes the Dow Jones to rise, however. It simply means that
when one rises, the other tends to rise, and when one falls, the other tends to fall. One reason
that these two variables tend to move together is that both are influenced by similar factors. For
example, both are influenced by that the general health of the economy. On the one hand, when
the economy prospers, both Dow Jones stocks and Nasdaq stocks tend to rise; therefore both
indexes tend to rise. On the other hand, when the economy falters, both indexes tend to fall.
While the indexes are correlated, other factors are responsible for the causation.

1.3 Arithmetic of Means, Variances, and Covariances

Elementary algebra allows us to derive the following relationships for means, variances, and
covariances:7

7. See appendix 1.1 at the end of this chapter for the algebraic proofs.
28 Chapter 1

20 Nasdaq
Feb 2000
deviation
Deviations from
mean

10 Jan 1987

Dow Jones
deviation
0
–20 –10 0 10 20

–10 (Positively) Correlated

Knowing the value of one variable helps
us predict the value of the other
↓
Cov= 19.61
CorrCoef = 0.67
–20
20 Nasdaq
Deviations deviation
from mean

Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5

–10
Independent (uncorrelated)
Knowing the value of one variable does
not help us predict the value of the other
↓
Cov= 0.91≈ 0
CorrCoef = –0.07≈ 0 –20

Figure 1.8
Scatter diagrams—Comparison of correlated and independent variables
29 Descriptive Statistics

• Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]

The mean of a constant plus a variable equals the constant plus the mean of the variable.
• Mean of a constant times a variable: Mean[cx] = c Mean[x]
The mean of a constant times a variable equals the constant times the mean of the variable.
• Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]
The mean of the sum of two variables equals the sum of the means of the variables.
• Variance of a constant plus a variable: Var[c + x] = Var[x]
The variance of a constant plus a variable equals the variance of the variable.
• Variance of a constant times a variable: Var[cx] = c2 Var[x]
The variance of a constant times a variable equals the constant squared times the variance of the
variable.
• Variance of the sum of two variables: Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]
The variance of the sum of two variables equals the sum of the variances of the variables plus
twice the variables’ covariance.
• Variance of the sum of two independent (uncorrelated) variables: Var[x + y] = Var[x] +

Var[y]
The variance of the sum of two independent (uncorrelated) variables equals the sum of the vari-
ances of the variables.
• Covariance of the sum of a constant and a variable: Cov[c + x, y] = Cov[x, y]
The covariance of two variables is unaffected when a constant is added to one of the
variables.
• Covariance of the product of a constant and a variable: Cov[cx, y] = c Cov[x, y]
Multiplying one of the variables by a constant increases their covariance by a factor equal to the
constant.

Chapter 1 Review Questions

1. Focus on the mean.

a. In words, what does the mean describe?
b. What is the equation for the mean?
2. Focus on the variance.
a. In words, what does the variance describe?
b. What is the equation for the variance?
3. What does a histogram illustrate?
30 Chapter 1

4. What does a scatter diagram illustrate?

5. Does the value of one variable help you predict the value of a second variable when the two
variables are
a. correlated?
b. independent (uncorrelated)?
6. What is the equation for the
a. covariance?
b. correlation coefficient?
7. What will the covariance and correlation coefficient equal when two variables are
a. positively correlated?
b. negatively correlated?
c. independent (uncorrelated)?

Chapter 1 Exercises

1. Consider the inches of precipitation in Amherst, MA, during 1964 and 1975:

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1964 5.18 2.32 2.71 2.72 0.83 1.84 3.02 3.01 0.94 1.32 1.68 3.98
1975 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90

a. For each year, record the number of months that fall into the following categories:

Inches of precipitation 1964 1975

0–1 _______ _______
1–2 _______ _______
2–3 _______ _______
3–4 _______ _______
4–5 _______ _______
5–6 _______ _______
6–7 _______ _______
7–8 _______ _______
8–9 _______ _______
9–10 _______ _______
10–11 _______ _______
b. Using your answers to part a, construct two histograms, one for 1964 and one for 1975.
Be certain that your horizontal and vertical scales are the same on the two histograms.
31 Descriptive Statistics

2. Consider the inches of precipitation in Amherst, MA, during 1964 and 1975.
a. Focus on the two histograms you constructed in exercise 1. Based on the histograms, in
which of the two years is the
i. center of the distribution greater (further to the right)? ______
ii. spread of the distribution greater? ______
b. For each of the two years, use your statistical software to find the mean and the sum of
squared deviations. Report your answers in the table below:

1964 1975
Mean ________ ________
Sum of squared deviations ________ ________

Getting Started in EViews

Access the Amherst weather data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Amherst Weather.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)

Instruct EViews to calculate the means and sum of squared deviations:

8. Common sample eliminates all observations in which there is one or more missing value in one of the variables; the
individual samples option does not do so. Since no values are missing for 1964 and 1975, the choice of common or
individual has no impact.
32 Chapter 1

In the Group window: Click Sample.

• In the Sample window: Enter year = 1964 in the “If condition (optional)” text area to restrict
the sample to 1964 only.
• Click OK. Descriptive statistics for 1964 appear in the Group window. Record the mean and
sum of squared deviations for 1964.

In the Group window: Click Sample.

• In the Sample window: Enter year = 1975 in the “If condition (optional)” text area to restrict
the sample to 1975 only.
• Click OK. Descriptive statistics for 1975 appear in the Group window. Record the mean and
sum of squared deviations for 1975.

Last, do not forget to close the file:

• In the EViews window: Click File, then Exit.
• In the Workfile window: Click No in response to the save changes made to workfile.

c. Using your answers to part b and some simple arithmetic (division), compute the variance
for each year:

1964 1975
Variance ________ ________

d. Are your answers to parts b and c consistent with your answer to part a? Explain.
3. Focus on precipitation in Amherst, MA in 1975. Consider a new variable, TwoPlusPrecip,
which equals two plus each month’s precipitation: TwoPlusPrecip = 2 + Precip.
a. Fill in the blanks in the table below:

1975 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Precip 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
TwoPlusPrecip ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____

b. Construct a histogram for TwoPlusPrecip and compare it to the histogram for Precip in
1975 that you constructed in problem 1.
i. How are the histograms related?
ii. What happened to the distribution center?
iii. What happened to the distribution spread?
c. Consider the equations that describe mean and variance of a constant plus a variable:

Mean[c + x] = c + Mean[x] Var[c + x] = Var[x]

33 Descriptive Statistics

Based on these equations and the mean and variance of Precip in 1975, what is the
i. mean of TwoPlusPrecip in 1975? ______
ii. variance of TwoPlusPrecip in 1975? ______
d. Using your statistical package, generate a new variable: TwoPlusPrecip = 2 + Precip. What
is the
i. mean of TwoPlusPrecip in 1975? ______
ii. sum of squared deviations of TwoPlusPrecip in 1975? ______

Getting Started in EViews

Access the Amherst weather data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Amherst Weather.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)

Instruct EViews to generate the new variable:

• In the Workfile window: Click Genr in the toolbar.
• In the Generate Series by Equation window; enter the formula for the new variable:

TwoPlusPrecip = 2 + Precip
• Click OK.

Instruct EViews to calculate the mean and sum of squared deviations of TwoPlusPrecip:
• In the Workfile window: Double click on TwoPlusPrecip.
• A spreadsheet displaying the value of TwoPlusPrecip for all the months appears.
• In the Series window: Click View; next click Descriptive Statistics & Tests and then Stats

Table. Descriptive statistics for all the months of the twentieth century now appear.
• In the Series window: Click Sample. In the Sample window: Enter year = 1975 in the “If
condition (optional)” text area to restrict the sample to 1975 only.
• Click OK. Descriptive statistics for 1975 appear in the Group window. Record the mean and
sum of squared deviations for 1975.
34 Chapter 1

iii. Using the sum of squared deviations and a calculator, compute the variance of TwoPlus-
Precip in 1975. ______
e. Are your answers to parts b, c, and d consistent? Explain.
4. Focus on precipitation in Amherst, MA, in 1975. Suppose that we wish to report precipitation
in centimeters rather than inches. To do this, just multiply each month’s precipitation by 2.54.
Consider a new variable, PrecipCm, which equals 2.54 times each month’s precipitation as
measured in inches: PrecipCm = 2.54 × Precip.
a. Consider the equations that describe mean and variance of a constant times a variable:

Mean[cx] = c Mean[x], Var[cx] = c2 Var[x]

Based on these equations and the mean and variance of Precip in 1975, what is the
i. mean of PrecipCm in 1975? ______
ii. variance of PrecipCm in 1975? ______
b. Using your statistical package, generate a new variable: PrecipCm = 2.54 × Precip. What
is the
i. mean of PrecipCm in 1975? ______
ii. sum of squared deviations of PrecipCm in 1975? ______

Getting Started in EViews

Access the Amherst weather data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Amherst Weather.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)

Instruct EViews to generate the new variable:

• In the Workfile window: Click Genr in toolbar.
• In the Generate Series by Equation window; enter the formula for the new variable:

PrecipCm = 2.54 * Precip

NB: The asterisk, *, is the EViews multiplication symbol.

• Click OK.

Instruct EViews to calculate the mean and sum of squared deviations of PrecipCm:
35 Descriptive Statistics

• In the Workfile window: Double click on PrecipCm.

• A spreadsheet now appears.
• In the Series window: Click View; next click Descriptive Statistics & Tests and then Stats

Table.
• In the Group window: Click Sample. In the Sample window: Enter year = 1975 in the “If
condition (optional)” text area to restrict the sample to 1975 only.
• Click OK. Descriptive statistics for 1975 appear in the Group window. Record the mean and
sum of squared deviations for 1975.

iii. Using the sum of squared deviations and a calculator, compute the variance of Pre-
cipCm in 1975. ______
c. Are your answers to parts a and b consistent? Explain.

Focus on thirty students who enrolled in an economics course during a previous semester.

Student SAT data: Cross-sectional data of student math and verbal high school SAT scores from
a group of 30 students.

SatMatht Math SAT score for student t

SatVerbalt Verbal SAT score for student t
SexM1t 1 if student t is male; 0 if female

The table below reports their SAT scores and sex:

Student SatMath SatVerbal SexM1 Student SatMath SatVerbal SexM1

1 670 760 0 16 680 580 1
2 780 700 0 17 750 730 1
3 720 700 0 18 630 610 1
4 770 750 0 19 700 730 1
5 610 620 0 20 730 650 1
6 790 770 0 21 760 730 1
7 740 800 0 22 650 650 1
8 720 710 0 23 800 800 1
9 700 680 0 24 680 750 1
10 750 780 0 25 800 740 1
11 800 750 1 26 800 770 1
12 770 690 1 27 770 730 1
13 790 750 1 28 750 750 1
14 700 620 1 29 790 780 1
15 730 700 1 30 780 790 1
36 Chapter 1

5. Consider the equations that describe the mean and variance of the sum of two variables:

Mean[x + y] = Mean[x] + Mean[y]

Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

a. Focus on SatMath and SatVerbal. On a sheet of graph paper, construct the scatter diagram.
b. Based on the scatter diagram, do SatMath and SatVerbal appear to be correlated? Explain.
c. Use your statistical package to compute the following descriptive statistics for SatMath
and SatVerbal; then fill in the blanks:

SatMath SatVerbal
Mean _______ _______
Variance _______ _______
Covariance _______
Correlation coefficient _______

Getting Started in EViews

Access the student SAT data online:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Student Data.]

Then:
•In the File Download window: Click Open. (Note that different browsers may present you
with a slightly different screen to open the workfile.)
• In the Workfile window: Highlight satmath by clicking on it; then, while depressing <Ctrl>,
click on satverbal to highlight it also.
• In the Workfile window: Double click on any of the highlighted variables.
•A new list now pops up: Click Open Group. A spreadsheet including the variables SatMath
and SatVerbal for all the students appears.

Now, instruct EViews to calculate the means:

•In the Group window: Click View; next click Descriptive Stats and then Individual Samples.9
Descriptive statistics now appear. Record the SatMath and SatVerbal means.

9. Common sample eliminates all observations in which there is one or more missing value in one of the variables; the
individual samples option does not do so. Since no values are missing, the choice of common or individual has no
impact.
37 Descriptive Statistics

Next instruct EViews to calculate the variances and covariance:

• In the Group window: Click View and then click Covariance Analysis. . . .
•In the Covariance Analysis window: Note that the Covariance checkbox is selected; then click
OK. The covariance matrix now appears. Record the variances and covariance.

Last, instruct EViews to calculate the correlation coefficient:

• In the Group window: Click View and then click Covariance Analysis. . . .
• In the Covariance Analysis window: Clear the Covariance box and select the Correlation box;
then click OK. The correlation matrix now appears. Record the correlation coefficient.
• Close the Group window to return to the Workfile window.

Note: Copying and Pasting EViews Text It is often convenient to copy and paste EViews results
into a word processing document such as Microsoft Word. In the long run this can save you
much time because you can reproduce your results quickly and accurately:
• In EViews, highlight the text you wish to copy and paste.
• Right click on the highlighted area.
• Unless you have a good reason to do otherwise, accept the default choice by clicking OK.
• In your word processor: click Paste.

Do these calculations support your answer to part b?

Focus on the sum of each student’s SAT scores: SatSum = SatMath + SatVerbal
d. Consider the equations that describe mean and variance of the sum of two variables:

Mean[x + y] = Mean[x] + Mean[y]

Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

Based on these equations and the mean and variance of SatMath and SatVerbal, what is the
i. mean of SatSum? ______
ii. variance of SatSum? ______
e. Using your statistical package, generate a new variable:
SatSum = SatMath + SatVerbal. What is the
i. mean of SatSum? ______
ii. sum of squared deviations of SatSum? ______
38 Chapter 1

Getting Started in EViews

Generate SatSum in EViews:

• In the Workfile window: Click Genr in toolbar
• In the Generate Series by Equation window. Enter the formula for the new variable:

SatSum = SatMath + SatVerbal

• Click OK.

f. Using a calculator compute the variance of SatSum. ______

Are your answers to parts d and e consistent?
6. Continue to focus on the student SAT data.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Student Data.]

a. If student 1 drops the course, would the mean Math SAT score of the 29 remaining students
increase or decrease?
b. More generally, if a student drops the course, what determines whether the mean Math
SAT score of the remaining students would increase or decrease?
c. If a student adds the course, what determines whether the mean Math SAT would increase
or decrease?
d. Evaluate the following statement:
“A student transfers from College A to College B. The mean Math SAT scores at both col-
leges increase.”
Could this statement possibly be true? If so, explain how; if not, explain why not.
7. Again, focus on the student SAT data. Consider the equation for the mean Math SAT score
of the students:
x1 + x2 + . . . + x30
Mean[ SatMath] =
30

where xi = student i’s Math SAT score

Next consider the mean Math SAT score of just the female students and the mean of the just the
male students. Since students 1 through 10 are female and students 11 through 30 are male:
x1 + x2 + . . . + x10
Mean[ SatMathFemale] =
10
x11 + x12 + . . . + x30
Mean[ SatMathMale] =
20
39 Descriptive Statistics

a. Using algebra, show that the mean for all students equals the weighted average of the
mean for female students and the mean for male students where the weights equal the propor-
tion of female and male students; that is,

Mean[SatMath] = WgtFemaleMean[SatMathFemale] + WgtMaleMean[SatMathMale]

where
Number of female students
Wgt Female =
Total number of students
= Weighht given to female students

and
Number of male students
Wgt Male =
Total number of students
= Weight giv
ven to male students

b. Do the weights sum to 1? Explain.

c. Find the means all students, for females, and for males. Are your results consistent with
the weighted average equation you just derived? Explain.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Student Data.]

Mean[SatMath] = ______
Mean[SatMathFemale] = ______ Mean[SatMathMale] = ______

8. The following data from the 1995 and 1996 baseball seasons illustrate what is known as
Simpson’s paradox.

Hits and at bats

1995 1996 Combined
Hits AB Hits AB Hits AB
Derek Jeter 12 48 183 582 ____ ____
David Justice 104 411 45 140 ____ ____
Batting averages
1995 1996 Combined
Derek Jeter ______ ______ ______
David Justice ______ ______ ______

a. Compute the batting average for both players in 1995. Fill in the appropriate blanks. In
1995, who had the higher average?
40 Chapter 1

b. Compute the batting average for both players in 1996. Fill in the appropriate blanks. In
1996, who had the higher average?
c. Next combine the hits and at bats for the two seasons. Fill in the appropriate blanks.
Compute the batting average for both players in the combined seasons. Fill in the appropriate
blanks. In the combined seasons, who had the higher average?
d. Explain why the batting average results appear paradoxical.
e. Resolve the paradox. Hint: Jeter’s combined season average can be viewed as a weighted
average of his two seasons, weighted by his at bats. Similarly Justice’s combined season
average can be viewed as a weighted average also. Apply the equation you derived in
problem 7.

Appendix 1.1: The Arithmetic of Means, Variances, and Covariances

Let us begin by quickly reviewing the mathematical definitions:

∑
T
x1 + x2 + . . . + xT x
t =1 t
Mean[ x] = x = =
T T

∑ ( xt − x )2
T
( x1 − x )2 + ( x2 − x )2 + … + ( xT − x )2 t =1
Var[ x] = =
T T

∑ ( xt − x ) ( yt − y )
T
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + … + ( xT − x ) ( yT − y ) t =1
Cov[ x, y] = =
T T

where T = total number of observations.

Mean of a Constant Plus a Variable: Mean[c + x] = c + Mean[x]

The mean of a constant plus a variable equals the constant plus the mean of the variable:
c + x1 + c + x2 + … + xT + c
Mean[c + x] =
T
c + c + . . . + c x1 + x2 + . . . + xT
= +
T T
Tc x1 + x2 + . . . + xT
= +
T T
= c + Mean[ x] = c + x
41 Descriptive Statistics

Mean of a Constant Times a Variable: Mean[cx] = c Mean[x]

The mean of a constant times a variable equals the constant times the mean of the variable:
cx1 + cx2 + . . . + cxT
Mean[cx] =
T
x1 + x2 + . . . + xT
=c
T
= c Mean[ x] = cx

Mean of the Sum of Two Variables: Mean[x + y] = Mean[x] + Mean[y]

The mean of the sum of two variables equals the sum of the means of the variables:
x1 + y1 + x2 + y2 + . . . + xT + yT
Mean[ x + y] =
T
( x + x + . . . + xT ) + ( y1 + y2 + . . . + yT )
= 1 2
T
x1 + x2 + . . . + xT y1 + y2 + . . . + yT
= +
T T
=x+y

Variance of a Constant Plus a Variable: Var[c + x] = Var[x]

The variance of a constant plus a variable equals the variance of the variable:

[(c + x1 ) − (c + x )]2 + [(c + x2 ) − (c + x )]2 + . . . + [(c + xT ) − (c + x )]2

Var[c + x] =
T
[(c − c) + ( x1 − x )]2 + [(c − c) + ( x2 − x )]2 + . . . + [(c − c) + ( xT − x )]2
=
T
( x1 − x )2 + ( x2 − x )2 + . . . + ( xT − x )2
=
T
= Var[ x]

Variance of a Constant Times a Variable: Var[cx] = c2 Var[x]

The variance of a constant times a variable equals the constant squared times the variance of the
variable:
42 Chapter 1

(cx1 − cx )2 + (cx2 − cx )2 + . . . + (cxT − cx )2

Var[cx] =
T
c 2 ( x1 − x )2 + c 2 ( x2 − x )2 + . . . + c 2 ( xT − x )2
=
T
( x − x ) 2
+ ( x − x ) 2
+ . . . + ( xT − x )2
= c2 1 2

T
= c 2 Var[ x]

Variance of the Sum of Two Variables: Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

The variance of the sum of two variables equals the sum of the variances of the variables plus
twice the variables’ covariance:
Var[ x + y]
[( x1 + y1 ) − ( x + y )]2 + . . . + [( xT + yT ) − ( x + y )]2
=
T
[( x1 − x ) + ( y1 − y )]2 + . . . + [( xT − x ) + ( yT − y )]2
=
T
[( x1 − x )2 + 2( x1 − x )( y1 − y ) + ( y1 − y )2 ] + . . . + [( xT − x )2 + 2( xT − x )( yT − y ) + ( yT − y )2 ]
=
T
[( x1 − x )2 + . . . + ( xT − x )2 ] + 2[( x1 − x )( y1 − y ) + . . . + ( xT − x )( yT − y )]
+ [( y1 − y )2 + . . . + ( yT − y )2 ]
=
T
( x1 − x )2 + . . . + ( xT − x )2 ( x1 − x )( y1 − y ) + . . . + ( xT − x )( yT − y )
= +2
T T
( y1 − y )2 + . . . + ( yT − y )2
+
T
= Var[ x] + 2 Cov[ x, y] + Var[ y]

Variance of the Sum of Two Independent (Uncorrelated) Variables: Var[x + y] = Var[x] + Var[y]

The variance of the sum of two independent (uncorrelated) variables equals the sum of the vari-
ances of the variables:

Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

= Var[x] + Var[y]

(since x and y are independent, Cov[x, y] = 0).

43 Descriptive Statistics

Covariance of the Sum of a Constant and a Variable: Cov[c + x, y] = Cov[x, y]

The covariance of two variables is unaffected when a constant is added to one of the
variables:
Cov[c + x, y]

=
[(c + x1 ) − (c + x )] ( y1 − y ) + [(c + x2 ) − (c + x )] ( y2 − y ) + . . . + [(c + xT ) − (c + x )] ( yT − y )
T
=
[(c − c) + ( x1 − x )] ( y1 − y ) + [(c − c) + ( x2 − x )] ( y2 − y ) + . . . + [(c − c) + ( xT − x )] ( yT − y )
T
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
=
T
= Cov[ x, y]

Covariance of the Product of a Constant and a Variable: Cov[cx, y] = c Cov[x, y]

Multiplying a variable by a constant increases the covariance by a factor equal to the

constant:

(cx1 − cx ) ( y1 − y ) + (cx2 − cx ) ( y2 − y ) + . . . + (cxT − cx ) ( yT − y )

Cov[cx, y] =
T
c ( x1 − x ) ( y1 − y ) + c ( x2 − x ) ( y2 − y ) + . . . + c ( xT − x ) ( yT − y )
=
T
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + . . . + ( xT − x ) ( yT − y )
=c
T
= cCov[ x, y]
Essentials of Probability and Estimation Procedures
2

Chapter 2 Outline

2.1 Random Processes and Probability

2.1.1 Random Process: A Process Whose Outcome Cannot Be Predicted with Certainty
2.1.2 Probability: The Likelihood of a Particular Outcome of a Random Process
2.1.3 Random Variable: A Variable That Is Associated with an Outcome of a Random
Process

2.2 Discrete Random Variables and Probability Distributions

2.2.1 Probability Distribution Describes the Probability for All Possible Values of a Random
Variable
2.2.2 A Random Variable’s Bad News and Good News
2.2.3 Relative Frequency Interpretation of Probability
2.2.4 Relative Frequency Interpretation of Probability Summary

2.3 Describing a Probability Distribution of a Random Variable

2.3.1 Center of the Probability Distribution: Mean (Expected Value) of the Random
Variable
2.3.2 Spread of the Probability Distribution: Variance of the Random Variable

2.4 Continuous Random Variables and Probability Distributions

2.5 Estimation Procedures: Populations and Samples

2.5.1 Clint’s Dilemma: Assessing Clint’s Political Prospects
2.5.2 Usefulness of Simulations
2.5.3 Center of the Probability Distribution: Mean of the Random Variable
2.5.4 Spread of the Probability Distribution: Variance of the Random Variable

2.6 Mean, Variance, and Covariance: Data Variables and Random Variables
46 Chapter 2

Chapter 2 Prep Questions

1. Consider a standard deck of 52 cards: 13 spades, 13 hearts, 13 diamonds, and 13 clubs.

Thoroughly shuffle the deck and then randomly draw one card. Do not look at the card. Fill in
the following blanks:
a. There are ____ chances out of _____ that the card drawn is a heart; that is, the probability
that the card drawn will be a heart equals _____.
b. There are ____ chances out of _____ that the card drawn is an ace; that is, the probability
that the card drawn will be an ace equals _____.
c. There are ____ chances out of _____ that the card drawn is a red card; that is, the probabil-
ity that the card drawn will be a red card equals _____.
2. Consider the following experiment: Thoroughly shuffle a standard deck of 52 cards. Ran-
domly draw one card and note whether or not it is a heart. Replace the card drawn.
a. What is the probability that the card drawn will be a heart? _____
b. If you were to repeat this experiment many, many times, what portion of the time would
you expect the card drawn to be a heart? _____
3. Review the arithmetic of means and variances and then complete the following equations:
a. Mean[cx] = _______________________
b. Mean[x + y] = _______________________
c. Var[cx] = _______________________
d. Var[x + y] = _______________________
e. When x and y are independent: Var[x + y] = _______________________
4. Using algebra, show that the expression:

(1 − p)2p + p2(1 − p)

simplifies to

p(1 − p)

2.1 Random Processes and Probability

2.1.1 Random Process: A Process Whose Outcome Cannot Be Predicted with Certainty

The outcome of a random process is uncertain. Tossing a coin is a random process because
you cannot tell beforehand whether the coin will land heads or tails. A baseball game is a random
process because the outcome of the game cannot be known beforehand, assuming of course that
the game has not been fixed. Drawing a card from a well-shuffled deck of fifty-two cards is a
47 Essentials of Probability and Estimation Procedures

random process because you cannot tell beforehand whether the card will be the ten of hearts,
the six of diamonds, the ace of spades, and so on.

2.1.2 Probability: The Likelihood of a Particular Outcome of a Random Process

The probability of an outcome tells us how likely it is for that outcome to occur. The value of
a probability ranges from 0 to 1.0. A probability of 0 indicates that the outcome will never occur;
1.0 indicates that the outcome will occur with certainty. A probability of one-half indicates that
the chances of the outcome occurring equals the chances that it will not. For example, if the
experts believe that a baseball game between two teams, say the Red Sox and Yankees, is a
toss-up, then the experts believe that
• the probability of a Red Sox win (and a Yankee loss) is one-half and
• the probability of a Red Sox loss (and a Yankee win) is also one-half.

An Example: A Deck of Four Cards

We will use a card draw as our first illustration of a random process. While we could use a
standard deck of fifty-two cards as the example, the arithmetic can become cumbersome. Con-
sequently, to keep the calculations manageable, we will use a deck of only four cards, the 2 of
clubs, the 3 of hearts, the 3 of diamonds, and the 4 of hearts:

2♣, 3♥, 3♦, 4♥

Experiment 2.1: Random Card Draw

• Shuffle the 2♣, 3♥, 3♦, and 4♥ thoroughly.

• Draw one card and record its value.
• Replace the card.

This experiment represents one repetition of a random process because we cannot determine
which card will be drawn before the experiment is conducted. Throughout this textbook, we will
continue to use the word experiment to represent one repetition of a random process. It is easy
to calculate the probability of each possible outcome for our card draw experiment.

Question: What is the probability of drawing the 2 of clubs?

Answer: Since the cards are well shuffled, each card is equally likely to be drawn. There is
one chance in four of drawing the 2 of clubs, so the probability of drawing the 2 of clubs is 1/4.
Question: What is the probability of drawing the 3 of hearts?
Answer: There is one chance in four of drawing the 3 of hearts, so the probability of drawing
the 3 of hearts is 1/4.
48 Chapter 2

Similarly the probability of drawing the 3 of diamonds is 1/4 and the probability of drawing the
4 of hearts is 1/4. To summarize,

1 1 1 1
Prob[2♣] = , Prob[3♥] = , Prob[3♦] = , Prob[ 4♥] =
4 4 4 4

2.1.3 Random Variable: A Variable That Is Associated with an Outcome of a Random Process

A random variable is a variable that is associated with a random process. The value of a random
variable cannot be determined with certainty before the experiment is conducted. There are two
types of random variables:
• A discrete random variable can only take on a countable number of discrete values.
• A continuous random variable can take on a continuous range of values; that is, a continuous
random variable can take on a continuum of values.

2.2 Discrete Random Variables and Probability Distributions

To illustrate a discrete random variable, consider our card draw experiment and define v:

v = the value of the card drawn

That is, v equals 2, if the 2 of hearts were drawn; 3, if the 3 of hearts or the 3 of diamonds were
drawn; and 4, if the 4 of hearts were drawn.

Question: Why do we call v a discrete random variable?

Answer:

• v is discrete because it can only take on a countable number of values; v can take on three
values: 2 or 3 or 4.
• v is a random variable because we cannot determine the value of v before the experiment is
conducted.

2.2.1 Probability Distribution Describes the Probability for All Possible Values of a Random
Variable

While we cannot determine v’s value beforehand, we can calculate the probability of each pos-
sible value:
• v equals 2 whenever the 2 of clubs is drawn; since the probability of drawing the 2 of clubs
is 1/4, the probability that v will equal 2 is 1/4.
49 Essentials of Probability and Estimation Procedures

Table 2.1
Probability distribution of random variable v

Card drawn v Prob[v]

1
2♣ 2 = 0.25
4
1 1 1
3♥ or 3♦ 3 + = = 0.50
4 4 2
1
4♥ 4 = 0.25
4

0.50

0.25

v
2 3 4

Figure 2.1
Probability distribution of the random variable v

• v equals 3 whenever the 3 of hearts or the 3 of diamonds is drawn; since the probability of
drawing the 3 of hearts is 1/4 and the probability of drawing the 3 of diamonds is 1/4, the prob-
ability that v will equal 3 is 1/2.
• v equals 4 whenever the 4 of hearts is drawn; since the probability of drawing the 4 of hearts
is 1/4, the probability that v will equal 4 is 1/4.

Table 2.1 describes the probability distribution of the random variable v. The probability
distribution is sometimes called the probability density function of the random variable or simply
the distribution of the random variable. Figure 2.1 illustrates the probability distribution with a
graph that indicates how likely it is for the random variable to equal each of its possible values.
Note that the probabilities must sum to 1 because one of the four cards must be drawn; v must
equal either 2 or 3 or 4. This illustrates a general principle: The sum of the probabilities of all
possible outcomes must equal 1.
50 Chapter 2

2.2.2 A Random Variable’s Bad News and Good News

In general, a random variable brings both bad and good news. Before the experiment is
conducted:

Bad news: What we do not know: on the one hand, we cannot determine the numerical value
of the random variable with certainty.
Good news: What we do know: on the other hand, we can often calculate the random variable’s
probability distribution telling us how likely it is for the random variable to equal each of its
possible numerical values.

2.2.3 Relative Frequency Interpretation of Probability

We can interpret the probability of a particular outcome as the relative frequency of the outcome
after the random process, the experiment, is repeated many, many times. We will illustrate the
relative frequency interpretation of probability using our card draw experiment:

Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 2?
Answer: Since one of the four cards is a 2, we would expect to draw a 2 about one-fourth of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 2 should be about 1/4, its probability.
Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 3?
Answer: Since two of the four cards are 3’s, we would expect to draw a 3 about one-half of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 3 should be about 1/2 its probability.
Question: If we repeat the experiment many, many times, what portion of the time would we
draw a 4?
Answer: Since one of the four cards is a 4, we would expect to draw a 4 about one-fourth of
the time. That is, when the experiment is repeated many, many times, the relative frequency of
a 4 should be about 1/4, its probability.

Econometrics Lab 2.1: Card Draw—Relative Frequency Interpretation of Probability

We could justify this interpretation of probability “by hand,” but doing so would be a very time-
consuming and laborious process. Computers, however, allow us to simulate the experiment
quickly and easily. The Card Draw simulation in our econometrics lab does so (figure 2.2).
51 Essentials of Probability and Estimation Procedures

2 of Hearts
2 of Diamonds
2 of Clubs
3 of Spades
3 of Hearts Cards selected to
3 of Diamonds be in the deck
3 of Clubs
4 of Spades
4 of Hearts
Card drawn
in this repetition
Start Stop Pause
Value of card drawn
in this repetition
Repetition

Value

Mean

Var

Value Rel freq

2 Relative
frequency of
3
2, 3, and 4
4

Figure 2.2
Card Draw simulation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2.1.]

We first specify the cards to include in our deck. By default, the 2♣, 3♥, 3♦, and 4♥ are
included. When we click Start, the simulation randomly selects one of our four cards. The card
drawn and its value are reported. To randomly select a second card, click Continue. A table
reports on the relative frequency of each possible value and a histogram visually illustrates the
distribution of the numerical values.
Click Continue repeatedly to convince yourself that our experiment is indeed a random
process; that is, convince yourself that there is no way to determine which card will be drawn
beforehand. Next uncheck the Pause checkbox and click Continue. The simulation no longer
52 Chapter 2

0.50

0.25

v
2 3 4

Figure 2.3
Histogram of the numerical values of v

pauses after each card is selected. It will now repeat the experiment very rapidly. What happens
as the number of repetitions becomes large? The relative frequency of a 2 is approximately 0.25,
the relative frequency of a 3 is approximately 0.50, and the relative frequency of a 4 is approxi-
mately 0.25 as illustrated by the histogram appearing in figure 2.3. After many, many repetitions
click Stop. Recall the probabilities that we calculated for our random variable v:

Prob[v = 2] = 0.25

Prob[v = 3] = 0.50

Prob[v = 4] = 0.25

This simulation illustrates the relative frequency interpretation of probability.

2.2.4 Relative Frequency Interpretation of Probability Summary

When the experiment is repeated many, many times, the relative frequency of each outcome
equals its probability. After many, many repetitions the distribution of the numerical values from
all the repetitions mirrors the probability distribution:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
53 Essentials of Probability and Estimation Procedures

2.3 Describing a Probability Distribution of a Random Variable

2.3.1 Center of the Probability Distribution: Mean (Expected Value) of the Random Variable

We have already defined the mean of a data variable in chapter 1. That is, the mean is the average
of the numerical values. We can extend the notion of the mean to a random variable by applying
the relative frequency interpretation of probability. The mean of a random variable equals the
average of the numerical values after the experiment is repeated many, many times. The mean
is often called the expected value because that is what we would expect the numerical value to
equal on average after many, many repetitions of the experiment.1

Question: On average, what would we expect v to equal if we repeated our experiment many,
many times? About one-fourth of the time v would equal 2, one-half of the time v would equal
3, and one-fourth of the time v would equal 4:
1 1 1
of the time of the time of the time
4 2 4
↓ ↓ ↓
v=2 v=3 v=4

Answer: On average, v would equal 3. Consequently the mean of the random variable v
equals 3.
More formally, we can calculate the mean of a random variable using the following two steps:
• Multiply each value by the value’s probability.
• Sum the products.

The mean equals the sum of these products. The following equation describes the steps more
concisely:2

Mean[ v] = ∑ All v v Prob[ v]

In words, it states that for each possible value, multiply the value and its probability; then sum
the products.
Let us now “dissect” the right-hand side of the equation:

1. The mean and expected value of a random variable are synonyms. Throughout this textbook we will be consistent
and always use the term “mean.” You should note, however, that the term “expected value” is frequently used instead
of “mean.”
2. Note that the v in Mean[v] is in a bold font. This is done to emphasize the fact that the mean refers to the entire
probability distribution. Mean[v] refers to the center of the entire probability distribution, not just a single value. When
v does not appear in a bold font, we are referring to a specific value that v can take on.
54 Chapter 2

• “v” represents the numerical value of the random variable.

• “Prob[v]” represents the probability of v.
• The uppercase sigma, Σ, is the summation sign; it indicates that we should sum the product
of v and its probability, Prob[v].
• The “All v” indicates that we should sum over all numerical values of v.

In our example, v can take on three values: 2, 3, and 4. Applying the equation for the mean
obtains

v=2 v=3 v=4

↓ ↓ ↓
1 1 1
Mean[ v] = 2 × + 3× + 4×
4 2 4
1 3
= + + 1 = 3
2 2

2.3.2 Spread of the Probability Distribution: Variance of the Random Variable

Next let us turn our attention to the variance. Recall from chapter 1 that the variance of a data
variable describes the spread of a data variable’s distribution. The variance equals the average
of the squared deviations of the values from the mean. Just as we used the relative frequency
interpretation of probability to extend the notion of the mean to a random variable, we will now
use it to extend the notion of the variance. The variance of a random variable equals the average
of the squared deviations of the values from its mean after the experiment is repeated many,
many times.
Begin by calculating the deviation from the mean and then the squared deviation for each
possible value of v:

Deviation from Squared

Card drawn v Mean[v] Prob[v]
Mean[v] deviation
1
2♣ 2 3 2 − 3 = −1 1 = 0.25
4
1
3♥ or 3♦ 3 3 3−3=0 0 = 0.50
2
1
4♥ 4 3 4−3=1 1 = 0.25
4
55 Essentials of Probability and Estimation Procedures

If we repeat our experiment many, many times, what would the squared deviations equal on
average?
•About one-fourth of the time v would equal 2, the deviation would equal −1, and the squared
deviation 1.
•About one-half of the time v would equal 3, the deviation would equal 0, and the squared
deviation 0.
•About one-fourth of the time v would equal 1, the deviation would equal 1, and the squared
deviation 1.

That is, after many, many repetitions,

1 1 1
of the time of the time of the time
4 2 4
↓ ↓ ↓
v=2 v=3 v=4
↓ ↓ ↓
Deviation = −1 Deviation = 0 Deviation = 1
↓ ↓ ↓
Squared deviation = 1 Squared deviation = 0 Squared deviation = 1

Half of the time the squared deviation would equal 1 and half of the time 0. On average, the
squared deviations from the mean would equal 1/2.
More formally, we can calculate the variance of a random variable using the following four
steps:
• For each possible value of the random variable, calculate the deviation from the mean.
• Square each value’s deviation.
• Multiply each value’s squared deviation by the value’s probability.
• Sum the products.

We use an equation to state this more concisely:

Var[ v] = ∑ All v (v − Mean[ v])2 Prob[ v]

For each possible value, multiply the squared deviation and its probability; then sum the prod-
ucts. In our example there are three possible values for v: 2, 3, and 4:
56 Chapter 2

v=2 v=3 v=4

↓ ↓ ↓
1 1 1
Var[ v] = 1 × + 0× + 1×
4 2 4
1 1 1
= + 0 + = = 0.5
4 4 2

Econometrics Lab 2.2: Card Draw Simulation—Checking the Mean and Variance Calculations

It is useful to use our simulation to check our mean and variance calculations. We will exploit
the relative frequency interpretation of probability to do so. Recall our experiment:
• Shuffle the 2♣, 3♥, 3♦, and 4♥ thoroughly.
• Draw one card and record its value.
• Replace the card.

The relative frequency interpretation of probability asserts that when an experiment is repeated
many, many times, the relative frequency of each outcome equals its probability. After many,
many repetitions the distribution of the numerical values from all the repetitions mirrors the
probability distribution:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution

If our equations are correct, what should we expect when we repeat our experiment many,
many times?
• The mean of a random variable’s probability distribution should equal the average of the
numerical values of the variable obtained from each repetition of the experiment after the experi-
ment is repeated many, many times. Consequently after many, many repetitions the mean should
equal about 3.
• The variance of a random variable’s probability distribution should equal the average of the
squared deviations from the mean obtained from each repetition of the experiment after the
experiment is repeated many, many times. Consequently after many, many repetitions the vari-
ance should equal about 0.5.
57 Essentials of Probability and Estimation Procedures

Mean (average) of the

Value
numerical values of
the cards drawn from
Mean
all repetitions
Var
Variance of the
numerical values of
the cards drawn from
all repetitions

Figure 2.4
Card Draw simulation

The Card Draw simulation in our econometrics lab allows us to confirm this:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2.2.]

As before, the 2♣, 3♥, 3♦, and 4♥ are selected by default (figure 2.4); so just click Start. Recall
that the simulation now randomly selects one of the four cards. The numerical value of the card
selected is reported. Note that the mean and variance of the numerical values are also reported.
You should convince yourself that the simulation is calculating the mean and variance correctly
by clicking Continue and calculating the mean and variance yourself. You will observe that the
simulation is indeed performing the calculations accurately. If you are still skeptical, click Con-
tinue again and perform the calculations. Do so until you are convinced that the mean and
variance reported by the simulation are indeed correct.
Next uncheck the Pause checkbox and click Continue. The simulation no longer pauses after
each card is selected. It will now repeat the experiment very rapidly. After many, many repeti-
tions click Stop. What happens as the number of repetitions becomes large?
58 Chapter 2

• The mean of the numerical values is about 3. This is consistent with our equation for the mean
of the random variable’s probability distribution:

Mean[ v] = ∑ All v v Prob[ v]

1 1 1
Mean[ v] = 2 × + 3× + 4×
4 2 4
1 3
= + + 1 = 3
2 2
• The variance of the numerical values is about 0.5. This is consistent with our equation for the
variance of the random variable’s probability distribution:

Var[ v] = ∑ All v (v − Mean[ v])2 Prob[ v]

1 1 1
Var[ v] = 1 × + 0× + 1×
4 2 4
1 1 1
= + 0 + = = 0.5
4 4 2

The simulation illustrates that the equations we use to compute the mean and variance are indeed
correct.

2.4 Continuous Random Variables and Probability Distributions

A continuous random variable, unlike a discrete random variable, can take on a continuous
range of values, a continuum of values. To learn more about these random variables, consider
the following example. Dan Duffer consistently hits 200 yard drives from the tee. A diagram of
the eighteenth hole appears in figure 2.5. The fairway is 32 yards wide 200 yards from the tee.
While the length of Dan’s drives is consistent (he always drives the ball 200 yards from the tee),
he is not consistent “laterally.” That is, his drives sometimes go to the left of where he aims and
sometimes to the right. Despite all the lessons Dan has taken, his drive can land up to 40 yards
to the left and up to 40 yards to the right of his target point. Suppose that Dan’s target point is
the center of the fairway. Since the fairway is 32 yards wide, there are 16 yards of fairway to
the left of Dan’s target point and 16 yards of fairway to the right.
The probability distribution appearing below the diagram of the eighteenth hole describes the
probability that his drive will go to the left and right of his target point. v equals the lateral
distance from Dan’s target point. A negative v represents a point to the left of the target point
and a positive v a point to the right. Note that v can take an infinite number of values between
−40 and +40: v can equal 10 or 16.002 or −30.127, and so on. v is a continuous rather than a
discrete random variable. The probability distribution at the bottom of figure 2.5 indicates how
likely it is for v to equal each of its possible values.
59 Essentials of Probability and Estimation Procedures

Eighteen hole

Fairway

32 yards
Lake
Target

200 yards
Left Right
rough rough

Tee

Probability distribution
0.025

0.020

0.015

0.010

0.005

v
–40 –32 –24 –16 –8 0 8 16 24 32 40

Figure 2.5
A continuous random variable

What is the area beneath the probability distribution? Applying the equation for the area of a
triangle obtains
1 1
Area beneath = × 0.025 × 40 + × 0.025 × 40
2 2
= 0.5 + 0.5 = 1

The area equals 1. This is not accidental. Dan’s probability distribution illustrates the property
that all probability distributions must exhibit:
• The area beneath the probability distribution must equal 1.
60 Chapter 2

Eighteen hole

Fairway

32 yards
Lake
Target

200 yards
Left Right
rough rough

Tee

Probability distribution
0.025 Prob[v between –16 and +16]

Prob[v less than –16] Prob [v greater than +16]

0.015

v
–40 –32 –24 –16 –8 0 8 16 24 32 40

Figure 2.6
A continuous random variable—Calculating probabilities

The area equaling 1 simply means that a random variable must always take on one of its pos-
sible values (see figure 2.6). In Dan’s case the area beneath the probability distribution must
equal 1 because Dan’s ball must land somewhere.
Let us now calculate some probabilities:
• What is the probability that Dan’s drive will land in the lake? The shore of the lake lies 16
yards to the right of the target point; hence the probability that his drive lands in the lake equals
the probability that v will be greater than 16:

Prob[Drive in lake] = Prob[v greater than +16]

61 Essentials of Probability and Estimation Procedures

This just equals the area beneath the probability distribution that lies to the right of 16. Applying
the equation for the area of a triangle:

1
Prob[ Drive in lake] = × 0.015 × 24 = 0.18
2
• What is the probability that Dan’s drive will land in the left rough? The left rough lies 16 yards
to the left of the target point; hence, the probability that his drive lands in the left rough equals
the probability that v will be less than or equal to −16:

Prob[Drive in left rough] = Prob[v less than −16]

This just equals the area beneath the probability distribution that lies to the left of −16:
1
Prob[ Drive in left rough] = × 0.015 × 24 = 0.18
2
• What is the probability that Dan’s drive will land in the fairway? The probability that his drive
lands in the fairway equals the probability that v will be within 16 yards of the target point:

Prob[Drive in fairway] = Prob[v between −16 and +16]

This just equals the area beneath the probability distribution that lies between −16 and 16. We
can calculate this area by dividing the area into a rectangle and triangle:
1
Prob[ Drive in fairway ] = 0.015 × 32 + × 0.010 × 32
2
= 0.015 × 32 + 0.005 × 322
= (0.015 + 0.005) × 32
= 0.020 × 32 = 0.64

As a check let us sum these three probabilities:

Prob[Drive in lake] + Prob[Drive in left rough] + Prob[Drive in fairway] = 0.18 + 0.18 + 0.64
= 1.0

The sum equals 1.0, illustrating the fact that Dan’s drive must land somewhere. This example
illustrates how we can use probability distributions to compute probabilities.

2.5 Estimation Procedures: Populations and Samples

We will now apply what we have learned about random variables to gain insights into statistics
that are cited in the news every day. For example, when the Bureau of Labor Statistics calculates
the unemployment rate every month, it does not interview every American, the entire American
population; instead, it gathers information from a subset of the population, a sample. More
62 Chapter 2

specifically, data are collected from interviews with about 60,000 households. Similarly political
pollsters do not poll every American voter to forecast the outcome of an election, but rather they
query only a sample of the voters. In each case a sample of the population is used to draw infer-
ences about the entire population. How reliable are these inferences? To address this question,
we consider an example.

2.5.1 Clint’s Dilemma: Assessing Clint’s Political Prospects

A college student, Clinton Jefferson Williams, is running for president of his student body. On
the day before the election, Clint must decide whether or not to hold a pre-election beer tap
rally:
• If he is comfortably ahead, he will not hold the beer tap rally; he will save his campaign funds
for a future political endeavor (or perhaps a Caribbean vacation in January).
• If he is not comfortably ahead, he will hold the beer tap rally to try to sway some voters.

There is not enough time to interview every member of the student body, however. What should
Clint do? He decides to conduct a poll.

Clint’s Opinion Poll

• Questionnaire: Are you voting for Clint?
•Procedure: Clint selects 16 students at random and poses the question. That is, each of the
16 randomly selected students is asked who he/she supports in the election.
• Results: 12 students report that they will vote for Clint and 4 against Clint.

Econometrician’s philosophy
If you lack the information to determine the value directly, estimate the value to the best of your
ability using the information you do have. By conducting the poll, Clint has adopted the philoso-
phy of the econometrician. Clint uses the information collected from the 16 students, the sample,
to draw inferences about the entire student body, the population. Seventy-five percent, 0.75, of
the sample support Clint:

Estimate of the actual population fraction supporting Clint = EstFrac = 12/16 = 3/4 = 0.75

This suggests that Clint leads, does it not? But how confident should Clint be that he is in fact
ahead. Clint faces a dilemma.

Clint’s Dilemma
Should Clint be confident that he has the election in hand and save his funds or should he finance
the beer tap rally?
We will now pursue the following project to help Clint resolve his dilemma.
63 Essentials of Probability and Estimation Procedures

Project
Use Clint’s opinion poll to assess his election prospects.

2.5.2 Usefulness of Simulations

In reality, Clint only conducts one poll. How, then, can a simulation of the polling process be
useful? The relative frequency interpretation of probability provides the answer. We can use a
simulation to conduct a poll many, many times. After many, many repetitions the simulation
reveals the probability distribution of the possible outcomes for the one poll that Clint
conducts:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution

We will now illustrate how the probability distribution might help Clint decide whether or not
to fund the beer tap rally.

Econometrics Lab 2.3: Simulating Clint’s Opinion Poll

The Opinion Poll simulation in our Econometrics Lab can help Clint address his dilemma. In
the simulation, we can specify the sample size. To mimic Clint’s poll, a sample size of 16 is
selected by default (shown in figure 2.7). Furthermore we can do something in the simulation
that we cannot do in the real world. We can specify the actual fraction of the population that
supports Clint, ActFrac. By default, the actual population fraction is set at 0.5; half of all voters
support Clint and half do not. In other words, we are simulating an election that is a toss-up.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2.3.]

When we click the Start button, the simulation conducts a poll of 16 people and reports the
fraction of those polled that support Clint:
Number for Clint
EstFrac =
Sample size
Number for Clint
=
16
64 Chapter 2

ActFrac Sample size

Sample
0.1 10
size
Actual 0.2 16
population 0.3 25
fraction 0.4 50
0.5
0.6
0.7

Start Stop
Pause

Repetition Mean (average) of the

numerical values of
the estimated fractions
EstFrac
from all repetitions

Numerical value Mean

of the estimated Variance of the
Var
fraction in this numerical values of
repetition the estimated fractions
from all repetitions

Figure 2.7
Opinion Poll simulation

EstFrac equals the estimated fraction of the population supporting Clint. To conduct a second
poll, click the Continue button. Do this several times. What do you observe? Sometimes the
estimated fraction, EstFrac, may equal the actual population fraction, 0.5, but usually it does
not. Furthermore EstFrac is a random variable; we cannot predict its value with certainty before
the poll is conducted. Next uncheck the Pause checkbox and click Continue. After many, many
repetitions click Stop.
The simulation histogram illustrates that sometimes 12 or more of those polled support Clint
even though only half the population actually supports him. So it is entirely possible that the
election is a toss-up even though 12 of the 16 individuals supported Clint in his poll. In other
words, Clint cannot be completely certain that he is leading, despite the fact that 75 percent of
the 16 individuals polled supported him. And where does Clint stand? The poll results do not
allow him to conclude he is leading with certainty. What conclusions can Clint justifiably draw
from his poll results?
65 Essentials of Probability and Estimation Procedures

To address this question, we should first do some “groundwork.” We begin by considering a

very simple experiment. While this experiment may appear naïve or even silly, it provides the
foundation that will allow us to help Clint address his dilemma. So please be patient.

Experiment 2.2: Opinion Poll with a Sample Size of 1—An Unrealistic but Instructive
Experiment

Write the name of each student in the college, the population, on a 3 × 5 card; then:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.

Next define the variable, v:

⎧1 if the individual polled supports Clint

v=⎨
⎩0 otherwise
v is a random variable. We cannot determine the numerical value of v before the experiment is
conducted because we cannot know beforehand whether or not the randomly selected student
will support Clint or not (see figure 2.8).

Population

Sample

For Clint?

Figure 2.8
Opinion Poll simulation—Sample size of one
66 Chapter 2

v Prob
For Clint 1 1/2
1/2

Individual

1/2
Not for Clint 0 1/2

Figure 2.9
Probabilities for a sample size of one

Question: What, if anything, can we say about the random variable v?

Answer: We can describe its probability distribution.

To explain how we can do so, assume for the moment that the election is actually a toss-up
as we did in our simulation; that is, assume that half the population supports Clint and half does
not. We make this hypothetical assumption only temporarily because it will help us understand
the polling process. With this assumption we can easily determine v’s probability distribution.
Since the individual is chosen at random, the chances that the individual will support Clint equal
the chances he/she will not (see figure 2.9):

Individual’s
v Prob[v]
response
1
For Clint 1
2
1
Not for Clint 0
2

We describe v’s probability distribution by calculating its center (mean) and spread (variance).

2.5.3 Center of the Probability Distribution: Mean of the Random Variable

Recall the equation for the mean of a random variable:

Mean[ v] = ∑ All v v Prob[ v]

For each possible value, multiply the value and its probability; then sum the products.
There are two possible values for v, 1 and 0:
67 Essentials of Probability and Estimation Procedures

v=1 v=0
↓ ↓
1 1
Mean[ v] = 1 × + 0×
2 2
1 1
= + 0 =
2 2

This makes sense, does it not? In words, the mean of a random variable equals the average
of the values of the variable after the experiment is repeated many, many times. Recall that we
have assumed that the election is a toss-up. Consequently after the experiment is repeated many,
many times we would expect v to equal
• 1, about half of the time.
• 0, about half of the time.

After many, many repetitions of the experiment, the numerical value of v should average out to
equal 1/2.

2.5.4 Spread of the Probability Distribution: Variance of the Random Variable

Recall the equation and the four steps we used to calculate the variance:

Var[ v] = ∑ All v (v − Mean[ v])2 Prob[ v]

For each possible value, multiply the squared deviation and its probability; then sum the
products.
• For each possible value, calculate the deviation from the mean;
• Square each value’s deviation;
• Multiply each value’s squared deviation by the value’s probability;
• Sum the products.

Individual’s Deviation from Squared

v Mean[v] Prob[v]
response Mean[v] deviation
1 1 1 1 1
For Clint 1 1− =
2 2 2 4 2
1 1 1 1 1
Not for Clint 0 0− =−
2 2 2 4 2
68 Chapter 2

There are two possible values for v, 1 and 0:

v=1 v=0
↓ ↓
1 1 1 1
Var[ v] = × + ×
4 2 4 2
1 1 1
= + =
8 8 4

The variance equals 1/4.

Econometrics Lab 2.4: Polling—Checking the Mean and Variance Calculations

We will now use our Opinion Poll simulation to check our mean and variance calculations by
specifying a sample size of 1. In this case the estimated fraction, EstFrac, and v are identical:

Number for Clint Number for Clint

EstFrac = = =v
Sample size 1
Once again, we exploit the relative frequency interpretation of probability. After many, many
repetitions of the experiment the distribution of the numerical values from the experiment mirrors
the random variable’s probability distribution:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution

If our calculations for the mean and variance of v’s probability distribution are correct, the mean
of the numerical values should equal about 0.50 and the variance about 0.25 after many, many
repetitions:
Mean of the numerical
Variance of numerical values
values
↓ After many, many repetitions ↓
Mean of probability Variance of probability
1 1
distribution = = 0.50 distribution = = 0.25
2 4
69 Essentials of Probability and Estimation Procedures

Table 2.2
Opinion Poll simulation results—sample size of one

1
Actual population fraction = ActFrac = p = = 0.50
2

Equations: Simulation:

Mean of Variance of Mean of Variance of

v’s v’s numerical numerical
probability probability Simulation values of v from values of v from
distribution distribution repetitions the experiments the experiments

1 1
= 0.50 = 0.25 >1,000,000 ≈ 0.50 ≈ 0.25
2 4

v Prob
For Clint 1 p
p

Individual

1–p
Not for Clint 0 1–p

Figure 2.10
Probabilities for a sample size of one

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2.4.]

Table 2.2 confirms that after many, many repetitions the mean numerical value is about 0.50
and the variance about 0.25, which is consistent with our calculations.

Generalization
Thus far we have assumed that the portion of the population supporting Clint is 1/2. Let us now
generalize our analysis by letting p equal the actual fraction of the population supporting Clint’s
candidacy: ActFrac = p. The probability that that the individual selected will support Clint just
equals p, the actual fraction of the population supporting Clint (see figure 2.10):

Individual’s
v Prob(v)
Response
For Clint 1 p
Not for Clint 0 1−p
70 Chapter 2

where

p = ActFrac = Actual fraction of population supporting Clint

Number of Clint supporters in student body
=
Total number of students in studeent body
We can derive an equation for the mean of v’s probability distribution and the variance of v’s
probability distribution.

Center of the Probability Distribution: Mean of the Random Variable

Once again, recall the equation we use to compute the mean:
Mean[ v] = ∑ All v v Prob[ v]

For each possible value, multiply the value and its probability; then sum the products. As before,
there are two possible values for v, 1 and 0:

v=1 v=0
↓ ↓
Mean[v] = 1×p + 0 × (1 − p)
= p + 0 =p

The mean equals p, the actual fraction of the population supporting Clint.

Spread of the Probability Distribution: Variance of the Random Variable

The equation and the four steps we used to calculate the variance are the same as before.
Var[ v] = ∑ All v (v − Mean[ v])2 Prob[ v]

For each possible value, multiply the squared deviation and its probability, and sum the prod-
ucts—in four steps:
• For each possible value, calculate the deviation from the mean.
• Square each value’s deviation.
• Multiply each value’s squared deviation by the value’s probability.
• Sum the products.

Individual’s Deviation from Squared

v Mean[v] Prob[v]
response Mean[v] deviation
For Clint 1 p 1−p (1 − p)2 p
Not for Clint 0 p −p p2
1−p
71 Essentials of Probability and Estimation Procedures

Again, there are two possible values for v, 1 and 0:

v=1 v=0
↓ ↓
Var[v] = (1 − p) × p + p (1 − p)
2 2

= factoring out p(1 − p)

= p(1 − p)[(1 − p) + p]
simplifying
= p(1 − p)

The variance equals p(1 − p).

Experiment 2.3: Opinion Poll with Sample Size of 2—Another Unrealistic but Instructive
Experiment

In reality, we would never use a poll of only two individuals to estimate the actual fraction of
the population supporting Clint. Nevertheless, analyzing such a case is instructive. Therefore let
us consider an experiment in which two individuals are polled (figure 2.11). Remember, we have
written the name of each student enrolled in the college on a 3 × 5 card.
In the first stage:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer; this yields a specific numeri-
cal value of v1 for the random variable. v1 equals 1 if the first individual polled supports Clint;
0 otherwise.
• Replace the card.

In the second stage, the procedure is repeated:

• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer; this yields a specific numeri-
cal value of v2 for the random variable. v2 equals 1 if the second individual polled supports Clint;
0 otherwise.
• Replace the card.

Last, calculate the fraction of those polled supporting Clint.

v1 + v2 1
Estimated fraction of population supporting Clint = EstFrac = = (v1 + v2 )
2 2
72 Chapter 2

Population

Sample

Individual 1
for Clint?

Individual 2
for Clint?

Figure 2.11
Opinion Poll simulation—Sample size of two

The estimated fraction of the population supporting Clint, EstFrac, is a random variable. We
cannot determine with certainty the numerical value of the estimated fraction, EstFrac, before
the experiment is conducted.

Question: What can we say about the random variable EstFrac?

Answer: We can describe its probability distribution.

The probability distribution reports the likelihood of each possible outcome. We can describe
the probability distribution by calculating its center (mean) and spread (variance).

Center of the Probability Distribution: Mean of the Random Variable

Recall that

⎡1 ⎤
Mean[ EstFrac] = Mean ⎢ (v1 + v2 ) ⎥
⎣ 2 ⎦
What do we know that would help us calculate the mean? We know the means of v1 and v2; also
we know about the arithmetic of means.
That is, we already have the means of the random variables v1 and v2:
• The first stage of the experiment is identical to the previous experiment in which only one
card is drawn; consequently

Mean[v1] = Mean[v] = p
73 Essentials of Probability and Estimation Procedures

where

p = ActFrac = Actual fraction of population supporting Clint

Number of Clint supporters in student body
=
Total number of students in studeent body
• Since the first card drawn is replaced, the second stage of the experiment is also identical to
the previous experiment:

Mean[v2] = Mean[v] = p

Next let us review the arithmetic of means:

• Mean[cx] = c Mean[x]
• Mean[x + y] = Mean[x] + Mean[y]

We will focus on

⎡1 ⎤
Mean ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
and apply the arithmetic of means:

Mean[cx] = c Mean[x] Mean[x + y] = Mean[x] + Mean[y]

↓ ↓
⎡1 ⎤ 1 1
Mean ⎢ (v1 + v2 ) ⎥ = Mean[(v1 + v2 )] = [ Mean[ v1 ] + Mean[ v2 ]]
⎣2 ⎦ 2 2

Mean[v1] = Mean[v2] = p
1
= [ p + p]
2

simplifying
1
= [ 2 p]
2
=p

Spread of the Probability Distribution: Variance of the Random Variable

What do we know that would help us calculate the variance? We know the variances of v1 and
v2; also we know about the arithmetic of variances:

⎡1 ⎤
Var[ EstFrac] = Var ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
74 Chapter 2

That is, we already have the variances of the random variables v1 and v2:
• The first stage of the experiment is identical to the previous experiment in which only one
card was drawn; consequently

Var[v1] = Var[v] = p(1 − p)

• Since the first card drawn is replaced, the second stage of the experiment is also identical to
the previous experiment:

Var[v2] = Var[v] = p(1 − p)

Let us review the arithmetic of variances:

• Var[cx] = c2 Var[x]
• Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

Now focus on the covariance of v1 and v2, Cov[v1, v2]. The covariance tells us whether the
variables are correlated or independent. On the one hand, when two variables are correlated their
covariance is nonzero; knowing the value of one variable helps us predict the value of the other.
On the other hand, when two variables are independent their covariance equals zero; knowing
the value of one does not help us predict the other.
In this case, v1 and v2 are independent and their covariance equals 0. Let us explain why. Since
the first card drawn is replaced, whether or not the first voter polled supports Clint does not
affect the probability that the second voter will support Clint. Regardless of whether or not the
first voter polled supported Clint, the probability that the second voter will support Clint is p,
the actual population fraction:

p = ActFrac = Actual fraction of population supporting Clint

Number of Clint supporters in student body
=
Total number of students in studeent body
More formally, the numerical value of v1 does not affect v2’s probability distribution, and vice
versa. Consequently the two random variables, v1 and v2, are independent and their covariance
equals 0:

Cov[v1, v2] = 0

Focus on

⎡1 ⎤
Var ⎢ (v1 + v2 ) ⎥
⎣2 ⎦
75 Essentials of Probability and Estimation Procedures

and apply all this:

Var[cx] = c2 Var[x] Var[x + y] = Var[x] + 2 Cov[x, y] + Var[y]

↓ ↓
⎡1 ⎤ 1 1
Mean ⎢ (v1 + v2 ) ⎥ = Var[(v1 + v2 )] = [ Var[ v1 ] + 2 Cov[ v1 + v2 ] + Var[ v2 ]]
⎣2 ⎦ 4 4

Cov[v1, v2] = 0
1
= [Var[ v1 ] + Var[ v2 ]]
4

Var[v1] = Var[v2] = p(1 − p)

1
= [ p(1 − p) + p(1 − p)]
4

simplifying
1
= [2 p(1 − p)]
4
p(1 − p)
=
2

Econometrics Lab 2.5: Polling—Checking the Mean and Variance Calculations

As before, we can use the simulation to check the equations we just derived by exploiting the
relative frequency interpretation of probability. When the experiment is repeated many, many
times, the relative frequency of each outcome equals its probability. After many, many repetitions
the distribution of the numerical values from all the repetitions mirrors the probability
distribution:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
76 Chapter 2

Applying this to the mean and variance obtains

Mean of the numerical

Variance of numerical values
values
After many,
↓ many ↓
repetitions
Mean of probability Variance of probability
distribution = p p(1 − p)
distribution =
2

To check the equations, we will specify a sample size of 2 and select an actual population
fraction of 0.50. Using the equations we derived, the mean of the estimated fraction’s probability
distribution should be 0.50 and the variance should be 0.125:

Mean[EstFrac] = p = 0.50
1 ⎛ 1⎞ 1 1
1−
p(1 − p) 2 ⎜⎝ 2 ⎟⎠ 2 × 2 1
Var[ EstFrac] = = = = = 0.125
2 2 2 8

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2.5.]

Be certain the simulation’s Pause checkbox is cleared. Click Start and then, after many, many
repetitions, click Stop.
The simulation results (table 2.3) suggest that our equations are correct. After many, many
repetitions the mean (average) of the numerical values equals the mean of the probability dis-
tribution, 0.50. Similarly, the variance of the numerical values equals the variance of the prob-
ability distribution, 0.125.

Table 2.3
Opinion Poll simulation results—sample size of two

1
Actual population fraction = ActFrac = p = = 0.50
2

Equations Simulation

Mean of Variance of
Mean of Variance of numerical numerical
EstFrac’s EstFrac’s values of values of
Sample probability probability Simulation EstFrac from EstFrac from
size distribution distribution repetitions the experiments the experiments

1 1
2 = 0.50 = 0.125 >1,000,000 ≈0.50 ≈0.125
2 8
77 Essentials of Probability and Estimation Procedures

2.6 Mean, Variance, and Covariance: Data Variables and Random Variables

In this chapter we extended the notions of mean, variance, and covariance that we introduced
in chapter 1 from data variables to random variables. The mean and variance describe the dis-
tribution of a single variable. The mean depicts the center of a variable’s distribution; the variance
depicts the distribution spread. In the case of a data variable, the distribution is illustrated by a
histogram; consequently the mean and variance describe the center and spread of the data vari-
able’s histogram. In the case of a random variable, mean and variance describe the center and
spread of the random variable’s probability distribution.
Covariance quantifies the notion of how two variables are related. When two data variables
are uncorrelated, they are independent; the value of one variable does not help us predict the
value of the other. In the case of independent random variables, the value of one variable does
not affect the probability distribution of the other variable and their covariance equals 0.

Chapter 2 Review Questions

1. What is a random process?

2. What information does the probability of an outcome provide?
3. What is a random variable?
4. What information does a random variable’s probability distribution provide?
5. When dealing with a random variable what is the
a. bad news; that is, what do we not know about the random variable?
b. good news; that is, what do we often know about the random variable?
6. What is the relative frequency interpretation of probability?
7. Focus on the probability distribution of a random variable. What information does its
a. mean provide?
b. variance provide?

Chapter 2 Exercises

1. Suppose that you have a deck composed of the following 10 cards:

2♠ 2♥ 2♦ 2♣ 3♠

3♥ 3♦ 4♠ 4♥ 5♦
78 Chapter 2

Consider the following experiment:

• Thoroughly shuffle the deck of 10 cards.
• Draw one card.
• Replace the card drawn.
a. Let v equal the value on the card drawn. Define the term “random variable.” Is v a random
variable? Explain.
b. What is the probability distribution of the random variable v? That is, what is the probabil-
ity that v equals
i. 2?
ii. 3?
iii. 4?
iv. 5?
c. Recall the equation for the mean and variance of the random variable v’s probability
distribution:

Mean[ v] = ∑ All v v Prob[ v], Var[ v] = ∑ All v (v − Mean[ v])2 Prob[ v]

Using these equations, calculate the mean (expected value) and variance of the random variable
v’s probability distribution.
Use the relative frequency interpretation of probability to check your answers to parts b and
c by clicking on the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 2E.1.]

d. After the experiment is repeated many, many times, does the distribution of numerical
values from the experiments mirror the random variable v’s probability distribution?
e. After the experiment is repeated many, many times, how are the mean and variance of the
random variable v’s probability distribution related to the mean and variance of the numerical
values?
2. Consider the following experiment. Using the same deck from question 1,
• Thoroughly shuffle the deck of 10 cards.
• Draw one card and record its value.
• Replace the card drawn.
• Thoroughly shuffle the deck of 10 cards.
• Draw a second card and record its value.
79 Essentials of Probability and Estimation Procedures

Let

v1 = value of the first card drawn

v2 = value of the second card drawn
a. What are the mean and variance of the random variable v1? What are the mean and vari-
ance of the random variable v2?
b. Are the random variables v1 and v2 independent? That is, would knowing the value of the
first card drawn affect the probability distribution for the value of the second card drawn?
c. Review the arithmetic of means and variances. Using your answers to part a, calculate the
mean (expected value) and variance of the sum of the two random variables, v1+ v2.
d. What are the mean (expected value) and variance of the average of the two random vari-
ables, (v1 + v2)/2?
3. Consider the following experiment. Using the same deck from questions 1 and 2,
• Thoroughly shuffle the deck of 10 cards.
• Draw one card and record its value.
• Do not replace the card drawn.
• Thoroughly shuffle the remaining 9 cards.
• Draw a second card and record its value.

(Note: This experiment differs from the earlier one in that the first card drawn is not replaced.)
As before, let

v1 = value of the first card drawn

v2 = value of the second card drawn
a. Calculate the following probabilities:
i. Prob[v2 = 2 if v1 = 2]: __________
ii. Prob[v2 = 2 if v1 ≠ 2]: __________
b. Are the random variables v1 and v2 independent? Explain.

Next suppose that instead of 10 cards, the deck contains 10,000 cards: 4,000 2’s, 3,000 3’s, 2,000
4’s, and 1,000 5’s. Consider this new experiment:
• Thoroughly shuffle the deck of 10,000 cards.
• Draw one card and record its value.
• Do not replace the card drawn.
• Thoroughly shuffle the remaining 9,999 cards.
• Draw a second card and record its value.
80 Chapter 2

c. Calculate the following probabilities:

i. Prob[v2 = 2 IF v1 = 2]: __________
ii. Prob[v2 = 2 IF v1 ≠ 2]: __________

Compare the consequences of not replacing the first card drawn with the deck of 10 cards versus
the deck of 10,000.
d. Compare your answers to parts a and c. As the population size increases (i.e., as the
number of cards in the deck increases), is the probability of drawing a 2 on the second draw
affected more or less by whether or not a 2 is drawn on the first draw?
e. Suppose that a student assumes v1 and v2 to be independent even though the first card is
not replaced. As the population size increases, would this assumption become a better or
worse approximation of reality?
f. In view of the fact that there are more than 100 million American voters, should a profes-
sional pollster of American sentiments worry about “the replacement issue?”
4. A European roulette wheel has 37 slots around its perimeter (see figure 2.12). The slots are
numbered 1 through 36 and 0. You can place a bet on the roulette board.

Figure 2.12
Game of roulette (© Can Stock Photo Inc./RaStudio and © Can Stock Photo Inc./oorka)
81 Essentials of Probability and Estimation Procedures

Many different types of bets can be made. You can bet on a single number, a row of numbers,
a column of numbers, a set of twelve numbers, all red numbers, all black numbers, all even
numbers, or all odd numbers. (Note: The rows, columns, twelves, reds, blacks, evens, and odds
do not include 0.) Once all bets are placed, the roulette wheel is spun and a ball is dropped into
the spinning wheel. Initially, the ball bounces wildly around the wheel, but eventually, it settles
into one of the 37 slots. If this is a slot that you bet on, you win; the amount of winnings depends
on the type of bet you made:

Type of bet Gross winnings from a $1 bet

Single number $36
Single row $12
Single column $3
Set of twelve $3
All reds $2
All blacks $2
All evens $2
All odds $2

If the ball does not settle into a slot you bet on, you lose your bet. Suppose that you always
place a $1 bet. Let
v = Your net winnings = Your gross winnings − $1

Roulette wheels are precisely balanced so that the ball is equally likely to land in each of the
37 slots.
a. Suppose that you place a $1 bet on the first set of twelve numbers.
i. If the ball ends up in one of the first 12 slots, what will v equal? What is the probability
of this scenario?
ii. If the ball does not end up in one of the first 12 slots, what will v equal? What is the
probability of this scenario?
iii. In this scenario, what are the mean (expected value) and variance of v?
b. Suppose that you place a $1 bet on red.
i. If the ball ends up in one of the 18 red slots, what will v equal? What is the probability
of this scenario?
ii. If the ball does not end up in one of the 18 red slots, what will v equal? What is the
probability of this scenario?
iii. In this scenario, what are the mean (expected value) and variance of v?
c. Compare the two bets. How are they similar? How are they different?
82 Chapter 2

Radii: 4 cm increments

10 9 8 7 6 5 4 3 2 1

80 cm

Figure 2.13
Assignment of archery points

5. The International Archery Federation establishes the rules for archery competitions. The
Federation permits the distance between the competitor and the target as well as the size of the
target to vary from competition to competition. Distance varies from 18 to 90 meters; the size
of the target varies from 40 to 122 centimeters in diameter. Say a friend, Archie, is participating
in a 60-meter contest. At a distance of 60 meters, the Federation specifies a target 80 centimeters
in diameter. At this distance Archie, an excellent archer, always shoots his arrows within 20
centimeters of the target’s center. Figure 2.13 described how points are assigned:
• 10 points if the arrow strikes within 4 centimeters of the target’s center.
• 9 points if the arrow strikes between 4 and 8 centimeters of the target’s center.
• 8 points if the arrow strikes between 8 and 12 centimeters of the target’s center.
• 7 points if the arrow strikes between 12 and 16 centimeters of the target’s center.
• 6 points if the arrow strikes between 16 and 20 centimeters of the target’s center.
83 Essentials of Probability and Estimation Procedures

Probability distribution

0.10

0.08

0.06

0.04

0.02

v
4 8 12 16 20 24 28 32 36 40

Figure 2.14
Probability distribution of v, the distance of Archie’s arrow from the target center

• 5 points if the arrow strikes between 20 and 24 centimeters of the target’s center.
• 4 points if the arrow strikes between 24 and 28 centimeters of the target’s center.
• 3 points if the arrow strikes between 28 and 32 centimeters of the target’s center.
• 2 points if the arrow strikes between 32 and 36 centimeters of the target’s center.
• 1 points if the arrow strikes between 36 and 40 centimeters of the target’s center.

Figure 2.14 describes the probability distribution of v, the distance of Archie’s arrow from the
target center; v is the distance of Archie’s arrow from the target’s center.

a. Explain why the area beneath Archie’s probability distribution must equal 1.
b. What is the probability that Archie will score at least 6 points?
c. What is the probability that Archie will score 10 points?
d. What is the probability that Archie will score 9 points?
e. What is the probability that Archie will score 7 or 8 points?
6. Recall our friend Dan Duffer who consistently hits 200 yard drives from the tee. Also recall
that Dan is not consistent “laterally”; his drive can land as far as 40 yards to the left or right of
his target point. Here v is the distance from Dan’s target point (see figure 2.15).

Dan has had a tough day on the course and has lost many golf balls. He only has one ball left
and wants to finish the round. Accordingly he wants to reduce the chances of driving his last
ball into the lake. So, instead of choosing a target point at the middle of the fairway, as indicated
in the figure to the right, he contemplates aiming his drive 8 yards to the left of the fairway
midpoint.
84 Chapter 2

Eighteen hole

Fairway
Target

32 yards
Lake
Target
8
yards

200 yards
Left Right
rough rough

Tee

Probability distribution
0.025

0.020

0.015

0.010

0.005

v
–40 –32 –24 –16 –8 0 8 16 24 32 40

Figure 2.15
Dan Duffer’s eighteenth hole

a. Revise and realign the figure to the right to reflect the new target point that Dan is
contemplating.
Based on this new target point:
b. What is the probability that his drive will land in the lake? ______
c. What is the probability that his drive will land in the left rough? ______
d. What is the probability that his drive will land in the fairway? ______
85 Essentials of Probability and Estimation Procedures

7. Joe passes through one traffic light on his daily commute to work. The traffic department has
set up the traffic light on a one minute cycle:

Red 30 seconds
Yellow 5 seconds
Green 25 seconds

Joe, a safe driver, decides whether or not to brake for the traffic light when he is 10 yards from
the light. If the light is red or yellow, he brakes; otherwise, he continues on.
a. When Joe makes his brake/continue decision next Monday, what is the probability that
the light will be
i. Red? _____
ii. Yellow? _____
iii. Green? _____
b. What is the probability that his brake/continue decision next Monday will be
i. Brake? _____
ii. Continue? _____
c. What is the probability that Joe will not stop at the light for the next five workdays during
his daily commute to work?
8. To avoid studying, you and your roommate decide to play the following game:

•Thoroughly shuffle your roommate’s standard deck of fifty-two cards: 13 spades, 13 hearts,
13 diamonds, and 13 clubs.
• Draw one card.
• If the card drawn is red, you win $1 from your roommate; if the card drawn is black, you lose
a $1.
• Replace the card drawn.

Let v equal your net winnings from the game:

• If a red card is drawn, v will equal +1.
• If a black card is drawn, v will equal −1.
a. What are the following probabilities?
i. Prob[v = +1] = ____
ii. Prob[v = −1] = ____
b. What do the mean and variance of v equal?
i. Mean[v] = ____
ii. Var[v] = ____
86 Chapter 2

After you play the game once, you both decide to play it again. Let us modify the notation to
reflect this:

v1 = your net winnings from the first repetition of the game

v2 = your net winnings from the second repetition of the game
v3 = your net winnings from the third repetition of the game
..
.
c. Are the random variables v1 and v2 correlated or independent? Explain.
d. Is any one vt correlated with another?
You and your roommate play the game eighteen times. Let TNW equal your total net winnings;
that is,

TNW = v1 + v2 + . . . + v18

e. What do the mean and variance of TNW equal?

i. Mean[TNW] = ____
ii. Var[TNW] = ____
Interval Estimates and the Central Limit Theorem
3

Chapter 3 Outline

3.1 Review
3.1.1 Random Variables
3.1.2 Relative Frequency Interpretation of Probability

3.2 Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution
3.2.1 Measure of the Probability Distribution Center: Mean of the Random Variable
3.2.2 Measure of the Probability Distribution Spread: Variance of the Random Variable
3.2.3 Why Is the Mean of the Estimate’s Probability Distribution Important? Biased and
Unbiased Estimation Procedures
3.2.4 Why Is the Variance of the Estimate’s Probability Distribution Important? Reliability
of Unbiased Estimation Procedures

3.3 Interval Estimates

3.4 Relative Frequency Interpretation of Probability

3.5 Central Limit Theorem

3.6 The Normal Distribution: A Way to Estimate Probabilities

3.6.1 Properties of the Normal Distribution:
3.6.2 Using the Normal Distribution Table: An Example
3.6.3 Justifying the Use of the Normal Distribution
3.6.4 Normal Distribution’s Rules of Thumb

3.7 Clint’s Dilemma and His Opinion Poll

88 Chapter 3

Chapter 3 Prep Questions

1. Consider a random process and the associated random variable.

a. What do we mean by a random variable?

b. When dealing with a random variable:
i. What do we never know?
ii. What can we hope to know?
2. Apply the arithmetic of means to show that

⎡1 ⎤
Mean ⎢ (v1 + v2 + . . . + vT ) ⎥ = p
⎣T ⎦
whenever Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = . . . = Mean[vT] = p.

3. Apply the arithmetic of variances to show that

⎡1 ⎤ p(1 − p)
Var ⎢ (v1 + v2 + . . . + vT ) ⎥ =
⎣T ⎦ T

Whenever
• Var[vt] = p(1 − p) for each t; that is, Var[v1] = Var[v2] = . . . = Var[vT] = p(1 − p)

and
• vt’s are independent; that is, all the covariances equal 0.

4. What is the relative frequency interpretation of probability?

5. What two concepts have we introduced to describe a probability distribution?

6. Would you have more confidence in a poll that queries a small number of individuals or a
poll that queries a large number?

3.1 Review

3.1.1 Random Variables

Remember, random variables bring both bad and good news. Before the experiment is
conducted:

Bad news: What we do not know: on the one hand, we cannot determine the numerical value
of the random variable with certainty.
89 Interval Estimates and the Central Limit Theorem

Good news: What we do know: on the other hand, we can often calculate the random variable’s
probability distribution telling us how likely it is for the random variable to equal each of its
possible numerical values.

3.1.2 Relative Frequency Interpretation of Probability

After many, many repetitions of the experiment the distribution of the numerical values from
the experiments mirrors the random variable’s probability distribution.

3.2 Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution

Polling procedures use information gathered from a sample of the population to draw inferences
about the entire population. In the previous chapter we considered two unrealistic samples sizes,
a sample size of 1 and a sample size of 2. Common sense suggests that such small samples
would not be helpful in drawing inferences about an entire population. We considered these
unrealistic sample sizes to lay the groundwork for realistic ones. We are now prepared to analyze
the general case in which the sample size equals T. Let us return to our friend Clint who is
running for president of his student body. Consider the following experiment:

Experiment 3.1: Opinion Poll with a Sample Size of T

Write the names of every individual in the population on a card. Perform the following procedure
T times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
•Ask that individual if he/she supports Clint; the individual’s answer determines the numerical
value of vt: vt equals 1 if the tth individual polled supports Clint; 0 otherwise.
• Replace the card.

Calculate the fraction of those polled supporting Clint:

v1 + v2 + . . . + vT
EstFrac =
T
1
= ( v1 + v2 + . . . + vT )
T

where T = sample size. The estimated fraction of the population supporting Clint, EstFrac, is a
random variable. We cannot determine the numerical value of the estimated fraction, EstFrac,
with certainty before the experiment is conducted.
90 Chapter 3

Question: What can we say about the random variable EstFrac?

Answer: We can describe the center and spread of EstFrac’s probability distribution by calcu-
lating its mean and variance.

Using the same logic as we applied in chapter 2, we know the following:

• Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = . . . = Mean[vT] = p.
•Var[vt] = p(1 − p) for each t; that is, Var[v1] = Var[v2] = . . . = Var[vT] = p(1 − p), where p =
ActFrac = actual fraction of the population supporting Clint.
• vt’s are independent; hence their covariances equal 0.

3.2.1 Measure of the Probability Distribution Center: Mean of the Random Variable

First, consider the mean. Apply the arithmetic of means and what we know about the vt’s:

⎡1 ⎤
Mean[ EstFrac] = Mean ⎢ ( v1 + v2 + . . . + vT )⎥
⎣T ⎦
since Mean[cx] = c Mean[x]
1
= Mean [( v1 + v2 + . . . + vT )]
T

since Mean[x + y] = Mean[x] + Mean[y]

1
= (Mean[v1 ] + Mean[ v2 ] + . . . + Mean[ vT ])
T
1
= ( p + p + . . . + p)
T

How many p terms are there? A total of T.

1
= (T × p)
T

Simplifying obtains

=p
91 Interval Estimates and the Central Limit Theorem

3.2.2 Measure of the Probability Distribution Spread: Variance of the Random Variable

Next, focus on the variance. Apply the arithmetic of variances and what we know about the vt’s:

⎡1 ⎤
Var[ EstFrac] = Var ⎢ ( v1 + v2 + . . . + vT )⎥
⎣T ⎦
since Var[cx] = c2Var[x]
1
= Var [( v1 + v2 + . . . + vT )]
T2

since Var[x + y] = Var[x] + Var[y] when x and y are independent; hence the covariances are
all 0.
1
= (Var[v1 ] + Var[ v2 ] + . . . + Var[ vT ])
T2

since Var[v1] = Var[v2] = . . . = Var[vT] = p(1 − p)

1
= [ p(1 − p) + p(1 − p) + . . . + p(1 − p)]
T2

How many p(1 − p) terms are there? A total of T.

1
= [T × p(1 − p)]
T2

Simplifying obtains
p(1 − p)
=
T

To summarize:
p(1 − p)
Mean[ EstFrac] = p, Var[ EstFrac] =
T

where p = ActFrac = actual fraction of the population supporting Clint and T = sample size.

Econometrics Lab 3.1: Polling—Checking the Mean and Variance Equations

Once again, we will exploit the relative frequency interpretation of probability to check the
equations for the mean and variance of the estimated fraction’s probability distribution:
92 Chapter 3

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution

We just derived the mean and variance of the estimated fraction’s probability distribution:
p(1 − p)
Mean[ EstFrac] = p, Var[ EstFrac] =
T

where p = ActFrac and T = sample size. Consequently after many, many repetitions the mean
of these numerical values should equal approximately p, the actual fraction of the population
that supports Clint, and the variance should equal approximately p(1 − p)/T.
In the simulation we begin by specifying the fraction of the population supporting Clint. We
could choose any actual population fraction; for purposes of illustration we choose 1/2 here.
1
Actual fraction of the population supporting Clint = ActFrac = = 0.50
2

1 ⎛ 1⎞ 1 1 1
⎜1 − ⎟ ×
1 2 ⎝ 2⎠ 2 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T T T 4T

We will now use our simulation to consider different sample sizes, different T’s.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 3.1.]

Our simulation results appearing in table 3.1 are consistent with the equations we derived. After
many, many repetitions the means of the numerical values equal the means of the estimated
fraction’s probability distribution. The same is true for the variances.
Public opinion polls use procedures very similar to that described in our experiment. A specific
number of people are asked who or what they support and then the results are reported. We can
think of a poll as one repetition of our experiment. Pollsters use the numerical value of the
estimated fraction from one repetition of the experiment to estimate the actual fraction. But how
reliable is such an estimate? We will now show that the reliability of an estimate depends on the
mean and variance of the estimate’s probability distribution.
93 Interval Estimates and the Central Limit Theorem

Table 3.1
Opinion Poll simulation results with selected sample sizes

Actual population fraction = ActFrac = p = 0.50

Equations Simulation

Mean of Variance of Mean (average) of Variance of

EstFrac’s EstFrac’s numerical values numerical values
Sample probability probability Simulation of EstFrac from of EstFrac from
size distribution distribution repetitions the experiments the experiments

1 1
1 = 0.50 = 0.25 >1,000,000 ≈ 0.50 ≈ 0.25
2 4
1 1
2 = 0.50 = 0.125 >1,000,000 ≈ 0.50 ≈ 0.125
2 8
1 1
25 = 0.50 = 0.01 >1,000,000 ≈ 0.50 ≈ 0.01
2 100
1 1
100 = 0.50 = 0.0025 >1,000,000 ≈ 0.50 ≈ 0.0025
2 400
1 1
400 = 0.50 = 0.000625 >1,000,000 ≈ 0.50 ≈ 0.000625
2 1, 600

3.2.3 Why Is the Mean of the Estimate’s Probability Distribution Important? Biased and
Unbiased Estimation Procedures

Recall Clint’s poll in which 12 of the 16 individuals queried supported him. The estimated frac-
tion, EstFrac, equaled 0.75:
12
EstFrac = = 0.75
16

In chapter 2 we used our Opinion Poll simulation to show that this poll result did not prove with
certainty that the actual fraction of the population supporting Clint exceeded 0.50. In general,
we observed that while it is possible for the estimated fraction to equal the actual population
fraction, it is more likely for the estimated fraction to be greater than or less than the actual
fraction. In other words, we cannot expect the estimated fraction from a single poll to equal the
actual population fraction.
What then can we conclude? We know that the estimated fraction is a random variable. While
we cannot determine its numerical value with certainty before the experiment is conducted, we
can describe its probability distribution. A random variable’s mean describes the center of its
probability distribution. Using a little algebra, we showed that the mean of the estimated frac-
tion’s probability distribution equals the actual fraction of the population supporting Clint.
Whenever the mean of an estimate’s probability distribution equals the actual value, the estima-
tion procedure is unbiased as illustrated in figure 3.1:
94 Chapter 3

Probability distribution

EstFrac
Mean[EstFrac] = ActFrac
Figure 3.1
Probability distribution of EstFrac values

Unbiased estimation procedure

↓
Mean[EstFrac] = Actual population fraction = ActFrac
Being unbiased is a very desirable property. An unbiased procedure does not systematically
underestimate or overestimate the actual fraction of the population supporting Clint.
By exploiting the relative frequency interpretation of probability we can use the simulations
we just completed to confirm that Clint’s estimation procedure is unbiased.

Relative frequency interpretation of probability:

After many, many repetitions, the distribution of the
numerical values mirrors the probability distribution.
Unbiased estimation procedure

Average of the estimate’s

numerical values after = Mean[EstFrac] = ActFrac
many, many repetitions

Average of the estimate’s

numerical values after = ActFrac
many, many repetitions
95 Interval Estimates and the Central Limit Theorem

When the estimation procedure is unbiased, the average of the numerical values of the estimated
fractions equaled the actual population fraction after many, many repetitions. Table 3.1 reports
that this is true.
We can obtain even more intuition about unbiased estimation procedures when the probability
distribution of the estimate is symmetric. In this case the chances that the estimated fraction will
be less than the actual population fraction in one repetition equal the chances that the estimated
fraction will be greater than the actual fraction. We will use a simulation to illustrate this.

Econometrics Lab 3.2: Polling—Illustrating the Importance of the Mean

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 3.2.]

Figure 3.2 illustrates the defaults. An actual population fraction of 0.50 and a sample size of
100 are specified. Two new lists appear in the lower left of the window: a From list and a To

ActFrac Sample size

Sample
0.1 16 size
Actual 0.2 25
population 0.3 50
fraction 0.4 100
0.5
0.6
0.7
Is the estimation
procedure
Numerical value
unbiased?
of the estimated
fraction in this Start Stop
Pause
repetition

Repetition
Mean (average) of the
EstFrac numerical values of
From To
the estimated fractions
0.000 0.425 from all repetitions
Mean
0.025 0.450
0.050 0.475 Var Variance of the
0.075 0.500 numerical values of
the estimated fractions
From–To percent
from all repetitions

Figure 3.2
Opinion Poll simulation
96 Chapter 3

list. By default, a From value of .000 and a To value of .500 are selected. The From–To Percent
line reports the percentage of repetitions in which the estimated fraction lies between the From
value, .000, and the To value, .500.
Check to be certain that the simulation is calculating the From–To Percent correctly by click-
ing Start and then Continue a few times. Then clear the Pause box and click Continue. After
many, many repetitions click Stop. The From–To Percent equals approximately 50 percent. The
estimates in approximately 50 percent of the repetitions are less than 0.5, the actual value; con-
sequently approximately 50 percent of the repetitions are greater than 0.5, the actual value. The
chances that the estimated fraction will be less than the actual population fraction in one repeti-
tion equal the chances that the estimated fraction will be greater than the actual fraction.
To summarize, there are two important points to make about Clint’s poll:
Bad news: We cannot expect the estimated fraction from Clint’s poll, 0.75, to equal the actual
population fraction.
Good news: The estimation procedure that Clint used is unbiased. The mean of the estimated
fraction’s probability distribution equals the actual population fraction:

Mean[EstFrac] = Actual population fraction = ActFrac

The estimation procedure does not systematically underestimate or overestimate the actual
population fraction. If the probability distribution is symmetric, the chances that the estimated
fraction will be less than the actual population fraction equal the chances that the estimated
fraction will be greater.

3.2.4 Why Is the Variance of the Estimate’s Probability Distribution Important? Reliability of
Unbiased Estimation Procedures

We will use the polling simulation to illustrate the importance of the probability distribution’s
variance.

Econometrics Lab 3.3: Polling—Illustrating the Importance of the Variance

In addition to specifying the actual population fraction and the sample size, the simulation
includes the From–To lists.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 3.3.]

As before, the actual population fraction equals 0.50 by default. Select 0.450 from the From list
and 0.550 from the To list. The simulation will now calculate the percent of the repetitions in
which the numerical value of the estimated fraction lies within the 0.450 to 0.550 interval. Since
we have specified the actual population fraction to be 0.50, the simulation will report the percent
97 Interval Estimates and the Central Limit Theorem

Table 3.2
Opinion Poll From–To simulation results

Actual population fraction = ActFrac = p = 0.50 ⇒ Mean[EstFrac] = 0.50

Variance of Simulation: Percent of repetitions

Sample EstFrac’s probability Simulation in which the numerical value of
Size distribution repetitions EstFrac lies between 0.45 and 0.55

25 0.01 >1,000,000 ≈39

100 0.0025 >1,000,000 ≈ 69
400 0.000625 >1,000,000 ≈95

Sample size = 25 Sample size = 100 Sample size = 400

95%
39% 69%

0.45 0.55 0.45 0.55 0.45 0.55

0.50 0.50 0.50

Figure 3.3
Histograms of estimated fraction numerical values

of repetitions in which the numerical value of the estimate fraction lies within 0.05 of the actual
fraction. Initially a sample size of 25 is selected. Note that the Pause checkbox is cleared. Click
Start and then after many, many repetitions click Stop. Next consider sample sizes of 100 and
400. Table 3.2 reports the results for the three sample sizes:
When the sample size is 25, the numerical value of the estimated fraction falls within 0.05 of
the actual population fraction in about 39 percent of the repetitions. When the sample size is
100, the numerical value of the estimated fraction falls within 0.05 of the actual population
fraction in about 69 percent of the repetitions. When the sample size is 400, the numerical value
of the estimated fraction falls within 0.05 of the actual population fraction in about 95 percent
of the repetitions (see figure 3.3).
The variance plays the key role here. On the one hand, when the variance is large, the distri-
bution is “spread out”; the numerical value of the estimated fraction falls within 0.05 of the
actual fraction relatively infrequently. On the other hand, when the variance is small, the distri-
bution is tightly “cropped” around the actual population fraction, 0.50; consequently the numeri-
cal value of the estimated fraction falls within 0.05 of the actual population fraction more
frequently.
98 Chapter 3

3.3 Interval Estimates

We can now exploit the relative frequency interpretation of probability to obtain a quantitative
sense of how much confidence we should have in the results of a single opinion poll. We do so
by considering the following interval estimate question:

Interval estimate question: What is the probability that the numerical value of the estimated
fraction, EstFrac, from one repetition of the experiment lies within ___ of the actual population
fraction, ActFrac? ______

Since we are focusing on the interval from 0.450 to 0.550 and the actual population fraction is
specified as 0.50, we can enter 0.05 in the first blank:

Interval estimate question: What is the probability that the numerical value of the estimated
fraction, EstFrac, from one repetition of the experiment lies within 0.05 of the actual population
fraction, ActFrac? ______

Begin by focusing on a sample size of 25. In view of what we just learned from the simula-
tion, we can now answer the interval estimate question. After many, many repetitions of the
experiment, the numerical value of the estimated fraction falls within 0.05 of the actual value
about 39 percent of the time. Now apply the relative frequency interpretation of probability.

3.4 Relative Frequency Interpretation of Probability

When the experiment is repeated many, many times, the relative frequency of each outcome
equals its probability. Consequently, when the sample size is 25, the probability that the numeri-
cal value of the estimated fraction in one repetition of the experiment falls within 0.05 of the
actual value is about 0.39. By the same logic, when the sample size is 100, the probability that
the numerical value of the estimated fraction in one repetition of the experiment will fall within
0.05 of the actual value is about 0.69. When the sample size is 400, the probability that the
numerical value of the estimated fraction in one repetition of the experiment will fall within 0.05
of the actual value is about 0.95 (see table 3.3 and figure 3.4).
As the sample size becomes larger, it becomes more likely that the estimated fraction resulting
from a single poll will be close to the actual population fraction. This is consistent with our
intuition, is it not? When more people are polled, we have more confidence that the estimated
fraction will be close to the actual value.
We can now generalize what we just learned (shown in figure 3.5). When an estimation pro-
cedure is unbiased, the variance of the estimate’s probability distribution is important because
it determines the likelihood that the estimate will be close to the actual value. When the probabil-
ity distribution’s variance is large, it is unlikely that the estimated fraction from one poll will be
close to the actual population fraction; consequently the estimated fraction is an unreliable
estimate of the actual population fraction. However, when the probability distribution’s variance
99 Interval Estimates and the Central Limit Theorem

Table 3.3
Interval estimate question and Opinion Poll simulation results

Actual population fraction = ActFrac = p = 0.50 ⇒ Mean[EstFrac] = 0.50

Variance After many, many repetitions, In a single poll,

of EstFrac’s the percent of repetitions in the probability that
Sample probability which EstFrac falls within the EstFrac falls within the
Size distribution interval from 0.45 to 0.55 interval from 0.45 to 0.55

25 0.01 ≈39% ≈ 0.39

100 0.0025 ≈ 69% ≈ 0.69
400 0.000625 ≈95% ≈ 0.95

Sample size = 25 Sample size = 100 Sample size = 400

0.95
0.39 0.69

0.45 0.55 0.45 0.55 0.45 0.55

0.50 0.50 0.50

Figure 3.4
Probability distribution of estimated fraction values

Variance large Variance small

EstFrac EstFrac
ActFrac ActFrac

Variance large Variance small

↓ ↓
Small probability that the Large probability that the
numerical value of the estimated numerical value of the estimated
fraction, Est Frac, from one reptition fraction, EstFrac, from one repetition
of the experiment will be close to the of the experiment will be close to the
actual population fraction, ActFrac actual population fraction, ActFrac
↓ ↓
Estimate is unreliable Estimate is reliable

Figure 3.5
Probability distribution of estimated fraction values
100 Chapter 3

is small, it is likely that the estimated fraction from one poll will be close to the actual popula-
tion fraction; in this case the estimated fraction is a reliable estimate of the actual population
fraction.

3.5 Central Limit Theorem

You might have noticed that the distributions of the numerical values produced by the simulation
for the samples of 25, 100, and 400 look like bell-shaped curves. Although we do not provide
a proof, it can be shown that as the sample size increases, the distribution gradually approaches
what mathematicians and statisticians call the normal distribution. Formally, this result is known
as the Central Limit Theorem.

Econometrics Lab 3.4: The Central Limit Theorem

We will illustrate the Central Limit Theorem by using our Opinion Poll simulation. Again, let
the actual population fraction equal 0.50 and consider three different sample sizes: 25, 100, and
400. In each case we will use our simulation to calculate interval estimates for 1, 2, and 3 stan-
dard deviations around the mean.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 3.4.]

First consider a sample size of 25:

1
Sample size = T = 25, Actual population fraction = ActFrac = = 0.50
2

1 1 1
1 p(1 − p) 2 × 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T 25 25 100
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.10
100 10

When the sample size equals 25, the standard deviation is 0.10. Since the distribution mean
equals the actual population fraction, 0.50, 1 standard deviation around the mean would be from
0.400 to 0.600, 2 standard deviations from 0.300 to 0.700, and 3 standard deviations from 0.200
to 0.800. In each case specify the appropriate From–To values and be certain that the Pause
checkbox is cleared. Click Start, and then after many, many repetitions click Stop. The simula-
tion results are reported in table 3.4.
101 Interval Estimates and the Central Limit Theorem

Table 3.4
Interval percentages for a sample size of 25

Interval: Simulation:
Standard deviations within Percent of repetitions
random variable’s mean From To within interval

1 0.400 0.600 69.25

2 0.300 0.700 96.26
3 0.200 0.800 99.85

Table 3.5
Interval percentages for a sample size of 100

Interval: Simulation:
Standard deviations within Percent of repetitions
random variable’s mean From To within interval

1 0.450 0.550 68.50

2 0.400 0.600 95.64
3 0.350 0.650 99.77

Next consider a sample size of 100 (table 3.5):

1
Sample size = T = 100, Actual population fraction = ActFrac = = 0.550
2

1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = 4 =
2 T 100 1000 400
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.05
400 20

When the sample size equals 100, the standard deviation is 0.05. Since the distribution mean
equals the actual population fraction, 0.50, 1 standard deviation around the mean would be from
0.450 to 0.550, 2 standard deviations from 0.400 to 0.600, and 3 standard deviations from 0.350
to 0.650.
102 Chapter 3

Table 3.6
Interval percentages for a sample size of 400

Interval: Simulation:
Standard deviations within Percent of repetitions
distribution mean From To within interval

1 0.475 0.525 68.29%

2 0.450 0.550 95.49%
3 0.425 0.575 99.73%

Last, consider a sample size of 400 (table 3.6):

1
Sample size = T = 400, Actual population fraction = ActFrac = = 0.5
50
2

1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = 4 =
2 T 400 4000 1, 600
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.025
1, 600 40

When the sample size equals 400, the standard deviation is 0.025. Since the distribution mean
equals the actual population fraction, 0.5, 1 standard deviation around the mean would be from
0.475 to 0.525; 2 standard deviations from 0.450 to 0.550, and 3 standard deviations from 0.425
to 0.575.
Let us summarize the simulation results in a single table (table 3.7). Clearly, standard devia-
tions play a crucial and consistent role here. Regardless of the sample size, approximately 68
or 69 percent of the repetitions fall within one standard deviation of the mean, approximately
95 or 96 percent within two standard deviations, and more than 99 percent within three. The
normal distribution exploits the key role played by standard deviations.

3.6 The Normal Distribution: A Way to Calculate Interval Estimates

The normal distribution is a symmetric, bell-shaped curve with the midpoint of the bell occur-
ring at the distribution mean (figure 3.6). The total area lying beneath the curve is 1.0. As
mentioned before, it can be proved rigorously that as the sample size increases, the probability
distribution of the estimated fraction approaches the normal distribution. This fact allows us to
use the normal distribution to estimate probabilities for interval estimates.
Nearly every econometrics and statistics textbook includes a table that describes the normal
distribution. We will now learn how to use the table to estimate the probability that a random
variable will lie between any two values. The table is based on the “normalized value” of the
random variable. By convention, the normalized value is denoted by the letter z (figure 3.7):
103 Interval Estimates and the Central Limit Theorem

Table 3.7
Summary of interval percentages results

Sample sizes

Mean[EstFrac] 25 100 400

SD[EstFrac] 0.100 0.050 0.025

Interval: 1 SD
From–To values 0.400–0.600 0.450–0.550 0.475–0.525
Percent of repetitions 69.25% 68.50% 68.29%
Interval: 2 SD
From–To values 0.300–0.700 0.400–0.600 0.450–0.550
Percent of repetitions 96.26% 95.64% 95.49%
Interval: 3 SD
From–To values 0.200–0.800 0.350–0.650 0.425–0.575
Percent of repetitions 99.85% 99.77% 99.73%

Probability distribution

v
Distribution
mean

Figure 3.6
Normal distribution
104 Chapter 3

Probability distribution

Right–tail probability:
Probability of being
more than z standard
deviations above the
distribution mean

v
Distribution z SD'
mean

Figure 3.7
Normal distribution right-tail probabilities

Value of random variable − Mean of random variable’s probability diistribution

z=
Standard deviation of random variable’s probability distribution

Let us express this a little more concisely:

Value of random variable − Distribution mean
z=
Distribution standarrd deviation

In words, z tells us by how many standard deviations the value lies from the mean. If the value
of the random variable equals the mean, z equals 0.0; if the value is one standard deviation above
the mean, z equals 1.0; if the value is two standard deviations above the mean, z equals 2.0; and
so on.
The equation that describes the normal distribution is complicated (see appendix 3.1). Fortu-
nately, we can avoid using the equation because tables are available that describe the distribution.
The entire normal distribution table appears in appendix 3.1; an abbreviated portion appears in
table 3.8.
In the table the normal distribution row specifies the z value’s whole number and its tenths;
the column the z-value’s hundredths. The numbers within the body of the table estimate the
probability that the random variable lies more than z standard deviations above its mean.
105 Interval Estimates and the Central Limit Theorem

Table 3.8
Right-tail probabilities for the normal distribution

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

..
.
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
..
.
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
..
.

3.6.1 Properties of the Normal Distribution

• The normal distribution is bell shaped.

• The normal distribution is symmetric around its mean (center).
• The area beneath the normal distribution equals 1.0.

3.6.2 Using the Normal Distribution Table: An Example

For purposes of illustration, suppose that we want to use the normal distribution to calculate the
probability that the estimated fraction from one repetition of the experiment would fall between
0.525 and 0.575 when the actual population fraction was 0.50 and the sample size was 100
(figure 3.8). We begin by calculating the probability distribution’s mean and standard
deviation:
1
Sample size = T = 100, Actual population fraction = ActFrac = = 0.550
2

1 1 1
1 p(1 − p) 2 × 2 1
Mean[ EstFrac] = p = = 0.50 Var[ EstFrac] = = = 4 =
2 T 100 100 400
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.05
400 20

To calculate the probability that the estimated fraction lies between 0.525 and 0.575, we first
calculate the z-values for 0.525 and 0.575; that is, we calculate the number of standard deviations
106 Chapter 3

Probability distribution
Prob[EstFrac between 0.525 and 0.575]

0.525 0.575 EstFrac

Distribution
mean = 0.50

Figure 3.8
Interval estimate from 0.525 to 0.575

that 0.525 and 0.575 lie from the mean. Since the mean equals 0.500 and the standard deviation
equals 0.05,
• z-value for 0.525 equals 0.50:
0.525 − 0.500 0.025 1
z= = = = 0.50
0.05 0.05 2

0.525 lies a half standard deviation above the mean.

• z-value for 0.575 equals 1.50:
0.575 − 0.500 0.075 3
z= = = = 1.50
0.05 0.05 2

0.575 lies one and a half standard deviations above the mean.

Next consider the right-tail probabilities for the normal distribution in table 3.9. When we use
this table we implicitly assume that the normal distribution accurately describes the estimated
fraction’s probability distribution. For the moment, assume that this is true. The entry corre-
sponding to z equaling 0.50 is 0.3085; this tells us that the probability that the estimated fraction
lies above 0.525 is 0.3085 (figure 3.9a).

Prob[EstFrac greater than 0.525] = 0.3085

107 Interval Estimates and the Central Limit Theorem

Table 3.9
Selected right-tail probabilities for the normal distribution

z 0.00 0.01

0.4 0.3446 0.3409

0.5 0.3085 0.3050
0.6 0.2743 0.2709
1.4 0.0808 0.0793
1.5 0.0668 0.0655

The entry corresponding to z equaling 1.50 is 0.0668; this tells us that the probability that the
estimated fraction lies above 0.525 is 0.0668 (figure 3.9b).

Prob[EstFrac greater than 0.575] = 0.0668

It is now easy to calculate the probability that the estimated fraction will lie between 0.525
and 0.575:

Prob[EstFrac between 0.525 and 0.575]

Just subtract the probability that the estimated fraction will be greater than 0.525 from the prob-
ability that the estimated fraction will be greater than 0.575:

Prob[EstFrac greater than 0.525] = 0.3085

Prob[EstFrac greater than 0.575] = 0.0668

Prob[EstFrac between 0.525 and 0.575] = 0.2417

With a sample size of 100, the probability that EstFrac will lie between 0.525 and 0.575 equals
0.2417. This, of course, assumes that the normal distribution describes EstFrac’s probability
distribution accurately.

3.6.3 Justifying the Use of the Normal Distribution

To justify using the normal distribution to calculate the probabilities, reconsider our simulations
in which we calculated the percentages of repetitions that fall within one, two, and three standard
deviations of the mean after many, many repetitions. Now use the normal distribution to calculate
these percentages.
We can now calculate the probability of being within one, two, and three standard deviations
of the mean by reviewing two important properties of the normal distribution:
• The normal distribution is symmetric about its mean.
• The area beneath the normal distribution equals 1.0.
108 Chapter 3

Probability distribution

0.3085

EstFrac
0.50 SD
0.50 0.525

Figure 3.9a
Probability of EstFrac greater than 0.525

Probability distribution

0.0668

EstFrac
1.5 SD
0.50 0.575

Figure 3.9b
Probability of EstFrac greater than 0.575
109 Interval Estimates and the Central Limit Theorem

Table 3.10
Right-tail probabilities for the normal distribution

z 0.00 0.01 z 0.00 0.01 z 0.00 0.01

0.9 0.1841 0.1814 1.9 0.0287 0.0281 2.9 0.0019 0.0018

1.0 0.1587 0.1562 2.0 0.0228 0.0222 3.0 0.0013 0.0013
1.1 0.1357 0.1335 2.1 0.0179 0.0174

Probability distribution
Prob[within 1 SD] = 0.6826

Prob[1 SD below] = 0.1587 Prob[1 SD above] = 0.1587

1 SD 1 SD
Distribution
mean

Figure 3.10
Normal distribution calculations

We begin with the one standard deviation (SD) case. Table 3.10 reports that the right-hand tail
probability for z = 1.00 equals 0.1587:

Prob[1 SD above] = 0.1587

We will now use that to calculate the probability of being within one standard deviation of the
mean as illustrated in figure 3.10:
• Since the normal distribution is symmetric, the probability of being more than one standard
deviation above the mean equals the probability of being more than one standard deviation below
the mean.

Prob[1 SD below] = Prob[1 SD above]

= 0.1587
• Since the area beneath the normal distribution equals 1.0, the probability of being within one
standard deviation of the mean equals 1.0 less the sum of the probabilities of being more than
110 Chapter 3

one standard deviation above the mean and the probalibity of being more than one standard
deviation below the mean.

Prob[1 SD within] = 1.0 − (Prob[1 SD below] + Prob[1 SD above])

= 1.0 − 0.1587 + 0.1587

= 1.0 − 0.3174

= 0.6826

We apply the same to two and three standard deviations:

• Two standard deviations. As table 3.10 reports, the right-hand tail probability for z = 2.00
equals 0.0228:

Prob[2 SDs within] = 1.0 − (Prob[2 SDs below] + Prob[2 SDs above])

= 1.0 − 0.0228 + 0.0228

= 1.0 − 0.0456

= 0.9544
• Three standard deviations. As table 3.10 reports, the right-hand tail probability for z = 3.00
equals 0.0026:

Prob[3 SDs within] = 1.0 − (Prob[3 SDs below] + Prob[3 SDs above])

= 1.0 − 0.013 + 0.0013

= 1.0 − 0.0026

= 0.9974

Table 3.11 compares the percentages calculated from our simulations with the percentages that
would be predicted by the normal distribution.
Table 3.11 reveals that the normal distribution percentages are good approximations of the
simulation percentages. Furthermore, as the sample size increases, the percentages of repetitions
within each interval gets closer and closer to the normal distribution percentages. This is pre-
cisely what the Central Limit Theorem states. We use the normal distribution to calculate interval
estimates because it provides estimates that are close to the actual values.

3.6.4 Normal Distribution’s Rules of Thumb

Table 3.12 illustrates what are sometimes called the normal distribution’s “rules of thumb.” In
round numbers, the probability of being within one standard deviation of the mean is 0.68, the
111 Interval Estimates and the Central Limit Theorem

Table 3.11
Interval percentages results and normal distribution percentages

Simulation:
Percent of repetitions
within interval
Interval: sample size Normal
Standard deviations within distribution
distribution mean 25 100 400 percentages

1 69.25% 68.50% 68.29% 68.26%

2 96.26% 95.64% 95.49% 95.44%
3 99.85% 99.77% 99.74% 99.74%

Table 3.12
Normal distribution rules of thumb

Standard deviations Probability

from the mean of being within

1 ≈0.68
2 ≈0.95
3 >0.99

probability of being within two standard deviations is 0.95, and the probability of being within
three standard deviations is more than 0.99.

Clint’s Dilemma and His Opinion Poll

We will now return to Clint’s dilemma. The election is tomorrow and Clint must decide whether
or not to hold a pre-election beer tap rally designed to entice more students to vote for him. If
Clint is comfortably ahead, he could save his money and not hold the beer tap rally. But with
the election so close, the beer tap rally could be critical. Ideally, Clint would like to contact each
individual in the student body, which time does not permit to happen.
In view of the lack of time, Clint decides to poll a sample of 16 students in the population.
Clint has adopted the philosophy of econometricians:

Econometrician’s philosophy: If you lack the information to determine the value directly,
estimate the value to the best of your ability using the information you do have.
More specifically, he wrote the name of each student on a 3 × 5 card and repeated the following
procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
112 Chapter 3

After conducting his poll, Clint learns that 12 of the 16 students polled support him. That is, the
estimated fraction of the population supporting Clint is 0.75:
12 3
Frac =
Estimated fraction of the population supporting Clint: EstF = = 0.75
16 4

Based on the results of the poll, it looks like Clint is ahead. But how confident should he be that
this is in fact true? We will address this question in the next chapter.

Chapter 3 Review Questions

1. Consider an estimate’s probability distribution:

a. Why is the mean of the probability distribution important? Explain.
b. Why is the variance of the probability distribution important? Explain.

2. What is an interval estimate?

3. What is the Central Limit Theorem?

Chapter 3 Exercises

1. During the 1994 to 1995 academic year, the mean Math and Verbal SAT scores in Ohio were
515 and 460. The standard deviation for both scores was 100. Consider the following two
variables:

SatSum = SatMath + SatVerbal

SatDiff = SatMath − SatVerbal

a. What is the mean of

i. SatSum? _____
ii. SatDiff? _____
b. Assume that SatMath and SatVerbal are independent. What is the variance of
i. SatSum? _____
ii. SatDiff? _____
c. Assume that SatMath and SatVerbal exhibit perfect positive correlation. What is the vari-
ance of
i. SatSum? _____
ii. SatDiff? _____
113 Interval Estimates and the Central Limit Theorem

d. Assume that the correlation coefficient for SatMath and SatVerbal equals 0.50. What is
the variance of
i. SatSum? _____
ii. SatDiff? _____
e. Assume that the correlation coefficient for SatMath and SatVerbal equals −0.50. What is
the variance of
i. SatSum? _____
ii. SatDiff? _____
f. Using your knowledge of the real world, which of the following do you find most likely.
That is, would you expect the correlation coefficient for SatMath and SatVerbal to be

0.0_____ 1.0_____ between 0.0 and 1.0_____ less than 0.0 _____ Explain.

2. Assume that the correlation coefficient for Math and Verbal SAT scores in Ohio is 0.5.
Suppose that an Ohio student is randomly chosen. What is the probability that his/her
a. SAT sum, SatSum, exceeds 1,000? ______
b. SAT Math and Verbal difference, SatDiff, exceeds 100? ______
Hint: Apply the normal distribution.

3. During the 1994 to 1995 academic year the mean Math SAT score for high school students
in Alaska was 489; in Michigan, the mean was 549. The standard deviation in both states equaled
100. A college admission officer must decide between one student from Alaska and one from
Michigan. Both students have taken the SAT, but the admission office has lost their scores. All
else being equal, the admission officer would like to admit the Alaska student for reasons of
geographic diversity, but he/she is a little concerned that the average math SAT score in Alaska
is lower.
a. Would knowledge of the Alaskan student’s Math SAT score help you predict the Michigan
student’s score, and vice versa?
b. Are the Alaskan student’s and Michigan student’s Math SAT scores independent?
c. The admission officer asks you to calculate the probability that the student from Michigan
has a higher score than the student from Alaska. Assuming that the applicants from each state
mirror that state’s Math SAT distribution, what is this probability?

4. The Wechsler Adult Intelligence Scale is a well-known IQ test. The test results are scaled so
that the mean score is 100 and the standard deviation is 15. There is no systematic difference
between the IQs of men and women. A dating service is matching male and female subscribers
whose IQs mirror the population as a whole.
a. What is the probability that a male subscriber will have an IQ exceeding 110? _____
b. What is the probability that a female subscriber will have an IQ exceeding 110? _____
114 Chapter 3

c. Assume that the dating service does not account for IQ when matching its subscribers;
consequently the IQs of the men and women who are matched are independent. Consider a
couple that has been matched by the dating service. What is the probability that both the male
and female will have an IQ exceeding 110? _____
d. Suppose instead that the dating service does consider IQ; the service tends to match high
IQ men with high IQ women, and vice versa. Qualitatively, how would that affect your answer
to part c? _____
5. Consider the automobiles assembled at a particular auto plant. Even though the cars are the
same model and have the same engine size, they obtain slightly different gas mileages. Presently
the mean is 32 miles per gallon with a standard deviation of 4.
a. What portion of the cars obtains at least 30 miles per gallon? Hint: Apply the normal
distribution.

A rental car company has agreed to purchase several thousand cars from the plant. The contract
demands that at least 90 percent of the autos achieve at least 30 miles per gallon. Engineers
report that there are two ways in which the plant can be modified to achieve this goal:

Approach 1: Increase the mean miles per gallon leaving the standard deviation unaffected; the
cost of increasing the mean is $100,000 for each additional mile.
Approach 2: Decrease the standard deviation leaving the mean unaffected; the cost of decreas-
ing the standard deviation is $200,000 for each mile reduction.
b. If approach 1 is used to achieve the objective, by how much must the mean be increased?
c. If approach 2 is used to achieve the objective, by how much must the standard deviation
be decreased?
d. Assuming that the plant owner wishes to maximize profits, which approach should
be used?
6. Recall the game described in the problems for chapter 2 that you and your roommate played:
•Thoroughly shuffle your roommate’s standard deck of fifty-two cards: 13 spades, 13 hearts,
13 diamonds, and 13 clubs.
• Draw one card.
• If the card drawn is red, you win $1 from your roommate; if the card drawn is black, you
lose a $1.
• Replace the card drawn.

TNW equals your total net winnings after you played the game eighteen times:

TNW = v1 − v2 + . . . + v18
115 Interval Estimates and the Central Limit Theorem

where vi = your net winnings from the ith repetition of the game. Recall that the mean of TNW’s
probability distribution equals 0 and the variance equals 18. Use the normal distribution to
estimate the probability of:
a. winning something: TNW greater than 0. ______
b. winning more than $2: TNW greater than 2. ______
c. losing more than $6: TNW less than −6. ______
d. losing more than $12: TNW less than −12. ______
7. Suppose that you and a friend each decide to play roulette fifty times, each time placing a
$1 bet. Focus on the total net winnings of you and your friend after you played the game fifty
times:

TNW = v1 − v2 + . . . + v50

where vi = Your net winnings from the ith spin of the roulette wheel.
a. You decide to always bet on the first set of twelve numbers.
i. Calculate the mean and variance of TNW’s probability distribution.

Mean[TNW] = _ Var[TNW] = _

Using the normal distribution, estimate the probability that in net, you will
ii. win $10 or more. _____
iii. lose $10 or more. _____
b. Your friend decides to always bet on red.
i. Calculate the mean and variance of TNW’s probability distribution.

Mean[TNW] = _ Var[TNW] = _

Using the normal distribution, estimate the probability that in net, he will
ii. win $10 or more. _____
iii. lose $10 or more. _____
c. A risk averse individual attempts to protect him/herself from losses. Who would be using
a more risk averse strategy, you or your friend?
116 Chapter 3

8. Consider the following polls that were conducted in 2004:

Reuters/ ABC News/ CBS CNN/USA

Zogby Wash Post News Today/Gallop

Date Oct 12–14 Oct 11–13 Oct 9–11 Oct 9–10

Bush–Cheney 48% 48% 48% 48%
Kerry–Edwards 44% 48% 45% 49%
Nader–Camejo 1% 1% 2% 1%
Other/unsure 8% 3% 5% 2%
Number polled 1,220 1,202 760 793
Margin of error ±3% ±3% ±4% ±4%

Focus on the number of individuals polled. Let us do some “back of the envelope” calculations.
For the calculations, consider only two major candidates, Bush and Kerry, and assume that the
election is a tossup; that is,
1
ActFrac = p = = 0.50
2

a. Complete the following table:

Reuters/ ABC News/ CNN/USA

CBS News
Zogby Wash Post Today/Gallop
Var[EstFrac] __________ __________ __________ __________
SD[EstFrac] __________ __________ __________ __________
2 × SD[EstFrac] __________ __________ __________ __________
3 × SD[EstFrac] __________ __________ __________ __________

b. Compare the numbers in the table to the margins of error. What do you suspect that the
margin of error equals? ____________
Hint: Round off your “table numbers” to the nearest percent.
c. Recall that the polling procedure is unbiased. Using the normal distribution’s rules of
thumb interpret the margin of error.
117 Interval Estimates and the Central Limit Theorem

Right–tail
probability

z
0

Figure 3.11
Right-tail probability

Appendix 3.1: Normal Distribution Right-Tail Probabilities

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
118 Chapter 3

1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

Value of random variable − Distribution mean x − Mean[ x]

z= =
Distribution standarrd deviation SD[ x]

1
Normal distribution probability density function: e −{( x − Mean[ x ]) SD[ x ]} 2
SD[ x] 2π
Estimation Procedures, Estimates, and Hypothesis Testing
4

Chapter 4 Outline

4.1 Clint’s Dilemma and Estimation Procedures

4.1.1 Clint’s Opinion Poll and His Dilemma
4.1.2 Clint’s Estimation Procedure: The General and the Specific
4.1.3 Taking Stock and Our Strategy to Assess the Reliability of Clint’s Poll Results: Use
the General Properties of the Estimation Procedure to Assess the Reliability of the
One Specific Application
4.1.4 Importance of the Mean (Center) of the Estimate’s Probability Distribution
4.1.5 Importance of the Variance (Spread) of the Estimate’s Probability Distribution for an
Unbiased Estimation Procedure

4.2 Hypothesis Testing

4.2.1 Motivating Hypothesis Testing: The Evidence and the Cynic
4.2.2 Formalizing Hypothesis Testing: Five Steps
4.2.3 Significance Levels and Standards of Proof
4.2.4 Type I and Type II Errors: The Trade-Offs

Chapter 4 Prep Questions

1. Consider an estimate’s probability distribution:

a. Why is the mean of the probability distribution important? Explain.
b. Why is the variance of the probability distribution important? Explain.

2. After collecting evidence from a crime scene, the police identified a suspect. The suspect
provides the police with a statement claiming innocence. The district attorney is deciding
whether or not to charge the suspect with a crime. The district attorney asks a forensic expert
to examine the evidence and compare it to the suspect’s personal statement. After the expert
completes his/her work, the district attorney poses the following the question to the expert:
120 Chapter 4

Question: What is the probability that similar evidence would have arisen IF the suspect were
in fact innocent?

Initially, the forensic expert assesses this probability to be 0.50. A week later, however, more
evidence is uncovered and the expert revises the probability to 0.01. In light of the new evidence,
is it more or less likely that the suspect is telling the truth?
3. The police charge a seventeen-year-old male with a serious crime. History teaches us that no
evidence can ever prove that a defendant is guilty beyond all doubt. In this case, however, the
police do have strong evidence against the young man suggesting that he is guilty, although the
possibility that he is innocent cannot be completely ruled out. You have been impaneled on a
jury to decide this case. The judge instructs you and your fellow jurors to find the young man
guilty if you determine that he committed the crime “beyond a reasonable doubt.”

a. The following table illustrates the four possible scenarios:

Jury finds defendant Jury finds defendant

guilty innocent
Defendant actually Jury is Jury is
innocent correct__ incorrect__ correct__ incorrect__
Defendant actually Jury is Jury is
guilty correct__ incorrect__ correct__ incorrect__

For each scenario, indicate whether the jury would be correct or incorrect.
b. Consider each scenario in which the jury errs. In each of these cases, what are the conse-
quences (the “costs”) of the error to the young man and/or to society?
4. Suppose that two baseball teams, Team RS and Team Y, have played 185 games against each
other in the last decade. Consider the following statement made by Mac Carver, a self-described
baseball authority:

Carver’s view: “Over the last decade, Team RS and Team Y have been equally strong.”
Now consider two hypothetical scenarios:

Hypothetical scenario A Hypothetical scenario B

Team RS wins 180 of the 185 games Team RS wins 93 of the 185 games
a. For the moment, assume that Carver’s is correct. Comparatively speaking, which scenario
would be likely (high probability) and which scenario would be unlikely (low probability)?
121 Estimation Procedures, Estimates, and Hypothesis Testing

Assuming that Carver’s view is correct

Would scenario A be Would scenario B be
Likely? ___ Unlikely? ___ Likely? ___ Unlikely? ___
↓ ↓
Would Would
Prob[Scenario A IF Carver correct] be Prob[Scenario B IF Carver correct] be
High? ___ Low? ___ High? ___ Low? ___
b. Next suppose that scenario A actually occurs. Would you be inclined to reject Carver’s
view or not reject it? If instead scenario B actually occurs, what would you be inclined
to do?

Scenario A actually occurs Scenario B actually occurs

↓ ↓
Reject Carver’s view? Reject Carver’s view?
Yes___ No___ Yes___ No___

4.1 Clint’s Dilemma and Estimation Procedures

We will now return to Clint’s dilemma. The election is tomorrow and Clint must decide whether
or not to hold a pre-election beer tap rally designed to entice more students to vote for him. On
the one hand, if Clint is comfortably ahead, he could save his money by not holding the beer
tap rally. On the other hand, if the election is close, the beer tap rally could prove critical. Ideally
Clint would like to poll each member of the student body, but time does not permit this. Con-
sequently Clint decides to conduct an opinion poll by selecting 16 students at random. Clint
adopts the philosophy of econometricians:

Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.

4.1.1 Clint’s Opinion Poll and His Dilemma

Clint wrote the name of each student on a 3 × 5 card and repeated the following procedure 16
times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
122 Chapter 4

• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.

Twelve of the 16 students polled support Clint. That is, the estimated fraction of the population
supporting him is 0.75:
12 3
Estimated fraction of population supporting Clint : EstFrac = = = 0.75
16 4

Based on the results of the poll, it looks like Clint is ahead. But how confident should Clint be
that he is in fact ahead. Clint faces a dilemma:

Clint’s dilemma: Should Clint be confident that he has the election in hand and save his funds
or should he finance the beer tap rally?

Our project is to use the poll to help Clint resolve his dilemma:

Project: Use Clint’s poll to assess his election prospects.

Our Opinion Poll simulation taught us that while the numerical value of the estimated fraction
from one poll could equal the actual population fraction, it typically does not. The simulations
showed that in most cases the estimated fraction will be either greater than or less than the actual
population fraction. Accordingly Clint must accept the fact that the actual population fraction
probably does not equal 0.75. So Clint faces a crucial question:

Crucial question:How much confidence should Clint have in his estimate? More to the point,
how confident should Clint be in concluding that he is actually leading?

To address the confidence issue, it is important to distinguish between the general properties of
Clint’s estimation procedure and the one specific application of that procedure, the poll Clint
conducted.
123 Estimation Procedures, Estimates, and Hypothesis Testing

4.1.2 Clint’s Estimation Procedure: The General and the Specific

General properties versus One specific application

↓
Clint’s estimation
procedure: Apply the polling procedure
Calculate the fraction of once to Clint’s sample of the 16
the 16 randomly selected randomly selected students:
students supporting Clint
v1 + v2 + . . . + v16
↓ EstFrac =
16
Before poll vt = 1 if for Clint After poll
↓ = 0 if not for Clint ↓
Random variable: Estimate: Numerical value
Probability distribution ↓
12 3
EstFrac = = = 0.75
16 4
How reliable is EstFrac?
Mean[EstFrac] = p = ActFrac = actual fraction of the population supporting Clint
p(1 − p) p(1 − p)
Var[ EstFrac] = = , where T = sample size
T 16
↓
Mean and variance describe the center and spread of the estimate’s probability distribution

4.1.3 Taking Stock and Our Strategy to Assess the Reliability of Clint’s Poll Results

Let us briefly review what we have done thus far. We have laid the groundwork required to
assess the reliability of Clint’s poll results by focusing on what we know before the poll is
conducted; that is, we have focused on the general properties of the estimation procedure, the
probability distribution of the estimate. In chapter 3 we derived the general equations for the
mean and variance of the estimated fraction’s probability distribution algebraically and then
checked our algebra by exploiting the relative frequency interpretation of probability in our
Opinion Poll simulation:
124 Chapter 4

What can we deduce before the poll is

conducted?
↓
General properties of the polling
procedure described by EstFrac’s
probability distribution
↓
Probability distribution as described by
its mean (center) and variance (spread)
↓
Use algebra to derive the equations for
the probability distribution’s
mean and variance
↓
Mean[EstFrac] = p Check the algebra with a simulation
p(1 − p) by exploiting the relative frequency
Var[ EstFrac] =
T interpretation of probability

Let us review the importance of the mean and variance of the estimated fraction’s probability
distribution.

4.1.4 Importance of the Mean (Center) of the Estimate’s Probability Distribution

Clint’s estimation procedure is unbiased because the mean of the estimated fraction’s probability
distribution equals the actual fraction of the population supporting Clint (figure 4.1):

Mean[EstFrac] = p = ActFrac = actual population fraction

His estimation procedure does not systematically underestimate or overestimate the actual value.
If the probability distribution is symmetric, the chances that the estimated fraction will be too
high in one poll equal the chances that it will be too low.
We used our Opinion Poll simulation to illustrate the unbiased nature of Clint’s estimation
procedure by exploiting the relative frequency interpretation of probability. After the experiment
is repeated many, many times, the average of the estimates obtained from each repetition of the
experiment equaled the actual fraction of the population supporting Clint:
125 Estimation Procedures, Estimates, and Hypothesis Testing

Probability distribution

ActFrac EstFrac

Figure 4.1
Probability distribution of EstFrac, estimated fraction values—Importance of the mean

Relative frequency interpretation of probability:

After many, many repetitions, the distribution of the
numerical values mirrors the probability distribution
Average of the estimate’s Unbiased estimation procedure
numerical values after
many, many repetitions = Mean[EstFrac] = ActFrac

Average of the estimate’s

numerical values after = ActFrac
many, many repetitions

4.1.5 Importance of the Variance (Spread) of the Estimate’s Probability Distribution for an
Unbiased Estimation Procedure

How confident should Clint be that his estimate is close to the actual population fraction? Since
the estimation procedure is unbiased, the answer to this question depends on the variance of the
estimated fraction’s probability distribution (see figure 4.2). As the variance decreases, the likeli-
hood of the estimate being “close to” the actual value increases; that is, as the variance decreases,
the estimate becomes more reliable.
126 Chapter 4

Variance large Variance small

EstFrac EstFrac
ActFrac ActFrac

Variance large Variance small

↓ ↓
Small probability that the Large probability that the
numerical value of the estimated numerical value of the estimated
fraction, EstFrac, from one repitition fraction, EstFrac, from one repetition
of the experiment will be close to the of the experiment will be close to the
actual population fraction, ActFrac actual population fraction, ActFrac

↓ ↓
Estimated is unreliable Estimate is reliable

Figure 4.2
Probability distribution of EstFrac, estimated fraction values—Importance of variance

4.2 Hypothesis Testing

Now we will apply what we have learned about the estimate’s probability distribution, the esti-
mation procedure’s general properties, to assess how confident Clint should be in concluding
that he is ahead.

4.2.1 Motivating Hypothesis Testing: The Evidence and the Cynic

Hypothesis testing allows us to accomplish this assessment. The hypothesis-testing technique

has a wide variety of applications. For example, it was used to speculate on the relationship
between Thomas Jefferson and Sally Hemings as described by Joseph J. Ellis in his book,
American Sphinx: The Character of Thomas Jefferson (p. 21):

The results, published in the prestigious scientific magazine Nature . . . showed a match between Jefferson
and Eston Hemings, Sally’s last child. The chances of such a match occurring randomly are less than one
in a thousand.

We will motivate the rationale behind hypothesis testing by considering a cynical view.
127 Estimation Procedures, Estimates, and Hypothesis Testing

Playing the Cynic: The Election Is a Toss-Up

In the case of Clint’s poll, a cynic might say “Sure, a majority of those polled supported Clint,
but the election is actually a toss-up. The fact that 75 percent of those polled supported Clint
was just the luck of the draw.”

Cynic’s view: Despite the poll results, the election is actually a toss-up.

Econometrics Lab 4.1: Polling—Could the Cynic Be Correct?

Could the cynic be correct? Actually we have already shown that the cynic could be correct
when we introduced our Opinion Poll simulation. Nevertheless, we will do so again for
emphasis.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 4.1.]

The Opinion Poll simulation clearly shows that 12 or even more of the 16 students selected
could support Clint in a single poll when the election is a toss-up. Accordingly we cannot simply
dismiss the cynic’s view as nonsense. We must take the cynic seriously. To assess his view, we
pose the following question. It asks how likely it would be to obtain a result like the one that
actually occurred if the cynic is correct.

Question for the cynic: What is the probability that the result from a single poll would be like
the one actually obtained (or even stronger), if the cynic is correct and the election is a
toss-up?

More specifically,

Question for the cynic: What is the probability that the estimated fraction supporting Clint
would equal 0.75 or more in one poll of 16 individuals, if the cynic is correct (i.e., if the election
is actually a toss-up and the fraction of the actual population supporting Clint equals 0.50)?

We denote the answer to this question as Prob[Results IF cynic correct]:

Probability that the result from a single poll would be like the
Prob[Results IF cynic correct = one actually obtained (or even stronger), IF the cynic is
correct (if the election is a toss-up)

When the probability is small, it would be unlikely that the election is a toss-up, and hence we
could be confident that Clint actually leads. When the probability is large, it is likely that the
election is a toss-up even though the poll suggests that Clint leads:
128 Chapter 4

Prob[Results IF cynic correct] small Prob[Results IF cynic correct] large

↓ ↓
Unlikely that the Likely that the
cynic is correct cynic is correct
↓ ↓
Unlikely that the Likely that the
election is a toss-up election is a toss-up

Assessing the Cynic’s View Using the Normal Distribution: Prob[Results IF cynic correct]
How can we answer the question for the cynic? That is, how can we calculate this probability,
Prob[Results IF cynic correct]? To understand how, recall Clint’s estimation procedure, his poll:
Write the names of every individual in the population on a separate card, then perform the
following procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
• Calculate the fraction of those polled supporting Clint.

If the cynic is correct and the election is a toss-up, the actual fraction of the population support-
ing Clint would equal 1/2 or 0.50. Based on this premise, apply the equations we derived to
calculate the mean and variance of the estimated fraction’s probability distribution:
1
Sample size = T = 16, Actual population fraction = ActFrac = = 0.50
2

1 1 1
1 p(1 − p) 2 × 2 4 1
Mean[ EstFrac] = p = = 0.50, Var[ EstFrac] = = = =
2 T 16 16 64
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.125
64 8

Since the standard deviation is 0.125, the result of Clint’s poll, 0.75, is two standard deviations
above the mean, 0.50 (figure 4.3).
Next recall the normal distribution’s rules of thumb (as listed in table 4.1).
The rules of thumb tell us that the probability of being within two standard deviations of the
random variable’s mean is approximately 0.95. Recall that the area beneath the normal distribu-
tion equals 1.00. Since the normal distribution is symmetric, the probability of being more than
two standard deviations above the mean is 0.025 as shown in figure 4.3:
129 Estimation Procedures, Estimates, and Hypothesis Testing

Sample size = 16 Mean = 0.50

SD = 0.125

0.95

0.025

2 SD 2 SD
0.25 0.50 0.75

Figure 4.3
Probability distribution of EstFrac—Calculating Prob[Results IF cynic correct]

Table 4.1
Normal distribution rules of thumb

Standard deviations Probability

from the mean of being within

1 ≈ 0.68
2 ≈ 0.95
3 > 0.99

1.00 − 0.95 0.05

= = 0.025
2 2

The answer to the cynic’s question is 0.025:

Prob[Results IF cynic correct] = 0.025

If the cynic is actually correct (if the election is actually a toss-up), the probability that the frac-
tion supporting Clint would equal 0.75 or more in one poll of 16 individuals equals 0.025, that
is, 1 chance in 40. Clint must now make a decision. He must decide whether or not he is willing
to live with the odds of a 1 in 40 chance that the election is actually a toss-up. If he is willing
to do so, he will not fund the beer tap rally; otherwise, he will.
130 Chapter 4

4.2.2 Formalizing Hypothesis Testing: Five Steps

The following five steps describe how we can formalize hypothesis testing.

Step 1: Collect evidence; conduct the poll.

Clint polls 16 students selected randomly; 12 of the 16 support him. The estimated fraction
of the population supporting Clint is 0.75 or 75 percent:
12 3
EstFrac = = = 0.75
16 4

Critical result:75 percent of those polled support Clint. This evidence, the fact that more than
half of those polled, suggests that Clint is ahead.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results the election is actually a toss-up; that is, the actual fraction
of the population supporting Clint is 0.50.
The null hypothesis adopts the cynical view by challenging the evidence; the cynic always chal-
lenges the evidence. By convention, the null hypothesis is denoted as H0. The alternative hypoth-
esis is consistent with the evidence; the alternative hypothesis is denoted as H1.

H0: ActFrac = 0.50 ⇒ Election is a toss-up; cynic is correct

H1: ActFrac > 0.50 ⇒ Clint leads; cynic is incorrect and the evidence is correct

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.

Questions for the cynic:

• Generic question: What is the probability that the result would be like the one obtained (or
even stronger), if H0 is true (if the cynic is correct)?
• Specific question: The estimated fraction was 0.75 in the poll of 16 individuals: What is the
probability that 0.75 or more of the 16 individuals polled would support Clint if H0 is true (if
the cynic is correct and the actual population fraction actually equaled 0.50)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true]1

1. Traditionally this probability is called the p-value. We will use the more descriptive term, however, to emphasize
what it actually represents. Nevertheless, you should be aware that this probability is typically called the p-value.
131 Estimation Procedures, Estimates, and Hypothesis Testing

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Step 4: Use the general properties of the estimation procedure, the estimated fraction’s prob-
ability distribution, to calculate Prob[Results IF H0 true].
Prob[Results IF H0 true] equals the probability that 0.75 or more of the 16 individuals polled
would support Clint if H0 is true (if the cynic is correct and the actual population fraction actu-
ally equaled 0.50); more concisely,

Prob[Results IF H0 true] = Prob[EstFrac is at least 0.75 IF ActFrac equals 0.50]

We will use the normal distribution to compute this probability. First calculate the mean and
variance of the estimated fraction’s probability distribution based on the premise that the null
hypothesis is true; that is, calculate the mean and variance based on the premise that the actual
fraction of the population supporting Clint is 0.50:

Estimation Assume H0 Equation for Assume H0

procedure unbiased true variance true

↓
1 1 1 1
Mean[ EstFrac] = p =
2
= 0.50 p(1 − p) 2 × 2 4 1
Var[ EstFrac] = = = =
T 16 16 64
1 1
SD[ EstFrac] = Var[ EstFrac] = = = 0.125
64 8

Recall that z equals the number of standard deviations that the value lies from the mean:
Value of random variable − Distribution mean
z=
Distribution standarrd deviation

The value of the random variable equals 0.75 (from Clint’s poll); the mean equals 0.50, and the
standard deviation 0.125:
0.75 − 0.50 0.25
z= = = 2.00
0.125 0.125

Next consider the table of right-tail probabilities for the normal distribution. Table 4.2, an abbre-
viated form of the normal distribution table, provides the probability (see also figure 4.4):
132 Chapter 4

Table 4.2
Selected right-tail probabilities for the normal distribution

z 0.00 0.01

1.9 0.0287 0.0281

2.0 0.0228 0.0222
2.1 0.0179 0.0174

Sample size = 16 Mean = 0.50

SD = 0.125

0.0228

EstFrac
2 SD
0.50 0.75

Figure 4.4
Probability distribution of EstFrac—Calculating Prob[Results IF H0 true]

Probability that the result from a single poll would be like

Prob[Results IF cynic correct] = the one actually obtained (or even stronger) IF the cynic is
correct (if the election is a toss-up)

= 0.0228

Step 5: Decide on the standard of proof, a significance level.

Clint must now decide whether he considers a probability of 0.0228 to be small or large. The
significance level is the dividing line between the probability being small and the probability
being large. The significance level Clint chooses implicitly establishes his standard of proof;
that is, the significance level establishes what constitutes “proof beyond a reasonable doubt.”
If the Prob[Results IF H0 true] is less than the significance level Clint adopts, he would judge
the probability to be “small.” Clint would conclude that it is unlikely for the null hypothesis to
be true, unlikely that the election is a tossup. He would consider the poll results in which 75
percent of those polled support him to be “proof beyond a reasonable doubt” that he is leading.
If instead the probability exceeds Clint’s significance level, he would judge the probability to
133 Estimation Procedures, Estimates, and Hypothesis Testing

be large. Clint would conclude that it is likely for the null hypothesis to be true, likely that the
election is a toss-up. In this case he would consider the poll results as not constituting “proof
beyond a reasonable doubt.”

Prob[Results IF H0 true] Prob[Results IF H0 true]

less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 true] small Prob[Results IF H0 true] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Suggestion: Clint leads Suggestion: Election a toss-up

4.2.3 Significance Levels and the Standard of Proof

Recall our calculation of Prob[Results IF H0 true]:

Probability that the result from a single poll would be like

Prob[Results IF cynic correct] = the one actually obtained (or even stronger) IF the cynic is
correct (if the election is a toss-up)

= 0.0228

Now consider two different significance levels that are often used in academe: 5 percent and 1
percent:

Significance level = 5 percent Significance level = 1 percent

↓ ↓
Prob[Results IF H0 true] Prob[Results IF H0 true]
less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 true] small Prob[Results IF H0 true] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Suggestion: Clint leads Suggestion: Election a toss-up
134 Chapter 4

Significance
level
Prob small Prob large

Reject H0 Do not Reject H 0 Prob[Results IF H0 true]

Unlikely cynic and H0 correct Likely cynic and H0 correct
0 Suggestion: Clint leads Suggestion: Election is a toss-up

Do not fund the rally Fund the rally

Figure 4.5
Significance levels and Clint’s election

If Clint should adopt a 5 percent significance level, he would reject the null hypothesis; Clint
would conclude that he leads and would not fund the beer tap rally. If instead he adopts a 1
percent significance level, he will not reject the null hypothesis; Clint would conclude that he
is not leading the election and so will fund the beer tap rally. A 1 percent significant level con-
stitutes a higher standard of proof than a 5 percent significance level; a lower significance level
makes it more difficult for Clint to conclude that he is leading (figure 4.5).
Now let us generalize. The significance level is the dividing line between what we consider
a small and large probability:

Prob[Results IF H0 true] Prob[Results IF H0 true]

less than significance level greater than significance level
↓ ↓
Reject H0 Do not reject H0

As we reduce the significance level, we make it more difficult to reject the null hypothesis; we
make it more difficult to conclude that Clint is leading. Consequently the significance level and
standard of proof are intimately related; as we reduce the significance level, we are implicitly
adopting a higher standard of proof:

Lower More difficult Higher

significance → to reject null → standard
level hypothesis of proof

What is the appropriate standard of proof for Clint? That is, what significance level should
he use? There is no definitive answer, only Clint can decide. The significance level Clint’s
chooses, his standard of proof, depends on a number of factors. In part, it depends on the impor-
tance he attaches to winning the election. If he attaches great importance to winning, he would
set a very low significance level, making it difficult to reject the null hypothesis. In this case he
would be setting a very high standard of proof; much proof would be required for him to reject
the notion that the election is a toss-up. Also Clint’s choice would depend on how “paranoid”
135 Estimation Procedures, Estimates, and Hypothesis Testing

he is. If Clint is a “worrywart” who always focuses on the negative, he would no doubt adopt a
low significance level. He would require a very high standard of proof before concluding that
he is leading. On the other hand, if Clint is a carefree optimist, he would adopt a higher signifi-
cance level and thus a lower standard of proof.

4.2.4 Type I and Type II Errors: The Trade-Offs

Traditionally significance levels of 1 percent, 5 percent, and 10 percent are used in academic
papers. It is important to note, however, that there is nothing “sacred” about any of these per-
centages. There is no mechanical way to decide on the appropriate significance level. We can
nevertheless address the general factors that should be considered. We will use a legal example
to illustrate this point.
Suppose that the police charge a seventeen-year-old male with a serious crime. Strong evi-
dence against him exists. The evidence suggests that he is guilty. But a word of caution is now
in order; no evidence can ever prove guilt beyond all doubt. Even confessions do not provide
indisputable evidence. There are many examples of an individual confessing to a crime that he/
she did not commit.
Again, let us play the cynic. The cynic always challenges the evidence:
Cynic’s view: Sure, there is evidence suggesting that the young man is guilty, but the evidence
results from the “luck of the draw.” The evidence is just coincidental. In fact the young man is
innocent.

Let us formulate the null and alternative hypotheses:

H0: Defendant is innocent; cynic is correct

H1: Defendant is guilty; cynic is incorrect

The null hypothesis, H0, reflects the cynic’s view. We cannot simply dismiss the null hypothesis
as crazy. Many individuals have been convicted on strong evidence when they were actually
innocent. Every few weeks we hear about someone who, after being convicted years ago, was
released from prison as a consequence of DNA evidence indicating that he/she could not have
been guilty of the crime.
Now suppose that you are a juror charged with deciding the fate of the young man. Criminal
trials in the United States require the prosecution to prove that the defendant is guilty “beyond
a reasonable doubt.” The judge instructs you to find the defendant guilty if you believe the
evidence meets the “beyond the reasonable doubt” criterion. You and your fellow jurors must
now decide what constitutes “proof beyond a reasonable doubt.” To help you make this decision,
we will make two sets of observations. We will first express each in simple English and then
“translate” the English into “hypothesis-testing language”; in doing so, remember the null
hypothesis asserts that the defendant is innocent:
136 Chapter 4

Translating into H0: Defendant is innocent

hypothesis
testing language H1: Defendant is guilty
Observation one:
The defendant is either H0 is either
• actually innocent • actually true
→
or or
• actually guilty • actually false
Observation two:
The jury must find the defendant either The jury must either
• guilty • reject H0
→
or or
• innocent • not reject H0

Four possible scenarios exist. Table 4.3 summarizes these scenarios.

It is possible for the jury to make two different types of mistakes:
• Type I error: Jury finds the defendant guilty when he is actually innocent; in terms of
hypothesis-testing language, the jury rejects the null hypothesis when the null hypothesis is
actually true.

Cost of type I error: Type I error means that an innocent young man is incarcerated; this is a
cost incurred not only by the young man, but also by society.
• Type II error: Jury finds the defendant innocent when he is actually guilty; in terms of
hypothesis-testing language, the jury does not reject the null hypothesis when the null hypothesis
is actually false.

Cost of type II error: Type II error means that a criminal is set free; this can be costly to society
because the criminal is free to continue his life of crime.

Table 4.3
Four possible scenarios

Jury finds guilty Jury finds innocent

Reject H0 Do not reject H0

Defendant actually H0 is actually Type I error Correct

innocent true Imprison innocent man Free innocent man
Defendant actually H0 is actually Correct Type II error
guilty false Imprison guilty man Free guilty man
137 Estimation Procedures, Estimates, and Hypothesis Testing

Table 4.4
Costs of type I and type II errors

Type I error Type II error

↓ ↓
Innocent man found guilty Guilty man found innocent
↓ ↓
Incarcerate an innocent man Free a criminal who could commit more crimes

Table 4.4 summarizes the two types of errors.

How much proof should constitute “proof beyond a reasonable doubt?” That is, how much
proof should a jury demand before finding the defendant guilty? The answer depends on the
relative costs of the two types of errors. As the costs of incarcerating an innocent man (type I
error) increase relative to costs of freeing a guilty man (type II error), the jurors should demand
a higher standard of proof, thereby making it more difficult to convict an innocent man. To
motivate this point, consider the following question:

Question: Suppose that the prosecutor decides to try the seventeen-year-old as an adult rather
than a juvenile. How should the jury’s standard of proof be affected?

In this case the costs of incarcerating an innocent man (type I error) would increase because the
conditions in a prison are more severe than the conditions in a juvenile detention center. Since
the costs of incarcerating an innocent man (type I error) are greater, the jury should demand a
higher standard of proof, thereby making a conviction more difficult:

Try Cost of incarcerating More difficult to Higher

defendant → innocent man → find defendant → standard
as adult becomes greater guilty of proof
Translating this into hypothesis testing language:
Cost of type I error Higher
Try defendant → relative to type II error → More difficult to → standard
as adult becomes greater reject H0 of proof

Now review the relationship between the significance level and the standard of proof; a lower
significance level results in a higher standard of proof:

Lower More difficult Higher

significance → to reject null → standard
level hypothesis of proof
138 Chapter 4

Significance
level
Small probability Large probability

Reject H0 Do not reject H 0 Prob[Results IF H0 true]

Type I error Type II error
0 possible possible

Figure 4.6
Significance levels and the standard of proof

To make it more difficult to reject the null hypothesis, to demand a higher standard of proof,
the jury should adopt a lower significance level:

Try Cost of type I error More difficult Higher

defendant → relative to type II error → to reject → standard
as adult becomes greater H0 of proof
↓
Lower
significance
level

The choice of the significance level involves trade-offs, a “tightrope act,” in which we balance
the relative costs of type I and type II error (see figure 4.6). There is no automatic, mechanical
way to determine the appropriate significance level. It depends on the circumstances.

Chapter 4 Review Questions

1. Consider an estimation procedure.

a. What are the general properties of an estimation procedure?
b. Why are the general properties of an estimation procedure important? Explain.
2. Is there a relationship between the choice of a significance level and the standard of proof?
Explain.
3. Focus on type I and type II errors.
a. What is type I error?
b. What is type II error?
c. Does the choice of a significance level affect the likelihood of type I and type II errors?
Explain.
139 Estimation Procedures, Estimates, and Hypothesis Testing

Chapter 4 Exercises

1. Evaluate each of the following statements: “When an estimation procedure is unbiased

a. the estimate will equal the actual value.”
b. the chances of the estimate being less than the actual value equal the chances of the esti-
mate being greater.”
c. the mean of the estimate’s probability distribution equals the actual value.”
d. the average of the estimates after many, many repetitions equals the actual value.”
2. Thomas Jefferson and Sally Hemings: From American Sphinx: The Character of Thomas
Jefferson by Joseph J. Ellis, page 21:

The DNA evidence suggests that a relationship existed between Thomas Jefferson and Sally
Hemings.
a. Play the cynic. What is the cynic’s view?
b. Formulate the null and alternative hypotheses.
c. What does Prob[Results IF H0 true] equal?
3. During 2003 the Texas legislature was embroiled in a partisan dispute to redraw the state’s
US congressional districts. Texas Republicans charged that the districts were drawn unfairly so
as to increase the number of Texas Democrats sent to the US House of Representatives. The
Republican position was based on a comparison of the statewide popular vote for House candi-
dates in the 2002 election and the number of Democratic and Republican congressmen who
were elected:

2002 statewide vote for Congress (total of all votes in the 32 Texas congressional
districts)
Democratic votes 1,885,178
Republican votes 2,290,723
2002 Representatives elected
Democratic representatives 17
Republican representatives 15
a. What is the fraction of voters statewide who cast ballots for a Democratic candidate? Call
this fraction DemVoterFrac:
DemVoterFrac = ________
140 Chapter 4

b. What is the fraction of Democrats in the Texas delegation to the House?

c. Unfair districting is called gerrymandering. Do your answers to parts a and b suggest, at
least the possibility of gerrymandering as Republicans charged?
d. Play the cynic. What is the cynic’s view?

To assess the cynic’s view, consider the following experiment: First write the names of each
citizen who voted in the Texas election on a card along with the party for whom he/she voted.
1,885,178 of these cards have the name of a Democratic voter and 2,290,723 have the name of
a Republican voter. Repeat the following 32 times:

• Thoroughly shuffle the cards.

• Select one card at random.
• Record the party for whom the citizen voted.
• Replace the card.

Then calculate the fraction of the voters drawn who voted for the Democratic candidate; call
this fraction DemCongressFrac.
There is no “unfair districting” present in this experiment; that is, there is no gerrymandering
present. Every Texas voter has an equal chance of being chosen. Consequently any discrepancy
between the portion of voters who are Democrats and DemCongressFrac is just a random occur-
rence as the cynic contends.

e. Formulate the null and alternative hypotheses.

H0: _____________________________________________________________

H1: _____________________________________________________________

In words, express Prob[Results IF H0 true] in terms of DemCongressFrac and

DemVoterFrac.
f. DemCongressFrac is a random variable. Using the appropriate equations, calculate
Mean[DemCongressFrac]: ___________
Var[DemCongressFrac]: ___________
SD[DemCongressFrac]: ___________
g. What does Prob[Results IF H0 true] equal?
141 Estimation Procedures, Estimates, and Hypothesis Testing

4. The Electoral College became especially controversial after the 2000 presidential election
when Al Gore won the popular vote but lost the Electoral vote to George W. Bush.

2000 Presidential vote Gore Bush

Popular vote 50,999,897 50,456,002
Electoral vote 266 271
a. What fraction of the popular vote was cast for the Democratic candidate? Call this fraction
DemVoterFrac:
DemVoterFrac = _______
b. What fraction of the Electoral votes was cast for the Democratic candidate?
c. Do your answers to parts a and b suggest, at least the possibility of, Electoral College
unfairness? If so, which party, Democratic or Republican, appears to be favored?
d. Play the cynic. What is the cynic’s view?

• Thoroughly shuffle the cards.

• Select one card at random.
• Record the party for whom the citizen voted.
• Replace the card.

Then calculate the fraction of the voters drawn who voted for the Democratic candidate; call
this fraction DemElectColFrac.
There is no unfairness present in this experiment; that is, every voter has an equal chance of
being chosen for the Electoral College. Consequently any discrepancy between the portion of
voters who are Democrats and DemElectColFrac is just a random occurrence as the cynic
contends.
e. Formulate the null and alternative hypotheses.

H0: ________________________________________________________________

H1: ________________________________________________________________

In words, express Prob[Results IF H0 true] in terms of DemElectColFrac and

DemVoterFrac.
142 Chapter 4

f. DemElectColFrac is a random variable. Using the appropriate equations, calculate

Mean[DemElectColFrac]: ___________
Var[DemElectColFrac]: ___________
SD[DemElectColFrac]: ___________
g. What does Prob[Results IF H0 true] equal? _______
5. Consider the 2008 presidential election:

2008 Presidential vote Obama McCain

Popular vote 69,498,215 59,948,240
Electoral vote 365 173

a. What fraction of the popular vote was cast for the Democratic candidate? Call this fraction
DemVoterFrac:
DemVoterFrac = _______
b. What fraction of the Electoral votes was cast for the Democratic candidate?
c. Do your answers to parts a and b suggest, at least the possibility of, Electoral College
unfairness? If so, which party, Democratic or Republican, appears to be favored?
d. Play the cynic. What is the cynic’s view?

To assess the cynic’s view, suppose that the following experiment was used to determine the
makeup of the Electoral College: First write the names of each citizen who voted in the 2008
Presidential election on a card along with the party for whom he/she voted. Repeat the following
537 times:
• Thoroughly shuffle the cards.
• Select one card at random.
• Record the party for whom the citizen voted.
• Replace the card.

e. Formulate the null and alternative hypotheses.

H0: ____________________________________________________________

H1: ____________________________________________________________

In words, express Prob[Results IF H0 true] in terms of DemElectColFrac and

DemVoterFrac.
f. DemElectColFrac is a random variable. Using the appropriate equations, calculate
Mean[DemElectColFrac]: ___________
Var[DemElectColFrac]: ___________
SD[DemElectColFrac]: ___________
g. What does Prob[Results IF H0 true] equal?
6. Recall the game described in the problems for chapters 2 and 3 that you and your roommate
played:
•Thoroughly shuffle your roommate’s standard deck of fifty-two cards: 13 spades, 13 hearts,
13 diamonds, and 13 clubs.
• Draw one card.
• If the card drawn is red, you win $1 from your roommate; if the card drawn is black, you
lose $1.
• Replace the card drawn.

TNW equals your total net winnings after you played the game eighteen times:

TNW = v1 + v2 + . . . + v18

where vi = your net winnings from the ith repetition of the game. Recall that the mean of TNW’s
probability distribution equals 0 and the variance equals 18 when the game is played 18 times.
After you finish playing the game eighteen times, you won three times and your roommate
won fifteen times; you have lost a total of $12, your TNW equals −12.

a. Considering your losses, might you be a little suspicious that your roommate’s deck of
cards might not be a standard deck containing 26 red cards and 26 black cards? Explain why
or why not.
b. Play the cynic. What is the cynic’s view?
c. Formulate the null and alternative hypotheses. Express Prob[Results IF H0 true] in words
and in terms of TNW.
d. What does Prob[Results IF H0 true] equal?
144 Chapter 4

7. Recall the game of roulette that we described in the problems of chapters 2 and 3. While
playing roulette, you notice that the girlfriend of the casino’s manager is also playing roulette.
She always bets $1 the first set of twelve numbers. You observe that after she has played fifty
times, she has won 35 times and lost 15 times; that is, in net she has won $20, her TNW equals
20. Recall that the mean of TNW’s probability distribution equals −1.35 and the variance equals
98.60 when someone bets on the first set of twelve numbers for fifty spins of the wheel.
a. Considering her winnings, might you be a little suspicious that everything was on the “up
and up?” Explain why or why not.
b. Play the cynic. What is the cynic’s view?
c. Formulate the null and alternative hypotheses. Express Prob[Results IF H0 true] in words
and in terms of TNW.
d. What does Prob[Results IF H0 true] equal?
Ordinary Least Squares Estimation Procedure—The Mechanics
5

Chapter 5 Outline

5.1 Best Fitting Line

5.2 Clint’s Assignment

5.3 Simple Regression Model

5.3.1 Parameters of the Model
5.3.2 Error Term and Random Influences
5.3.3 What Is Simple about the Simple Regression Model?
5.3.4 Best Fitting Line
5.3.5 Needed: A Systematic Procedure to Determine the Best Fitting Line

5.4 Ordinary Least Squares (OLS) Estimation Procedure

5.4.1 Sum of Squared Residuals Criterion
5.4.2 Finding the Best Fitting Line

5.5 Importance of the Error Term

5.5.1 Absence of Random Influences: A “What If” Question
5.5.2 Presence of Random Influences: Back to Reality

5.6 Error Terms and Random Influences: A Closer Look

5.7 Standard Ordinary Least Squares (OLS) Premises

5.8 Clint’s Assignment: The Two Parts

146 Chapter 5

Chapter 5 Prep Questions

1. The following table reports the (disposable) income earned by Americans and their total
savings between 1950 and 1975 in billions of dollars:

Income Savings Income Savings Income Savings

Year (billion $) (billion $) Year (billion $) (billion $) Year (billion $) (billion $)
1950 210.1 17.9 1959 350.5 32.9 1968 625.0 67.0
1951 231.0 22.5 1960 365.4 33.7 1969 674.0 68.8
1952 243.4 23.9 1961 381.8 39.7 1970 735.7 87.2
1953 258.6 25.5 1962 405.1 41.8 1971 801.8 99.9
1954 264.3 24.3 1963 425.1 42.4 1972 869.1 98.5
1955 283.3 24.5 1964 462.5 51.1 1973 978.3 125.9
1956 303.0 31.3 1965 498.1 54.3 1974 1071.6 138.2
1957 319.8 32.9 1966 537.5 56.6 1975 1187.4 153.0
1958 330.5 34.3 1967 575.3 67.5

a. Construct a scatter diagram for income and savings. Place income on the horizontal axis
and savings on the vertical axis.
b. Economic theory teaches that savings increases with income. Do these data tend to support
this theory?
c. Using a ruler, draw a straight line through these points to estimate the relationship between
savings and income. What equation describes this line?
d. Using the equation, estimate by how much savings will increase if income increases by
$1 billion.
2. Three students are enrolled in Professor Jeff Lord’s 8:30 am class. Every week, he gives a
short quiz. After returning the quiz, Professor Lord asks his students to report the number of
minutes they studied; the students always respond honestly. The minutes studied and the quiz
scores for the first quiz appear in the table below:1

Minutes Quiz
Student studied (x) score (y)
1 5 66
2 15 87
3 25 90

1. NB: These data are not “real.” Instead, they were constructed to illustrate important pedagogical points.
147 Ordinary Least Squares Estimation Procedure—The Mechanics

a. Construct a scatter diagram for income and savings. Place minutes on the horizontal axis
and score on the vertical axis.
b. Ever since first grade, what have your parents and teachers been telling you about the
relationship between studying and grades? For the most part, do these data tend to support
this theory?
c. Using a ruler, draw a straight line through these points to estimate the relationship between
minutes studied and quiz scores. What equation describes this line?
d. Using the equation, estimate by how much a student’s quiz score would increase if that
student studies one additional minute.
3. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
4. What is the relative frequency interpretation of probability?
5. Calculus problem: Consider the following equation:

SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2

Differentiate SSR with respect to bConst and set the derivative equal to 0:
dSSR
=0
dbConst

Solve for bConst, and show that

bConst = –y − bxx–

where
y1 + y2 + y3
y=
3
x1 + x2 + x3
x=
3

6. Again, consider the following equation:

SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2

Let

bConst = –y − bxx–

Substitute the expression for bConst into the equation for SSR. Show that after the substitution:

SSR = [(y1 − –y ) − bx(x1 − x– )]2 + [(y2 − –y ) − bx(x2 − x– )]2 + [(y3 − –y ) − bx(x3 − x– )]2
148 Chapter 5

Table 5.1
US annual income and savings data, 1950 to 1975

Income Savings Income Savings Income Savings

Year (billion $) (billion $) Year (billion $) (billion $) Year (billion $) (billion $)

1950 210.1 17.9 1959 350.5 32.9 1968 625.0 67.0

1951 231.0 22.5 1960 365.4 33.7 1969 674.0 68.8
1952 243.4 23.9 1961 381.8 39.7 1970 735.7 87.2
1953 258.6 25.5 1962 405.1 41.8 1971 801.8 99.9
1954 264.3 24.3 1963 425.1 42.4 1972 869.1 98.5
1955 283.3 24.5 1964 462.5 51.1 1973 978.3 125.9
1956 303.0 31.3 1965 498.1 54.3 1974 1071.6 138.2
1957 319.8 32.9 1966 537.5 56.6 1975 1187.4 153.0
1958 330.5 34.3 1967 575.3 67.5

5.1 Best Fitting Line

Recall the income and savings data we introduced in the chapter preview questions. Annual time
series data of US disposable income and savings from 1950 and 1975 are shown in table 5.1.
Economic theory suggests that as American households earn more income, they will save more:

Theory: Additional income increases savings.

Project: Assess the effect of income on savings.
Question: How can we use our data to “test” this theory? That is, how can we assess the effect
of income on savings?
Answer: We begin by drawing a scatter diagram of the income–savings data (figure 5.1). Each
point represents income and savings of a single year. The lower left point represents income and
savings for 1950: (210.1, 17.9). The upper right point represents income and savings for 1975:
(1187.4, 153.0). Each other point represents one of the other years.

The data appear to support the theory: as income increases, savings generally increase.

Question: How can we estimate the relationship between income and savings?
Answer: Draw a line through the points that best fits the data; then use the equation for the
best fitting line to estimate the relationship (figure 5.2).2

2. In reality this example exhibits a time series phenomenon requiring the use of sophisticated techniques beyond the
scope of an introductory textbook. Nevertheless, it does provide a clear way to motivate the notion of a best fitting line.
Consequently this example is a useful pedagogical tool even though more advanced statistical techniques are required
to analyze the data properly.
149 Ordinary Least Squares Estimation Procedure—The Mechanics

Savings (y)

200

1975
150

100

1950
0
0 200 400 600 800 1,000 1,200 1,400
Income (x)

Figure 5.1
Income and savings scatter diagram

Savings (y)

200

150

100

0
0 200 400 600 800 1,000 1,200 1,400

Income (x)

Figure 5.2
Income and savings scatter diagram with best fitting line
150 Chapter 5

By choosing two points on this line, we can solve for the equation of the best fitting line. It
looks like the points (200, 15) and (1200, 155) are more or less on the line. Let us use these two
points to estimate the slope:
Rise 155 − 15 140
Slope = = = = 0.14
Run 1200 − 200 1000

A little algebra allows us to derive the equation for this line:

y − 15
= 0.14
x − 200

y − 15 = 0.14x − 28

y = 0.14x − 13

This equation suggests that if Americans earn an additional $1 of income, savings will rise by
an estimated $0.14; or equivalently, we estimate that a $1,000 increase in income causes a $140
increase in savings. Since the slope is positive, the data appear to support our theory; additional
income appears to increase savings.

5.2 Clint’s Assignment

Consider a second example. Three students are enrolled in Professor Jeff Lord’s 8:30 am class.
Every week, he gives a short quiz. After returning the quiz, Professor Lord asks his students to
report the number of minutes they studied; the students always respond honestly. The minutes
studied and the quiz scores for the first quiz appear in table 5.2.
The theory suggests that a student’s score on the quiz depends on the number of minutes he/
she studied:

Theory: Additional studying increases quiz scores.

Also it is generally believed that Professor Lord, a very generous soul, awards students some
points just for showing up for a quiz so early in the morning. Our friend Clint has been assigned
the problem of assessing the theory. Clint’s assignment is to use the data from Professor Lord’s
first quiz to assess the theory:

Table 5.2
First quiz results

Student Minutes studied (x) Quiz score (y)

1 5 66
2 15 87
3 25 90
151 Ordinary Least Squares Estimation Procedure—The Mechanics

Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz
scores.

5.3 Simple Regression Model

The following equation allows us to use the simple regression model to assess the theory:

yt = βConst + βxxt + et

where

yt = quiz score received by student t

xt = number of minutes studied by student t
et = error term for student t: random influences
t = 1, 2, and 3, denoting the three students: student 1, student 2, and student 3

yt, the quiz score, is called the dependent variable and xt, the minutes studied, the explanatory
variable. The value of the dependent variable depends on the value of the explanatory variable.
Or putting it differently, the value of the explanatory variable explains the value of the dependent
value.

5.3.1 Parameters of the Model

βConst and βx, the constant and coefficient of the equation, are called the parameters of the
model. To interpret the parameters recall the following:
• It is generally believed that Professor Lord gives students some points just for showing up for
the quiz.
• The theory postulates that studying more will improve a student’s score.
Using these observations, we can interpret the parameters, βConst and βx:
• βConst represents the number of points Professor Lord gives students just for showing up.

• βx represents the number of additional points earned for an additional minute of studying.

5.3.2 Error Term and Random Influences

et is the error term. The error term reflects all the random influences on student t’s quiz score,
yt. For example, if, on the one hand, Professor Lord were in an unusually bad humor when he
graded one student’s quiz, that student’s quiz score might be unusually low; this would be
reflected by a negative error term. If, on the other hand, Professor Lord were in an unusually
good humor, the student’s score might be unusually high and a positive error term would result.
Professor Lord’s disposition is not the only sources of randomness. For example, a particular
152 Chapter 5

student could have just “lucked out” by correctly anticipating the questions Professor Lord asked.
In this case the student’s score would be unusually high, his/her error term would be positive.
All such random influences are accounted for by the error term. The error term accounts for all
the factors that cannot be determined or anticipated beforehand.

5.3.3 What Is Simple about the Simple Regression Model?

The word simple is used to describe the model because the model includes only a single explana-
tory variable. Obviously many other factors influence a student’s quiz score; the number of
minutes studied is only one factor. However, we must start somewhere. We will begin with the
simple regression model. Later we will move on and introduce multiple regression models to
analyze more realistic scenarios in which two or more explanatory variables are used to explain
the dependent variable.

5.3.4 Best Fitting Line

Question: How can Clint use the data to assess the effect of studying on quiz scores?
Answer: He begins by drawing a scatter diagram using the data appearing in table 5.2 (plotted
in figure 5.3).

The data appear to confirm the “theory.” As minutes studied increase, quiz scores tend to
increase.

Question: How can Clint estimate the relationship between minutes studied and the quiz score
more precisely?
Answer: Draw a line through the points that best fits the data; then use the best fitting line’s
equation to estimate the relationship.

Clint’s effort to “eyeball” the best fitting line appears in figure 5.4. By choosing two points
on this line, Clint can solve for the equation of his best fitting line. It looks like the points (0,
60) and (20, 90) are more or less on the line. He can use these two points to estimate the slope:
Rise 90 − 60 30
Slope = = = = 1.5
Run 20 − 0 20

Next Clint can use a little algebra to derive the equation for the line:
y − 60
= 1.5
x−0

y − 60 = 1.5x

y = 60 + 1.5x
153 Ordinary Least Squares Estimation Procedure—The Mechanics

Score (y )

100

Std 3
90
Std 2

Std 1

5 10 15 20 25 30

Minutes (x)

Figure 5.3
Minutes and scores scatter diagram

This equation suggests that an additional minute of studying increases a student’s score by 1.5
points.

5.3.5 Needed: A Systematic Procedure to Determine the Best Fitting Line

Let us compare the two examples we introduced. In the income–savings case, the points were
clustered tightly around our best fitting line (figure 5.2). Two individuals might not “eyeball”
the identical “best fitting line,” but the difference would be slight. In the minutes–scores case,
however, the points are not clustered nearly so tightly (figure 5.3). Two individuals could
“eyeball” the “best fitting line” very differently; therefore two individuals could derive substan-
tially different equations for the best fitting line and would then would report very different
estimates of the effect that studying has on quiz scores. Consequently we need a systematic
procedure to determine the best fitting line. Furthermore, once we determine the best fitting line,
we need to decide how confident we should be in the theory. We will now address two issues:
154 Chapter 5

Score (y )

100

Std 3
90
Std 2

Std 1

5 10 15 20 25 30

Minutes (x)

Figure 5.4
Minutes and scores scatter diagram with Clint’s eyeballed best fitting line

• What systematic procedure should we use to determine the best fitting line for the data?
• In view of the best fitting line, how much confidence should we have in the theory’s
validity?

5.4 Ordinary Least Squares (OLS) Estimation Procedure

The ordinary least squares (OLS) estimation procedure is the most widely used estimation
procedure to determine the equation for the line that “best fits” the data. Its popularity results
from two factors:
• The procedure is computationally straightforward; it provides us (and computer software) with
a relatively easy way to estimate the regression model’s parameters, the constant and slope of
the best fitting line.
• The procedure possesses several desirable properties when the error term meets certain
conditions.
155 Ordinary Least Squares Estimation Procedure—The Mechanics

This chapter focuses on the computational aspects of the ordinary least squares (OLS) estimation
procedure. In chapter 6 we turn to the properties of the estimation procedure.
We begin our study of the ordinary least squares (OLS) estimation procedure by introducing
a little notation. We must distinguish between the actual values of the parameters and the esti-
mates of the parameters. We have used the Greek letter beta, β, to denote the actual values.
Recall the original model:

yt = βConst + βxxt + et
βConst denotes the actual constant and βx the actual coefficient.
We will use Roman italicized b’s to denote the estimates. bConst denotes the estimate of the
constant for the best fitting line and bx denotes the estimate of the coefficient for the best fitting
line. That is, the equation for the best fitting line is

y = bConst + bxx

The constant and slope of the best fitting line, bConst and bx, estimate the values of βConst and βx.3

5.4.1 Sum of Squared Errors Criterion

The ordinary least squares (OLS) estimation procedure chooses bConst and bx so as to minimize
the sum of the squared residuals. We will now use our example to illustrate precisely what this
means. We begin by introducing an equation for each student’s estimated score: Esty1, Esty2, and
Esty3.

Esty1 = bConst + bxx1, Esty2 = bConst + bxx2, Esty3 = bConst + bx x3

Esty1, Esty2, and Esty3 estimate the score received by students 1, 2, and 3 based on the estimated
constant, bConst, the estimated coefficient, bx, and the number of minutes each student studies, x1,
x2, and x3.
The difference between a student’s actual score, yt, and his/her estimated score, Estyt, is called
the residual, Rest:

Res1 = y1 − Esty1, Res2 = y2 − Esty2, Res3 = y3 − Esty3

Substituting for each student’s estimated score:

Res1 = y1 − bConst − bxx1, Res2 = y2 − bConst − bxx2, Res3 = y3 − bConst − bxx3

3. There is another convention that is often used to denote the parameter estimates, the “beta-hat” convention. The
estimate of the constant is denoted by β̂ Const and the coefficient by β̂ x. While the Roman italicized b’s estimation conven-
tion will be used throughout this textbook, be aware that you will come across textbooks and articles that use the beta-hat
convention. The b’s and β̂’s denote the same thing; they are interchangeable.
156 Chapter 5

Next we square each residual and add them together to compute the sum of squared residuals,
SSR:

SSR = Res 21 + Res 22 + Res 23

= (y1 − bConst − bxx1)2+ (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2

We can generalize the sum of squared residuals by considering a sample size of T:

SSR = ∑ t =1 Rest2 = ∑ t =1 ( yt − bConst − bx xt )2

T T

where T = sample size. bConst and bx are chosen to minimize the sum of squared residuals. The
following equations for bConst and bx accomplish this:

∑
T
t =1
( yt − y )( xt − x )
bConst = y − bx x , bx =
∑
T
t =1
( xt − x ) 2

To justify the equations, consider a sample size of 3:

SSR = (y1 − bConst − bxx1)2+ (y2 − bConst − bxx2)2+ (y3 − bConst − bxx3)2

5.4.2 Finding the Best Fitting Line

First focus on bConst. Differentiate the sum of squared residuals, SSR, with respect to bConst and
set the derivative equal to 0:
dSSR
= −2( y1 − bConst − bx x1 ) + −2( y2 − bConst − bx x2 ) + −2( y3 − bConst − bx x3 ) = 0
dbConst

Dividing by −2:

(y1 − bConst − bxx1) + (y2 − bConst − bxx2) + (y3 − bConst − bxx3) = 0

collecting like terms:

(y1 + y2 + y3) + (−bConst − bConst − bConst) + (−bxx1 − bxx2 − bxx3) = 0

simplifying:

(y1 + y2 + y3) − 3bConst − bx(x1 + x2 + x3) = 0

dividing by 3:
y1 + y2 + y3 x + x + x3
− bConst − bx 1 2 =0
3 3
157 Ordinary Least Squares Estimation Procedure—The Mechanics

y1 + y2 + y3 x + x + x3
Since equals the mean of y, –y , and 1 2 equals the mean of x, x–:
3 3
–y − bConst − bxx– = 0

Our first equation, our equation for bConst, is now justified. To minimize the sum of squared
residuals, the following relationship must be met:
–y = bConst + bxx– or bConst = –y − bxx–

As illustrated in figure 5.5, this equation simply says that the best fitting line must pass through
the point (x–, –y ), the point representing the mean of x, minutes studied, and the mean of y, the
quiz scores.

Score (y )

100

Std 3
90
Std 2 = bConst + bx x
OLS estimate: y

80
− −
(x, y) = (15, 81)

Std 1

5 10 15 20 25 30

Minutes (x)

Figure 5.5
Minutes and scores scatter diagram with OLS best fitting line
158 Chapter 5

It is easy to calculate the means:

x1 + x2 + x3 5 + 15 + 25 45
x= = = = 15
3 3 3
y1 + y2 + y3 66 + 87 + 90 243
y= = = = 81
3 3 3

The best fitting line passes through the point (15, 81).
Next we will justify the equation for bx. Reconsider the equation for the sum of squared
residuals and substitute –y − bxx– for bConst:

SSR = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2

substituting –y − bxx– for bConst:

= [y1 − (y– − bxx– ) − bxx1]2 + [y2 − (y– − bxx– ) − bxx2]2 + [y3 − ( –y − bxx– ) − bxx3]2

simplifying each of the three terms:

= [y1 − –y + bxx– − bxx1]2 + [y2 − –y + bxx– − bxx2]2 + [y3 − –y + bxx– − bxx3]2

switching of the “bx terms” within each of the three squared terms:

= [y1 − –y − bxx1 + bxx– ]2 + [y2 − –y − bxx2 + bxx– ]2 + [y3 − –y − bxx3 + bxx– ]2

factoring out −bx within each of the three squared terms:

= [(y1 − –y ) − bx(x1 − x– )]2 + [(y2 − –y ) − bx(x2 − x– )]2 + [(y3 − –y ) − bx(x3 − x– )]2

To minimize the sum of squared residuals, differentiate SSR with respect to bx and set the deriva-
tive equal to 0:
dSSR
= −2[( y1 − y ) − bx ( x1 − x )]( x1 − x ) − 2[( y2 − y ) − bx ( x2 − x )]( x2 − x )
dbx
− 2[( y3 − y ) − bx ( x3 − x )]( x3 − x ) = 0

dividing by −2:

[(y1 − –y ) − bx(x1 − x– )](x1 − x– ) + [(y2 − –y ) − bx(x2 − x– )](x2 − x– )

+ [(y3 − –y ) − bx(x3 − x– )](x3 − x– ) = 0

simplifying the expression:

(y1 − –y )(x1 − x– ) − bx(x1 − x– )2 + (y2 − –y )(x2 − x– ) − bx(x2 − x– )2

+ (y3 − –y )(x3 − x– ) − bx(x3 − x– )2 = 0

moving all terms containing bx to the right side:

(y1 − –y )(x1 − x– ) + (y2 − –y )(x2 − x– ) + (y3 − –y )(x3 − x– ) = bx(x1 − x– )2 + bx(x2 − x– )2 + bx(x3 − x– )2

159 Ordinary Least Squares Estimation Procedure—The Mechanics

factoring out bx from the right-side terms:

(y1 − –y )(x1 − x– ) +(y2 − –y )(x2 − x– ) + (y3 − –y )(x3 − x– ) = bx[(x1 − x– )2 + (x2 − x– )2 + (x3 − x– )2]

solving for bx:

( y1 − y )( x1 − x ) + ( y2 − y )( x2 − x ) + ( y3 − y )( x3 − x )
bx =
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2

Now let us generalize this to a sample size of T:

∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2

Therefore we have justified our second equation.

Let us return to Professor Lord’s first quiz to calculate the constant and slope, bConst and bx,
of the ordinary least squares (OLS) best fitting line for the first quiz’s data. We have already
computed the means for the quiz scores and minutes studied:
x1 + x2 + x3 5 + 15 + 25 45
x= = = = 15
3 3 3
y1 + y2 + y3 66 + 87 + 90 243
y= = = = 81
3 3 3

Now, for each student, calculate the deviation of y from its mean and the deviations of x from
its mean:

Student yt –y yt − –y xt x– xt − x–
1 66 81 −15 5 15 −10
2 87 81 6 15 15 0
3 90 81 9 25 15 10

Next, for each student, calculate the products of the y and x deviations and squared x
deviations:

Student (yt − –y )(xt − x–) (xt − x–)2

1 (−15)(−10) = 150 (−10)2 = 100
2 (6)(0) = 0 (0)2 = 0
3 (9)(10) = 90 (10)2 = 100
Sum = 240 Sum = 200
160 Chapter 5

Score (y )

100

Std 3
90
Std 2 OLS estimate: y =b + bxx
Const
= 63 + 1.2x
80
− −
(x, y) = (15, 81)

Std 1

5 10 15 20 25 30
Minutes (x)

Figure 5.6
Minutes and scores scatter diagram with OLS best fitting line

bx equals the sum of the products of the y and x deviations divided by the sum of the squared x
deviations:

∑
T
t =1
( yt − y )( xt − x ) 240 6
bx = = = = 1.2
∑
T
t =1
( xt − x ) 2 200 5

To calculate bConst recall that the best fitting line passes through the point representing the
average value of x and y, (x– , –y ) (see figure 5.6):
–y = bConst + bxx–

Solving for bConst obtains

bConst = –y − bxx–
161 Ordinary Least Squares Estimation Procedure—The Mechanics

We just learned that bx equals 6/5. The average of the x’s, x–, equals 15 and the average of the
y’s, –y , equals 81. Substituting, we have
6
bConst = 81 − x
5
= 81 − 18 = 63

Using the ordinary least squares (OLS) estimation procedure, we have the best fitting line for
Professor Lord’s first quiz as
6
y = 63 + x = 63 + 1.2 x
5

Consequently the least squares estimates for βConst and βx are 63 and 1.2. These estimates suggest
that Professor Lord gives each student 63 points just for showing up; each minute studied earns
the student 1.2 additional points. Based on the regression we estimate that:
• 1 additional minute studied increases the quiz score by 1.2 points.
• 2 additional minutes studied increase the quiz score by 2.4 points.
• And so on.

Let us now quickly calculate the sum of squared residuals for the best fitting line:
6
Student xt yt Estyt = 63 + xt = 63 + 1.2 xt Rest = yt − Estyt Res 2t
5
6
1 5 66 63 + × 5 = 63 + 6 = 69 66 − 69 = −3 9
5
6
2 15 87 63 + × 15 = 63 + 6 × 3 = 63 + 18 = 81 87 − 81 = 6 36
5
6
3 25 90 63 + × 25 = 63 + 6 × 5 = 63 + 30 = 93 90 − 93 = −3 9
5
SSR = 54

The sum of squared residuals for the best fitting line is 54.

Econometrics Lab 5.1: Finding the Ordinary Least Squares (OLS) Estimates

We can use our Econometrics Lab to emphasize how the ordinary least squares (OLS) estimation
procedure determines the best fitting line by accessing the Best Fitting Line simulation
(figure 5.7).
162 Chapter 5

Objective: Show that the

equations for the OLS estimates
Data: x y
for the constant and coefficient
minimize the sum of squared 5 66
residuals (SSR ) 15 87
25 90
Note that the data are from
Professor Lord’s first quiz

Figure 5.7
Best Fitting Line simulation—Data

By default the data from Professor Lord’s first quiz are specified: the values of x and y for
the first student are 5 and 66, for the second student 15 and 87, and for the third student 25
and 90.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 5.1.]

Click Go. A new screen appears as shown in figure 5.8 with two slider bars, one slide bar for
the constant and one for the coefficient.
By default the constant and coefficient values are 63 and 1.2, the ordinary least squares (OLS)
estimates. Also the arithmetic used to calculate the sum of squared residuals is displayed. When
the constant equals 63 and the coefficient equals 1.2, the sum of squared residuals equals 54.00;
this is just the value that we calculated.
Next experiment with different values for the constant and coefficient values by moving the
two sliders. Convince yourself that the equations we used to calculate the estimate for the con-
stant and coefficient indeed minimize the sum of squared residuals.

Software and the ordinary least squares (OLS) estimation procedure: Fortunately, we do not have
to trudge through the laborious arithmetic to compute the ordinary least squares (OLS) estimates.
Statistical software can do the work for us.
Professor Lord’s first quiz data: Cross-sectional data of minutes studied and quiz scores in the
first quiz for the three students enrolled in Professor Lord’s class (table 5.3).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Professor Lord’s First Quiz.]
163 Ordinary Least Squares Estimation Procedure—The Mechanics

Constant Coefficient

x Act y Est y Res Res sqr

5.0 66.0 69.0 –3.0 9.00
15.0 87.0 81.0 6.0 36.00
63 1.2 25.0 90.0 93.0 –3.0 9.00
SSR = 54.00

New data

Figure 5.8
Best Fitting Line simulation—Parameter estimates

Table 5.3
First quiz results

Student Minutes studied (x) Quiz score (y)

1 5 66
2 15 87
3 25 90

Getting Started in EViews

We can use the statistical package EViews to perform the calculations. After opening the workfile
in EViews:
• In the Workfile window: Click on the dependent variable, y, first; and then, click on the
explanatory variable, x, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.

This window previews the regression that will be run; note that the dependent variable, “y,”
is the first variable listed followed by two expressions representing the explanatory variable, “x,”
and the constant “c.”
Do not forget to close the workfile.
164 Chapter 5

Table 5.4
OLS first quiz regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Const 63.00000 8.874120 7.099296 0.0891
Number of observations 3
Sum squared residuals 54.00000
Estimated equation: Esty = 63 + 1.2x
Interpretation of estimates:
bConst = 63: students receive 63 points for showing up.
bx = 1.2: students receive 1.2 additional points for each additional minute studied.
Critical result: coefficient estimate equals 1.2; positive sign of the coefficient estimate suggests that additional
studying increases quiz scores. This evidence lends support to our theory.

Table 5.4 reports the values of the coefficient and constant for the best fitting line. Note that the
sum of squared residuals for the best fitting line is also included.

5.5 Importance of the Error Term

Recall the regression model:

yt = βConst + βxxt + et

where

yt = quiz score of student t

xt = minutes studied by student t
et = error term for student t

The parameters of the model, the values of the constant, βConst, and the coefficient, βx, represent
the actual number of
• points Professor Lord gives students just for showing up, βConst;
• additional points earned for each minute of study, βx.

Obviously the parameters of the model play an important role, but what about the error term,
et? To illustrate the importance of the error term, suppose that somehow we know the values of
βConst and βx. For the moment, suppose that βConst, the actual constant, equals 50 and βx, the actual
coefficient, equals 2. In words, this means that Professor Lord gives each student 50 points for
165 Ordinary Least Squares Estimation Procedure—The Mechanics

showing up; furthermore each minute of study provides the student with two additional points.
Consequently the regression model is

yt = 50 + 2xt + et

Note: In the real world, we never know the actual values of the constant and coefficient. We are
assuming that we do here, just to illustrate the importance of the error term.
The error term reflects all the factors that cannot be anticipated or determined before the quiz
is given; that is, the error term represents all random influences. In the absence of random influ-
ences, the error terms would equal 0.

5.5.1 Absence of Random Influences: A “What If” Question

Assume, only for the moment, that there are no random influences; consequently each error term
would equal 0 (figure 5.9). While this assumption is unrealistic, it allows us to appreciate the
important role played by the error term. Focus on the first student taking Professor Lord’s first

Score (y )

Std 3
100

Std 2
80

Actual: y = 50 + 2x
70

60
Std 3

51 01 52 02 53 0
Minutes (x)

Figure 5.9
Best fitting line with no error term
166 Chapter 5

quiz. The first student studies for 5 minutes. In the absence of random influences (that is, if e1
equaled 0), what score would the first student receive on the quiz? The answer is 60:

y1 = 50 + 2 × 5 + 0 = 50 + 10 = 60

Next consider the second student. The second student studies for 15 minutes. In the absence of
random influences, the second student would receive an 80 on the quiz:

y2 = 50 + 2 × 15 + 0 = 50 + 30 = 80
The third student would receive a 100:

y3 = 50 + 2 × 25 + 0 = 50 + 50 = 100

We summarize this in table 5.5:

In the absence of random influences, the intercept and slope of the best fitting line would
equal the actual constant and the actual coefficient, βConst and βx:

y = βConst + βxx = 50 + 2x

In sum, in the absence of random influences, the error term of each student equals 0 and the
best fitting line fits the data perfectly. The slope of this line equals 2, the actual coefficient, and
the vertical intercept of the line equals 50, the actual constant. Without random influences, it is
easy to determine the actual constant and coefficient by applying a little algebra. We will now
use a simulation to emphasize this point (figure 5.10).

Econometrics Lab 5.2: Coefficient Estimates When Random Influences Are Absent

The Coefficient Estimate simulation allows us to do something we cannot do in the real world.
It allows us to specify the actual values of the constant and coefficient in the model; that is, we
can select βConst and βx. We can specify the number of values:
• Points Professor Lord gives students just for showing up, βConst; by default, βConst is set at 50.
• Additional points earned for an additional minute of study, βx; by default, βx is set at 2.

Table 5.5
Quiz results with no random influences (no error term)

Absence of random
Student Minutes (x) influences score (y)

1 5 60
2 15 80
3 25 100
167 Ordinary Least Squares Estimation Procedure—The Mechanics

Act const Act err var Actual variance

of error term’s
Actual 40 200
probability
constant: 50 350
distribution
βConst 60 500

Err term

Act coef Sample size

Actual −2 3
coefficient: 0 4
βx 2 5
6
Start
Estimated coefficient value
calculated from this repetition:
Repetition
Σt=1 (yt − −y)(xt − −x)
T

bx =
Σt=1 (xt − −x)2
Coef est T

Figure 5.10
Coefficient Estimate simulation

Consequently the regression model is

yt = 50 + 2xt + et
Each repetition of the simulation represents a quiz from a single week. In each repetition the
simulation does the following:
• Calculates the score for each student based on the actual constant (βConst), the actual coefficient
(βx), and the number of minutes the student studied; then, to be realistic, the simulation can add
a random influence in the form of the error term, et. An error term is included whenever the Err
Term checkbox is checked.
• Applies the ordinary least squares (OLS) estimation procedure to estimate the coefficient.

When the Pause box is checked the simulation stops after each repetition; when it is cleared,
quizzes are simulated repeatedly until the “Stop” button is clicked.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 5.2.]

We can eliminate random influences by clearing the Err Term box. After doing so, click Start
and then Continue a few times. We discover that in the absence of random influences the esti-
mate of the coefficient value always equals the actual value, 2 (see table 5.6).
168 Chapter 5

Table 5.6
Simulation results with no random influences (no error term)

Coefficient estimate:
Repetition No error term

1 2.0
2 2.0
3 2.0
4 2.0

Table 5.7
Quiz results with random influences (with error term)

Inclusion of random
Student Minutes (x) influences score (y)

1 5 66
2 15 87
3 25 90

This is precisely what we concluded earlier from the scatter diagram. In the absence of random
influences, the best fitting line fits the data perfectly. The best fitting line’s slope equals the
actual value of the coefficient.

5.5.2 Presence of Random Influences: Back to Reality

The real world is not that simple, however; random influences play an important role. In the real
world, random influences are inevitably present. In figure 5.11 the actual scores on the first quiz
have been added to the scatter diagram. As a consequence of the random influences, students 1
and 2 over perform while student 3 under performs (table 5.7).
As illustrated in figure 5.12, when random influences are present, we cannot expect the inter-
cept and slope of the best fitting line to equal the actual constant and the actual coefficient. The
intercept and slope of the best fitting line, bConst and bx, are affected by the random influences.
Consequently the intercept and slope of the best fitting line, bConst and bx, are themselves random
variables. Even if we knew the actual constant and slope, that is, if we knew the actual values
of βConst and βx, we could not predict the values of the constant and slope of the best fitting line,
bConst and bx, with certainty before the quiz was given.

Econometrics Lab 5.3: Coefficient Estimates When Random Influences Are Present

We will now use the Coefficient Estimate simulation to emphasize this point. We will show that
in the presence of random influences, the coefficient of the best fitting line is a random
variable.
169 Ordinary Least Squares Estimation Procedure—The Mechanics

Score (y )

100

Std 3
90
Std 2

Actual: y = 50 + 2x
70

Std 1

5 10 15 20 25 30

Minutes (x)

Figure 5.11
Scatter diagram with error term

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 5.3.]

Note that the Error Term checkbox is now checked to include the error term. Be certain that the
Pause checkbox is checked and then click Start. When the simulation computes the best fitting
line, the estimated value of the coefficient typically is not 2 despite the fact that the actual value
of the coefficient is 2. Click the Continue button a few more times to simulate each successive
week’s quiz. What do you observe? We simply cannot expect the coefficient estimate to equal
the actual value of the coefficient. In fact, when random influences are present, the coefficient
estimate almost never equals the actual value of the coefficient. Sometimes the estimate is less
than the actual value, 2, and sometimes it is greater than the actual value. When random influ-
ences are present, the coefficient estimates are random variables.
While your coefficient estimates will no doubt differ from the estimates in table 5.8, one thing
is clear. Even if we know the actual value of the coefficient, as we do in the simulation, we
cannot predict with certainty the value of the estimate from one repetition. Our last two
170 Chapter 5

Score (y )

100

Std 3
90
Std 2

OLS estimate: y = 63 + 1.2x

Actual: y = 50 + 2x
70

Std 1
60

5 10 15 20 25 30

Minutes (x)

Figure 5.12
OLS best fitting line with error term

Table 5.8
Simulation results with random influences (with error term)

Coefficient estimate:
Repetition With error term

1 1.8
2 1.6
3 3.2
4 1.9
171 Ordinary Least Squares Estimation Procedure—The Mechanics

simulations illustrate a critical point: the coefficient estimate is a random variable as a conse-
quence of the random influences introduced by each student’s error term.

5.6 Error Terms and Random Influences: A Closer Look

We will now use a simulation to gain insights into random influences and error terms. As we
know, random influences are those factors that cannot be anticipated or determined beforehand.
Sometimes random influences lead to a higher quiz score, and other times they lead to a lower
score. The error terms embody these random influences:
• Sometimes the error term is positive, indicating that the score is higher than “usual.”
• Other times the error term is negative indicating that the score is lower than “usual.”

If the random influences are indeed random, they should be a “wash” after many, many quizzes.
That is, random influences should not systematically lead to higher or lower quiz scores. In other
words, if the error terms truly reflect random influences, they should average out to 0 “in the
long run.”

Econometrics Lab 5.4: Error Terms When Random Influences Are Present

Let us now check to be certain that the simulations are capturing random influences properly
by accessing the Error Term simulation (figure 5.13).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 5.4.].

Initially, the Pause checkbox is checked and the error term variance is 500. Now click Start
and observe that the simulation reports the numerical value error term for each of the three
students. Record these three values. Also note that the simulation constructs a histogram for each
student’s error term and also reports the mean and variance. Click Continue again to observe
the numerical values of the error terms for the second quiz. Confirm that the simulation is cal-
culating the mean and variance of each student’s error terms correctly. Click Continue a few
more times. Note that the error terms are indeed random variables. Before the quiz is given, we

Err var Actual variance of

error term’s
Repetition Pause 200 probability
350 distribution:
500 Var[e]

Figure 5.13
Error Term simulation
172 Chapter 5

Student 1 Student 2 Student 3

Mean: 0 Variance: 500 Mean: 0 Variance: 500 Mean: 0 Variance: 500

Figure 5.14
Error Term simulation results

cannot predict the numerical value of a student’s error term. Each student’s histogram shows
that sometimes the error term for that student is positive and sometimes it is negative. Next clear
the Pause checkbox and click Continue. After many, many repetitions click Stop.
After many, many repetitions, the mean (average) of each student’s error terms equals about
0 (figure 5.14). Consequently each student’s error term truly represents a random influence; it
does not systematically influence the student’s quiz score. It is also instructive to focus on each
student’s histogram. For each student, the numerical value of the error term is positive about
half the time and negative about half the time after many, many repetitions.
In sum, the error terms represent random influences; consequently the error terms have no
systematic effect on quiz scores, the dependent variable:
• Sometimes the error term is positive, indicating that the score is higher than “usual.”
• Other times the error term is negative indicating that the score is lower than “usual.”

What can we say about the student’s error terms beforehand, before the next quiz? We can
describe their probability distribution. The chances that a student’s error term will be positive is
the same as the chances it will be negative. For any one quiz, the mean of each student’s error
term’s probability distribution equals 0:

Mean[e1] = 0 Mean[e2] = 0 Mean[e3] = 0

↓ ↓ ↓
e1 has no systematic e2 has no systematic e3 has no systematic
effect on student 1’s score effect on student 2’s score effect on student 3’s score
↓ ↓ ↓
e1 represents e2 represents e3 represents
a random influence a random influence a random influence
173 Ordinary Least Squares Estimation Procedure—The Mechanics

5.7 Standard Ordinary Least Squares (OLS) Premises

Initially, we will make some strong assumptions regarding the explanatory variables and the
error terms:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent:
Cov[ei, ej] = 0. Knowing the value of the error term from one observation does not help us predict
the value of the error term for any other observation.
• Explanatory variable/error term independence premise: The explanatory variables, the
xt’s, and the error terms, the et’s, are not correlated. Knowing the value of an observation’s
explanatory variable does not help us predict the value of that observation’s error term.

We call these premises the standard ordinary least squares (OLS) premises. They make the
analysis as straightforward as possible. In part IV of this textbook we relax these premises to
study more general cases. Our strategy is to start with the most straightforward case and then
move on to more complex ones. While we only briefly cite the premises here, we will return to
them in the fourth part of the textbook to study their implications.

5.8 Clint’s Assignment: The Two Parts

Recall Clint’s assignment. He must assess the effect of studying on quiz scores by using Profes-
sor Lord’s first quiz as evidence. Clint can apply the ordinary least squares (OLS) estimation
procedure; the OLS estimate for the value of the coefficient is 1.2. But we now know that the
estimate is a random variable. We cannot expect the coefficient estimate from the one quiz, 1.2,
to equal the actual value of the coefficient, the actual impact that studying has on a student’s
quiz score. We will proceed by dividing Clint’s assignment into two related parts:
• Reliability of the coefficient estimate: How reliable is the coefficient estimate calculated
from the results of the first quiz? That is, how confident should Clint be that the coefficient
estimate, 1.2, will be close to the actual value?
• Assessment of the theory: In view of the fact that Clint’s estimate of the coefficient equals
1.2, how confident should Clint be that the theory is correct, that additional studying increases
quiz scores?

In the next few chapters we will address these issues.

174 Chapter 5

Chapter 5 Review Questions

1. What criterion does the ordinary least squares (OLS) estimation procedure apply when deriv-
ing the best fitting line?
2. How are random influences captured in the simple regression model?
3. When applying the ordinary least squares (OLS) estimation procedure, what type of variables
are the parameter estimates as a consequence of random influences?
4. What are the standard ordinary least square (OLS) premises?

Chapter 5 Exercises

1. A colleague of Professor Lord is teaching another course in which three students are enrolled.
The number of minutes each student studied and his/her score on the quiz are reported below:

Regression example data: Cross-sectional data of minutes studied and quiz scores from a course
taught by Professor Lord’s colleague.

xt = minutes studied by student t

yt = quiz score received by student t

Minutes Quiz
Student studied (x) score (y)
1 5 14
2 10 44
3 30 80

a. On a sheet of graph paper, plot a scatter diagram of the data. Then, using a ruler, draw a
straight line that, by sight, best fits the data.
b. Using a calculator and the equations we derived in class, apply the least squares estimation
procedure to find the best fitting line by filling in the blanks:
First, calculate the means:
Means: x– = _____________ = _______
–y = _____________ = _______
Second, for each student calculate the deviation of x from its mean and the deviation of y
from its mean:

Student yt –y yt − –y xt x– xt − x–
1 14 _____ _____ 5 _____ _____
2 44 _____ _____ 10 _____ _____
3 80 _____ _____ 30 _____ _____
175 Ordinary Least Squares Estimation Procedure—The Mechanics

Third, calculate the products of the y and x deviations and squared x deviations for each
student; then calculate the sums:

Student (yt − –y )(xt − x–) (xt − x–)2

1 __________ = _____ __________ = _____
2 __________ = _____ __________ = _____
3 __________ = _____ __________ = _____
Sum = _____ Sum = _____

Now apply the equations:

∑
T
t =1
( yt − y )( xt − x )
bx = = = _______
∑
T
t =1
( xt − x )2

bConst = –y = bxx– = − = − =

c. Calculate the sum of squared residuals by filling in the following blanks:

Student xt yt Estyt = bConst + btxt Rest = yt − Estyt Res 2t

1 5 14 _________________ = _____ ____ − ____ = ____ _____
2 10 44 _________________ = _____ ____ − ____ = ____ _____
3 30 80 _________________ = _____ ____ − ____ = ____ _____
SSR = _____
2. Using statistical software, check your answer to exercise 1.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Regression Example.]

Getting Started in EViews

After opening the file, use the following steps to run the regression:
• In the Workfile window: Click on the dependent variable, y, first; and then, click on the
explanatory variable, x, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.

Do not forget to close the workfile.

176 Chapter 5

a. Are the calculations you made in problem 1 consistent with those provided by the
software?
b. Based on the regression results, what equation estimates the effect of minutes studied on
quiz scores?
c. Estimate the effect of minutes studied on quiz scores:
i. 1 additional minute results in ____ additional points.
ii. 2 additional minutes result in ____ additional points.
iii. 5 additional minutes result in ____ additional points.
iv. 1 fewer minute results in ____ fewer points.
v. 2 fewer minutes result in ____ fewer points.
3. Consider crude oil production in the United States.

Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.

OilProdBarrelst US crude oil productions in year t (thousands of barrels per day)

Pricet Real wellhead price of crude oil in year t (1982–84 dollars per barrel)

Crude oil Crude oil

Price production Price production
($ per (1,000s of ($ per (1,000s of
Year barrel) barrels) Year barrel) barrels)
1976 14.39 8,132 1991 12.14 7,417
1977 14.14 8,245 1992 11.40 7,171
1978 13.80 8,707 1993 9.86 6,847
1979 17.41 8,552 1994 8.90 6,662
1980 26.20 8,597 1995 9.59 6,560
1981 34.95 8,572 1996 11.77 6,465
1982 29.55 8,649 1997 10.74 6,452
1983 26.30 8,688 1998 6.67 6,252
1984 24.91 8,879 1999 9.34 5,881
1985 22.39 8,971 2000 15.52 5,822
1986 11.41 8,680 2001 12.33 5,801
1987 13.56 8,349 2002 12.51 5,746
1988 10.63 8,140 2003 14.98 5,681
1989 12.79 7,613 2004 19.47 5,419
1990 15.33 7,355
177 Ordinary Least Squares Estimation Procedure—The Mechanics

a. What does economic theory teach us about how the real price of crude oil should affect
US crude oil production?
b. Using statistical software, estimate the effect that the real price of oil has on US crude oil
production.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crude Oil Production.]

Getting Started in EViews

After opening the workfile in EViews:

•In the Workfile window: Click on the dependent variable, oilprodbarrels, first; and then click
on the explanatory variable, price, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In Equation Specification window: Click OK.
• Do not forget to close the workfile.

c. Do the regression results tend to support the theory? Explain.

d. Estimate the effect of the real price of crude oil on US crude oil production:
i. How would a $1 increase in the real price affect US crude oil production? ___________
ii. How would a $2 increase in the real price affect US crude oil production? ___________
iii. How would a $5 decrease in the real price affect US crude oil production? ___________
4. Consider labor market supply data.

Labor supply data: Cross-sectional data of hours worked and wages for the 92 married workers
included in the March 2007 Current Population Survey residing in the Northeast region of the
United States who earned bachelor but no advanced degrees.

HoursPerWeekt Hours worked per week by worker t

Waget Wage earned by worker t (dollars per hour)
a. What does economic theory teach us about how the wage rate should affect the number
of hours of labor a worker will supply?
b. Using statistical software, estimate the effect that the wage rate has on the number of hours
of labor a worker supplies.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Labor Supply.]
178 Chapter 5

c. Do the regression results tend to support the theory? Explain.

d. Estimate the effect of the wage on labor supply:
i. How would a $1.00 increase in the wage affect the hours of labor supplied? ___________
ii. How would a $2.00 increase in the wage affect the hours of labor supplied? ___________
iii. How would a $.50 decrease in the wage affect the hours of labor supplied? ___________
5. Consider gasoline consumption in the United States.

Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.

GasConst US gasoline consumption in year t (millions of gallons per day)

PriceDollarst Real price of gasoline in year t (2000 dollars per gallon)

Gasoline Gasoline
Real price consumption Real price consumption
Year ($ per gal) (millions of gals) Year ($ per gal) (millions of gals)
1990 1.43 303.9 1995 1.25 327.1
1991 1.35 301.9 1996 1.31 331.4
1992 1.31 305.3 1997 1.29 336.7
1993 1.25 314.0 1998 1.10 346.7
1994 1.23 319.2 1999 1.19 354.1

a. What does economic theory teach us about how the real price of gasoline should affect
US gasoline consumption?
b. Using statistical software, estimate the effect that the real price of gasoline has on US
gasoline consumption.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

c. Do the regression results tend to support the theory? Explain.

d. Estimate the effect of the real price of gasoline on US gasoline consumption:
i. How would a $1.00 increase in the real price affect US gasoline consumption?
___________
ii. How would a $2.00 increase in the real price affect US gasoline consumption?
___________
iii. How would a $0.50 increase in the real price affect US gasoline consumption?
___________
179 Ordinary Least Squares Estimation Procedure—The Mechanics

iv. How would a $1.00 decrease in the real price affect US gasoline consumption?
___________
v. How would a $2.00 decrease in the real price affect US gasoline consumption?
___________
6. Consider cigarette smoking data the United States.

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.

CigConsPCt Cigarette consumption per capita in state t (packs)

EducColleget Percentage of population with bachelor degrees in state t
EducHighSchoolt Percentage of population with high school diplomas in state t
IncPCt Income per capita in state t (1,000s of dollars)
Popt population of state t (persons)
PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)
PriceSuppliert Price of cigarettes in state t received by suppliers (dollars per pack)
RegionMidWestt 1 if state t in Midwest census region, 0 otherwise
RegionNorthEastt 1 if state t in Northeast census region, 0 otherwise
RegionSoutht 1 if state t in South census region, 0 otherwise
RegionWestt 1 if state t in West census region, 0 otherwise
SmokeRateAdultt Percentage of adults who smoke in state t
SmokeRateYoutht Percentage of youths who smoke in state t
Statet Name of state t
Taxt Cigarette tax rate in state t (dollars per pack)
TobProdPCt Per capita tobacco production in state t (pounds)

Conventional wisdom suggests that high school drop outs are more likely to smoke cigarettes
than those who graduate.
a. Using statistical software, estimate the effect that the completion of high school has on
per capita cigarette consumption.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

b. Do the regression results tend to support the conventional wisdom? Explain.

7. Consider earmark data for the 110th Congress:

House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
180 Chapter 5

CongressNamet Name of Congressperson t

CongressPartyt Party of Congressperson t
CongressStatet State of Congressperson t
IncPCt Income per capita in the Congressperson t’s state (dollars)
Numbert Number of earmarks received that were sponsored solely by
Congressperson t
PartyDem1t 1 if Congressperson t Democrat; 0 otherwise
PartyRep1t 1 if Congressperson t Republican; 0 otherwise
RegionMidwestt 1 if Congressperson t represents a midwestern state, 0 otherwise
RegionNortheastt 1 if Congressperson t represents a northeastern state; 0 otherwise
RegionSoutht 1 if Congressperson t represents a southern state; 0 otherwise
RegionWestt 1 if Congressperson t represents a western state; 0 otherwise
ScoreLiberalt Congressperson’s t liberal score rating in 2007
Termst Number of terms served by Congressperson in the US Congress
UnemRatet Unemployment rate in Congressperson t’s state

a. What is an earmark?
It has been alleged that since the Congress was controlled by Democrats, Democratic members
received more solo earmarks than their non-Democratic colleagues.
b. Using statistical software, estimate the effect that the political party of a member of Con-
gress has on the dollars of earmarks received.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

c. Do the regression results tend to support the allegations? Explain.

Ordinary Least Squares Estimation Procedure—The Properties
6

Chapter 6 Outline

6.1 Clint’s Assignment: Assess the Effect of Studying on Quiz Scores

6.2 Review
6.2.1 Regression Model
6.2.2 The Error Term
6.2.3 Ordinary Least Squares (OLS) Estimation Procedure
6.2.4 The Estimates, bConst and bx, Are Random Variables

6.3 Strategy: General Properties and a Specific Application

6.3.1 Review: Assessing Clint’s Opinion Poll Results
6.3.2 Preview: Assessing Professor Lord’s Quiz Results

6.4 Standard Ordinary Least Squares (OLS) Regression Premises

6.5 New Equation for the Ordinary Least Squares (OLS) Coefficient Estimate

6.6 General Properties: Describing the Coefficient Estimate’s Probability Distribution

6.6.1 Mean (Center) of the Coefficient Estimate’s Probability Distribution
6.6.2 Variance (Spread) of the Coefficient Estimate’s Probability Distribution

6.7 Estimation Procedures and the Estimate’s Probability Distribution

6.7.1 Importance of the Mean (Center)
6.7.2 Importance of the Variance (Spread)

6.8 Reliability of the Coefficient Estimate

6.8.1 Estimate Reliability and the Variance of the Error Term’s Probability Distribution
6.8.2 Estimate Reliability and the Sample Size
182 Chapter 6

6.8.3 Estimate Reliability and the Range of x’s

6.8.4 Reliability Summary

6.9 Best Linear Unbiased Estimation Procedure (BLUE)

Chapter 6 Prep Questions

1. Run the Distribution of Coefficient Estimates simulation in the Econometrics Lab by clicking
the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6P.1.]

After completing the lab, fill in the following blanks:

Numerical value of Your calculations Simulation’s calculations

Repetition coefficient estimate Mean Variance Mean Variance
1 _______ _______ _______ _______ _______
2 _______ _______ _______ _______ _______
3 _______ _______ _______ _______ _______

Note: You must click the Next Problem button to get to the simulation’s problem 1.

2. Review the arithmetic of means:

a. Mean of a constant times a variable: Mean[cx] = _____________
b. Mean of a constant plus a variable: Mean[c + x] = _____________
c. Mean of the sum of two variables: Mean[x + y] = _____________
3. Review the arithmetic of variances:
a. Variance of a constant times a variable: Var[cx] = _____________
b. Variance of the sum of a variable and a constant: Var[c + x] = _____________
c. Variance of the sum of two variables: Var[x + y] = ___________________
d. Variance of the sum of two independent variables: Var[x + y] = ___________________
4. Consider an estimate’s probability distribution:
a. Why is the mean (center) of the probability distribution important? Explain.
b. Why is the variance (spread) of the probability distribution important? Explain.

6.1 Clint’s Assignment: Assess the Effect of Studying on Quiz Scores

Clint’s assignment is to assess the theory that additional studying increases quiz scores. To do
so, he must use data from Professor Lord’s first quiz, the number of minutes studied, and the
quiz score for each of the three students in the course (table 6.1).
183 Ordinary Least Squares Estimation Procedure—The Properties

Table 6.1
First quiz results

Student Minutes studied (x) Quiz score (y)

1 5 66
2 15 87
3 25 90

Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz
scores.

6.2 Review

6.2.1 Regression Model

Clint uses the following regression model to complete his assignment:

yt = βConst + βxxt + et

where

yt = quiz score of student t

xt = minutes studied by student t
et = error term for student t

βConst and βx are the parameters of the model. Let us review their interpretation:
• βConst reflects the number of points Professor Lord gives students just for showing up.
• βx reflects the number of additional points earned for each additional minute of studying.

6.2.2 The Error Term

The error term, et, plays a crucial role in the model. The error term represents random influences.
The mean of the error term’s probability distribution for each student equals 0:

Mean[e1] = 0, Mean[e2] = 0, Mean[e3] = 0

Consequently the error terms have no systematic on affect quiz scores. Sometimes the error term
will be positive and sometimes it will be negative, but after many, many quizzes each student’s
error terms will average out to 0. When the probability distribution of the error term is symmetric,
the chances that a student will score better than “usual” on one quiz equal the chances that the
student will do worse than “usual.”
184 Chapter 6

6.2.3 Ordinary Least Squares (OLS) Estimation Procedure

As a consequence of the error terms (random influences) we can never determine the actual
values of βConst and βx; that is, Clint has no choice but to estimate the values. The ordinary least
squares (OLS) estimation procedure is the most commonly used procedure for doing this:

∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2

bConst = –y − bxx–

Using the results of the first quiz, Clint estimates the values of the coefficient and constant:

Ordinary least squares (OLS) estimates:

First quiz data Esty = 63 + 1.2x
Student x y bConst = estimated points for showing up = 63
1 5 66 bx = estimated points for each minute studied = 1.2
2 15 87
3 25 90
x1 + x2 + x3 5 + 15 + 25 45 y1 + y2 + y3 66 + 87 + 90 243
x= = = = 15 y= = = = 81
3 3 3 3 3 3

Student yt –y yt − –y xt x– xt − x–
1 66 81 −15 5 15 −10
2 87 81 6 15 15 0
3 90 81 9 25 15 10

Student (yt − –y )(xt − x– ) (xt − x– )2

1 (−15)(−10) = 150 (−10)2 = 100
2 (6)(0) = 0 (0)2 = 0
3 (9)(10) = 90 (10)2 = 100
Sum = 240 Sum = 200

∑ ∑
T T
t =1
( yt − y )( xt − x ) = 240 t =1
( xt − x )2 = 200

∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2 bConst = y − bx x = 81 − × 15 = 63
∑
T
t =1
( xt − x ) 2 200 5 5
185 Ordinary Least Squares Estimation Procedure—The Properties

6.2.4 The Estimates, bConst and bx, Are Random Variables

In the previous chapter we used the Econometrics Lab to show that the estimates for the constant
and coefficient, bConst and bx, are random variables. As a consequence of the error terms (random
influences) we could not determine the numerical value of the estimates for the constant and
coefficient, bConst and bx, before we conduct the experiment, even if we knew the actual values
of the constant and coefficient, βConst and βx. Furthermore we can never expect the estimates to
equal the actual values. Consequently we must assess the reliability of the estimates. We will
focus on the coefficient estimate.

Estimate reliability: How reliable is the coefficient estimate calculated from the results of the
first quiz? That is, how confident can Clint be that the coefficient estimate, 1.2, will be close to
the actual value of the coefficient?

6.3 Strategy: General Properties and a Specific Application

6.3.1 Review: Assessing Clint’s Opinion Poll Results

Clint faced a similar problem when he polled a sample of the student population to estimate the
fraction of students supporting him. Twelve of the 16 randomly selected students polled, 75
percent, supported Clint, thereby suggesting that he was leading. But we then observed that it
was possible for this result to occur even if the election was actually a toss-up. In view of this,
we asked how confident Clint should be in the results of his single poll. To address this issue,
we turned to the general properties of polling procedures to assess the reliability of the estimate
Clint obtained from his single poll:
186 Chapter 6

General properties versus One specific application

↓ ↓
Clint’s estimation procedure:
Apply the polling procedure
Calculate the fraction of the 16
once to Clint’s sample of the
randomly selected students
16 randomly selected students:
supporting Clint
v1 + v2 + . . . + v16
↓ EstFrac = ↓
16
Before poll vt = 1 if for Clint After poll
↓ = 0 if not for Clint ↓
Random variable: Estimate: Numerical value
Probability distribution ↓
12 3
EstFrac = = = 0.75
16 4
How reliable is EstFrac?
Mean[EstFrac] = p = ActFrac = actual fraction of the population supporting Clint
p(1 − p) p(1 − p)
Var[ EstFrac] = = , where T = sample size
T 16
↓
Mean and variance describe the center and spread of the estimate’s probability distribution

While we could not determine the numerical value of the estimated fraction, EstFrac, before
the poll was conducted, we could describe its probability distribution. Using algebra, we derived
the general equations for the mean and variance of the estimated fraction’s, EstFrac’s, probability
distribution. Then we checked our algebra with a simulation by exploiting the relative frequency
interpretation of probability: after many, many repetitions, the distribution of the numerical
values mirrors the probability distribution for one repetition.
187 Ordinary Least Squares Estimation Procedure—The Properties

What can we deduce before the poll is

conducted?
↓
General properties of the polling procedure
are described by EstFrac’s probability
distribution
↓
Probability distribution is described by its
mean (center) and variance (spread)
↓
Use algebra to derive the equations
for the probability distribution’s
mean and variance
↓
Mean[EstFrac] = p Check the algebra with a simulation
p(1 − p) by exploiting the relative frequency
Var[ EstFrac] =
T interpretation of probability

The estimated fraction’s probability distribution allowed us to assess the reliability of Clint’s
poll.

6.3.2 Preview: Assessing Professor Lord’s Quiz Results

Using the ordinary least squares (OLS) estimation procedure we estimated the value of the coef-
ficient to be 1.2. This estimate is based on a single quiz. The fact that the coefficient estimate
is positive suggests that additional studying increases quiz scores. But how confident can we be
that the coefficient estimate is close to the actual value? To address the reliability issue we will
focus on the general properties of the ordinary least squares (OLS) estimation procedure:
188 Chapter 6

General properties versus One specific application

↓
↓
OLS estimation procedure:
Apply the estimation
Estimate βConst and βx by finding
procedure once to the first
the bConst and bx that minimize the
quiz’s data
sum of squared residuals
↓
↓ Model:
Before experiment yt = βConst + βxxt + et After experiment
↓ ↓
Random variable: OLS equations: Estimate: Numerical value
Probability distribution ↓

∑
T
t =1
( yt − y )( xt − x ) 240 6
bx = bx = = = 1.2
∑ t =1 ( xt − x )2
T
200 5

6
bConst = –y − bxx– bConst = 81 − × 15 = 63
5
Mean[bx] = ?
Var[bx] = ?
↓
Mean and variance describe the center and spread of the estimate’s probability distribution

While we cannot determine the numerical value of the coefficient estimate before the quiz is
given, we can describe its probability distribution. The probability distribution tells us how likely
it is for the coefficient estimate based on a single quiz to equal each of the possible values. Using
algebra, we will derive the general equations for the mean and variance of the coefficient esti-
mate’s probability distribution. Then we will check our algebra with a simulation by exploiting
the relative frequency interpretation of probability: after many, many repetitions the distribution
of the numerical values mirrors the probability distribution for one repetition.
189 Ordinary Least Squares Estimation Procedure—The Properties

What can we deduce before the poll is conducted?

↓
General properties of the OLS estimation
procedure are described by the coefficient
estimate’s probability distribution
↓
Probability distribution is described by its mean
(center) and variance (spread)
↓
Check the algebra with a simulation
Use algebra to derive the equations for the
→ by exploiting the relative frequency
probability distribution’s mean and variance
interpretation of probability

The coefficient estimate’s probability distribution will allow us to assess the reliability of the
coefficient estimate calculated from Professor Lord’s quiz.

6.4 Standard Ordinary Least Squares (OLS) Regression Premises

To derive the equations for the mean and variance of the coefficient estimate’s probability dis-
tribution, we will apply the standard ordinary least squares (OLS) regression premises. As we
mentioned chapter 5, these premises make the analysis as straightforward as possible. In later
chapters we will relax these premises to study more general cases. In other words, we will start
with the most straightforward case and then move on to more complex ones later.
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent: Cov[ei, ej]
= 0. Knowing the value of the error term from one observation does not help us predict the value
of the error term for any other observation.
• Explanatory variable/error term independence premise: The explanatory variables, the xt’s,
and the error terms, the et’s, are not correlated. Knowing the value of an observation’s explana-
tory variable does not help us predict the value of that observation’s error term.

To keep the algebra manageable, we will assume that the explanatory variables are constants in
the derivations that follow. This assumption allows us to apply the arithmetic of means and
variances easily. While this simplifies our algebraic manipulations, it does not affect the validity
of our conclusions.
190 Chapter 6

6.5 New Equation for the Ordinary Least Squares (OLS) Coefficient Estimate

In chapter 5 we derived an equation that expressed the OLS coefficient estimate in terms of the
x’s and y’s:

∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2

It is advantageous to use a different equation to derive the equations for the mean and variance
of the coefficient estimate’s probability distribution, however; we will use an equivalent equation
that expresses the coefficient estimate in terms of the x’s, e’s, and βx rather than in terms of the
x’s and y’s:1

∑
T
( xt − x )et
bx = β x + t =1

∑
T
t =1
( xt − x )2

To keep the notation as straightforward as possible, we will focus on the 3 observation case. The
logic for the general case is identical to the logic for the 3 observation case:
( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x +
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2

6.6 General Properties: Describing the Coefficient Estimate’s Probability Distribution

6.6.1 Mean (Center) of the Coefficient Estimate’s Probability Distribution

To calculate the mean of bx’s probability distribution, review the arithmetic of means:
• Mean of a constant times a variable: Mean[cx] = c Mean[x]
• Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]
• Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]

and recall that the error term represents random influences:

• The mean of each error term’s probability distribution is 0:

Mean[e1] = Mean[e2] = Mean[e3] = 0

1. Appendix 6.1 appearing at the end of this chapter shows how we can derive the second equation for the coefficient
estimate, bx, from the first.
191 Ordinary Least Squares Estimation Procedure—The Properties

Now we apply algebra to the new equation for the coefficient estimate, bx:

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Mean[ bx ] = Mean ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦
Applying Mean[c + x] = c + Mean[x]

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= β x + Mean ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
2 2 ⎠ ⎦
Applying Mean[cx] = c Mean[x]

1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Applying Mean[x + y] = Mean[x] + Mean[y]

1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2

Applying Mean[cx] = c Mean[x]

1
= βx + [( x1 − x )Mean[e1 ] + ( x2 − x )Mean[e2 ] + ( x3 − x )Mean[e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
since Mean[e1] = Mean[e2] = Mean[e3] = 0 = βx. So, as shown in figure 6.1, we have

Mean[bx] = βx

Consequently the ordinary least squares (OLS) estimation procedure for the value of the coef-
ficient is unbiased. In any one repetition of the experiment, the mean (center) of the probability
distribution equals the actual value of the coefficient. The estimation procedure does not sys-
tematically overestimate or underestimate the actual coefficient value, βx. If the probability
distribution is symmetric, the chances that the estimate calculated from one quiz will be too high
equal the chances that it will be too low.

Econometrics Lab 6.1: Checking the Equation for the Mean

We can use the Distribution of Coefficient Estimates simulation in our Econometrics Lab to
replicate the quiz many, many times. But in reality, Clint only has information from one quiz,
the first quiz. How then can a simulation be useful? The relative frequency interpretation of
192 Chapter 6

Probability distribution

bx
Mean[ b ]
x
= βx

Figure 6.1
Probability distribution of coefficient estimates

probability provides the answer. The relative frequency interpretation of probability tells us that
the distribution of the numerical values after many, many repetitions of the experiments mirrors
the probability distribution of one repetition. Consequently repeating the experiment many, many
times reveals the probability distribution for the one quiz:

Distribution of the
numerical values
After many,
many
repetitions
Probability distribution

We can use the simulation to check the algebra we used to derive the equation for the mean of
the coefficient estimate’s probability distribution:

Mean[bx] = βx

If our algebra is correct, the mean (average) of the estimated coefficient values should equal the
actual value of the coefficient, βx, after many, many repetitions (see figure 6.2).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.1.]
193 Ordinary Least Squares Estimation Procedure—The Properties

Act coef Act err var

Unbiased estimation Is the estimation
procedure: After many, −2 procedure for the 200
0 coefficient’s value 350
many repetitions of the
2 unbiased? 500
experiment the mean
(average) of the estimates Mean (average) of
equals the actual value the estimated
Repetition coefficient values
from all repetitions
Estimated coefficient value
from this repetition: Coef est
Variance of the estimated
Σt=1 (yt − −y)(xt − −x)
T
coefficient values from all
Mean
bx = repetitions
Σt=1 (xt − −x)2
T
Var

Figure 6.2
Distribution of coefficient estimates simulation

Recall that a simulation allows us to do something that we cannot do in the real world. In the
simulation, we can specify the actual values of the constant and coefficient, βConst and βx. The
default setting for the actual coefficient value is 2. Be certain that the Pause checkbox is checked.
Click Start. Record the numerical value of the coefficient estimate for the first repetition. Click
Continue to simulate the second quiz. Record the value of the coefficient estimate for the second
repetition and calculate the mean and variance of the numerical estimates for the first two repeti-
tions. Note that your calculations agree with those provided by the simulation. Click Continue
again to simulate the third quiz. Calculate the mean and variance of the numerical estimates for
the first three repetitions. Once again, note that your calculations and the simulation’s calcula-
tions agree. Continue to click Continue until you are convinced that the simulation is calculating
the mean and variance of the numerical values for the coefficient estimates correctly.
Now clear the Pause checkbox and click Continue. The simulation no longer pauses after
each repetition. After many, many repetitions click Stop.

Question: What does the mean (average) of the coefficient estimates equal?
Answer: It equals about 2.0.

This lends support to the equation for the mean of the coefficient estimate’s probability distribu-
tion that we just derived (table 6.2). Now change the actual coefficient value from 2 to 4. Click
Start, and then after many, many repetitions click Stop. What does the mean (average) of the
estimates equal? Next, change the actual coefficient value to 6 and repeat the process.
Note that in all cases the mean (average) of the estimates for the coefficient value equals the
actual value of the coefficient after many, many repetitions (figure 6.3).
The simulations confirm our algebra. The estimation procedure does not systematically under-
estimate or overestimate the actual value of the coefficient. The ordinary least squares (OLS)
estimation procedure for the coefficient value is unbiased.
194 Chapter 6

Table 6.2
Distribution of Coefficient Estimates simulation results

Equation:
Mean of coef Simulation:
Actual estimate prob dist Simulation Mean (average) of estimated coef
βx Mean[bx] repetitions values, bx, from the experiments

2 2 >1,000,000 ≈ 2.0
4 4 >1,000,000 ≈ 4.0
6 6 >1,000,000 ≈ 6.0

Probability distribution

bx
2

Figure 6.3
Histogram of coefficient value estimates

6.6.2 Variance (Spread) of the Coefficient Estimate’s Probability Distribution

Next we turn our attention to the variance of the coefficient estimate’s probability distribution.
To derive the equation for the variance, begin by reviewing the arithmetic of variances:
• Variance of a constant times a variable: Var[cx] = c2 Var[x]
• Variance of the sum of a variable and a constant: Var[c + x] = Var[x]
• Variance of the sum of two independent variables: Var[x + y] = Var[x] + Var[y]

Focus on the first two standard ordinary least squares (OLS) premises:
• Error term/equal variance premise: Var[e1] = Var[e2] = Var[e3] = Var[e].
• Error term/error term independence premise: The error terms are independent; that is, Cov[et,
ej] = 0.
195 Ordinary Least Squares Estimation Procedure—The Properties

Recall the new equation for bx:

( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3

bx = β x +
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2

Therefore

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Var[ bx ] = Var ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦

Applying Var[c + x] = Var[x]

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= Var ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦
Applying Var[cx] = c2Var[x]

1
= Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ]2
Error term/error term independence premise:

Var[ x + y] = Var[ x] + Var[ y]

1
= [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Applying Var[cx] = c2 Var[x]

1
= [( x1 − x )2 Var[e1 ] + ( x2 − x )2 Var[e2 ] + ( x3 − x )2 Var[e3 ]]
[( x1 − x ) 2
+ ( x2 − x ) + ( x3 − x )
2
]
2 2

Error term/equal variance premise:

Var[ e1 ] = Var[ e2 ] = Var[ e3 ] = Var[ e]
1
= [( x1 − x )2 Var[e] + ( x2 − x )2 Var[e] + ( x3 − x )2 Var[e]]
[ 1
( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ]
2 2

Factoring out the Var[e]

1
= [( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ] Var[e]]
[( x1 − x ) + ( x2 − x )2 + ( x3 − x )2 ]
2 2
196 Chapter 6

Simplifying

Var[e]
=
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2

We can generalize this:

Var[e]
Var[bx ] =
∑
T
t =1
( xt − x )2

The variance of the coefficient estimate’s probability distribution equals the variance of the error
term’s probability distribution divided by the sum of squared x deviations.

Econometrics Lab 6.2: Checking the Equation for the Variance

We will now use the Distribution of Coefficient Estimates simulation to check the equation that
we just derived for the variance of the coefficient estimate’s probability distribution (figure 6.4).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.2.]

The simulation automatically spreads the x values uniformly between 0 and 30. We will continue
to consider three observations; accordingly, the x values are 5, 15, and 25. To convince yourself
of this, be certain that the Pause checkbox is checked. Click Start and then Continue a few
times to observe that the values of x are always 5, 15, and 25.

Act coef Act err var

Unbiased estimation
procedure: After many, –2 200
0 350
many repetitions of the 500
2
experiment the mean
(average) of the estimates Mean (average) of
the estimated Var[ e]
equals the actual value Var[bx] =
Repetition coefficient values
from all repetitions
Σt=1T (x − −x)2
Estimated coefficient value
from this repetition: Coef est
Is the coefficient variance
Σt=1 (yt − −y)(xt − −x)
T
Mean equation correct?
bx=
Σt=1 (xt − −x)2
T
Var
Variance of the estimated
coefficient values from all
repetitions

Figure 6.4
Distribution of Coefficient Estimates simulation
197 Ordinary Least Squares Estimation Procedure—The Properties

Next recall the equation we just derived for the variance of the coefficient estimate’s probabil-
ity distribution:

Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2

By default, the variance of the error term probability distribution is 500; therefore the numerator
equals 500. Let us turn our attention to the denominator, the sum of squared x deviations. We
have just observed that the x values are 5, 15, and 25. Their mean is 15 and their sum of squared
deviations from the mean is 200:

x1 + x2 + x3 5 + 15 + 25 45
x= = = = 15
3 3 3

Student xt x– xt − x– (xt − x– )2
1 5 15 −10 (−10)2 = 100
2 15 15 0 (0)2 = 0
3 25 15 10 (10)2 = 100
Sum = 200

∑
T
t =1
( xt − x )2 = 200

That is,

(x1 − x– )2 + (x2 − x– )2 + (x3 − x– )2 = 200

When the variance of the error term’s probability distribution equals 500 and the sum of squared
x deviations equals 200, the variance of the coefficient estimate’s probability distribution
equals 2.50:

Var[e] Var[e] 5000

Var[bx ] = = = = 2.50
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
T 2 2 2
t =1
( xt − x ) 2 200

To show that the simulation confirms this, be certain that the Pause checkbox is cleared
and click Continue. After many, many repetitions click Stop. Indeed, after many, many repeti-
tions of the experiment the variance of the numerical values is about 2.50. The simulation
confirms the equation we derived for the variance of the coefficient estimate’s probability
distribution.
198 Chapter 6

6.7 Estimation Procedures and the Estimate’s Probability Distribution: Importance of the Mean
(Center) and Variance (Spread)

Let us review what we learned about estimation procedures when we studied Clint’s opinion
poll in chapter 3:
• Importance of the probability distribution’s mean: Formally, an estimation procedure is
unbiased whenever the mean (center) of the estimate’s probability distribution equals the actual
value. The relative frequency interpretation of probability provides intuition: If the experiment
were repeated many, many times the average of the numerical values of the estimates will equal
the actual value. An unbiased estimation procedure does not systematically underestimate or
overestimate the actual value. If the probability distribution is symmetric, the chances that the
estimate calculated from one repetition of the experiment will be too high equal the chances the
estimate will be too low (figure 6.5).
• Importance of the probability distribution’s variance: When the estimation procedure is
unbiased, the variance of the estimate’s probability distribution’s variance (spread) reveals the
estimate’s reliability; the variance tells us how likely it is that the numerical value of the estimate
calculated from one repetition of the experiment will be close to the actual value (figure 6.6).

When the estimation procedure is unbiased, the variance of the estimate’s probability distribu-
tion determines reliability.
• On the one hand, as the variance decreases, the probability distribution becomes more tightly
cropped around the actual value making it more likely for the estimate to be close to the actual
value.
• On the other hand, as the variance increases, the probability distribution becomes less tightly
cropped around the actual value making it less likely for the estimate to be close to the actual
value.

Estimate
Actual value

Figure 6.5
Probability distribution of estimates—Importance of the mean
199 Ordinary Least Squares Estimation Procedure—The Properties

Variance large Variance small

Estimate Estimate
Actual value Actual value

Variance large Variance small

↓ ↓
Small probability that the numerical Large probability that the numerical
value of the estimate from value of the estimate from
one repetition of the experiment one repetition of the experiment
will be close to the actual value will be close to the actual value

↓ ↓
Estimate is unreliable Estimate is reliable

Figure 6.6
Probability distribution of estimates—Importance of the variance

6.8 Reliability of the Coefficient Estimate

We will focus on the variance of the coefficient estimate’s probability distribution to explain
what influences its reliability. We will consider three factors:
• Variance of the error term’s probability distribution
• Sample size
• Range of the x’s

6.8.1 Estimate Reliability and the Variance of the Error Term’s Probability Distribution

What is our intuition here? The error term represents the random influences. It is the error term
that introduces uncertainty into the mix. On the one hand, as the variance of the error term’s
probability distribution increases, uncertainty increases; consequently the available information
becomes less reliable, and we would expect the coefficient estimate to become less reliable. On
the other hand, as the variance of the error term’s probability distribution decreases, the available
information becomes more reliable, and we would expect the coefficient estimate to become
more reliable.
200 Chapter 6

To justify this intuition, recall the equation for the variance of the coefficient estimate’s prob-
ability distribution:

Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2

The variance of the coefficient estimate’s probability distribution is directly proportional to the
variance of the error term’s probability distribution.

Econometrics Lab 6.3: Variance of the Error Term’s Probability Distribution

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.3.]

We will use the Distribution of Coefficient Estimates simulation to confirm the role played by
the variance of the error term’s probability distribution. To do so, check the From–To checkbox.
Two lists now appear: a From list and a To list. Initially, 1.0 is selected in the From list and 3.0
in the To list. Consequently the simulation will report the percent of repetitions in which the
coefficient estimate falls between 1.0 and 3.0. Since the default value for the actual coefficient,
βx, equals 2.0, the simulation reports on the percent of repetitions in which the coefficient esti-
mate falls within 1.0 of the actual value. The simulation reports the percent of repetitions in
which the coefficient estimate is “close to” the actual value where “close to” is considered to
be within 1.0.
By default, the variance of the error term’s probability distribution equals 500 and the sample
size equals 3. Recall that the sum of the squared x deviations equals 200 and therefore the vari-
ance of the coefficient estimate’s probability distribution equals 2.50:

Var[e] Var[e] 5000

Var[bx ] = = = = 2.50
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 200
T
t =1
( xt − x ) 2

Be certain that the Pause checkbox is cleared. Click Start, and then after many, many repetitions,
click Stop. As table 6.3 reports, the coefficient estimate lies within 1.0 of the actual coefficient
value in 47.3 percent of the repetitions.
Now reduce the variance of the error term’s probability distribution from 500 to 50. The vari-
ance of the coefficient estimate’s probability distribution now equals 0.25:

Var[e] Var[e] 500 1

Var[bx ] = = = = = 0.25
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
T 2 2 2
t =1
( xt − x ) 2 200 4

Click Start, and then after many, many repetitions click Stop. The histogram of the coefficient
estimates is now more closely cropped around the actual value, 2.0. The percent of repetitions
201 Ordinary Least Squares Estimation Procedure—The Properties

Table 6.3
Distribution of Coefficient Estimates simulation reliability results

Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx

Sample x x Mean Percent between

βx Var[e] Size Min Max Mean[bx] Var[bx] (average) Variance 1.0 and 3.0

2 500 3 0 30 2.0 2.50 ≈2.0 ≈2.50 ≈47.3%

2 50 3 0 30 2.0 0.25 ≈2.0 ≈ 0.25 ≈95.5%

in which the coefficient estimate lies within 1.0 of the actual coefficient value rises from 47.3
percent to 95.5 percent.
Why is this increase important? The variance measures the spread of the probability distribu-
tion. This is important when the estimation procedure is unbiased. As the variance decreases,
the probability distribution becomes more closely cropped around the actual coefficient value
and the chances that the coefficient estimate obtained from one quiz will lie close to the actual
value increases. The simulation confirms this; after many, many repetitions the percent of repeti-
tions in which the coefficient estimate lies between 1.0 and 3.0 increases from 47.3 percent to
95.5 percent. Consequently, as the error term’s variance decreases, we can expect the estimate
from one quiz to be more reliable. As the variance of the error term’s probability distribution
decreases, the estimate is more likely to be close to the actual value. This is consistent with our
intuition, is it not?

6.8.2 Estimate Reliability and the Sample Size

Next we will investigate the effect of the sample size, the number of observations, used to cal-
culate the estimate. Increase the sample size from 3 to 5. What does our intuition suggest? As
we increase the number of observations, we will have more information. With more information
the estimate should become more reliable; that is, with more information the variance of the
coefficient estimate’s probability distribution should decrease. Using the equation, let us now
calculate the variance of the coefficient estimate’s probability distribution when there are 5
observations. With 5 observations the x values are spread uniformly at 3, 9, 15, 21, and 27; the
mean (average) of the x’s, x– , equals 15 and the sum of the squared x deviations equals 360:

x1 + x2 + x3 + x4 + x5 3 + 9 + 15 + 21 + 27 75
x= = = = 15
5 3 5
202 Chapter 6

Student xt x– xt − x– (xt − x– )2
1 3 15 −12 (−12)2 = 144
2 9 15 −6 (−6)2 = 36
3 15 15 0 (0)2 = 0
4 21 15 6 (6)2 = 36
5 27 15 12 (12)2 = 144
Sum = 360

∑
T
t =1
( xt − x )2 = 360

Applying the equation for the value of the coefficient estimate’s probability distribution obtains

Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )2 + ( x4 − x )2 + ( x5 − x )2
T 2 2
t =1
( xt − x ) 2

50
=
(3 − 15)2 + (9 − 15)2 + (15 − 15)2 + (21 − 15)2 + (27 − 15)2
50 50 50
= = = = 0.1388 . . . ≈ 0.14
(−12)2 + (6)2 + (0)2 + (6)2 + (12)2 144 + 36 + 0 + 36 + 144 360
The variance of the coefficient estimate’s probability distribution falls from 0.25 to 0.14. The
smaller variance suggests that the coefficient estimate will be more reliable.

Econometrics Lab 6.4: Sample Size

Are our intuition and calculations supported by the simulation? The answer is in fact yes.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.4.]

Note that the sample size has increased from 3 to 5. Click Start, and then after many, many
repetitions click Stop (table 6.4).
After many, many repetitions the percent of repetitions in which the coefficient estimate lies
between 1.0 and 3.0 increases from 95.5 percent to 99.3 percent. As the sample size increases,
we can expect the estimate from one quiz to be more reliable. As the sample size increases, the
estimate is more likely to be close to the actual value.

6.8.3 Estimate Reliability and the Range of x’s

Let us again begin by appealing to our intuition. As the range of x’s becomes smaller, we are
basing our estimates on less variation in the x’s, less diversity; accordingly we are basing our
203 Ordinary Least Squares Estimation Procedure—The Properties

Table 6.4
Distribution of Coefficient Estimates simulation reliability results

Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx

Sample x x Mean Percent between

βx Var[e] size Min Max Mean[bx] Var[bx] (average) Variance 1.0 and 3.0

2 500 3 0 30 2.0 2.50 ≈2.0 ≈2.50 ≈47.3%

2 50 3 0 30 2.0 0.25 ≈2.0 ≈ 0.25 ≈95.5%
2 50 5 0 30 2.0 0.14 ≈2.0 ≈ 0.14 ≈99.3%

estimates on less information. As the range becomes smaller, the estimate should become less
reliable, and consequently the variance of the coefficient estimate’s probability distribution
should increase. To confirm this, increase the minimum value of x from 0 to 10 and decrease
the maximum value from 30 to 20. The five x values are now spread uniformly between 10 and
20 at 11, 13, 15, 17, and 19; the mean (average) of the x’s, x– , equals 15 and the sum of the
squared x deviations equals 40:

x1 + x2 + x3 + x4 + x5 11 + 13 + 15 + 17 + 19 75
x= = = = 15
5 3 5
Student xt x– xt − x– (xt − x– )2
1 11 15 −4 (−4)2 = 16
2 13 15 −2 (−2)2 = 4
3 15 15 0 (0)2 = 0
4 17 15 2 (2)2 = 4
5 19 15 4 (6)2 = 16
Sum = 40

∑
T
t =1
( xt − x )2 = 40

Applying the equation for the value of the coefficient estimate’s probability distribution:

Var[e] Var[e]
Var[bx ] = =
∑ ( x1 − x ) + ( x2 − x ) + ( x3 − x )2 + ( x4 − x )2 + ( x5 − x )2
T 2 2
t =1
( xt − x ) 2

50
=
(11 − 15)2 + (13 − 15)2 + (15 − 15)2 + (17 − 15)2 + (19 − 15)2
50 50 50 5
= = = = = 1.25
(−4)2 + (2)2 + (0)2 + (2)2 + (4)2 16 + 4 + 0 + 4 + 16 40 4
204 Chapter 6

Table 6.5
Distribution of Coefficient Estimates simulation reliability results

Probability
Actual distribution Simulations:
values Equations Estimated coefficient values, bx

Sample x x Mean Percent between

βx Var[e] size Min Max Mean[bx] Var[bx] (average) Variance 1.0 and 3.0

2 500 3 0 30 2.0 2.50 ≈2.0 ≈2.50 ≈ 47.3%

2 50 3 0 30 2.0 0.25 ≈2.0 ≈ 0.25 ≈ 95.5%
2 50 5 0 30 2.0 0.14 ≈2.0 ≈ 0.14 ≈ 99.3%
2 50 5 10 20 2.0 1.25 ≈2.0 ≈1.25 ≈ 62.8%

The variance of the coefficient estimate’s probability distribution increases from about 0.14
to 1.25.

Econometrics Lab 6.5: Range of x’s

Our next lab confirms our intuition.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.5.]

After changing the minimum value of x to 10 and the maximum value to 20, click the Start,
and then after many, many repetitions click Stop.
After many, many repetitions the percent of repetitions in which the coefficient estimate lies
between 1.0 and 3.0 decreases from 99.3 percent to 62.8 percent (table 6.5). An estimate from
one repetition will be less reliable. As the range of the x’s decreases, the estimate is less likely
to be close to the actual value.

6.8.4 Reliability Summary

Our simulation results illustrate relationships between information, the variance of the coefficient
estimate’s probability distribution, and the reliability of an estimate:

More and/or more reliable information Less and/or less reliable information
↓ ↓
Variance of coefficient estimate’s Variance of coefficient estimate’s probability
probability distribution smaller distribution larger
↓ ↓
Estimate more reliable; more likely the Estimate less reliable; less likely the estimate is
estimate is “close to” the actual value “close to” the actual value
205 Ordinary Least Squares Estimation Procedure—The Properties

6.9 Best Linear Unbiased Estimation Procedure (BLUE)

In chapter 5 we introduced the mechanics of the ordinary least squares (OLS) estimation pro-
cedure and in this chapter we analyzed the procedure’s properties. Why have we devoted so
much attention to this particular estimation procedure? The reason is straightforward. When the
standard ordinary least squares (OLS) premises are satisfied, no other linear estimation procedure
produces more reliable estimates. In other words, the ordinary least squares (OLS) estimation
procedure is the best linear unbiased estimation procedure (BLUE). Let us now explain this
more carefully.
If an estimation procedure is the best linear unbiased estimation procedure (BLUE), it must
exhibit three properties:
• The estimate must be a linear function of the dependent variable, the yt’s.
• The estimation procedure must be unbiased; that is, the mean of the estimate’s probability
distribution must equal the actual value.
• No other linear unbiased estimation procedure can be more reliable; that is, the variance of
the estimate’s probability distribution when using any other linear unbiased estimation procedure
cannot be less than the variance when the best linear unbiased estimation procedure is used.

The Gauss–Markov theorem proves that the ordinary least squares (OLS) estimation procedure
is the best linear unbiased estimation procedure.2 We will illustrate the theorem by describing
two other linear unbiased estimation procedures that while unbiased, are not as reliable as the
ordinary least squares (OLS) estimation procedure. Note that while we would never use either
of these estimation procedures to do serious analysis, they are useful pedagogical tools. They
allow us to illustrate what we mean by the best linear unbiased estimation procedure.

6.9.1 Two New Estimation Procedures

We will now consider the Any Two and the Min–Max estimation procedures:
• Any Two estimation procedure: Choose any two points on the scatter diagram (figure 6.7);
draw a straight line through the points. The coefficient estimate equals the slope of this line.
• Min–Max estimation procedure: Choose two specific points on the scatter diagram (figure
6.8); the point with the smallest value of x and the point with the largest value of x; draw a
straight line through the two points. The coefficient estimate equals the slope of this line.

2. The proof appears at the end of this chapter in appendix 6.2.

206 Chapter 6

Any Two
y

x
3 9 15 21 27

Figure 6.7
Any Two estimation procedure

y Min-Max

x
3 9 15 21 27

Figure 6.8
Min–Max estimation procedure
207 Ordinary Least Squares Estimation Procedure—The Properties

Table 6.6
BLUE simulation results

Sample size = 5
Simulations:
Actual Values Estimated coefficient values, bx

Estimation Mean Percent between

procedure βx Var[e] (average) Variance 1.0 and 3.0

OLS 2.0 500 ≈2.0 ≈1.4 ≈ 60.4%

Any Two 2.0 50 ≈2.0 ≈14.0 ≈29.0%
Min–Max 2.0 50 ≈2.0 ≈1.7 ≈55.2%

Econometrics Lab 6.6: Comparing the Ordinary Least Squares (OLS), Any Two, and Min–Max
Estimation Procedures

We will now use the BLUE simulation in our Econometrics Lab to justify our emphasis on the
ordinary least squares (OLS) estimation procedure.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 6.6.]

By default, the sample size equals 5 and the variance of the error term’s probability distribution
equals 500. The From–To values are specified as 1.0 and 3.0 (table 6.6).
Initially the ordinary least squares (OLS) estimation procedure is specified. Be certain that
the Pause checkbox is cleared. Click Start, and then after many, many repetitions click Stop.
For the OLS estimation procedure, the average of the estimated coefficient values equals about
2.0 and the variance 1.4. 60.4 percent of the estimates lie with 1.0 of the actual value. Next
select the Any Two estimation procedure instead of OLS. Click Start, and then after many, many
repetitions click Stop. For the Any Two estimation procedure, the average of the estimated coef-
ficient values equals about 2.0 and the variance 14.0; 29.0 percent of the estimates lie within
1.0 of the actual value. Repeat the process one last time after selecting the Min–Max estimation
procedure; the average equals about 2.0 and the variance 1.7; 55.2 percent of the estimates lie
with 1.0 of the actual value.
Let us summarize:
• In all three cases the average of the coefficient estimates equal 2.0, the actual value; after many,
many repetitions the mean (average) of the estimates equals the actual value. Consequently all
three estimation procedures for the coefficient value appear to be unbiased.
• The variance of the coefficient estimate’s probability distribution is smallest when the ordinary
least squares (OLS) estimation procedure is used. Consequently the ordinary least squares (OLS)
estimation procedure produces the most reliable estimates.
208 Chapter 6

What we have just observed can be generalized. When the standard ordinary least squares (OLS)
regression premises are met, the ordinary least squares (OLS) estimation procedure is the best
linear unbiased estimation procedure because no other linear unbiased estimation procedure
produces estimates that are more reliable.

Chapter 6 Review Questions

1. Consider the ordinary least squares (OLS) estimation procedure.

a. Why is the mean of the estimate’s probability distribution important? Explain.
b. Why is the variance of the estimate’s probability distribution important? Explain.
2. Consider the ordinary least squares (OLS) estimation procedure. When the variance of the
error term’s probability distribution increases:
a. How, if at all, is the mean of the coefficient estimate’s probability distribution affected?
b. How, if at all, is the variance of the coefficient estimate’s probability distribution affected?
c. How is the reliability of the coefficient estimate affected? Explain intuitively why this
occurs.
3. Consider the ordinary least squares (OLS) estimation procedure. When the sample size
increases:
a. How, if at all, is the mean of the coefficient estimate’s probability distribution affected?
b. How, if at all, is the variance of the coefficient estimate’s probability distribution affected?
c. How is the reliability of the coefficient estimate affected? Explain intuitively why this
occurs.
4. Consider the ordinary least squares (OLS) estimation procedure. When the range of the
explanatory variable decreases:
a. How, if at all, is the mean of the coefficient estimate’s probability distribution affected?
b. How, if at all, is the variance of the coefficient estimate’s probability distribution affected?
c. How is the reliability of the coefficient estimate affected? Explain intuitively why this
occurs.
5. Why have we placed so much emphasis on the ordinary least squares (OLS) estimation
procedure?

Chapter 6 Exercises

1. Assume that the standard ordinary least square (OLS) premises are met. Let (xi, yi) and (xj,
yj) be the values of the explanatory and dependent variables from two different observations.
209 Ordinary Least Squares Estimation Procedure—The Properties

Also let
bSlope = slope of the straight line connecting the two points representing these two
observations
a. Express bSlope in terms of xi, yi, xj, and yj.
Consider the simple regression model and two different observations, i and j:

yi = βConst + βxxi + ei
yj = βConst + βxxj + ej

b. Using the simple regression model, substitute for yi and yj in the expression for bSlope.
(Assume that xi does not equal xj.)
c. What does the mean of bSlope’s probability distribution, Mean[bSlope], equal?
d. What does the variance of bSlope’s probability distribution, Var[bSlope], equal?
2. Assume that the standard ordinary least square (OLS) premises are met. Consider the Min–
Max estimation procedure that we simulated in Econometrics Lab 6.6. Let
• The actual coefficient equals 2 (βx = 2).
• The variance of the error term’s probability distribution equals 500 (Var[e] = 500).
• The sample size equals 5, and the values of the x’s equal: 3, 9, 15, 21, and 27.

Using your answers to exercise 1:

a. What does the mean of the Min–Max estimate’s probability distribution equal?
b. What does the variance of the Min–Max estimate’s probability distribution equal?
c. Are your answers consistent with the simulations of the Min–Max estimation procedure
that we reported at the end of this chapter?
3. Assume that the standard ordinary least square (OLS) premises are met. Consider the Any
Two estimation procedure that we simulate in this chapter’s Econometrics Lab 6.6. Let
• The actual coefficient equals 2 (βx = 2).
• The variance of the error term’s probability distribution equals 500 (Var[e] = 500).
• The sample size is 5, and the values of the x’s are: 3, 9, 15, 21, and 27.
Using your answers to exercise 1:
a. What does the mean of the Any Two estimate’s probability distribution equal?
b. What does the variance of the Any Two estimate’s probability distribution equal?
c. Are your answers consistent with the simulations of the Any Two estimation procedure
that we reported at the end of this chapter?

Revisit the US crude oil production data.

210 Chapter 6

Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.

OilProdBarrelst US crude oil productions in year t (thousands of barrels per day)

Pricet Real price of crude oil in year t (dollars per barrel—1982–84 = 100)

4. Using statistical software, generate a new variable that expresses crude oil production in
thousands of gallons per day rather than thousands of barrels per day. Call the new variable
OilProdGallons. Note that there are 42 gallons in 1 barrel.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crude Oil Production.]

Getting Started in EViews

After opening the file:

• In the Workfile window: click Genr.
• In the Generate Series by Equation window enter the formula for the new series:

OilProdGallons = OilProdBarrels*42

Note that the asterisk, *, is EViews’ multiplication symbol.

• Click OK.

a. Run the following ordinary least squares (OLS) regressions:

Dependent variable: OilProdBarrels (crude oil production expressed in barrels)

Explanatory variable: Price

i. Based on this OilProdBarrels regression, estimate the effect of a $1 increase in price

on the barrels of oil produced.
ii. Based on your answer to part i, estimate the effect of a $1 increase in price on the
gallons of oil produced. (Remember, there are 42 gallons in 1 barrel.)
b. Run the following ordinary least squares (OLS) regressions:

Dependent variable: OilProdGallons (crude oil production expressed in gallons)

Explanatory variable: Price

Based on this OilProdGallons regression, estimate the effect of a $1 increase in price on the
gallons of oil produced.
211 Ordinary Least Squares Estimation Procedure—The Properties

c. Do the units in which the dependent variable is measured influence the estimate of how
the explanatory variable affects the dependent variable?
5. Using statistical software, generate a new variable that adds the constant 1,000 to OilProd-
Barrels in every year. Call the new variable OilProdBarrels1000.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crude Oil Production.]

Getting Started in EViews

After opening the file:

• In the Workfile window: click Genr.
• In the Generate Series by Equation window enter the formula for the new series:

OilProdBarrelsPlus1000 = OilProdBarrels + 1000

Note that the asterick, *, is EViews multiplication symbol.

• Click OK.

a. Run two ordinary least squares (OLS) regressions:

i. Dependent variable: OilProdBarrels
Explanatory variable: Price
What equation estimates the dependent variable, OilProdBarrels; that is, what is the equa-
tion for the best fitting line?
ii. Dependent variable: OilProdBarrelsPlus1000
Explanatory variable: Price
What equation estimates the dependent variable, OilProdBarrelsPlus1000; that is, what
is the equation for the best fitting line?
b. Compare the equations for the best fitting lines.
c. Explain the shift in the best fitting line?

Revisit the US gasoline consumption data.

Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.
GasConst US gasoline consumption in year t (millions of gallons per day)
PriceDollarst Real price of gasoline in year t (dollars per gallon –2000 dollars)
212 Chapter 6

6. Using statistical software, generate a new variable that expresses the price of gasoline in cents
rather than dollars. Call this new variable PriceCents.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]
a. Run the following ordinary least squares (OLS) regressions:

Dependent variable: GasCons

Explanatory variable: PriceDollars

i. Based on this PriceDollars regression, estimate the effect of a $1 increase in price on

the gallons of gasoline demanded.
ii. Based on your answer to part i, estimate the effect a 1 cent increase in price would
have on the gallons of gasoline demanded.
b. Run the following ordinary least squares (OLS) regressions:

Dependent variable: GasCons

Explanatory variable: PriceCents

Based on this PriceCents regression, estimate the effect that a 1 cent increase in price has on
the gallons of gasoline demanded.
c. Do the units in which the explanatory variable is measured influence the estimate of how
the explanatory variable affects the dependent variable?
7. Generate a new variable that equals the dollar price of gasoline, PriceDollars, plus 2. Call
this new variable PriceDollarsPlus2.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

a. Run two ordinary least squares (OLS) regressions:

i. Dependent variable: GasCon
Explanatory variable: PriceDollars
What equation estimates the dependent variable, GasCons; that is, what is the equation
for the best fitting line?
ii. Dependent variable: GasCon
Explanatory variable: PriceDollarsPlus2
What equation estimates the dependent variable, GasCons; that is, what is the equation
for the best fitting line?
213 Ordinary Least Squares Estimation Procedure—The Properties

b. Compare the estimated constants in the two regressions.

c. Compare the estimated coefficients in the two regressions.
d. What happens to the ordinary least squares (OLS) best fitting line when a constant is added
to the explanatory variable?

Appendix 6.1: New Equation for the OLS Coefficient Estimate

Begin by recalling the expression for bx that we derived previously in chapter 5:

∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2

bx is expressed in terms of the x’s and y’s. We wish to express bx in terms of the x’s, e’s, and βx.

Strategy: Focus on the numerator of the expression for bx and substitute for the y’s to express
the numerator in terms of the x’s, e’s, and βx. As we will shortly show, once we do this, our goal
will be achieved.

∑
T
We begin with the numerator, t =1
( yt − y )( xt − x ), and substitute βConst + βxxt + et for yt:

∑ ( yt − y )( xt − x ) = ∑ t =1 (βConst + β x xt + et − y )( xt − x )
T T
t =1

Rearranging terms

= ∑ t =1 (βConst − y + β x xt + et )( xt − x )
T

Adding and subtracting βxx–

= ∑ t =1 (βConst + β x x − y + β x xt − β x x + et )( xt − x )
T

Simplifying

= ∑ t =1[(βConst + β x x − y ) + β x ( xt − x ) + et )]( xt − x )
T

Splitting the summation into three parts

= ∑ t =1 (βConst + β x x − y )( xt − x ) + ∑ t =1 β x ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T
214 Chapter 6

Simplifying the first and second terms

= (βConst + β x x − y )∑ t =1 ( xt − x ) + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T

Now focus on the first term, (βConst + β x x − y )∑ t =1 ( xt − x ) . What does ∑

T T
t =1
( xt − x ) equal?

∑ ( xt − x ) = ∑ t =1 xt − ∑ t =1 x
T T T
t =1

∑
T
Replacing t =1
x with Tx– ,

= ∑ t =1 xt − T x
T

since x = ∑ t =1 xt T
T

∑
T
x
= ∑ t =1 xt t =1 t
T
−T
T

Simplifying

= ∑ t =1 xt − ∑ t =1 xt
T T

∑
T
Next return to the expression for the numerator, t =1
( yt − y )( xt − x ) :

∑ ( yt − y )( xt − x ) = (βConst + β x x − y )∑ t =1 ( xt − x ) + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T T
t =1

↓ ∑ t =1 ( xt − x ) = 0
T

= 0 + β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T

Therefore

∑ ( yt − y )( xt − x ) = β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T T
t =1

Last, apply this to the equation we derived for bx in chapter 5:

∑
T
t =1
( yt − y )( xt − x )
bx =
∑
T
t =1
( xt − x )2
215 Ordinary Least Squares Estimation Procedure—The Properties

Substituting for the numerator

β x ∑ t =1 ( xt − x )2 + ∑ t =1 ( xt − x )et
T T

=
∑
T
t =1
( xt − x )2

Splitting the single fraction into two

β x ∑ t =1 ( xt − x )2 ∑
T T
t =1
( xt − x )et
= +
∑ ∑
T T
t =1
( xt − x )2 t =1
( xt − x )2

Simplifying the first term

∑
T
( xt − x )et
= βx + t =1

∑
T
t =1
( xt − x )2

We have now expressed bx in terms of the x’s, e’s, and βx.

Appendix 6.2: Gauss–Markov Theorem

Gauss–Markov theorem: When the standard ordinary least squares (OLS) premises are satis-
fied, the ordinary least squared (OLS) estimation procedure is the best linear unbiased estimation
procedure.

Proof Let

bOLS
x = ordinary least squares (OLS) estimate

First, note that bOLS

x is a linear function of the y’s:3

∑
T
t =1
( yt − y )( xt − x )
bOLS
x =
∑
T
t =1
( xi − x )2

where
( xt − x )
wtOLS =
∑
T
i =1
( xi − x )2

3. To reduce potential confusion, the summation index in the denominator has been changed from t to i.
216 Chapter 6

Let wOLS
t equal the ordinary least squares (OLS) “linear weights”; more specifically,

bxOLS = ∑ t =1 wtOLS ( yt − y )
T

Now let us derive two properties of wOLS

t :

∑
T
•
t =1
wtOLS = 0

∑
T
•
t =1
wtOLS ( xt − x ) = 1

∑
T
First, t =1
wtOLS = 0:

( xt − x )
∑ wtOLS = ∑ t =1
T T

∑
t =1 T
i =1
( xi − x )2

Placing the summation in the numerator

∑
T
t =1
( xt − x )
=
∑
T
t =1
( xi − x )2

Splitting the summations in the numerator

∑ x − ∑ t =1 x
T T
t =1 t
=
∑
T
t =1
( xi − x )2

since there are Tx– terms

∑
T
t =1 t
x −Tx
=
∑
T
t =1
( x − x )2
i

and since x = ∑ t =1 xt T
T

=
∑
T
t =1 t
x −T (∑ T
x T
t =1 t )
∑
T
t =1
( xi − x ) 2
217 Ordinary Least Squares Estimation Procedure—The Properties

Simplifying

∑ x −∑ x
T T
t =1 t t =1 t
=
∑ (x − x )
T 2
t =1 i

and since the numerator equals 0,

∑
T
Second, t =1
wtOLS = 0:

( xt − x )
∑ wtOLS ( xt − x ) = ∑ t =1
T T
( xt − x )
∑
t =1 T
i =1
( xi − x )2

Simplifying
( xt − x ) 2
= ∑ t =1
T

∑
T
i =1
( xi − x )2

Placing the summation in the numerator

∑
T
t =1
( xt − x ) 2
=
∑
T
t =1
( xi − x )2

and since the numerator and denominator are equal,

Next consider a new linear estimation procedure whose weights are wtOLS + w′t. Only when each
w′t equals 0 will this procedure to identical to the ordinary least squares (OLS) estimation pro-
cedure. Let b′x equal the coefficient estimate calculated using this new linear estimation
procedure:

bx′ = ∑ t =1 (wtOLS + wt′) yt

Now let us perform a little algebra:

bx′ = ∑ t =1 (wtOLS + wt′) yt

Substituting βConst + βxxt + et for yt

= ∑ t =1 (wtOLS + wt′)(βConst + β x xt + et )
T
218 Chapter 6

Multiplying through

= ∑ t =1 (wtOLS + wt′)βConst + ∑ t =1 (wtOLS + wt′)β x xt + ∑ t =1 (wtOLS + wt′)et

T T T

Factoring out βConst from the first term and βx from the second

= βConst ∑ t =1 (wtOLS + wt′) + β x ∑ t =1 (wtOLS + wt′) xt + ∑ t =1 (wtOLS + wt′)et

T T T

Again, simplifying the first two terms

= βConst ∑ t =1 wtOLS + βConst ∑ t =1 wt′ + β x ∑ t =1 wtOLS xt + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)et

T T T T T

∑ ∑
T T
since t =1
wtOLS xt = 0 and t =1
wtOLS xt = 1

= 0 + βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)et

T T T

Therefore

bx′ = βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)et

T T T

Now calculate the mean of the new estimate’s probability distribution, Mean[b′t]:

Mean[ bx′ ] = Mean ⎡βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)et ⎤

T T T
⎣ ⎦
since Mean[c + x] = c + Mean[x]

= βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + Mean ⎡ ∑ t =1 (wtOLS + wt′)et ⎤

T T T
⎣ ⎦
Focusing on the last term, Mean[cx] = c Mean[x],

= βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)Mean[et ]

T T T

Focusing on the last term, since the error terms represents random influences, Mean[et] = 0,

= βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt

T T

The new linear estimation procedure must be unbiased:

Mean[b′x] = βx
219 Ordinary Least Squares Estimation Procedure—The Properties

Therefore

∑ ∑
T T
t =1
wt′ = 0 and t =1
wt′xt = 0

Next calculate the variance of b′t:

Var[ bx′ ] = Var ⎡βConst ∑ t =1 wt′ + β x + β x ∑ t =1 wt′xt + ∑ t =1 (wtOLS + wt′)et ⎤

T T T
⎣ ⎦
since Var[c + x] = Var[x]

= Var ⎡ ∑ t =1 (wtOLS + wt′)et ⎤

T
⎣ ⎦
Since the error terms are independent, covariances equal 0: Var[x + y] = Var[x] + Var[y]

= ∑ t =1 Var[(wtOLS + wt′)et ]
T

and since Var[cx] = c2 Var[x]

= ∑ t =1 (wtOLS + wt′)2 Var[et ]

The variance of each error term’s probability distribution is identical, so Var[e]

= ∑ t =1 (wtOLS + wt′)2 Var[e]

Factoring out Var[e]

= Var[e]∑ t =1 (wtOLS + wt′)2

Expanding the squared terms

= Var[e]∑ t =1 (wtOLS )2 + 2 wtOLS wt′ + (wt′)2

Splitting up the summation

= Var[e] ⎡ ∑ t =1 (wtOLS )2 + 2∑ t =1 wtOLS wt′ + ∑ t =1 (wt′)2 ⎤

T T T
⎣ ⎦

∑
T
Now focus on the cross product terms, t =1
wtOLS wt′ :

( xt − x )
∑ wtOLS wt′ = ∑ t =1
T T
wt′
∑
t =1 T
i =1
( xi − x )2
220 Chapter 6

Placing the summation in the numerator

∑
T
t =1
( xt − x )wt′
=
∑
T
i =1
( xi − x )2

Splitting the summations in the numerator

=
∑
T
t =1 ( x w′ − ∑
t t
T
t =1
xwt′ )
∑
T
i =1
( xi − x ) 2

Factoring out x– from the second term in the numerator

∑ x w ′ − x ∑ t =1 wt′
T T
t =1 t t
=
∑
T
i =1
( xi − x )2

∑ ∑
T T
since t =1 t
x wt′ = 0 and t =1
wt′ = 0

0−0
=
∑
T
i =1
( xi − x )2

and since the numerator equals 0,

Therefore

Var[ bx′ ] = Var[e] ⎡ ∑ t =1 (wtOLS )2 + 2∑ t =1 wtOLS wt′ + ∑ t =1 (wt′)2 ⎤

T T T
⎣ ⎦

∑
T
since t =1
wtOLS wt′ = 0

= Var[e] ⎡ ∑ t =1 (wtOLS )2 + ∑ t =1 (wt′)2 ⎤

T T
⎣ ⎦
The variance of the estimate’s probability distribution is minimized whenever each w′t equals 0,
whenever the estimation procedure is the ordinary least squares (OLS) estimation procedure.
Estimating the Variance of an Estimate’s Probability Distribution
7

Chapter 7 Outline

7.1 Review
7.1.1 Clint’s Assignment
7.1.2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure
7.1.3 Importance of the Coefficient Estimate’s Probability Distribution

7.2 Strategy to Estimate the Variance of the Coefficient Estimate’s Probability Distribution

7.3 Step 1: Estimate the Variance of the Error Term’s Probability Distribution
7.3.1 First Attempt: Variance of the Error Term’s Numerical Values
7.3.2 Second Attempt: Variance of the Residual’s Numerical Values
7.3.3 Third Attempt: “Adjusted” Variance of the Residual’s Numerical Values

7.4 Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate
the Variance of the Coefficient Estimate’s Probability Distribution

7.5 Tying up a Loose End: Degrees of Freedom

7.5.1 Reviewing our Second and Third Attempts to Estimate the Variance of the Error
Term’s Probability Distribution
7.5.2 How Do We Calculate an Average?

7.6 Summary: The Ordinary Least Squares (OLS) Estimation Procedure

7.6.1 Three Important Parts
7.6.2 Regression Results
222 Chapter 7

Chapter 7 Prep Questions

1. Consider an estimate’s probability distribution:

a. Why is the mean of the probability distribution important?
b. Why is the variance of the probability distribution important?
2. Consider Professor Lord’s first quiz.
a. Suppose that we know the actual value of the constant and coefficient. More specifically,
suppose that the actual value of the constant is 50 and the actual value of the coefficient is
2. Fill in the blanks below to calculate each student’s error term and the error term squared.
Then compute the sum of the squared error terms.

βConst = 50 βx = 2 et = yt − (βConst + βxxt)

Student xt yt 50 + 2x et 1st quiz e 2t 1st quiz
1 5 66 50 + 2 × ___ = ____ _____ _____
2 15 87 50 + 2 × ___ = ____ _____ _____
3 25 90 50 + 2 × ___ = ____ _____ _____
Sum = _____

b. In reality, we do not know the actual value of the constant and coefficient. We used the
ordinary least squares (OLS) estimation procedure to estimate their values. The estimated
constant was 63 and the estimated value of the coefficient was 6/5. Fill in the blanks below
to calculate each student’s residual and the residual squared. Then, compute the sum of the
squared residuals.

bConst = 63
6
bx = = 1.2 Rest = yt − (bConst + bxxt)
5
6
Student xt yt Estyt = 63 + xt Rest 1st quiz Res 2t 1st quiz
5
6
1 5 66 63 + × ___ = ____ _____ _____
5
6
2 15 87 63 + × ___ = ____ _____ _____
5
6
3 25 90 63 + × ___ = ____ _____ _____
5
Sum = _____
223 Estimating the Variance of an Estimate’s Probability Distribution

c. Compare the sum of squared errors with the sum of squared residuals.
d. In general, when applying the ordinary least squares (OLS) estimation procedure could
the sum of squared residuals ever exceed the sum of squared errors? Explain.
3. Suppose that student 2 had missed Professor Lord’s quiz.

Student xt yt
1 5 66 xt = minutes studied
3 25 90 yt = quiz score

a. Plot a scatter diagram of the data.

b. What is the equation for the best fitting line?
c. What are the residuals for each observation?
d. Suppose that the quiz scores were different. For example, suppose that student 1 received
a 70 instead of 66.
i. What is the equation for the best fitting line now?
ii. What are the residuals for each observation?
e. Again, suppose that the quiz scores were different. For example, suppose that student 1
received an 86 instead of 66 of 70.
i. What is the equation for the best fitting line now?
ii. What are the residuals for each observation?
f. In general, when there are only two observations what will the residuals for the best fitting
line equal? Explain.

7.1 Review

7.1.1 Clint’s Assignment: Assess the Effect of Studying on Quiz Scores

Clint’s assignment is to assess the effect of studying on quiz scores:

Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz scores.

7.1.2 General Properties of the Ordinary Least Squares (OLS) Estimation Procedure

An estimate’s probability distribution describes the general properties of the estimation proce-
dure. In the last chapter we showed that when the standard ordinary least squares (OLS) premises
are met, the mean of the coefficient estimate’s probability distribution equals the actual value,
βx, and the variance equals the variance of the error term’s probability distribution divided by
the sum of the squared x deviations, Var[ e] ∑ t =1 ( xt − x )2 :
T
224 Chapter 7

Var[ e]
Mean[ bx ] = β x, Var[ bx ] =
∑
T
t =1
( xt − x ) 2

7.1.3 Importance of the Coefficient Estimate’s Probability Distribution

Let us now review the importance of the mean and variance. In general, the mean and variance
of the coefficient estimate’s probability distribution play important roles:
• Mean: When the mean of the estimate’s probability distribution equals the actual value the
estimation procedure is unbiased. An unbiased estimation procedure does not systematically
underestimate or overestimate the actual value.
• Variance: When the estimation procedure is unbiased, the variance of the estimate’s prob-
ability distribution determines the reliability of the estimate. As the variance decreases, the
probability distribution becomes more tightly cropped around the actual value making it more
likely for the coefficient estimate to be close to the actual value:

Mean of estimate’s probability Variance of estimate’s

distribution equals actual value probability distribution
↓ ↓
Estimation procedure is Determines the reliability As variance decreases
→ →
unbiased of the estimate reliability increases

We can apply these general concepts to the ordinary least squares (OLS) estimation procedure.
The mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual value
of the coefficient, βx; consequently the ordinary least squares (OLS) estimation procedure is
unbiased. The variance of the coefficient estimate’s probability distribution is now important;
the variance determines the reliability of the estimate. What does the variance equal? We derived
the equation for the variance:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2

But neither Clint nor we know the variance of the error term’s probability distribution, Var[e].
How then can the variance of the variance of the coefficient estimate’s probability distribution
be calculated? How can of Clint proceed? When Clint was faced with a similar problem before,
what did he do? Clint used the econometrician’s philosophy:

Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.

What information does Clint have? Clint has the data from Professor Lord’s first quiz
(table 7.1).
225 Estimating the Variance of an Estimate’s Probability Distribution

Table 7.1
First quiz results

Student Minutes studied (x) Quiz score (y)

1 5 66
2 15 87
3 25 90

How can Clint use this information to estimate the variance of the coefficient estimate’s probabil-
ity distribution?

7.2 Strategy to Estimate the Variance of the Coefficient Estimate’s Probability Distribution

Clint needs a procedure to estimate the variance of the coefficient estimate’s probability distribu-
tion. Ideally this procedure should be unbiased. That is, it should not systematically underesti-
mate or overestimate the actual variance. His approach will be based on the relationship between
the variance of the coefficient estimate’s probability distribution and the variance of the error
term’s probability distribution that we derived in chapter 6:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2

Clint’s strategy is to replace the actual variances in this equation with estimated variances:
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2

where

EstVar[bx] = estimated variance of the coefficient estimate’s probability distribution

EstVar[e] = estimated variance of the error term’s probability distribution

Clint adopts a two-step strategy:

Step 1: Clint estimates the variance of the error term’s probability distribution.
Step 2: Clint uses the estimate for the variance of the error term’s probability distribution to
estimate the variance for the coefficient estimate’s probability distribution.
226 Chapter 7

Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓
Var[ e]
Var[ bx ] =
∑
T
EstVar[e] t =1
( xt − x ) 2

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2

7.3 Step 1: Estimating the Variance of the Error Term’s Probability Distribution

The data from Professor Lord’s first quiz is the only available information that Clint can use to
estimate the variance of the error term’s probability distribution, Var[e].
We will now describe three attempts to estimate the variance using the results of Professor
Lord’s first quiz by calculating the following:

1. Variance of the error term’s numerical values from the first quiz.
2. Variance of the residual’s numerical values from the first quiz.
3. “Adjusted” variance of the residual’s numerical values from the first quiz.

In each case we will use simulations to assess these attempts by exploiting the relative frequency
interpretation of probability:

Relative frequency interpretation of probability: After many, many repetitions of the experiment,
the distribution of the numerical values from the experiments mirrors the random variable’s
probability distribution; the two distributions are identical:

Distribution of the
numerical values
After many,
many
repetitions
Probability distribution

The first two attempts fail. Nevertheless, they provide the motivation for the third attempt which
succeeds. Even though the first two attempts fail, it is instructive to explore them.
227 Estimating the Variance of an Estimate’s Probability Distribution

7.3.1 First Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
Variance of the Error Term’s Numerical Values from the First Quiz

In reality Clint cannot observe the actual parameters, βConst and βx, but for the moment, assume
that we know them. If we were privy to the actual parameters, we would be able to calculate
the actual numerical values of the error terms for each of our three students from the first quiz:

Student 1’s error term Student 2’s error term Student 3’s error term
↓ ↓ ↓
e1 = y1 − (βConst + βxx1) e2 = y2 − (βConst + βxx2) e3 = y3 − (βConst + βxx3)

How could we use these three numerical values for the error terms from the first quiz to
estimate the variance of the error term’s probability distribution? Why not calculate the variance
of the numerical values of the three error terms and then use that variance to estimate the vari-
ance of the error term’s probability distribution? That is,

EstVar[e] = Var[e1, e2, and e3 for 1st quiz]

Recall that the variance is the average of the squared deviations from the mean:

EstVar[ e] = Var[e1, e2, and e3 for 1st quiz]

(e1 − Mean[ e])2 + (e2 − Mean[ e])2 + (e3 − Mean[ e])2
=
3
Since the error terms represent random influences, the mean of the error term’s probability
distribution must equal 0. Therefore the deviations from the mean are just e1, e2, and e3. The
variance is the sum of the squared errors divided by 3:

EstVar[ e] = Var[e1, e2, and e3 for 1st quiz]

e12 + e22 + e32 for 1st quiz SSE for 1st quiz
= =
3 3
In our simulation we can specify the value of the actual parameters; by default, βConst equals
50 and βx equals 2. Using these values, calculate the numerical values of the error terms:

First quiz: βConst = 50 βx = 2 et = yt − (βConst + βxxt)

Student xt yt βConst + βxxt = 50 + 2xt et = yt − (50 + 2xt) e 2t
1 5 66 50 + 2 × 5 = 60 66 − 60 = 6 62 = 36
2 15 87 50 + 2 × 15 = 80 87 − 80 = 7 72 = 49
3 25 90 50 + 2 × 25 = 100 90 − 100 = −10 −102 = 100
Sum = 185
228 Chapter 7

EstVar[ e] = Var[e1, e2, and e3 for 1st quiz]

e12 + e22 + e32 for 1st quiz SSE for 1st quiz 185 2
= = = = 61
3 3 3 3
Can we expect variance of the numerical values, 61 2/3, to equal the actual variance of the
coefficient estimate’s probability distribution? Absolutely not, random influences are present. In
fact we can be all but certain that the actual variance of the coefficient estimate’s probability
distribution will not equal 61 2/3. What then can we hope for? We can hope that this procedure
is unbiased. We can hope that the procedure does not systematically overestimate or underesti-
mate the actual value. But is the estimation procedure really unbiased?

Econometrics Lab 7.1: Is the First Attempt Estimation Procedure Unbiased?

We address this question by using the Estimating Variances simulation in our Econometrics Lab
figure 7.1):

Act err var

Unbiased estimation Actual variance
procedure: After many, 200 of error term’s
many repetitions of the 350 probability
experiment the mean 500 distribution,
(average) of the estimates Var[e]
equals the actual value
Does the error term
Repetition represent a random
Mean (average) of the influence?
error term’s numerical
Error terms
values from all
repetitions
Mean
Variance of the
Var error term’s
Sum of squared residuals numerical
Sum of squared errors SSR values from all
repetitions
Divide by sample size or SSE
degrees of freedom?
Divide by T T –2 Is the estimation
Use errors or residuals? procedure for the variance
Use Err Res
of the error term’s
probability distribution
Estimate of the variance for the unbiased?
error’s term probability
distribution calculated from this Error var est Mean (average) of the
repetition, EstVar[e] variance estimates
Mean from all repetitions

Figure 7.1
Variance of the error term’s probability distribution simulation
229 Estimating the Variance of an Estimate’s Probability Distribution

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 7.1.]

Some new boxes now appear:

• In the “Use” line, “Err” is selected, indicating that the simulation will calculate the numerical
value of each student’s error term and then square the error terms to estimate the variance of the
error term’s probability distribution.
• In the “Divide by” line, “T” is selected. T equals the sample size, 3 in this case. The simula-
tion will divide the sum of the squared errors by 3 to estimate the variance of the error term’s
probability distribution.

By selecting “Err” in the “Use” line and “T” in the “Divide by” line, the simulation mimics the
procedure that we just described to estimate the variance of the error term’s probability distribu-
tion. Also note that the actual variance of the error term’s probability distribution equals 500 by
default.
Be certain that the Pause checkbox is checked and click Start. The simulation reports the sum
of squared errors (SSE) and the estimate for variance of the error term’s probability distribution
(Error Var Est) based on the data for the first repetition:
EstVar[ e] = Var[e1, e2, and e3 for 1st repetition]
Sum of squared errors for 1st repetition
=
T

Convince yourself that the simulation is calculating EstVar[e] correctly by applying the proce-
dure we just outlined. Then click Continue to simulate a second quiz. The simulation now
reports on the estimate for variance of the error term’s probability distribution (Error Var Est)
based on the data for the second repetition:
EstVar[ e] = Var[e1, e2, and e3 for 2 nd repetition]
Sum of squared errors for 2nd repetition
=
T

Again, convince yourself that the simulation is calculating EstVar[e] by applying the procedure
we outlined. Also the simulation calculates the mean (average) of the two variance estimates;
the mean of the variance estimates is reported in the Mean line directly below Error Var Est.
Convince yourself that the simulation is calculating the mean of the variance estimates
correctly.
Click Continue a few more times. Note that for some repetitions the estimated variance is
less than the actual variance and sometimes the estimate is greater than the actual. Does this
estimation procedure for the variance systematically underestimate or overestimate the actual
variance or is the estimation procedure unbiased? We can apply the relative frequency
230 Chapter 7

interpretation of probability to address this question by comparing the mean (average) of the
variance estimates with the actual variance after many, many repetitions. If the estimation pro-
cedure is unbiased, the mean of the variance estimates will equal the actual variance of the error
term’s probability distribution, 500 in this case, after many, many repetitions:

After many, many repetitions

↓
Mean (average) of the estimates = Actual value
↓
Estimation procedure is unbiased

Clear the Pause checkbox and click Continue; after many, many repetitions click Stop. The
mean of the estimates for the error term’s variance equals about 500, the actual variance. Next
change the actual variance to 200; click Start, and then after many, many repetitions click Stop.
Again, the mean of the estimates approximately equals the actual value. Finally, change the
actual variance to 50 and repeat the procedure (table 7.2).
The simulation illustrates that this estimation procedure does not systematically underestimate
or overestimate the actual variance; that is, this estimation procedure for the variance of the error
term’s probability distribution is unbiased. But does this help Clint? Unfortunately, it does not.
To calculate the error terms we must know that actual value of the constant, βConst, and the actual
value of the coefficient, βx. In a simulation we can specify the actual values of the parameters,
βConst and βx, but neither Clint nor we know the actual values for Professor Lord’s quiz. After
all, if Clint knew the actual value of the coefficient, he would not need to go through the trouble
of estimating it, would he? The whole problem is that Clint will never know what the actual
value equals, that is why he must estimate it. Consequently this estimation procedure does not
help Clint; he lacks the information to perform the calculations. So, what should he do?

Table 7.2
Error Term Variance simulation results—First attempt

Mean (average) of the estimates

for the variance of the error term’s
Actual value Simulation probability distribution SSE
Var[e] repetitions divided by T

500 >1,000,000 ≈ 500

200 >1,000,000 ≈ 200
50 >1,000,000 ≈ 50
231 Estimating the Variance of an Estimate’s Probability Distribution

7.3.2 Second Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
Variance of the Residual’s Numerical Values from the First Quiz

Clint cannot calculate the actual values of the error terms because he does not know the actual
values of the parameters, βConst and βx. So he decides to do the next best thing. He has already
used the data from the first quiz to estimate the values of βConst and βx.

First Minutes Quiz

quiz Student studied (x) score (y)
1 5 66
2 15 87
3 25 90

∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2 bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
∑
T
t =1
( xt − x ) 2 200 5 5

Clint’s estimate of βConst is 63 and βx is 1.2. Consequently, why not use these estimated values
for the constant and coefficient to estimate the numerical values of error terms for the three
students? In other words, just use the residuals to estimate the error terms:

Res1 = y1 − (bConst + bx x1), Res2 = y2 − (bConst + bx x2), Res3 = y3 − (bConst + bx x3)

Then use the variance of the three numerical values of the residuals to estimate the variance of
the error term’s probability distribution:

EstVar[e] = Var[Res1, Res2, and Res3 for 1st quiz]

Recall that the variance equals the average of the squared deviations from the mean:

EstVar[ e] = Var[ Res1, Res2, and Res3 for 1st quiz]

( Res1 − Mean[ Res])2 + ( Res2 − Mean[ Res])2 + ( Res3 − Mean[ Res])2
=
3
232 Chapter 7

Clint has all the information needed to perform these calculations:

6
First quiz: bConst = 63 Bx = = 1.2
5
6
Student xt yt Estt = bConst + bx xt = 63 + xt Rest = yt − Estyt Res 2t
5
6
1 5 66 63 + × 5 = 69 66 − 69 = −3 −32 = 9
5
6
2 15 87 63 + × 15 = 81 87 − 81 = 6 62 = 36
5
6
3 25 90 63 + × 25 = 93 90 − 93 = −3 –32 = 9
5
Sum = 0 Sum = 54

What is the mean of the residuals? Clearly, the mean is 0:1

Mean[ Res] = Mean[ Res1, Res2, and Res3 for 1st quiz]
Res1 + Res2 + Res3 −3 + 6 − 3
= + =0
3 3

Clint can easily calculate the variance of the estimated errors when using the residuals to do so:
EstVar[ e] = Var[ Res1, Res2, and Res3 for 1st quiz]
( Res1 − Mean[ Res])2 + ( Res2 − Mean[ Res])2 + ( Res3 − Mean[ Res])2
=
3
Res1 + Res2 + Res3
2 2 2
=
3
Sum of squared residuals for 1st quiz SSR for 1st quiz 54
= = = = 18
3 3 3

The good news is that Clint can indeed perform these calculations. He can calculate the
residuals and therefore can estimate the variance of the error term’s probability distribution using
this procedure. Unfortunately, there is also some bad news. This estimation procedure is biased;
it systematically underestimates the variance of the error term’s probability distribution.

1. Using a little algebra, we can in fact show that the mean of the residuals must always equal 0 when we use the
ordinary least squares (OLS) estimation procedure.
233 Estimating the Variance of an Estimate’s Probability Distribution

Econometrics Lab 7.2: Is the Second Attempt Estimation Procedure Unbiased?

To illustrate this, return to the Estimating Variances simulation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 7.2.]

Note that the “Res” is selected in the “Use” line, indicating that the variance of the residuals
rather than the error terms will be used to estimate the variance of the error term’s probability
distribution. As before, the actual variance of the error term’s probability distribution is specified
as 500 by default. Be certain that the Pause checkbox is cleared; click Start and after many,
many repetitions click Stop. The mean (average) of the estimates for the variance equals about
167 while the actual variance of the error term is 500. Next select a variance of 200 and then
50 and repeat the process. Convince yourself that this procedure consistently underestimates the
variance.
The mean of the estimates is less than the actual values; this estimation procedure is biased
downward (table 7.3). This estimation procedure systematically underestimates the variance of
the error term’s probability distribution.

Econometrics Lab 7.3: Comparing the Sum of Squared Residuals and the Sum of Squared Errors

To understand why this estimation procedure is biased downward, we will return to the Estimat-
ing Variances simulation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 7.3.]

This time, be certain that the Pause checkbox is checked and then click Start. Note that both
the sum of squared errors and the sum of squared residuals are reported. Which is less in the
first repetition? Click the Continue button to run the second repetition. Which sum is less in
the second repetition? Continue to do this until you recognize the pattern that is emerging. The
sum of squared residuals is always less than the sum of squared errors. Why?

Table 7.3
Error Term Variance simulation results—Second attempt

Mean (average) of the estimates

for the variance of the error term’s
Actual value Simulation probability distribution SSR
Var[e] repetitions divided by T

500 >1,000,000 ≈167

200 >1,000,000 ≈ 67
50 >1,000,000 ≈17
234 Chapter 7

Recall how bConst and bx were chosen. They were chosen so as to minimize the sum of squared
residuals:

SSR = Res 12 + Res 22 + Res 23 = (y1 − bConst − bxx1)2 + (y2 − bConst − bxx2)2 + (y3 − bConst − bxx3)2

and compare it to the sum of squared errors:

SSE = e 21 + e 22 + e 23 = (y1 − βConst − βxx1)2 + (y2 − βConst − βxx2)2 + (y3 − βConst − βxx3)2

The sum of squared residuals, Res 21 + Res 22 + Res 23 , would equal the actual sum of squared errors,
e 21 + e 22 + e 23 , only if bConst equaled βConst and bx equaled βx:

Only if
bConst = βConst and bx = βx
↓
Res + Res + Res = e 21 + e 22 + e 23
2
1
2
2
2
3

As a consequence of random influences we can never expect the estimates to equal the actual
values, however. That is, we must expect the sum of squared residuals to be less than the sum
of squared errors:

Typically
bConst ≠ βConst and bx ≠ βx
↓
Res + Res + Res < e 21 + e 22 + e 23
2
1
2
2
2
3

Divide both sides of the inequality by 3 to compare the variance of the Res’s and e’s:
Res12 + Res22 + Res32 e12 + e22 + e32
<
3 3
↓
Var[Res1, Res2, and Res3] < Var[e1, e2, and e3]

The variance of the residuals will be less than the variance of the actual errors. Recall our
first attempt to estimate the variance of the error term’s probability distribution. When we used
the variance of the actual errors, the procedure was unbiased:
Res12 + Res22 + Res32 e12 + e22 + e32
<
3 3
↓
Var[Res1, Res2, and Res3] < Var[e1, e2, and e3]
↓ ↓
Systematically Unbiased estimation
underestimates variance procedure
235 Estimating the Variance of an Estimate’s Probability Distribution

Using the variance of the residuals leads to bias because it systematically underestimates the
variance of the error term’s numerical values. So now, what can Clint do?

7.3.3 Third Attempt to Estimate the Variance of the Error Term’s Probability Distribution:
“Adjusted” Variance of the Residual’s Numerical Values from the First Quiz

While we will not provide a mathematical proof, Clint can correct for this bias by calculating
what we will call the “adjusted” variance of the residuals. Instead of dividing the sum of squared
residuals by the sample size, Clint can calculate the adjusted variance by dividing by what are
called the degrees of freedom:
EstVar[ e] = AdjVar[ Res1, Res2, and Res3 for 1st quiz]
Sum of squared residuals for 1st quiz
=
Degrees of freedom

where

Degrees of freedom = Sample size − Number of estimated parameters

The degrees of freedom equal the sample size less the number of estimated parameters. For the
time being, do not worry about precisely what the degrees of freedom represent and why they
solve the problem of bias. We will motivate the rationale later in this chapter. We do not wish
to be distracted from Clint’s efforts to estimate the variance of the error term’s probability dis-
tribution at this time. So let us postpone the rationalization for now. For the moment we will
accept that fact that the degrees of freedom equal 1 in this case:

Degrees of freedom = Sample size – Number of estimated parameters = 3 − 2 = 1

We subtract 2 because we are estimating the values of 2 parameters: the constant, βConst, and the
coefficient, βx.
Clint has the information necessary to perform the calculations for the adjusted variance of
the residuals. Recall that we have already calculated the sum of squared residuals:
236 Chapter 7

6
First quiz: bConst = 63 bx = = 1.2
5
6
Student xt yt Estt = bConst + bx xt = 63 + xt Rest = yt − Estyt Res 2t
5
6 −32 = 9
1 5 66 63 + × 5 = 69 66 − 69 = −3
5
6 62 = 36
2 15 87 63 + × 15 = 81 87 − 81 = 6
5
6
3 25 90 63 + × 25 = 93 90 − 93 = −3 −32 = 9
5
Sum = 0 Sum = 54

So we need only divide the sum, 54, by the degrees of freedom to use the adjusted variance to
estimate the variance of the error term’s probability distribution:
EstVar[ e] = AdjVar[ Res1, Res2, and Res3 for 1st quiz]
Sum of squared residuals for 1st quiz 54
= = = 54
Degrees of freedom 1

Econometrics Lab 7.4: Is the Third Attempt Estimation Procedure Unbiased?

We will use the Estimating Variances simulation to illustrate that this third estimation procedure
is unbiased.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 7.4.]

In the “Divide by” line, select “T−2” instead of “T.” Since we are estimating two parameters,
the simulation will be dividing by the degrees of freedom instead of the sample size. Initially
the variance of the error term’s probability distribution is specified as 500. Be certain that the
Pause checkbox is cleared; click Start, and then after many, many repetitions click Stop. The
mean (average) of the variance estimates equals about 500, the actual variance. Next repeat the
process by selecting a variance of 200 and then 50. Table 7.4 gives the results. In each case the
mean of the estimates equals the actual value after many, many repetitions. This estimation
procedure proves to be unbiased.
237 Estimating the Variance of an Estimate’s Probability Distribution

Table 7.4
Error Term Variance simulation results—Third attempt

Mean (average) of the estimates

for the variance of the error term’s
Actual value Simulation probability distribution SSR
Var[e] repetitions divided by T − 2

500 >1,000,000 ≈ 500

200 >1,000,000 ≈ 200
50 >1,000,000 ≈ 50

7.4 Step 2: Use the Estimate for the Variance of the Error Term’s Probability Distribution to
Estimate the Variance of the Coefficient Estimate’s Probability Distribution

At last Clint has found an unbiased estimation procedure for the variance of the error term’s
probability distribution:

EstVar[e] = AdjVar[Res1, Res2, and Res3 for 1st quiz] = 54

But why did he need this estimate in the first place? He needs it to estimate the variance of the
coefficient estimate’s probability distribution in order to assess the reliability of the coefficient
estimate. Recall his two-step strategy:

Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between
probability distribution from the available the variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓
Var[ e]
Var[ bx ] =
∑
T
EstVar[e] t =1
( xt − x ) 2

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2

A little arithmetic allows Clint to estimate the variance of the coefficient estimate’s probability
distribution:
EstVar[ e]
EstVar[ bx ] =
( x1 − x ) + ( x2 − x ) + ( x3 − x )
54
=
(5 − 15)2 + (15 − 15)2 + (25 − 15)2
54 54 54
= = = = .27
(−10) + (0) + (10)
2 2 2
100 + 100 200
238 Chapter 7

Recall that the standard deviation is the square root of the variance; hence we can calculate
the estimated standard deviation by computing the square root of estimated variance:

EstSD[ bx ] = EstVar[ bx ] = 0.27 = 0.5196

The estimated standard deviation is called the standard error:

SE[bx ] = EstSD[ bx ]
= Estimated standard deviation of bx s probability distribution
= EstVar[ bx ] = 0.27 = 0.5196
The standard error equals the square root of the estimated variance.
Let us summarize Clint’s two-step strategy:

Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and error
information—data from the first quiz term’s probability distributions
↓ ↓
EstVar[ e] = AdjVar[ Res’s]
Var[ e]
SSR 54 Var[ bx ] =
∑
T
= = = 54 ( xt − x ) 2
Degrees of freedom 1 t =1

EstVar[ e] 54
EstVar[ bx ] = = = 0.27
∑
T
t =1
( xt − x ) 2 200

EstSD[ bx ] = EstVar[ bx ] = 0.27 = 0.5196

We have already used a simulation to show that step 1 is justified; that is, we have shown that
the estimation procedure for the variance of the error term’s probability distribution is unbiased.
Now we will justify the step 2 by showing that the estimation procedure for the variance of the
coefficient estimate’s probability distribution is also unbiased. To do so, we will once again
exploit the relative frequency interpretation of probability:

Distribution of the
numerical values
After many,
↓ many
repetitions
Probability distribution
239 Estimating the Variance of an Estimate’s Probability Distribution

An estimation procedure is unbiased whenever the mean (average) of the estimated numerical
values equals the actual value after many, many repetitions:

After many, many repetitions

↓
Mean (average) of the estimates = Actual value
↓
Estimation procedure is unbiased

Econometrics Lab 7.5: Is Estimation Procedure for the Variance of the Coefficient Estimate’s
Probability Distribution Unbiased?

We will use our Estimating Variance simulation in the Econometrics Lab to show that this two-
step estimation procedure for the variance of the coefficient estimate’s probability distribution
is unbiased (figure 7.2).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 7.5.]

By default, the actual variance of the error term’s probability distribution is 500 and the sample
size is 3. We can now calculate the variance of the coefficient estimate’s probability
distribution:
Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x ) 2

From before recall that the sum of x squared deviations equals 200:

∑
T
t =1
( xt − x )2 = 200

and that when the variance of the error term’s probability distribution is specified as 500 and
the sum of the squared x deviations equals 200, the variance of the coefficient estimates probabil-
ity distribution equals 2.50:
Var[ e] 500
Var[ bx ] = = = 2.50
∑
T
t =1
( xt − x ) 2 200

Let us begin by confirming that the simulation is performing the calculations correctly. Be
certain that the Pause button is checked, and then click Start. The sum of squared residuals and
the sum of squared x deviations are reported for the first repetition. Use this information along
with a pocket calculator to compute the estimate for the variance of the coefficient estimate’s
probability distribution, EstVar[bx]:
240 Chapter 7

Act coef Act err var

Actual variance
−2 200 of error term’s
0 350 probability
Unbiased estimation
2 500 distribution:
procedure: After many,
many repetitions of the Var[e]
experiment the average
of the estimates equals Variance of
Repetition Actual variance of
the actual value the estimated
coefficient estimate’s
coefficient
probability
Coef value est values from
distribution:
all repetitions
Estimated coefficient value Var[ e ]
Var[ bx ] =
Mean
Σt=1(x − −
T 2
from this repetition: x)
t
Var
Σ − −
T
t=1 (yt − y)(xt − x )
bx =
Σt=1 (xt − −x)2
T

Sum sqr XDev

SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
Mean
the coefficient estimate’s Mean(average) of the variance
probability distribution estimates from all repetitions
calculated from this repetition

Figure 7.2
Variance of the coefficient estimate’s probability distribution simulation

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2

where
SSR
EstVar[ e] =
Degrees of freedom

Compare your calculation with the simulation’s estimate. You will discover that they are
identical. Next click Continue and perform the same calculation for the second repetition. Again,
you will discover that the simulation has calculated the estimate for the variance of the coefficient
estimate’s probability distribution correctly. Also confirm that the simulation is computing the
mean of the variance estimates correctly by taking the average of the coefficient variance esti-
mates from the first two repetitions.
241 Estimating the Variance of an Estimate’s Probability Distribution

Is the estimation procedure for the variance of the

coefficient estimate’s probability distribution unbiased?

Variance of the Mean (average) of the estimates

Actual coefficient estimate’s for the variance of the coefficient
Var[et] probability distribution estimate’s probability distribution
500 2.50 ≈ 2.50
200 1.00 ≈1.00
50 0.25 ≈ 0.25

Click Continue a few more times. The variance estimate should be less than the actual value,
2.50, in some of the repetitions and greater than the actual value in others. Now the critical
question:

Critical question: After many, many repetitions, will the mean (average) of the variance esti-
mates equal the actual variance of the coefficient estimate’s probability distribution?

If the answer is yes, the variance estimation procedure is unbiased; the procedure is not system-
atically overestimating or underestimating the actual variance. If instead the answer is no, the
variance estimation procedure is biased. To answer this question, clear the Pause checkbox
and click Continue. After many, many repetitions click Stop. What do you observe? After
many, many repetitions the average of the coefficient’s variance estimates indeed equals
about 2.50.
Repeat this process after you change the error term variance to 200 and then to 50. As reported
above, the answer to the critical question is yes in all cases. The estimation procedure for the
variance of the coefficient estimate’s probability distribution is unbiased.

7.5 Tying up a Loose End: Degrees of Freedom

7.5.1 Reviewing Our Second and Third Attempts to Estimate the Variance of the Error Term’s
Probability Distribution

Earlier in this chapter we postponed our explanation of degrees of freedom because it would
have interrupted the flow of our discussion. We will now return to the topic by reviewing Clint’s
efforts to estimate the variance of the error term’s probability distribution. Since Clint can never
observe the actual constant, βConst, and the actual coefficient, βx, he cannot calculate the actual
values of the error terms. He can, however, use his estimates for the constant, bConst, and coef-
ficient, bx, to estimate the errors by calculating the residuals:
242 Chapter 7

Error terms Residuals

↓ ↓
et = yt − (βConst + βxxt) Rest = yt − (bConst + bx xt)

We can think of the residuals as the estimated “error terms.” Now let us briefly review our second
and third attempts to estimate the variance of the error term’s probability distribution.
In our second attempt we used the variance of the residuals (“estimated errors”) to estimate
the variance of the error term’s probability distribution. The variance is the average of the squared
deviations from the mean:

Var[ Res1, Res2, and Res3 for 1st quiz]

( Res1 − Mean[ Res])2 + ( Res2 − Mean[ Res])2 + ( Res3 − Mean[ Res])2
=
Sample size

Since the residuals are the “estimated errors,” it seemed natural to divide the sum of squared
residuals by the sample size, 3 in Clint’s case. Furthermore, since the Mean[Res] = 0,

Res12 + Res22 + Res32 SSR SSR

EstVar[ Res1, Res2, and Res3 for 1st quiz] = = =
Sample size Sample size 3

But we showed that this procedure was biased; the Estimating Variance simulation revealed that
it systematically underestimated the error term’s variance.
We then modified the procedure; instead of dividing by the sample size, we divided by the
degrees of freedom, the sample size less the number of estimated parameters:

AdjVar[ Res1, Res2, and Res3 for 1st quiz]

Res12 + Res22 + Res32 SSR SSR
= = =
Degrees of freedom Degrees of freedom 1

Degrees of freedom = Sample size − Number of estimated parameters

=3−2=1

The Estimating Variances simulation illustrated that this modified procedure was unbiased.

7.5.2 How Do We Calculate an Average?

Why does dividing by 1 rather than 3 “work?” That is, why do we subtract 2 from the sample
size when calculating the average of the squared residuals (“estimated errors”)? To provide some
intuition, we will briefly revisit Amherst precipitation in the twentieth century (table 7.5).
Calculating the mean for June obtains
0.75 + 4.54 + . . . + 7.99 377.76
Mean (average) for June = = = 3.78
100 100
243 Estimating the Variance of an Estimate’s Probability Distribution

Table 7.5
Monthly precipitation in Amherst, MA, during the twentieth century

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24

Each of the 100 Junes in the twentieth century provides one piece of information that we use to
calculate the average. To calculate an average, we divide the sum by the number of pieces of
information.

Key principle: To calculate a mean (an average), we divide the sum by the number of pieces
of information:
Sum
Mean (average) =
Number of pieces of information

Hence, to calculate the average of the squared deviations, the variance, we must divide by the
number of pieces of information.
Now let us return to our efforts to estimate the variance of the error term’s probability
distribution:

Claim: The degrees of freedom equal the number of pieces of information that are available to
estimate the variance of the error term’s probability distribution.
To justify this claim, suppose that the sample size were 2. Plot the scatter diagram (figure 7.3):
• With only two observations, we only have two points.
• The best fitting line passes directly through each of the two points on the scatter diagram.
• Consequently the two residuals, “the two estimated errors,” for each observation must always
equal 0 when the sample size is 2 regardless of what the actual variance of the error term’s
probability distribution equals:

Res1 = 0 and Res2 = 0 regardless of what the variance actually equals.

The first two residuals, “the first two estimated errors,” provide no information about the actual
variance of the error term’s probability distribution because the line fits the data perfectly—both
residuals equal 0. Only with the introduction of a third observation do we get some sense of the
error term’s variance (figure 7.4).
244 Chapter 7

Figure 7.3
Degrees of freedom—Two observations

y y

Res3 Res2

Res1

x x
Suggests large error term variance Suggests small error term variance

Figure 7.4
Degrees of freedom—Three observations
245 Estimating the Variance of an Estimate’s Probability Distribution

To summarize:
• The first two observations provide no information about the error term; stated differently, the
first two observations provide “zero” information about the error term’s variance.
• The third observation provides the first piece of information about the error term’s variance.
This explains why Clint should divide by 1 to calculate the “average” of the squared devia-
tions. In general, the degrees of freedom equal the number of pieces of information that we have
to estimate the variance of the error term’s probability distribution:
Degrees of freedom = Sample size − Number of estimated parameters
To calculate the average of the sum of squared residuals, we should divide the sum of squared
residuals by the degrees of freedom, the number of pieces of information.

7.6 Summary: The Ordinary Least Squares (OLS) Estimation

7.6.1 Three Important Parts

The ordinary least squares (OLS) estimation procedure actually includes three procedures; a
procedure to estimate the following:
• Regression parameters
• Variance of the error term’s probability distribution
• Variance of the coefficient estimate’s probability distribution

All three estimation procedures are unbiased. (Recall that we are assuming that the standard
ordinary least squares (OLS) premises are met. We will address the importance of these premises
in part IV of the textbook.) We will now review the calculations and then show that statistical
software performs all these calculations for us.

Estimating the Value of the Regression Parameters

We calculated the ordinary least squares (OLS) estimates, bx and bConst, by using the appropriate
equations; these estimates minimize the sum of squared residuals:

∑
T
t =1
( yt − y )( xt − x ) 240 6 6
bx = = = = 1.2, bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
∑
T
t =1
( xt − x ) 2 200 5 5

Estimating the Variance of the Error Term’s Probability Distribution

Once we calculate the estimates, it is easy to calculate the sum of squared residuals, SSR:

SSR = ∑ t =1 Rest2 = ∑ t =1 ( yt − Estyt )2 = ∑ t =1 ( yt − bConst − bx xt )2 = 54

T T T
246 Chapter 7

We estimated the variance of the error term’s probability distribution, EstVar[e], by dividing the
sum of squared residuals by the degrees of freedom:
SSR 54
EstVar[e] = AdjVar[Res1 + Res2 + Res3 ] = = = 54
Degrees of freedom 1

The square root of this estimated variance is typically called the standard error of the
regression:

SE of regression = EstVar[ e] = 54 = 7.348

Note that the term standard error always refers to the square root of an estimated variance.

Estimating the Variance of the Coefficient Estimate’s Probability Distribution

The estimated value of the error term’s probability distribution allowed us to estimate the vari-
ance of the coefficient estimate’s probability distribution, EstVar[bx]:
EstVar[ e] 54
EstVar[ bx ] = = = 0.27
∑
T
t =1
( xt − x ) 2 200

The square root of this estimated variance of the coefficient estimate’s probability distribution
is called the standard error of the coefficient estimate, SE[bx]:

SE[ bx ] = EstVar[ bx ] = 0.27 = 0.5196

We illustrated that these estimation procedure have nice properties. When the standard ordi-
nary least squares (OLS) premises are satisfied:
• Each of these procedures is unbiased.
• The procedure to estimate the value of the parameters is the best linear unbiased estimation
procedure (BLUE).

7.6.2 Regression Results

In reality, we did not have to make all these laborious calculations. Statistical software performs
these calculations for us thereby saving us the task of performing the arithmetic (table 7.6):

Professor Lord’s first quiz data: Cross-sectional data of minutes studied and quiz scores in
the first quiz for the three students enrolled in Professor Lord’s class.

xt = minutes studied by student t

yt = quiz score received by student t

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Professor Lord’s First Quiz.]
247 Estimating the Variance of an Estimate’s Probability Distribution

Table 7.6
Quiz scores’ regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Const 63.00000 8.874120 7.099296 0.0891
Number of observations 3
Sum squared residuals 54.00000
SE of Regression 7.348469
Estimated equation: Esty = 63 + 1.2x

We previously noted the regression results report the parameter estimates and the sum of squared
residuals. While statistical software typically does not report the estimated variance of the error
term’s probability distribution, it does report the standard error of the regression, SE of regres-
sion, which is just the square root of the estimated variance of the error term’s probability dis-
tribution. We can easily calculate the estimated variance of the error term’s probability distribution
from the regression results by squaring the standard error of the regression:

EstVar[e] = 7.3284692 = 54

Similarly, while the statistical software does not report the estimated variance of the coefficient
estimate’s probability distribution, it does report its standard error. We can easily calculate the
estimated variance of the coefficient estimates probability distribution from the regression results
by squaring the standard error of the coefficient estimate:

EstVar[bx] = 0.5196152 = 0.27

Chapter 7 Review Questions

1. Consider the ordinary least squares (OLS) estimation procedure. How is the variance of the
coefficient estimate’s probability distribution related to the variance of the error term’s probabil-
ity distribution?
2. What strategy have we used to estimate the variance of the coefficient estimate’s probability
distribution?
3. Consider our first attempt to estimate the variance of the error term’s probability
distribution:
e12 + e22 + . . . + eT2 SSE
EstVar[ e] = Var[e1, e2, . . . , eT ] = =
Sample size Sample size

Why did this attempt fail?

248 Chapter 7

4. Consider our second attempt to estimate the variance of the error term’s probability
distribution:
Res12 + Res22 + . . . + ResT2 SSR
EstVar[ e] = Var[ Res1, Res2, . . . , ResT ] = =
Sample size Sample size

Why did this attempt fail?

5. Consider our third attempt to estimate the variance of the error term’s probability
distribution:
Res12 + Res22 + . . . + ResT2 SSR
EstVar[ e] = Var[ Res1, Res2, . . . , ResT ] = =
Degrees of freedom Degrees of freedom

This attempt succeeded. Explain why it is appropriate to divide by the degrees of freedom rather
than the sample size.

Chapter 7 Exercises

Recall Professor Lord’s colleague who is teaching another course in which three students are
enrolled.

Regression example data: Cross-sectional data of minutes studied and quiz scores from a course
taught by Professor Lord’s colleague.

xt = minutes studied by student t

yt = quiz score received by student t

Minutes Quiz
Student studied (x) score (y)
1 5 14
2 10 44
3 30 80

1. The simple regression model for the quiz is

yt = βConst + βxxt + et

Using a calculator and the equations we derived in class, apply the least squares estimation
procedure to find the best fitting line by filling in the blanks:
First, calculate the means:

Means: x– = _______________ =
–y = _______________ =
249 Estimating the Variance of an Estimate’s Probability Distribution

Second, for each student calculate the deviation of x from its mean and the deviation of y from
its mean:

Student yt –y yt − –y xt x– xt − x–
1 14 _____ _____ 5 _____ _____
2 44 _____ _____ 10 _____ _____
3 80 _____ _____ 30 _____ _____

Third, calculate the products of the y and x deviations and squared x deviations for each student;
then calculate the sums:

Student (yt − –y )(xt − x– ) (xt − x– )2

1 __________ = _____ __________ = _____
2 __________ = _____ __________ = _____
3 __________ = _____ __________ = _____
Sum = _____ Sum = _____

Now, apply the formulas:

∑
T
t =1
( yt − y )( xt − x )
bx = = = _________
∑
T
t =1
( xt − x )2

bConst = –y − bx x– = _ − ____ = _ − _ = _

2. Calculate the sum of squared residuals by filling in the following blanks:

Student xt yt Estyt = bConst + btxt Rest = yt − Estyt Res 2t

1 5 14 _________________ = _____ ____ − ____ = ____ _____
2 10 44 _________________ = _____ ____ − ____ = ____ _____
3 30 80 _________________ = _____ ____ − ____ = ____ _____
SSR = _____

3. Finally, use the quiz data to estimate the variance and standard deviation of the coefficient
estimate’s probability distribution.
4. Check your answers to exercises 1, 2, and 3 using statistical software.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Regression Example.]
250 Chapter 7

5. Consider the following regression results:

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE
x 15.26071 4.492548
Const −27.32826 86.78952
Number of observations 10
SE of regression 78.09296
Based on the results:

a. Can you estimate the variance of the error term’s probability distribution, EstVar[e]?

If so, what does it equal?

Yes ____ No ____ _____________

b. Can you estimate the variance of the coefficient estimate’s probability distribution,
EstVar[bx]?

If so, what does it equal?

Yes ____ No ____ _____________

∑
T
c. Can you calculate t =1
( xt − x )2 ?

If so, what does it equal?

Yes ____ No ____ _____________

∑
T
d. Can you calculate t =1
( yt − y )( xt − x ) ?

If so, what does it equal?

Yes ____ No ____ _____________

e. Can you calculate the sum of squared residuals, SSR?

If so, what does it equal?

Yes ____ No ____ _____________

f. Can you calculate sum of squared errors, SSE?

If so, what does it equal?

Yes ____ No ____ _____________
Interval Estimates and Hypothesis Testing
8

Chapter 8 Outline

8.1 Clint’s Assignment: Taking Stock

8.2 Estimate Reliability: Interval Estimate Question

8.2.1 Normal Distribution versus the Student t-Distribution: One Last Complication
8.2.2 Assessing the Reliability of a Coefficient Estimate: Applying the Student t-Distribution

8.3 Theory Assessment: Hypothesis Testing

8.3.1 Motivating Hypothesis Testing: The Cynic
8.3.2 Formalizing Hypothesis Testing: The Steps

8.4 Summary: The Ordinary Least Squares (OLS) Estimation Procedure

8.4.1 Regression Model and the Role of the Error Term
8.4.2 Standard Ordinary Least Squares (OLS) Premises
8.4.3 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation
Procedures
8.4.4 Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the
Standard Ordinary Least Squares (OLS) Premises

8.5 Causation versus Correlation

Chapter 8 Prep Questions

1. Run the following simulation and answer the questions posed. Summarize your answers by
filling in the following blanks:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8P.1.]
252 Chapter 8

Actual From To Repetitions between

values βx Var[e] value value From and To values
2 50 1.5 2.5 ≈_____%
2 50 1.0 3.0 ≈_____%
2 50 0.5 3.5 ≈_____%

2. In the simulation you just ran (question 1):

a. Using the appropriate equation, compute the variance of the coefficient estimate’s prob-
ability distribution? ______
b. What is the standard deviation of the coefficient estimate’s probability distribution?
______
c. Using the normal distribution’s “rules of thumb,” what is the probability that the coefficient
estimate in one repetition would lie between:
i. 1.5 and 2.5? ______
ii. 1.0 and 3.0? ______
iii. 0.5 and 3.5? ______
d. Are your answers to part c consistent with your simulation results?
3. Recall the normal distribution. What is the definition of the normal distribution’s z?

4. Recall the regression results from Professor Lord’s first quiz:

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob
x 1.200000 0.519615 2.309401 0.2601
Const 63.00000 8.874120 7.099296 0.0891
Number of observations 3
Sum squared residuals 54.00000
SE of regression 7.348469
a. Does the positive coefficient estimate suggest that studying more will improve a student’s
quiz score? Explain.
Consider the views of a cynic:

Cynic’s view: Studying has no impact on a student’s quiz score; the positive coefficient estimate
obtained from the first quiz was just “the luck of the draw.” In fact, studying does not affect quiz
scores.
253 Interval Estimates and Hypothesis Testing

b. If the cynic were correct and studying has no impact on quiz scores, what would the actual
coefficient, βx, equal?
c. Is it possible that the cynic is correct? To help you answer this question, run the following
simulation:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8.P4.]

8.1 Clint’s Assignment: Taking Stock

We will begin by taking stock of where Clint stands. Recall the theory he must assess:

Theory: Additional studying increases quiz scores.

Clint’s assignment is to assess the effect of studying on quiz scores:

Project: Use data from Professor Lord’s first quiz to assess the effect of studying on quiz scores.

Clint uses a simple regression model to assess the theory. Quiz score is the dependent variable
and number of minutes studied is the explanatory variable:

yt = βConst + βxxt + et

where

yt = quiz score of student t

xt = minutes studied by student t
et = error term for student t

βConst and βx are the model’s parameters. They incorporate the view that Professor Lord awards
each student some points just for showing up; subsequently, the number of additional points
each student earns depends on how much he/she studied:
• βConst represents the number of points Professor Lord gives a student just for showing up.
• βx represents the number of additional points earned for each additional minute of study.

Since the values of βConst and βx are not observable, Clint adopted the econometrician’s
philosophy:

Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.
254 Chapter 8

Clint used the results of the first quiz to estimate the values of βConst and βx by applying the
ordinary least squares (OLS) estimation procedure to find the best fitting line:

First quiz data

∑
T
t =1
( yt − y )( xt − x ) 240 6
bx = = = = 1.2
∑
T
Student x y
t =1
( xt − x ) 2 200 5

6
1 5 66 → bConst = y − bx x = 81 − × 15 = 81 − 18 = 63
5
2 15 87
3 25 90

Clint’s estimates suggest that Professor Lord gives each student 63 points for showing up; sub-
sequently, each student earns 1.2 additional points for each additional minute studied.
Clint realizes that he cannot expect the coefficient estimate to equal the actual value; in fact,
he is all but certain that it will not. So now Clint must address two related issues:
• Estimate reliability: How reliable is the coefficient estimate, 1.2, calculated from the first
quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to
the actual value?
• Theory assessment: How confident should Clint be that the theory is correct, that studying
improves quiz scores?

We will address both of these issues in this chapter. First, we consider estimate reliability.

8.2 Estimate Reliability: Interval Estimate Question

The interval estimate question quantifies the notion of reliability:

Interval estimate question: What is the probability that the estimate, 1.20, lies within ____ of
the actual value? ____

The general properties of the ordinary least squares (OLS) estimation procedure allow us to
address this question. It is important to distinguish between the general properties and one spe-
cific application. Recall that the general properties refer to what we know about the estimation
procedure before the quiz is given; the specific application refers to the numerical values of the
estimates calculated from the results of the first quiz:
255 Interval Estimates and Hypothesis Testing

General properties versus One specific application

↓ ↓
OLS estimation procedure:
Estimate βConst and βx by Apply the estimation
finding the bConst and bx that procedure once to the first
minimize the sum of quiz’s data:
squared residuals
↓ Model: ↓
Before experiment yt = βConst + βxxt + et After experiment
↓ ↓
Random variable: Estimate: Numerical value
Probability distribution OLS equations: ↓
240 6
∑
T
t =1
( yt − y )( xt − x ) bx = = = 1.2
bx = 200 5
∑
T
t =1
( xt − x )2 6
bConst = 81 − × 15 = 63
bConst = –y − bx x– 5
Mean[bx] = βx
Var[e]
Var[bx ] =
∑
T
t =1
( xt − x )2

↓
Mean and variance describe the center and spread of the estimate’s probability distribution

The estimates are random variables and a quiz can be viewed as an experiment. We cannot
determine the numerical value of an estimate with certainty before the experiment (quiz) is
conducted. What then do we know beforehand? We can describe the probability distribution of
the estimate. We know that the mean of the coefficient estimate’s probability distribution equals
the actual value of the coefficient and its variance equals the variance of the error term’s probabil-
ity distribution divided by the sum of squared x deviations:

Mean of estimate’s probability Variance of estimate’s

distribution equals actual value probability distribution
↓ ↓
Estimation procedure is Determines the reliability As variance decreases
→ →
unbiased of the estimate reliability increases
256 Chapter 8

Both the mean and variance of the coefficient estimate’s probability distribution play a
crucial role:
• Since the mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual
value of the coefficient, βx, the estimation procedure is unbiased; the estimation procedure does
not systematically underestimate or overestimate the actual coefficient value.
• When the estimation procedure for the coefficient value is unbiased, the variance of the esti-
mate’s probability distribution, Var[bx], determines the reliability of the estimate; as the variance
decreases, the probability distribution becomes more tightly cropped around the actual value;
consequently it becomes more likely for the coefficient estimate to be close to the actual coef-
ficient value.

To assess his estimate’s reliability, Clint must consider the variance of the coefficient esti-
mate’s probability distribution. But we learned that Clint can never determine the actual variance
of the error term’s probability distribution, Var[e]. Instead, Clint employs a two step strategy for
estimating the variance of the coefficient estimate’s probability distribution:

Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between
probability distribution from the available the variances of coefficient estimate’s and
information – data from the first quiz error term’s probability distributions
↓ ↓
EstVar[ e] = AdjVar[ Res’s]
Var[ e]
SSR 54 Var[ bx ] =
∑
T
= = = 54 ( xt − x ) 2
Degrees of freedom 1 t =1

EstVar[ e] 54
EstVar[ bx ] = = = 0.27
∑
T
t =1
( xt − x ) 2 200

EstSD[ bx ] = EstVar[ bx ] = 0.27 = 0.5196

Unfortunately, there is one last complication before we can address the interval estimate
question.

8.2.1 Normal Distribution versus the Student t-Distribution: One Last Complication

We begin by reviewing the normal distribution. Recall that the variable z played a critical role
in using the normal distribution:
Value of random variable − Distribution mean
z=
Distribution sttandard deviation
= Number of standard deviations from the mean
257 Interval Estimates and Hypothesis Testing

In words, z equals the number of standard deviations the value lies from the mean. But Clint
does not know what the variance and standard deviation of the coefficient estimate’s probability
distribution equal. That is why he must estimate them. Consequently he cannot use the normal
distribution to calculate probabilities.
When the standard deviation is not known and must be estimated, the Student t-distribution
must be used. The variable t is similar to the variable z; instead of equaling the number of stan-
dard deviations the value lies from the mean, t equals the number of estimated standard devia-
tions the value lies from the mean:
Value of random variable − Distribution mean
t=
Estimated distributiion standard deviation
= Number of estimated standard deviations from the mean

Recall that the estimated standard deviation is called the standard error; hence
Value of random variable − Distribution mean
t=
Standard error
= Numbeer of standard errors from the distribution mean

Like the normal distribution, the t-distribution is symmetric about its mean. Since estimating
the standard deviation introduces an additional element of uncertainty, the Student t-distribution
is more “spread out” than the normal distribution as illustrated in figure 8.1. The Student

Probability distribution of random variable

Normal

Student t

Value of random variable

Distribution mean

Figure 8.1
Normal and Student t-distributions
258 Chapter 8

t-distribution’s “spread” depends on the degrees of freedom. As the number of degrees of

freedom increase, we have more information; consequently the t-distribution’s spread decreases,
moving it closer and closer to the normal distribution. Since the “spread” of the Student t-
distribution depends on the degrees of freedom, the table describing the Student t-distribution
is more cumbersome than the normal distribution table. Fortunately, our Econometrics Lab
allows us to avoid the cumbersome Student t-distribution table.

8.2.2 Assessing the Reliability of a Coefficient Estimate

How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident
should Clint be that the coefficient estimate, 1.2, will be close to the actual value? The interval
estimate question to address this question:

Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
____ of the actual coefficient value? ____

We begin by filling in the first blank, choosing our “close to” value. The value we choose
depends on how demanding we are; that is, our “close to” value depends on the range that we
consider to be “close to” the actual value. For purposes of illustration, we will choose 1.5; so
we write 1.5 in the first blank.

Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
1.5 of the actual coefficient value? ____

Figure 8.2 illustrates the probability distribution of the coefficient estimate and the probability
that we wish to calculate. The estimation procedure we used to calculate the coefficient estimate,
the ordinary least squares (OLS) estimation procedure is unbiased:

Mean[bx] = βx

Consequently we place the actual coefficient value, βx, at the center of the probability
distribution.
As discussed above, we must use the Student t-distribution rather than the normal distribution
since we must estimate the standard deviation of the probability distribution. The regression
results from Professor Lord’s first quiz provide the estimate (table 8.1).
The standard error equals the estimated standard deviation. t equals the number of standard
errors (estimated standard deviations) that the value lies from the distribution mean:
259 Interval Estimates and Hypothesis Testing

Probability that the

Student t-distribution
estimate is within 1.5
Mean[bx ] = βx
of the actual value

bx
1.5 1.5

βx −1. 5 Actual value = βx β x + 1.5

Figure 8.2
Probability distribution of coefficient estimate—”Close to” value equals 1.5

Table 8.1
Quiz scores regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Const 63.00000 8.874120 7.099296 0.0891
Number of observations 3
Sum squared residuals 54.00000
SE of regression 7.348469
Estimated equation: Esty = 63 + 1.2x
Interpretation of estimates:
bConst = 63: Students receive 63 points for showing up
bx = 1.2: Students receive 1.2 additional points for each additional minute studied
Critical result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that
additional studying increases quiz scores. This evidence lends support to our theory.

Value of random variable − Distribution mean

t=
Standard error
= Numbeer of standard errors from the distribution mean

Since the distribution mean equals the actual value, we can “translate” 1.5 below and above the
actual value into t’s. Since the standard error equals 0.5196, 1.5 below and above the actual
value translates into 2.89 standard errors below and above the actual value:
260 Chapter 8

Probability that the Student t-distribution

estimate is within 1.5 Mean[bx ] = βx
of the actual value SE[bx ] = 0.5196

b
1.5 1.5 x
2.89 SE's 2.89 SE's
βx −1. 5 Actual value = βx βx + 1.5
t = −2.89 t = 2.89

Figure 8.3
Probability distribution of coefficient estimate—”Close to” value equals 1.5

1.5 below actual value 1.5 above actual value

↓ ↓
1.5 1.5
= 2.89 SE’s below actual value = 2.89 SE’s above actual value
0.5196 0.5196

To summarize:

The probability that the The probability that the

estimate lies within 1.5 of estimate lies within 2.89 SE’s
the actual value = of the actual value
↓
That is, between t’s of
−2.89 and 2.89

Figure 8.3 adds this information to the probability distribution graph.

Econometrics Lab 8.1: Calculating the Tail Probabilities

We can now use the Econometrics Lab to calculate the probability that the estimate is within
1.5 of the actual value by computing probabilities the left and right tails probabilities.1

1. Appendix 8.2 shows how we can use the Student t-distribution table to address the interval estimate question. Since
the table is cumbersome, we will use the Econometrics Lab to do so.
261 Interval Estimates and Hypothesis Testing

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8.1a.]
• Left tail: The following information has been entered:
Degrees of freedom: 1
t: −2.89
Click Calculate. The left tail probability is approximately 0.11.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8.1b.]
• Right tail: The following information has been entered:
Degrees of freedom: 1
t: 2.89
Click Calculate. The right tail probability is approximately 0.11.

Since the Student t-distribution is symmetric, both the left and right tail probabilities equal
0.11 (figure 8.4). Hence, the probability that the estimate is within 1.5 of the actual value
equals 0.78:

1.00 − (0.11 + 0.11) = 0.78.

0.78

Probability that the Student t-distribution

estimate is within 1.5 Mean[bx ] = βx
of the actual value SE[bx ] = 0.5196

0.11 0.11

b
1.5 1.5 x
2.89 SE's 2.89 SE's
βx −1. 5 Actual value = βx βx + 1.5
t = −2.89 t = 2.89

Figure 8.4
Probability distribution of coefficient estimate—Applying Student t-distribution
262 Chapter 8

We can now fill in the second blank in the interval estimate question:

Interval estimate question: What is the probability that the coefficient estimate, 1.2, lies within
1.5 of the actual coefficient value? 0.78

We will turn our attention to assessing the theory.

8.3 Theory Assessment: Hypothesis Testing

Hypothesis testing allows Clint to assess how much confidence he should have in the theory.
We begin by motivating hypothesis testing using the same approach as we took with Clint’s
opinion poll. We will play the role of the cynic. Then we will formalize the process.

8.3.1 Motivating Hypothesis Testing: The Cynic

Recall that the “theory” suggests that a student’s score on the quiz depends on the number of
minutes he/she studies:

Theory: Additional studying increases scores.

Review the regression model:

yt = βConst + βxxt + et

The theory suggests that βx is positive. Review the regression results for the first quiz (table 8.2).

Table 8.2
Quiz scores regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

The estimate for βx, 1.2, is positive. We estimate that an additional minute of studying
increases a student’s quiz score by 1.2 points. This lends support to Clint’s theory. But, how
much confidence should Clint have in the theory? Does this provide definitive evidence that
Clint’s theory is correct, or should we be skeptical? To answer this question, recall our earlier
hypothesis-testing discussion and play the cynic. What would a cynic’s view of our theory and
the regression results be?
Cynic’s view: Studying has no impact on a student’s quiz score; the positive coefficient estimate
obtained from the first quiz was just “the luck of the draw.” In fact, studying has no effect on
quiz scores; the actual coefficient, βx, equals 0.

Is it possible that our cynic is correct?

Econometrics Lab 8.2: Assessing the Cynic’s View

We will use a simulation to show that it is.

A positive coefficient estimate can arise in one repetition of the experiment even when the actual
coefficient is 0.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8.2.]

In the simulation, the default actual coefficient value is 0. Check the From–To checkbox. Also
0 is specified in the From list. In the To list, no value is specified; consequently there is no upper
From–To bound. The From–To Percent box will report the percent of repetitions in which the
coefficient estimate equals 0 or more. Be certain that the “Pause” checkbox is cleared. Click
Start, and then after many, many repetitions click Stop. In about half of the repetitions the
coefficient estimate is positive; that is, when the actual coefficient, βx, equals 0, the estimate is
positive about half the time. The histogram illustrates this. Now, we can apply the relative fre-
quency interpretation of probability. If the actual coefficient were 0, the probability of obtaining
a positive coefficient from one quiz would be about one-half as illustrated in figure 8.5.
Consequently we cannot dismiss the cynic’s view as absurd.
To assess the cynic’s view, we pose the following question:

Question for the cynic: What is the probability that the result would be like the one obtained
(or even stronger), if studying actually has no impact on quiz scores? That is, what is the prob-
ability that the coefficient estimate from the first quiz would be 1.2 or more, if studying had no
impact on quiz scores (if the actual coefficient, βx, equals 0)?
Answer: Prob[Results IF cynic correct].
264 Chapter 8

If βx = 0
Prob[b > 0] ≈ 0.50
x

Figure 8.5
Probability distribution of coefficient estimate—Could the cynic be correct?

The magnitude of the probability determines the likelihood that the cynic is correct, the likeli-
hood that studying has no impact on quiz scores:

Prob[Results IF cynic correct] small Prob[Results IF cynic correct] large

↓ ↓
Unlikely that the Likely that the
cynic is correct cynic is correct
↓ ↓
Unlikely that the Likely that the
studying has no impact studying has no impact

To compute this probability, let us review what we know about the probability distribution of
the coefficient estimate:

OLS estimation If H0 Number of Number of

procedure unbiased true Standard error observations parameters
↓
Mean[bx] = βx = 0 SE[bx] = 0.5196 DF = 3 − 2 = 1

Question for the cynic: What is the probability that the coefficient estimate from the first quiz
would be 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, βx,
equaled 0)?

How can we answer this question? We turn to the Econometrics Lab.

265 Interval Estimates and Hypothesis Testing

Econometrics Lab 8.3: Using the Econometrics Lab to Calculate Prob[Results IF cynic correct]

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 8.3.]

The appropriate information has been entered:

Mean: 0
Standard error: 0.5196
Value: 1.2
Degrees of freedom: 1

Click Calculate. The probability that the estimate lies in the right tail equals 0.13. The answer
to the question for the cynic is 0.13 (figure 8.6):
In fact there is an even easier way to compute the probability. We do not even need to use the
Econometrics Lab to because the statistical software calculates this probability automatically.
To illustrate this, we will first calculate the t-statistic based on the premise that the cynic is
correct, based on the premise that the actual value of the coefficient equals 0:
Value of random variable − Distribution mean 1.2 − 0
t= = = 2.309
Standard Error 0.5196
= Number of standard errors from the distributtion mean

1.2 lies 2.309 standard errors from 0. Next return to the regression results (table 8.3) and focus
attention on the row corresponding to the coefficient and on the “t-Statistic” and “Prob” columns.

Student t-distribution
Mean = 0
SE = 0.5196
DF = 1

0.13

bx
0 1.2

Figure 8.6
Probability distribution of coefficient estimate—Prob[Results IF cynic correct]
266 Chapter 8

Table 8.3
Quiz scores regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Two interesting observations emerge:

• First, the t-Statistic column equals 2.309, the value of the t-statistic we just calculated; the

t-statistic based on the premise that the cynic is correct and the actual coefficient equals 0. The
t-Statistic column reports the number of standard errors the coefficient estimate based on the
premise that the actual coefficient equals 0.
• Second, the Prob column equals 0.2601. This is just twice the probability we just calculated
using the Econometrics Lab:

2 × Prob[Results IF cynic correct] = Prob column

2 × 0.13 = 0.26

The Prob column is based on the premise that the actual coefficient equals 0 and then focuses
on the two tails of the probability distribution where each tail begins 1.2 (the numerical value
of the coefficient estimate) from 0. As figure 8.7 illustrates, the value in the Prob column equals
the probability of lying in the tails; the probability that the estimate resulting from one week’s
quiz lies at least 1.2 from 0 assuming that the actual coefficient, βx, equals 0. That is, the Prob
column reports the tails probability:

Tails probability: The probability that the coefficient estimate, bx, resulting from one regression
would lie at least 1.2 from 0 based on the premise that the actual coefficient, βx, equals 0.

Consequently we do not need to use the Econometrics Lab to answer the question that we
pose for the cynic:
267 Interval Estimates and Hypothesis Testing

Student t-distribution
Mean = 0
SE = 0.5196
DF = 1

0.2601/2 0.2601/2

b
x
1. 2 1.2
−1. 2 0 1.2

Figure 8.7
Probability distribution of coefficient estimate—Tails probability

Student t-distribution
Mean = 0
SE = 0.5196
DF = 1

0.2601/2

b
x
0 1.2

Figure 8.8
Probability distribution of coefficient estimate—Prob[Results IF cynic correct]

Question for the cynic: What is the probability that the coefficient estimate from the first quiz
is 1.2 or more, if studying had no impact on quiz scores (if the actual coefficient, βx, equals 0)?
Answer: Prob[Results IF cynic correct]

We can use the regression results to answer this question. From the Prob column we know
that the tails probability equals 0.2601. As shown in Figure 8.8, we are only interested in the
right tail, however, the probability that the coefficient estimate will equal 1.2 or more, if the
actual coefficient equals 0.
268 Chapter 8

Since the Student t-distribution is symmetric, the probability of lying in one of the tails is
0.2601/2. The answer to the question we posed to assess the cynic’s view is 0.13:
Tails probability 0.2601
Prob[ Results IF cynic correct ] = = ≈ 0.13
2 2

8.3.2 Formalizing Hypothesis Testing: The Steps

We formalized hypothesis testing in chapter 4 when we considered Clint’s public opinion poll.
We will follow the same steps here, with one exception. We add a step 0 to construct an appro-
priate model to assess the theory.

Theory: Additional studying increases quiz scores.

Step 0: Formulate a model reflecting the theory to be tested.

We have already constructed this model:

yt = βConst + βxxt + et

where

yt = quiz score
xt = minutes studied
βConst = points for showing up
βx = points for each minute studied

The theory suggests that βx is positive.

Step 1: Collect data, run the regression, and interpret the estimates (table 8.4).

First quiz data

Student x y
1 5 66
bConst = estimated points for showing up = 63
2 15 87 →
bx = estimated points for each minute studied = 1.2
3 25 90

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, studying has no impact on quiz scores. The results were just
“the luck of the draw.”
269 Interval Estimates and Hypothesis Testing

Table 8.4
Quiz scores regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Now we construct the null and alternative hypotheses. Like the cynic, the null hypothesis chal-
lenges the evidence; the alternative hypothesis is consistent with the evidence:

H0: βx = 0 Cynic is correct: studying has no impact on a student’s quiz score.

H1: βx > 0 Cynic is incorrect: additional studying increases quiz scores.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.

Questions for the cynic:

•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and studying actually has no impact?
• Specific question: The regression’s coefficient estimate was 1.2: What is the probability that
the coefficient estimate in one regression would be 1.2 or more if H0 were actually true (if the
actual coefficient, βx, equals 0)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true]

The magnitude of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
270 Chapter 8

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].

OLS estimation If H0 Number of Number of

procedure unbiased true Standard error observations parameters
↓
Mean[bx] = βx = 0 SE[bx] = 0.5196 DF = 3 − 2 = 1

We have already calculated this probability. First, we did so using the Econometrics Lab. Then,
we noted that the statistical software had done so automatically. We need only divide the tails
probability, as reported in the Prob column of the regression results, by 2:
0.2601
Prob[ Results IF H 0 true] = ≈ 0.13
2

The probability that the coefficient estimate in one regression would be 1.2 or more if H0 were
actually true (if the actual coefficient, βx, equals 0) is 0.13.

Step 5: Decide on the standard of proof, a significance level.

The significance level is the dividing line between the probability being small and the prob-
ability being large.

Prob[Results IF H0 true] Prob[Results IF H0 true]

Recall that the traditional significant levels used in academia are 1, 5, and 10 percent. Obviously
0.13 is greater than 0.10. Consequently Clint would not reject the null hypothesis that studying
has no impact on quiz scores even with a 10 percent significance level.

8.4 Summary: The Ordinary Least Squares (OLS) Estimation Procedure

8.4.1 Regression Model and the Role of the Error Term

Let us sum up what we have learned about the ordinary least squares (OLS) estimation
procedure:
271 Interval Estimates and Hypothesis Testing

yt = βConst + βxxt + et

where

yt = dependent variable
et = error term
xt = explanatory variable
t = 1, 2, . . . , T
T = sample size

The error term is a random variable; it represents random influences. The mean of the each error
term’s probability distribution equals 0:

Mean[et] = 0 for each t = 1, 2, . . . , T

8.4.2 Standard Ordinary Least Squares (OLS) Premises

• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent: Cov[ei,
ej] = 0. Knowing the value of the error term from one observation does not help us predict the
value of the error term for any other observation.
• Explanatory variable/error term independence premise: The explanatory variables, the
xt’s, and the error terms, the et’s, are not correlated. Knowing the value of an observation’s
explanatory variable does not help us predict the value of that observation’s error term.

8.4.3 Ordinary Least Squares (OLS) Estimation Procedure: Three Important Estimation
Procedure

There are three important estimation procedures embedded within the ordinary least squares
(OLS) estimation procedures:
• A procedure to estimate the values of the regression parameters, βx and βConst:

∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2
272 Chapter 8

• A procedure to estimate the variance of the error term’s probability distribution, Var[e]:
SSR
EstVar[ e] =
Degrees of freedom
•A procedure to estimate the variance of the coefficient estimate’s probability distribution,
Var[bx]:
EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x ) 2

8.4.4 Properties of the Ordinary Least Squares (OLS) Estimation Procedure and the Standard
Ordinary Least Squares (OLS) Premises

When the standard ordinary least square (OLS) premises are met:
•Each estimation procedure is unbiased; each estimation procedure does not systematically
underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).

Causation versus Correlation

Our theory and step 0 illustrate the important distinction between causation and correlation:

Theory: Additional studying increases quiz scores.

Step 0: Formulate a model reflecting the theory to be tested.

yt = βConst + βxxt + et

where

yt = quiz score
xt = minutes studied
et = error term
βConst = points for showing up
βx = points for each minute studied

The theory suggests that βx is positive.

Our model is a causal model. An increase in studying causes a student’s quiz score to increase:

Increase in studying (xt)

↓ Causes
Quiz score to increase (yt)
273 Interval Estimates and Hypothesis Testing

Correlation results whenever a causal relationship describes the reality accurately. That is, when
additional studying indeed increases quiz scores, studying and quiz scores will be (positively)
correlated:
• Knowing the number of minutes a student studies allows us to predict his/her quiz score.
• Knowing a student’s quiz score helps us predict the number of minutes he/she has studied.

More generally, a causal model that describes reality accurately implies correlation:

Causation implies Correlation

Beware that correlation need not imply causation, however. For example, consider precipita-
tion in the Twin Cities, precipitation in Minneapolis and precipitation in St Paul. Since the cities
are near each other precipitation in the two cities are highly correlated. When it rains in
Minneapolis, it also always rains in St Paul, and vice versa. But there is no causation involved
here. Rain in Minneapolis does not cause rain in St. Paul, nor does rain in St. Paul cause rain
in Minneapolis. The rain is caused by the weather system moving over the cities. In general, the
correlation of two variables need not imply that a causal relationship exists between the
variables:

Correlation need not imply Causation

Chapter 8 Review Questions

1. What is the “template” for the interval estimation question?

2. To compute interval estimates, when is it appropriate to use the
a. normal distribution?
b. Student t-distribution?
3. What are the formal hypothesis-testing steps?
4. We have focused on three estimation procedures embedded within the ordinary least squares
(OLS) estimation procedures. What are they?
5. When the standard ordinary least squares (OLS) premises are met, what can we conclude
about the estimation procedures embedded within the ordinary least squares (OLS) estimation
procedures?

Chapter 8 Exercises

Consider petroleum consumption in Nebraska:

Petroleum consumption data for Nebraska: Annual time series data of petroleum consumption
and prices for Nebraska from 1990 to 1999
274 Chapter 8

PetroConst Consumption of petroleum in year t (1,000’s of gallons)

Cpit Midwest Consumer Price Index in year t (1982–84 100)
Popt Nebraska population in year t
PriceNomt Nominal price of petroleum in year t (dollars per gallon)
Petroleum
Midwest consumption Nebraska Petroleum price
Year CPI (1,000s of gallons) population (dollars per gallon)
1990 127.4 1,604,232 1,578,417 1.07
1991 132.4 1,562,862 1,595,919 1.04
1992 136.1 1,587,474 1,611,687 1.01
1993 140.0 1,578,612 1,625,590 1.01
1994 144.0 1,626,828 1,639,041 1.01
1995 148.4 1,657,950 1,656,992 1.02
1996 153.0 1,822,212 1,673,740 1.12
1997 156.7 1,804,572 1,686,418 1.08
1998 159.3 1,905,330 1,695,816 0.92
1999 162.7 1,929,060 1,704,764 0.98

1. Focus on the following simple regression model of demand:

Model: PetroConsPCt = βConst + βPPriceRealt + et

where

t = 1970, 1971, . . . , 2004

PetroConsPCt = per capita consumption of petroleum (gallons) in year t
PriceRealt = real price of petroleum (dollars per gallon, adjusted by the CPI) in year t
a. Generate the variables PetroConsPCt and PriceReal. In doing so, be certain that the units
are correct; that is, express the variable PetroConsPCt in terms of gallons and PriceRealt in
terms real dollars per gallon. Note that the variable PetroConst is reported in units of 1,000s
of gallons; account for this when you generate PetroConsPCt. Note: Check to be certain that
your new variables have been generated correctly.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption - Neb.]

b. Estimate the parameters of the model. Interpret bP, the estimate for βP.
Consider the reliability of the coefficient estimate.
275 Interval Estimates and Hypothesis Testing

c. Should we use the normal distribution or the Student t-distribution? Explain.

d. What is the probability that the coefficient estimate falls within:
i. 400 of the actual coefficient value? ______
ii. 300 of the actual coefficient value? ______
iii. 200 of the actual coefficient value? ______

Use the Econometrics Lab to compute the probabilities:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

2. Again, consider Nebraska’s petroleum consumption data in the 1990’s and the model cited
in question 1.
a. What does economic theory teach us about how the real price of petroleum should affect
Nebraska petroleum consumption?
b. Apply the hypothesis-testing approach that we developed to assess the theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption - Neb.]

c. What is your assessment of the theory? Explain.

3. Revisit the gasoline consumption data:

Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999

GasConst US gasoline consumption in year t (millions of gallons per day)

PriceDollarst Real price of gasoline in year t (dollars per gallon –2000 dollars)

a. What does economic theory teach us about how the real price of gasoline should affect
US gasoline consumption?
b. Apply the hypothesis-testing approach that we developed to assess the theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

c. What is your assessment of the theory? Explain.

4. Revisit the cigarette consumption data.

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia
276 Chapter 8

CigConsPCt Cigarette consumption per capita in state t (packs)

EducColleget Percent of population with bachelor degrees in state t
EducHighSchoolt Percent of population with high school diplomas in state t
IncPCt Income per capita in state t (1,000’s of dollars)
Popt Population of state t (persons)
PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)
PriceSuppliert Price of cigarettes in state t received by suppliers (dollars per pack)
RegionMidWestt 1 if state t in Midwest census region, 0 otherwise
RegionNorthEastt 1 if state t in Northeast census region, 0 otherwise
RegionSoutht 1 if state t in South census region, 0 otherwise
RegionWestt 1 if state t in West census region, 0 otherwise
SmokeRateAdultt Percent of adults who smoke in state t
SmokeRateYoutht Percent of youths who smoke in state t
Statet Name of state t
Taxt Cigarette tax rate in state t (dollars per pack)
TobProdPCt Per capita tobacco production in state t (pounds)

Conventional wisdom suggests that high school dropouts are more likely to smoke cigarettes
than those who graduate.
a. Apply the hypothesis-testing approach that we developed to assess the conventional
wisdom.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

b. What is your assessment of the conventional wisdom? Explain.

5. Revisit the congressional earmarks data.

House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.
277 Interval Estimates and Hypothesis Testing

CongressNamet Name of Congressperson t

CongressPartyt Party of Congressperson t
CongressStatet State of Congressperson t
IncPCt Income per capita in the Congressperson t’s state (dollars)
Numbert Number of earmarks received that were sponsored solely by Congressper-
son t
PartyDem1t 1 if Congressperson t Democrat; 0 otherwise
PartyRep1t 1 if Congressperson t Republican; 0 otherwise
RegionMidwestt 1 if Congressperson t represents a midwestern state, 0 otherwise
RegionNortheastt 1 if Congressperson t represents a northeastern state; 0 otherwise
RegionSoutht 1 if Congressperson t represents a southern state; 0 otherwise
RegionWestt 1 if Congressperson t represents a western state; 0 otherwise
ScoreLiberalt Congressperson’s t liberal score rating in 2007
Termst Number of terms served by Congressperson in the US Congress
UnemRatet Unemployment rate in Congressperson t’s state

It has been alleged that since the Congress was controlled by Democrats, Democratic members
received more solo earmarks than their non-Democratic colleagues.
a. Apply the hypothesis-testing approach that we developed to assess the allegation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]
b. What is your assessment of the allegations? Explain.
6. Consider wage and age data.

Wage and age data: Cross-sectional data of wages and ages for 190 union members included
in the March 2007 Current Population Survey who have earned high school degrees, but have
not had any additional education.

Aget Age of worker t (years)

Waget Wage rate of worker t (dollars)

Many believe that unions strongly support the seniority system. Some union contracts require
employers to pay workers who have been on the job for many years more than newly hired
workers. Consequently older workers should typically be paid more than younger workers:

Seniority theory: Additional years of age increases wage rate.

Use data from the March 2007 Current Population Survey to investigate the seniority theory.
a. Apply the hypothesis-testing approach that we developed to assess the seniority theory.
278 Chapter 8

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and Age.]

b. What is your assessment of the theory? Explain.

7. Revisit the effect that the price of crude oil has on crude oil production:

Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.

OilProdBarrelst US crude oil productions in year t (thousands of barrels per day)

Pricet Real wellhead price of crude oil in year t (1982–84 dollars per barrel)

a. What does economic theory teach us about how the real price of crude oil should affect
US crude oil production?
b. Apply the hypothesis-testing approach that we developed to assess the theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crude Oil Production.]

c. What is your assessment of the theory? Explain.

Appendix 8.1 Student t-Distribution Table—Right-Tail Critical Values

α: Right-tail
probability

t
0

Figure 8.9
Student t-distribution—Right-tail probabilities
279 Interval Estimates and Hypothesis Testing

Table 8.5
Right-tail critical values for the Student t-distribution

Degrees of freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005

1 3.078 6.314 12.706 31.821 63.657

2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
31 1.309 1.696 2.040 2.453 2.744
32 1.309 1.694 2.037 2.449 2.738
33 1.308 1.692 2.035 2.445 2.733
34 1.307 1.691 2.032 2.441 2.728
35 1.306 1.690 2.030 2.438 2.724
36 1.306 1.688 2.028 2.434 2.719
37 1.305 1.687 2.026 2.431 2.715
38 1.304 1.686 2.024 2.429 2.712
39 1.304 1.685 2.023 2.426 2.708
280 Chapter 8

Table 8.5
(continued)

Degrees of freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005

40 1.303 1.684 2.021 2.423 2.704

50 1.299 1.676 2.009 2.403 2.678
60 1.296 1.671 2.000 2.390 2.660
70 1.294 1.667 1.994 2.381 2.648
80 1.292 1.664 1.990 2.374 2.639
90 1.291 1.662 1.987 2.368 2.632
100 1.290 1.660 1.984 2.364 2.626
110 1.289 1.659 1.982 2.361 2.621
120 1.289 1.658 1.980 2.358 2.617

Table 8.6
Right-tail critical values for the Student t-distribution

Degrees of freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005

1 3.078 6.314 12.706 31.821 63.657

2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841

Appendix 8.2 Assessing the Reliability of a Coefficient Estimate Using the Student
t-Distribution Table

We begin by describing the Student t-distribution table; a portion of it appears in table 8.6.
The first column represents the degrees of freedom. The numbers in the body of the table are
called the “critical values.” A critical value equals the number of standard errors a value lies
from the mean. The top row specifies α‘s value of, the “right-tail probability.” Figure 8.10 helps
us understand the table.
Since the t-distribution is symmetric, the “left tail probability” also equals α. The probability
of lying within the tails, in the center of the distribution, is 1 − 2α. This no doubt sounds con-
fusing, but everything should become clear after we show how Clint can use this table to answer
the interval estimate question.

Interval estimate question: What is the probability that the estimate, 1.2, lies within ____ of
the actual value? ____
281 Interval Estimates and Hypothesis Testing

Student t-distribution
1 –2α

α α

Estimate
Critical value × SE Critical value × SE
Distribution mean

Figure 8.10
Student t-distribution—Illustrating the probabilities

Let us review the regression results from Professor Lord’s first quiz:

Coefficient estimate = bx = 1.2

Standard error of coefficient estimate = SE[bx] = 0.5196

Next we will modify figure 8.10 to reflect our specific example. Focus on figure 8.11:
•We are interested in the coefficient estimate; consequently we replace the horizontal axis label
by substituting bx for estimate.
• Also we know that the estimation procedure Clint uses, the ordinary least squares (OLS) esti-
mation procedure, is unbiased; hence the distribution mean equals the actual value. We can
replace the distribution mean with the actual coefficient value, βx.

Now let us help Clint fill in the blanks. When using the table we begin by filling in the second
blank rather than the first.

Second blank: Choose α to specify the tail probability.

Clint must choose a value for α. As we will see, the value he chooses depends on how demand-
ing he is. For example, suppose that Clint believes that a 0.80 probability of the estimate lying
in the center of the distribution, close to the mean, is good enough. He would then choose an α
equal to 0.10. To understand why, note that when α equals 0.10, the probability of the estimate
lying in the right tail would be 0.10. Since the t-distribution is symmetric, the probability of the
estimate lying in the left tail would be 0.10 also. Therefore the probability that the estimate lies
in the center of the distribution would be 0.80; accordingly we write 0.80 in the second blank.
What is the probability that the estimate, 1.2, lies within _____ of the actual value? 0.80
282 Chapter 8

Student t-distribution

1 –2α

α α

b
x

Critical value × SE Critical value × SE

βx

Figure 8.11
Student t-distribution—Illustrating the probabilities for coefficient estimate

Table 8.7
Right-tail critical values for the Student t-distribution—α equals 0.10 and degrees of freedom equals 1

Degrees of freedom α = 0.10 α = 0.05 α = 0.025 α = 0.01 α = 0.005

1 3.078 6.314 12.706 31.821 63.657

2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841

First blank: Calculate tail boundaries.

The first blank quantifies what “close to” means. The standard error and the Student t-distri-
bution table allow us to fill in the first blank. To do so, we begin by calculating the degrees of
freedom. Recall that the degrees of freedom equal 1:

Degrees of freedom = Sample size − Number of estimated parameters

=3−2

=1
Clint chose a value of α equal to 0.10 (figure 8.12). Table 8.7 indicates that the critical value
for α = 0.10 with one degree of freedom is 3.078. The probability that the estimate falls within
3.078 standard errors of the mean is 0.80. Next the regression results report that the standard
error equals 0.5196:

SE[bx] = 0.5196
283 Interval Estimates and Hypothesis Testing

Student t-distribution
0.80

0.10 0.10

bx
Critical value × SE Critical value × SE

3.078 × 0.5196 = 1.6 3.078 × 0.5196 = 1.6

βx −1. 6 βx βx +1.6

Figure 8.12
Student t-distribution—Calculations for an α equal to 0.10

After multiplying the critical value given in the table, 3.078, by the standard error, 0.5196, we
can fill in the first blank:

3.078 × 0.5196 = 1.6

What is the probability that the estimate, 1.2, lies within 1.6 of the actual value? 0.80
One-Tailed Tests, Two-Tailed Tests, and Logarithms
9

Chapter 9 Outline

9.1 A One-Tailed Hypothesis Test: The Downward Sloping Demand Curve

9.2 One-Tailed versus Two-Tailed Tests

9.3 A Two-Tailed Hypothesis Test: The Budget Theory of Demand

9.4 Hypothesis Testing Using Clever Algebraic Manipulations

9.5 Summary: One-Tailed and Two-Tailed Tests

9.6 Logarithms: A Useful Econometric Tool to Fine Tuning Hypotheses—The Math

9.6.1 Interpretation of the Coefficient Estimate: Esty = bConst + bx x
9.6.2 Differential Approximation: Δy ≈ (dy/dx)Δx
9.6.3 Derivative of a Natural Logarithm: d log(z)/dz = 1/z
9.6.4 Dependent Variable Logarithm: y = log(z)
9.6.5 Explanatory Variable Logarithm of z: x = log(z)

9.7 Using Logarithms—An Illustration: Wages and Education

9.7.1 Linear Model: Waget = βConst + βEHSEduct + et
9.7.2 Log Dependent Variable Model: LogWaget = βConst + βEHSEduct + et
9.7.3 Log Explanatory Variable Model: Waget = βConst + βELogHSEduct + et
9.7.4 Log-Log (Constant Elasticity) Model: LogWaget = βConst + βELogHSEduct + et

9.8 Summary: Logarithms and the Interpretation of Coefficient Estimates

286 Chapter 9

Chapter 9 Prep Questions

1. Suppose that the following equation describes how Q and P are related: Q = βConstPβP.
a. What does dQ/dP equal?
b. Focus on the ratio of P to Q; that is, focus on P/Q. Substitute βConstPβP for Q and show
that P/Q equals 1/βConstPβP−1.
c. Show that (dQ/dP)(P/Q) equals βP.

2. We would like to express the percent changes algebraically. To do so, we begin with an
example. Suppose that X increases from 200 to 220.
a. In percentage terms by how much has X increased?
b. Argue that you have implicitly used the following equation to calculate the percent change:

ΔX
Percent change in X = × 100
X

3. Suppose that a household spends $1,000 of its income on a particular good every month.
a. What does the product of the good’s price, P, and the quantity of the good purchased by
the household each month, Q, equal?
b. Solve for Q.
c. Consider the function Q = βConstPβP. What would
i. βConst equal?
ii. βP equal?

4. Let y be a function of x: y = f(x). What is the differential approximation? That is,

Δy ≈ _______

5. What is the expression for a derivative of a natural logarithm? That is, what does d log(z)/dz
equal?1

9.1 A One-Tailed Hypothesis Test: The Downward Sloping Demand Curve

Microeconomic theory tells us that the demand curve is typically downward sloping. In introduc-
tory economics and again in intermediate microeconomics we present sound logical arguments
justifying the shape of the demand curve. History has taught us many times, however, that just
because a theory sounds sensible does not necessary mean that it is true. We must test this theory
to determine if it is supported by real world evidence. We will focus on gasoline consumption
in the United States during the 1990s to test the downward sloping demand theory.

1. Be aware that sometimes natural logarithms are denoted as ln(z) rather than log(z). We will use the log(z) notation
for natural logarithms throughout this textbook.
287 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Gasoline consumption data: Annual time series data US gasoline consumption and prices from
1990 to 1999.

GasConst US gasoline consumption in year t (millions of gallons per day)

PriceDollarst Real price of gasoline in year t (dollars per gallon—2000 dollars)
Gasoline Gasoline
Real price consumption Real price consumption
Year ($ per gallon) (millions of gals) Year ($ per gallon) (millions of gals)
1990 1.43 303.9 1995 1.25 327.1
1991 1.35 301.9 1996 1.31 331.4
1992 1.31 305.3 1997 1.29 336.7
1993 1.25 314.0 1998 1.10 346.7
1994 1.23 319.2 1999 1.19 354.1

Theory: A higher price decreases the quantity demanded; the demand curve is downward
sloping.
Project: Assess the effect of gasoline prices on gasoline consumption.
Step 0: Formulate a model reflecting the theory to be tested.

Our model will be a simple linear equation:

GasConst = βConst + βPPriceDollarst + et

where

GasConst = quantity of gasoline demanded in year t (millions of gallons)

PriceDollarst = price in year t (1990 dollars)

The theory suggests that βP should be negative. A higher price decreases the quantity demanded;
the demand curve is upward sloping.

Step 1: Collect data, run the regression, and interpret the estimates.
The gasoline consumption data can be accessed by clicking within the box below.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

While the regression results (table 9.1) indeed support the theory, remember that we can never
expect an estimate to equal the actual value; sometimes the estimate will be greater than the
actual value and sometimes less. The fact that the estimate of the price coefficient is negative,
−151.7, is comforting, but it does not prove that the actual price coefficient, βP, is negative. In
fact we do not have and can never have indisputable evidence that the theory is correct. How
do we proceed?
288 Chapter 9

Table 9.1
Gasoline demand regression results

Ordinary least squares (OLS)

Dependent variable: GasCons

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceDollars −151.6556 47.57295 −3.187853 0.0128

Const 516.7801 60.60223 8.527410 0.0000
Number of observations 10
Estimated equation: EstGasCons = 516.8 − 151.7PriceDollars
Interpretation of estimates:
bP = −151.7: A $1 increase in the real price of gasoline decreases the quantity of gasoline demanded by 151.7
million gallons.
Critical result: The coefficient estimate equals −151.7. The negative sign of the coefficient estimate suggests that a
higher price reduces the quantity demanded. This evidence supports the downward sloping demand
theory.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.

Cynic’s view: The price actually has no effect on the quantity of gasoline demanded; the nega-
tive coefficient estimate obtained from the data was just “the luck of the draw.” The actual
coefficient, βP, equals 0.
Now, we construct the null and alternative hypotheses:

H0: βP = 0 Cynic’s view is correct: Price has no effect on quantity demanded

H1: βP < 0 Cynic’s view is incorrect: A higher price decreases quantity demanded

The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is
consistent with the evidence.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.

Question for the cynic:

•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and the price actually has no impact?
• Specific question: The regression’s coefficient estimate was −151.7: What is the probability
that the coefficient estimate in one regression would be −151.7 or less, if H0 were actually true
(if the actual coefficient, βP, equals 0)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true]

289 One-Tailed Tests, Two-Tailed Tests, and Logarithms

The magnitude of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
If the null hypothesis were true, the actual price coefficient would equal 0. Since ordinary
least squares (OLS) estimation procedure for the coefficient value is unbiased, the mean of the
probability distribution for the coefficient estimates would be 0. The regression results provide
us with the standard error of the coefficient estimate. The degrees of freedom equal 8: the number
of observations, 10, less the number of parameters we are estimating, 2 (the constant and the
coefficient).

OLS estimation Number of Number of

procedure unbiased If H0 true Standard error observations parameters
↓
Mean[bP] = βP = 0 SE[bP] = 47.6 DF = 10 − 2 = 8

We now have the information needed to calculate Prob[Results IF H0 true], the probability of
result like the one obtained (or even stronger) if the null hypothesis, H0, were true. We could
use the Econometrics Lab to compute this probability, but in fact the statistical software has
already done this for us (table 9.2).
Recall that the Prob column reports the tails probability:

Tails probability: The probability that the coefficient estimate, bP, resulting from one regression
would lie at least 151.7 from 0, if the actual coefficient, βP, equals 0.
The tails probability reports the probability of lying in the two tails (figure 9.1). We are only
interested in the probability that the coefficient estimate will be −151.7 or less; that is, we are
only interested in the left tail. Since the Student t-distribution is symmetric, we divide the tails
probability by 2 to calculated Prob[Results IF H0 true]:
0.0128
Prob[ Results IF H 0 true] = = 0.0064
2
290 Chapter 9

Table 9.2
Gasoline demand regression results

Ordinary least squares (OLS)

Dependent variable: GasCons

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceDollars −151.6556 47.57295 −3.187853 0.0128

Student t-distribution
Mean = 0
SE = 47.6
DF = 8

0.0128/2 0.0128/2

bP
−151.7 0

Figure 9.1
Probability distribution of linear model’s coefficient estimate
291 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Step 5: Decide on the standard of proof, a significance level.

The significance level is the dividing line between the probability being small and the prob-
ability being large.

Prob[Results IF H0 true] Prob[Results IF H0 true]

The traditional significance levels in academe are 1, 5, and 10 percent. In this case, the
Prob[Results IF H0 true] equals 0.0064, less than 0.01. So, even with a 1 percent significance
level, we would reject the null hypothesis that price has no impact on the quantity. This result
supports the theory that the demand curve is downward sloping.

9.2 One-Tailed versus Two-Tailed Tests

Thus far we have considered only one-tailed tests because the theories we have investigated
suggest that the coefficient was greater than a specific value or less than a specific value:
• Quiz score theory: The theory suggested that studying increases quiz scores, that the coef-
ficient of minutes studied was greater than 0.
•Demand curve theory: The theory suggested that a higher price decreases the quantity sup-
plied, that the coefficient of price was less than 0.

In these cases, we were only concerned with one side or one tail of the distribution, either the
right tail or the left tail. Some theories, however, suggest that the coefficient equals a specific
value. In these cases, both sides (both tails) of the distribution are relevant and two-tailed tests
are appropriate. We will now investigate one such theory, the budget theory of demand.

9.3 A Two-Tailed Hypothesis Test: The Budget Theory of Demand

The budget theory of demand postulates that households first decide on the total number of
dollars to spend on a good. Then, as the price of the good fluctuates, households adjust the
quantity they purchase to stay within their budgets. We will focus on gasoline consumption to
assess this theory:
292 Chapter 9

Budget theory of demand: Expenditures for gasoline are constant. That is, when gasoline prices
change, households adjust the quantity demanded so as to keep their gasoline expenditures
constant. Expressing this mathematically, the budget theory of demand postulates that the price,
P, times the quantity, Q, of the good demanded equals a constant:

P × Q = BudAmt

where

BudAmt = budget amount

Project: Assess the budget theory of demand.

As we will learn, the price elasticity of demand is critical in assessing the budget theory of
demand. Consequently we will now review the verbal definition of the price elasticity of demand
and show how we can make it mathematically rigorous.

Verbal definition:The price elasticity demand equals the percent change in the quantity
demanded resulting from a 1 percent change in price.

To convert the verbal definition into a mathematical one, we start with the verbal definition:

Price elasticity of demand = Percent change in quantity demanded resulting from a 1 percent
change in price

Convert this verbal definition into a ratio:

Percent change in the quantity
=
Percent change in the price
Next let us express the percent changes algebraically. To do so, consider an example. Suppose
that the variable X increases from 200 to 220; this constitutes a 10 percent increase. How did
we calculate that?

X: 200 → 220

Percent change in X = (220 − 200)/200 × 100 = (20/200) × 100 = 0.1 × 100 = 10 percent. We
can generalize this:

ΔX
Percent change in X = × 100
X
Substituting for the percent changes
(ΔQ Q) × 100
=
(ΔP P ) × 100
293 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Simplifying

ΔQ P
=
ΔP Q
Taking limits as ΔP approaches 0,
dQ P
=
dP Q

There always exists a potential confusion surrounding the numerical value for the price elasticity
of demand. Since the demand curve is downward sloping, dQ/dP is negative. Consequently the
price elasticity of demand will be negative. Some textbooks, in an effort to avoid negative
numbers, refer to price elasticity of demand as an absolute value. This can lead to confusion,
however. Accordingly we will adopt the more straightforward approach: our elasticity of demand
will be defined so that it is negative.
Now we are prepared to embark on the hypothesis-testing process.

Step 0: Formulate a model reflecting the theory to be tested.

The appropriate model is the constant price elasticity model:

Q = βConstPβP

Before doing anything else, however, let us now explain why this model indeed exhibits
constant price elasticity. We start with the mathematical definition of the price elasticity of
demand:
dQ P
Price elasticity of demand =
dP Q

Now compute the price elasticity of demand when Q = βConstPβP:

dQ P
Price elasticity of demand =
dP Q

Recall the rules of differentiation:

dQ
= βConst β P P βP −1
dP

Substituting for dQ/dP

P
= βConst β P P β P −1
Q
294 Chapter 9

Substituting βConstPβP for Q

P
= βConst β P P β P −1
βConst P βP

Simplifying
= βP

The price elasticity of demand just equals the value of βP, the exponent of the price, P.
A little algebra allows us to show that the budget theory of demand postulates that the price
elasticity of demand, βP, equals −1. First start with the budget theory of demand:

P × Q = BudAmt

Multiply through by P−1

Q = BudAmt × P−1

Compare this to the constant price elasticity demand model:

Q = βConstPβP

Clearly,

βConst = BudAmt and βP = −1

This allows us to reframe the budget theory of demand in terms of the price elasticity of
demand, βP:

Budget theory of demand: βP = −1.0

Natural logarithms allow us to convert the constant price elasticity model into its linear form:

Q = βConst PβP

Taking natural logarithms of both sides:

log(Q) = log(βConst) + βPlog(P)

Log Q = c + βP Log P

where

Log Q = log(Q)
c = log(βConst)
Log P = log(P)

Step 1: Collect data, run the regression, and interpret the estimates.
Recall that we are using US gasoline consumption data to assess the theory.
295 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Gasoline consumption data: Annual time series data for US gasoline consumption and prices
from 1990 to 1999.

GasConst US gasoline consumption in year t (millions of gallons per day)

PriceDollarst Real price of gasoline in year t (dollars per gallon—chained 2000 dollars)

We must generate the two variables: the logarithm of quantity and the logarithm of price:
• LogQt = log(GasConst)
• LogPt = log(PriceDollarst)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

Getting Started in EViews

To generate the new variables, open the workfile.

• In the Workfile window: click Genr.
• In the Generate Series by Equation window: enter the formula for the new series:

logq = log(gascons)
• Click OK.

Repeat the process to generate the logarithm of price.

• In the Workfile window: click Genr.
• In the Generate Series by Equation window: enter the formula for the new series:

logp = log(pricedollars)
• Click OK.

Now we can use EViews to run a regression with logq, the logarithm of quantity, as the dependent
variable and logp, the logarithm of price, as the explanatory variable.
•In the Workfile window: Click on the dependent variable, logq, first, and then click on the
explanatory variable, logp, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
• Do not forget to close the workfile.
296 Chapter 9

Note that estimate for the price elasticity of demand equals −0.586 (table 9.3). Since the budget
theory of demand postulates that the price elasticity of demand equals −1.0, the critical result is
not whether the estimate is above or below −1.0. Instead, the critical result is that the estimate
does not equal −1.0; more specifically, the estimate is 0.414 from −1.0. Had the estimate been
−1.414 rather than −0.586, the results would have been just as troubling as far as the budget
theory of demand is concerned (see figure 9.2).

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
The cynic always challenges the evidence. The regression results suggest that the price elastic-
ity of demand does not equal −1.0 since the coefficient estimate equals −0.586. Accordingly, the
cynic challenges the evidence by asserting that it does equal −1.0.

Table 9.3
Budget theory of demand regression results

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.585623 0.183409 −3.192988 0.0127

Const 5.918487 0.045315 130.6065 0.0000
Number of observations 10
Estimated equation: EstLogQ = 5.92 − 0.586LogP
Interpretation of estimates:
bP = − 0.586: A 1 percent increase in the price decreases the quantity demand by 0.586 percent. That is, the
estimate for the price elasticity of demand equals −0.586.
Critical result: The coefficient estimate equals −0.586. The coefficient estimate does not equal −1.0; the estimate is
0.414 from −1. This evidence suggests that the budget theory of demand is incorrect.

Theory Evidence
Price
elasticity
0.414 of demand
−1. 0 −0.586 0

Figure 9.2
Number line illustration of critical result
297 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Cynic’s view:Sure the coefficient estimate from regression suggests that the price elasticity of
demand does not equal −1.0, but this is just “the luck of the draw.” The actual price elasticity
of demand equals −1.0.
Question: Can we dismiss the cynic’s view as absurd?
Answer: No, as a consequence of random influences. Even if the actual price elasticity equals
−1.0, we could never expect the estimate to equal precisely −1.0. The effect of random influences
is captured formally by the “statistical significance question:”
Statistical significance question: Is the estimate of −0.586 statistically different from −1.0?
More precisely, if the actual value equals −1.0, how likely would it be for random influences to
cause the estimate to be 0.414 or more from −1.0?

We will now construct the null and alternative hypotheses to address this question:

H0: βP = −1.0 Cynic’s view is correct; actual price elasticity of demand equals −1.0.
H1: βP ≠ −1.0 Cynic’s view is incorrect; actual price elasticity of demand does not
equal −1.0.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.

Question for the cynic:

•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and the actual price elasticity of demand equals
−1.0?
• Specific question: The regression’s coefficient estimate was −0.586: What is the probability
that the coefficient estimate, bP, in one regression would be at least 0.414 from −1.0, if H0 were
actually true (if the actual coefficient, βP, equals −1.0)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true]

The magnitude of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
298 Chapter 9

If the null hypothesis were true, the actual coefficient would equal −1.0. Since ordinary least
squares (OLS) estimation procedure for the coefficient value is unbiased, the mean of the prob-
ability distribution of coefficient estimates would be −1.0. The regression results provide us with
the standard error of the coefficient estimate. The degrees of freedom equal 8: the number of
observations, 10, less the number of parameters we are estimating, 2 (the constant and the
coefficient).

OLS estimation If H0 Standard Number of Number of

procedure unbiased true error observations parameters
↓
Mean[bP] = βP = −1.0 SE[bP] = 0.183 DF = 10 − 2 = 8

Can we use the “tails probability” as reported in the regression results to compute Prob[Results
IF H0 true]? Unfortunately, we cannot. The tails probability appearing in the Prob column of the
regression results is based on the premise that the actual value of the coefficient equals 0. Our
null hypothesis claims that the actual coefficient equals −1.0, not 0. Accordingly the regression
results appearing in table 9.3 do not report the probability we need.
We can, however, use the Econometrics Lab to compute the probability.

Econometrics Lab 9.1: Using the Econometrics Lab to Calculate Prob[Results IF H0 True]

We will calculate this probability in two steps:

• First, calculate the right-tail probability. Calculate the probability that the estimate lies 0.414

or more above −1.0; that is, the probability that the estimate lies at or above −0.586.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 9.1a.]

The following information has been entered:

Mean: −1.0 Value: −0.586

Standard error: 0.183 Degrees of freedom: 8

Click Calculate. The right-tail probability equals 0.027.

•Second, calculate the left-tail probability. Calculate the probability that the estimate lies 0.414
or more below −1.0; that is, the probability that the estimate lies at or below −1.414.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 9.1b.]
299 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Student t-distribution
Mean = −1.0
SE = 0.183
DF = 8

0.027 0.027

bP
0.414 0.414
−1.414 −1. 0 −0.586

Figure 9.3
Probability distribution of constant elasticity model’s coefficient estimate

The following information has been entered:

Mean: −1.0 Value: −1.414

Standard error: 0.183 Degrees of freedom: 8

Click Calculate. The left-tail probability equals 0.027.

As shown in figure 9.3 the probability that the estimate lies at least 0.414 from −1.0 equals
0.054, the sum of the right- and left-tail probabilities:

Left tail Right tail

↓ ↓
Prob[Results IF H0 true] ≈ 0.027 + 0.027 = 0.054.

Recall why we could not use the tails probability appearing in the regression results to cal-
culate the probability? The regression’s tail’s probability is based on the premise that the value
of the actual coefficient equals 0. Our null hypothesis, however, is based on the premise that the
value of the actual coefficient equals −1.0. So the regression results do not report the probability
we need.

9.4 Hypothesis Testing Using Clever Algebraic Manipulations

It is very convenient to use the regression results to calculate the probabilities, however. In fact
we can do so by being clever. Since the results report the tails probability based on the premise
300 Chapter 9

that the actual coefficient equals 0, we can cleverly define a new coefficient that equals 0 when-
ever the price elasticity of demand equals −1.0. The following definition accomplishes this:

βClever = βP + 1.0

The critical property of βClever’s definition is that the price elasticity of demand, βP, equals −1.0
if and only if βClever equals 0:
βP = −1.0 Ù βClever = 0

Next recall the log form of the constant price elasticity model:

LogQt = c + βPLogPt

where

LogQt = log(GasConst)
LogPt = log(Pricet)

Let us now perform a little algebra. Since βClever = βP + 1.0, βP = βClever − 1.0. Let us substitute
for βP:

LogQt = c + βPLogPt

Substituting for βP obtains

LogQ = c + (βClever − 1.0) LogPt

Multiplying through by LogPt obtains

LogQ = c + βCleverLogPt − LogPt

Moving LogPt to the left-hand side obtains

LogQt + LogPt = c + βCleverLogPt

LogQPlusLogPt = c + βCleverLogP

where

LogQPlusLogPt = LogQt + LogPt

We can now express the hypotheses in terms of βClever. Recall that βP = −1.0 if and only if
βClever = 0:

H0: βP = −1.0 Ù H0: βClever = 0 Actual price elasticity of demand equals −1.0.
H1: βP ≠ −1.0 Ù H1: βClever ≠ 0 Actual price elasticity of demand does not equal −1.0.
301 One-Tailed Tests, Two-Tailed Tests, and Logarithms

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

Getting Started in EViews

To generate the new variables, open the workfile.

• In the Workfile window: click Genr.
• In the Generate Series by Equation window: enter the formula for the new series; for example,

logqpluslogp = logq + logp

• Click OK.

Now we can use EViews to run a regression with yclever as the dependent variable and logp as
the explanatory variable.
•In the Workfile window: Click on the dependent variable, logqpluslogp, first; and then click
on the explanatory variable, logp, while depressing the <Ctrl> key.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Equation.
• In the Equation Specification window: Click OK.
• Do not forget to close the workfile.

Table 9.4
Budget theory of demand regression results with clever algebra

Ordinary least squares (OLS)

Dependent variable: LogQPlusLogP

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP 0.414377 0.183409 2.259308 0.0538

Const 5.918487 0.045315 130.6065 0.0000
Number of observations 10
Estimated equation: EstLogQ = 5.92 + 0.414LogP
Critical result: The coefficient estimate, bClever, equals 0.414. The coefficient estimate does not equal 0; the estimate
is 0.414 from 0. This evidence suggests that the budget theory of demand is incorrect.
302 Chapter 9

First let us compare the estimates for βP in table 9.3 and βClever in table 9.4
• Estimate for βP, bP, equals −0.586;
• Estimate for βClever, bClever, equals 0.414.
This is consistent with the definition of βClever. By definition, βClever equals βP plus 1.0:
βClever = βP + 1.0

The estimate of βClever equals the estimate of βP plus 1:

bClever = bP + 1.0
= −0.586 + 1.0
= 0.414

Next calculate Prob[Results IF H0 true] focusing on βClever:

•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct?
• Specific question: The regression’s coefficient estimate was 0.414. What is the probability
that the coefficient estimate, bClever, in one regression would be at least 0.414 from 0, if H0 were
actually true (if the actual coefficient, βClever, equals 0)?
The tails probability appearing in the regression results is based on the premise that the actual
value of the coefficient equals 0. Consequently the tails probability answers the question
(figure 9.4).
Answer: Prob[Results IF H0 true] = 0.0538 ≈ 0.054

Student t-distribution
Mean = 0
SE = 0.183
DF = 8

0.0538/2 0.0538/2

bClever
0.414 0.414
0 0.414

Figure 9.4
Probability distribution of constant elasticity model’s coefficient estimate = clever approach
303 One-Tailed Tests, Two-Tailed Tests, and Logarithms

This is the same value for the probability that we computed when we used the Econometrics
Lab. By a clever algebraic manipulation, we can get the statistical software to perform the prob-
ability calculations. Now we turn to the final hypothesis-testing step.

Step 5: Decide on the standard of proof, a significance level.

The significance level is the dividing line between the probability being small and the prob-
ability being large.
Prob[Results IF H0 true] Prob[Results IF H0 true]
less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 true] Prob[Results IF H0 true] large
small
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

At a 1 or 5 percent significance level, we do not reject the null hypothesis that the elasticity of
demand equals −1.0, thereby supporting the budget theory of demand. That is, at a 1 or 5 percent
significance level, the estimate of −0.586 is not statistically different from −1.0.

9.5 Summary: One-Tailed and Two-Tailed Tests

The theory that we are testing determines whether we should use of a one-tailed or two-tailed
test. When the theory suggests that the actual value of a coefficient is greater than or less than
a specific constant, a one-tailed test is appropriate. Most economic theories fall into this category.
In fact most economic theories suggest that the actual value of the coefficient is either greater
than 0 or less than 0 (see figure 9.5). For example, economic theory teaches that the price should
have a negative influence on the quantity demanded; similarly theory teaches that the price
should have a positive influence on the quantity supplied. In most cases economists use one-
tailed tests. However, some theories suggest that the coefficient equals a specific value; in these
cases a two-tailed test is required.

9.6 Logarithms: A Useful Econometric Tool to Fine Tune Hypotheses—The Math

The constant price elasticity model is just one example of how logarithms can be a useful
econometric tool. Generally, logarithms provide a very convenient way to test hypotheses that
304 Chapter 9

Theory: β > c or β < c

Probability distribution

H0 : β = c
H 1: β > c

b
c
Probability distribution

H0 : β = c
H1 : β < c

b
c
Theory: β = c
Probability distribution
H0 : β = c
H1 : β ≠ c

b
c
Prob[Results IF H true] =
0
Probability of obtaining results like those
we actually got (or even stronger), if H0 is true

Prob[Results IF H true] =
0
Small Large

Reject H Do not reject H

0 0

Figure 9.5
One-tailed and two-tailed tests—A comparison
305 One-Tailed Tests, Two-Tailed Tests, and Logarithms

are expressed in terms of percentages rather than “natural” units. To see how, we will first review
three concepts:
• The interpretation of the coefficient estimate
• The differential approximation
• The derivative of a logarithm

9.6.1 Interpretation of the Coefficient Estimate: Esty = bConst + bx x

Let x increase by Δx: x → x + Δx. Consequently the estimated value of y will increase by Δy:
Esty → Esty + Δy:

Esty + Δy = bConst + bx (x + Δx)

Multiply through by bx,

Esty + Δy = bConst + bx x + bx Δx
Reconsider the original equation

Esty = bConst + bx x

Subtracting the original equation obtains

Δy = bx Δx

In words, bx estimates the unit change in the dependent variable y resulting from a one unit
change in explanatory variable x.

9.6.2 Differential Approximation: Δy ≈ (dy/dx)Δx

In words, the derivative tells us by approximately how much y changes when x changes by a
small amount; that is, the derivative equals the change in y caused by a one (small) unit change
in x.
306 Chapter 9

9.6.3 Derivative of a Natural Logarithm: d log(z)/dx = 1/z

The derivative of the natural logarithm of z with respect to z equals 1 divided by z.2 We have
already considered the case in which both the dependent variable and explanatory variable are
logarithms. Now we will consider two cases in which only one of the two variables is a
logarithm:
• Dependent variable is a logarithm.
• Explanatory variable is a logarithm.

9.6.4 Dependent Variable Logarithm: y = log(z)

Regression: Esty = bConst + bx x where y = log(z)

Interpreting bx ↓ ↓ Differential approximation
d log( z )
Δy = bx Δx Δy ≈ Δz
dz
↓ Derivative of logarithm
1 Δz
Δy ≈ Δz =
z z
Substituting Δz/z for Δy,

Δz
≈ bx Δx
z
Δz Multiply both sides of the
× 100 ≈ (bx × 100)Δx
z equation by 100,
Interpretation of Δz/z × 100:
percent change in z

Percent change in z ≈ (bx ×100) Δx

In words, when the dependent variable is a logarithm, bx × 100 estimates the percent change
in the dependent variable resulting from a one unit change in the explanatory variable, which is
the percent change in y resulting from a one (natural) unit change in x.

2. The log notation refers to the natural logarithm (logarithm base e), not the logarithm base 10.
307 One-Tailed Tests, Two-Tailed Tests, and Logarithms

9.6.5 Explanatory Variable Logarithm of z: x = log(z)

Regression: Esty = bConst + bx x where x = log(z)

Interpreting bx ↓ ↓ Differential approximation
Δy = bx Δx d log( z )
Δx ≈ Δz
dz
↓ Derivative of logarithm
1 Δz
Δx ≈ Δz =
z z
Substituting Δz/z for Δx,
Δz
Δy ≈ bx
z
bx ⎛ Δz ⎞ Multiply and divide the
Δy ≈ ⎜ × 100⎟⎠
100 ⎝ z right side by 100
Δz
Interpretation of ×100 :
z
percent change in z,
bx
Δy ≈ × Percent change in z
100

In words, when the explanatory variable is a logarithm, bx/100 estimates the (natural) unit
change in the dependent variable resulting from a 1 percent change in the explanatory variable,
which is the unit change in z resulting from a 1 percent change in z.

9.7 Using Logarithms—An Illustration: Wages and Education

To illustrate the usefulness of logarithms, consider the effect of a worker’s high education on
his/her wage. Economic theory (and common sense) suggests that a worker’s wage is influenced
by the number of years of education he/she completes:

Theory: Additional years of education increases a workers wage rate.

Project: Assess the effect of education on salary.

To assess this theory we will focus on the effect of high school education; we consider workers
who have completed the ninth, tenth, eleventh, or twelfth grades and have not continued on to
college or junior college. We will use data from the March 2007 Current Population Survey. In
the process we can illustrate the usefulness of logarithms. Logarithms allow us to fine-tune our
hypotheses by expressing them in terms of percentages.
308 Chapter 9

Wage and education data: Cross-sectional data of wages and education for 212 workers
included in the March 2007 Current Population Survey residing in the Northeast region of the
United States who have completed the ninth, tenth, eleventh, or twelfth grades, but have not
continued on to college or junior college.

Waget Wage rate earned by worker t (dollars per hour)

HSEduct Highest high school grade completed by worker t (9, 10, 11, or 12 years)

We can consider four models that capture the theory in somewhat different ways:
• Linear model
• Log dependent variable model
• Log explanatory variable model
• Log-log (constant elasticity) model

9.7.1 Linear Model: Waget = βConst + βEHSEduct + et

The linear model includes no logarithms. Wage is expressed in dollars and education
in years.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and High School Education.]

As table 9.5 reports we estimate that an additional year of high school increases the wat by about
$1.65 per hour. It is very common to express wage increases in this way. All the time we hear
people say that they received a $1.00 per hour raise or a $2.00 per hour raise. It is also very

Table 9.5
Wage regression results with linear model

Ordinary least squares (OLS)

Dependent variable: Wage

Explanatory variable(s): Estimate SE t-Statistic Prob

HSEduc 1.645899 0.555890 2.960834 0.0034

Const −3.828617 6.511902 − 0.587941 0.5572
Number of observations 212
Estimated equation: EstWage = −3.83 + 1.65HSEduc
Interpretation of estimates:
bE = 1.65: 1 additional year of high school education increases the wage by about $1.65 per hour.
309 One-Tailed Tests, Two-Tailed Tests, and Logarithms

common to hear raises expressed in percentage terms. When the results of labor new contracts
are announced the wage increases are typically expressed in percentage terms; management
agreed to give workers a 2 percent increase or a 3 percent increase. This observation leads us
to our next model: the log dependent variable model.

9.7.2 Log Dependent Variable Model: LogWaget = βConst + βEHSEduct + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and High School Education.]

First we must generate a new dependent variable, the log of wage:

LogWage = log(Wage)

The dependent variable (LogWage) is expressed in terms of the logarithm of dollars; the explana-
tory variable (HSEduc) is expressed in years (table 9.6).
Let us compare the estimates derived by our two models:
• The linear model implicitly assumes that the impact of one additional year of high school
education is the same for each worker in terms of dollars. We estimate that a worker’s wage
increases by $1.65 per hour for each additional year of high school (table 9.5).
•The log dependent variable model implicitly assumes that the impact of one additional year of
high school education is the same for each worker in terms of percentages. We estimate that a
worker’s wage increases by 11.4 percent for each additional year of high school (table 9.6).

The estimates each model provides differ somewhat. For example, consider two workers, the
first earning $10.00 per hour and a second earning $20.00. On the one hand, the linear model
estimates that an additional year of high school would increase the wage of each worker by

Table 9.6
Wage regression results with log dependent variable model

Ordinary Least Squares (OLS)

Dependent variable: LogWage

Explanatory variable(s): Estimate SE t-Statistic Prob

HSEduc 0.113824 0.033231 3.425227 0.0007

Const 1.329791 0.389280 3.416030 0.0008
Number of Observations 212
Estimated equation: EstLogWage = 1.33 + 0.114HSEduc
Interpretation of estimates:
bE = 0.114: 1 additional year of high school education increases the wage by about 11.4 percent.
310 Chapter 9

$1.65 per hour. On the other hand, the log dependent variable model estimates that an additional
hear of high school would increase the wage of the first worker by 11.4 percent of $10.00, $1.14
and the second worker by 11 percent of $20.00, $2.28.
As we will see, the last two models (the log explanatory variable and log-log models) are not
particularly natural in this context of this example. We seldom express differentials in education
as percentage differences. Nevertheless, the log explanatory variable and log-log models are
appropriate in many other contexts. Therefore we will apply them to our wage and education
data even thought the interpretations will sound unusual.

9.7.3 Log Explanatory Variable Model: Waget = βConst + βELogHSEduct + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and High School Education.]

Generate a new dependent variable, the log of experience:

LogHSEduc = log(HSEduc)

The dependent variable (Wage) is expressed in terms of dollars; the explanatory variable
(LogHSEduc) is expressed in terms of the log of years. As mentioned above, this model is not
particularly appropriate for this example because we do not usually express education differences
in percentage terms. Nevertheless, the example does illustrate how we interpret the coefficient
in a log explanatory variable model. The regression results estimate that a 1 percent increase in
high school education increases the way by about $.17 per hour (table 9.7).

Table 9.7
Wage regression results with log explanatory variable model

Ordinary least squares (OLS)

Dependent variable: Wage

Explanatory variable(s): Estimate SE t-Statistic Prob

LogHSEduc 17.30943 5.923282 2.922270 0.0039

Const −27.10445 14.55474 −1.862242 0.0640
Number of observations 212
Estimated equation: EstWage = −27.1 + 17.31LogHSEduc
Interpretation of estimates:
bE = 17.31: A 1 percent increase in high school education increases the wage by about $.17 per hour.
311 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Table 9.8
Wage regression results with constant elasticity model

Ordinary least squares (OLS)

Dependent variable: LogWage

Explanatory variable(s): Estimate SE t-Statistic Prob

LogHSEduc 1.195654 0.354177 3.375868 0.0009

Const − 0.276444 0.870286 −0.317647 0.7511
Number of observations 212
Estimated equation: EstLogWage = − 0.28 + 1.20LogHSEduc
Interpretation of estimates:
bE = 1.20: A 1 percent increase in high school education increases the wage by about 1.2 percent.

9.7.4 Log-Log (Constant Elasticity) Model: LogWaget = βConst + βELogHSEduct + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and High School Education.]

Both the dependent and explanatory variables are expressed in terms of logs. This is just the
constant elasticity model that we discussed earlier. The regression results estimate that a
1 percent increase in high school education increases the wage by 1.2 percent (table 9.8).
While the log-log model is not particularly appropriate in this case, we have already seen that
it can be appropriate in other contexts. For example, this was the model we used to assess the
budget theory of demand earlier in this chapter.

9.8 Summary: Logarithms and the Interpretation of Coefficient Estimates

Dependent variable: y Explanatory variable: x

Coefficient estimate: Estimates the (natural) unit change in y
resulting from a one (natural) unit change in x
Dependent variable: log(y) Explanatory variable: x
Coefficient estimate multiplied by 100: Estimates the percent
change in y resulting from a one (natural) unit change in x
Dependent variable: y Explanatory variable: log(x)
Coefficient estimate divided by 100: Estimates the (natural) unit
change in y resulting from a 1 percent change in x
Dependent variable: log(y) Explanatory variable: log(x)
Coefficient estimate: Estimates the percent change in y resulting
from a 1 percent change in x
312 Chapter 9

Chapter 9 Review Questions

1. Consider the general structure of the theory, the null hypothesis, and alternative hypothesis.
When is a
a. One-tailed hypothesis appropriate?
Theory: ___________________
H0: ___________
H1: ___________
b. Two-tailed hypothesis appropriate?
Theory: ___________________
H0: ___________
H1: ___________

2. How should the coefficient estimate be interpreted when the dependent and explanatory
variables are specified as:
a. Dependent variable: y and explanatory variable: x
b. Dependent variable: log(y) and explanatory variable: x
c. Dependent variable: y and explanatory variable: log(x)
d. Dependent variable: log(y) and explanatory variable: log(x)

Chapter 9 Exercises

Revisit Nebraska petroleum consumption.

Petroleum consumption data for Nebraska: Annual time series data of petroleum consumption
and prices for Nebraska from 1990 to 1999.

PetroConst Consumption of petroleum in year t (1,000s of gallons)

Cpit Midwest Consumer Price Index in year t (1982–84 100)
Popt Nebraska population in year t
PriceNomt Nominal price of petroleum in year t (dollars per gallon)

1. Generate two new variables from the Nebraska data:

PetroConsPCt Per capita consumption of petroleum in year t (gallons)

PriceRealt Real price of petroleum in year t (dollars per gallon)
313 One-Tailed Tests, Two-Tailed Tests, and Logarithms

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption - Neb.]
a. Petroleum consumption includes the consumption of all petroleum products: gasoline, fuel
oil, and so on. Consequently, would you expect petroleum to be a necessity or a luxury?
b. In view of your answer to part a, would expect the per capita demand for petroleum to be
inelastic? Explain.
c. Consequently, why would the numerical value of the real price elasticity of demand be
greater than −1.0? (Remember the “number line.” Note that −0.8, −0.6, etc., are all greater
than −1.0.)
d. Apply the hypothesis-testing approach that we developed to assess the theory. Calculate
Prob[Results IF H0 true] in two ways:
i. Using the Econometrics Lab.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

ii. Using the “clever definition” approach.

e. What is your assessment of the theory? Explain.

2. Consider the budget theory of demand in the context of per capita petroleum consumption:

PriceReal × PetroConsPC = RealBudAmt

where

RealBudAmt = real budgeted amount

a. Apply the hypothesis-testing approach that we developed to assess budget theory of

demand.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption - Neb.]

Calculate Prob[Results IF H0 true] in two ways:

i. Using the Econometrics Lab.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

ii. Using the “clever definition” approach.

b. What is your assessment of the theory? Explain.

3. Revisit the US crude oil supply data.

314 Chapter 9

Crude oil production data: Annual time series data of US crude oil production and prices from
1976 to 2004.

OilProdBarrelst US crude oil productions in year t (thousands of barrels per day)

Pricet Real wellhead price of crude oil in year t (1982–84 dollars per barrel)

Consider the following rather bizarre theory of supply:

Theory of supply: The price elasticity of supply equals 0.10.

a. Apply the hypothesis-testing approach that we developed to assess the theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crude Oil Production.]

Calculate Prob[Results IF H0 true] in two ways:

i. Using the Econometrics Lab.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

ii. Using the “clever definition” approach.

b. What is your assessment of the theory? Explain.
4. Revisit the gasoline consumption data.

Gasoline consumption data: Annual time series data for US gasoline consumption and prices
from 1990 to 1999.

GasConst US gasoline consumption in year t (millions of gallons per day)

PriceDollarst Real price of gasoline in year t (chained 2000 dollars per gallon)

a. Would you theorize the price elasticity of demand for gasoline to be elastic or inelastic?
Explain.
b. Apply the hypothesis-testing approach that we developed to assess the theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Gasoline Consumption.]

c. What is your assessment of the theory? Explain.

5. Revisit the cigarette consumption data.
315 One-Tailed Tests, Two-Tailed Tests, and Logarithms

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.

CigConsPCt Cigarette consumption per capita in state t (packs)

PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)

a. Would you theorize that the price elasticity of demand for cigarettes would be elastic or
inelastic? Explain.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the price elasticity
of demand.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

c. Does your estimate for the price elasticity of demand support your theory? Explain.

6. Reconsider the Current Population Survey wage data.

Wage and age data: Cross section data of wages and ages for 190 union members included in
the March 2007 Current Population Survey who have earned high school degrees, but have not
had any additional education.

Aget Age of worker t (years)

Waget Wage rate of worker t (dollars per hour)

And recall the seniority theory:

Seniority theory: Additional years of age increases wage rate.

We often describe wage increases in terms of percent changes. Apply the hypothesis-testing
approach that we developed to assess this “percent increase version” of the seniority theory.
a. Apply the hypothesis-testing approach that we developed to assess the seniority theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Wage and Age.]
b. What is your assessment of the theory? Explain.
316 Chapter 9

7. Revisit the effect that the Current Population Survey labor supply data.

HoursPerWeekt Hours worked per week by worker t

Waget Wage earned by worker t (dollars per hour)

Consider the theory that the supply elasticity is inelastic; that is, that the wage elasticity is
less than 1.

a. Apply the hypothesis-testing approach that we developed to assess the inelastic labor
supply theory.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Labor Supply.]

b. What is your assessment of the theory? Explain.

Multiple Regression Analysis—Introduction
10

Chapter 10 Outline

10.1 Simple versus Multiple Regression Analysis

10.2 Goal of Multiple Regression Analysis

10.3 A One-Tailed Test: Downward Sloping Demand Theory

10.4 A Two-Tailed Test: No Money Illusion Theory

10.4.1 Linear Demand Model and the No Money Illusion Theory
10.4.2 Constant Elasticity Demand Model and the No Money Illusion Theory
10.4.3 Calculating Prob[Results IF H0 true]: Clever Algebraic Manipulation

Chapter 10 Prep Questions

1. Consider the following constant elasticity model:

Q = βConstPβPIβIChickPβCP

where

Q = quantity of beef demanded

P = price of beef (the good’s own price)
I = household Income
ChickP = price of chicken
a. Show that if βCP = −βP − βI, then
β βI
⎛ P ⎞ ⎛ I ⎞
P

Q = βConst ⎜ ⎟
⎝ ChickP ⎠ ⎜⎝ ChickP ⎟⎠
318 Chapter 10

b. If βCP = −βP − βI, what happens to the quantity of beef demanded when the price of beef
(the good’s own price, P), income (I), and the price of chicken (ChickP) all double?
c. If βP + βI + βCP = 0, what happens to the quantity of beef demanded when the price of
beef (the good’s own price, P), income (I), and the price of chicken (ChickP) all double?

2. Again, consider the following constant elasticity model:

Q = βConstPβPIβIChickPβCP

What does log(Q) equal, where log is the natural logarithm?

3. Consider the following model:

log(Q) = log(βConst) + βP log(P) + βI log(I) + βCP log(ChickP)

Let βClever = βP + βI + βCP. Show that

log(Q) = log(βConst) + βP[log(P) − log(ChickP)] + βI[log(I) − log(ChickP)] + βClever log(ChickP)

10.1 Simple versus Multiple Regression Analysis

Thus far we have focused our attention on simple regression analysis where the model assumes
that only a single explanatory variable affects the dependent variable. In the real world, however,
a dependent variable typically depends on many explanatory variables. For example, while
economic theory teaches that the quantity of a good demanded depends on the good’s own price,
theory also tells us that the quantity depends on other factors also: income, the price of other
goods, and so on. Multiple regression analysis allows us to assess such theories.

10.2 Goal of Multiple Regression Analysis

• Multiple regression analysis attempts to sort out the individual effect of each explanatory
variable.
• An explanatory variable’s coefficient estimate allows us to estimate the change in the dependent
variable resulting from a change in that particular explanatory variable while all other explana-
tory variables remain constant.

10.3 A One-Tailed Test: Downward Sloping Demand Theory

We begin by explicitly stating the theory:

Downward sloping demand theory: The quantity of a good demanded by a household depends
on its price and other relevant factors. When the good’s own price increases while all other
relevant factors remain constant, the quantity demanded decreases.
319 Multiple Regression Analysis—Introduction

All other factors relevant

to demand remain constant

D
Q

Figure 10.1
Downward sloping demand curve

Project: Assess the downward sloping demand theory.

Graphically, the theory is illustrated by a downward sloping demand curve (figure 10.1). When
we draw a demand curve for a good, we implicitly assume that all factors relevant to demand
other than that good’s own price remain the constant.
We will focus on the demand for a particular good, beef, to illustrate the importance of mul-
tiple regression analysis. We now apply the hypothesis-testing steps.

Step 0: Formulate a model reflecting the theory to be tested.

We will use a linear demand model to test the theory. Naturally the quantity of beef demanded
depends on its own price, the price of beef. Furthermore we postulate that the quantity of beef
demanded also depends on income and the price of chicken. In other words, our model proposes
that the factors relevant to the demand for beef, other than beef’s own price, are income and the
price of chicken.

Q = βConst + βPPt + βIIt + βChickPChickPt + et

where

Qt = quantity of beef demanded

Pt = price of beef (the good’s own price)
It = household Income
ChickPt = price of chicken

The theory suggests that when income and the price of chicken remain constant, an increase
in the price of beef (the good’s own price) decreases the quantity of beef demanded (figure 10.2);
similarly, when income and the price chicken remain constant, a decrease in the price of beef
(the good’s own price) increases the quantity of beef demanded:
320 Chapter 10

Income and the price of

chicken remain constant

D
Q

Figure 10.2
Downward sloping demand curve for beef

When income (It) and the price of

chicken (ChickPt) remain constant:

An increase in the price of A decrease in the price of

beef, the good’s own beef, the good’s own
price (Pt) price (Pt)
↓ ↓
Quantity of beef Quantity of beef
demanded (Qt) demanded (Qt)
decreases increases
The theory suggests that the model’s price coefficient, βP, is negative:

Pt Increases It Constant Chickt Constant

↓ ↓ ↓
Q = βConst + βPPt + βIIt + βChickPChickPt + et
↓ ↓ ↓
Decreases Constant Constant
↓
βP < 0

Economic theory teaches that the sign of coefficients for the explanatory variables other than
the good’s own price may be positive or negative. Their signs depend on the particular good in
question:
• The sign of βI depends on whether beef is a normal or inferior good. Beef is generally regarded
as a normal good; consequently we would expect βI to be positive: an increase in income results
in an increase in the quantity of beef demanded.
321 Multiple Regression Analysis—Introduction

Beef a normal good

↓
βI > 0
• The sign of βCP depends on whether beef and chicken are substitutes or complements. Beef
and chicken are generally believed to be substitutes; consequently, we would expect βCP to be
positive. An increase in the price of chicken would cause consumers to substitute beef for the
now more expensive chicken; that is, an increase in the price of chicken results in an increase
in the quantity of beef demanded.

Beef and chicken substitutes

↓
βCP > 0

Step 1: Collect data, run the regression, and interpret the estimates.

Beef consumption data: Monthly time series data of beef consumption, beef prices, income,
and chicken prices from 1985 and 1986 (table 10.1).

Qt Quantity of beef demanded in month t (millions of pounds)

Pt Price of beef in month t (cents per pound)
It Disposable income in month t (billions of 1985 dollars)
ChickPt Price of chicken in month t (cents per pound)

These data can be accessed at the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

We now use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 10.2).
To interpret these estimates, let us for the moment replace the numerical value of each estimate
with the italicized lower case Roman letter b, b, that we use to denote the estimate. That is,
replace the estimated:
• Constant, 159,032, with bConst
• Price coefficient, −549.5, with bP
• Income coefficient, 24.25, with bI
• Chicken price coefficient, 287.4, with bCP

EstQ = 159,032 − 549.5P + 24.25I + 287.4ChickP

↓ ↓ ↓ ↓
EstQ = bConst + bPP + bII + bCPChickP
322 Chapter 10

Table 10.1
Monthly beef demand data from 1985 and 1986

Year Month Q P I ChickP

1985 1 211,865 168.2 5,118 75.0

1985 2 216,183 168.2 5,073 75.9
1985 3 216,481 161.8 5,026 74.8
1985 4 219,891 157.2 5,131 73.7
1985 5 221,934 155.9 5,250 73.6
1985 6 217,428 157.2 5,137 74.6
1985 7 219,486 152.9 5,138 71.4
1985 8 218,972 151.9 5,133 69.3
1985 9 218,742 147.4 5,152 70.9
1985 10 212,243 160.4 5,180 72.3
1985 11 209,344 168.4 5,189 76.2
1985 12 215,232 172.1 5,213 75.7
1986 1 222,379 159.7 5,219 75.0
1986 2 219,337 152.9 5,247 73.7
1986 3 224,257 149.9 5,301 74.2
1986 4 235,454 144.6 5,313 75.1
1986 5 230,326 151.9 5,319 74.6
1986 6 228,821 150.1 5,315 77.1
1986 7 229,108 156.5 5,339 85.6
1986 8 225,543 164.3 5,343 93.3
1986 9 220,516 160.6 5,348 81.9
1986 10 221,239 163.2 5,344 92.5
1986 11 223,737 162.9 5,351 82.7
1986 12 226,660 160.4 5,345 81.8

Table 10.2
Beef demand regression results—Linear model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P −549.4847 130.2611 − 4.218333 0.0004

I 24.24854 11.27214 2.151192 0.0439
ChickP 287.3737 193.3540 1.486257 0.1528
Const 159,032.4 61,472.68 2.587041 0.0176
Number of observations 24
Estimated equation: EstQ = 159,032 − 549.5P + 24.25I + 287.4ChickP
323 Multiple Regression Analysis—Introduction

The coefficient estimates attempt to separate out the individual effect that each explanatory
variable has on the dependent variable. To justify this, focus on the estimate of the beef price
coefficient, bP. It estimates by how much the quantity of beef changes when the price of beef
(the good’s own price) changes while income and the price of chicken (all other explanatory
variables) remain constant. More formally, when all other explanatory variables remain
constant:
ΔQ
ΔQ = bP ΔP or bP =
ΔP

where

ΔQ = change in the quantity of beef demanded

ΔP = change in the price of beef, the good’s own price

A little algebra explains why. We begin with the equation estimating our model:

EstQ = bConst + bPP + bII + bCPChickP

Now increase the price of beef (the good’s own price) by ΔP while keeping all other explanatory
variables constant. ΔQ estimates the resulting change in quantity of beef demanded.
From To
Price: P → P + ΔP
Quantity: EstQ → EstQ + ΔQ

while all other explanatory variables remain constant; that is, while I and ChickP remain
constant.
In the equation estimating our model, substitute EstQ + ΔQ for EstQ and P + ΔP for P:

EstQ = bConst + bPP + bII + bCPChickP

↓ ↓ Substituting
EstQ + ΔQ = bConst + bP(P + ΔP) + bII + bCPChickP
↓ Multiplying through by bP
EstQ + ΔQ = bConst + bPP + bPΔP + bII + bCPChickP

EstQ = bConst + bPP + bII + bCPChickP Original equation

Subtracting the equations

ΔQ = 0 + bPΔP + 0 + 0
Simplifying
324 Chapter 10

ΔQ = bPΔP
Dividing through by ΔP

ΔQ
= bP while all other explanatory variables remain constant
ΔP

To summarize,
ΔQ
ΔQ = bP ΔP or bP =
ΔP

while all other explanatory variables (I and ChickP) remain constant.

Note that the sign of bP determines whether or not the data support the downward sloping
demand theory. A demand curve illustrates what happens to the quantity demanded when the
price of the good changes while all other factors that affect demand (in the case income and the
price of chicken) remain constant. bP is the estimated “slope” of the demand curve.
The word slope has been placed in quotes. Why is this? Slope is defined as rise divided by
run. As figure 10.3 illustrates, bP equals run over rise, however. This occurred largely by an
historical accident. When economists, Alfred Marshall in particular, first developed demand and
supply curves, they placed the price on the vertical axis and the quantity on the horizontal axis.
Consequently bP actually equals the reciprocal of the estimated slope. To avoid using the
awkward phrase “the reciprocal of the estimated slope” repeatedly, we place the word slope
within double quotes to denote this.
Now let us interpret the other coefficients. Using similar logic, we have

ΔQ
ΔQ = bI ΔI or bI =
ΔI

"Slope" = bP

Income and the price of

chicken remain constant
ΔP

ΔQ

D
Q

Figure 10.3
Demand curve “slope”
325 Multiple Regression Analysis—Introduction

while all other explanatory variables (P and ChickP) remain constant. bI estimates the change
in quantity when income changes while all other explanatory variables (the price of beef and
the price of chicken) remain constant.
ΔQ
ΔQ = bCP ΔChickP or bCP =
ΔPChick

while all other explanatory variables (P and I) remain constant. bCP estimates the change in
quantity when the price of chicken changes while all other explanatory variables (the price of
beef and income) remain constant.
What happens when the price of beef (the good’s own price), income, and the price of chicken
change simultaneously? The total estimated change in the quantity of beef demanded just equals
the sum of the individual changes; that is, the total estimated change in the quantity of beef
demanded equals the change resulting from the change in
• the price of beef (the good’s own price)
plus
• income

plus
• the price of chicken

The following equation expresses this succinctly:

Total Change in the price of beef Change in Change in the

change (the good’s own price) income price of chicken
↓ ↓ ↓ ↓
ΔQ = bPΔP + bI ΔI + bCP ΔChickP

Each term estimates the change in the dependent variable, quantity of beef demanded, resulting
from a change in each individual explanatory variable.
The estimates achieve the goal:

Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect of each explanatory variable. An explanatory variable’s coefficient estimate allows
us to estimate the change in the dependent variable resulting from a change in that particular
explanatory variable while all other explanatory variables remain constant.

Now let us interpret the numerical values of the coefficient estimates,

Estimated effect of a change in the price of beef (the good’s own price):

ΔQ = bpΔP = −549.5ΔP

while all other explanatory variables remain constant.

326 Chapter 10

Interpretation: The ordinary least squares (OLS) estimate of the price coefficient equals −549.5;
that is, we estimate that if the price of beef increases by 1 cent while income and the price of
chicken remain unchanged, the quantity of beef demanded decreases by about 549.5 million
pounds.

Estimated effect of a change in income:

ΔQ = bIΔI = 24.25ΔI

while all other explanatory variables remain constant.

Interpretation: The ordinary least squares (OLS) estimate of the income coefficient equals
24.25; that is, we estimate that if disposable income increases by 1 billion dollars while the price
of beef and the price of chicken remain unchanged, the quantity of beef demanded increases by
about 24.25 million pounds.

Estimated effect of a change in the price of chicken:

ΔQ = bCPΔChickP = 287.4ΔChickP

while all other explanatory variables remain constant.

Interpretation: The ordinary least squares (OLS) estimate of the chicken price coefficient equals
287.4; that is, we estimate that if the price of chicken increases by 1 cent while the price of beef
and income remain unchanged, the quantity of beef demanded increases by about 287.4 million
pounds.

Putting the three estimates together obtains

ΔQ = bpΔP + bIΔI + bCPΔChickP

ΔQ = −549.5ΔP + 24.25ΔI + 287.4ΔChickP

We estimate that the total change in the quantity of beef demanded equals −549.5 times the
change in the price of beef (the good’s own price) plus 24.25 times the change in disposable
income plus 287.4 times the change in the price of chicken.
Recall that the sign of the estimate for the good’s own price coefficient, bP, determines whether
or not the data support the downward sloping demand theory. bP estimates the change in the
quantity of beef demanded when the price of beef (the good’s own price) changes while the
other explanatory variables, income and the price of chicken, remain constant. The theory pos-
tulates that an increase in the good’s own price decreases the quantity of beef demanded. The
negative price coefficient estimate lends support to the theory.
327 Multiple Regression Analysis—Introduction

The own price coefficient estimate is −549.5. The negative sign of the coefficient
Critical result:
estimate suggests that an increase in the price decreases the quantity of beef demanded. This
evidence supports the downward sloping theory.

Now let us continue with the hypothesis testing steps.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.

The cynic is skeptical of the evidence supporting the view that the actual price coefficient,
βP, is negative; that is, the cynic challenges the evidence and hence the downward sloping
demand theory:

Cynic’s view: Sure, the price coefficient estimate from the regression suggests that the demand
curve is downward sloping, but this is just “the luck of the draw.” The actual price coefficient,
βP, equals 0.

H0: βP = 0 Cynic is correct: The price of beef (the good’s own price) has no effect on
quantity of beef demanded.
H1: βP < 0 Cynic is incorrect: An increase in the price decreases quantity of beef demanded.

The null hypothesis, like the cynic, challenges the evidence: an increase in the price of beef has
no effect on the quantity of beef demanded. The alternative hypothesis is consistent with the
evidence: an increase in the price decreases the quantity of beef demanded.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we obtained
(or even stronger), if the cynic is correct and the price of beef actually has no impact?
•Specific question: What is the probability that the coefficient estimate, bP, in one regression
would be −549.5 or less, if H0 were true (if the actual price coefficient, βP, equals 0)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true].

Figure 10.4 illustrates the Prob[Results IF H0 true].

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true] (figure 10.5).

OLS estimation Number of Number of

If H0 true Standard error
procedure unbiased observations parameters
↓
Mean[bP] = βP = 0 SE[bP] = 130.3 DF = 24 − 4 = 20
328 Chapter 10

Prob[Results IF H0 true]

bP
−549.5 0

Figure 10.4
Probability distribution of coefficient estimate for the beef price

Student t-distribution
0.0004/2 = 0.0002 Mean = 0
SE = 130.3
DF = 20

bP
−549.5 0

Figure 10.5
Calculating Prob[Results IF H0 true]

We can now calculate Prob[Results IF H0 true]. The easiest way is to use the regression results.
Recall that the tails probability is reported in the Prob column. The tails probability is .0004;
therefore, to calculate Prob[Results IF H0 true] we need only divide 0.0004 by 2 (table 10.3).

0.0004
Prob[ Results IF H 0 true] = ≈ 0.0002
2

Step 5: Decide on the standard of proof, a significance level.

The significance level is the dividing line between the probability being small and the probability
being large.
329 Multiple Regression Analysis—Introduction

Table 10.3
Beef demand regression results—Linear model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P −549.4847 130.2611 − 4.218333 0.0004

I 24.24854 11.27214 2.151192 0.0439
ChickP 287.3737 193.3540 1.486257 0.1528
Const 159,032.4 61,472.68 2.587041 0.0176
Number of observations 24

Prob[Results IF H0 true] less than Prob[Results IF H0 true] greater than

significance level significance level
↓ ↓
Prob[Results IF H0 true] small Prob[Results IF H0 true] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

We can reject the null hypothesis at the traditional significance levels of 1, 5, and 10 percent.
Consequently the data support the downward sloping demand theory.

10.4 A Two-Tailed Test: No Money Illusion Theory

We will now consider a second theory regarding demand. Microeconomic theory teaches that
there is no money illusion; that is, if all prices and income change by the same proportion, the
quantity of a good demanded will not change. The basic rationale of this theory is clear. Suppose
that all prices double. Every good would be twice as expensive. If income also doubles, however,
consumers would have twice as much to spend. When all prices and income double, there is no
reason for a consumer to change his/her spending patterns; that is, there is no reason for a con-
sumer to change the quantity of any good he/she demands.
We can use indifference curve analysis to motivate this more formally.1 Recall the household’s
utility maximizing problem:

max Utility = U(X, Y)

s.t. PXX + PYY = I

1. If you are not familiar with indifference curve analysis, please skip to the Linear Demand Model and Money Illusion
Theory section and accept the fact that the no money illusion theory is well grounded in economic theory.
330 Chapter 10

Y
Indifference curve

I /PY

Solution

Budget constraint
Slope = −PX /PY

X
I /P
X

Figure 10.6
Utility maximization

A household chooses the bundle of goods that maximizes its utility subject to its budget con-
straint. How can we illustrate the solution to the household’s problem? First, we draw the budget
constraint. To do so, let us calculate its intercepts.

PXX + PYY = I

X-intercept: Y = 0 Y-intercept: X = 0
↓ ↓
PXX = I PYY = I
↓ ↓
I I
X= Y=
PX PY

Next, to maximize utility, we find the highest indifference curve that still touches the budget
constraint as illustrated in figure 10.6.
Now suppose that all prices and income double:

Before After
max Utility = U(X, Y) PX → 2PX max Utility = U(X, Y)
s.t. PXX + PYY = I PY → 2PY s.t. 2PXX + 2PYY = 2I
I → 2I
331 Multiple Regression Analysis—Introduction

How is the budget constraint affected? To answer this question, calculate the intercepts after all
prices and income have doubled and then compare them to the original ones:

2PXX + 2PYY = 2I

X-intercept: Y = 0 Y-intercept: X = 0
↓ ↓
2PXX = 2I 2PYY = 2I
↓ ↓
2I I 2I I
X= = Y= =
2 PX PX 2 PY PY

Since the intercepts have not changed, the budget constraint line has not changed; hence, the
solution to the household’s constrained utility maximizing problem will not change.
In sum, the no money illusion theory is based on sound logic. But remember, many theories
that appear to be sensible turn out to be incorrect. That is why we must test our theories.

Project: Use the beef demand data to assess the no money illusion theory.

Can we use our linear demand model to do so? Unfortunately, the answer is no. The linear
demand model is inconsistent with the proposition of no money illusion. We will now explain
why.

10.4.1 Linear Demand Model and Money Illusion Theory

The linear demand model is inconsistent with the no money illusion proposition because it
implicitly assumes that “slope” of the demand curve equals a constant value, βP, and unaffected
by income or the price chicken.2 To understand why, consider the linear model:

Q = βConst + βPP + βII + βCPChickP

and recall that when we draw a demand curve income and the price of chicken remain constant.
Consequently for a demand curve:

Q = QIntercept + βPP

where

QIntercept = βConst + βII + βCPChickP

2. Again, recall that quantity is plotted on the horizontal and price is plotted on the vertical axis, the slope of the demand
curve is actually the reciprocal of βP, 1/βP. That is why we place the word “slope” within quotes. This does not affect
the validity of our argument, however. The important point is that the linear model implicitly assumes that the “slope”
of the demand curve is constant, unaffected by changes in other factors relevant to demand.
332 Chapter 10

P P P

P2
Q Q1 Q Q
Q0 Q2

Figure 10.7
Demand curve for beef

This is just an equation for a straight line; βP equals the “slope” of the demand curve.
Now consider three different beef prices and the quantity of beef demanded at each of the
prices while income and chicken prices remain constant:

Income and the price of

chicken constant
Price of beef P0 P1 P2
↓ ↓ ↓
Quantity of beef demanded Q0 Q1 Q2

When the price of beef is P0, Q0 units of beef are demanded; when the price of beef is P1, Q1
units of beef are demanded; and when the price of beef is P2, Q2 units of beef are demanded
(figure 10.7).
Now, suppose that income and the price of chicken doubles. When there is no money
illusion:
• Q0 units of beef would still be demanded if the price of beef rises from P0 to 2P0.
• Q1 units of beef would still be demanded if the price of beef rises from P1 to 2P1.
• Q2 units of beef would still be demanded if the price of beef rises from P2 to 2P2.

After income and the

price of chicken double
Price of beef 2P0 2P1 2P2
↓ ↓ ↓
Quantity of beef demanded Q0 Q1 Q2
333 Multiple Regression Analysis—Introduction

P Initially, if the price P Initially, if the price P Initially, if the price

of beef equals P0 of beef equals P1 of beef equals P2

Income and the

price of chicken double
2P0

"Slope" = βP "Slope" = βP
"Slope" = βP

P0
2P1

P1
2P2
P2
Q Q Q
Q0 Q1 Q2

Figure 10.8
Demand curve for beef and no money illusion

Figure 10.8 illustrates this.

We can now sketch in the new demand curve that emerges when income and the price of
chicken double. Just connect the points (Q0, 2P0), (Q1, 2P1), and (Q2, 2P2). As figure 10.9 illus-
trates, the slope of the demand curve has changed.
But now recall that the linear demand model implicitly assumes that “slope” of the demand
curve equals a constant value, βP, and unaffected by income or the price chicken. Consequently
a linear demand model is intrinsically inconsistent with the existence of no money illusion. We
cannot use a model that is inconsistent with the theory to assess the theory. So, we must find a
different model.

10.4.2 Constant Elasticity Demand Model and Money Illusion Theory

To test the theory of no money illusion, we need a model of demand that can be consistent with
it. The constant elasticity demand model is such a model:

Q = βConstPβPIβIChickPβCP

The three exponents equal the elasticities. The beef price exponent equals the own price elasticity
of demand, the income exponent equals the income elasticity of demand, and the exponent of
the price of chicken equals the cross price elasticity of demand:
334 Chapter 10

After income and

price of chicken double

"Slope" = βP

Initial

Figure 10.9
Two demand curves for beef—Before and after income and the price of chicken doubling

βP = (Own) price elasticity of demand

= Percent change in the quantity of beef demanded resulting from a 1 percent change in
the price of beef (the good’s own price)
dQ P
=
dP Q
βI = Income elasticity of demand
= Percent change in the quantity of beef demanded resulting from a 1 percent change in
income
dQ I
=
dI Q
βCP = Cross price elasticity of demand
= Percent change in the quantity of beef demanded resulting from a 1 percent change in
the price of chicken
dQ ChickP
=
dChickP Q
335 Multiple Regression Analysis—Introduction

A little algebra allows us to show that the constant elasticity demand model is consistent with
the money illusion theory whenever the exponents sum to 0. Let βP + βI + βCP = 0 and solve for βCP:

βP + βI + βCP = 0
↓
βCP = −βP − βI

Now apply this result to the constant elasticity model:

Q = βConstPβPIβIChickPβCP

Substituting for βCP

βConstPβPIβIChickP(−βP−βI)

Splitting the exponent

βConstPβPIβIChickP−βPChickP−βI

Moving negative exponents to denominator

⎛ P βP ⎞ ⎛ I βI ⎞
= βConst ⎜
⎝ ChickP β P ⎟⎠ ⎜⎝ ChickP β I ⎟⎠

Simplifying
β βI
⎛ P ⎞ ⎛ I ⎞
P

= βConst ⎜
⎝ ChickP ⎟⎠ ⎜⎝ ChickP ⎟⎠

What happens to the two fractions whenever the price of beef (the good’s own price), income,
and the price of chicken change by the same proportion? Both the numerators and denominators
increase by the same proportion; hence the fractions remain the same. Therefore the quantity of
beef demanded remains the same. This model of demand is consistent with our theory whenever
the exponents sum to 0.
Let us begin the hypothesis testing process. We have already completed step 0.

Step 0: Formulate a model reflecting the theory to be tested.

Q = βConstPβPIβIChickPβCP
No money illusion theory: The elasticities sum to 0: βP + βI + βCP = 0.

Step 1: Collect data, run the regression, and interpret the estimates.

Natural logarithms convert the original equation for the constant elasticity demand model into
its “linear” form:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

336 Chapter 10

Table 10.4
Beef demand regression results—Constant elasticity model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick

To apply the ordinary least squares (OLS) estimation procedure we must first generate the
logarithms:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

↓ ↓ ↓ ↓ ↓ ↓
LogQt = log(βConst) + βPLogPt + βILogIt + βCPLogChickPt + et

where

LogQt = log(Qt)
LogPt = log(Pt)
LogIt = log(It)
LogChickPt = log(ChickPt)

Next we run a regression with the log of the quantity of beef demanded as the dependent
variable; the log of the price of beef (the good’s own price), log of income, and the log of the
price of the price of chicken are the explanatory variables (table 10.4).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

Interpreting the Estimates

bP = Estimate for the (own) price elasticity of demand = −0.41
We estimate that a 1 percent increase in the price of beef (the good’s own price) decreases
the quantity of beef demanded by 0.41 percent when income and the price of chicken
remain constant.
bI = Estimate for the income elasticity of demand = 0.51
We estimate that a 1 percent increase in income increases the quantity of beef demanded
by 0.51 percent when the price of beef and the price of chicken remain constant.
337 Multiple Regression Analysis—Introduction

bCP = Estimate for the cross price elasticity of demand = 0.12

We estimate that a 1 percent increase in the price of chicken increases the quantity of beef
demanded by 0.12 percent when the price of beef and income remain constant.

What happens when the price of beef (the good’s own price), income, and the price of chicken
increase by 1 percent simultaneously? The total estimated percent change in the quantity of beef
demanded equals sum of the individual changes. That is, the total estimated percent change in
the quantity of beef demanded equals the estimated percent change in the quantity demanded
resulting from
• a 1 percent change in the price of beef (the good’s own price)
plus
• a 1 percent change in income

plus
• a 1 percent change in the price of chicken.

The estimated percent change in the quantity demanded equals the sum of the elasticity esti-
mates. We can express this succinctly:

1 Percent increase in
↓
Price of beef Income Price of chicken
Estimated ↓ ↓ ↓
percent change = bP + bI + bCP
in Q = −0.41 + 0.51 + 0.12
= 0.22

A 1 percent increase in all prices and income results in a 0.22 percent increase in quantity of
beef demanded, suggesting that money illusion is present. As far as the no money illusion theory
is concerned, the sign of the elasticity estimate sum is not critical. The fact that the estimated
sum is +0.22 is not crucial; a sum of −0.22 would be just as damning. What is critical is that
the sum does not equal 0 as claimed by the money illusion theory.

Critical result: The sum of the elasticity estimates equals 0.22. The sum does not equal 0; the
sum is 0.22 from 0. This evidence suggests that money illusion is present and the no money
illusion theory is incorrect.

Since the critical result is that the sum lies 0.22 from 0, a two-tailed test, rather than a one-
tailed test is appropriate.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
338 Chapter 10

Cynic’s view: Sure, the elasticity estimates do not sum to 0 suggesting that money illusion
exists, but this is just “the luck of the draw.” In fact money illusion is not present; the sum of
the actual elasticities equals 0.

The cynic claims that the 0.22 elasticity estimate sum results simply from random influences.
A more formal way of expressing the cynic’s view is to say that the 0.22 estimate for the elastic-
ity sum is not statistically different from 0. An estimate is not statistically different from 0
whenever the nonzero results from random influences.
Let us now construct the null and alternative hypotheses:

H0: βP + βI + βCP = 0 Cynic’s is correct: Money illusion not present

H1: βP + βI + βCP ≠ 0 Cynic’s is incorrect: Money illusion present

The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is con-
sistent with the evidence. Can we dismiss the cynic’s view as nonsense?

Econometrics Lab 10.1: Could the Cynic Be Correct?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 10.1.]

We will use a simulation to show that the cynic could indeed be correct. In this simulation, Coef1,
Coef2, and Coef3 denote the coefficients for the three explanatory variables. By default, the actual
values of the coefficients are −0.5, 0.4, and 0.1. The actual values sum to 0 (figure 10.10).

Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5

Coef3 Sam size

0.0 23 Coefficient
0.1 24 estimates and
0.2 25 their sum for
this repetition

Coef ests: 1, 2, 3, and sum

Figure 10.10
Sum of the elasticity estimates and random influences
339 Multiple Regression Analysis—Introduction

Be certain that the Pause checkbox is checked. Click Start. The coefficient estimates for each
of the three coefficients and their sum are reported:
• The coefficient estimates do not equal their actual values.
• The sum of the coefficient estimates does not equal 0 even though the sum of the actual coef-
ficient values equals 0.

Click Continue a few more times. As a consequence of random influences we could never expect
the estimate for an individual coefficient to equal its actual value. Therefore we could never
expect a sum of coefficient estimates to equal the sum of their actual values. Even if the actual
elasticities sum to 0, we could never expect the sum of their estimates to equal precisely 0.
Consequently we cannot dismiss the cynic’s view as nonsense.
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and money illusion was not present?
• Specific question: The sum of the coefficient estimates is 0.22 from 0. What is the probabil-
ity that the sum of the coefficient estimates in one regression would be 0.22 or more from 0, if
H0 were true (if the sum of the actual elasticities equaled 0)?

Answer: Prob[Results IF H0 true]

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Estimate sum is statistically Estimate sum is not statistically
different from 0 different from 0

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true].
How can we calculate this probability? We will explore three approaches that can be used:
• Clever algebraic manipulation
• Wald (F-distribution) test
• Letting statistical software do the work
340 Chapter 10

10.4.3 Calculating Prob[Results IF H0 true]: Clever Algebraic Manipulation

We begin with the clever algebraic manipulation approach. This approach exploits the tails prob-
ability reported in the regression printout. Recall that the tails probability is based on the premise
that the actual value of the coefficient equals 0. Our strategy takes advantage of this:
• First, cleverly define a new coefficient that equals 0 when the null hypothesis is true.
• Second, reformulate the model to incorporate the new coefficient.
• Third, use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the new model.
• Last, focus on the estimate of the new coefficient. Use the new coefficient estimate’s tails
probability to calculate Prob[Results IF H0 true].

Step 0: Formulate a model reflecting the theory to be tested.

Begin with the null and alternative hypotheses:

H0: βP + βI + βCP = 0 Cynic is correct: Money illusion not present

H1: βP + βI + βCP ≠ 0 Cynic is incorrect: Money illusion present

Now cleverly define a new coefficient so that the null hypothesis is true when the new coefficient
equals 0:

βClever = βP + βI + βCP

Clearly, βClever equals 0 if and only if the elasticities sum to 0 and no money illusion exists; that
is, βClever equals 0 if and only if the null hypothesis is true.
Now we will use algebra to reformulate the constant elasticity of demand model to incorporate
βClever:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

βClever = βP + βI + βCP

Solving for βCP

= βClever − βP − βI

Substitute for βCP:

= log(βConst) + βP log(Pt) + βI log(It) + (βClever − βP − βI)log(ChickPt) + et

Multiply log(ChickPt) term:

= log(βConst) + βP log(Pt) + βI log(It)

+ βClever log(ChickPt) − βP log(ChickPt) − βI log(ChickPt) + et

341 Multiple Regression Analysis—Introduction

Rearrange terms:

= log(βConst) + βP log(Pt) − βP log(ChickPt)

+ βI log(It) − βI log(ChickPt) + βClever log(ChickPt) + et

Factor the βP and βI terms:

= log(βConst) + βP[log(Pt) − log(ChickPt)]

+ βI[log(It) − log(ChickPt)] + βClever log(ChickPt) + et

Define new variables:

LogQt = log(βConst) + βPLogPLessLogChickPt

+ βIlogILessLogChickPt + βCleverLogChickPt + et

where

LogQt = log(Qt)
LogPLessLogChickPt = log(Pt) − log(ChickPt)
LogILessLogChickPt = log(It) − log(ChickPt)
LogChickPt = log(ChickPt)

Step 1: Collect data, run the regression, and interpret the estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

Now, use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of this model (table 10.5).
It is important to note that the estimates of the reformulated model are consistent with the
estimates of the original model (table 10.4):
• The estimate of the price coefficient is the same in both cases, −0.41.
• The estimate of the income coefficient is the same in both cases, 0.51.
• In the reformulated model, the estimate of βClever equals 0.22, which equals the sum of the

elasticity estimates in the original model.

Step 2: Play the cynic and challenge the results; reconstruct the null and alternative
hypotheses.
Cynic’s view: Sure, bClever, the estimate for the sum of the actual elasticities, does not equal 0,
suggesting that money illusion exists, but this is just “the luck of the draw.” In fact money illu-
sion is not present; the sum of the actual elasticities equals 0.
342 Chapter 10

Table 10.5
Beef demand regression results—Constant elasticity model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogPLessLogChickP − 0.411812 0.093532 − 4.402905 0.0003

LogILessLogChickP 0.508061 0.266583 1.905829 0.0711
LogChickP 0.220974 0.275863 0.801027 0.4325
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24
Estimated equation: EstLogQ = 9.50 − 0.41LogPLessLogChickP + 0.51LogILessLogChickP + 0.22LogChick
Critical result: bClever, the estimate for the sum of the actual elasticities, equals 0.22. The estimate does not equal 0;
the estimate is 0.22 from 0. This evidence suggests that money illusion is present and the no money
illusion theory is incorrect.

Prob[Results IF H0 true]

bClever
0.22 0.22
0 0.22

Figure 10.11
Probability distribution of the clever coefficient estimate

We now reformulate the null and alternative hypotheses in terms of βClever:

H0: βP + βI + βCP = 0 ⇒ βClever = 0 Cynic is correct: Money illusion not present

H1: βP + βI + βCP ≠ 0 ⇒ βClever ≠ 0 Cynic is incorrect: Money illusion present

We have already shown that we cannot dismiss the cynic’s view as nonsense. As a consequence
of random influences we could never expect the estimate for βClever to equal precisely 0, even if
the actual elasticities sum to 0.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis (figure 10.11).
343 Multiple Regression Analysis—Introduction

Student t-distribution
Mean = 0
SE = 0.2759
DF = 20

0.4325/2 0.4325/2

bClever
0.22 0.22
0 0.22

Figure 10.12
Calculating Prob[Results IF H0 true]

• Generic question: What is the probability that the results would be like those we obtained
(or even stronger), if the cynic is correct and no money illusion was present?
• Specific question: What is the probability that the coefficient estimate in one regression,
bClever, would be at least 0.22 from 0, if H0 were true (if the actual coefficient, βClever, equals 0)?

Answer: Prob[Results IF H0 true]

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
bClever and the estimate sum is bClever and the estimate sum is not
statistically different from 0 statistically different from 0

Step 4: Use the general properties of the estimation procedure, the probability distribution of
the estimate, to calculate Prob[Results IF H0 true] (figure 10.12).

OLS estimation Number of Number of

If H0 true Standard error
procedure unbiased observations parameters
↓
Mean[bClever] = βClever = 0 SE[bClever] = 0.2759 DF = 24 − 4 = 20
344 Chapter 10

The software automatically computes the tails probability based on the premise that the actual
value of the coefficient equals 0. This is precisely what we need, is it not? The regression printout
reports that the tails probability equals 0.4325. Consequently
Prob[Results IF H0 true] = 0.4325.
Step 5: Decide on the standard of proof, a significance level.

The significance level is the dividing line between the probability being small and the probability
being large.

Prob[Results IF H0 true] less than Prob[Results IF H0 true] greater than

significance level significance level
↓ ↓
Prob[Results IF H0 true] small Prob[Results IF H0 true] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
bClever and the estimate sum is bClever and the estimate sum is not
statistically different from 0 statistically different from 0

The probability exceeds the traditional significance levels of 1, 5, and 10 percent. Based on the
traditional significance levels, we would not reject the null hypothesis. We would conclude that
βClever and the estimate sum is not statistically different from 0, thereby supporting the no money
illusion theory.
In the next chapter we will explore two other ways to calculate Prob[Results IF H0 true].

Chapter 10 Review Questions

1. How does multiple regression analysis differ from simple regression analysis?
2. Consider the following linear demand model:

Qt = βConst + βPPt + βIIt + βCPChickPt + et

a. What is the interpretation of the own price coefficient estimate, bP?

b. What is the interpretation of the income coefficient estimate, bI?
c. What is the interpretation of the chicken price coefficient estimate, bCP?
3. Is the linear demand model consistent with the no money illusion theory?
345 Multiple Regression Analysis—Introduction

4. Consider the following constant elasticity demand model:

Q = βConst PβPIβIChickPβCP

a. What is the interpretation of the own price exponent estimate, bP?

b. What is the interpretation of the income exponent estimate, bI?
c. What is the interpretation of the chicken price exponent estimate, bCP?
5. Is the constant elasticity demand model consistent with the no money illusion theory?

Chapter 10 Exercises

Agricultural production data: Cross-sectional agricultural data for 140 nations in 2000 that
cultivated more than 10,000 square kilometers of land.

Labort Number of agricultural workers in country t (persons)

Landt Land under cultivation in country t (sq km)
Machineryt Number of agricultural machines in country t (tractors)
ValueAddedt Agricultural valued added in country t (2000 US dollars)

1. Focus on the following linear model for value added:

Model: ValueAddedt = βConst + βLaborLabort + βLandLandt + βMachineryMachineryt + et

a. What is your theory regarding how the quantity of labor, land, and machinery affects
agricultural value added?
b. What does your theory imply about the signs of the model’s coefficients?
c. What are the appropriate hypotheses?
d. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]

e. What can you conclude about your theory?

2. Instead of focusing on a linear value-added model, consider the constant elasticity model for
value added:

Model: ValueAdded = βConstLaborβLaborLandβLandMachineryβMachinery

Definition: The value-added function exhibits constant returns to scale if and only if increasing
each input by the same factor will increase ValueAdded by that same factor.
346 Chapter 10

For example, suppose that the value-added function exhibits constant returns to scale. If twice
as much labor, land, and machinery are used, then value added will double also.

a. Begin with the equation for the constant elasticity model for value added.

ValueAdded = βConstLaborβLaborLandβLandMachineryβMachinery
Then double labor, land, and machinery; that is, in the equation replace
• Labor with 2Labor
• Land with 2Land
• Machinery with 2Machinery
Derive the expression for NewValueAdded in terms of ValueAdded:
NewValueAdded = _____________________
b. If the value-added function exhibits constant returns to scale, how must the new expression
for value added, NewValueAdded , be related to the original expression for value added,
ValueAdded?
c. If the value-added function exhibits constant returns to scale, what must the sum of the
exponents, βLabor + βLand + βMachinery, equal?
3. Consider the constant elasticity model for value added

Model: ValueAdded = βConstLaborβLaborLandβLandMachineryβMachinery

and the following theory:

Theory: The value added function exhibits constant returns to scale.

a. Take natural logarithms to derive the log form of the constant elasticity model for value
added.
b. Estimate the parameters of the model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]

c. Formulate the null and alternative hypotheses.

4. Assess the constant returns to scale theory.
a. Focus on the following “clever” coefficient definitions:
i. βClever = βLabor + βLand + βMachinery
ii. βClever = βLabor + βLand + βMachinery − 1
iii. βClever = βLabor + βLand + βMachinery + 1
Which of these is an appropriate “clever” definition to assess the constant returns to scale
theory?
347 Multiple Regression Analysis—Introduction

b. Incorporate the “clever” coefficient into the log form of the constant elasticity model for
value added.
c. Estimate the parameters of the equation that incorporates the new coefficient.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]
d. Express the constant returns to scale hypotheses in terms of the “clever” coefficient
definition.
e. Is a one-tail or a two-tail test appropriate to assess the constant returns to scale theory?
Use your estimate of the clever coefficient to assess the theory.

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.

CigConsPCt Cigarette consumption per capita in state t (packs)

IncPCt Income per capita in state t (1,000’s of dollars)
PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)
SmokeRateYoutht Percent of youths who smoke in state t

5. Revisit the cigarette consumption data.

a. What is your theory regarding how:
i. The price of cigarettes affects per capita cigarette consumption?
ii. Per capita income affects per capita cigarette consumption?
b. Based on your theory, construct a linear regression model. What equation depicts your
model?
c. What does your theory imply about the sign of the coefficients?
d. What are the appropriate hypotheses?
e. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

f. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
6. Again, revisit the cigarette consumption data.
a. Instead of a linear model, consider a constant elasticity model to capture the impact that
the price of cigarettes and income per capita have on cigarette consumption. What equation
depicts your model?
348 Chapter 10

b. What does your theory imply about the sign of the coefficients?
c. What are the appropriate hypotheses?
d. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

e. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
7. Again, revisit the cigarette consumption data.
a. What is your theory regarding how
i. The price of cigarettes affects the youth smoking rate?
ii. Per capita income affects the youth smoking rate?
b. Based on your theory, construct a linear regression model. What equation depicts your
model?
c. What does your theory imply about the signs of the model’s coefficients?
d. What are the appropriate hypotheses?
e. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

f. Is a one-tail or a two-tail test appropriate to assess your theories? Explain. What can you
conclude about your theories?
Hypothesis Testing and the Wald Test
11

Chapter 11 Outline

11.1 No Money Illusion Theory: Taking Stock

11.2 No Money Illusion Theory: Calculating Prob[Results IF H0 True]

11.2.1 Clever Algebraic Manipulation
11.2.2 Wald (F-Distribution) Test
11.2.3 Calculating Prob[Results IF H0 true]: Let the Software Do the Work

11.3 Testing the Significance of the “Entire” Model

11.4 Equivalence of Two-Tailed t-Tests and Wald Tests (F-Tests)

11.4.1 Two-Tailed t-Test
11.4.2 Wald Test

11.5 Three Important Distributions: Normal, Student-t, and F

Chapter 11 Prep Questions

1. Consider the log form of the constant elasticity demand model:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

Show that if βP + βI + βCP = 0, then

log(Qt) = log(βConst) + βP[log(Pt) − log(ChickPt)] + βI[log(It) − log(ChickPt)] + et

Hint: Assume that the coefficients sum to 0 and solve for βCP in terms βP and βI; then substitute
this expression for βCP into the log form of the constant elasticity demand model.
350 Chapter 11

2. Review how the ordinary least squares (OLS) estimation procedure determines the value of
the parameter estimates. What criterion does this procedure use to determine the value of the
parameter estimates?
3. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
4. Focus on our beef consumption data:

Beef consumption data: Monthly time series data of beef consumption, beef prices, income,
and chicken prices from 1985 and 1986.

Qt Quantity of beef demanded in month t (millions of pounds)

Pt Price of beef in month t (cents per pound)
It Disposable income in month t (billions of chained 1985 dollars)
ChickPt Price of chicken in month t (cents per pound)

Consider the log form of the constant elasticity demand model:

Model: log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

a. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the constant elasticity demand model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]
i. What does the sum of squared residuals equal?
ii. What is the ordinary least squares (OLS) estimate for
• βP?
• βI?
• βCP?
• log(βConst)?
What criterion do these estimates satisfy?
b. Now, consider a different restriction. Restrict the value of βCP to 0.
i. Incorporate this restriction into the constant elasticity demand model. What is the equa-
tion describing the restricted model?
ii. We wish to use the ordinary least squares (OLS) estimation procedure to estimate the
remaining parameters.
351 Hypothesis Testing and the Wald Test

• What dependent variable should we use?

• What explanatory variables should we use?
• Run the regression.
• What does the sum of squared residuals equal?
iii. Compared to unrestricted regression, part a, has the sum of squared residuals risen or
fallen? Explain why.
c. Now restrict the value of βCP to 1.
i. Incorporate this restriction into the constant elasticity demand model. What is the equa-
tion describing the restricted model?
ii. We wish to use the ordinary least squares (OLS) estimation procedure to estimate the
remaining parameters.
• What dependent variable should we use?
• What explanatory variables should we use?
• Run the regression.
• What does the sum of squared residuals equal?
iii. Compared to unrestricted regression, part a, has the sum of squared residuals risen or
fallen? Explain why.

11.1 No Money Illusion Theory: Taking Stock

The money illusion theory contends that whenever all prices and income change by the same
proportion the quantity demanded is unaffected. In terms of elasticities, this means that a good’s
elasticities (own price, income, and cross price) sum to 0. Let us briefly review the steps that
we undertook in the last chapter to assess this theory.

Project: Assess the no money illusion theory.

Since the linear demand model is intrinsically inconsistent with the no money illusion theory,
we cannot use it to assess the theory. The constant elasticity demand model can be used, however:

Constant elasticity demand model: Q = βConstPβPIβIChickPβCP

where

βP = (own) price elasticity of demand

βI = income elasticity of demand
βCP = cross price elasticity of demand
352 Chapter 11

When the elasticities sum to 0, no money illusion exists:

No money illusion theory: The elasticities sum to 0: βP + βI + βCP = 0.

Next we converted the constant elasticity demand model into a linear relationship by taking
natural logarithms:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

We then used the ordinary least squares (OLS) estimation procedure to estimate the
elasticities:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

If all prices and income increase by 1 percent, the quantity of beef demanded would increase
by 0.22 percent. The sum of the elasticity estimates does not equal 0; more specifically, the sum
lies 0.22 from 0. The nonzero sum suggests that money illusion exists (table 11.1).
However, as a consequence of random influences, we could never expect the sum of the
elasticity estimates to equal exactly precisely 0, even if the sum of the actual elasticities did
equal 0. Consequently we followed the hypothesis testing procedure. We played the cynic in
order to construct the null and alternative hypotheses. Finally, we needed to calculate the prob-
ability that the results would be like those we obtained (or even stronger), if the cynic is correct
and null hypothesis is actually true; that is, we needed to calculate Prob[Results IF H0 true].

Table 11.1
Beef demand regression results—Constant elasticity model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24 Degrees of freedom 20
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick
Interpretation of estimates:
bP = − 0.41: (Own) price elasticity of demand = − 0.41
bI = 0.51: Income elasticity of demand = 0.51
bChickP = 0.12: Cross price elasticity of demand = 0.12
Critical result: Sum of the elasticity estimates (bP + bI + bCP = − 0.41 + 0.51 + 0.12 = 0.22) does not equal 0; the
estimate is 0.22 from 0. This evidence suggests that money illusion is present and the no money
illusion theory is incorrect.
353 Hypothesis Testing and the Wald Test

11.2 No Money Illusion Theory: Calculating Prob[Results IF H0 true]

11.2.1 Clever Algebraic Manipulation

In the last chapter we explored one way to calculate this probability, the clever algebraic manipu-
lation approach. First we cleverly defined a new coefficient that equals 0 if and only if the null
hypothesis is true:

βClever = βP + βI + βCP

We then reformulated the null and alternative hypotheses in terms of the new coefficient, βClever:

H0: βP + βI + βCP = 0 ⇒ βClever = 0 ⇒ Money illusion not present

H1: βP + βI + βCP ≠ 0 ⇒ βClever ≠ 0 ⇒ Money illusion present

After incorporating the new coefficient into the model, we used the ordinary least squares (OLS)
estimation procedure to estimate the value of the new coefficient. Since the null hypothesis is
now expressed as the new, clever coefficient equaling 0, the new coefficient’s tails probability
reported in the regression printout is the probability that we need:

Prob[Results IF H0 true] = 0.4325

We will now explore two other ways to calculate this probability:

• Wald (F-distribution) test
• Letting statistical software do the work

11.2.2 Wald (F-Distribution) Test

The Wald test involves two different regressions:

• Restricted regression reflects H0; the restricted regression “enforces” the theory.
• Unrestricted regression reflects H1; the unrestricted regression does not “enforce” the theory.

We will now discuss each regression.

Restricted Regression Reflects H0

The restricted regression enforces the theory; that is, the restricted regression imposes the restric-
tion specified by the null hypothesis. In this case the null hypothesis requires the elasticities sum
to equal 0:

βP + βI + βCP = 0
354 Chapter 11

We now incorporate this restriction into the constant elasticity demand model:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

βP + βI + βCP = 0. Solving for βCP

βCP = − (βP + βI). Substituting for βCP

= log(βConst) + βP log(Pt) + βI log(It) + (βP + βI)log(ChickPt) + et

Multiplying log(ChickPt) term

= log(βConst) + βP log(Pt) + βI log(It) − βP log(ChickPt) − βI log(ChickPt) + et

Rearranging terms

= log(βConst) + βP log(Pt) − βP log(ChickPt) + βI log(It) − βI log(ChickPt) + et

Factoring the βP and βI terms

= log(βConst) + βP[log(Pt) − log(ChickPt)] + βI[log(It) − log(ChickPt)] + et

Now define new variables:

LogQt = log(βConst) + βPLogPLessLogChickPt + βIlogILessLogChickPt + et

where

LogQt = log(Qt)
LogPLessLogChickPt = log(Pt) − log(ChickPt)
LogILessLogChickPt = log(It) − log(ChickPt)
LogChickPt = log(ChickPt)

Next estimate the parameters of the restricted equation (table 11.2).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

To compute the cross price elasticity estimate, we must remember that the restricted regression
is based on the premise that the sum of the elasticities equals 0. Hence

bP + bI + bCP = 0

and

bCP = − (bP + bI) = −(−0.47 + 0.30) = 0.17

For future reference, note in the restricted regression the sum of squared residuals equals
0.004825 and the degrees of freedom equal 21:

SSRR = 0.004825, DFR = 24 − 3 = 21

355 Hypothesis Testing and the Wald Test

Table 11.2
Beef demand regression results—Restricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogPLessLogChickP − 0.467358 0.062229 −7.510284 0.0000

LogILessLogChickP 0.301906 0.068903 4.381606 0.0003
Const 11.36876 0.260482 43.64516 0.0000
Number of observations 24 Degrees of freedom 21
Sum of squared residuals 0.004825
Estimated equation: EstLogQ = 11.4 − 0.47LogPLessLogChickP + 0.30LogILessLogChickP
Interpretation of estimates:
bP = − 0.47: (Own) price elasticity of demand = − 0.47
bI = 0.30: Income elasticity of demand = 0.30

Unrestricted Regression Reflects H1

The unrestricted regression does not force the model to enforce the theory; that is, the
unrestricted regression considers the model that reflects the alternative hypothesis allowing the
parameter estimates to take on any values. We have already run the unrestricted regression to
estimate the coefficients of this model. The log of the quantity of beef demanded is the dependent
variable; the logs of the price of beef (the good’s own price), income, and the price of chicken
are the explanatory variables:

log(Qt) = log(βConst) + βP log(Pt) + βI log(It) + βCP log(ChickPt) + et

Let us review the regression printout (table 11.3). Record the sum of squared residuals and the
degrees of freedom in the unrestricted regression:

SSRU = 0.004675, DFU = 24 − 4 = 20

Comparing the Restricted Sum of Squared Residuals and the Unrestricted Sum of Squared
Residuals: The F-Statistic
Next we compare the sum of squared residuals for the restricted and unrestricted regressions:

SSRR = 0.004825, SSRU = 0.004675

The sum of squared residuals from the restricted equation is larger.

Question: Is this a coincidence?

Answer: No. Let us now explain why.
356 Chapter 11

Table 11.3
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24 Degrees of freedom 20
Sum of squared residuals .004675
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick

Table 11.4
Comparison of parameter estimates

Restricted Unrestricted
regression regression

bP − 0.47 − 0.41
bI 0.30 0.51
bChickP 0.17 0.12
bConst 9.50 11.37
SSR 0.004825 0.004675

The parameter estimates of the restricted and unrestricted regressions differ (table 11.4). Recall
that the estimates of the constant and coefficients are chosen so as to minimize the sum of
squared residuals. In the unrestricted regression no restrictions are placed on the coefficient
estimates; when bP equals −0.41, bI equals 0.51, and bChickP equals 0.12, the sum of squared
residuals is minimized. The estimates of the unrestricted regression minimize the sum of squared
residuals. The estimates of the restricted regression do not equal the estimates of the unrestricted
regression. Hence the restricted sum of square residuals is greater than the unrestricted sum.
More generally:
• The unrestricted equation places no restrictions on the estimates.
•Enforcing a restriction impedes our ability to make the sum of squared residuals as small as
possible.
• A restriction can only increase the sum of squared residuals; a restriction cannot reduce the
sum:

SSRR ≥ SSRU
357 Hypothesis Testing and the Wald Test

Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5

Coef3 Sam size

0.0 23 Restricted coefficient

Coefficient
0.1 24 estimates and sum for
sum restriction
0.2 25 this repetition

Restricted sum of
Restriction: Coef sum: 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum

Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition

Figure 11.1
Restricted and unrestricted sum of square residuals simulation

Econometrics Lab 11.1: The Restricted and Unrestricted Sums of Squared Residuals

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.1.]

This simulation emphasizes the point. It mimics the problem at hand by including three explana-
tory variables whose coefficients are denoted as Coef1, Coef2, and Coef3. By default the actual
values of the three coefficients are −0.5, 0.4, and 0.1, respectively. The simulation allows us
to specify a restriction on the coefficient sum. By default a coefficient sum of 0 is imposed
(figure 11.1).
Be certain the Pause checkbox is checked and click Start. The first repetition is now per-
formed. The simulation calculates the parameter estimates for both the restricted and unrestricted
equations. The sum of the restricted coefficient estimates equals 0; the sum of the unrestricted
coefficient does not equal 0. If our logic is correct the restricted sum will be greater than the
unrestricted sum. Check the two sums. Indeed, the restricted sum is greater. Click Continue a
few times. Each time the restricted sum is always greater than the unrestricted sum confirming
our logic.
Now let us consider a question:
358 Chapter 11

Question: Since the imposition of a restriction can only make the sum of squared residuals
larger, how much larger should we expect it be?

The answer to this question depends on whether or not the restriction is actually true. On the
one hand, if in reality the restriction is not true, we would expect the sum of squared residuals
to increase by a large amount. On the other hand, if the restriction is actually true, we would
expect the sum of squared residuals to increase only modestly.

Restriction not true Restriction true

↓ ↓
SSRR much larger than SSRU SSRR only a little larger than SSRU

How do we decide if the restricted sum of squared residuals is much larger or just a little
larger than the unrestricted sum? For reasons that we will not delve into, we compare the mag-
nitudes of the restricted and unrestricted sum of squared residuals by calculating what statisti-
cians call the F-statistic:

(SSRR − SSRU ) / ( DFR − DFU )

F=
SSRU / DFU

On the one hand, when the restricted sum is much larger than the unrestricted sum,
• SSRR − SSRU is large

and
• the F-statistic is large.

On the other hand, when the restricted sum is only a little larger than the unrestricted sum,
• SSRR − SSRU is small

and
• the F-statistic is small.

Note that since the restricted sum of squared residuals (SSRR) cannot be less than the unrestricted
sum (SSRU), F-statistic can never be negative (figure 11.2):

F≥0

Furthermore the F-statistic is a random variable. The claim is based on the fact that:
• Since the parameter estimates for both the restricted and unrestricted equations are random
variables both the restricted and unrestricted sums of squared residuals are random variables.
•Since both the restricted and unrestricted sums of squared residuals are random variables, the
F-statistic is a random variable.
359 Hypothesis Testing and the Wald Test

Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5

Coef3 Sam Size

0.0 23 Restricted coefficient

0.1 24 Coefficient estimates and sum for
0.2 25 sum restriction this repetition

Restricted sum of
Restriction: Coef sum: 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum

Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition

F -Statistic
F-Statistic for
this repetition

Figure 11.2
Sums of square residuals and the F-statistic simulation

Econometrics Lab 11.2: The F-Statistic Is a Random Variable

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.2.]

Again, let us use a simulation to illustrate that the F-statistic is a random variable. Be certain
the Pause checkbox is checked and click Start. Then click Continue a few times. We cannot
predict the sums of squared residuals or the F-statistic beforehand. Clearly, the sums of squared
residuals and the F-statistic are random variables.
Let us now put this all together:

H0: βP + βI + βCP = 0 ⇒ Money illusion not present ⇒ Restriction true

H1: βP + βI + βCP ≠ 0 ⇒ Money illusion present ⇒ Restriction not true

360 Chapter 11

Money illusion present Money illusion not present

↓ ↓
If H0 is not true If H0 is true
↓ ↓
Restriction not true Restriction true
↓ ↓
Restriction should cost Restriction should cost
much in terms of SSR little in terms of SSR
↓ ↓
SSRR much larger than SSRU SSRR a little larger than SSRU
We compare the sizes of the
sum of squared residuals by
calculating the F-statistic:

(SSRR − SSRU ) / ( DFR − DFU )

F=
SSRU / DFU
F large F small

Now calculate the numerical value of our F-statistic:

SSRR = 0.004825 DFR = 24 − 3 = 21

SSRU = 0.004675 DFU = 24 − 4 = 20
SSRR − SSRU = 0.000150 DFR − DFU = 1

(SSRR − SSRU ) / ( DFR − DFU ) 0.000150 / 1 0.0000150

F= = = = 0.64
SSRU / DFU 0.004675 / 20 0.000234

Next consider the views of our cynic.

Cynic’s view: Sure, the F-statistic is 0.64, but the F-statistic will always be positive because
the restricted sum of squared residuals (SSRR) will always be greater than the unrestricted sum
(SSRU). An F-statistic of 0.64 results from “the luck of the draw.”

We can characterize the cynic’s view as follows:

• An F-statistic of 0.64 is small; it is not statistically different from 0.
• The restricted sum of squared errors is larger than the unrestricted sum only as a consequence
of the luck of the draw.
• The restriction is true.
• The null hypothesis, H0, is actually true and money illusion is not present.
361 Hypothesis Testing and the Wald Test

Question assess the cynic’s view:

• Generic question: What is the probability that the results would be like those we obtained
(or even stronger), if the cynic is correct and H0 is actually true?
•Specific question: What is the probability that the F-statistic from one pair of regressions
would be 0.64 or more, if H0 were true, if the restriction is true and money illusion is not present?

Answer: Prob[Results IF H0 true]

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Reject the no money Do not reject the no money
illusion theory illusion theory

We must calculate Prob[Results IF H0 true]. Before doing so, however, recall that the F-
statistic is a random variable. Recall what we have learned about random variables:
• The bad news is that we cannot predict the value of a random variable beforehand.
• The good news is that in this case we can describe its probability distribution.

The F-distribution describes the probability distribution of the F-statistic. As figure 11.3 shows,
the F-distribution looks very different than the normal and Student t-distribution. The normal
and Student t-distributions were symmetric bell shaped curves. The F-distribution is neither

F -Distribution

F
0

Figure 11.3
F-Distribution
362 Chapter 11

symmetric nor bell shaped. Since the F-statistic can never be negative, the F-distribution begins
at F equals 0. Its precise shape depends on the numerator’s and the denominator’s degrees of
freedom, the degrees of freedom of the restricted and unrestricted regressions.

Econometrics Lab 11.3: The Restricted and Unrestricted Sums of Squared Residuals and
the F-Distribution

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.3.]

Now we will use the simulation to calculate Prob[Results IF H0 true] (figure 11.4):

Coef1 Coef2
Actual
− 0.6 0.3 coefficients
− 0.5 0.4
− 0.4 0.5

Coef3 Sam size

0.0 23 Restricted coefficient

Coefficient
0.1 24 estimates and sum for
sum restriction
0.2 25 this repetition

Restricted sum of
Restriction: Coef sum 0 squared residuals
for this repetition
Coef ests: 1, 2, 3, and sum

Res:
Unrestricted coefficient estimates
SSR R and sum for this repetition
Unr:
Unrestricted sum of squared
SSR U residuals for this repetition

F–Statistic for
this repetition

F–Statistic At least F–value

At least F–value 0.64

Percent of repetitions in which
At least percent the F–statistic is equal to or
greater than the at least F–value

Figure 11.4
F-Distribution simulation
363 Hypothesis Testing and the Wald Test

• By default, the actual values of the three coefficients (Coef1, Coef2, and Coef3) are −0.5, 0.4,
and 0.1, respectively. The actual values of the coefficients sum to 0. Hence the premise of the
null hypothesis is met; that is, H0 is true.
• Also the At Least F-Value is set at 0.64; this is the value of the F-statistic that we just calcu-
lated for the restricted and unrestricted beef demand regressions. Click Continue a few more
times. Sometimes the F-statistic is less than 0.64; other times it is greater than 0.64. Note the
At Least Percent line; the simulation is calculating the percent of repetitions in which the
F-statistics is equal to or greater than 0.64.
•Clear the Pause checkbox, click Start, and then after many, many repetitions click Stop. The
F-statistic equals 0.64 or more in about 43 percent of the repetitions.

We can now apply the relative frequency interpretation of probability; in one repetition of the
experiment, the probability that the F-statistic would be 0.64 or more when the null hypothesis
is true equals 0.43:

Prob[Results IF H0 true] = 0.43

There is another way to calculate this probability that does not involve a simulation. Just as
there are tables that describe the normal and Student t-distributions, there are tables describing
the F-distribution. Unfortunately, F-distribution tables are even more cumbersome than Student
t-tables. Fortunately, we can use our Econometrics Lab to perform the calculation instead.

Econometrics Lab 11.4: F-Distribution Calculations

We wish to calculate the probability that the F-statistic from one pair of regressions would be
0.64 or more, if H0 were true (if there is no money illusion, if actual elasticities sum to 0),
Prob[Results IF H0 true] (figure 11.5).

F -Distribution
DFNum = 1
DFDem = 20

0.43

F
0.64

Figure 11.5
Calculating Prob[Results IF H0 true]—Using a simulation
364 Chapter 11

Access the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.4.]

Recall our F-statistic calculation:

(SSRR − SSRU ) / ( DFR − DFU ) 0.000150 / 1 0.0000150

F= = = = 0.64
SSRU / DFU 0.004675 / 20 0.000234

The following information has been entered:

Sum of squares numerator = 0.000150

Sum of squares denominator = 0.004675
Degrees of freedom numerator = 1
Degrees of freedom denominator = 20

Click Calculate:

Prob[Results IF H0 true] = 0.43

11.2.3 Calculating Prob[Results IF H0 true]: Let the Software Do the Work

Many statistical software packages can be used to conduct a Wald test automatically.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

First estimate the unrestricted regression (table 11.5). Then choose the Wald test option and
impose the appropriate restriction that the coefficients sum to 0.

Table 11.5
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24 Degrees of freedom 20
Sum of squared residuals 0.004675
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick
365 Hypothesis Testing and the Wald Test

Getting Started in EViews

After running the unrestricted regression:

•In the Equation window: Click View, Coefficient Diagnostics, and Wald Test—Coefficient
Restrictions.
• In the Wald Test window: Enter the restriction; in this case,

C(1) + C(2) + C(3) = 0

• Click OK.

Prob[Results IF H0 true]: The F-statistic equals 0.64 (table 11.6). The probability that the
F-statistic from one pair of regressions would be 0.64 or more, if H0 were true (if there is no
money illusion, if actual elasticities sum to 0) equals 0.43.
We have now described three ways to calculate Prob[Results IF H0 true]. Let us compare the
results (table 11.7).
While the methods use different approaches, they produce identical conclusions. In fact it can
be shown rigorously that the methods are equivalent.

Table 11.6
Beef demand regression results—Wald test of No Money Illusion theory

Wald test

Degrees of freedom

Value Num Dem Prob

F-statistic 0.641644 1 20 0.4325

Table 11.7
Comparison of the methods to calculate Prob[Results IF H0 true]

Method Prob[Results IF H0 true]

t-Test using clever definition 0.43

Wald test using restricted and unrestricted regressions 0.43
Wald test using statistical software 0.43
366 Chapter 11

11.3 Testing the Significance of the “Entire” Model

Next we will consider a set of null and alternative hypotheses that assess the entire model:

No explanatory variables has an effect on the

H0: βP = 0, βI = 0, and βCP = 0
dependent variable
H1: βP ≠ 0 and/or βI ≠ 0 and/or βCP ≠ 0 At least one explanatory variable has an effect on
(at least one coefficient does not equal 0) the dependent variable

On the one hand, if the null hypothesis were true, none of the explanatory variables would affect
the dependent variable, and consequently the model would be seriously deficient. On the other
hand, if the alternative hypothesis were true, at least one of the explanatory variables would be
influencing the dependent variable.
We will use the restricted and unrestricted regressions approach to calculate Prob[Results IF
H0 true]. We begin with estimating the restricted and unrestricted regressions.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

Restricted equation—reflects H0: βP = 0, βI = 0, and βCP = 0 (table 11.8):

Model: LogQt = log(βConst) + et

Unrestricted equation—reflects H1: βP ≠ 0 and/or βI ≠ 0 and/or βCP ≠ 0:

Model: LogQt = log(βConst) + βPLogP + βILogIt + βCPLogChickPt + et

We have estimated the unrestricted regression before (table 11.9).

Table 11.8
Beef demand regression results—No explanatory variables

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

Const 12.30576 0.005752 2139.539 0.0000

Number of observations 24 Degrees of freedom 23
Sum of squared residuals 0.018261
367 Hypothesis Testing and the Wald Test

Table 11.9
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24 Degrees of freedom 20
Sum of squared residuals 0.004675
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick

Next we compute the F-statistic and calculate Prob[Results IF H0 true]:

SSRR = 0.018261 DFR = 24 − 1 = 23

SSRU = 0.004675 DFU = 24 − 4 = 20
SSRR − SSRU = 0.013586 DFR − DFU = 3

(SSRR − SSRU ) / ( DFR − DFU ) 0.013586 / 3 0.0044529

F= = = = 19.4
SSRU / DFU 0.004675 / 20 0.000234

What is the probability that the F-statistic from one pair of regressions
Prob[Results IF H0 true]:
would be 19.4 or more, if the H0 were true (i.e., if both prices and income have no effect on
quantity of beef demanded, if each of the actual coefficients, βP, βI, and βCP, equals 0)?

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Econometrics Lab 11.5: Calculating Prob[Results IF H0 true]

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.5.]

Using the Econometrics Lab, we conclude that the probability of obtaining the results like we
did if null hypothesis were true is less than 0.0001:

Prob[Results IF H0 true] < 0.0001

368 Chapter 11

Also we could let a statistical package do the work by using it to run the Wald test.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

After running the unrestricted regression, choose the Wald test option and impose the restriction
that all the coefficients equal 0.

Getting Started in EViews

After running the unrestricted regression:

• Click View, Coefficient Diagnostics, and Wald Test—Coefficient Restrictions.
• Enter the restriction; in this case,

C(1) = C(2) = C(3) = 0

• Click OK.

Prob[Results IF H0 true] < 0.0001

Note that even though the Wald test printout reports the probability to be 0.0000 (table 11.10),
it is not precisely 0 because the printout reports the probability only to four decimals. To empha-
size this fact, we report that Prob[Results IF H0 true] is less than 0.0001.
In fact most statistical packages automatically report this F-statistic and the probability when
we estimate the unrestricted model (table 11.11).
The values appear in the F-statistic and Prob[F-statistic] rows:

F-statistic = 19.4

Prob[Results IF H0 True] = 0.000004

Using a significance level of 1 percent, we would conclude that Prob[Results IF H0 true] is small.
Consequently we would reject the null hypothesis that none of the explanatory variables included
in the model has an effect on the dependent variable.

Table 11.10
Demand regression results—Wald test of entire model

Wald test

Degrees of freedom

Value Num Dem Prob

F-statistic 19.37223 3 20 0.0000

369 Hypothesis Testing and the Wald Test

Table 11.11
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP −0.411812 0.093532 − 4.402905 0.0003

LogI 0.508061 0.266583 1.905829 0.0711
LogChickP 0.124724 0.071415 1.746465 0.0961
Const 9.499258 2.348619 4.044615 0.0006
Number of observations 24 Degrees of freedom 20
Sum of squared residuals 0.004675
F-Statistic 19.3722 Prob[F-statistic] 0.000004
Estimated equation: EstLogQ = 9.50 − 0.41LogP + 0.51LogI + 0.12LogChick

11.4 Equivalence of Two-Tailed t-Tests and Wald Tests (F-tests)

A two-tailed t-test is equivalent to a Wald test. We will use the constant elasticity demand model
to illustrate this:

log(Q) = log(βConst) + βP log(P) + βI log(I) + βCP log(ChickP)

Focus on the coefficient of the price of chicken, βCP. Consider the following two-tailed
hypotheses:

H0: βCP = 0 ⇒ Price of chicken has no effect on the quantity of beef demanded

H1: βCP ≠ 0 ⇒ Price of chicken has an effect on the quantity of beef demanded

We will first calculate Prob[Results IF H0 true] using a two-tailed t-test and then using a Wald
test.

11.4.1 Two-Tailed t-Test

We begin by estimating the parameters of the model (table 11.12):

Model: LogQt = c + βPLogPt + βILogIt + βCPLogChickPt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]
370 Chapter 11

Table 11.12
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

Student t-distribution
Mean = 0
SE = 0.0714
DF = 20

0.0961/2 0.0961/2

bCP
0.12 0.12
0 0.12

Figure 11.6
Calculating Prob[Results IF H0 true]—Using a t-test

Next calculate Prob[Results IF H0 true].

Prob[Results IF H0 true]: What is the probability that the coefficient estimate in one regression,
bCP, would be at least 0.12 from 0, if H0 were true (if the actual coefficient, βCP, equals 0)?

OLS estimation If H0 Number of Number of

procedure unbiased true Standard error observations parameters
↓
Mean[bCP] = βCP = 0 SE[bCP] = 0.0714 DF = 24 − 4 = 20

Since the tails probability is based on the premise that the actual coefficient value equals 0, the
tails probability reported in the regression printout is just what we are looking for (figure 11.6):

Prob[Results IF H0 true] = 0.0961

371 Hypothesis Testing and the Wald Test

11.4.2 Wald Test

Next we turn to a Wald test. Let us review the rationale behind the Wald test:
• The null hypothesis enforces the restriction and the alternative hypothesis does not:

H0: Restriction true H1: Restriction not true

•We run two regressions: restricted and unrestricted. The restricted regression reflects the null
hypothesis and the unrestricted regression the alternative hypothesis:

If H0 is not true If H0 is true

↓ ↓
Restriction is not true Restriction is true
• SSRR cannot be less than SSRU: Since the least squares estimation procedure chooses the esti-
mates so as to minimize the sum of squared residuals, any restriction can only increase, not
decrease, the sum of squared residuals.
•So the question becomes: By how much does the SSRR exceed SSRU? The answer depends on
whether or not the null hypothesis is actual true:

If H0 is not true If H0 is true

↓ ↓
Restriction not true Restriction true
↓ ↓
Restriction should cost Restriction should cost
much in terms of SSR little in terms of SSR
↓ ↓
SSRR much larger than SSRU SSRR a little larger than SSRU
We compare the sizes of the
sum of squared residuals by
calculating the F-statistic:

(SSRR − SSRU ) / ( DFR − DFU )

F=
SSRU / DFU
F large F small

Now let us apply this to the problem at hand:

Model: LogQt = log(βConst) + βPLogPt + βILogIt + βCPLogChickPt + et

372 Chapter 11

Recall the hypotheses:

H0: βCP = 0 ⇒ Price of chicken has no effect on the quantity of beef demanded

H1: βCP ≠ 0 ⇒ Price of chicken has an effect on the quantity of beef demanded

Next estimate the restricted and unrestricted regressions:

• Restricted regression—reflects the null hypothesis, βCP = 0. For the restricted regression, we

just drop the price of chicken, PChick, as an explanatory variable because its coefficient is speci-
fied as 0 (table 11.13):

Model: LogQ = log(βConst) + βPLogP + βILogI + et

• Unrestricted regression—consistent with the alternative hypothesis, βCP ≠ 0. For the unre-
stricted regression, we include the price of chicken, PChick, as an explanatory variable. This is
just the regression we have run many times before (table 11.14):

Model: LogQt = c + βPLogPt + βILogIt + βCPLogChickPt + et

Table 11.13
Beef demand regression results—Restricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.305725 0.074513 − 4.102963 0.0005

LogI 0.869706 0.175895 4.944466 0.0001
Const 6.407302 1.616841 3.962852 0.0007
Number of observations 24 Degrees of freedom 21
Sum of squared residuals 0.005388

Table 11.14
Beef demand regression results—Unrestricted model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

F -Distribution
DFNum = 1
DFDem = 20

Prob[Results IF H0 true]

F
3.05

Figure 11.7
Calculating Prob[Results IF H0 true]—Using an F-test

Using these two regressions, we can now calculate the F-statistic (figure 11.7):

SSRR = 0.005388 DFR = 24 − 3 = 21

SSRU = 0.004675 DFU = 24 − 4 = 20
SSRR − SSRU = 0.000713 DFR − DFU = 1

(SSRR − SSRU ) / ( DFR − DFU ) 0.000713 / 1 0.0000713

F= = = = 3.05
SSRU / DFU 0.004675 / 20 0.000234

Prob[Results IF H0 true]: What is the probability that the F-statistic from one pair of regressions
would be 3.05 or more, if the H0 were true (if the actual coefficient, βCP, equals 0; that is, if the
price of chicken has no effect on the quantity of beef demanded)?

We can use either the Econometrics Lab to calculate this probability.

Econometrics Lab 11.6: Calculating Prob[Results IF H0 true]

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 11.6.]

Prob[Results IF H0 true] = 0.0961

Alternatively we could use statistical software to calculate the probability. After running the
unrestricted regression, choose the Wald test option and impose the restriction that all the coef-
ficients equal 0.
374 Chapter 11

Getting Started in EViews

After running the unrestricted regression:

• Click View, Coefficient Diagnostics, and Wald Test—Coefficient Restrictions.
• Enter the restriction; in this case

C(3) = 0
• Click OK.

Using either method, we conclude that based on a Wald test, the Prob[Results IF H0 true] equals
0.0961(table 11.15).
Now compare Prob[Results IF H0 true]’s calculated for the two-tailed t-test and the Wald test:

t-Test: Prob[Results IF H0 true] = 0.0961

Wald test: Prob[Results IF H0 true] = 0.0961

The probabilities are identical. This is not a coincidence. It can be shown rigorously that a two-
tailed t-test is a special case of the Wald test.

11.5 Three Important Distributions

We have introduced three distributions that are used to assess theories: Normal, Student-t, and
F (figure 11.8).
•Theories involving a single variable: Normal distribution and Student t-distribution. The
normal distribution is used whenever we know the standard deviation of the distribution; the
normal distribution is described by its mean and standard deviation.
Often we do not know the standard deviation of the distribution, however. In these cases we turn
to the Student t-distribution; it is described by its mean, estimated standard deviation (standard
error), and the degrees of freedom. The Student t-distribution is more “spread out” than the

Table 11.15
Beef demand regression results—Wald test of LogChickP coefficient

Wald test

Degrees of freedom

Value Num Dem Prob

F-Statistic 3.050139 1 20 0.0961

375 Hypothesis Testing and the Wald Test

Normal distribution and Student t-distribution F-Distribution

Normal
Student t

Theories involving Theories involving

a single variable several variables

If standard deviation If standard deviation

↓
known not known F-Distribution
↓ ↓ ↓
F-Distribution
Normal distribution Student t-distribution
is described by
↓ ↓ the degrees of
Normal distribution Student t-distribution freedom
is described by its is described by its in its numerator and
mean and the mean and the estimated denominator
standard deviation standard deviation,
(standard error), and
degrees of freedom

Figure 11.8
Normal distribution, Student t-distribution, and F-distribution

normal distribution because an additional element of uncertainty is added when the standard
deviation is not known and must be estimated.
• Theories involving several variables: F-Distribution. The F-distribution can be used to assess
relationships among two or more estimates. We compute the F-statistic by using the sum of
squared residuals and the degrees of freedom in the restricted and unrestricted regressions:

(SSRR − SSRU ) / ( DFR − DFU )

F=
SSRU / DFU

The F-distribution is described by the degrees of freedom in the numerator and denominator,
DFR − DFU and DFU.

Chapter 11 Review Questions

1. Consider a Wald test. Can the restricted sum of squared residuals be less than the unrestricted
sum of squared residuals? Explain.
376 Chapter 11

2. How is a Wald test F-statistic related to the restricted and unrestricted sum of squared
residuals?
3. How is a two-tailed t-test related to the Wald test?
4. What are the three important probability distributions we have introduced? When is it appro-
priate to use each of them?

Chapter 11 Exercises

Agricultural production data: Cross-sectional agricultural data for 140 nations in 2000 that
cultivated more than 10,000 square kilometers of land.

Labort Number of agricultural workers in country t (persons)

Landt Land under cultivation in country t (sq km)
Machineryt Number of agricultural machines in country t (tractors)
ValueAddedt Agricultural valued added in country t (2000 US dollars)

1. Focus on the log form of the constant elasticity value added model:

log(ValueAddedt) = log(βConst) + βLaborlog(Labort) + βLandlog(Landt)

+ βMachinerylog(Machineryt) + et

and consider the constant returns to scale theory.

a. Express the null and alternative hypotheses for the constant returns to scale theory in terms
of βLabor, βLand, and βMachinery.
b. Assess the constant returns to scale theory using the “clever” definition approach.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]

2. Focus on the log form of the constant elasticity value added model:

log(ValueAddedt) = log(βConst) + βLaborlog(Labort) + βLandlog(Landt)

+ βMachinerylog(Machineryt) + et

Assess the constant returns to scale theory using the Wald approach.
a. Consider the unrestricted regression.
i. Estimate the parameters of the unrestricted regression.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]
377 Hypothesis Testing and the Wald Test

ii. What does the unrestricted sum of squared residuals equal?

iii. What do the unrestricted degrees of freedom equal?
b. Consider the restricted regression.
i. If constant returns to scale are present, what condition would βLabor, βLand, and βMachinery
satisfy?
ii. Derive the equation that describes the restricted regression.
iii. Estimate the parameters of the restricted regression.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]

iv. What does the restricted sum of squared residuals equal?

v. What do the restricted degrees of freedom equal?
c. Using your answers to the previous parts, compute the F-statistic for the Wald test.
d. Using the Econometrics Lab, compute Prob[Results IF H0 true].

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

F-Distribution.]

3. Assess the constant returns to scale theory using the Wald approach the “easy way” with
statistical software.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Agricultural Production.]

4. Compare the Prob[Results IF H0 true] that has been calculated in three ways: clever algebra,
Wald test using the Econometrics Lab, and Wald test using statistical software.

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.

CigConsPCt Cigarette consumption per capita in state t (packs)

PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)
PriceSuppliert Price of cigarettes in state t received by suppliers (dollars per pack)
Taxt Cigarette tax rate in state t (dollars per pack)

5. Reconsider state cigarette consumption data.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]
378 Chapter 11

We, as consumers, naturally think of the price of cigarettes as what we must pay to purchase a
pack of cigarettes:

PriceConsumert = price from the standpoint of the consumer in state t

The seller of cigarettes, however, must pass the cigarette tax on to the government. From the
standpoint of the supplier, the price from the supplier’s standpoint equals the price from the
producer’s standpoint less the tax:

PriceSuppliert = PriceConsumert − Taxt

PriceConsumert = PriceSuppliert + Taxt

Convince yourself that the values of PriceConsumer, PriceSupplier, and Tax for our cigarette
consumption data are actually related in this way.
Consider the following model:

CigConsPCt = βConst + βPriceSupplierPriceSuppliert + βTaxTaxt + et

This model raises the possibility that consumers of cigarettes react differently to the price
received by the supplier and the tax received by the government even though they both affect
the price paid by consumers in the same way.
a. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients of
PriceSuppliert and Taxt.
b. Interpret the coefficient estimates.
6. Continue using cigarette consumption data and focus on the following null and alternative
hypothesis:

H0: βPriceSupplier = βTax

H1: βPriceSupplier ≠ βTax

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

a. In words, what does the null hypothesis suggest? What does the alternative hypothesis
suggest?
b. Use the cigarette consumption data and a clever algebraic manipulation to calculate
Prob[Results IF H0 true].
7. Continue using the cigarette consumption data in order to use the Wald test to calculate
Prob[Results IF H0 true].
379 Hypothesis Testing and the Wald Test

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

a. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the unrestricted regression. What do the unrestricted sum of square residuals and degrees
of freedom equal?
b. Derive the equation that describes the restricted regression.
c. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the restricted regression. What do the restricted sum of square residuals and degrees of
freedom equal?
d. Compute the F-statistic for the Wald test.
e. Using the Econometrics Lab, compute Prob[Results IF H0 true].

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

F-Distribution.]

8. Continue using the cigarette consumption data in order to calculate Prob[Results IF H0 true]
the “easy way.”

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

a. Use statistical software to compute Prob[Results IF H0 true].

b. Compare the Prob[Results IF H0 true] that has been calculated in three ways: clever
algebra, Wald test using the Econometrics Lab, and statistical software.
Model Specification and Development
12

Chapter 12 Outline

12.1 Model Specification: Ramsey REgression Specification Error Test (RESET)

12.1.1 RESET Logic
12.1.2 Linear Demand Model
12.1.3 Constant Elasticity Demand Model

12.2 Model Development: The Effect of Economic Conditions on Presidential Elections

12.2.1 General Theory: “It’s the economy stupid”
12.2.2 Generate Relevant Variables
12.2.3 Data Oddities
12.2.4 Model Formulation and Assessment: An Iterative Process
12.2.5 Specific Voting Models

Chapter 12 Prep Questions

1. Consider a multiple regression model. When a particular explanatory variable has no effect
on the dependent variable, what does the actual value of its coefficient equal?
2. The 1992 Clinton presidential campaign focused on the economy and made the phrase “It’s
the economy stupid” famous. Bill Clinton and his political advisors relied on the theory that
voters hold the President and his party responsible for the state of the economy. When the
economy performs well, the President’s party gets credit; when the economy performs poorly,
the President’s party takes the blame.
“It’s the economy stupid” theory: The American electorate is sensitive to economic conditions.
Good economic conditions increase the vote for the President’s party; bad economic conditions
decrease the vote for the President’s party.
382 Chapter 12

Consider the following model:

VotePresPartyt = βConst + βUnemPriorAvgUnemPriorAvgt + et

where

VotePresPartyt = percent of the popular vote received by the incumbent President’s party in
year t
UnemPriorAvgt = average unemployment rate in the three years prior to election, that is, three
years prior to year t
a. Assuming that the “It’s the economy stupid” theory is correct, would βUnemPriorAvg be posi-
tive, negative or zero?
b. For the moment assume that when you run the appropriate regression, the sign of the
coefficient estimate agrees with your answer to part a. Formulate the null and alternative
hypotheses for this model.
3. Again focus on the on the “It’s the economy stupid” theory. Consider a second model:

VotePresPartyt = βConst + βUnemCurrentUnemCurrentt + et

where

UnemCurrentt = unemployment rate in the current year, year t

a. Assuming that the theory is correct, would βUnemCurrent be positive, negative or zero?
b. For the moment assume that when you run the appropriate regression, the sign of the
coefficient estimate agrees with your answer to part a. Formulate the null and alternative
hypotheses for this model.
4. Again focus on the on the “It’s the economy stupid” theory. Consider a third model:

VotePresPartyt = βConst + βUnemTrendUnemTrendt + et

where

UnemCurrentt = unemployment rate change from previous year; that is, the unemployment rate
trend in year t (Note: If the unemployment rate is rising, the trend will be a positive number; if
the unemployment rate is falling, the trend will be a negative number.)
a. Assuming that the theory is correct, would βUnemTrend be positive, negative or zero?
b. For the moment assume that when you run the appropriate regression, the sign of the
coefficient estimate agrees with your answer to part a. Formulate the null and alternative
hypotheses for this model.
5. The following table reports the percent of the popular vote received by the Democrats,
Republicans, and third parties for every presidential election since 1892.
383 Model Specification and Development

Year VotePartyDem VotePartyRep VotePartyThird

1892 46.1 43.0 10.9
1896 46.7 51.0 2.3
1900 45.5 51.7 2.8
1904 37.6 56.4 6.0
1908 43.0 51.7 5.3
1912 41.8 23.2 35.0
1916 49.2 46.1 4.7
1920 34.1 60.3 5.6
1924 28.5 54.0 17.5
1928 40.8 58.3 0.9
1932 57.4 39.6 3.0
1936 60.8 36.5 2.7
1940 54.7 44.8 0.5
1944 53.4 45.9 0.7
1948 49.6 45.1 5.3
1952 44.4 55.1 0.5
1956 42.0 57.4 0.6
1960 49.7 49.5 0.8
1964 61.1 38.5 0.4
1968 42.7 43.4 13.9
1972 37.5 60.7 1.8
1976 50.1 48.0 1.9
1980 41.0 50.7 8.3
1984 40.6 58.8 0.6
1988 46.6 53.4 0.0
1992 43.3 37.7 19.0
1996 50.0 42.0 8.0
2000 48.4 47.9 3.7
2004 48.3 50.7 1.0
2008 52.9 45.6 1.5
384 Chapter 12

Focus your attention on the vote received by third party candidates.

a. Which election stands out as especially unusual?

b. What were the special political circumstances that explain why this particular election is
so unusual?

12.1 Model Specification: Ramsey REgression Specification Error Test (RESET)

We have introduced two different models of demand:

• The linear demand model:

Qt = βConst + βPPt + βIIt + βCPChickPt + et

• The constant elasticity demand model:

LogQt = c + βPLogPt + βILogIt + βCPLogChickPt + et

Project: Assess the specification of the demand models.

12.1.1 RESET Logic

Both models use the same information to explain the quantity of beef demanded: the price of
beef (the good’s own price), income, and the price of chicken. The models use this information
differently, however. That is, the two models specify two different ways in which the
quantity of beef demanded is related to the price of beef (the good’s own price), income, and
the price of chicken. We will now explore how we might decide whether or not a particular
specification of the model can be improved. The RESET test is designed to do just this. In the
test we modify the original model to construct an artificial model. An artificial model is not
designed to test a theory, but rather it is designed to assess the original model.
To explain the RESET test, we begin with the general form of the simple linear regression
model: y is the dependent variable and x is the explanatory variable:

yt = βConst + βxxt + et

We use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters:
• bConst estimates βConst
• bx estimates βx

The parameter estimates can be used to estimate the value of y:

Esty = bConst + bxx

385 Model Specification and Development

The estimated value of y is sometimes called the fitted value of y.

Now we come to the RESET test. We specify an artificial model which adds an explanatory
variable to the original model. The new explanatory variable is the estimated value of y squared:

yt = γConst + γxxt + γEsty2Esty 2t + εt

The artificial model looks just like the original model with one addition: the square of the esti-
mated value for the original model’s dependent variable, Esty2.
Esty is calculated from the information used to estimate the original model. Consequently the
artificial model adds no new information. The artificial model uses the same information as the
original model, but uses it in a different way. This is the rationale behind the RESET test:

Critical point: The artificial model adds no new information. It is just using the same informa-
tion in a different form.
Question: Can this new form of the information in the artificial model help us explain the
dependent variable significantly better? The coefficient of Esty2 provides the answer to this
question. If γEsty2, the coefficient of Esty2, equals 0, the new form of the information is adding
no explanatory power; if γEsty2 does not equal 0, the new form adds power. We now construct the
appropriate null and alternative hypotheses:

H0: γEsty2 = 0 ⇒ New form of the information adds NO explanatory power

H1: γEsty2 ≠ 0 ⇒ New form of the information adds explanatory power

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Unlikely that the new form of the Likely that the new form of the
information adds no explanatory power information adds no explanatory
↓ power
Likely that the new form of the
information adds explanatory power
↓
There is reason to consider a new There is no compelling reason to
model that uses the information in a consider a new model that uses the
different form information in a different form
386 Chapter 12

12.1.2 Linear Demand Model

We will now consider the linear model of beef demand to illustrate the RESET test:

Original model: Qt = βConst + βPPt + βIIt + βCPChickPt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

First run the regression to estimate the parameters of the original model (table 12.1):
Next construct the artificial model:

Artificial model: Qt = γConst + γPPt + γIIt + γCPChickPt + γEstQ2EstQ 2t + εt

EstQ is the estimated value of Q based on the original model:

EstQ = 159,032 − 549.5P + 24.25I + 287.4ChickP

Step 1: Collect data, run the regression, and interpret the estimates.

After generating EstQ, we square it to generate EstQSquared:

EstQSquared = EstQ2

Then we use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 12.2):

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the new form of the information adds no explanatory power.
The coefficient EstQ2, γEstQ2, actually equals 0.

Table 12.1
Beef demand regression results—Linear model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P −549.4847 130.2611 − 4.218333 0.0004

Table 12.2
Beef demand regression results—Artificial model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P 13,431.05 5,861.208 2.291515 0.0335

I −593.4581 259.1146 −2.290330 0.0336
ChickP −7,054.644 3,082.373 −2.288706 0.0337
EstQSquared 5.79E-05 2.42E-05 2.385742 0.0276
Const −1,085,301.0 524,497.0 −2.069223 0.0524
Number of observations 24
Critical result: The EstQSquared coefficient estimate is 0.0000579. The estimate does not equal 0; the estimate is
0.0000579 from 0. This evidence suggests that the new form of the information adds explanatory
power.

H0: γEstQ2 = 0 Cynic is correct: New form of the information adds NO explanatory power;
there is no compelling reason to consider a new specification of the original
model.
H1: γEstQ2 ≠ 0 Cynic is incorrect: New form of the information adds explanatory power; there
is reason to consider a new specification of the original model.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct the new form of the information adds NO
explanatory power?
• Specific question: The regression’s estimate of γEstQ2 was 0.0000579. What is the probability
that the estimate of γEstQ2 from one regression would be at least 0.0000579 from 0, if H0 were
true (i.e., if γEstQ2 actually equaled 0, if the different form of the information did not improve the
regression)?

Answer: Prob[Results IF H0 true]

The size of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
388 Chapter 12

Table 12.3
Beef demand regression results—Linear model RESET test

Ramsey RESET test

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P 13,431.11 5,861.274 2.291499 0.0335

I −593.4412 259.1094 −2.290311 0.0336
ChickP −7,054.195 3,082.207 −2.288683 0.0337
C −1,085,437.0 524,557.3 −2.069244 0.0524
Fitted^2 5.79E-05 2.43E-05 2.385725 0.0276
Number of observations 24
Critical result: The Fitted^2 coefficient estimate is 0.0000579. The estimate does not equal 0; the estimate is
0.0000579 from 0. This evidence suggests that the new form of the information adds explanatory
power.

Steps 4 and 5: The tails probability reported in the regression results is the probability that
we need:

Prob[Results IF H0 true] = 0.0276

We would reject the null hypothesis at the 5 percent significance level although not at the 1
percent significance level. This suggests that it may be prudent to investigate an alternative
specification of the original model.
Fortunately, statistical software provides a very easy way to run a RESET test by generating
the new variable automatically (table 12.3).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

Getting Started in EViews

After running the unrestricted regression:

• Click View, Stability Diagnostics, Ramsey RESET Test.
• Enter the number of fitted terms to include, 1 in this case (we want one “fitted” term, EstQ2).
• Click OK.

Our calculations and those provided by the statistical software are essentially the same. The
slight differences that do emerge result from the fact that we rounded off some decimal places
from the parameter estimates of the original model when we generated the estimated value of
Q, EstQ.
389 Model Specification and Development

Summarizing the RESET logic:

• EstQ2 adds no additional information; it is just using the same information in a different form.
• In the case of our linear demand model, including EstQ2 in the artificial regression improves
the results “significantly” at the 5 percent significance level suggesting it may be prudent to
investigate an alternative specification of the original model.

12.1.3 Constant Elasticity Demand Model

Next consider a different specification of the model, a constant elasticity demand model:

Original model: LogQt = c + βPLogPt + βILogIt + βCPLogChickPt + et

We then estimate its parameters using the ordinary least squares (OLS) estimation procedure
(table 12.4).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beef Demand.]

Now let us construct the artificial model:

LogQt = γConst + γPLogPt + γILogIt + γCPLogChickPt + γEstQ2EstLongQ 2t + εt

Step 1: Collect data, run the regression, and interpret the estimates.

We will estimate the artificial model using statistical software (table 12.5).

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the new form of the information adds no explanatory power.

Table 12.4
Beef demand regression results—Constant elasticity model

Ordinary least squares (OLS)

Dependent variable: LogQ

Explanatory variable(s): Estimate SE t-Statistic Prob

LogP − 0.411812 0.093532 − 4.402905 0.0003

Table 12.5
Beef demand regression results—Constant elasticity Model RESET test

Ramsey RESET test

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

LogP 108.0339 54.18066 1.993956 0.0607

LogI −133.2683 66.83651 −1.993944 0.0607
LogChickP −32.73312 16.41626 −1.993946 0.0607
Const −872.3276 440.5753 −1.979974 0.0624
Fitted^2 10.70320 5.347430 2.001560 0.0598
Number of observations 24
Critical result: The Fitted^2 coefficient is 10.7. The estimate does not equal 0; the estimate is 10.7 from 0. This
evidence suggests that the new form of the information adds explanatory power.

The coefficient EstLogQ2, γEstQ2, actually equals 0.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct the new form of the information adds NO
explanatory power?
• Specific question: The regression’s estimate of γEstQ2 was 10.7. What is the probability that
the estimate of γEstQ2 from one regression would be at least 10.7 from 0, if H0 were true (that is,
if γEstQ2 actually equaled 0, if the different form of the information did not improve the
regression)?

Answer: Prob[Results IF H0 true]

The size of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
Steps 4 and 5: The tails probability reported in the regression results is the probability that we
need:
391 Model Specification and Development

Prob[Results IF H0 true] = 0.0598

Using the traditional significance levels of 1 or 5 percent, we do not reject the null hypothesis
and conclude that there is no compelling reason to specify a new model.

12.2 Model Development: The Effect of Economic Conditions on Presidential Elections

12.2.1 General Theory: “It’s the economy stupid”

The 1992 Clinton presidential campaign focused on the economy and made the phrase “It’s the
economy stupid” famous. Bill Clinton, the Democratic challenger, and his political advisors
relied of the theory that voters hold the Republican President, George H. W. Bush, and his party
responsible for the state of the economy. When the economy performs well, the President’s party
gets credit; when the economy performs poorly, the President’s party takes the blame:

“It’s the economy stupid” theory: The American electorate is sensitive to economic conditions.
Good economic conditions increase the vote for the President’s party; bad economic conditions
decrease the vote for the President’s party.
Project: Assess the effect of economic conditions on presidential elections.

Clearly, we need data to test this theory. Fortunately, we have already collected some data. Data
from 1890 to 2008 can be easily accessed:

Presidential election data: Annual time series data of US presidential election and economic
statistics from 1890 to 2008.

VotePartyDemt Percent of popular vote received by the Democratic candidate in year t

VotePartyRept Percent of popular vote received by the Republican candidate in year t
VotePartyThirdt Percent of the popular vote received by third (minor) party candidates in
year t
PresPartyR1t 1 if incumbent President is Republican in year t; 0 if Democrat in year t
PresIncumt 1 if incumbent President is a candidate in year t, 0 otherwise
PresPartyTermst Number of consecutive terms the incumbent President’s party has held the
presidency in year t
UnemCurrentt Unemployment rate in year t (percent)
RealGdpCurrentt Real GDP in year t
RealGdpGrowtht Real GDP growth rate in year t (percent)
PriceCpiCurrentt Price level in year t (CPI)
InflCpiCurrentt Inflation rate in year t based on the CPI (percent)
PriceGdpCurrentt GDP price deflator in year t
InflGdpCurrentt Inflation rate in year t based on the GDP price deflator (percent)
392 Chapter 12

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election – 1892–2008.]

12.2.2 Generate Relevant Variables

First note that the data does not include the variable that we are trying to explain: the vote
received by the incumbent President’s party:

VotePresPartyt Percent of popular vote received by the President’s party in year t

Fortunately, we can generate it from the variables that we have. We have data reporting the
percent of the popular vote received by the Democratic and Republican candidates, VotePar-
tyDemt and VotePartyRept. Also another variable indicates the incumbent President’s party,
PresPartyR1t. Focus attention on these three variables:

VotePartyDemt Percent of popular vote received by the Democratic candidate in year t

VotePartyRept Percent of popular vote received by the Republican candidate in year t
PresPartyR1t 1 if the President is a Republican in year t, 0 if Democrat in year t

We can use the following equation to generate the variable VotePresParty:

VotePresPartyt = PresPartyR1t × VotePartyRept + (1− PresPartyR1t) × VotePartyDemt

To show that this new variable indeed equals the vote receive by the President’s party consider
the two possibilities:
• When the Republicans are occupying the White House,

PresPartyR1t = 1 and 1 − PresPartyR1t = 0

The new variable VotePresPartyt will equal the vote received by the Republican candidate:

VotePresPartyt = PresPartyR1t × VotePartyRept + (1− PresPartyR1t) × VotePartyDemt

= 1 × VotePartyRept + 0 × VotePartyDemt

= VotePartyRept
• When the Democrats are occupying the White House,

PresPartyR1t = 0 and 1 − PresPartyR1t = 1

The new variable VotePresPartyt will equal the vote received by the Democratic candidate:

VotePresPartyt = PresPartyR1t × VotePartyRept + (1− PresPartyR1t) × VotePartyDemt

= 0 × VotePartyRept + 1 × VotePartyDemt

= VotePartyDemt
393 Model Specification and Development

Table 12.6
Checking generated variables

Year VotePartyDem VotePartyRep PresPartyR1 VotePresParty

1892 46.1 43.0 1 43.0

1896 46.7 51.0 0 46.7
1900 45.5 51.7 1 51.7
1904 37.6 56.4 1 56.4
1908 43.0 51.7 1 51.7

After generating any new variable, it is important to check to be certain that it is generated cor-
rectly. The first few elections are reported in table 12.6. Everything looks fine. When Republicans
hold the White House (when PresPartyRIt equals 1), the new variable, VotePresPartyt, equals
the vote received by the Republican candidate (VotePartyRept). Alternatively, when Democrats
hold the White House (when PresPartyR1t equals 0), VotePresPartyt equals the vote received by
the Democratic candidate (VotePartyDemt).

12.2.3 Data Oddities

Next let us look at our voting data to investigate the possibility of data oddities (table 12.7). In
all but a handful of elections, third (minor) parties captured only a small percent of the total
vote. In some years third parties received a substantial fraction, however. The election of 1912
is the most notable example.
In the 1912 election more than a third of votes are siphoned off from the Republicans and
Democrats. How should deal with this? One approach is just to focus on those elections that
were legitimate two-party elections. In this approach, we might ignore all elections in which
third (minor) parties receive at least 10 percent or perhaps 15 percent of the votes cast. If we
were to pursue this approach, however, we would be discarding information. Econometricians
never like to throw away information. Another approach would be to focus just on the two major
parties by expressing the percent of votes just in terms of those votes casted just for the Repub-
lican and Democratic candidates. Let us call this variable VotePresPartyTwot:

VotePresPartyTwot Percent of popular vote received by the incumbent President’s party

based on the two major parties (ignoring third parties) in year t

We can generate this variable by using the following equation:

VotePresPartyTwo = 100 × VotePresParty/(VotePartyRep + VotePartyDem)

As always, it is important to be certain that the new variable has been generated correctly
(table 12.8). Undoubtedly there are other ways to account for third parties. In this chapter,
however, we will do so by focusing on the variable VotePresPartyTwo.
394 Chapter 12

Table 12.7
Checking for data oddities

Year VotePartyDem VotePartyRep VotePartyThird

1892 46.1 43.0 10.9

1896 46.7 51.0 2.3
1900 45.5 51.7 2.8
1904 37.6 56.4 6.0
1908 43.0 51.7 5.3
1912 41.8 23.2 35.0
1916 49.2 46.1 4.7
1920 34.1 60.3 5.6
1924 28.5 54.0 17.5
1928 40.8 58.3 0.9
1932 57.4 39.6 3.0
1936 60.8 36.5 2.7
1940 54.7 44.8 0.5
1944 53.4 45.9 0.7
1948 49.6 45.1 5.3
1952 44.4 55.1 0.5
1956 42.0 57.4 0.6
1960 49.7 49.5 0.8
1964 61.1 38.5 0.4
1968 42.7 43.4 13.9
1972 37.5 60.7 1.8
1976 50.1 48.0 1.9
1980 41.0 50.7 8.3
1984 40.6 58.8 0.6
1988 46.6 53.4 0.0
1992 43.3 37.7 19.0
1996 50.0 42.0 8.0
2000 48.4 47.9 3.7
2004 48.3 50.7 1.0
2008 52.9 45.6 1.5

Table 12.8
Checking generated variables

Year VotePres VotePartyDem VotePartyRep VotePresPartyTwo

1892 43.00000 46.10000 43.00000 48.26038

1896 46.70000 46.70000 51.00000 47.79939
1900 51.70000 45.50000 51.70000 53.18930
1904 56.40000 37.60000 56.40000 60.00000
395 Model Specification and Development

12.2.4 Model Formulation and Assessment: An Iterative Process

Now we will illustrate the iterative process that that econometricians use to develop their models.
There is no “cookbook” procedure we can follow. Common sense and inventiveness play critical
roles in model development:

Model formulation:
Formulate a specific model
describing the theory
Incorporate insights from the assessment
↓ ↑
to refine the specific model describing
Model assessment: the general theory
Apply econometric techniques
to assess the model

Gradually we refine the specific details of the model using an iterative process: model formula-
tion and model assessment. In a real sense this is as much of an art as a science.

12.2.4 Specific Voting Models

We will describe specific models that attempt to explain the percent of the vote received by the
President’s party. In doing so, we will illustrate how the iterative process of model formulation
and model assessment leads us from one model to the next. We begin by observing that the
unemployment rate is most frequently cited economic statistic. Every month the Bureau of Labor
Statistics announces the previous month’s unemployment rate. The announcement receives
headline attention in the newspapers and on the evening news broadcasts. Consequently it seems
natural to begin with models that focus on the unemployment rate. We will eventually refine our
model by extending our focus to another important economic variable, inflation.

Model 1: Past performance—Electorate is sensitive to how well the economy has performed
in the three years prior to the election.

The first model implicitly assumes that voters conscientiously assess economic conditions over
the three previous years of the President’s administration. If conditions have been good, the
President and his party are rewarded with more votes. If conditions have been bad, fewer votes
would be received. More specifically, we use the average unemployment rate in the three years
prior to the election to quantify economic conditions over the three previous years of the Presi-
dent’s administration.

Step 0: Formulate a model reflecting the theory to be tested.

VotePresPartyTwot = βConst + βUnemPriorAvgUnemPriorAvgt + et

396 Chapter 12

where

UnemPriorAvgt = average unemployment rate in the three years prior to election; that is, three
years prior to year t

Theory: A high the average unemployment rate during the three years prior to the election will
decrease the votes for the incumbent President’s party; a low average unemployment rate will
increase the votes. The actual value of the coefficient, βUnemPriorAvg, is negative:
βUnemPriorAvg < 0

Step 1: Collect data, run the regression, and interpret the estimates.

After generating the variable UnemPriorAvg we use the ordinary least squares (OLS) estima-
tion procedure to estimate the model’s parameters (table 12.9).
The coefficient estimate is 0.33. The coefficient estimate directly contradicts our theory.
Accordingly we will abandon this model and go “back to the drawing board.” We will consider
another model.

Model 2: Present performance—Electorate is sensitive to how well the economy is performing

during the election year itself.

Our analysis of the first model suggests that voters may not have a long memory; accordingly,
the second model suggests that voters are myopic; voters judge the President’s party only on the
current economic climate; they do not care what has occurred in the past. More specifically, we
use the current unemployment rate to assess economic conditions.

Table 12.9
Election regression results—Past performance model

Ordinary least squares (OLS)

Dependent variable: VotePresPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

UnemPriorAvg 0.331914 0.319360 1.039310 0.3079

Const 49.70180 2.595936 19.14600 0.0000
Number of observations 29
Estimated equation: EstVotePresPartyTwo = 49.7 + 0.33UnemPriorAvg
Interpretation of estimates:
bUnemPriorAvg = 0.33: A 1 percentage point increase in the average unemployment rate during the three years prior to
the election increases the vote the President’s party receives by 0.33 percentage points.
Critical result: The coefficient estimate for UnemPriorAvg equals 0.33. The positive sign of the coefficient estimate
suggests that a higher average unemployment rate in the three years prior to the election increases
the votes received by the President’s party. This evidence tends to refute the “It’s the economy
stupid” theory.
397 Model Specification and Development

Step 0: Formulate a model reflecting the theory to be tested.

VotePresPartyTwot = βConst + βUnemCurrentUnemCurrentt + et

where

UnemCurrentt = unemployment rate in the current year, year t

Theory: A high unemployment rate in the election year itself will decrease the votes for incum-
bent President’s party; a low unemployment rate will increase the votes. The actual value of the
coefficient, βUnemCurrent, is negative:

βUnemCurrent < 0

Step 1: Collect data, run the regression, and interpret the estimates.

We use the ordinary least squares (OLS) estimation procedure to estimate the second model’s
parameters (table 12.10).
The coefficient estimate is −0.12. This is good news. The evidence supports our theory. Now
we will continue on to determine how confident we should be in our theory.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the current unemployment rate does not affect the votes
received by the incumbent President’s party.

H0: βUnemCurrent = 0 Cynic is correct: Current unemployment rate has no effect on votes
H1: βUnemCurrent < 0 Cynic is incorrect: High unemployment rate reduces votes for the incum-
bent President’s party

Table 12.10
Election regression results—Present performance model

Ordinary least squares (OLS)

Dependent variable: VotePresPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

UnemCurrent − 0.124858 0.294305 − 0.424247 0.6746

Const 52.70895 2.414476 21.83039 0.0000
Number of observations 30
Estimated equation: EstVotePresPartyTwo = 52.7 − 0.12UnemCurrent
Interpretation of estimates:
bUnemCurrent = 0.12: A 1 percentage point increase in the election year unemployment rate decreases the vote the
President’s party receives by 0.12 percentage points.
Critical result: The coefficient estimate for UnemCurrent equals −0.12. The negative sign of the coefficient estimate
suggests that a higher unemployment rate in the election year reduces the votes received by the
President’s party. This evidence lends support to the “It’s the economy stupid” theory.
398 Chapter 12

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
• Generic question: What is the probability that the results would be like those we actually
obtained (or even stronger), if the cynic is correct and the current unemployment rate actually
has no impact?
• Specific question: The regression’s coefficient estimate was −0.12. What is the probability
that the coefficient estimate in one regression would be −0.12 or less, if H0 were actually true
(if the actual coefficient, βUnemCurrent, equals 0)?

Answer: Prob[Results IF H0 true] (figure 12.1)

The size of this probability determines whether we reject the null hypothesis:

Prob[Results IF H0 true] small Prob[Results IF H0 true] large

↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Steps 4 and 5: Use the EViews regression printout to calculate Prob[Results IF H0 true]
(figure 12.2).
The tails probability answers the following question:

Question: If actual value of the coefficient were 0, what is the probability that the estimate
would be at least 0.12 from 0?
Answer: 0.6746

Prob[Results IF H0 true]

bUnemCurrent

−0.12 0

Figure 12.1
Probability distribution of coefficient estimate
399 Model Specification and Development

0.6746/2 0.6746/2

bUnemCurrent
0.12 0.12
− 0.12 0

Figure 12.2
Probability distribution of coefficient estimate

The probability of being in the left-hand tail equals the tails probability divided by 2:

0.6746
Prob[ Results IF H 0 true] = ≈ 0.34
2

This is not good news. By the traditional standards, a significance level of 1, 5, or 10 percent,
this probability is large; we cannot reject the null hypothesis which asserts that the current
unemployment rate has not effect on votes.
Model 2 provides both good and bad news. The coefficient sign supports the theory suggesting
that we are on the right track. Voters appear to have a short memory; they appear to be more
concerned with present economic conditions than the past. The bad news is that the coefficient
for the current unemployment rate does not meet the traditional standards of significance.

Model 3: Present trend—Electorate is sensitive to the current trend, whether economic condi-
tions are improving or deteriorating during the election year.

The second model suggests that we may be on the right track by just focusing on the election
year. The third model speculates that voters are concerned with the trend in economic conditions
during the election year. If economic conditions are improving, the incumbent President’s party
is rewarded with more votes. On the other hand, if conditions are deteriorating, fewer votes
would be received. We use the trend in the unemployment rate to assess the trend in economic
conditions.

Step 0: Formulate a model reflecting the theory to be tested.

VotePresPartyTwot = βConst + βUnemTrendUnemTrendt + et

400 Chapter 12

where

UnemTrendt = unemployment rate change from previous year; that is, the unemployment rate
trend in year t

Theory: A rising unemployment rate during the election year will decrease the votes of the
incumbent President’s party; a falling unemployment rate will increase votes. The actual value
of the coefficient, βUnemTrend, is negative:
βUnemTrend < 0

Step 1: Collect data, run the regression, and interpret the estimates.

After generating the variable UnemTrend, we use the ordinary least squares (OLS) estimation
procedure to estimate the model’s parameters (table 12.11).
The coefficient estimate is −0.75. This is good news. The evidence supports our theory. Now
we will continue on to determine how confident we should be in our theory.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the unemployment rate trend does not affect the votes
received by the incumbent President’s party.

Table 12.11
Election regression results—Present trend model

Ordinary least squares (OLS)

Dependent variable: VotePresPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

UnemTrend − 0.752486 0.568784 −1.322973 0.1965

Const 51.95260 1.325732 39.18785 0.0000
Number of observations 30
Estimated equation: EstVotePresPartyTwo = 52.0 − 0.75UnemTrend
Interpretation of estimates:
bUnemTrend = −0.75: A 1 percentage point rise in the unemployment from the previous year decreases the vote the
President’s party receives by 0.75 percentage points. On the contrary, a 1 percentage point fall in the unemploy-
ment rate increases the vote by 0.75 percentage points.
Critical result: The UnemTrend coefficient estimate equals −0.75. The negative sign of the coefficient estimate
suggests that deteriorating economic conditions as evidenced by a rising unemployment will
decrease the vote received by the President’s party. On the contrary, improving economic conditions
as evidenced by a falling unemployment rate will increase the votes received by the President’s
party. This evidence lends support to the “It’s the economy stupid” theory.
401 Model Specification and Development

H0: βUnemTrend = 0 Cynic is correct: Unemployment rate trend has no effect on votes
H1: βUnemTrend < 0 Cynic is incorrect: A rising unemployment rate (a positive value for
UnemTrend) decreases the vote for the incumbent President’s party; a
falling unemployment rate trend (a negative value for UnemTrend)
increases the vote.
Steps 3, 4, and 5: We will now calculate Prob[Results IF H0 true]. We have done this several
times now, we know that since we are conducting a one-tailed test, the Prob[Results IF H0 true]
equals half the tails probability:

0.1965
Prob[ Results IF H 0 true] = ≈ 0.10
2

While this probability is still considered large at the 5 percent significance level, we appear to
be on the right track. We will shortly consider a fourth model; it postulates that when judging
economic conditions, the electorate considers not only the unemployment rate trend but also the
trend in prices, the inflation rate.
Before moving on to model 4, however, let us illustrate the subtle difference between models
2 and 3 by using each to estimate the vote received by the President’s party in 2008. For model
2 we only need the unemployment rate for 2008 to calculate the estimate; for model 3 we not
only need the unemployment rate in 2008 but also the unemployment rate in the previous year,
2007:
Unemployment rate in 2008 = 5.81% Unemployment rate in 2007 = 4.64%

Model 2: In 2008,

UnemCurrent = 5.81

EstVotePresPartyTwo = 52.7 − 0.12UnemCurrent

= 52.7 − 0.12 × 5.81

= 52.7 − 0.7

= 52.0

Model 2’s estimate depends only on the unemployment rate in the current year, 2008 in this
case. The unemployment rate for 2007 is irrelevant. The estimate for 2008 would be the same
regardless of what the unemployment rate for 2007 equaled.
402 Chapter 12

Model 3: In 2008,

UnemTrend = 5.81 − 4.64 = 1.17

EstVotePresPartyTwo = 52.0 − 0.75UnemTrend

= 52.0 − 0.75 × 1.17

= 52.0 − 0.9

= 51.1

Model 3’s estimate depends on the change in the unemployment rate; consequently the unem-
ployment rates in both years are important.

Model 4: Present trend II—Electorate is sensitive not only to the unemployment rate trend, but
also the trend in prices, the inflation rate.

The fourth model, like the third, theorizes that voters are concerned with the trend. If economic
conditions are improving, the incumbent President’s party is rewarded with more votes. If condi-
tions were deteriorating, fewer votes would be received. The fourth model postulates that voters
are not only concerned with the trend in the unemployment rate but also the trend in prices. The
inflation rate measures the trend in prices. A 2 percent inflation rate means that prices are on
average rising by 2 percent, a 3 percent inflation rate means that prices are rising by 3 percent,
and so on.

Step 0: Formulate a model reflecting the theory to be tested.

VotePresPartyTwot = βConst + βUnemTrendUnemTrendt + βInflCpiCurrentInflCpiCurrentt + et

where

UnemTrendt = change in the unemployment rate in the current year, in year t

InflCpiCurrentt = inflation rate based on the CPI in the current year, in year t

Theory:

• A rising unemployment rate during the election year will decrease the votes of the incumbent
President’s party; a falling unemployment rate will increase votes. The actual value of the Unem-
Trend coefficient, βUnemTrend, is negative:

βUnemTrend < 0
• An increase in the inflation rate during the election year will decrease the votes of the incum-
bent President’s party; a decrease in the inflation rate will increase votes. The actual value of
the InflCpiCurrent coefficient, βInflCpiCurrent, is negative:

βInflCpiCurrent < 0
403 Model Specification and Development

Table 12.12
Election regression results—Present trend model

Ordinary least squares (OLS)

Dependent variable: VotePresPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

UnemTrend −1.068160 0.560702 −1.905040 0.0675

InflCpiCurrent − 0.585465 0.286421 −2.044071 0.0508
Const 53.57059 1.484912 36.07662 0.0000
Number of observations 30
Estimated equation: EstVotePresPartyTwo = 53.6 − 1.07UnemTrend − 0.59InflCpiCurrent
bUnemTrend = −1.07: A 1 percentage point rise in the unemployment rate from the previous year decreases the vote the
President’s party receives by 1.07 percent age points.
bInflCpiCurrent = − 0.59: A 1 percent rise in prices decreases the vote the President’s party receives by 0.59 percent.
Critical result: The UnemTrend coefficient estimate equals −1.07. The negative sign of the coefficient estimate
suggests that deteriorating economic conditions as evidenced by a rising unemployment will
decrease the vote received by the President’s party. This evidence lends support to the “It’s the
economy stupid” theory.
The InflCpiCurrent coefficient estimate equals −0.59. The negative sign of the coefficient estimate
suggests that a rising prices decrease the votes received by the President’s party. This evidence lends
support to the “It’s the economy stupid” theory.

Step 1: Collect data, run the regression, and interpret the estimates (table 12.12).

On the one hand, both coefficients suggest that deteriorating economic conditions decrease
the votes received by the President’s party. On the other hand, improving economic conditions
increase the vote.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.

Cynic’s view of unemployment rate trend: Despite the results, the unemployment trend has no
effect.
Cynic’s view of inflation rate: Despite the results, the trend in prices has no effect.

Unemployment trend hypotheses Inflation hypotheses

H0: βUnemTrend = 0 H0: βInflCpiCurrent = 0
H1: βUnemTrend < 0 H1: βInflCpiCurrent < 0

Using the tails probabilities reported in the regression printout, we can easily
Steps 3, 4, and 5:
compute Prob[Results IF H0 True] for each of our theories:

Unemployment Trend Inflation

0.0675 0.0508
Prob[ Results IF H 0 true] = ≈ 0.034 Prob[ Results IF H 0 true] = ≈ 0.025
2 2
404 Chapter 12

At the 5 percent significance level both of these probabilities are small. Hence, at the 5 percent
significance level, we can reject the null hypotheses that the unemployment trend and inflation
have no effect on the vote for the incumbent President’s party. This supports the notion that “it’s
the economy stupid.”
This example illustrates the model formulation and assessment process. As mentioned before,
the process is as much of an art as a science. There is no routine “cookbook” recipe that we can
apply. It cannot be emphasized enough that we must use our common sense and
inventiveness.

Chapter 12 Review Questions

1. What is an artificial model?

2. Consider a multiple regression model. When a particular explanatory variable has no effect
on the dependent variable, what does its coefficient equal?
3. Consider the “artificial explanatory variable” in the context of a RESET test:
a. What does the “artificial explanatory variable” not add to the regression?
b. What does the “artificial explanatory variable” add to the regression?
4. Consider the “artificial explanatory variable” in the context of a RESET test.
a. If the coefficient estimate of the artificial explanatory variable does not significantly differ
from 0:
i. Does the new form of the information add significant explanatory power in explaining
the dependent variable?
ii. Is there a compelling reason to consider a new specification of the model?
2. If the coefficient estimate of the artificial explanatory variable does significantly differ
from 0:
i. Does the new form of the information add significant explanatory power in explaining
the dependent variable?
ii. Would it prudent to consider a new specification of the model?

Chapter 12 Exercises

1. Revisit the presidential election data.

Presidential election data: Annual time series data of US presidential election and economic
statistics from 1890 to 2008.
405 Model Specification and Development

VotePartyDemt Percent of popular vote received by the Democratic candidate in year t

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election – 1892–2008.]

Consider the following factors that may or may not influence the vote for the incumbent Presi-
dent’s party:

• Real GDP growth rate

• Inflation rate
• Number of consecutive terms the incumbent President’s party has held the presidency
a. Formulate a theory explaining how each of these factors should affect the presidential
vote.
b. Present a model incorporating these factors. What do your theories imply about the sign
of each coefficient?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theories.

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.
406 Chapter 12

CigConsPCt Cigarette consumption per capita in state t (packs)

EducColleget Percent of population with bachelor degrees in state t
EducHighSchoolt Percent of population with high school diplomas in state t
IncPCt Income per capita in state t (1,000s of dollars)
Popt Population of state t (persons)
PriceConsumert Price of cigarettes in state t paid by consumers (dollars per pack)
PriceSuppliert Price of cigarettes in state t received by suppliers (dollars per pack)
RegionMidWestt 1 if state t in Midwest census region, 0 otherwise
RegionNorthEastt 1 if state t in Northeast census region, 0 otherwise
RegionSoutht 1 if state t in South census region, 0 otherwise
RegionWestt 1 if state t in West census region, 0 otherwise
SmokeRateAdultt Percent of adults who smoke in state t
SmokeRateYoutht Percent of youths who smoke in state t
Statet Name of state t
Taxt Cigarette tax rate in state t (dollars per pack)
TobProdPCt Per capita tobacco production in state t (pounds)

2. Revisit the cigarette consumption data.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

a. Focus on the cigarette tax rate, Tax. Which state has the highest tax rate and which state
has the lowest tax rate?
b. Consider the following linear model that attempts to explain the cigarette tax rate:

Taxt = βConst + βTobProdPCTobProdPCt + et

What rationale might justify this model? That is, devise a theory explaining why a state’s
tobacco production should affect the state’s tax on cigarettes. What does your theory suggest
about the sign of the coefficient, βTobProd?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters. Interpret the coefficient estimate.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
3. Revisit the cigarette consumption data.
407 Model Specification and Development

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

a. Perform a RESET test on the linear model explaining the cigarette tax rate. What do you
conclude?

Getting Started in EViews

After running the original regression:

• Click View, Stability Diagnostics, Ramsey RESET Test.
• Be certain that 1 is selected as the number of fitted terms.
• Click OK.

Consider the following nonlinear model:

Taxt = βConst + βTob Pr od TobProdPCt + et

b. Apply the hypothesis testing approach that we developed to assess this model.

Getting Started in EViews

Generate a square root with one of the following commands:

SqrtTobProdPC = sqr(TobProdPC)

SqrtTobProdPC = TobProdPC∧.5

In EViews, the term sqr is the square root function and the character ∧ represents an exponent.

c. Perform a RESET test for the nonlinear model. What do you conclude?
4. Revisit the cigarette consumption data.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

Consider how the price of cigarettes paid by consumers and per capita income affect the adult
smoking rate.
a. Formulate a theory explaining how each of these factors should affect the adult smoking
rate.
408 Chapter 12

b. Present a linear model incorporating these factors. What do your theories imply about the
sign of each coefficient?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
f. Perform a RESET test for your model. What do you conclude?
5. Revisit the cigarette consumption data.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

Focus your attention on explaining the youth smoking rate. Choose the variables that you believe
should affect the youth smoking rate.
a. Formulate a theory explaining how each of these factors should affect the youth smoking
rate.
b. Present a linear model incorporating these factors. What do your theories imply about the
sign of each coefficient?
c. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.
d. Formulate the null and alternative hypotheses.
e. Calculate Prob[Results IF H0 true] and assess your theory.
f. Perform a RESET test for your model. What do you conclude?
Dummy and Interaction Variables
13

Chapter 13 Outline

13.1 Preliminary Mathematics: Averages and Regressions Including Only a Constant

13.2 An Example: Discrimination in Academe

13.2.1 Average Salaries
13.2.2 Dummy Variables
13.2.3 Models
13.2.4 Beware of Implicit Assumptions
13.2.5 Interaction Variables
13.2.6 Conclusions

13.3 An Example: Internet and Television Use

13.3.1 Similarities and Differences
13.3.2 Interaction Variable: Economic and Political Interaction

Chapter 13 Prep Questions

1. Recall our first regression example, Professor Lord’s quiz:

Minutes Quiz
Student Studied (x) Score (y)
1 5 66
2 15 87
3 25 90

Consider the most simple of all possible models, one that does not include even a single explana-
tory variable:

Model: yt = βConst + et
410 Chapter 13

bConst denotes the estimate of βConst:

Estimates: Estyt = bConst

Residuals: Rest = yt − Estyt

The sum of squared residuals equals:

SSR = Res 21 + Res 22 + Res 23 = (y1 − bConst)2+ (y2 − bConst)2 + (y3 − bConst)2

Using calculus derive the equation for bConst that minimizes the sum of squared residuals by
expressing bConst in terms of y1, y2, and y3.

2. Consider the following faculty salary data:1

Faculty salary data: Artificially generated cross section salary data and characteristics for 200
faculty members.

Salaryt Salary of faculty member t (dollars)

Experiencet Teaching experience for faculty member t (years)
Articlest Number of articles published by faculty member t
SexM1t 1 if faculty member t is male; 0 if female

You can access these data by clicking on the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

a. What is the average salary for all 200 faculty members?

b. What is the average salary for the men?
c. What is the average salary for the women?

1. As a consequence of privacy concerns, these data were artificially generated.

411 Dummy and Interaction Variables

Getting Started in EViews

For all faculty members:

• In the Workfile window: double click Salary.
•In the Workfile window: click View, then click Descriptive Statistics, then click Histogram
and Stats.

For men only:

• In the Workfile window: click Sample.
• To include only men, enter SexM1 = 1 in the If condition window.
• Click OK.

For women only:

• In the Workfile window: click Sample.
• To include only women, enter SexM1 = 0 in the If condition window.
• Click OK.

Note: Do not forget to “turn off” the sample.

• In the Workfile window: click Sample.
• lear the If condition window.
• Click OK.

d. Consider the following model:

Salary = βConst + et

What is the value of the estimated constant?

Getting Started in EViews

To estimate the model, you must “trick” EViews into running the appropriate regression:
• In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight one other
variable, say SexM1.
• In the Workfile window: double click a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete SexM1 so that the line specifying the equation
looks like this:
salary c
• Click OK.
412 Chapter 13

e. Now consider a second model:

Salaryt = βConst + βSexM1SexM1t + et

Run the appropriate regression to estimate the values of the constant and coefficient. What
is the estimated salary for men? What is the estimated salary for women?
f. Compare your answers to d and e with your answers to a, b, and c. What conclusions can
you draw concerning averages and the regression estimates?
3. Consider the following model explaining Internet use in various countries:

LogUsersInternett = β Int
Const + β Year Yeart + β CapHum CapitalHumant
Int Int

+ β Int
CapPhy CapitalPhysicalt + β GDP Gdpt + β Auth Autht + e t
Int Int Int

where

LogUsersInternett = logarithm of Internet users per 1,000 people for observation t

Yeart = year for observation t
CapitalHumant = literacy rate for observation t (percent of population 15 and over)
CapitalPhysicalt = telephone mainlines per 10,000 people for observation t
GdpPCt = per capita real GDP in nation t (1,000s of “international” dollars)
Autht = the Freedom House measures of political authoritarianism for observation t normalized
to a 0 to 10 scale. 0 represents the most democratic rating and 10 the most authoritarian. During
the 1995 to 2002 period, Canada and the United States had a 0 rating; Iraq and the Democratic
Republic of Korea (North Korea) rated 10.
a. Note that the dependent variable is the logarithm of Internet users. Interpret the coefficient
of Year, β Int
Year .

b. Develop a theory that explains how each explanatory variable affects Internet use. What
do your theories suggest about the sign of each coefficient?
4. Consider a similar model explaining television use in various countries:

LogUsersTVt = β TV
Const + β Year Yeart + β CapHum CapitalHumant + β CapPhy CapitalPhysicalt
TV TV TV

+ β TV
GDP Gdpt + β Auth Autht + e t
TV TV

where

LogUsersTVt = logarithm of television users per 1,000 people for observation t

a. Develop a theory that explains how each explanatory variable affects television use.
b. Based on your theories, which coefficients should be qualitatively similar (have the same
sign) as those in the Internet use model and which may be qualitatively different?
413 Dummy and Interaction Variables

13.1 Preliminary Mathematics: Averages and Regressions Including Only a Constant

Before investigating the possibility of discrimination in academia, we will consider a technical

issue that will prove useful. While a regression that includes only a constant (i.e., a regression
with no explanatory variables) is not interesting in itself, it teaches us an important lesson. When
a regression includes only a constant, the ordinary least squares (OLS) estimate of the constant
equals the average of the dependent variable’s values. A little calculus allows us to prove this:

Model yt = βConst + et
Estimates Estyt = bConst
Residuals Rest = yt − Estyt

Now compute the sum of the squared residuals:

SSR = Res 12 + Res 22 + Res 23

= (y1 − Esty1)2 + (y2 − Esty2)2 + (y3 − Esty3)2

= (y1 − bConst)2 + (y2 − bConst)2 + (y3 − bConst)2

To minimize the sum of squared residuals, differentiate with respect to bConst and set the deriva-
tive equal to 0:

dSSR
= −2( y1 − bConst ) − 2( y2 − bConst ) − 2( y3 − bConst ) = 0
dbConst

Divide by −2:

y1 − bConst + y2 − bConst + y3 − bConst = 0

Rearrange terms:

y1 + y2 + y3 = 3bConst

Divide by 3:

y1 + y2 + y3
= bConst
3

y1 + y2 + y3
where equals the mean of y, –y ,
3
–y = bConst

We have just shown that when a regression includes only a constant the ordinary least squares
(OLS) estimate of the constant equals the average value of the dependent variable, y.
414 Chapter 13

13.2 An Example: Discrimination in Academe

Now we consider faculty salary data. It is important to keep in mind that these data were arti-
ficially generated; the data are not “real.” Artificially generated, rather than real, data are used
as a consequence of privacy concerns.

Faculty salary data: Artificially generated cross-sectional salary data and characteristics for 200
faculty members.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

Salaryt Salary of faculty member t (dollars)

Experiencet Teaching experience for faculty member t (years)
Articlest Number of articles published by faculty member t
SexM1t 1 if faculty member t is male; 0 if female

Project: Assess the possibility of discrimination in academe.

We begin by examining the average salaries of men and women.

13.2.1 Average Salaries

First let us report the average salaries:

Both males and females $82,802

Males only 91,841
Females only 63,148
Difference 28,693

On average, males earn nearly $30,000 more than females. This certainly raises the possibility
that gender discrimination exists, does it not?

13.2.2 Dummy Variables

A dummy variable separates the observations into two disjoint groups; a dummy variable equals
1 for one group and 0 for the other group. The variable SexM1 is a dummy variable; SexM1
denotes whether a faculty member is a male of female; SexM1 equals 1 if the faculty member
is a male and 0 if female. We will now show that dummy variables prove very useful in explor-
ing the possibility of discrimination by considering three types of models:
415 Dummy and Interaction Variables

• Type 1 models: No explanatory variables; only a constant.

• Type 2 models: A constant and a single dummy explanatory variable denoting sex.
• Type 3 models: A constant, a dummy explanatory variable denoting sex, and other explana-
tory variable(s).

13.2.3 Models

Type 1 Models: No Explanatory Variables; Only a Constant

We begin with a model that includes no explanatory variables; that is, a model that includes only
a constant. We are doing this to confirm our conclusion that a regression with only a constant
is equivalent to computing an average of the dependent variable.

Step 0: Formulate a model reflecting the theory to be tested.

Model: Salaryt = βConst + et

Since this model includes only a constant, we are theorizing that except for random influences
each faculty member earns the same salary. That is, this model attributes all variations in income
to random influences.

Step 1: Collect data, run the regression, and interpret the estimates.

Getting Started in EViews

Table 13.1 confirms the fact that when a regression only includes a constant, the ordinary least
squares (OLS) estimate of the constant is just the average of the dependent variable. To empha-
size this fact, we will now run two more regressions with only a constant: one regression includ-
ing only men (table 13.2) and one including only women (table 13.3).
416 Chapter 13

Table 13.1
Discrimination regression results—All observations

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

Const 82,802.40 1,929.422 42.91565 0.0000

Number of observations 200

Table 13.2
Discrimination regression results—Males only

Dependent variable: Salary

Explanatory variables: None
Sample: SexM1 = 1

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

Const 91,840.58 2,259.201 40.65180 0.0000

Number of observations 137 Sample SexM1 = 1

Table 13.3
Discrimination regression results—Females only

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

Const 63,147.94 2,118.879 29.80252 0.0000

Number of observations 63 Sample SexM1 = 0

Compare the regression results to the salary averages:

Both males and females $82,802

Males only 91,841
Females only 63,148

Tables 13.1, 13.2, and 13.3 illustrate the important lesson that type 1 models teach us. In a
regression that includes only a constant, the ordinary least squares (OLS) estimate of the constant
is the average of the dependent variable. Next let us consider a slightly more complicated model.
417 Dummy and Interaction Variables

Type 2 Models: A Constant and a Single Dummy Explanatory Variable Denoting Sex

Step 0: Formulate a model reflecting the theory to be tested.

Salaryt = βConst + βSexM1SexM1t + et

where SexM1 equals 1 for males and 0 for females

Discrimination theory: Women are discriminated against in the job market; hence, men earn
higher salaries than women. Since SexM1 equals 1 for males and 0 for females, βSexM1 should be
positive indicating that men will earn more than women: βSexM1 > 0.

Step 1: Collect data, run the regression, and interpret the estimates.

Using the ordinary least squares (OLS) estimation procedure, we estimate the parameters
(table 13.4):
For emphasis, let us apply the estimated equation to men and then to women by plugging in
their values for SexM1:

Estimated equation: EstSalary = 63,148 + 28,693SexM1

We can now compute the estimated salary for men and women:

For men For women

SexM1 = 1 SexM1 = 0
EstSalaryMen = 63,148 + 28,693 = 91,841 EstSalaryWomen = 63,148 + 0 = 63,148

Next note something very interesting by comparing the regression results to the salary
averages:

Table 13.4
Discrimination regression results—Male sex dummy

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexM1 28,692.65 3,630.670 7.902852 0.0000

Const 63,147.94 3,004.914 21.01489 0.0000
Number of observations 200
Estimated equation: EstSalary = 63,148 + 28,693SexM1
Interpretation of estimates:
bSexM1 = 28,693: Men earn $28,693 more than women
Critical result: The SexM1 coefficient estimate equals 28,693. This evidence, the positive sign of the coefficient
estimate, suggests that men earn more than women thereby supporting the discrimination theory.
418 Chapter 13

Males only 91,841

Females only 63,148
Difference 28,693

An ordinary least squares (OLS) regression that includes only a constant and a dummy variable
is equivalent to comparing averages. The conclusions are precisely the same: men earn $28,693
more than women. The dummy variable’s coefficient estimate equals the difference of the
averages.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, there is no discrimination.

H0: βSexM1 = 0 Cynic is correct: No discrimination

H1: βSexM1 > 0 Cynic is incorrect: Discrimination in favor of men, against women

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question for discrimination hypothesis: What is the probability that the results
would be like those we obtained (or even stronger), if the cynic is correct and no discrimination
were present?
• Specific question for discrimination hypothesis: What is the probability that the coefficient
estimate, bSexM1, in one regression would be 2,240 or more, if H0 were true (if the actual coef-
ficient, βSexM1, equals 0)?

Steps 4 and 5: To calculate the Prob[Results IF H0 true], use the tails probability reported in
the regression printout. This is easy to do. Since this is a one-tailed test, we divide the tails
probability by 2:2

< 0.0001
Prob[ Results IF H 0 true] = = < 0.0001
2

Clearly, the Prob[Results IF H0 true] is very small. We can reject the null hypothesis which
asserts that no discrimination exists.
Before we continue, let us point out that our dummy variable, SexM1, assigned 1 to males
and 0 to females. This was an arbitrary choice. We could just as easily assigned 0 to males and
1 to females, could we not? To see what happens when we switch the assignments, generate a
new variable, SexF1:

2. Note that even though the tails probability is reported as 0.0000, the probability can never precisely equal 0. It will
always exceed 0. Consequently, instead of writing 0.0000, we write < 0.0001 to emphasize the fact that the probability
can never equal precisely 0.
419 Dummy and Interaction Variables

SexF1 = 1 − SexM1

For men For women

SexM1 = 1 SexM1 = 0
SexF1 = 1 − 1 = 0 SexF1 = 1 − 0 = 1
Step 0: Formulate a model reflecting the theory to be tested.

Salaryt = βConst + βSexF1SexF1t + et

where SexF1t = 1 if faculty member t is female; 0 if male.

Discrimination theory: Women are discriminated against in the job market; hence women earn
lower salaries than men. Since SexF1 equals 1 for females and 0 for males, βSexF1 should be
negative indicating that women will earn less than men: βSexF1 < 0.
Step 1: Collect data, run the regression, and interpret the estimates.

After we generate the new dummy variable, SexF1, we can easily run the regression (table 13.5).
Let us apply this estimated equation to men and then to women by plugging in their values
for SexF1:

For men For women

SexF1 = 0 SexF1 = 1
EstSalaryMen = 91,841 − 0 = 91,841 EstSalaryWomen = 91,841 − 28,693 = 63,148

The results are precisely the same as before. This is reassuring. The decision to assign 1 to one
group and 0 to the other group is completely arbitrary. It would be very discomforting if this

Table 13.5
Discrimination regression results—Female sex dummy

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexF1 −28,692.65 3,630.670 −7.902852 0.0000

Const 91,840.58 2,037.708 45.07052 0.0000
Number of observations 200
Estimated equation: EstSalary = 91,841 − 28,693SexF1
Interpretation of estimates:
bSexF1 = −28,693: Women earn $28,693 less than men
Critical result: The SexF1 coefficient estimate equals −28,683. This evidence, the negative sign of the coefficient
estimate, suggests that women earn less than men thereby supporting the discrimination theory.
420 Chapter 13

arbitrary decision affected our conclusions. The fact that the arbitrary decision does not affect
the results is crucial.

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, there is no discrimination.

H0: βSexF1 = 0 Cynic is correct: No discrimination

H1: βSexF1 < 0 Cynic is incorrect: Discrimination in favor of men, against women

The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is
consistent with the evidence.

Steps 3, 4, and 5: It is easy to calculate the Prob[Results IF H0 true] by using the tails probabil-
ity reported in the regression printout. Since this is a one-tailed test, we divide the tails probabil-
ity by 2:

< 0.0001
Prob[ Results IF H 0 true] = = < 0.0001
2

Since the probability is so small, we reject the null hypothesis that no discrimination exists.

Bottom line:

• Our choice of the base group for the dummy variable (i.e., the group that is assigned a value
of 0 for the dummy variable) does not influence the results.
• Type 2 models, models that include only a constant and a dummy variable, are equivalent to
comparing averages.

Question: Do type 2 models provide convincing evidence of gender discrimination?

On the one hand, yes:

• The dummy variable coefficients suggest that women earn less than men.
•The dummy variable coefficients are very significant—the probability of obtaining results like
we obtained if no discrimination exists is less than 0.0001.

On the other hand, what implicit assumption is this discrimination model making? The model
implicitly assumes that the only relevant factor in determining faculty salaries is gender. Is this
reasonable? Well, very few individuals contend that gender is the only factor. Many individuals
believe that gender is one factor, perhaps an important factor, affecting salaries, but they believe
that other factors such as education and experience also play a role.
421 Dummy and Interaction Variables

Type 3 Models: A Constant, a Dummy Explanatory Variable Denoting Sex, and Other
Explanatory Variable(s)
While these models allow the possibility of gender discrimination, they also permit us to explore
the possibility that other factors affect salaries too. To explore such models, let us include both
a dummy variable and the number of years of experience as explanatory variables.

Step 0: Formulate a model reflecting the theory to be tested.

Salaryt = βConst + βSexF1SexF1t + βExperExperiencet + et

Theories:

• Discrimination: As before, we theorize that women are discriminated against: βSexF1 < 0.
• Experience: It is generally believed that in most occupations, employees with more experi-
ence earn more than employees with less experience. Consequently we theorize that the experi-
ence coefficient should be positive: βExper > 0.

Step 1: Collect data, run the regression, and interpret the estimates.

We can now compute the estimated salary for men and women (table 13.6):

EstSalary = 42,238 − 2,240SexF1 + 2,447Experience

Table 13.6
Discrimination regression results—Female sex dummy and experience

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexF1 −2,240.053 3,051.835 − 0.734002 0.4638

Experience 2,447.104 163.3812 14.97787 0.0000
Const 42,237.61 3,594.297 11.75129 0.0000
Number of observations 200
Estimated equation: EstSalary = 42,238 − 2,240SexF1 + 2,447Experience
Interpretation of estimates:
bSexF1 = −2,240: Women earn about $2,240 less than men after accounting for experience
bExper = 2,447: Each additional year of experience results in a $2,447 increase in salary for both men and women
Critical result: The SexF1 coefficient estimate equals −2,240. The negative sign of the coefficient estimate suggests
that women earn less than men. This evidence supports the discrimination theory.
The Experience coefficient estimate equals 2,447. The positive sign of the coefficient estimate
suggests that additional experience increases salaries. This evidence supports the experience theory.
422 Chapter 13

For men, SexF1 = 0:

EstSalaryMen = 42,238 − 0 + 2,447Experience

= 42,238 + 2,447Experience
For women, SexF1 = 1:

EstSalaryWomen = 42,238 − 2,240 + 2,447Experience

39,998 + 2,447Experience

We can illustrate the estimated salaries of men and women graphically (figure 13.1).

Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
• Cynic’s view on discrimination: Despite the results, there is no discrimination.
• Cynic’s view on experience: Despite the results, experience does not increase salary.

EstSalary

EstSalaryMen = 42,238 + 2,447 Experience

Slope = 2,447
42,238
2,240
EstSalaryWomen = 39,998 + 2,447 Experience

39,998

Experience

Figure 13.1
Salary discrimination
423 Dummy and Interaction Variables

Discrimination hypotheses Experience hypotheses

H0: βSexF1 = 0 No discrimination H0: βExper = 0 Experience has no effect on salary
H1: βSexF1 < 0 Discrimination H1: βExper > 0 Experience increases salary

The null hypothesis, like the cynic, challenges the evidence. The alternative hypothesis is con-
sistent with the evidence. We will proceed by focusing on discrimination.

Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
•Generic question for discrimination hypothesis: What is the probability that the results
would be like those we obtained (or even stronger), if the cynic is correct and no discrimination
were present?
• Specific question for discrimination hypothesis: The regression’s coefficient estimate was
−2,240. What is the probability that the coefficient estimate in one regression would be −2,240
or less, if H0 were true (if the actual coefficient, βSexF1, equals 0; i.e., if no discrimination existed)?

Answer: Prob[Results IF H0 true]

Step 4 and 5: Use the general properties of the estimation procedure, the probability distribu-
tion of the estimate (figure 13.2), to calculate Prob[Results IF H0 true].

0.4638
Prob[ Results IF H 0 true] = = 0.23
2

At the traditional significance levels of 1, 5, and 10 percent, we cannot reject the null hypothesis
that no discrimination exists. What should we make of this dramatic change?

0.4638/2 = .2319

bSexF1
− 2,240 0

Figure 13.2
Probability distribution of coefficient estimate
424 Chapter 13

13.2.4 Beware of Implicit Assumptions

Focus on our last model: Salaryt = βConst + βSexF1SexF1t + βExper Experiencet + et.

Implicit assumption: One year of added experience increases the salary of men and women by
equal amounts.

In other words, this model implicitly assumes that women start behind men by a certain amount
and then remain behind men by that same amount for each level of experience. We will call this
“lump sum” discrimination. Figure 13.3 illustrates this well; the slopes of the lines representing
the estimated salaries for men and women are equal.
Might gender discrimination take another form? Yes. Experience could affect the salaries of
men and women differently. It is possible for a man to receive more for an additional year of
experience than a woman. In other words, could men be more highly rewarded for experience
than women? Our last model excludes this possibility because it implicitly assumes that a year
of added experience increases the salary of men and women by equal amounts. To explore the
possibility of this second type of discrimination, we will introduce interaction variables. We will
refer to this type of discrimination as “raise” discrimination.

EstSalary

EstSalaryMen = 42,238 + 2,447 Experience

Slope = 2,447
42,238
2,240
EstSalaryWomen = 39,998 + 2,447 Experience

39,998

Experience

Figure 13.3
Estimated discrimination equations with “lump sum” discrimination
425 Dummy and Interaction Variables

13.2.5 Interaction Variables

An interaction variable allows us to explore the possibility that one explanatory variable influ-
ences the effect that a second explanatory variable has on the dependent variable. We generate
an interaction variable by multiplying the two variables together. We will focus on the interaction
of Experience and SexF1 by generating the variable Exper_SexF1:

Exper_SexF1 = Experience × SexF1

We will now add the interaction variable, Exper_SexF1, to our last model.

Step 0: Formulate a model reflecting the theory to be tested.

Salaryt = βConst + βSexF1SexF1t + βExperExperiencet + βExper_SexF1Exper_SexF1t + et

Theories:
• “Lump sum” discrimination: As before, we theorize that women are discriminated against:
βSexF1 < 0.
• Experience: As before, we theorize that the experience coefficient should be positive:
βExper > 0.
• “Raise” discrimination: One year of additional experience should increase the salary of
women by less than their male counterparts. Hence we theorize that the coefficient of the inter-
action variable is negative: βExper_SexF1 < 0. (If it is not clear why you should expect this coefficient
to be negative, be patient. It should become clear shortly.)

Step 1: Collect data, run the regression, and interpret the estimates (table 13.7).

Table 13.7
Discrimination regression results—Female sex dummy, experience, and Female sex dummy−Experience
interaction variable

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexF1 10,970.26 5,538.331 1.980787 0.0490

Experience 2,676.158 179.6929 14.89295 0.0000
Exper_SexF1 −1,134.665 399.9411 −2.837081 0.0050
Const 37,594.67 3,892.412 9.658451 0.0000
Number of observations 200
Estimated equation: EstSalary = 37,595 + 10,970SexF1 + 2,676Experience − 1,135Exper_SexF1
426 Chapter 13

Now let us apply the estimated equation to men and women:

For men For women

SexF1 = 0 SexF1 = 1
Exper_SexF1 = 0 Exper_SexF1 = Experience
For men,

EstSalaryMen = 37,595 + 10,970SexF1 + 2,676Experience − 1,135Exper_SexF1

= 37,595 + 0 + 2,676Experience − 0

= 37,595 + 2,676Experience

For women,

EstSalaryWomen = 37,595 + 10,970SexF1 + 2,676Experience − 1,135Exper_SexF1

= 37,595 + 10,970 + 2,676Experience − 1,135Experience

= 48,565 + 1,541Experience

Plot the estimated salary for men and women (figure 13.4). We can use this regression to assess
the possibility of two different types of discrimination. One of the estimates is a little
surprising:

EstSalary

EstSalaryMen = 37,595 + 2,676 Experience

EstSalaryWomen = 48,565 + 1,541Experience

48,565

37,595

Experience

Figure 13.4
Estimated discrimination equations with “lump sum” and “raise” discrimination
427 Dummy and Interaction Variables

• “Lump sum” discrimination: As before, the coefficient of the sex dummy variable, SexF1,
assesses the possibility of “lump sum” discrimination. The coefficient estimate is positive. This
is unexpected. It suggests that when faculty members are hired from graduate school with no
experience, women receive about $10,970 more than men. The positive coefficient estimate
suggests that reverse discrimination exists at the entry level.
• “Raise” discrimination: The coefficient of the interaction variable, Exper_SexF1, assesses

the possibility of this more subtle type of discrimination, “raise” discrimination. The coefficient
estimate is negative. It suggests that a woman is receives $1,135 less than a man for an additional
year of experience. The negative coefficient estimate suggests that women receive smaller annual
raises than their male counterparts.

These regression results paint a more complex picture of possible discrimination than is often
contemplated. Again, recall that as a consequence of privacy concerns these data were artificially
generated. Consequently, do not conclude that the conclusions we have suggested here neces-
sarily reflect the “real world.” This example was used because it illustrates how multiple regres-
sion analysis can exploit dummy variables and interaction variables to investigate important
issues, such as the presence of discrimination.

13.2.6 Conclusions

• Beware of averages: We should not consider differences in averages, by themselves, as

evidence of discrimination. When we just consider average salaries, we are implicitly adopting
a model of salary determination that few, if anyone, consider realistic. We implicitly assume that
the only factor that determines an individual’s salary is his/her sex. While many would argue
that gender is one factor, very few would argue that gender is the only factor.
• Power of multiple regression analysis: Since is it naïve to consider just averages, what
quantitative tools should we use to assess the presence of discrimination? Multiple regression
analysis is an appropriate tool. It allows us to consider the roles played by several factors in the
determination of salary and separates out the individual influence of each. Multiple regression
analysis allows us to consider not only the role of gender, but also the role that the other factors
may play. Multiple regression analysis sorts out the impact that each individual explanatory
variable has on the dependent variable.
• Flexibility of multiple regression analysis: Not only does multiple regression analysis
allow us to consider the roles played by various factors in salary determination, but also it
allows us to consider various types of potential discrimination. The preceding example
illustrates how we can assess the possible presence “lump sum” discrimination and/or “raise”
discrimination.
428 Chapter 13

13.3 An Example: Internet and Television Use

Next we consider Internet and television use:

Project: Assess the determinants of Internet and television use internationally.

Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use – 1995–2002.]

LogUsersInternett Logarithm of Internet users per 1,000 people for observation t

LogUsersTVt Logarithm of television users per 1,000 people for observation t
Yeart Year for observation t
CapitalHumant Literacy rate for observation t (percent of literate population 15 and over)
CapitalPhysicalt Telephone mainlines per 10,000 people for observation t
GdpPCt Per capita real GDP in nation t (1,000s of “international” dollars)
Autht The Freedom House measures of political authoritarianism for observation
t normalized to a 0 to 10 scale. 0 represents the most democratic rating and
10 the most authoritarian. During the 1995–2002 period, Canada and the
United States had a 0 rating; Iraq and the Democratic Republic of Korea
(North Korea) rated 10.

Step 0: Formulate a model reflecting the theory to be tested.

Internet model: LogHsersInternett = β Int

Const + β Year Yeart + β CapHum CapitalHumant
Int Int

+ β Int
CopPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht
Int Int

+ e Int
t

Television model: LogUsersTVt = β TV

Const + β Year Yeart + β CapHum CapitalHumant
TV TV

+ β TV
CapPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht + e t
TV TV TV

The dependent variable in both the Internet and television models is the logarithm of users. This
is done so that the coefficients can be interpreted as percentages.

13.3.1 Similarities and Differences

The theory behind the effect of human capital, physical capital, and per capita GDP on both
Internet and television use is straightforward: Additional human capital, physical capital, and
per capita GDP should stimulate both Internet and television use.
429 Dummy and Interaction Variables

We postulate that the impact of time and political factors should be different for the two media,
however:
• As an emerging technology, we theorize that there should be, on the one hand, substantial
growth of Internet use over time—even after accounting for all the other factors that may affect
Internet use. Television, on the other hand, is a mature technology. After accounting for all the
other factors, time should play little or no role in explaining television use.
• We postulate that the political factors should affect Internet and television use differently. On
the one hand, since authoritarian nations control the content of television, we would expect
authoritarian nations to promote television; television provides the authoritarian nation the means
to get the government’s message out. On the other hand, since it is difficult to control Internet
content, we would expect authoritarian nations to suppress Internet use.

Table 13.8 summarizes our theories and presents the appropriate null and alternative hypoth-
eses. As table 13.8 reports, all the hypothesis tests are one-tailed tests with the exception of the
Year coefficient in the television use model.
Let us begin by focusing on Internet use.

Step 1: Collect data, run the regression, and interpret the estimates.

Since the dependent variables are logarithms, we interpret the coefficient estimates in terms
of percentages (table 13.9). The signs of all the coefficient estimates support our theories.

Table 13.8
Theories and hypotheses for Internet and television use

LogUsersInternet LogUsersTV

Variable Theory Hypotheses Theory Hypotheses

Year β Int
Year >0 H0: β Int
Year =0 β TV
Year =0 H0: β TV
Year = 0

Emerging versus H1: β Int

Year >0 H1: β TV
Year ≠ 0
mature technology
CapitalHuman β Int
CapHum > 0 H0: β Int
CapHum = 0 β TV
CapHum > 0 H0: β TV
CapHum = 0

Literacy rate H1: β Int

CapHum >0 H1: β TV
CapHum > 0

CapitalPhysical β Int
CapPhy >0 H0: β Int
CapPhy =0 β TV
CapPhy >0 H0: β TV
CapPhy = 0

Telephone (main) lines H1: β Int

CapPhy > 0 H1: β TV
CapPhy > 0
per 1,000 people
GdpPC β Int
GDP > 0 H0: β Int
GDP = 0 β TV
GDP > 0 H0: β TV
GDP = 0

Per capita real GDP H1: β Int

GDP >0 H1: β TV
GDP > 0

Auth β Int
Auth < 0 H0: β Int
Auth = 0 β TV
Auth > 0 H0: β TV
Auth = 0

0-Democratic H1: β Int

Auth <0 H1: β TV
Auth > 0
10-Authoritarian
430 Chapter 13

Table 13.9
Internet regression results

Ordinary least squares (OLS)

Dependent variable: LogUsersInternet

Explanatory variable(s): Estimate SE t-Statistic Prob

Year 0.449654 0.017078 26.32965 0.0000

CapitalHuman 0.023725 0.002470 9.606597 0.0000
CapitalPhysical 0.002056 0.000480 4.286193 0.0000
GdpPC 0.118177 0.011461 10.31146 0.0000
Auth − 0.095836 0.013999 − 6.845761 0.0000
Const −899.3201 34.17432 −26.31567 0.0000
Number of observations 566
Estimated equation: EstLogUsersInternet = −899.3 + 0.450Year + 0.024CapitalHuman + 0.002CapitalPhysical
+ 0.118GdpPC − 0.096Auth
Interpretation of estimates: After accounting for all other explanatory variables
β Int
Year = 0.450: A one unit increase in Year increases Internet use by 45 percent. That is, after accounting for all
other explanatory variables, the estimated annual rate of Internet use is 45 percent per year. This is just the type of
rapid growth we would expect for an emerging technology.
CapHum = 0.024: A one percentage point increase in the literacy rate, CapitalHuman, increases Internet use by
b Int
2.4 percent
CapPhy = 0.002: A one unit increase in telephone mainlines per 10,000 persons, CapitalPhysical, increases Internet
b Int
use by 0.2 percent
GFP = 0.118: A 1,000 international dollar increase in gross domestic product, GdpPC, increases Internet use by
b Int
11.8 percent
Auth = 0.096: A 1 unit increase in the authoritarian index, Auth, decreases Internet use by 9.6 percent
b Int

Next estimate the television use model (table 13.10).

Steps 2, 3, 4, and 5: Table 13.11 summarizes the remaining steps.

Note that all the results support the theories and all the coefficients except for the Year coefficient
in the television regression are significant at the 1 percent level. It is noteworthy that the regres-
sion results suggest that the impact of Year and Auth differ for the two media just as we postu-
lated. Our results suggest that after accounting for all other explanatory variables:
• Internet use grows by an estimated 45 percent per year whereas the annual growth rate of
television use does not differ significantly from 0.
• Increases in the authoritarian index results to a significant decrease Internet use, but a signifi-
cant increase television use.
431 Dummy and Interaction Variables

Table 13.10
Television regression results

Ordinary least squares (OLS)

Dependent variable: LogUsersTV

Explanatory variable(s): Estimate SE t-Statistic Prob

Year 0.022989 0.015903 1.445595 0.1487

CapitalHuman 0.036302 0.001915 18.95567 0.0000
CapitalPhysical 0.001931 0.000510 3.789394 0.0002
GdpPC 0.058877 0.012338 4.772051 0.0000
Auth 0.063345 0.012825 4.939278 0.0000
Const −44.95755 31.77155 −1.415025 0.1575
Number of observations 742
Estimated equation: EstLogUsersTV = −45.0 + 0.023Year + 0.036CapitalHuman + 0.002CapitalPhysical
+ 0.059GdpPC + 0.063Auth
Interpretation of estimates: After accounting for all other explanatory variables
Year = 0.023: A one unit increase in Year increases television use by 2.3 percent. The tails probability indicates
b TV
that after accounting for all other explanatory variables, we cannot reject the null hypothesis that there is no growth
in television use at the traditional significance levels. This is what we would expect for a mature technology.
CapHum = 0.036: A one percentage point increase in the literacy rate, CapitalHuman, increases television use by
b TV
3.6 percent
CapPhy = 0.002: A one unit increase in telephone mainlines per 10,000 persons, CapitalPhysical, increases
b TV
television use by 0.2 percent
CDP = 0.058: A 1,000 international dollar increase in gross domestic product, GdpPC, increases television use by
b TV
5.9 percent
Auth = 0.063: A 1 unit increase in the authoritarian index, Auth, increases television use by 6.3 percent
b TVt

Table 13.11
Coefficient estimates and Prob[Results IF H0 true]

LogUsersInternet LogUsersTV

Year 0.450* (<0.0001) 0.023 (0.1487)

CapitalHuman 0.024* (<0.0001) 0.036* (<0.0001)
CapitalPhysical 0.002* (<0.0001) 0.002* (0.0001)
GdpPC 0.118* (<0.0001) 0.059* (<0.0001)
Auth − 0.096 (<0.0001) 0.064* (<0.0001)
Prob[Results IF H0 true] in parentheses. * indicates significance at the 1 percent level.
432 Chapter 13

13.3.2 Interaction Variable: Economic and Political Interaction

Next let us investigate the following question:

Question: Does per capita GDP have a greater impact on Internet use in authoritarian nations
than nonauthoritarian ones?
Some argue that the answer to this question is yes; that is, that per capita GDP has a greater
impact on Internet use in authoritarian nations. Their rationale is based on the following logic:
• In authoritarian nations, citizens have few sources of uncensored information. There are few,
if any, uncensored newspapers, news magazines, etc. available. The only source of uncensored
information is the Internet. Consequently the effect of per capita GDP on Internet use will be
large.
• In nonauthoritarian nations, citizens have many sources of uncensored information. Higher per
capita GDP will no doubt stimulate Internet use, but it will also stimulate the purchase of uncen-
sored newspapers, news magazines, etc. Consequently the effect on Internet use will be modest.
An authoritarian index–GDP interaction variable can be used to explore this issue. To do so,
generate the interaction variable Auth_GdpPC, the product of the authoritarian index and per
capita GDP:

Auth_GdpPC = Auth × GdpPC

Step 0: Formulate a model reflecting the theory to be tested.

Add this interaction variable to the Internet model:

LogUsersInternett = β Int
Const + β Year Yeart +β CapHum CapitalHumant
Int Int

+ β Int
CapPhy CapitalPhysicalt + β GDP GdpPCt + β Auth Autht
Int Int

+ β Int
Auth_GDP Auth_GdpPCt + e t
Int

If the theory regarding the interaction of authoritarianism and per capita GDP is correct, the
coefficient of the interaction variable, Auth_GdpPC, should positive: β Int
Auth_GDP > 0. (If you are
not certain why, it should become clear shortly.) The null and alternative hypotheses are

H0: β Int
Auth_GDP = 0

H1: β Int
Auth_GDP > 0

Step 1: Collect data, run the regression, and interpret the estimates.

Focus attention on the estimated effect of GDP. To do so, consider both the GDP and Auth_
GDP terms in the estimated equation (table 13.12):
433 Dummy and Interaction Variables

Table 13.12
Internet regression results—With interaction variable

Ordinary least squares (OLS)

Dependent variable: LogUsersInternet

Explanatory variable(s): Estimate SE t-Statistic Prob

Year 0.472826 0.016320 28.97241 0.0000

CapitalHuman 0.021560 0.002341 9.211160 0.0000
CapitalPhysical 0.003246 0.000473 6.859161 0.0000
GdpPC 0.033394 0.014715 2.269342 0.0236
Auth − 0.229875 0.020586 −11.16651 0.0000
Auth_GdpPC 0.017505 0.002064 8.480178 0.0000
Const − 944.9202 32.64247 −28.94757 0.0000
Number of observations 566
Estimated equation: EstLogUsersInternet = −944.9 + 0.473Year
+ 0.022CapitalHuman + 0.003CapitalPhysical
+ 0.033GdpPC − 0.230Auth
+ 0.0175Auth_GdpPC

Table 13.13
Interaction variable estimate calculations

Authoritarian index Estimated effect of per capita GDP

0.033GdpPC + 0.0175 × 0 × GdpPC

0 0.033GdpPC + 0 = 0.033GdpPC
0.033GdpPC + 0.0175 × 2 × GdpPC
2 0.033GdpPC + 0.035GdpPC = 0.068GdpPC
0.033GdpPC + 0.0175 × 4 × GdpPC
4 0.033GdpPC + 0.070GdpPC = 0.103GdpPC
0.033GdpPC + 0.0175 × 6 × GdpPC
6 0.033GdpPC + 0.105GdpPC = 0.138GdpPC
0.033GdpPC + 0.0175 × 8 × GdpPC
8 0.033GdpPC + 0.140GdpPC = 0.173GdpPC

0.033GdpPC + 0.0175 Auth_GdpPC

0.033GdpPC + 0.0175 Auth × GdpPC

We will now estimate the impact of GDP for several values of the authoritarian index
(table 13.13).
Recall that as the authoritarian index increases, the level of authoritarianism rises. Therefore
the estimates suggest that as a nation becomes more authoritarian, a $1,000 increase in per capita
GDP increases Internet use by larger amounts. This supports the position of those who believe
that citizens of all nations seek out uncensored information. In authoritarian nations, citizens
434 Chapter 13

have few sources of uncensored information; therefore, as per capita GDP rises, they embrace
the uncensored information the Internet provides more enthusiastically than do citizens of non-
authoritarian nation in which other sources of uncensored information are available.

Chapter 13 Review Questions

1. What is a dummy variable?

2. When is a regression equivalent to calculating an average?
3. What implicit assumption do we make when we use averages to draw conclusions?
4. What is an interaction variable?

Chapter 13 Exercises

Faculty salary data: Artificially constructed cross section salary data and characteristics for 200
faculty members.

Salaryt Salary of faculty member t (dollars)

Experiencet Teaching experience for faculty member t (years)
Articlest Number of articles published by faculty member t
SexM1t 1 if faculty member t is male; 0 if female

1. Reconsider the faculty salary data and add the number of articles each faculty member has
published to the model:

Salaryt = βConst + βSexF1SexF1t + βExperExperiencet + βExper_SexF1Exper_SexF1t + βArtArticlest + et

a. What is your theory regarding how the number of published articles should affect salary?
What does your theory suggest about the sign of the Articles coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the published articles coefficient estimate.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

c. Formulate the null and alternative hypotheses regarding the effect of published articles.
d. Assess the effect of published articles.
2. Again, reconsider the faculty salary data and add an article–sex interaction variable to the
faculty salary model:

Salaryt = βConst + βSexF1SexF1t + βExperExperiencet + βExper_SexF1Exper_SexF1t + βArtArticlest

+ βArt_SexF1Articles_Sext + et
435 Dummy and Interaction Variables

where Articles_Sex = Articles × SexF1.

a. Focus on the following allegation:

Allegation: Women receive less credit for their publications than do their male colleagues.

What does the allegation suggest about the sign of the Articles_Sex coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the published articles–sex interaction coefficient estimate.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

c. Formulate the null and alternative hypotheses regarding the allegation.

d. Calculate Prob[Results IF H0 true] and assess the allegation.

House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

This file includes the following data variables:

CongressNamet Name of Congressperson t

3. Revisit the House earmark data and consider the following model:

Numbert = βConst + βTermsTermst + βLiberalScoreLiberalt + βIncomeIncomePCt + et

a. Develop a theory that explains how each explanatory variable affects the number of solo
earmarks. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the coefficients.
Interpret the coefficient estimates.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
4. Again consider the House earmark data and add a liberal–Democrat interaction variable to
the number of solo earmarks model:

Numbert = βConst + βTermsTermst + βLiberalScoreLiberalt + βIncomeIncomePCt

+ βLib_Dem Lib_Demt + et

where Lib_Demt = ScoreLiberalt × PartyDemocratt

a. Focus on the following allegation:

Allegation: Liberal Democrats receive more earmarks than their non-Democratic liberal
colleagues.

What does the allegation suggest about the sign of the Lib_Dem coefficient?

b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the Lib_Dem interaction coefficient estimate. What do you conclude
about the allegation?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

5. Revisit the House earmark data:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

Consider the following model:

Numbert = βConst + βTermsTermst + βLiberalScoreLiberalt + βIncomeIncomePCt

+ βNERegionNortheastt + et
437 Dummy and Interaction Variables

a. Focus on the following allegation:

Allegation: Members of Congress from the Northeast receive more earmarks than their col-
leagues from other parts of the country.

What does the allegation suggest about the sign of the RegionNortheast coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model. Interpret the Northeast region coefficient estimate.
c. Formulate the null and alternative hypotheses regarding the allegation.
d. Calculate Prob[Results IF H0 true] and assess the allegation.
Omitted Explanatory Variables, Multicollinearity,
14 and Irrelevant Explanatory Variables

Chapter 14 Outline

14.1 Review
14.1.1 Unbiased Estimation Procedures
14.1.2 Correlated and Independent (Uncorrelated) Variables

14.2 Omitted Explanatory Variables

14.2.1 A Puzzle: Baseball Attendance
14.2.2 Goal of Multiple Regression Analysis
14.2.3 Omitted Explanatory Variables and Bias
14.2.4 Resolving the Baseball Attendance Puzzle
14.2.5 Omitted Variable Summary

14.3 Multicollinearity
14.3.1 Perfectly Correlated Explanatory Variables
14.3.2 Highly Correlated Explanatory Variables
14.3.3 “Earmarks” of Multicollinearity

14.4 Irrelevant Explanatory Variables

Chapter 14 Prep Questions

1. Review the goal of multiple regression analysis. In words, explain what multiple regression
analysis attempts to do?
2. Recall that the presence of a random variable brings forth both bad news and good news.
a. What is the bad news?
b. What is the good news?
3. Consider an estimate’s probability distribution. Review the importance of its mean and
variance:
440 Chapter 14

a. Why is the mean of the probability distribution important? Explain.

b. Why is the variance of the probability distribution important? Explain.
4. Suppose that two variables are positively correlated.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does the graph look
like?
c. What can we say about their correlation coefficient?
d. When two variables are perfectly positively correlated, what will their correlation coef-
ficient equal?
5. Suppose that two variables are independent (uncorrelated).
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does the graph look
like?
c. What can we say about their correlation coefficient?

Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.

Attendancet Paid attendance for game t

DateDayt Day of game t
DateMontht Month of game t
DateYeart Year of game t
DayOfWeekt Day of the week for game t (Sunday = 0, Monday = 1, etc.)
DHt Designator hitter for game t (1 if DH permitted; 0 otherwise)
HomeGamesBehindt Games behind of the home team for before game t
HomeIncomet Per capita income in home team’s city for game t
HomeLossest Season losses of the home team before game t
HomeNetWinst Net wins (wins less losses) of the home team before game t
HomeSalaryt Player salaries of the home team for game t (millions of dollars)
HomeWinst Season wins of the home team before the game before game t
PriceTickett Average price of tickets sold for game t’s home team (dollars)
VisitGamesBehindt Games behind of the visiting team before game t
VisitLossest Season losses of the visiting team before the game t
VisitNetWinst Net wins (wins less losses) of the visiting team before game t
VisitSalaryt Player salaries of the visiting team for game t (millions of dollars)
VisitWinst Season wins of the visiting team before the game
441 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

6. Focus on the baseball data.

a. Consider the following simple model:

Attendancet = βConst + βPricePriceTickett + et

Attendance depends only on the ticket price.

i. What does the economist’s downward sloping demand curve theory suggest about the
sign of the PriceTicket coefficient, βPrice?
ii. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters. Interpret the regression results.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

b. Consider a second model:

Attendancet = βConst + βPricePriceTickett + βHomeSalaryHomeSalaryt + et

Attendance depends not only on the ticket price but also on the salary of the home team.

i. Devise a theory explaining the effect that home team salary should have on attendance.
What does your theory suggest about the sign of the HomeSalary coefficient, βHomeSalary?
ii. Use the ordinary least squares (OLS) estimation procedure to estimate both of the
model’s coefficients. Interpret the regression results.
c. What do you observe about the estimates for the PriceTicket coefficients in the two
models?
7. Again, focus on the baseball data and consider the following two variables:

Attendancet Paid attendance at the game t

PriceTickett Average ticket price in terms of dollars for game t

You can access these data by clicking the following link:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Generate a new variable, PriceCents, to express the price in terms of cents rather than dollars:

PriceCents = 100 × PriceTicket

a. What is the correlation coefficient for PriceTicket and PriceCents?
b. Consider the following model:

Attendancet = βConst + βPriceTicketPriceTickett + βPriceCentsPriceCentst + et

442 Chapter 14

Run the regression to estimate the parameters of this model. You will get an “unusual” result.
Explain this by considering what multiple regression analysis attempts to do.

8. The following are excerpts from an article appearing in the New York Times on September 1,
2008:
Doubts Grow over Flu Vaccine in Elderly by Brenda Goodman

The influenza vaccine, which has been strongly recommended for people over 65 for more than four
decades, is losing its reputation as an effective way to ward off the virus in the elderly.

A growing number of immunologists and epidemiologists say the vaccine probably does not work very
well for people over 70 . . .

The latest blow was a study in The Lancet last month that called into question much of the statistical
evidence for the vaccine’s effectiveness. . . .

The study found that people who were healthy and conscientious about staying well were the most likely
to get an annual flu shot. . . . [others] are less likely to get to their doctor’s office or a clinic to receive the
vaccine.

Dr. David K. Shay of the Centers for Disease Control and Prevention, a co-author of a commentary that
accompanied Dr. Jackson’s study, agreed that these measures of health . . . “were not incorporated into
early estimations of the vaccine’s effectiveness” and could well have skewed the findings.

a. Does being healthy and conscientious about staying well increase or decrease the chances
of getting flu?
b. According to the article, are those who are healthy and conscientious about staying well
more or less likely to get a flu shot?
c. The article alleges that previous studies did not incorporate health and conscientious in
judging the effectiveness of flu shots. If the allegation is true, have previous studies overes-
timated or underestimated the effectiveness of flu shots?
d. Suppose that you were the director of your community’s health department. You are con-
sidering whether or not to subsidize flu vaccines for the elderly. Would you find the previous
studies useful? That is, would a study that did not incorporate health and conscientious in
judging the effectiveness of flu shots help you decide if your department should spend your
limited budget to subsidize flu vaccines? Explain.

14.1 Review

14.1.1 Unbiased Estimation Procedures

Estimates are random variables. Consequently there is both good news and bad news. Before
the data are collected and the parameters are estimated:
443 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

• Bad news: On the one hand, we cannot determine the numerical values of the estimates with
certainty (even if we knew the actual values).
• Good news: On the other hand, we can often describe the probability distribution of the
estimate telling us how likely it is for the estimate to equal its possible numerical values.

Mean (Center) of the Estimate’s Probability Distribution

An unbiased estimation procedure does not systematically underestimate or overestimate the
actual value (figure 14.1). The mean (center) of the estimate’s probability distribution equals the
actual value. Applying the relative frequency interpretation of probability, when the experiment
is repeated many, many times, the average of the numerical values of the estimates equals the
actual value.
If the distribution is symmetric, we can provide an interpretation that is perhaps even more
intuitive. When the experiment were repeated many, many times,
• half the time the estimate is greater than the actual value;
• half the time the estimate is less than the actual value.

Accordingly we can apply the relative frequency interpretation of probability. In one repetition,
the chances that the estimate will be greater than the actual value equal the chances that the
estimate will be less.

Probability distribution

Estimate
Actual value

Figure 14.1
Probability distribution of an estimate—Unbiased estimation procedure
444 Chapter 14

Variance large Variance small

Estimate Estimate
Actual value Actual value

Variance large Variance small

↓ ↓
Small probability that the numerical
Large probability that the numerical
value of the estimate from
value of the estimate from
one repetition of the experiment
one repetition of the experiment
will be close to the actual value
will be close to the actual value

↓ ↓
Estimate is unreliable Estimate is reliable

Figure 14.2
Probability distribution of an estimate—Importance of variance

Variance (Spread) of the Estimate’s Probability Distribution Variance

When the estimation procedure is unbiased, the distribution variance (spread) indicates the
estimate’s reliability, the likelihood that the numerical value of the estimate will be close to the
actual value (figure 14.2).

14.1.2 Correlated and Independent (Uncorrelated) Variables

Two variables are

• correlated whenever the value of one variable does help us predict the value of the other;
• independent (uncorrelated) whenever the value of one variable does not help us predict the
value of the other.

Scatter Diagrams
The Dow Jones and Nasdaq growth rates are positively correlated. Most of the scatter diagram
points lie in the first and third quadrants (figure 14.3). When the Dow Jones growth rate is high,
the Nasdaq growth rate is usually high also. Similarly, when the Dow Jones growth rate is low,
the Nasdaq growth rate is usually low also. Knowing one growth rate helps us predict the other.
Amherst precipitation and the Nasdaq growth rate are independent, uncorrelated. The scatter
diagram points are spread rather evenly across the graph. Knowing the Nasdaq growth rate does
not help us predict Amherst precipitation, and vice versa.
445 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Deviations from means Deviations from means

20 Nasdaq 20 Nasdaq

10 10

Dow Jones 0 Precipitation

0
–20 –10 10 20 –5 –4 –3 –2 –1 1 2 3 4 5

–10 –10

–20 –20

Not independent: Positively correlated Independent: Uncorrelated

CorrCoef = 0.67 CorrCoef = –0.07≈ 0

Figure 14.3
Scatter diagrams, correlation, and independence

Correlation Coefficient
The correlation coefficient indicates the degree to which two variables are correlated; the correla-
tion coefficient ranges from −1 to +1:
• = 0 = Independent (uncorrelated): Knowing the value of one variable does not help us
predict the value of the other.
• > 0 = Positive correlation: Typically, when the value of one variable is high, the value of
the other variable will be high.
• < 0 = Negative correlation: Typically, when the value of one variable is high, the value of
the other variable will be low.

14.2 Omitted Explanatory Variables

We will consider baseball attendance data to study the omitted variable phenomena.

Project: Assess the determinants of baseball attendance.

Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.
446 Chapter 14

Attendancet Paid attendance for game t

DateDayt Day of game t
DateMontht Month of game t
DateYeart Year of game t
DayOfWeekt Day of the week for game t (Sunday = 0, Monday = 1, etc.)
DHt Designator hitter for game t (1 if DH permitted; 0 otherwise)
HomeGamesBehindt Games behind of the home team before game t
HomeIncomet Per capita income in home team’s city for game t
HomeLossest Season losses of the home team before game t
HomeNetWinst Net wins (wins less losses) of the home team before game t
HomeSalaryt Player salaries of the home team for game t (millions of dollars)
HomeWinst Season wins of the home team before game t
PriceTickett Average price of tickets sold for game t’s home team (dollars)
VisitGamesBehindt Games behind of the visiting team before game t
VisitLossest Season losses of the visiting team before game t
VisitNetWinst Net wins (wins less losses) of the visiting team before game t
VisitSalaryt Player salaries of the visiting team for game t (millions of dollars)
VisitWinst Season wins of the visiting team before the game

14.2.1 A Puzzle: Baseball Attendance

Let us begin our analysis by focusing on the price of tickets. Consider the following two models
that attempt to explain game attendance:

Model 1: Attendance depends on ticket price only.

The first model has a single explanatory variable, ticket price, PriceTicket:

Attendancet = βConst + βPricePriceTickett + et

Downward sloping demand theory: This model is based on the economist’s downward sloping
demand theory. An increase in the price of a good decreases the quantity demand. Higher ticket
prices should reduce attendance; hence the PriceTicket coefficient should be negative:

βPrice < 0

We will use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters (table 14.1):
447 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Table 14.1
Baseball attendance regression results—Ticket price only

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket 1,896.611 142.7238 13.28868 0.0000

Const 3,688.911 1,839.117 2.005805 0.0453
Number of observations 585
Estimated equation: EstAttendance = 3,688 + 1,897PriceTicket
Interpretation of estimates:
bPriceTicket = 1,897. We estimate that a $1.00 increase in the price of tickets increases attendance by 1,897 per game.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

The estimated coefficient for the ticket price is positive suggesting that higher prices lead to an
increase in quantity demanded. This contradicts the downward sloping demand theory, does
it not?

Model 2: Attendance depends on ticket price and salary of home team.

In the second model, we include not only the price of tickets, PriceTicket, as an explanatory
variable, but also the salary of the home team, HomeSalary:

Attendancet = βConst + βPricePriceTickett + βHomeSalaryHomeSalaryt + et

We can justify the salary explanatory variable in the grounds that fans like to watch good players.
We will call this the star theory. Presumably a high salary team has better players, more stars,
on its roster and accordingly will draw more fans.

Teams with higher salaries will have better players, which will increase attendance.
Star theory:
The HomeSalary coefficient should be positive:

βHomeSalary > 0

Now use the ordinary least squares (OLS) estimation procedure to estimate the parameters (table
14.2). These coefficient estimates lend support to our theories.
The two models produce very different results concerning the effect of the ticket price on
attendance. More specifically, the coefficient estimate for ticket price changes drastically from
1,897 to −591 when we add home team salary as an explanatory variable. This is a disquieting
puzzle. We will solve this puzzle by reviewing the goal of multiple regression analysis and then
explaining when omitting an explanatory variable will prevent us from achieving the goal.
448 Chapter 14

Table 14.2
Baseball attendance regression results—Ticket price and home team salary

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket −590.7836 184.7231 −3.198211 0.0015

HomeSalary 783.0394 45.23955 17.30874 0.0000
Const 9,246.429 1,529.658 6.044767 0.0000
Number of observations 585
Estimated equation: EstAttendance = 9,246 − 591PriceTicket + 783HomeSalary
Interpretation of estimates:
bPriceTicket = −591. We estimate that a $1.00 increase in the price of tickets decreases attendance by 591 per game.
bHomeSalary = 783. We estimate that a $1 million increase in the home team salary increases attendance by 783 per
game.

14.2.2 Goal of Multiple Regression Analysis

Multiple regression analysis attempts to sort out the individual effect of each explanatory vari-
able. The estimate of an explanatory variable’s coefficient allows us to assess the effect that an
individual explanatory variable has on the dependent variable. An explanatory variable’s coef-
ficient estimate estimates the change in the dependent variable resulting from a change in that
particular explanatory variable while all other explanatory variables remain constant.
In model 1 we estimate that a $1.00 increase in the ticket price increase attendance by nearly
2,000 per game, whereas in model 2, we estimate that a $1.00 increase decreases attendance by
about 600 per game. The two models suggest that the individual effect of the ticket price is very
different. The omitted variable phenomenon allows us to resolve this puzzle.

14.2.3 Omitted Explanatory Variables and Bias

Claim: Omitting an explanatory variable from a regression will bias the estimation procedure
whenever two conditions are met. Bias results if the omitted explanatory variable
• influences the dependent variable;
• is correlated with an included explanatory variable.
When these two conditions are met, the coefficient estimate of the included explanatory variable
is a composite of two effects, the influence that the
• included explanatory variable itself has on the dependent variable (direct effect);

• omitted explanatory variable has on the dependent variable because the included explanatory
variable also acts as a proxy for the omitted explanatory variable (proxy effect).
449 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Act Coef1 Sample size Is the estimation

procedure for the
0 50
coefficient value
1 75
unbiased?
2 100
125

Act Coef2
Pause
−5
0
Start Stop
5

Both X’s
Repetition
Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
corr X1&X2
Ests below act
− 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2

Figure 14.4
Omitted variable simulation

Since the goal of multiple regression analysis is to sort out the individual effect of each explana-
tory variable we want to capture only the direct effect.

Econometrics Lab 14.1: Omitted Variable Proxy Effect

We can now use the Econometrics Lab to justify our claims concerning omitted explanatory
variables. The following regression model including two explanatory variables is used
(figure 14.4):

Model: yt = βConst + βx1x1t + βx2x2t + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 14.1.]

The simulation provides us with two options; we can either include both explanatory variables
in the regression, “Both Xs” or just one, “Only X1.” By default the “Only X1” option is selected,
consequently the second explanatory variable is omitted. That is, x1t is the included explanatory
450 Chapter 14

variable and x2t is the omitted explanatory variable. For simplicity, assume that x1’s coefficient,
βx1, is positive. We will consider three cases to illustrate when bias does and does not result:
• Case 1: The coefficient of the omitted explanatory variable is positive and the two explana-
tory variables are independent (uncorrelated).
• Case 2: The coefficient of the omitted explanatory variable equals zero and the two explana-
tory variables are positively correlated.
• Case 3: The coefficient of the omitted explanatory variable is positive and the two explana-
tory variables are positively correlated.

We will now show that only in the last case does bias result because only in the last case is the
proxy effect is present.

Case 1: The coefficient of the omitted explanatory variable is positive and the two explanatory
variables are independent (uncorrelated).

Will bias result in this case? Since the two explanatory variables are independent (uncorrelated),
an increase in the included explanatory variable, x1t, typically will not affect the omitted explana-
tory variable, x2t. Consequently the included explanatory variable, x1t, will not act as a proxy
for the omitted explanatory variable, x2t. Bias should not result.

Typically
Included omitted
variable Independence variable
x1t up x2t unaffected
↓ βx1 > 0 ↓ βx2 > 0
yt up yt unaffected
↓ ↓
Direct effect No proxy effect

We will use our lab to confirm this logic. By default, the actual coefficient for the included
explanatory variable, x1t, equals 2 and the actual coefficient for the omitted explanatory variable,
x2t, is nonzero, it equals 5. Their correlation coefficient, Corr X1&X2, equals 0.00; hence the
two explanatory variables are independent (uncorrelated). Be certain that the Pause checkbox is
cleared. Click Start and after many, many repetitions, click Stop. Table 14.3 reports that the
average value of the coefficient estimates for the included explanatory variable equals its actual
value. Both equal 2.0. The ordinary least squares (OLS) estimation procedure is unbiased.
The ordinary least squares (OLS) estimation procedure captures the individual influence that
the included explanatory variable itself has on the dependent variable. This is precisely the effect
that we wish to capture. The ordinary least squares (OLS) estimation procedure is unbiased; it
is doing what we want it to do.
451 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Table 14.3
Omitted variables simulation results

Percent of coef1 estimates

Actual Actual Corr Mean (average)
coef 1 coef 2 coef of coef1 estimates Below actual value Above actual value

2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50

Table 14.4
Omitted variables simulation results

Percent of coef1 estimates

Actual Actual Corr Mean (average)
coef 1 coef 2 coef of coef1 estimates Below actual value Above actual value

2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50
2 0 0.30 ≈ 2.0 ≈ 50 ≈ 50

Case 2: The coefficient of the omitted explanatory variable equals zero and the two explanatory
variables are positively correlated.

In the second case the two explanatory variables are positively correlated; when the included
explanatory variable, x1t, increases, the omitted explanatory variable, x2t, will typically increase
also. But the actual coefficient of the omitted explanatory variable, βx2, equals 0; hence, the
dependent variable, yt, is unaffected by the increase in x2t. There is no proxy effect because the
omitted variable, x2t, does not affect the dependent variable; hence bias should not result.
Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 = 0
yt up yt unaffected
↓ ↓
Direct effect No proxy effect

To confirm our logic with the simulation, be certain that the actual coefficient for the omitted
explanatory variable equals 0 and the correlation coefficient equals 0.30. Click Start and then
after many, many repetitions, click Stop. Table 14.4 reports that the average value of the coef-
ficient estimates for the included explanatory variable equals its actual value. Both equal 2.0.
The ordinary least squares (OLS) estimation procedure is unbiased.
452 Chapter 14

Again, the ordinary least squares (OLS) estimation procedure captures the influence that the
included explanatory variable itself has on the dependent variable. Again, there is no proxy effect
and all is well.

Case 3: The coefficient of the omitted explanatory variable is positive and the two explanatory
variables are positively correlated.

As with case 2 the two explanatory variables are positively correlated; when the included
explanatory variable, x1t, increases the omitted explanatory variable, x2t, will typically increase
also. But now the actual coefficient of the omitted explanatory variable, βx2, is no longer 0, it is
positive; hence an increase in the omitted explanatory variable, x2t, increases the dependent
variable. In additional to having a direct effect on the dependent variable, the included explana-
tory variable, x1t, also acts as a proxy for the omitted explanatory variable, x2t. There is a proxy
effect.

Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 > 0
yt up yt up
↓ ↓
Direct effect Proxy effect

In the simulation, the actual coefficient of omitted explanatory variable, βx2, once again equals
5. The two explanatory variables are positively correlated, the correlation coefficient equals 0.30.
Click Start and then after many, many repetitions click Stop. Table 14.5 reports that the average
value of the coefficient estimates for the included explanatory variable, 3.5, exceeds its actual
value, 2.0. The ordinary least squares (OLS) estimation procedure is biased upward.
Now we have a problem. The ordinary least squares (OLS) estimation procedure overstates
the influence of the included explanatory variable, the effect that the included explanatory vari-
able itself has on the dependent variable.

Table 14.5
Omitted variables simulation results

Percent of coef1 estimates

Actual Actual Corr Mean (average)
coef 1 coef 2 coef of coef1 estimates Below actual value Above actual value

2 5 0.00 ≈ 2.0 ≈ 50 ≈ 50
2 0 0.30 ≈ 2.0 ≈ 50 ≈ 50
2 5 0.30 ≈ 3.5 ≈ 28 ≈ 72
453 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

bx1 < 2

bx1
2 3. 5

Figure 14.5
Probability distribution of an estimate—Upward bias

Let us now take a brief aside. Case 3 provides us with the opportunity to illustrate what bias
does and does not mean.
• What bias does mean: Bias means that the estimation procedure systematically overesti-
mates or underestimates the actual value. In this case, upward bias is present. The average of
the estimates is greater than the actual value after many, many repetitions.
• What bias does not mean: Bias does not mean that the value of the estimate in a single
repetition must be less than the actual value in the case of downward bias or greater than the
actual value in the case of upward bias. Focus on the last simulation. The ordinary least squares
(OLS) estimation procedure is biased upward as a consequence of the proxy effect. Despite the
upward bias, however, the estimate of the included explanatory variable is less than the actual
value in many of the repetitions as shown in figure 14.5.

Upward bias does not guarantee that in any one repetition the estimate will be greater than the
actual value. It just means that it will be greater “on average.” If the probability distribution is
symmetric, the chances of the estimate being greater than the actual value exceed the chances
of being less.
Now we return to our three omitted variable cases by summarizing them (table 14.6).

Econometrics Lab 14.2: Avoiding Omitted Variable Bias

Question: Is the estimation procedure biased or unbiased when both explanatory variables are
included in the regression?
454 Chapter 14

Table 14.6
Omitted variables simulation summary

Does the omitted Is the omitted Estimation procedure

variable influence the variable correlated with for the included
Case dependent variable? an included variable? variable is

1 Yes No Unbiased
2 No Yes Unbiased
3 Yes Yes Biased

Table 14.7
Omitted variables simulation results—No omitted variables

Actual Actual Correlation Mean of coef 1

coef 1 coef 2 parameter estimates

2 5 0.3 ≈ 2.0

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 14.2.]

To address this question, “Both Xs” is now selected. This means that both explanatory variables,
x1t and x2t, will be included in the regression. Both explanatory variables affect the dependent
variable and they are correlated. As we saw in case 3, if one of the explanatory variables is
omitted, bias will result. To see what occurs when both explanatory variables are included, click
Start and after many, many repetitions, click Stop. When both variables are included the ordi-
nary least squares (OLS) estimation procedure is unbiased (table 14.7).

Conclusion: To avoid omitted variable bias, all relevant explanatory variables should be
included in a regression.

14.2.4 Resolving the Baseball Attendance Puzzle

We begin by reviewing the baseball attendance models:

• Model 1: Attendance depends on ticket price only.

Attendancet = βConst + βPricePriceTickett + et

Estimated equation: EstAttendance = 3,688 + 1,897PriceTicket
Interpretation: We estimate that $1.00 increase in the price of tickets increases by 1,897 per
game.
• Model 2: Attendance depends on ticket price and salary of home team.
455 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Attendancet = βConst + βPricePriceTickett + βHomeSalaryHomeSalaryt + et

Estimated equation: EstAttendance = 9,246 − 591PriceTicket + 783HomeSalary

Interpretation: We estimate the following:
• $1.00 increase in the price of tickets decreases attendance by 591 per game.
• $1 million increase in the home team salary increases attendance by 783 per game.

The ticket price coefficient estimate is affected dramatically by the presence of home team salary;
in model 1 the estimate is much higher 1,897 versus −591. Why?
We will now argue that when ticket price is included in the regression and home team salary
is omitted, as in model 1, there reason to believe that the estimation procedure for the ticket
price coefficient will be biased. We just learned that the omitted variable bias results when the
following two conditions are met; when an omitted explanatory variable:
• influences the dependent variable

and
• is correlated with an included explanatory variable.

Now focus on model 1:

Attendancet = βConst + βPricePriceTickett + et

Model 1 omits home team salary, HomeSalaryt. Are the two omitted variable bias conditions
met?
• It certainly appears reasonable to believe that the omitted explanatory variable, HomeSalaryt,
affects the dependent variable, Attendancet. The club owner who is paying the high salaries
certainly believes so. The owner certainly hopes that by hiring better players more fans will
attend the games. Consequently it appears that the first condition required for omitted variable
bias is met.
• We can confirm the correlation by using statistical software to calculate the correlation matrix
(table 14.8).

Table 14.8
Ticket price and home team salary correlation matrix

Correlation matrix

PriceTicket HomeSalary

PriceTicket 1.000000 0.777728

HomeSalary 0.777728 1.000000
456 Chapter 14

The correlation coefficient between PriceTickett and HomeSalaryt is 0.78; the variables are
positively correlated. The second condition required for omitted variable bias is met.
We have reason to suspect bias in model 1. When the included variable, PriceTickett, increases
the omitted variable, HomeSalaryt, typically increases also. An increase in the omitted variable,
HomeSalaryt, increases the dependent variable, Attendancet:

Included Positive Typically

variable correlation omitted variable
PriceTickett up HomeSalaryt up
↓ βPrice < 0 ↓ βHomeSalary > 0
Attendancet down Attendancet up
↓ ↓
Direct effect Proxy effect

In additional to having a direct effect on the dependent variable, the included explanatory vari-
able, PriceTickett, also acts as a proxy for the omitted explanatory variable, HomeSalaryt. There
is a proxy effect and upward bias results. This provides us with an explanation of why the ticket
price coefficient estimate in model 1 is greater than the estimate in model 2.

14.2.5 Omitted Variable Summary

Omitting an explanatory variable from a regression biases the estimation procedure whenever
two conditions are met. Bias results if the omitted explanatory variable:
• influences the dependent variable;
• is correlated with an included explanatory variable.

When these two conditions are met, the coefficient estimate of the included explanatory variable
is a composite of two effects; the coefficient estimate of the included explanatory reflects two
influences:
• The included explanatory variable, which has an effect on the dependent variable (direct
effect).
• The omitted explanatory variable, which has an effect on the dependent variable because the
included explanatory variable also acts as a proxy for the omitted explanatory variable (proxy
effect).

The bad news is that the proxy effect leads to bias. The good news is that we can eliminate the
proxy effect and its accompanying bias by including the omitted explanatory variable. But now,
we will learn that if two explanatory variables are highly correlated a different problem can
emerge.
457 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

14.3 Multicollinearity

The phenomenon of multicollinearity occurs when two explanatory variables are highly cor-
related. Recall that multiple regression analysis attempts to sort out the influence of each indi-
vidual explanatory variable. But what happens when we include two explanatory variables in a
single regression that are perfectly correlated? Let us see.

14.3.1 Perfectly Correlated Explanatory Variables

In our baseball attendance workfile, ticket prices, PriceTickett, are reported in terms of dollars.
Generate a new variable, PriceCentst, reporting ticket prices in terms of cents rather than dollars:

PriceCentst = 100 × PriceTickett

Note that the variables PriceTickett and PriceCentst are perfectly correlated. If we know one,
we can predict the value of the other with complete accuracy. Just to confirm this, use statistical
software to calculate the correlation matrix (table 14.9).
The correlation coefficient of PriceTickett and PriceCentst equals 1.00. The variables are
indeed perfectly correlated. Now run a regression with Attendance as the dependent variable and
both PriceTicket and PriceCents as explanatory variables.

Dependent variable: Attendance

Explanatory variables: PriceTicket and PriceCents

Your statistical software will report a diagnostic. Different software packages provide different
messages, but basically the software is telling us that it cannot run the regression.
Why does this occur? The reason is that the two variables are perfectly correlated. Knowing
the value of one allows us to predict perfectly the value of the other with complete accuracy.
Both explanatory variables contain precisely the same information. Multiple regression analysis
attempts to sort out the influence of each individual explanatory variable. But if both variables
contain precisely the same information, it is impossible to do this. How can we possibility sepa-
rate out each variable’s individual effect when the two variables contain the identical informa-
tion? We are asking statistical software to do the impossible.

Table 14.9
EViews dollar and cent ticket price correlation matrix

Correlation matrix

PriceTicket PriceCents

PriceTicket 1.000000 1.000000

PriceCents 1.000000 1.000000
458 Chapter 14

Explanatory variables
perfectly correlated
↓
Knowing the value of one
explanatory value allows
us to predict perfectly the
value of the other
↓
Both variables contain
precisely the same information
↓
Impossible to separate out the
individual effect of each variable

Next we consider a case in which the explanatory variables are highly, although not perfectly,
correlated.

14.3.2 Highly Correlated Explanatory Variables

To investigate the problems created by highly correlated explanatory variable, we will use our
baseball data to investigate a model that includes four explanatory variables:

Attendancet = βConst + βPricePriceTickett + βHomeSalaryHomeSalaryt

+ βHomeNWHomeNetWinst + βHomeGBHomeGamesBehindt + et

where

Attendancet = paid attendance for game t

PriceTickett = average price of tickets sold for game t’s home team (dollars)
HomeSalaryt = player salaries of the home team for game t (millions of dollars)
HomeNetWinst = the difference between the number of wins and losses of the home team before
game t
HomeGamesBehindt = games behind of the home team before game t
The variable HomeNetWinst equals the difference between the number of wins and losses of
the home team. It attempts to capture the quality of the team. On the one hand, HomeNetWinst
will be positive and large for a high-quality team, a team that wins many more games than it
losses. On the other hand, HomeNetWinst will be a negative number for a low-quality team. Since
baseball fans enjoy watching high-quality teams, we would expect high-quality teams to be
rewarded with greater attendance:
459 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Table 14.10
2009 final season standings—AL East

Team Wins Losses Home net wins Games behind

New York Yankees 103 59 44 0

Boston Red Sox 95 67 28 8
Tampa Bay Rays 84 78 6 19
Toronto Blue Jays 75 87 −12 28
Baltimore Orioles 64 98 −34 39

The variable HomeGamesBehindt captures the home team’s standing in its divisional race. For
those who are not baseball fans, note that all teams that win their division automatically qualify
for the baseball playoffs. Ultimately the two teams what win the American and National League
playoffs meet in the World Series. Since it is the goal of every team to win the World Series,
each team strives to win its division. Games behind indicates how close a team is to winning its
division. To explain how games behind are calculated, consider the final standings of the Ameri-
can League Eastern Division in 2009 (table 14.10).
The Yankees had the best record; the games behind value for the Yankees equals 0. The Red
Sox won eight fewer games than the Yankees; hence the Red Sox were 8 games behind. The
Rays won 19 fewer games than the Yankees; hence the Rays were 19 games behind. Similarly
the Blue Jays were 28 games behind and the Orioles 39 games behind.1 During the season if a
team’s games behind becomes larger, it becomes less likely the team will win its division, less
likely for that team to qualify for the playoffs, and less likely for that team to eventually win
the World Series. Consequently, if a team’s games behind becomes larger, we would expect
home team fans to become discourage resulting in less attendance.
We use the terms team quality and division race to summarize our theories regarding home
net wins and home team games behind:
• Team quality theory: More net wins increase attendance. βHomeNW > 0.
• Division race theory: More games behind decreases attendance. βHomeGB < 0.

We would expect HomeNetWinst and HomeGamesBehindt to be negatively correlated. As

HomeNetWins decreases, a team moves farther from the top of its division and consequently
HomeGamesBehindt increases. We would expect the correlation coefficient for HomeNetWinst
and HomeGamesBehindt to be negative. Let us check by computing their correlation matrix:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

1. In this example all teams have played the same number of games. When a different number of games have been
played, the calculation becomes a little more complicated. Games behind for a non–first place team equals
( Wins of first − Wins of trailing) + ( Losses of trailing − Losses of first)
2
460 Chapter 14

Table 14.11
HomeNetWins and HomeGamesBehind correlation matrix

Correlation matrix

HomeNetWins HomeGamesBehind

HomeNetWins 1.000000 − 0.962037

HomeGamesBehind − 0.962037 1.000000

Table 14.12
Attendance regression results

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket −437.1603 190.4236 −2.295725 0.0220

HomeSalary 667.5796 57.89922 11.53003 0.0000
HomeNetWins 60.53364 85.21918 0.710329 0.4778
HomeGamesBehind −84.38767 167.1067 − 0.504993 0.6138
Const 11,868.58 2,220.425 5.345184 0.0000
Number of observations 585
Estimated equation: EstAttendance = 11,869 − 437PriceTicket + 668HomeSalary + 61HomeNetWins
− 84HomeGamesBehind
Interpretation of estimates:
bPriceTicket = − 437. We estimate that a $1.00 increase in the price of tickets decreases attendance by 437 per game.
bHomeSalary = 668. We estimate that a $1 million increase in the home team salary increases attendance by 668 per
game.
bHomeGamesBehind = −84. We estimate that 1 additional game behind decreases attendance by 84 per game.

Table 14.11 reports that the correlation coefficient for HomeGamesBehindt and HomeNetWinst
equals −0.962. Recall that the correlation coefficient must lie between −1 and +1. When two
variables are perfectly negatively correlated their correlation coefficient equals −1. While Home-
GamesBehindt and HomeNetWinst are not perfectly negatively correlated, they come close; they
are highly negatively correlated.
We use the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters (table 14.12).
The sign of each estimate supports the theories. Focus on the two new variables included in
the model: HomeNetWinst and HomeGamesBehindt. Construct the null and alternative
hypotheses.
461 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Team quality theory Division race theory

H0: βHomeNW = 0 Team quality has no H0: βHomeGB = 0 Games behind has no
effect on attendance effect on attendance
H1: βHomeNW > 0 Team quality increases H1: βHomeGB < 0 Games behind decreases
attendance attendance

While the signs coefficient estimates are encouraging, some of results are disappointing:
• The coefficient estimate for HomeNetWinst is positive supporting our theory, but what about
the Prob[Results IF H0 true]? What is the probability that the estimate from one regression would
equal 60.53 or more, if the H0 were true (i.e., if the actual coefficient, βHomeNW, equals 0, if home
team quality has no effect on attendance)? Using the tails probability, we can easily calculate
the probability

0.4778
Prob[ Results IF H 0 true] = ≈ 0.24
2

We cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10 percent,
suggesting that it is quite possible for the null hypothesis to be true, quite possible that home
team quality has no effect on attendance.
• Similarly the coefficient estimate for HomeGamesBehindt is negative supporting our theory,
but what about the Prob[Results IF H0 true]? What is the probability that the estimate from one
regression would equal −84.39 or less, if the H0 were true (i.e., if the actual coefficient, βHomeGB,
equals 0, if games behind has no effect on attendance)? Using the tails probability, we can easily
calculate the probability

0.6138
Prob[ Results IF H 0 true] = ≈ 0.31
2

Again, we cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10
percent, suggesting that it is quite possible for the null hypothesis to be true, quite possible that
games behind has no effect on attendance.
Should we abandon our “theory” as a consequence of these regression results?
Let us perform a Wald test (table 14.13) to access the proposition that both coefficients
equal 0:

H0: βHomeNW = 0 and βHomeGB = 0 Neither team quality nor games

behind have an effect on attendance
H1: βHomeNW ≠ 0 and/or βHomeGB ≠ 0 Either team quality and/or games
behind have an effect on attendance
462 Chapter 14

Table 14.13
EViews Wald test results

Wald test

Degrees of freedom

Value Num Dem Prob

F-Statistic 5.046779 2 580 0.0067

Prob[Results IF H0 true]: What is the probability that the F-statistic would be 111.4 or more, if
the H0 were true (i.e., if both βHomeNW and βHomeGB equal 0, if both team quality and games behind
have no effect on attendance)?

Prob[Results IF H0 true] = 0.0067

We can reject the null hypothesis at a 1 percent significance level; it is unlikely that both team
quality and games behind have no effect on attendance.
There appears to be a paradox when we compare the t-tests and the Wald test:

t-Tests Wald test

↓
Cannot reject the Cannot reject the Can reject the null
null hypothesis null hypothesis hypothesis that both
that team quality that games behind team quality and games
have no effect have no effect behind have no effect
on attendance on attendance on attendance
↓
Individually, neither Team quality and/or
team quality nor games behind games behind do appear
appears to influence attendance to influence attendance

Individually, neither team quality nor games behind appears to influence attendance significantly,
but taken together by asking if team quality and/or games behind influence attendance, we
conclude that they do.
Next let us run two regressions each of which includes only one of the two troublesome
explanatory variables (tables 14.14 and 14.15). When only a single explanatory variable is
included the coefficient is significant.
463 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Table 14.14
EViews attendance regression results—HomeGamesBehind omitted

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket − 449.2097 188.8016 −2.379268 0.0177

HomeSalary 672.2967 57.10413 11.77317 0.0000
HomeNetWins 100.4166 31.99348 3.138658 0.0018
Const 11,107.66 1,629.863 6.815087 0.0000
Number of observations 585
Estimated equation: EstAttendance = 11,108 − 449PriceTicket + 672HomeSalary + 100HomeNetWins
Interpretation of estimates:
bPriceTicket = − 449. We estimate that a $1.00 increase in the price of tickets decreases attendance by 449 per game.
bHomeSalary = 672. We estimate that a $1 million increase in the home team salary increases attendance by 672 per
game.
bHomeNetWins = 100. We estimate that 1 additional home net win increases attendance by 100 per game.

Table 14.15
EViews attendance regression results—HomeNetWins omitted

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket − 433.4971 190.2726 −2.278295 0.0231

HomeSalary 670.8518 57.69106 11.62835 0.0000
HomeGamesBehind −194.3941 62.74967 −3.097931 0.0020
Const 12,702.16 1,884.178 6.741486 0.0000
Number of observations 585
Estimated equation: EstAttendance = 12,702 − 433PriceTicket + 671HomeSalary − 194HomeGamesBehind
Interpretation of estimates:
bPriceTicket = − 433. We estimate that a $1.00 increase in the price of tickets decreases attendance by 433 per game.
bHomeSalary = 671. We estimate that a $1 million increase in the home team salary increases attendance by 671 per
game.
bHomeGamesBehind = −194. We estimate that 1 additional game behind decreases attendance by 194 per game.
464 Chapter 14

14.3.3 “Earmarks” of Multicollinearity

We are observing what we will call the earmarks of multicollinearity:

• Explanatory variables are highly correlated.
• A regression including both explanatory variables: On the one hand, t-tests do not allow us to
reject the null hypothesis that the coefficient of each individual variable equals 0; when consider-
ing each explanatory variable individually, we cannot reject the hypothesis that each individually
has no influence. On the other hand, a Wald test allows us to reject the null hypothesis that the
coefficients of both explanatory variables equal 0; when considering both explanatory variables
together, we can reject the hypothesis that they have no influence.
• Regressions with only one explanatory variable appear to produce “good” results.

How can we explain this? Recall that multiple regression analysis attempts to sort out the influ-
ence of each individual explanatory variable. When two explanatory variables are perfectly
correlated, it is impossible for the ordinary least squares (OLS) estimation procedure to separate
out the individual influences of each variable. Consequently, if two variables are highly corre-
lated, as team quality and games behind are, it may be very difficult for the ordinary least squares
(OLS) estimation procedure to separate out the individual influence of each explanatory variable.
This difficulty evidences itself in the variance of the coefficient estimates’ probability distribu-
tions. When two highly correlated variables are included in the same regression, the variances
of each estimate’s probability distribution is large. This explains our t-test results.

Explanatory variables Explanatory variables

perfectly correlated highly correlated
↓ ↓
Knowing the value of one Knowing the value of one
variable allows us to predict variable allows us to
the other perfectly predict the other very accurately
↓ ↓
Both variables contain In some sense, both variables
the same information contain nearly the same information
↓ ↓
Impossible to separate out Difficult to separate out
their individual effects their individual effects
↓
Large variance of each coefficient
estimate’s probability distribution

We use a simulation to justify our explanation (figure 14.6).

465 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Act Coef1 Sample size

0 50
1 75
2 100
125
Act Coef2
Pause
−5
0
Start Stop
5

Repetition Both X’s

Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
Corr X1&X2
Ests below act − 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2

Figure 14.6
Multicollinearity simulation

Econometrics Lab 14.3: Multicollinearity

Our model includes two explanatory variables, x1t and x2t:

Model: y = βConst + βx1x1t + βx2x2t + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 14.3.]

By default the actual value of the coefficient for the first explanatory variable equals 2 and actual
value for the second equals 5. Note that the “Both Xs” is selected; both explanatory variables
are included in the regression. Initially, the correlation coefficient is specified as 0.00; that is,
initially the explanatory variables are independent. Be certain that the Pause checkbox is cleared
and click Start. After many, many repetitions click Stop. Next repeat this process for a correla-
tion coefficient of 0.30, a correlation coefficient of 0.60, and a correlation coefficient of 0.90
(table 14.16).
466 Chapter 14

Table 14.16
Multicollinearity simulation results

Correlation Mean of Variance of

Actual coef 1 parameter coef 1 estimates coef 1 estimates

2 0.00 ≈ 2.0 ≈ 6.5

2 0.30 ≈ 2.0 ≈ 7.2
2 0.60 ≈ 2.0 ≈ 10.1
2 0.90 ≈ 2.0 ≈ 34.2

The simulation reveals both good news and bad news:

• Good news: The ordinary least squares (OLS) estimation procedure is unbiased. The mean
of the estimate’s probability distribution equals the actual value. The estimation procedure does
not systematically underestimate or overestimate the actual value.
• Bad news: As the two explanatory variables become more correlated, the variance of the
coefficient estimate’s probability distribution increases. Consequently the estimate from one
repetition becomes less reliable.

The simulation illustrates the phenomenon of multicollinearity.

14.4 Irrelevant Explanatory Variables

An irrelevant explanatory variable is a variable that does not influence the dependent variable.
Including an irrelevant explanatory variable can be viewed as adding “noise,” an additional
element of uncertainty, into the mix. An irrelevant explanatory variable adds a new random
influence to the model. If our logic is correct, irrelevant explanatory variables should lead to
both good news and bad news:
• Good news: Random influences do not cause the ordinary least squares (OLS) estimation
procedure to be biased. Consequently the inclusion of an irrelevant explanatory variable should
not lead to bias.
• Bad news: The additional uncertainty added by the new random influence means that the
coefficient estimate is less reliable; the variance of the coefficient estimate’s probability distribu-
tion should rise when an irrelevant explanatory variable is present.

We will use our Econometrics Lab to justify our intuition.

Econometrics Lab 14.4: Irrelevant Explanatory Variables

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 14.4.]
467 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

Act Coef1 Sample size

0 50
1 75
2 100
125
Act Coef2
Pause
−5
0
Start Stop
5

Repetition Both X’s

Only X1
Coef1 value est
Percent of estimates
Mean above and below
Var actual value
Corr X1&X2
Ests below act − 0.30 Correlation
Ests above act 0.00 coefficient for
0.30 X1 and X2

Figure 14.7
Irrelevant explanatory variable simulation

Once again, we use a two explanatory variable model:

Model: y = βConst + βx1x1t + βx2x2t + et

By default the first explanatory variable, x1t, is the relevant explanatory variable; the default
value of its coefficient is 2. The second explanatory variable, x2t, is the irrelevant one (figure
14.7). An irrelevant explanatory variable has no effect on the dependent variable; consequently
the actual value of its coefficient, βx2, equals 0.
Initially the “Only X1” option is selected indicating that only the relevant explanatory vari-
able, x1t, is included in the regression; the irrelevant explanatory variable, x2t, is not included.
Click Start and then after many, many repetitions click Stop. Since the irrelevant explanatory
variable is not included in the regression, correlation between the two explanatory variables will
have no impact on the results. Confirm this by changing correlation coefficients from 0.00 to
0.30 in the “Corr X1&X2” list. Click Start and then after many, many repetitions click Stop.
Similarly show that the results are unaffected when the correlation coefficient is 0.60 and 0.90.
Subsequently investigate what happens when the irrelevant explanatory variable is included
by selecting the “Both Xs” option; the irrelevant explanatory, x2t, will now be included in the
468 Chapter 14

Table 14.17
Irrelevant explanatory variable simulation results

Only variable 1 included Variables 1 and 2 included

Corr coef
Actual for variables Mean of coef 1 Variance of coef 1 Mean of coef 1 Variance of coef 1
coef 1 1 and 2 estimates estimates estimates estimates

2.0 0.00 ≈ 2.0 ≈ 6.4 ≈ 2.0 ≈ 6.5

2.0 0.30 ≈ 2.0 ≈ 6.4 ≈ 2.0 ≈ 7.2
2.0 0.60 ≈ 2.0 ≈ 6.4 ≈ 2.0 ≈ 10.1
2.0 0.90 ≈ 2.0 ≈ 6.4 ≈ 2.0 ≈ 34.2

regression. Be certain that the correlation coefficient for the relevant and irrelevant explanatory
variables initially equals 0.00. Click Start and then after many, many repetitions click Stop.
Investigate how correlation between the two explanatory variables affects the results when the
irrelevant explanatory variable is included by selecting correlation coefficient values of 0.30,
0.60, and 0.90. For each case click Start and then after many, many repetitions click Stop. Table
14.17 reports the results of the lab.
The results reported in table 14.17 are not surprising; the results support our intuition. On the
one hand, when only the relevant (variable 1) is included:
• The mean of the coefficient estimate for relevant explanatory variable, x1t, equals 2, the actual
value; consequently the ordinary least squares (OLS) estimation procedure for the coefficient
estimate is unbiased.
• Naturally, the variance of the coefficient estimate is not affected by correlation between the
relevant and irrelevant explanatory variables because the irrelevant explanatory variable is not
included in the regression.

On the other hand, when both relevant and irrelevant variables (variables 1 and 2) are included:
• The mean of the coefficient estimates for relevant explanatory variable, x1t, still equals 2, the
actual value; consequently, the ordinary least squares (OLS) estimation procedure for the coef-
ficient estimate is unbiased.
• The variance of the coefficient estimate is greater whenever the irrelevant explanatory variable
is included even when the two explanatory variables are independent (when the correlation coef-
ficient equals 0.00). This occurs because the irrelevant explanatory variable is adding a new
random influence to the model.
• As the correlation between the relevant and irrelevant explanatory variables increases it
becomes more difficult for the ordinary least squares (OLS) estimation procedure to separate
out the individual influence of each explanatory variable. As we saw with multicollinearity, this
difficulty evidences itself in the variance of the coefficient estimate’s probability distributions.
As the two explanatory variables become more correlated the variance of the coefficient esti-
mate’s probability distribution increases.
469 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

The simulation illustrates the effect of including an irrelevant explanatory variable in a model.
While it does not cause bias, it does make the coefficient estimate of the relevant explanatory
variable less reliable by increasing the variance of its probability distribution.

Chapter 14 Review Questions

1. Consider an omitted explanatory variable:

a. What problem can arise?
b. Under what circumstances will the problem arise?
2. Suppose that multicollinearity is present in a regression.
a. What is the “good news?”
b. What is the “bad news?”
3. Suppose that an irrelevant explanatory variable is included in a regression.
a. What is the “good news?”
b. What is the “bad news?”

Chapter 14 Exercises

Cigarette consumption data: Cross section of per capita cigarette consumption and prices in
fiscal year 2008 for the 50 states and the District of Columbia.

CigConsPCt Cigarette consumption per capita in state t (packs)

1. Consider the following model:

CigConsPCt = βConst + βPricePriceConsumert + βEduCollEducColleget

+ βTobProdTobProdPCt + et
a. Develop a theory that explains how each explanatory variable affects per capita cigarette
consumption. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
2. Consider a second model explaining cigarette consumption:

CigConsPCt = βConst + βPricePriceConsumert + βIIncPCt

+ βTobProdTobProdPCt + et

a. Develop a theory that explains how each explanatory variable affects per capita cigarette
consumption. What do your theories suggest about the sign of each coefficient?
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
3. Consider a third model explaining cigarette consumption:

CigConsPCt = βConst + βPricePriceConsumert + βEduCollEducColleget

+ βIIncPCt + βTobProdTobProdPCt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Cigarette Consumption.]
471 Omitted Explanatory Variables, Multicollinearity, Irrelevant Variables

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
4. Focus on the coefficient estimates of EducCollege and IncomePC in exercises 1, 2, and 3.
a. Compare the estimates.
b. Provide a scenario to explain why the estimates may have changed as they did.

House earmark data:Cross-sectional data of proposed earmarks in the 2009 fiscal year for the
451 House members of the 110th Congress.

CongressNamet Name of Congressperson t

5. Consider the following model explaining the number of solo earmarks:

Numbert = βConst + βTermsTermst + βLiberalScoreLiberalt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
472 Chapter 14

6. Consider a second model explaining the number of solo earmarks:

Numbert = βConst + βTermsTermst + βDemPartyDemocratt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
7. Consider a third model explaining the number of solo earmarks:

Numbert = βConst + βTermsTermst + βLiberalScoreLiberalt

+ βDemPartyDemocratt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

House Earmarks.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theories.
8. Focus on the coefficient estimates of ScoreLiberal and PartyDemocrat in exercises 5, 6,
and 7.
a. Compare the standard errors.
b. Provide a scenario to explain why the standard errors may have changed as they did.
Other Regression Statistics and Pitfalls
15

Chapter 15 Outline

15.1 Two-Tailed Confidence Intervals

15.1.1 Confidence Interval Approach: Which Theories Are Consistent with the Data?
15.1.2 A Confidence Interval Example: Television Growth Rates
15.1.3 Calculating Confidence Intervals with Statistical Software

15.2 Coefficient of Determination, R-Squared (R 2)

15.3 Pitfalls
15.3.1 Explanatory Variable Has the Same Value for All Observations
15.3.2 One Explanatory Variable Is a Linear Combination of Other Explanatory Variables
15.3.3 Dependent Variable Is a Linear Combination of Explanatory Variables
15.3.4 Outlier Observations
15.3.5 Dummy Variable Trap

Chapter 15 Prep Questions

1. A friend believes that the internet is displacing the television as a source of news and enter-
tainment. The friend theorizes that after accounting for other factors, television usage is falling
by 1 percent annually:

−1.0 Percent growth rate theory: After accounting for all other factors, the annual growth rate
of television users is negative, −1.0 percent.

Recall the model we used previously to explain television use:

LogUsersTVt = βConst
TV
+ βYear
TV
Yeart + βCapHum
TV
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
TV TV
GdpPCt + βAuth
TV
Autht + etTV

and the data we used:

474 Chapter 15

Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use – 1995–2002.]

LogUsersInternett Logarithm of Internet users per 1,000 people for observation t

LogUsersTVt Logarithm of television users per 1,000 people for observation t
Yeart Year for observation t
CapitalHumant Literacy rate for observation t (percent of population 15 and over)
CapitalPhysicalt Telephone mainlines per 10,000 people for observation t
GdpPCt Per capita real GDP in nation t (1,000’s of “international” dollars)
Autht Freedom House measures of political authoritarianism for observation t
normalized to a 0 to 10 scale. 0 represents the most democratic rating
and 10 the most authoritarian. During the 1995 to 2002 period, Canada
and the United States had a 0 rating; Iraq and the Democratic Republic
of Korea (North Korea) rated 10.

Now assess your friend’s theory.

a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use – 1995–2002.]
b. Formulate the appropriate null and alternative hypotheses. Is a one-tailed or a two-tailed
test appropriate?
c. Use the Econometrics Lab to calculate the Prob[Results IF H0 true].

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

2. A regression’s coefficient of determination, called the R-squared, is referred to as the good-

ness of fit. It equals the portion of the dependent variable’s squared deviations from its mean
that is explained by the parameter estimates:

∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t
R =
2
=
∑ (y − y)
T
Actual squaredd deviations from the mean t
2
t =1

Calculate the R-squared for Professor Lord’s first quiz by filling in the following blanks:
475 Other Regression Statistics and Pitfalls

Actual y Actual Explained y Explained

deviation squared Esty deviation squared
from mean deviation equals from mean deviation
Student xt yt y − –y t (y − –y )2
t 63 + 1.2x Esty − –y t (Esty − –y )2
t

1 5 66 ___ _ _ _ ___

2 15 87 _____ _____ _____ _____ _____
3 25 90 _____ _____ _____ _____ _____

∑
T
t =1
( yt − y )2 = ______

∑ ∑
T T
y = ____
t =1 t t =1
( Estyt − y )2 = ______

y= = R-Squared = = ___

3. Students frequently experience difficulties when analyzing data. To illustrate some of these
pitfalls, we first review the goal of multiple regression analysis:

Reconsider our baseball data for 1996.

Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.

Attendancet Paid attendance for game t

DHt Designator hitter for game t (1 if DH permitted; 0 otherwise)
HomeSalaryt Player salaries of the home team for game t (millions of dollars)
PriceTickett Average price of tickets sold for game t’s home team (dollars)
VisitSalaryt Player salaries of the visiting team for game t (millions of dollars)

Now consider several pitfalls that students often encounter:

a. Explanatory variable has the same value for all observations. Run the following
regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Dependent variable: Attendance

Explanatory variables: PriceTicket, HomeSalary, and DH
476 Chapter 15

i. What happens?
ii. What is the value of DHt for each of the observations?
iii. Why is it impossible to determine the effect of an explanatory variable if the explana-
tory variable has the same value for each observation? Explain.
b. One explanatory variable is a linear combination of other explanatory variables. Generate
a new variable, the ticket price in terms of cents:

PriceCents = 100 × PriceTicket

Run the following regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 Amercian League Summer.]

Dependent variable: Attendance

Explanatory variables: PriceTicket, PriceCents, and HomeSalary

i. What happens?
ii. Is it possible to sort out the effect of two explanatory variables when they contain
redundant information?
c. One explanatory variable is a linear combination of other explanatory variables—another
example. Generate a new variable, the total salaries of the two teams playing:

TotalSalary = HomeSalary + VisitSalary

Run the following regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Dependent variable: Attendance

Explanatory variables: PriceTicket, HomeSalary, VisitSalary, and TotalSalary

i. What happens?
ii. Is it possible to sort out the effect of explanatory variables when they are linear com-
binations of each other and therefore contain redundant information?
d. Dependent variable is a linear combination of explanatory variables. Run the following
regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Dependent variable: TotalSalary

Explanatory variables: HomeSalary and VisitSalary
477 Other Regression Statistics and Pitfalls

What happens?
e. Outlier observations. First, run the following regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Dependent variable: Attendance

Explanatory variables: PriceTicket and HomeSalary
i. What is the coefficient estimate for the ticket price?
ii. Look at the first observation. What is the value of HomeSalary for the first
observation?
Now access a second workfile in which a single value was entered incorrectly:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer Outlier.]

iii. Look at the first observation. What is the value of HomeSalary for the first observa-
tion? Is the value that was entered correctly?
Run the following regression:

Dependent variable: Attendance

Explanatory variables: PriceTicket and HomeSalary

iv. Compare the coefficient estimates in the two regressions.

4. Return to our faculty salary data.

Faculty salary data: Artificially constructed cross section salary data and characteristics for 200
faculty members.

Salaryt Salary of faculty member t (dollars)

Experiencet Teaching experience for faculty member t (years)
Articlest Number of articles published by faculty member t
SexM1t 1 if faculty member t is male; 0 if female

As we did in chapter 13, generate the dummy variable SexF1, which equals 1 for a woman and
0 for a man. Run the following three regressions specifying Salary as the dependent variable:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

a. Explanatory variables: SexF1 and Experience

b. Explanatory variables: SexM1 and Experience
c. Explanatory variables: SexF1, SexM1, and Experience—but without a constant
478 Chapter 15

Getting Started in EViews

SexM1, and Experience.

• In the Workfile window: double click on a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete c so that the window looks like this:
salary sexf1 sexm1 experience.
• Click OK.

For each regression, what is the equation that estimates the salary for
i. men?
ii. women?

Last, run one more regression specifying Salary as the dependent variable:
d. Explanatory variables: SexF1, SexM1, and Experience—but with a constant. What
happens?

5. Consider a system of linear equations of 2 equations and 3 unknowns. Can you solve for all
three unknowns?

15.1 Two-Tailed Confidence Intervals

15.1.1 Confidence Interval Approach: Which Theories Are Consistent with the Data?

Our approach thus far has been to present a theory first and then use data to assess the
theory:
• First, we presented a theory.
• Second, we analyzed the data to determine whether or not the data were consistent with the
theory.

In other words, we have started with a theory and then decided whether or not the data were
consistent with the theory.
The confidence interval approach reverses this process. Confidence intervals indicate the
range of theories that are consistent with the data.
479 Other Regression Statistics and Pitfalls

• First, we analyze the data.

• Second, we consider various theories and determine which theories are consistent with the
data and which are not.

In other words, the confidence interval approach starts with the data and then decides what theo-
ries are compatible.
Hypothesis testing plays a key role in both approaches. Consequently we must choose a sig-
nificance level. A confidence interval’s “size” and the significance level are intrinctly related:
Two-tailed confidence interval + Significance level = 100%
Since the traditional significance levels are 10, 5, and 1 percent, the three most commonly used
confidence intervals are 90, 95, and 99 percent:
• For a 90 percent confidence interval, the significance level is 10 percent.
• For a 95 percent confidence interval, the significance level is 5 percent.
• For a 99 percent confidence interval, the significance level is 1 percent.

A theory is consistent with the data if we cannot reject the null hypothesis at the confidence
interval’s significance level. No doubt this sounds confusing, so let us work through an example
using our international television data:

15.1.2 A Confidence Interval Example: Television Growth Rates

Project: Which growth theories are consistent with the international television data?

Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use – 1995–2002.]

LogUsersInternett Logarithm of Internet users per 1,000 people for observation t

We begin by specifying the “size” of the confidence interval. Let us use a 95 percent confi-
dence interval which means that we are implicitly choosing a significance level of 5 percent.
The following two steps formalize the procedure to decide whether a theory lies within the two-
tailed 95 percent confidence interval:
Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.
Step 2: Consider a specific theory. Is the theory consistent with the data? Does the theory lie
within the confidence interval?
• Step 2a: Based on the theory, construct the null and alternative hypotheses. The null hypoth-
esis reflects the theory.
• Step 2b: Compute Prob[Results IF H0 true].
• Step 2c: Do we reject the null hypothesis?

Yes: Reject the theory. The data are not consistent with the theory. The theory does not lie
within the confidence interval.

No: The data are consistent with the theory. The theory does lie within the confidence
interval.

Recall that we decided to use a 95 percent confidence interval and consequently a 5 percent
significance level:

Prob[Results IF H0 true] < 0.05 Prob[Results IF H0 true] > 0.05

↓ ↓
Reject H0 Do not reject H0
↓ ↓
Theory is not consistent with the data Theory is consistent with the data
↓ ↓
Theory does not lie within the 95 percent Theory does lie within the 95 percent
confidence interval confidence interval

We will illustrate the steps by focusing on four growth rate theories postulating what the
growth rate of television use equals after accounting for other relevant factors:
• 0.0 percent growth rate theory
• −1.0 percent growth rate theory
• 4.0 percent growth rate theory
• 6.0 percent growth rate theory
481 Other Regression Statistics and Pitfalls

0.0 Percent Growth Rate Theory

Since television is a mature technology we begin with a theory postulating that time will have
no impact on television use after accounting for other factors; that is, after accounting for
other factors the growth rate of television use will equal 0.0. We will now apply our two steps
to determine if the 0.0 percent growth rate theory lies within the 95 percent confidence
interval:

Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.

We will apply the same model to explain television use that we used previously:

Model:

LogUsersTVt = βConst
TV
+ βYear
TV
Yeart + βCapHum
TV
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
TV TV
GdpPCt + βAuth
TV
Autht + etTV

We already estimated the parameters of this model in chapter 13 (table 15.1).

Step 2: 0.0 Percent growth rate theory. Focus on the effect of time. Is a 0.0 percent growth
theory consistent with the data? Does the theory lie within the confidence interval?

0.0 Percent growth rate theory: After accounting for all other explanatory variables, time has
no effect on television use; that is, after accounting for all other explanatory variables, the annual
growth rate of television use equals 0.0 percent. Accordingly the actual coefficient of Year, βYearTV
,
equals 0.000.

Table 15.1
Television regression results

Ordinary least squares (OLS)

Dependent variable: LogUsersTV

Explanatory variable(s): Estimate SE t-Statistic Prob

Year 0.022989 0.015903 1.445595 0.1487

• Step 2a: Based on the theory, construct the null and alternative hypotheses.

H0: βYear
TV
= 0.000

H1: βYear
TV
≠ 0.000
• Step 2b: Compute Prob[Results IF H0 true].

Prob[Results IF H0 true] = Probability that the coefficient estimate would be at least 0.023
from 0.000, if H0 were true (if the actual coefficient equals, βYear
TV
, 0.000).

OLS estimation Number of Number of

procedure unbiased If H0 true Standard error observations parameters
↓
Mean[b TV
]=β
Year
TV
Year =0 SE[bTV
Year ] = 0.0159 DF = 742 − 6 = 736

We can use the Econometrics Lab to calculate the probability of obtaining the results if the null
hypothesis is true. Remember that we are conducting a two-tailed test.

Econometrics Lab 15.1: Calculate Prob[Results IF H0 true].

First, calculate the right-hand tail probability.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.1a.]

Question: What is the probability that the estimate lies at or above 0.023?
Answer: 0.0742 (figure 15.1)

Student t-distribution
Mean = 0.000
SE = 0.0159
DF = 736

0.0742 0.0742

TV
0.023 0.023 bYear
− 0.023 0.000 0.023

Figure 15.1
Probability distribution of coefficient estimate—0.0 Percent growth rate theory
483 Other Regression Statistics and Pitfalls

Second, calculate the left hand tail probability.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.1b.]

Question: What is the probability that the estimate lies at or below −0.023?
Answer: 0.0742

The Prob[Results IF H0 true] equals the sum of the right and the left tail two probabilities:

Left tail Right tail

↓ ↓
Prob[Results IF H0 true] = 0.074 + 0.074 ≈ 0.148
• Step 2c: Do we reject the null hypothesis? No, we do not reject the null hypothesis at a
5 percent significance level; Prob[Results IF H0 true] equals 0.148, which is greater than 0.05.
The theory is consistent with the data; hence 0.000 does lie within the 95 percent confidence
interval.

Let us now apply the procedure to three other theories:

• −1.0 Percent growth rate theory: After accounting for all other factors, the annual growth
rate of television users is −1.0 percent; that is, βYear
TV
equals −0.010.
• 4.0 Percent growth rate theory: After accounting for all other factors, the annual growth
rate of television users is 4.0 percent; that is, βYear
TV
equals 0.040.
• 6.0 Percent growth rate theory: After accounting for all other factors, the annual growth
rate of television users is 6.0 percent; that is, βYear
TV
equals 0.060.

We will not provide justification for any of these theories. The confidence interval approach
does not worry about justifying the theory. The approach is pragmatic; the approach simply asks
whether or not the data support the theory.

−1.0 Percent Growth Rate Theory

Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate
the model’s parameters.

We have already done this.

Step 2: −1.0 Percent growth rate theory. Is the theory consistent with the data? Does the theory
lie within the confidence interval?
• Step 2a: Based on the theory, construct the null and alternative hypotheses.
484 Chapter 15

H0: βYear
TV
= −0.010

H1: βYear
TV
≠ −0.010
• Step 2b: Compute Prob[Results IF H0 true].
To compute Prob[Results IF H0 true], we first pose a question:

Question: How far is the coefficient estimate, 0.023, from the value of the coefficient specified
by the null hypothesis, −0.010?
Answer: 0.033

Accordingly

Prob[Results IF H0 true] = Probability that the coefficient estimate would be at least 0.033
from −0.010, if H0 were true (if the actual coefficient equals, βYear
TV
, −0.010)

OLS estimation Number of Number of

procedure unbiased If H0 true Standard error observations parameters
↓
Mean[b TV
Year]=β TV
Year = −0.010 SE[b TV
Year] = 0.0159 DF = 742 − 6 = 736

We can use the Econometrics Lab to calculate the probability of obtaining the results if the null
hypothesis is true. Once again, remember that we are conducting a two-tailed test:

Econometrics Lab 15.2: Calculate Prob[Results IF H0 true]

First, calculate the right-hand tail probability (figure 15.2).

Student t-distribution
Mean = –0.010
SE = 0.0159
DF = 736

0.0191 0.0191

TV
0.033 0.033 bYear
–0.043 –0.010 0.023

Figure 15.2
Probability distribution of coefficient estimate—−1.0 Percent growth rate theory
485 Other Regression Statistics and Pitfalls

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.2a.]

Question: What is the probability that the estimate lies 0.033 or more above −0.010, at or above
0.023?
Answer: 0.0191

Second, calculate the left-hand tail probability.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.2b.]

Question: What is the probability that the estimate lies 0.033 or more below −0.010, at or
below −0.043?
Answer: 0.0191

The Prob[Results IF H0 true] equals the sum of the of the two probabilities:

Left tail Right tail

↓ ↓
Prob[Results IF H0 true] = 0.019 + 0.019 ≈ 0.038

• Step 2c: Do we reject the null hypothesis?

Yes, we do reject the null hypothesis at a 5 percent significance level; Prob[Results IF H0 true]
equals 0.038, which is less than 0.05. The theory is not consistent with the data; hence −0.010
does not lie within the 95 percent confidence interval.

4.0 Percent Growth Rate Theory

Following the same procedure for the 4.0 percent growth rate theory:

Prob[Results IF H0 true] ≈ 0.285

We do not reject the null hypothesis at a 5 percent significance level. The theory is consistent
with the data; hence 0.040 does lie within the 95 percent confidence interval.

6.0 Percent Growth Rate Theory

Again, following the same procedure for the 6.0 percent growth rate theory:

Prob[Results IF H0 true] ≈ 0.020

We do reject the null hypothesis at a 5 percent significance level. The theory is not consistent
with the data; hence 0.060 does not lie within the 95 percent confidence interval.
We summarize the four theories in figure 15.3 and table 15.2.
486 Chapter 15

Student t-distribution
Mean = –0.010
SE = 0.0159 –1.0 Percent growth theory
DF = 736 TV
H0:β Year = −0.010
TV
0.0191 0.0191 H :β ≠−0.010
1 Year
Prob[Results IF H0 true] ≈ 0.038

TV
–0.043 0.033 0.033 0.023 b
–0.010 Year

Student t-distribution
Mean = 0.000
SE = 0.0159 0.0 Percent growth theory
DF = 736 TV
H0:β Year = 0.000
0.0742 0.0742
TV
H1:β Year ≠ 0.000
Prob[Results IF H0 true] ≈ 0.148
TV
–0.023
0.023
0.000
0.023
0.023 bYear

Student t-distribution
Mean = 0.040
SE = 0.0159
4.0 Percent growth theory DF = 736
TV
H0:β Year = 0.040
0.1427 0.1427
TV
H1:β Year ≠ 0.040
Prob[Results IF H0 true] ≈ 0.285

0.017 0.017 TV
0.023 0.040 0.057 bYear

Student t-distribution
Mean = 0.060
SE = 0.0159
DF = 736
6.0 Percent growth theory 0.0101 0.0101
TV
H0:β Year = 0.060
TV
H1:β Year ≠ 0.060
Prob[Results IF H0 true] ≈ 0.020 TV
0.023
0.037
0.060
0.037
0.097 bYear

Figure 15.3
Probability distribution of coefficient estimate—Comparison of growth rate Theories

Table 15.2
Growth rate theories and the 95 percent confidence interval

Growth Prob[Results Confidence

rate theory Null and alternative hypotheses IF H0 true] interval

−1% H0: β TV
Year = −0.010 H1: β TV
Year ≠ −0.010 ≈ 0.038 No
0% H0: β TV
Year = 0.000 H1: β TV
Year ≠ 0.000 ≈ 0.148 Yes
4% H0: β TV
Year = 0.040 H1: β TV
Year ≠ 0.040 ≈ 0.285 Yes
6% H0: β TV
Year = 0.060 H1: β TV
Year ≠ 0.060 ≈ 0.020 No
487 Other Regression Statistics and Pitfalls

Prob[Results IF H0 true]

0.285

0.148

0.050

0.038

0.020
Growth
rate
theory
−1.0% 0.0% 4.0% 6.0%
Within 95%
0.038 0.148 conﬁdence interval 0.285 0.020

LB UB
βYear βYear
Do not reject H0
Reject H0 Reject H0

Signiﬁcance level = 5% = 0.05

Figure 15.4
Lower and Upper Confidence Interval Bounds

Now we will make two observations and pose two questions:

• The 0.0 percent growth rate theory lies within the confidence interval, but the −1.0 percent
theory does not (figure 15.4).

Question: What is the lowest growth rate theory that is consistent with the data; that is, what
is the lower bound of the confidence interval, βYear
LB
?
• The 4.0 percent growth rate theory lies within the confidence interval, but the 6.0 percent
theory does not (figure 15.4).
Question: What is the highest growth rate theory that is consistent with the data; that is, what
is the upper bound of the confidence interval, βYear
UB
?

Figure 15.5 answers these questions visually by illustrating the lower and upper bounds. The
Prob[Results IF H0 true] equals 0.05 for both lower and upper bound growth theories because
our calculations are based on a 95 percent confidence interval:
488 Chapter 15

Student t-distribution
LB
Mean = β Year
Lower bound growth theory
SE = 0.0159 TV LB
H0:β Year = β Year
DF = 736
TV LB
H1:β Year ≠ β Year
0.025 0.025 Prob[Results IF H0 true] = 0.05

bTV
Year
LB 0.023
β Year Student t-distribution
LB
Mean = β Year
SE = 0.0159
Upper bound growth theory DF = 736
TV UB
H0:β Year = β Year
TV UB
H1:β Year ≠ β Year 0.025 0.025
Prob[Results IF H0 true] = 0.05

b TV
Year
0.023 UB
β Year

Figure 15.5
Probability distribution of coefficient estimate—Lower and upper confidence intervals

• The lower bound growth theory postulates a growth rate that is less than that estimated. Hence
the coefficient estimate, 0.023, marks the right-tail border of the lower bound.
•The upper bound growth theory postulates a growth rate that is greater than that estimated.
Hence the coefficient estimate, 0.023, marks the left-tail border of the upper bound.

Econometrics Lab 15.3: Calculating the 95 Percent Confidence Interval

We can use the Econometrics Lab to calculate the lower and upper bounds:
• Calculating the lower bound, βYear
LB
: For the lower bound, the right-tail probability equals
0.025.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.3a.]

The appropriate information is already entered for us:

Standard error: 0.0159

Value: 0.023
489 Other Regression Statistics and Pitfalls

Degrees of freedom: 736

Area to right: 0.025

Click Calculate. The reported mean is the lower bound.

Mean: −0.0082

βYear
LB
= −0.0082
• Calculating the Upper Bound, β YearUB
: For the upper bound, the left-tail probability equals
0.025. Accordingly the right-tail probability will equal 0.975.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 15.3b.]

The appropriate information is already entered for us:

Standard error: 0.0159

Value: 0.023
Degrees of freedom: 736
Area to right: 0.975

Click Calculate. The reported mean is the upper bound.

Mean: 0.0542

βYear
UB
= 0.0542

In this case −0.0082 and 0.0542 mark the bounds of the two-tailed 95 percent confidence
interval:
• For any growth rate theory between −0.82 percent and 5.42 percent:
Prob[Results IF H0 true] > 0.05 → Do not reject H0 at the 5 percent significance level.
• For any growth rate theory below −0.82 percent or above 5.42 percent:
Prob[Results IF H0 true] < 0.05 → Reject H0 at the 5 percent significance level.

15.1.3 Calculating Confidence Intervals with Statistical Software

Fortunately, statistic software provides us with an easy and convenient way to compute confi-
dence intervals. The software does all the work for us.
490 Chapter 15

Table 15.3
95 Percent confidence interval calculations

95 Percent interval estimates

Dependent variable: LogUsersTV

Explanatory variable(s): Estimate Lower Upper

Year 0.022989 −0.008231 0.054209

CapitalHuman 0.036302 0.034656 0.083099
CapitalPhysical 0.001931 0.000931 0.002932
GdpPC 0.058877 0.032542 0.040061
Auth 0.063345 0.038167 0.088522
Const −44.95755 −107.3312 17.41612
Number of observations 742

Getting Started in EViews

After running the appropriate regression:

• In the Equation window: Click View, Coefficient Diagnostics, and Confidence Intervals.
• In the Confidence Intervals window: Enter the confidence levels you wish to compute. (By
default the values of .90, .95, and .99 are entered.)
• Click OK.

Table 15.3 reports that the lower and upper bounds for the 95 percent confidence interval are
−0.0082 and 0.0542. These are the same values that we calculated using the Econometrics Lab.

15.2 Coefficient of Determination (Goodness of Fit), R-Squared (R2)

All statistical packages report the coefficient of determination, the R-squared, in their regression
printouts. The R-squared seeks to capture the “goodness of fit.” It equals the portion of the depen-
dent variable’s squared deviations from its mean that is explained by the parameter estimates:

∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t
R =2
=
∑ (y − y)
T
Actual squaredd deviations from the mean t
2
t =1

To explain how the coefficient of determination is calculated, we will revisit Professor Lord’s
first quiz (table 15.4). Recall the theory, the model, and our analysis:

Theory: An increase in the number of minutes studied results in an increased quiz score.
Model: yt = βConst + βxxt + et
491 Other Regression Statistics and Pitfalls

Table 15.4
First quiz data

Minutes Quiz
Student studied (x) score (y)

1 5 66
2 15 87
3 25 90

Table 15.5
First quiz regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601

Const 63.00000 8.874120 7.099296 0.0891
Number of observations 3
R-squared 0.842105
Estimated equation: Esty = 63 + 1.2x
Interpretation of estimates:
bConst = 63: Students receive 63 points for showing up.
bx = 1.2: Students receive 1.2 additional points for each additional minute studied.
Critical result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate, suggests that
additional studying increases quiz scores. This evidence lends support to our theory.

where

xt = minutes studied by student t

yt = quiz score earned by student t

Theory: βx > 0

We used the ordinary least squares (OLS) estimation procedure to estimate the model’s param-
eters (table 15.5).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Professor Lord’s First Quiz.]

Next we formulated the null and alternative hypotheses to determine how much confidence
we should have in the theory:

H0: βx = 0 Studying has no impact on a student’s quiz score

H1: βx > 0 Additional studying increases a student’s quiz score
492 Chapter 15

Table 15.6
R-squared calculations for first quiz

Actual y Actual Explained y Explained

deviation squared Esty deviation squared
from mean deviation equals from mean deviation

Student xt yt yt − –y (yt − –y )2 63 + 1.2x Estyt − –y (Estyt − –y )2

1 5 66 −15 225 69 −12 144

2 15 87 6 36 81 0 0
3 25 90 9 81 93 12 144

∑ ∑ ∑
T T T
y = 243
t =1 t t =1
( yt − y )2 = 342 t =1
( Estyt − y )2 = 288
243 288
y= = 81 R-squared = = 0.84
3 342

We then calculated Prob[Results IF H0 true], the probability of the results like we obtained (or
even stronger) if studying in fact had no impact on quiz scores. The tails probability reported in
the regression printout allows us to calculate this easily. Since a one-tailed test is appropriate,
we divide the tails probability by 2:
0.2601
Prob[ Results IF H 0 true] = ≈ 0.13
2
We cannot reject the null hypothesis that studying has no impact even at the 10 percent signifi-
cance level.
The regression printout reports that the R-squared equals about .84; this means that 84 percent
of the dependent variable’s squared deviations from its mean are explained by the parameter
estimates. Table 15.6 shows the calculations required to compute the R-squared:
The R-squared equals ∑ t =1 ( Estyt − y )2 divided ∑ t =1 ( yt − y )2 :
T T

∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t 2888
R = 2
= = = 0.84
∑ (y − y)
T
Actual squaredd deviations from the mean t
2 342
t =1

Note that 84 percent of the y’s squared deviations are explained by the estimated constant and
coefficient. Our calculation of the R-squared agrees with the regression printout.
While the R-squared is always calculated and reported by all statistical software, it is not
useful in assessing theories. We will justify this claim by considering a second quiz that Profes-
sor Lord administered. Each student studies the same number of minutes and earns the same
score in the second quiz as he/she did in the first quiz (table 15.7).
Before we run another regression that includes the data from both quizzes, let us apply our
intuition:
493 Other Regression Statistics and Pitfalls

Table 15.7
Second quiz data

Minutes Quiz
Student studied (x) score (y)

1 5 66
2 15 87
3 25 90

Table 15.8
First and second quiz regression results

Ordinary least squares (OLS)

Dependent variable: y
Explanatory variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.259808 4.618802 0.0099

Const 63.00000 4.437060 14.19859 0.0001
Number of observations 6
R-squared 0.842105

• Begin by focusing on only the first quiz. Taken in isolation, first quiz suggests that studying
improves quiz scores. We cannot be very confident of this, however, since we cannot reject the
null hypothesis even at a 10 percent significance level.
• Next consider only the second quiz. Since the data from the second quiz is identical to the
data from the first quiz, the regression results would be identical. Hence, taken in isolation, the
second quiz suggests that studying improves quiz scores.

Each quiz in isolation suggests that studying improves quiz scores. Now consider both quizzes
together. The two quizzes taken together reinforce each other; this should make us more confi-
dent in concluding that studying improves quiz scores, should it not?
If our intuition is correct, how should the Prob[Results IF H0 true] be affected when we con-
sider both quizzes together? Since we are more confident in concluding that studying improves
quiz scores, the probability should be less. Let us run a regression using data from both the first
and second quizzes to determine whether or not this is true (table 15.8).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Professor Lord’s First and Second Quizzes.]

Using data from both quizzes:

0.0099
Prob[ Results IF H 0 true] = ≈ 0.005
2
494 Chapter 15

Table 15.9
R-squared calculations for first and second quizzes

Actual y Actual Explained y Explained

deviation squared Esty deviation squared
from mean deviation equals from mean deviation
Quiz/
student xt yt yt − –y (yt − –y )2 63 + 1.2x Estyt − –y (Estyt − –y )2

1/1 5 66 −15 225 69 −12 144

1/2 15 87 6 36 81 0 0
1/3 25 90 9 81 93 12 144
2/1 5 66 −15 225 69 −12 144
2/2 15 87 6 36 81 0 0
2/3 25 90 9 81 93 12 144

∑ ∑ ∑
T T T
y = 486
t =1 t t =1
( yt − y )2 = 684 t =1
( Estyt − y )2 = 576
486 576
y= = 81 R-squared = = 0.84
6 684

As a consequence of the second quiz, the probability has fallen from 0.13 to 0.005; clearly, our
confidence in the theory rises. We can now reject the null hypothesis that studying has no impact
at the traditional significance levels of 1, 5, and 10 percent. Our calculations confirm our
intuition.
Next consider the R-squared for the last regression that includes both quizzes. The regression
printout reports that the R-squared has not changed; the R-squared is still 0.84. Table 15.9
explains why:

∑ (Esty − y )
T 2
Explained squared deviations from the mean t =1 t 5866
R =
2
= = = 0.84
∑ (y − y)
T
Actual squaredd deviations from the mean t
2 684
t =1

The R-squared still equals 0.84. Both the actual and explained squared deviations have doubled;
consequently their ratio, the R-squared, remains unchanged. Clearly, the R-squared does not help
us assess our theory. We are now more confident in the theory, but the value of the R-squared
has not changed. The bottom line is that if we are interested in assessing our theories we should
focus on hypothesis testing, not on the R-squared.

15.3 Pitfalls

Frequently econometrics students using statistical software encounter pitfalls that are frustrating.
We will now discuss several of these pitfalls and describe the warning signs that accompany
them. We begin by reviewing the “goal” of multiple regression analysis:
495 Other Regression Statistics and Pitfalls

We will consider five common pitfalls that often befell students:

• Explanatory variable has the same value for all observations.
• One explanatory variable is a linear combination of other explanatory variables.
• Dependent variable is a linear combination of explanatory variables.
• Outlier observations.
• Dummy variable trap.

We will illustrate the first four pitfalls by revisiting our baseball attendance data that reports on
every game played in the American League during the summer of 1996 season.

Project: Assess the determinants of baseball attendance.

Baseball data: Panel data of baseball statistics for the 588 American League games played
during the summer of 1996.

Attendancet Paid attendance for game t

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

We begin with a model that we have studied before in which attendance, Attendance, depends
on two explanatory variables, ticket price, PriceTicket, and home team salary, HomeSalary:

Attendancet = βConst + βPricePriceTickett + βHomeSalaryHomeSalaryt + et

Recall the regression results from chapter 14 (table 15.10).

15.3.1 Explanatory Variable Has the Same Value for All Observations

One common pitfall is to include an explanatory variable in a regression that has the same value
for each observation. To illustrate this, consider the variable DH:

DHt Designator hitter for game t (1 if DH permitted; 0 otherwise)

496 Chapter 15

Table 15.10
Baseball attendance regression results

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket −590.7836 184.7231 −3.198211 0.0015

HomeSalary 783.0394 45.23955 17.30874 0.0000
Const 9,246.429 1,529.658 6.044767 0.0000
Number of observations 585
Estimated equation: EstAttendance = 9,246 − 591PriceTicket + 783HomeSalary
Interpretation:
bPriceTicket = −591. We estimate that a $1.00 increase in the price of tickets decreases attendance by 591 per game.
bHomeSalary = 783. We estimate that a $1 million increase in the home team salary increases attendance by 783 per
game.

Our baseball data includes only American League games in 1996. Since interleague play did
not begin until 1997 and all American League games allowed designated hitters, the variable
DHt equals 1 for each observation. Let us try to use the ticket price, PriceTicket, home team
salary, HomeSalary, and the designated hitter dummy variable, DH, to explain attendance,
Attendance:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

The statistical software issues a diagnostic. While the verbiage differs from software package
to software package, the message is the same: the software cannot perform the calculations that
we requested. That is, the statistical software is telling us that it is being asked to do the
impossible.
What is the intuition behind this? To determine how a dependent variable is affected by an
explanatory variable, we must observe how the dependent variable changes when the explanatory
variable changes. The intuition is straightforward:
• On the one hand, if the dependent variable tends to rise when the explanatory variable rises,
the explanatory variable affects the dependent variable positively suggesting a positive
coefficient.
• On the other hand, if the dependent variable tends to fall when the explanatory variable rises,
the explanatory variable affects the dependent variable negatively suggesting a negative
coefficient.

The evidence of how the dependent variable changes when the explanatory variable changes is
essential. In the case of our baseball example, there is no variation in the designated hitter
497 Other Regression Statistics and Pitfalls

explanatory variable, however; the DHt equals 1 for each observation. We have no way to assess
the effect that the designated hitter has on attendance. We are asking our statistical software to
do the impossible. While we have attendance information when the designated hitter was used,
we have no attendance information when the designated hitter was not used. How then can we
expect the software to assess the impact of the designed hitter on attendance?

15.3.2 One Explanatory Variable Is a Linear Combination of Other Explanatory Variables

We have already seen one example of this when we discussed multicollinearity in the previous
chapter. We included both the ticket price in terms of dollars and the ticket price in terms of
cents as explanatory variables. The ticket price in terms of cents was a linear combination of
the ticket price in terms of dollars:

PriceCents = 100 × PriceTicket

Let us try to use the ticket price, PriceTicket, home team salary, HomeSalary, and the ticket
price in terms of cents, PriceCents, to explain attendance, Attendance:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

When both measures of the price were included in the regression our statistical software will
issue a diagnostic indicating that it is being asked to do the impossible. Statistical software
cannot separate out the individual influence of the two explanatory variables, PriceTicket and
PriceCents, because they contain precisely the same information; the two explanatory variables
are redundant. We are asking the software to do the impossible.
In fact any linear combination of explanatory variables produces this problem. To illustrate
this, we consider two regressions. The first specifies three explanatory variables: ticket price,
home team salary, and visiting team salary (table 15.11).

Table 15.11
Baseball attendance

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket −586.5197 179.5938 −3.265813 0.0012

HomeSalary 791.1983 44.00477 17.97983 0.0000
VisitSalary 163.4448 27.73455 5.893181 0.0000
Const 3,528.987 1,775.648 1.987437 0.0473
Number of observations 585
Estimated equation: EstAttendance = 3,529 − 587PriceTicket + 791HomeSalary + 163VisitSalary
498 Chapter 15

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Now generate a new variable, TotalSalary:

TotalSalary = HomeSalary + VisitSalary

TotalSalary is a linear combination of HomeSalary and VisitSalary. Let us try to use the ticket
price, PriceTicket, home team salary, HomeSalary, and visiting team salary, VisitSalary, and
total salary, TotalSalary, to explain attendance, Attendance:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

Our statistical software will issue a diagnostic indicating that it is being asked to do the
impossible.
The information contained in TotalSalary is already included in HomeSalary and VisitSalary.
Statistical software cannot separate out the individual influence of the three explanatory variables
because they contain redundant information. We are asking the software to do the impossible.

15.3.3 Dependent Variable Is a Linear Combination of Explanatory Variables

Suppose that the dependent variable is a linear combination of the explanatory variables. The
following regression illustrates this scenario. TotalSalary is by definition the sum of HomeSalary
and VisitSalary. Total salary, TotalSalary, is the dependent variable; home team salary, HomeSal-
ary, and visiting team salary, VisitSalary, are the explanatory variables (table 15.12).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

The estimates of the constant and coefficients reveal the definition of TotalSalary:

TotalSalary = HomeSalary + VisitSalary

Table 15.12
Total salary

Ordinary least squares (OLS)

Dependent variable: TotalSalary

Explanatory variable(s): Estimate SE t-Statistic Prob

HomeSalary 1.000000 8.58E − 17 1.17E + 16 0.0000

VisitSalary 1.000000 8.61E − 17 1.16E + 16 0.0000
Const 0.000000 4.24E − 15 0.000000 1.0000
Number of observations 588
Estimated Equation: EstTotalSalary = 1.000HomeSalary + 1.000VisitSalary
499 Other Regression Statistics and Pitfalls

Furthermore the standard errors are very small, approximately 0. In fact they are precisely equal
to 0, but they are not reported as 0’s as a consequence of how digital computers process numbers.
We can think of these very small standard errors as telling us that we are dealing with an “iden-
tity” here, something that is true by definition.

15.3.4 Outlier Observations

We should be aware of the possibility of “outliers” because the ordinary least squares (OLS)
estimation procedure is very sensitive to them. An outlier can occur for many reasons. One
observation could have a unique characteristic or one observation could include a mundane typo.
To illustrate the effect that an outlier may have, once again consider the games played in the
summer of the 1996 American League season.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer.]

The first observation reports the game played in Milwaukee on June 1, 1996: the Cleveland
Indians visited the Milwaukee Brewers. The salary for the home team, the Brewers, totaled
20.232 million dollars in 1996:

Observation Month Day Home team Visiting team Home team salary
1 6 1 Milwaukee Cleveland 20.23200
2 6 1 Oakland New York 19.40450
3 6 1 Seattle Boston 38.35453
4 6 1 Toronto Kansas City 28.48671
5 6 1 Texas Minnesota 35.86999

Review the regression in table 15.13.

Suppose that a mistake was made in entering the Milwaukee’s player salary for the first observa-
tion; suppose that the decimal point was misplaced and that 20232.00 was entered instead of

Table 15.13
Baseball attendance regression with correct data

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket −590.7836 184.7231 −3.198211 0.0015

Table 15.14
Baseball attendance regression with an outlier

Ordinary least squares (OLS)

Dependent variable: Attendance

Explanatory variable(s): Estimate SE t-Statistic Prob

PriceTicket 1,896.379 142.8479 13.27552 0.0000

HomeSalary − 0.088467 0.484536 − 0.182580 0.8552
Const 3,697.786 1,841.286 2.008263 0.0451
Number of observations 585
Estimated equation: EstAttendance = 3,698 + 1,896PriceTicket − .0885HomeSalary

20.23200. All the other values were entered correctly. You can access the data including this
“outlier” in table 15.14.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

1996 American League Summer Outlier.]

Observation Month Day Home team Visiting team Home team salary

1 6 1 Milwaukee Cleveland 20232.00

2 6 1 Oakland New York 19.40450
3 6 1 Seattle Boston 38.35453
4 6 1 Toronto Kansas City 28.48671
5 6 1 Texas Minnesota 35.86999

Even though only a single value has been altered, the estimates of both coefficients changes
dramatically. The estimate of the ticket price coefficient changes from about −591 to 1,896 and
the estimate of the home salary coefficient changes from 783.0 to −0.088. This illustrates how
sensitive the ordinary least squares (OLS) estimation procedure can be to an outlier. Conse-
quently we must take care to enter data properly and to check to be certain that we have generated
any new variables correctly.

15.3.5 Dummy Variable Trap

To illustrate the dummy variable trap, we will revisit our faculty salary data:
Project: Assess the possibility of discrimination in academe.
Faculty salary data: Artificially constructed cross-sectional salary data and characteristics for
200 faculty members.
501 Other Regression Statistics and Pitfalls

Salaryt Salary of faculty member t (dollars)

Experiencet Years of teaching experience for faculty member t
Articlest Number of articles published by faculty member t
SexM1t 1 if faculty member t is male; 0 if female

We will investigate models that include only dummy variables and years of teaching experience.
More specifically, we will consider four cases:

Model Dependent variable Explanatory variables Constant

1 Salary SexF1 and Experience Yes

2 Salary SexM1 and Experience Yes
3 Salary SexF1, SexM1, and Experience No
4 Salary SexF1, SexM1, and Experience Yes
We begin by generating the variable SexF1 as we did in chapter 13:

SexF1 = 1 − SexM1

Now we will estimate the parameters of the four models (table 15.15). First, model 1.

Model 1:

Salaryt = βConst + βSexF1SexF1t + βEExperiencet + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]

Table 15.15
Faculty salary regression

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexF1 −2,240.053 3,051.835 − 0.734002 0.4638

We calculate the estimated salary equation for men and women. For men, SexF1 = 0:

EstSalary = 42,238 − 2,240SexF1 + 2,447Experience

EstSalaryMen = 42,238 − 0 + 2,447Experience

= 42,238 + 2,447Experience
The intercept for men equals $42,238; the slope equals 2,447.
For women, SexF1 = 1:

EstSalary = 42,238 − 2,240SexF1 + 2,447Experience

EstSalaryWomen = 42,238 − 2,240 + 2,447Experience

= 39,998 + 2,447Experience

The intercept for women equals $39,998; the slope equals 2,447.
It is easy to plot the estimated salary equations for men and women (figure 15.6). Both plotted
lines have the same slope, 2,447. The intercepts differ, however. The intercept for men is 42,238
while the intercept for women is 39,998:

Model 2:

Salaryt = βConst + βSexM1SexM1t + βEExperiencet + et

EstSalary = bConst + bSexM1SexM1 + bEExperience

EstSalary

EstSalaryMen = 42,238 + 2,447Experience

Slope = 2,447
42,238
2,240

EstSalaryWomen = 39,998 + 2,447Experience

39,998

Experience

Figure 15.6
Estimated salaries equations for men and women
503 Other Regression Statistics and Pitfalls

Let us attempt to calculate the second model’s estimated constant and the estimated male sex
dummy coefficient, bConst and bSexM1, using the intercepts from model 1.

For men For women

SexM1 = 1 SexM1 = 0
EstSalaryMen = bConst + bSexM1 + bEExperience EstSalaryWomen = bConst + bEExperience
InterceptMen = bConst + bSexM1 InterceptWomen = bConst
42,238 = bConst + bSexM1 39,998 = bConst

We now have two equations:

42,238 = bConst + bSexM1

39,998 = bConst

and two unknowns, bConst and bSexM1. It is easy to solve for the unknowns. The second equation
tells us that bConst equals 39,998:

bConst = 39,998

We focus on the first equation:

42,238 = bConst + bSexM1

↓ substituting for bConst
42,238 = 39,998 + bSexM1

We solve for bSexM1:

bSexM1 = 42,238 − 39,998 = 2,240

Using the estimates from model 1, we compute that the estimate for model 2’s estimate for the
constant, which should be 39,998 and the estimate for the male sex dummy coefficient, which
should be 2,240.
Let us now run the regression. The regression confirms our calculations (table 15.16).
Model 3:
Salaryt = βSexF1SexF1t + βSexM1SexM1t + βEExperiencet + et

EstSalary = bSexF1SexF1 + bSexM1SexM1 + bEExperience

Again, let us attempt to calculate the third model’s estimated female sex dummy coefficient and
its male sex dummy coefficient, bSexF1 and bSexM1, using the intercepts from model 1.
504 Chapter 15

Table 15.16
Faculty salary regression

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexM1 2,240.053 3,051.835 0.734002 0.4638

Experience 2,447.104 163.3812 14.97787 0.0000
Const 39,997.56 2,575.318 15.53112 0.0000
Number of observations 200
Estimated equation: EstSalary = 39,998 + 2,240SexM1 + 2,447Experience

For men For women

SexF1 = 0 and SexM1 = 1 SexF1 = 1 and SexM1 = 0
EstSalaryMen = bSexM1 + bEExperience EstSalaryWomen = bSexF1 + bEExperience
InterceptMen = bSexM1 InterceptWomen = bSexF1
42,238 = bSexM1 39,998 = bSexF1

We now have two equations:

42,238 = bSexM1

39,998 = bSexF1

and two unknowns, bSexF1 and bSexM1. Using the estimates from model 1, we compute that the
estimate for model 3’s estimate for the male sex dummy coefficient should be 42,238 and the
estimate for the female sex dummy coefficient should be 39,998.
Let us now run the regression:

Getting Started in EViews

To estimate the third model (part c) using EViews, you must “fool” EViews into running the
appropriate regression:
•In the Workfile window: highlight Salary and then while depressing <Ctrl>, highlight SexF1,
SexM1, and Experience.
• In the Workfile window: double click on a highlighted variable.
• Click Open Equation.
• In the Equation Specification window delete c so that the window looks like this:
salary sexf1 sexm1 experience.
• Click OK.
505 Other Regression Statistics and Pitfalls

Table 15.17
Faculty salary regression

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

SexF1 39,997.56 2,575.318 15.53112 0.0000

SexM1 42,237.61 3,594.297 11.75129 0.0000
Experience 2,447.104 163.3812 14.97787 0.0000
Number of observations 200
Estimated equation: EstSalary = 39,998 SexM1 + 42,238SexM1 + 2,447Experience

Again, the regression results (table 15.17) confirm our calculations.

Model 4:

Salaryt = βConst + βSexF1SexF1t + βSexM1SexM1t + βEExperience + et

EstSalary = bConst + bSexF1SexF1 + bSexM1SexM1 + bEExperience

Question: Can we calculate the fourth model’s bConst, bSexF1, and bSexM1 using model 1’s
intercepts?

For men For women

SexF1 = 0 and SexM1 = 1 SexF1 = 1 and SexM1 = 0
EstSalaryMen = bConst + bSexM1 + bEExperience EstSalaryWomen = bConst + bSexF1 + bEExperience
InterceptMen = bConst + bSexM1 InterceptWomen = bConst + bSexF1
42,238 = bConst + bSexM1 39,998 = bConst + bSexF1

We now have two equations:

42,238 = bConst + bSexM1

39,998 = bConst + bSexF1

and three unknowns, bConst, bSexF1, and bSexM1. We have more unknowns than equations. We cannot
solve for the three unknowns. It is impossible. This is called a dummy variable trap:

Dummy variable trap: A model in which there are more parameters representing the intercepts
than there are intercepts.

There are three parameters, bConst, bSexF1, and bSexM1, estimating the two intercepts.
Let us try to run the regression:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Faculty Salaries.]
506 Chapter 15

Our statistical software will issue a diagnostic telling us that it is being asked to do the impossible.
In some sense, the software is being asked to solve for three unknowns with only two equations.

Chapter 15 Review Questions

1. Explain in words how the confidence interval approach differs from the approach we have
taken thus far.
2. If you wish to assess a theory should you be concerned with the coefficient of determination,
the R-squared?
3. What is the goal of multiple regression analysis?
4. In each case, what issue arises for multiple regression analysis and in words explain why it
arises:
a. Explanatory variable has the same value for all observations.
b. One explanatory variable is a linear combination of other explanatory variables.
c. Dependent variable is a linear combination of explanatory variables.
d. Outlier observations.
e. Dummy variable trap.

Chapter 15 Exercises

Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.

LogUsersInternett Logarithm of Internet users per 1,000 people for observation t

LogUsersTVt Logarithm of television users per 1,000 people for observation t
Yeart Year for observation t
CapitalHumant Literacy rate for observation t (percent of population 15 and over)
CapitalPhysicalt Telephone mainlines per 10,000 people for observation t
GdpPCt Per capita real GDP in nation t (1,000’s of “international” dollars)
Autht The Freedom House measures of political authoritarianism for
observation t normalized to a 0 to 10 scale. 0 represents the most
democratic rating and 10 the most authoritarian. During the 1995 to 2002
period, Canada and the United States had a 0 rating; Iraq and the
Democratic Republic of Korea (North Korea) rated 10.

1. Consider the following model of Internet use:

LogUsersInternett = βConst
Int
+ βYear
Int
Yeart + βCapHum
Int
CapitalHumant
+ βCapPhyCapitalPhysicalt + βGDP
Int Int
GdpPCt + βAuth
Int
Autht + etInt
507 Other Regression Statistics and Pitfalls

a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use.]
b. Compute the two-tailed 95 percent confidence interval for the coefficient estimate of Year.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

2. Consider the following model of television use:

LogUsersTVt = βConst + βYearYeart + et

a. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use.]

b. Compute the two-tailed the 95 percent confidence interval for the coefficient estimate
Year.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

Petroleum data consumption for Massachusetts and Nebraska: Panel data of petroleum con-
sumption and prices for two states, Massachusetts and Nebraska, from 1990 to 1999.

Cpit Northeast CPI for Massachusetts; Midwest CPI for Nebraska

PetroConst Consumption of petroleum for observation t (1,000s of gallons)
PetroConsPCt Per capita consumption of petroleum for observation t (gallons)
Popt Population for observation t (persons)
PriceNomt Nominal price of petroleum for observation t (dollars per gallon)
Masst 1 if observation t is Massachusetts; 0 if Nebraska
Yeart Year

Consider the following model:

PetroConsPCt = βConst + βPPriceRealt + et

where t = 1990, 1991, . . ., 1999.
Generate the variable PriceReal.
508 Chapter 15

3. Estimate this model for Massachusetts by restricting your sample to Massachusetts observa-
tions only. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of the model.
a. What equation describes estimated per capita petroleum consumption in Massachusetts,
EstPetroConsPCMass?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption – Mass and Neb.]

Getting Started in EViews

In the Workfile window:

• Click Sample.

• To include only the Massachusetts data, enter Mass1 = 1 in the If condition window.
• Click OK.

b. What is the estimated intercept of the best fitting line?

c. What is the estimated slope of the best fitting line?

Getting Started in EViews

NB: Do not forget that the Sample option behaves like a toggle switch. It remains on until
you turn it off.
Therefore, before proceeding, in the Workfile window:
• Click Sample.

• Clear the If condition window.

• Click OK.

4. Estimate the model for Nebraska by restricting your sample to Nebraska observations only.
Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the
model.
a. What equation describes estimated per capita petroleum consumption in Nebraska,
EstPetroConsPCNeb?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption – Mass and Neb.]
509 Other Regression Statistics and Pitfalls

Getting Started in EViews

In the Workfile window:

• Click Sample.

• To include only the Nebraska data, enter Mass1 = 0 in the If condition window.
• Click OK.

b. What is the estimated intercept of the best fitting line?

c. What is the estimated slope of the best fitting line?

Getting Started in EViews

NB: Do not forget that the Sample option behaves like a toggle switch. It remains on until
you turn it off.
Therefore, before proceeding, in the Workfile window:
• Click Sample.

• Clear the If condition window.

• Click OK.

5.
a. Consider the following new model:

PetroConsPCt = βMassMass1t + βNebNeb1t + βPMassPriceReal_Masst

+ βPNebPriceReal_Nebt + et

where

Neb1t = 1 − Mass1t
PriceReal_Masst = PriceRealt × Mass1t
PriceReal_Nebt = PriceRealt × Neb1t

Let bMass, bNeb, bPMass, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:

EstPetroConsPC =
bMassMass1 + bNebNeb1 + bPMassPriceReal_Mass + bPNebPriceReal_Neb

i. What equation estimates per capita petroleum consumption in Massachusetts?

ii. What is the intercept of this equation?
iii. What is the slope of this equation?
510 Chapter 15

b. Focus on Nebraska. Based on this model:

i. What equation estimates per capita petroleum consumption in Nebraska?
ii. What is the intercept of this equation?
iii. What is the slope of this equation?
c. Recall your answers to exercises 3 and 4. Can you solve for the numerical values of bMass,
bNeb, bPMass, and bPNeb using these answers? If so, do so; if not, explain why not.
d. Next use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of this model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass and Neb.]

e. Are your results for parts c and d consistent?

6. Consider the following new model explaining petroleum consumption:

PetroConsPCt =
βConst + βMassMass1t + βPNebNeb1t + βPMassPriceReal_Masst + βPNebPriceReal_Nebt + et

Let bConst, bMass, bNeb, bPMass, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:

EstPetroConsPC =
bConst + bMassMass1 + bNebNeb + bPMassPriceReal_Mass + bPNebPriceReal_Neb

a. Focus on Massachusetts. Based on this model:

i. What equation estimates per capita petroleum consumption in Massachusetts?
ii. What is the intercept of this equation?
iii. What is the slope of this equation?
b. Focus on Nebraska. Based on this model:
i. What equation estimates per capita petroleum consumption in Nebraska?
ii. What is the intercept of this equation?
iii. What is the slope of this equation?
c. Recall your answers to exercises 3 and 4. Can you solve for the numerical values of bConst,
bMass, bNeb, bPMass, and bPNeb using these answers? If so, do so; if not, explain why not.
d. Next use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of this model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass and Neb.]
511 Other Regression Statistics and Pitfalls

e. Are your results for parts c and d consistent?

7. Consider the following new model:

PetroConsPCt =
βMassMass1t + βPNebNeb1t + βPPriceRealt + βPMassPriceReal_Masst + βPNebPriceReal_Nebt + et
Let bMass, bNeb, bConst, bP, and bPNeb equal the ordinary least squares (OLS) estimates of the
parameters:

EstPetroConsPC =
bMassMass1 + bNebNeb1t + bPPriceReal + bPMassPriceReal_Mass + bPNebPriceReal_Neb

a. Focus on Massachusetts. Based on this model:

i. What equation estimates per capita petroleum consumption in Massachusetts?
ii. What is the intercept of this equation?
iii. What is the slope of this equation?
b. Focus on Nebraska. Based on this model:
i. What equation estimates per capita petroleum consumption in Nebraska?
ii. What is the intercept of this equation?
iii. What is the slope of this equation?
c. Recall your answers to exercises 3 and 4. Can you solve for the numerical values of bMass,
bNeb, bP, bPMass, and bPNeb using these answers? If so, do so; if not, explain why not.
d. Next use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of this model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass and Neb.]

e. Are your results for parts c and d consistent?

Heteroskedasticity
16

Chapter 16 Outline

16.1 Review
16.1.1 Regression Model
16.1.2 Standard Ordinary Least Squares (OLS) Premises
16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure

16.2 What Is Heteroskedasticity?

16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
16.3.1 The Mathematics
16.3.2 Our Suspicions
16.3.3 Confirming Our Suspicions

16.4 Accounting for Heteroskedasticity: An Example

16.5 Justifying the Generalized Least Squares (GLS) Estimation Procedure

16.6 Robust Standard Errors

Chapter 16 Prep Questions

1. What are the standard ordinary least squares (OLS) premises?

2. In chapter 6 we showed that the ordinary least squares (OLS) estimation procedure for the
coefficient value was unbiased; that is, we showed that
514 Chapter 16

Mean[bx] = βx

Review the algebra. What role, if any, did the first standard ordinary least squares (OLS) premise,
the error term equal variance premise, play?

3. In chapter 6 we showed that the variance of the coefficient estimate’s probability distribution
equals the variance of the error term’s probability distribution divided by the sum of the squared
x deviations; that is, we showed that

Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2

Review the algebra. What role, if any, did the first standard ordinary least squares (OLS) premise,
the error term equal variance premise, play?

4. Consider the following data describing Internet use in 1992:

1992 Internet data: Cross-sectional data of Internet use and gross domestic product for 29
countries in 1992.

Countryt Name of country t

GdpPCt Per capita GDP (1,000’s of real “international” dollars) in nation t
LogUsersInternett Log of Internet users per 1,000 people in nation t

Focus on the following model:

LogUsersInternett = βConst + βGDPGdpPCt + et

a. What is your theory concerning how per capita GDP should affect Internet use? What does
your theory suggest about the sign of the GdpPC coefficient, βGDP?
b. Run the appropriate regression. Do the data support your theory?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet Use—1992.]

c. What are the appropriate null and alternative hypotheses?

d. After running the regression, plot a scatter diagram of those residuals and per capita
GDP.
515 Heteroskedasticity

Getting Started in EViews

• Run the ordinary least squares (OLS) regression.

• The residuals from the regression we just ran are automatically stored by EViews as the vari-
able resid.
• In the Workfile window: First Click gdppc; then hold down the <Ctrl> key and click resid.
• In the Workfile window: Double click on a highlighted variable
• In the Workfile window: Click Open Group.
• In the Group window: Click View and then Graph.
• In the Graph Options window: Click Scatter.

e. Based on the scatter diagram, what do you conclude about the variance of the residuals
as per capita GDP increases?
5. Again, consider the following model:

LogUsersInternett = βConst + βGDPGdpPCt + et

Assume that the variance of the error term’s probability distribution is proportional to each
nation’s per capita GDP:

Var[et] = V × GdpPCt

where V is a constant.

Now divide both sides of the equation that specifies the model by the square root of per capita
GDP, GdpPCt . Let

et
εt =
GdpPCt

What is the variance of εt’s probability distribution?

16.1 Review

16.1.1 Regression Model

We begin by reviewing the basic regression model:

yt = βConst + βxxt + et, t = 1, 2, . . . , T

516 Chapter 16

where

yt = dependent variable
et = error term
xt = explanatory variable
T = sample size

The error term is a random variable that represents random influences: Mean[et] = 0

16.1.2 The Standard Ordinary Least Squares (OLS) Premises

We will now focus our attention on the standard ordinary least squares (OLS) regression
premises:
• Error term equal variance premise: The variance of the error term’s probability distribution
for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent: Cov[ei, ej]
= 0.
Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.
• Explanatory variable/error term independence premise: The explanatory variables, the xt’s,
and the error terms, the et’s, are not correlated.
Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.

16.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure

The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
• Values of the regression parameters, βx and βConst:

∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2

• Variance of the error term’s probability distribution, Var[e]:

SSR
EstVar[ e] =
Degrees of freedom
517 Heteroskedasticity

• Variance of the coefficient estimate’s probability distribution, Var[bx]:

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2

When the standard ordinary least squares (OLS) regression premises are met:
• Each estimation procedure is unbiased; that is, each estimation procedure does not systemati-
cally underestimate or overestimate the actual value.
• The ordinary least squares (OLS) estimation procedure for the coefficient value is the best
linear unbiased estimation procedure (BLUE).

In this chapter we will focus on the first standard ordinary least squares (OLS) premise, the
error term equal variance premise. We begin by examining precisely what the premise means.
Subsequently we investigate what problems do and do not emerge when the premise is violated
and finally what can be done to address the problems that do arise.

16.2 What Is Heteroskedasticity?

Heteroskedasticity refers to the variances of the error terms’ probability distributions. The syl-
lable “hetero” means different; the syllable “skedasticity” refers to the spread of the distribution.
Heteroskedasticity means that the spread of the error term’s probability distribution differs from
observation to observation. Recall the error term equal variance premise:
• Error term equal variance premise: The variance of the error term’s probability distribution
for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

The presence of heteroskedasticity violates the error term equal variance premise.
We begin by illustrating the effect of heteroskedasticity on the error terms. Consider the three
students in Professor Lord’s class who must take a quiz every Tuesday morning:

Student 1 Student 2 Student 3

y1 = βConst + βxx1 + e1 y2 = βConst + βxx2 + e2 y3 = βConst + βxx3 + e3
y1 = Student 1’s score y2 = Student 2’s score y3 = Student 3’s score
x1 = Student 1’s studying x2 = Student 2’s studying x3 = Student 3’s studying
e1 = Student 1’s error term e2 = Student 2’s error term e3 = Student 3’s error term
518 Chapter 16

Err var Heter

50 –2.0
200 –1.0
350 0.0
500 1.0

Repetition Pause

Start

Figure 16.1
Heteroskedasticity simulation

The error terms, the et’s, represent random influences; that is, the error terms have no systematic
effort on the dependent variable yt. Consequently the mean of each error term’s probability distribu-
tion, Mean[et], equals 0. In other words, if the experiment were repeated many, many times, the error
term would average out to be 0. When the distribution is symmetric, half the time the error term
would be positive leading to a higher than normal value of yt and half the time it would be negative
leading to a lower than normal value of yt. We will use a simulation to illustrate this (figure 16.1).

Econometrics Lab 16.1: The Error Terms and Heteroskedasticity

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 16.1.]

The list labeled Heter is the “heteroskedasticity factor.” Initially, Heter is specified as 0, meaning
that no heteroskedasticity is present. Click Start and then Continue a few times to note that the
distribution of each student’s error terms is illustrated in the three histograms at the top of the
window. Also the mean and variance of each student’s error terms are computed. Next uncheck
the Pause checkbox and click Continue; after many, many repetitions of the experiment click
Stop. The mean of each student’s error term is approximately 0, indicating that the error terms
truly represent random influences; the error terms have no systematic affect on a student’s quiz
score. Furthermore the spreads of each student’s error term distribution appear to be nearly
identical; the variance of each student’s error term is approximately the same. Hence the error
term equal variance premise is satisfied (figure 16.2).
Next change the value of the Heter. When a positive value is specified, the distribution spread
increases as we move from student 1 to student 2 to student 3; when a negative value is speci-
fied, the spread decreases. Specify 1 instead of 0. Note that when you do this, the title of the
variance list changes to Mid Err Var. This occurs because heteroskedasticity is now present and
the variances differ from student to student. The list specifies the variance of the middle student’s,
student 2’s, error term probability distribution. By default student 2’s variance is 500. Now, click
Start and then after many, many repetitions of the experiment click Stop. The distribution
spreads of each student’s error terms are not identical (figure 16.3).
519 Heteroskedasticity

Student 1 Student 2 Student 3

Mean: 0 Variance: 500 Mean: 0 Variance: 500 Mean: 0 Variance: 500

Figure 16.2
Error term probability distributions—Error term equal variance premise satisfied

Student 1 Student 2 Student 3

Mean: 0 Variance: 100 Mean: 0 Variance: 500 Mean: 0 Variance: 1210

Figure 16.3
Error term probability distributions—Error term equal variance premise violated

The error term equal variance premise is now violated. What might cause this discrepancy?
Suppose that student 1 tries to get a broad understanding of the material and hence reads all the
assignment albeit quickly. However, student 3 guesses on what material will be covered on the
quiz and spends his/her time thoroughly studying only that material. When student 3 guesses
right, he/she will do very well on the quiz, but when he/she guesses wrong, he/she will do very
poorly. Hence we would expect student 3’s quiz scores to be more volatile than student 1’s. This
volatility is reflected by the variance of the error terms. The variance of student 3’s error term
distribution would be greater than student 1’s.

16.3 Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences

16.3.1 The Mathematics

Now let us explore the consequences of heteroskedasticity. We will focus on two of the three
estimation procedures embedded within the ordinary least squares (OLS) estimation
procedure:
520 Chapter 16

• Value of the coefficient.

• Variance of the coefficient estimate’s probability distribution.

Question: Are these estimation procedures still unbiased when heteroskedasticity is present?

Ordinary Least Squares (OLS) Estimation Procedure for the Coefficient Value
Begin by focusing on the coefficient value. Previously we showed that the estimation procedure
for the coefficient value was unbiased by
• applying the arithmetic of means;

and
• recognizing that the means of the error terms’ probability distributions equal 0 (since the error
terms represent random influences).

Let us quickly review. First recall the arithmetic of means:

Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]

Mean of a constant times a variable: Mean[cx] = c Mean[x]
Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]

To keep the mathematics straightforward, we focused on a sample size of 3:

Equation for coefficient estimate:

∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2

Now some algebra:1

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Mean[ bx ] = Mean ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦

Applying Mean[c + x] = c + Mean[x]

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= β x + Mean ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦

1. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can apply the arithmetic of means easily. Our results are unaffected by this assumption.
521 Heteroskedasticity

Applying Mean[cx] = c Mean[x]

1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2

Applying Mean[x + y] = Mean[x] + Mean[y]

1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2

Applying Mean[cx] = c Mean[x]

1
= βx + [( x1 − x )Mean[ e1 ] + ( x2 − x )Mean[ e2 ] + ( x3 − x )Mean[ e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2

Since Mean[e1] = Mean[e2] = Mean[e3] = 0

= βx

What is the critical point here? We have not relied on the error term equal variance premise
to show that the estimation procedure for the coefficient value is unbiased. Consequently we
suspect that the estimation procedure for the coefficient value should still be unbiased in the
presence of heteroskedasticity.

Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the Coefficient
Estimate’s Probability Distribution
Next consider the estimation procedure for the variance of the coefficient estimate’s probability
distribution used by the ordinary least squares (OLS) estimation procedure. The strategy involves
two steps:
• First, we used the adjusted variance to estimate the variance of the error term’s probability
distribution: EstVar[e] = SSR/Degrees of freedom.
• Second, we applied the equation relating the variance of the coefficient estimates
probability distribution and the variance of the error term’s probability distribution:
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2
T

Step 1: Estimate the variance of the error term’s Step 2: Apply the relationship between the
probability distribution from the available variances of coefficient estimate’s and
information—data from the first quiz error term’s probability distributions
↓ ↓
SSR Var[ e]
EstVar[ e] = Var[ bx ] =
∑
T
Degrees of freedom
t =1
( xt − x )2

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2
522 Chapter 16

This strategy is grounded on the premise that the variance of each error term’s probability
distribution is the same, the error term equal variance premise:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

Unfortunately, when heteroskedasticity is present, the error term equal variance premise is vio-
lated because there is not a single Var[e]. The variance differs from observation to observation.
When heteroskedasticity is present the strategy used by the ordinary least squares (OLS) estima-
tion procedure to estimate the coefficient estimate’s probability distribution is based on a faulty
premise. The ordinary least squares (OLS) estimation procedure is trying to estimate something
that does not exist, a single Var[e]. Consequently we should be suspicious of the procedure.

16.4.2 Our Suspicions

So, where do we stand? We suspect that when heteroskedasticity is present, the ordinary least
squares (OLS) estimation procedure for the
• coefficient value will still be unbiased;
• variance of the coefficient estimate’s probability distribution may be biased.

16.4.3 Confirming Our Suspicions

We will use the Econometrics Lab to address the two suspicions.

Econometrics Lab 16.2: Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation
Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 16.2.]

The simulation (figure 16.4) allows us to address the two critical questions:
• Question 1: Is the estimation procedure for the coefficient’s value unbiased; that is, does the
mean of the coefficient estimate’s probability distribution equal the actual coefficient value? The
relative frequency interpretation of probability allows us to address this question by using the
simulation. After many, many repetitions the distribution of the estimated values mirrors the
probability distribution. Therefore we need only compare the mean of the estimated coefficient
values with the actual coefficient values. If the two are equal after many, many repetitions, the
estimation procedure is unbiased.
523 Heteroskedasticity

Act coef Act err var

Is the estimation Actual variance
Unbiased estimation −2 200
procedure for the of “middle”
procedure: After many, 0 350
coefficient’s value error term’s
many repetitions of the
2 unbiased? 500 probability
experiment the average
distribution
(mean) of the
estimates equals Mean (average)
the actual value of the estimated
Repetition coefficient
values from
Coef value est all repetitions

Estimated coefficient value

Mean
from this repetition:
Var
Σt=1 (yt − −y)(xt − −x)
T

bx = Variance of the estimated

Σt=1 (xt − −x)2
T
coefficient values from all
Sum sqr XDev repeitions

SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
the coefficient estimate’s Mean
probability distribution Mean(average) of the variance
calculated from this repetition estimates from all repetitions

Figure 16.4
Heteroskedasticity simulation

Relative frequency
Unbiased
interpretation of Mean
or biased
probability of the coefficient
estimate’s probability ↓ Actual
↓ distribution coefficient
= or ≠ value
After many, many
→ ↓
repetitions
Mean (average) of
the estimated
coefficient values
after many, many
repetitions
524 Chapter 16

• Question 2: Is the estimation procedure for the variance of the coefficient estimate’s prob-
ability distribution unbiased? Again, the relative frequency interpretation of probability allows
us to address this question by using the simulation. We need only compare the variance of the
estimated coefficient values and estimates for the variance after many, many repetitions. If the
two are equal, the estimation procedure is unbiased.

Relative frequency
Unbiased
interpretation of Variance
or biased
probability of the coefficient
estimate’s probability ↓ Mean (average) of the
↓ distribution variance estimates after
= or ≠ many, many repetitions
After
many, many → ↓
repetitions
Variance of
the estimated
coefficient values
after many, many
repetitions

Note that the “Heter” list now appears in this simulation. This list allows us to investigate the
effect of heteroskedasticity (figure 16.5). Initially, 0 is specified as a benchmark, meaning that
no heteroskedasticity is present. Click Start and then after many, many repetitions click Stop.
The simulation results appear in table 16.1. In the absence of heteroskedasticity both estimation
procedures are unbiased:
• The estimation procedure for the coefficient value is unbiased. The mean (average) of the
coefficient estimates equals the actual coefficient value; both equal 2.

Heter

–2.0
–1.0
0.0
1.0
2.0

Figure 16.5
Heteroskedasticity factor list
525 Heteroskedasticity

Table 16.1
Heteroskedasticity simulation results

Is OLS estimation Is OLS estimation procedure for the variance of

procedure for the coefficient’s the of the coefficient estimate’s robability
value unbiased? distribution unbiased?

Actual Estimate of Variance of the Estimate of the variance

coefficient coefficient estimated coefficient for coefficient estimate’s
value value values probability distribution
↓ ↓ ↓ ↓
Mean (average) Variance of the Average of
Actual of the estimated estimated coefficient estimated variances,
Heter value values, bx, from values, bx, from all EstVar[bx], from
factor of βx all repetitions repetitions all repetitions

0 2.0 ≈2.0 ≈2.5 ≈2.5

1 2.0 ≈2.0 ≈3.6 ≈2.9

• The estimation procedure for the variance of the coefficient estimates probability distribution
is unbiased. The mean (average) of the variance estimates equals the actual variance of the coef-
ficient estimates; both equal 2.5.

When the standard ordinary least squares (OLS) premises are satisfied, both estimation proce-
dures are unbiased.
Next we will investigate the effect of heteroskedasticity by selecting 1.0 from the “Heter” list.
Heteroskedasticity is now present. Click Start and then after many, many repetitions click Stop.
• The estimation procedure for the coefficient value is unbiased. The mean (average) of the
coefficient estimates equals the actual coefficient value; both equal 2.
• The estimation procedure for the variance of the coefficient estimates probability distribution
is biased. The mean (average) of the estimated variances equals 2.9, while the actual variance
equals 3.6.

The simulation results confirm our suspicions. When heteroskedasticity is present there is
some good news, but also some bad news:
• Good news: The ordinary least squares (OLS) estimation procedure for the coefficient value
is still unbiased.
• Bad news: The ordinary least squares (OLS) estimation procedure for the variance of the
coefficient estimate’s probability distribution is biased.

When the estimation procedure for the variance of the coefficient estimate’s probability dis-
tribution is biased, all calculations based on the estimate of the variance will be flawed also;
that is, the standard errors, t-statistics, and tail probabilities appearing on the ordinary least
526 Chapter 16

squares (OLS) regression printout are unreliable. Consequently we will use an example to
explore how we can account for the presence of heteroskedasticity.

16.5 Accounting for Heteroskedasticity: An Example

We can account for heteroskedasticity by applying the following steps:

Step 1: Apply the ordinary least squares (OLS) estimation procedure.

Estimate the model’s parameters with the ordinary least squares (OLS) estimation procedure.
Step 2: Consider the possibility of heteroskedasticity.
• Ask whether there is reason to suspect that heteroskedasticity may be present.
• Use the ordinary least squares (OLS) regression results to “get a sense” of whether hetereo-
skedasticity is a problem by examining the residuals.
• If the presence of hetereoskedasticity is suspected, formulate a model to explain it.
• Use the Breusch–Pagan–Godfrey approach by estimating an artificial regression to test for the
presence of heteroskedasticity.

Step 3: Apply the generalized least squares (GLS) estimation procedure.

•Apply the model of heteroskedasticity and algebraically manipulate the original model to
derive a new, tweaked model in which the error terms do not suffer from heteroskedasticity.
• Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the
tweaked model.

We will illustrate this approach by considering the effect of per capita GDP on Internet use.

Theory: Higher per capita GDP increases Internet use.

Project: Assess the effect of GDP on Internet use.

To assess the theory we construct a simplified model with a single explanatory variable, per
capita GDP. Previously we showed that several other factors proved important in explaining
Internet use. We include only per capita GDP here for pedagogical reasons: to provide a simple
illustration of how we can account for the presence of heteroskedasticity.

LogUsersInternett = βConst + βGDPGdpPCt + et

where

LogUsersInternett = log of Internet users per 1,000 people in nation t

GdpPCt = per capita GDP (1,000s of real “international” dollars) in nation t

The theory suggests that the model’s coefficient, βGDP, is positive. To keep the exposition clear,
we will use data from a single year, 1992, to test this theory:
527 Heteroskedasticity

Table 16.2
Internet regression results

Ordinary least squares (OLS)

Dependent variable: LogUsersInternet

Explanatory variable(s): Estimate SE t-Statistic Prob

GdpPC 0.100772 0.032612 3.090019 0.0046

Const − 0.486907 0.631615 − 0.770891 0.4475
Number of observations 29
Estimated equation: EstLogUsersInternet = − 0.487 + 0.101GdpPC
Interpretation of estimates:
bGDP = 10.1: a $1,000 increase in real per capita GDP results in a 10.1 percent increase in Internet users.
Critical result: The GdpPC coefficient estimate equals 0.101. The positive sign of the coefficient estimate suggests
that higher per capita GDP increases Internet use. This evidence supports the theory.

1992 Internet data: Cross-sectional data of Internet use and gross domestic product for 29
countries in 1992.

Countryt Name of country t

GdpPCt Per capita GDP (1,000s of real “international” dollars) in nation t
LogUsersInternett Log of Internet users per 1,000 people in country t
Yeart Year

We now follow the steps outlined above.

Step 1: Apply the ordinary least squares (OLS) estimation procedure.

Using statistical software, we run a regression with the log of Internet use as the dependent
variable and the per capita GDP as the explanatory variable (table 16.2).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet Use—1992.]

Since the evidence appears to support the theory, we construct the null and alternative
hypotheses:

H0: βGDP = 0 Per capita GDP does not affect Internet use
H1: βGDP > 0 Higher per capita GDP increases Internet use

As always, the null hypothesis challenges the evidence; the alternative hypothesis is consistent
with the evidence. Next we calculate Prob[Results IF H0 true].
528 Chapter 16

Prob[Results IF H0 true]: What is the probability that the GdpPC estimate from one repetition
of the experiment will be 0.101 or more, if H0 were true (i.e., if the per capita GDP has no effect
on the Internet use, if βGDP actually equals 0)?

We now apply the properties of the coefficient estimate’s probability distribution:

OLS estimation If H0 Number of Number of

Standard error
procedure unbiased true observations parameters
↓
Mean[bGDP] = βGDP = 0 SE[bGDP] = 0.0326 DF = 29 − 2 = 27

Econometrics Lab 16.3: Calculating Prob[Results IF H0 true]

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 16.3.]

To emphasize that the Prob[Results IF H0 true] depends on the standard error we will use the
Econometrics Lab to calculate the probability. The following information has been entered in
the lab:

Mean = 0 Value = 0.101

Standard error = 0.0326 Degrees of freedom = 27

Click Calculate.

Prob[Results IF H0 true] = 0.0023.

We use the standard error provided by the ordinary least squares (OLS) regression results to
compute the Prob[Results IF H0 true].
We can also calculate Prob[Results IF H0 true] using the tails probability reported in the
regression printout. Since this is a one-tailed test, we divide the tails probability by 2:

0.0046
Prob[ Results IF H 0 true] = ≈ 0.0023
2

Based on the 1 percent significance level, we would reject that null hypothesis. We would reject
the hypothesis that per capita GDP has no effect on Internet use.
There may a problem with this, however. The equation used by the ordinary least squares
(OLS) estimation procedure to estimate the variance of the coefficient estimate’s probability
distribution assumes that the error term equal variance premise is satisfied. Our simulation
revealed that when heteroskedasticity is present and the error term equal variance premise is
violated, the ordinary least squares (OLS) estimation procedure estimating the variance of the
coefficient estimate’s probability distribution was flawed. Recall that the standard error equals
529 Heteroskedasticity

the square root of the estimated variance. Consequently, if heteroskedasticity is present, we may
have entered the wrong value for the standard error into the Econometrics Lab when we calcu-
lated Prob[Results IF H0 true]. When heteroskedasticity is present the ordinary least squares
(OLS) estimation procedure bases it computations on a faulty premise, resulting in flawed stan-
dard errors, t-Statistics, and tails probabilities. Consequently we should move on to the next
step.

Step 2: Consider the possibility of heteroskedasticity.

We must pose the following question:

Question: Is there reason to believe that heteroskedasticity could possibly be present?

Intuition leads us to suspect that the answer is yes. When the per capita GDP is low, individuals
have little to spend on any goods other than the basic necessities. In particular, individuals have
little to spend on Internet use and consequently Internet use will be low. This will be true for
all countries in which per capita GDP is low. On the contrary, when per capita GDP is high,
individuals can afford to purchase more goods. Naturally, consumer tastes vary from nation to
nation. In some high per capita GDP nations, individuals will opt to spend much on Internet use
while in other nations individuals will spend little. A scatter diagram of per capita GDP and
Internet use appears to confirm our intuition (figure 16.6).

3
Log Users Internet

0
0 5 10 15 20 25 30 35

GdpPC

Figure 16.6
Scatter diagram: GdpPC versus LogUsersInternet
530 Chapter 16

As per capita GDP rises we observe a greater variance for the log of Internet use per 1,000
persons. In nations with low levels of per capita GDP (less than $15,000), the log varies between
about 0 and 1.6. Whereas in nations with high level of per capita GDP (more than $15,000), the
log varies between about 0 and 3.20. What does this suggest about the error term in our model:
LogUsersInternett = βConst + βGDPGdpPCt + et

Two nations with virtually the same level of per capita GDP have quite different rates of Internet
use. The error term in the model would capture these differences. Consequently, as per capita
GDP increases, we would expect the variance of the error term’s probability distribution to
increase.
Of course, we can never observe the error terms themselves. We can, however, think of the
residuals as the estimated error terms:

Error term Residual

↓ ↓
yt = βConst + βxxt + et Rest = yt − Estt
↓ ↓
et = yt − (βConst + βxxt) Rest = yt − (bConst + bxxt)

Since the residuals are observable we can plot a scatter diagram of the residuals, the estimated
errors, and per capita GDP to illustrate how they are related (figure 16.7).

0.4

0.3

0.2

0.1
Residuals

0
0 5 10 15 20 25 30 35
–0.1

–0.2

–0.3

–0.4

GdpPC

Figure 16.7
Scatter Diagram: GdpPC versus Residuals
531 Heteroskedasticity

Getting Started in EViews

• Run the ordinary least squares (OLS) regression.

• The residuals from the regression we just ran are automatically stored by EViews as the vari-
able resid.
• In the Workfile window: First Click gdppc; then hold down the <Ctrl> key and click resid.
• In the Workfile window: Double click on a highlighted variable.
• In the Workfile window: Click Open Group.
• In the Group window: Click View and then Graph.
• In the Graph Options window: Click Scatter.

Our suspicions appear to be borne out. The residuals in nations with high per capita GDP are
more spread out than in nations with low per capita GDP. It appears that heteroskedasticity could
be present. The error term equal variance premise may be violated. Consequently we must be
suspicious of the standard errors and probabilities appearing in the regression printout; the
ordinary least squares (OLS) estimation procedure is calculating these values based on what
could be an invalid premise, the error term equal variance premise.
Since the scatter diagram suggests that our fears may be warranted, we now test the hetero-
skedasticity more formally. While there are several different approaches, we will focus on the
Breusch–Pagan–Godfrey test, which utilizes an artificial regression based on the following
model:

Heteroskedasticity model:

(et − Mean[et])2 = αConst + αGDPGdpPCt + vt

The model suggests that as GdpPCt increases, the squared deviation of the error term from its
mean increases. Based on the scatter diagram appearing in figure 16.7, we suspect that αGDP is
positive:

Theory:

αGDP > 0

We can simplify this model by recognizing that the error term represents random influences;
hence the mean of its probability distribution equals 0; hence,

(et − Mean[et])2 = e 2t

So our model becomes

Heteroskedasticity model:

e 2t = αConst + αGDPGdpPCt + vt
532 Chapter 16

Table 16.3
Breusch–Pagan–Godfrey results

Ordinary least squares (OLS)

Dependent variable: ResSqr

Explanatory variable(s): Estimate SE t-Statistic Prob

GdpPC 0.086100 0.031863 2.702189 0.0118

Const − 0.702317 0.617108 −1.138078 0.2651
Number of observations 29
Critical result: The GdpPC coefficient estimate equals .086. The positive sign of the coefficient estimate suggests
that higher per capita GDP increases the squared deviation of the error term from its mean. This
evidence supports the theory that heteroskedasticity is present.

Of course, we can never observe the error terms themselves. We can, however, think of the
residuals as the estimates of the error terms. We substitute the residual squared for the error term
squared:

Heteroskedasticity model:

ResSqrt = αConst + αGDPGdpPCt + vt

where ResSqrt = square of the residual for nation t.

We must generate the new variable ResSqr. To do so, run the original regression so that the
statistical software calculates the residuals. After gaining access to the residuals, square them to
generate the new residuals squared variable, ResSqr. Then estimate the model’s parameters (table
16.3).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet Use—1992.]

Next formulate the null and alternative hypotheses for the artificial regression model:

H0: αGDP = 0 Per capita GDP does not affect the squared deviation of the residual
H1: αGDP > 0 Higher per capita GDP increases the squared deviation of the residual

and compute Prob[Results IF H0 true] from the tails probability reported in the regression
printout:

0.0118
Prob[ Results IF H 0 true] = ≈ 0.0059
2

Statistical software can perform the Breusch–Pagan–Godfrey test automatically:

533 Heteroskedasticity

Getting Started in EViews

• Run the ordinary least squares (OLS) regression.

• In the equation window, click View, Residual Diagnostics, and Heteroskedasticity Tests
• The Breusch–Pagan–Godfrey test is the default
• After checking the explanatory variables (the regressors), click OK.

We reject the null hypothesis at the traditional significance levels of 1, 5, and 10 percent. Our
formal test reinforces our suspicion that heteroskedasticity is present. Furthermore note that the
estimate of the constant is not statistically significantly different from 0 even at the 10 percent
significance level. We will exploit this to simplify the mathematics that follow. We assume that
the variance of the error term’s probability distribution is directly proportional to per capita GDP:

Var[et] = V × GdpPCt

where V equals a constant.

As per capita GDP increases the variance of the error term’s probability increases just as the
scatter diagram suggests. We will now use this specification of error term variance to tweak the
original model to “eliminate” heteroskedasticity. This approach is called the generalized least
square (GLS) estimation procedure.

Step 3: Apply the generalized least squares (GLS) estimation procedure.

Strategy: Algebraically manipulate the original model so that the problem of heteroskedasticity
is eliminated in the new model. That is, tweak the original model so that variance of each nation’s
error term’s probability distribution is the same. We can accomplish this with just a little algebra.
Based on our scatter diagram and the Breusch–Pagan–Godfrey test, we assume that the vari-
ance of the error term’s probability distribution is proportional to per capita GDP:

Var[et] = V × GdpPCt

where V equals a constant.

We begin with our original model:

Original model:

LogUsersInternett = βConst + βGDPGdpPCt + et

Now divide both sides of the equation by GdpPCt . (For the moment, do not worry about why
we divide by GdpPCt ; we will justify that shortly.)
534 Chapter 16

Tweaked model:

LogUsersInternett βConst GdpPCt et

= + βGDP +
GdpPCt GdpPCt GdpPCt GdpPCt
βConst
= + βGDP GdpPCt + ε t
GdpPCt
where ε t = et GdpPCt .
To understand why we divided by GdpPCt , focus on the error term, εt, in the tweaked
model. What does the variance of this error term’s probability distribution equal?

⎡ et ⎤
Var[e t ] = Var ⎢ ⎥
⎣ GdpPCt ⎦
↓ Arithmetic of variances: Van[cx] = c2Var[x]
1
= Var[ et ]
GdpPCt
↓ Var[et] = V × GpcPCt, where V equals a constant
1
= V × GpcPCt
GdpPCt
=V

We divided the original model by GdpPCt so that the variance of the error term’s probability
distribution in the tweaked model equals V for each observation. Consequently the error term
equal variance premise is satisfied in the tweaked model. Therefore the ordinary least squares
(OLS) estimation procedure computations of the estimates for the variance of the error term’s
probability distribution will not be flawed in the tweaked model.
The dependent and explanatory variables in the new tweaked model are:
LogUsersInternett
Tweaked dependent variable: AdjLogUsersInternett =
GdpPCt
1
Tweaked explanatory variables: AdjConstt =
GdpPCt
AdjGdpPCt = GdpPCt

NB: The tweaked model does not include a constant.

When we run a regression with the new dependent and new explanatory variables, we should
now have an unbiased procedure to estimate the variance of the coefficient estimate’s probability
distribution. Be careful to eliminate the constant when running the tweaked regression (table
16.4).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet Use—1992.]
535 Heteroskedasticity

Table 16.4
Tweaked Internet regression results

Ordinary least squares (OLS)

Dependent variable: AdjLogUsersInternet

Explanatory variable(s): Estimate SE t-Statistic Prob

AdjGdpPC 0.113716 0.026012 4.371628 0.0002

AdjConst − 0.726980 0.450615 −1.613306 0.1183
Number of observations 29
Estimated equation: EstAdjLogUsersInternet = −0.727AdjConst + 0.114GdpPC
Interpretation of estimates:
bGDP = 11.4: a $1,000 increase in real per capita GDP results in a 11.4 percent increase in Internet users

Table 16.5
Comparison of Internet regression results

βGDP coefficient Standard Tails

estimate error t-Statistic probability

Ordinary least squares (OLS) 0.101 0.033 3.09 0.0046

Generalized least squares (GLS) 0.114 0.026 4.37 0.0002

Now let us compare the tweaked regression for βGDP and the original one (table 16.5).
The most striking differences are the calculations that are based on the estimated variance of
the coefficient probability distribution: the standard error, the t-Statistic, and Prob values. This
is hardly surprising. The ordinary least squares (OLS) regression calculations are based on the
error term equal variance premise. Our analysis suggests that this premise is violated ordinary
least squares (OLS) regression, however. Consequently the standard Error, t-Statistic, and Prob.
calculations, will be flawed when we use the ordinary least squares (OLS) estimation procedure.
The general least squares (GLS) regression corrects for this.
Recall the purpose of our analysis in the first place: Assess the effect of per capita GDP on
Internet use. Recall our theory and associated hypotheses:
Theory: Higher per capita GDP increases Internet use.

H0: βGDP = 0 Per capita GDP does not affect Internet use
H1: βGDP > 0 Higher per capita GDP increases Internet use

We see that the value of the tails probability decreases from 0.0046 to 0.0002. Since a one-tailed
test is appropriate, the Prob[Results IF H0 true] declines from 0.0023 to 0.0001. Accounting for
heteroskedasticity has an impact on the analysis.
536 Chapter 16

16.6. Justifying the Generalized Least Squares (GLS) Estimation Procedure

We will now use a simulation to illustrate that the generalized least squares (GLS) estimation
procedure for the following cases:
• Value of the coefficient estimate is unbiased.
• Variance of the coefficient estimate’s probability distribution is unbiased.
• Value of the coefficient estimate is the best linear unbiased estimation procedure (BLUE).

Econometrics Lab 16.4: Generalized Least Squares (GLS) Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 16.4.].]

A new drop down box now appears in figure 16.8. We can specify the estimation procedure:
either ordinary least squares (OLS) or generalized least squares (GLS). Initially, OLS is specified
indicating that the ordinary least squares (OLS) estimation procedure is being used. Also note
that by default 0.0 is specified in the Heter list, which means that no heteroskedasticity is present.
Recall our previous simulations illustrating that the ordinary least squares (OLS) estimation
procedure to estimate the coefficient value and the ordinary least squares (OLS) procedure to
estimate the variance of the coefficient estimate’s probability distribution were both unbiased
when no heteroskedasticity is present. To review this, click Start and then after many, many
repetitions click Stop.
Next introduce heteroskedasticity by selecting 1.0 from the “Heter” list. Recall that while the
ordinary least squares (OLS) estimation procedure for the coefficient’s value was still unbiased,
the ordinary least squares (OLS) estimation procedure for the variance of the coefficient esti-
mate’s probability distribution was biased. To review this, click Start and then after many, many
repetitions click Stop.
Finally, select the generalized least squares (GLS) estimation procedure instead of the ordinary
least squares (OLS) estimation procedure. Click Start and then after many, many repetitions
click Stop. The generalized least squares (GLS) results are reported in the last row of table 16.6.
When heteroskedasticity is present and the generalized least squares (GLS) estimation procedure
is used, the variance of the estimated coefficient values from each repetition of the experiment
equals the average of the estimated variances. This suggests that the generalized least squares
(GLS) procedure indeed provides an unbiased estimation procedure for the variance. Also note
that when heteroskedasticity is present, the variance of the estimated values resulting from
generalized least squares (GLS) is less than ordinary least squares (OLS), 2.3 versus 2.9. What
does this suggest? The lower variance suggests that the generalized least squares (GLS) proce-
dure provides more reliable estimates when heteroskedasticity is present. In fact it can be shown
that the generalized least squares (GLS) procedure is indeed the best linear unbiased estimation
(BLUE) procedure when heteroskedasticity is present.
537 Heteroskedasticity

Act coef Act err var

Is the estimation Actual variance
Unbiased estimation −2 200
procedure for the of “middle”
procedure: After many, 0 350
coefficient’s value error term’s
many repetitions of the
2 unbiased? 500 probability
experiment the average
distribution
(mean) of the
estimates equals Mean (average)
OLS
the actual value of the estimated
Repetition coefficient
OLS estimation values from
procedure is used Coef value est all repetitions

Estimated coefficient value

Mean
from this repetition:

Σt=1 (yt − −y)(xt − −x) Var

bx =
Σt=1 (xt − −x)2
T Variance of the estimated
coefficient values from all
Sum sqr XDev repetitions

SSR
EstVar[e ] = SSR Is the estimation procedure
Degrees of freedom
for the variance of the
EstVar[ e ] coefficient estimate’s
EstVar[ bx ] = probability distribution
Σt=1 (xt − −x)2
T
unbiased?
Coef var est
Estimate of the variance for
the coefficient estimate’s Mean
probability distribution Mean(average) of the variance
calculated from this repetition estimates from all repetitions

Figure 16.8
Heteroskedasticity simulation

Table 16.6
Heteroskedasticity simulation results

Is OLS estimation Is OLS estimation procedure for the variance

procedure for the coefficient’s of the of the coefficient estimate’s probability
value unbiased? distribution unbiased?

Actual Estimate of Variance of the Estimate of the variance

coefficient coefficient estimated coefficient for coefficient estimate’s
value value values probability distribution
↓ ↓ ↓ ↓
Mean (average) Variance of the Average of
Actual of the estimated estimated coefficient estimated variances,
Heter Estim value values, bx, from values, bx, from EstVar[bx], from
factor proc of βx all repetitions all repetitions all repetitions

0 OLS 2.0 ≈2.0 ≈2.5 ≈2.5

1 OLS 2.0 ≈2.0 ≈3.6 ≈2.9
1 GLS 2.0 ≈2.0 ≈2.3 ≈2.3
538 Chapter 16

Let us summarize:

Standard
Is the estimation procedure: Heteroskedasticity
premises
an unbiased estimation procedure for the OLS OLS GLS
coefficient’s value? Yes Yes Yes
variance of the coefficient estimate’s Yes No Yes
probability distribution?
for the coefficient value the best linear Yes No Yes
unbiased estimation procedure (BLUE)?

16.7 Robust Standard Errors

We have seen that two issues emerge when heteroskedasticity is present:

• The standard error calculations made by the ordinary least squares (OLS) estimation procedure
are flawed.
• While the ordinary least squares (OLS) for the coefficient value is unbiased, it is not the best
linear unbiased estimation procedure (BLUE).

Robust standard errors address the first issue and are particularly appropriate when the sample
size is large. White standard errors constitute one such approach. We will not provide a rigorous
justification of this approach, the mathematics is too complex. We will, however, provide the
motivation by taking a few liberties. Begin by reviewing our derivation of the variance of the
coefficient estimate’s probability distribution, Var[bx], presented in chapter 6:
∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2

Next a little algebra obtains

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Var[ bx ] = Var ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦

Applying Var[c + x] = Var[x]

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= Var ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎝
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x )
2 2 ⎠ ⎦
539 Heteroskedasticity

Applying Var[cx] = c2 Var[x]

1
= Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Error term/error term independence premise: The error terms are independent, Var[x + y] =
Var[x] + Var[y]
1
= [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Applying Var[cx] = c2Var[x]

1
=
[( x1 − x ) + ( x2 − x )2 + ( x3 − x )2 ]2
2

[( x1 − x )2 Var[ e1 ] + ( x2 − x )2 Var[ e2 ] + ( x3 − x )2 Var[ e3 ]]

Generalizing

∑
T
t =1
( xt − x )2 Var[ et ]
= 2
⎡ ∑ T ( xt − x )2 ⎤
⎣ t =1 ⎦
Focus on Var[et] and recall that the variance equals the average of the squared deviations from
the mean:

Var[et] = Average of (et − Mean[et])2

Since error terms represent random influences, the mean equals 0 and since we are considering
a single error term, the average is e 2t :

Var[et] = e 2t

While the error terms are not observable, we can think of the residuals as the estimated error
term. Consequently we will use Res 2t to estimate e 2t :

e 2t → Res 2t
540 Chapter 16

Table 16.7
Internet regression results—Robust standard errors

Ordinary least squares (OLS)

Dependent variable: LogUsersInternet

Explanatory variable(s): Estimate SE t-Statistic Prob

GdpPC 0.100772 0.032428 3.107552 0.0044

Const − 0.486907 0.525871 − 0.925906 0.3627
White heteroskedasticity-consistent SEs
Number of observations 29
Estimated equation: EstLogUsersInternet = − 0.487 + 0.101GdpPC

Applying this to the equation for the variance of the coefficient estimate’s probability distribution
obtains

∑
T
t =1
( xt − x )2 Var[ et ]
Var[ b ] =
x 2
⎡ ∑ T ( xt − x )2 ⎤
⎣ t =1 ⎦
↓ Substituting e2t for Var[et]
∑
T
t =1
( xt − x )2 et2
= 2
⎡∑ T
( xt − x )2 ⎤
⎣ t =1 ⎦
↓ Residuals as estimated error terms: e2t → Res2t
∑ ( x − x ) Res
T 2 2
t =1 t t
EstVar[ bx ] = 2
⎡∑ ( x − x ) ⎤
T 2
⎣ t =1 ⎦ t

The White robust standard error is the square root of the estimated variance.2 Statistical software
makes it easy to compute robust standard errors (table 16.7).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet Use—1992.]

Getting Started in EViews

• Run the ordinary least squares (OLS) regression.

• In the equation window, click Estimate and Options.
• In the Coefficient covariance matrix box select White from the drop down list.
• Click OK.

2. While it is beyond the scope of this textbook, it can be shown that although this estimation procedure is biased, the
magnitude of the bias diminishes and approaches zero as the sample size approaches infinity.
541 Heteroskedasticity

Chapter 16 Review Questions

1. In words, what is heteroskedasticity?

2. When heteroskedasticity is present, are all the standard ordinary least (OLS) premises satis-
fied? If not, which one(s) is violated?
3. Suppose that we are using the ordinary least squares (OLS) estimation procedure and that
heteroskedasticity is present.
a. What is the “good news?”
b. What is the “bad news?”

4. When heteroskedasticity is present, how do we account for it?

Chapter 16 Exercises

Judicial data: Cross-sectional data of judicial and economic statistics for the fifty states in 2000.

JudExpt State and local expenditures for the judicial system per 100,000 persons in state t
CrimesAllt Crimes per 100,000 persons in state t
GdpPCt Real per capita GDP in state t (2000 dollars)
Popt Population in state t (persons)
UnemRatet Unemployment rate in state t (percent)
Statet Name of state t
Yeart Year
1. We wish to explain state and local judicial expenditures. To do so, consider the following
linear model:

JudExpt = βConst + βCrimesCrimesAllt + βGDPGdpPCt + et

a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient using the Judicial Data. Interpret the coefficient estimates. What are the critical
results?
542 Chapter 16

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Judicial Expenses.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true]’s and assess your theories.

2. Consider the possibility of heteroskedasticity in the judicial expenditure model.

a. Intuitively, is there reason to suspect that heteroskedasticity might exist? More specifically,
is there reason to suspect that the variance of the error term’s probability distribution may be
correlated with per capita real GDP?
b. Consider the ordinary least squares (OLS) estimates of the parameters that you computed
in the previous question. Plot the residuals versus per capita real GDP. Does your graph appear
to confirm your suspicions concerning the presence of heteroskedasticity?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Judicial Expenses.]

c. Based on your suspicions, formulate a linear model of heteroskedasticity.

d. Use the Breusch–Pagan–Godfrey approach to test for the presence of heteroskedasticity.

3. Apply the generalized least squares (GLS) estimation procedure to the judicial expenditure
model. To simplify the mathematics, use the following equation to model the variance of the
error term’s probability distribution:

Var[et] = V × GdpPCt
where V equals a constant.

a. Apply this heteroskedasticity model to manipulate algebraically the original model to

derive a new, tweaked model in which the error terms do not suffer from
heteroskedasticity.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the tweaked model.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Judicial Expenses.]
4. How, if at all, does accounting for heteroskedasticity affect the assessment of your
theories?

Burglary and poverty data: Cross section of burglary and economic statistics for the fifty states
in 2002.
543 Heteroskedasticity

PovRatet Individuals below the poverty level in state t (percent)

Burglariest Burglaries per 100,000 persons in state t
Popt Population in state t (persons)
UnemRatet Unemployment rate in state t (percent)
Statet Name of state t
5. We wish to explain poverty. To do so, consider the following linear model:

PovRatet = βConst + βUnemUnemRatet + βBurgBurglariest + et

a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient using the burglary and poverty data. Interpret the coefficient estimates. What are
the critical results?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Poverty.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true]’s and assess your theories.

6. Consider the possibility of heteroskedasticity in the poverty model.

Hint: Before answering, consider the following:

• The unemployment rate is calculated from a sample of the population. The sample size in each
state is approximately proportional to the state’s population.
• Is the reported unemployment rate is a state with a large population more or less reliable that
a state with a small population?
•Would you expect the variance of the error term’s probability distribution in a large state to
be more or less than the variance in a small state?
b. Consider the ordinary least squares (OLS) estimates of the parameters that you computed
in the previous question. Plot the residuals versus population.
544 Chapter 16

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Porvety.]

i. Does your graph appear to confirm your suspicions concerning the presence of
heteroskedasticity?
ii. If so, does the variance appear to be directly or inversely proportion to population?
c. Based on your suspicions, formulate a model of heteroskedasticity.
d. Use the Breusch–Pagan–Godfrey approach to test for the presence of heteroskedasticity.

7. Apply the generalized least squares (GLS) estimation procedure to the judicial expenditure
model. To simplify the mathematics use the following equation to model the variance of the
error term’s probability distribution:
V
Var[ et ] =
Popt

where V equals a constant.

a. Apply this heteroskedasticity model to manipulate algebraically the original model to

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Poverty.]

8. How, if at all, does accounting for heteroskedasticity affect the assessment of your
theories.
Autocorrelation (Serial Correlation)
17

Chapter 17 Outline

17.1 Review
17.1.1 Regression Model
17.1.2 Standard Ordinary Least Squares (OLS) Premises
17.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure
17.1.4 Covariance and Independence

17.2 What Is Autocorrelation (Serial Correlation)?

17.3 Autocorrelation and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences
17.3.1 The Mathematics
17.3.2 Our Suspicions
17.3.3 Confirming Our Suspicions

17.4 Accounting for Autocorrelation: An Example

17.5 Justifying the Generalized Least Squares (GLS) Estimation Procedure

17.6 Robust Standard Errors

Chapter 17 Prep Questions

1. What are the standard ordinary least squares (OLS) premises?

2. In chapter 6 we showed that the ordinary least squares (OLS) estimation procedure for the
coefficient value was unbiased; that is, we showed that
546 Chapter 17

Mean[bx] = βx

Review the algebra. What role, if any, did the second premise ordinary least squares (OLS)
premise, the error term/error term independence premise, play?
3. In chapter 6 we showed that the variance of the coefficient estimate’s probability distribution
equals the variance of the error term’s probability distribution divided by the sum of the squared
x deviations; that is, we showed that

Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2

Review the algebra. What role, if any, did the second premise ordinary least squares (OLS)
premise, the error term/error term independence premise, play?

4. Suppose that two variables are positively correlated.

a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does the graph look
like?
c. What can we say about their covariance and correlation coefficient?
5. Suppose that two variables are independent.
a. In words, what does this mean?
b. What type of graph do we use to illustrate their correlation? What does the graph look
like?
c. What can we say about their covariance and correlation coefficient?
6. Consider the following model and data:

ConsDurt = βConst + βIInct + et

Consumer durable data: Monthly time series data of consumer durable production and income
statistics 2004 to 2009.

ConsDurt Consumption of durables in month t (billions of 2005 chained dollars)

Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained dollars)
a. What is your theory concerning how disposable income should affect the consumption of
consumer durables? What does your theory suggest about the sign of the income coefficient,
βI?
b. Run the appropriate regression. Do the data support your theory?
547 Autocorrelation (Serial Correlation)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Consumption and Disposable Income.]

c. Graph the residuals.

Getting Started in EViews

• Run the regression and close the Equation window.

• Click View.
• Click Actual, Fitted, Residual.
• Click Residual Graph.

d. If the residual is positive in one month, is it usually positive in the next month?
e. If the residual is negative in one month, is it usually negative in the next month?
7. Consider the following equations:

yt = βConst + βxxt + et

et = ρet−1 + vt

Estyt = bConst + bxxt

Rest = yt − Estyt

Start with the last equation, the equation for Rest. Using algebra and the other equations,
show that

Rest = (βConst − bConst) + (βx − bx)xt + ρet−1 + vt

8. Consider the following equations:

yt = βConst + βxxt + et

yt−1 = βConst + βxxt−1 + et−1

et = ρet−1 + vt

Multiply the yt−1 equation by ρ. Then subtract it from the yt equation. Using algebra and the et
equation show that

(yt − ρyt−1) = (βConst − ρβConst) + βx(xt − ρxt−1) + vt

548 Chapter 17

17.1 Review

17.1.1 Regression Model

We begin by reviewing the basic regression model:

yt = βConst + βxxt + et, t = 1, 2, ... , T

where
yt = dependent variable
et = error term
xt = explanatory variable
T = sample size
The error term is a random variable that represents random influences: Mean[et] = 0

17.1.2 The Standard Ordinary Least Squares (OLS) Premises

Again, we begin by focusing our attention on the standard ordinary least squares (OLS) regres-
sion premises:
• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = .... = Var[eT] = Var[e]

•Error term/error term independence premise: The error terms are independent:
Cov[ei, ej] = 0.
Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.
• Explanatory variable/error term independence premise: The explanatory variables, the
xt’s, and the error terms, the et’s, are not correlated.
Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.

17.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation
Procedure

The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
549 Autocorrelation (Serial Correlation)

• Values of the regression parameters, βx and βConst:

∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2

• Variance of the error term’s probability distribution, Var[e]:

SSR
EstVar[ e] =
Degrees of freedom
• Variance of the coefficient estimate’s probability distribution, Var[bx]:

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2

Crucial point: When the ordinary least squares (OLS) estimation procedure performs its calcu-
lations, it implicitly assumes that the standard ordinary least squares (OLS) regression premises
are satisfied.
In chapter 16 we focused on the first standard ordinary least squares (OLS) premise. We will
now turn our attention to the second, error term/error term independence premise. We begin by
examining precisely what the premise means. Subsequently, we investigate what problems do
and do not emerge when the premise is violated and finally what can be done to address the
problems that do arise.

17.1.4 Covariance and Independence

We introduced covariance to quantify the notions of correlation and independence. On the one
hand, if two variables are correlated, their covariance is nonzero. On the other hand, if two
variables are independent their covariance is 0. A scatter diagram allows us to illustrate how
covariance is related to independence and correlation. To appreciate why, consider the equation
we use to calculate covariance:

∑ ( xt − x ) ( yt − y )
N
( x1 − x ) ( y1 − y ) + ( x2 − x ) ( y2 − y ) + … + ( xN − x ) ( yN − y ) t =1
Cov[ x, y] = =
N N
550 Chapter 17

(y − −
y)
i

Quadrant II Quadrant I

− ) < 0 (y − − − ) > 0 (y − −
(x − x
(x − x y) > 0 y) > 0
i i i i

−)( y − − − −
(x i − x y) < 0 (x i − x)( y i − y) > 0
i
−
(x i − x )

Quadrant III Quadrant IV

− ) < 0 (y − −
(x − x − ) > 0 (y − −
y) < 0 (x − x y) < 0
i i i i

−)(y − −
(x − x − −
y) > 0 (x i − x)(y i − y) < 0
i i

Figure 17.1
Scatter diagram and covariance

Focus on one term in the numerator the covariance term, (xt − x– )(yi − –y ); consider its sign in
each of the four quadrants (see figure 17.1).
•First quadrant. Dow growth rate is greater than its mean and Nasdaq growth is greater than its
mean; the product of the deviations is positive in the first quadrant:

(xt − x– ) > 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) > 0

•Second quadrant. Dow growth rate is less than its mean and Nasdaq growth is greater than its
mean; the product of the deviations is negative in the second quadrant:

(xt − x– ) < 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) < 0

• Third quadrant. Dow growth rate is less than its mean and Nasdaq growth is less than its mean;
the product of the deviations is positive in the third quadrant:

(xt − x– ) < 0 and (yt − –y ) < 0 → (xt − x– )(yt − –y ) > 0

•Fourth quadrant. Dow growth rate is greater than its mean and Nasdaq growth is less than its
mean; the product of the deviations is negative in the fourth quadrant:

(xt − x– ) > 0 and (yt − –y ) < 0 → (xt − x– )(yt − –y ) < 0

Recall that we used precipitation in Amherst, the Nasdaq growth rate, and the Dow Jones growth
rate to illustrate independent and correlated variables in chapter 1.
551 Autocorrelation (Serial Correlation)

20 Nasdaq
deviation

Deviations
from mean

Precipitation
deviation
0
–5 –4 –3 –2 –1 0 1 2 3 4 5

–10

Cov = ~0.9 ≈ 0
–20

Figure 17.2
Precipitation versus Nasdaq growth

Precipitation in Amherst and the Nasdaq growth rate are independent; knowing one does not
help us predict the other. Figure 17.2 shows that the scatter diagram points are distributed rela-
tively evenly throughout the four quadrants thereby suggesting that the covariance is approxi-
mately 0. However, the Dow Jones growth rate and the Nasdaq growth rate are not independent;
they are correlated. Most points on figure 17.3 are located in the first and third quadrants; con-
sequently most of the covariance terms are positive resulting in a positive covariance.

17.2 What Is Autocorrelation (Serial Correlation)?

Autocorrelation (serial correlation) is present whenever the value of one observation’s error
term allows us to predict the value of the next. When this occurs, one observation’s error term
is correlated with the next observation’s; the error terms are correlated, and the second premise,
the error term/error term independence premise, is violated. The following equation models
autocorrelation:

Autocorrelation model:

et = ρet−1 + vt,

where vt’s are independent.

552 Chapter 17

20 Nasdaq
Feb 2000 deviation

Deviations
from mean

Jan 1987
10

Dow Jones
deviation
0
–20 –10 0 10 20

–10

Cov = 19.5

–20

Figure 17.3
Dow Jones growth versus Nasdaq growth

The Greek letter “rho” is the traditional symbol that is used to represent autocorrelation. When
rho equals 0, no autocorrelation is present; when rho equals 0, the ρet−1 term disappears and the
error terms, the e’s, are independent because the vt’s are independent. However, when rho does
not equal 0, autocorrelation is present.

ρ=0 ρ≠0
↓ ↓
et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present

We will now turn to the Econometrics Lab to illustrate this.

553 Autocorrelation (Serial Correlation)

Econometrics Lab 17.1: The Error Terms and Autocorrelation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 17.1.]

We can use a simulation to illustrate autocorrelation. We begin with selecting .0 in the “rho” list
(figure 17.4). Focus on the et−1 versus et scatter diagram (figure 17.5). You will observe that this
scatter diagram looks very much like the Amherst precipitation–Nasdaq scatter diagram (figure
17.2), indicating that the two error terms are independent; that is, does knowing et−1 not help us
to predict et? Next specify rho to equal 0.9. Now the scatter diagram (figure 17.6) will look
much more like the Dow Jones–Nasdaq scatter diagram (figure 17.3), suggesting that for the
most part, when et−1 is positive et will be positive also, or alternatively when et−1 is negative, et
will be negative also; this illustrates positive autocorrelation.

Rho

− 0.9
− 0.6
− 0.3
0.0
0.3

Figure 17.4
Rho list

et-1

Figure 17.5
ρ=0
554 Chapter 17

et-1

Figure 17.6
ρ = 0.9

17.3 Autocorrelation and the Ordinary Least Squares (OLS) Estimation Procedure: The
Consequences

17.3.1 The Mathematics

Now let us explore the consequences of autocorrelation. Just as with heteroskedasticity, we will
focus on two of the three estimation procedures embedded within the ordinary least squares
(OLS) estimation procedure:
• Value of the coefficient.
• Variance of the coefficient estimate’s probability distribution.

Question: Are these estimation procedures still unbiased when autocorrelation is present?

and
• recognizing that the means of the error terms’ probability distributions equal 0 (since the error
terms represent random influences).
555 Autocorrelation (Serial Correlation)

Let us quickly review. First recall the arithmetic of means:

Mean of a constant plus a variable: Mean[c + x] = c + Mean[x]

Mean of a constant times a variable: Mean[cx] = c Mean[x]
Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]
To keep the algebra straightforward, we focused on a sample size of 3.

Equation for coefficient estimate:

∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x )2

Now some algebra obtains1

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Mean[ bx ] = Mean ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦

Applying Mean[c + x] = c + Mean[x]

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= β x + Mean ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= β x + Mean ⎢⎜ 2⎟
⎝
⎣ 1( x − x ) 2
+ ( x 2 − x ) 2
+ ( x3 − x ) ⎠ ⎦
Applying Mean[cx] = c Mean[x]
1
= βx + Mean [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
( x1 − x ) + ( x2 − x )2 + ( x3 − x )2
2

Applying Mean[x + y] = Mean[x] + Mean[y]

1
= βx + [Mean[( x1 − x )e1 ] + Mean[( x2 − x )e2 ] + Mean[( x3 − x )e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Applying Mean[cx] = c Mean[x]

1
= βx + [( x1 − x )Mean[ e1 ] + ( x2 − x )Mean[ e2 ] + ( x3 − x )Mean[ e3 ]]
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2

1. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can easily apply the arithmetic of means. Our results are unaffected by this assumption.
556 Chapter 17

Since Mean[e1] = Mean[e2] = Mean[e3] = 0

= βx

What is the critical point here? We have not relied on the error term/error term independence
premise to show that the estimation procedure for the coefficient value is unbiased. Consequently
we suspect that the estimation procedure for the coefficient value will continue to be unbiased
in the presence of autocorrelation.

Ordinary Least Squares (OLS) Estimation Procedure for the Variance of the Coefficient
Estimate’s Probability Distribution
Next consider the estimation procedure for the variance of the coefficient estimate’s probability
distribution used by the ordinary least squares (OLS) estimation procedure:
The strategy involves two steps:
• First, we used the adjusted variance to estimate the variance of the error term’s probability
distribution: EstVar[e] = SSR/Degress of freedoms.
• Second, we applied the equation relating the variance of the coefficient estimates
probability distribution and the variance of the error term’s probability distribution:
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2 .
T

Step 1: Estimate the variance of the

Step 2: Apply the relationship between the
error term’s probability distribution from
variances of coefficient estimate’s and error
the available information – data from the
term’s probability distributions
first quiz
↓ ↓
SSR Var[ e]
EstVar[ e] = Var[ bx ] =
∑
T
Degrees of freedom t =1
( xt − x )2

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2

Unfortunately, when autocorrelation is present, the second step is not justified. To understand
why, recall the arithmetic of variances:

Variance of a constant times a variable: Var[cx] = c2 Var[x]

Variance of the sum of a constant and a variable: Var[c + x] = Var[x]
Variance of the sum of two variables: Var[x + y] = Var[x] + Var[y] + Cov[x, y]

Focus on the variance of the sum of two variables:

Var[x + y] = Var[x] + Var[y] + Cov[x, y]

557 Autocorrelation (Serial Correlation)

Since the covariance of independent variables equals 0, we can simply ignore the covariance
terms when calculating the sum of independent variables. However, if two variables are not
independent, their covariance does not equal 0. Consequently, when calculating the variance of
the sum of two variables that are not independent we cannot ignore their covariance.

Var[x + y] = Var[x] + Var[y] + Cov[x, y]

x and y independent x and y not independent
↓ ↓
Cov[x, y] = 0 Cov[x, y] ≠ 0
↓ ↓
Can ignore covariance Cannot ignore covariance
↓
Var[x + y] = Var[x] + Var[y]

Next apply this to the error terms when autocorrelation is absent and when it is present:

When autocorrelation is absent When autocorrelation is present

↓ ↓
The error terms are independent The error terms not independent
↓ ↓
We can ignore the error term covariances We cannot ignore the error term covariances

We will now review our derivation of the relationship between the variance of the coefficient
estimate’s probability distribution and the variance of the error term’s probability distribution,
Var[ bx ] = Var[ e] ∑ t =1 ( xt − x )2 , to illustrate the critical role played by the error term/error term
T

independence premise. We began with the equation for the coefficient estimate:
Equation for coefficient estimate:

∑
T
t =1
( xt − x )et ( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3
bx = β x + = βx +
∑ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
T
t =1
( xt − x ) 2

Then we applied a little algebra:2

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

Var[ bx ] = Var ⎢β x + 1
⎣ ( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ⎥⎦
Applying Var[c + x] = Var[x]

2. Recall that to keep the algebra straightforward, we assume that the explanatory variables are constants. By doing so,
we can apply the arithmetic of variances easily. Our results are unaffected by this assumption.
558 Chapter 17

⎡ ( x − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 ⎤

= Var ⎢ 1
⎣ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎥⎦
2 2 2

Rewriting the fraction as a product

⎡⎛ ⎞
(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )⎤⎥
1
= Var ⎢⎜ 2⎟
⎣⎝ ( x1 − x ) + ( x2 − x ) + ( x3 − x ) ⎠ ⎦
2 2

Applying Var[cx] = c2 Var[x]

1
= Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Error term/error term independence premise: The error terms are independent, Var[x + y] =
Var[x] + Var[y]
1
= [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Applying Var[cx] = c2 Var[x]

1
= [( x1 − x )2 Var[ e1 ] + ( x2 − x )2 Var[ e2 ] + ( x3 − x )2 Var[ e3 ]]
[( x1 − x ) + ( x2 − x )2 + ( x3 − x )2 ]
2 2

Error term equal variance premise: The error term variance is identical, Var[e1] = Var[e2] =
Var[e3] = Var[e].

Factoring out the Var[e]

1
= [( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2 ] Var[ e]]
[( x1 − x ) 2
+ ( x2 − x ) + ( x3 − x )
2
]
2 2

Simplifying
Var[ e]
=
( x1 − x )2 + ( x2 − x )2 + ( x3 − x )2
Generalizing
Var[ e]
=
∑
T
t =1
( xt − x )2

Focus on the fourth step. When the error term/error term independence premise is satisfied,
that is, when the error terms are independent, we can ignore the covariance terms when calculat-
ing the variance of a sum of variables:
559 Autocorrelation (Serial Correlation)

1
Var [ e ] = Var [(( x1 − x )e1 + ( x2 − x )e2 + ( x3 − x )e3 )]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Error term/error term independence premise: The error terms are independent: Var[x + y] =
Var[x] + Var[y].

When autocorrelation is present, however, the error terms are not independent and the covariance
terms cannot be ignored. Therefore, when autocorrelation is present, the fourth step is invalid:

1
Var[ bx ] = [Var[( x1 − x )e1 ] + Var[( x2 − x )e2 ] + Var[( x3 − x )e3 ]]
[( x1 − x ) 2
+ ( x2 − x )2 + ( x3 − x )2 ]
2

Consequently, in the presence of autocorrelation, the equation we used to describe the relation-
ship between the variances of the probability distribution for the error term and the probability
distribution coefficient estimate is no longer valid:

Var[ e]
Var[ bx ] =
∑
T
t =1
( xt − x )2

The procedure used by the ordinary least squares (OLS) to estimate the variance of the coef-
ficient estimate’s probability distribution is flawed.
Step 1: Estimate the variance of the error Step 2: Apply the relationship between the
term’s probability distribution from the variances of coefficient estimate’s and error
available information—data from the first quiz term’s probability distributions
↓ ↓
SSR Var[ e]
EstVar[ e] = Var[ bx ] =
∑
T
Degrees of freedom
t =1
( xt − x ) 2

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2

The equation that the ordinary least squares (OLS) estimation procedure uses to estimate the
variance of the coefficient estimate’s probability distribution is flawed when autocorrelation is
present. Consequently, how can we have faith in the variance estimate?

17.3.2 Our Suspicions

Let us summarize. After reviewing the algebra, we suspect that when autocorrelation is present,
the ordinary least squares (OLS) estimation procedure for the
560 Chapter 17

• coefficient value will still be unbiased;

• variance of the coefficient estimate’s probability distribution may be biased.

17.3.3 Confirming Our Suspicions

We will use a simulation to confirm our suspicions (shown in table 17.1 and figure 17.7).

Econometrics Lab 17.2: The Ordinary Least Squares (GLS) Estimation Procedure and
Autocorrelation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 17.2.]

Autocorrelation model: et = ρet−1 + vt Vt’s are independent

As a benchmark, we begin by specifying rho to equal .0; consequently no autocorrelation is

present. Click Start and then after many, many repetitions click Stop. As we observed before,
both the estimation procedure for the coefficient value and the estimation procedure for the
variance of coefficient estimate’s probability distribution are unbiased. When the ordinary least
squares (OLS) standard regression premises are met, all is well. But, what happens when auto-
correlation is present and the error term/error term independence premise is violated? To inves-
tigate this, we set rho to equal .6. Click Start and then after many, many repetitions click Stop.
There is both good news and bad news:
• Good news: The ordinary least squares (OLS) estimation procedure for the coefficient value
is still unbiased. The average of the estimated values equals the actual value, 2.

Table 17.1
Autocorrelation simulation results

Is OLS estimation Is OLS estimation procedure for

procedure for the coefficient’s the variance of the of the coefficient
value unbiased? estimate’s probability distribution unbiased?

Actual Estimate of Variance of the Estimate of the variance

coefficient coefficient estimated coefficient for coefficient estimate’s
Sample size value value values probability distribution
30
↓ ↓ ↓ ↓
Mean (average) Variance of the Average of
Actual of the estimated estimated coefficient estimated variances,
Estim value values, bx, from values, bx, from EstVar[bx], from
Rho proc of βx all repetitions all repetitions all repetitions

0 OLS 2.0 ≈2.0 ≈ 0.22 ≈ 0.22

0.6 OLS 2.0 ≈2.0 ≈1.11 ≈ 0.28
561 Autocorrelation (Serial Correlation)

Rho

− 0.9
− 0.6
− 0.3
0.0
0.3
0.6
0.9

Figure 17.7
Specifying rho

• Bad news: The ordinary least squares (OLS) estimation procedure for the variance of the
coefficient estimate’s probability distribution is biased. The average the actual variance of the
estimated coefficient values equals 1.11 while the average of the estimated variances equals 0.28.

Just as we feared, when autocorrelation is present, the ordinary least squares (OLS) calculations
to estimate the variance of the coefficient estimates are flawed.
When the estimation procedure for the variance of the coefficient estimate’s probability dis-
tribution is biased, all calculations based on the estimate of the variance will be flawed also;
that is, the standard errors, t-statistics, and tail probabilities appearing on the ordinary least
squares (OLS) regression printout are unreliable. Consequently we will use an example to
explore how we account for the presence of autocorrelation.

17.4 Accounting for Autocorrelation: An Example

We can account for autocorrelation by applying the following steps:

Step 1: Apply the ordinary least squares (OLS) estimation procedure. Estimate the model’s
parameters with the ordinary least squares (OLS) estimation procedure.
Step 2: Consider the possibility of autocorrelation.
• Ask whether there is reason to suspect that autocorrelation may be present.
• Use the ordinary least squares (OLS) regression results to “get a sense” of whether autocorrela-
tion is a problem by examining the residuals.
• Use the Lagrange multiplier approach by estimating an artificial regression to test for the
presence of autocorrelation.
• Estimate the value of the autocorrelation parameter, ρ.
Step 3: Apply the generalized least squares (GLS) estimation procedure.
• Apply the model of autocorrelation and algebraically manipulate the original model to derive
a new, tweaked model in which the error terms do not suffer from autocorrelation.
562 Chapter 17

• Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the
tweaked model.
Time series data often exhibit autocorrelation. We will consider monthly consumer durables
data:
Consumer durable data: Monthly time series data of consumer durable consumption and
income statistics 2004 to 2009.
ConsDurt Consumption of durables in month t (billions of 2005 chained dollars)
Const Consumption in month t (billions of 2005 chained dollars)
Inct Disposable income in month t (billions of 2005 chained dollars)

Project: Assess the effect of disposable income on the consumption of consumer durables.

These particular start and end dates were chosen to illustrate the autocorrelation phenomenon
clearly.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Consumption and Disposable Income.]

We will focus on a traditional Keynesian model to explain the consumption of consumer

durables:

Model: ConsDurt = βConst + βIInct + et

Economic theory suggests that higher levels of disposable income increase the consumption of
consumer durables:

Theory:

βI > 0

Higher disposable income increases the consumption of durables.

Step 1: Apply the ordinary least squares (OLS) estimation procedure (table 17.2).

We now formulate the null and alternative hypotheses:

H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables

As always, the null hypothesis challenges the evidence; the alternative hypothesis is consistent
with the evidence. Next we calculate Prob[Results IF H0 true].

Prob[Results IF H0 true]: What is the probability that the Inc coefficient estimate from one
repetition of the experiment will be 0.087 or more, if H0 were true (i.e., if the per capita income
has no effect on the Internet use, if βI actually equals 0)?
563 Autocorrelation (Serial Correlation)

Table 17.2
OLS consumer durable regression results

Ordinary least squares (OLS)

Dependent variable: ConsDur

Explanatory variable(s): Estimate SE t-Statistic Prob

Inc 0.086525 0.016104 5.372763 0.0000

Const 290.7887 155.4793 1.870273 0.0656
Number of observations 72
Estimated equation: EstConsDur = 290.8 + 0.087Inc
Interpretation of estimates:
bInc = 0.087: A $1 increase in real disposable income increases the real consumption of durable goods by $0.087
Critical result: The Inc coefficient estimate equals 0.087. This evidence, the positive sign of the coefficient estimate,
suggests that higher disposable income increases the consumption of consumer durables thereby
supporting the theory.

OLS estimation If H0 Number of Number of

SE
procedure unbiased true observations parameters
↓

Mean[bI] = βI = 0 SE[bI] = 0.0161 DF = 72 − 2 = 70

Econometrics Lab 17.3: Calculating Prob[Results IF H0 true]

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 17.3.]

To emphasize that the Prob[Results IF H0 true] depends on the standard error we will use the
Econometrics Lab to calculate the probability. The following information has already been
entered:

Mean = 0 Value = 0.087

Standard error = 0.0161 Degrees of freedom = 70

Click Calculate.

Prob[Results IF H0 true] = <0.0001

We use the standard error provided by the ordinary least squares (OLS) regression results to
compute the Prob[Results IF H0 true].
We can also calculate the Prob[Results IF H0 true] by using the tails probability reported in
the regression printout. Since this is a one-tailed test, we divide the tails probability by 2:
< 0.0001
Prob[ Results IF H 0 true] = ≈ < 0.0001
2
564 Chapter 17

Based on the 1 percent significance level, we would reject that null hypothesis. We would
reject the hypothesis that disposable income has no effect on the consumption of consumer
durables use.
There may a problem with this, however. The equation used by the ordinary least squares
(OLS) estimation procedure to estimate the variance of the coefficient estimate’s probability
distribution assumes that the error term/error term independence premise is satisfied. Our simula-
tion revealed that when autocorrelation is present and the error term/error term independence
premise is violated, the ordinary least squares (OLS) estimation procedure estimating the vari-
ance of the coefficient estimate’s probability distribution can be flawed. Recall that the standard
error equals the square root of the estimated variance. Consequently, if autocorrelation is present,
we may have entered the wrong value for the standard error into the Econometrics Lab when
we calculated Prob[Results IF H0 true]. When autocorrelation is present the ordinary least
squares (OLS) estimation procedure bases it computations on a faulty premise, resulting in
flawed standard errors, t-statistics, and tails probabilities. Consequently we should move on to
the next step.

Step 2: Consider the possibility of autocorrelation.

Unfortunately, there is reason to suspect that autocorrelation may be present. We would expect
the consumption of durables are not only influenced by disposable income, but also by the busi-
ness cycle:
• When the economy is strong, consumer confidence tends to be high; consumers spend more
freely and purchase more than “usual.” When the economy is strong the error term tends to be
positive.
• When the economy is weak, consumer confidence tends to be low; consumers spend less
freely and purchase less than “usual.” When the economy is weak the error term tends to be
negative.

We know that business cycles tend to last for many months, if not years. When the economy is
strong, it remains strong for many consecutive months; hence, when the economy is strong we
would expect consumers to spend more freely and for the error term to be positive for many
consecutive months. On the other hand, when the economy is weak, we would expect consumers
to spend less freely and the error term to be negative for many consecutive months.
565 Autocorrelation (Serial Correlation)

Economy strong Economy weak

↓ ↓

Consumer confidence was high In the last month consumer

last month; consumers spent confidence was low; consumers
→ et−1 > 0 → et−1 < 0
more freely, consume more, spent less freely, consume less,
last month. last month.
↓ ↓

Typically consumer confidence

Typically consumer confidence
will continue to be high this
will continue to be low this
month; consumers will spend → et > 0 → et < 0
month; consumers will spend less
more freely, consume more,
freely, consume less, this month
this month
As a consequence of the business cycle we would expect the error term to exhibit some “inertia.”
Positive error terms tend to follow positive error terms; negative error terms tend to follow nega-
tive error terms. So we suspect that the error terms are not independent; instead, we suspect that
the error terms will be positively correlated, positive autocorrelation. How can we “test” our
suspicions?
Of course, we can never observe the error terms themselves. We can, however, use the residu-
als to estimate the error terms:
Error term Residual
↓ ↓

yt = βConst + βxxt + et Rest = yt − Estt

↓ ↓

et = yt − (βConst + βxxt) Rest = yt − (bConst + bxxt)

We can think of the residuals as the estimated errors. Since the residuals are observable we use
the residuals as proxies for the error terms. Figure 17.8 plots the residuals.
The residuals are plotted consecutively, one month after another. As we can easily see, a posi-
tive residual is typically followed by another positive residual; a negative residual is typically
followed by a negative residual. “Switchovers” do occur, but they are not frequent. This suggests
that positive autocorrelation is present. Most statistical software provides a very easy way to
look at the residuals.
566 Chapter 17

100

25
Residual

0
Jan-04 Jun-05 Oct-06 Mar-08 Jul-09
–25

–50

–75

–100

Figure 17.8
Plot of the residuals

Getting Started in EViews

• First, run the regression.

• In the Equation window, click View.
• Click Actual, Fitted, Residual.
• Click Residual Graph.

It is also instructive to construct a scatter diagram (figure 17.9) of the residuals versus the
residuals lagged one month. Most of the scatter diagram points lie in the first and third quadrants.
The residuals are positively correlated.
Since the residual plots suggest that our fears are warranted, we now test the autocorrelation
model more formally. While there are many different approaches, we will focus on the Lagrange
multiplier (LM) approach, which uses an artificial regression to test for autocorrelation.3 We will
proceed by reviewing a mathematical model of autocorrelation.

3. The Durbin–Watson statistic is the traditional method of testing for autocorrelation. Unfortunately, the distribution
of the Durbin–Watson statistic depends on the distribution of the explanatory variable. This makes hypotheses testing
with the Durbin–Watson statistic more complicated than with the Lagrange multiplier test. Consequently we will focus
on the Lagrange multiplier test.
567 Autocorrelation (Serial Correlation)

100

20
Residual

0
–100 –75 –50 –25 0 25 50 75 100
–20

–40

–60

–80

–100

Residual lag

Figure 17.9
Scatter diagram of the residuals

Autocorrelation model: et = ρet−1 + vt

where vt’s are independent,

ρ=0 ρ≠0
↓ ↓

et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present

In this case we believe that ρ is positive. A positive rho provides the error term with inertia. A
positive error term tends to follow a positive error term and a negative error term tends to follow
a negative term. But also note that there is a second term, vt. The vt’s are independent; they
represent random influences that affect the error term also. It is the vt’s that “switch” the sign
of the error term.
Now we combine the original model with the autocorrelation model:
568 Chapter 17

Original model: yt = βConst + βxxt + et et’s are unobservable

Autocorrelation model: et = ρet−1 + vt vt’s are independent
Ordinary least squares (OLS) estimate: Estyt = bConst + bxxt
Residuals: Rest = yt − Estyt Rest’s are observable

Substituting for yt,

Rest = yt − Estyt
yt = βConst + βxxt + et
↓

Substituting for et,

= βConst + βxxt + et − Estyt
et = ρet−1 + vt
↓

Substituting for Estyt,

= βConst + βxxt + ρet−1 + vt − Estyt
Estyt = bConst + bxxt
↓

= βConst + βxxt + ρet−1 + vt − (bConst + bxxt)

Rearranging terms

= (βConst − bConst) + (βx − bx)xt + ρet−1 + vt Cannot observe et−1 use Rest−1 instead
↓

= (βConst − bConst) + (βx − bx)xt + ρRest−1 + vt

NB: Since the vt’s are independent, we need not worry about autocorrelation here.
Most statistical software allows us to easily assess this model (table 17.3).

Getting Started in EViews

• First, run the regression.

• In the Equation window, click View.
• Click Residual Diagnostics.
• Click Serial Correlation LM Test.
• Change the number of Lags to include from 2 to 1.

Critical result: The Resid(−1) coefficient estimate equals 0.8394. The positive sign of the coef-
ficient estimate suggests that an increase in last period’s residual increases this period’s residual.
This evidence suggests that autocorrelation is present.
Now we formulate the null and alternative hypotheses:
569 Autocorrelation (Serial Correlation)

Table 17.3
Lagrange multiplier test results

Lagrange multiplier (LR)

Dependent variable: Resid

Explanatory variable(s): Estimate SE t-Statistic Prob

Inc − 0.002113 0.008915 − 0.237055 0.8133

Const 19.96027 86.07134 0.231904 0.8173
Resid(−1) 0.839423 0.066468 12.62904 0.0000
Number of observations 72
Presample missing value lagged residuals set to zero

H0: ρ = 0 No autocorrelation present

H1: ρ > 0 Positive autocorrelation present

The null hypothesis challenges the evidence by asserting that no autocorrelation is present. The
alternative hypothesis is consistent with the evidence.
Next we calculate Prob[Results IF H0 true].

Prob[Results IF H0 true]: What is the probability that the coefficient estimate from one regres-
sion would be 0.8394 or more, if the H0 were true (i.e., if no autocorrelation were actually
present, if ρ actually equals 0)?

Using the tails probability reported in the regression printout obtains

Prob[Results IF H0 True] <0.0001

Autocorrelation appears to be present; accordingly, we will now return to the autocorrelation

model to estimate the parameter, ρ.

Autocorrelation model: et = ρet−1 + vt

where vt’s are independent,

ρ=0 ρ≠0
↓ ↓

et = vt et depends on et−1
↓ ↓
No autocorrelation Autocorrelation present

In practice there are a variety of ways to estimate ρ. We will discuss what is perhaps the most
straightforward. Since the error terms are unobservable, we “replace” the error terms with the
residuals:
570 Chapter 17

Table 17.4
Regression results—Estimating ρ

Ordinary least squares (OLS)

Dependent variable: Residual

Explanatory variable(s): Estimate SE t-Statistic Prob

ResidualLag 0.839023 0.064239 13.06089 0.0000

Number of observations 71
Estimated equation: Residual = 0.0890ResidualLag
Critical result: The ResidualLag coefficient estimate equals 0.8390; that is, the estimated value of ρ equals 0.8390.

Model: et = ρet−1 + vt
↓ ↓

Rest = ρRest−1 + vt

where vt’s are independent. Note that there is no constant in this model (table 17.4).

Estimate of ρ = Estρ = 0.8390

Getting Started in EViews

• Run the original regression; EViews automatically calculates the residuals and places them in
the variable resid.
• EViews automatically modifies Resid every time a regression is run. Consequently we will
now generate two new variables before running the next regression to prevent a “clash:”
residual = resid
residuallag = residual(−1)
•Now specify residual as the dependent variable and residuallag as the explanatory variable;
do not forget to “delete” the constant.

Step 3: Apply the generalized least squares (GLS) estimation procedure.

Strategy: Our strategy for dealing with autocorrelation will be similar to our strategy for dealing
with heteroskedasticity. Algebraically manipulate the original model so that the problem of
autocorrelation is eliminated in the new model. That is, tweak the original model so that the
error terms in the tweaked model are independent.

We can accomplish this with a little algebra. We begin with the original model and then apply
the autocorrelation model:
Original model: yt = βConst + βxxt + et
571 Autocorrelation (Serial Correlation)

Autocorrelation model: et = ρet−1 + vt

where vt’s are independent.

Original model for period t:

yt = βConst + βxxt + et
Original model for period t − 1:

yt−1 = βConst + βxxt−1 + et−1

Multiplying by ρ,

ρyt−1 = ρβConst + ρβxxt−1 + ρet−1

Rewrite the equations for yt and by ρyt−1:

yt = βConst + βxxt + et

ρyt−1 = ρβConst + ρβxxt−1 + ρet−1

Subtracting

yt − ρyt−1 = βConst − ρβConst + βxxt − ρβxxt−1 + et − ρet−1

Factoring out βx
↓

yt − ρyt−1 = βConst − ρβConst + βx(xt − ρxt−1) + et − ρet−1

Substituting for et
↓

yt − ρyt−1 = βConst − ρβConst + βx(xt − ρxt−1) + ρet−1 + vt − ρet−1

↓ Simplifying
(yt − ρyt−1) = (βConst − ρβConst) + βx(xt − ρxt−1) + vt

In the tweaked model:

New dependent variable: yt − ρyt−1

New explanatory variable: xt − ρxt−1

Critical point: In the tweaked model, vt’s are independent; hence we need not be concerned
about autocorrelation in the tweaked model.

Now let us run the tweaked regression for our example; using the estimate of ρ, we generate
two new variables:

New dependent variable: AdjConsDurt = yt − Estρyt−1

AdjConsDurt = ConsDurt − 0.8390ConsDurt−1
572 Chapter 17

Table 17.5
GLS regression results—Accounting for autocorrelation

Ordinary least squares (OLS)

Dependent variable: AdjConsDur

Explanatory variable(s): Estimate SE t-Statistic Prob

AdjInc 0.040713 0.028279 1.439692 0.1545

Const 118.9134 44.43928 2.675861 0.0093
Number of observations 71
Estimated equation: EstAdjConsDur = 118.9 + 0.041Inc
Interpretation of estimates:
bAdjInc = 0.041: A $1 increase in real disposable income increases the real consumption of durable goods by $0.041
Critical result: The Inc coefficient estimate equals 0.041. This evidence, the positive sign of the coefficient estimate,
suggests that higher disposable income increases the consumption of consumer durables thereby
supporting the theory.

New explanatory variable: AdjInct = xt − Estρxt−1

AdjInct = Inct − 0.8390Inct−1

Then we estimate the tweaked model (table 17.5):

We now review of null and alternative hypotheses:

H0: βI = 0 Higher disposable income does not affect the consumption of durables
H1: βI > 0 Higher disposable income increases the consumption of durables

Then, using the tails probability, we calculate Prob[Results IF H0 true]:

0.1545
Prob[ Results IF H 0 true] = ≈ 0.0772
2

After accounting for autocorrelation, we cannot reject the null hypothesis at the 1 or 5 percent
significance levels.
Let us now compare the disposable income coefficient estimate in last regression, the general-
ized least squares (GLS) regression that accounts for autocorrelation, with the disposable income
coefficient estimate in the ordinary least squares (OLS) regression that does not account for
autocorrelation (table 17.6). The most striking differences are the calculations that are based on
the estimated variance of the coefficient probability distribution: the coefficient’s standard error,
t-statistic, and tails probability. The standard error nearly doubles when we account for autocor-
relation. This is hardly surprising. The ordinary least squares (OLS) regression calculations are
based on the premise that the error terms are independent. Our analysis suggests that this is not
true. The general least squares (GLS) regression accounts for error term correlation. The standard
573 Autocorrelation (Serial Correlation)

Table 17.6
Coefficient estimate comparison

βI Coefficient Standard Tails

estimate error t-Statistic probability

Ordinary least squares (OLS) 0.087 0.016 5.37 < 0.0001

Generalized least squares (GLS) 0.041 0.028 1.44 0.1545

Table 17.7
Autocorrelation simulation results

Sample size: 30 Actual Mean (average) of the Variance of the estimated

value estimated values, bx, coefficient values, bx,
Rho Estim proc of βx from all repetitions from all repetitions

0.6 OLS 2.0 ≈2.0 ≈1.11

0.6 GLS 2.0 ≈2.0 ≈1.01

error, t-statistic, and tails probability in the general least squares (GLS) regression differ
substantially.

17.5 Justifying the Generalized Least Squares (GLS) Estimation Procedure

We will now use a simulation to illustrate that the generalize least squares (GLS) estimation
procedure indeed provides “better” estimates than the ordinary least squares (OLS) estimation
procedure. While both procedures provide unbiased estimates of the coefficient’s value, only the
generalized least squares (GLS) estimation procedure provides an unbiased estimate of the
variance.

Econometrics Lab 17.4: Generalized Least Squares (GLS) Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 17.4.]

As before, choose a rho of 0.6; by default the ordinary least squares (OLS) estimation procedure
is chosen. Click Start and then after many, many repetitions click Stop. When the ordinary least
squares (OLS) estimation procedure is used, the variance of the estimated coefficient values
equals about 1.11. Now specify the generalized least squares (GLS) estimation procedure by
clicking GLS. Click Start and then after many, many repetitions click Stop. When the general-
ized least squares (GLS) estimation procedure is used, the variance of the estimated coefficient
values is less, 1.01. Consequently the generalized least squares (GLS) estimation procedure
provides more reliable estimates (table 17.7).
574 Chapter 17

Table 17.8
OLS regression results—Robust standard errors

Ordinary least squares (OLS)

Dependent variable: ConsDur

Explanatory variable(s): Estimate SE t-Statistic Prob

Inc 0.086525 0.028371 3.049804 0.0032

Const 290.7887 268.3294 1.083701 0.2822
Number of observations 72
Estimated equation: EstConsDur = 290.8 + 0.087Inc
Interpretation of estimates:
bInc = 0.087: A $1 increase in real disposable income increases the real consumption of durable goods by $0.087.

17.6 Robust Standard Errors

Like heteroskedasticity, two issues emerge when autocorrelation is present:

As before, robust standard errors address the first issue arising when autocorrelation is present.
Newey–West standard errors provide one such approach that is suitable for both autocorrelation
and heteroskedasticity. This approach applies the same type of logic that we used to motivate
the White approach for heteroskedasticity, but it is more complicated. Consequently we will not
attempt to motivate the approach here. Statistical software makes it easy to compute Newey–
West robust standard errors (table 17.8).4

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Consumption and Disposable Income.]

Getting Started in EViews

• Run the ordinary least squares (OLS) regression.

• In the Equation window, click Estimate and Options.
• In the Coefficient covariance matrix box select HAC (Newey–West) from the dropdown list.
• Click OK.

4. While it is beyond the scope of this textbook, it can be shown that while this estimation procedure is biased, the
magnitude of the bias diminishes and approaches zero as the sample size approaches infinity.
575 Autocorrelation (Serial Correlation)

Chapter 17 Review Questions

1. In words, what is autocorrelation?

2. When autocorrelation is present, are all the standard ordinary least (OLS) premises satisfied?
If not, which one(s) is violated?
3. Suppose that autocorrelation is present in a regression.
a. What is the “good news?”
b. What is the “bad news?”
4. When autocorrelation is present, what strategy do we use to account for it?

Chapter 17 Exercises

Petroleum consumption data for Massachusetts: Annual time series data of petroleum con-
sumption and prices for Massachusetts from 1970 to 2004.

PetroConst Consumption of petroleum for observation t (1,000s of gallons)

PetroConsPCt Per capita consumption of petroleum for observation t (gallons)
Popt Population for observation t (persons)
PriceRealt Real price of petroleum for observation t (1982–84 dollars per gallon)
Yeart Year

1. Consider the following linear model:

PetroConsPCt = βConst + βPPriceRealt + et

a. Develop a theory regarding how the explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient. Interpret the coefficient estimates. What are the critical results?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.

2. Consider the possibility of autocorrelation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass.]
576 Chapter 17

a. Time series data often exhibit autocorrelation. Consequently plot the residuals. Does the
plot of the residuals possible suggest the presence of autocorrelation?
b. Use the Lagrange multiplier approach by estimating an artificial regression to test for the
presence of autocorrelation.
c. Estimate the value of the autocorrelation parameter, ρ.

3. Apply the generalized least squares (GLS) estimation procedure.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Petroleum Consumption—Mass.]

a. Apply the model of autocorrelation and algebraically manipulate the original model to
derive a new, tweaked model in which the error terms do not suffer from autocorrelation.
b. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of
the tweaked model.

4. How, if at all, does accounting for autocorrelation affect the assessment of your theory?

Crime data for California: Annual time series data of crime and economic statistics for
California from 1989 to 2008.

CrimesAllt Crimes per 100,000 persons in year t

IncPCt Per capita disposable personal income in year t (1982–84 dollars)
UnemRatet Unemployment rate in year t (percent)
Statet Name of state t

5. Apply the ordinary least squares (OLS) estimation procedure.

a. Consider the following linear model:

CrimesAllt = βConst + βIIncPCt + βUnUnemRatet + et

Develop a theory regarding how the explanatory variable influences the dependent variable.
What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of each
coefficient. Do the signs of the coefficient estimates lend support for your theory?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crime in California.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
577 Autocorrelation (Serial Correlation)

6. Consider the possibility of autocorrelation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crime in California.]

7. Apply the generalized least squares (GLS) estimation procedure.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crime in California.]

8. How, if at all, does accounting for autocorrelation affect the assessment of your theories?
Explanatory Variable/Error Term Independence Premise, Consistency, and
18 Instrumental Variables

Chapter 18 Outline

18.1 Review
18.1.1 Regression Model
18.1.2 Standard Ordinary Least Squares (OLS) Premises
18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure

18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure

18.3 A Closer Look at the Explanatory Variable/Error Term Independence Premise

18.4 Explanatory Variable/Error Term Correlation and Bias

18.4.1 Geometric Motivation
18.4.2 Confirming Our Logic

18.5 Estimation Procedures: Large and Small Sample Properties

18.5.1 Unbiased and Consistent Estimation Procedure
18.5.2 Unbiased but Inconsistent Estimation Procedure
18.5.3 Biased but Consistent Estimation Procedure

18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency

18.7 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure

18.7.1 Motivation of the Instrumental Variables Estimation Procedure
18.7.2 Mechanics
18.7.3 The “Good” Instrument Conditions
18.7.4 Justification of the Instrumental Variables Estimation Procedure
580 Chapter 18

Chapter 18 Prep Questions

1. Consider the following model:

yt = βConst + βxxt + et, t = 1, 2, . . . , T

where

yt = dependent variable
et = error term
xt = explanatory variable
T = sample size

Suppose that the actual constant equals 6 and the actual coefficient equals 1/2:

1
βConst = 6, β x =
2

Also suppose that the sample size is 6. The following table reports the value of the explanatory
variable and the error term for each of the six observations:

Observation xt et
1 2 4
2 6 2
3 10 3
4 14 −2
5 18 −1
6 22 −4

a. On a sheet of graph paper place x on the horizontal axis and e on the vertical axis.
i. Plot a scatter diagram of x and e.
ii. As xt increases, does et typically increase or decrease?
iii. Is et positively or negatively correlated with xt?
b. Immediately below this graph construct a second graph with x on the horizontal axis and
y on the vertical axis.
i. Plot the line depicting the actual equation, the line representing the actual constant and
the actual coefficient:

1
y = 6+ x
2
581 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

ii. Fill in the following blanks for each observation t:

Observation xt et yt
1 2 4 _____
2 6 2 _____
3 10 3 _____
4 14 −2 _____
5 18 −1 _____
6 22 −4 _____
iii. Plot the x and y values for each of the six observations.
c. Based on the points you plotted in your second graph, “eyeball” the best fitting line and
sketch it in.
d. How are the slope of the line representing the actual equation and the slope of the best
fitting line related?
2. Recall the poll Clint conducted to estimate the fraction of the student population that sup-
ported him in the upcoming election for class president. He used the following approach:

Random sample technique: Write the name of each individual in the population on a 3 × 5 card.

Perform the following procedure 16 times:

• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she is voting for Clint and record the answer.
• Replace the card.

Calculate the fraction of the sample supporting Clint.

Now consider an alternative approach. You are visiting Clint in his dorm room when he asks
you to conduct the poll. Instead of writing the name of each individual on a 3 × 5 card, you
simply leave Clint’s room and ask the first 16 people you run into how he/she will vote:

Nonrandom Sample Technique:

• Leave Clint’s dorm room and ask the first 16 people you run into if he/she is voting for Clint.
• Calculate the fraction of the sample supporting Clint.

Use the Econometrics Lab to simulate the two sampling techniques (figure 18.1).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18P.2.]
582 Chapter 18

ActFrac Sample size

Sample
0.1 10
size
Actual
0.2 16
population
fraction 0.3 25
0.4
0.5
0.6
Is Clint’s estimation
Nonrandom sample procedure unbiased?

Start Stop Pause

Repetition Mean (average) of the

numerical values of the
EstFrac estimated fraction from
all repetitions
Numerical value
Mean
of the sample
estimated in this Variance of the
repetition Var numerical values of the
estimated fraction from
all repetitions

Figure 18.1
Opinion Poll simulation

a. Answer the questions posed in the lab, and then fill in the following blanks:

After many, many repetitions

Sampling Population Sample Mean (average) Magnitude Variance
technique fraction size of estimates of bias of estimates
Random 0.50 16 _____ _____ _____
Nonrandom 0.50 16 _____ _____ _____
Nonrandom 0.50 25 _____ _____ _____
Nonrandom 0.50 100 _____ _____ _____

Focus on the nonrandom sampling technique.

b. What happens to the mean of the estimated fraction as the sample size increases?
c. Explain why your answer to part b “makes sense.” To do so, consider the following
questions:
583 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

i. Compared to the general student population, are the students who live near Clint more
likely to be Clint’s friends?
ii. Compared to the general student population, are the students who live near Clint more
likely to vote for him?
iii. Would the nonrandom sampling technique bias the poll in Clint’s favor?
iv. What happens to the magnitude of the bias as the sample size increases? Explain.
d. As the sample size increases, what happens to the variance of the estimates?

18.1 Review

18.1.1 Regression Model

We begin by reviewing the basic regression model:

yt = βConst + βxxt + et, t = 1, 2, . . . , T

where

yt = dependent variable
et = error term
xt = explanatory variable
T = sample size

The error term is a random variable that represents random influences:

Mean[et] = 0

18.1.2 The Standard Ordinary Least Squares (OLS) Premises

Recall the standard ordinary least squares (OLS) regression premises:

• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent: Cov[ei, ej]
= 0.

Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.
584 Chapter 18

• Explanatory variable/error term independence premise: The explanatory variables, the

xt’s, and the error terms, the et’s, are not correlated.

Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.

18.1.3 Estimation Procedures Embedded within the Ordinary Least Squares (OLS)
Estimation Procedure

The ordinary least squares (OLS) estimation procedure includes three important estimation
procedures:
• Values of the regression parameters, βx and βConst:

∑
T
t =1
( yt − y )( xt − x )
bx = and bConst = y − bx x
∑
T
t =1
( xt − x )2

• Variance of the error term’s probability distribution, Var[e]:

SSR
EstVar[ e] =
Degrees of freedom

• Variance of the coefficient estimate’s probability distribution, Var[bx]:

EstVar[ e]
EstVar[ bx ] =
∑
T
t =1
( xt − x )2

18.2 Taking Stock and a Preview: The Ordinary Least Squares (OLS) Estimation Procedure

The ordinary least square (OLS) estimation procedure is economist’s most widely used estima-
tion procedure. When contemplating the use of this procedure, we should keep two issues in
585 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

mind: Is the ordinary least squares (OLS) estimation procedure for the coefficient value unbi-
ased? If unbiased, is the ordinary least squares (OLS) estimation procedure reliable in the fol-
lowing two ways:
• Can the calculations for the standard errors be trusted?
• Is the ordinary least square (OLS) estimation procedure for the coefficient value the most
reliable, the best linear unbiased estimation procedure (BLUE)?

In the previous two chapters we showed that the violation the first two standard ordinary least
squares (OLS) premises, the error term equal variance premise and the error term/error term
independence premise, does not cause the ordinary least squares (OLS) estimation procedure
for the coefficient value to be biased. This was good news. We then focused on the reliability
issue. We learned that the standard error calculations could not be trusted and that the ordinary
least squares (OLS) estimation procedure was not the best linear unbiased estimation procedure
(BLUE). In this chapter we turn our attention to the third premise, explanatory variable/error
term independence. Unfortunately, violation of the third premise does cause the ordinary least
squares (OLS) estimation procedure for the coefficient value to be biased. The explanatory vari-
able/error term independence premise determines whether or not the ordinary least squares
(OLS) estimation procedure is unbiased or biased. Figure 18.2 summarizes the roles played by
the three standard premises.

OLS bias question: Is the explanatory Satisfied: Violated:

variable/error term independence Explanatory variable Explanatory variable
premise satisfied or violated? and error term and error term
independent correlated

Is the OLS estimation procedure

for the value of the coefficient
unbiased or biased? Unbiased Biased

Satisfied Violated
OLS reliability question: Are the
error term equal variance and error
term/error term independence premises
satisfied or violated?

Can the OLS calculations for the

Yes No
standard error be “trusted”?

Is the OLS estimation procedure for

Yes No
the value of the coefficient BLUE?

Figure 18.2
OLS bias and reliability flow diagram
586 Chapter 18

This chapter begins by explaining why bias results when the explanatory variable/error term
independence premise is violated. Next we introduce a new property that is used to describe
estimation procedures, consistency. Typically, consistency is considered to be not as desirable
as is being unbiased, but in some cases, estimation procedures that are biased sometimes meet
the consistency standard. We close the chapter by introducing one such procedure: the Instru-
mental Variables (IV) estimation procedure.

18.3 A Closer Look at the Explanatory Variable/Error Term Independence Premise

We begin by using a simulation to illustrate the explanatory variable/error term independence

premise:
• Explanatory variable/error term independence premise: The explanatory variables, the
xt’s, and the error terms, the et’s, are not correlated. Knowing the value of an observation’s
explanatory variable does not help us predict the value of that observation’s error term.

Econometrics Lab 18.1: Explanatory Variable/Error Term Independence and Correlation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.1.]

Initially the explanatory variable/error term correlation coefficient equals 0. Be certain that the
Pause checkbox is checked. Then click the Start button. Note that the blue points indicate the
observations with low x values, the black points the observations medium x values, and the red
points the observations with high x values. Click the Start button a few more times to convince
yourself that this is always true. Now clear the Pause checkbox and click Start. After many,
many repetitions, click Stop. Note that the scatter diagram points are distributed more or less
evenly across the graph as shown in figure 18.3.
Since the points are spread evenly, knowing the value of the explanatory variable, xt, does not
help us predict the value of the error term, et. The explanatory variable and the error term are
independent: the explanatory variable/error term independence premise is satisfied. The value
of x, low, medium, or high, does not affect the mean of the error terms. The mean is approxi-
mately 0 in each case (figure 18.4).
Next we select 0.60 in the Corr X&E list. Consequently the explanatory variable and error
term are now positively correlated. After many, many repetitions we observe that the explanatory
variable and the error term are no longer independent. The scatter diagram points are no longer
spread evenly; a pattern emerges. As illustrated in figure 18.5, as the value of explanatory vari-
able rises, the error term tends to rise also:
•When the value of the explanatory variable is low, the error term is typically negative. The
mean of the low x value error terms is negative (figure 18.6).
587 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

x
t

Figure 18.3
Scatter diagram—Corr X&E = 0

Low x's Medium x's High x's

Mean: 0 Variance: 500 Mean: 0 Variance: 500 Mean: 0 Variance: 500

Figure 18.4
Error term probability distributions—Corr X&E = 0

•When the value of the explanatory variable is high and the error term is typically positive. The
mean of the high x value error terms is positive (figure 18.6).

Last, we select −0.60 in the Corr X&E list. Again, the scatter diagram points are not spread
evenly (figure 18.7). The explanatory variable and error term are now negatively correlated. As
the value of explanatory variable rises, the error term falls:
•When the value of the explanatory variable is low, the error term is typically positive. The
mean of the low x value error terms is positive.
•When the value of the explanatory variable is high, the error term is typically negative. The
mean of the high x value error terms is negative.
588 Chapter 18

e
t

x
t

Figure 18.5
Scatter diagram—Corr X&E = 0.6

Low x's Medium x's High x's

Mean: –24 Variance: 500 Mean: 0 Variance: 500 Mean: 24 Variance: 500

Figure 18.6
Error term probability distributions—Corr X&E = 0.6

We will proceed by explaining geometrically why correlation between the explanatory vari-
ables and error terms biases the ordinary least squares (OLS) estimation procedure for coefficient
value. Then we will use a simulation to confirm our logic.

18.4 Explanatory Variable/Error Term Correlation and Bias

18.4.1 Geometric Motivation

Focus attention on figure 18.8. The line in the lower two graphs represent the actual relationship
between the dependent variable, yt, and the explanatory variable, xt:

yt = βConst + βxxt
589 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

e
t

x
t

Figure 18.7
Corr X&E = −0.6

Explanatory variable and error term Explanatory variable and error term
e postively correlated e negatively correlated
t t

x x
t t

yt yt

Actual
equation line

xt xt

Figure 18.8
Explanatory variable/error term correlation
590 Chapter 18

βConst is the actual constant and βx the actual coefficient. Now we will examine the left and right
panels:
• Left panels of figure 18.8: The explanatory variable, xt, and error term, et, are positively
correlated as illustrated in the top left scatter diagram. The et tends to be low for low values of
xt and high for high values of xt. Now consider the bottom left scatter diagram in which the xt’s
and yt’s are plotted. When the explanatory variable and the error term are positively correlated,
the scatter diagram points tend to lie below the actual equation line for low values of xt and
above the actual equation line for high values of xt.
• Right panels of figure 18.8: The explanatory variable, xt, and error term, et, are negatively

correlated as illustrated in the top right scatter diagram. The et tends to be tends to be high for
low values of xt and low for high variables of xt. Now consider the bottom right scatter diagram
in which the xt’s and yt’s are plotted. When the explanatory variable and the error term are nega-
tively correlated, the scatter diagram points tend to lie above the actual regression line for low
values of xt and below the actual regression line for high values of xt.

In figure 18.9 we have added the best fitting line for each of the two panels:
• Left panels of figure 18.9: When the explanatory variable and error terms are positively cor-
related the best fitting line is more steeply sloped that the actual equation line; consequently the
ordinary least squares (OLS) estimation procedure for the coefficient value is biased upward.
• Right panels of figure 18.9: When the explanatory variable and error terms are negatively
correlated the best fitting line is less steeply sloped than the actual equation line; consequently,
the ordinary least squares (OLS) estimation procedure for the coefficient value is biased
downward.

18.4.2 Confirming Our Suspicions

Based on our logic we would expect the ordinary least squares (OLS) estimation procedure for
the coefficient value to be biased whenever the explanatory variable and the error term are
correlated.

Econometrics Lab 18.2: Ordinary Least Squares (OLS) and Explanatory Variable/Error Term
Correlation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.2.]

We can confirm our logic using a simulation. As a base case, we begin with 0.00 specified in
the Corr X&E list; the explanatory variables and error terms are independent. Click Start and
then after many, many repetitions click Stop. The simulation confirms that no bias results when-
ever the explanatory variable/error term independence premise is satisfied.
591 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Explanatory variable and Explanatory variable and

error term positively correlated error term negatively correlated
↓ ↓

↓
for low xt’s for high xt’s for low xt’s for high xt’s

↓ ↓ ↓ ↓
et’s low et’s high et’s high et’s low

↓ ↓ ↓ ↓
yt below actual yt above actual yt above actual yt below actual
regression line regression line regression line regression line

Explanatory variable and error term Explanatory variable and error term
positively correlated negatively correlated
et et

xt xt

yt yt
Best fitting line Actual
equation line

Actual
equation line
Best fitting line

xt xt

Figure 18.9
Explanatory variable/error term correlation with best fitting line

Now, specify 0.30 in the Corr X&E list; the explanatory variable and error terms are positively
correlated. Click Start and then after many, many repetitions click Stop. The average of the
estimated coefficient values, 6.1, exceeds the actual value, 2.0; the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased upward whenever the explanatory vari-
able and error terms are positively correlated. By selecting −0.6 from the “Corr X&E” list, we
can show that downward bias results whenever the explanatory variable and error terms are
negatively correlated. The average of the estimated coefficient values, −2.1, is less than the actual
value, 2.0 (table 18.1).
592 Chapter 18

Table 18.1
Explanatory variable/error term correlation—Simulation results

Estimation Corr Sample Actual Mean of Magnitude Variance of

procedure X&E size coef coef ests of bias coef ests

OLS 0.00 50 2.0 ≈ 2.0 ≈ 0.0 ≈ 4.0

OLS 0.30 50 2.0 ≈ 6.1 ≈ 4.1 ≈ 3.6
OLS − 0.30 50 2.0 ≈ −2.1 ≈ 4.1 ≈ 3.6

The simulation validates our logic:

Explanatory variable Explanatory variable

and error term and error term
positively correlated negatively correlated
↓ ↓
OLS estimation procedure OLS estimation procedure
for the coefficient value for the coefficient value
is biased upward is biased downward

18.5 Estimation Procedures: Small and Large Sample Properties

Explanatory variable/error term correlation creates a problem for the ordinary least squares
(OLS) estimation procedure. Positive correlation causes upward bias and negative correlation
causes downward bias. What can we do in these cases? Econometricians respond to this question
very pragmatically by adopting the philosophy that “half a loaf is better than none.” In general,
we use different estimation procedures that, while still biased, may meet an arguably less
demanding criterion called consistency. In most cases, consistency is not as desirable as is being
unbiased; nevertheless, if we cannot find an unbiased estimation procedure, consistency proves
to be better than nothing. After all, “half a loaf is better than none.” To explain the notion of
consistency, we begin by reviewing what it means for an estimation procedure to be unbiased
(figure 18.10).

Unbiased: An estimation procedure is unbiased whenever the mean (center) of the estimate’s
probability distribution equals the actual value.
Mean of the estimate’s probability distribution = Actual value

An unbiased estimation procedure does not systematically underestimate or overestimate the

actual value. The relative frequency interpretation of probability provides intuition. If the experi-
ment were repeated many, many times the average of the numerical values of the estimates will
equal the actual value:

Mean (average) of the estimates = Actual value after many, many repetitions
593 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Probability distribution

Estimate
Actual value
Figure 18.10
Unbiased estimation procedures and the probability distribution of estimates

Being unbiased is a small sample property because the size of the sample plays no role in
determining whether or not an estimation procedure is unbiased.

Consistent: Consistency is a large sample property; the size sample plays a critical role here.
Also both the mean and variance of the estimate’s probability distribution are important when
deciding if an estimation procedure is consistent.
Mean of the estimate’s probability distribution: Consistency requires the mean to either
• equal the actual value:

Mean[Est] = Actual value

or
• approach the actual value as the sample size approaches infinity:

Mean[Est] → Actual value

Sample size → ∞

That is, either the

• estimation procedure must be unbiased
594 Chapter 18

or
• magnitude of the bias must diminish as the sample size becomes larger.

Consistency requires the variance to diminish

Variance of the estimate’s probability distribution:
as the sample size becomes larger; more specifically, the variance must approach 0 as the sample
size approaches infinity:

Variance[Est] → 0

Sample size → ∞

Figure 18.11 illustrates the relationship between the two properties of estimation procedures.
Figure 18.12 provides a flow diagram, a “roadmap,” that we can use to determine the properties
of an estimation procedure.
To illustrate the distinction between these two properties of estimation procedures we will
consider three examples:
• Unbiased and consistent.
• Unbiased but not consistent.
• Biased and consistent.

Consistent

Unbiased

Figure 18.11
Unbiased and consistent estimation procedures
595 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Does Mean[Est] equal the actual value?

Biased
Yes

Does Mean[Est] Actual value

as the sample size ?

Unbiased Yes

Does Var[Est] 0
No
as the sample size ?

Yes No

Consistent Not consistent

Figure 18.12
Determining the properties of an estimation procedure

18.5.1 Unbiased and Consistent Estimation Procedure

When the standard ordinary least squares (OLS) premises are met the ordinary least squares
(OLS) estimation procedure is not only unbiased, but also consistent. We will use our Econo-
metrics Lab to illustrate this.

Econometrics Lab 18.3: Ordinary Least Squares (OLS) Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.3.]

This estimation procedure is unbiased and consistent (table 18.2). After many, many
repetitions:
• The average of the estimated coefficient values equals the actual value, 2.0, suggesting that
the estimation procedure is unbiased.
• The variance of the estimated coefficient values appears to be approaching 0 as the sample
size increases.
596 Chapter 18

Table 18.2
Unbiased and consistent estimation procedure

Estimation Corr Sample Actual Mean of Magnitude Variance of

procedure X&E size coef coef ests of bias coef ests

OLS 0.00 3 2.0 ≈ 2.0 ≈ 0.0 ≈ 2.50

OLS 0.00 6 2.0 ≈ 2.0 ≈ 0.0 ≈1.14
OLS 0.00 10 2.0 ≈ 2.0 ≈ 0.0 ≈ 0.67
OLS 0.00 100 2.0 ≈ 2.0 ≈ 0.0 ≈ 0.07
OLS 0.00 250 2.0 ≈ 2.0 ≈ 0.0 ≈ 0.03

Large sample

Small sample

Estimate
Actual value

Figure 18.13
OLS estimation procedure—Probability distributions

When the standard ordinary least squares (OLS) premises are met, the ordinary least squares
(OLS) estimation procedure provides us with the best of all possibilities; it is both unbiased and
consistent (figure 18.13).

18.5.2 Unbiased but Inconsistent Estimation Procedure

The Any Two estimation procedure that we introduced in chapter 6 provides us with an example
of an estimation procedure that is unbiased but not consistent. Let us review the Any Two esti-
mation procedure. First we construct a scatter diagram plotting the explanatory variable on the
horizontal axis and the dependent variable on the vertical axis. Then we choose any two points
at random and draw a straight line connecting these points. The coefficient estimate equals the
slope of this line (figure 18.14).
597 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

yt Any Two

xt
3 9 15 21 27

Figure 18.14
Any Two estimation procedure

Table 18.3
Any Two estimation procedure

Estimation Sample Actual Mean of Magnitude Variance of

procedure size coef coef ests of bias coef ests

Any Two 3 2.0 ≈2.0 ≈ 0.0 ≈7.5

Any Two 6 2.0 ≈2.0 ≈ 0.0 ≈17.3
Any Two 10 2.0 ≈2.0 ≈ 0.0 ≈31.0

Econometrics Lab 18.4: Any Two Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.4.]

As table 18.3 reports the Any two estimation procedure is unbiased but not consistent. After
many, many repetitions:
• The average of the estimated coefficient values equals the actual value, 2.0, suggesting that
the estimation procedure is unbiased.
•The variance of the estimated coefficient values increases as the sample size increases (figure
18.15); consequently the estimation procedure is not consistent.
598 Chapter 18

Small sample

Large sample

Estimate
Actual value

Figure 18.15
Any Two estimation procedure—Probability distributions

18.5.3 Biased but Consistent Estimation Procedure

To illustrate an estimation procedure that is biased, but consistent, we will revisit the opinion
poll conducted by Clint. Recall that Clint used a random sampling procedure to poll the
population.

Random Sampling Procedure

Write the name of each individual in the population on a 3 × 5 card. Perform the following
procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.

Calculate the fraction of the sample supporting Clint. This estimation procedure proved to be
unbiased.
But now consider an alternative approach. Suppose that you are visiting Clint in his dorm
room and he asks you to conduct the poll. Instead of taking the time to write the name of each
individual on a 3 × 5 card, you simply leave Clint’s room and ask the first 16 people you run
into how he/she will vote.
599 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Nonrandom Sampling Procedure

Leave Clint’s dorm room.
• Ask the first 16 people you run into if he/she is voting for Clint.
• Calculate the fraction of the sample supporting Clint.

Why do we call this a nonrandom sampling technique? Compared to the general student
population:
• Are the students who live near Clint more likely to be a friend of Clint?
• Consequently, are the students who live near Clint more likely to vote for Clint?

Since your starting point is Clint’s dorm room, it is likely that you will poll students who are
Clint’s friends. They will probably be more supportive of Clint than the general student popula-
tion, will they not? Consequently we would expect this nonrandom polling technique to be biased
in favor of Clint. We will use a simulation to test our logic.

Econometrics Lab 18.5: Biased but Consistent Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.5.]

Observe that you can select the sampling technique by checking or clearing the Nonrandom
Sample checkbox (see figure 18.16). Begin by clearing the Nonrandom Sample checkbox to
choose the random sampling technique; this provides us with a benchmark. Click Start and then
after many, many repetitions click Stop. As before, we observe that the estimation procedure is
unbiased. Convince yourself that the random sampling technique is also consistent by increasing
the sample size from 16 to 25 and to 100.
Next specify the nonrandom technique that we just introduced by checking the “Nonrandom
Sample” checkbox. You walk out of Clint’s dorm room and poll the first 16 people you run into.
Click Start and then after many, many repetitions click Stop. The simulation results confirm
our logic. The nonrandom polling technique biases the poll results in favor of Clint. But now
what happens as we increase the sample size from 16 to 25 and then to 100?
We observe that while the nonrandom sampling technique is still biased, the magnitude of the
bias declines as the sample size increases (table 18.4). As the sample size increases from 16 to
25 to 100, the magnitude of the bias decreases from 0.06 to 0.04 to 0.01. This makes sense, does
it not? As the sample size becomes larger, you will be farther and farther from Clint’s dorm
room, which means that you will be getting larger and larger portion of your sample from the
general student population rather than Clint’s friends. Furthermore the variance of the estimates
also decreases as the sample size increases. This estimation procedure is biased but consistent.
After many, many repetitions:
600 Chapter 18

ActFrac Sample size

Sample
size
0.1 10
Actual 0.2 16
population
fraction 0.3 25
0.4
0.5
0.6

Is Clint’s estimation
Nonrandom sample procedure unbiased?

Start Stop Pause

Repetition Mean (average) of the

Figure 18.16
Opinion Poll simulation

Table 18.4
Opinion Poll simulation—Random and nonrandom samples

After many, many repetitions

Sampling Population Sample Mean (average) Magnitude Variance of

technique fraction size of estimates of bias estimates

Random 0.50 16 ≈ 0.50 ≈ 0.00 ≈ 0.016

Random 0.50 25 ≈ 0.50 ≈ 0.00 ≈ 0.010
Random 0.50 100 ≈ 0.50 ≈ 0.00 ≈ 0.0025
Nonrandom 0.50 16 ≈ 0.56 ≈ 0.06 ≈ 0.015
Nonrandom 0.50 25 ≈ 0.54 ≈ 0.04 ≈ 0.010
Nonrandom 0.50 100 ≈ 0.51 ≈ 0.01 ≈ 0.0025
601 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Large sample

Small sample

Estimate
Actual value

Figure 18.17
Nonrandom sample estimation procedure—Probability distributions

• The average of the estimates appears to be approaching the actual value, 0.5.
• The variance of the estimated coefficient values appears to be approaching 0 as the sample
size increases (figure 18.17).

18.6 The Ordinary Least Squares (OLS) Estimation Procedure, and Consistency

We have shown that when the explanatory variable/error term independence premise is violated,
the ordinary least squares (OLS) estimation procedure for the coefficient estimate is biased. But
might it be consistent?

Econometrics Lab 18.6: Ordinary Least Squares (OLS) Estimation Procedure and Consistency

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.6.]

Clearly, the magnitude of the bias does not diminish as the sample size increases (table 18.5).
The simulation demonstrates that when the explanatory variable/error term independence premise
is violated, the ordinary least squares (OLS) estimation procedure is neither unbiased nor con-
sistent. This leads us to a new estimation procedure, the instrumental variable (IV) estimation
procedure. Like ordinary least squares, the instrumental variables will prove to be biased when
the explanatory variable/error term independence premise is violated, but it has an advantage:
under certain conditions, the instrumental variable (IV) estimation procedure is consistent.
602 Chapter 18

Table 18.5
Explanatory variable/error term correlation—Simulation results

Estimation Corr Sample Actual Mean of Magnitude Variance of

procedure X&E size coef coef ests of bias coef ests

OLS 0.30 50 2.0 ≈ 6.1 ≈ 4.1 ≈3.6

OLS 0.30 100 2.0 ≈ 6.1 ≈ 4.1 ≈1.7
OLS 0.30 150 2.0 ≈ 6.1 ≈ 4.1 ≈1.2

Original model:
yt = Const + x xt + t , where yt = dependent variable
xt = explanatory variable
t = error term
When xt and t t = 1, 2, . . .,T , T = sample size
are correlated

xt is the “problem”
explanatory variable

Figure 18.18
“Problem” explanatory variable

18.7 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure

18.7.1 Motivation of the Instrumental Variables Estimation Procedure

In some situations the instrumental variable estimation procedure can mitigate, but not com-
pletely remedy, cases where the explanatory variable and the error term are correlated (figure
18.18). When an explanatory variable, xt, is correlated with the error term, εt, we will refer to
the explanatory variable as the “problem” explanatory variable. The correlation of the explana-
tory variable and the error term creates the bias problem for the ordinary least squares (OLS)
estimation procedure.
We begin by searching for another variable called an instrument. Traditionally, we denote the
instrument by the lower case Roman letter z, zt. An effective instrument must possess two prop-
erties. A “good” instrument, zt, must be
• correlated with the “problem” explanatory variable, xt, and
• independent of the error term, εt.

We use the instrument to provide us with an estimate of the “problem” explanatory variable.
Then this estimate is used as a surrogate for the “problem” explanatory variable. The estimate
of the “problem” explanatory variable, rather than the “problem” explanatory variable itself, is
used to explain the dependent variable.
603 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

18.7.2 Mechanics

Choose a “good” instrument: A “good” instrument, zt, must have be

• correlated with the “problem” explanatory variable, xt, and
• uncorrelated with the error term, εt.

Instrumental Variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the
problem explanatory variable, xt.
• Dependent variable: “Problem” explanatory variable, xt.
• Explanatory variable: Instrument, zt.
• Estimate of the “problem” explanatory variable: Estxt = aConst + azzt, where aConst and az are the
estimates of the constant and coefficient in this regression, IV Regression 1.

Instrumental Variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.

18.7.3 The “Good” Instrument Conditions

Let us now provide the intuition behind why a “good” instrument, zt, must satisfy the two condi-
tions mentioned above.

Instrument/”Problem” Explanatory Variable Correlation

The instrument, zt, must be correlated with the “problem” explanatory variable, xt. To understand
why, focus on IV Regression 1. We are using the instrument to create a surrogate for the
“problem” explanatory variable in IV Regression 1:

Estxt = aConst + azzt

The estimate, Estxt, will be a good surrogate only if it is a good predictor of the “problem”
explanatory variable, xt. This will occur only if the instrument, zt, is correlated with the “problem”
explanatory variable, xt.

Instrument/Error Term Independence

The instrument, zt, must be independent of the error term, εt. Focus on IV Regression 2. We
begin with the original model and then replace the “problem” explanatory, xt, variable with its
surrogate, Estxt:
604 Chapter 18

yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt

where Estxt = aConst + azzt from IV Regression 1.

To avoid violating the explanatory variable/error term independence premise in IV Regression
2, the surrogate for the “problem” explanatory variable, Estxt, must be independent of the error
term, εt. The surrogate, Estxt, is derived from the instrument, zt, in IV Regression 1:

Estxt = aConst + azzt

Consequently, to avoid violating the explanatory variable/error term independence premise, the
instrument, zt, and the error term, εt, must be independent.

Estxt and εt must be independent

yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt

zt and εt must be independent

18.7.4 Justification of the Instrumental Variables Estimation Procedure

As we will see, while instrumental variable estimation procedure will not solve the problem of
bias, it can mitigate it. We will use a simulation to illustrate that while the instrumental variable
(IV) estimation procedure is still biased, it is consistent when “good” instrument conditions are
satisfied (figure 18.19).

Econometrics Lab 18.7: Instrumental Variables (IV) Estimation Procedure and Consistency

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18.7.]

Two new correlation lists appear in this simulation: Corr X&Z and Corr Z&E. The two new lists
reflect the two conditions required for a good instrument:
• The Corr X&Z list specifies the correlation coefficient for the explanatory variable and the
instrument. To be a “good” instrument the explanatory variable and the instrument must be cor-
related. The default value is 0.50.
• The Corr Z&E specifies the correlation coefficient for the instrument and error term. To be a
“good” instrument the instrument and error term must be independent. The default value is 0.00;
that is, the instrument and error term are independent.
605 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Sample size

0.0 50
Act 1.0 75
coef 2.0 100
125
150
Estimation
procedure

Pause
IV

Start Stop

Repetition
Coef value est

Mean

Var

Corr X&Z Corr Z&E Corr X&E

0.50 0.00 −0.30

0.75 0.10 0.00
0.30

Correlation Correlation Correlation

coefficient for coefficient for coefficient for
X and Z Z and E X and E

Figure 18.19
Instrumental Variables simulation

The default values meet the “good” instrument conditions.

Also note that in the Corr X&E list the value 0.30 is specified. The correlation coefficient for
the explanatory variable and error term equals 0.30. The explanatory variable/error term inde-
pendence premise is violated. Last, IV is selected indicating that the instrumental variable (IV)
estimation procedure we just described will be used to estimate the value of the explanatory
variable’s coefficient.
We will now illustrate that while the instrumental variable (IV) estimation procedure is still
biased, it is consistent. To do so, click Start and then after many, many repetitions click Stop.
606 Chapter 18

Table 18.6
IV estimation procedure—“Good” instrument conditions satisfied

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests

IV 0.50 0.00 0.30 50 2.0 ≈1.61 ≈0.39 ≈20.3

IV 0.50 0.00 0.30 100 2.0 ≈1.82 ≈0.18 ≈ 8.7
IV 0.50 0.00 0.30 150 2.0 ≈1.88 ≈0.12 ≈5.5

Table 18.7
IV estimation procedure—A better instrument

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests

IV 0.50 0.00 0.30 150 2.0 ≈1.88 ≈0.12 ≈ 5.5

IV 0.75 0.00 0.30 150 2.0 ≈1.95 ≈.005 ≈2.3

Subsequently we increase the sample size from 50 to 100 and then again from 100 to 150. Table
18.6 reports the simulation results.
Both bad news and good news emerge:

Bad news: The instrumental variable estimation is biased. The mean of the estimates for the
coefficient of the explanatory variable does not the actual value we specified, 2.0.
Good news: As we increase the sample size,
• the mean of the coefficient estimates gets closer to the actual value

and
• the variance of the coefficient estimates becomes smaller.

This illustrates the fact that the instrumental variable (IV) estimation procedure is consistent.
Next we will use the lab to illustrate the importance of the “good” instrument conditions.
First, let us see what happens when we improve the instrument by making it more highly cor-
related with the problem explanatory variable. We do this by increasing the correlation coeffi-
cient of the explanatory variable and the instrument from 0.50 to 0.75 in the Corr X&Z list
(table 18.7).
The magnitude of the bias decreases; also the variance of the coefficient estimates also
decreases. A more highly correlated instrument provides a better estimate of the “problem”
explanatory variable in IV Regression 1 and hence is a better instrument.
607 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

Table 18.8
IV estimation procedure—Instrument correlated with error term

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X&Z Z&E X&E size coef coef ests of bias coef ests

IV 0.75 0.10 0.30 50 2.0 ≈3.69 ≈1.69 ≈ 6.8

IV 0.75 0.10 0.30 100 2.0 ≈3.74 ≈1.74 ≈3.2
IV 0.75 0.10 0.30 150 2.0 ≈3.76 ≈1.76 ≈2.1

Last, let us use the lab to illustrate the important role that the independence of the error term
and the instrument plays by specifying 0.10 from the Corr Z&E list; the instrument and the error
term are no longer independent (table 18.8). As we increase the sample size from 50 to 100 to
150, the magnitude of the bias does not decrease. The instrumental variable (IV) estimation
procedure is no longer consistent when the instrument is correlated with the error term; the
explanatory variable/error term independence premise is violated in IV Regression 2.

Chapter 18 Review Questions

1. What are the ramifications for the ordinary least squares (OLS) estimation procedure for the
value of the coefficient if the explanatory variable and error term are
a. positively correlated?
b. negatively correlated?
c. independent?
2. How does the problem resulting from explanatory variable/error term correlation differ from
the problems caused by heteroskedasticity or autocorrelation?
3. When is an estimation procedure unbiased?
4. When is an estimation procedure consistent?
5. Must an unbiased estimation procedure be consistent? Explain.
6. Must a consistent estimation procedure be unbiased? Explain.
7. What are the two “good” instrument conditions? Why is each important?

Chapter 18 Exercises

1. Random Sampling Procedure

Write the name of each individual in the population on a 3 × 5 card. Perform the following
procedure T times (T equals the sample size):
608 Chapter 18

• Thoroughly shuffle the cards.

• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.

Calculate the fraction of the sample supporting Clint.

Since there is little time to conduct the poll only 16 individuals can be sampled if Clint uses the
random polling procedure.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18E.1.]

The Nonrandom Sample box is cleared; hence the random sampling procedure described above
will be used. By default the actual fraction supporting Clint, ActFrac, equals 0.5; also the From–
To values are specified as 0.45 and 0.55. Click the Start button and then after many, many,
repetitions click Stop.

a. What does the mean of the estimates equal?

b. What does the variance of the estimates equal?
c. What percent of the repetitions fall within 0.05 of the actual fraction, ActFrac?
2. Nonrandom Sampling Procedure
Leave Clint’s dorm room.
• Ask the first T individuals encountered if he/she supports Clint and record the answer (T equals
the sample size).
• Calculate the fraction of the sample supporting Clint.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18E.2.]

The Nonrandom Sample box is checked; hence the nonrandom sampling procedure described
above will be used. As in problem 1, the sample size equals 16, the actual fraction supporting
Clint, ActFrac, equals 0.5, and the From–To values are specified as 0.45 and 0.55. Click the
Start button and then after many, many, repetitions click Stop.
a. What does the mean of the estimates equal?
b. What is the magnitude of the bias?
c. What does the variance of the estimates equal?
d. What percent of the repetitions fall within 0.05 of the actual fraction, ActFrac?
3. Compare your answers to problems 1 and 2. When the sample size is the same which sampling
procedure is more reliable?
609 Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables

4. Clearly, the nonrandom polling procedure requires less “setup.” It is not necessary to write
the name of each student on a separate card, and so forth. Consequently, with the nonrandom
procedure, there is time to poll more students. In the following simulation the sample size has
been raised from 16 to 25 to account for this.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 18E.4.]

Click the Start button and then after many, many, repetitions click Stop.
a. What does the mean of the estimates equal?
b. What is the magnitude of the bias?
c. What does the variance of the estimates equal?
d. What percent of the repetitions fall within .05 of the actual fraction, ActFrac?
5. Compare your answers to problems 1 and 4. Is an unbiased estimation procedure always
better than a biased one? Explain.
Measurement Error and the Instrumental Variables Estimation Procedure
19

Chapter 19 Outline

19.1 Introduction to Measurement Error

19.1.1 What Is Measurement Error?
19.1.2 Modeling Measurement Error

19.2 The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable
Measurement Error

19.3 The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable
Measurement Error
19.3.1 Summary: Explanatory Variable Measurement Error Bias
19.3.2 Explanatory Variable Measurement Error: Attenuation (Dilution) Bias
19.3.3 Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent?

19.4 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure

19.4.1 Mechanics
19.4.2 The “Good” Instrument Conditions

19.5 Measurement Error Example: Annual, Permanent, and Transitory Income

19.5.1 Definitions and Theory
19.5.2 Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a
Serious Econometric Problem?

19.6 Instrumental Variable (IV) Approach

19.6.1 The Mechanics
19.6.2 Comparison of the Ordinary Least Squares (OLS) and the Instrumental Variables
(IV) Approaches
19.6.3 “Good” Instrument Conditions Revisited
612 Chapter 19

19.7 Justifying the Instrumental Variable (IV) Estimation Procedure

Chapter 19 Prep Questions

1. Suppose that a physics assignment requires you to measure the amount of time it takes a one
pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop
watch to measure how long it takes the weight to fall.
a. Even though you are very careful and conscientious would you expect the stop watch to
report precisely the same amount of time on each trial? Explain.

Suppose that the following equation describes the relationship between the measured elapsed
time and the actual elapsed time:

yMeasuredt = yActualt + vt

where

yMeasuredt = measured elapsed time

yActualt = actual elapsed time

and where vt is a random variable. vt represents the random influences that cause your measure-
ment of the elapsed time to deviate from the actual elapsed time. The random influences cause
you to click the stop watch a little early or a little late.

b. Recall that you are careful and conscientious in attempting to measure the elapsed time.
i. In approximately what portion of the trials would you overestimate the elapsed time;
that is, in approximately what portion of the trials would you expect vt to be positive?
ii. In approximately what portion of the trials would you underestimate the elapsed time;
that is, in approximately what portion of the trials would you expect vt to be negative?
iii. Approximately what would the mean (average) of vt equal?
2. Economists distinguish between permanent income and annual income. Loosely speaking,
permanent income equals what a household earns per year “on average;” that is, permanent
income can be thought of as the “average” of annual income over an entire lifetime. In some
years, annual income is more than its permanent income, but in other years, it is less. The dif-
ference between the household’s annual income and permanent income is called transitory
income:

IncTranst = IncAnnt − IncPermt

where

IncAnnt = household’s annual income

IncPermt = household’s permanent income
IncTranst = household’s transitory income
613 Measurement Error and the Instrumental Variables Estimation Procedure

or equivalently,

IncAnnt = IncPermt + IncTranst

Since permanent income equals what a household earns “on average,” the mean of transitory
income equals 0. Microeconomic theory teaches that households base their consumption deci-
sions on their “permanent” income.
Theory: Additional permanent income increases consumption.

Consider the following model to assess the theory:

Model: Const = βConst + βIncPermIncPermt + et

Theory: βIncPerm > 0

When we attempt to gather data to access this theory, we immediately encounter a difficulty.
Permanent income cannot be observed. Only annual income data are available to assess the
theory. So, while we would like to specify permanent income as the explanatory variable, we
have no choice. We must use annual disposable income.

a. Can you interpret transitory income as measurement error? Hint: What is the mean
(average) of transitory income?
b. Now represent transitory income, IncTranst, by ut:

IncAnnt − IncPermt + ut

Express the model in terms of annual income.

c. What is the equation for the new error term?
d. Will the new error term and the explanatory variable, IncAnnt, be correlated?
e. What are the econometric ramifications of using the ordinary least squares (OLS) estima-
tion procedure to estimate the permanent income coefficient, βIncPerm, using annual income as
the explanatory variable?

19.1 Introduction to Measurement Error

Two types of measurement error can be present:

• Dependent variable
• Explanatory variable

We will argue that dependent variable measurement error does not lead to bias. However, when-
ever explanatory variable measure error exists, the explanatory variable and error term will be
correlated resulting in bias. We consider dependent variable measurement error first. Before
doing so, we will describe precisely what we mean by measurement error.
614 Chapter 19

19.1.1 What Is Measurement Error?

Suppose that a physics assignment requires you to measure the amount of time it takes a one
pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop
watch to measure how long it takes the weight to fall.

Question: Will your stop watch report the same amount of time on each trial?
Answer: No. Sometimes reported times will be lower than other reported times. Sometimes
you will be a little premature in clicking the stop watch button. Other times you will be a little
late.

It is humanly impossible to measure the actual elapsed time perfectly. No matter how careful
you are, sometimes the measured value will be a little low and other times a little high. This
phenomenon is called measurement error.

19.1.2 Modeling Measurement Error

We can model measurement error with the following equation:

yMeasuredt = yActualt + vt

Recall that yActualt equals the actual amount of time elapsed and yMeasuredt equals the mea-
sured amount of time; vt represents measurement error. Sometimes vt will be positive when you
are a little too slow in clicking the stop watch button; other times vt will be negative when you
click the button a little too soon. vt is a random variable; we cannot predict the numerical value
of vt beforehand. What can we say about vt? We can describe its distribution. Since you are
conscientious in measuring the elapsed time, the mean of vt’s probability distribution equals 0:

Mean[vt] = 0

Measurement error does not systematically increase or decrease the measured value of yt. The
measured value of yt, yMeasuredt,will not systematically overestimate or underestimate the actual
value.

19.2 The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable
Measurement Error

We begin with the equation specifying the actual relationship between the dependent and
explanatory variables:

Actual relationship: yActualt = βConst + βxActualxActualt + et

615 Measurement Error and the Instrumental Variables Estimation Procedure

But now suppose that as a consequence of measurement error, the actual value of the dependent
variable, yActualt, is not observable. You have no choice but to use the measured value, yMea-
suredt. Recall that the measured value equals the actual value plus the measurement error random
variable, vt:

yMeasuredt = yActualt + vt
where vt is a random variable with mean 0: Mean[vt] = 0. Solving for yActualt:

yActualt = yMeasuredt − vt

Let us apply this to the actual relationship:

yActualt = βConst + βxActualxActualt + et

↓ Substituting for yActualt
yMeasuredt − vt = βConst + βxActualxActualt + et
Rearranging terms
yMeasuredt = βConst + βxActualxActualt + et + vt
↓ Letting εt = et + vt,
yMeasuredt = βConst + βxActualxActualt + εt

εt represents the error term in the regression that you will actually be running. Will this result
in bias? To address this issue consider the following question:

Question: Are the explanatory variable, xActualt, and the error term, εt, correlated?

To answer the question, suppose that the measurement error term, vt, were to increase:

vt up
εt = et + vt
xActualt unaffected ↔ εt up

The value of the explanatory variable, xActualt, is unchanged while the error term, εt, increases.
Hence the explanatory variable and error term εt are independent; consequently no bias should
result.

Econometrics Lab 19.1: Dependent Measurement Error

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 19.1.]

We use a simulation to confirm our logic (figure 19.1). First we consider our base case, the no
measurement error case. The YMeas Err checkbox is cleared indicating that no dependent vari-
able measurement error is present. Consequently no bias should result. Be certain that the Pause
616 Chapter 19

0.0 Sample size

Act 1.0
10
coef 2.0
20
30
Estimation 40
procedure 50

Pause
OLS

Start Stop

YMeas err
Repetition
Coef value est

Mean

Var

Figure 19.1
Dependent variable measurement error simulation

checkbox is cleared and click Start. After many, many repetitions, click Stop. The ordinary least
squares (OLS) estimation procedure is unbiased in this case; the average of the estimated coef-
ficient values and the actual coefficient value both equal 2.0. When no measurement error is
present, all is well.
Now we will introduce dependent variable measurement error by checking the YMeas Err
checkbox. The YMeas Var list now appears with 20.0 selected; the variance of the measurement
error’s probability distribution, Var[vt], equals 20.0. Click Start and then after many, many
repetitions click Stop. Again, the average of the estimated coefficient values and the actual coef-
ficient value both equal 2.0. Next select from 20.0 to 50.0 to 80.0 from the YMeas Var list and
repeat the process.
The simulation confirms our logic (table 19.1). Even when dependent variable measurement
error is present, the average of the estimated coefficient values equals the actual coefficient value.
Dependent variable measurement error does not lead to bias.
What are the ramifications of dependent variable measurement error? The last column of table
19.1 reveals the answer. As measurement error variance increases, the variance of the estimated
coefficient values and hence the variance of the coefficient estimate’s probability distribution
increases. As the variance of the dependent variable measurement error term increases, we
introduce “more uncertainty” into the process and hence, the ordinary least squares (OLS) esti-
mates become less reliable.
617 Measurement Error and the Instrumental Variables Estimation Procedure

Table 19.1
Dependent variable measurement error simulation results

Sample size = 10

Type of YMeas Actual coef Mean (average) of the Variance of estimated

measurement error var value estimated coef values coef values

None 2.0 ≈2.0 ≈1.7

Dep Vbl 20.0 2.0 ≈2.0 ≈1.8
Dep Vbl 50.0 2.0 ≈2.0 ≈2.0
Dep Vbl 80.0 2.0 ≈2.0 ≈2.2

19.3 The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable
Measurement Error

To investigate explanatory variable measurement error we again begin with the equation that
describes the actual relationship between the dependent and explanatory variables:

Actual relationship: yActualt = βConst + βxActualxActualt + et

Next suppose that we cannot observe the actual value of the explanatory variable; we can only
observe the measured value. The measured value equals the actual value plus the measurement
error random variable, ut:

xMeasuredt = xActualt + ut

where
ut is a random variable with mean 0: Mean[ut] = 0. Solving for yActualt:

xActualt = xMeasuredt − ut

Now we apply this to the actual relationship:

yActualt = βConst + βxActualxActualt + et

↓ Substituting for xActualt
= βConst + βxActual(xMeasuredt − ut) + et
↓ Multiplying
= βConst + βxActualxMeasuredt − βxActualut + et
Rearranging terms
= βConst + βxActualxMeasuredt + et − βxActualut
↓ Letting εt = et − βxActualut,
yActualt = βConst + βxActualxMeasuredt + εt

εt is the error term in the regression that we will actually be running.

618 Chapter 19

Recall what we learned about correlation between the explanatory variable and error term
(figure 19.2):

Explanatory variable Explanatory variable Explanatory variable

and error term and error term and error term
positively correlated uncorrelated negatively correlated
↓ ↓ ↓
OLS estimation OLS estimation OLS estimation
procedure for procedure for procedure for
the coefficient value the coefficient value the coefficient value
is biased upward is unbiased is biased downward

Are the explanatory variable, xMeasuredt, and the error term, εt, correlated? The answer to the
question depends on the sign of the actual coefficient. Consider the three possibilities:
• βxActual > 0: When the actual coefficient is positive, negative correlation exists; consequently,

the ordinary least squares (OLS) estimation procedure for the coefficient value would be biased
downward. To understand why, suppose that ut increases:

ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt down

Negative explanatory variable/error

term correlation
↓
OLS biased downward
• βxActual < 0: When the actual coefficient is negative, positive correlation exists; consequently
the ordinary least squares (OLS) estimation procedure for the coefficient value would be biased
upward. To understand why, suppose that ut increases:

ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt up

Positive explanatory variable/error

term correlation
↓
OLS biased upward
619 Measurement Error and the Instrumental Variables Estimation Procedure

0.0 Sample size

Act 1.0
40
coef 2.0
50
60
Estimation 70
procedure 80

Pause
OLS

Start Stop

XMeas err

Coef value est

Mean

Var

XMeas var

10.0
15.0
20.0

Explanatory variable
measurement error
variance

Figure 19.2
Explanatory variable measurement error simulation

• βxActual = 0: When the actual coefficient equals 0, no correlation exists; consequently no bias
results. To understand why, suppose that ut increases:
ut up
xMeasuredt = xActualt + ut et = et − βxActualut βxActual > 0
xMeasuredt up ↔ εt unaffected

No explanatory variable/error
term correlation
↓
OLS unbiased
620 Chapter 19

19.3.1 Summary: Explanatory Variable Measurement Error Bias

βxActual < 0 βxActual = 0 βxActual > 0

↓ ↓ ↓
xMeasuredt and εt are xMeasuredt and εt are xMeasuredt and εt are
positively correlated uncorrelated negatively correlated
↓ ↓ ↓
OLS estimation OLS estimation OLS estimation
procedure procedure procedure
is biased upward is unbiased is biased downward
↓ ↓
Biased toward 0 Biased toward 0

Econometrics Lab 19.2: Explanatory Variable Measurement Error

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 19.2.]

We will use a simulation to check our logic. This time we check the XMeas Err checkbox. The
XMeas Var list now appears with 20.0 selected; the variance of the measurement error’s probabil-
ity distribution, Var[ut], equals 20.0. Then we select various values for the actual coefficient. In
each case, click Start and then after many, many repetitions click Stop. The simulation results
are reported in table 10.2 (table 19.2).
The simulation results confirm our logic. When the actual coefficient is positive and explanatory
variable measurement error is present, the ordinary least squares (OLS) estimation procedure for
the coefficient value is biased downward. When the actual coefficient is negative and explanatory
variable measurement error is present, upward bias results. Last, when the actual coefficient is
zero, no bias results even in the presence of explanatory variable measurement error.

Table 19.2
Explanatory variable measurement error simulation results

Sample size = 40

Type of XMeas Actual coef Mean (average) of the Magnitude

measurement error var value estimated coef values of bias

Exp Vbl 20.0 2.0 ≈1.11 ≈ 0.89

Exp Vbl 20.0 1.0 ≈ 0.56 ≈ 0.44
Exp Vbl 20.0 −1.0 ≈ −0.56 ≈ 0.44
Exp Vbl 20.0 0.0 ≈ 0.00 ≈ 0.00
621 Measurement Error and the Instrumental Variables Estimation Procedure

OLS estimation procedure

β xActual
β xActual < 0 β >0
0 xActual

Figure 19.3
Effect of explanatory variable measurement error

19.3.2 Explanatory Variable Measurement Error: Attenuation (Dilution) Bias

The simulations reveal an interesting pattern. While explanatory variable measurement error
leads to bias, the bias never appears to be strong enough to change the sign of the mean of the
coefficient estimates. In other words, explanatory variable measurement error biases the ordinary
least squares (OLS) estimation procedure for the coefficient value toward 0. This type of bias
is called attenuation or dilution bias (figure 19.3).
Why does explanatory variable measurement error cause attenuation bias? Even more basic,
why does explanatory variable measurement error cause bias at all? After all, the chances that
the measured value of the explanatory variable will be too high equal the chances it will be too
low. Why should this lead to bias? To appreciate why, suppose that the actual value of the coef-
ficient, βxActual, is positive. When the measured value of the explanatory variable, xMeasuredt,
rises it can do so for two reasons:
• the actual value of explanatory variable, xActualt, rises

or
• the value of the measurement error term, ut, rises.

Consider what happens to yActualt in each case:

xActualt up → yActualt up

xMeasuredt up or

ut up → yActualt unchanged
622 Chapter 19

So we have two possibilities:

• First case: The actual value of the dependent variable rises since the actual value of the explana-
tory variable has risen. In this case the estimation procedure will estimate the value of the coef-
ficient estimate “correctly.”
• Second case: The actual value of the dependent variable remains unchanged since the actual
value of the explanatory variable is unchanged. In this case the estimation procedure would
estimate the value of the coefficient to be 0.

Taking into account both cases, we conclude that the estimation procedure will understate the
effect that the actual value of the explanatory variable has on the dependent variable. Overall,
the estimation procedure will understate the actual value of the coefficient.

19.3.3 Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent?

Econometrics Lab 19.3: Consistency and Explanatory Variable Measurement Error

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 19.3.]

We have already shown that when explanatory variable measurement error is present and the
actual coefficient is nonzero, the ordinary least squares (OLS) estimation procedure for the coef-
ficient value is biased. But perhaps it is consistent. Let us see by increasing the sample size
(table 19.3).
The bias does not lessen as the sample size is increased. Unfortunately, when explanatory
variable measurement error is present and the actual coefficient is nonzero, the ordinary least
squares (OLS) estimation procedure for the coefficient value provides only bad news:
• Bad news: The ordinary least squares (OLS) estimation procedure is biased.
• Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.

Table 19.3
OLS estimation procedure, measurement error, and consistency

Estimation XMeas Sample Actual Mean of Magnitude Variance of

procedure var size coef coef ests of bias coef ests

OLS 20 40 2.0 ≈1.11 ≈ 0.89 ≈ 0.2

OLS 20 50 2.0 ≈1.11 ≈ 0.89 ≈ 0.2
OLS 20 60 2.0 ≈1.11 ≈ 0.89 ≈ 0.1
623 Measurement Error and the Instrumental Variables Estimation Procedure

Original model:
yt = Const + , where yt = dependent variable
x xt + t
xt = explanatory variable
t = error term
When xt and t
t = 1, 2, . . .,T, T = sample size
are correlated

xt is the “problem”
explanatory variable

Figure 19.4
The “problem” explanatory variable

19.4 Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure

Recall that the instrumental variable estimation procedure addresses situations in which the
explanatory variable and the error term are correlated (figure 19.4).
When an explanatory variable, xt, is correlated with the error term, εt, we will refer to the
explanatory variable as the “problem” explanatory variable. The correlation of the explanatory
variable and the error term creates the bias problem for the ordinary least squares (OLS) estima-
tion procedure. The instrumental variable estimation procedure can mitigate, but not completely
remedy the problem. Let us briefly review the procedure and motivate it.

19.4.1 Mechanics

Choose a “good” instrument: A “good” instrument, zt, must be

• correlated with the “problem” explanatory variable, xt, and
• uncorrelated with the error term, εt.

Instrumental variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the
problem explanatory variable, xt.
• Dependent variable: “Problem” explanatory variable, xt.
• Explanatory variable: Instrument, zt.
• Estimate of the “problem” explanatory variable: Estxt = aConst + azzt, where aConst and az are the
estimates of the constant and coefficient in this regression, IV Regression 1.
Instrumental variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.
624 Chapter 19

19.4.2 The “Good” Instrument Conditions

Let us now provide the intuition behind why a “good” instrument, zt, must satisfy the two condi-
tions: instrument/”problem” explanatory variable correlation and instrument/error term
independence.

Instrument/”Problem” Explanatory Variable Correlation

Estxt = aConst + azzt

Instrument/Error Term Independence

yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt

where Estxt = aConst + azzt from IV Regression 1.

Estxt = aConst + azzt

Consequently, to avoid violating the explanatory variable/error term independence premise,

the instrument, zt, and the error term, εt, must be independent.

Estxt and εt must be independent

yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt

zt and εt must be independent

625 Measurement Error and the Instrumental Variables Estimation Procedure

19.5 Measurement Error Example: Annual, Permanent, and Transitory Income

19.5.1 Definitions and Theory

Economists distinguish between permanent income and annual income. Loosely speaking, per-
manent income equals what a household earns per year “on average;” that is, permanent income
can be thought of as the “average” of annual income over an entire lifetime. In some years, the
household’s annual income is more than its permanent income, but in other years, it is less. The
difference between the household’s annual income and permanent income is called transitory
income:

IncTranst = IncAnnt − IncPermt

where

IncAnnt = households’s annual income

IncPermt = household’s permanent income
IncTranst = household’s transitory income

or equivalently,

IncAnnt = IncPermt + IncTranst

Since permanent income equals what the household earns “on average,” the mean of transitory
income equals 0.
Microeconomic theory teaches that households base their consumption decisions on their
“permanent” income. We are going to apply the permanent income consumption theory to health
insurance coverage:

Theory: Additional permanent per capita disposable income within a state increases health
insurance coverage within the state.
Project: Assess the effect of permanent income on health insurance coverage.

We consider a straightforward linear model:

Model: Coveredt = βConst + βIncPermIncPermPCt + et

Theory: βIncPerm > 0

where

Coveredt = percent of individuals with health insurance in state t

IncPermPCt = per capita permanent disposable income in state t
626 Chapter 19

When we attempt to gather data to access this theory, we immediately encounter a difficulty.
Permanent income cannot be observed. Only annual income data are available to assess the
theory.

Health insurance data: Cross-sectional data of health insurance coverage, education, and
income statistics from the 50 states and the District of Columbia in 2007.

Coveredt Adults (25 and older) covered by health insurance in state t (percent)
IncAnnPCt Per capita annual disposable income in state t (thousands of dollars)
HSt Adults (25 and older) who completed high school in state t (percent)
Collt Adults (25 and older) who completed a four year college in state t (percent)
AdvDegt Adults (25 and older) who have an advanced degree in state t (percent)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Health Insurance.]

While we would like to specify permanent income as the explanatory variable, permanent income
is unobservable. We have no choice. We must use annual disposable income as the explanatory
variable. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
(table 19.4).
Now construct the null and alternative hypotheses:

H0: βIncPerm = 0 Disposable income has no effect on health insurance coverage

H1: βIncPerm > 0 Additional disposable income increases health insurance coverage

Table 19.4
Health insurance OLS regression results

Ordinary least squares (OLS)

Dependent variable: Covered

Explanatory variable(s): Estimate SE t-Statistic Prob

IncAnnPC 0.226905 0.104784 2.165464 0.0352

Const 78.56242 3.605818 21.78768 0.0000
Number of observations 51
Estimated equation: EstCovered = 78.6 + .23IncAnnPC
Interpretation of estimates:
bIncAnnPC = 0.23: A $1,000 increase in annual per capita disposable income increases the state’s health insurance
coverage by 0.23 percentage points
Critical result: The IncAnnPC coefficient estimate equals 0.23. The positive sign of the coefficient estimate suggests
that increases in disposable income increase health insurance coverage. This evidence supports the
theory.
627 Measurement Error and the Instrumental Variables Estimation Procedure

Since the null hypothesis is based on the premise that the actual value of the coefficient equals
0, we can calculate the Prob[Results IF H0 true] using the tails probability reported in the regres-
sion printout:

0.0352
Prob[Results IF H 0 true] = = 0.0176
2

19.5.2 Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a Serious
Econometric Problem?

Might this regression suffer from a serious econometric problem, however? Yes. Annual income
equals permanent income plus transitory income; transitory income can be viewed as measure-
ment error. Sometimes transitory income is positive, sometimes it is negative, on average it
is 0:

IncAnnPCt = IncPermPCt + IncTransPCt

↓
Measurement
error
↓
IncAnnPCt = IncPermPCt + ut

where Mean[ut] = 0, or equivalently,

IncPermPCt = IncAnnPCt − ut
As a consequence of explanatory variable measurement error the ordinary least squares (OLS)
estimation procedure for the coefficient will be biased downward. To understand why we begin
with our model and then do a little algebra:

Coveredt = βConst + βIncPermIncPermPCt + et

↓ Substituting for IncPermPCt
= βConst + βIncPerm(IncAnnPCt − ut) + et
↓ Multiplying
= βConst + βIncPermIncAnnPCt − βIncPermut + et
Rearranging terms
= βConst + βIncPermIncAnnPCt + et − βIncPermut
↓ Letting εt = βIncPermut
Coveredt = βConst + βIncPermIncAnnPCt + εt
628 Chapter 19

where βIncPerm > 0. Theory suggests that βIncPerm is positive; consequently we expect the new error
term, εt, and the explanatory variable, IncAnnPCt, to be negatively correlated.

ut up
IncAnnPCt = IncPermPCt + ut εt = et − βIncPermut βIncPerm > 0
IncAnnPCt up ↔ εt down

Negative explanatory variable/error

term correlation
↓
OLS biased downward

IncAnnPCt is the “problem” explanatory variable because it is correlated with the error term, εt.
The ordinary least squares (OLS) estimation procedure for the coefficient value is biased toward
0. We will now show how we can use the instrumental variable (IV) estimation procedure to
mitigate the problem.

19.6 Instrumental Variable (IV) Approach

19.6.1 The Mechanics

Choose an instrument: In this example we use percent of adults who completed high school,
HSt, as our instrument. In doing so, we believe that it satisfies the two “good” instrument condi-
tions. We believe that high school education, HSt,
• is positively correlated with the “problem” explanatory variable, IncAnnPCt

and
• is uncorrelated with the error term, εt.

Instrumental Variables (IV) Regression 1

• Dependent variable: “Problem” explanatory variable, IncAnnPC.
• Explanatory variable: Instrument, the correlated variable, HS.

We can motivate IV Regression 1 by devising a theory to explain permanent income. Our theory
is very straightforward, state per capita permanent income depends on percent of state residents
who are high school graduates:

IncPermPCt = αConst + αHSHSt + et

629 Measurement Error and the Instrumental Variables Estimation Procedure

Table 19.5
Health insurance IV Regression 1 results

Ordinary least squares (OLS)

Dependent variable: IncAnnPC

Explanatory variable(s): Estimate SE t-Statistic Prob

HS 0.456948 0.194711 2.346797 0.0230

Const −5.274762 16.75975 − 0.314728 0.7543
Number of observations 51
Estimated equation: EstIncAnnPC = −5.27 + 0.457HS

where

HSt = percent of adults (25 and over) who completed high school in state t

Theory: As a state has a greater percent of college graduates, its per capita permanent income
increase; hence αHS > 0.

But, again, we note that permanent income is not observable, only annual income is. Conse-
quently we have no choice but to use annual per capita income as the dependent variable
(table 19.5).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Health Insurance.]

What are the ramifications of using annual per capita income as the dependent variable? We can
view annual per capita income as permanent per capita income with measurement error. What
do we know about dependent variable measurement error? Dependent variable does not lead to
bias; only explanatory variable measurement error creates bias. Since annual income is the
dependent variable in IV Regression 1, the ordinary least squares (OLS) estimation procedure
for the regression parameters will not be biased.

Instrumental Variables (IV) Regression 2

• Dependent variable: Original dependent variable, Covered.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results
from IV Regression 1, EstIncAnnPC.
Use the estimates of IV Regression 1 to create a new variable, the estimated value of per capita
disposable income based on the completion of high school (table 19.6):

EstIncAnnPC = −5.27 + 0.457HS

630 Chapter 19

Table 19.6
Health insurance IV Regression 2 results

Ordinary least squares (OLS)

Dependent variable: Covered

Explanatory variable(s): Estimate SE t-Statistic Prob

EstIncAnnPC 1.387791 0.282369 4.914822 0.0000

Const 39.05305 9.620730 4.059260 0.0002
Number of observations 51
Estimated equation: EstCovered = 39.05 + 1.39EstIncAnnPC
Interpretation of estimates:
bEstIncAnnPC = 1.39: A $1,000 increase in annual per capita disposable income increases the state’s health insurance
coverage by 1.39 percentage points
Critical result: The EstIncAnnPC coefficient estimate equals 1.39. The positive sign of the coefficient estimate
suggests that increases in permanent disposable income increase health insurance coverage. This
evidence supports the theory.

Table 19.7
Comparison of OLS and IV Regression results

βIncPerm estimate Standard error t-Statistic Tails probability

Ordinary least squares (OLS) 0.23 0.105 2.17 0.0352

Instrumental variable (IV) 1.38 0.282 4.91 < 0.0001

19.6.2 Comparison of the Ordinary Least Squares (OLS) and the Instrumental
Variables (IV) Approaches

Now review the two approaches that we used to estimate of the effect of permanent income on
health insurance coverage: the ordinary least squares (OLS) estimation procedure and the instru-
mental variable (IV) estimation procedure.
• First, we used annual disposable income as the explanatory variable and applied the ordinary
least squares (OLS) estimation procedure. We estimated that a $1,000 increase in per capita
disposable income increases health insurance coverage by 0.23 percentage points. But we believe
that an explanatory variable measurement error problem is present here.
• Second, we used an instrumental variable (IV) approach, which resulted in a higher estimate
for the impact of permanent income. We estimated that a $1,000 increase in per capita disposable
income increases health insurance coverage by 1.39 percentage points.

These results are consistent with the notion that the ordinary least squares (OLS) estimation
procedure for the coefficient value is biased downward whenever explanatory variable measure-
ment error is present (table 19.7).
631 Measurement Error and the Instrumental Variables Estimation Procedure

19.6.3 “Good” Instrument Conditions Revisited

IV Regression 1 allows us to assess the first “good” instrument condition.

Instrument/”problem” explanatory variable correlation: The instrument, HSt, must be correlated

with the “problem” explanatory variable, IncAnnPCt.

We are using the instrument to create a surrogate for the “problem” explanatory variable in IV
Regression 1:

EstIncAnnPCt = −5.27 + 0.457HSt

The estimate, EstIncAnnPCt, will be a “good” surrogate only if the instrument, HSt, is correlated
with the “problem” explanatory variable, IncAnnPCt; that is, only if the estimate is a good pre-
dictor of the “problem” explanatory variable.
The sign of the HSt coefficient is positive supporting our view that annual income and high
school education are positively correlated. Furthermore, the coefficient is significant at the 5
percent level and nearly significant at the 1 percent level. So it is reasonable to judge that the
instrument meets the first condition.
Next focus on the second “good” instrument condition:

Instrument/error term independence: The instrument, HS, and the error term, εt, must be inde-
pendent. Otherwise, the explanatory variable/error term independence premise would be violated
in IV Regression 2.

Recall the model that IV Regression 2 estimates:

Coveredt = βConst + βIncPermEstAnnIncPCt + εt

Question: Are EstAnnIncPCt and εt independent?

EstIncAnnPCt = −5.27 + 0.457HSt, εt = et − βIncPermut

Answer: Only if HSt and εt are independent.

The explanatory variable/error term independence premise will be satisfied only if the instru-
ment, HSt, and the new error term, εt, are independent. If they are correlated, then we have gone
“from the frying pan into the fire.” It was the violation of this premise that created the problem
in the first place. There is no obvious reason to believe that they are correlated. Unfortunately,
there is no way to confirm this empirically, however. This can be the “Achilles heel” of the
instrumental variable (IV) estimation procedure. Finding a good instrument can be very tricky.

19.7 Justifying the Instrumental Variable (IV) Estimation Procedure

Claim: While the instrumental variable (IV) estimation procedure for the coefficient value in
the presence of measurement is biased, it is consistent.
632 Chapter 19

Econometrics Lab 19.4: Consistency and the Instrumental Variable (IV) Estimation Procedure

While this claim can be justified rigorous, we will avoid the mathematics by using a
simulation.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 19.4.]

Focus your attention on figure 19.5. Since we wish to investigate the properties of the instru-
mental variable (IV) estimation procedure, IV is selected in the estimation procedure box. Next
note the XMeas Var List. Explanatory variable measurement error is present. By default, the

0.0 Sample size

Act 1.0
40
coef 2.0
50
60
Estimation 70
procedure 80

Pause
IV

Start Stop

Repetition XMeas err

Coef value est

Mean

Var

Corr X&Z XMeas var

0.50 10.0
0.75 15.0
20.0

Correlation Explanatory variable

coefficient for measurement error
X1 and Z variance

Figure 19.5
Instrumental variable measurement error simulation
633 Measurement Error and the Instrumental Variables Estimation Procedure

Table 19.8
Measurement error, IV estimation procedure, and consistency

Estimation XMeas Sample Actual Mean of Magnitude Variance of

procedure var size coef coef ests of bias coef ests

IV 20 40 2.0 ≈2.24 ≈ 0.24 ≈ 8.7

IV 20 50 2.0 ≈2.21 ≈ 0.21 ≈ 5.4
IV 20 60 2.0 ≈2.17 ≈ 0.17 ≈3.4

variance of the probability distribution for the measurement error term, Var[ut], equals 20.0. In
the Corr X&Z list .50 is selected; the correlation coefficient between the explanatory variable
and the instrument is .50.
Initially, the sample size is 40. Click Start and then after many, many repetitions click Stop.
The average of the estimated coefficient values equals 2.24. Next increase the sample size from
40 to 50 and repeat the process. Do the same for a sample size of 60. As table 19.8 reports, the
average of the estimated coefficient values never equals the actual value; consequently the
instrumental variable (IV) estimation procedure for the coefficient value is biased. But also note
that the magnitude of the bias decreases as the sample size increases. Also the variance of the
estimates declines as the sample size increases.
Table 19.8 suggests that when explanatory variable measurement error is present, the instru-
mental variable (IV) estimation procedure for the coefficient value provides both good news and
bad news:
• Bad news: The instrumental variable (IV) estimation procedure for the coefficient value is
still biased; the average of the estimated coefficient values does not equal the actual value.
• Good news: The instrumental variable (IV) estimation procedure for the coefficient value is
consistent.

As the sample size is increased,

• the magnitude of the bias diminishes and
• the variance of the estimated coefficient values decreases.

Chapter 19 Review Questions

1. Suppose that dependent variable measurement error is present.

a. What problem does not result? In words, explain why.
b. What problem does result? Explain why.
2. What problem results when explanatory variable measurement error is present? Explain why.
3. When explanatory variable measurement error is present:
634 Chapter 19

a. Is the ordinary least squares (OLS) estimation procedure appropriate? Explain.

b. In what way is the instrumental variable (IV) estimation procedure not better than the
ordinary least squares (OLS) estimation procedure? In what way is the instrumental variable
(IV) estimation procedure better?

Chapter 19 Exercises

Reconsider the following model to explain scores on a final exam:

ExamScoret = βConst + βProbProbScorest + βSatSATScorest + et

where

ExamScoret = student t’s score on the final exam

ProbScorest = sum of student t’s score on the semester’s problem sets
SATScorest = sum of student t’s math and verbal SAT scores

Suppose that we wish to publish the results of our analysis along with the data. As a consequence
of privacy concerns, we wish to prevent “outsiders” from connecting an individual student’s
exam, problem set, and SAT scores.

1. Randomize the student SATScores. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s SATScores.
• If the coin lands tails, subtract 10 points from that student’s SATScores.

Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?

2. Randomize the student ProbScores. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s ProbScores.
• If the coin lands tails, subtract 10 points from that student’s ProbScores.

Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?

3. Randomize the student ExamScore. More specifically, for each student flip a coin:
• If the coin lands heads, add 10 points to that student’s ExamScore.
• If the coin lands tails, subtract 10 points from that student’s ExamScore.

Use the randomized values in the analysis instead of the actual values. What are the econometric
consequences of this approach?
635 Measurement Error and the Instrumental Variables Estimation Procedure

Health insurance data: Cross-sectional data of health insurance coverage, education, and
income statistics from the 50 states and the District of Columbia in 2007.

Coveredt Percent of adults (25 and older) with health insurance in state t
IncAnnPCt Per capita annual disposable income in state t (thousands of dollars)
HSt Percent of adults (25 and older) who completed high school in state t
Collt Percent of adults (25 and older) who completed a four year college in state t
AdvDegt Percent of adults (25 and older) who have an advanced degree in state t

4. Consider the same theory and model that we used in the chapter:

Theory: Additional permanent per capita disposable income within a state increases health
insurance coverage within the state.

Coveredt = βConst + βIncPermIncPermPCt + et

where

Coveredt = percent of individuals with health insurance in state t

IncPermPCt = per capita permanent disposable income in state t

Repeat part of what was done in this chapter: use the ordinary least squares (OLS) estimation
procedure to estimate the value of the IncPermPC coefficient. You have no choice but to use
IncAnnPCt as the explanatory variable since IncPermPCt is not observable. Does the sign of the
coefficient lend support for your theory?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Health Insurance.]
5. Use the instrumental variable (IV) estimation procedure to estimate the coefficient of IncPerm-
PCt. But instead of using the percentage of adults who completed high school, HS, as the instru-
ment use the percentage of adults who completed a four year college, Coll, as the instrument.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Health Insurance.]

a. Compute the IV Regression 1 results.

b. Compute the IV Regression 2 results.
6. How, if at all, does the instrumental variable (IV) procedure affect the assessment of your
theory?

Judicial data: Cross-sectional data of judicial and economic statistics for the fifty states in 2000.
636 Chapter 19

JudExpt State and local expenditures for the judicial system per 100,000 persons in state t
CrimesAllt Crimes per 100,000 persons in state t
GdpPCt Per capita real per capita GDP in state t (2000 dollars)
Popt Population in state t (persons)
UnemRatet Unemployment rate in state t (percent)
Statet Name of state t
Yeart Year

7. Consider the following linear model explaining judicial expenditures:

JudExpt = βConst + βCrimesCrimesAllt + βGDPGdpPCt + et

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Judicial Expenses.]

8. Many believe that measurement error is present in crime rate statistics. Use instrumental
variable (IV) estimation procedure to account for the measurement error.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Judicial Expenses.]

a. Explain why UnemRatet may be an appropriate instrument for CrimesAllt.

b. Using UnemRatet as the instrument, compute the IV Regression 1 results.
c. Compute the IV Regression 2 results.
9. How, if at all, does the instrumental variable (IV) procedure affect the assessment of your
theory?
Omitted Variables and the Instrumental Variable Estimation Procedure
20

Chapter 20 Outline

20.1 Revisit Omitted Explanatory Variable Bias

20.1.1 Review of Our Previous Explanation of Omitted Explanatory Variable Bias
20.1.2 Omitted Explanatory Variable Bias and the Explanatory Variable/Error Term
Independence Premise

20.2 The Ordinary Least Squares Estimation Procedure, Omitted Explanatory Variable Bias,
and Consistency

20.3 Instrumental Variable Estimation Procedure: A Two Regression Estimation Procedure

20.3.1 Mechanics
20.3.2 The “Good” Instrument Conditions

20.4 Omitted Explanatory Variables Example: 2008 Presidential Election

20.5 Instrument Variable (IV) Application: 2008 Presidential Election

20.5.1 The Mechanics
20.5.2 “Good” Instrument Conditions Revisited

20.6 Justifying the Instrumental Variable (IV) Estimation Procedure

Chapter 20 Prep Question

1. Consider two regression models:

yt = βConst + βx1x1t + βx2x2t + et

638 Chapter 20

and

yt = βConst + βx1x1t + εt

a. Express the second model’s error term, εt, in terms as a function of the first model’s terms.

Assume that
• the coefficient βx2 is positive
and
• the explanatory variables, x1t and x2t, are positively correlated.
b. Will the explanatory variable, x1t, and the second model’s error term, εt, be correlated? If
so, how?
c. Focus on the second model. Suppose that the ordinary least squares (OLS) estimation
procedure were used to estimate the parameters of the second model. Would the ordinary
least squares estimation (OLS) estimation procedure for the value of βx1 be biased? If so,
how?

Omitted explanatory variables example—2008 presidential election data: Cross-sectional data

of election, population, and economic statistics from the 50 states and the District of Columbia
in 2008.

AdvDegt Percent adults who have advanced degrees in state t

Collt Percent adults who graduated from college in state t
HSt Percent adults who graduated from high school in state t
PopDent Population density of state t (persons per square mile)
RealGdpGrowtht GDP growth rate for state t in 2008 (percent)
UnemTrendt Change in the unemployment rate for state t in 2008 (percent)
VoteDemPartyTwot Percent of the vote received in 2008 received by the Democratic party in
state t based on the two major parties (percent)

2. Consider the following model explaining the vote received by the Democratic Party in the
2008 presidential election:

VoteDemPartyTwot = βConst + βPopDenPopDent + βLibLiberalt + et

= βConst + βPopDenPopDent + (βLibLiberalt + et)
= βConst + βPopDenPopDent + εt

a. Express the second model’s error term, εt, in terms as a function of the first model’s terms.

Assume that
• the coefficient βLib is positive
and
• the explanatory variables PopDent and Liberalt are positively correlated.
b. Will the explanatory variable PopDent and the second model’s error term, εt, be correlated?
If so, how?
c. Focus on the second model. Suppose that the ordinary least squares (OLS) estimation
procedure were used to estimate the parameters of the second model. Would the ordinary
least squares estimation (OLS) estimation procedure for the value of βLib be biased? If so,
how?
3. What does the correlation coefficient of PopDent and Collt equal?

20.1 Revisit Omitted Explanatory Variable Bias

We will briefly review our previous discussion of omitted explanatory variables that appears in
chapter 14. Then we will show that omitted explanatory variable phenomenon can also be ana-
lyzed in terms of explanatory variable/error term correlation.

20.1.1 Review of Our Previous Explanation of Omitted Explanatory Variable Bias

In chapter 14 we argued that omitting an explanatory variable from a regression will bias the
ordinary least squares (OLS) estimation procedure for the coefficient value whenever two condi-
tions are met. Bias results if the omitted variable
• influences the dependent variable;
• is correlated with an included variable.

When these two conditions are met, the ordinary least squares (OLS) procedure to estimate the
coefficient of the included explanatory variable captures two effects:
• Direct effect: The effect that the included explanatory variable actually has on the dependent
variable.
• Proxy effect: The effect that the omitted explanatory variable has on the dependent variable
because the included variable is acting as a proxy for the omitted variable.

Recall the goal of multiple regression analysis:

Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the indi-
vidual effect that each explanatory variable has on the dependent variable.
640 Chapter 20

Consequently we want the coefficient estimate of the included variable to capture only the direct
effect and not the proxy effect. Unfortunately, the ordinary least squares (OLS) estimation pro-
cedure fails to do this when the omitted variance influences the dependent variable and when it
is also correlated with an included variable.
To illustrate this, we considered a model with two explanatory variables, x1 and x2:

Model: yt = βConst + βx1x1t + βx2x2t + et

For purposes of illustration, assume that the coefficients are positive and that the explanatory
variables are positively correlated:
• βx1 > 0 and βx2 > 0.
• x1t and x2t are positively correlated.

What happens when we omit the explanatory variable x2t from the regression?
The two conditions necessary for the omitted variable bias are satisfied:
• Since βx2 is positive, the omitted variable influences the dependent variable.
• Since x1t and x2t are positively correlated, the omitted variable is correlated with an included
variable.

An increase in x1t directly affects yt, causing yt to increase; this is the direct effect we want to
capture. But the story does not end here when x2t is omitted. Since the two explanatory variables
are positively correlated, an increase in x1t is typically accompanied by an increase in x2t, which
in turn leads to an additional increase in yt:

Typically
Included Positive omitted
variable correlation variable
x1t up x2t up
↓ βx1 > 0 ↓ βx2 > 0
yt up yt up
↓ ↓
Direct effect Proxy effect

When the explanatory variable x2t is omitted from a regression, the ordinary least squares (OLS)
estimation procedure for the value of x1t’s coefficient, βx1, is biased upward because it reflects
not only the impact of x1t itself (direct effect) but also the impact of x2t (proxy effect).
641 Omitted Variables and the Instrumental Variable Estimation Procedure

20.1.2 Omitted Explanatory Variable Bias and the Explanatory Variable/Error Term
Independence Premise

We can also use what we learned in chapter 18 about correlation between the explanatory vari-
able and error term to explain why bias occurs. When we omit the explanatory variable x2t from
the regression, the error term of the new equation, εt, includes not only the original error term,
et, but also the “omitted variable term,” βx2x2t:

yt = βConst + βx1x1t + βx2x2t + et

= βConst + βx1x1t + (βx2x2t + et)
= βConst + βx1x1t + ε t

where εt = βx2x2t + et.

Recall that
• βx2 > 0.
• x1t and x2t are positively correlated.

The new error term, εt, includes the “omitted variable term,” βx2x2t. Therefore the included
explanatory variable, x1t, and the new error term, εt, are positively correlated:
• Since x1t and x2t are positively correlated, when x1t increases, x2t typically increases also.
• Since βx2 is positive when x1t increases, βx2x2t and the new error term, εt, increases also, the
new error term will typically increase.

x1t and x2t

positively correlated
Included Typically omitted
variable x1t up variable x2t up
↓ ↓ εt = βx2x2t + et
x1t up εt up βx2 > 0

x1t and εt positively correlated

What did we learn about the consequence of correlation between the explanatory variable and
error term? When the explanatory variable and error term are positively correlated, the ordinary
least squares (OLS) estimation procedure for the value of x1t’s coefficient is biased upward.
642 Chapter 20

x1t and εt positively correlated

↓
OLS estimation procedure for
the value of x1’s coefficient
is biased upward
When x2t is omitted, x1t becomes a “problem” explanatory variable because it is correlated with
the new error term. Our two analyzes arrive at the same conclusion.

20.2 The Ordinary Least Squares Estimation Procedure, Omitted Explanatory Variable Bias,
and Consistency

When an omitted explanatory variable causes the ordinary least squares (OLS) estimation pro-
cedure to be biased, might the procedure still be consistent? We will use a simulation to address
this question (figure 20.1).

Econometrics Lab 20.1: Ordinary Least Squares, Omitted Variables, and Consistency

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 20.1.]

By default, the actual value of x1t coefficient, Coef1, equals 2.0 and the actual value of the x2t
coefficient, Coef2, equals 4.0. The correlation coefficient for the explanatory variables x1t and
x2t equals 0.60; the explanatory variables are positively correlated. Furthermore the Only X1
option is selected; the explanatory variable x2t is omitted. The included explanatory variable,
x1t, will be positively correlated with the error term. x1t becomes a “problem” explanatory
variable.
Initially, the sample size equals 50. Click Start and then after many, many repetitions click
Stop. The mean of the coefficient estimates for the explanatory variable x1 equals 4.4. Our logic
is confirmed; upward bias results. Nevertheless, to determine if the ordinary least squares (OLS)
estimation procedure might consistent we increase the sample size from 50 to 100 and once
more from 100 to 150. As table 20.1 reports, the mean of the coefficient estimates remains
at 4.4.
Unfortunately, the ordinary least squares (OLS) estimation procedure proves to be not only
biased but also not consistent whenever an explanatory variable is omitted that
• affects the dependent variable

and
• is correlated with an included variable.

What can we do?

643 Omitted Variables and the Instrumental Variable Estimation Procedure

Sample size

50
Act 1.0 75
coef 2.0 100
125
150
– 4.0
0.0 Estimation
Act
Coef2 4.0 procedure

Pause
OLS

Start Stop

Repetition
Coef value est Both X’s
Mean Only X1

Var

Correlation Corr X1&X2

coefficient for 0.00
X1 and X2 0.30
0.60

Figure 20.1
OLS omitted variable simulation

Table 20.1
Ordinary least squares: Bias and consistency

Estimation Corr Actual Sample Actual Mean of Magnitude Variance of

procedure X1&X2 coef2 size coef coef1 ests of bias coef1 ests

OLS 0.60 4.0 50 2.0 ≈ 4.4 ≈2.4 ≈ 6.6

OLS 0.60 4.0 100 2.0 ≈ 4.4 ≈2.4 ≈3.2
OLS 0.60 4.0 150 2.0 ≈ 4.4 ≈2.4 ≈2.1
644 Chapter 20

Original model:
yt = Const + , where yt = dependent variable
x xt + t
xt = explanatory variable
t = error term
When xt and t t = 1, 2, . . .,T, T = sample size
are correlated

xt is the “problem”
explanatory variable

Figure 20.2
“Problem” explanatory variable

20.3 Instrumental Variable Estimation Procedure: A Two Regression Estimation Procedure

The instrumental variable (IV) estimation procedure can deal with situations when the explana-
tory variable and the error term are correlated (figure 20.2). When an explanatory variable, xt,
is correlated with the error term, εt, we refer to the explanatory variable as the “problem”
explanatory variable. The correlation of the explanatory variable and the error term creates the
bias problem for the ordinary least squares (OLS) estimation procedure. The instrumental vari-
able estimation procedure can mitigate, but not completely remedy these cases. Let us briefly
review the procedure and motivate it.

20.3.1 Mechanics

Choose a “good” instrument: A “good” instrument, zt, must have two properties:
• Correlated with the “problem” explanatory variable, xt, and
• Uncorrelated with the error term, εt.

Instrumental variables (IV) Regression 2: In the original model, replace the “problem” explana-
tory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable
provided by the instrument, zt, from IV Regression 1.
• Dependent variable: Original dependent variable, yt.
• Explanatory variable: Estimate of the “problem” explanatory variable based on the results from
IV Regression 1, Estxt.
645 Omitted Variables and the Instrumental Variable Estimation Procedure

20.3.2 The “Good” Instrument Conditions

Let us again provide the intuition behind why a “good” instrument, zt, must satisfy the two
conditions:

Instrument/“problem” explanatory variable correlation: The instrument, zt, must be correlated

with the “problem” explanatory variable, xt. To understand why, focus on IV Regression 1. We
are using the instrument to create a surrogate for the “problem” explanatory variable in IV
Regression 1:

Estxt = aConst + azzt

Instrument/error term independence: The instrument, zt, must be independent of the error term,
εt. Focus on IV Regression 2. We begin with the original model and then replace the “problem”
explanatory, xt, variable with its surrogate, Estxt:

yt = βConst + βxxt + εt
↓ Replace “problem” with surrogate
= βConst + βxEstxt + εt

where Estxt = aConst + azzt from IV Regression 1.

Consequently, to avoid violating the explanatory variable/error term independence premise the
instrument, zt, and the error term, εt, must be independent.

Estxt and εt must be independent

yt = βConst + βxEstxt + εt
↓
Estxt = aConst + azzt

zt and εt must be independent

646 Chapter 20

20.4 Omitted Explanatory Variables Example: 2008 Presidential Election

2008 presidential election data: Cross-sectional data of election, population, and economic
statistics from the 50 states and the District of Columbia in 2008.

AdvDegt Percent adults who have advanced degrees in state t

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election—2008.]

Now we introduce a model to explain the Democratic vote:

Model: VoteDemPartyTwot = βConst + βPopDenPopDent + βLibLiberalt + et

The variable Liberalt reflects the “liberalness” of the electorate in state t. On the one hand, if
the electorate is by nature liberal in state t, the Liberalt would be high; on the other hand, if the
electorate is conservative, the Liberalt would be low. The theories described below suggest that
coefficients both Liberalt and PopDent would be positive:
•Population density theory: States with high population densities have large urban areas
which are more likely to vote for the Democratic candidate, Obama; hence βPopDen > 0.
• “Liberalness” theory: Since the Democratic party is more liberal than the Republican party,
a high “liberalness” value would increase the vote of the Democratic candidate, Obama; hence
βLib > 0.

Unfortunately, we do not have any data to quantify the “liberalness” of a state; according, Liberal
must be omitted from the regression (table 20.2).
Question: Might the ordinary least squares (OLS) estimation procedure suffer from a serious
econometric problem?

Since the “liberalness” variable is unobservable and must be omitted from the regression, the
explanatory variable/error term premise would be violated if the included variable, PopDent, is
correlated with the new error term, εt.
647 Omitted Variables and the Instrumental Variable Estimation Procedure

Table 20.2
Democratic vote OLS regression results

Ordinary least squares (OLS)

Dependent variable: VoteDemPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

PopDen 0.004915 0.000957 5.137338 0.0000

Const 50.35621 1.334141 37.74431 0.0000
Number of observations 51
Estimated equation: VoteDemPartyTwo = 50.3 + 0.005PopDen
Interpretation of estimates:
bEstPopDen = 0.005: A 1 person increase in a state’s population density increases the state’s Democratic vote by 0.005
percentage points; that is, a 100 person increase in a state’s population density increases the state’s Democratic
vote by 0.5 percentage points
Critical result: The EstPopDen coefficient estimate equals 0.005. The positive sign of the coefficient estimate
suggests that increases in a state with a higher population density will have a greater Democratic
vote. This evidence supports the theory.

PopDent and Liberalt

positively correlated
Included Typically omitted
variable variable
PopDent up Liberalt up
t = Lib Liberalt + et

t up Lib > 0

PopDent and t positively correlated

Figure 20.3
PopDen—A “problem” explanatory variable

VoteDemPartyTwot = βConst + βPopDenPopDent + βLibLiberalt + et

= βConst + βPopDenPopDent + (βLibLiberalt + et)
= βConst + βPopDenPopDent + εt

where εt = βLibLiberalt + et. We have good reason to believe that they will be correlated because
we would expect PopDent and Liberalt to be correlated. States that tend to elect liberal repre-
sentatives and senators then to have high population densities. That is, we suspect that PopDent
and Liberalt are positively correlated (figure 20.3). Consequently the included explanatory vari-
able, PopDent, and the error term, εt, will be positively correlated. The ordinary least squares
(OLS) estimation procedure for the value of the coefficient will be biased upward.
To summarize, when the explanatory variable Liberal is omitted, as it must be, PopDen
becomes a “problem” explanatory variable become it is correlated with the error term, εt. Now
648 Chapter 20

we will apply the Instrumental Variable (IV) estimation procedure to understand how the instru-
ment variable estimation procedure can address the omitted variable problem.

20.5 Instrument Variable (IV) Application: 2008 Presidential Election

20.5.1 The Mechanics

Choose an instrument: In this example we will use the percent of college graduates, Collt, as
our instrument. In doing so, we believe that it satisfies the two “good” instrument conditions;
that is, we believe that the percentage of high school graduates, Collt, is
• positively correlated with the “problem” explanatory variable, PopDent

and
•uncorrelated with the error term, ε t = βLibLiberalt + et. Consequently we believe that the instru-
ment, Collt, is uncorrelated with the omitted variable, Liberalt.

Instrumental variables (IV) Regression 1 (table 20.3):

• Dependent variable: “Problem” explanatory variable, PopDen.

• Explanatory variable: Instrument, Coll.

Instrumental variables (IV) Regression 2 (table 20.4):

• Dependent variable: Original dependent variable, VoteDemPartyTwo.

• Explanatory variable: Estimate of the “problem” explanatory variable based on the results
from IV Regression 1, EstPopDen.

Table 20.3
Democratic vote IV Regression 1 results

Ordinary least squares (OLS)

Dependent variable: PopDen

Explanatory variable(s): Estimate SE t-Statistic Prob

Coll 149.3375 28.21513 5.292816 0.0000

Const −3,676.281 781.0829 − 4.706647 0.0000
Number of observations 51
Estimated equation: EstPopDen = −3,676.3 + 149.3Coll
649 Omitted Variables and the Instrumental Variable Estimation Procedure

Table 20.4
Democratic vote IV Regression 2 results

Ordinary least squares (OLS)

Dependent variable: VoteDemPartyTwo

Explanatory variable(s): Estimate SE t-Statistic Prob

EstPopDen 0.009955 0.001360 7.317152 0.0000

Const 48.46247 1.214643 39.89852 0.0000
Number of observations 51
Estimated equation: VoteDemPartyTwo = 48.5 + 0.010EstPopDen
Interpretation of estimates:
bEstPopDen = 0.010: A 1 person increase in a state’s population density increases the state’s Democratic vote by 0.010
percentage points; that is, a 100 person increase in a state’s population density increases the state’s Democratic
vote by 1.0 percentage points
Critical result: The EstPopDen coefficient estimate equals 0.010. The positive sign of the coefficient estimate
suggests that increases in a state with a higher population density will have a greater Democratic
vote. This evidence supports the theory.

Table 20.5
Correlation matrices Coll and PopDen

Correlation matrix

Coll PopDen

Coll 1.000000 0.603118

PopDen 0.603118 1.000000

20.5.2 Good Instrument Conditions Revisited

IV Regression 1 allows us to assess the first “good” instrument condition.

Instrument/“problem” explanatory variable correlation: The instrument, Collt, must be corre-

lated with the “problem” explanatory variable, PopDent. We are using the instrument to create
a surrogate for the “problem” explanatory variable in IV Regression 1:

EstPopDen = −3,676.3 + 149.3Coll

The estimate, EstPopDent, will be a “good” surrogate only if the instrument, Collt, is correlated
with the “problem” explanatory variable, PopDent; that is, only if the estimate is a good predictor
of the “problem” explanatory variable. Table 20.5 reports that the correlation coefficient for Collt
and PopDent equals 0.60. Furthermore the IV Regression 1 results appearing in table 20.3 suggest
that the instrument, Collt, will be a good predictor of the “problem” explanatory variable,
PopDent. Clearly, the coefficient estimate is significant at the 1 percent level. So it is reasonable
to judge that the instrument meets the first condition.
650 Chapter 20

Next focus on the second “good” instrument condition:

Instrument/error term independence: The instrument, Collt, and the error term, εt, must be
independent. Otherwise, the explanatory variable/error term independence premise would be
violated in IV Regression 2.

Recall the model that IV Regression 2 estimates:

VoteDemPartyTwot = βConst + βPopDenEstPopDent + εt

Question: Are EstPopDent and εt independent?

EstPopDent = −3,676.3 + 149.3Collt, εt = βLibLiberalt + et

Answer: Only if Collt and Liberalt are independent.

The explanatory variable/error term independence premise will be satisfied only if the surrogate,
EstPopDent, and the error term, εt, are independent. EstPopDent is a linear function of Collt and
εt is a linear function of PopDent:
• EstPopDent = −560.9 + 28.13Collt

and
• εt = βLibLiberalt + et

Hence the explanatory variable/error term independence premise will be satisfied only if the
instrument, Collt, and the omitted variable, Liberalt, are independent. If they are correlated, then
we have gone “from the frying pan into the fire.” It was the violation of this premise that created
the problem in the first place. Unless we were to believe that liberals are better educated than
conservatives, and vice versa, it is not unreasonable to believe that education and political lean-
ings are independent. Many liberals are highly educated and many conservatives are highly
educated. Unfortunately, there is no way to confirm this empirically. This can be the “Achilles
heel” of the instrumental variable (IV) estimation procedure. When we choose an instrument, it
must be uncorrelated with the omitted variable. Since there is no way to assess this empirically,
we are implicitly assuming that the second good instrument condition is satisfied when we use
the instrumental variables estimation procedure to address the omitted explanatory variables
problem.

20.6 Justifying the Instrumental Variable (IV) Approach: A Simulation

We claim that while the instrumental variable (IV) estimation procedure for the coefficient value
is still biased when an omitted explanatory variable problem exists, it will be consistent when
we use a “good” instrument.
651 Omitted Variables and the Instrumental Variable Estimation Procedure

Sample size

0.0 50
Act 1.0 75
coef 2.0 100
125
150

Act –4.0
Coef2 0.0 Estimation
4.0 procedure

IV Pause

Start Stop

Repetition
Coef value est
Both X’s
Mean
Only X1
Var

Corr X1&Z Corr X2&Z Corr X1&X2

0.50 0.00 0.00

0.75 0.10 0.30
0.60

Correlation Correlation Correlation

coefficient for coefficient for coefficient for
X1 and Z X2 and Z X1 and X2

Figure 20.4
IV omitted variable simulation

Econometrics Lab 20.2: Instrumental Variables—Omitted Variables: Good Instrument

While this claim can be justified rigorous, we will avoid the complicated mathematics by using
a simulation (figure 20.4).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 20.2.]

The model upon which this simulation is based on the following model:

yt = βConst + βx1x1t + βx2x2t + et

652 Chapter 20

The Only X1 button is selected; hence, only the first explanatory variable will be included in
the analysis. The model becomes

yt = βConst + βx1x1t + βx2x2t + et

= βConst + βx1x1t + (βx2x2t + et)
= βConst + βx1x1t + εt
where εt = βx2x2t + et. The second explanatory variable, x2, is omitted.
As before the values for the actual coefficients of the two explanatory variables are 2.0 and
4.0. Since the Only X1 button is selected, the explanatory variable, x2, is omitted and only the
first explanative variable, x1, is included. The correlation coefficient of the two explanatory
variables equals 0.60. The included and omitted variables are positively correlated.
Since the instrumental variable (IV) estimation procedure, IV, is specified, the Corr X1&Z
and Corr X2&Z lists appear. These lists concern the instrumental variable, z. Recall that to be
a good instrument two conditions must be met:
•included “problem” explanatory variable must be correlated so that the instrument acts as a
good surrogate for the “problem” explanatory variable.
• error term must be independent so that we do not violate the explanatory variable/error term
independence premise.

The Corr X1&Z list specifies the correlation coefficient of the included explanatory variable,
x1, and the instrument, z. This correlation indicates how good a surrogate the instrument will
be. An increase in correlation means that the instrument should become a better surrogate. By
default, this correlation coefficient equals .50. The Corr X2&Z list specifies the correlation coef-
ficient of the omitted explanatory variable, x2, and the instrument, z. Recall how the omitted
variable, x2, and the error term, εt, are related:

yt = βConst + βx1x1t + εt

where εt = βx2 x2t + et. By default, .00 is selected from the Corr X2&Z list. Hence the instrument,
z, and the error term, εt, are independent. The second condition required for a “good” instrument
is also satisfied. Initially, the sample size equals 50. Then we increase from 50 to 100 and sub-
sequently from 100 to 150. Table 20.6 reports results from this simulation:
Both bad news and good news emerge:
• Bad news: The instrumental variable estimation is biased. The mean of the estimates for the
coefficient of the first explanatory variable, x1, does not equal the actual value we specified, 2.
• Good news: As we increase the sample size, the mean of the coefficient estimates gets closer
to the actual value and the variance of the coefficient estimates becomes smaller. This illustrates
the fact that the instrumental variable (IV) estimation procedure is consistent.
653 Omitted Variables and the Instrumental Variable Estimation Procedure

Table 20.6
IV estimation procedure—Biased but consistent

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests

IV 0.50 0.00 0.60 50 2.0 ≈1.81 ≈0.19 ≈32.6

IV 0.50 0.00 0.60 100 2.0 ≈1.89 ≈0.11 ≈14.5
IV 0.50 0.00 0.60 150 2.0 ≈1.94 ≈0.06 ≈ 9.2

Table 20.7
IV estimation procedure—Improved instrument

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests

IV 0.50 0.00 0.60 150 2.0 ≈1.94 ≈0.06 ≈ 9.2

IV 0.75 0.00 0.60 150 2.0 ≈1.97 ≈0.03 ≈3.9

Next let us see what happens when we improve the instrument by making it more correlated
with the included “problem” explanatory variable. We do this by increasing the correlation coef-
ficient of the included explanatory variable, x1, and the instrument, z, from 0.50 to 0.75 when
the sample size equals 150 (table 20.7). The magnitude of the bias decreases and the variance
of the coefficient estimates also decreases. We now have a better instrument.

Econometrics Lab 20.3: Instrumental Variables—Omitted Variables: Bad Instrument

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 20.3.]

Last, let us use the lab to illustrate the important role that the independence of the error term,
εt, and the instrument, z, plays:

βt = βx2x2t + et

By specifying 0.10 from the Corr X2&Z list the error term, εt, and the instrument, z, are no
longer independent (table 20.8). As we increase the sample size from 50 to 100 to 150, the
magnitude of the bias does not decrease. The instrumental variable (IV) estimation procedure is
no longer consistent. This illustrates the “Achilles heel” of the instrument variable (IV) estima-
tion procedure.
654 Chapter 20

Table 20.8
IV estimation procedure—Instrument correlated with omitted variable

Correlation coefficients
Estimation Sample Actual Mean of Magnitude Variance of
procedure X1&Z X2&Z X1&X2 size coef1 coef1 ests of bias coef1 ests

IV 0.50 0.10 0.60 50 2.0 ≈2.63 ≈0.63 ≈31.1

IV 0.50 0.10 0.60 100 2.0 ≈2.70 ≈0.70 ≈13.7
IV 0.50 0.10 0.60 150 2.0 ≈2.73 ≈0.73 ≈8.7

Chapter 20 Review Questions

1. In generally, what are the two conditions that a “good” instrument must meet?
2. More specifically, when an omitted variable issue arises, a “good” instrument must satisfy
two conditions.
a. The instrument and the included “problem” explanatory variable:
i. How must the instrument be related to the included “problem” explanatory variable?
Explain.
ii. Can we determine whether this condition is met? If so, how?
b. The instrument and the omitted variable:
i. How must the instrument be related to the omitted explanatory variable? Explain.
ii. Can we determine whether this condition is met? If so, how?

Chapter 20 Exercises

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election—2008.]

Consider the following model explaining the vote received by the Democratic Party in the
2008 presidential election:

VoteDemPartyTwot = βConst + βUnTrendUnemTrendt + βPopDenPopDent + βLibLiberalt + et

where

AdvDegt Percent adults who have advanced degrees in state t

UnemTrendt Change in the unemployment rate for state t in 2008 (percent)
PopDent Population density of state t (persons per square mile)
Liberalt “Liberalness” of the electorate in state t

Recall that βPopDen > 0 and βLib > 0.

655 Omitted Variables and the Instrumental Variable Estimation Procedure

1. Focus on the unemployment trend.

a. How do you believe a state’s unemployment trend affected the Democrat vote in the 2008
presidential election?
b. What does this suggest about the sign of βUnTrend?
2. Since no measures of state “liberalness” are included in the data, what problems, if any,
might concern you when using the ordinary least squares (OLS) estimation procedure to calcu-
late this estimate? Explain.
3. As a consequence of the troublesome issues arising from using the ordinary least squares
(OLS) estimation procedure use the instrumental variable (IV) estimation procedure to estimate
the effect of population density. Consider using the AdvDeg variable as an instrument. To be a
“good” instrument, what conditions must the AdvDeg variable meet?
4. Use the instrumental variable (IV) estimation procedure with AdvDeg as the instrument to
estimate the effect of population density. (To do so, you cannot use the two-stage estimation
procedure. You must run the two instrumental variables regressions: IV Regression 1 and IV
Regression 2.)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election—2008.]

a. What are your results for IV Regression 1?

b. What are your results for IV Regression 2?
5. Assess whether or not AdvDeg meets the “good” instrument conditions.
Consider the following model explaining the vote received by the Democratic Party in the 2008
presidential election:

VoteDemPartyTwot = βConst + βGdpGth RealGdpGrowtht + βPopDenPopDent + βLibLiberalt + et

where

AdvDegt Percent adults who have advanced degrees in state t

RealGdpGrowtht GDP growth rate for state t in 2008 (percent)
PopDent Population density of state t (persons per square mile)
Liberalt “Liberalness” of the electorate in state t

Recall that βPopDen > 0 and βLib > 0.

6.
a. First focus on the growth rate of real GDP.
i. How do you believe a state’s GDP growth rate affected the Democrat vote in the 2008
presidential election?
ii. What does this suggest about the sign of βGdpGth?
656 Chapter 20

b. Next focus on population density.

i. How do you believe a state’s population density affected the Democrat vote in the
2008 presidential election?
ii. What does this suggest about the sign of βPopDen?
c. Last focus on “liberalness.”
i. How do you believe a state’s “liberalness” affected the Democrat vote in the 2008
presidential election?
ii. What does this suggest about the sign of βLib?
7. Since no measures of state “liberalness” are included in the data, What problems, if any,
might concern you when using the ordinary least squares (OLS) estimation procedure to calcu-
late this estimate? Explain.
8. As a consequence of the troublesome issues arising from using the ordinary least squares
(OLS) estimation procedure use the instrumental variable (IV) estimation procedure to estimate
the effect of population density. Consider using the AdvDeg variable as an instrument. To be a
“good” instrument, what conditions must the AdvDeg variable meet?
9. Use the instrumental variable (IV) estimation procedure with AdvDeg as the instrument to
estimate the effect of population density. (To do so, you cannot use the two-stage estimation
procedure. You must run the two instrumental variables regressions: IV Regression 1 and IV
Regression 2.)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Presidential Election—2008.]

a. What are your results for IV Regression 1?

b. What are your results for IV Regression 2?
10. Assess whether or not AdvDeg meets the “good” instrument conditions.
Panel Data and Omitted Variables
21

Chapter 21 Outline

21.1 Taking Stock: Ordinary Least Squares (OLS) Estimation Procedure

21.1.1 Standard Ordinary Least Squares (OLS) Premises
21.1.2 OLS Bias Question
21.1.3 OLS Reliability Question

21.2 Preview: Panel Data Examples and Strategy

21.3 First Differences and Fixed Effects (Dummy Variables)

21.3.1 Math Quiz Score Model
21.3.2 Ordinary Least Squares (OLS) Pooled Regression
21.3.3 First Differences
21.3.4 Cross-sectional Fixed Effects (Dummy Variables)

21.4 Period Fixed Effects (Dummy Variables)

21.4.1 Chemistry Score Model
21.4.2 Ordinary Least Squares (OLS) Pooled Regression
21.4.3 Period Fixed Effects (Dummy Variables)

21.5 Cross-sectional Random Effects

21.5.1 Art Project Model
21.5.2 Ordinary Least Squares (OLS) Pooled Regression
21.5.3 Cross-sectional Random Effects

21.6 Random Effects Critical Assumptions

Chapter 21 Preview Questions

1. Fill in each of the following blanks with a Yes or a No:

658 Chapter 21

Satisfied: Violated:
OLS bias question: Is the explanatory Explanatory variable Explanatory variable
variable/error term independence and error term and error term
premise satisfied or violated? independent correlated
Is the OLS estimation procedure
for the value of the coefficient
unbiased or biased? __________ __________

OLS reliability question: Are the error Satisfied Violated

term equal variance and error
term/error term independence premises
satisfied or violated?
Can the OLS calculations for the
standard error be “trusted?” _____ _____
Is the OLS estimation procedure for
the value of the coefficient BLUE? _____ _____

2. Suppose that there are three college students enrolled in a small math class: Jim, Peg, and
Tim. A quiz is given weekly. Each student’s quiz score for the first ten weeks of the semester
is reported below along with the number of minutes the student studied and his/her math SAT
score from high school.

Quiz Math Minutes Quiz Math Minutes

Student Week score SAT studied Student Week score SAT studied
Jim 1 18 720 13 Peg 1 31 760 27
Jim 2 20 720 17 Peg 2 32 760 23
Jim 3 24 720 19 Peg 3 28 760 21
Jim 4 16 720 23 Peg 4 22 760 23
Jim 5 8 720 13 Peg 5 22 760 27
Jim 6 18 720 15 Peg 6 31 760 19
Jim 7 27 720 17 Peg 7 26 760 25
Jim 8 15 720 13 Peg 8 24 760 25
Jim 9 14 720 17 Peg 9 25 760 17
Jim 10 11 720 13 Peg 10 24 760 19
Tim 1 15 670 17 Tim 6 12 670 17
Tim 2 5 670 11 Tim 7 12 670 19
Tim 3 14 670 21 Tim 8 17 670 13
Tim 4 13 670 15 Tim 9 11 670 11
Tim 5 14 670 13 Tim 10 10 670 9
659 Panel Data and Omitted Variables

Panel data (also called longitudinal data) combines time series and cross-sectional information.
A time series refers to data for a single entity in different time periods. A cross section refers
to data for multiple entities in a single time period. In this example, data from the ten weeks
represent the time series; that is, the ten weeks provide data from ten different time periods. The
data from the three students represent the cross section; that is, the three students provide data
for three different entities, Jim, Peg, and Tim.
Our assignment is to assess the effect of studying on the students’ math quiz scores:
Assignment: Assess the effect of studying on quiz scores.

Consider the following model:

MathScore ti = βConst + βSatMathSat ti + βMinsMathMins ti + e ti

where

MathScore ti = quiz score for student i in week t

MathSat ti = math SAT score for student i in week t
MathMins ti = minutes studied student i in week t

Focus on the superscripts and subscripts of the variables:

• The subscript t denotes time; that is, t equals the week: 1, 2, . . . , or 10.
• The superscript i denotes the individual student; that is, i equals Jim, Peg, or Tim.

Let us now take a closer look at the data. The math SAT scores are from high school. Jim’s SAT
score in high school equaled a constant 720. Similarly Peg’s is a constant 760 and Tim’s is a
constant 670:

t = 720
MathSat Jim for t = 1, 2, . . . , 10
MathSat Peg
t = 760 for t = 1, 2, . . . , 10
t = 670
MathSat Tim for t = 1, 2, . . . , 10

This allows us to simplify the notation. Since the MathSat variable only depends on the student
and does not depend on the week, we can drop the time subscript t for the MathSat variable,
but of course we must retain the individual student superscript i to denote the student:

MathScore ti = βConst + βSatMathSati + βMinsMathMins ti + e ti

a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
Privacy concerns did not permit the college to release student SAT data. Consequently, you
have no choice but to omit MathSat from your regression.
b. Do high school students who receive high SAT math scores tend to study more or less
than those students who receive low scores?
660 Chapter 21

c. Would you expect MathSat and MathMins to be correlated?

d. Would you expect the ordinary least squares (OLS) estimation procedure for the value of
the MathMins coefficient to be biased? Explain.

21.1 Taking Stock: Ordinary Least Squares (OLS) Estimation Procedure

This chapter does not introduce any new concepts. It instead applies the concepts that we already
learned to a new situation. We begin by reviewing the concepts we will be using. First recall the
standard ordinary least squares (OLS) premises:

21.1.1 Standard Ordinary Least Squares (OLS) Premises

• Error term equal variance premise: The variance of the error term’s probability distribu-
tion for each observation is the same; all the variances equal Var[e]:

Var[e1] = Var[e2] = . . . = Var[eT] = Var[e]

• Error term/error term independence premise: The error terms are independent: Cov[ei,
ej] = 0.

Knowing the value of the error term from one observation does not help us predict the value of
the error term for any other observation.

• Explanatory variable/error term independence premise: The explanatory variables, the

xt’s, and the error terms, the et’s, are not correlated.

Knowing the value of an observation’s explanatory variable does not help us predict the value
of that observation’s error term.
The ordinary least square (OLS) estimation procedure is economist’s most widely used esti-
mation procedure (figure 21.1). When contemplating the use of this procedure, we should keep
two questions in mind:

OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?
•If independent, the ordinary least squares (OLS) estimation procedure for the coefficient value
will be unbiased, and we should pose the second reliability question.
• If correlated, the ordinary least squares (OLS) estimation procedure for the coefficient value
will be biased, in which case we should consider an alternative procedure in an effort to calculate
better estimates.
661 Panel Data and Omitted Variables

OLS bias question: Is the explanatory Satisfied: Explanatory Violated:

variable/error term independence premise variable and error term Explanatory variable and
satisfied or violated? independent error term correlated
Is the OLS estimation procedure
for the value of the coefficient unbiased or
biased? Unbiased Biased

OLS reliability question: Are the error term Satisfied Violated

equal variance and error term/error term
independence premises satisfied or violated?

Can the OLS calculations for the standard error

Yes No
be “trusted”?

Is the OLS estimation procedure for the value

Yes No
of the coefficient BLUE?

Figure 21.1
Ordinary least squares (OLS) bias summary

OLS reliability question: Are the ordinary least square (OLS) error term equal variance premise
and the error term/error term independence premises satisfied; that is, is the variance of the
probability distribution for each observation’s error term the same and are the error terms inde-
pendent from each other?
• If satisfied, the ordinary least squares (OLS) estimation procedure calculation of the coeffi-
cient’s standard error, t-statistic, and tails probability will be “sound” and the ordinary least
squares (OLS) estimation procedure is the best linear unbiased estimation procedure (BLUE).
In some sense, we cannot find a better linear estimation procedure and hence we should be
pleased.
• If violated, the ordinary least squares (OLS) estimation procedure calculation of the coeffi-
cient’s standard error, t-statistic, and tails probability will be flawed and the ordinary least
squares (OLS) estimation procedure is not the best linear unbiased estimation procedure (BLUE).
In this case, we can use a generalized least squares (GLS) estimation by tweaking the original
model in a way that eliminates the problem or calculate robust standard errors.

21.2 Preview: Panel Data Examples and Strategy

In this chapter we apply what we have learned to panel data (also called longitudinal data), situ-
ations in which have time series data for a number of cross sections. We use three artificially
generated examples to show how the use of panel data techniques can mitigate some of the dif-
ficulties encountered when using the ordinary least squares (OLS) estimation procedure. These
662 Chapter 21

examples are designed to illustrate the issues clearly. All three examples involve the score stu-
dents receive in their college classes:
• Math class panel data: Three students comprise the entire enrollment of a math class. Each
week a quiz is given. The quiz score earned by each student is collected from the first ten weeks
of the course. Also the number of minutes each student studied for each week’s quiz and each
student’s math SAT score from high school is available.
• Chemistry class panel data: Two students are enrolled in an advanced undergraduate chem-
istry course. Each week a lab report is due. The score earned by each student is collected from
the first ten labs along with the number of minutes each student devoted to the lab. Each week
a different graduate student grades both lab reports submitted by the two students.
• Studio art class panel data: Three students are randomly selected from a heavily enrolled
studio art class. Each week each student submits an art project. The score earned by each student
is collected from the first ten weeks of the course along with the number of minutes each student
devoted to the project.

We have data describing the performance of a number of students over a number of weeks. This
is what we mean by panel data. Cross-sectional and time series information are combined. In
our cases the students comprise the cross sections and the weeks the time series. As we will
learn, the existence of panel data can sometimes allow us to account for omitted variables.

21.3 First Differences and Fixed Effects (Dummy Variables)

Suppose that there are three college students enrolled in a small math class: Jim, Peg, and Tim.
A quiz is given weekly. Each student’s quiz score for the first ten weeks of the semester is
reported below along with the number of minutes the student studied and his/her math SAT score
from high school.

Math quiz score data: Artificially constructed panel data for 3 students during the first 10 weeks
of a math class (table 21.1):

MathScore ti Quiz score for student i in week t

MathSat ti Math SAT score for student i in week t
i
MathMins t Minutes studied student i in week t
i
Week t Week
i
DumJim t 1 if student i is Jim; 0 otherwise
i
DumPeg t 1 if student i is Peg; 0 otherwise
i
DumTim t 1 if student i is Tim; 0 otherwise

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Panel Data—Math.]
663 Panel Data and Omitted Variables

Table 21.1
Math quiz panel data

Quiz Math Minutes Quiz Math Minutes

Student Week score SAT studied Student Week score SAT studied

Jim 1 18 720 13 Peg 1 31 760 27

Jim 2 20 720 17 Peg 2 32 760 23
Jim 3 24 720 19 Peg 3 28 760 21
Jim 4 16 720 23 Peg 4 22 760 23
Jim 5 8 720 13 Peg 5 22 760 27
Jim 6 18 720 15 Peg 6 31 760 19
Jim 7 27 720 17 Peg 7 26 760 25
Jim 8 15 720 13 Peg 8 24 760 25
Jim 9 14 720 17 Peg 9 25 760 17
Jim 10 11 720 13 Peg 10 24 760 19
Tim 1 15 670 17 Tim 6 12 670 17
Tim 2 5 670 11 Tim 7 12 670 19
Tim 3 14 670 21 Tim 8 17 670 13
Tim 4 13 670 15 Tim 9 11 670 11
Tim 5 14 670 13 Tim 10 10 670 9

Panel data combines time series and cross-sectional information. A time series refers to data
from a single entity in different time periods. A cross section refers to data for multiple entities
in a single time period. In this example, data from the ten weeks represent the time series; that
is, the ten weeks provide data from ten different time periods. The data from the three students
represent the cross section; that is, the three students provide data for three different entities.
Our assignment is to assess the effect of studying on the students’ math quizzes:
Project: Assess the effect of studying on math quiz scores.

21.3.1 Math Quiz Score Model

Consider the following model:

MathScoreti = βConst + βSatMathSatti + βMinsMathMins ti + e ti

First focus on the superscripts and subscripts of the variables:1

1 Traditionally both the cross section and time are identified as subscripts. To reduce the possibility of confusion,
however, we use a superscript to identify the cross section so that there is only a single subscript, the subscript identify-
ing the time period.
664 Chapter 21

• The subscript t denotes time period, the week; that is, t equals 1, 2, . . . , or 10.
• The superscript i denotes the cross section, the individual student; that is, i equals Jim, Peg,
or Tim.

Let us now take a closer look at the data. The math SAT scores are from high school. Jim’s SAT
score from high school equaled a constant 720. Similarly Peg’s is a constant 760 and Tim’s is
a constant 670:

MathSattJim = 720 for t = 1, 2, . . . , 10

MathSat t
Peg
= 760 for t = 1, 2, . . . , 10
MathSat t
Tim
= 670 for t = 1, 2, . . . , 10

This allows us to simplify the notation. Since the SAT data for our three college students are
from high school, the MathSat variable only depends on the student and does not depend on the
week. We can drop the time subscript t for the MathSat variable, but we must, of course, retain
the individual student superscript i to denote the student:

MathScore ti = βConst + βSatMathSati + βMinsMathMins ti + e ti

For clarity, let us apply this to each of the three students:

• For Jim, i = Jim,

MathScoretJim = βConst + βSatMathSat Jim + βMinsMathMinstJim + etJim

• For Peg, i = Peg,

MathScoretPeg = βConst + βSatMathSat Peg + βMinsMathMinstPeg + etPeg

• For Tim, i = Tim,

MathScoretTim = βConst + βSatMathSat Tim + βMinsMathMinstTim + etTim

The model’s implicit assumptions now become apparent:

• βConst is the same for each student.
• βSat is the same for each student. This implies that the effect of math SAT scores on quiz scores
is identical for each student
• βMathMins is the same for each student. This implies that the effect of minutes studied on quiz
scores is identical for each student; that is, each student receives the same number of additional
points for each additional minute studied.

Theory: The theory concerning how math SAT scores and studying affect quiz scores is
straightforward. Both coefficients should be positive:

βSat > 0: Higher math SAT scores increase a student’s quiz score
βMins > 0: Studying more increases a student’s quiz score
665 Panel Data and Omitted Variables

Table 21.2
Math quiz pooled OLS regression results—MathSAT and MathMins explanatory variables

Ordinary least squares (OLS)

Dependent variable: MathScore

Explanatory variable(s): Estimate SE t-Statistic Prob

MathSat 0.117932 0.028331 4.162589 0.0003

MathMins 0.431906 0.214739 2.011305 0.0544
Const −73.54375 18.05883 − 4.072454 0.0004
Number of observations 30
Estimated equation: EstMathScore = −73.54 + 0.118MathSat + 0.43MathMins
Interpretation of estimates:
bMathSat = 0.118: A 100 point increase in a student’s math SAT score increases his/her quiz score by 11.8 points
bMathMins = 0.43: A 10 minute increase in studying increases a student’s math quiz score by 4.3 points

21.3.2 Ordinary Least Squares (OLS) Pooled Regression

We begin pooling the data and using the ordinary least squares (OLS) estimation procedure to
estimate the parameters of the model. We run a regression in which each week of student data
represents one observation. This is called a pooled regression because we merge the weeks and
students together. No distinction is made between a specific student (cross section) or a specific
week (time). In a pooled regression every week of data for each student is treated the same
(table 21.2).
Now apply the estimates to each of the three students:

Jim: MathSatJim = 720

EstMathScoreJim = −73.54 + 0.118MathSat + 0.43MathMins
= −73.54 + 0.118 × 720 + 0.43MathMins
= −73.54 + 84.96 + 0.43MathMins
= 11.42 + 0.43MathMins

Peg: MathSatPeg = 760

EstMathScorePeg = −73.54 + 0.118MathSat + 0.43MathMins
= −73.54 + 0.118 × 760 + 0.43MathMins
= −73.54 + 89.68 + 0.43MathMins
= 16.14 + 0.43MathMins
666 Chapter 21

EstMathScore
Peg

Jim

16.14
Tim

11.42

Slope = 0.43
5.52

MathMins

Figure 21.2
Estimates of math scores with math SAT explanatory variable

Tim: MathSatTim = 670

EstMathScoreTim = −73.54 + 0.118MathSat + 0.43MathMins
= −73.54 + 0.118 × 670 + 0.43MathMins
= −73.54 + 79.06 + 0.43MathMins
= 5.52 + 0.43MathMins

When we plot the estimated equations the three students have different intercepts, but all have
the same slope (figure 21.2). This is a consequence of our model’s assumption regarding the
effect of studying:

MathScore ti = βConst + βSatMathSat i + βMinsMathMins ti + e ti

We have implicitly assumed that the effect of studying reflected by the coefficient βMins is the
same for each student.
Now let us introduce a twist. What if privacy concerns did not permit the college to release
student SAT data? In this case we must omit MathSat from our pooled regression (table 21.3).
What are the ramifications of omitting MathSat from this regression?
667 Panel Data and Omitted Variables

Table 21.3
Math quiz pooled OLS regression results—MathMins explanatory variable only

Ordinary least squares (OLS)

Dependent variable: MathScore

Explanatory variable(s): Estimate SE t-Statistic Prob

MathMins 1.017235 0.204204 4.981467 0.0000

Const 0.594368 3.754613 0.158303 0.8754
Number of observations 30
Estimated equation: EstMathScore = 0.59 + 1.02MathMins
Interpretation of estimates:
bMathMins = 1.02: A 10 minute increase in studying increases a student’s quiz score by 10.2 points

MathScore ti = βConst + βSatMathSat i + βMinsMathMins ti + e ti

Switch the βSatMathSat i and βMinsMathMins ti terms

= βConst + βMinsMathMins ti + βSatMathSat i + e ti

= βConst + βMinsMathMins ti + βSatMathSati + e ti

= βConst + βMins MathMins ti + ε ti

where ε ti = βSat MathSati + e ti . The MathSat term becomes “folded” into the new error term.
Will the ordinary least squares (OLS) estimation procedure for the coefficient value be unbi-
ased? The OLS bias question cited at the start of the chapter provides the answer:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?

More specifically, we need to determine whether or not the model’s explanatory variable, Math-
Minsti, and error term, ε ti , are correlated. To do so, suppose that MathSat i rises. Recall the defini-
tion of ε ti:

εti = βSatMathSat i + eti

ε ti will rise also. What would happen to MathMins ti ? Consider the following question:

Question: Do high school students who receive high SAT math scores tend to study more or
less than those students who receive low scores?

Typically students earning higher SAT scores tend to study more. Hence MathMins ti would typi-
cally rise also. The explanatory variable, MathMins ti, and the error term, ε ti , are positively cor-
related. The ordinary least squares (OLS) estimation procedure for the value of the coefficient
668 Chapter 21

MathSat i up

i =
MathSat i and MathMins ti t Sat MathSat i + eti
positively correlated > 0
Sat
i
Typically MathMinsti up t up

MathMinsti and t
i

positively correlated

OLS estimation procedure for the

coefficient value is biased up

Figure 21.3
Math quiz scores and bias

is biased upward (figure 21.3). When the explanatory variable MathSat is omitted and the ordi-
nary least squares (OLS) estimation procedure is used to estimate the value of the MathMin’s
coefficient, upward bias results.
What can we do? We will now introduce two approaches we can take to address this problem:
• First differences
• Dummy variable/fixed effects

21.3.3 First Differences

To explain the first differences approach, focus on the first student, Jim. Apply the model to
week t and the previous week, week t − 1:

For week t: MathScore Jim

t = βConst + βSatMathSat Jim + βMinsMathMinstJim + etJim
For week t − 1: − 1 = βConst + βSatMathSat
MathScoretJim Jim
+ βMinsMathMinstJim
− 1 + et − 1
Jim

First subtract. The first two terms are the right-hand side, βConst and βSatMathSat Jim, subtract out
leaving us with the following expression:

MathScoretJim − MathScoretJim
−1 = βMinsMathMinstJim − βMinsMathMinstJim
− 1 + et
Jim
− etJim
−1

= βMins(MathMinstJim − βMins MathMinstJim

− 1) + (et
Jim
− etJim
− 1)

By computing first differences, we have eliminated the omitted variable, MathSatJim, from the
equation because MathSat Jim is the same for all weeks.
669 Panel Data and Omitted Variables

Table 21.4
Math quiz first difference OLS regression results

Ordinary least squares (OLS)

Dependent variable: DifMathScore

Explanatory variable(s): Estimate SE t-Statistic Prob

DifMathMins 0.262500 0.226398 1.159463 0.2568

Number of observations 27
Estimated equation: EstDifMathScore = 0.26DifMathMins
Interpretation of estimates:
bDifMathMins = 0.26: A 10 minute increase in studying increases a student’s math quiz score by 2.6 points

Using similar logic for Jones and Smith obtains

MathScore tPeg − MathScore tPeg

− 1 = βMinsMathMinst
Peg
− βMinsMathMinstPeg
− 1 + et
Peg
− etPeg
−1

MathScoretTim − MathScoretTim
− 1 = βMinsMathMinst
Tim
− βMinsMathMinstTim
− 1 + et
Tim
− etTim
−1

We can now generalize this by using the superscript i to represent the students:

MathScore ti − MathScore ti − 1 = βMinsMathMins it − βMinsMathMins ti − 1 + e ti − e ti − 1

Next generate two new variables and use the ordinary least squares (OLS) estimation proce-
dure to estimate the parameters of the first differences model (table 21.4):

DifMathScore = MathScore − MathScore(−1)

DifMathMins = MathMins − MathMins(−1)

Note that this approach relies on one critical assumption:

For each student (cross section) the omitted variable must

First differences critical assumption:
equal the same value in each week (time period). That is, from week to week:
• MathSatJim does not vary.
• MathSatPeg does not vary.
• MathSatTim does not vary.

In fact math SAT scores do not vary for a student from week to week because students take their
math SAT’s in high school, not while they are in college. But, if the math SAT scores for a
student were to vary from week to week, our logic would fail because the MathSat term for that
student would not subtract out when we calculated the first differences. So we have satisfied the
critical assumption in this case.
670 Chapter 21

21.3.4 Cross-sectional Fixed Effects (Dummy Variables)

Again, begin by focusing on Jim. Because MathSat Jim does not vary from week to week, we can
“fold” the MathSatJim term into Jim’s constant.

MathScoretJim = βConst + βCatMathSat Jim + βMinsMathMinsJim + etJim

↓ MathSat Jim = 720
= βConst + βSat × 720 + βMinsMathMinsJim + etJim

Leting α CJim
onst = βConst + βSat × 720

= α CJim
onst +βMins MathMinsJim + etJim

Using the same logic for Peg and Tim obtains

MathScoretPeg =α CPeg
onst +βMinsMathMins
Peg
+ etPeg

MathScoretTim =α CTim
onst +βMinsMathMins
Tim
+ eTim
t

We now have three separate equations: one for Jim, one for Peg, and one for Tim. We can
represent for the three equations concisely introducing three dummy variables:

DumJimi = 1 if the student is Jim (if i = Jim); 0 otherwise

DumPegi = 1 if the student is Peg (if i = Peg); 0 otherwise
DumTimi = 1 if the student is Tim (if i = Tim); 0 otherwise

and using them in the following model:

MathScoreti = α CJim
onst DumJim + α Const DumPeg + α Const DumTim + βMinsMathMins + et
i Peg i Tim i i i

To convince yourself that the “concise” model is equivalent to the three separate equations,
consider each student individually:

For Jim, i = Jim DumJimi = 1 DumPegi = 0 DumTimi = 0

MathScoretJim = α CJim
onst + 0 + 0 + βMinsMathMinsJim + etJim
For Peg, i = Peg DumJimi = 0 DumPegi = 1 DumTimi = 0
MathScoretJim = 0 + α CPeg
onst + 0 + βMinsMathMinsPeg + etPeg

For Tim, i = Tim DumJimi = 0 DumPegi = 0 DumTimi = 1

MathScoretJim = 0 + 0 + α CTim
onst + βMinsMathMinsTim + eTim
t
671 Panel Data and Omitted Variables

Table 21.5
Math quiz OLS regression results—MathMins and cross-sectional dummy variable explanatory variables

Ordinary least squares (OLS)

Dependent variable: MathScore

Explanatory variable(s): Estimate SE t-Statistic Prob

MathMins 0.327305 0.231771 1.412189 0.1698

DumJim 11.86313 3.948825 3.004217 0.0058
DumPeg 19.10292 5.410950 3.530419 0.0016
DumTim 7.521354 3.645812 2.063012 0.0492
Number of observations 30
Estimated equation: EstMathScore = 11.86DumJim + 19.10DumPeg + 7.52 DumTim + 0.33MathMins
Jim: EstMathScore = 11.86 + 0.33MathMins
Peg: EstMathScore = 19.10 + 0.33MathMins
Tim: EstMathScore = 7.52 + 0.33MathMins
Interpretation of estimates:
bMathMins = 0.33: A 10 minute increase in studying increases a student’s quiz score by 3.3 points

Next we use the ordinary least squares (OLS) estimation procedure to estimate the parameters
of our “concise” model (table 21.5). Let us plot the estimated equations for each student (figure
21.4). The dummy variable coefficient estimates are just the intercepts of the estimated equa-
tions. Jim’s intercept is 11.86, Peg’s 19.10, and Tim’s 7.72.
Statistical software makes it easy for us to do this. See table 21.6.

Getting Started in EViews

• Click on MathScore and then while holding the <Ctrl> key down, click on MathMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects Specification box, select Fixed from the Cross Section dropdown box.
• Click OK.

Question: What does the estimate of the constant, 12.83, represent?

Answer: The average of Jim’s, Peg’s, and Tim’s intercepts.

11.86 + 19.10 + 7.52 38.48

Average of the intercepts = = = 12.83
3 3

Statistical software allows us to obtain the individual fixed effects intercepts.

672 Chapter 21

EstMathScore
Peg

Jim
19.10

Tim

11.86

Slope = 0.33
7.52

MathMins

Figure 21.4
Estimates of math scores with dummy cross-sectional variables

Getting Started in EViews

• First, estimate the cross-sectional fixed effects.

• Click View.
• Click Fixed/Random Effects.
• Click Cross Section Effects.

The intercept for each group equals the constant from the regression results (table 21.6) plus the
effect value from table 21.7:

Intercept for Jim: Cross ID 1 12.83 − 0.97 = 11.86

Intercept for Peg: Cross ID 2 12.83 + 6.27 = 19.10
Intercept for Tim: Cross ID 3 12.83 − 5.31 = 7.52

These are just the dummy variable coefficient estimates, the intercepts of the estimated
equations.
The dummy/fixed effects approach relies on the same critical assumption as did the first dif-
ference approach:
673 Panel Data and Omitted Variables

Table 21.6
Math quiz cross-sectional fixed effects regression results

Fixed effects (FE)

Dependent variable: MathScore

Explanatory variable(s): Estimate SE t-Statistic Prob

MathMins 0.327305 0.231771 1.412189 0.1698

Const 12.82913 4.184080 3.066178 0.0050
Number of observations 30
Cross sections 3
Periods 10
Estimated equation: EstMathScore = 12.83 + 0.33MathMins
Interpretation of estimates:
bMathMins = 0.33: A 10 minute increase in studying increases a student’s math quiz score by 3.3 points

Table 21.7
Math quiz cross-sectional fixed effects

Cross ID Fixed effect

1 − 0.966005
2 6.273785
3 − 5.307779

Cross-sectional dummy variable/fixed effects critical assumption: For each student (cross
section) the omitted variable must equal the same value in each week (time period). That is,
from week to week:
• MathSatJim does not vary.
• MathSatPeg does not vary.
• MathSatTim does not vary.

If the math SAT scores for a student were to vary from week to week our logic would fail because
the MathSat term for that student could not be folded into that student’s constant. But, since the
SAT scores are from high school, they do not vary.

21.4 Period Fixed Effects (Dummy Variables)

Next we consider a second scenario in which two students, Ted and Sue, are enrolled in an
advanced undergraduate chemistry course. Each week a lab report is due.
674 Chapter 21

Table 21.8
Chemistry lab panel data

Student Week LabMins LabScore Student Week LabMins LabScore

Ted 1 63 83 Sue 1 63 83
Ted 2 64 92 Sue 2 64 92
Ted 3 70 82 Sue 3 70 82
Ted 4 80 95 Sue 4 80 95
Ted 5 71 85 Sue 5 71 85
Ted 6 78 96 Sue 6 78 96
Ted 7 68 86 Sue 7 68 86
Ted 8 67 96 Sue 8 67 96
Ted 9 80 89 Sue 9 80 89
Ted 10 63 90 Sue 10 63 90

Chemistry lab score data: Artificially constructed panel data for 2 students during the first 10
weeks of a chemistry class.

LabScore ti Chemistry lab score for student i in week t

LabMins ti Minutes devoted to lab by student i in week t
Weekt Week

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Panel Data—Chemistry.]

The scores of the two students and the time each devoted to each lab each week are given in
table 21.8.
Each week the lab reports of the two students are graded by one of 10 graduate students in
the small chemistry department. Each week a different graduate student grades the lab reports
of the two undergraduates; the undergraduate students do not know which graduate student will
be doing the grading beforehand. In the first week, both Ted’s and Sue’s lab reports are graded
by one graduate student; in the second week, Ted’s and Sue’s reports are graded by a second
graduate student; and so on. Our assignment is to use this information to assess the effect that
time devoted to the lab each week has on the score that week’s lab report receives.
Project: Assess the effect of time devoted to lab on the lab report score.

21.4.1 Chemistry Lab Score Model

We begin by formulating a model:

LabScoreti = βConst + βGGGraderGenerosityti + βLabMinsLabMinsti + eti

675 Panel Data and Omitted Variables

where

GraderGenerosityti = generosity of the grader for student i in week t

We apply the model to each student individually:

LabScoretTed = βConst + βGGGraderGenerositytTed + βLabMinsLabMinstTed + eTed

LabScoretSue = βConst + βGGGraderGenerositytSue + βLabMinsLabMinstSue + etSue

Again, we should be aware of the model’s implicit assumptions:

• βConst is the same for each student.
• βGG is the same for each student. The grader does not “play favorites.” The graduate student

applies the same grading standards to each student.

• βLabMins is the same for each student. This implies that the effect of time devoted to the lab on
the lab score is identical for each student.

Recall that in a given week the same graduate student grades each student’s lab report. Therefore
we can drop the student superscript in the GraderGenerosity variable:

LabScoretTed = βConst + βGGGraderGenerosityt + βLabMinsLabMinstTed + etTed

LabScoretSue = βConst + βGGGraderGenerosityt + βLabMinsLabMinstSue + etSue

We apply this to each week:

Week 1 (t = 1): LabScore1Ted = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Ted + e1Ted

LabScore1Sue = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Sue + e1Sue
Week 2 (t = 2): LabScore2Ted = βConst + βGGGraderGenerosity2 + βLabMinsLabMinsTed
2 + e2
Ted

LabScore2Sue = βConst + βGGGraderGenerosity2 + βLabMinsLabMins2Sue + e2Sue

..
.

Week 10 (t = 10): LabScore10

Ted
= βConst + βGGGraderGenerosity10 + βLabMinsLabMins10
Ted
+ e10
Ted

Sue
LabScore10 = βConst + βGGGraderGenerosity10 + βLabMinsLabMins10
Sue
+ e10
Sue

Generalizing obtains

LabScoreti = βConst + βLabMinsLabMinsti + βGGGraderGenerosityt + eti

where

i = Ted, Sue
t = 1, 2, . . . , 10
676 Chapter 21

Table 21.9
Chemistry lab pooled OLS regression results

Ordinary least squares (OLS)

Dependent variable: LabScore

Explanatory variable(s): Estimate SE t-Statistic Prob

LabMins 0.513228 0.174041 2.948883 0.0086

Const 52.67196 12.58013 4.186916 0.0006
Number of observations 20
Estimated equation: EstLabScore = 52.7 + 0.51LabMins
Interpretation of estimates:
bLabMins = 0.51: A 10 minute increase devoted to the lab increases a student’s lab score by 5.1 points

21.4.2 Ordinary Least Squares (OLS) Pooled Regression

We begin by using the ordinary least squares (OLS) estimation procedure in a pooled regression
to estimate the parameters. We include all ten weeks for each of the two students in a single
regression; consequently, we include a total of twenty observations. But we have a problem: the
explanatory variable GraderGenerosity is unobservable. We must omit it from the regression
(table 21.9).
What are the ramifications of omitting GraderGenerosity from the regression?

LabScoreti = βConst + βGGGraderGenerosityt + βLabMinsLabMinsti + e ti

Switching the GraderGenerosityt and LabMinsti terms

= βConst + βLabMinsLabMinsti + βGGGraderGenerosityt + eti

= βConst + βLabMinsLabMinsti + (βGGGraderGenerosityt + e ti )

= βConst + βLabMinsLabMins ti + ε ti

where εti = βGGGraderGenerosityt + eti. The GraderGenerosityt term becomes “folded” into the
new error term, eti.
Will the ordinary least squares (OLS) estimation procedure for the coefficient value be unbi-
ased? The OLS bias question cited earlier in the chapter provides the answer:
OLS bias question: Is the ordinary least squares (OLS) explanatory variable/error term inde-
pendence premise satisfied; that is, are the model’s error term and explanatory variable indepen-
dent or correlated?

To answer this question, suppose the grader in week t is unusually generous. Then GraderGen-
erosityt and the new error term, eti, would rise. Ted and Sue would not know about the grader’s
generosity until after the lab report was returned. Consequently the number of minutes devoted
677 Panel Data and Omitted Variables

GraderGenerosityt up

i i
t
up LabMins t unaffected

i i
LabMins t and t

independent

OLS estimation
procedure for coefficient
value is unbiased

Figure 21.5
Chemistry lab scores and bias

to the lab, LabMinsti, would be unaffected. The explanatory variable, LabMinsti and the new error
term, eti are independent. The ordinary least squares (OLS) estimation procedure for the value
of the coefficient is unbiased (figure 21.5).
Now let us move on to the OLS reliability question:

In fact the error terms are not independent. We would expect Ted’s error term in a particular
week to be correlated with Sue’s error term in that week. The reason stems from the fact that a
different graduate student grades the each week’s lab reports. Naturally some graduate students
will award more partial credit than others. For example, on the one hand, if a generous graduate
student grades the lab reports in the first week we would expect the error terms of both students
to be positive. On the other hand, if the first week’s reports are graded by a very demanding
graduate student, we would expect the error terms of both students to be negative.
How might be get a sense of whether or not this type of correlation is present in this case?
Recall that while the error terms are unobservable we can think of the residuals as the estimated
error terms. Table 21.10 reports on the residuals. The residuals appear to confirm our suspicions.
The sign of the residuals is always the same. Figure 21.6 plots a scatter diagram of Ted’s and
Sue’s residuals. Each point on the scatter diagram represents on specific week.
The scatter diagram points fall in the first and third quadrants. On the one hand, when the
residual of one student is positive, the residual for the other student is positive also. On the other
hand, when the residual of one student is negative, the residual for the other student is negative.
The scatter diagram suggests that our suspicions about error term/error term correlation are
warranted.
678 Chapter 21

Table 21.10
Chemistry lab OLS residuals

Week Ted’s residual Sue’s residual

1 3.30 1.30
2 6.48 3.35
3 − 6.60 −3.14
4 1.27 2.30
5 − 4.11 −3.60
6 −2.01 −2.57
7 −1.57 −1.16
8 8.94 0.27
9 − 4.73 −7.03
10 4.99 4.32

8
Sue's residuals

Week 10
4
Week 2
Week 4

Week 1
Week 8
0
Week 7
–8 –6 –4 –2 0 2 4 6 8 10
Week 6 Ted's residuals
Week 3
–4
Week 5

Week 9 –8

Figure 21.6
Scatter diagram of Ted’s and Sue’s chemistry lab OLS residuals
679 Panel Data and Omitted Variables

21.4.3 Period Fixed Effects (Dummy Variables)

To understand how period fixed effects can address this issue, recall the original model:

Week 1 (t = 1): LabScore1Ted = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Ted + e1Ted

LabScore1Sue = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Sue + e1Sue
Week 2 (t = 2): LabScore2Ted = βConst + βGGGraderGenerosity2 + βLabMinsLabMins2Ted + e2Ted
LabScore2Sue = βConst + βGGGraderGenerosity2 + βLabMinsLabMins2Sue + eSue
2

..
.

Week 10 (t = 10): LabScore10

Ted
= βConst + βGGGraderGenerosity10 + βLabMinsLabMins10
Ted
+ e10
Ted

LabScore10Sue = βConst + βGGGraderGenerosity10 + βLabMinsLabMins10Sue + e10

Sue

Now focus on week 1. We can fold the constant grader generosity term into the constant for
that week:

Week 1 (t = 1): LabScore1Ted = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Ted + e1Ted

Fold into constant

LabScore1Ted = α1 + βLabMinsLabMins1Ted + e1Ted

LabScore1Sue = βConst + βGGGraderGenerosity1 + βLabMinsLabMins1Sue + e1Sue

Fold into constant

LabScore1Sue = α1 + βLabMinsLabMins1Sue + e1Sue

Using the same logic for the other weeks obtains

Week 2 (t = 2): LabScore2Ted = α2 + βLabMinsLabMinsTed

2 + e2
Ted

LabScore2Sue = α2 + βLabMinsLabMins2Sue + eSue

..
.

Week 10 (t = 10): LabScore10

Ted
= α10 + βLabMinsLabMins10
Ted
+ e10
Ted

LabScore10Sue = α10 + βLabMinsLabMins10Sue + e10

Sue

For each week we have folded the generosity of the grader into the constant. In each week
the constant is identical for both students because the same graduate grades both lab reports. We
now have ten new constants for each week, one constant for each of the ten weeks. Period fixed
effects estimates the values of parameters. Statistical software makes it easy to compute these
estimates (table 21.11).
680 Chapter 21

Table 21.11
Chemistry lab time period fixed effects regression results

Period fixed effects (FE)

Dependent variable: LabScore

Explanatory variable(s): Estimate SE t-Statistic Prob

LabMins 0.366178 0.116859 3.133509 0.0121

Const 63.26684 8.434896 7.500607 0.0000
Number of observations 20
Cross sections 2
Periods 10
Estimated equation: EstLabScore = 63.27 + 0.37LabMins
Interpretation of estimates:
bLabMins = 0.37: A 10 minute increase devoted to the lab increases a student’s lab score by 3.7 points

Getting Started in EViews

• Click on MathScore and then, while holding the <Ctrl> key down, click on MathMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects specification box, select Fixed from the Period dropdown box.
• Click OK.

Question: What does the estimate of the constant, 63.27, represent?

Answer: The average of the weekly time constants.

Statistical software allows us to obtain the estimates of each week’s constant (table 21.12).

Getting Started in EViews

• First, estimate the cross-sectional fixed effects.

• Click View.
• Click Fixed/Random Effects.
• Click Period Effects.

The period fixed effects suggest that the graduate student who graded the lab reports for week
9 was the toughest grader and the graduate student who graded for week 8 was the most
generous.
Period dummy variable/fixed effects critical assumption: For each week (time period) the
omitted variable must equal the same value for each student (cross section).
681 Panel Data and Omitted Variables

Table 21.12
Chemistry lab period fixed effects

Period ID Fixed effect

1 3.171238
2 4.466844
3 − 4.948602
4 2.805060
5 − 4.082423
6 −3.251531
7 −1.448602
8 4.819041
9 −5.814780
10 4.283755

21.5 Cross-sectional Random Effects

Thus far we have been considering the cases in which we had data for all the students in the
course. In reality this is not always true, however. For example, in calculating the unemployment
rate, the Bureau of Labor Statistics conducts a survey that acquires data from approximately
60,000 American households. Obviously there are many more than 60,000 households in the
United States.
We will now consider a scenario to gain insights into such cases. Suppose that there are several
hundred students enrolled in a studio art course at a major university. Each week a studio art
project is assigned. At the beginning of the semester, three students were selected randomly from
all those enrolled: Bob, Dan, and Kim.

Art project score data: Artificially constructed panel data for 3 students during the first 10 weeks
of a studio art class.

ArtScore ti Studio art project score for student i in week t

ArtMins ti Minutes devoted to studio art project by student i in week t
i
Week t Week

The scores of the three students and the time each devoted to the project each week are reported
in table 21.13:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Panel Data—Art.]

Our assignment is to use information from the randomly selected students to assess the effect
that time devoted to the project each week has on the score that week’s project receives:

Project: Assess the effect of time devoted to project on the project score.
682 Chapter 21

Table 21.13
Studio art panel data

Student Week ArtScore ArtMins Student Week ArtScore ArtMins

Bob 1 13 35 Dan 1 17 55
Bob 2 17 42 Dan 2 11 57
Bob 3 19 33 Dan 3 21 61
Bob 4 23 45 Dan 4 15 58
Bob 5 13 31 Dan 5 13 54
Bob 6 15 42 Dan 6 17 62
Bob 7 17 35 Dan 7 19 61
Bob 8 13 37 Dan 8 13 53
Bob 9 17 35 Dan 9 11 50
Bob 10 13 34 Dan 10 9 55
Kim 1 27 53 Kim 6 19 43
Kim 2 23 53 Kim 7 25 56
Kim 3 21 49 Kim 8 25 50
Kim 4 23 48 Kim 9 17 44
Kim 5 27 53 Kim 10 19 48

21.5.1 Art Project Model

We begin by formulating a model:

ArtMins ti = βConst + βAnIQArtIQi + βMinsArtMins ti + e ti

where

ArtIQi = innate artistic talent for student i

i = Bob, Dan, Kim
t = 1, 2, . . . , 10

Again, we should be aware of two implicit assumptions made by this model:

• βConst is the same for each student.
• βArtMins is the same for each student. This implies that the effect of minutes studied on art project
scores is identical for each student; that is, each student receives the same number of additional
points for each additional minute devoted to the project.

The ArtIQi variable, innate artistic talent, requires explanation. Clearly, ArtIQ is an abstract
concept and is unobservable. Therefore, we must omit it from the regression. Nevertheless, we
do know that different students possess different quantities of innate artistic talent. Figure 21.7
illustrates this notion.
683 Panel Data and Omitted Variables

ArtIQ i
Mean[ArtIQ]

Figure 21.7
Probability distribution of ArtIQ random variable

Since our three students were selected randomly, define a random variable, vi, to equal the
amount by which a student’s innate artistic talent deviates from the mean:

ArtIQi = Mean[ArtIQ] + vi

where vi is a random variable. Since the three students were chosen randomly, the mean of vi’s
probability distribution equals 0:

Mean[vi] = 0
Next let us incorporate our specification of ArtIQi to the model:

ArtScore ti = βConst + βArtIQArtIQi + βMinsArtMins ti + e ti

= βConst + βArtIQ(Mean[ArtIQ] + vi) + βMinsArtMins ti + e ti
= βConst + βArtIQMean[ArtIQ] + βArtIQvi + βMinsArtMins ti + e ti

Fold the Mean[ArtIQ] term into the constant:

Let αConst = βConst + βArtIQMean[ArtIQ]

= αConst + βArtIQvi + βMinsArtMins ti + e ti

Rearrange terms

= αConst + βMinsArtMins ti + βArtIQvi + e ti

= αConst + βMinsArtMins ti + ε ti
where ε ti = βArtIQvi + e ti and
ε represents random influences for student i in week t.
i
t
684 Chapter 21

Table 21.14
Studio art pooled OLS regression results

Ordinary least squares (OLS)

Dependent variable: ArtScore

Explanatory variable(s): Estimate SE t-Statistic Prob

ArtMins 0.403843 0.345388 1.169242 0.2522

Const 40.57186 6.350514 6.388752 0.0000
Number of observations 30
Estimated equation: EstArtScore = 40.6 + 0.40ArtMins
Interpretation of estimates:
bArtMins = 0.40: A 10 minute increase in studying increases a student’s art project score by 4.0 points

21.5.2 Ordinary Least Squares (OLS) Pooled Regression

We begin with a pooled regression by using the ordinary least squares (OLS) estimation proce-
dure to estimate model’s parameters (table 21.14):

Interpretation: EstArtScore = 40.57 + 0.40ArtMins

bMins = 0.40

We estimate that a ten-minute increase devoted to an art project increases a student’s score by
4 points. Will the ordinary least squares (OLS) estimation procedure for the coefficient value be
unbiased? The OLS bias question cited earlier in the chapter provides the answer:

Recall the model that we are using (figure 21.8):

ArtMins ti = αConst + βMinsArtMins ti + ε ti

where

ε ti = βArtIQvi + e ti
ArtIQi = Mean[ArtIQ] + vi

When vi increases both innate artistic ability, ArtIQi, and the model’s error term, ε ti , increases.
Therefore the correlation or lack thereof between innate artistic ability, ArtIQi, and the amount
of time devoted to the project, ArtMins ti , determines whether or not the ordinary least squares
(OLS) estimation procedure for the coefficient value is biased or unbiased.
685 Panel Data and Omitted Variables

v i up
vi > 0
i
ArtIQ = Mean[ArtIQ]+ vi ArtIQ
i
ArtIQ up
i
t = ArtIQ vi + eti

ArtMinsti ? i
t up

Typically up Typically not affected Typically down

Positive correlation Independent Negative correlation

Biased up Unbiased Biased down

Figure 21.8
Studio art projects and bias

It is unclear how the amount a time students devote to their studio art projects will be cor-
related with their innate artistic ability. It could be argued that students with more artistic ability
will be more interested in studio art and hence would devote more time to their art projects.
However, highly talented students may only spend a little time on their projects because they
only need to spend a few minutes to get a good score.
The random effects (RE) estimation procedure is only appropriate when the omitted explana-
tory variable and the included explanatory variable are independent. Consequently we will now
assume that innate artistic ability, ArtIQi, and the time devoted to studio art projects, ArtMins ti,
are independent so to motivate the rationale of the random effects (RE) estimation procedure.
Since we are assuming independence, we can move on to pose the OLS reliability question:

In fact the error terms are not independent. To understand why, note that the error term, ε ti, in
this model is interesting; it has two components: ε ti = βArtIQvi + e ti. The first term, βArtIQvi, reflects
innate artistic talent of each randomly selected student:
• Bob’s deviation from the innate artistic talent mean: vBob
• Dan’s deviation from the innate artistic talent mean: vDan
• Kim’s deviation from the innate artistic talent mean: vKim

The second term, e ti, represents the random influences of each student’s weekly quiz.
686 Chapter 21

It is instructive to illustrate the error terms:

Individual Week ε ti = βArtIQvi + e ti

Bob 1 βArtIQvBob + e1Bob
Bob 2 βArtIQvBob + e2Bob
.. .. ..
. . .
Bob 10 βArtIQvBob + e10
Bob

Dan 1 βArtIQvDan + e1Dan

Dan 2 βArtIQvDan + e2Dan
.. .. ..
. . .
Dan 10 βArtIQvDan + e10Dan
Kim 1 βArtIQvKim + e1Kim
Kim 2 βArtIQvKim + e Kim
2

.. .. ..
. . .
Kim 10 βArtIQvKim + e10
Kim

Each of Bob’s error terms has a common term, βArtIQvBob. Similarly each of Dan’s error terms
has and common error term, βArtIQvDan, and each of Kim’s error terms has a common term,
βArtIQvKim. Consequently the error terms are not independent. Since the error term/error
term independence premise is violated, the standard error calculations made by the ordinary
least squares (OLS) estimation procedure are flawed; furthermore the ordinary least square
estimation procedure for the coefficient value is not the best linear unbiased estimation
procedure (BLUE).
To check our logic, we would like to analyze the error terms to determine if they appear to
be correlated, but the error terms are not observable. We can exam the residuals, however. Recall
that the residuals can be thought of as the estimated error terms (figure 21.9). The residuals
indeed suggested that the error term/error term independence premise is violated.

21.5.3 Cross-sectional Random Effects

The random effects estimation procedure exploits this error term pattern to calculate “better”
estimates (table 21.15).
687 Panel Data and Omitted Variables

0 Obs
0 5 10 15 20 25 30
–5

–10
Bob Dan Kim

–15

–20

Figure 21.9
Art class ordinary least squares (OLS) residuals

Table 21.15
Studio art cross-sectional random effects regression results

Period random effects (RE)

Dependent variable: ArtScore

Explanatory variable(s): Estimate SE t-Statistic Prob

ArtMins 0.813260 0.167838 4.845507 0.0000

Const 33.31153 8.687413 3.834459 0.0007
Number of observations 30
Cross sections 3
Periods 10
Estimated equation: EstArtScore = 33.31 + 0.81LabMins
Interpretation of estimates:
bArtMins = 0.81: A 10 minute increase devoted to the lab increases a student’s art project score by 8.1 points

Getting Started in EViews

• Click on ArtScore and then, while holding the <Ctrl> key down, click on ArtMins.
• Double click the highlighted area.
• Click the Panel Options tab.
• In the Effects Specification box, select Random from the Cross Section dropdown box.
• Click OK.
688 Chapter 21

The intuition behind all this is that we can exploit the additional information about the error
terms to improve the estimation procedure. Additional information is a “good thing.” It is worth
noting that we adopted the same strategy when we studied heteroskedasticity and autocorrelation
(chapters 16 and 17). When the error terms are not independent we can exploit that information
to improve our estimate beyond what the ordinary least squares (OLS) estimation procedure
provides. In this case we used the random effects estimation procedure to do so.

21.6 Random Effects Critical Assumptions

For each student (cross section) the omitted variable must equal the same value in each week
(time period). That is, from week to week:
• ArtIQBob does not vary.
• ArtIQDan does not vary.
• ArtIQKim does not vary.

The omitted variable and the included variable(s) are independent.

Chapter 21 Review Questions

1. What is the critical assumption that the first differences estimation procedure makes?
2. What is the critical assumption that the cross section fixed effects (FE) estimation procedure
makes?
3. What is the critical assumption that the period fixed effects (FE) estimation procedure makes?
4. What is the critical assumption that the random effects (RE) estimation procedure makes?
5. What is the intuition behind our treatment of heteroskedasticity and autocorrelation? How is
the random effects estimation procedure similar?

Chapter 21 Exercises

Beer consumption data: Panel gasoline data of beer consumption, beer prices, and income
statistics for fifty states and the District of Columbia from 1999 to 2007:

BeerPC ti Per capita beer consumption in state i during year t (12 oz cans)
Pricet Real price of beer in year t (1982–84 dollars per 12 oz can)
i
IncPC t Per capita real disposable income in state i during in year t (thousands of chained
2005 dollars)
689 Panel Data and Omitted Variables

1. Consider the following linear model of beer consumption:

BeerPC ti = βConst + βPPricet + βIIncPC ti + e ti

a. Develop a theory regarding how each explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
without fixed or random effects, estimate the value of each coefficient. Interpret the coeffi-
cient estimates. What are the critical results?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beer Consumption.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
2. Introduce cross-sectional fixed effects to estimate the parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Beer Consumption.]

a. Interpret the coefficient estimates. What are the critical results?

b. Calculate Prob[Results IF H0 true] and assess your theories.
c. What effect does the introduction of cross section fixed effects have on the results?
d. Can you justify the inclusion of fixed effects? Explain. Hint: Consider the state of Utah.

Internet and TV data: Panel data of Internet, TV, economic, and political statistics for 208
countries from 1995 to 2002.

UsersInternetti Internet users per 1,000 people in country i during year t

i
UsersTV t Television users per 1,000 people in country i during year t
i
Year t Year
CapitaHumanti Literacy rate in country i during year t (percent of population 15 and over)
i
CapitaPhysical t Telephone mainlines per 10,000 people in country i during year t
i
GdpPC t Per capita real GDP in country i during year t (1,000’s of “international”
dollars)
AuthPCti The Freedom House measure of political authoritarianism in country i
during year t normalized to a 0 to 10 scale. 0 represents the most
democratic rating and 10 the most authoritarian. During the 1995 to 2002
period, Canada and the United States had a 0 rating; Iraq and the
Democratic Republic of Korea (North Korea) rated 10.
690 Chapter 21

Consider the following log-linear model of Internet use:

Log (UsersInternet ti ) = βConst + βYearYearti + βCapHumCapitaHumanti + βCapPhyCapitaPhysical ti

+ βGDPGdpPCti + βAuthAuthti + eti

3. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters without
fixed or random effects. Interpret the coefficient estimate of Year.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use—1995–2002.]

4. Do not include the variable Year as an explanatory variable. Instead introduce period fixed
effects.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Internet and TV Use—1995–2002.]

a. Are the coefficient estimates of CapitalHuman, CapitalPhysical, GdpPC, and Auth quali-
tatively consistent with the estimates that you obtained in the previous exercise when the
variable Year was included and period fixed effects were not specified?
b. Examine the estimates of the period fixed effects coefficients. Are they qualitatively con-
sistent with the coefficient estimate for the variable year that you obtained when period fixed
effects were not specified?

Motor fuel consumption data for Arkansas, Massachusetts, and Washington: Panel data relating
to motor fuel consumption for Arkansas, Massachusetts, and Washington from 1973 to 2007.
These three states were chosen randomly from the fifty states and the District of Columbia.

MotorFuelPCti Per capita motor fuel consumption in state i during year t (gallons)
Priceti Real price of gasoline in state i during year t (1982–84 dollars per gallon)
i
IncPC t Per capita real disposable income in state i during year t (thousands of
chained 2005 dollars)
PopDenti Population density in state i during year t (persons per square mile)
i
UnemRate t Unemployment rate in state i during year t (percent)

5. Consider the following linear model of motor fuel consumption:

MotorFuelPCti = βConst + βPPricet + eti

a. Develop a theory regarding how the explanatory variable influences the dependent vari-
able. What does your theory imply about the sign of each coefficient?
b. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters
without fixed or random effects, estimate the value of each coefficient. Interpret the coeffi-
cient estimate. What is the critical result?
691 Panel Data and Omitted Variables

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Motor Fuel Consumption.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
e. Examine the residuals. Do they suggest that the random effect approach may be appropri-
ate? Explain.
6. Introduce cross section random effects to estimate the parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Motor Fuel Consumption.]

a. Interpret the coefficient estimates. What are the critical results?

b. Calculate Prob[Results IF H0 true] and assess your theory.
c. What effect does the introduction of cross-sectional random effects have on the results?
Simultaneous Equations Models—Introduction
22

Chapter 22 Outline

22.1 Review: Explanatory Variable/Error Term Correlation

22.2 Simultaneous Equations Models: Demand and Supply

22.2.1 Endogenous versus Exogenous Variables
22.2.2 Single Equation versus Simultaneous Equations Models
22.2.3 Demand Model
22.2.4 Supply Model
22.2.5 Summary: Endogenous Explanatory Variable Problem

22.3 An Example: The Market for Beef

22.3.1 Demand and Supply Models
22.3.2 Ordinary Least Squares (OLS) Estimation Procedure
22.3.3 Reduced Form (RF) Estimation Procedure: The Mechanics
22.3.4 Comparing Ordinary Least Squares (OLS) and Reduced Form (RF) Estimates

22.4 Justifying the Reduced Form (RF) Estimation Procedure

22.5 Two Paradoxes

22.6 Resolving the Two Paradoxes: Coefficient Interpretation Approach

22.6.1 Review: Goal of Multiple Regression Analysis and the Interpretation of the
Coefficients
22.6.2 Paradox: Demand Model Price Coefficient Depends on the Reduced Form (RF)
Feed Price Coefficients
22.6.3 Paradox: Supply Model Price Coefficient Depends on the Reduced Form (RF)
Income Coefficients

22.7 The Coefficient Interpretation Approach: A Bonus

694 Chapter 22

Chapter 22 Prep Questions

1. This question requires slogging through much high school algebra, so it is not very exciting.
While tedious, it helps us understand simultaneous equations models. Consider the following
two equations that model the demand and supply of beef:

Demand model: QDt = β Dconst + β DPPt + β DIInct + eDt

Supply model: Q St = β Sconst + β SPPt + β DFPFeedPt + eSt

where

Q Dt = quantity of beef demanded in period t

Q St = quantity of beef supplied in period t
Pt = price of beef in period t
Inct = disposable income in period t
FeedPt = price of cattle feed in period t

Let

Qt = equilibrium quantity in period t : Q Dt = Q St = Qt

Using algebra, solve for Pt and Qt in terms of FeedPt and Inct:

a. Strategy to solve for Pt:
• Substitute Qt for Q Dt and Q St.
• Subtract the supply model equation from the demand model equation.
• Solve for Pt.
b. Strategy to solve for Qt:

• Substitute Qt for Q Dt and Q St.

• Multiply the demand model equation by β SP and the supply model equation by β DP.
• Subtract the new equation for the supply model from the new equation for the demand
model.
• Solve for Qt.
2. Next, we express equations for Pt and Qt in terms of the following α’s:

Qt = α QConts + α QFPFeedPt + α QIInct + ε Qt

Pt = α PConts + α FP
P
FeedPt + α PIInct + ε Pt
Compare these two equations for Qt and Pt with the two equations for Qt and Pt in problem
1. Express α QFP, α QI, α FP
P
, and α PI in terms of the β’s appearing in problem 1:
695 Simultaneous Equations Models—Introduction

a. α QFP = ___ α QI = _ α PFP = _ α PI = ___

Now consider the following ratios of α’s.
b. What does α QFP/α PFP equal?
c. What does α QI/α PI equal?

3. In words answer the following questions:

a. What is the goal of multiple regression analysis?
b. What is the interpretation of each coefficient in the regression model?
Consider the following multiple regression model:

General regression model: yt = βConst + βx1x1t + βx2x2t + et

Since the actual parameters of the model, the β’s, are unobservable, we estimate them. The
estimated parameters are denoted by italicized Roman b’s:

Esty = bConst + bx1x1 + bx2x2

In terms of the estimated coefficients, bx1 and/or bx2, what is the expression for the estimated
change in y?
c. If x1 changes by Δx1 while x2 constant: Δy =
d. If x2 changes by Δx2 while x1 constant: Δy =
e. Putting parts c and d together, if both x1 and x2 change: Δy =

4. Consider the following model of the US beef market:

Demand model: QD = 100,000 − 10,000P + 150Inc

Supply model: QS = 190,000 + 5,000P − 6,000FeedP

Equilibrium: QD = QS = Q

where

Q = quantity of beef (millions of pounds)

P = real price of beef (cents per pound)
Inc = real disposable income (billions of dollars)
FeedP = real price of cattle feed (cents per pounds of corn cobs)

Use algebra to solve for the equilibrium price and quantity. That is,
a. Express the equilibrium price, P, in terms of FeedP and Inc.
b. Express the equilibrium quantity, Q, in terms of FeedP and Inc.
These two equations are called the reduced form (RF) equations.
696 Chapter 22

5. Suppose that FeedP equals 40 and Inc equals 4,000.

a. What are the numerical values of the equilibrium price and quantity?
b. On a sheet of graph paper, plot the demand and supply curves to illustrate the
equilibrium.
6. Assume that you did not know the equations for the demand and supply models. But you do
know the reduced form (RF) equations that you derived in part a.
a. Suppose that Inc were to rise from 4,000 to 6,000 while FeedP remains constant at 40.
Using the reduced form (RF) equations calculate the new equilibrium price and quantity.
• Will the demand curve shift? Explain.
• Will the supply curve shift? Explain.
• On a sheet of graph paper, plot the demand curve(s), the supply curve(s), and the two
equilibria.
• Based on the numerical values of the two equilibria can you calculate the slope of the
supply curve? Explain.
• Based on the numerical values of the two equilibria can you calculate the price coeffi-
cient of the demand model? Explain.
b. Instead, suppose that FeedP were to rise from 40 to 60 while Inc remains constant at
4,000. Using the reduced form (RF) equations calculate the new equilibrium price and
quantity.
• Will the demand curve shift? Explain.
• Will the supply curve shift? Explain.
• On a sheet of graph paper, plot the demand curve(s), the supply curve(s), and the two
equilibria.
• Based on the numerical values of the two equilibria can you calculate the price coeffi-
cient of the demand model? Explain.
• Based on the numerical values of the two equilibria can you calculate the price coeffi-
cient of the supply model? Explain.

22.1 Review: Explanatory Variable/Error Term Correlation

Demand and supply curves are arguably the economist’s most widely used tools. They provide
one example of simultaneous equations models. Unfortunately, as we will shortly show, the
ordinary least squares (OLS) estimation procedure is biased when it is used to estimate the
parameters of these models. To illustrate this, we begin by reviewing the effect that explanatory
variable/error term correlation has on the ordinary least squares (OLS) estimation procedure.
Then we focus on a demand/supply model to explain why the ordinary least squares (OLS)
estimation procedure leads to bias.
697 Simultaneous Equations Models—Introduction

Explanatory variable and error term Explanatory variable and error term
e positively correlated et negatively correlated
t

xt xt

yt yt

Best fitting line Actual

equation line

Actual
equation line
Best fitting line

xt xt

Figure 22.1
Explanatory variable and error term correlation

Explanatory variable/error term correlation (figure 22.1) leads to bias. On the one hand, when
the explanatory variable and error term are positively correlated, the best fitting line is more
steeply sloped than the actual equation line; consequently the ordinary least squares (OLS)
estimation procedure for the coefficient value is biased upward. On the other hand, when the
explanatory variable and error term are negatively correlated, the best fitting line is less steeply
sloped than the actual equation line; consequently, the ordinary least squares (OLS) estimation
procedure for the coefficient value is biased downward.

Explanatory variable Explanatory variable

and error term are and error term are
positively correlated negatively correlated
↓ ↓
OLS estimation procedure OLS estimation procedure
for coefficient value for coefficient value
biased upward biased downward
698 Chapter 22

22.2 Simultaneous Equations: Demand and Supply

Consider the market for a good such as food or clothing. The following two equations describe
a standard demand/supply model of the market for the good:

Demand model: Q Dt = β DConst + β DPPt + Other demand factors + e Dt

Supply model: Q St = β Const

S
+ β SPPt + Other supply factors + eSt

where

Pt = price of the good

Q Dt = quantity of the good demanded
e Dt = error term in the demand equation
Q St = quantity of the good supplied
e St = error term in the supply equation

First, focus on our notation. The superscripts denote the models:

• Superscript D—Demand model: Q Dt equals the quantity demand. β Const

D
, β PD, and eDt refer to
the parameters and error term of the demand model.
• Superscript S—Supply model: Q St equals the quantity supplied. β Const
S
, β PS , and e ts refer to
the parameters and the error term of the supply model.
The parameter subscripts refer to the constants and explanatory variable coefficients of the
models.
• Subscript Const—Constant: β ConstD
and β Const
S
are the constants of the demand model and
the supply model.
• Subscript P—Price coefficient: β DP and β PS are the price coefficients of the demand model
and the supply model.

The quantity of a good demanded is determined by the good’s own price and other demand
factors such as income, the price of substitutes, the price of and complements. Similarly the
quantity of a good supplied is determined by the good’s own price and other supply factors such
as wages and raw material prices.
The market is in equilibrium whenever the quantity demanded equals the quantity supplied:

Q Dt = Q St = Qt

Both the quantity, Qt, and the price, Pt, are determined simultaneously as depicted by the famous
demand/supply diagram reproduced in figure 22.2.
699 Simultaneous Equations Models—Introduction

Price

Q = Equlibrium quantity
P
P = Equilibrium price

D
Quantity
Q

Figure 22.2
Demand/supply model

22.2.1 Endogenous versus Exogenous Variables

In a simultaneous equations model, it is important to distinguish between endogenous and exog-

enous variables. Endogenous variables are variables whose values are determined “within” the
model. In the demand/supply model, the quantity and the price, as determined by the intersection
of the supply and demand curves, are endogenous; that is, both quantity and price are determined
simultaneously “within” the model. Conversely, exogenous variables are determined “outside”
the model; in the context of the model, the values of exogenous variables are taken as given.
The model does not attempt to explain how the values of exogenous variables are determined.
For example, when considering the demand and supply models for beer, we are not trying to
explain the effect of the other demand and supply factors; that is, we are not trying to explain
how income, the price of wine, wages, and the price of hops are determined. Income, the price
of wine, wages, and the price of hops would all be exogenous variables, so we would take them
as given.

Endogenous variables: Variables determined “within” the model; namely price and quantity
Exogenous variables: Variables determined “outside” the model; namely other demand and
supply factors

22.2.2 Single Equation versus Simultaneous Equations Models

In single equation models, there is only one endogenous variable, the dependent variable itself;
all explanatory variables are exogenous. For example, in the following single model the depen-
dent variable is consumption and the explanatory variable income:
700 Chapter 22

Const = βConst + βIInct + et

The model only attempts to explain how consumption is determined. The dependent variable,
consumption, is the only endogenous variable. The model does not attempt to explain how
income is determined; that is, the values of income are taken as given. All explanatory variables,
in this case only income, are exogenous.
In a simultaneous equations model, while the dependent variable is endogenous, an explana-
tory variable can be either endogenous or exogenous. In the demand/supply model, quantity, the
dependent variable, is an endogenous; quantity is determined “within” the model. Price is both
an endogenous variable and an explanatory variable. Price is determined “within” the model,
and it is used to explain the quantity demanded and the quantity supplied.
What are the consequences of endogenous explanatory variables for the ordinary least squares
(OLS) estimation procedure?

Claim: Whenever an explanatory variable is also an endogenous variable, the ordinary least
squares (OLS) estimation procedure for the value of its coefficient is biased.
We will now use the demand and supply models to justify this claim.

22.2.3 Demand Model

When the ordinary least squares (OLS) estimation procedure is used to estimate the demand
model, the good’s own price and the error term are positively correlated; accordingly, the ordi-
nary least squares (OLS) estimation procedure for the value of the price coefficient will be biased
upward (figure 22.3). Let us now show why.

Demand model: Q Dt = β DConst + β DPPt + Other demand factors + e Dt

Price
D (eD up)
S

P ( eD up)

P ( eD down)

D (eD down) D
Quantity

Figure 22.3
Effect of the demand error term
701 Simultaneous Equations Models—Introduction

Ordinary Least Squares (OLS) Estimation Procedure: Our Suspicions

When the error term, eDt, rises the demand curve shifts to the right resulting in a higher price;
on the other hand, when the error, eDt, falls the demand curve shifts to the left resulting in a lower
price. In the demand model, the price, Pt, and the error term, e Dt, are positively correlated. Since
the good’s own price is an explanatory variable, bias results:

eDt up eDt down

↓ ↓
Pt up Pt down

Explanatory variable and

error term positively correlated
↓
OLS estimation procedure for
coefficient value biased upward

Confirming Our Suspicions

The ordinary least squares (OLS) estimation procedure for the value of the price coefficient in
the demand model should be biased upward (figure 22.4). We will check our logic with a
simulation.

Econometrics Lab 22.1: Simultaneous Equations—Demand Price Coefficient

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 22.1.]

We are focusing on the demand model; hence, the Dem radio button is selected. The lists imme-
diately below the Dem radio button specify the demand model. The actual constant equals 30,
the actual price coefficient equals −4, and so forth. XCoef represents an “other demand factor,”
such as income.
Be certain that the Pause checkbox is cleared. Click Start and then after many, many repeti-
tions click Stop. The average of the estimated demand price coefficient values is −2.6, greater
than the actual value, −4.0 (table 22.1). This result suggests that the ordinary least squares (OLS)
estimation procedure for the value of the price coefficient is biased upward. Our Econometrics
Lab confirms our suspicions.
But even though the ordinary least squares (OLS) estimation procedure is biased, it might be
consistent, might it not? Recall the distinction between an unbiased and a consistent estimation
procedure:
702 Chapter 22

Specifies the demand model Specifies the supply model

Dem Sup

20 0
Const Unbiased estimation
30 10
procedure: After many,
many repetitions of the
experiment the average
−4 1 (mean) of the estimates
PCoef equals the actual value
−1 4

1 1
2 2
XCoef

Is the estimation
30 30 procedure for the
XVar 40 40 coefficient’s value
unbiased?
Estimation
procedure
10 10
Err Var 20 20

OLS
Mean (average)
of the estimated PCoef value est
coefficient
values from all Mean
repetitions

Figure 22.4
Simultaneous equations simulation

Table 22.1
Simultaneous equations simulation results—Demand

Estimation Sample Actual Mean (average) of Magnitude

procedure size coef value estimated coef values of bias

OLS 20 −4.0 ≈ −2.6 ≈1.4

703 Simultaneous Equations Models—Introduction

Table 22.2
Simultaneous equations simulation results—Demand

Estimation Sample Actual Mean (average) of Magnitude

procedure size coef value estimated coef values of bias

OLS 20 − 4.0 ≈ −2.6 ≈1.4

OLS 30 − 4.0 ≈ −2.6 ≈1.4
OLS 40 − 4.0 ≈ −2.6 ≈1.4

Unbiased: The estimation procedure does not systematically underestimate or overestimate the
actual value; that is, after many, many repetitions the average of the estimates equals the actual
value.
Consistent but biased: As consistent estimation procedure can be biased. But, as the sample
size, as the number of observations, grows:

• The magnitude of the bias decreases. That is, the mean of the coefficient estimate’s probability
distribution approaches the actual value.
• The variance of the estimate’s probability distribution diminishes and approaches 0.

How can we use the simulation to investigate this possibility? Just increase the sample size.
If the procedure is consistent, the average of the estimated coefficient values after many, many
repetitions would move closer and closer to −4.0, the actual value, as we increase the sample
size. That is, if the procedure is consistent, the magnitude of the bias would decrease as the
sample size increases. (Also the variance of the estimates would decrease.) So let us increase
the sample size from 20 to 30 and then to 40. Unfortunately, we observe that a larger sample
size does not reduce the magnitude of the bias (table 22.2).
When we estimate the value of the price coefficient in the demand model, we find that the
ordinary least squares (OLS) estimation procedure fails in two respects:
Bad news: The ordinary least squares (OLS) estimation procedure is biased.
Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.

22.2.4 Supply Model

We will now use the same line of reasoning to show that the ordinary least squares (OLS) esti-
mation procedure for the value of the price coefficient in the supply model is also biased (figure
22.5).

Supply model: Q St = β SConst + β SPPt + Other supply factors + e St

704 Chapter 22

Price
S (eS down)
S

P ( eS down)

S (eS up)
P

P ( eS up)

D
Quantity

Figure 22.5
Effect of the supply error term

Ordinary Least Squares (OLS) Estimation Procedure: Our Suspicions

On the one hand, when the error term, e St, rises, the supply curve shifts to the right, resulting in
a lower price; on the other hand, when the error term, eSt, falls, the supply curve shifts to the left,
resulting in a higher price. In the supply model, the price, Pt, and the error term, eSt, are negatively
correlated:

eSt up eSt down

↓ ↓
Pt up Pt down

Explanatory variable and

error term negatively correlated
↓
OLS estimation procedure for
coefficient value biased downward

The ordinary least squares (OLS) estimation procedure for the value of the price coefficient in
the supply model should be biased downward. Once again, we will use a simulation to confirm
our logic.

Confirming Our Suspicions

Econometrics Lab 22.2: Simultaneous Equations—Supply Price Coefficient

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 22.2.]
705 Simultaneous Equations Models—Introduction

Table 22.3
Simultaneous equations simulation results—Supply

Estimation Sample Actual Mean (average) of Magnitude

procedure size coef value estimated coef values of bias

OLS 20 1.0 ≈ −0.4 ≈1.4

OLS 30 1.0 ≈ −0.4 ≈1.4
OLS 40 1.0 ≈ −0.4 ≈1.4

S S
Prob[ b < 0 ] Prob[b > 0]
P P

S
bP
− 0.4 1. 0
0

Figure 22.6
Probability distribution of price coefficient estimate

We are now focusing on the supply curve; hence, the Sup radio button is selected. Note that the
actual value of the supply price coefficient equals 1.0. Be certain that the Pause checkbox is
cleared. Click Start and then after many, many repetitions click Stop. The average of the esti-
mated coefficient values is −1.4, less than the actual value, 1.0. This result suggests that the
ordinary least squares (OLS) estimation procedure for the value of the price coefficient is biased
downward, confirming our suspicions.
But might the estimation procedure be consistent? To answer this question increase the sample
size from 20 to 30 and then from 30 to 40. The magnitude of the bias is unaffected. Accordingly,
it appears that the ordinary least squares (OLS) estimation procedure for the value of the price
coefficient is not consistent either (table 22.3).
When estimating the price coefficient’s value in the supply model, the ordinary least squares
(OLS) estimation procedure fails in two respects (figure 22.6):
706 Chapter 22

Bad news: The ordinary least squares (OLS) estimation procedure is biased.
Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.

The supply model simulations illustrate a problem even worse than that encountered when
estimating the demand model. In this case the bias can be so severe that the mean of the coef-
ficient estimate’s probability distribution has the wrong sign. To gain more intuition, suppose
that the probability distribution is symmetric. Then the chances that the coefficient estimate
would have the wrong sign are greater than the chances that it would have the correct sign when
using the ordinary least squares (OLS) estimation procedure. This is very troublesome, is it not?

22.2.5 Summary: Endogenous Explanatory Variable Problem

We have used the demand and supply models to illustrate that an endogenous explanatory vari-
able creates a bias problem for the ordinary least squares (OLS) estimation procedure. Whenever
an explanatory variable is also an endogenous variable, the ordinary least squares (OLS) estima-
tion procedure for the value of its coefficient is biased.

22.3 An Example: The Market for Beef

Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.

Qt Quantity of beef in month t (millions of pounds)

Pt Real price of beef in month t (1982–84 cents per pound)
Inct Real disposable income in month t (billions of 2005 dollars)
ChickPt Real price of whole chickens in month t (1982–84 cents per pound)
FeedPt Real price of cattle feed in month t (1982–84 cents per pounds of corn cobs)

22.3.1 Demand and Supply Models

We begin by describing the endogenous and exogenous variables:

Endogenous variables: Both the quantity of beef and the price of beef, Qt and Pt, are endogenous
variables; they are determined within the model.
Exogenous Variables:
• Disposable income is an “Other demand factor”; disposable income, Inct, is an exogenous
variable that affects demand. Since beef is regarded as a normal good, we expect that households
would demand more beef when income rises.
• The price of chicken is also an “Other demand factor”; the price of chicken, ChickPt, is an
exogenous variable that affects demand. Since chicken is a substitute for beef, we expect that
households would demand more beef when the price of chicken rises.
707 Simultaneous Equations Models—Introduction

• The price of cattle feed is an “Other supply factor”; the price of cattle feed, FeedPt, is an
exogenous variable that affects supply. Since cattle feed is an input to the production of beef,
we expect that firms would produce less when the price of cattle feed rises.

Now let us formalize the simultaneous equations demand/supply model that we will
investigate:

Demand model: QDt = β Const

D
+ β DPPt + β DIInct + eDt

Supply model: QSt = β SConst + β SPPt + β SFPFeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Project: Estimate the beef market demand and supply parameters.

22.3.2 Ordinary Least Squares (OLS) Estimation Procedure

Let us begin by using the ordinary least squares (OLS) procedure to estimate the parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

As reported in table 22.4, the estimate of the demand model’s price coefficient is negative,
−364.4, suggesting that higher prices decrease the quantity demanded. The result is consistent
with economic theory suggesting that the demand curve is downward sloping.

Table 22.4
OLS regression results—Demand model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P −364.3646 18.29792 −19.91290 0.0000

Inc 23.74785 0.929082 25.56056 0.0000
Const 155,137.0 4,731.400 32.78882 0.0000
Number of observations 120
Estimated equation: EstQD = 155,137 − 364.4P + 23.75Inc
Interpretation of estimates:
b PD = −364.4: A 1 cent increase in the price of beef decreases the quantity demanded by 364.4 million pounds
b ID = 23.75: A 1 billion dollar increase in real disposable income increases the quantity of beef demanded by 23.75
million pounds
708 Chapter 22

Table 22.5
OLS regression results—Supply model

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

P −231.5487 41.19843 −5.620328 0.0000

FeedP −700.3695 119.3941 −5.866031 0.0000
Const 272,042.7 7,793.872 34.90469 0.0000
Number of observations 120
Estimated equation: EstQS = 272,043 − 231.5P + 700.1FeedP
Interpretation of estimates:
bPS = −231.5: A 1 cent increase in the price of beef decreases the quantity supplied by 231.5 million pounds
S
bFP = −700.1: A 1 cent increase in cattle feed decreases the quantity of beef supplied by 700.1 million pounds

As reported in table 22.5, the estimate of the supply model’s price coefficient is negative,
−231.5, suggesting that higher prices decrease the quantity supplied. Obviously this result is not
consistent with economic theory. This result suggests that the supply curve is downward sloping
rather than upward sloping. But what have we just learned about the ordinary least squares (OLS)
estimation procedure. The ordinary least squares (OLS) estimation procedure for the price coef-
ficient estimate of the supply model will be biased downward. This could explain our result,
could it not?

22.3.3 Reduced Form (RF) Estimation Procedure: The Mechanics

We will now describe an alternative estimation procedure, the reduced form (RF) estimation
procedure. We will show that while this new procedure does not “solve” the bias problem, it
mitigates it. More specifically, while the procedure is still biased, it proves to be consistent. In
this way, the new procedure is “better than” ordinary least squares (OLS). We begin by describ-
ing the mechanics of the reduced form (RF) estimation procedure.
We have argued that the ordinary least squares (OLS) estimation procedure leads to bias
because an endogenous variable, in our case the price, is an explanatory variable. The reduced
form (RF) approach begins by using algebra to express each endogenous variable only in terms
of the exogenous variables. These new equations are called the reduced form (RF) equations.

Intuition: Since bias results from endogenous explanatory variables, algebraically manipulate
the simultaneous equations model to express each endogenous variable only in terms of the
exogenous variables. Then use the ordinary least squares (OLS) estimation procedure to estimate
the parameters of these newly derived equations, rather than the original ones.

The reduced form (RF) approach involves three steps:

709 Simultaneous Equations Models—Introduction

Step 1: Derive the reduced form (RF) equations from the original models.
• The reduced form (RF) equations express each endogenous variable in terms of the exogenous
variables only.
• Algebraically solve for the original model’s parameters in terms of the reduced form (RF)
parameters.

Step 2: Use ordinary least squares (OLS) estimation procedure to estimate the parameters of
the reduced form (RF) equations.
Step 3: Calculate coefficient estimates for the original models using the derivations from step
1 and estimates from step 2.

Step 1: Derive the Reduced Form (RF) Equations from the Original Models
We begin with the supply and demand models:

Demand model: Q Dt = β DConst + β PDPt + β IDInct + eDt

Supply model: Q St = β Const

S
+ β PS Pt + β SFPFeedPt + eSt

Equilibrium: QDt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

There are six parameters of the demand and supply models: βConst D
, β PD, β DI, β Const
S
, βPS, and βFP
S
.
We wish to estimate the values of these parameters.
The reduced form (RF) equations express each endogenous variable in terms of the exogenous
variables. In this case we wish to express Qt in terms of FeedPt and Inct and Pt in terms of FeedPt
and Inct. The appendix at the end of this chapter shows that how elementary, yet laborious,
algebra can be used to derive the following reduced form (RF) equations for the endogenous
variables, Qt and Pt:
β PS βConst
D
− β PD βConst
S
β PD β FP
S
β PS β ID β PS etD − β PD etS
Qt = − FeedPt + Inct +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
βConst
D
− βConst
S
β FP
S
β ID etD − etS
Pt = − FeedPt + Inct +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
Now let us make an interesting observation about the reduced form (RF) equations. Focus
first on the ratio of the feed price coefficients and then on the ratio of the income coefficients.
These ratios equal the price coefficients of the original demand and supply models, β PD and βPS :
710 Chapter 22

Feed price coefficients in the Income coefficients in the

reduced form (RF) equations reduced form (RF) equations
↓ ↓
Ratio of feed price Ratio of income
coefficients equals the coefficients equals the
price coefficient of price coefficient of
the demand model the supply model
↓ ↓
β PD β FP
S
β PS β ID
−
β PS − β PD β PS − β PD
= β PD = β PS
β FP
S
β ID
− S
β P − β PD β PS − β PD
We will formalize this observation by expressing all the parameters of the original simultane-
ous equations model (the constants and coefficients of the demand and supply models, the β’s)
in terms of the parameters of the reduced form (RF) model (the constants and coefficients of
the reduced form equations. To do so, let α represent the parameters, the constants and coeffi-
cients, of the reduced form (RF) equations:

Qt = α Const
Q
+ αFP
Q
FeedPt + α QIInct + ε Qt

Pt = αConst
P
+ α FP
P
FeedPt + α PIInct + ε Pt

First, consider the notation we use in the reduced form (RF) equations. Superscripts refer to
the reduced form (RF) equation:

• Superscript Q—Quantity reduced form (RF) equation: α Const Q

, α FP
Q
, α QI, and ε Qt are the
parameters and error term of the quantity reduced form (RF) equation.
• Superscript P—Price reduced form (RF) equation: α Const P
, α FP
P
, α PI, and ε tP are the param-
eters and the error term of the price reduced form (RF) equation.

The parameter subscripts refer to the constants and coefficients of each reduced form (RF)
equation:

• Subscript Const—Reduced form (RF) constants: α QConst and α Const

P
, are the constants of the
quantity and price reduced form (RF) equations.
• Subscript FP—Reduced form (RF) feed price coefficients: α FP Q
and α FP
P
are the feed price
coefficients of the quantity and price reduced form (RF) equations.
• Subscript I—Reduced form (RF) income coefficients: α QI and α IP are the income coeffi-
cients of the quantity and price reduced form (RF) equations.

There are six parameters of the reduced form (RF) equations: α Const
Q
, α FP
Q
, α QI, α PConst, α FP
P
, and α PI.
711 Simultaneous Equations Models—Introduction

By comparing the two sets of reduced form (RF) equations, we can express each of the reduced
form (RF) parameter, each α, in terms of the parameters of the original demand and supply
models, the β’s. We have six equations:
β PS βConst
D
− β PD βConst
S
β Dβ S βSβD
α Const
Q
= , α FP
Q
= − SP FPD , α IQ = S P I D
βP − βP
S D
βP − βP βP − βP
βConst
D
− βConst
S
β FP
S
β ID
α Const
P
= , α P
FP = − , α P
I =
β PS − β PD β PS − β PD β PS − β PD
There are six parameters of the original demand and supply models: βConst
D
, β PD, β DI, βConst
S
, βPS, and
β FP. That is, we have six unknowns, the β’s. We have six equations and six unknowns. We can
S

solve for the unknowns by expressing the β’s in terms of the α’s. For example, we can solve for
price coefficients of the original demand and supply models, β DP and β SP:

Feed price coefficients in the Income coefficients in the

reduced form (RF) equations: reduced form (RF) equations:
Ratio of α QFP to αFP
P
equals β PD Ratio of α QI to α IP equals β PS
↓ ↓
β β D S
β β IDS
− P FP P
α FP
Q
β −β S D
α IQ β PS − β PD
= P
= β PD
P
= = β PS
α FP
P
β FP
S
α IP β ID
− S
β P − β PD β PS − β PD
↓ ↓
α Q
α Q
FP
= β PD = β PS
I

α P
FP α P
I

These coefficients reflect the “slopes” of the demand and supply curves.1

Step 2: Use Ordinary Least Squares (OLS) to Estimate the Reduced Form Equations
We use the ordinary least squares (OLS) estimation procedure to estimate the α’s (see tables
22.6 and 22.7).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Estimate of α FP
Q
= αFP
Q
= −332.00 Estimate of α QI = α QI = 17.347
Estimate of α FP
Q
= α FP
Q
= 1.0562 Estimate of α PI = α IP = 0.018825

1. The coefficients do not equal the slope of the demand curve, but rather the reciprocal of the slope. They are the ratio
of run over rise instead of rise over run. This occurs as a consequence of the economist’s convention of placing quantity
on the horizontal axis and price on the vertical axis. To avoid the awkwardness of using the expression “the reciprocal
of the slope” repeatedly, we will place the word slope within quotes to indicate that it is the reciprocal.
712 Chapter 22

Table 22.6
OLS regression results—Quantity reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP −331.9966 121.6865 −2.728293 0.0073

Inc 17.34683 2.132027 8.136309 0.0000
Const 138,725.5 13,186.01 10.52066 0.0000
Number of observations 120
Estimated equation: EstQ = 138,726 − 332.00FeedP + 17.347Inc
Interpretation of estimates:
aFPQ
= −332.00: A 1 cent increase in the price of cattle feed decreases the quantity of beef by 332.00 million pounds
a IQ = 17.347: A 1 billion dollar increase in real disposable income increases the quantity of beef by 17.347 million
pounds

Table 22.7
OLS regression results—Price reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP 1.056242 0.286474 3.687044 0.0003

Inc 0.018825 0.005019 3.750636 0.0003
Const 33.02715 31.04243 1.063936 0.2895
Number of observations 120
Estimated equation: EstP = 33.037 + 1.0562FeedP + 0.018825Inc
Interpretation of estimates:
P
aFP = 1.0562: A 1 cent increase in the price of cattle feed increases the price of beef by 1.0562 cents
aIP = 17.347: A 1 billion dollar increase in real disposable income increases the price of beef by 0.018825 cents

Table 22.8 summarizes the reduced form (RF) coefficient estimates.

Let us now take a brief aside to show that the reduced form (RF) estimates are consistent with
the standard demand/supply analysis for the beef market.

• Feed price reduced form (RF) estimates: Since cattle feed is an input for beef production,
an increase in the feed price shifts the supply curve for beef to the left. As figure 22.7 illustrates,
the equilibrium quantity falls and the equilibrium price rises.
The feed price coefficient estimate in the quantity reduced form (RF) equation is negative,
−332.00. The negative estimate suggests that an increase in feed prices reduces the quantity. The
713 Simultaneous Equations Models—Introduction

Table 22.8
Reduced form (RF) coefficient estimates

Quantity reduced form (RF) Price reduced form (RF)

coefficient estimate coefficient estimate

Feed price −332.00 1.0562

Income 17.347 0.018825

Price
Feed price increases
S
S'

D
Quantity

Figure 22.7
Demand/supply analysis—An increase in feed price

feed price estimate in the price reduced form (RF) equation is positive, 1.0562. This suggests
that an increase in the feed price increases the price of beef.

Feed price increases

Quantity Price
falls rises

The feed price coefficient estimates are consistent with the standard demand/supply analysis.

• Income reduced form (RF) estimates: Since beef is generally regarded as a normal good,
an increase in income shifts the demand curve for beef to the right. As figure 22.8 illustrates,
the equilibrium quantity and price both increase.
The income coefficient estimates in both the quantity and price reduced form (RF) regression
are positive, 17.347 and .018825. The positive estimates suggest that an increase in income cause
both the quantity and the price of beef to rise.
714 Chapter 22

Price
Income increases
S

D
Quantity

Figure 22.8
Demand/supply analysis—An increase in income

Income increases

Quantity Price
rises falls

The income coefficient estimates are consistent with the standard demand/supply analysis.

We now return to complete step 3 of the reduced form (RF) estimation procedure.

Step 3: Calculate Coefficient Estimates for the Original Model Using the Derivations and
Estimates from Steps 1 and 2
We will use the reduced form (RF) coefficient estimates from step 2 to estimate the price coef-
ficients of the demand and supply models, β DP and β PS , the “slopes” of the supply and demand
curves. To do so, we apply the equations for β PD and β PS that we derived in step 1.
715 Simultaneous Equations Models—Introduction

Demand curve Supply curve

price coefficient price coefficient
From step 1: α FP
Q
α IQ
β PD = β PS =
α FP
P
α IP
↓ ↓
Q
aFP aIQ
Estimate of β PD = bPD = P
Estimate of β PS = bPS =
aFP aIP
From step 2: aQFP = −332.00 aQI = 17.347
P
aFP = 1.0562 a IP = 0.018825
↓ ↓
Q
aFP −332.00 aIQ 17.347
bPD = P
= = −314.3 bPS = = = 921.5
aFP 1.0562 aIP 0.018825

22.3.4 Comparing Ordinary Least Squares (OLS) and Reduced Form (RF) Estimates

We will now compare the ordinary least squares (OLS) and reduced form (RF) estimates of the
price coefficients (table 22.9). The supply curve price coefficient is the most obvious difference.
The ordinary least squares (OLS) estimate is negative while the reduced form (RF) estimate is
positive. In view of our upward sloping supply curve theory, this result is comforting. Unlike
the ordinary least squares (OLS) estimates, the signs of the reduced form (RF) price coefficient
estimates are consistent not only with our theory of demand, but also our theory of supply.
Consequently we will now show that the reduced form (RF) estimation procedure is “better”
than the ordinary least squares (OLS) estimation procedure when estimating simultaneous equa-
tions models.

22.4 Justifying the Reduced Form (RF) Estimation Procedure

Previously, we used the simultaneous equations simulation to show that the ordinary least squares
(OLS) estimation procedure was neither unbiased nor consistent when estimating the values of

Table 22.9
Comparing OLS and RF price coefficient estimates

Estimated demand Estimated supply

curve price coefficient curve price coefficient
bPD bPS

Ordinary least squares (OLS) −364.4 −231.5

Reduced form (RF) −314.3 921.5
716 Chapter 22

Table 22.10
Simultaneous equations simulation results

Demand price coefficient Supply price coefficient

Actual Mean of Variance of Actual Mean of Variance of

Estimation Sample coef estimated estimated coef estimated estimated
procedure size value coef values coef values value coef values coef values

RF 20 − 4.0 ≈ −4.3 ≈5.4 1.0 ≈1.3 ≈5.3

the price coefficients. Now, we will use this simulation to investigate the properties of the
reduced form (RF) estimation procedure. It would be wonderful if the reduced form (RF)
approach were unbiased. Failing that, might the reduced form (RF) approach be consistent?
While we could address these issues rigorously, we will avoid the complex mathematics by using
a simulation.

Econometrics Lab 22.3: Reduced From (RF) Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 22.3.]

Note that the reduced form (RF), rather than the ordinary least squares (OLS), estimation pro-
cedure is now selected. Also the Dem radio button is selected initially; the demand model is
being analyzed. Be certain that the Pause checkbox is cleared. Click Start and then after many,
many repetitions click Stop. Next select the Sup radio button and repeat the process to analyze
the supply model.
Table 22.10 reports the reduced form (RF) results for a sample size of 20. The results suggest
that reduced form (RF) estimation procedures for the price coefficients are biased. The averages
of the estimated price coefficient values after many, many repetitions do not equal the actual
values for either the demand or supply models. The average of the demand price coefficient
estimates equals −4.3 while the actual value equals −4.0; similarly the averate of the supply price
coefficient estimates equals 1.3 while the actual value equals 1.0.
But perhaps that unlike the ordinary least squared (OLS) estimation procedure, the reduced
form (RF) approach might be consistent. To address this question, we increase the sample size,
first from 20 to 30 and then from 30 to 40 (table 22.11). As the sample size becomes larger, bias
is still present but the magnitude of the bias diminishes for both the demand and supply price
coefficients. Furthermore the variances also fall. The simulation illustrates that while the reduced
form (RF) estimation procedure for the price coefficient value is still biased, it is consistent.
We can conclude that the reduced form (RF) estimation procedure for the coefficient value
of an endogenous explanatory variable provides both good news and bad news:
Bad news: The reduced form (RF) estimation procedure for the coefficient value is biased.
717 Simultaneous Equations Models—Introduction

Table 22.11
Simultaneous equations simulation results

Demand price coefficient Supply price coefficient

Actual Mean of Variance of Actual Mean of Variance of

Estimation Sample coef estimated estimated coef estimated estimated
procedure size value coef values coef values value coef values coef values

RF 20 −4.0 ≈ −4.3 ≈5.4 1.0 ≈1.3 ≈5.3

RF 30 −4.0 ≈ −4.2 ≈1.2 1.0 ≈1.2 ≈1.2
RF 40 −4.0 ≈ −4.1 ≈ 0.6 1.0 ≈1.1 ≈ 0.6

Good news: The reduced form (RF) estimation procedure for the coefficient value is
consistent.

22.5 Two Paradoxes

Let us reexamine how we obtained the estimates for the price coefficients of the demand and
supply models:
Q
aFP −332.00 aQ 17.347
bPD = P
= = −314.3, bPS = IP = = 921.5
aFP 1.0562 aI 0.018825

These equations for the two price coefficient estimates appear paradoxical at first glance:

• The demand model’s price coefficient, bPD, depends on the reduced form (RF) coefficients of
Q P Q P
feed price, aFP and aFP . But aFP and aFP tell us something about supply, not demand. They tell
us how the feed price, a variable that shifts the supply curve, affects the equilibrium quantity
and price.
• Similarly the supply model’s price coefficient, bPS, depends on the reduced form (RF) coeffi-
cients of income, aQI and a IP. But a IQ and a IP tell us something about demand, not supply. They
tell us how income, a variable that shifts the demand curve, affects the equilibrium quantity and
price.

22.6 Resolving the Paradoxes: Coefficient Interpretation Approach

22.6.1 Review: Goal of Multiple Regression Analysis and the Interpretation of the Coefficients

Now let us review two key concepts:

• Goal of multiple regression analysis: Multiple regression analysis attempts to sort out the
individual effect that each explanatory variable has on the dependent variable.
718 Chapter 22

• Interpretation of coefficients: Each explanatory variable’s coefficient reveals the individual

impact which that explanatory variable has on the dependent variable; that is, each explanatory
variable’s coefficient tells us how changes in that explanatory variable affect the dependent vari-
able while all other explanatory variables remain constant.
Consider a general regression model with two explanatory variables:

yt = βConst + βx1x1t + βx2x2t + et

where

yt = value of the dependent variable for observation t

x1t = value of the explanatory variable 1 for observation t
x2t = value of the explanatory variable 2 for observation t

Since the actual parameters of the model, the β’s are unobservable, we estimate them:

Esty = bConst + bx1x1 + bx2x2

The coefficient estimates attempt to separate out the individual effect that each explanatory
variable has on the dependent variable. To explain what this means, focus on the estimate of the
first explanatory variable’s coefficient, bx1. It estimates the change in the dependent variable
resulting from a change in the explanatory variable 1 while all other explanatory variables remain
constant. More formally,

Δy
Δy = bx1Δx1 or bx1 =
Δx1

where

Δx1 = change in explanatory variable 1

Δy = estimated change in dependent variable

while all other explanatory variables remain constant. A little algebra explains why. We begin
with the equation estimating our model:

Esty = bConst + bx1x1 + bx2x2

Now increase the explanatory variable 1 by Δx1 while keeping all other explanatory variables
constant. Δy estimates the resulting change in the dependent variable.

From To
Price: x1 → x1 + Δx1
Quantity: Esty → Esty + Δy
719 Simultaneous Equations Models—Introduction

All other explanatory variables remain constant. In the equation estimating our model,
substitute
Esty + Δy x1 + Δx1
and
for Esty for x1
Esty = bConst + bx1x1 + bx2x2 Substituting
↓ ↓
Esty + Δy = bConst + bx1(x1 + Δx1) + bx2x2 Multiplying through by bx1
↓
Esty + Δy = bConst + bx1x1 + bx1Δx1 + bx2x2

Return to the original equation:

Esty = bConst + bx1x1 + bx2x2

Subtract the equations:

Δy = 0 + bx1Δx1 + 0

Simplify:

Δy = bx1Δx1

Divide through by Δx1:

Δy
= bx1
Δx1
while all other explanatory variables remain constant.
Using the same logic, we can interpret the estimate of the second explanatory variable’s coef-
ficient, bx2, analogously:

Δy
Δy = bx 2 Δx 2 or bx 2 =
Δx 2

while all other explanatory variables remain constant. bx2 allows us to estimate the change in the
dependent variable when explanatory variable 2 changes while all other explanatory variables
remain constant.
What happens when both explanatory variables change simultaneously?
The total estimated change in the quantity demanded equals the sum of the individual changes:

Total estimated change in quantity demanded resulting from a change in

Explanatory Explanatory
variable 1 variable 2
↓ ↓
Δy = bx1Δx1 + bx2Δx2
720 Chapter 22

Each term estimates the change in the dependent variable resulting from a change in each indi-
vidual explanatory variable. We will now apply the interpretation of the coefficient estimates to
resolve the paradoxes.
We will resolve the paradoxes by applying the interpretation of the coefficient estimates:
• First, to the original simultaneous equations models.
• Second, to the reduced form (RF) equations.

22.6.2 Paradox: Demand Model Price Coefficient Depends on the Reduced Form (RF) Feed
Price Coefficients

We will first explain why the price coefficient estimate of the demand model, bPD, is determined
Q P
by the reduced form (RF) feed price coefficient estimates, aFP and aFP .
Recall the demand model:

Memand model: Q Dt = βConst

D
+ β DPPt + β DIInct + eDt

The following equation estimates the quantity demanded:

EstQD = bDConst + b DPP + bDIInc

Interpret the price coefficient estimate, bPD. The price coefficient estimate of the demand model
estimates the change in the quantity of beef demanded when price of beef changes while income
remains constant.

ΔQD = bPDΔP

while income remains constant. Solving for bDP obtains

ΔQ D
bPD =
ΔP
while income remains constant. Since income remains constant, the demand curve does not shift;
hence, bPD is just the estimated “slope” of the demand curve for beef (figure 22.9).
Next consider the reduced form (RF) equations that estimate the quantity and price:

EstQ = 138,726 − 332.00FeedP + 17.347Inc

EstP = 33.027 + 1.0562FeedP + 0.018825Inc

Suppose that the feed price decreases while income remains constant. As shown in figure 22.10,
the decrease in feed prices shifts the supply curve for beef to the right.
Now interpret the feed price coefficients in the reduced form (RF) equations:

• The feed price coefficient of the quantity reduced form (RF) equation estimates the change in
the beef quantity of beef when the feed price changes while income remains constant:
721 Simultaneous Equations Models—Introduction

Price
Income constant

D
D
ΔQ
bP = ΔP

ΔP

D
ΔQ D
Quantity

Figure 22.9
“Slope” of demand curve

Price
Income constant
S
Feed price
decreases

ΔP

ΔQ D
Quantity

Figure 22.10
Feed price decreases and income remains constant

ΔQ = −332.00ΔFeedP

while income remains constant.

• The feed price coefficient of the quantity reduced form (RF) equation estimates the change in
the beef price when the feed price changes:

ΔP = 1.0562ΔFeedP

while income remains constant. Divide ΔQ by ΔP. While income remains constant:
722 Chapter 22

ΔQ −332.00 ΔFeedP −332.00

= = = −314.3
ΔP 1.0562 ΔFeedP 1.0562

while income remains constant.

Next recognize that Q represents the quantity of beef demanded. The change in the feed price
causes the supply curve to shift, but the demand curve remains stationary because income has
remained constant. As figure 22.10 illustrates, we are moving from one point on the demand
curve to another point on the same demand curve. This movement represents a change in the
quantity of beef demanded, QD:

ΔQ D
= −314.3
ΔP

We now can appreciate why the “slope” of the demand curve for beef is estimated by the
reduced form (RF) feed price coefficients. Changes in the feed price cause the supply curve for
beef to shift. When the demand curve remains stationary, changes in the feed price move the
equilibrium from one point on the demand curve to another point on the same demand curve.
Consequently the feed price coefficients of the reduced form (RF) equations estimate how the
quantity and price change as we move along the demand curve because they are based on the
premise that income remains constant and therefore the demand curve remains stationary. The
reduced form (RF) feed price coefficients provide us with the information we need to calculate
the “slope” of the demand curve for beef.

22.6.3 Paradox: Supply Model Price Coefficient Depends on the Reduced Form (RF) Income
Coefficients

We will use similar logic to explain why is the price coefficient estimate of the supply model,
β SP, is determined by the reduced form (RF) income coefficient estimates, aQI and aIP. Recall the
supply model:

Supply model: Q St = β Const

S
+ β SP Pt + β FP
S
FeedPt + eSt

The following equation estimates the quantity supplied:

Begin by focusing on the supply model estimated equation:

EstQS = bConst
S
+ bSPP + bSFPFeedP

Interpret the price coefficient estimate, bPS . The price coefficient estimate of the supply model
estimates the change in the quantity of beef supplied when price of beef changes while the feed
price remains constant:

ΔQS = bPS ΔP

while feed price remains constant. Solving for bPS :

723 Simultaneous Equations Models—Introduction

Price
Feed price constant
S

S
ΔP S
ΔQ
b = ΔP
P

S
ΔQ
Quantity

Figure 22.11
“Slope” of supply curve

Price
Feed price constant
S

ΔP

D'
Income increases
D

ΔQ
Quantity

Figure 22.12
Income increases and feed price remains constant

ΔQ S
bPS =
ΔP
while feed price remains constant. Since the feed price is constant, the supply curve does not
shift; hence, bPS is just the estimated “slope” of the supply curve for beef (figure 22.11).
Once again, consider the reduced form (RF) equations that estimate the quantity and price:

EstQ = 138,726 − 332.00FeedP + 17.347Inc

EstP = 33.027 + 1.0562FeedP + 0.018825Inc

724 Chapter 22

Suppose that income increases and feed price remains constant. As shown in figure 22.12, the
demand curve will shift to the right.
Now interpret the income coefficients in the reduced form (RF) equations:

• The income coefficient of the quantity reduced form (RF) equation estimates the change in
beef quantity when income changes while feed prices remain constant:

ΔQ = 17.347ΔInc

while feed price remains constant.

• The income coefficient of the price reduced form (RF) equation estimates the change in the
beef price changes income changes while feed prices remain constant:

ΔP = 0.018825ΔInc

while feed price remains constant. Dividing ΔQ by ΔP, we have

ΔQ 17.347ΔInc 17.347
= = = 921.5
ΔP 018825ΔInc 018825
while feed price remains constant.
Next recognize that this Q represents the quantity of beef supplied. The change in income
causes the demand curve to shift, but the supply curve remains stationary because the feed price
has remained constant. As figure 22.12 illustrates, we move from one point on the supply curve
to another point on the same supply curve. This movement represents a change in the quantity
of beef supplied, QS:

ΔQ S
= 921.5
ΔP

We can appreciate why the “slope” of the supply curve for beef is determined by the reduced
form (RF) income coefficients. Changes in income cause the demand curve for beef to shift.
When the supply curve remains stationary, changes in income move the equilibrium from one
point on the supply curve to another point on the same supply curve. Consequently the income
coefficients of the reduced form (RF) equations estimate how the quantity and price change as
we move along the supply curve because they are based on the premise that the feed price
remains constant and therefore the supply curve remains stationary. The reduced form (RF)
income coefficients provide us with the information we need to calculate the “slope” of the
supply curve for beef.

22.7 The Coefficient Interpretation Approach: A Bonus

We now have developed some intuition regarding why the estimated “slope” of the demand curve
depends on the feed price reduced form (RF) coefficient estimates and why the estimated “slope”
725 Simultaneous Equations Models—Introduction

of the supply curve depends on the income reduced form (RF) coefficient estimates. The coef-
ficient interpretation approach provides intuition and also gives us a bonus. The coefficient
interpretation approach provides us with a simple way to derive the relationships between the
estimated “slopes” of the demand and supply curves and the reduced form (RF) estimates.
Compare the algebra we just used to express the estimated “slopes” of the demand and supply
curves with the algebra used in the appendix to this chapter.

Chapter 22 Review Questions

1. What is the distinction between endogenous and exogenous variables?

2. How do simultaneous equations models differ from single equation models?
3. In a simultaneous equations model, is the ordinary least squares (OLS) estimation procedure
for the value of an endogenous explanatory variable’s coefficient
a. unbiased?
b. consistent?

4. Consider a reduced form (RF) equation:

a. What type of variable is the dependent variable, endogenous or exogenous?
b. What type of variable are the explanatory variables, endogenous or exogenous?

5. In a simultaneous equations model, is the reduced form (RF) estimation procedure for the
value of a coefficient for an endogenous explanatory variable
a. unbiased?
b. consistent?

6. What paradoxes arise when using the reduced form (RF) estimation procedure to estimate
the price coefficients of the demand/supply simultaneous equations model? Resolve the
paradoxes.

Chapter 22 Exercises

Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.

Qt Quantity of beef in month t (millions of pounds)

Consider the linear model of the beef market we used in class:

Demand model: Q Dt = β DConst + β DPPt + β DIInct + eDt

Supply model: QSt = β Const

S
+ β SPPt + β FP
s
FeedPt + eSt
Equilibrium: Q Dt = Q St = Qt

and the reduced form (RF) estimates:

EstQ = 138,726 − 332.00FeedP + 17.347Inc

EstP = 33.027 + 1.0562FeedP + 0.018825Inc

1. Focus on the linear demand model.

a. In words, interpret the Inc coefficient estimate, bDI. Express this more mathematically by
filling in the following blanks:

bDI = _ while _ remains constant

b. Consider the price reduced form (RF) estimates: EstP = 33.027 + 1.0562FeedP +
0.018825Inc
i. What equation estimates the change in the price, ΔP, when both income changes by
ΔInc and the feed price changes byΔFeedP?
ii. When the “while” condition cited in part a is satisfied, how must the change in income,
ΔInc, and the change in feed prices, ΔFeedP, be related? Solve the equation for ΔFeedP?
c. Consider the quantity reduced form (RF) estimates: EstQ = 138,726 − 332.00FeedP +
17.347Inc
i. What equation estimates change in the quantity, ΔQ, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
ii. Substitute in your answer to part b(ii). Then recall your answer to part a to calculate
the numerical value of b DI.

2. Focus on the original supply model.

S
a. In words, interpret the FeedP coefficient estimate, bFP. Express this more mathematically
by filling in the following blanks:

b SFP = _ while _ remains constant

b. Consider the price reduced form (RF) estimates: EstP = 33.027 + 1.0562FeedP +
.018825Inc
i. What equation estimates the change in the price, ΔP, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
727 Simultaneous Equations Models—Introduction

ii. When the “while” condition cited in part a is satisfied, how must the change in income,
ΔInc, and the change in feed prices, ΔFeedP, be related? Solve the equation for ΔInc.
c. Consider the quantity reduced form (RF) estimates: EstQ = 138,726 − 332.00FeedP +
17.347Inc
i. What equation estimates change in the quantity, ΔQ, when both income changes by
ΔInc and the feed price changes by ΔFeedP?
ii. Substitute in your answer to part b(ii). Then recall your answer to part a to calculate
S
the numerical value of b FP .

Now consider a different model describing the beef market: a constant elasticity model. The log
version of this model is

Demand model: log(Q Dt) = β Const

D
+ β DP log(Pt) + β DI log(Inct) + e Dt

Supply model: log(Q St ) = β Const

S
+ β PS log(Pt) + β FP
s
log(FeedPt) + eSt

Equilibrium: log(QDt) = log(QSt) = log(Qt)

3. What are the reduced form (RF) equations for this model?
4. Estimate the parameters for the reduced form (RF) equations.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

a. Focus on the quantity reduced form (RF) regression. Use the regression results to estimate
the change in the log of the quantity, Δlog(Q), when the log of the feed price changes by
Δlog(FeedP) and the log of income changes by Δlog(Inc):

Δlog(Q) =

b. Focus on the price reduced form (RF) regression. Use the regression results to estimate
the change in the log of the price, Δlog(P), when the log of the feed price changes by
Δlog(FeedP) and the log of income changes by Δlog(Inc):

Δlog(P) =

5. Focus on the original demand and supply models:

a. Use the parameters of the reduced form (RF) equations to estimate the own price elastic-
ity of demand. Interpret the estimate.
b. Use the parameters of the reduced form (RF) equations to estimate the own price elastic-
ity of supply. Interpret the estimate.

Chicken market data: Monthly time series data relating to the market for chicken from 1980
to 1985.
728 Chapter 22

Qt Quantity of chicken in month t (millions of pounds)

Pt Real price of whole chickens in month t (cents per pound 1982–84 = 100)
FeedPt Real price chicken formula feed in month t (cents per pound)
Inct Real disposable income in month t (billions of 2005 dollars)
PorkPt Real price of pork in month t (cents per pound)

Consider the following linear model of the chicken market:

Demand model: Q Dt = β Const

D
+ β DPPt + β DIInct + eDt

Supply model: Q St = β Const

S
+ β PS Pt + βFP
s
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

6. What are the reduced form (RF) equations for this model?
7. Estimate the parameters for the reduced form (RF) equations.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Chicken.]

a. Focus on the quantity reduced form (RF) regression. Use the regression results to estimate
the change in the quantity, ΔQ, when the feed price changes by ΔFeedP and income changes
by ΔInc:

ΔQ =

b. Focus on the price reduced form (RF) regression. Use the regression results to estimate
the change in the price, ΔP, when the feed price changes by ΔFeedP and income changes
by ΔInc:

ΔP =

8. Focus on the original demand model:

a. Use the parameters of the reduced form (RF) equations to estimate the price coefficient
of demand, β PD.
b. Use the parameters of the reduced form (RF) equations to estimate the price coefficient
of demand, β SP.

Crime and police data: Annual time series data of US crime and economic statistics from 1988
to 2007.
729 Simultaneous Equations Models—Introduction

Crimet Violent crimes per 100,000 persons in year t

PoliceExpt Per capita police expenditures in year t (2005 dollars)
UnemRatet Unemployment rate in year t (percent)
GdpPCt Per capita GDP in year t (2005 dollars)
PovRatet Individuals below the poverty level in year t (percent)

Consider the following simultaneous equations model of crime and police expenditures

Crime model: Crimet = β Cconst + β CPPoliceExpt + β UCUnemRatet + eCt

Police expenditure model: PoliceExpt = β Const

P
+ β CPCrimet = β GPGdpPCt + ePt

9. Focus on the crime model:

a. What is your theory concerning how police expenditures should affect violent crime?
What does you theory suggest about the sign of the police expenditure coefficient, β CP?
b. What is your theory concerning how the employment rate should affect violent crime?
What does you theory suggest about the sign of the unemployment rate coefficient, β CP?

10. Focus on the police expenditure model:

a. What is your theory concerning how violent crime should affect police expenditures?
What does you theory suggest about the sign of the violent crime coefficient, βCP?
b. What is your theory concerning how per capita GDP should affect police expenditures?
What does you theory suggest about the sign of the per capita GDP coefficient, βCP?

11. What are the reduced form (RF) equations for this model?
12. Estimate the parameters for the reduced form (RF) equations.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Crime in US.]

a. Focus on the crimes reduced form (RF) regression. Use the regression results to estimate
the change in the crime rate, ΔCrimes, when the unemployment rate changes by ΔUnemRate
and per capita GDP changes by ΔGdpPC:
b. Focus on the police expenditure reduced form (RF) regression. Use the regression results
to estimate the change in police expenditures, ΔPoliceExp, when the unemployment rate
changes by ΔUnemRate and per capita GDP changes by ΔGdpPC:

13. Focus on the original crime model:

a. In words, interpret the PoliceExp coefficient estimate, bPC. Express this more
mathematically.
b. Use your answers to exercises 12a and 12b to calculate the value of bPC.
730 Chapter 22

14. Focus on the original police expenditure model:

a. In words, interpret the Crimes coefficient estimate, bCP. Express this more mathematically.
b. Use your answers to exercises 12a and 12b to calculate the value of bCP.

Appendix 22.1: Algebraic Derivation of the Reduced Form Equations

Demand model: Q Dt = β Const

D
+ βPDPt + β DIInct + eDt

Supply model: Q St = βConst

S
+ β PS Pt + βFP
s
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Strategy to Derive the Reduced Form (RF) Equation for Pt

• Substitute Qt for Q Dt and Q St.

• Subtract the supply model equation from the demand model equation.
• Solve for Pt.

Q Dt = β Const
D
+ βPDPt + + β DIInct + eDt

Q St = β Const
S
+ β PS Pt + β FP
s
FeedPt + + eSt

Substitute

Qt = β Const
D
+ βPDPt + + β IDInct + eDt

Qt = β Const
S
+ βPSPt + βFP
s
FeedPt + + eSt

Subtract

0 = β Const
D
− βConst
S
+ β DPPt − β PS Pt − β FP
s
FeedPt + β DIInct + eDt − eSt

Solve

β PS Pt − β PDPt = β Const
D
− βConst
S
+ − β FP
S
FeedPt + β IDInct + β Dt − eSt

(β PS − β PD)Pt = β Const
D
− βConst
S
+ − βFP
S
FeedPt + β DIInct + β Dt − eSt
βConst
D
− βConst
S
β FP
S
FeedPt β ID Inct etD − etS
Pt = + − + +
β PS − β PD β PS − β PD β PS − β PD β PS − β PD
Strategy to Derive the Reduced Form (RF) Equation for Qt

• Substitute Qt for Q Dt and Q St.

• Multiply the demand model equation by β SP and the supply model equation by β PD.
731 Simultaneous Equations Models—Introduction

• Subtract the new equation for the supply model from the new equation for the demand model.
• Solve for Qt.

Q Dt = βConst
D
+ βPDPt + + β DIInct + eDt

Q St = βConst
S
+ β PS Pt + βFP
s
FeedPt + + eSt
Substitute

Qt = βConst
D
+ βPDPt + + β DIInct + eDt

Qt = βConst
S
+ βPS Pt + βFP
s
FeedPt + + eSt

Multiply

βPSQt = βPSβConst
D
+ β PSβ PDPt + + βPSβ DIInct + βPS eDt

βPDQt = β PDβConst
S
+ β PDβPSPt + βPDβFP
S
FeedPt + + β PDeSt

Subtract

βPSQt − βPDQt = βPSβConst

D
− β PDβConst
S
+ βPSβPDPt − βPDβPSPt − βPDβFP
S
FeedPt + β PSβ DIInct + βPS eDt − β PDeSt

Solve

(β PS − βPD)Qt = β PSβConst
D
− β PDβConst
S
+ − β PDβFP
S
FeedPt + β PSβ DIInct + βPS eDt − βPDeSt

β PS βConst
D
− β PD βConst
S
β PD β FP
S
FeedPt β S e D − β PD etS
Qt = + − + β PS β ID Inct + P tS
βP − βP
S D
βP − βP
S D
β P − β PD
Simultaneous Equations Models—Identification
23

Chapter 23 Outline

23.1 Review
23.1.1 Demand and Supply Models
23.1.2 Ordinary Least Squares (OLS) Estimation Procedure
23.1.3 Reduced Form (RF) Estimation Procedure—One Way to Cope with Simultaneous
Equations Models

23.2 Two-Stage Least Squares (TSLS): An Instrumental Variable (IV) Two- Step Approach—A
Second Way to Cope with Simultaneous Equations Models
23.2.1 First Stage: Exogenous Explanatory Variable(s) Used to Estimate the Endogenous
Explanatory Variable(s)
23.2.2 Second Stage: In the Original Model, the Endogenous Explanatory Variable
Replaced with Its Estimate

23.3 Comparison of Reduced Form (RF) and Two-Stage Least Squares (TSLS) Estimates

23.4 Statistical Software and Two-Stage Least Squares (TSLS)

23.5 Identification of Simultaneous Equations Models: Order Condition

23.5.1 Taking Stock
23.5.2 Underidentification
23.5.3 Overidentification
23.5.4 Overidentification and Two-Stage Least Squares (TSLS)

23.6 Summary of Identification Issues

734 Chapter 23

Chapter 23 Prep Questions

Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.

Qt Quantity of beef in month t (millions of pounds)

Pt Real price of beef in month t (1982–84 cents per pound)
Inct Real disposable income in month t (thousands of chained 2005 dollars)
ChickPt Real price of whole chickens in month t (1982–84 cents per pound)
FeedPt Real price of cattle feed in month t (1982–84 cents per pounds of corn cobs)

Consider the model for the beef market that we used in the last chapter:

Demand model: QDt = βConst

D
+ βPDPt + β DIInct + eDt

Supply model: QSt = β Const

S
+ β PS Pt + β SFPFeedPt + est

Equilibrium: QDt = QSt = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

1. We will now introduce another estimation procedure for simultaneous equations models, the
two-stage least squares (TSLS) estimation procedure:

First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).

• Dependent variable: The endogenous explanatory variable(s), the “problem” explanatory

variable(s).
• Explanatory variable(s): All exogenous variables.
Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.

• Dependent variable: Original dependent variable.

• Explanatory variable(s): First-stage estimate of the endogenous explanatory variable and
the relevant exogenous explanatory variables.

Naturally, begin by focusing on the first stage.

First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).

• Dependent variable: The endogenous explanatory variable(s), the “problem” explanatory

variable(s). In this case the price of beef, Pt, is the endogenous explanatory variable.
735 Simultaneous Equations Models—Identification

• Explanatory variable(s): All exogenous variables. In this case the exogenous variables are
FeedPt and Inct.

Using the ordinary least squares (OLS) estimation procedure, what equation estimates the
“problem” explanatory variable, the price of beef?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

EstP = __________________________________

Generate a new variable, EstP, that estimates the price of beef based on the first stage.

2. Next consider the demand model:

Demand model: Q Dt = β Const

D
+ β PDPt + β DIInct + eDt

and the second stage of the two-stage least squares (TSLS) estimation procedure:

Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.

• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variable(s): First-stage estimate of the endogenous explanatory variable and
the relevant exogenous explanatory variables. In this case the estimate of the price of beef and
income, EstPt and Inct.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

a. Using the ordinary least squares (OLS) estimation procedure, estimate the EstP coefficient
of the demand model.
b. Compare the two-stage least squares (TSLS) coefficient estimate for the demand model
with the estimate computed using the reduced form (RF) estimation procedure in chapter 22.

3. Now consider the supply model:

Supply model: Q St = β Const

S
+ β PS Pt + βFP
S
FeedPt + eSt

and the second stage of the two-stage least squares (TSLS) estimation procedure:

Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.

• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
736 Chapter 23

• Explanatory variable(s): First-stage estimate of the endogenous explanatory variable and

the relevant exogenous explanatory variables. In this case the estimate of the price of beef and
income, EstPt and PFeedt.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]
a. Using the ordinary least squares (OLS) estimation procedure, estimate the EstP coefficient
of the supply model.
b. Compare the two-stage least squares (TSLS) coefficient estimate for the supply model
with the estimate computed using the reduced form (RF) estimation procedure in chapter 22.
4. Reconsider the following simultaneous equations model of the beef market and the reduced
form (RF) estimates:

Demand and Supply Models

Demand model: Q Dt = β Const

D
+ β PDPt + β DIInct + eDt

Supply model: Q St = βConst

S
+ βPS Pt + βFP
S
FeedPt + eSt

Equilibrium: Q Dt = QSt = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + aQIInc

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc

a. Focus on the reduced form (RF) estimates for the income coefficients:
i. The reduced form (RF) income coefficient estimates, aIQ and aIP, allowed us to estimate
the “slope” of which curve?

_ Demand _Supply

ii. If the reduced form (RF) income coefficient estimates were not available, would we
be able to estimate the “slope” of this curve?
b. Focus on the reduced form (RF) estimates for the feed price coefficients:
Q
i. The reduced form (RF) feed price coefficient estimates of these coefficients, aFP and
P
aFP, allowed us to estimate the “slope” of which curve?
737 Simultaneous Equations Models—Identification

_ Demand _Supply

ii. If the reduced form (RF) feed price coefficient estimates were not available, would we
be able to estimate the “slope” of this curve?

23.1 Review

23.1.1 Demand and Supply Models

For the economist, arguably the most important example of a simultaneous equations model is
the demand/supply model:

Demand model: Q tD = β Const

D
+ βPDPt + β IDInct + eDt

Supply model: Q St = βConst

S
+ βPS Pt + βFP
S
FeedPt + eSt

Equilibrium: QDt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Project: Estimate the beef market demand and supply parameters.

It is important to emphasize the distinction between endogenous and exogenous variables in
a simultaneous equations model. Endogenous variables are variables whose values are deter-
mined “within” the model. In the demand/supply example, both quantity and price are deter-
mined simultaneously “within” the model; the model is explaining both the equilibrium quantity
and the equilibrium price as depicted by the intersection of the supply and demand curves.
Conversely, exogenous are determined “outside” the context of the model; the values of exog-
enous variables are taken as given. The model does not attempt to explain how the values of
exogenous variables are determined.

• Endogenous variables: Variables determined “within” the model: Quantity and Price.
• Exogenous variables: Variables determined “outside” the model.

Unlike single regression models, an endogenous variable can be an explanatory variable in

simultaneous equations models. In the demand and supply models the price is such a variable.
Both the quantity demanded and the quantity supplied depend on the price; hence the price is
an explanatory variable.
Furthermore the price is determined “within” the model; the price is an endogenous variable.
The price is determined by the intersection of the supply and demand curves. The traditional
demand/supply graph clearly illustrates that both the quantity, Qt, and the price, Pt, are endog-
enous, both are determined “within” the model (figure 23.1).
738 Chapter 23

Price

Q = Equlibrium quantity
P
P = Equilibrium price

D
Quantity
Q

Figure 23.1
Demand/supply model

23.1.2 Ordinary Least Squares (OLS) Estimation Procedure

In the last chapter we learned why simultaneous equations cause a problem for the ordinary least
squares (OLS) estimation procedure:

Simultaneous equations and bias: Whenever an explanatory variable is also an endogenous

variable, the ordinary least squares (OLS) estimation procedure for the value of the explanatory
variable’s coefficient is biased.

In the demand/supply model, the price is an endogenous explanatory variable. When we used
the ordinary least squares (OLS) estimation procedure to estimate the value of the price coef-
ficient in the demand and supply models, we observed that a problem emerged. In each model,
price and the error term were correlated resulting in bias; the price is the “problem” explanatory
variable (figure 23.2).
So where did we go from here? We explored the possibility that the ordinary least squares
(OLS) estimation procedure might be consistent. After all, is not “half a loaf” better than none?
We took advantage of our Econometrics Lab to address this issue. Recall the distinction between
an unbiased and a consistent estimation procedure:

Unbiased: The estimation procedure does not systematically underestimate or overestimate the
actual value; that is, after many, many repetitions the average of the estimates equals the actual
value.

Consistent but biased: As consistent estimation procedure can be biased. But as the sample
size, as the number of observations, grows:
739 Simultaneous Equations Models—Identification

Demand model Supply model

Q =β
t
D D
Const +β P
D
P t QtS = S
Const + S
P
P t
D S
+Other demand factors + e t
+Other supply factors + et

Price Price
D (eD up) S (es down)
S S

P (eD up) P (es down)

S (es up)
P P

P (eD down) P (es up)

D (eD down) D D
Quantity Quantity

et
D
up etD down etS up etS down
↓ ↓ ↓ ↓
Pt up Pt down Pt down Pt up

Explanatory variable and Explanatory variable and

error term and positively correlated error term and positively correlated

↓ ↓
OLS estimation procedure OLS estimation procedure
for coefficient value for coefficient value
biased upward biased upward

Figure 23.2
Correlation of price and error terms

Unfortunately, the Econometrics Lab illustrates the sad fact that the ordinary least squares (OLS)
estimation procedure is neither unbiased nor consistent.
We then considered an alternative estimation procedure: the reduced form (RF) estimation
procedure. Our Econometrics Lab taught us that while the reduced form (RF) estimation proce-
dure is biased, it is consistent. That is, as the sample size grows, the average of the coefficient
estimates gets “closer and closer” to the actual value and the variance grew smaller and smaller.
Arguably, when choosing between two biased estimates, it is better to use the one that is con-
sistent. This represents the econometrician’s pragmatic, “half a loaf is better than none” philoso-
phy. We will now quickly review the reduced form (RF) estimation procedure.
740 Chapter 23

Reduced Form (RF) Estimation Procedure—One Way to Cope with Simultaneous Equations
Models

We begin with the simultaneous equations model and then constructed the reduced form (RF)
equations:

Demand and Supply Models

Demand model: Q Dt = β Const

D
+ β PDPt + βIDInct + eDt

Supply model: QSt = β Const

S
+ β PS Pt + β FP
S
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + aQIInc

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc

We use the ordinary least squares (OLS) estimation procedure to estimate the reduced form (RF)
parameters (tables 23.1 and 23.2) and then use the ratio of the reduced form (RF) estimates to
estimate the “slopes” of the demand and supply curves.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Table 23.1
OLS regression results—Quantity reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP −331.9966 121.6865 −2.728293 0.0073

Inc 17.34683 2.132027 8.136309 0.0000
Const 138,725.5 13,186.01 10.52066 0.0000
Number of observations 120
Estimated equation: EstQ = 138,726 − 332.00FeedP + 17.347Inc
Q
a FP = −332.00, a IQ = 17.347
741 Simultaneous Equations Models—Identification

Table 23.2
OLS regression results—Price reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP 1.056242 0.286474 3.687044 0.0003

Inc 0.018825 0.005019 3.750636 0.0003
Const 33.02715 31.04243 1.063936 0.2895
Number of observations 120
Estimated equation: EstP = 33.037 − 1.0562FeedP + 0.018825Inc
P
aFP = 1.0562, a IP = 0.018825

Estimated “slope” Estimated “slope”

of the demand curve of the supply curve
↓ ↓
Ratio of reduced form (RF) Ratio of reduced form (RF)
feed price income
coefficient estimates coefficient estimates
↓ ↓
Q
a aIQ
Estimate of β PD = bPD = FP
P
Estimate of β PS = bPS =
a FP aIP

−332.00 17.347
= = −314.3 = = 921.5
1.0562 0.018825

23.2 Two-Stage Least Squares (TSLS): An Instrumental Variable (IV) Two-Step Approach—A
Second Way to Cope with Simultaneous Equations Models

Another way to estimate simultaneous equations model is the two-stage least squares (TSLS)
estimation procedure. As the name suggests, the procedure involves two steps. As we will see,
two-stage least squares (TSLS) uses a strategy that is similar to the instrumental variable (IV)
approach.

First stage: Use the exogenous explanatory variable(s) to estimate the endogenous explanatory
variable(s).

• Dependent variable: The endogenous explanatory variable(s), the “problem” explanatory

variable(s).
742 Chapter 23

• Explanatory variable(s): All exogenous variables.

Second stage: In the original model, replace the endogenous explanatory variable with its
estimate.

• Dependent variable: Original dependent variable.

• Explanatory variables: First-stage estimate of the endogenous explanatory variable and the
relevant exogenous explanatory variables.

We will now illustrate the two-stage least squares (TSLS) approach by considering the beef
market.
Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.

Qt Quantity of beef in month t (millions of pounds)

Consider the model for the beef market that we used in the last chapter:

Demand model: Q Dt = β Const

D
+ β PDPt + β DIInct + eDt

Supply model: Q St = β Const

S
+ βPS Pt + βFP
S
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

The strategy for the first stage is similar to the strategy used by instrumental variable (IV)
approach. The endogenous explanatory variable is the source of the bias; consequently the
endogenous explanatory variable is the “problem” explanatory variable. In the first stage the
endogenous explanatory variable is the dependent variable. The explanatory variables are all
the exogenous variables. In our example, price is the endogenous explanatory variable; conse-
quently price becomes the dependent variable in the first stage. The exogenous variables, income
and feed price, are the explanatory variables.

23.2.1 First Stage: Exogenous Explanatory Variable(s) Used to Estimate the Endogenous
Explanatory Variable(s)

• Dependent variable: The endogenous explanatory variable(s), the “problem” explanatory

variable(s). In this case the price of beef, Pt, is the endogenous explanatory variable.
743 Simultaneous Equations Models—Identification

Table 23.3
OLS regression results—TSLS first-stage equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP 1.056242 0.286474 3.687044 0.0003

Inc 0.018825 0.005019 3.750636 0.0003
Const 33.02715 31.04243 1.063936 0.2895
Number of observations 120
Estimated equation: EstP = 33.027 + 1.0562FeedP + 0.018825Inc

• Explanatory variable(s): All exogenous variables. In this case the exogenous variables are
FeedPt and Inct.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Using these regression results we estimate the price of beef based on the exogenous variables,
income and feed price (table 23.3).
The strategy for the second stage is also similar to the instrumental variable (IV) approach.
We return to the original model and replace the endogenous explanatory variable with its estimate
from stage 1. The dependent variable is the original dependent variable, quantity. The explana-
tory variables are stage 1’s estimate of the price and the relevant exogenous variables. In our
example we have two models, one for demand and one for supply; accordingly we first apply
the second stage to demand and then to supply.

23.2.2 Second Stage: In the Original Model, the Endogenous Explanatory Variable Replaced
with Its Estimate

Demand Model

• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variables: First-stage estimate of the endogenous explanatory variable and the
relevant exogenous explanatory variables. In this case the estimate of the price of beef and
income, EstPt and Inct.

We estimate the “slope” of the demand curve to be −314.3 (table 23.4).

744 Chapter 23

Table 23.4
OLS regression results—TSLS second-stage demand equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

EstP −314.3312 115.2117 −2.728293 0.0073

Inc 23.26411 2.161914 10.76089 0.0000
Const 149,106.9 16,280.07 9.158860 0.0000
Number of observations 120
Estimated equation: EstQD = 149,107 − 314.3EstP + 23.26Inc
b PD = −314.3, bPI = 23.26

Table 23.5
OLS regression results—TSLS second-stage supply equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

EstP 921.4783 113.2551 8.136309 0.0000

FeedP −1305.262 121.2969 −10.76089 0.0000
Const 108,291.8 16,739.33 6.469303 0.0000
Number of observations 120
Estimated equation: EstQS = 108,292 + 921.5EstP − 1,305.3 FeedP
bPS = 921.5, S
bFP = −1,305.3

Supply Model

• Dependent variable: Original dependent variable. In this case the original explanatory vari-
able is the quantity of beef, Qt.
• Explanatory variables: First-stage estimate of the “problem” explanatory endogenous vari-
able and any relevant exogenous explanatory variable. In this case the estimated of the price of
beef and income, EstPt, and PFeedt.

We estimate the “slope” of the demand curve to be 921.5 (table 23.5).

23.3 Comparison of Reduced Form (RF) and Two-Stage Least Squares (TSLS) Estimates

Compare the estimates from the reduced form (RF) approach with the estimates from the two-
stage least squares (TSLS) approach (table 23.6). The estimates are identical. In this case the
745 Simultaneous Equations Models—Identification

Table 23.6
Comparison of reduced form (RF) and two-stage least squares (TSLS) price coefficient estimates

Price coefficient estimates: Estimated “slope” of

Demand curve (b PD ) Supply curve (b PS )

Reduced form (RF) −314.3 921.5

Two-stage least squares (TSLS) −314.3 921.5

Table 23.7
TSLS regression results—Demand model

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): FeedP and Inc
Explanatory variable(s): Estimate SE t-Statistic Prob

P −314.3188 58.49828 −5.373129 0.0000

Inc 23.26395 1.097731 21.19276 0.0000
Const 149,106.5 8,266.413 18.03763 0.0000
Number of observations 120
Estimated equation: EstQD = 149,107 − 314.3EstP + 23.26Inc
b PD = −314.3, b ID = 23.26

reduced form (RF) estimation procedure and the two-stage least squares (TSLS) estimation
procedure produce identical results.

23.4 Software and Two-Stage Least Squares (TSLS)

Many statistical packages provide an easy way to apply the two-stage least squares (TSLS)
estimation procedure so that we do not need to generate the estimate of the endogenous explana-
tory variable ourselves (tables 23.7 and 23.8).
746 Chapter 23

Table 23.8
TSLS regression results—Supply model

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): FeedP and Inc
Explanatory variable(s): Estimate SE t-Statistic Prob

P 921.4678 348.8314 2.641585 0.0094

FeedP −1,305.289 373.6098 −3.493723 0.0007
Const 108,292.0 51,558.51 2.100372 0.0378
Number of observations 120
Estimated equation: EstQS = 108,292 + 921.5EstP − 1,305.3 FeedP
b PS = 921.5, S
bFP = −1,305.3

Getting Started in EViews

EViews makes it very easy for us to use the two-stage least squares (TSLS) approach. EViews
does most of the work for us eliminating the need to generate a new variable:

• In the Workfile window, highlight all relevant variables: q p feedp income.

• Double click on one of the highlighted variables and click Open Equation.
• In the Equation Estimation window, click the Method drop down list and then select TSLS—
Two-Stage Least Squares (TSNLS and ARIMA).
• In the Instrument List box, enter the exogenous variables: feedp income.
• In the Equation Specification box, enter the dependent variable followed by the explanatory
variables (both exogenous and endogenous) for each model:
• To estimate the demand model, enter q p income.
• To estimate the supply model, enter q p feedp.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Note that these are the same estimates that we obtained when we generated the estimates of the
price on our own.

23.5 Identification of Simultaneous Equations Models: The Order Condition

23.5.1 Taking Stock

Let us step back for a moment to review our beef market model.
747 Simultaneous Equations Models—Identification

Demand and Supply Models

Demand model: Q tD= β Const

D
+ βPDPt + β IDInct + e Dt

Supply model: Q St = βConst

S
+ βPSPt + βFP
S
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + aIQInc

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc

We can use the coefficient interpretation approach to estimate the “slopes” of the demand and
supply in terms of the reduced form (RF) estimates (figure 23.3).1

Intuition: Critical Role of the Exogenous Variable Absent from the Model
In each model there is one exogenous variable absent and one endogenous explanatory variable.
This one to one correspondence allows us to estimate the coefficient of the endogenous explana-
tory variable, price.

• Demand model: Q Dt = β Const

D
+ β PDPt + β DIInct + eDt
Changes in the feed price, the exogenous variable absent from the demand model, allow us
to estimate the “slope” of the demand curve. The intuition is that the supply curve shifts while
the demand curve remains stationary. Consequently the equilbria “trace out” the stationary
demand curve.
• Supply model: Q St = β Const
S
+ βPSPt + βFP
S
FeedPt + eSt
Changes in income, the exogenous variable absent from the demand model, allow us to esti-
mate the “slope” of the supply curve. The intuition is that the demand curve shifts while the
supply curve remains stationary. Consequently the equilibria “trace out” the stationary supply
curve.
Key point: In each case, changes in the exogenous variable absent in the model allow us to
estimate the value of the price coefficient, the model’s endogenous explanatory variable.

1. Again, recall that the coefficients do not equal the slope of the demand curve, but rather the reciprocal of the slope.
They are the ratio of run over rise instead of rise over run. This occurs as a consequence of the economist’s convention
of placing quantity on the horizontal axis and price on the vertical axis. To avoid the awkwardness of using the expres-
sion “the reciprocal of the slope” repeatedly, we will place the word slope within double quotes to indicate that it is the
reciprocal.
748 Chapter 23

Suppose that FeedP increases while Suppose that Inc increases while
Inc remains constant: FeedP remains constant:
ΔQ = aFP
Q
ΔFeedP ΔQ = aIQ ΔInc
ΔP = aFP
P
ΔFeedP ΔP = aIP ΔInc
Price Price
FeedP increases
FeedP constant
Inc constant S
S’ Inc increases
Initial
Initial D’
equilibrium
equilibrium
Q
S ΔP = aI ΔInc
Q
ΔP= aF PΔFeedP
Q
D ΔQ = aI ΔInc
Q
ΔQ = aF PΔFeedP D
Quantity Quantity
Changes in the feed price shift the Changes in the income shift the
supply curve, but not the demand curve demand curve, but not the supply curve
↓ ↓
We are moving from one equilibrium to We are moving from one equilibrium to
another on the same demand curve. another on the same supply curve.
This movement represents a change in This movement represents a change in
D S
the quantity of beef demanded, Q the quantity of beef supplied, Q
↓ ↓
Estimated “slope” of demand curve: Estimated “slope” of supply curve:

ΔQ aFP
Q
ΔFeedP aFP
Q
ΔQ aIQ ΔInc aIQ
bPD = = P = P bPS = = =
ΔP aFP ΔFeedP aFP ΔP aIP ΔInc aIP

↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
demand model, FeedP, allows us to supply model, Inc, allows us to
estimate the “slope” of the demand curve estimate the “slope” of the supply curve

Figure 23.3
Reduced form (RF) and coefficient interpretation approach—Identified
749 Simultaneous Equations Models—Identification

Order Condition
The order condition formalizes this relationship:

Number of Number of
exogenous endogenous
explanatory explanatory variables
variables absent included in the model
from the model
Less than Equal to Greater than
↓
Model Model Model
underidentified identified overidentified
↓ ↓ ↓
No RF estimate Unique RF estimate Multiple RF estimates

The reduced form estimation procedure for our beef market example is identified. For both the
demand model and the supply model, the number of exogenous variable absent from the model
equaled the number of endogenous explanatory variables in the model:

Exogenous variables: FeedP and Inc.

A total of two exogenous explanatory variables.
Demand model Supply model
Exogenous Endogenous Exogenous Endogenous
explanatory explanatory explanatory explanatory
variables variables variables variables variables variables
included absent included included absent included
1 2−1=1 1 1 2−1=1 1
1 equal to 1 1 less than 1
↓ ↓
Unique Unique
RF estimate RF estimate

23.5.2 Underidentification

We will now illustrate the underidentification problem. Suppose that no income information
was available. Obviously, if we have no income information, we cannot include Inc as an
explanatory variable in our models:
750 Chapter 23

Demand and Supply Models

Demand model: QDt = βConst

D
+ β PDPt + β DIInct + eDt

Supply model: Q St = βConst

S
+ β PS Pt + βFP
S
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + aQIInc

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc

Let us now apply the order condition by comparing the number of absent exogenous variables
and endogenous explanatory variables in each model:

Exogenous variable: Feed.

A total of one exogenous explanatory variable.
Demand model Supply model
Exogenous Endogenous Exogenous Endogenous
explanatory explanatory explanatory explanatory
variables variables variables variables variables variables
included absent included included absent included
1 1−0=1 1 1 1−1=0 1
1 equal to 1 0 less than 1
↓ ↓
Unique RF No RF
estimate estimate

The order condition suggests that we should

• be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model and
• not be able to estimate the coefficient of the endogenous explanatory variable, P, in the supply
model.

The coefficient interpretation approach explains why. We can still estimate the “slope” of the
demand curve by calculating the ratio of the reduced form (RF) feed price coefficient estimates,
Q P
aFP and aFP , but we cannot estimate the “slope” of the supply curve since we cannot estimate the
751 Simultaneous Equations Models—Identification

Estimated “slope” of demand curve: Estimated “slope” of supply curve:

ΔQ D ΔQ S
bPD = bPS =
ΔP ΔP
Price Price
FeedP increases
FeedP constant
Inc constant S
S’ Inc increases
Initial
Initial D’
equilibrium
equilibrium
S Q
Q ΔP = aI ΔInc
ΔP =aF PΔFeedP

D
ΔQ = aQ ΔFeedP D ΔQ = aQΔInc
FP I
Quantity Quantity
Changes in the feed price shift the Changes in income shift the
supply curve, but not the demand curve demand curve, but not the supply curve

↓ ↓
ΔQ ΔFeedP aFP
Q
ΔQ ΔInc aIQ
bPD = = P = P bPS = = =
Δ a P ΔFeedP aFP Δ a ΔInc aIP

Figure 23.4
Reduced form (RF) and coefficient interpretation approach—Underidentified

reduced for (RF) income coefficients. We will use the coefficient estimate approach to explain
this phenomenon to take advantage of the intuition it provides (figure 23.4).
There is both good news and bad news when we have feed price information but no income
information:

Good news: On the one hand, since we still have feed price information, we have information
about supply curve shifts. The shifts in the supply curve cause the equilibrium quantity and price
to move along the demand curve. In other words, shifts in the supply curve “trace out” the
demand curve; hence we can still estimate the “slope” of the demand curve.
Bad news: On the other hand, since we have no income information, we have no information
about demand curve shifts. Without knowing how the demand curve shifts we have no idea how
the equilibrium quantity and price move along the supply curve. In other words, we cannot “trace
out” the supply curve; hence we cannot estimate the “slope” of the supply curve.

To use the reduced form (RF) approach to estimate the “slope” of the demand curve, we first
use ordinary least squares (OLS) to estimate the parameters of the reduced form (RF) equations
(tables 23.9 and 23.10).
752 Chapter 23

Table 23.9
OLS regression results—Quantity reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP −821.8494 131.7644 − 6.237266 0.0000

Const 239,158.3 5,777.771 41.39283 0.0000
Number of observations 120
Estimated equation: EstQ = 239,158 − 821.85FeedP
Q
aFP = −821.85

Table 23.10
OLS regression results—Price reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP 0.524641 0.262377 1.999571 0.0478

Const 142.0193 11.50503 12.34411 0.0000
Number of observations 120
Estimated equation: EstP = 142.0 + 0.52464FeedP
P
aFP = 0.52464

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Then we can estimate the “slope” of the demand curve by calculating the ratio of the feed price
estimates:
Q
aFP −821.85
Estimated “slope” of the demand curve = bPD = P
= = −1, 566.5
aFP 0.52464

Now let us use the two-stage least squares (TSLS) estimation procedure to estimate the “slope”
of the demand curve (table 23.11). In both cases the estimated “slope” of the demand curve is
−1,566.5.
When we try to use two-stage least squares (TSLS) to estimate the “slope” of the supply curve
the statistical software will report an error. We are asking the statistical software to do the
impossible.
Similarly an underidentification problem would exist if income information was available, but
feed price information was not.
753 Simultaneous Equations Models—Identification

Table 23.11
TSLS regression results—Demand model

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): FeedP
Explanatory variable(s): Estimate SE t-Statistic Prob

P −1,566.499 703.8335 −2.225667 0.0279

Const 46,1631.4 115,943.8 3.981510 0.0001
Number of observations 120
Estimated equation: EstQ = 461,631 − 1,566.5P
b PD = −1,566.5

Demand and Supply Models

Demand model: Q Dt = β Const

D
+ β PDPt + β DIInct + eDt

Supply model: Q St = β Const

S
+ β PSPt + βFP
S
FeedPt + eSt

Equilibrium: Q Dt = Q St = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt and Inct

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + aQIInc

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc

Again, let us now apply the order condition by comparing the number of absent exogenous
variables and endogenous explanatory variables in each model:

Exogenous variable: Inc.

A total of one exogenous explanatory variable.
Demand model Supply model
Exogenous Endogenous Exogenous Endogenous
explanatory explanatory explanatory explanatory
variables variables variables variables variables variables
included absent included included absent included
1 1−1=0 1 1 2−1=1 1
0 less than 1 1 equal to 1
↓ ↓
No Unique
RF estimate RF estimate

The order condition suggests that we should

• be able to estimate the coefficient of the endogenous explanatory variable, P, in the supply
model and
• not be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model.

The coefficient interpretation approach explains why (figure 23.5).

Again, there is both good news and bad news when we have income information, but no feed
price information:

Good news: Since we have income information, we still have information about demand curve
shifts. The shifts in the demand curve cause the equilibrium quantity and price to move along
the supply curve. In other words, shifts in the demand curve “trace out” the supply curve; hence
we can still estimate the “slope” of the supply curve.
Bad news: On the other hand, since we have no feed price information, we have no information
about supply curve shifts. Without knowing how the supply curve shifts we have no idea how
the equilibrium quantity and price move along the demand curve. In other words, we cannot
“trace out” the demand curve; hence we cannot estimate the “slope” of the demand curve.

To use the reduced form (RF) approach to estimate the “slope” of the supply curve, we first
use ordinary least squares (OLS) to estimate the parameters of the reduced form (RF) equations
(tables 23.12 and 23.13).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]
755 Simultaneous Equations Models—Identification

Estimated “slope” of demand curve: Estimated “slope” of supply curve:

ΔQ D
ΔQ S
bPD = bPS =
ΔP ΔP
Price Price
FeedP increases
FeedP constant
Inc constant S
S’ Inc increases
Initial
Initial D’
equilibrium
equilibrium Q
ΔP = aI ΔInc
S
Q
ΔP =aF PΔFeedP
Q
D ΔQ = aI ΔInc
Q
ΔQ = aF PΔFeedP D
Quantity Quantity
Changes in the feed price shift the Changes in income shift the
supply curve, but not the demand curve demand curve, but not the supply curve
↓ ↓
ΔQ aFP
Q
ΔFeedP aFP
Q
ΔQ aIQ ΔInc aIQ
bPD = = P = P bPS = = =
ΔP aFP ΔFeedP aFP ΔP aIP ΔInc aIP

Figure 23.5
Reduced form (RF) and coefficient interpretation approach—Underidentified

Table 23.12
OLS regression results—Quantity reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

Inc 20.22475 1.902708 10.62946 0.0000

Const 111,231.3 8,733.000 12.73690 0.0000
Number of observations 120
Estimated equation: EstQ = 111,231 + 20.225Inc
a IQ = 20.225
756 Chapter 23

Table 23.13
OLS regression results—Price reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

Inc 0.009669 0.004589 2.107161 0.0372

Const 120.4994 21.06113 5.721413 0.0000
Number of observations 120
Estimated equation: EstP = 120.5 + 0.009669Inc
aIP = 0.009669

Table 23.14
TSLS regression results—Supply equation

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): Inc
Explanatory variable(s): Estimate SE t-Statistic Prob

P 2091.679 1169.349 1.788756 0.0762

Const −140,814.8 192,634.8 − 0.730994 0.4662
Number of observations 120
Estimated equation: EstQ = −140,815 + 2,091.7P
bPS = 2,091.7

Then we can estimate the “slope” of the supply curve by calculating the ratio of the income
estimates:
aIQ 20.225
Estimated “slope ” of the supply curve = bPS = P
= = 2, 091.7
aI 0.009669

Once again, two-stage least squares (TSLS) provide the same estimate (table 23.14).
Also when we try to use two-stage least squares (TSLS) to estimate the “slope” of the demand
curve the statistical software will report an error.

Conclusion: When a simultaneous equations model is underidentified, we cannot estimate all

its parameters. For those parameters we can estimate, however, the reduced form (RF) estimation
procedure and the two-stage least squares (TSLS) estimation procedures are equivalent.
757 Simultaneous Equations Models—Identification

23.5.3 Overidentification

While an underidentification problem arises when too little information is available, an overi-
dentification problem arises when, in some sense, too much information is available. To illus-
trate this, suppose that in addition to the feed price and income information, the price of chicken
is also available. Since beef and chicken are substitutes, the price of chicken would appear as
an exogenous explanatory variable in the demand model. The simultaneous equations model and
the reduced form (RF) estimates would change:

Demand and Supply Models

Demand model: Q Dt = βConst

D
+ βPDPt + β DIInct + βCP
D
ChickPt + eDt

Supply model: Q St = βConst

S
+ βPS Pt + βFP
S
FeedPt + + eSt

Equilibrium: QDt = QSt = Qt

Endogenous variables: Qt and Pt

Exogenous variables: FeedPt, Inct, and ChickPt

Reduced Form (RF) Estimates

Quantity reduced form (RF) estimates: EstQ = aConst

Q
+ aFP
Q
FeedP + a QIInc + aCP
Q
ChickP

Price reduced form (RF) estimates: EstP = aConst

P
+ aFP
P
FeedP + aIPInc + aCP
P
ChickP

Let us now apply the order condition by comparing the number of absent exogenous variables
and endogenous explanatory variables in each model:

Exogenous variables: FeedP, Inc, and ChickP

A total of three exogenous explanatory variables.
Demand model Supply model
Exogenous Endogenous Exogenous Endogenous
explanatory explanatory explanatory explanatory
variables variables variables variables variables variables
included absent included included absent included
1 3−2=1 1 1 3−1=2 1
1 equal to 1 2 greater than 1
↓ ↓
Unique Multiple RF
RF estimate estimates
758 Chapter 23

Estimated “slope” of supply curve:

ΔQ S
bPS =
ΔP
Price Price FeedP constant
FeedP constant Inc increases
Inc increases S ChickP increases S

Initial D’ Initial D’
equilibrium Q equilibrium Q
ΔP = aI ΔInc ΔP = aFP ΔChickP

Q Q
D ΔQ = aI ΔInc D ΔQ = aCP ΔChickP

Quantity Quantity

Changes in income shift the Changes in the chicken price shift the
demand curve, but not the supply curve demand curve, but not the supply curve
↓ ↓
ΔQ aIQ ΔInc aIQ ΔQ aCP
Q
ΔChickP aCP
Q
bPS = = = bPS = = P = P
ΔP aIP ΔInc aIP ΔP aCP ΔChickP aCP

↓ ↓
The exogenous variable absent in the The exogenous variable absent in the
supply model, Inc, allows us to supply model, ChickP, allows us to
estimate the “slope” of the supply curve estimate the “slope” of the supply curve

Figure 23.6
Reduced form (RF) and coefficient interpretation approach—Overidentified

The order condition suggests that we should

• be able to estimate the coefficient of the endogenous explanatory variable, P, in the demand
model and
• encounter some complications when estimating the coefficient of the endogenous explanatory
variable, P, in the supply model. In fact, the reduced form (RF) estimation procedure provides
multiple estimates.

We will now explain why the multiple estimates result (figure 23.6). We have two exogenous
factors that shift the demand curve: income and the price of chicken. Consequently there are
two ways to “trace out” the supply curve. There are now two different ways to use the reduced
form (RF) estimates to estimate the “slope” of the supply curve:
759 Simultaneous Equations Models—Identification

Ratio of the reduced form (RF) Ratio of the reduced form (RF)
income coefficients chicken feed coefficients
↓ ↓
Estimated “slope” Estimated “slope”
of supply curve: of supply curve:
↓ ↓
Q Q
a aCP
bPS = I
P
bPS = P
a I aCP

We will go through the mechanics of the reduced form (RF) estimation procedures to illustrate
the overidentification problem. First we use the ordinary least squares (OLS) to estimate the
reduced form (RF) parameters (tables 23.15 and 23.16).

Table 23.15
OLS regression results—Quantity reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: Q
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP −349.5411 135.3993 −2.581558 0.0111

Inc 16.86458 2.675264 6.303894 0.0000
ChickP 47.59963 158.4147 0.300475 0.7644
Const 138,194.2 13,355.13 10.34765 0.0000
Number of observations 120
Estimated equation: EstQ = 138,194 − 349.54FeedP + 16.865Inc + 47.600ChickP
Q
aFP = −349.54, a IQ = 16.865, Q
a CP = 47.600

Table 23.16
OLS regression results—Price reduced form (RF) equation

Ordinary least squares (OLS)

Dependent variable: P
Explanatory variable(s): Estimate SE t-Statistic Prob

FeedP 0.955012 0.318135 3.001912 0.0033

Inc 0.016043 0.006286 2.552210 0.0120
ChickP 0.274644 0.372212 0.737870 0.4621
Const 29.96187 31.37924 0.954831 0.3416
Number of observations 120
Estimated equation: EstP = 29.96 + 0.95501FeedP + 0.016043Inc + 0.27464ChickP
P
aFP = 0.95501, aIP = 0.95501, P
aCP = 0.27464
760 Chapter 23

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

Then we use the reduced form (RF) estimates to compute the estimates for the “slopes” of
the demand and supply curves:

Estimated “slope” Estimated “slope” Estimated “slope”

of demand curve of the supply curve of the supply curve
↓ ↓ ↓
Ratio of reduced Ratio of reduced Ratio of reduced
form (RF) feed price form (RF) income form (RF) chicken price
coefficient estimates coefficient estimates coefficient estimates
↓ ↓ ↓
a Q
−349.54 a Q
16.865 a Q
47.600
bPD = FP
P
= = −366.0 bPS = = I
P
= 1051.2 bPS = CP
P
= = 173.3
a FP 0.95501 a 0.016043
I a CP 0.27464

The reduced form (RF) estimation procedure produces two different estimates for the “slope”
for the supply curve. This is what we mean by overidentification.

23.5.4 Overidentification and Two-Stage Least Squares (TSLS)

While reduced form (RF) estimation procedure cannot resolve the overidentification problem,
two-stage least squares (TSLS) approach can. The two-squares least squares estimation proce-
dure provides a single estimate of the “slope” of the supply curve. The following regression
printout reveals this (tables 23.17 and 23.18).

Table 23.17
TSLS regression results—Demand model

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): FeedP, Inc, and ChickP
Explanatory variable(s): Estimate SE t-Statistic Prob

P −366.0071 68.47718 −5.344950 0.0000

Inc 22.73632 1.062099 21.40697 0.0000
ChickP 148.1212 86.30740 1.716205 0.0888
Const 149,160.5 7,899.140 18.88313 0.0000
Number of observations 120
Estimated equation: EstQD = 149,160 − 366.0EstP + 22.74Inc
bPD = −366.0, bID = 22.74, D
bCP = 22.74
761 Simultaneous Equations Models—Identification

Table 23.18
TSLS regression results—Supply model

Two-stage least squares (TSLS)

Dependent variable: Q
Instrument(s): FeedP, Inc, and ChickP
Explanatory variable(s): Estimate SE t-Statistic Prob

P 893.4857 335.0311 2.666874 0.0087

FeedP −1,290.609 364.0891 −3.544761 0.0006
Const 112,266.0 49,592.54 2.263769 0.0254
Number of observations 120
Estimated equation: EstQS = 112,266 + 893.5EstP − 1,290.6FeedP
bPS = 893.5, S
bFP = −1,290.6

Table 23.19
Comparison of RF and TSLS estimates

Price coefficient estimates:

Estimated “slope” of

Demand curve (bPD ) Supply curve (b PS )

Reduced form (RF) −366.0

Based on income coefficients 1,051.2
Based on chicken price coefficients 173.3
Two-stage least squares (TSLS) −366.0 893.5

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

The estimated “slope” of the demand curve is −366.0. This is the same estimate as computed
by the reduced form (RF) estimation procedure. Two-stage least squares (TSLS) provides a
single estimate for the “slope” of the supply curve:

bPS = 893.5

Table 23.19 compares the estimates that result when using the two different estimation proce-
dures. Note that on the one hand, the demand model is not overidentified. Both the reduced form
(RF) estimation procedure and the two-stage least squares (TSLS) estimation procedure provide
the same estimate for the “slope” of the demand curve. On the other hand, the supply model is
overidentified. The reduced form (RF) estimation procedure provides two estimates for the “slope”
of the supply curve; the two-stage least squares (TSLS) estimation procedure provides only one.
762 Chapter 23

23.6 Summary of Identification Issues

Chapter 23 Review Questions

1. What does it mean for a simultaneous equations model to be underidentified?

2. What does it mean for a simultaneous equations model to be overidentified?
3. Compare the reduced form (RF) estimation procedure and the two-stage least squares (TSLS)
estimation procedure:
a. When will the two procedures produce identical results?
b. When will the two procedures produce different results? How do the results differ?

Chapter 23 Exercises

Consider the data we used in class to analyze the beef market:

Beef market data: Monthly time series data relating to the market for beef from 1977 to 1986.

Qt Quantity of beef in month t (millions of pounds)

Consider the following constant elasticity model describing the beef market:
763 Simultaneous Equations Models—Identification

Demand model: log(QDt) = β Const

D
+ βPD log(Pt) + β DI log(Inct) + βCP
D
log(ChickP) + eDt

Supply model: log(QSt) = βConst

S
+ βPS log(Pt) + βFP
S
log(FeedPt) + + eSt
Equilibrium: log(QDt ) = log(QSt ) = log(Qt)

1. Suppose that there were no data for the price of chicken and income; that is, while you can
include the variable FeedP in your analysis, you cannot use the variables Inc and ChickP.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

a. Consider the reduced form (RF) estimation procedure:

i. Can we estimate the own price elasticity of demand, β PD? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β DP? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
2. Suppose instead that there were no data for the price of feed; that is, while you can include
the variables Inc and ChickP in your analysis, you cannot use the variable FeedP.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

a. Consider the reduced form (RF) estimation procedure:

ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
3. Last suppose that you can use all the variables in your analysis.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Beef.]

a. Consider the reduced form (RF) estimation procedure:

i. Can we estimate the own price elasticity of demand, β DP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β PD? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?

Chicken market data: Monthly time series data relating to the market for chicken from 1980
to 1985.

Qt Quantity of chicken in month t (millions of pounds)

Pt Real price of whole chickens in month t (1982–84 cents per pound)
FeedPt Real price chicken formula feed in month t (1982–84 cents per pound)
Inct Real disposable income in month t (thousands of chained 2005 dollars)
PorkPt Real price of pork in month t (1982–84 cents per pound)

Consider the following constant elasticity model describing the chicken market:

Demand model: log(Q Dt ) = β Const

D
+ β PD log(Pt) + β ID log(Inct) + β PP
D
log(PorkP) + eDt

Supply model: log(QSt) = β Const

S
+ β SP log(Pt) + β SFP log(FeedPt) + + e St
Equilibrium: log(Q Dt ) = log(Q St ) = log(Qt)

4. Suppose that there were no data for the price of pork and income; that is, while you can
include the variable FeedP in your analysis, you cannot use the variables Inc and PorkP.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Chicken.]
765 Simultaneous Equations Models—Identification

a. Consider the reduced form (RF) estimation procedure:

i. Can we estimate the own price elasticity of demand, β DP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β DP? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
5. Suppose instead that there were no data for the price of feed; that is, while you can include
the variables Inc and PorkP in your analysis, you cannot use the variable FeedP.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Chicken.]

a. Consider the reduced form (RF) estimation procedure:

i. Can we estimate the own price elasticity of demand, β DP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
ii. Can we estimate the own price elasticity of supply, β SP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β PD? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?
6. Last suppose that you use all the variables in your analysis.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Market for Chicken.]

a. Consider the reduced form (RF) estimation procedure:

ii. Can we estimate the own price elasticity of supply, β SP? If not, explain why not. If so,
does the reduced form (RF) estimation procedure provide a single estimate? What is (are)
the estimate (estimates)?
b. Consider the two-stage least squares (TSLS) estimation procedure:
i. Can we estimate the own price elasticity of demand, β DP? If so, what is (are) the estimate
(estimates)?
ii. Can we estimate the own price elasticity of supply, β PS? If so, what is (are) the estimate
(estimates)?

In general, compare the reduced form (RF) estimation procedure and the two-stage least squares
(TSLS) estimation procedure.

7. When the reduced form (RF) estimation procedure provides no estimate for a coefficient,
how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
8. When the reduced form (RF) estimation procedure provides a single estimate for a coefficient,
how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
How are the estimates related?
9. When the reduced form (RF) estimation procedure provides multiple estimates for a coeffi-
cient, how many estimates does the two-stage least squares (TSLS) estimation procedure provide?
Binary and Truncated Dependent Variables
24

Chapter 24 Outline

24.1 Introduction

24.2 Binary Dependent Variables

24.2.1 Electoral College: Red and Blue States
24.2.2 Linear Probability Model
24.2.3 Probit Probability Model: Correcting the Linear Model’s Intrinsic Problems

24.3 Truncated (Censored) Dependent Variables

24.3.1 Ordinary Least Squares (OLS) Estimation Procedure
24.3.2 Tobit Estimation Procedure

Chapter 24 Prep Questions

2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.

PopDent Population density of state t in 2004 (persons per square mile)

Unemt Unemployment rate in 2004 of state t (percent)
WinDem1t 1 if Democrats won state t in 2004; 0 if Republicans won state t

Consider the following model explaining the winning party in each state:

Model: WinDem1t = βConst + βDenPopDent + et

1. Devise a theory regarding how the population density of a state affects voting behavior. That
is, as a state’s population density increases thereby becoming more urban will a state become
more or less likely to vote Democratic? What does your theory suggest about the sign of the
coefficient for PopDent?
768 Chapter 24

2. Construct a scatter diagram to illustrate the relationship between PopDen and WinDem1 by
plotting Plot PopDen on the horizontal axis and WinDem1 on the vertical axis.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2004.]
a. What does the scatter diagram suggest about your theory?
b. What feature of this scatter diagram is different from the scatter diagrams we have seen
previously?
3. Use the ordinary least squares (OLS) estimation procedure to estimates the model’s
parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2004.]

a. What equation estimates the effect of PopDen on WinDem1?

b. What is the estimated value of WinDem1 for
i. Alaska?
ii. Florida?
iii. Maryland?
iv. Massachusetts?
v. Rhode Island?
vi. New Jersey?

Salaries and hitting performance of American League hitters: Cross-sectional 2011 salary and
2010 performance data for all hitters on Major League rosters at the opening of the 2011 season.

Salaryt Salary of player t in 2011 (thousands of dollars)

AtBatst At bats of player t in 2010
BatAvgt Batting average of player t in 2010
NameFirstt Player t’s first name
NameLastt Player t’s last name
OnBasePctt On base percentage of player t in 2010
SlugAvgt Slugging average of player t in 2010
Teamt Player t’s team

Consider the following model explaining salaries of Major League hitters:

Model: Salaryt = βConst + βOBPOnBasePctt + et

769 Binary and Truncated Dependent Variables

4. Devise a theory regarding how on base percentage affects salary. What does you theory
suggest about the sign of the coefficient for OnBasePctt?
5. Construct a scatter diagram to illustrate the relationship between OnBasePct and Salary by
plotting OnBasePct on the horizontal axis and Salary on the vertical axis.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

MLB Hitter Salaries—2011.]

a. What does the scatter diagram suggest about your theory?

b. In 2011 Major League Baseball’s collective bargaining agreement with the players’ union
requires teams to pay each Major League player at least $414,000. How is the minimum
salary reflected on the scatter diagram?
6. Use the ordinary least squares (OLS) estimation procedure to estimate the model’s
parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

MLB Hitter Salaries—2011.]

a. What equation estimates the effect of OnBasePct on Salary?

b. What is the estimated value of Salary when
i. OnBasePct equals .400?
ii. OnBasePct equals .300?
iii. OnBasePct equals .200?
c. Do all the estimated values make sense?

24.1 Introduction

We will now consider two special problems that the dependent variable can create when using
the ordinary least squares (OLS) estimation procedure:
• The first arises when the dependent variable is binary, that is, when the dependent variable
can only take on two values such as Yes/No or True/False. In this case one of the two possibili-
ties is represented with a 0 and the other with a 1; that is, the dependent variable is a dummy
variable.
• The second problem arises when the dependent variable can never be greater than a specific
value and/or less than a specific value. For example, in the United States the legal wage an
employee can be paid cannot fall below the Federal minimum wage in most states and occupa-
tions. Currently the federally mandated minimum wage is $7.25 per hour.

We will now show that whenever either of these problems is present, the ordinary least squares
(OLS) estimation procedure can produce erroneous results.
770 Chapter 24

24.2 Binary Dependent Variables

24.2.1 Electoral College: Red and Blue States

The number of votes a state receives in the Electoral College equals the number of congress-
people the state sends to Washington: the number of Representatives plus the number of Sena-
tors, two.1 With the exception of two states, Maine and Nebraska, all a state’s electoral votes are
awarded to the presidential candidate receiving the most votes. We will focus on the 2004 presi-
dential election in which the Democrat, John Kerry, challenged the incumbent Republican,
George W. Bush.2

Project: Assess the effect of state population density on state Electoral College winner.
2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.

PopDent Population density of state t in 2004 (persons per square mile)

Unemt Unemployment rate in 2004 of state t (percent)
WinDem1t 1 if Democrats won state t in 2004; 0 if Republicans won state t

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2004.]

Table 24.1 reports on the winner of each state’s electoral votes along with each state’s popula-
tion density. Note that the table ranks the states in order of their population density. As you can
see, the Republicans won all of the eleven least densely populated states. The Democrats won
all of the seven most densely populated states. The Republicans and Democrats split the states
in the middle. This observation allows us to formulate a theory:

Theory: The party winning a state’s Electoral College vote depends on the state’s population
density; as a state becomes more densely populated, the Democrats rather than the Republicans
become more likely to win.

To assess the theory we begin by constructing a scatter diagram (figure 24.1). Each point
represents one state. The dependent variable y which equals 1 whenever the Democrats win and
0 whenever the Republicans win. The dependent variable WinDem1 is a binary variable, a
dummy variable; WinDem1 takes on only one of two values, either 0 or 1:

1. The District of Columbia has three votes even though it has no (voting) members of Congress. Consequently the
District of Columbia was not included in our analysis, although its inclusion would not affect our conclusions.
2. While Maine and Nebraska do not have a “winner take all” system, in 2004 all of the state’s electoral votes were
won by a single party in each of these states. The Democrats won all of Maine’s electoral votes and the Republicans
won all of Nebraska’s votes.
771 Binary and Truncated Dependent Variables

Table 24.1
2004 Electoral College winners by state

Population Population
density (persons density (persons
State per sq mi) Winner State per sq mi) Winner

Alaska 1.2 Rep Washington 92.8 Dem

Wyoming 5.2 Rep Wisconsin 101.4 Dem
Montana 6.4 Rep Louisiana 103.0 Rep
North Dakota 9.2 Rep Kentucky 104.1 Rep
South Dakota 10.2 Rep South Carolina 139.4 Rep
New Mexico 15.6 Rep Tennessee 143.3 Rep
Idaho 16.8 Rep New Hampshire 144.1 Dem
Nevada 21.2 Rep Georgia 153.8 Rep
Nebraska 22.7 Rep Indiana 173.1 Rep
Utah 29.7 Rep North Carolina 174.9 Rep
Kansas 33.4 Rep Michigan 177.6 Dem
Oregon 37.3 Dem Virginia 188.3 Rep
Maine 42.4 Dem Hawaii 194.9 Dem
Colorado 44.3 Rep Illinois 227.8 Dem
Arizona 50.6 Rep California 228.4 Dem
Oklahoma 51.1 Rep Pennsylvania 275.2 Dem
Arkansas 52.6 Rep Ohio 279.5 Rep
Iowa 52.7 Rep Florida 321.0 Rep
Mississippi 61.5 Rep New York 408.7 Dem
Minnesota 63.8 Dem Delaware 422.3 Dem
Vermont 66.9 Dem Maryland 566.6 Dem
West Virginia 74.9 Rep Connecticut 717.3 Dem
Missouri 83.4 Rep Massachusetts 821.3 Dem
Texas 85.6 Rep Rhode Island 1,025.0 Dem
Alabama 88.8 Rep New Jersey 1,162.0 Dem

WinDem1t = 1 if the Democrats win in state t

=0 if the Republicans win in state t

The scatter diagram appears to support our theory. As the population density increases, states
tend to support Democrats rather than Republicans. All states whose population density was less
than 35 persons per square mile voted Republican while all states whose population density was
greater than 400 persons per square mile voted Democrat. States whose population density was
between 35 and 400 persons per square mile were split. Next we will formulate a model to assess
the theory.
772 Chapter 24

WinDem(0 if Rep; 1 if Dem)

1 10 100 1,000 10,000

PopDen(Persons per square mile)

Figure 24.1
Scatter diagram—Population density versus election winner

24.2.2 Linear Probability Model

The linear probability model is just the “standard” linear specification. The dependent variable,
WinDem1t, is the winning party indicated by a 0 or 1 and the explanatory variable is population
density, PopDen:

WinDem1t = βConst + βDenPopDent + et

Interpretation of the Dependent Variable

The dependent variable WinDem1t can be interpreted as the probability of Democrats winning.
On the one hand, if the Democrats actually win in a state, then the probability of the Democrats
winning is 1; therefore, if the Democrats win, WinDem1t equals 1. On the other hand, if the
Republicans actually win in a state, then the probability of the Democrats winning is 0; therefore,
if the Republicans win, WinDem1t equals 0. Using the ordinary least squares (OLS) estimation
procedure, we can estimate the model’s parameters (table 24.2).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2004.]

We now formulate out null and alternative hypotheses:

H0: βDen = 0 Population density has no effect on the election outcome.

H1: βDen > 0 An increase in population density increases the probability of a Democrat win.
773 Binary and Truncated Dependent Variables

Table 24.2
OLS regression results—2004 presidential election

Ordinary least squares (OLS)

Dependent variable: WinDem1

Explanatory variable(s): Estimate SE t-Statistic Prob

PopDen 0.001001 0.000236 4.247280 0.0001

Const 0.192315 0.074299 2.588406 0.0127
Number of observations 50
Estimated equation: EstWinDem1 = 0.192 + 0.001PopDen
Interpretation of estimates:
bPopDen = 0.001: When a state’s population density increases by 100, the probability that the state votes Democratic
increases by 0.1
Critical result: The estimate of PopDen’s coefficient is positive. As a state becomes more densely populated, it is
more likely to vote Democratic. This lends support to our theory.

Intrinsic Problems with the Linear Probability Model

Using these results we can estimate the probability that the Democrats will win a particular state:

EstWinDen = 0.192 + 0.001PopDen

Using this equation let us calculate estimates for some selected states:

State PopDen EstWinDem1

Alaska 1.2 0.192 + 0.001 × 1.2 = 0.192 + 0.001 = 0.193
Florida 321.0 0.192 + 0.001 × 321.0 = 0.192 + 0.321 = 0.513
Maryland 566.6 0.192 + 0.001 × 566.6 = 0.192 + 0.567 = 0.759

These estimates for the probability of the Democrats winning each state appear to be reasonable.
Alaska’s population density is very low, and we estimate that the probability of a Democrat win
is low also, only 0.193. Florida’s population density falls in the middle, between 35 and 400;
we estimate that the probability of a Democrat win in Florida is about half. Maryland’s popula-
tion density is high, above 400; we estimate that the probability of a Democrat win in Maryland
is high also, 0.759.
Let us calculate probability estimates for Massachusetts, Rhode Island, and New Jersey:

State PopDen EstWinDem1

Massachusetts 821.3 0.192 + 0.001 × 821.3 = 0.192 + 0.821 = 1.013
Rhode Island 1,025.0 0.192 + 0.001 × 1,025.0 = 0.192 + 1.025 = 1.217
New Jersey 1,162.0 0.192 + 0.001 × 1,162.0 = 0.192 + 1.162 = 1.354
774 Chapter 24

EstWinDem EstWinDem = 0.192 + 0.001PopDen

1.2

1.0

0.8

0.6

0.4

0.2

0 PopDem
200 400 600 800 1,000 1,200

Figure 24.2
Scatter diagram—Population density versus winning party with linear best fitting line

The probability estimates for these states are nonsensical. Remember that a probability cannot
be less than 0 or greater than 1. On the one hand, if the probability of an event equals 0, that
event cannot occur. On the other hand, if the probability of an event equal 1, the event will occur
with certainty. An event cannot be more certain than certain. A probability of 1.013 or 1.217 or
1.354 simply does not make sense.
It is easy to understand why the linear probability model produces the nonsensical results;
just graph the best fitting line (figure 24.2). The probability model is linear; consequently, the
slope of the best fitting line is a constant, 0.001. Since the slope is a positive constant, the esti-
mated probability will exceed 1 whenever the population density is large enough.

24.2.3 Probit Probability Model: Correcting the Linear Model’s Intrinsic Problems

Stretched S-shaped versus Straight Regression Lines

How can we remedy this problem? Instead of a linear (straight) best fitting line, perhaps we
should use a “stretched S-shaped” best fitting line, to account for the fact that the probability
can never be less than 0 or greater than 1. The stretched S-shaped regression line never falls
below 0 and never rises above 1 (figure 24.3). What are the ramifications of a stretched S-shaped
regression line? The stretched S-shaped line means that the change in probability resulting from
a change in population density varies with population density:
• When population density is “small,” a change in population density results in only a small
change in probability; for example, when the population density falls from 400 to 200, the esti-
mated probability of a Democratic win falls, but by only a little. When the population density
775 Binary and Truncated Dependent Variables

EstWinDem

1.2

1.0

0.8

0.6

0.4

0.2

0 PopDen
200 400 600 800 1,000 1,200

Figure 24.3
Scatter diagram—Population density versus winning party with stretched S-shaped best fitting line

is low initially we would expect the probability of a Democratic win to be low already. Therefore
any subsequent decrease in population density must reduce the estimated probability only by a
small amount; otherwise, a nonsensical negative probability would result.
• When the population density is “large,” a change in the population density results in only a
small change in probability; for example, when the population density rises from 800 to 1,000,
the estimated probability of being a Democratic win rises, but by only a little. When the popula-
tion density is high initially we would expect the probability of being a Democratic win to be
high already. Therefore any subsequent increase in the population density must raise the esti-
mated probability only by a small amount; otherwise, a nonsensical probability exceeding 1
would result.
• When the population density is “moderate” a change in the population results in a large change
in probability; for example, the change in the estimated probability of being a Democratic win
is large in the 500 to 700 range.

Simple Probit Example Data

The probit approach is arguably the most frequently used procedure to construct a stretched
S-shaped regression line.3 For pedagogical purposes we will consider a hypothetical example
including only four states to illustrate how this is done. To simplify the notation, we let xt equal

3. Any procedure that is used frequently is logit. While the probit and logit procedures do not produce identical results,
rarely do the results differ in any substantial way.
776 Chapter 24

Table 24.3
Party affiliation—Simple probit example data

x = PopDen Winning y = WinDem1

State (persons/sq mi) party (Dem = 1, Rep = 0)

1 100 Rep 0
2 450 Dem 1
3 550 Rep 0
4 900 Dem 1

(Dem) 1

(Rep) 0 x
200 400 600 800 1,000

Figure 24.4
Scatter diagram—Simplified example

the population density, PopDent, and yt equal the dummy variable representing the winning party,
WinDem1t, (Table 24.3).

The first state votes Republican and has a population density of 100 persons per square mile.
The second state votes Democratic and has a population density of 450. The third state votes
Republican and has a population density of 550. The fourth state votes Democratic and has a
population density of 900. The scatter diagram appears in Figure 24.4:

Constructing the Stretched S-shaped Line Using the Normal Distribution

As we continue, we want to find the best fitting stretched S-shaped regression line. But how do
we construct a stretched S-shaped regression line (figure 24.5)? The probit model utilizes the
normal distribution to construct the line. Recall that the normal distribution’s z equals the number
of standard deviations a value is from the distribution’s mean. The probit model involves two
steps:
777 Binary and Truncated Dependent Variables

(Dem) 1

(Rep) 0 x
200 400 600 800 1,000

Figure 24.5
Scatter diagram—Simplified example with stretched S-shaped best fitting line

Step 1: We transform population density into a z using a transformation function.

Step 2: We apply the normal distribution to the z to compute the probability of the state will
vote Republican and Democratic.

The easiest way to understand how the probit model works is to use an example. Let us begin
by considering one possible transformation function: z = −2 + 0.004x. For the moment, do not
worry about why we are using this particular transformation function. We simply wish to show
how the probit estimation procedure constructs its stretched S-shaped lines.
Begin by calculating the probability that state 1 votes Democratic. Its population density, xt,
equals 100, we simply plug it into the transformation function:

z = −2 + 0.004x = −2 + 0.004 × 100 = −2 + 0.4 = −1.6

Next we turn to the normal distribution to calculate the probability of the Democrats and Repub-
licans winning in the state. We can use the Econometrics Lab to do so (figure 24.6).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 24.1a.]

A z value of −1.6 is already entered for us, so we just click Calculate.

The area to the left of −1.6 is 0.0548; hence the probability of the Democrats winning in state
1 equals 0.0548. The area to the right and the probability of the Republicans winning is 0.9452.
Let us perform the same calculation for state 2. Since its population density, xt, equals 450, z
equals −0.2:

z = −2 + 0.004x = −2 + 0.004 × 450 = −2 + 1.8 = −0.2

Apply the normal distribution (figure 24.7).

778 Chapter 24

First state: x = 100:

z = −2 + 0.004 × 100 = −2 + 0.4 = 1.6

Est Prob[Rep]
Est Prob[Dem]
1 −Esty = 0.9453
Esty = 0.0548

z
–1.6 0

Prob[Dem] = Esty = 0.0548

Prob[Rep] = 1 −Esty = 0.9452

Figure 24.6
x equals 100

Second state: x = 450:

z = −2 + 0.004 × 450 = −2 + 1.8 = −0.2

Est Prob[Rep]
Est Prob[Dem]
1 −Esty = 0.5793
Esty = 0.4207

z
− 0.2 0

Prob[Dem] = Esty = 0.4207

Prob[Rep] = 1 −Esty = 0.5793

Figure 24.7
x equals 450
779 Binary and Truncated Dependent Variables

Table 24.4
Simple probit example probability calculations

Est Est
Win Actual Prob[Dem] Prob[Rep] Prob of
State party Y x z = −2 + 0.004x Esty 1 − Esty actual y

1 Dem 0 100 −2 + 0.004 × 100 = −1.6 0.0548 0.9452 0.9452

2 Rep 1 450 −2 + 0.004 × 450 = − 0.2 0.4207 0.5793 0.4207
3 Dem 0 550 −2 + 0.004 × 550 = 0.2 0.5793 0.4207 0.4207
4 Rep 1 900 −2 + 0.004 × 900 = 1.6 0.9452 0.0548 0.9452

(Dem) 1
0.9452

0.5793

0.4207

0.0548
(Rep) 0 x
200 400 600 800 1,000

Figure 24.8
Scatter diagram—Population density versus winning party with stretched S-shaped best fitting line

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 24.1b.]

The probability of the Democrats winning in state 2 equals 0.4207; the probability of the Repub-
licans winning equals 0.5793.
Table 24.4 reports on the probabilities for the four states (plotted in figure 24.8). The trans-
formation function, z = −2 + 0.004x, obtains the four probabilities:
• First state votes Republican equals 0.9452.
• Second state votes Democratic equals 0.4207.
• Third state votes Republican equals 0.4207.
• Fourth state votes Democratic equals 0.9452.
780 Chapter 24

Now let us calculate the probability of the actual result, that is, the probability that the first
state votes Republican and the second state votes Democratic and the third state votes Republican
and the fourth state votes Democratic; this equals the product of the individual probabilities:

Prob[1st Rep and 2nd is Dem and 3rd is Rep and 4th is Dem]
Prob Prob Prob Prob
= × × ×
[1st Rep ] [2nd Dem ] [3rd Rep] [4th Dem ]
= 0.9452 × 0.4207 × 0.4207 × 0.9452 = 0.1582

Maximum Likelihood Estimation Procedure

So we understand how the probit estimation procedure constructs its stretched S-shaped line.
But how do we choose the transformation function, the equation that “transforms” population
density into z? That is, how do we decide if

z = −2 + 0.004x

is better or worse than

z = −2 + 0.005x or z = −3 + 0.004x or z = −3 + 0.003x or ...

The answer is that we choose the equation that maximizes the likelihood of obtaining the
actual result. That is, choose the equation that maximizes the likelihood that the first state votes
Republican, the second Democratic, the third Republican, and the fourth Democratic.
We already calculated this probability for the transformation function z = −2 + 0.004x; it
equals 0.1582. Obviously the search for the “best” transformation function could be very time-
consuming. Our Econometrics Lab can help, however.

Econometrics Lab 24.2: Probit Estimation Procedure

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 24.2.]

We can specify different constants and coefficients by moving the slide bars (figure 24.9). The
simulation quickly calculates the probability of obtaining the actual results for the transformation
functions with different constants and coefficients.
Initially, a constant of −2.00 and a coefficient of 0.004 are selected, the values we used to
explain the construction of the stretched S-shaped line. Note that the probabilities we calculated
and those calculated by the lab are identical. But let us try some other constants and coefficients.
Table 24.5 reports on several different transformation functions:
It looks like a constant of −2.00 and coefficient of 0.004 is the best. That is, the transforma-
tion function z = −2 + 0.004x maximizes the probability of obtaining the actual results. Fortu-
nately, computational techniques have been devised to estimate the best transformation functions
quickly. Statistical software uses such algorithms (table 24.6).
781 Binary and Truncated Dependent Variables

Constant Coefficient
State Winner y PopDen z Prob(Dem) Prob(Rep) Prob(y)
1 Rep 0 100 –1.60 0.0548 0.9452 0.9452
2 Dem 1 450 –0.02 0.4207 0.5793 0.5793
3 Rep 0 550 0.20 0.5793 0.4207 0.4207
4 Dem 1 900 1.60 0.9452 0.0548 0.9452

Prob(1 Rep and 2 Dem and 3 Rep and 4 Dem)

= 0.9452 x 0.4207 x 0.4207 x 0.9452
= 0.1582

–2.00 0.0040

Figure 24.9
Probit Calculation Lab

Table 24.5
Simple probit example—Econometric Lab calculations

Constant Coefficient Prob[1st Rep and 2nd Dem and 3rd Rep and 4th Dem]

−2.00 0.0040 0.1582

−2.00 0.0039 0.1277
−2.00 0.0041 0.1278
−2.05 0.0040 0.1278
−1.95 0.0040 0.1278

Table 24.6
Probit results—Simple probit example

Probit

Dependent variable: y
Explanatory variable(s): Estimate SE z-Statistic Prob

x 0.004014 0.003851 1.042327 0.2973

Const −2.006945 2.068016 − 0.970469 0.3318
Number of observations 4
782 Chapter 24

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Probit Example.]

Getting Started in EViews

In EViews, after double clicking on variables selected for the regression, y and x:
• In the Equation Estimation window, click on the Method dropdown list.
• Select BINARY.
• Click OK.

These results are consistent with our simulation results. The maximum likelihood transformation
is:

z = −2 + 0.004x

Now we will apply the Probit estimation procedure to the 2004 election results using EViews
(table 24.7). Using the probit estimates we can calculate the probabilities for a few selected
states (table 24.8).

Table 24.7
Probit results—2004 presidential election

Probit

Dependent variable: WinDem1

Explanatory variable(s): Estimate SE z-Statistic Prob

PopDen 0.005729 0.002030 2.821456 0.0048

Const −1.186613 0.326697 −3.632150 0.0003
Number of observations 50

Table 24.8
Probit estimates—2004 presidential election

State PopDen z = −1.187 + 0.00573PopDen Est Prob[Dem] Est Prob[Rep]

Alaska 1.2 −1.187 + 0.00573 × 1.2 = −1.180 0.119 0.881

Indiana 173.1 −1.187 + 0.00573 × 173.1 = − 0.195 0.423 0.577
Florida 321.0 −1.187 + 0.00573 × 321.0 = 0.652 0.743 0.257
New York 408.7 −1.187 + 0.00573 × 408.7 = 1.155 0.876 0.124
783 Binary and Truncated Dependent Variables

24.3 Truncated (Censored) Dependent Variables

We will now consider a second example in which the ordinary least squares (OLS) estimation
procedure falters. The problem arises whenever the dependent variable cannot take on a value
that is greater than or less than a specific value. Our example considers the salaries of Major
League Baseball hitters at the beginning of the 2011 season.

Project: Assess the impact of a hitter’s the previous season’s performance on his salary.
Salaries and hitting performance of Major League hitters: Cross-sectional 2011 salary and 2010
performance data for all hitters on Major League rosters at the opening of the 2011 season.

Salaryt Salary of player t in 2011 (thousands of dollars)

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

MLB Hitter Salaries—2011]

First examine the salaries of the twenty-five highest paid hitters (table 24.9). Next consider
the let us examine the salaries of the twenty-five lowest paid hitters (table 24.10). As table 24.10
shows, no player earns less than $414,000. This is dictated by the collective bargaining
agreement negotiated by Major League Baseball and the Major League Baseball Player’s Union.
The minimum salary a team could pay to a Major League player in 2011 was $414,000. Con-
sequently player salaries are said to be truncated or censored, their value cannot fall below
$414,000.
We will now investigate the theory that a hitter’s performance in 2010 affects his salary in
2011.4 On the one hand, if a hitter does well in 2010, he will be able to negotiate a high salary
in 2011; on the other hand, if his 2010 performance was poor, a low salary would results. More
specifically, we will focus on the effect that on base percentage has on salary:
Theory: An increase in a hitter’s on base percentage in 2010 increases his 2011 salary.

4. For purposes of illustration we are ignoring the existence of multiyear contracts.

784 Chapter 24

Table 24.9
Salaries of the twenty-five highest paid MLB hitters in 2011

NameLast NameFirst Team Salary ($1,000)

Rodriguez Alex NYY 32,000

Wells Vernon LAA 26,643
Teixeira Mark NYY 23,125
Mauer Joe MIN 23,000
Helton Todd COL 20,275
Cabrera Miguel DET 20,000
Howard Ryan PHI 20,000
Beltran Carlos NYM 19,325
Lee Carlos HOU 19,000
Soriano Alfonso CHN 19,000
Hunter Torii LAA 18,500
Bay Jason NYM 18,125
Suzuki Ichiro SEA 18,000
Holliday Matt STL 16,318
Young Michael TEX 16,175
Fielder Prince MIL 15,500
Utley Chase PHI 15,286
Morneau Justin MIN 15,000
Crawford Carl BOS 14,857
Jeter Derek NYY 14,729
Ramirez Aramis CHN 14,600
Pujols Albert STL 14,508
Fukudome Kosuke CHN 14,500
Wright David NYM 14,250
Beltre Adrian TEX 14,000

We begin by considering a simple linear model relating on base percent and salary:

Model: Salaryt = βConst + βOBPOnBasePctt + et

where

Salaryt = salary of player t in 2011 (thousands of dollars)

OnBasePctt = on base percentage of player t in 2010

Theory: βOBP > 0

To assess this model, first construct a scatter diagram (figure 24.10).

The rectangular points on the scatter diagram represent the eighteen players who earn the
minimum salary of $414,000; the triangular points represent all other players. The on base
785 Binary and Truncated Dependent Variables

Table 24.10
Salaries of the twenty-five lowest paid MLB hitters in 2011

NameLast NameFirst Team Salary ($1,000)

Freese David STL 416

Wilson Bobby LAA 416
Greene Tyler STL 416
Espinosa Danny WAS 415
Ramos Wilson WAS 415
Hoffpauir Jarrett SND 415
Mangini Matt SEA 415
Baxter Mike SND 414
Bocock Brian PHI 414
Bourjos Peter LAA 414
Brown Domonic PHI 414
Conger Hank LAA 414
Cousins Scott FLA 414
Craig Allen STL 414
Descalso Daniel STL 414
Duda Lucas NYM 414
Dyson Jarrod KAN 414
Francisco Juan CIN 414
Freeman Freddie ATL 414
Hayes Brett FLA 414
Hicks Brandon ATL 414
Morel Brent CHA 414
Morrison Logan FLA 414
Nickeas Mike NYM 414
Trumbo Mark LAA 414

percentage of all but two of these players falls below the average on base percentage of all MLB
hitters, 0.323. Consequently it is reasonable to believe that without the minimum salary imposed
by the collective bargaining agreement the salaries of most of these players would fall below
$414,000. In other words, the collective bargaining agreement is truncating salaries. To under-
stand why truncated variables create problems for the ordinary least squares (OLS) estimation
procedure we begin by estimating the model’s parameters by using ordinary least square (OLS)
estimation procedure (table 24.11).

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

MLB Hitter Salaries—2011.]
786 Chapter 24

35,000

30,000

25,000
Salary($1,000)

20,000

15,000

10,000

5,000

0
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
On base percentage

Figure 24.10
Scatter diagram—On base percentage versus salary

Table 24.11
OLS regression results—MLB salaries

Ordinary least squares (OLS)

Dependent variable: Salary

Explanatory variable(s): Estimate SE t-Statistic Prob

OnBasePct 37,964.84 4,885.038 7.771657 0.0000

Const −8,372.006 1,593.313 −5.254463 0.0000
Number of observations 401
Estimated equation: EstSalary = −8,370 + 37,960OnBasePct
Interpretation of estimates:
bOnBasePct = 37,960: A 0.001 point increase in on base percentage increases salary by $37,960
787 Binary and Truncated Dependent Variables

35,000
Best fitting line
without "truancated"
30,000 observations

25,000
Salary ($1,000)

20,000

15,000

10,000 Best fitting line with

"truancated"
observations
5,000

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
On base percentage

Figure 24.11
Scatter diagram—On base percentage versus salary and best fitting lines

24.3.1 Ordinary Least Squares (OLS) Estimation Procedure

The ordinary least squares (OLS) estimates suggest that a 0.001 point increase in on base per-
centage increases salary by $37,960. This in fact understates the actual effect of on base percent-
age, how salaries would be affected by on base percentage if Major League Baseball would not
be artificially constrained by the $414,000 minimum salary (figure 24.11). Without baseball’s
minimum wage, it is reasonable to believe that the “truncated” points would be lower and,
consequently, the best filling line would be steeper.

24.3.2 Tobit Estimation Procedure

The Tobit estimation procedure accounts for the truncation of the dependent variable. It takes
advantage of all the available information, but treats “truncated” observations differently than
the “nontruncated” ones (table 24.12). While we will not delve into the mathematics underlying
the Tobit estimation procedure, we will show that software packages allow us to apply the pro-
cedure easily.
788 Chapter 24

Table 24.12
Tobit results—MLB salaries

Tobit

Dependent variable: Salary

Explanatory variable(s): Estimate SE z-Statistic Prob

OnBasePct 47,800.87 5,502.369 8.687325 0.0000

Const −11,732.32 1,802.921 − 6.507397 0.0000
Number of observations 401
Left (lower) censoring 414
value
Estimated equation: EstSalary = −11,730 + 47,800OnBasePct
Interpretation of estimates:
bOnBasePct = 47,800: A 0.001 point increase in on base percentage increases salary by $47,800

Table 24.13
Comparison of ordinary least squares (OLS) and tobit results

Estimation procedure Estimate of βOBP

OLS 37,960
Tobit 47,800

Getting Started in EViews

• As usual, select the dependent and explanatory variables and then double click on one of the
selected variables.
• In the Equation Estimation window, click on the Method dropdown list;
• Select CENSORED.
•By default, the left (lower) censoring value is 0. This value should be changed to 414, the
minimum wage, for Major League players.
• Click OK.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

MLB Hitter Salaries—2011.]

Now let us compare the two estimates (table 24.13). The Tobit estimate of the OnBasePct coef-
ficient is 47,800 as opposed to the ordinary least squares (OLS) estimate of 37,960. This is
consistent with the scatter diagram appearing in figure 24.11.
789 Binary and Truncated Dependent Variables

Chapter 24 Review Questions

1. How do we interpret the dependent variable when it is a binary (dummy) variable?

2. What problem arises with the ordinary least squares (OLS) estimation procedure when the
dependent variable is a binary variable? Explain.
3. Does the probit estimation procedure address the problem that arises with the ordinary least
squares (OLS) estimation procedure when the dependent variable is a binary variable? Explain.
4. What problem can arise with the ordinary least squares (OLS) estimation procedure when
the dependent variable is truncated?

Chapter 24 Exercises

2004 Electoral College data: Cross-sectional data from the 2004 presidential election for the
fifty states.

PopDent Population density of state t in 2004 (persons per square mile)

Unemt Unemployment rate in 2004 of state t (percent)
WinDem1t 1 if Democrats won state t in 2004; 0 if Republicans won state t

Generate a new variable, WinRep1t:

WinRep1t = 1 if Republicans won state t in 2004

=0 if Democrats won state t

1. Consider the following model explaining the winning party in each state:

Model: WinRep1t = βConst + βDenPopDent + et

a. Develop a theory regarding how population density influences the probability of a Repub-
lican victory. What does your theory imply about the sign of population density
coefficient?
b. Using the ordinary least squares (OLS) estimation procedure, estimate the value of the
population density coefficient using the 2004 Electoral College data. Interpret the coefficient
estimates. What is the critical result?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2004.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
790 Chapter 24

e. Compare the ordinary least squares (OLS) equation we computed earlier in the chapter
estimating WinDem1 with the equation you just computed estimating WinRep1. Are these
two equations consistent? That is, do the two equations suggest that the effect of population
density on the winning party in a state is the same? Explain.
2. Use the probit estimation procedure to analyze the effect that PopDen has on WinRep1.
Compare the probit estimates we computed earlier in the chapter estimating WinDem1 with the
probit estimates you just computed estimating WinRep1. Are the estimates consistent?
2008 Electoral College data: Cross-sectional data from the 2008 presidential election for the
fifty states.

PopDent Population density of state t in 2008 (persons per square mile)

Unem2007t Unemployment rate in 2007 of state t (percent)
Unem2008t Unemployment rate in 2008 of state t (percent)
UnemTrendt Unemployment rate trend state t (percent); that is, Unem2008t − Unem2007t
WinDem1t 1 if Democrats won state t in 2008; 0 if Republicans won state t

3. Use the probit estimation procedure to analyze the effect of population density in the 2008
presidential election. For the moment, assume that population density is the only explanatory
variable affecting the election results.
a. Use the probit estimation procedure to find the maximum likelihood transformation. What
is the critical result?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2008.]

b. Formulate the null and alternative hypotheses.

c. Calculate Prob[Results IF H0 true] and assess your theory.
4. Use the probit estimation procedure to analyze the effect of the unemployment trend in the
2008 presidential election. Suppose that the unemployment trend, rather than population density,
is the only explanatory variable affecting the election results.
a. Develop a theory regarding how the unemployment trend influences the probability of a
Democratic victory.
b. Use the probit estimation procedure to find the maximum likelihood transformation. What
is the critical result?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2008.]

c. Formulate the null and alternative hypotheses.

d. Calculate Prob[Results IF H0 true] and assess your theory.
791 Binary and Truncated Dependent Variables

5. Use the probit estimation procedure to analyze the effect of both the population density and
the unemployment trend in the 2008 presidential election in a single model.
a. Use the probit estimation procedure to find the maximum likelihood transformation. What
is the critical result?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Electoral College—2008.]

Compare your probit estimates in exercises 3 and 4 with the ones you just computed.
b. Are the critical results the same?
c. How do the estimate values and their significances differ?

Degree day and temperature data for Charlestown, SC: Daily time series data of degree days
and high temperatures for Charlestown, SC, in 2001.

HeatDegDayst Number of heating degree days on day t (degrees Fahrenheit)

HighTempt High temperature on day t (degrees Fahrenheit)

Heating degree days only consider those days in which heat is required; that is, when tempera-
tures are high and cooling rather than heating is required, the value of heating degree days is 0.
Consequently heating degree days is a truncated or censored variable—truncated at 0.
Consider the following model:

Model: HeatDegDayst = βConst + βHTHighTempt

6. Devise a theory to explain the number of degree days based on the high temperature. What
does your theory suggest about the sight of the coefficient for HighTemp?
7. Construct a scatter diagram to illustrate the relationship between HighTemp and HeatDe-
gDays by plotting HighTemp on the horizontal axis and HeatDegDays on the vertical axis. Does
the scatter diagram lend support to your theory?

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Charleston SC Degree Days.]

8. Use the ordinary least squares (OLS) estimation procedure to analyze the effect of the high
temperature on heating degree days.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Charleston SC Degree Days.]

a. Estimate the value of the high temperature coefficient. Interpret the coefficient estimate.
What is the critical result?
b. Formulate the null and alternative hypotheses.
792 Chapter 24

c. Calculate Prob[Results IF H0 true] and assess your theory.

9. Use the Tobit estimation procedure to estimate the model’s parameters.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Charleston SC Degree Days.]
a. How do the ordinary least squares (OLS) and the Tobit estimates differ?
b. Explain these results.
Descriptive Statistics, Probability, and Random Variables—A Closer Look
25

Chapter 25 Outline

25.1 Descriptive Statistics: Other Measures of the Distribution Center

25.1.1 Measure of the Distribution Center: Mode
25.1.2 Measure of the Distribution Center: Median
25.1.3 Relationship between the Mean and Median

25.2 Event Trees: A Tool to Calculate Probabilities

25.3 Calculating the Probability of a Combination of Different Outcomes

25.4 Nonconditional, Conditional, and Joint Probabilities

25.5 Conditional/Joint Probability Relationship

25.6 The Monty Hall Problem: Mathematicians Eat “Humble Pie”

25.7 Correlation
25.7.1 Correlated Events
25.7.2 Correlated Random Variables and Covariance

25.8 Independence
25.8.1 Independent Events
25.8.2 Independent Random Variables and Covariance

25.9 Summary of Correlation and Independence

25.9.1 Correlation
25.9.2 Independence

25.10 Describing Probability Distributions of Continuous Random Variables

794 Chapter 25

Chapter 25 Prep Questions

1. Consider a deck of cards that contains only 3 red cards and 2 black cards.
a. Draw one card from the deck. What is the probability that the card drawn is
i. red?
ii. black?
b. Do not replace the first card drawn. Draw a second card from the deck. If the first card
drawn is red, what is the probability that the second card drawn is
i. red?
ii. black?
c. If the first card drawn is black, what is the probability that the second card drawn is
i. red?
ii. black?
2. Monty Hall was the host of the popular TV game show “Let’s Make a Deal.” Use our lab to
familiarize yourself with the game.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 25P.2.]

Click Play and follow the instructions. Play the game a dozen times or so. How frequently did
you win?

3. Let v be a discrete random variable.

a. What is the equation for the mean of v, Mean[v]?
b. What is the equation for the variance of v, Var[v]?
4. In words, what does it mean for two random variables to be independent?

25.1 Descriptive Statistics: Other Measures of the Distribution Center

In chapter 1 we introduced the mean as a measure of the distribution center. While the mean is
the most commonly cited measure of the center, two others are also useful: mode and median.
We will now introduce them by considering the distribution of American family sizes. Table
25.1 provides data for the sizes of American families in 2008. We begin by constructing a his-
togram to illustrate the data visually (figure 25.1).
In total, there were approximately 224,703,000 adult Americans in 2008. The mean family
size for American adults in 2008 was 2.76. Now we will introduce two other measures of the
distribution center, the mode and the median.
795 Descriptive Statistics, Probability, and Random Variables—A Closer Look

US family size for adults (18 and older) in 2008

70,000

60,000
Number of adults (thousands)

50,000

40,000

30,000

20,000

10,000

0
1 2 3 4 5 6 7 8 9 or more

Family size (persons)

Figure 25.1
Histogram of family size—2008

Table 25.1
Family sizes—2008

Number of Percent of
Family size (persons) adults (thousands) adults (%)

1 51,565 22.9
2 67,347 30.0
3 39,432 17.5
4 36,376 16.2
5 18,074 8.0
6 7,198 3.2
7 2,784 1.2
8 1,028 0.5
9 or more 899 0.4

Source: US Census Bureau Current Population Survey, Annual Social and Economic Supplement, 2008
796 Chapter 25

25.1.1 Measure of the Distribution Center: Mode

The mode is the most frequently occurring data value. It is easy to determine the mode from
the histogram; the mode corresponds to the highest bar. In our case, the mode is 2 persons. As
table 25.1 reports, 30.0 percent of all American adults were members of two person families.
The second most frequent was one person families; 20.5 percent of all American adults were
members of one person families. If you chose one American adult at random, he/she would be
more likely to be a member of a two person family than any other family size. Be aware, however,
that while a family size of two would be more likely than any other family size, the probability
that a randomly selected adult would be a member of a two person family would be less than
one-half, only 0.30:

67, 347, 000

Prob[Family size = 2] = = 0.30
244, 703, 000

25.1.2 Measure of the Distribution Center: Median

The median is the data value that divides the distribution in the middle, that is, into two “equal”
parts. One way to think of the median is to imagine that all 224,703,000 American adults are
lined up in order of increasing family size. The median family size is the family size of the
112,351,500th American in the line, the American in the middle (figure 25.2).
In 2008 the median family size was 2. At least half (22.9 + 30.0 = 52.9 percent) of all American
adults were members of families including 2 or fewer persons and at least half (30.0 + 17.5 +
16.2 + 13.3 = 77.0 percent) were members of families including 2 or more persons.

5 18,074 (8.0%)
6 7,198 (3.2%)
7 2,784 (1.2%)
8 1,028 (0.5%)
9 or more 899 (0.4%)
112,351,500th American adult
29,983 (13.3%)
median

5 or
1 2 3 4
more

51,147 67,347 39,432 36,376 29,983

(22.9%) (30.0%) (17.5%) (16.2%) (13.3%)

224,703,000 American adults

Figure 25.2
Americans adults lined up in order of family size
797 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Table 25.2
Preferred level of aid assumption

Number of Preferred level of federal Percent of

persons in family aid per family (dollars) adults (%)

1 $2,000 22.9
2 4,000 30.0
3 6,000 17.5
4 8,000 16.2
5 10,000 8.0
6 12,000 3.2
7 14,000 1.2
8 16,000 0.5
9 or more 18,000 0.4

The median voter theorem provides one example of how important the median can be. To
appreciate why, suppose that each family’s preferred level of federal aid for education depends
on its size. To make this illustration more concrete, assume that table 25.2 reports on the preferred
level of Federal aid for each family size. While these preferred aid numbers are hypothetical,
they do attempt to capture one realistic feature of family preferences. That is, as a family has
more children, it typically supports more aid for education because the family will gain more
benefits from that aid.
The median voter theorem states that in a majority rule voting process, the preferences of the
median voter, the voter in the middle, will win whenever the median’s preference is pitted against
any other alternative. In this case, the preferences of the 2 person family, the median, will win.
The preferred aid level of the median voter, $4,000, will win. To understand why, we will con-
sider two elections, one in which $4,000 is pitted against a proposal greater than $4,000 and a
second in which $4,000 is pitted against a proposal that is less than $4,000.
• $4,000 versus a proposal greater than $4,000: Suppose that the median voter’s choice,
$4,000, is pitted against a proposal that is greater than $4,000. Clearly, all adult members of 2
person families will vote for $4,000 since $4,000 is their preferred choice. Although $4,000 is
not the preferred choice of 1 person families, $4,000 is closer to their preferred $2,000 choice
than a proposal that is greater than $4,000. Hence adult members of 1 person families will vote
for $4,000 also. Now let us count the votes. Adult members of 1 and 2 person families will vote
for $4,000 which constitutes a majority of the votes, 52.9 percent to be exact. $4,000 will defeat
any proposal that is greater than $4,000.
• $4,000 versus a proposal less than $4,000: Suppose that the median voter’s choice, $4,000,
is pitted against a proposal that is less than $4,000. As before, all adult members of 2 person
families will vote for $4,000 since $4,000 is their preferred choice. Although $4,000 is not the
preferred choice of 3, 4, 5, 6, 7, 8, and 9 or more person families, $4,000 is closer to their
798 Chapter 25

preferred choice than a proposal that is less than $4,000. Hence adult members of these families
will vote for $4,000 also. Adult members of 2, 3, 4, 5, 6, 7, 8, and 9 or more person families
will vote for $4,000, which constitutes a majority of the votes, 77.0 percent to be exact. $4,000
will defeat any proposal that is less than $4,000.

The median family’s preferred level of aid, $4,000, will defeat any proposal that is greater than
$4,000 and any proposal that is less.

25.1.3 Relationship between the Mean and Median

Recall that the mean is the average. The mean describes the average characteristic of the popula-
tion. For example, per capita income describes the income earned by individuals on average;
batting average describes the hits per official at bat a baseball player gets. In our family size
example, the mean equals 2.76. On average, the typical American adult resides in a family of
2.76 persons.
For our family size example, the median, 2, was less than the mean, 2.76. To understand why
this occurs look at the histogram again. Its right-hand tail is longer than its left-hand tail. When
we calculate the median, we find that a family of 4 persons has the same impact as a family of
9. If suddenly quintuples were born to a family of 4, making it a family of 9, the median would
not be affected. However, the mean would be affected. With the birth of the quintuples, the mean
would rise. Consequently, since the right-hand tail of the distribution is longer than the left-hand
tail, the mean is greater than the median because the high values have a greater impact on the
mean than they do on the median.

25.2 Event Trees: A Tool to Calculate Probabilities

Event trees are simple but useful tools we can employ to calculate probabilities. We will use
the following experiment to introduce event trees:

Experiment 25.1: A Standard Deck of 52 Cards

• Shuffle the 52 cards thoroughly.

• Draw one card and record its color—red or black.
• Replace the card.

An event tree visually illustrates the mutually exclusive outcomes (events) of a random
process. In figure 25.3 there are two such outcomes: either the card is red or it is black. The
circle represents the event of the random process, the card draw. There are two branches from
the circle: one representing a red card and one a black card. The ends of the two branches rep-
resent mutually exclusive events—two events that cannot occur simultaneously. A card cannot
799 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Outcome Prob

26/52 = 1/2 Red 1/2 = 0.50

Card
draw

26/52 = 1/2 Black 1/2 = 0.50

Figure 25.3
Card draw event tree for one draw

be both red and black. The event tree reports the probabilities of a red or black card. The “stan-
dard” deck of cards contains 13 spades, 13 hearts, 13 diamonds, and 13 clubs. 26 of the 52 cards
are red, the hearts and diamonds; 26 of the 52 cards are black, the spades and clubs.

• What are the chances the card drawn will be red? Since 26 of the 52 cards are red, there are
26 chances in 52 or 1 chance in 2 that the card will be red. The probability that the card will be
red equals 26/52 or 1/2.
• What are the chances the card drawn will be black? Similarly, since 26 of the 52 cards are
black, there are 26 chances in 52 or 1 chance in 2 that the card will be black. The probability
that the card will be black is 26/52 or 1/2.

The probability is 1/2 that we will move along the red branch and 1/2 that we will move along
the black branch.
There are two important features of this event tree that are worth noting. First, we can only
wind up at the end of one of the branches because the card drawn cannot be both red and black;
stated more formally, red and black are mutually exclusive events. Second, we must wind up at
the end of one branch because the card drawn must be either red or black. This means that the
probabilities of the branch ends must sum to 1.0. We have now introduced the general charac-
teristics of event trees:

• We cannot wind up at the end of more than one event tree branch; consequently the ends of
each event tree branch represent mutually exclusive events.
• We must wind up at the end of one event tree branch; consequently the sum of the probabilities
of the event tree branches equals 1.0.

Econometrics Lab 25.1: Card Draw Simulation—A Standard Deck of 52 Cards

The Card Draw simulation permits us to study this experiment.

800 Chapter 25

Number Red Number Black

18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26

Number of draws
1
2
3
Start Pause

Figure 25.4
Card Draw simulation

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 25.1.]

In the simulation we select the number of red cards and the number of black cards to include
in our deck. We can also specify the number of cards to be drawn from the deck. In this case,
we include 26 red cards and 26 black cards in the deck; we draw one card from the deck. The
relative frequencies of red and black cards are reported. Since the Pause checkbox is checked,
the simulation will pause after each repetition. Click Start. Was a red or black card drawn? Now
the Start button becomes the Continue button. Click Continue to run the second repetition. Is
the simulation calculating the relative frequency of red and black cards correctly? Click the
Continue button a few more times to convince yourself that the simulation is calculating relative
frequencies correctly. Now uncheck the Pause checkbox and click Continue. After many, many
repetitions, click Stop. Observe that the relative frequencies of red and black cards will be about
0.500, equal to the probabilities. The simulation illustrates the relative frequency interpretation
of probability (figure 25.4):

When the experiment is repeated many, many

Relative frequency interpretation of probability:
times, the relative frequency of an outcome equals its probability.
We now wish to illustrate the usefulness of event trees when analyzing more complicated
random processes. Unfortunately, the arithmetic can become cumbersome when experimenting
with a standard deck of 52 cards. To keep the arithmetic straightforward, we will proceed with
a smaller deck of cards, a nonstandard deck of cards consisting of only 3 red cards and 2 black
cards.
801 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Outcome Prob

3/5 Red 3/5 = 0.60

Card
draw

2/5 Black 2/5 = 0.40

Figure 25.5
Card draw event tree for one draw

Experiment 25.2: A Deck of 5 Cards—3 Red Cards and 2 Black Cards

• Shuffle the 5 cards thoroughly.

• Draw one card and record its color—red or black.
• Replace the card.

Since 3 of the 5 cards are red, the probability of drawing a red card is 3/5; since 2 of the 5 cards
are black, the probability of drawing a black card is 2/5. Like our first card draw, the new event
tree possesses two properties (figure 25.5):

Econometrics Lab 25.2: Card Draw Simulation—Draw One Card from a Deck of 3 Red Cards
and 2 Black Cards

Again, we will use a simulation this experiment to illustrate the relative frequency notion of
probability.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 25.2.]

By default, the deck includes 3 red cards and 2 black cards. Click Start and then after many,
many repetitions click Stop. The relative frequency of red cards is about 0.600 and the relative
frequency of black card is about 0.400. Once again, see that the probabilities we calculated equal
the relative frequencies when the experiment is repeated many times (figure 25.6).
802 Chapter 25

Number Red Number Black

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Number of draws
1
2
3
Start Pause

Figure 25.6
Card Draw simulation

Experiment 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards without Replacement

Thus far we have been only drawing one card from the deck. Now consider an experiment in
which we draw two cards from our small deck.

• Shuffle the 5 cards thoroughly.

• Draw one card and record its color—red or black.
• Do not replace the card.
• Draw a second card and record its color—red or black.
• Replace both cards.

The event tree now looks a little more complicated because it must illustrate both the first and
second draws (figure 25.7).
Since the first card drawn is not replaced, the probabilities of obtaining a red and black card
on the second draw depend on whether a red or black card was drawn on the first draw. If a red
card is drawn on the first draw, two red cards and two black cards remain in the deck; conse-
quently on the second draw

• there are two chances in four of drawing a red card. The probability of drawing a red card is
1/2:

1
Prob [2nd is red IF 1st is red ] = = 0.50
2
803 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Draw 1 Draw 2 Outcome Probability

2 Red 3/5 × 1/2 = 3/10
Red 1st R 2nd R
2 Black 2/4 = 1/2 = 0.30

Draw
Red
2

3/5
3 Red 2/4 = 1/2 3/5 × 1/2 = 3/10
Black 1st R 2nd B
2 Black = 0.30

Draw
1

2/5 × 3/4 = 6/20

3 Red
Red 1st B 2nd R = 3/10
1 Black 3/4
2/5 = 0.30

Draw
Black
2

2/5 × 1/4 = 2/10

1/4
Black 1st B 2nd B = 1/10
= 0.10

Figure 25.7
Card draw event tree for two draws

• there are two chances in four of drawing a black card. The probability of drawing a black card
is 1/2:

1
Prob [2nd is red IF 1st is black ] = = 0.50
2

If a black card is drawn on the first draw, 3 red cards and 1 black card remain in the deck;
consequently on the second draw

• there are three chances in four of drawing a red card. The probability of drawing a red card
is 3/4:

3
Prob [2nd is red IF 1st is black ] = = 0.75
4

• there is one chance in four of drawing a black card. The probability of drawing a black card
is 1/4:

1
Prob [2nd is red IF 1st is red ] = = 0.25
4
804 Chapter 25

After the two draws are complete, there are four possible outcomes (events) as indicated by
the end of each event tree branch:

• A red card in the first draw and a red card in the second draw.
• A red card in the first draw and a black card in the second draw.
• A black card in the first draw and a red card in the second draw.
• A black card in the first draw and a black card in the second draw.

These four outcomes (events) are mutually exclusive. The probability of winding up at the end
of a branch equals the product of the probabilities of each limb of the branch.
For example, consider Prob[1st is red AND is 2nd red] by focusing on the top branch of the
event tree:
Prob [1st is red AND 2nd is red ]
= Prob [1st is red ] × Prob [2nd is red IF 1st is red ]
3 1
= ×
5 2
3 1 3
= × = = 0.30
5 2 10

As figure 25.7 indicates, when the first card is drawn there are 3 chances in 5 that we will move
along the Draw 1 red limb; the probability of drawing a red card on the first draw is 3/5. Since
the first card drawn is not replaced only 4 cards now remain, 2 of which are red. So there is 1
chance in 2 that we will continue along the Draw 1’s Draw 2 red limb; if the first card drawn
is a red card, the probability of drawing a red card on the second draw is 1/2. We will use the
relative frequency interpretation of probability to confirm that the probability of a red card on
the first draw and a red card on the second draw equals the product of these two probabilities.
After many, many repetitions of the experiment:

• In the first draw, a red card will be drawn in 3/5 of the repetitions.
• For these repetitions, the repetitions in which a red card is drawn, a red card will be drawn in
1/2 of the second draws.
3 1 3
• Overall, a red card will be drawn in the first and second draws in × = = 0.30 of the
repetitions. 5 2 10

Next consider Prob[1st is red AND 2nd is black] by focusing on the second branch from the
top.
805 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Prob [1st is red AND 2nd is black ]

= Prob [1st is red ] × Prob [2nd is red IF 1st is red ]
3 1
= ×
5 2
3 1 3
= × = = 0.30
5 2 10

The probability of a red card in the first draw is 3/5. Of the 4 cards now remaining, 2 are black.
Therefore, the probability of a black card in the second draw is 1/2. The probability of a red
card on the first draw and a black card on the second is the product of the two probabilities.
Using the same logic, we can calculate the probability of winding up at the end of the other
two event tree branches:
Prob [1st is black AND 2nd is red ]
= Prob [1st is black ] × Prob [2nd is red IF 1st is black ]
2 3
= ×
5 4
2 3 3
= × = = 0.30
5 4 10

and
Prob [1st is black AND 2nd is black ]
= Prob [1st is black ] × Prob [2nd is black IF 1st is black ]
2 1
= ×
5 4
2 1 1
= × = = 0.10
5 4 10

Once again, note that our new event tree exhibits the general event tree properties:

Econometrics Lab 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards
and 2 Black Cards without Replacement

We can use our Card Draw simulation to illustrate the relative frequency interpretation of
probability.
806 Chapter 25

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 25.3.]

The relative frequency of each outcome mirrors the probability of that outcome.

25.3 Calculating the Probability of a Combination of Different Outcomes

Event trees facilitate the calculation of a combination of different outcomes. Since the ends of
each event tree branch represent mutually exclusive events, we add the probabilities of each
outcome. For example, suppose that we want to calculate the probability that a black card is
drawn on the second draw. As the event tree in figure 25.7 illustrates, there are two different
ways of this event occurring:

• A red card could be drawn on the first draw.

• A black card could be drawn on the first draw.

Focusing on the second and fourth event tree branches from the top:

Prob[2nd is black]
= Prob[1st is red AND 2nd is black] + Prob[1st is black AND 2nd is black]
= 0.30 + 0.10
= 0.40

The probability of drawing a black card on the second draw equals 0.40.
Similarly we can calculate the probability that the second card drawn is red by focusing on
the first and third event tree branches from the top:

Prob[2nd is red]
= Prob[1st is red AND 2nd is red] + Prob[1st is black AND 2nd is red]
= 0.30 + 0.30
= 0.60

The probability of drawing a red card on the second draw equals 0.60.
Similarly suppose that we wish to know the probability of drawing two reds cards or two
black cards:

Prob[2 reds OR 2 blacks]

= Prob[1st is red AND 2nd is red] + Prob[1st is black AND 2nd is black]
= 0.30 + 0.10
= 0.40
807 Descriptive Statistics, Probability, and Random Variables—A Closer Look

The probability of drawing two reds cards or two black cards equals 0.40. We simply sum the
probabilities of the appropriate branch ends.

25.4 Nonconditional, Joint, and Conditional Probabilities

It is important to distinguish between conditional, joint, and nonconditional probabilities:

• Conditional probability: The probability that an event will occur if another event occurs.
• Joint probability: The probability that two events will occur together.
• Nonconditional probability: The probability that an event will occur without any informa-
tion about other events.

To understand the distinction better, consider our last experiment and two possible events:

• Drawing a black card on the first draw: 1st is black.

• Drawing a red card on the second draw: 2nd is red.

The probability that the second card will be red if the first card is black, Prob[2nd is red IF 1st
is black], is a conditional probability. The probability that first card is black and the second card
is red, Prob[1st is black AND 2nd is red], is a joint probability. The probability of drawing a
red card on the second draw without any additional information, Prob[2nd is red], is a noncon-
ditional probability.
We have already computed these probabilities by using the event tree appearing in figure 25.7:

3
Conditional probability: Prob [2nd is red IF 1st is black ] = = 0.75
4

3
Joint probability: Prob [1st is black AND 2nd is red ] = = 0.330
10

Nonconditional probability: Prob[2nd is red] = 0.60

Event trees are useful because they facilitate the calculation of all three types of probabilities.
First, event trees report the conditional probabilities and then the joint probabilities. Then, since
the ends of the branches represent mutually exclusive events, we can compute the nonconditional
probabilities by summing the joint probabilities. Table 25.3 summarizes all three types of
probabilities.

25.5 Conditional/Joint Probability Relationship

The joint, conditional, and nonconditional probabilities are related. We have in fact already used
this relationship to calculate the probabilities:
808 Chapter 25

Table 25.3
Conditional, joint, and nonconditional probabilities without replacement

Conditional probabilities Joint probabilitites Nonconditional probabilities

1 3 3
Prob[2nd R IF 1st R] = = 0.50 Prob[1st R AND 2nd R] = = 0.30 Prob[1st R] = = 0.60
2 10 5
1 3 3
Prob[2nd B IF 1st R] = = 0.50 Prob[1st R AND 2nd B] = = 0.30 Prob[2nd R] = = 0.60
2 10 5
3 3 2
Prob[2nd R IF 1st B] = = 0.75 Prob[1st B AND 2nd R] = = 0.30 Prob[1st B] = = 0.40
4 10 5
1 1 2
Prob[2nd B IF 1st B] = = 0.25 Prob[1st B AND 2nd B] = = 0.10 Prob[2nd B] = = 0.40
4 10 5

Prob [1st is black AND 2nd is red ] = Prob [1st is black ] × Prob [2nd is red IF 1st is black ]
2 3
= ×
5 4
2 3 3
= × = = 0.30
5 4 10
The probability of drawing a black card on the first draw and a red card on the second draw
equals the probability of drawing a black card on the first times the probability of drawing a red
card on the second draw if black is drawn first.
We can generalize this relationship between joint and conditional probabilities by specifying
events A and B as follows:

• A = Drawing a black card on the first draw: 1st is black.

• B = Drawing a red card on the second draw: 2nd is red.

Substituting A for 1st is black and B for 2nd is red:

Prob[1st is black AND 2nd is red] = Prob[1st is black] × Prob[2nd is red IF 1st is black]
↓
Prob[A and B] = Prob[A] × Prob[B IF A]

We can rewrite this equation in its more common form1:

Prob[A AND B]
Prob[B IF A] =
Prob[ A]

1. Frequently the following symbols are used instead of the words OR, AND, and IF:

• Prob[A OR B] can be denoted as Prob[A ∪ B] .

• Prob[A AND B] can be denoted as Prob[A ∩ B].
• Prob[A IF B] can be denoted as Prob[A | B].
809 Descriptive Statistics, Probability, and Random Variables—A Closer Look

25.6 The Monty Hall Problem: Mathematicians Eat “Humble Pie”

To illustrate the value of event trees and the conditional/joint probability relationship, consider
a mathematical controversy that erupted in the popular press during 1990. The controversy
involved the game show “Let’s Make a Deal.” On the show, a contestant is presented with three
closed doors, numbered 1, 2, and 3. One of the doors has a valuable prize behind it. A “dud” is
behind the other two doors. The real prize has been randomly placed behind one of the three
doors. Monty Hall, the emcee, knows where the prize is located. Monty asks the contestant to
choose one of the three doors after which he opens one of the three doors. In deciding which
door to open, Monty adheres to two rules. He never opens the door that

• the contestant chose;

• contains the prize.

Monty always opens one of the doors containing a dud. He then gives the contestant the oppor-
tunity to change his/her mind and switch doors. Should the contestant stay with the door he/she
chose initially or should the contestant switch?
In September 1990 Marilyn vos Salant, a columnist for Parade Magazine, wrote about the
contestant’s choice. She claimed that the contestant should always switch. This created a fire-
storm of ridicule from academic mathematicians, some of whom were on the faculty of this
country’s most prestigious institutions. The New York Times even reported the controversy on
the front page of its Sunday, July 21, 1991, edition stating that several thousand letters criticized
Ms. Salant’s advice.2 Two typical responses were:
• Robert Sachs, George Mason University: “You blew it! Let me explain: If one door is shown
to be a loser, that information changes the probability of either remaining choice—neither of
which has any reason to be more likely—to 1/2. As a professional mathematician, I’m very
concerned with the general public’s lack of mathematical skills. Please help by confessing your
error and, in the future, being more careful.”
• E. Ray Bobo, Georgetown University: “How many irate mathematicians are needed to get you
to change your mind?”

Much to the embarrassment of many mathematicians, Ms. Salant’s advice was eventually proved
correct. One of them, Dr. Sachs, had the grace to apologize:

I wrote her another letter, telling her that after removing my foot from my mouth, I’m now eating humble
pie. I vowed as penance to answer all the people who wrote to castigate me. It’s been an intense profes-
sional embarrassment.

This incident teaches us a valuable lesson. As Persi Diaconis, a mathematician from Harvard
University, stated: “Our brains are just not wired to do probability problems very well. . . .” That

2. John Tierney, The New York Times, Sunday, July 21, 1991, pp. 1 and 20.
810 Chapter 25

is why event trees are so useful. We will now use event trees to analyze the Monty Hall problem
and show how many mathematicians would have avoided embarrassment had they applied this
simple, yet powerful tool.
Suppose that you are a contest appearing on the “Let’s Make a Deal” stage. The prize has
already been placed behind one of the doors. There is an equal chance of the prize being behind
each door. The probability that it is behind any one door is one out of three, 1/3. We begin by
drawing the event tree that appears in figure 25.8.
You now choose one of the doors. Suppose you choose door numbered 3. Recall Monty’s
rules: Monty never opens the door that

• the contestant choses;

• contains the prize.

Since you chose door 3, we do not have worry about Monty opening door 3 as a consequence
of Monty’s first rule. Monty will now open either door 1 or door 2. Keeping in mind Monty’s
second rule:

• If the prize is behind door 1, he would open door 2:

Prob[Monty opens door 1 IF prize behind door 1] = 0

Prob[Monty opens door 2 IF prize behind door 1] = 1

• If the prize is behind door 2, he would open door 1:

Prob[Monty opens door 1 IF prize behind door 2] = 1

Prob[Monty opens door 2 IF prize behind door 2] = 0

Prize behind
door 1
1/3

1/3 Prize behind

Prize
door 2

1/3
Prize behind
door 3

Figure 25.8
Event tree before you choose a door
811 Descriptive Statistics, Probability, and Random Variables—A Closer Look

• If the prize is behind door 3, he will randomly choose to open either door 1 or door 2; the
chances are 50-50 he will open door 1 and 50-50 he will open door 2:
1
Prob [ Monty opens door 1 IF prize behind door 3] =
2
1
Prob [ Monty opens door 2 IF prize behind door 3] =
2

Figure 25.9 extends the event tree we drew in figure 25.8 to account for the fact that you
chose door 3. Let us explain how we extended the top branch of the event tree. As shown in
both figures 25.8 and 25.9, the probability that the prize is behind door 1 is 1/3. Now, if the
prize is behind door 1, the probability that Monty will open door 1 is 0, and hence the probability
that he will open door 2 is 1 as indicated in figure 25.9. Using similar logic, we can now extend
the other branches.
Before opening a door Monty pauses for a commercial break so you have time to consider
your strategy. Using the event tree, it is easy to calculate the joint probabilities. From top to
bottom of figure 25.9:

Contestant initially chooses door 3 Prob

0 Open
Monty opens either 1/3 × 0 = 0
door 1
1 or 2 Prize behind
Open
door 1
Open
1/3 × 1 = 1/3
1 door 2
1/3

1 Open
1/3 × 1 = 1/3
door 1
1/3 Prize behind
Prize Open
door 2
Open
1/3 × 0 = 0
0 door 2

1/3
1/2 Open
1/3 × 1/2 = 1/6
door 1
Prize behind
Open
door 3
Open
1/3 × 1/2 = 1/6
1/2 door 2

Figure 25.9
Event tree after you choose door 3
812 Chapter 25

Prob[Monty opens door 1 AND Prize behind door 1] = 0

1
Prob [ Monty opens door 2 AND Prize behind door 1] =
3

1
Prob [ Monty opens door 1 AND Prize behind door 2 ] =
3

Prob[Monty opens door 2 AND Prize behind door 2] = 0

1
Prob [ Monty opens door 1 AND Prize behind door 3] =
6

1
Prob [ Monty opens door 2 AND Prize behind door 3] =
6
Also the event tree allows us to calculate the nonconditional probabilities of which door Monty
will open. Since you choose door 3, Monty will open either door 1 or door 2:

• Prob[Monty opens door 1]: Counting from the top of figure 25.9, focus on the ends of
branches 1, 3, and 5:

1 1 1
Prob [ Monty opens door 1] = 0 + + =
3 6 2

• Prob[Monty opens door 2]: Counting from the top of figure 25.9, focus on the ends of
branches 2, 4, and 6:

1 1 1
Prob [ Monty opens door 2 ] = +0+ =
3 6 2

Note that these two nonconditional probabilities sum to 1 because we know that Monty always
opens one of the two doors that you did not choose.
Now we are in a position to give you some advice. We know that Monty will open door 1 or
door 2 as soon as the commercial ends. First, consider the possibility that Monty opens door. 1.
If so, door 1 will contain a dud. In this case the prize is either behind door 2 or door 3. We can
calculate the probability that the prize is behind door 2 and the probability that the prize is
behind door 3 if Monty were to open door 1 by applying the conditional/joint probability
relationship:
• Prob[Prize behind door 2 IF Monty opens door 1]: Begin with the conditional/joint probability
relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]
813 Descriptive Statistics, Probability, and Random Variables—A Closer Look

We will apply this relationship by letting

A = Monty opens door 1

B = Prize behind door 2

Substituting Monty opens door 1 for A and Prize behind door 2 for B gives

Prob [ Prize behind door 2 IF Monty opens door 1]

Prob [ Monty opens door 1 AND Prize behind door 2 ]
=
Prob[Monty opens door 1]

We have already calculated these probabilities with the help of the event tree:
1
Prob [ Monty opens door 1 AND Prize behind door 2 ] =
3

1
Prob [ Monty opens door 1] =
2
Now we plug in:

Prob [ Prize behind door 2 IF Monty opens door 1]

Prob [ Monty opens door 1 AND Prize behind door 2 ]
=
Prob[Monty opens door 1]
13 2
= =
12 3

2
Prob[ Prize behind door 2 IF Montyopens door 1] =
3

• Prob[Prize behind door 3 IF Monty opens door 1]: We use the same logic. Begin with the
conditional/joint probability relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]

Next apply this relationship by letting

A = Monty opens door 1

B = Prize behind door 3

Substituting Monty opens door 1 for A and Prize behind door 3 for B gives
814 Chapter 25

Prob [ Prize behind door 3 IF Monty opens door 1]

Prob [ Monty opens door 1 AND Prize behind door 3]
=
Prob[Monty opens door 1]

We have already calculated these probabilities:

1
Prob [ Monty opens door 1 AND Prize behind door 3] =
6

1
Prob [ Monty opens door 1] =
2

Now we plug in:

Prob [ Prize behind door 2 IF Monty opens door 1]

Prob [ Monty opens door 1 AND Prize behind door 3]
=
Prob[Monty opens door 1]
16 1
= =
12 3

1
Prob [ Prize behind door 3 IF Monty opens door 1] =
3

If Monty opens door 1 after you have chosen door 3,

2
the probability that the prize is behind door 2 is and
3

1
the probability that the prize is behind door 3 is .
3

Therefore, if Monty opens door 1, you should switch from door 3 to door 2.
Next consider the possibility that Monty opens door 2. If so, door 2 will contain a dud. In
this case the prize is either behind door 1 or door 3. We can calculate the probability that the
prize is behind door 1 and the probability that the prize is behind door 3 if Monty were to open
door 2 by applying the conditional/joint probability relationship:

• Prob[Prize behind door 1 IF Monty opens door 2]: Conditional/joint probability relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]

We will apply this relationship by letting

815 Descriptive Statistics, Probability, and Random Variables—A Closer Look

A = Monty opens door 2

B = Prize behind door 1

By the same logic as before,

2
Prob [ Prize behind door 1 IF Monty opens door 2 ] =
3

• Calculating Prob[Prize behind door 3 IF Monty opens door 2]: Conditional/joint probability
relationship
Prob[A AND B]
Prob[B IF A] =
Prob[A]

We will apply this relationship by letting

A = Monty opens door 2

B = Prize behind door 3

By the same logic as before,

1
Prob [ Prize behind door 3 IF Monty opens door 2 ] =
3

If Monty opens door 2 after you have chosen door 3,

2
the probability that the prize is behind door 1 is and
3

1
the probability that the prize is behind door 3 is .
3

Therefore, if Monty opens door 2, you should switch from door 3 to door 1.
So let us summarize. If Monty opens door 1, you should switch. If Monty opens door 2, you
should switch. Regardless of which door Monty opens, you should switch doors. Ms. vos Salant
is correct and all her academic critics should be eating “humble pie.”
What is the intuition here? Before you make your initial choice, you know that the probability
that the prize lies behind door 3 equals 1/3. Furthermore you know that after you make your
choice, Monty will open neither the door you chose nor the door that contains the prize. There-
fore, when Monty actually opens a door, you will be given no additional information that is
relevant to door 3. Without any additional information about door 3, it should not affect the
probability that the prize lies behind door 3. This is precisely what our calculations showed.
816 Chapter 25

Econometrics Lab 25.4: Monty Hall Simulation

We will use a simulation to confirm our conclusion that switching is the better strategy.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 25.4.]

Click Start Simulation and then after many, many repetitions click Stop. The simulation reports
the winning percentage for both the no switch and switch strategies. No switch winning fre-
quency equals 0.3333 . . . and the switch winning percentages equals 0.6666. . . . The results are
consistent with the probabilities that we just calculated.

25.7 Correlation

We begin with correlated events presenting both verbal and rigorous definitions. Then, we extend
the notion of correlation to random variables.

25.7.1 Correlated Events

Definition
Two events are correlated whenever the occurrence of one event helps up predict the other;
more specifically, two events are correlated when the occurrence of one event either increases
or decreases the probability of the other. Formally, events A and B are correlated whenever

Prob[B IF A] ≠ Prob[B]

The conditional probability of an event B does not equal its nonconditional probability. When
events A and B are correlated, event A is providing additional information that causes us to
modify our assessment of event B’s likelihood. To illustrate what we mean by correlation, review
experiment 25.3:

Experiment 25.3: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards without Replacement

• Shuffle the 5 cards thoroughly.

• Draw one card and record its color—red or black.
• Do not replace the card.
• Draw a second card and record its color—red or black.
• Replace both cards.
817 Descriptive Statistics, Probability, and Random Variables—A Closer Look

It is easy to show that the two events, second card drawn is red and first card drawn is black in
experiment 25.3, are correlated. Refer to the event tree appearing in figure 25.7 to compute the
nonconditional probability that the second card drawn is red. Since the ends of each event tree
branch represent mutually exclusive events, we can calculate the combination of different out-
comes by adding the probabilities of ending up at the appropriate branches. As the event tree in
figure 25.7 illustrates, there are two different ways to draw a red card on the second draw:

• A red card could be drawn on the first draw: first is red.

• A black card could be drawn on the first draw: first is black.

Consequently the probability of drawing a red card on the second draw equals 0.60:

Prob[2nd is red] + Prob[1st is red AND 2nd is red] + Prob[1st is black AND 2nd is red]
= 0.30 + 0.30
= 0.60

Recall that the conditional probability of drawing a red card on the second draw if the first card
drawn is black. As illustrated in figure 25.7,

3
Prob [2nd is red IF 1st is black ] = = 0.75
4

Consequently the conditional and nonconditional probabilities are not equal:

Prob[2nd is red IF 1st is black] ≠ Prob[2nd is red]

Therefore the events are correlated. This is intuitive, is it not? If we know that we have drawn
a black card on the first draw and we do not replace it, there will be fewer black cards remaining
in the deck. Consequently we will be more likely to draw a red card on the second draw.

Correlated Events and Joint Probability

When two events are correlated their joint probability will not equal the product of the noncon-
ditional probabilities; that is, if A and B are correlated, then

Prob[A AND B] ≠ Prob[A] × Prob[B]

We can illustrate this with our example. We know that the events 1st is black and 2nd is red are
correlated. Let

A = 1st is black, B = 2nd is red

Now let us review the probabilities:

Prob[1st is black AND 2nd is red] = 0.30

Prob[1st is black] = 0.40, Prob[2nd is red] = 0.60

818 Chapter 25

Prob[1st is black] × Prob[2nd is red] = 0.40 × 0.60 = 0.24

Clearly, 0.30 ≠ 0.24:

Prob[1st is black AND 2nd is red] ≠ Prob[1st is black] × Prob[2nd is red]

25.7.2 Correlated Random Variables and Covariance

We will now extend the notions of correlation to random variables. Continuing on with experi-
ment 25.3, let

v1 = number of black cards drawn in the first draw

v2 = number of black cards drawn in the second draw

v1 can take on two possible values, 0 and 1. Similarly v2 can take on two possible values, 0
and 1.
Let us modify figure 25.7 by adding v1 and v2 to the event tree describing experiment 25.3 as
shown in figure 25.10. Using the event tree, we can calculate the conditional, joint, and noncon-
ditional probabilities for the random variables v1 and v2 (table 25.4). In the absence of

Draw 1 Draw 2 Outcome Prob

2 Red
Red 1st R 2nd R 3/5 × 1/2 = 3/10
2 Black 2/4 = 1/2
v2 = 0 v1 = 0 v2 = 0 = 0.30
Draw
Red
2
v1 = 0
3/5
3 Red 2/4 = 1/2
Black 1st R 2nd B 3/5 × 1/2 = 3/10
2 Black
v2 = 1 v1 = 0 v2 = 1 = 0.30
Draw
1

3 Red
Red 1st B 2nd R 2/5 × 3/4 = 6/20
1 Black 3/4
2/5 v2 = 0 v1 = 1 v2 = 0 = 3/10
Draw = 0.30
Black
2
v1 = 1

1/4
Black 1st B 2nd B 2/5 × 1/4 = 2/20
v2 = 1 v1 = 1 v2 = 1 = 1/10
= 0.10

Figure 25.10
Event tree for two draws without replacement
819 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Table 25.4
Conditional, joint, and nonconditional probabilities without replacement

Conditional probabilities Joint probabilitites Nonconditional probabilities

1 3 3
Prob[v2 = 0 IF v1 = 0] = = 0.50 Prob[v1 = 0 AND v2 = 0] = = 0.30 Prob[v1 = 0] = = 0.60
2 10 5
1 3 3
Prob[v2 = 1 IF v1 = 0] = = 0.50 Prob[v1 = 0 AND v2 = 1] = = 0.30 Prob[v2 = 0] = = 0.60
2 10 5
3 3 2
Prob[v2 = 0 IF v1 = 1] = = 0.75 Prob[v1 = 1 AND v2 = 0] = = 0.30 Prob[v1 = 1] = = 0.40
4 10 5
1 1 2
Prob[v2 = 1 IF v1 = 1] = = 0.25 Prob[v1 = 1 AND v2 = 1] = = 0.10 Prob[v2 = 1] = = 0.40
4 10 5

information about v1, the nonconditional probabilities are relevant. The nonconditional probabil-
ity that v2 will equal

0 is 0.60: Prob[v2 = 0] = 0.60

1 is 0.40: Prob[v2 = 1] = 0.40

On the one hand, if we know that v1 equals 0, the probabilities change; the conditional prob-
ability that v2 will now equal

0 is 0.50: Prob[v2 = 0 IF v1 = 0] = 0.50

1 is 0.50: Prob[v2 = 1 IF v1 = 0] = 0.50

On the other hand, if we know that v1 equals 1, the conditional probability that v2 will now equal

0 is 0.75: Prob[v2 = 0 IF v1 = 1] = 0.75

1 is 0.25: Prob[v2 = 1 IF v1 = 1] = 0.25

The random variables v1 and v2 are correlated. Knowing the value of v1 helps us predict the
value of v2 because the value of v1 affects v2’s probability distribution. In this case, v1 and v2
are negatively correlated; an increase in the value of v1 from 0 to 1, increases the likelihood that
v2 will be lower

v1 = 0 v1 = 1
Prob[v2 = 0 IF v1 = 0] = 0.50 Prob[v2 = 0 IF v1 = 1] = 0.75
Prob[v2 = 1 IF v1 = 0] = 0.50 Prob[v2 = 1 IF v1 = 1] = 0.25

Now recall that covariance is a measure of correlation. If two variables are correlated, their
covariance not equal 0. Let us now calculate the covariance of the random variables of v1 and
v2 to illustrate this fact. The equation for the covariance is
820 Chapter 25

Cov[ v1, v2 ] = ∑ All v ∑ All v2

(v1 − Mean[ v1 ])(v2 − Mean[ v2 ])Prob[v1 AND v2 ]
1

First, calculate the means of v1 and v2:

Mean[ v1 ] = ∑ All v v1 Prob[v1 ] Mean[ v2 ] = ∑ All v v2 Prob[v2 ]

1 2

= 0 × 0.60 + 1 × 0.40 = 0 × 0.60 + 1 × 0.40

= 0.4 = 0.4

Now, focusing on the equation for covariance, obtain

v1 v2 (v1 − Mean[v1]) (v2 − Mean[v2]) Prob[v1 AND v2] Product
0 0 0 − 0.4 = −0.4 0 − 0.4 = −0.4 0.30 −0.4 × −0.4 × 0.30 = 0.0480
0 1 0 − 0.4 = −0.4 1 − 0.4 = 0.6 0.30 −0.4 × 0.6 × 0.30 = −0.0720
1 0 1 − 0.4 = 0.6 0 − 0.4 = −0.4 0.30 0.6 × −0.4 × 0.30 = −0.0720
1 1 1 − 0.4 = 0.6 1 − 0.4 = 0.6 0.10 0.6 × 0.6 × 0.10 = 0.0360
Cov[v1, v2] = −0.0600

The covariance is negative because v1 and v2 are negatively correlated. An increase in v1,
increases the probability that v2 will be lower.

25.8 Independence

As with correlation we begin with independent events presenting both verbal and rigorous defini-
tions. Then we extend the notion of independence to random variables.

25.8.1 Independent Events

Definition
Two events are independent (uncorrelated) whenever the occurrence of one event does not help
us predict the other. For example, the total points scored in the Super Bowl and the relative
humidity in Santiago, Chile, on Super Bowl Sunday are independent events. Knowing the value
of one would not help us predict the other. Two events are independent when the occurrence of
one event does not affect the likelihood that the other event will occur. Formally, event B is
independent of event A independent whenever

Prob[B IF A] = Prob[B]

The occurrence of event A does not affect the chances that event B will occur.

Independence and Joint Probability

The joint probability of two independent events equals the product of the nonconditional
probabilities:
821 Descriptive Statistics, Probability, and Random Variables—A Closer Look

If Prob[B IF A] = Prob[B], then Prob[A AND B] = Prob[A] × Prob[B]

To justify this, begin with the conditional/joint probability relationship:

Prob[A AND B]
Prob [ B IF A] =
Prob[A]

Multiply both sides of the equation by Prob[A]:

Prob[A] × Prob[B IF A] = Prob[A AND B]
↓
Since B is independent of A, Prob[B IF A] = Prob[B]

Prob[A] × Prob[B] = Prob[A AND B]

Independent Events and Symmetry

The notion of independence is symmetric; when the probability of event B is unaffected by event
A, the probability of event A is unaffected by event B:

If Prob[B IF A] = Prob[B], then Prob[A IF B] = Prob[A]

To justify this, apply the conditional/joint probability relationship to Prob[A IF B]:

Prob[B AND A]
Prob [ A IF B ] =
Prob[B]

Prob[B AND A] = Prob[A AND B]

Prob[A AND B]
=
Prob[B]

Since Prob[B IF A] = Prob[B], Prob[A AND B] = Prob[A] × Prob[B]

Prob[A] × Prob[B]
=
Prob[B]

= Prob[A]

25.8.2 Independent Random Variables and Covariance

Two random variables are independent if the probability distribution of each is unaffected by
the value of the other:

Prob[v2 IF v1] = Prob[v2]

822 Chapter 25

And hence, after applying the logic we used with independent events, the joint probability equals
the product of the nonconditional probabilities:

Prob[v1 AND v2] = Prob[v1] × Prob[v2]

We can show that when two random variables are independent their covariance will equal 0:
Cov[v1, v2 ] = ∑ All v ∑ All v2
(v1 − Mean[ v1 ])(v2 − Mean[ v2 ])Prob[v1 AND v2 ]
1

If v1 and v2 are independent,

= ∑ All v ∑ All v2
(v1 − Mean[ v1 ])(v2 − Mean[ v2 ]) Prob[v1 ] × Prob[v2 ]
1

Rearranging factors

= ∑ All v ∑ All v2
(v1 − Mean[ v1 ])Prob[v1 ] × (v2 − Mean[ v2 ])Prob[v2 ]
1

Rearranging terms

= ∑ All v (v1 − Mean[ v1 ])Prob[v1 ] × ∑ All v (v2 − Mean[ v2 ])Prob[v2 ]

1 2

Now focus on ∑ All v2

(v2 − Mean[ v2 ])Prob[v2 ] . By applying the equation for the mean, we can
show that ∑ All v (v2 − Mean[ v2 ])Prob[v2 ] = 0 :
2

∑ All v2
(v2 − Mean[ v2 ])Prob[v2 ]

Simplifying algebra

= ∑ All v v2 Prob[v2 ] − ∑ All v Mean[ v2 ]Prob[v2 ]

2 2

Simplifying algebra

= ∑ All v v2 Prob[v2 ] − ∑ All v Mean[ v2 ]Prob[v2 ]

2 2

↓ Mean[ v2 ] = ∑ All v v2 Prob[v2 ]

= Mean[ v2 ] − ∑ All v Mean[ v2 ]Prob[v2 ]

↓ Simplifying algebra
= Mean[ v2 ] − Mean[ v2 ]∑ All v Prob[v2 ]
2

↓ ∑ All v2
Prob[v2 ] = 1

= Mean[v2] − Mean[v2]
=0
823 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Returning to the covariance equation:

Cov[v1, v2 ] = ∑ All v (v1 − Mean[ v1 ])Prob[v1 ] × ∑ All v (v2 − Mean[ v2 ])Prob[v2 ]

1 2

= ∑ All v (v1 − Mean[ v1 ])Prob[v1 ] × 0

=0
We have shown that when the random variables v1 and v2 are independent their covariance
equals 0.
To illustrate two independent random variables, let us modify experiment 25.3 by replacing
the card drawn after the first draw:

Experiment 25.4: Card Draw Simulation—Draw Two Cards from a Deck of 3 Red Cards and 2
Black Cards with Replacement

• Shuffle the 5 cards thoroughly.

• Draw one card and record its color—red or black.
• Replace the card.
• Draw a second card and record its color—red or black.
• Replace the card.

As before, our random variable equals the number of black cards drawn:

v1 = number of black cards drawn in the first draw

v2 = number of black cards drawn in the second draw

v1 can take on two possible values, 0 and 1. Similarly v2 can take on two possible values,
0 and 1.
Let us begin by constructing the event tree (figure 25.11). Using the event tree, we can cal-
culate the conditional, joint, and nonconditional probabilities for the random variables v1 and v2
(table 25.5). In the absence of information about v1, the nonconditional probabilities are relevant;
the nonconditional probability that v2 will equal

0 is 0.60: Prob[v2 = 0] = 0.60

1 is 0.40: Prob[v2 = 1] = 0.40

But what happens if we know that v1 equals 0? The conditional probabilities tell us that nothing
changes. The probability that v2 will equal

0 is 0.60: Prob[v2 = 0 IF v1 = 0] = 0.60

1 is 0.40: Prob[v2 = 1 IF v1 = 0] = 0.40

824 Chapter 25

Draw 1 Draw 2 Outcome Prob

2 Red
Red 1st R 2nd R 3/5 × 3/5 = 9/25
2 Black 3/5
v2 = 0 v1 = 0 v2 = 0 = 0.36
Draw
Red
2
v1 = 0
3/5
3 Red 2/5
Black 1st R 2nd B 3/5 × 2/5 = 6/25
2 Black
v2 = 1 v 1 = 0 v2 = 1 = 0.24
Draw
1

3 Red
Red 1st B 2nd R 2/5 × 3/5 = 6/25
1 Black 3/5
2/5 v2 = 0 v1 = 1 v2 = 0 = 0.24
Draw
Black
2
v1 = 1

2/5
Black 1st B 2nd B 2/5 × 2/5 = 4/25
v2 = 1 v1 = 1 v2 = 1 = 0.16

Figure 25.11
Event tree for two draws with replacement

Table 25.5
Conditional, joint, and nonconditional probabilities with replacement

Conditional probabilities Joint probabilitites Nonconditional probabilities

3 9 3
Prob[v2 = 0 IF v1 = 0] = = 0.60 Prob[v1 = 0 AND v2 = 0] = = 0.36 Prob[v1 = 0] = = 0.60
5 25 5
2 6 3
Prob[v2 = 1 IF v1 = 0] = = 0.40 Prob[v1 = 0 AND v2 = 1] = = 0.24 Prob[v2 = 0] = = 0.60
5 25 5
3 6 2
Prob[v2 = 0 IF v1 = 1] = = 0.60 Prob[v1 = 1 AND v2 = 0] = = 0.24 Prob[v1 = 1] = = 0.40
5 25 5
2 4 2
Prob[v2 = 1 IF v1 = 1] = = 0.40 Prob[v1 = 1 AND v2 = 1] = = 0.16 Prob[v2 = 1] = = 0.40
5 25 5
825 Descriptive Statistics, Probability, and Random Variables—A Closer Look

However, if we know that v1 equals 1, the probability that v2 will now equal

0 is 0.60: Prob[v2 = 0 IF v1 = 1] = 0.60

1 is 0.40: Prob[v2 = 1 IF v1 = 1] = 0.40

Knowing the value of v1 does not help us predict the value of v2 because it does not affect v2’s
probability distribution. In this case, v1 and v2 are independent.
Let us now calculate the covariance of these random variables to show that their covariance
equals 0. The equation for the covariance is

Cov[v1, v2 ] = ∑ All v ∑ All v2

(v1 − Mean[ v1 ])(v2 − Mean[ v2 ])Prob[v1 AND v2 ]
1

First, calculate the means of v1 and v2:

Mean[ v1 ] = ∑ All v v1 Prob[v1 ] Mean[ v2 ] = ∑ All v v2 Prob[v2 ]

1 2

= 0 × 0.60 + 1 × 0.40 = 0 × 0.60 + 1 × 0.40

= 0.4 = 0.4

Now, focusing on the equation for covariance, obtain

v1 v2 (v1 − Mean[v1]) (v2 − Mean[v2]) Prob[v1 AND v2] Product
0 0 0 − 0.4 = −0.4 0 − 0.4 = −0.4 0.36 −0.4 × −0.4 × 0.36 = 0.0576
0 1 0 − 0.4 = −0.4 1 − 0.4 = 0.6 0.24 −0.4 × 0.6 × 0.24 = −0.0576
1 0 1 − 0.4 = 0.6 0 − 0.4 = −0.4 0.24 0.6 × −0.4 × 0.24 = −0.0576
1 1 1 − 0.4 = 0.6 1 − 0.4 = 0.6 0.16 0.6 × 0.6 × 0.16 = 0.0576
Cov[v1, v2] = 0.0000

When v1 and v2 are independent, the covariance equals 0.

25.9 Summary of Correlation and Independence

25.9.1 Correlation

Correlated Events

• Definition: Two events are correlated whenever the occurrence of one event helps up predict
the other; more specifically, whenever the occurrence of one event either increases or decreases
the probability of the other:

Prob[B IF A] ≠ Prob[B]
826 Chapter 25

• Correlated events and joint probability: Two events are correlated whenever their joint
probability will not equal the product of the nonconditional probabilities:

Prob[B IF A] ≠ Prob[B]
↓
Prob[A AND B] ≠ Prob[A] × Prob[B]

• Correlated random variables and covariance: Two variables are correlated whenever their
covariance does not equal 0.

Cov[v1, v2] ≠ 0

25.9.2 Independence

Independent Events

• Definition: Two events are independent (uncorrelated) whenever the occurrence of one event
does not help us predict the other; more specifically, whenever the occurrence of one event does
not increase or decrease the probability of the other:

Prob[B IF A] = Prob[B]

• Independent events and joint probability: The joint probability of two independent events
equals the product of the nonconditional probabilities:

Prob[B IF A] = Prob[B]
↓
Prob[A AND B] = Prob[A] × Prob[B]

• Independent events and symmetry: When the probability of event B is unaffected by event
A, the probability of event A is unaffected by event B:

Prob[B IF A] = Prob[B]
↓
Prob[A IF B] = Prob[A]
• Independent random variables and covariance: Two variables are independent whenever
their covariance equals 0:

Cov[v1, v2] = 0
827 Descriptive Statistics, Probability, and Random Variables—A Closer Look

25.10 Describing Probability Distributions of Continuous Random Variables

Integral calculus allows us to extend the equations for the mean and variance of discrete random
variables to continuous random variables. Since knowledge of integral calculus is not needed
for most econometric analysis, we will include only the definition for those students who have
been exposed to the integral calculus:

Distribution center: Mean[v] = ∫ vProb[v]dv

All v

Distribution spread: Var[v] = ∫ (v − Mean[v])2 Prob[v]dv

All v

Correlation: Var[v1, v2 ] = ∫ (v1 − Mean[v1 ])(v2 − Mean[v2 ])Prob[v1 AND v2 ]dv

All v

Chapter 25 Review Questions

1. Consider the measures of the center that we introduced. Define the

a. mean.
b. mode.
c. median.
2. What can you conclude about the distribution when the mean is greater than the median?
3. What is an event tree?
4. Explain the differences between nonconditional, conditional, and joint probabilities.
5. Consider two independent random variables.
a. In words, what does it mean for two random variables to be independent?
b. What will the covariance equal for two random variables?

Chapter 25 Exercises

1. Focus on thirty students who enrolled in an economics course during a previous semester.

Student SAT data: Cross-sectional data of student math and verbal high school SAT scores from
a group of 30 students.

SatMatht Math SAT score for student t

SatVerbalt Verbal SAT score for student t
SexM1t 1 if student t is male; 0 if female
828 Chapter 25

Student SatMath SatVerbal SexM1 Student SatMath SatVerbal SexM1

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Student SAT.]
a. Construct a histogram for the data variable SatVerbal.
b. What are the mean, mode, and median for SatVerbal?
c. How does your histogram help you explain why the mean and median are related as they
are?

2. Two cab companies, Yellow Cab and Orange Cab, serve a small town. There are 900 Yellow
cabs and 100 Orange cabs. A cab strikes a color-blind pedestrian. After striking the pedestrian,
the cab immediately leaves the scene of the accident. The victim knows that a cab struck him,
but his color blindness makes him unable to report on the hue of the cab.
a. Based on this information, draw an event tree to determine the probability that the guilty
cab was Yellow and the probability that the guilty cab was Orange?
A judge must find one of the cab companies liable for the damage done to the pedestrian.
b. Based on the available information, which cab company should the judge find guilty?
Explain.

3. Reconsider the cab liability issue described in question 2. An eyewitness has just come
forward who reports that he saw the accident. Irrefutable documentation has proven that the
probability of the eyewitness being correct is 0.8 and the probability of being incorrect is 0.2.
829 Descriptive Statistics, Probability, and Random Variables—A Closer Look

a. Extend the event tree you constructed in question 2 to reflect two possibilities: the pos-
sibility that the eyewitness will report that a Yellow cab was guilty and that an Orange cab
was guilty.
b. Using your event tree, determine the following joint probabilities:

Prob[Yellow guilty AND Yellow reported] =

Prob[Yellow guilty AND Orange reported] =

Prob[Orange guilty AND Orange reported] =

Prob[Orange guilty AND Yellow reported] =

c. Using your event tree, calculate the following probabilities:

Prob[Yellow reported] =

Prob[Orange reported] =

d. Using the conditional/joint probability relationship, compute the following conditional

probabilities:

Prob[Yellow guilty IF Yellow reported] =

Prob[Yellow guilty IF Orange reported] =

Prob[Orange guilty IF Orange reported] =

Prob[Orange guilty IF Yellow reported] =

e. Should the judge, a very busy individual, take the time to hear the eyewitness’s testimony?
Explain.
f. Your event tree reflects two pieces of information: the relative number of Yellow and
Orange cabs and the reliability of eyewitness testimony. Intuitively, how do these two pieces
of information explain you results?

4. This problem comes from a “Car Talk Puzzler: Attack of the Bucolic Plague.”3

RAY: This puzzler came to us a while ago—January 1999, to be precise. It’s from Professor
Bruce Robinson at the University of Tennessee in Knoxville. Of course, I had to make a few
modifications . . .
TOM: He won’t even want to be associated with it, once you’re finished.
RAY: I’m sure he’ll send us an email asking to have his name expunged.

3. This problem was the “Car Talk Puzzler” on June 7, 2004.

830 Chapter 25

Here it is:

• A dreaded new disease is sweeping across the countryside. It’s called “The Bucolic Plague.”
If you’re afflicted with it, you begin wandering around the woods aimlessly, until you finally
collapse and die. The remedy is to lock yourself in the bathroom for two or three days, until the
urge passes.
• A test has been developed that can detect whether you have the disease. The test is 99 percent
accurate. That is, if you have the disease, there is a 99 percent chance that the test will detect
it. If you don’t have the disease, the test will be 99 percent accurate in saying that you don’t.
• In the general population, 0.1 percent of the people have the disease—that’s one-tenth of one
percent.
• You decide to go for the test. You get your results: positive.

Should you lock yourself in the bathroom and ask for a constant supply of magazines, or should
you not be worried? And, the real question is, what is the probability that you actually have the
Bucolic Plague?

a. First, suppose that you have not been tested yet. Consider the fact that 0.1 percent of the
general population has the disease. Assuming that you are typical, what is the probability that
you have the disease? Draw the appropriate event tree to illustrate this.
b. Now, you have been tested, but you have not yet received the test results. Extend your
event tree to account for the possibility of a positive or negative result.
i. Using your event tree, determine the following joint probabilities:

Prob[You have the disease AND You test positive]

Prob[You have the disease AND You test negative]

Prob[You do not have the disease AND You test positive]

Prob[You do not have the disease AND You test negative]

ii. Using your event tree, calculate the following probabilities:

Prob[You test positive]

Prob[You test negative]

iii. Using the conditional/joint probability relationship, compute the following conditional
probabilities:

Prob[You have the disease IF You test positive]

Prob[You have the disease IF You test negative]

831 Descriptive Statistics, Probability, and Random Variables—A Closer Look

Prob[You do not have the disease IF You test positive]

Prob[You do not have the disease IF You test negative]

c. Based on your answers to part b, should you be overly concerned?

d. Your event tree reflects two pieces of information: the incidence of the disease in the
general population and the reliability of the test. Intuitively, how do these two pieces of
information explain your results?

5. Suppose that the producer of “Let’s Make a Deal” changes the way in which the “prize door”
is selected. Instead of randomly placing the prize behind one of the three doors, the following
procedure is used:
• First, the contestant chooses two doors rather one.
• Second, Monty opens one of the two doors the contestant had chosen. The door Monty opens
never contains the prize.
• Third, Monty gives the contestant the opportunity to stay with unopened door that he/she
initially chose or switch to the other unopened door.
Suppose that the contestant initially chooses doors 1 and 2. Monty uses the following rules to
decide which door to open:
• If the prize is behind door 1, he would open door 2.
• If the prize is behind door 2, he would open door 1.
• If the prize is behind door 3, he would choose either to open door 1 or door 2 randomly; that
is, if the prize is behind door 3, the chances are 50-50 he will open door 1 and 50-50 he will
open door 2.
a. Draw the event tree describing which door Monty will open.
b. Calculate the following conditional probabilities:
i. Prob[Prize behind door 2 IF Monty opens door 1]
ii. Prob[Prize behind door 3 IF Monty opens door 1]
iii. Prob[Prize behind door 1 IF Monty opens door 2]
iv. Prob[Prize behind door 3 IF Monty opens door 2]
c. After Monty opens a door, would you advise the contestant to stay with the unopened door
he/she chose initially or switch to the other unopened door, door 3?

5. Suppose that the producer of “Let’s Make a Deal” changes the way in which the “prize door”
is selected. Instead of randomly placing the prize behind one of the three doors, the following
procedure is used:
• Thoroughly shuffle a standard deck of fifty-two cards.
• Randomly draw one card, note its color, and replace the card.
832 Chapter 25

• Thoroughly shuffle the deck again.

• Randomly draw a second card, note its color, and replace the card.
After the two cards are drawn, the producer uses the following rules to decide where to place
the prize:
• If both cards drawn are red, the prize door is 1.
• If both cards drawn are black, the prize door is 3.
• If one card drawn is red and one black, the prize door is 2.
a. Which door is most likely to hide the prize?
b. Suppose that you initially choose this door, the door most likely to hide the prize. After
Monty opens a door that does not include the prize, should you switch?
Estimating the Mean of a Population
26

Chapter 26 Objectives

26.1 Estimation Procedure for the Population Mean

26.2 Estimated Mean’s Probability Distribution

26.2.1 Measure of the Probability Distribution’s Center: Mean
26.2.2 Measure of the Probability Distribution’s Spread: Variance

26.3 Taking Stock: What We Know versus What Clint Knows

26.4 Estimation Procedures: Importance of the Probability Distribution’s Mean (Center) and
Variance (Spread)

26.5 Strategy to Estimate the Variance of the Estimated Mean’s Probability Distribution

26.6 Step 1: Estimate the Variance of the Population

26.6.1 First Attempt: Variance of Clint’s Four Numerical Values Based on the Actual
Population Mean
26.6.2 Second Attempt: Variance of Clint’s Four Numerical Values Based on the Estimated
Population Mean
26.6.3 Third Attempt: “Adjusted” Variance of Clint’s Four Numerical Values Based on the
Estimated Population Mean

26.7 Step 2: Use the Estimated Variance of the Population to Estimate the Variance of the
Estimated Mean’s Probability Distribution

26.8 Clint’s Assessment of the Key West Tourist Bureau’s Claim

834 Chapter 26

26.9 Normal Distribution and the Student t-Distribution

26.10 Tying Up a Loose End: Degrees of Freedom

Chapter 26 Prep Questions

1. Apply the arithmetic of means to show that

⎡1 ⎤
Mean ⎢ (v1 + v2 + … + vT ) ⎥ = ActMean
⎣T ⎦
whenever Mean[vi] = ActMean for each i; that is,

Mean[v1] = Mean[v2] = ... = Mean[vT] = ActMean for i = 1, 2, ... , T

2. Apply the arithmetic of variances to show that

⎡1 ⎤ ActVar
Var ⎢ (v1 + v2 + … + vT ) ⎥ =
⎣T ⎦ T

whenever

• Var[vi] = ActVar for each i; that is,

Var[v1] = Var[v2] = ... = Var[vT] = ActVar for i = 1, 2, ... , T

and

• the vi’s are independent; that is, all the covariances equal 0.
3. Consider an estimate’s probability distribution:
a. Why is the mean of the probability distribution important? Explain.
b. Why is the variance of the probability distribution important? Explain.
4. Consider a random variable. When additional uncertainty is present how is the spread of does
the random variable’s probability distribution affected? How is the variance affected?

26.1 Estimation Procedure for the Population Mean

Last summer our friend Clint was hired by the consumer group to analyze a claim made by the
Key West Tourist Bureau. The tourist bureau claims that the average low temperature in Key
West during the winter months is 65 degrees Fahrenheit (rounded to the nearest degree). Clint
has been hired to assess this claim.
The consumer group has already compiled the high and low high temperatures for each winter
day from the winter of 2000–2001 to the winter of 2008–2009, 871 days in total.
835 Estimating the Mean of a Population

12-1-2000 12-2-2000 12-3-2000 2-28-2009

...
66.90 69.10 69.80 69.10

Figure 26.1
Clint’s 871 cards

Key West winter weather data: Time series data of daily high and low temperatures in Key
West, Florida, during the winter months (December, January, and February) from the 2000–2001
winter to the 2008–2009 winter.

Yeart Year of observation t

Montht Month of observation t
Dayt Day of observation t
Hight Precipitation for observation t
Lowt Precipitation for observation t

Clint has recorded the low temperature for each day on a 3 × 5 card (figure 26.1).
Since we can access statistical software, we can use the software to calculate the actual average
low temperature and the actual variance in the winter months.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Key West Winters.]
66.90 + 69.10 + 69.80 + … + 69.10
ActMeanAll871 = = 64.56
871

(66.90 − 64.56)2 + (69.10 − 64.56)2 + (69.80 − 64.56)2 + … + (69.10 − 64.56)2

ActVarAll871 =
871
= 43.39

In fact the claim of the Tourist Bureau is justified. The average low temperature in Key West
was 64.56 which when rounded to the nearest whole number equals 65. But Clint has a problem.
He does not have access to any statistical software, however. He does not have the time to sum
all 871 observations to calculate the mean. It would take him a long time to do so. Instead, he
adopts the econometrician’s philosophy to assess the Tourist Bureau’s claim:

Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.

Clint samples the population of all 871 days by performing the following experiment:
836 Chapter 26

12-2-2001 2-15-2008 1-22-2004 1-23-2001

69.10 66.20 54.00 55.90

Figure 26.2
Four cards Clint randomly selects

Experiment 26.1: Using Clint’s 871 Cards

Perform the following procedure four times:

• Thoroughly shuffle the cards.

• Randomly draw one card.
• Record the low reported on the card drawn; call it vi.
• Replace the card.

Use the average of the low for the four days sampled to estimate the mean:
v1 + v2 + v3 + v4
EstMean =
4

Clint draws the following four cards (figure 26.2). Clint uses these four values to estimate the
mean of the population:
69.10 + 66.20 + 54.00 + 55.90 245.20
EstMean = = = 61.30
4 4

Project: Assess the reliability of a sample mean.

Can Clint expect his estimated mean to equal the actual mean? Unfortunately, he cannot. The
following simulation clearly illustrates this.

Econometrics Lab 26.1: Estimating the Population Mean

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.1.]

By default, a sample size of 4 is selected (figure 26.3) and the Pause checkbox is checked. Click
the Start button and compare the numerical value of the estimated mean with the actual mean
from the first repetition of the experiment. Are they equal? Click the Continue button a few times
and compare the estimated and actual means for each of the subsequent repetitions. We can draw
two conclusions:
837 Estimating the Mean of a Population

ActMean Sample size

Actual
population 64.56 1 Sample
mean 2 size
ActVar
3
48.39 4
Actual
Population size 5
population
6
variance 871
7
Estimate mean

Start Stop Pause

Repetition

Numerical
EstMean
value of the
estimated
Mean
mean in this
repetition Variance

Figure 26.3
Opinion Poll simulation

• We cannot expect the numerical value of the estimated mean to equal the actual population
mean.
• We cannot predict the numerical value of the estimated mean before the experiment is con-
ducted; hence the estimated mean is a random variable.

So where does this leave Clint? He knows that in all likelihood his estimate, 61.30, does not
equal the actual population mean. But perhaps he could get some sense of how likely it is for
his estimate to be “close” to the actual value. Recall that Clint faced the same problem when
assessing the reliability of his opinion poll. He wanted to know how likely it was that his opinion
poll results were close to the actual fraction of the population supporting him for class president.
As with Clint’s opinion poll, we will use the general properties of Clint’s estimation procedure
to assess the reliability of one specific application of the procedure:
838 Chapter 26

General properties versus One specific application

↓ ↓
Clint’s estimation procedure: Apply the estimation
Calculate the mean of the 4 procedure once to Clint’s 4
randomly selected values randomly selected values:

v1 + v2 + v3 + v4
EstMean =
4
Before selection After selection
↓ ↓
Random variable:
Estimate: Numerical value
Probability distribution
↓ ↓
Mean[EstMean]
EstMean = 61.30
Var[EstMean]
↓

Mean and variance describe the center and spread of the estimate’s probability distribution
The estimated mean, EstMean, is a random variable. While we cannot determine the value of
a random variable before the experiment is conducted, we can often describe a random variable’s
probability distribution. Our next goal is to do just that. We will describe the probability distribu-
tion of the estimated mean, EstMean, by deriving the equations for its mean and variance. The
mean describes the probability distribution’s center and the variance describes the probability
distribution’s spread. We will use the same strategy that we used when studying opinion polls.
First we will consider a very simple and unrealistic experiment in which only one card is drawn.
Then we will apply the arithmetic of means and variances to generalize the results for sample
sizes greater than one.

Experiment 26.2: Write Each Day’s Low on a 3 × 5 Card

• Thoroughly shuffle the cards.

• Randomly draw one card.
• Record the low written on the card drawn.
• Replace the card.
Let v equal the recorded value:

v = the low temperature recorded on the card drawn

We will now derive the equation for the mean and variance of v’s probability distribution.
839 Estimating the Mean of a Population

26.2 Estimated Mean’s Probability Distribution

26.2.1 Measure of the Probability Distribution’s Center: Mean

Recall the formula for the mean (expected value) of a random variable:

Mean[ v] = ∑ v Prob[v]
All v

To apply this formula to our experiment, we will calculate the probability of drawing a specific
card from Clint’s deck of 871 cards (figure 26.4).
What is the probability of drawing the card for December 1, 2000? Since there are 871 cards
in the well-shuffled deck, there is one chance in 871 of drawing the December 1, 2000, card.
Thus the probability of drawing the December 1, 2000, card is 1/871. What is the probability
of drawing the card for December 2, 2000? By the same logic, the probability of drawing the
December 2, 2000, card is 1/871. Clearly, the probability of drawing the card for any specific
day from a well-shuffled deck of 871 cards is 1/871:
1
Prob [12-1-2000] = Prob [12-2-2000] = … = Prob [2-28-2009] =
871

Now we can apply the formula for the mean:

Mean[ v] = ∑ All v v Prob[v]

1 1 1
= 66.90 + 66.10 + … + 66.90
871 871 871
12-1-00 12-2-00 2-28-09

Next let us do a little algebra:

Factoring out 1 871 from each term
1
= (66.90 + 66.10 + 69.80 + … + 66.90)
871
66.90 + 69.10 + 69.80 + … + 69.10
=
871

12-1-2000 12-2-2000 12-3-2000 2-28-2009

...
66.90 69.10 69.80 69.10

Figure 26.4
Clint’s 871 cards
840 Chapter 26

66.90 + 69.10 + 69.80 + … + 69.10

What does equal? It equals the actual population mean,
871
64.56:

Mean[v] = ActMeanAll871 = 64.56

In words, the center of the random variable v’s probability distribution, Mean[v], equals that
actual mean of the population, 64.56.

26.2.2 Measure of the Probability Distribution’s Spread: Variance

We can use the same strategy to show that the variance of the random variable v will equal the
population variance. Review the formula for the variance of a random variable:

Var[ v] = ∑ All v (v − Mean[v])2 Prob[v]

1 1 1
= (66.90 − 64.56)2 + (66.10 − 64.56)2 + … + (66.90 − 64.56)2
871 871 871
12-1-00 12-2-00 2-28-09

Again, let us do a little algebra:

Factoring out 1 871 from each term
1
= [(66.90 − 64.56)2 + (66.10 − 64.56)2 + … + (66.90 − 64.56)2 ]
871
(66.90 − 64.56)2 + (66.10 − 64.56)2 + … + (66.90 − 64.56)2
=
871
(66.90 − 64.56)2 + (66.10 − 64.56)2 + … + (66.90 − 64.56)2
What does equal? It equals the actual
871
population variance, 43.39:

Var[v] = ActVarAll871 = 43.39

In words, the spread of the random variable v’s probability distribution, Var[v], equals the actual
variance of the population, 43.39.
Next consider the general case where T cards are drawn from the deck and then apply the
arithmetic of means and variances:

Experiment 26.3: Write Each Day’s Low on a 3 × 5 Card

Perform the following procedure T times:

841 Estimating the Mean of a Population

• Thoroughly shuffle the cards.

• Randomly draw one card.
• Record the amount of rainfall written on the card drawn; call it vi.
• Replace the card.
Use the average of the T days sampled to estimate the mean:
v1 + v2 + … + vT
EstMean = , where T = sample size
T

We can describe the probability distribution of the random variable EstMean by applying the
arithmetic of means and variances to the estimate of the mean:
v1 + v2 + … + vT 1
EstMean = = (v1 + v2 + … + vT )
T T

First we consider the mean. Keep in mind that the mean of each v equals the population mean,
ActMeanAll871:

⎡1 ⎤
Mean [ EstMean] = Mean ⎢ ( v1 + v2 + … + vT )⎥
⎣T ⎦
Mean[cx] = c Mean[x]
1
= Mean [ v1 + v2 + … + vT ]
T

Mean[x + y] = Mean[x] + Mean[y]

1
= (Mean[v1 ] + Mean[v2 ] + … + Mean[vT ])
T

Mean[v1] = Mean[v2] = ... = Mean[vT] = ActMeanAll871

1
= ( ActMeanAll871 + ActMeanAll871 + … + ActMeanAll871)
T

There are T ActMeanAll871 terms

1
= (T × ActMeanAll871)
T

Simplifying

= ActMeanAll871
842 Chapter 26

The terminology can be confusing: Mean[EstMean] is the mean of the estimated mean. To
resolve this confusion, remember that the estimated mean, EstMean, is a random variable.
Therefore, like any random variable, EstMean is described by its probability distribution. So
Mean[EstMean] refers to the mean of EstMean’s probability distribution, the center of EstMean’s
probability distribution. To emphasize this point, we will often follow the word “mean” with the
word “center” in parentheses when referring to Mean[EstMean]; for example, Mean[EstMean]
is the mean (center) of EstMean’s probability distribution.
Next we focus on the variance. Note that the variance of each v equals the population vari-
ance, ActVarAll871. Also note that since each card drawn is replaced, the probability distribution
of v for one draw is not affected by the value of v on any other draw. The v’s are independent;
hence all the covariances equal 0:

⎡1 ⎤
Var [ EstMean] = Var ⎢ ( v1 + v2 + … + vT )⎥
⎣T ⎦
Var[cx] = c2Var[x]
1
= Var [ v1 + v2 + … + vT ]
T2

Var[x + y] = Var[x] + Var[y] when x and y are independent. The covariances are all 0.
1
= (Var [v1 ] + Var [v2 ] + … + Var [vT ])
T2

Var[v1] = Var[v2] = ... = Var[vT] = ActVarAll871

1
= ( ActVarAll 871 + ActVarAll 871 + … + ActVarAll 871)
T2

There are T ActVarAll871 terms

1
= (T × ActVarAll871)
T2

Simplifying
ActVarAll 871
=
T

Econometrics Lab 26.2: Checking the Equations

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.2.]
843 Estimating the Mean of a Population

By exploiting the relative frequency interpretation of probability, we can use a simulation to

check our equations (figure 26.5). When we repeat this experiment many, many times, the dis-
tribution of the numerical values resulting from the experiments mirrors the probability distribu-
tion. We could keep track of the numerical value of the estimated mean for each repetition by
hand. Then we could calculate the mean and variance of these numerical values. This would be
a very laborious exercise, however. Fortunately, computers provide us with a quick, easy way
to simulate the experiment. We need only specify one parameter: the sample size, in our case
the number of cards drawn. Table 26.1 reports the results for three sample sizes, 1, 4, and 10.
Our simulation results confirm the equations we derived for the mean and variance. The equa-
tions reveal and the simulations confirm two important aspects of Clint’s estimation
procedure:

• The mean (center) of the estimated fraction’s probability distribution, Mean[EstMean], equals
the actual population mean:

Mean[EstMean] = ActMeanAll871

ActMean Sample size

Actual
population 64.56 1 Sample
mean 2 size
ActVar
3
48.39 4
Actual
population Population size
variance 871
Estimate mean

Start Stop Pause

Mean (average) of the
Repetition numerical values of
the estimated means
Numerical from all repetitions
EstMean
value of the
estimated Variance of the
Mean
mean in this numerical values of
repetition Variance the estimated means
from all repetitions

Figure 26.5
Opinion Poll simulation
844 Chapter 26

Table 26.1
Checking our equations for the mean and variance

Actual population mean = ActMeanAll871 = 64.56

Actual population variance = ActVarAll871 = 48.39

Equations Simulation

Mean (Center) Variance of Mean (average) of Variance of

of EstMean’s EstMean’s numerical values numerical values
Sample probability probability of EstMean from of EstMean from
size distribution distribution the experiments the experiments

48.39
1 64.56 = 48.39 ≈ 64.56 ≈ 48.39
1
48.39
4 64.56 = 12.10 ≈ 64.56 ≈ 12.10
4
48.39
10 64.56 = 4.84 ≈ 64.56 ≈ 4.84
10

• The variance of the estimated fraction’s probability distribution, Var[EstMean], decreases as

the sample size increases.
ActVarAll 871
Var[ EstMean] = , where T = sample size
T

26.3 Taking Stock: What We Know versus What Clint Knows

It is important to keep in mind what we know versus what Clint knows. We know that the average
of all the lows, the actual mean, equals 64.56 and the actual variance equals 43.39. We used a
statistical package to compute these two statistics. Accordingly, we know that the Tourist
Bureau’s claim that the average winter low in Key West is 65 can be justified (at least to the
nearest whole degree). Clint does not have access to a statistical package, however, and does
not have the time to perform the arithmetic calculations needed to calculate the actual mean.
Clint proceeds to estimate the mean winter low temperature by randomly selecting four days
and calculating the average of the lows on these days. What does Clint know about his estimated
mean, EstMean? Let us summarize what Clint knows:

• Clint cannot expect EstMean to equal the actual mean, ActMeanAll871.

• EstMean is a random variable. Even if Clint knew the actual mean, he could not predict with
certainty the numerical value of EstMean before he randomly selected the four cards.
• EstMean, like any random variable, is described by its probability distribution. Clint can
describe the center and spread of EstMean’s probability distribution by deriving the equations
for its mean and variance (figure 26.6).
845 Estimating the Mean of a Population

Probability distribution

EstMean
ActMeanAll871

Figure 26.6
Probability distribution of EstMean

Mean[EstMean] = ActMeanAll871
ActVarAll 871
Var[ EstMean] =
T

Even though Clint does not know the numerical value of the actual mean and actual variance,
ActMeanAll871 and ActVarAll871, he does know that the mean (center) of EstMean’s
ActVarAll 871
probability distribution equals ActMeanAll871 and the variance equals whatever
T
the values are.

26.4 Estimation Procedures: Importance of the Probability Distribution’s Mean (Center) and
Variance (Spread)

Let us review what we learned about estimation procedures (figure 26.7):

• Importance of the probability distribution’s mean: Formally, an estimation procedure is

unbiased whenever the mean (center) of the estimate’s probability distribution equals the actual
value. The relative frequency interpretation of probability provides intuition: if the experiment
were repeated many, many times, the average of the numerical values of the estimates will equal
the actual value. An unbiased estimation procedure does not systematically underestimate or
overestimate the actual value. If the probability distribution is symmetric, the chances that the
estimate calculated from one repetition of the experiment will be too high equal the chances that
846 Chapter 26

Probability distribution

estimate
Actual value

Figure 26.7
Probability distribution of estimates—Importance of the mean

the estimate will be too low. The fact that the mean (center) of EstMean’s probability distribution
equals the population mean is good news for Clint. The procedure he used to estimate the popu-
lation mean is unbiased, it does not systematically underestimate the actual population mean.
Since the estimation procedure is unbiased, the variance of the estimate’s probability distribution
plays a critical role.
• Importance of the probability distribution’s variance: When the estimation procedure is
unbiased, the probability distribution’s variance (spread) reveals the estimate’s reliability; the
variance tells us how likely it is that the numerical value of the estimate calculated from one
repetition of the experiment will be close to the actual value (figure 26.8).

26.5 Strategy to Estimate the Variance of the Estimated Mean’s Probability Distribution

The variance of EstMean’s probability distribution is crucial in assessing the reliability of Clint’s
estimate. On the one hand, if the variance is small, Clint can be confident that his estimate is
“close” to the actual population mean. On the other hand, if the variance is large, Clint must be
skeptical. What does Clint know about the variance of EstMean’s probability distribution? He
has already derived the equation for it:
ActVarAll 871
Var[ EstMean] = , where T = sample size
T
847 Estimating the Mean of a Population

Variance large Variance small

Estimate Estimate
Actual value Actual value

Variance large Variance small

↓ ↓
Estimate is unreliable
Estimate is reliable

Figure 26.8
Probability distribution of estimates—Importance of the variance

The sample size equals 4. We know that the actual population variance equals 43.39; hence we
know that the variance of the estimated mean’s probability distribution, Var[EstMean], equals
10.85:
43.39
Var[ EstMean] = = 10.85
4

Clint does not know the actual variance of the population, ActVarAll871, however. While he
has the raw data needed to calculate ActVarAll871, he does not have time to do so—it takes
longer to calculate the variance than the mean. So what should he do? Recall the econometri-
cian’s philosophy:

Econometrician’s philosophy: If you lack the information to determine the value directly, esti-
mate the value to the best of your ability using the information you do have.

Clint can estimate the population variance from the available information, his four randomly
selected values for the low temperatures: 69.10, 66.20, 54.00, and 55.90. Then, he can modify
the equation we derived for Var[EstMean] by replacing Var[EstMean] and ActVarAll871 with
their estimated versions:
EstVarAll 871
EstVar[ EstMean] =
T
848 Chapter 26

where

EstVarAll871 = estimated population variance

T = sample size

Clint adopts a two-step strategy.

Step 1: Clint estimates the variance of the population, EstVarAll871, the variance of all 871
observations.
Step 2: Clint uses the estimate for the variance of population to estimate the variance for the
estimated mean’s probability distribution.

Apply the relationship between variance

Estimate the variance of the population from the
of the estimated mean’s probability and
available information
the population variance
↓ ↓
ActVarAll 871
EstVarAll871 Var[ EstMean] =
T

EstVarAll 871
EstVar[ EstMean] =
T

26.6 Step 1: Estimate the Variance of the Population

We will now describe three attempts to estimate the population variance using the Clint’s four
randomly selected values by calculating the following:

1. The variance of Clint’s four numerical values based on the actual population mean.
2. The variance of Clint’s four numerical values based on the estimated population mean.
3. The “adjusted” variance of Clint’s four numerical values based on the estimated population
mean.

While the first two attempts fail for different reasons, they provide the motivation for the third
attempt, which succeeds. Therefore it is useful to explore the two failed attempts.

26.6.1 First Attempt: Variance of Clint’s Four Numerical Values Based on the Actual
Population Mean

The rationale is very simple. The variance is the average of the squared deviations from the
mean. So why not just calculate the variance for the four values he has, 66.20, 62.60, 62.10, and
57.90, to estimate the variance of the entire population? Let us do that now:
849 Estimating the Mean of a Population

Actual Deviation from actual Deviation

Sample value population population mean squared
mean (vi − ActMeanAll871) (vi − ActMeanAll871)2
69.10 64.56 69.10 − 64.56 = 4.54 20.6116
66.20 64.56 66.20 − 64.56 = 1.64 2.6896
54.00 64.56 54.00 − 64.56 = −10.56 111.5136
55.90 64.56 55.90 − 64.56 = −8.66 74.9956

Sum of squared deviations using actual population mean = 209.8104

Var[Clint’s four values using actual population mean]
= Average of squared deviations using actual population mean
Sum of squared deviations using actual population mean
=
Samplee size
209.8104
= = 69.94
4

The average of the squared deviations provides an estimate of the population’s variance:

EstVarAll871 = Var[Clint’s four values using actual population mean] = 69.94

Note that the estimate obtained from Clint’s sample, 69.94, does not equal the population vari-
ance, 43.39. This should not surprise us, however. We would never expect any estimate to achieve
perfection. What then is the best we could hope for? We could hope that this estimation procedure
would be unbiased; we could hope that the estimation procedure does not systematically under-
estimate or overestimate the actual value. This estimation procedure is in fact unbiased. We will
use our simulation to illustrate this.

Econometrics Lab 26.3: First Attempt—Estimating the Population Variance

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.3.]

Focus your attention on the lower left corner of the window. For the moment ignore the Divide
By line. In the Use Mean line, note that the Act button is selected. This means that the actual
population mean, 64.56, is used to calculate the deviations. Consequently our estimate of the
variance is based on the actual population mean just as we did in our calculations above. Click
Start. The values of the four cards selected are reported in figure 26.9. Calculate the variance
of the four values based on the actual population mean, 64.56. Is the simulation calculating the
estimated variance, EstVar, correctly? Next click Continue. Again, calculate the estimated vari-
ance. Is the simulation calculating it correctly? Also check to see if the simulation has calculated
850 Chapter 26

ActMean Sample size

Actual
population 64.56 1
Sample
mean 2
ActVar size
3
48.39 4
Actual
population Population size
variance 1198
Is the estimation
Estimate variance procedure unbiased?

Numerical value Start Stop

Pause
of the estimated
variance in this
repetition Repetition Mean (average) of the
numerical values of the
estimated variances from
EstVar
all repetitions

Mean

Use mean Act Est

Divide by T T−1

Figure 26.9
Opinion Poll simulation

the mean (average) of the variance estimates from the first two repetitions. Now uncheck the
Pause checkbox; after many, many repetitions click Stop. Compare the mean (average) of the
estimated variances with the actual population variance. Both equal 43.39. This suggests that
the estimation procedure for the actual population variance is unbiased.
Does this help Clint? Unfortunately, it does not. Recall what Clint knows versus what we
know. We know that the actual population mean equals 64.56, but Clint does not. Indeed, if he
knew the population mean, he would not have to go through all of this trouble in the first place.
So Clint must now try another tack.

26.6.2 Second Attempt: Variance of Clint’s Four Numerical Values Based on the Estimated
Population Mean

Since Clint does not know the actual population mean what can he do? He can use the estimate
he calculated, 61.30, from the four randomly selected lows?
851 Estimating the Mean of a Population

69.10 + 66.20 + 54.00 + 55.90 245.20

EstMean = = = 61.30
4 4

Clint’s estimate of Deviation from Clint’s

Sample Deviation squared
actual population estimated mean
value (vi − EstMean)2
mean (EstMean) (vi − EstMean)
69.10 61.30 69.10 − 61.30 = 7.80 60.8400
66.20 61.30 66.20 − 61.30 = 4.90 24.0100
54.00 61.30 54.00 − 61.30 = −7.30 53.2900
55.90 61.30 55.90 − 61.30 = −5.40 29.1600

Sum of squared deviations using estimated population mean = 167.3000

Var[Clint’s four values using estimated population mean]
= Average of squared deviations using estimated population meean
Sum of squared deviations using estimated population mean
=
Sample size
167.3000
= = 41.825
4

The average of the squared deviations based on Clint’s estimated population mean provides an
estimate of the actual population’s variance that Clint can calculate:

EstVarAll871 = Var[Clint’s four values using estimated population mean] = 41.825

Econometrics Lab 26.4: Second Attempt—Estimating the Population Variance

Hopefully this estimation procedure will be unbiased. Let us use a simulation to find out.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.4.]

Before clicking Start, note that Est is selected in the Use Mean line. Consequently, instead of
calculating the deviation from the actual mean, the simulation will now calculate the deviation
from the estimated mean. Now click Start and then after many, many repetitions click Stop.
Compare the mean (average) of the estimated variances with the actual population variance.
Unfortunately, they are not equal. The mean of the estimated variances equals 32.54 while the
actual variance equals 43.39. This suggests that the estimation procedure for the actual popula-
tion variance is biased downward; it systematically underestimates the variance of the
population.
852 Chapter 26

To explain why, note that when we use Clint’s estimate of the population mean to calculate
the sum of squared deviations, we obtain a lower sum than we did when we used the actual
population mean:
Sum of squared deviations using actual population mean = 209.8104
Sum of squared deviations using estimated population mean = 167.3000
Is this just a coincidence? No, it is not. To understand why, we will ask a question: What
value would minimize the sum of squared deviations of the 4 sample values? Let vVarMin equal
this value; that is,

vVarMin minimizes (v1 − vVarMin)2 + (v2 − vVarMin)2 + (v3 − vVarMin)2 + (v4 − vVarMin)2

where v1, v2, v3, and v4 equal the four sample values.
With a little calculus we can solve for vVarMin, differentiate the sum of squared deviations with
respect to vVarMin, and then set the derivative equal to 0:
d[(v1 − vVarMin )2 + (v2 − vVarMin )2 + (v3 − vVarMin )2 + (v4 − vVarMin )2 ]
dvVarMin
= −2(v1 − vVarMin ) − 2(v1 − vVarMin ) − 2(v1 − vVarMin ) − 2(v1 − vVarMin ) = 0
Now some algebra,

−2(v1 − vVarMin) − 2(v2 − vVarMin) − 2(v3 − vVarMin) − 2(v4 − vVarMin) = 0

(v1 − vVarMin) + (v2 − vVarMin) + (v3 − vVarMin)+ (v4 − vVarMin) = 0
v1 + v2 + v3 + v4 − 4vVarMin = 0
v1 + v2 + v3 + v4 = 4vVarMin
v1 + v2 + v3 + v4
= vVarMin
4

v1 + v2 + v3 + v4
What does equal? It is just the estimated population mean. Using the estimate
4
of the population mean to calculate the deviations from the mean minimizes the sum of squared
deviations. The two sums are equal only if the estimate of the population mean equals the actual
population mean:
Only if the estimated mean equals the actual mean
Sum of squared deviations based ↓ Sum of squared deviations based
on estimated population mean = on actual population mean

Typically the estimate of the population mean will not equal the actual population mean,
however. Consequently the sum of squared deviations based on the estimate of the population
mean will be less than the sum of squared deviations based on the population mean itself:
853 Estimating the Mean of a Population

Typically the estimated mean will not equal the actual mean
Sum of squared deviations based ↓ Sum of squared deviations based
on estimated population mean < on actual population mean
↓ ↓
Systematically underestimates Unbiased estimation procedure for
actual population variance actual population variance

Recall that the average of the sum of squared deviations based on the population mean pro-
vides an unbiased procedure for the population variance. Consequently, if Clint were to estimate
the population variance by using the deviations from the estimated mean rather than the actual
mean, he would systematically underestimate the variance of the population. So, let us make
one last attempt.

26.6.3 Third Attempt: “Adjusted” Variance of Clint’s Four Numerical Values Based on the
Estimated Population Mean

How should Clint proceed? Fortunately, he has a way out. Clearly, Clint has no choice but to
use the estimated population mean to calculate the sum of squared residuals. If he divides by 3
rather than 4, his estimation procedure will be unbiased. More generally, when the actual popula-
tion mean is unknown and the estimated population mean must be used to calculate the deviations
from the mean, we divide the sum of squared deviations by the sample size less 1 rather than
by the sample size itself. In this case the sample size less 1 equals the degrees of freedom; there
are 3 degrees of freedom. For the time being, do not worry about precisely what the degrees of
freedom represent and why they solve the problem of bias. We will motivate the rationale later
in this chapter. We do not wish to be distracted from Clint’s efforts to assess the Tourist Bureau’s
claim at this time. So we will postpone the rationalization for now.
Let us now compute our “adjusted” estimate of the variance. Recall our calculations of the
sum of squared deviations using the estimated population mean:

Clint’s estimate of Deviation from Clint’s

Sample Deviation Squared
actual population estimated mean
value (vi − EstMean)2
mean (EstMean) (vi − EstMean)
69.10 61.30 69.10 − 61.30 = 7.80 60.8400
66.20 61.30 66.20 − 61.30 = 4.90 24.0100
54.00 61.30 54.00 − 61.30 = −7.30 53.2900
55.90 61.30 55.90 − 61.30 = −5.40 29.1600

Sum of squared deviations using estimated population mean = 167.3000

Now calculate the adjusted variance:
854 Chapter 26

AdjVar[Clint’s four values using estimated population mean]

= Adjusted average of squared deviations using estimated population mean
Sum of squared deviations using estimated population mean
=
Sample size − 1
167.3000 167.3000
= = = 55.77
4 −1 3

EstVarAll871 = AdjVar[Clint’s four values using estimated population mean] = 55.77

Econometrics Lab 26.5: Third Attempt—Estimating the Population Variance

We will use a simulation to illustrate that the adjusted variance procedure is unbiased.

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.5.]

Before clicking Start, note that Est is selected in the Use Mean line and T − 1 in the Divide By
line. Consequently the simulation will now calculate the deviation from the estimated mean and
then after summing the squared deviations, it will divide by the sample size less one, T−1, rather
than the sample size itself, T.
Now click Start and then after many, many repetitions click Stop. Compare the mean
(average) of the estimated variances with the actual population variance. They are equal. This
suggests that our third estimation procedure for the population variance is unbiased. After many,
many repetitions the adjusted average of the squared deviations equals the actual population
variance.

26.7 Step 2: Use the Estimated Variance of the Population to Estimate the Variance of the
Estimated Mean’s Probability Distribution

At last we have an estimate for the population variance: EstVarAll871 = 55.77

But why did Clint want to know the population variance in the first place? He will use his
estimate of the population variance to estimate variance of the estimated mean’s probability
distribution, Var[EstMean]:
Apply the relationship between variance
Estimate the variance of the population
of the estimated mean’s probability and
from the available information
the population variance
↓ ↓
ActVarAll 871
EstVarAll871 Var[ EstMean] =
T
EstVarAll 871
EstVar[ EstMean] =
T
855 Estimating the Mean of a Population

Let us perform the calculation:

EstVarAll 871
EstVar[ EstMean] =
T
55.77
= = 13.94
4

Recall that the standard deviation equals the square root of the variance. Consequently the
estimated standard deviation equals the square root of the estimated variance. Furthermore the
estimated standard deviation has been given a special name, it is called the standard error:

SE = EstSD = EstVar

Now let us calculate the standard error, the estimated standard deviation of the estimated mean,
EstMean’s, probability distribution:

SE[ EstMean] = EstSD[ EstMean] = EstVar[ EstMean]

Since the estimated variance of the probability distribution equals 13.94,

SE[ EstMean] = EstSD[ EstMean] = 13.94 = 3.73

26.8 Clint’s Assessment of the Key West Tourist Bureau’s Claim

Now, at last, Clint is in a position to assess the Tourist Bureau’s claim that the average daily low
temperature during winter was 65 in Key West. Hypothesis testing allows Clint to do so. Recall
the steps involved in hypothesis testing:

Step 1: Collect evidence.

Clint has already done this. By selecting four months randomly, Clint estimates the average
low temperature to be 61.30.
69.10 + 66.20 + 54.00 + 55.90 245.20
EstMean = = = 61.30
4 4

Critical result:The estimated mean is 61.30. This evidence, the fact that the estimated mean is
less than 65, suggests that the Tourist Bureau’s claim is not justified.

Step 2: Play the cynic, challenge the evidence, and construct the null and alternative
hypotheses.

Cynic’s view: Despite the results the average low temperature is actually 65.
The null hypothesis adopts the cynical view by challenging the evidence; the cynic always
challenges the evidence. The alternative hypothesis is consistent with the evidence.
856 Chapter 26

H0: ActMeanAll871 = 65 ⇒ Actual mean is 65; cynic is correct.

H1: ActMeanAll871 < 65 ⇒ Actual mean is less than 65; cynic is incorrect.

Step 3: Formulate the question to assess the null hypothesis and the cynic’s view.

Question for the Cynic

• Generic question: What is the probability that the result would be like the one obtained (or
even stronger), if H0 is true (if the cynic is correct)?
• Specific question: The estimated mean was 61.30. What is the probability of obtaining an
average low of 61.30 or less from four randomly selected days if the actual population mean of
lows were 65 (if H0 is true)?

Answer: Prob[Results IF cynic correct] or Prob[Results IF H0 true]

The magnitude of this probability determines whether we reject or do not reject the null hypoth-
esis; that is, the magnitude of this probability determines the likelihood that the cynic is correct
and H0 is true:
Prob[Results IF H0 true] small Prob[Results IF H0 true] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

Step 4: Use the general properties of the estimation procedure, the estimated mean’s probability
distribution, to calculate Prob[Results IF H0 true].

When we assess Clint’s poll, we used the normal distribution to calculate this probability. Unfor-
tunately, we cannot use the normal distribution now. Instead, we must use a different distribution,
the Student t-distribution. We will now explain why.

26.9 Normal Distribution and the Student t-Distribution

Recall that the variable z played a critical role in using the normal distribution:
Value of random variable − Distribution mean
z=
Distribution sttandard deviation
= Number of standard deviations from the mean

In words, z equals the number of standard deviations the value lies from the mean. But Clint
does not know what the actual variance and standard deviation of his probability distribution
857 Estimating the Mean of a Population

Probability distribution of random variable

Normal

Student t

Value of random variable

Distribution mean

Figure 26.10
Normal and Student t-distributions

equals. That is why he had to estimate it. Consequently he cannot use the normal distribution
to calculate probabilities.
When the standard deviation is not known and must be estimated, the Student t-distribution
rather than the normal distribution must be used (figure 26.10). t equals the number of estimated
standard deviations the value lies from the mean:

Value of random variable − Distribution mean

t=
Estimated distributtion standard deviation
= Number of estimated standard deviations from the mean

The estimated standard deviation is called the standard error; hence

Value of random variable − Distribution mean

t=
Standard error
= Number of standard errors from the distribution mean

Since estimating the standard deviation introduces an additional element of uncertainty, the
Student t-distribution is more “spread out” than the normal distribution.
Unfortunately, the Student t-table is more cumbersome than the normal distribution table. The
right-hand tail probabilities not only depend on the value of t, the number of estimated standard
deviations from the mean, but also on the degrees of freedom. We will exploit our Econometrics
Lab to calculate the probability. Let us review the relevant information (figure 26.11):

Prob[Results IF H0 true] = Prob[EstMean is 61.30 or less IF ActMeanAll871 equals 65]

858 Chapter 26

Student t-distribution
Mean = 65
SE = 3.73
DF = 3
0.1972

EstMean
61.30 65

Figure 26.11
Probability distribution of EstMean

ActMeanAll871 equals 65]

Clint’s standard Number of
Estimation procedure If H0 true
error calculation observations
unbiased
↓ ↓ ↓
Mean[EstMean] = ActMeanAll871 = 65 SE[EstMean] = 3.73 DF = 4 − 1 = 3

Econometrics Lab 26.6—Using the Student t-Distribution

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Lab 26.6.]

The following information has been entered:

Mean = 65
Standard error = 3.73
Value = 61.30
Degrees of freedom = 27
Click Calculate.
Using our Econometrics Lab, the probability that EstMean would be 61.30 or less if the actual
population mean equals 65 is 0.1972; hence

Prob[Results IF H0 true] ≈ 0.20

Now let us return to the fifth and last step in the hypothesis testing procedure.
859 Estimating the Mean of a Population

Step 5: Decide on the standard of proof, a significance level

Prob[Results IF H0 true] Prob[Results IF H0 true]

less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 true]
Prob[Results IF H0 true] small
large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0

At the traditional significance levels used in academe (1, 5, and 10 percent), we cannot reject
the null hypothesis that the average low is 65 and the tourist bureau’s claims are justified. Con-
sequently Clint fails to reject the null hypothesis; that is, he fails to reject the Tourist Bureau’s
claim that the average winter low temperature in Key West is 65.

26.10 Tying Up a Loose End: Degrees of Freedom

Earlier in this chapter we postponed our explanation of degrees of freedom because it would
have interrupted the flow of our discussion. We will now return to the topic. In this case the
degrees of freedom equal the sample size less one:

Degrees of freedom = Sample size − 1

We will now explain why we divided by the degrees of freedom, the sample size less one, rather
than the sample size itself when estimating the variance. To do so, we will return to the basics
and discuss a calculation we have been making since grade school.
Revisit Amherst precipitation in the twentieth century (table 26.2). Calculating the mean for
June obtains

Table 26.2
Monthly precipitation in Amherst, MA, during the twentieth century

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

0.75 + 4.54 + … + 7.99 377.76

Mean (average) for June = = = 3.78
100 100

Key principle: To calculate a mean or an average, we divide the sum by the number of pieces
of information.

Sum
Mean (average) =
Number of pieces of information

Now focus on the variance. Recall that the variance equals the average of the squared devia-
tions from the mean:

Variance = Average of the squared deviations from the mean

Sum of the squared deviations
=
Number of pieces of informatioon

In Clint’s case the number of pieces of information available to estimate the variance equals 3,
not 4; that is why we should divide by 3. To understand why, consider the first observation in
isolation. Recall the first card Clint draws:

v1 = 4.54

Based only on the first observation, the estimated population mean would equal 4.54; hence,
when considering only the first observation, the deviation from the estimated mean, and the
squared deviation would equal 0. More generally, when we consider only the first observation
the deviation from the estimated mean and the squared deviation will always equal 0, despite
which one of the 871 cards was drawn, the estimated mean would equal that value recorded on
the card:
Considering first observation only
↓
EstMean = v1
↓
v1 − EstMean = 0
↓
(v1 − EstMean)2 = 0
861 Estimating the Mean of a Population

Based on only a single observation, the deviation and squared deviation will equal 0 regardless
of what the actual population variance equals. Consequently the first observation provides no
information when estimating the population variance. Only when the second observation is
introduced would we begin to get some information about the actual population variance:
• On the one hand, if the actual population variance were large, then it would be likely for the
value recorded on the second observation to be far from the first observation value; consequently
the deviations from the estimated mean and squared deviations would be large.
• On the other hand, if the actual population variance were small, then it would be likely for
the value recorded on the second observation to be close to the first observation value; conse-
quently the deviations from the estimated mean and the squared deviations would be small.
Since the first observation in isolation provides no information about the variance, the number
of pieces of information available to estimate the variance equals the sample size less one. The
degrees of freedom equal the number of pieces of information that are available to estimate the
variance.

Chapter 26 Review Questions

1. Consider an estimate’s probability distribution:

a. Why is the mean of the probability distribution important? Explain.
b. Why is the variance of the probability distribution important? Explain.
2. Focus on the estimation procedure for the population mean. What is the equation for the
a. mean of the estimate’s probability distribution?
b. variance of the estimate’s probability distribution?
3. When estimating the population variance should we divide by the sample size or the sample
size less 1? Explain.
4. What is the difference between the normal distribution and the Student t-distribution?
5. Why, when performing hypothesis tests involving the population mean, do we use the Student
t-distribution rather than the normal distribution?

Chapter 26 Exercises

1. The large manufacturer of laptop computers claims that on average its laptops achieves 7
hours of battery life; that is, the manufacturer claims that the actual mean number of hours its
laptop will operate without the battery begin recharged is 7:

Claim: ActMeanAll = 7.0

862 Chapter 26

A consumer group has challenged the claim, however, asserting that the average is less than 7.0.
You have been asked by the Consumer Protection Agency to investigate this claim. To do so you
conduct the following experiment:

Experiment: Write the Serial Number of Each Laptop on a 3 × 5 Card

Perform the following procedure eight times:

• Thoroughly shuffle the cards.

• Randomly draw one card.
• Find the laptop with that serial number and determine its battery life, the number of the number
of hours it will operate before needed a recharge.
• Replace the card.

Use the average of the eight laptops sampled to estimate the mean battery life:

v1 + v2 + v3 + v8
EstMean =
4

The results of the experiment are reported below:

Laptop Battery life (vi)

1 8.9
2 6.4
3 6.2
4 6.7
5 7.5
6 5.3
7 6.4
8 6.2

∑
8
a. What does the sum of the hours, v , now equal? On average, what does the battery
t =1 t

life of the eight laptops equal? What is the estimated mean, EstMean, for the battery life of
all laptops produced?
b. Show that the sum of squared deviations from the estimated mean, EstMean, of the eight
laptops you tested equals 8.12.
c. Estimate the variance for the battery life of all laptops produced by the manufacturer, the
population?
863 Estimating the Mean of a Population

d. Argue that the vi’s, the battery life of the laptops you tested, are independent random
variables.
e. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?

2. Now, apply the information you compiled in problem 1 to assess the consumer group’s com-
plaint. We will use hypothesis testing to do so.
a. Play the cynic and construct the null and alternative hypotheses, H0 and H1.
b. If the null hypothesis were correct, what would the mean of EstMean’s probability distri-
bution equal?
c. Formulate the question needed to assess the cynic’s view and the null hypothesis, that is,
compute Prob[Results IF H0 true].
d. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Our Econometrics
Lab includes software that allows you to calculate this probability easily. Access the lab using
the following link and then fill in the blanks with the appropriate numbers:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______
e. Assess the consumer manufacturer’s claim.

3. Now suppose that the sample size were 64, eight times larger. That is, instead of randomly
selecting eight cards, suppose that you randomly selected sixty-four laptops. Furthermore
suppose that the battery life data just replicated the data from the first eight laptops. That is,

8 laptops achieved 8.9 hours

8 laptops achieved 6.4 hours
8 laptops achieved 6.2 hours
8 laptops achieved 6.7 hours
8 laptops achieved 7.5 hours
8 laptops achieved 5.3 hours
8 laptops achieved 6.4 hours
8 laptops achieved 6.2 hours
864 Chapter 26

∑
64
a. What would the sum of the hours for the sixty-four laptops, v , equal? On average,
t =1 t

what does the battery life of the sixty-four laptops equal? What is the estimated mean,
EstMean, for the battery life of all laptops produced?
b. What would the sum of squared deviation from the estimated mean now equal?
c. Estimate the variance for number of hours of all laptops produced by the manufacturer,
the population?
d. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?
e. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Access the lab using
the following link and then fill in the blanks:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______

f. Assess the consumer manufacturer’s claim.

4. Now suppose that the sample size were 120, fifteen times larger than the original eight.
Furthermore suppose that the battery life data just replicated the data from the first eight laptops.
That is,

15 laptops achieved 8.9 hours

15 laptops achieved 6.4 hours
15 laptops achieved 6.2 hours
15 laptops achieved 6.7 hours
15 laptops achieved 7.5 hours
15 laptops achieved 5.3 hours
15 laptops achieved 6.4 hours
15 laptops achieved 6.2 hours
865 Estimating the Mean of a Population

∑
120
a. What would the sum of the hours for the 120 laptops, t =1 t
v , now equal? On average,
what does the battery life of the 120 laptops equal? What is the estimated mean, EstMean,
for the battery life of all laptops produced?
b. What would the sum of squared deviation from the estimated mean now equal?
c. Estimate the variance for number of hours of all laptops produced by the manufacturer,
the population?
d. Estimate the variance of EstMean’s probability distribution. What is the estimated standard
deviation; that is, what is the standard error?
e. Using the Student t-distribution, calculate Prob[Results IF H0 true]. Access the lab using
the following link and then fill in the blanks:

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

t-Distribution.]

Mean: ______
Standard error: ______
Value: ______
Degrees of freedom: ______
Prob[Results IF H0 true] = ______
f. Assess the consumer manufacturer’s claim.

5. Summarize you answers to the first three problems below:

Sum of Variance Variance

squared estimate of estimate of
Sample
EstMean deviations population’s EstMean’s Prob[Results IF H0 true]
size
from probability probability
EstMean distribution distribution
8 ____ _____ ________ ________ ________
64 ____ _____ ________ ________ ________
120 ____ _____ ________ ________ ________

Using your intuition, explain why Prob[Results IF H0 true] changes as the sample size increases.
866 Chapter 26

6. Suppose that you had not learned about degrees of freedom and the Student t-distribution.
Specifically, suppose that you used the sample size when estimating population variance and the
normal distribution when calculating Prob[Results IF H0 true]. Fill in the blanks had this been
the case:

Sum of Variance Variance

7. Compare your answers to problems 5 and 6. As the sample size increases does the use of the
degrees of freedom and the Student t-distribution become more or less critical? Explain.
Index

Absence of random influences, 165–68 maximum likelihood estimation procedure and, 780–82
Alternative hypothesis. See Hypothesis testing probit probability model, 774–82
Any Two estimation procedure, 205–208, 597 BLUE. See Best linear unbiased estimation procedure
Artificial models, 384–85 (BLUE)
Attenuation bias, 621 Bounds, upper and lower confidence interval, 488–89
Autocorrelation Breusch–Pagan–Godfrey test, 531–35
accounting for, 561–73 Budget theory of demand, 291–99
covariance and independence and, 549–51
defined, 551 Calculating an average, 242–45, 859–61
error terms and, 553 Causation and correlation, 27, 272–73
generalized least squares and, 561–74 Censored dependent variables, 783–88
mathematics, 554–59 Center. See Mean (center)
model, 551–53 Central Limit Theorem, 100–102
ordinary least squares and, 554–61 Clever algebraic manipulation approach, 340–44, 353
rho, 551–53 Coefficient estimates, 166–71, 187–89
robust standard errors and, 574 autocorrelation and, 554–59
Averages, 3–8 biased vs. unbiased, 239–41
calculating, 242–45, 861–63 heteroskedasticity and, 520–22
dummy variables and, 414, 427 interpretation using logarithms, 304–11
irrelevant explanatory variables, 468–69
Basic regression model, 515–16, 548, 583 mean (center) of, 190–93
Best fitting line, 148–50, 152–53 probability distribution, 190–97
ordinary least squares (OLS) estimation procedure for importance of, 224–25
finding, 156–64 variance estimation, 237–41
stretched S-shaped vs. straight regression, 774–82 reduced form estimation procedure, 714–15
systematic procedure to determine, 153–54 reliability of, 173, 199–204, 258–62
Best linear unbiased estimation procedure (BLUE), 205– range of x’s and, 202–204
208, 517, 538, 549, 585 sample size and, 201–202
Bias variance of error term’s probability distribution and,
attenuation, 621 199–201
biased but consistent estimation procedure, 598–601 Coefficient interpretation approach, 717–25
dilution, 621 paradoxes, 722–24
estimation procedures, 227–35, 599–601 Coefficient of determination, 490–94
explanatory variable/error term correlation and, 588–92 Conditional probability, 807–808
instrumental variable approach and ordinary least Monty Hall problem, 809–16
squares, 628–31 Confidence intervals
omitted explanatory variable, 448–56, 639–43 calculated with statistical software, 489–90
ordinary least squares, 660–61, 667–68, 676–77 consistence with data, 478–79
Binary dependent variables example, 479–89
electoral college example, 770–71 ordinary least squares and, 479–92
linear probability model, 772–74 upper and lower bounds, 488–89
868 Index

Consistency Dependent variables, 151, 769

instrumental variable (IV) estimation procedure and, binary, 770–82
604–607, 631–33, 650–54 electoral college example, 770–71
ordinary least squares estimation procedure and, linear probability model, 772–74
595–96, 601–602, 622–23, 642–43, 701–706 maximum likelihood estimation procedure and,
reduced form estimation procedure and, 715–17 780–82
Consistent and biased estimation procedure, 598–601 probit probability model, 774–82
Consistent and unbiased estimation procedure, 595–96 instrumental variables, 628–31, 648–49
Constant elasticity demand model, 311, 333–39, 340–44 as linear combination of explanatory variables, 498–99
equivalence of two-tailed t-tests and Wald tests logarithm, 306
illustrated using, 369–74 measurement error, 614–16
Ramsey REgression Specification Error Test (RESET) truncated (censored), 783–88
and, 389–91 Descriptive statistics, 794–95
Continuous random variables, 48, 58–61 arithmetic of means, variances, and covariances in, 27,
probability distributions, 827 29, 40–43
Correlated events, 816–18, 825–26 correlation coefficient, 22–27
Correlated random variables and covariance, 818–20 covariance, 13–21
Correlated variables, 444–45 defined, 2
highly correlated explanatory variables, 458–63 distribution center, 2, 3–8
perfectly correlated explanatory variables, 457–58 distribution spread, 2, 8–11
Correlation, 13 histograms, 11–12
causation and, 27, 272–73 mean (center), 2, 3–8
coefficient, 22–27 median, 796–98
natural range, 23–27 mode, 796
unit insensitivity, 22–23 relationships between two variables, 13–27
covariance and, 13–19 scatter diagrams, 13
events, 816–18 single data variables, 2–12
explanatory variable/error term, 588–92, 696–97 two data variables, 13–27
independence and, 825–26 variance (spread), 2, 8–11
perfect negative, 25–26 Determination, coefficient of, 490–94
perfect positive, 24–25 Differences, first, 668–69
serial (see Autocorrelation) Dilution bias, 621
Covariance, 13–19, 27, 29 Direct effect, explanatory variable, 450–52, 456, 639
correlated random variables and, 818–20 Discrete random variables, 48
independence and, 549–51, 826 Distribution, probability
independent random variables and, 821–25 center, 66–67, 90, 124–25, 443, 839–40
independent variables and, 19–21 coefficient estimate, 190–97, 224–25, 556–59
Cross price elasticity of demand, 333–39 continuous random variables, 58–61, 827
Cross-sectional fixed effects, 670–73 discrete random variable, 48
Cross-sectional random effects, 681–88 error term, 199–201, 226–36
estimate’s, 89–97
Data oddities, 393–94 hypothesis testing and, 126–38
Degrees of freedom, 241–45, 859–61 importance of mean and variance in, 96–100, 124–23,
Demand 198–99, 442–44, 845–47
budget theory of, 291–99 mean, 66–67, 90, 124–25, 443, 839–40
downward sloping theory of, 286–91, 318–29, 446–47 of random variables, 53–58
price elasticity of, 293–94, 300, 333–39 spread, 67–76, 91–92, 125–26, 444, 840–44
Demand model variance, 67–76, 91–92, 125–26, 444, 840–44
coefficient interpretation approach, 717–25 Distribution(s)
constant elasticity, 311, 333–39, 340–44, 389–91 center, 2–8, 66–67
endogenous and exogenous variables, 706–707 F-, 358–64
linear, 308–309, 318–21, 331–33, 386–89 histograms, 11–12
order condition and, 746–61 introduction to, 2
ordinary least squares, 700–703, 707–708, 738–41 mean, 2–8, 66–67
paradoxes, 717–25 normal, 102–104
reduced form estimation procedure, 722–24 example, 105–107
simultaneous equations for supply and, 698–703, 737, hypothesis testing and, 128–30
747 interval estimates and, 256–58
869 Index

justifying use of, 107–10 best linear unbiased estimation procedure (BLUE),
properties of, 105 205–208, 517, 538, 549, 585
rules of thumb, 110–12 biased and unbiased, 93–96, 125–26, 233, 236, 239–41,
stretched S-shaped line using, 774–82 289, 442–44, 522–26, 703
spread, 2, 8–11, 67–68 biased but consistent, 598–601
Student t-, 374–75, 856–59 embedded within the ordinary least squares estimation
variance, 2, 8–11, 67–68 procedure, 271–72, 516–17, 548–49, 584
Downward sloping demand theory, 286–91, 318–29, general and specific, 123
446–47 generalized least squares (GLS) (see Generalized least
Dummy variables squares (GLS) estimation procedure)
cross-sectional fixed effects and, 670–71 importance of mean in, 124–25, 198
averages and, 413 importance of variance in, 125–26, 198
defined, 414–15 instrumental variable (IV), 602–607, 623–24
example, 414, 428 maximum likelihood, 780–82
fixed effects and, 670–73 Min-Max, 205–208
implicit assumptions and, 424 population mean, 834–38
models, 415–23 opinion polls, 61–66, 89–97, 121–24
period fixed effects and, 673–81 ordinary least squares (OLS) (see Ordinary least
trap, 500–506 squares (OLS) estimation procedure)
probit, 780–82
Elasticity of demand. See Constant elasticity model; reduced form (RF), 708–15, 720–22, 744–45, 747
Demand theory; Price elasticity of demand order condition and, 746–61
Endogenous variables reliability of unbiased, 96–97
vs. exogenous variables, 699, 737 small and large sample properties, 592–601
replaced with its estimate, 743–44 Tobit, 787–88
Errors two-stage least squares (TSLS), 741–46, 760–61
measurement, 613 unbiased and consistent, 595–96
defined, 614 unbiased but inconsistent, 596–97
example, 625–28 Events
modeling, 614 correlated, 816–18, 825–26
ordinary least squares and, 617–23 independent, 820–21, 826
robust standard, 538–41, 574 Event trees, 798–806
type I and type II, 135–38 Monty Hall problem and, 809–16
Error terms, 151–52, 173, 183, 189, 270–71, 539, 558, EViews
559, 583 autocorrelation and, 547
autocorrelation and, 551–61 binary dependent variables, 782
equal variance premise (see Standard ordinary least dummy and interaction variables, 411–12, 415–23
squares (OLS) premises) dummy variable trap and, 504–506
explanatory variables and, 548, 584, 586–88 fixed effects, 680
correlation and bias, 588–92, 696–97 heteroskedasticity and, 540–41
independence premise (see Standard ordinary least ordinary least squares (OLS) estimation procedure and,
squares (OLS) premises) 163–64
heteroskedasticity and, 517–26 probit estimation procedure, 782
importance of, 164–71 random effects, 687
independence premise (see Standard ordinary least scatter diagrams, 515–16
squares (OLS) premises) Tobit estimation procedure, 788
probability distribution variance, 199–201 two-stage least squares, 746
degrees of freedom and, 241–45 Exogenous variables
estimation of, 226–36 absent from demand and supply models, 747–48
random influences and, 171–72 vs. endogenous variables, 699, 737
standard ordinary least squares (OLS) premises (see order condition and, 746–61
Standard ordinary least squares (OLS) premises) Explanatory variables, 151
Estimates dependent variable as linear combination of,
coefficient (see Coefficient estimates) 498–99
interval (see Interval estimates) dummy variables and, 415–23, 500–506
Estimation procedures endogenous, 699, 706
Any Two, 205–208, 597 error terms, 618–19
best fitting line and, 148–50 correlation and bias, 588–92, 696–97
870 Index

Explanatory variables (cont.) Independence

independence premise (see Standard ordinary least correlation and, 19–21, 825–26
squares (OLS) premises) covariance and, 19–21, 549–51
exogenous, 699, 706–707, 742–44 Independent events, 820–21, 826
having same value for all observations, 495–97 Independent random variables and covariance, 821–25
highly correlated, 458–63 Independent variables, 19–21, 444–45
interaction variables and, 425–27 Instrumental variable (IV) estimation procedure, 602–
irrelevant, 466–69 607, 623–24, 644–45
logarithms and, 305–11 compared to ordinary least squares, 628–31
measurement error and ordinary least squares, 617–23 example, 628–30, 648–49
multicollinearity and, 457–66 good instrument conditions, 603–604, 623–24, 631–33,
omitted, 445–56 644–45, 649–54
bias and, 448–56 justification, 604–607, 631–33, 650–54
one explanatory variable as linear combination of other, measurement error and, 628–33
497–98 omitted variables and, 648–54
perfectly correlated, 457–58 Interaction variables, 425–27, 432–34
Interpretation of coefficients, 717–25
F-distribution. 373–75 Interval estimates, 98, 253–54
First differences, 668–69 coefficient estimate reliability and, 258–62
Fixed effects normal distribution vs. Student t-distribution and,
cross-sectional, 670–73 256–58
period, 673–81 reliability and, 254–62
Formalizing hypothesis testing, 130–33, 268–70 Irrelevant explanatory variables, 466–69
F-statistic, 358–62, 366–68, 375 Iterative process for model formulation and assessment,
395–404
Generalized least squares (GLS) estimation procedure
autocorrelation and, 561–74 Joint probability, 807–808, 817–18
heteroskedasticity and, 526–38 correlated events and, 817
justification, 536–38, 573–74 independence and, 820–21
GLS. See Generalized least squares (GLS) estimation Monty Hall problem, 809–16
procedure Justification of estimation procedures
Goals of multiple regression analysis, 318, 325, 448, generalized least squares estimation procedure, 536–38,
475–77, 495, 639–40, 717 573–74
Goodness of fit, 490 instrumental variable estimation procedure, 604–607,
631–33, 650–54
Heteroskedasticity ordinary least squares estimation procedure,
accounting for, 526–35 595–96
defined, 517 reduced form estimation procedure, 715–17
error term equal variance premise, 517–19
generalized least squares and, 526–38 Line, best fitting, 148–50, 152–53
mathematics, 519–22 systematic procedure to determine, 153–54
ordinary least squares and, 519–8 Linear demand model, 308–309, 318–21
robust standard errors and, 538–41 money illusion theory and, 331–33
Histograms, 11–12 Ramsey REgression Specification Error Test (RESET)
Hypothesis testing and, 386–89
formalizing, 130–33, 268–70 Linear probability model, 772–74
logarithms and, 303–11 vs. probit probability model, 774–82
motivating, 126–30, 262–68 Logarithms, 303–304
null and alternative, 288–89, 296–99, 340–44, 366–68, dependent variable, 306
423 derivative of natural, 305
one-tailed, 286–91, 318–29 differential approximation, 305
vs. two-tailed, 291 illustration of use of, 307–11
significance levels and standard of proof, 134–35 interpretation of coefficient estimate, 304–11
two-tailed, 291–99, 329–44 models
type I and type II errors, 135–38 log dependent variable model, 306, 309–10
log explanatory variable model, 307, 310–11
Implicit assumptions, 424 log-log (constant elasticity) model, 293–94, 311
Income elasticity of demand, 333–39 Lower confidence interval bounds, 488–89
871 Index

Maximum likelihood estimation procedure, 780–82 linear demand model and, 331–33
Mean (center), 3–8, 27, 29, 839–40, 844–45 probability calculation
coefficient estimate’s probably distribution, 190–93, clever algebraic manipulation, 340–44, 353
224–25 Wald (F-distribution) test, 353–64
distribution, 2, 3–8 Nonconditional probability, 807
estimation of population Nonrandom sampling, 599–601
degrees of freedom, 859–61 Normal distribution, 102–104
estimating variance of, 848–54 example, 105–107
importance of probability distribution’s mean and hypothesis testing and, 128–30
variance, 845–46 interval estimates and, 256–58
normal distribution and Student t-distribution, 856–59 justifying use of, 107–10
procedure, 834–38 properties, 105
estimation procedures and, 93–96, 124–25 rules of thumb, 110–12
importance of, 95–96, 124–25, 198, 845–46 single variable, 374–75
probability distribution, 66–67, 90, 124–25, 190–93 stretched S-shaped line using, 774–82
of random variable, 66–67, 90 vs. Student t-distribution, 856–59
relationship to median, 798 Null hypothesis. See Hypothesis testing
strategy to estimate variance of probability distribution
of, 846–48 Oddities, data, 393–94
unbiased estimation procedure and, 124–25, 190–91, OLS. See Ordinary least squares (OLS) estimation
443 procedure
Measurement error, 613 Omitted explanatory variables, 445–46, 456
explanatory variable, 622–23 bias, 448–56, 639–43
defined, 614 explanatory variable/error term independence premise
dependent variable, 614–17 and, 641–43
example, 625–28 consistency and, 642–43
instrumental variables and, 628–33 direct effect, 448–56
modeling, 614 instrumental variables and, 648–54
Median, 796–98 bad instrument, 653
Min-Max estimation procedure, 205–208 good instrument conditions, 645, 649–53
Mode, 796 ordinary least squares and, 446–54, 642–43
Model specification and development proxy effect, 448–56
data oddities in, 393–94 Omitted variable proxy effect, 449–53
effect of economic conditions on presidential elections, One-tailed hypothesis test, 286–91
391–404 vs. two-tailed tests, 291
iterative process for formulation and assessment, Opinion poll simulation, 63–66, 68–76
395–404 Order condition, 746–49
Ramsey REgression Specification Error Test (RESET), overidentification problem, 757–60
384–91 underidentification problem, 749–56
Monty Hall problem, 809–16 Ordinary least squares (OLS) estimation procedure,
Motivating hypothesis testing, 126–30, 262–68 154–55, 184, 584–86, 660–61
Multicollinearity, 457–66 absence of random influences and, 165–68
earmarks of, 464–66 autocorrelation and, 551–61
highly correlated explanatory variables and, 458–63 as best linear unbiased estimation procedure (BLUE),
perfectly correlated explanatory variables and, 457–58 205–208
Multiple regression analysis bias, 660–61, 667–68, 676–77
constant elasticity demand model and, 333–39 coefficient estimates, 166–71, 187–89
flexibility of, 427 mean (center) of, 190–93
goal of, 318, 325, 448, 475–77, 495, 639–40, 717 reliability, 199–204
linear demand model and, 331–33 confidence intervals and, 479–92
power of, 427 consistency, 601, 622–23
vs. simple, 318 dependent variable measurement error and, 614–16
dummy variables and, 415–23
Natural logarithms, 305 error term/error term independence premise (see
Natural range and correlation coefficient, 23–27 Standard ordinary least squares (OLS) premises)
No money illusion theory, 329–31 equal error term variance premise (see Standard
constant elasticity demand model and, 333–39 ordinary least squares (OLS) premises)
hypothesis testing, 351–53 error terms and, 164–71, 171–72, 183, 270–71
872 Index

Ordinary least squares (OLS) estimation procedure Population(s)

(cont.) mean estimate, 848–61
estimates as random variables, 185 estimating variance of, 848–54
estimation procedures embedded within, 516–17, degrees of freedom, 859–61
548–49, 584 importance of probability distribution’s mean and
explanatory variable/error term independence premise variance, 845–46
(see Standard ordinary least squares (OLS) premises) normal distribution and Student t-distribution,
explanatory variable measurement error and, 617–23 856–59
finding best fitting line using, 156–64 procedure, 834–38
general properties, 223–25, 245–47 sample(s) and, 61–62, 89–97
heteroskedasticity and, 517–26 Presence of random influences, 168–72
instrumental variables approach compared to, 628–31 Price elasticity of demand, 293–94, 300, 333–39
interaction variables and, 432–34 Probability
multiple regression analysis and, 326–29 calculated for combination of different outcomes,
omitted explanatory variables and, 446–54 806–807
outliers and, 499–500 Central Limit Theorem and, 100–102
pitfalls of statistical software and, 494–506 conditional, 807–808
pooled regression, 665–68, 676–78, 684–86 continuous random variables and, 58–61
presence of random influences, 168–72 defined, 47
regression discrete random variables and, 48
model, 151, 183 event trees, 798–806
parameters, 151, 183 interval estimates, 98
reliability, 199–204, 661, 677–78 joint, 807–808, 817–18, 820–21
R-squared and, 491–94 Monty Hall problem, 809–16
simultaneous equations, 700–706, 707–708, 738–41 nonconditional, 807
standard ordinary least squares (OLS) premises (see probit, 774–82
Standard ordinary least squares (OLS) premises) random process and, 46–47
sum of squared errors criterion, 155–56 random variables and, 48–50, 88–89
truncated (censored) dependent variables, 783–88 relative frequency interpretation of, 50–52, 89, 94–95,
unbiased and consistent, 595–96 98–100, 226, 800–806
using only a constant, 413 tails, 260–62, 266–67, 289, 299, 340–44, 398–99
Outlier observations, 499–500 Probability distribution
Overidentification problem, 757–60 coefficient estimate, 190–97, 224–25, 556–59
error term, 199–201, 226–36
Panel data estimate’s, 89–97
cross-sectional fixed effects, 670–73 F-distribution, 373–75
cross-sectional random effects, 681–88 hypothesis testing and, 126–38
examples and strategy, 661–62 importance of mean and variance in, 198, 845–46
first differences and fixed effects, 662–73 mean (center), 66–67, 90, 124–25, 443, 839–40
period fixed effects, 673–75 normal distribution, 102–12
ordinary least squares (OLS) pooled regression, of random variable, 53–58
665–68, 676–78, 684–86 variance (spread), 67–76, 91–92, 125–26, 444,
Parameters of the model, 151, 183 840–44
Perfectly correlated explanatory variables, 457–58 Student t-distribution, 374–75, 856–59
Perfect correlation Probit
perfect negative correlation, 25–26 estimation procedure, 780–82
perfect positive correlation, 24–25 probability model, 774–82
Period fixed effects, 673–81 Proxy effect, missing explanatory variable, 448–56
Pitfalls, 494–95
dependent variable as linear combination of explanatory Ramsey REgression Specification Error Test (RESET)
variables, 498–99 artificial model construction and, 384–85
dummy variable trap, 500–506 constant elasticity demand model and, 389–91
explanatory variable having same value for all linear demand model and, 386–89
observations, 495–97 Random effects, cross-sectional, 681–88
one explanatory variable as linear combination of other Random influences, 151–52
explanatory variables, 497–98 absence of, 165–68
outliers, 499–500 error terms and, 171–72
Pooled regression, 665–68, 676–78, 684–86 presence of, 168–72
873 Index

Random process, 46–47 Scatter diagrams, 13, 152–53, 444–45, 515–16

probability and, 47–48 covariance and, 16–19, 549–50
random variables and, 48–50 Serial correlation. See Autocorrelation
Random variables, 48–50, 88–89, 185 Significance levels and standards of proof,
continuous, 58–61, 827 134–35
correlated, 818–20 type I and type II errors and, 135–38
discrete, 48 Simple regression analysis, 151–54
F-statistic as, 358–62 vs. multiple regression, 318
independent, 821–25, 826 Simultaneous equations. See also Reduced form (RF)
interval estimates and, 256–58 estimation procedure; Two-stage least squares
mean of, 66–67, 90 (TSLS) estimation procedure
probability distribution of, 53–58 order condition, 746–61
variance of, 54–58, 67–76, 91–92 ordinary least squares estimation procedure and,
Range, 8–12 700–706, 707–708, 738–41
correlation coefficient and natural, 23–24 vs. single equation models, 699–700
of x’s and estimate reliability, 202–204 Single data variables
Reduced form (RF) estimation procedure, 708–15, distribution center and, 2, 3–8
740–41, 747 distribution spread and, 2, 8–11
compared to ordinary least squares, 715 histograms and, 11–12
compared to two-stage least squares, 744–46, 752–53, Single equation vs. simultaneous equations models,
755–56, 759–62 699–700
demand model price coefficient and, 720–22 Slope, definition of, 324
justification, 715–17 Small and large sample properties, 592–601
order condition and, 746–61 Spread. See Variance (spread)
supply model price coefficient and, 722–24 Squared errors, sum of. See Sum of squared errors
Regression. See also Estimation procedures; Generalized Squared residuals, sum of. See Sum of squared
least squares (GLS) estimation procedure; residuals
Instrumental variables (IV) estimation procedure; Standard deviation, 8–12
Ordinary least squares (OLS) estimation Standard error
procedure; Reduced form (RF) estimation defined, 246
procedure; Two-stage least squares (TSLS) of regression, 247
estimation procedure robust, 538–41, 574
basic model, 515–16, 548, 583 Standard of proof, 134–35
pooled, 665–68, 676–78, 684–86 Standard ordinary least squares (OLS) premises, 173,
restricted, 353–55, 372–73 189, 271, 516, 548, 583–84, 660–61
unrestricted, 355–56, 372–73, 388–89 error term equal variance premise, 195
Relative frequency interpretation of probability, 50–52, heteroskedasticity and, 517–26
89, 94–95, 98–100, 226, 800–806 error term/error term independence premise, 195
Reliability autocorrelation and, 551–61
coefficient estimate, 173, 199–204, 258–62 explanatory variable/error term independence premise,
interval estimates and, 254–62 586–92
ordinary least squares, 661, 677–78 correlation and bias, 588–92, 696–97
unbiased estimation procedures, 96–97, 125–26, 233, Statistics, descriptive. See Descriptive statistics
236, 239–41, 289, 442–44 Stretched S-shaped line, 774–82
RESET. See Ramsey REgression Specification Error Test Student t-distribution, 374–75, 856–59
(RESET) Sum of squared errors, 155–56, 233–35
Residual plots, 566–68 Sum of squared residuals, 233–35
Restricted sum of squared residuals, 355–58 restricted and unrestricted, 355–58, 362–63, 373
F-distribution and, 362–63 Supply model
Rho. See Autocorrelation coefficient interpretation approach, 717–25
Robust standard errors, 538–41, 574 endogenous and exogenous variables, 706–707
R-squared, 490–94 order condition and, 746–61
ordinary least squares, 703–706, 707–708, 738–41
Samples, 61–62, 89–97 paradoxes, 717–25
nonrandom, 599–601 reduced form estimation procedure, 722–24
size, 201–202, 592–601 simultaneous equations for demand and, 698–703, 737,
estimate reliability and, 201–202 747
small and large sample properties, 592–601 Symmetry and independent events, 821, 826
874 Index

Tails probability, 260–62, 266–67, 289, 299, 398–99 error term correlation and bias, 588–92, 696–97
confidence intervals and, 479–89 error term independence premise (see standard
confidence interval upper and lower bounds and, ordinary least squares (OLS) premises)
488–89 exogenous, 742–44
Tobit estimation procedure, 787–88 having same value for all observations, 495–97
Truncated (censored) dependent variables, 783–88 highly correlated, 458–63
t-tests, 369–74 irrelevant, 466–69
Two-stage least squares (TSLS) estimation procedure, logarithm, 307
741–44 measurement error, 617–23
compared to reduced form, 744–46, 752–53, 755–56, multicollinearity and, 457–66
759–62 omitted, 445–56, 639–40
overidentification and, 760–61, 762 one explanatory variable as linear combination of
underidentification, 753, 756, 762 other, 497–98
Two-tailed confidence intervals perfectly correlated, 457–58
consistence with data, 478–79 instrumental, 602–607, 623–24, 628–31
example, 479–89 interaction, 425–27, 432–34
ordinary least squares and, 479–92 log dependent variable model, 309–10
Two-tailed hypothesis tests, 291–99 omitted explanatory, 445–56
equivalence of Wald tests and, 369–74 ordinary least squares (OLS), bias and consistency,
Two-variable relationships 446–56, 639–43
correlation, 13 instrumental variables, bias, and consistency, 642–43
correlation coefficient, 22–27 example, 646–48
covariance and, 13–19 probability distribution of, 53–58
independence of variables in, 19–21 random, 48–50, 53–58, 66–67, 88–89, 185
scatter diagrams of, 13 continuous, 48, 58–61, 827
Type I and type II errors, 135–38 correlated, 818–20
discrete, 48–58
Unbiased estimation procedures, 93–96, 125–26, 233, F-statistic as, 358–62
236, 239–41, 289, 442–44 independent, 821–25
autocorrelation and, 559–61 interval estimates and, 256–58
consistency and, 595–96 mean of, 66–67, 90
defined, 93–95 variance of, 54–58, 91–92
heteroskedasticity and, 522–26 Variance (spread), 2, 8–12, 27, 29, 67–76, 91–92,
inconsistent, 596–97 125–26, 194–97, 840–44
Uncorrelated variables, 19–21, 444–45 estimating coefficient estimate, 194–97, 224–25,
Underidentification problem, 749–56 237–41
Unit insensitivity and correlation coefficient, 22–23 normal distribution and Student t-distribution, 256–58,
Unrestricted sum of squared residuals, 355–58, 362–63, 856–57
373 strategy for, 225–26, 521–22, 556, 559
F-distribution and, 362–63, 373 estimating error term, 199–201, 227–36, 516, 517–19
Upper confidence interval bounds, 488–89 degrees of freedom and, 241–45
estimating population, 848–54
Variable(s) degrees of freedom, 859–61
correlated and independent, 19–21, 444–45 normal distribution and Student t-distribution,
dependent, 151, 628–31, 648–49, 769 856–59
binary, 770–82 strategy to estimate, 846–48
logarithm, 306 importance of, 96–97, 125–26, 198, 845–46
measurement error, 614–16 random variable, 54–58, 67–76, 91–92
truncated (censored), 783–88 unbiased estimation procedure, 125–23, 198–99, 444,
dummy, 414–24, 670–73 846–47
cross-sectional fixed effects, 670–71
period fixed effects, 673–80, 679–81 Wald (F-distribution) test
trap, 500–506 calculations, 363–65
endogenous vs. exogenous, 699, 706–707, 737 equivalence of two-tailed t-tests and, 369–74
exogenous, 699, 737 restricted and unrestricted sums of squared residuals
order condition and, 746–61 and, 355–58, 362–63, 373
explanatory, 151, 628–31, 648–49 restricted regression, 353–55, 372–73
dependent variable as linear combination of, 498–99 unrestricted regression, 355–56, 372–73, 388–89

ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
No ratings yet
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
41 pages
12STP Strategies For Promoting Cox's Bazar Sea Beach
100% (1)
12STP Strategies For Promoting Cox's Bazar Sea Beach
23 pages
GO RT NO 43
No ratings yet
GO RT NO 43
6 pages
Contoh Work Order
No ratings yet
Contoh Work Order
7 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
No ratings yet
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
143 pages
Iqra Ias: Ncert List For Upsc Cse
No ratings yet
Iqra Ias: Ncert List For Upsc Cse
3 pages
wayne-price-the-value-of-radical-theory
No ratings yet
wayne-price-the-value-of-radical-theory
88 pages
Statistics for Econometrics
No ratings yet
Statistics for Econometrics
100 pages
Class Notes in Statistics and Econometrics
No ratings yet
Class Notes in Statistics and Econometrics
1,644 pages
Yamane, 1973 - Statistics An Introductory Analysis
80% (5)
Yamane, 1973 - Statistics An Introductory Analysis
8 pages
02_inference
No ratings yet
02_inference
53 pages
Builder Profile
No ratings yet
Builder Profile
19 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Econometricks-short Guide
No ratings yet
Econometricks-short Guide
110 pages
Lecture 2 - Single Eq Reg Model
No ratings yet
Lecture 2 - Single Eq Reg Model
30 pages
ppt
No ratings yet
ppt
14 pages
Econometrics Notes - University of Utah (370 Pages)
No ratings yet
Econometrics Notes - University of Utah (370 Pages)
370 pages
Developing An Effective Business Model
No ratings yet
Developing An Effective Business Model
16 pages
Pivot The Great
No ratings yet
Pivot The Great
16 pages
BBE New A1.1 - 26.3.23 Verified
No ratings yet
BBE New A1.1 - 26.3.23 Verified
4 pages
Final_Exam_Routine Autumn_2024
No ratings yet
Final_Exam_Routine Autumn_2024
2 pages
Models of Industrial Relations
No ratings yet
Models of Industrial Relations
9 pages
Chapter # 2 Law of Demand: Statement
No ratings yet
Chapter # 2 Law of Demand: Statement
13 pages
TIS-FEE-STRUCTURE-2024-25
No ratings yet
TIS-FEE-STRUCTURE-2024-25
2 pages
Buy Replica Watches - CRAIG REDL
No ratings yet
Buy Replica Watches - CRAIG REDL
1 page
Asim Working Capital
No ratings yet
Asim Working Capital
6 pages
Chapter 2 - Key Measures and Relationships
No ratings yet
Chapter 2 - Key Measures and Relationships
17 pages
Finance Act
100% (1)
Finance Act
53 pages
Costing Process in Apparel Export House - Adidas
No ratings yet
Costing Process in Apparel Export House - Adidas
7 pages
Econ1F22Syllabus 1
No ratings yet
Econ1F22Syllabus 1
3 pages
Neel Kanth Trek
No ratings yet
Neel Kanth Trek
16 pages
Ch5 Present Worth Analysis
No ratings yet
Ch5 Present Worth Analysis
34 pages
URBS - Curitiba - RIT
No ratings yet
URBS - Curitiba - RIT
39 pages
05 - WILSON ELECTRONICS, Inc
No ratings yet
05 - WILSON ELECTRONICS, Inc
3 pages
Paul Newbold Statistics For Business and
100% (2)
Paul Newbold Statistics For Business and
869 pages
Modern Mathematical Statistics-Dudewics
No ratings yet
Modern Mathematical Statistics-Dudewics
6 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
57 pages
Lecture Notes
No ratings yet
Lecture Notes
80 pages
3-2021-51109 - Certificate of Registration (New Law New Rules) - B-2022-43740
No ratings yet
3-2021-51109 - Certificate of Registration (New Law New Rules) - B-2022-43740
5 pages
Contents (2)
No ratings yet
Contents (2)
5 pages
Candlestick Pattern
0% (1)
Candlestick Pattern
22 pages
Introduction To Statistics 1662031282
100% (1)
Introduction To Statistics 1662031282
936 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
No ratings yet
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
345 pages
Stat PDF
No ratings yet
Stat PDF
132 pages
RKC
No ratings yet
RKC
13 pages
(eBook PDF) Elementary Statistics 4th Edition instant download
100% (1)
(eBook PDF) Elementary Statistics 4th Edition instant download
57 pages
(eBook PDF) Elementary Statistics 4th Edition instant download
100% (1)
(eBook PDF) Elementary Statistics 4th Edition instant download
52 pages
Ecmet
No ratings yet
Ecmet
1,644 pages
Site Master File
100% (1)
Site Master File
27 pages
Dubai Business Address 2
No ratings yet
Dubai Business Address 2
4 pages
A Basic Course in Statistics (Fifth Edition) (PDFDrive)
No ratings yet
A Basic Course in Statistics (Fifth Edition) (PDFDrive)
764 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Paul Newbold Statistics For Business and Finance
No ratings yet
Paul Newbold Statistics For Business and Finance
869 pages
Practical Data Analysis With JMP
No ratings yet
Practical Data Analysis With JMP
8 pages
Ecstats
No ratings yet
Ecstats
299 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
A Concise Introduction To Statistical Inference
100% (5)
A Concise Introduction To Statistical Inference
231 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
Python贝叶斯分析（第2版）: Chinese Edition
From Everand
Python贝叶斯分析（第2版）: Chinese Edition
Posts & Telecom Press
No ratings yet
Quant Probability: Mathematical Foundations and Applications in Finance
From Everand
Quant Probability: Mathematical Foundations and Applications in Finance
William Johnson
5/5 (1)
Adaptive Filtering Prediction and Control
From Everand
Adaptive Filtering Prediction and Control
Graham C Goodwin
No ratings yet
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Introduction to Bayesian Statistics
From Everand
Introduction to Bayesian Statistics
William M. Bolstad
No ratings yet
Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook
From Everand
Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook
Bruce P. Gibbs
No ratings yet
The Elements of Quantitative Investing
From Everand
The Elements of Quantitative Investing
Giuseppe A. Paleologo
No ratings yet
Software and Programming Tools in Pharmaceutical Research
From Everand
Software and Programming Tools in Pharmaceutical Research
Editors: Dilpreet Singh
No ratings yet
Statistical Models and Methods for Reliability and Survival Analysis
From Everand
Statistical Models and Methods for Reliability and Survival Analysis
Vincent Couallier
No ratings yet
Risk Management in Banking
From Everand
Risk Management in Banking
Joël Bessis
No ratings yet
Basic Stochastic Processes
From Everand
Basic Stochastic Processes
Pierre Devolder
No ratings yet
A Practitioner's Approach for Problem-Solving using AI
From Everand
A Practitioner's Approach for Problem-Solving using AI
Satvik Vats
No ratings yet
The Failure of Risk Management: Why It's Broken and How to Fix It
From Everand
The Failure of Risk Management: Why It's Broken and How to Fix It
Douglas W. Hubbard
No ratings yet
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
From Everand
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring Probability and Random Processes Using MATLAB®
From Everand
Exploring Probability and Random Processes Using MATLAB®
Roshan Trivedi
No ratings yet
Principles of Control Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of Control Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Observer Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Observer Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Economic Control of Quality of Manufactured Product
From Everand
Economic Control of Quality of Manufactured Product
Walter A. Shewhart
No ratings yet
Sequential Analysis
From Everand
Sequential Analysis
Abraham Wald
4/5 (2)
Nonlinear Dynamics: Exploration Through Normal Forms
From Everand
Nonlinear Dynamics: Exploration Through Normal Forms
Peter B. Kahn
5/5 (1)
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #2
From Everand
Quant Developers' Tools and Techniques: Quant Books, #2
Manfred Hindering
No ratings yet
Numerical Methods and Implementation in Geotechnical Engineering – Part 2
From Everand
Numerical Methods and Implementation in Geotechnical Engineering – Part 2
Y.M. Cheng
No ratings yet
Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
From Everand
Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
Paul Goodwin
No ratings yet
Understanding Checksums and Cyclic Redundancy Checks
From Everand
Understanding Checksums and Cyclic Redundancy Checks
Philip Koopman
No ratings yet
Numerical Methods
From Everand
Numerical Methods
Germund Dahlquist
4.5/5 (1)
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
Guidelines for Hazard Evaluation Procedures
From Everand
Guidelines for Hazard Evaluation Procedures
CCPS (Center for Chemical Process Safety)
4.5/5 (5)
Robust Adaptive Control
From Everand
Robust Adaptive Control
Petros Ioannou
No ratings yet
Handbook of Measurement in Science and Engineering, Volume 2
From Everand
Handbook of Measurement in Science and Engineering, Volume 2
Myer Kutz
No ratings yet

(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)

Uploaded by

(the MIT Press Ser.) Frank Westhoff - An Introduction to Econometrics _ a Self-Contained Approach-MIT Press (2013)

Uploaded by

An Introduction to Econometrics

The MIT Press

Library of Congress Cataloging-in-Publication Data

Westhoff, Frank, 1946–

How to Use This Book xvii

2 Essentials of Probability and Estimation Procedures 45

2.3 Describing a Probability Distribution of a Random Variable 53

3 Interval Estimates and the Central Limit Theorem 87

4 Estimation Procedures, Estimates, and Hypothesis Testing 119

5 Ordinary Least Squares Estimation Procedure—The Mechanics 145

6 Ordinary Least Squares Estimation Procedure—The Properties 181

6.6.1 Mean (Center) of the Coefficient Estimate’s Probability Distribution 190

7 Estimating the Variance of an Estimate’s Probability Distribution 221

8 Interval Estimates and Hypothesis Testing 251

9 One-Tailed Tests, Two-Tailed Tests, and Logarithms 285

10 Multiple Regression Analysis—Introduction 317

11 Hypothesis Testing and the Wald Test 349

12 Model Specification and Development 381

13 Dummy and Interaction Variables 409

14 Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables 439

15 Other Regression Statistics and Pitfalls 473

16.4 Accounting for Heteroskedasticity: An Example 526

17 Autocorrelation (Serial Correlation) 545

Chapter 18 Review Questions 607

19 Measurement Error and the Instrumental Variables Estimation Procedure 611

20 Omitted Variables and the Instrumental Variable Estimation Procedure 637

21 Panel Data and Omitted Variables 657

22 Simultaneous Equations Models—Introduction 693

23 Simultaneous Equations Models—Identification 733

24 Binary and Truncated Dependent Variables 767

25 Descriptive Statistics, Probability, and Random Variables—A Closer Look 793

25.7 Correlation 816

26 Estimating the Mean of a Population 833

1.1 Describing a Single Data Variable

1.2 Describing the Relationship between Two Data Variables

1.3 Arithmetic of Means, Variances, and Covariances

Chapter 1 Prep Questions

1.1 Describing a Single Data Variable

1.1.1 Introduction to Distributions

1.1.2 Measure of the Distribution Center: Mean (Average)

x1 = value for the first observation (June 1901) = 0.75

The following equation expresses the mean generally:

“dissect” the numerator of this expression:

• The uppercase Greek sigma, Σ, is an abbreviation for the word summation.

Precipt Monthly precipitation in Amherst, MA, for observation t (inches)

Getting Started in EViews

Access the Amherst weather data online:

 To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

Next instruct EViews to calculate the means:

• This last step is critical. In the Group window: Click Sample.

Now, do not forget to close the file:

Jun Jul Aug

Mean 3.78 3.79 3.96

Year Apr May Jun Jul Aug Mean

1951 3.63 2.96 3.05 4.15 3.56 3.47

Let us first calculate the variance for 1998:

Month Precipitation Mean Deviation from mean Squared deviation

Sum of squared deviations 32.6282

Standard deviation = Variance = 6.5256 in 2 = 2.55 in

Month Precipitation Mean Deviation from mean Squared deviation

Sum of squared deviations 0.9326

Spread small Spread large

T = total number of observations

We can express the variance more concisely using “summation” notation:

1.1.4 Histogram: Visual Illustration of a Data Variable’s Distribution

The histogram provides a visual illustration of the distribution of September precipitation in

February 2.88 1.49

1.2 Describing the Relationship between Two Data Variables

1.2.2 Correlation of Two Variables

1.2.3 Measure of Correlation: Covariance

2. Multiply each observation’s x deviation by its y deviation.

We can express these steps concisely with an equation:

T = total number of observations

(xt − x– ) > 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) > 0

(xt − x– ) < 0 and (yt − –y ) > 0 → (xt − x– )(yt − –y ) < 0

(xt − x– ) < 0 and (yt − –y ) < 0 → (xt − x– )(yt − –y ) > 0

To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select

[To access this online material, go to http://mitpress.mit.edu/westhoffeconometrics and select