QM 2 Linear Regression
QM 2 Linear Regression
Linear Regression
• Topics
• Linear regression model
• Testing estimated coefficients in the model
• Notes:
• One-sided tests
• Heteroscedastic error terms
1. Linear regression model
• We want to estimate the causal effect of one variable X on
another variable Y.
.6
.4
.2
0
Distribution of CRRA in DK (Risk_Field.dta)
-2 0 2 4
crra
4
-
CR
Scatterplot of CRRA and age
20 40 60 80
Age
The OLS estimator
How can we estimate the unknown parameters β0 and β1 in
the linear model?
Notation:
• β0 and β1 refer to true population means
• and refer to estimators of population means
• b0 and b1 refer to estimated values of population means
Least squares assumption (i)
CRRA
-2
4
0
For any given value of X, the mean of u is zero: E(u | X=x) = 0.
20 40 60 80
Age
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0109613 .0018155 -6.04 0.000 -.0145248 -.0073978
_cons | 1.14752 .0857724 13.38 0.000 .9791665 1.315873
------------------------------------------------------------------------------
Example with linear regression model
• Linear regression model with age as only independent variable
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0109613 .0018155 -6.04 0.000 -.0145248 -.0073978
_cons | 1.14752 .0857724 13.38 0.000 .9791665 1.315873
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0109613 .0018155 -6.04 0.000 -.0145248 -.0073978
_cons | 1.14752 .0857724 13.38 0.000 .9791665 1.315873
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .0587697 .052734 1.11 0.265 -.0447355 .1622749
_cons | .6227924 .0374645 16.62 0.000 .5492579 .6963268
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .0587697 .052734 1.11 0.265 -.0447355 .1622749
_cons | .6227924 .0374645 16.62 0.000 .5492579 .6963268
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
student | .435723 .088916 4.90 0.000 .2612007 .6102454
_cons | .611252 .0273426 22.36 0.000 .5575845 .6649194
------------------------------------------------------------------------------
Predicted crra values
. regress crra student
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
student | .435723 .088916 4.90 0.000 .2612007 .6102454
_cons | .611252 .0273426 22.36 0.000 .5575845 .6649194
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
young | .3585009 .0685339 5.23 0.000 .223984 .4930178
_cons | .5901624 .0285679 20.66 0.000 .5340898 .6462349
------------------------------------------------------------------------------
Predicted crra values
. regress crra young
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
young | .3585009 .0685339 5.23 0.000 .223984 .4930178
_cons | .5901624 .0285679 20.66 0.000 .5340898 .6462349
------------------------------------------------------------------------------
• We want to:
• quantify the sampling error associated with
• test hypotheses such as β1=0
• construct a confidence interval for β1
Distribution of the sample average
• Sampling distribution of sample mean
• Yi are i.i.d. from the population
• Sample mean: = (Y1 + Y2 + … + Yn) / n = =
• Sample variance: var(Y) = = σY2
2
Sampling distribution, sample size = 100 observations ~ N(200, 5)
• What is E()?
• If E() = β1, then the mean (expected value) of the sampling distribution is equal to the
population mean, and OLS is unbiased.
• What is var()?
• We need to know the variance (standard deviation) of the sampling distribution for
hypothesis tests. The standard deviation of the sampling distribution for the estimator is
called the standard error.
General setup
The null hypothesis is a specific value assigned to the estimated
parameter, β1.
ˆ1 1,0
• To test β1: t =
SE (ˆ )
1
.3
.2
.1
Standard normal distribution
0
Standard normal distribution
-4 -2 0 2 4
z-value
Density
.4
.3
.2
.1
Transformation of normal distributions
0
Normal distributions
-5 0 5 10
z-value
Density
.4
.3
.2
.1
Student t-distribution with large degree of freedom
0
Student t-distribution
-4 -2 0 2 4
t-value
Test-statistic: student t-distribution
ˆ1 1,0
• Construct the t-statistic t =
ˆ 2ˆ
1
• You reject the null hypothesis at the 5% significance level if p-value <
0.05.
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0109613 .0018155 -6.04 0.000 -.0145248 -.0073978
_cons | 1.14752 .0857724 13.38 0.000 .9791665 1.315873
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .0587697 .052734 1.11 0.265 -.0447355 .1622749
_cons | .6227924 .0374645 16.62 0.000 .5492579 .6963268
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
student | .435723 .088916 4.90 0.000 .2612007 .6102454
_cons | .611252 .0273426 22.36 0.000 .5575845 .6649194
------------------------------------------------------------------------------
T-test and 95% confidence interval
. regress crra student
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
student | .435723 .088916 4.90 0.000 .2612007 .6102454
_cons | .611252 .0273426 22.36 0.000 .5575845 .6649194
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
young | .3585009 .0685339 5.23 0.000 .223984 .4930178
_cons | .5901624 .0285679 20.66 0.000 .5340898 .6462349
------------------------------------------------------------------------------
T-test and 95% confidence interval
. regress crra young
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
young | .3585009 .0685339 5.23 0.000 .223984 .4930178
_cons | .5901624 .0285679 20.66 0.000 .5340898 .6462349
------------------------------------------------------------------------------
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0109613 .0018155 -6.04 0.000 -.0145248 -.0073978
_cons | 1.14752 .0857724 13.38 0.000 .9791665 1.315873
------------------------------------------------------------------------------
• H0: β1 = 0
• H1: β1 > 0
• Critical t-value = 1.64 at 5% level
• p-value: pr(t-value > −6.04) = 1 − pr(t-value < −6.04) = 1 − 0.0001 = 0.9999
• H1: β1 < 0
• Critical t-value = −1.64 at 5% level
• p-value: pr(Z < −6.04) = 0.0001
One-sided tests of discrete independent variable
• Linear regression model with female as only independent variable
------------------------------------------------------------------------------
crra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .0587697 .052734 1.11 0.265 -.0447355 .1622749
_cons | .6227924 .0374645 16.62 0.000 .5492579 .6963268
------------------------------------------------------------------------------
• H0: β1 = 0
• H1: β1 > 0
• Critical t-value = 1.64 at 5% level
• p-value: pr(t-value > 1.11) = 1 − pr(t-value < 1.11) = 1 − 0.867 = 0.133
• H1: β1 < 0
• Critical t-value = −1.64 at 5% level
• p-value: pr(Z < 1.11) = 0.8665
z-distribution
Standard normal distribution:
• Referred to as the z-distribution
• Cumulative standard normal distribution in Table 1 of S&W
• z-test: testing the mean of a population against a null
hypothesis (hypothesized value)
• One and two-tailed tests
• Can use the z-distribution for any n if the population distribution is
normal and the standard deviation is known
t-distribution
Student t-distribution:
• Referred to as the t-distribution
• Standard deviation of population distribution is not known
• t-test: testing the mean of a population against a null
hypothesis (hypothesized value)
• One and two-tailed tests
• t-distribution is similar to the z-distribution when n is large
• Can use the t-test for small samples (n < 30) if the population
distribution is approximately normal
• Critical values for two-sided and one-sided tests using the t-
distribution in Table 2 of S&W
Note: Heteroskedasticity
• Homoskedasticity
• var(u|X=x) is constant – the variance of the conditional
distribution of u given X does not depend on X
• Heteroskedasticity
• var(u|X=x) is not constant – the variance of the conditional
distribution of u given X depends on X
Example of heteroskedasticity
Example of heteroskedasticity
We implicitly allow for heteroskedasticity
Recall the three least squares assumptions:
1. E(u|X=x) = 0
2. (Xi,Yi), i=1,…,n, are i.i.d.
3. Large outliers are rare
• The formula for the variance of and the OLS standard error
simplifies under homoskesticity:
var[( X i x )ui ]
var() = 2 2
(general formula)
n( )
X
u2
= 2 (simplification if u is homoscedastic)
n X