0% found this document useful (0 votes)
31 views

CH 1 and 2

Uploaded by

Milkessa Seyoum
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

CH 1 and 2

Uploaded by

Milkessa Seyoum
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Jeffrey M.

Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Chapter

The Nature of Econometrics and Economic Data

1.1 What Is Econometrics?


Econometrics has evolved as a separate discipline from mathematical statistics because the
former focuses on the problems inherent in collecting and analyzing non-experimental economic
data. Nonexperimental data are not accumulated through controlled experiments on individuals,
firms, or segments of the economy. (Nonexperimental data are sometimes called observational
data to emphasize the fact that the researcher is a passive collector of the data.) Experimental
data are often collected in laboratory environments in the natural sciences, but they are much
more difficult to obtain in the social sciences. While some social experiments can be devised, it
is often impossible, prohibitively expensive, or morally repugnant to conduct the kinds of
controlled experiments that would be needed to address economic issues.

Naturally, econometricians have borrowed from mathematical statisticians whenever possible.


The method of multiple regression analysis is the mainstay in both fields, but its focus and
interpretation can differ markedly. In addition, economists have devised new techniques to deal
with the complexities of economic data and to test the predictions of economic theories.

1.2. Steps in Empirical Economic Analysis


Econometric methods are relevant in virtually every branch of applied economics. They come
into play either when we have an economic theory to test or when we have a relationship in mind
that has some importance for business decisions or policy analysis. An empirical analysis uses
data to test a theory or to estimate a relationship.

Formulation of the question of interest : the first step in any empirical analysis is the
careful formulation of the question of interest. The question might deal with testing a certain
aspect of an economic theory, or it might pertain to testing the effects of a government policy. In
principle, econometric methods can be used to answer a wide range of questions.

Economic Model Development: An economic model consists of mathematical equations


that describe various relationships. Economists are well-known for their building of models to
describe a vast array of behaviors. Formal economic modeling is sometimes the starting point for
empirical analysis, but it is more common to use economic theory less formally, or even to rely
entirely on intuition.

Econometric model Development: After we specify an economic model, we need to turn it


into what we call an econometric model.

1
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Hypotheses Formulation: Once an econometric model has been specified, various


hypotheses of interest can be stated in terms of the unknown parameters.

Empirical analysis: An empirical analysis, by definition, requires data. After data on the
relevant variables have been collected, econometric methods are used to estimate the parameters
in the econometric model and to formally test hypotheses of interest.

Inference and Prediction: In some cases, the econometric model is used to make predictions
in either the testing of a theory or the study of a policy’s impact.

1.3 The Structure of Economic Data


Economic data sets come in a variety of types.

Cross-Sectional Data:
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time. An important feature of
cross-sectional data is that we can often assume that they have been obtained by random
sampling from the underlying population.

Sometimes random sampling is not appropriate as an assumption for analyzing cross-sectional


data. For example, suppose we are interested in studying factors that influence the accumulation
of family wealth. We could survey a random sample of families, but some families might refuse
to report their wealth. If, for example, wealthier families are less likely to disclose their wealth,
then the resulting sample on wealth is not a random sample from the population of all families.
This is an illustration of a sample selection problem.

Another violation of random sampling occurs when we sample from units that are large relative
to the population, particularly geographical units. The potential problem in such cases is that the
population is not large enough to reasonably assume the observations are independent draws.

Cross-sectional data are widely used in economics and other social sciences. In economics, the
analysis of cross-sectional data is closely aligned with the applied microeconomics fields, such
as labor economics, state and local public finance, industrial organization, urban economics,
demography, and health economics. Data on individuals, households, firms, and cities at a given
point in time are important for testing microeconomic hypotheses and evaluating economic
policies.

2
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Time Series Data:


A time series data set consists of observations on a variable or several variables over time.
Examples of time series data include stock prices, money supply, consumer price index, gross
domestic product, annual homicide rates, and automobile sales figures. Because past events can
influence future events and lags in behavior are prevalent in the social sciences, time is an
important dimension in a time series data set. Unlike the arrangement of cross-sectional data, the
chronological ordering of observations in a time series conveys potentially important
information.

A key feature of time series data that makes it more difficult to analyze than cross-sectional data
is the fact that economic observations can rarely, if ever, be assumed to be independent across
time. Most economic and other time series are related, often strongly related, to their recent
histories. For example, knowing something about the gross domestic product from last quarter
tells us quite a bit about the likely range of the GDP during this quarter, since GDP tends to
remain fairly stable from one quarter to the next. While most econometric procedures can be
used with both cross-sectional and time series data, more needs to be done in specifying
econometric models for time series data before standard econometric methods can be justified.

Another feature of time series data that can require special attention is the data frequency at
which the data are collected. In economics, the most common frequencies are daily, weekly,
monthly, quarterly, and annually. Many weekly, monthly, and quarterly economic time series
display a strong seasonal pattern, which can be an important factor in a time series analysis. For
example, monthly data on housing starts differs across the months simply due to changing
weather conditions.

Pooled Cross Sections:


Some data sets have both cross-sectional and time series features. For example, suppose that two
cross-sectional household surveys are taken in the United States, one in 1985 and one in 1990. In
1985, a random sample of households is surveyed for variables such as income, savings, family
size, and so on. In 1990, a new random sample of households is taken using the same survey
questions. In order to increase our sample size, we can form a pooled cross section by combining
the two years. Because random samples are taken in each year, it would be a fluke if the same
household appeared in the sample during both years. This important factor distinguishes a pooled
cross section from a panel data set.

Pooling cross sections from different years is often an effective way of analyzing the effects of a
new government policy. The idea is to collect data from the years before and after a key policy
change. A pooled cross section is analyzed much like a standard cross section, except that we
often need to account for secular differences in the variables across the time. In fact, in addition
3
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

to increasing the sample size, the point of a pooled cross-sectional analysis is often to see how a
key relationship has changed over time.

Panel or Longitudinal Data:


A panel data (or longitudinal data) set consists of a time series for each cross-sectional member
in the data set. As an example, suppose we have wage, education, and employment history for a
set of individuals followed over a ten-year period. Or we might collect information, such as
investment and financial data, about the same set of firms over a five-year time period. Panel
data can also be collected on geographical units.

The key feature of panel data that distinguishes it from a pooled cross section is the fact that the
same cross-sectional units (individuals, firms, or counties) are followed over a given time period.

Because panel data require replication of the same units over time, panel data sets, especially
those on individuals, households, and firms, are more difficult to obtain than pooled cross
sections. Not surprisingly, observing the same units over time leads to several advantages over
cross-sectional data or even pooled cross-sectional data. The benefit that we will focus on in this
text is that having multiple observations on the same units allows us to control certain
unobserved characteristics of individuals, firms, and so on. As we will see, the use of more than
one observation can facilitate causal inference in situations where inferring causality would be
very difficult if only a single cross section were available. A second advantage of panel data is
that it often allows us to study the importance of lags in behavior or the result of decision
making. This information can be significant since many economic policies can be expected to
have an impact only after some time has passed.

1.4 Causality and the Notion of Ceteris Paribus in Econometric Analysis


In most tests of economic theory, and certainly for evaluating public policy, the economist’s goal
is to infer that one variable has a causal effect on another variable (such as crime rate or worker
productivity). Simply finding an association between two or more variables might be suggestive,
but unless causality can be established, it is rarely compelling.

The notion of ceteris paribus—which means “other (relevant) factors being equal”—plays an
important role in causal analysis. Most economic questions are ceteris paribus by nature. For
instance in the theory of demand if other factors are not held fixed, then we cannot know the
causal effect of a price change on quantity demanded. Holding other factors fixed is critical for
policy analysis as well.

While this may seem pretty simple, even at this early stage it should be clear that, except in very
special cases, it will not be possible to literally hold all else equal. The key question in most

4
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

empirical studies is: Have enough other factors been held fixed to make a case for causality?
Rarely is an econometric study evaluated without raising this issue. In most serious applications,
the number of factors that can affect the variable of interest is immense, and the isolation of any
particular variable may seem like a hopeless effort. However, we will eventually see that, when
carefully applied, econometric methods can simulate a ceteris paribus experiment. Even when
economic theories are not most naturally described in terms of causality, they often have
predictions that can be tested using econometric methods.

5
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Chapter 2
The Simple Regression Model

The simple regression model can be used to study the relationship between two variables.

2.1 Definition of the Simple Regression Model


Much of applied econometric analysis begins with the following premise: y and x are two
variables, representing some population, and we are interested in “explaining y in terms of x,” or
in “studying how y varies with changes in x.” In writing down a model that will “explain y in
terms of x,” we must confront three issues.

 First, since there is never an exact relationship between two variables, how do we allow
for other factors to affect y?
 Second, what is the functional relationship between y and x? And
 Third, how can we be sure we are capturing a ceteris paribus relationship between y and
x (if that is a desired goal)?

We can resolve these ambiguities by writing down an equation (population regression) relating y
to x. A simple equation is

Such mathematical equation for a population that relates two variables is known as

 the simple linear regression model or


 the two-variable linear regression model or
 bivariate linear regression model because it relates the two variables x and y.

The variables y and x have several different names used interchangeably. The terms “dependent
variable” and “independent variable” are frequently used in econometrics. But be aware that the
label “independent” here does not refer to the statistical notion of independence between random
variables. In statistics, random variables X and Y are said to be independent if probability of the
occurrence X does not affect the probability of occurrence of Y and vice versa.

6
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

/covariate

In the population regression function

the variable u, called the error term or disturbance (not residual) in the relationship, represents
factors other than x that affect y. A simple regression analysis effectively treats all factors
affecting y other than x as being unobserved. You can usefully think of u as standing for
“unobserved.” If the other factors in u are held fixed, so that the change in u is zero, ∆u = 0, then
x has a linear effect on y:

Thus, the change in y is simply β1 multiplied by the change in x. This means that β1 is the slope
parameter in the relationship between y and x holding the other factors in u fixed. The linearity
implies that a one-unit change in x has the same effect on y, regardless of the initial value of x.
This is unrealistic for many economic applications. For example, in a wage-education function,
we might want to allow for increasing returns: the next year of education has a larger effect on
wages than did the previous year.

The intercept parameter β0 also has its uses, although it is rarely central to an analysis.

The most difficult issue to address is whether the model

really allows us to draw ceteris paribus conclusions about how x affects y. We just saw in that β1
does measure the effect of x on y, holding all other factors (in u) fixed. Is this the end of the
causality issue? Unfortunately, no. How can we hope to learn in general about the ceteris paribus
effect of x on y, holding other factors fixed, when we are ignoring all those other factors?

We are only able to get reliable estimators of β 0 and β1 from a random sample of data when we
make an assumption restricting how the unobservable u is related to the explanatory variable x.
Without such a restriction, we will not be able to estimate the ceteris paribus effect, β 1. Because

7
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

u and x are random variables. As long as the intercept β 0 is included in the equation, nothing is
lost by assuming that the average value of u in the population is zero. This means the
unobservable is normally distributed.

A natural measure of the association between two random variables is the correlation coefficient.

The correlation coefficient between x and u is given by

If u and x are uncorrelated ( ), then, as random variables, they are not linearly related.
(they may be related but not linearly). It is possible for u to be uncorrelated with x while being
correlated with functions of x, such as x2.

Since most of the time u and x are correlated we rely on the conditional expectation E(u/x) not on
the unconditional expectation E(u). Because u and x are random variables, we can define the
conditional distribution of u given any value of x. In particular, for any x, we can obtain the
expected (or average) value of u for that slice of the population described by the value of x. If u
and x are unrelated (u does not depend on the value of x)

zero conditional mean assumption.

The zero conditional mean assumption another interpretation that is often useful. Taking the
expected value of conditional on x we get

Because E(u/x)=0 and E(x/x)=x

The equation shows E(y/x) is a linear function of x. For example if x is education and y is wage,
for different level of education E(y) is measured. See the graph below.

8
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Where E(u/x)=E(u)=0 we break y into two components.

1) The pieces is sometimes called the systematic part of y—that is, the part of y
explained by x—and
2) u is called the unsystematic part, or the part of y not explained by x.

2.2 Deriving the Ordinary Least Squares Estimates

In the population regression function how are estimated? They


are estimated from sample. Using sample observation i

How to use the sample data to obtain estimates of the intercept and slope in the population
regression? One way of finding the parameters is using the OLS estimator. In the population
regression are parameters that minimize the square of the error term (disturbance). In
the sample regression line they are parameters that minimize the square of the residuals.

9
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Or using Expectation

10
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

So using the method of moments approach to estimation we can solve for β0 and β1 from

Using summation operation

Plugging into

11
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

So the estimate is simply the sample covariance between x and y divided by the sample

variance of x. If x and y are positively correlated in the sample, then is positive; and if x and

y are negatively correlated, then is negative.

The ordinary least square estimators using matrix

Let there is only one independent variable x. β0 and β1 are the intercept and slope parameters.

y=Xβ+u

Residual Sum of Squares (RSS) is

12
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

RSS=u’u=(y-Xβ)’(y-Xβ)

RSS=u’u=(y’- β’X’)( y-Xβ) when we transpose a multiplication we change the order of the
places (Xββ’X’)

RSS=u’u= y’y - y’Xβ – β’X’y + β’X’Xβ to find the estimators we derivate the RSS with respect
to β

0 - X’y – X’y + 2X’Xβ = 0 because , 0 is a vector

-2X’y + 2X’Xβ =0

2X’Xβ = 2X’y

X’Xβ = X’y

(X’X)-1X’Xβ = (X’X)-1X’y

Iβ = (X’X)-1X’y

β = (X’X)-1X’y X’X is assumed to be non-singular (invertible)

What is the variance of β?

Let a matrix A is a non-stochastic (non-random or constant) variable and X is a stochastic


variable. Then var (AX) = AXA’. Transpose of a product changes the order of product. Let X
and Y are matrices, then (XY)’=Y’X’. Another is transpose of inverse is inverse of transpose.

Var (β)= Var[(X’X)-1X’y] here X is considered as constant (I don’t know the reason)

Var (β) = (X’X)-1X’ Var(y)X(X’X)-1 where Var (y) is σ2I

Var (β) = (X’X)-1X’ σ2I X(X’X)-1 taking σ2I to the left

Var (β) = σ2 (X’X)-1X’ X(X’X)-1

Var (β) = σ2 (X’X)-1

13
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

The OLS regression line or fitted values line or the sample regression function is given by

the hats are to show that the values are estimated

 β0 is the value of y when x is zero

 the slope estimate it shows by how much y changes when x changes 1 one unit

2.3. Mechanics of OLS


Here we cover some algebraic properties of the fitted OLS regression line. These properties are
features of OLS for a particular sample of data. They can be contrasted with the statistical
properties of OLS, which requires deriving features of the sampling distributions of the
estimators.

Fitted Values and Residuals


The intercept and slope estimates, have been obtained for the given sample of data.

Given we can obtain the fitted value for each observation.

Each fitted value of is on the OLS regression line. The OLS residual associated with
observation is the difference between and its fitted value

 If is positive, the line under-predicts ;


 If is negative, the line over-predicts
 The ideal case for observation i is when = 0, but in most cases every residual is not
equal to zero.

Algebraic Properties of OLS Statistics


1. The sum, and therefore the sample average of the OLS residuals, is zero.

14
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

It follows immediately from the OLS first order condition.

The OLS estimates are chosen to make the residuals add up to zero (for any data set).

2. The sample covariance between the regressors and the OLS residuals is zero.

This also follows from the first order condition which can be written in terms of the residuals as

3. The point is always on the OLS regression line.

In other words, if we take equation and plug in for x, then the predicted value is

. This is exactly what equation that derived from first order condition shows.

Now we can write as its fitted value

 From property (1) above, the average of the residuals is zero; equivalently,
the sample average of the fitted values, , is the same as the sample average of the , or

 Further, properties (1) and (2) can be used to show that the sample covariance between
is zero. (Because is a function of x and covariance between x and u is zero?)
 Thus, we can view OLS as decomposing each yi into two parts, a fitted value and a
residual . The fitted values and residuals are uncorrelated in the sample.

Total variation from its mean is

15
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

this does not change if we add and subtract

squaring and summing

where

sample covariance
between the residuals and the fitted values is zero,

Total sum of square (TSS): is a measure of the total sample variation in the yi; that is, it
measures how spread out the yi are in the sample. If we divide TSS by n - 1, we obtain the

sample variance of y,

Explained sum of square (ESS): since , if we divide ESS by n-1 we get sample variation

in the , the explained sum of squares is sometimes called the “regression


sum of squares” or “model sum of squares.”

Residual Sum of Square (RSS): if we divide RSS by n-1 measures the sample variation in the

, the residual sum of squares is often called the “error sum of


squares.” Using the error sum of square is not good because the errors and the residuals are
different quantities

The total variation in y can always be expressed as the sum of the explained variation and the
unexplained variation.

16
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Goodness-of-Fit
It is measuring how well the explanatory or independent variable, x, explains the dependent
variable, y. It is often useful to compute a number that summarizes how well the OLS regression
line fits the data. In the following discussion, be sure to remember that we assume that an
intercept is estimated along with the slope.

Assuming that the total sum of squares, TSS, is not equal to zero—which is true except in the
very unlikely event that all the yi equal the same value, if we divide both sides by TSS

The R-squared of the regression, (coefficient of determination), is defined as

it is the fraction of the sample variation in y that is explained by x.

R2 is between zero and one. When interpreting, we usually multiply it by 100 to change it into a
percent: 100.R2 is the percentage of the sample variation in y that is explained by x.

If the data points all lie on the same line, OLS provides a perfect fit to the data. In this case, R 2 =
1. A value of R2 that is nearly equal to zero indicates a poor fit of the OLS line. That means very
little of the variation in the yi is captured by the variation in the (which all lie on the OLS
regression line). In fact, it can be shown that R 2 is equal to the square of the sample correlation
coefficient between yi and . This is where the term “R-squared” came from, because R is
correlation coefficient.

2.4. Units of Measurement and Functional Form


Two important issues in applied economics are

(1) understanding how changing the units of measurement of the dependent and/or
independent variables affects OLS estimates and

17
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

(2) knowing how to incorporate popular functional forms used in economics into regression
analysis.

The Effects of Changing Units of Measurement on OLS Statistics


OLS estimates change in entirely expected ways when the units of measurement of the
dependent and independent variables change.

When only the dependent variable unit of measurement changes: Generally, it is easy to figure
out what happens to the intercept and slope estimates when the dependent variable changes units
of measurement. If the dependent variable is multiplied by the constant c—which means each
value in the sample is multiplied by c—then the OLS intercept and slope estimates are also
multiplied by c. (This assumes nothing has changed about the independent variable.)

When the independent variables unit of measurement change: Generally, if the independent
variable is divided or multiplied by some nonzero constant, c, then the OLS slope coefficient is
also multiplied or divided by c respectively. Changing the units of measurement of only the
independent variable does not affect the intercept because the independent variable with the
intercept is zero.

Change in the goodness of fit when the measurement of the dependent or independent variable
changes: the goodness-of-fit (R2) of the model should not depend on the units of measurement of
our variables.

Incorporating Nonlinearities in Simple Regression


So far we have focused on linear relationships between the dependent and independent variables.
Linear relationships are not nearly general enough for all economic applications. Fortunately, it
is rather easy to incorporate many nonlinearities into simple regression analysis by appropriately
defining the dependent and independent variables. Here we will cover two possibilities that often
appear in applied work.

When the dependent and independent variables are in linear form: Example
. In this equation the effect of one additional year of education
regardless of years of education before is the same. One more year of education increases wage
by β1, regardless of years of education before.

Where only the dependent variable is in logarithmic form: Example,


taking the change in the variables due to one unit change in
education is

18
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

This shows that the return of education depends on the level of education before. So the return of
one additional year of education increases wage by increasing rate. Since the dependent variable
is in logarithmic form, to draw its graph we change it to level form by taking its exponent.

Where both the dependent and the independent variables are in logarithmic form (constant
elasticity model): . This model falls under the simple regression model
and estimated by the OLS. It has percentage interpretation. Taking the change in the variables

Its interpretation is that a 1% change in x changes y by β1%

Unit of measurement and the logarithmic form: What happens to the intercept and slope
estimates if we change the units of measurement of the dependent variable when it appears in
logarithmic form. Because the change to logarithmic form approximates a proportionate change,
it makes sense that nothing happens to the slope. We can see this by writing the rescaled variable
as c1yi for each observation i. The original equation is

19
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

If we add log(c1) to both sides, we get

Remember that the sum of the logs is equal to the log of their product. Therefore, the slope is

still β1, but the intercept is now

Similarly, if the independent variable is log(x), and we change the units of measurement of x
before taking the log, the slope remains the same but the intercept does not change.

Summary of functional form

The model with y as the dependent variable and x as the independent variable is called the level-
level model, because each variable appears in its level form. The model with log(y) as the
dependent variable and x as the independent variable is called the log-level model. We will not
explicitly discuss the level-log model here, because it arises less often in practice.

If we have coefficient of a logarithmic transformed variable to change back into level


interpretation we take exponent of the coefficient. Example

 Then, change y to log(y)


 …Run this like a regular OLS equation. Then you have to “back out” the results

20
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

or

where eβ ≈ 1+β

The Meaning of “Linear” Regression


The simple regression model that we have studied in this chapter is also called the simple linear
regression model. Yet, as we have just seen, the general model also allows for certain nonlinear
relationships. So what does “linear” mean here? It is when the equation is linear in the
parameters, β0 and β1.

Eg.

log

While the mechanics of simple regression do not depend on how y and x are defined, the
interpretation of the coefficients does depend on their definitions.

2.5 Expected Values and Variances of the OLS Estimators


The population model is given by

We now view as estimators for the parameters that appear in the population

model. This means that we will study properties of the distributions of over different
random samples from the population.

Unbiasedness of OLS
Where SLR is simple linear regression

Assumption SLR.1: linear in parameters

In the population model, the dependent variable y is related to the independent variable x and the
error (or disturbance) u as

21
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

To be realistic, y, x, and u are all viewed as random variables in stating the population model.
We are interested in using data on y and x to estimate the parameters β 0 and, especially, β1. We
assume that our data were obtained as a random sample.

Assumption SLR. 2: Random sampling

We can use a random sample of size n, {(x i,yi): i = 1,2,…,n}, from the population model. We can
write the population equation in terms of the random sample as

where ui is the error or disturbance for observation i. Thus, ui contains the unobservables for
observation i which affect yi. The should not be confused with the residuals, .Later on, we
will explore the relationship between the errors and the residuals.

Assumption SLR.3: Zero conditional mean of the error term

For a random sample, this assumption implies that , for all i = 1,2,…,n.

In order to obtain unbiased estimators of β0 and β1, we need to impose the zero conditional mean
assumption. Conditioning on the sample values of the independent variable is the same as
treating the xi as fixed in repeated samples. This process involves several steps. We first choose n
sample values for x1, x2, …, xn (These can be repeated.). Given these values, we then obtain a
sample on y (effectively by obtaining a random sample of the ui). Next another sample of y is

22
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

obtained, using the same values for x1, …, xn. Then another sample of y is obtained, again using
the same xi. And so on.

The fixed in repeated samples scenario is not very realistic in nonexperimental contexts. For
instance, in sampling individuals for the wage-education example, it makes little sense to think
of choosing the values of educ ahead of time and then sampling individuals with those particular
levels of education. Random sampling, where individuals are chosen randomly and their wage
and education are both recorded, is representative of how most data sets are obtained for
empirical analysis in the social sciences. Once we assume that , and we have random
sampling, nothing is lost in derivations by treating the xi as nonrandom. The danger is that the
fixed in repeated samples assumption always implies that ui and xi are independent.

Assumption SLR. 4: Sample variation in the independent variable

In the sample, the independent variables xi, i = 1,2,…,n, are not all equal to the same constant.
This requires some variation in x in the population.

This is because in the slope estimator the variation in x in the denominator becomes zero if there
is no variation in x.

Because we are now interested in the behavior of across all possible samples, is properly
viewed as a random variable.

We can write in terms of the population coefficients and errors by substituting for yi by

Since so

Since

23
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

We now see that the estimator equals the population slope , plus a term that is a linear

combination in the errors {u1,u2,…,un}. Conditional on the values of xi, the randomness in is
due entirely to the errors in the sample. The fact that these errors are generally different from

zero is what causes to differ from .

Theorem 2.1: unbiasedness of the OLS estimators

Using Assumptions SLR.1 through SLR.4,

Proof for β1

In this proof, the expected values are conditional on the sample values of the independent

variable. Since in are functions only of the xi, they are


nonrandom in the conditioning

where we have used the fact that the expected value of each ui (conditional on {x1,x2,...,xn}) is
zero under Assumptions SLR.2 and SLR.3.

Broof for β0

From the OLS estimator first order condition

24
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

Remember that unbiasedness is a feature of the sampling distributions of , which says


nothing about the estimate that we obtain for a given sample. We hope that, if the sample we
obtain is somehow “typical,” then our estimate should be “near” the population value.
Unfortunately, it is always possible that we could obtain an unlucky sample that would give us a
point estimate far from , and we can never know for sure whether this is the case.

Unbiasedness generally fails if any of our four assumptions fail.

Using simple regression when u contains factors affecting y that are also correlated with x can
result in spurious correlation: that is, we find a relationship between y and x that is really due to
other unobserved factors that affect y and also happen to be correlated with x. In addition to
omitted variables, there are other reasons for x to be correlated with u in the simple regression
model.

Variances of the OLS Estimators


In addition to knowing that the sampling distribution of is centered about ( is unbiased),

it is important to know how far we can expect to be away from on average. Among other
things, this allows us to choose the best estimator among all, or at least a broad class of, the

unbiased estimators. The measure of spread in the distribution of (and ) that is easiest to
work with is the variance or its square root, the standard deviation.

It turns out that the variance of the OLS estimators can be computed under Assumptions SLR.1
through SLR.4. However, these expressions would be somewhat complicated. Instead, we add an
assumption that is traditional for cross-sectional analysis. This assumption states that the

25
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

variance of the unobservable, u, conditional on x, is constant. This is known as the


homoskedasticity or “constant variance” assumption.

Assumption SLR. 5: Homoskedasticity

We must emphasize that the homoskedasticity assumption is quite distinct from the zero
conditional mean assumption, E(u/x) = 0. Assumption SLR.3 involves the expected value of u,
while Assumption SLR.5 concerns the variance of u (both conditional on x). Recall that we
established the unbiasedness of OLS without Assumption SLR.5: the homoskedasticity

assumption plays no role in showing that are unbiased. We add Assumption SLR.5

because it simplifies the variance calculations for and because it implies that ordinary
least squares has certain efficiency properties. If we were to assume that u and x are independent,
then the distribution of u given x does not depend on x, and so and
. But independence is sometimes too strong of an assumption.

Because since

Which means σ2 is also the unconditional expectation of u2 (the unconditional variance of u,). σ2
is often called the error variance or disturbance variance. The square root of σ2, σ, is the standard
deviation of the error. A larger σ means that the distribution of the unobservables affecting y is
more spread out.

It is often useful to write Assumptions SLR.3 and SLR.5 in terms of the conditional mean and
conditional variance of y:

For variance also

26
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

In other words, the conditional expectation of y given x is linear in x, but the variance of y given
x is constant. This situation is graphed in Figure below where

When Var(u/x) depends on x, the error term is said to exhibit heteroskedasticity (or non-constant
variance). Since Var(u/x) = Var(y/x), heteroskedasticity is present whenever Var(y/x) is a
function of x. See the figure below for heteroskedasticity.

Theorem 2.2:The sampling variances of the OLS estimators is

27
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

where these are conditional on the


sample values {x1,…,xn}.These formulas are the “standard” formulas for simple regression
analysis, which are invalid in the presence of heteroskedasticity. This will be important when we
turn to confidence intervals and hypothesis testing in multiple regression analysis.

Under Assumptions SLR.1 through SLR.5 we can prove these.

Prove for β1

Since β1 is just a constant, and we are conditioning on the x i, is also nonrandom.


Furthermore, because the ui are independent random variables across i (by random sampling), the
variance of the sum is the sum of the variances. Using these facts, we have

depends on the error variance, σ2, and the total variation in {x1,x2,…,xn. First, the larger

the error variance, the larger is .This makes sense since more variation in the

unobservables affecting y makes it more difficult to precisely estimate . On the other hand,
more variability in the independent variable is preferred: as the variability in the x i increases, the

variance of decreases. This also makes intuitive sense since the more spread out is the sample
of independent variables, the easier it is to trace out the relationship between E(y/x) and x. That

28
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

is, the easier it is to estimate .If there is little variation in the x i, then it can be hard to pinpoint
how E(y/x) varies with x. As the sample size increases, so does the total variation in the x i.

Therefore, a larger sample size results in a smaller variance for . For the purposes of
constructing confidence intervals and deriving test statistics, we will need to work with the

standard deviations of and .

Proof for β0?

Variance of the OLS estimator using matrix

Estimating the Error Variance


and depends on the variance of the error term . Now let us emphasize on
the difference between the errors (or disturbances) and the residuals. The distinction is crucial
for constructing an estimator .

29
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

The model shows how to write the population model in terms of a randomly
sampled observation, where ui is the error for observation i. We can also express yi in terms of its

fitted value and residual as in equation . Comparing these two equations, we see

that the error shows up in the equation containing the population parameters, . On the

other hand, the residuals show up in the estimated equation with . The errors are never
observable, while the residuals are computed from the data.

We can write the residuals as a function of the errors by substituting the population model

into the sample estimated model for yi.

Although the expected value of , and similarly the expected value , but is

not the same as . The difference between them does have an expected value of zero.

An unbiased “estimator” of σ2 is degree of freedom is n

Unfortunately, this is not a true estimator, because we do not observe the errors u i. But, we do
have estimates of the OLS residuals .

If we replace the errors with the OLS residuals, have This is a true estimator,
because it gives a computable rule for any sample of data on x and y. One slight drawback to this
estimator is that it turns out to be biased (although for large n the bias is small). Since it is easy to
compute an unbiased estimator, we use that instead.

The estimator RSS/n is biased essentially because it does not account for two restrictions that
must be satisfied by the OLS residuals. These restrictions are given by the two OLS first order
conditions:

30
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

The restrictions means if we know n - 2 of the residuals, we can always get the other two
residuals by using the restrictions implied by the first order conditions in. Thus, there are only
n -2 degrees of freedom in the OLS residuals [as opposed to n degrees of freedom in the errors].
So the unbiased estimator of σ2 that we will use makes a degrees-of-freedom adjustment:

Theorem 2.3: The unbiased estimator of

Proof

If we average equation across all i and use the fact that the OLS

residuals average out to zero, we have . Subtracting this average


from the first equation gives

Therefore,

Summing across all i gives

Taking expected value of the above

Now, the expected value of the first term is

for sample degree of freedom is n-1

31
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

The expected value of the second term is

substituting

Finally, the third term can be written as ( it is not clear how )

Taking expectation

Putting these three terms together gives

So that

If is plugged into the parameter variance formulas, then we have unbiased estimators of

. The natural estimator of and is called the standard error of


the regression (SER). Other names for are the standard error of the estimate and the root mean
squared error. Although is not an unbiased estimator of , we can show that it is a consistent
estimator of .

The estimate is interesting since it is an estimate of the standard deviation in the


unobservables affecting y; equivalently, it estimates the standard deviation in y after the effect of
x has been taken out. Most regression packages report the value of along with the R-squared,
intercept, slope, and other OLS statistics.

Substituting we get

and

The standard deviation of the parameters become

32
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

and this are also called the standard error of

Note that are viewed as a random variable when we think of running OLS
over different samples of y; this is because varies with different samples. For a given sample,

are numbers, just as are simply numbers when we compute it from


the given data.

The error variance estimator using matrix


however we do not know

So,

However the unbiased degree of freedom is n-2 not n

2.6 Regression through the Origin


In rare cases, we wish to impose the restriction that, when x = 0, the expected value of y is zero.
There are certain relationships for which this is reasonable. For example, if income (x) is zero,

33
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

then income tax revenues (y) must also be zero. In addition, there are problems where a model
that originally has a nonzero intercept is transformed into a model without an intercept.

Formally, we now choose a slope estimator, which we call , and a line of the form

where the tildas over and y are used to distinguish this problem from the much more common
problem of estimating an intercept along with a slope. This model is called regression through
the origin because the line passes through the point . To obtain the slope
estimate in we still rely on the method of ordinary least squares, which in this case minimizes the
sum of squared residuals.

Using calculus must solve the first order condition

Solving for

provided that not all the xi are zero, a case we rule out.

34
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition

The estimate through the origin and the estimate with intercept are the same if, and only if
.

===========================//================================

35

You might also like