CH 1 and 2
CH 1 and 2
Chapter
Formulation of the question of interest : the first step in any empirical analysis is the
careful formulation of the question of interest. The question might deal with testing a certain
aspect of an economic theory, or it might pertain to testing the effects of a government policy. In
principle, econometric methods can be used to answer a wide range of questions.
1
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Empirical analysis: An empirical analysis, by definition, requires data. After data on the
relevant variables have been collected, econometric methods are used to estimate the parameters
in the econometric model and to formally test hypotheses of interest.
Inference and Prediction: In some cases, the econometric model is used to make predictions
in either the testing of a theory or the study of a policy’s impact.
Cross-Sectional Data:
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time. An important feature of
cross-sectional data is that we can often assume that they have been obtained by random
sampling from the underlying population.
Another violation of random sampling occurs when we sample from units that are large relative
to the population, particularly geographical units. The potential problem in such cases is that the
population is not large enough to reasonably assume the observations are independent draws.
Cross-sectional data are widely used in economics and other social sciences. In economics, the
analysis of cross-sectional data is closely aligned with the applied microeconomics fields, such
as labor economics, state and local public finance, industrial organization, urban economics,
demography, and health economics. Data on individuals, households, firms, and cities at a given
point in time are important for testing microeconomic hypotheses and evaluating economic
policies.
2
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
A key feature of time series data that makes it more difficult to analyze than cross-sectional data
is the fact that economic observations can rarely, if ever, be assumed to be independent across
time. Most economic and other time series are related, often strongly related, to their recent
histories. For example, knowing something about the gross domestic product from last quarter
tells us quite a bit about the likely range of the GDP during this quarter, since GDP tends to
remain fairly stable from one quarter to the next. While most econometric procedures can be
used with both cross-sectional and time series data, more needs to be done in specifying
econometric models for time series data before standard econometric methods can be justified.
Another feature of time series data that can require special attention is the data frequency at
which the data are collected. In economics, the most common frequencies are daily, weekly,
monthly, quarterly, and annually. Many weekly, monthly, and quarterly economic time series
display a strong seasonal pattern, which can be an important factor in a time series analysis. For
example, monthly data on housing starts differs across the months simply due to changing
weather conditions.
Pooling cross sections from different years is often an effective way of analyzing the effects of a
new government policy. The idea is to collect data from the years before and after a key policy
change. A pooled cross section is analyzed much like a standard cross section, except that we
often need to account for secular differences in the variables across the time. In fact, in addition
3
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
to increasing the sample size, the point of a pooled cross-sectional analysis is often to see how a
key relationship has changed over time.
The key feature of panel data that distinguishes it from a pooled cross section is the fact that the
same cross-sectional units (individuals, firms, or counties) are followed over a given time period.
Because panel data require replication of the same units over time, panel data sets, especially
those on individuals, households, and firms, are more difficult to obtain than pooled cross
sections. Not surprisingly, observing the same units over time leads to several advantages over
cross-sectional data or even pooled cross-sectional data. The benefit that we will focus on in this
text is that having multiple observations on the same units allows us to control certain
unobserved characteristics of individuals, firms, and so on. As we will see, the use of more than
one observation can facilitate causal inference in situations where inferring causality would be
very difficult if only a single cross section were available. A second advantage of panel data is
that it often allows us to study the importance of lags in behavior or the result of decision
making. This information can be significant since many economic policies can be expected to
have an impact only after some time has passed.
The notion of ceteris paribus—which means “other (relevant) factors being equal”—plays an
important role in causal analysis. Most economic questions are ceteris paribus by nature. For
instance in the theory of demand if other factors are not held fixed, then we cannot know the
causal effect of a price change on quantity demanded. Holding other factors fixed is critical for
policy analysis as well.
While this may seem pretty simple, even at this early stage it should be clear that, except in very
special cases, it will not be possible to literally hold all else equal. The key question in most
4
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
empirical studies is: Have enough other factors been held fixed to make a case for causality?
Rarely is an econometric study evaluated without raising this issue. In most serious applications,
the number of factors that can affect the variable of interest is immense, and the isolation of any
particular variable may seem like a hopeless effort. However, we will eventually see that, when
carefully applied, econometric methods can simulate a ceteris paribus experiment. Even when
economic theories are not most naturally described in terms of causality, they often have
predictions that can be tested using econometric methods.
5
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Chapter 2
The Simple Regression Model
The simple regression model can be used to study the relationship between two variables.
First, since there is never an exact relationship between two variables, how do we allow
for other factors to affect y?
Second, what is the functional relationship between y and x? And
Third, how can we be sure we are capturing a ceteris paribus relationship between y and
x (if that is a desired goal)?
We can resolve these ambiguities by writing down an equation (population regression) relating y
to x. A simple equation is
Such mathematical equation for a population that relates two variables is known as
The variables y and x have several different names used interchangeably. The terms “dependent
variable” and “independent variable” are frequently used in econometrics. But be aware that the
label “independent” here does not refer to the statistical notion of independence between random
variables. In statistics, random variables X and Y are said to be independent if probability of the
occurrence X does not affect the probability of occurrence of Y and vice versa.
6
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
/covariate
the variable u, called the error term or disturbance (not residual) in the relationship, represents
factors other than x that affect y. A simple regression analysis effectively treats all factors
affecting y other than x as being unobserved. You can usefully think of u as standing for
“unobserved.” If the other factors in u are held fixed, so that the change in u is zero, ∆u = 0, then
x has a linear effect on y:
Thus, the change in y is simply β1 multiplied by the change in x. This means that β1 is the slope
parameter in the relationship between y and x holding the other factors in u fixed. The linearity
implies that a one-unit change in x has the same effect on y, regardless of the initial value of x.
This is unrealistic for many economic applications. For example, in a wage-education function,
we might want to allow for increasing returns: the next year of education has a larger effect on
wages than did the previous year.
The intercept parameter β0 also has its uses, although it is rarely central to an analysis.
really allows us to draw ceteris paribus conclusions about how x affects y. We just saw in that β1
does measure the effect of x on y, holding all other factors (in u) fixed. Is this the end of the
causality issue? Unfortunately, no. How can we hope to learn in general about the ceteris paribus
effect of x on y, holding other factors fixed, when we are ignoring all those other factors?
We are only able to get reliable estimators of β 0 and β1 from a random sample of data when we
make an assumption restricting how the unobservable u is related to the explanatory variable x.
Without such a restriction, we will not be able to estimate the ceteris paribus effect, β 1. Because
7
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
u and x are random variables. As long as the intercept β 0 is included in the equation, nothing is
lost by assuming that the average value of u in the population is zero. This means the
unobservable is normally distributed.
A natural measure of the association between two random variables is the correlation coefficient.
If u and x are uncorrelated ( ), then, as random variables, they are not linearly related.
(they may be related but not linearly). It is possible for u to be uncorrelated with x while being
correlated with functions of x, such as x2.
Since most of the time u and x are correlated we rely on the conditional expectation E(u/x) not on
the unconditional expectation E(u). Because u and x are random variables, we can define the
conditional distribution of u given any value of x. In particular, for any x, we can obtain the
expected (or average) value of u for that slice of the population described by the value of x. If u
and x are unrelated (u does not depend on the value of x)
The zero conditional mean assumption another interpretation that is often useful. Taking the
expected value of conditional on x we get
The equation shows E(y/x) is a linear function of x. For example if x is education and y is wage,
for different level of education E(y) is measured. See the graph below.
8
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
1) The pieces is sometimes called the systematic part of y—that is, the part of y
explained by x—and
2) u is called the unsystematic part, or the part of y not explained by x.
How to use the sample data to obtain estimates of the intercept and slope in the population
regression? One way of finding the parameters is using the OLS estimator. In the population
regression are parameters that minimize the square of the error term (disturbance). In
the sample regression line they are parameters that minimize the square of the residuals.
9
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Or using Expectation
10
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
So using the method of moments approach to estimation we can solve for β0 and β1 from
Plugging into
11
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
So the estimate is simply the sample covariance between x and y divided by the sample
variance of x. If x and y are positively correlated in the sample, then is positive; and if x and
Let there is only one independent variable x. β0 and β1 are the intercept and slope parameters.
y=Xβ+u
12
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
RSS=u’u=(y-Xβ)’(y-Xβ)
RSS=u’u=(y’- β’X’)( y-Xβ) when we transpose a multiplication we change the order of the
places (Xββ’X’)
RSS=u’u= y’y - y’Xβ – β’X’y + β’X’Xβ to find the estimators we derivate the RSS with respect
to β
-2X’y + 2X’Xβ =0
2X’Xβ = 2X’y
X’Xβ = X’y
(X’X)-1X’Xβ = (X’X)-1X’y
Iβ = (X’X)-1X’y
Var (β)= Var[(X’X)-1X’y] here X is considered as constant (I don’t know the reason)
13
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
The OLS regression line or fitted values line or the sample regression function is given by
the slope estimate it shows by how much y changes when x changes 1 one unit
Each fitted value of is on the OLS regression line. The OLS residual associated with
observation is the difference between and its fitted value
14
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
The OLS estimates are chosen to make the residuals add up to zero (for any data set).
2. The sample covariance between the regressors and the OLS residuals is zero.
This also follows from the first order condition which can be written in terms of the residuals as
In other words, if we take equation and plug in for x, then the predicted value is
. This is exactly what equation that derived from first order condition shows.
From property (1) above, the average of the residuals is zero; equivalently,
the sample average of the fitted values, , is the same as the sample average of the , or
Further, properties (1) and (2) can be used to show that the sample covariance between
is zero. (Because is a function of x and covariance between x and u is zero?)
Thus, we can view OLS as decomposing each yi into two parts, a fitted value and a
residual . The fitted values and residuals are uncorrelated in the sample.
15
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
where
sample covariance
between the residuals and the fitted values is zero,
Total sum of square (TSS): is a measure of the total sample variation in the yi; that is, it
measures how spread out the yi are in the sample. If we divide TSS by n - 1, we obtain the
sample variance of y,
Explained sum of square (ESS): since , if we divide ESS by n-1 we get sample variation
Residual Sum of Square (RSS): if we divide RSS by n-1 measures the sample variation in the
The total variation in y can always be expressed as the sum of the explained variation and the
unexplained variation.
16
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Goodness-of-Fit
It is measuring how well the explanatory or independent variable, x, explains the dependent
variable, y. It is often useful to compute a number that summarizes how well the OLS regression
line fits the data. In the following discussion, be sure to remember that we assume that an
intercept is estimated along with the slope.
Assuming that the total sum of squares, TSS, is not equal to zero—which is true except in the
very unlikely event that all the yi equal the same value, if we divide both sides by TSS
R2 is between zero and one. When interpreting, we usually multiply it by 100 to change it into a
percent: 100.R2 is the percentage of the sample variation in y that is explained by x.
If the data points all lie on the same line, OLS provides a perfect fit to the data. In this case, R 2 =
1. A value of R2 that is nearly equal to zero indicates a poor fit of the OLS line. That means very
little of the variation in the yi is captured by the variation in the (which all lie on the OLS
regression line). In fact, it can be shown that R 2 is equal to the square of the sample correlation
coefficient between yi and . This is where the term “R-squared” came from, because R is
correlation coefficient.
(1) understanding how changing the units of measurement of the dependent and/or
independent variables affects OLS estimates and
17
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
(2) knowing how to incorporate popular functional forms used in economics into regression
analysis.
When only the dependent variable unit of measurement changes: Generally, it is easy to figure
out what happens to the intercept and slope estimates when the dependent variable changes units
of measurement. If the dependent variable is multiplied by the constant c—which means each
value in the sample is multiplied by c—then the OLS intercept and slope estimates are also
multiplied by c. (This assumes nothing has changed about the independent variable.)
When the independent variables unit of measurement change: Generally, if the independent
variable is divided or multiplied by some nonzero constant, c, then the OLS slope coefficient is
also multiplied or divided by c respectively. Changing the units of measurement of only the
independent variable does not affect the intercept because the independent variable with the
intercept is zero.
Change in the goodness of fit when the measurement of the dependent or independent variable
changes: the goodness-of-fit (R2) of the model should not depend on the units of measurement of
our variables.
When the dependent and independent variables are in linear form: Example
. In this equation the effect of one additional year of education
regardless of years of education before is the same. One more year of education increases wage
by β1, regardless of years of education before.
18
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
This shows that the return of education depends on the level of education before. So the return of
one additional year of education increases wage by increasing rate. Since the dependent variable
is in logarithmic form, to draw its graph we change it to level form by taking its exponent.
Where both the dependent and the independent variables are in logarithmic form (constant
elasticity model): . This model falls under the simple regression model
and estimated by the OLS. It has percentage interpretation. Taking the change in the variables
Unit of measurement and the logarithmic form: What happens to the intercept and slope
estimates if we change the units of measurement of the dependent variable when it appears in
logarithmic form. Because the change to logarithmic form approximates a proportionate change,
it makes sense that nothing happens to the slope. We can see this by writing the rescaled variable
as c1yi for each observation i. The original equation is
19
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Remember that the sum of the logs is equal to the log of their product. Therefore, the slope is
Similarly, if the independent variable is log(x), and we change the units of measurement of x
before taking the log, the slope remains the same but the intercept does not change.
The model with y as the dependent variable and x as the independent variable is called the level-
level model, because each variable appears in its level form. The model with log(y) as the
dependent variable and x as the independent variable is called the log-level model. We will not
explicitly discuss the level-log model here, because it arises less often in practice.
20
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
or
where eβ ≈ 1+β
Eg.
log
While the mechanics of simple regression do not depend on how y and x are defined, the
interpretation of the coefficients does depend on their definitions.
We now view as estimators for the parameters that appear in the population
model. This means that we will study properties of the distributions of over different
random samples from the population.
Unbiasedness of OLS
Where SLR is simple linear regression
In the population model, the dependent variable y is related to the independent variable x and the
error (or disturbance) u as
21
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
To be realistic, y, x, and u are all viewed as random variables in stating the population model.
We are interested in using data on y and x to estimate the parameters β 0 and, especially, β1. We
assume that our data were obtained as a random sample.
We can use a random sample of size n, {(x i,yi): i = 1,2,…,n}, from the population model. We can
write the population equation in terms of the random sample as
where ui is the error or disturbance for observation i. Thus, ui contains the unobservables for
observation i which affect yi. The should not be confused with the residuals, .Later on, we
will explore the relationship between the errors and the residuals.
For a random sample, this assumption implies that , for all i = 1,2,…,n.
In order to obtain unbiased estimators of β0 and β1, we need to impose the zero conditional mean
assumption. Conditioning on the sample values of the independent variable is the same as
treating the xi as fixed in repeated samples. This process involves several steps. We first choose n
sample values for x1, x2, …, xn (These can be repeated.). Given these values, we then obtain a
sample on y (effectively by obtaining a random sample of the ui). Next another sample of y is
22
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
obtained, using the same values for x1, …, xn. Then another sample of y is obtained, again using
the same xi. And so on.
The fixed in repeated samples scenario is not very realistic in nonexperimental contexts. For
instance, in sampling individuals for the wage-education example, it makes little sense to think
of choosing the values of educ ahead of time and then sampling individuals with those particular
levels of education. Random sampling, where individuals are chosen randomly and their wage
and education are both recorded, is representative of how most data sets are obtained for
empirical analysis in the social sciences. Once we assume that , and we have random
sampling, nothing is lost in derivations by treating the xi as nonrandom. The danger is that the
fixed in repeated samples assumption always implies that ui and xi are independent.
In the sample, the independent variables xi, i = 1,2,…,n, are not all equal to the same constant.
This requires some variation in x in the population.
This is because in the slope estimator the variation in x in the denominator becomes zero if there
is no variation in x.
Because we are now interested in the behavior of across all possible samples, is properly
viewed as a random variable.
We can write in terms of the population coefficients and errors by substituting for yi by
Since so
Since
23
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
We now see that the estimator equals the population slope , plus a term that is a linear
combination in the errors {u1,u2,…,un}. Conditional on the values of xi, the randomness in is
due entirely to the errors in the sample. The fact that these errors are generally different from
Proof for β1
In this proof, the expected values are conditional on the sample values of the independent
where we have used the fact that the expected value of each ui (conditional on {x1,x2,...,xn}) is
zero under Assumptions SLR.2 and SLR.3.
Broof for β0
24
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Using simple regression when u contains factors affecting y that are also correlated with x can
result in spurious correlation: that is, we find a relationship between y and x that is really due to
other unobserved factors that affect y and also happen to be correlated with x. In addition to
omitted variables, there are other reasons for x to be correlated with u in the simple regression
model.
it is important to know how far we can expect to be away from on average. Among other
things, this allows us to choose the best estimator among all, or at least a broad class of, the
unbiased estimators. The measure of spread in the distribution of (and ) that is easiest to
work with is the variance or its square root, the standard deviation.
It turns out that the variance of the OLS estimators can be computed under Assumptions SLR.1
through SLR.4. However, these expressions would be somewhat complicated. Instead, we add an
assumption that is traditional for cross-sectional analysis. This assumption states that the
25
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
We must emphasize that the homoskedasticity assumption is quite distinct from the zero
conditional mean assumption, E(u/x) = 0. Assumption SLR.3 involves the expected value of u,
while Assumption SLR.5 concerns the variance of u (both conditional on x). Recall that we
established the unbiasedness of OLS without Assumption SLR.5: the homoskedasticity
assumption plays no role in showing that are unbiased. We add Assumption SLR.5
because it simplifies the variance calculations for and because it implies that ordinary
least squares has certain efficiency properties. If we were to assume that u and x are independent,
then the distribution of u given x does not depend on x, and so and
. But independence is sometimes too strong of an assumption.
Because since
Which means σ2 is also the unconditional expectation of u2 (the unconditional variance of u,). σ2
is often called the error variance or disturbance variance. The square root of σ2, σ, is the standard
deviation of the error. A larger σ means that the distribution of the unobservables affecting y is
more spread out.
It is often useful to write Assumptions SLR.3 and SLR.5 in terms of the conditional mean and
conditional variance of y:
26
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
In other words, the conditional expectation of y given x is linear in x, but the variance of y given
x is constant. This situation is graphed in Figure below where
When Var(u/x) depends on x, the error term is said to exhibit heteroskedasticity (or non-constant
variance). Since Var(u/x) = Var(y/x), heteroskedasticity is present whenever Var(y/x) is a
function of x. See the figure below for heteroskedasticity.
27
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Prove for β1
depends on the error variance, σ2, and the total variation in {x1,x2,…,xn. First, the larger
the error variance, the larger is .This makes sense since more variation in the
unobservables affecting y makes it more difficult to precisely estimate . On the other hand,
more variability in the independent variable is preferred: as the variability in the x i increases, the
variance of decreases. This also makes intuitive sense since the more spread out is the sample
of independent variables, the easier it is to trace out the relationship between E(y/x) and x. That
28
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
is, the easier it is to estimate .If there is little variation in the x i, then it can be hard to pinpoint
how E(y/x) varies with x. As the sample size increases, so does the total variation in the x i.
Therefore, a larger sample size results in a smaller variance for . For the purposes of
constructing confidence intervals and deriving test statistics, we will need to work with the
29
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
The model shows how to write the population model in terms of a randomly
sampled observation, where ui is the error for observation i. We can also express yi in terms of its
fitted value and residual as in equation . Comparing these two equations, we see
that the error shows up in the equation containing the population parameters, . On the
other hand, the residuals show up in the estimated equation with . The errors are never
observable, while the residuals are computed from the data.
We can write the residuals as a function of the errors by substituting the population model
Although the expected value of , and similarly the expected value , but is
not the same as . The difference between them does have an expected value of zero.
Unfortunately, this is not a true estimator, because we do not observe the errors u i. But, we do
have estimates of the OLS residuals .
If we replace the errors with the OLS residuals, have This is a true estimator,
because it gives a computable rule for any sample of data on x and y. One slight drawback to this
estimator is that it turns out to be biased (although for large n the bias is small). Since it is easy to
compute an unbiased estimator, we use that instead.
The estimator RSS/n is biased essentially because it does not account for two restrictions that
must be satisfied by the OLS residuals. These restrictions are given by the two OLS first order
conditions:
30
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
The restrictions means if we know n - 2 of the residuals, we can always get the other two
residuals by using the restrictions implied by the first order conditions in. Thus, there are only
n -2 degrees of freedom in the OLS residuals [as opposed to n degrees of freedom in the errors].
So the unbiased estimator of σ2 that we will use makes a degrees-of-freedom adjustment:
Proof
If we average equation across all i and use the fact that the OLS
Therefore,
31
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
substituting
Taking expectation
So that
If is plugged into the parameter variance formulas, then we have unbiased estimators of
Substituting we get
and
32
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
Note that are viewed as a random variable when we think of running OLS
over different samples of y; this is because varies with different samples. For a given sample,
So,
33
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
then income tax revenues (y) must also be zero. In addition, there are problems where a model
that originally has a nonzero intercept is transformed into a model without an intercept.
Formally, we now choose a slope estimator, which we call , and a line of the form
where the tildas over and y are used to distinguish this problem from the much more common
problem of estimating an intercept along with a slope. This model is called regression through
the origin because the line passes through the point . To obtain the slope
estimate in we still rely on the method of ordinary least squares, which in this case minimizes the
sum of squared residuals.
Solving for
provided that not all the xi are zero, a case we rule out.
34
Jeffrey M. Wooldridge: Introductory Econometrics: A Modern Approach 2nd Edition
The estimate through the origin and the estimate with intercept are the same if, and only if
.
===========================//================================
35