0% found this document useful (0 votes)
18 views

Econometrics II (n)

The document is a compilation of notes on Econometrics II from Bonga University, covering topics such as regression analysis with qualitative information, time series econometrics, simultaneous equation models, and panel data. It discusses the use of binary (dummy) variables in regression models, the linear probability model, and the logit model, highlighting their applications and limitations. The content is structured into chapters, each focusing on different econometric methods and concepts relevant to economic analysis.

Uploaded by

selamutarekegn0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Econometrics II (n)

The document is a compilation of notes on Econometrics II from Bonga University, covering topics such as regression analysis with qualitative information, time series econometrics, simultaneous equation models, and panel data. It discusses the use of binary (dummy) variables in regression models, the linear probability model, and the logit model, highlighting their applications and limitations. The content is structured into chapters, each focusing on different econometric methods and concepts relevant to economic analysis.

Uploaded by

selamutarekegn0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

BONGA UNIVERSITY

COLLEGE OF BUSINESS AND ECONOMICS


DEPARTMENT OF ECONOMICS
Econometrics II

Compiled by Firehun Jemal

March, 2023
Bonga, Ethiopia
Table of Contents
CHAPTER ONE ................................................................................................................................................. 1
Regression Analysis with Qualitative Information:......................................................................................... 1
Binary (Dummy Variables) ................................................................................................................................ 1
1.1 Describing Qualitative Information ................................................................................................................ 1
1.2 A Single Dummy Independent Variable ......................................................................................................... 2
1.3 The Logit Model ........................................................................................................................................... 4
1.4 . The Probit Model .......................................................................................................................................... 6
CHAPTER TWO:............................................................................................................................................... 7
TIME SERIES ECONOMETERICS................................................................................................................ 7
2.1. STOCHASTIC PROCESSES ........................................................................................................................ 7
2.1.1 Stationary Stochastic Processes ............................................................................................................... 7
2.1.2 Non-stationary Stochastic Processes ....................................................................................................... 8
2.2. UNIT ROOT STOCHASTIC PROCESS .................................................................................................... 10
2.3 TREND STATIONARY (TS) AND DIFFERENCE STATIONARY (DS) ................................................ 10
2.4 INTEGRATED STOCHASTIC PROCESSES............................................................................................. 12
CHAPTER THREE .......................................................................................................................................... 16
INTRODUCTION TO SIMULTANEOUS EQUATION MODELS ........................................................... 16
3.1 Complete simultaneous equation model : .................................................................................................... 16
3.2 Simultaneity bias (Inconsistency of OLS Estimators under SEM) ............................................................... 19
3.2.1 Identification of Structural Equation (Order and rank conditions) (without proof) ............................... 19
CHAPYTER FOUR: ........................................................................................................................................ 22
Introduction PANEAL DATA ......................................................................................................................... 22
4.1 Pooled data .................................................................................................................................................... 23
4.1.2. Panel data/Longitudinal data .............................................................................................................. 23
4.2 Estimation of Panel Data Regression Models............................................................................................... 25
4.3 Comparison of FEM Vs ECM.................................................................................................................... 28

I
CHAPTER ONE
Regression Analysis with Qualitative Information:
Binary (Dummy Variables)
1.1 Describing Qualitative Information
Qualitative data can be observed and recorded. This data type is non-numerical in nature. This type of
data is collected through methods of observations, one-to-one interview, conducting focus groups and
similar methods. Qualitative data in statistics is also known as categorical data. Data that can be arranged
categorically based on the attributes and properties of a thing or a phenomenon.
Qualitative factors often come in the form of binary information: a person is female or male; a person does
or does not own a personal computer; a firm offers a certain kind of employee pension plan or it does not;
a state administers capital punishment or it does not. In all of these examples, the relevant information can
be captured by defining a binary variable or a zero-one variable. In econometrics, binary variables are
most commonly called dummy variables, although this name is not especially descriptive.
In defining a dummy variable, we must decide which event is assigned the value one and which is assigned
the value zero. For example, in a study of individual wage determination, we might define female to be a
binary variable taking on the value one for females and the value zero for males. The name in this case
indicates the event with the value one. The same information is captured by defining male to be one if the
person is male and zero if the person is female. Either of these is better than using gender because this
name does not make it clear when the dummy variable is one: does gender = 1 correspond to male or
female? What we call our variables is unimportant for getting regression results, but it always helps to
choose names that clarify equations and expositions.
Why do we use the values zero and one to describe qualitative information?
In a sense, these values are arbitrary: any two different values would do. The real benefit of capturing
qualitative information using zero-one variables is that it leads to regression models where the
parameters have very natural interpretations, as we will see now.

1
1.2 A Single Dummy Independent Variable
How do we incorporate binary information into regression models? In the simplest case, with only a single
dummy explanatory variable, we just add it as an independent variable in the equation. For example,
consider the following simple model of hourly wage determination:
𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛿0 𝑓𝑒𝑚𝑎𝑙𝑒 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢…………… 1.1

We use 𝛿0 as the parameter on female in order to highlight the interpretation of the parameters multiplying
dummy variables; later, we will use whatever notation is most convenient. In model (1.1), only two
observed factors affect wage: gender and education. Because female=1 when the person is female, and
female = 0 when the person is male, the parameter 𝛿0 has the following interpretation: 𝛿0 is the difference
in hourly wage between females and males, given the same amount of education (and the same error term
u). Thus, the coefficient 𝛿0 determines whether there is discrimination against women: if 𝛿0 < 0, then, for
the same level of other factors, women earn less than men on average. a dummy variable, say male, which
is one for males and zero for females. This would be redundant. In (1.1), the intercept for males is 𝛿0 , and
the intercept for females is 𝛽0 + 𝛿0 . Because there are just two groups, we only need two different
intercepts. This means that, in addition to 𝛽0, we need to use only one dummy variable; we have chosen
to include the dummy variable for females. Using two dummy variables would introduce perfect
collinearity because female + male = 1, which means that male is a perfect linear function of female.
Including dummy variables for both genders is the simplest example of the so-called dummy variable trap,
which arises when too many dummy variables describe a given number of groups.

In (1), we have chosen males to be the base group or benchmark group, that is, the group against which
comparisons are made. This is why 𝛽0 is the intercept for males, and 𝛿0 is the difference in intercepts
between females and males. We could choose females as the base group by writing the model as
𝑤𝑎𝑔𝑒 = 𝛼0 + 𝛾0 𝑚𝑎𝑙𝑒 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢
A Binary Dependent Variable
A Binary Dependent Variable can be estimated the following model.
I. The Linear Probability Model
A linear probability model (LPM) is a regression model where the outcome variable (regressand) is a
binary variable, and one or more explanatory variables are used to predict the outcome. Explanatory
variables can themselves be binary, or be continuous.
To fix ideas, consider the following linear probability regression model:

2
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 …………………… (1)
Where X = family income and
1 if the family owns a house
Y=
0 if it does not own a house.
This is because the conditional expectation of Yi given Xi, E(Yi|Xi),
 It can be interpreted as the conditional probability that the event will occur given Xi , that is, Pr
(Yi = 1|Xi). Thus, in our example, E(Yi|Xi) gives the probability of a family owning a house and
whose income is the given amount Xi .
The justification of the name LPM for models like (1) can be seen as follows: Assuming E(ui)= 0, as usual
(to obtain unbiased estimators), we obtain
𝐸(𝑌𝑖 |𝑋𝑖 ) = 𝛽1 + 𝛽2 𝑋𝑖 ………………………………………… …………………………………………2
Now, if Pi = probability that Yi = 1 (that is, the event occurs), and (1 − Pi) = probability that Yi=0 (that is,
that the event does not occur), the variable Yi has the following (probability) distribution.
𝑌𝑖 Probability
0 1-pi
1 -pi
Total 1

That is, Yi follows the Bernoulli probability distribution.


Now, by the definition of mathematical expectation, we obtain:
E(Yi) = 0(1 − Pi) + 1(Pi) = Pi ………………………………..………………………………………(3)
Comparing (2) with (3), we can equate
E(Yi | Xi) = β1 + β2Xi = Pi ……………………………………………………………………………(4)
That is, the conditional expectation of the model (1) can, in fact, be interpreted as the conditional
probability of Yi . In general, the expectation of a Bernoulli random variable is the probability that the
random variable equals 1. In passing note that if there are n independent trials, each with a probability p
of success and probability (1 − p) of failure, and X of these trials represent the number of successes, then
X is said to follow the binomial distribution. The mean of the binomial distribution is np and its variance
is np (1 − p). The term success is defined in the context of the problem.
Since the probability Pi must lie between 0 and 1, we have the restriction
0 ≤ E(Yi | Xi) ≤ 1 ……………………………………………………………………………..(5)

3
that is, the conditional expectation (or conditional probability) must lie between 0 and 1.
From the preceding discussion it would seem that OLS can be easily extended to binary dependent variable
regression models. However, the LPM poses several problems, which are as follows:

Non-Normality of the Disturbances ui


Although OLS does not require the disturbances (ui) to be normally distributed, we assumed them to be
so distributed for the purpose of statistical inference. But the assumption of normality for ui is not tenable
for the LPMs because, like Yi, the disturbances ui also take only two values; that is, they also follow the
Bernoulli distribution.
Obviously, ui cannot be assumed to be normally distributed; they follow the Bernoulli distribution.
But the nonfulfillment of the normality assumption may not be so critical as it appears because we know
that the OLS point estimates still remain unbiased (recall that, if the objective is point estimation, the
normality assumption is not necessary). Besides, as the sample size increases indefinitely, statistical theory
shows that the OLS estimators tend to be normally distributed generally. As a result, in large samples the
statistical inference of the LPM will follow the usual OLS procedure under the normality assumption.
Limitation of LPM
As we have seen, the LPM is plagued by several problems, such as
1) non-normality of ui,
2) heteroscedasticity of ui,
3) possibility of ˆYi lying outside the 0–1 range, and
4) The generally lower R2 values.
1.3 The Logit Model
We will continue with our home ownership example to explain the basic ideas underlying the logit model.
Recall that in explaining home ownership in relation to income, the LPM was
Pi = E(Y = 1 | Xi) = β1 + β2Xi ………………………………………….................................................. (1)
Where X is income and Y = 1 means the family owns a house. But now consider the following
representation of home ownership:
1
𝑃𝑖 = 𝐸(𝑌 = 1 | 𝑋𝑖) = 1 +𝑒 −(𝛽1+𝛽2𝑋𝑖 )……………………………………………………………… (2)

For ease of exposition, we write (15.5.2) as


1 𝑒𝑧
𝑃𝑖 = 1+𝑒 −𝑍𝑖 = 1+𝑒 𝑧…………………………………………………………………………………… (3)

whereZi= β1 + β2Xi .
Equation (3) represents what is known as the (cumulative) logistic distribution function.

4
It is easy to verify that as Zi ranges from −∞ to +∞, Pi ranges between 0 and 1 and that Pi is nonlinearly
related to Zi(i.e., Xi), thus satisfying the two requirements considered earlier. But it seems that in satisfying
these requirements, we have created an estimation problem because Pi is nonlinear not only in X but also
in the β’s as can be seen clearly from (2). This means that we cannot use the familiar OLS procedure to
estimate the parameters. But this problem is more apparent than real because (2) can be linearized, which
can be shown as follows.
If Pi, the probability of owning a house, is given by (3), then (1 − Pi ), the probability of not owning a
house, is
1
1 − 𝑃𝑖 = 1+𝑒 𝑍𝑖 …………………………………………………………………………………………. 4)

Therefore, we can write


𝑃𝑖 1+𝑒 𝑍𝑖
= 1+𝑒 −𝑍𝑖 = 𝑒 𝑍𝑖 ……………………………………………… ……………………………………..(5)
1−𝑃𝑖

Now Pi/(1 − Pi) is simply the odds ratio in favor of owning a house—the ratio of the probability that a
family will own a house to the probability that it will not own a house. Thus, if Pi = 0.8, it means that
odds are 4 to 1 in favor of the family owning a house.
Now if we take the natural log of (5), we obtain a very interesting result, namely,
𝑃𝑖
𝐿𝑖 = ln ( ) = 𝑍𝑖
1 − 𝑃𝑖
= 𝛽1 + 𝛽2𝑋𝑖 ……………………………………………………………………………………..… (6)
that is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation viewpoint) linear
in the parameters.18 L is called the logit, and hence the name logit model for models like (6). In this
situation we may have to resort to the maximumlikelihood (ML) method to estimate the parameters than
OLS.
Notice these features of the logit model.
1) As P goes from 0 to 1 (i.e., as Z varies from −∞ to +∞), the logitL goes from −∞ to +∞. That is,
although the probabilities (of necessity) lie between 0 and 1, the logits are not so bounded.
2) Although L is linear in X, the probabilities themselves are not. This property is in contrast with the
LPM model (1) where the probabilities increase linearly with X.
3) Although we have included only a single X variable, or regressor, in the preceding model, one can
add as many regressors as may be dictated by the underlying theory.
4) If L, the logit, is positive, it means that when the value of the regressor(s) increases, the odds that
the regressand equals 1 (meaning some event of interest happens) increases. If L is negative, the
odds that the regressand equals 1 decreases as the value of X increases. To put it differently, the

5
logit becomes negative and increasingly large in magnitude as the odds ratio decreases from 1 to
0 and becomes increasingly large and positive as the odds ratio increases from 1 to infinity.
5) More formally, the interpretation of the logit model given in (6) is as follows: β2, the slope,
measures the change in L for a unit change in X, that is, it tells how the log-odds in favor of owning
a house change as income changes by a unit, say, $1000. The intercept β1 is the value of the log
odds in favor of owning a house if income is zero. Like most interpretations of intercepts, this
interpretation may not have any physical meaning.
6) Given a certain level of income, say, X*, if we actually want to estimate not the odds in favor of
owning a house but the probability of owning a house itself, this can be done directly from (3)
once the estimates of β1 + β2 are available. This, however, raises the most important question:
How do we estimate β1 and β2 in the first place? The answer is given in the next section.
7) Whereas the LPM assumes that Pi is linearly related to Xi, the logit model assumes that the log of
the odds ratio is linearly related to Xi. Using the estimated Pi, we can obtain the estimated logit as:
̂𝒊
𝑷
𝑳̂𝒊 = 𝒍𝒏( ̂𝟏 + 𝜷
̂ 𝟐 𝑿𝒊
𝟏−𝑷̂𝒊) =𝜷

1.4 . The Probit Model


As we have noted, to explain the behavior of a dichotomous dependent variable we will have to use a
suitably chosen CDF. The logit model uses the cumulative logistic function.
The estimating model that emerges from the normal CDF is popularly known as the probit model,
although sometimes it is also known as the normit model. In principle one could substitute the normal
CDF in place of the logistic CDF.

6
CHAPTER TWO:
TIME SERIES ECONOMETERICS
A time series data set consists of observations on a variable or several variables over time.
In economics examples of time series data include stock prices, money supply, consumer price index,
gross domestic product, exchange rates, exports, etc.
In such time series data set, time is an important dimension because past as well as current events influence
future events (that is, lags do matter in time series analysis and time series data. Unlike the arrangement
of cross-sectional data, the chronological ordering of observations(variables) in a time series expresses
potentially important information.
Thus, a key feature of time series data that makes it more difficult to analyze is the fact that economic
observations can rarely be assumed to be independent across time. Therefore, in general a time series data
is a sequence of numerical data in which each variable is associated with a particular instant in time.
2.1. STOCHASTIC PROCESSES
A random or stochastic process is a collection of random variables ordered in time. If we let Y denote a
random variable, and if it is continuous, we denote it as Y(t), but if it is discrete, we denoted it as Yt. An
example of the former is an electrocardiogram, and an example of the latter is GDP, PDI, etc.
2.1.1 Stationary Stochastic Processes
A type of stochastic process that has received a great deal of attention and scrutiny by time series analysts
is the so-called stationary stochastic process. Broadly speaking, a stochastic process is said to be
stationary if its mean and variance are constant over time and the value of the covariance between the two
time periods depends only on the distance or gap or lag between the two time periods and not the actual
time at which the covariance is computed.
In the time series literature, such a stochastic process is known as a weakly stationary, or covariance
stationary, or second-order stationary, or wide sense, stochastic process.
To explain weak stationarity, let Yt be a stochastic time series with these properties:
Mean: E(Yt) = μ
Variance: var (Yt) = E(Yt − μ)2 = σ2
Covariance: γk = E[(Yt − μ)(Yt+k − μ)]
where γk, the covariance (or autocovariance) at lag k, is the covariance between the values of Yt and Yt+k,
that is, between two Y values k periods apart. If k = 0, we obtain γ0, which is simply the variance of Y (=
σ2); if k = 1, γ1 is the covariance between two adjacent values of Y suppose we shift the origin of Y from

7
Yt to Yt+m (say, from the first quarter of 1970 to the first quarter of 1975 for our GDP data). Now if Yt is
to be stationary, the mean, variance, and auto covariance’s of Yt+m must be the same as those of Yt . In
short, if a time series is stationary, its mean, variance, and auto covariance (at various lags) remain the
same no matter at what point we measure them; that is, they are time invariant. Such a time series will
tend to return to its mean (called mean reversion) and fluctuations around this mean (measured by its
variance) will have a broadly constant amplitude. If a time series is not stationary in the sense just defined,
it is called a nonstationary time series (keep in mind we are talking only about weak stationarity). In
other words, a nonstationary time series will have a time varying mean or a time-varying variance or both.
A time series is strictly stationary if all the moments of its probability distribution and not just the first
two (i.e., mean and variance) are invariant over time. If, however, the stationary process is normal, the
weakly stationary stochastic process is also strictly stationary, for the normal stochastic process is fully
specified by its two moments, the mean and the variance.
Why are stationary time series so important?
Because if a time series is non-stationary,
 We can study its behavior only for the time period under consideration.
 Each set of time series data will therefore be for a particular episode.
 As a consequence, it is not possible to generalize it to other time periods.
Therefore, for the purpose of forecasting, such (no- stationary) time series may be of little practical value.
How do we know that a particular time series is stationary?
We will consider several tests of stationary.
Before we move on, we mention a special type of stochastic process (or time series), namely, a purely
random, or white noise, process. We call a stochastic process purely random if it has፡
 zero mean,
 constant variance σ2,
 and is serially uncorrelated
2.1.2 Non-stationary Stochastic Processes
Although our interest is in stationary time series, one often encounters non-stationary time series, the
classic example being the random walk model (RWM).
It is often said that asset prices, such as stock prices or exchange rates, follow a random walk; that is, they
are non-stationary. We distinguish two types of random walks:
(1) Random walk without drift (i.e., no constant or intercept term)
(2) Random walk with drift (i.e., a constant term is present).

8
If it is also independent, such a process is called strictly white noise.
The term random walk is often compared with a drunkard’s walk. Leaving a bar, the drunkard moves a
random distance ut at time t, and, continuing to walk indefinitely, will eventually drift farther and farther
away from the bar. The same is said about stock prices. Today’s stock price is equal to yesterday’s stock
price plus a random shock.
Random Walk without Drift.
Suppose ut is a white noise error term with mean 0 and variance σ2. Then the series Yt is said to be a
random walk if
Yt = Yt−1 + ut
In the random walk model, the value of Y at time t is equal to its value at time (t − 1) plus a random shock;
thus it is an AR(1) model. 𝑓(𝑥) = As a regression of Y at time t on its value lagged one period. Believers
in the efficient capital market hypothesis argue that stock prices are essentially random and therefore there
is no scope for profitable speculation in the stock market: If one could predict tomorrow’s price on the
basis of today’s price, we would all be millionaires.
Now from (Yt = Yt−1 + ut ) we can write
Y1 = Y0 + u1
Y2 = Y1 + u2 = Y0 + u1 + u2
Y3 = Y2 + u3 = Y0 + u1 + u2 + u3
In general, if the process started at some time 0 with a value of Y0, we have
Yt = Y0 ++ ∑𝑡=1 𝑢𝑡
Therefore,
E(Yt) = E ﴾Y0 + + ∑𝑡=1 𝑢𝑡﴿ = Y0 (why?)
In like fashion, it can be shown that
var (Yt) = tσ2
As the preceding expression shows, the mean of Y is equal to its initial, or starting, value, which is constant,
but as t increases, its variance increases indefinitely, thus violating a condition of stationary. In short, the
RWM without drift is a no stationary stochastic process. In practice Y0 is often set at zero, in which case
E(Yt) = 0. An interesting feature of RWM is the persistence of random shocks (i.e., random errors), Yt is
the sum of initial Y0 plus the sum of random shocks. As a result, the impact of a particular shock does not
die away. For example, if u2 = 2 rather than u2 = 0, then all Yt ’s from Y2 onward will be 2 units higher
and the effect of this shock never dies out. That is why random walk is said to have an infinite memory.
As Kerry Patterson notes, random walk remembers the shock forever; that is, it has infinite memory.

9
Interestingly, if you write ( Yt = Yt−1 + ut ) as
(Yt − Yt−1) = _∆Yt = ut
Where ∆ is the first difference operator It is easy to show that, while Yt is nonstationary, its first difference
is stationary. In other words, the first differences of a random walk time series are stationary.
Random Walk with Drift.
Let us modify (Yt = Yt−1 + ut ) as follows:
Yt = δ + Yt−1 + ut
Where δ is known as the drift parameter.
The name drift comes from the fact that if we write the preceding equation as Yt − Yt−1 = ∆_Yt = δ + ut
It shows that Yt drifts upward or downward, depending on δ being positive or negative. Note that model
(Yt = δ + Yt−1 + ut) is also an AR (1) model.
For the random walk with drift model (Yt = δ + Yt−1 + ut)
E(Yt) = Y0 + t · δ
var (Yt) = tσ2
As you can see, for RWM with drift the mean as well as the variance increases over time, again violating
the conditions of (weak) stationarity. In short, RWM, with or without drift, is a non-stationary stochastic
process.
2.2. UNIT ROOT STOCHASTIC PROCESS
Let us write the RWM (Yt = Yt−1 + ut) as:
Yt = ρYt−1 + ut − 1 ≤ ρ ≤ 1
This model resembles the Markov first-order autoregressive model that we discussed in the chapter on
autocorrelation. If ρ = 1, (Yt = Yt−1 + ut ) becomes a RWM (without drift). If ρ is in fact 1, we face what
is known as the unit root problem, that is, a situation of non-stationary; we already know that in this case
the variance of Yt is not stationary. The name unit root is due to the fact that ρ = 1. Thus the terms non
stationary, random walk, and unit root can be treated as synonymous.
If, however, |ρ| ≤ 1, that is if the absolute value of ρ is less than one, then it can be shown that the time
series Yt is stationary in the sense we have defined it.
2.3 TREND STATIONARY (TS) AND DIFFERENCE STATIONARY (DS)
The distinction between stationary and non-stationary stochastic processes (or time series) has a crucial
bearing on whether the trend (the slow long run evolution of the time series under consideration) Broadly
speaking, if the trend in a time series is completely predictable and not variable, we call it a deterministic

10
trend, whereas if it is not predictable, we call it a stochastic trend. To make the definition more formal,
consider the following model of the time series Yt .
Yt = β1 + β2t + β3Yt−1 + ut
where ut is a white noise error term and where t is time measured chronologically.
Now we have the following possibilities:
Pure random walk: If in (Yt = β1 + β2t + β3Yt−1 + ut ) β1 = 0, β2 = 0, β3 = 1, we get
Yt = Yt−1 + ut Which is nothing but a RWM without drift and is therefore nonstationary.
But note that, if we write (Yt = Yt−1 + ut) as
∆Yt = (Yt − Yt−1) = ut
It becomes stationary, as noted before. Hence, a RWM without drift is a difference stationary process
(DSP).
Random walk with drift: If in (Yt = β1 + β2t + β3Yt−1 + ut) β1 _= 0, β2 = 0, β3 = 1, we get
Yt = β1 + Yt−1 + ut
Which is a random walk with drift and is therefore nonstationary. If we write it as
(Yt − Yt−1) = ∆Yt = β1 + ut this means Yt will exhibit a positive (β1 > 0) or negative (β1 <
0) trend Such a trend is called a stochastic trend
Equation (Yt − Yt−1) = ∆Yt = β1 + ut ) is a DSP process because the no stationarity in Yt can be eliminated
by taking first differences of the time series.
Deterministic trend: If in (Yt = β1 + β2t + β3Yt−1 + ut ), β1 = 0, β2 = 0, β3 = 0, we obtain
Yt = β1 + β2t + ut which is called a trend stationary process (TSP). Although the
mean of Yt is β1 + β2t, which is not constant, its variance ( = σ2) is. Once the values of β1 and β2 are
known, the mean can be forecast perfectly. Therefore, if we subtract the mean of Yt from Yt, the resulting
series will be stationary, hence the name trend stationary. This procedure of removing the (deterministic)
trend is called de trending.
Random walk with drift and deterministic trend: If in (Yt = β1 + β2t + β3Yt−1 + ut ), β1 = 0, β2 = 0,
β3 = 1, we obtain:
Yt = β1 + β2t + Yt−1 + ut we have a random walk with drift and a deterministic trend
which can be seen if we write this equation as ∆Yt = β1 + β2t + ut which means that Yt is non-stationary.
Deterministic trend with stationary AR(1) component: If in (Yt = β1 + β2t + β3Yt−1 + ut)
β1 _= 0, β2 _= 0, β3 < 1, then we get
Yt = β1 + β2t + β3Yt−1 + ut which is stationary around the deterministic trend. To see the
difference between stochastic and deterministic trends, consider below Figure The series named stochastic

11
in this figure is generated by an RWM: Yt = 0.5 + Yt−1 + ut , where 500 values of ut were generated from
a standard normal distribution and where the initial value of Y was set at 1. The series named deterministic
is generated as follows: Yt = 0.5t + ut , where ut were generated as above and where t is time measured
chronologically.
As you can see from below figure in the case of the deterministic trend, the deviations from the trend line
(which represents no stationary mean) are purely random and they die out quickly; they do not contribute
to the long-run development of the time series, which is determined by the trend component. In the case
of the stochastic trend, on the other hand, the random component ut affects the long-run course of the
series Yt .
2.4 INTEGRATED STOCHASTIC PROCESSES
The random walk model is but a specific case of a more general class of stochastic processes known as
integrated processes. Recall that the RWM without drift is non-stationary, but its first difference
stationary. Therefore, we call the RWM without drift integrated of order 1, denoted as I (1). Similarly,
if a time series has to be differenced twice (i.e., take the first difference of the first differences) to make it
stationary, we call such a time series integrated of order . In general, if a (non-stationary) time series
has to be differenced d times to make it stationary, that time series is said to be integrated of order d.
A time series Yt integrated of order d is denoted as Yt ∼ I (d). If a time series Yt is stationary to begin with
(i.e., it does not require any differencing), it is said to be integrated of order zero, denoted by Yt ∼ I (0).
Thus, we will use the terms “stationary time series” and “time series integrated of order zero” to mean the
same thing.
Most economic time series are generally I (1); that is, they generally become stationary only after taking
their first differences.
Properties of Integrated Series
The following properties of integrated time series may be noted: Let Xt , Yt , and Zt be three time series.
1. If Xt ∼ I(0) and Yt ∼ I(1), then Zt = (Xt + Yt) = I(1); that is, a linear combination or sum of stationary
and nonstationary time series is nonstationary.
2. If Xt ∼ I(d), then Zt = (a + bXt) = I(d), where a and b are constants.That is, a linear combination of an
I(d) series is also I(d). Thus, if Xt ∼ I(0), then Zt = (a + bXt) ∼ I(0).
3. If Xt ∼ I(d1) and Yt ∼ I(d2), then Zt = (aXt + bYt) ∼ I(d2), where d1 < d2.
4. If Xt ∼ I (d) and Yt ∼ I(d), then Zt = (aXt + bYt) ∼ I(d*); d* is generally equal to d, but in some cases
d* < d
TESTS OF STATIONARITY

12
By now the reader probably has a good idea about the nature of stationary stochastic processes and their
importance. In practice we face two important questions:
(1) How do we find out if a given time series is stationary?
(2) If we find that a given time series is not stationary, is there a way that it can be made stationary?
THE UNIT ROOT TEST
A test of stationarity (or non-stationarity) that has become widely popular over the past several years is
the unit root test. We will first explain it, then illustrate it and then consider some limitations of this test.
We start with
Yt = ρYt−1 + ut − 1 ≤ ρ ≤ 1 where ut is a white noise error term.
We know that if ρ = 1, that is, in the case of the unit root, (Yt = ρYt−1 + ut − 1 ≤ ρ ≤ 1) becomes a random
walk model without drift, which we know is a non-stationary stochastic process. Therefore, why not
simply regress Yt on its (one period) lagged value Yt−1 and find out if the estimated ρ is statistically equal
to 1? If it is, then Yt is non-stationary. This is the general idea behind the unit root test of stationary.
For theoretical reasons, we manipulate (Yt = ρYt−1 + ut) as follows: Subtract Yt−1 from both sides of (Yt
= ρYt−1 + ut ) to obtain:
Yt − Yt−1 = ρYt−1 − Yt−1 + ut
= (ρ − 1)Yt−1 + ut which can be alternatively written as:
∆Yt = δYt−1 + ut where δ = (ρ − 1) and _, as usual, is the first-difference operator.
In practice, therefore, instead of estimating (Yt = ρYt−1 + ut), we estimate∆Yt =δYt−1 + ut) and test the
(null) hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning the time series under
consideration is non-stationary.
Before we proceed to estimate (∆Yt =δYt−1 + ut ), it may be noted that if δ = 0,
(∆Yt =δYt−1 + ut ) will become
∆Yt = (Yt − Yt−1) = ut
Since ut is a white noise error term, it is stationary, which means that the first differences of a random
walk time series are stationary, a point we have already made before. Now let us turn to the estimation of
(∆Yt =δYt−1 + ut ). This is simple enough; all we have to do is to take the first differences of Yt and regress
them on Yt−1 and
see if the estimated slope coefficient in this regression (= ˆ δ) is zero or not. If it is zero, we conclude that
Yt is nonstationary. But if it is negative, we conclude that Yt is stationary. The only question is which test
we use to find out if the estimated coefficient of Yt−1 in ((∆Yt =δYt−1 + ut ) is zero or not. You might be
tempted to say, why not use the usual t test? Unfortunately, under the null hypothesis that δ = 0 (i.e., ρ =

13
1), the t value of the estimated coefficient of Yt−1 does not follow the t distribution even in large samples;
that is, it does not have an asymptotic normal distribution.

The Augmented Dickey–Fuller (ADF) Test


In conducting the DF test as in (21.9.2), (21.9.4), or (21.9.5), it was assumed that the error term ut was
uncorrelated. But in case the ut are correlated, Dickey and Fuller have developed a test, known as the
augmented Dickey–Fuller (ADF) test. This test is conducted by “augmenting” the preceding three
equations by adding the lagged values of the dependent variable ∆Yt.
Power of Test.
Most tests of the DF type have low power; that is, they tend to accept the null of unit root more frequently
than is warranted. That is, these tests may find a unit root even when none exists. There are several reasons
for this.
First, the power depends on the (time) span of the data more than mere size of the sample. For a given
sample size n, the power is greater when the span is large. Thus, unit root test(s) based on 30 observations
over a span of 30 years may have more power than that based on, say, 100 observations over a span of
100 days.
Second, if ρ ≈ 1 but not exactly 1, the unit root test may declare such a time series non-stationary.
Third, these types of tests assume a single unit root; that is, they assume that the given time series is I(1).
But if a time series is integrated of order higher than 1, say, I(2), there will be more than one unit root. In
the latter case one may use the Dickey–Pantula test. Fourth, if there are structural breaks in a time series
(see the chapter on dummy variables) due to, say, the OPEC oil embargoes, the unit root tests may not
catch them.

SUMMARY AND CONCLUSIONS


1. Regression analysis based on time series data implicitly assumes that the underlying time series are
stationary. The classical t tests, F tests, etc. are based on this assumption.
2. In practice most economic time series are non-stationary.
3. A stochastic process is said to be weakly stationary if its mean, variance, and auto co variances are
constant over time (i.e., they are time invariant).

14
4. At the informal level, weak stationary can be tested by the correlogram of a time series, which is a
graph of autocorrelation at various lags. For stationary time series, the corre logram tapers off quickly,
whereas for non-stationary time series it dies off gradually. For a purely random series, the
autocorrelations at all lags 1 and greater are zero.
5. At the formal level, stationary can be checked by finding out if the time series contains a unit root. The
Dickey–Fuller (DF) and augmented Dickey–Fuller (ADF) tests can be used for this purpose.
6. An economic time series can be trend stationary (TS) or difference stationary (DS). A TS time series
has a deterministic trend, whereas a DS time series has a variable, or stochastic, trend. The common
practice of including the time or trend variable in a regression model to detrend the data is justifiable only
for TS time series. The DF and ADF tests can be applied to determine whether a time series is TS or DS.
7. Regression of one time series variable on one or more time series variables often can give nonsensical
or spurious results. This phenomenon is known as spurious regression. One way to guard against it is to
find out if the time series are co integrated.
8. Co integration means that despite being individually nonstationary, a linear combination of two or
more time series can be stationary. The EG, AEG, and CRDW tests can be used to find out if two or more
time series are co integrated.
9. Co integration of two (or more) time series suggests that there is a long-run, or equilibrium, relationship
between them.
10. The error correction mechanism (ECM) developed by Engle and Granger is a means of reconciling
the short-run behavior of an economic variable with its long-run behavior.
11. The field of time series econometrics is evolving. The established results and tests are in some cases
tentative and a lot more work remains. An important question that needs an answer is why some economic
time series are stationary and some are nonstationary.

15
CHAPTER THREE
INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
Simultaneous equation model: The model in which there is a single dependent variable and one or more
explanatory variables then the model is called a single equation model. On the other hand, a system of
equations representing a set of relationships among variables or describing the joint dependence of
variables is called simultaneous equation. In such models there are more than one equation one of the
mutually or jointly dependent or endogenous variables.
Example : Let us u1t ; 1, 2 0 consider the following
0 1 Pt 2 Yt
demand supply model . Demand function : Qtd
0 1 Pt 2Zt u 2t : 1, 2
d s
Supply function :Qt t Qts 0
Equilibrium condition:Q
Where , Q tdquantity demand Q tsquantity
supplied
P=Price
Y=disposal income
Z=cost of main raw materials t=time
which is an example of simultaneous equation model.

3.1 Complete simultaneous equation model :


A simultaneous equation model is complete if the total number of equation of Endogenous
variable is equal to the number of equation.

Example :consider the model of income determination .


Consumption function: 0 1 yt C tut
Income identity : Ct It Yt
Where, C= consumption
Y= income
I= investment t= time
u= error term

Here the number of Endogenous variables is equal to the number of equation .Hence this is a complete
simultaneous equation model.

Define or distinguish between endogenous and exogenous variable :

Endogenous variable : In the context of SEM , the jointly dependent variables are called Endogenous
variables that is the variables whose values are determined whit in the model are called endogenous
variables .
The endogenous variables are regarded as stochastic.

16
Exogenous / predetermined variable:
The variable having specified values in general is called exogenous variable .In SEM , the
predetermined variables , that variables whose values are determined outside the model are called
exogenous variable .It is also known as non-stochastic as non random variable .

What are the differences between single equation model & simultaneous equation model?
Single equation model Simultaneous equation model
1. The model of equation representing 1.The model of equation representing
a single relationship among among variables is called simultaneous
variables is called single equation equation model.
model.
2. In such model there is only one 2. In such model there is more than
equation. one equation.
3. Example: consumption function: 1 yt
3. Example: C t 0 ut
1 yt
Ct 0 ut Ct
4. In this method we may use OLS Yt It
method of estimation. 4. In this method the classical OLS
5. In single equation model one can method may not be applied.
estimates parameters of asingle 5. In this simultaneous equation
equation. model one may not estimate the
parameters for a single equation.

Define structural model/behavior model ?


Ans . Structural model: A complete system of equation, which describes or represents the structure of
the relationship of economic variables, is called a structural model. The endogenous variables are
expressed as the function of the other endogenous variables, predetermined variables & disturbance.

example
Ct 0 1 yt ut
Yt Ct It Yt

Where, C t & Y t = endogenous variable


I t = predetermined or exogenous variable u t=
stochastic disturbance form
t = time period
1 = co –efficient of endogenous variable
= co –efficient of exogenous variable
The & are known as structural parameter co efficient.

Define reduced form of a model with example.

Reduced form of a model: The reduced form model is that model in which the endogenous variables are
expressed as an explicit function of a predetermined variable. In otherwords, a reduced form equation is

17
one that expresses an endogenous variable solely in terms of the predetermined variable & the stochastic
disturbances.
Example : Let us consider the Keynsian model of income determination as
Consumption function: C t 0 1 yt ut ………..(1)
Income identity : Y t Ct It ……………….(2)
Where, C= consumption
Y= income
I= investment
t= time u= error term
Here , C t &Y t are the endogenous variable & I t is an exogenous variable.
Now , if we put the value of C t in equation (2),
We get,

So, the equation (3) is called reduced form equation , it express the endogenous solely as a function of
& 1
the exogenous variables I & the stochastic disturbance term wt , 0 are the reduced form parameter.

Simultaneous equation bias:


The bias arising from the application of classical least square to an equation belonging to a system of
simultaneous relations is called simultaneous equation bias.

Because of arising:
In the one or more explanatory variables are correlated with the disturbance term because the estimators
thus obtained are inconsistent.

Consequences:
This creates several problems. Firstly these arises the problem of identification of the parameters of
individual relationships. Secondly, these arise problems of estimation.

Problem: consider the Keynsian model & show that,


1. Y t & u t are uncorrelated.
1
2. The OLS estimates of is biased & inconsistent.

18
3.2 Simultaneity bias (Inconsistency of OLS Estimators under SEM)
In a simultaneous equation model, if an explanatory variable is determined simultaneously with the
dependent variable, it is generally correlated with the error term and applying OLS will result in biased
and inconsistent estimates. That is, the least squares estimator of parameters in a structural simultaneous
equation is biased and inconsistent because of the correlation between the random error and the
endogenous variables on the right-hand side of the equation.
3.2.1 Identification of Structural Equation (Order and rank conditions) (without proof)
By identification it is to mean whether numerical estimates of the parameters of a structural equation can
be obtained from the estimated reduced-form coefficients. Thus, when we say that an equation is
identified, it means we can estimate the parameters of a structural equation from the estimated reduced-
form coefficients. Identification is a concern of model formulation, not estimation as the latter depends up
on the empirical data and the form of the model.

Formally speaking, there are two rules/conditions which must be fulfilled for an equation to be
identified.
The order condition for identification

This condition is based on a counting rule of the exogenous and endogenous variables included
and excluded in the SEM. This condition states that an equation in any SEM satisfies the order
condition for identification if the number of excluded exogenous variables from the equation is
at least as large as the number of right-hand side endogenous variables in the equation.
The rank condition for identification
The rank condition states that in an SEM containing G equations any particular equation is
identified if and only if it is possible to construct at least one non-zero determinant of order (G-
1) from the coefficients of the variables excluded from that particular equation but contained in
the other equations of the model. The sufficient condition for identification is called the rank
condition

Remember from your linear algebra course that, the term rank refers to the rank of a matrix
and is given by the largest-order square matrix (contained in the given matrix) whose
determinant is nonzero. Alternatively, the rank of a matrix is the largest number of linearly
independent rows or columns of a matrix.

To understand the order and rank conditions, let‟s introduce the following notations:

Let, M = number of endogenous variables in the model

19
m = number of endogenous variables in a given equation

K = number of exogenous variables in the model including the intercept

k = number of exogenous variables in a given equation


Order Condition
 In a model of M simultaneous equations in order for an equation to be identified, it
must exclude at least M −1 variables (endogenous as well as exogenous) appearing
in the model. If it excludes exactly M − 1 variables, the equation is just identified.
If it excludes more than M−1 variables, it is over-identified.

 In a model of M simultaneous equations, in order for an equation to be identified,


the number of exogenous variables excluded from the equation must not be less than
the number of endogenous variables included in that equation less 1, that is,
K − k ≥ m− 1
If K − k = m − 1, the equation is just identified, but if K − k > m − 1, it is over-identified.
Rank condition
 In a model containing M equations in M endogenous variables, an equation is
identified if and only if at least one nonzero determinant of order (M − 1)(M − 1)
can be constructed from the coefficients of the variables (both endogenous and
predetermined) excluded from that particular equation but included in the other
equations of the model.
 In a model containing in simultaneous equations:
- If K− k> m − 1 and the rank of the A matrix is M − 1, the equation is over-
identified.
- If K− k= m − 1 and the rank of the matrix A is M − 1, the equation is
exactly identified.
- If K− k ≥ m − 1 and the rank of the matrix A is less than M − 1, the
equation is under-identified.
- If K − k < m − 1, the structural equation is unidentified. The rank of the A
matrix in this case is bound to be less than M − 1. (Why?)
Steps of checking rank condition
1. Bring all items of each equation, except the error term, to the left of the equal sign
2. Put all the endogenous and exogenous variables in a row

20
3. Put the corresponding coefficients of each variable beneath each variable
4. Construct a matrix from excluded variables (both exogenous and endogenous) and
check for its rank
5. If we can form at more than one (M-1) by (M-1) matrix of non-zero determinant, the
matric is said to be over identified. If we can form at exactly one (M-1) by (M-1) matrix
of non-zero determinant, the matric is said to be just identified. If we can not

Form at least (M-1) by (M-1) matrix of non-zero determinant, the matric is said to be
under identified.
Earlier we have said that SEM, unlike linear regression model, cannot be estimated directly using OLS
technique for it will give us biased and inconsistent estimates. Rather there are other estimation

techniques-like Instrumental Variable (IV) estimation or two-stage least squares estimation method
((2SLS).

21
CHAPYTER FOUR:
Introduction PANEAL DATA

So far, you have covered regression analysis using either cross sectional or time series data alone.
Although these two cases arise often in applications, cross-sectional data across time-a situation
where the data set has both cross sectional and time series dimensions-are being used more and
more often in empirical research.We know that, multiple regression is a powerful tool for
controlling for the effect of variables on which we have data. If data is not available for some of
the variables, however, they cannot be included in the regression and the OLS estimators of the
regression coefficients could have omitted variables bias.

This chapter describes a method for controlling for some types of omitted variables without
actually observing them. This method requires a specific type of data, called panel data, in which
each observational unit, or entity, is observed at two or more time periods. By studying changes in
the dependent variable over time, it is possible to eliminate the effect of omitted variables that
differ across entities but are constant over time.

Basically sampling cross sectional data across time involves two kinds of data sets: Panel data and
pooled data. Panel data (also known as longitudinal or cross-sectional time-series data) is a
dataset in which the behavior of entities is observed across time. These entities could be states,
companies, individuals, countries, etc.

The structure of panel dataset looks like the following.


Panel data allows you to control for variables you cannot observe or measure like cultural factors or
difference in business practices across companies; or variables that change over time but not across entities
(i.e. national policies, federal regulations, international agreements, etc.).
That is, it accounts for individual heterogeneity.

22
4.1 Pooled data

This involves sampling randomly from a large population at different points in time. Samples
drawn in different times may not be the same. The advantage here is samples consist of
independently sampled observations. This was also a key aspect in our analysis of cross-
sectional data: among other things, it rules out correlation in the error terms across different
observations. Pooling is helpful only if the relationship between the dependent variable and at
least some of the independent variables remain constant over time.

4.1.2. Panel data/Longitudinal data


In panel data the same cross-sectional units are surveyed over time. The problem here is if we lose any
observation for whatever reason (e.g. because of death), we can no longer use panel data.
Despite the existence of some variations, both pooled data and panel data essentially connote movement
over time of cross-sectional units. We will, therefore, use the term panel data in a generic sense to include
one or more of these terms.
Panel Data Model Examples
 Labor economics: effect of education on income, with data across time and individuals.
 Economics: effects of income on savings, with data across years and countries.
Panel data characteristics
1. Panel data provide information on individual behavior, both across individuals and over
time – they have both cross-sectional and time-series dimensions.
2. Panel data include N individuals observed at T regular time periods.
3. Panel data can be balanced when all individuals are observed in all time periods
(𝑇𝑖 = 𝑇for all i) or unbalanced when individuals are not observed in all time periods (𝑇𝑖
≠ 𝑇).
4. We assume correlation (clustering) over time for a given individual, with independence
over individuals.
 Example: the income for the same individual is correlated over time but it is
independent across individuals.

23
Panel data types
 Short panel: many individuals and few time periods (we use this case in class)
 Long panel: many time periods and few individuals
 Both: many time periods and many individuals

4.1.2. Advantages of using panel data

1. The techniques of panel data estimation can take into account heterogeneity relating to
firms, states, countries, etc., over time, explicitly by allowing for individual-specific
variables.

2. Increases precision estimators with more power of test statistics: By combining time
series of cross-section observations, panel data give “more informative data, more
variability, less collinearity among variables, more degrees of freedom and more
efficiency.”

3. By studying the repeated cross section of observations, panel data are better suited to
study the dynamics of change. Example: Spells of unemployment, job turnover, and
labor mobility.

4. Panel data can better detect and measure effects that simply cannot be observed in pure
cross-section, or pure time series data. Ex: the effects of minimum wage laws on
employment and earnings.

5. Panel data enables us to study more complicated behavioral models. For example,
phenomena such as economies of scale and technological change.

6. By making data available for several thousand units, panel data can minimize the bias
that might result if we aggregate individuals or firms into broad aggregates.

24
4.2 Estimation of Panel Data Regression Models

4.1.3. The Fixed Effects Approach

 The fixed effects model allows the individual-specific effects to be correlated with the
regressors x.
 We include 𝑎𝑖 as intercepts.

 Each individual has a different intercept term and the same slope parameters.

𝑦𝑖𝑡 = 𝑎𝑖 + 𝑥𝑖𝑡𝛽 + 𝑢𝑖𝑡

 We can recover the individual specific effects after estimation as:

In other words, the individual-specific effects are the leftover variation in the dependent
variable that cannot be explained by the regressors.
 Time dummies can be included in the regressors x.
There are several possibilities/assumptions

Case 1: The intercept and slope coefficients are constant across time and space and the
error term captures differences over time and individuals

Case 2: The slope coefficients are constant but the intercept varies over individuals.
Case 3: The slope coefficients are constant but the intercept varies over individuals and
time.

Case 4: Both the intercept as well as slope coefficients vary over individuals.
Case 5: Both the intercept as well as slope coefficients vary over individuals and time.
Note that complexity and reality increase as we move from case 1 to case 5.
Problems of Fixed Effects Approach, or LSDV model
i. Incorporating too many dummy variables will erode the degree of freedom down.
This reduces the degree of precision.
ii. Existence of too many variables will more likely cause multicollinearity problem.

iii. Error term may exhibit different behavior for different units. For example, it may be
auto correlated for KTF, where as it may not in MCI, etc.
iv. Problems with the existence of time-invariant variables

25
4.1.4. The Random Effects Approach
 The random-effects model assumes that the individual-specific effects 𝑎𝑖 are
distributed independently of the regressors.
 We include 𝑎𝑖 in the error term.
 Each individual has the same slope parameters and a composite error term 𝜀𝑖𝑡 = 𝑎𝑖 +
𝑒 and 𝑦𝑖𝑡 = 𝑥𝑖𝑡𝛽 + (𝑎𝑖 + 𝑒𝑖𝑡)
As it is said earlier fixed effect (covariance model) is straightforward to apply, but require loss
of large degree of freedom in the presence of large cross-sectional units. More specifically, if
the dummy variables do in fact represent a lack of knowledge about the (true) model, we can
express this ignorance through the disturbance term and this is what the error components
model (ECM) or random effects model (REM) suggest. Let‟s start with one of the previous
models:

𝑌𝑖𝑡 = 𝛼0𝑖 + 𝛼1𝑋1𝑖𝑡 + 𝛼2𝑋2𝑖𝑡 + 𝑢𝑖𝑡 … … … … … … … … … … … … … … … … (4.9)


The random effect assumes that instead of treating 𝛼0𝑖as fixed, we assume that it is a random
variable with a mean value of 𝛼0 (no subscript i here). And the intercept value for an individual
company can be expressed as:

𝛼0𝑖 = 𝛼0 + 𝜀𝑖 … … … … … … … … … … … … … … … … … … … … … … (4.10)

26
Where, εi is a random error term with a mean value of zero and variance of𝗌 𝜎2.

Substitute equation (4.10) in to (4.9),

𝑌𝑖𝑡 = 𝛼0 + 𝜀𝑖 + 𝛼1𝑋1𝑖𝑡 + 𝛼2𝑋2𝑖𝑡 + 𝑢𝑖𝑡

𝑌𝑖𝑡 = 𝛼0 + 𝛼1𝑋1𝑖𝑡 + 𝛼2𝑋2𝑖𝑡 + 𝑣𝑖𝑡 … … … … … … … … … … … … … … … … … (4.11)


Where, 𝑣𝑖𝑡 = 𝜀𝑖 + 𝑢𝑖𝑡, 𝜀𝑖 accounts cross-section, or individual-specific, error component, and

𝑢𝑖𝑡 accounts the combined time series and cross-section error components. Both𝜀𝑖 and 𝑢𝑖𝑡 are
assumed to fulfill the basic assumptions of classical linear regression model (CLRM). In
addition, the correlation between them should be zero.

In FEM each cross-sectional unit has its own (fixed) intercept value, in all N such values
for N cross-sectional units. In REM (ECM), on the other hand, the intercept 𝛼0 represents the
mean value of all the (cross-sectional) intercepts and the error component εi represents the
(random) deviation of individual intercept from this mean value. Note that εi is not directly
observable, and is known as an unobservable, or latent, variable.

Let‟s go back to the previous example. This model says that the four firms included in our
sample are drawn from a much larger population of such companies and that they have a
common mean value for the intercept (= 𝛼0) and the individual differences in the intercept
values of each company are reflected in the error term 𝜀𝑖.

Note also that 𝑉𝑎𝑟 (𝑣𝑖𝑡) = 𝜎2𝑢+ 𝜎2 𝗌(remember this from previous semester). Hence, if
𝜎2 = 0 , there is no difference between models (4.1) and (4.9). But, though
Homoscedastic, it is auto correlated. Unless we account for this problem, estimates though are
unbiased, will be inefficient. To this end, the generalized least square (GLS) is mostly use

27
4.3 Comparison of FEM Vs ECM
 Which is better?
To decide between fixed or random-effects you can run a Hausman test
where the null hypothesis is that the preferred model is random effects
vs. the alternative the fixed effects. It basically tests whether the unique
errors (𝜀𝑖) are correlated with the regressors, the null hypothesis is they
are not.

Run a fixed effects model and save the estimates, then run a random model and
save the estimates, then perform the test. If the p-value is significant (for example
<0.05) then use fixed effects, if not use random effects.

If it is assumed that εi and the X‟s are uncorrelated, ECM may be appropriate, whereas if
εi and the X‟s are correlated, FEM may be appropriate.
Which one has to be chosen?
If the number of time series data is large and the number of cross-sectional units is
small, there is likely to be little difference in the values of the parameters estimated
by FEM and ECM. Hence the choice here is based on computational convenience.
On this score, FEM may be preferable.

If the number of time series data is small and the number of cross-sectional units is
large, there is significant difference and FEM is appropriate. Note that
𝛼0𝑖depends on number of cross sectional units.

If the number of time series data is small and the number of cross-sectional units is
large, and if the assumptions underlying ECM hold, ECM estimators are more
efficient than FEM estimators.

28

You might also like