Econometrics Notes
Econometrics Notes
notes
ECONOMETRICS
INTRODUCTION TO ECONOMETRICS
What is Econometrics?
Methodology of Econometrics
The overall methodology of Econometrics can be described under the below given 8
steps:
Example: c=a+bY+e
4: Obtaining data: To estimate the econometric model, that is, to obtain the
numerical values of a and b, we need data.
5: Estimation of the Econometric Model: Based on the available data, we can estimate
the parameters in econometric models. The numerical estimates of the parameters give
empirical content to the economic theory. The statistical technique of regression
analysis is the main tool used to obtain the estimates.
6: Hypothesis Testing: The estimation based on the sample data needs hypothesis
testing for generalization of the estimate. The confirmation or refutation of economic
theories on the basis of sample evidence is based on a branch of statistical theory
known as statistical inference (hypothesis testing).
7: Forecasting or Prediction: If the estimated parameters are statistically significant, we
can use the outcome for forecasting or prediction purposes.
8: Using the Model for Policy purpose: The outcomes derived from econometrics
operations can be used for various policy purposes.
Before moving into the core topics of Econometrics, some pre-requisite knowledge
about Types of data and measurement scale of data will strengthen the understanding
about Econometrics operations.
Types of Data
● Time series data: A time series is a set of observations on the values that a
variable takes at different times. Such data may be collected at regular intervals
such as daily, weekly, monthly, quarterly, annually.
For example: data of SGDP of all states in India during the year 2021.
● Pooled data: In pooled, or combined, data are elements of both time series and
cross-section data.
● Panel data: This is a special type of pooled data in which the same
cross-sectional unit is surveyed over time.
For example: data of monthly bond prices of 100 companies for five years.
1. Nominal Scale. Nominal variables can be placed into categories. They don’t
have a numeric value and so cannot be added, subtracted, divided or multiplied.
They also have no order. For example Gender.
2. Ordinal Scale. The ordinal scale contains things that you can place in order. For
example, Rank obtained by students. Basically, if you can rank data by 1st, 2nd,
3rd place (and so on), then you have data that’s on an ordinal scale.
3. Interval Scale. An interval scale is one where there is order and the difference
between two values is meaningful. Examples of interval variables include:
temperature, year, etc.
4. Ratio Scale. An extension of the interval scale, here both differences and ratios
are meaningful. Most of the quantitative variables comes under this head. For
example GDP.
Regression analysis is concerned with the study of the dependence of one variable,
the dependent variable, on one or more other variables, the explanatory variables, with
a view to estimating and/or predicting the (population) mean or average value of the
former in terms of the known or fixed (in repeated sampling) values of the latter.
The term dependent variable can be denoted using other terminologies like Explained
variable, Regressand, Endogenous variable, and Controlled variable. Likewise the term
explanatory variables also have other terminologies such as Independent variable,
Regressor, Exogenous variable, and Control variable.
Where β1 and β2 are unknown but fixed parameters known as the regression
coefficients. Equation itself is known as the linear population regression function
(PRF).
Yi = ˆ β1 + ˆβ2Xi + ˆui
The PRF is an idealized concept, since in practice one rarely has access to the entire
population of interest. Usually, one has a sample of observations from the population.
Therefore, error term is an inevitable element in models based on samples. Usually, one
has to use the stochastic sample regression function (SRF) to estimate the PRF.
OLS METHOD
This method work based on the criterion of minimizing error term. Under this method,
the value of parameter is selected at which the summation of error square is minimum.
2 ^ 2
∑ 𝑢𝑖 =∑(𝑌𝑖 − 𝑌𝑖) should be minimum
Hence as per least square criterion, the first order derivative w.r.t β will provide exact
estimate of regression coefficients.
^ ^ 2
∑(𝑌𝑖 − β1 − β2Xi)
∑𝑥𝑖𝑦𝑖
β2=
2
∑𝑥𝑖
β1= 𝑌-β2𝑋
Here, 𝑋= mean of X
xi= 𝑋-X.
1: The regression model is linear in the parameters. Keep in mind that the
regressand Y and the regressor X themselves may be nonlinear.
2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be
nonstochastic.
3: Zero mean value of disturbance ui. Given the value of X, the mean, or expected,
value of the random disturbance term ui is zero. Technically, the conditional mean value
of ui is zero. Symbolically, we have
E(ui |Xi) = 0
8: Variability in X values. The X values in a given sample must not all be the same. If
all the X values are identical, then Xi = .X and the denominator of that equation will be
zero, making it impossible to estimate β2 and therefore β1.
10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.
Given the assumptions of the classical linear regression model, the least-squares
estimates possess some ideal or optimum properties. These properties are contained in
the well-known Gauss–Markov theorem.
1. OLS estimator is linear, that is, a linear function of a random variable.
2. It is unbiased, that is, its average or expected value, E (ˆ β2), is equal to the true
value, β2.
3. OLS estimates give us minimum variance (best).
An unbiased estimator with the least variance is known as an efficient estimator.
These properties are specific to the small sample based models (finite sample
properties). While, for the large sample models, an additional property of Consistency
also included (asymptotic properties). An estimator is said to be consistent if its value
approaches the actual, true parameter (population) value as the sample size increases.
COEFFICIENT OF DETERMINATION
• The overall goodness of fit of the regression model is measured by the coefficient
2
of determination, 𝑟 . It tells what proportion of the variation in the dependent
variable is explained by the explanatory variable.
• The coefficient of determination is the square of the correlation (r) between
predicted y scores and actual y scores; This 𝑟2 lies between 0 and 1; closer it is
to 1, better is the fit.
• And 𝑟2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An 𝑟2 of 0.10 means that 10 percent of the variance in Y is
predictable from X; an 𝑟2 of 0.20 means that 20 percent is predictable; and so
on.
To compute this r 2, we proceed as follows: Recall that
Yi = ˆYi + ˆui
In deviation form: yi = ˆyi + ˆui
TSS = ESS + RSS
𝐸𝑆𝑆 𝑅𝑆𝑆
1= 𝑇𝑆𝑆 + 𝑇𝑆𝑆
TSS = Total sum square: total variation of the actual Y values about their sample mean
ESS = Explained sum square: variation of the estimated Y values about their mean
RSS = Residual sum square: residual or unexplained variation of the Y values about the
regression line.
MULTIPLE REGRESSION MODEL
Regression models with more than one independent variable called multiple
regression models.
𝑌𝑖 = β1 + β2𝑋2𝑖 + β3𝑋3𝑖 + 𝑢𝑖
• The coefficients β2 and β3 are called the partial regression coefficients, which
means β2 measures the change in the mean value of Y, per unit change in X2,
holding the value of X3 constant.
• R-squared or 𝑹𝟐 is used to find goodness of fit of a multiple regression model.
So, if 𝑹𝟐 is 0.8, it means 80% of the variation in the output variable is explained
by the input variables. So, in simple terms, higher the 𝑹𝟐, the more variation is
explained by your input variables and hence better is your model.
2 2
𝑅 and Adjusted 𝑅
However, the problem with 𝑹𝟐 is that it will either stay the same or increase with
addition of more variables, even if they do not have any relationship with the output
variables. This is where “Adjusted ” comes to help. Adjusted R-square penalizes
you for adding variables which do not improve your existing model.
• Hence, if you are building Linear regression on multiple variable, it is always
suggested that you use Adjusted 𝑹𝟐 to judge goodness of model. In case you only
have one input variable, 𝑹𝟐 and Adjusted 𝑹𝟐 would be exactly same.
• Typically, the more non-significant variables you add into the model, the gap in 𝑹𝟐
and Adjusted 𝑹𝟐 increases.
HYPOTHESIS TESTING
• Null Hypothesis (Ho): Null hypothesis is stated for the purpose of testing or
verifying its validity. It assumes that there is no difference between population
parameter and sample statistic.
Errors Ho When…
Type I Rejected Ho is true
Type II Accepted Ho is false
• Two Tailed Test: In this, the critical region lies on both sides. It does not tell us
whether the value is less than or greater than the desired value. The rejection
region under this test is taken on both the sides of the distribution.
• One Tailed Test: Under this, H1 can either be greater than or less than the
desired value. The rejection region, under this test, is taken only on one side of
the distribution.
T-test
It was given by William Gosset in 1905 and is also known as Student’s T-test. It can be
used when standard deviation of population is unknown and number of observations is
less than 30.
Uses:
s= standard deviation
Z-test
The test was given by Fischer and is used when population standard deviation is
known. The test is used when we need to identify whether the two samples are from the
same population or not. Z test can be considered as an alternative test for T test and it
follows certain assumptions, like:
Assumptions:
Z=
(𝑋−µ )0
𝑠
F-test
• The test was given by Fisher in 1920’s and is closely related with ANOVA. It is
also known as Variance ratio test.
2
𝑠1
F= 2
𝑠2
2
𝑥 (Chi-Square) test
It is a non-parametric test and does not make any assumptions about population from
which samples are drawn. It was first used by Karl Pearson in 1900.
1. It is a test of independence.
3. It is used to test the discrepancies between the observed frequencies and the
expected frequencies.
DUMMY VARIABLE
ASSUMPTION VIOLATIONS
We already mentioned about CLRM assumption, but in some cases, the assumptions
may violate. In this section we are going to discuss about major assumption violations,
causes of such violations, consequences of such violation, detection methods and
remedial measures for respective assumption violations.
MULTICOLLINEARITY
1. There is tendency of economics variables to move together over time. In time
series data, growth and trend factors are main cause for multicollinearity.
3. An overdetermined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical
research where there may be a small number of patients about whom information
is collected on a large number of variables.
Consequences of Multicollinearity
Recall the meaning of ˆ β2: It gives the rate of change in the average value of Y as
X2 changes by a unit, holding X3 constant. But if X3 and X2 are perfectly collinear,
there is no way X3 can be kept constant: As X2 changes, so does X3 by the factor λ.
What it means, then, is that there is no way of disentangling the separate influences
of X2 and X3 from the given sample.
• Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
• Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily
• Although the t ratio of one or more coefficients is statistically insignificant, 𝑅2, the
overall measure of goodness of fit, can be very high.
• The OLS estimators and their standard errors can be sensitive to small changes
in the data.
Detection of Multicollinearity
2
1. If a model has High 𝑅 but few significant t ratios, there is a higher chance for
multicollinearity.
4. Auxiliary Regression: Under this we exclude one variable and regress it on other
2
variables to estimate our 𝑅 .
1
5. Variance Inflation Factor (VIF) = 2
1−𝑟 23
1
Tolerance (TOL) = 𝑉𝐼𝐹
If TOL is close to zero, then multicollinearity is present and if it closer to one then
less degree of multicollinearity.
Remedial Measures
2. Dropping a variable: If there is some collinear variables in the model, dropping
any variable will solve multicollinearity of the model.
HETEROSCEDASTICITY
The term Heteroscedasticity means that the variance of error term is not
constant, that is, different error terms have different variances.
2
var(ui|Xi)= σ𝑖
Heteroscedastic Homoscedastic
Causes of Heteroscedasticity
1. All the models pertaining to learning skills or error learning models exhibit
heteroscedasticity.
4. Data collection technique improves with time. As a result constant variance is not
found for all those economic variables where data collecting techniques are changing
very fast.
Consequences of heteroscedasticity
1. The estimators are still linear, unbiased and consistency doesn’t change.
2. However, the estimators are no longer BLUE because they are no longer efficient;
therefore, they are not the best estimators. The estimators do not have a constant
variance.
Detection of Heteroscedasticity
• Park test
• Glejser test
• Goldfeld-Quandt Test: This test only applicable to sample size greater than or
equal to 30.
Remedial Measures
The problem of heteroscedasticity can be corrected when variances are known and
unknown.
In such a case, we transform the model in such a way so as to obtain a functional form
in which transformed error term has a constant variance.
AUTOCORRELATION
By the term auto-correlation we mean that the error terms are related to each other over
time or over space. Value of error term in one period is correlated with its value in
another period, then they are said to be serially correlated or auto-correlated.
❑ Positive auto-correlation is when the variables are going in the same direction.
Causes of Auto-correlation
4. Lags: Whichever economic phenomenon is showing impact of the variables from the
previous time period, problem of auto-correlation is likely to surface in such models.
Consequence of Auto-correlation
• In the presence of autocorrelation the OLS estimators are still linear unbiased as
well as consistent and asymptotically normally distributed, but they are no longer
efficient (i.e., minimum variance).
• t and F-tests are no longer valid and it will provide misleading result.
2
• 𝑅 is likely to be over-estimated.
Detecting Auto-correlation
• Graphical Method
• Runs Test
• Durbin–Watson d Test
Remedial Measures
1. If the source of autocorrelation is omitted variables, then the remedy is to include
those omitted variables in the set of explanatory variables.
If auto-correlation exists because of reasons other than the above mentioned two, then
it is the case of pure auto-correlation.
SIMULTANEOUS-EQUATION MODELS
A system describing the joint dependence of variables is called a system of
simultaneous equations or simultaneous equations model. A unique feature of
simultaneous-equation models is that the endogenous variable (i.e., regressand) in one
equation may appear as an explanatory variable (i.e., regressor) in another equation of
the system.
IDENTIFICATION PROBLEM
Q = D (P, X, Ud ) (demand)
• If the model is not identified then estimates of the parameters can not be
estimated.
In econometric theory two possible situations of identifiability:
Rules of Identification
1) Order Condition: It states that the total number of variables (endogenous and
exogenous) excluded from it must be equal to or greater than the number of
endogenous variables in the model less one. It is necessary condition but not the
sufficient condition for identification.
K–M≥G–1
exogenous)
to be identified
2) Rank Condition: The rank condition states that in a system of G equations any
particular equation is identified if and only if it is possible to construct at least one non
zero determinant of order (G – 1) from the coefficients of the variables excluded from
that particular equation but contained in the other equations of the model.
2. There must be full information about all equations in the model.
3. Error term from reduced form equations should be independently, identically
distributed.
The ILS coefficients inherit all asymptotic properties like consistency and
efficiency; but the small sample properties such as unbiasedness do not
generally hold true.
The 2SLS method was introduced by Henri Theil and Robert Bassmann and is mostly
used in equations which are over-identified. Under this method, the OLS is applied
twice. The method is used to obtain the proxy or instrumental variable for some
explanatory variable correlated with error term. 2SLS purifies the stochastic explanatory
variables from the influence of stochastic disturbance or random term.
Features of 2SLS
1. The method can be applied to an individual equation in the system without taking
into account the other equations.
2. This method is suitable for over-identified equations because it gives one
estimate per parameter.
3. This method can also be applied to unidentified equations but in that case ILS
estimates will be equal to 2SLS estimates.
4. If R² values in a reduced form regression are very high, then OLS and 2SLS will
be very close. If R² values in the 1st regressions are low, then 2SLS estimates
will be meaningless.
RECURSIVE MODELS
Recursive Models: If the equation of the structural form could be arranged in such a way
that the first equation contains only predetermined variables as explanatory variables
and second equation contains predetermined variables and the first endogenous
variable of the first equation and so on. Based on the assumption that explanatory
variables are uncorrelated with disturbance terms in the same equation.
𝑦1=𝑥1+𝑢1
𝑦2=𝑦1 + 𝑥 +𝑢2
1
𝑦3=𝑦1 + 𝑦2 + 𝑥 +𝑢3
1
This system above is called a recursive system. It is also called a triangular system as
coefficients of endogenous variables form a triangular form. The main advantage of
recursive model is that OLS can be applied directly on each equation to estimate
parameters and hence OLS will be best and unbiased.
• A stochastic process is said to be stationary if its mean and variance are constant
overtime and the value of covariance between two time periods depends only on the
gap or lag between the two time periods.
The fluctuations of time series can be classified into four basic type of variations, they
are often called components or elements of a time series. They are:
1. Secular Trend: The secular trend is the main component of a time series which
results from long term effect of socio-economic and political factors. This trend
may show the growth or decline in a time series over a long period.
2. Seasonal Variations (Seasonal Trend): These are short term movements
occurring in a data due to seasonal factors. The short term is generally
considered as a period in which changes occur in a time series with variations in
weather or festivities.
3. Cyclical Variations: These are long term oscillation occurring in a time series.
These oscillations are mostly observed in economics data and the periods of
such oscillations are generally extended from five to twelve years or more.
4. Irregular Variations: These are sudden changes occurring in a time series
which are unlikely to be repeated, it is that component of a time series which
cannot be explained by trend, seasonal or cyclic movements.
Measurement of Trend
A non – stationary time series will have a time varying mean or time varying
variance or both.
𝑌𝑡 = 𝛼 + 𝑌𝑡−1 + 𝑢𝑡
In random walk with drift the mean and variance increases overtime again violating
the condition of weak stationary
𝑌𝑡 = 𝑌𝑡−1 + 𝑢𝑡
𝑌𝑡 = p𝑌𝑡−1 + 𝑢𝑡 −1≤𝜌≤1
𝑌𝑡 = 𝜌𝑌𝑡−1 + 𝑢𝑡 … … … . . (1)
If 𝝆 = 1 then equation (i) will become random walk model without drift which is
non-stationary stochastic process.
∆𝑌𝑡= (𝜌 − 1)−1 + 𝑢𝑡
∆𝑌𝑡= 𝛿𝑌𝑡−1 + 𝑢𝑡
𝑤ℎ𝑒𝑟𝑒 𝛿 = 𝜌 − 1
CO-INTEGRATION
Stationarity is a crucial property for time series modeling. The problem is, in practice,
very few phenomena are actually stationary in their original form. The trick is to employ
the right technique for reframing the time series into a stationary form. One such
technique leverages a statistical property called co-integration. Co-integration forms a
synthetic stationary series from a linear combination of two or more non-stationary
series.
When two time series variables X and Y do not individually hang around a constant
value but their combination (could be linear) does hang around a constant is
called co-integration. Sometimes it's considered as a long term relationship between the
said variables.
METHODS OF FORECASTING
The most important aspect of time series is Forecasting. There are two methods of
forecasting which are popularly used. They are:
QUESTION DISCUSSION
(D) For an adequate model the residual (û) will be zero for all sample data points
(Answer: A) (2013 sep iii 71)
Explanation: Only the errors follow a normal distribution (which implies the conditional
probability of Y given X is normal too). You do need distributional assumptions about the
response variable in order to make inferences (e.g, confidence intervals), but it is not
necessary that the response variable be normally distributed.
2
σ σ
C. SE= D. SE=
𝑛 𝑛
Explanation: The standard error (SE) of a statistic is the approximate standard deviation
of a statistical sample population. Therefore, the relationship between the standard
error of the mean and the standard deviation is such that, for a given sample size, the
standard error of the mean equals the standard deviation divided by the square root of
the sample size.
▪ Which of the following is true in the context of statistical test of hypotheses for
two variable linear regression model:
2 2
A. 𝑡 >F B. 𝑡 <F
2
C. 𝑡 =F D. t=F (Answer: C)(2015 june iii 68)
Y=20+2X
SE= 0.46
To test Ho: β1=2.1 against 𝐻1: β1≠ 2.1 (not equal to 2.1), test statistic |t| is equal to
Codes :
(a) Generalized least square (GLS) method is capable of providing BLUE in situations
where OLS fails
(b) GLS is OLS on the transformed variables that satisfy the standard least squares
assumptions
D. a is not correct but b and c are correct (Answer: A) (2019 dec 48)
▪ If a Durbin - Watson statistics takes value close to zero, what will be the value of
first order autocorrelation coefficient ?
(C) Close to minus one (D) Close to either minus or plus one
^
[
d= 2 1 − 𝑝 ]
^
-1<𝑝>1 (Simply correlation coefficient)
C. Test a hypothesis
D. Estimate probability
The Granger causality test is a statistical hypothesis test for determining whether
one time series is useful in forecasting another, first proposed in
1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued
that causality in economics could be tested for by measuring the ability to predict the
future values of a time series using prior values of another time series
2
❑ Coefficient of determination - 𝑅
The method relies on the key assumption that forecasts from a group are
generally more accurate than those from individuals. The aim of the Delphi
method is to construct consensus forecasts from a group of experts in a
structured iterative manner.
The Wald test can tell you which model variables are contributing something significant.
▪ Given the two regression lines estimated from given data as under :
Y = 4 + 0.4X
X = –2 + 0.9Y
• Use correlation for a quick and simple summary of the direction and strength of
the relationship between two or more numeric variables. Use regression when
you’re looking to predict, optimize, or explain a number response between the
variables (how x influences y).
• r= 𝑏𝑥𝑦 * 𝑏𝑦𝑥
2
• Coefficient of determination (𝑟 ) is a measurement used to explain how much
variability of one factor can be caused by its relationship to another related factor.
Bibliography
H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1.
Roger B. Davis and ScD Kenneth J. MukamalMD, Davis, R. B., Roger B. Davis From
the Division of General Medicine and Primary Care, Mukamal, K. J., Kenneth J.
Mukamal From the Division of General Medicine and Primary Care, & Davis, C. to R. B.
(2006, September 5). Hypothesis testing. Circulation. Retrieved September 28, 2021,