0% found this document useful (0 votes)
2 views

Econometrics Notes

Econometrics is a tool for empirically evaluating economic theories, involving a methodology that includes defining economic theories, specifying mathematical and econometric models, obtaining data, estimating models, hypothesis testing, forecasting, and policy application. It utilizes various data types such as time series, cross-sectional, pooled, and panel data, and employs regression analysis as a primary tool for estimation. The document also covers the assumptions of classical linear regression, properties of least squares estimators, coefficient of determination, multiple regression models, and hypothesis testing techniques including T-tests, Z-tests, and F-tests.

Uploaded by

Shubham Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Econometrics Notes

Econometrics is a tool for empirically evaluating economic theories, involving a methodology that includes defining economic theories, specifying mathematical and econometric models, obtaining data, estimating models, hypothesis testing, forecasting, and policy application. It utilizes various data types such as time series, cross-sectional, pooled, and panel data, and employs regression analysis as a primary tool for estimation. The document also covers the assumptions of classical linear regression, properties of least squares estimators, coefficient of determination, multiple regression models, and hypothesis testing techniques including T-tests, Z-tests, and F-tests.

Uploaded by

Shubham Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Econometrics

notes
ECONOMETRICS

INTRODUCTION TO ECONOMETRICS

What is Econometrics?

“Econometrics is concerned with the empirical determination of economic laws” ( Theil,


1971).

As the definition suggest, Econometrics is a tool used to empirically evaluate economic


laws and theories.

Methodology of Econometrics

The overall methodology of Econometrics can be described under the below given 8
steps:

1: Economic theory: Since Econometrics is concerned with the empirical determination


of economic theories, the procedures of econometric operations start with a statement
of theory or hypothesis.

Example: Fundamental theory of consumption.

2: Specification of mathematical model of the theory: Every economic theory would


have equivalent mathematical models. For example, c=a+bY.

3: Specification of Econometric model of the theory: Mathematical models are


concerned with the exact or deterministic relationship among variables. But
relationships between economic variables are inexact. To allow for the inexact
relationships between economic variables, specification of econometric model is
required. Econometric models would provide more realistic expression of relationship
between variables by incorporating components of error term.

Example: c=a+bY+e

4: Obtaining data: To estimate the econometric model, that is, to obtain the
numerical values of a and b, we need data.

5: Estimation of the Econometric Model: Based on the available data, we can estimate
the parameters in econometric models. The numerical estimates of the parameters give
empirical content to the economic theory. The statistical technique of regression
analysis is the main tool used to obtain the estimates.
6: Hypothesis Testing: The estimation based on the sample data needs hypothesis
testing for generalization of the estimate. The confirmation or refutation of economic
theories on the basis of sample evidence is based on a branch of statistical theory
known as statistical inference (hypothesis testing).
7: Forecasting or Prediction: If the estimated parameters are statistically significant, we
can use the outcome for forecasting or prediction purposes.
8: Using the Model for Policy purpose: The outcomes derived from econometrics
operations can be used for various policy purposes.

Before moving into the core topics of Econometrics, some pre-requisite knowledge
about Types of data and measurement scale of data will strengthen the understanding
about Econometrics operations.

Types of Data

●​ Time series data: A time series is a set of observations on the values that a
variable takes at different times. Such data may be collected at regular intervals
such as daily, weekly, monthly, quarterly, annually.

For example: data of GDP growth for the last 10 years.

●​ Cross-sectional data: Cross-section data are data on one or more variables


collected at the same point in time.

For example: data of SGDP of all states in India during the year 2021.

●​ Pooled data: In pooled, or combined, data are elements of both time series and
cross-section data.

●​ Panel data: This is a special type of pooled data in which the same
cross-sectional unit is surveyed over time.

For example: data of monthly bond prices of 100 companies for five years.

Measurement scale of variables

1.​ Nominal Scale. Nominal variables can be placed into categories. They don’t
have a numeric value and so cannot be added, subtracted, divided or multiplied.
They also have no order. For example Gender.

2.​ Ordinal Scale. The ordinal scale contains things that you can place in order. For
example, Rank obtained by students. Basically, if you can rank data by 1st, 2nd,
3rd place (and so on), then you have data that’s on an ordinal scale.

3.​ Interval Scale. An interval scale is one where there is order and the difference
between two values is meaningful. Examples of interval variables include:
temperature, year, etc.

4.​ Ratio Scale. An extension of the interval scale, here both differences and ratios
are meaningful. Most of the quantitative variables comes under this head. For
example GDP.

THE SIMPLE LINEAR REGRESSION MODEL


As we mentioned in the methodology, the process of econometric analysis starts with an
econometric model. An econometric model can be constructed using regression
equation. Therefore regression is considered as a main tool of econometrics.

Regression analysis is concerned with the study of the dependence of one variable,
the dependent variable, on one or more other variables, the explanatory variables, with
a view to estimating and/or predicting the (population) mean or average value of the
former in terms of the known or fixed (in repeated sampling) values of the latter.

The term dependent variable can be denoted using other terminologies like Explained
variable, Regressand, Endogenous variable, and Controlled variable. Likewise the term
explanatory variables also have other terminologies such as Independent variable,
Regressor, Exogenous variable, and Control variable.

E(Y | Xi) = β1 + β2Xi

Where β1 and β2 are unknown but fixed parameters known as the regression
coefficients. Equation itself is known as the linear population regression function
(PRF).

●​ THE SAMPLE REGRESSION FUNCTION (SRF):

Yi = ˆ β1 + ˆβ2Xi + ˆui

The PRF is an idealized concept, since in practice one rarely has access to the entire
population of interest. Usually, one has a sample of observations from the population.
Therefore, error term is an inevitable element in models based on samples. Usually, one
has to use the stochastic sample regression function (SRF) to estimate the PRF.

OLS METHOD

The estimation of econometric model is very crucial in econometric analysis. The


method of ordinary least squares is attributed to Carl Friedrich Gauss, a German
mathematician. Under certain assumptions, the method of least squares has some very
attractive statistical properties that have made it one of the most powerful and popular
methods for estimating the unknown parameters in a linear regression model.

This method work based on the criterion of minimizing error term. Under this method,
the value of parameter is selected at which the summation of error square is minimum.

2 ^ 2
∑ 𝑢𝑖 =∑(𝑌𝑖 − 𝑌𝑖) should be minimum
Hence as per least square criterion, the first order derivative w.r.t β will provide exact
estimate of regression coefficients.

^ ^ 2
∑(𝑌𝑖 − β1 − β2Xi)

∑𝑥𝑖𝑦𝑖
β2=
2
∑𝑥𝑖

β1= 𝑌-β2𝑋

Here, 𝑋= mean of X

xi= 𝑋-X.

ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL

1: The regression model is linear in the parameters. Keep in mind that the
regressand Y and the regressor X themselves may be nonlinear.

2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be
nonstochastic.

3: Zero mean value of disturbance ui. Given the value of X, the mean, or expected,
value of the random disturbance term ui is zero. Technically, the conditional mean value
of ui is zero. Symbolically, we have
E(ui |Xi) = 0

4: Homoscedasticity or equal variance of ui. Given the value of X, the variance of ui


is the same for all observations. That is, the conditional variances of ui are identical.
Symbolically, we have
2
Var (ui) = σ

5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj (i


_= j), the correlation between any two ui and uj (i _= j) is zero. Symbolically,
Cov (ui, uj) = 0

6: Zero covariance between ui and Xi, or E(ui, Xi) = 0.


7: The number of observations n must be greater than the number of parameters
to be estimated.

8: Variability in X values. The X values in a given sample must not all be the same. If
all the X values are identical, then Xi = .X and the denominator of that equation will be
zero, making it impossible to estimate β2 and therefore β1.

9: The regression model is correctly specified. Alternatively, there is no


specification bias or error in the model used in empirical analysis.

10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.

PROPERTIES OF LEAST SQUARE ESTIMATORS

Given the assumptions of the classical linear regression model, the least-squares
estimates possess some ideal or optimum properties. These properties are contained in
the well-known Gauss–Markov theorem.

Gauss–Markov Theorem: Given the assumptions of the classical linear regression


model, the least-squares estimators, in the class of unbiased linear estimators, have
minimum variance, that is, they are BLUE.

Best Linear Unbiased Estimators (BLUE):

1.​ OLS estimator is linear, that is, a linear function of a random variable.
2.​ It is unbiased, that is, its average or expected value, E (ˆ β2), is equal to the true
value, β2.
3.​ OLS estimates give us minimum variance (best).
An unbiased estimator with the least variance is known as an efficient estimator.

These properties are specific to the small sample based models (finite sample
properties). While, for the large sample models, an additional property of Consistency
also included (asymptotic properties). An estimator is said to be consistent if its value
approaches the actual, true parameter (population) value as the sample size increases.

COEFFICIENT OF DETERMINATION

•​ The overall goodness of fit of the regression model is measured by the coefficient
2
of determination, 𝑟 . It tells what proportion of the variation in the dependent
variable is explained by the explanatory variable.
•​ The coefficient of determination is the square of the correlation (r) between
predicted y scores and actual y scores; This 𝑟2 lies between 0 and 1; closer it is
to 1, better is the fit.
•​ And 𝑟2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An 𝑟2 of 0.10 means that 10 percent of the variance in Y is
predictable from X; an 𝑟2 of 0.20 means that 20 percent is predictable; and so
on.
To compute this r 2, we proceed as follows: Recall that
Yi = ˆYi + ˆui
In deviation form: yi = ˆyi + ˆui
TSS = ESS + RSS

𝐸𝑆𝑆 𝑅𝑆𝑆
1= 𝑇𝑆𝑆 + 𝑇𝑆𝑆
TSS = Total sum square: total variation of the actual Y values about their sample mean
ESS = Explained sum square: variation of the estimated Y values about their mean
RSS = Residual sum square: residual or unexplained variation of the Y values about the
regression line.
MULTIPLE REGRESSION MODEL

Regression models with more than one independent variable called multiple
regression models.
𝑌𝑖 = β1 + β2𝑋2𝑖 + β3𝑋3𝑖 + 𝑢𝑖

•​ The coefficients β2 and β3 are called the partial regression coefficients, which
means β2 measures the change in the mean value of Y, per unit change in X2,
holding the value of X3 constant.
•​ R-squared or 𝑹𝟐 is used to find goodness of fit of a multiple regression model.
So, if 𝑹𝟐 is 0.8, it means 80% of the variation in the output variable is explained
by the input variables. So, in simple terms, higher the 𝑹𝟐, the more variation is
explained by your input variables and hence better is your model.
2 2
𝑅 and Adjusted 𝑅
However, the problem with 𝑹𝟐 is that it will either stay the same or increase with
addition of more variables, even if they do not have any relationship with the output
variables. This is where “Adjusted ” comes to help. Adjusted R-square penalizes
you for adding variables which do not improve your existing model.
• Hence, if you are building Linear regression on multiple variable, it is always
suggested that you use Adjusted 𝑹𝟐 to judge goodness of model. In case you only
have one input variable, 𝑹𝟐 and Adjusted 𝑹𝟐 would be exactly same.
• Typically, the more non-significant variables you add into the model, the gap in 𝑹𝟐
and Adjusted 𝑹𝟐 increases.
HYPOTHESIS TESTING

The purpose of statistical inference is to draw conclusions about a population on the


basis of data obtained from a sample of that population. Hypothesis testing is a process
used to evaluate the strength of evidence from the sample and provide a method for
understanding how reliably one can extrapolate observed findings in a sample under
study to the larger population from which the sample was drawn.

•​ Hypothesis: It is a general statement made about any relationship. In other


words, it is a tentative assumption made about any relationship/population
parameter.

•​ Null Hypothesis (Ho): Null hypothesis is stated for the purpose of testing or
verifying its validity. It assumes that there is no difference between population
parameter and sample statistic.

•​ Alternative Hypothesis (H1): It includes any other admissible hypothesis, other


than null hypothesis. Alternate hypothesis accepted when null hypothesis is
rejected.

Errors Ho When…
Type I Rejected Ho is true
Type II Accepted Ho is false

Some concepts related to hypothesis testing:

•​ Level of Significance: It is the probability of committing Type I error.

•​ Power of a test: It is the probability of avoiding a Type II error.

•​ Degree of freedom: Degree of freedom of an estimate is the number of


independent pieces of information that went into calculating the estimate.

•​ Two Tailed Test: In this, the critical region lies on both sides. It does not tell us
whether the value is less than or greater than the desired value. The rejection
region under this test is taken on both the sides of the distribution.
•​ One Tailed Test: Under this, H1 can either be greater than or less than the
desired value. The rejection region, under this test, is taken only on one side of
the distribution.

Steps of Testing Hypothesis:​


1. Set your hypothesis.​
2. Choose a level of significance​
3. Choose a test​
4. Make calculations or computations using the test.​
5. Decision making stage

T-test

It was given by William Gosset in 1905 and is also known as Student’s T-test. It can be
used when standard deviation of population is unknown and number of observations is
less than 30.

Uses:

To test significance of mean of a random sample (Testing the Significance of


Regression Coefficients)
𝑋= sample average

µ0= population mean

s= standard deviation

S.E= standard error

Z-test

The test was given by Fischer and is used when population standard deviation is
known. The test is used when we need to identify whether the two samples are from the
same population or not. Z test can be considered as an alternative test for T test and it
follows certain assumptions, like:

Assumptions:

a. Sample size is large, that is, n > 30.

b. Population variance is known.

c. Population is normally distributed.

Z=
(𝑋−µ )0
𝑠

𝑋= sample average µ0= population mean s= standard deviation

F-test
•​ The test was given by Fisher in 1920’s and is closely related with ANOVA. It is
also known as Variance ratio test.
2
𝑠1
F= 2
𝑠2

•​ F-test is used to measure overall significance of regression model and is also a


2
test of significance of 𝑅 .
𝐸𝑆𝑆/𝑑𝑓 𝐸𝑆𝑆/(𝑘−1)
F= 𝑅𝑆𝑆/𝑑𝑓 = 𝑅𝑆𝑆/(𝑛−𝑘)

2
𝑥 (Chi-Square) test

It is a non-parametric test and does not make any assumptions about population from
which samples are drawn. It was first used by Karl Pearson in 1900.

Application of Chi-Square tests:

1. It is a test of independence.

2. It is a test of goodness of fit.

3. It is used to test the discrepancies between the observed frequencies and the
expected frequencies.

4. It is used to determine the association between two or more attributes.

DUMMY VARIABLE

Econometric analysis is not only for quantitative variables, we can incorporate


qualitative variables too. Dummy variables are used to represent qualitative variables in
a regression model, Ex: Gender, religion, nationality, etc. Variables that assume 0 and 1
values are called Dummy variable.

If a qualitative variable has m categories, introduce only (m − 1) dummy variables. For


example, the qualitative variable gender has two categories; male and female. We can
represent these two categories using a single variable. If the respondent given the value
of 1 for the variable, then it indicate, the respondent is male. Or if the respondent given
the value of 0 for the variable, then it indicate, the respondent is female.

ASSUMPTION VIOLATIONS
We already mentioned about CLRM assumption, but in some cases, the assumptions
may violate. In this section we are going to discuss about major assumption violations,
causes of such violations, consequences of such violation, detection methods and
remedial measures for respective assumption violations.

MULTICOLLINEARITY

The term multicollinearity is coined by Ragnar Frisch. Simply, Multicollinearity is


when two or more explanatory variables in a regression are highly related to one
another, such that they do not provide unique and/or independent information to that
regression.

Multicollinearity is perfect when 𝑟𝑥1𝑥2 = 1.

Reasons for the Problem of Multicollinearity

1.​ There is tendency of economics variables to move together over time. In time
series data, growth and trend factors are main cause for multicollinearity.

2.​ Model specification, for example, adding polynomial terms to a regression


model, especially when the range of the X variable is small.

3.​ An overdetermined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical
research where there may be a small number of patients about whom information
is collected on a large number of variables.

Consequences of Multicollinearity

1. When Multicollinearity is perfect:

•​ Least Square estimators are indeterminate.

•​ Variance and co-variances of the estimators become infinitely large.

yi = ˆ β2x2i + ˆ β3x3i + ˆui

Recall the meaning of ˆ β2: It gives the rate of change in the average value of Y as
X2 changes by a unit, holding X3 constant. But if X3 and X2 are perfectly collinear,
there is no way X3 can be kept constant: As X2 changes, so does X3 by the factor λ.
What it means, then, is that there is no way of disentangling the separate influences
of X2 and X3 from the given sample.

2. When Multicollinearity is less than perfect:

•​ Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
•​ Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily

•​ Although the t ratio of one or more coefficients is statistically insignificant, 𝑅2, the
overall measure of goodness of fit, can be very high.

•​ The OLS estimators and their standard errors can be sensitive to small changes
in the data.

Detection of Multicollinearity
2
1.​ If a model has High 𝑅 but few significant t ratios, there is a higher chance for
multicollinearity.

2.​ High pair-wise correlations among regressors denoting the presence of


multicollinearity in the model.

3.​ Examination of partial correlations.

4.​ Auxiliary Regression: Under this we exclude one variable and regress it on other
2
variables to estimate our 𝑅 .
1
5.​ Variance Inflation Factor (VIF) = 2
1−𝑟 23

1
Tolerance (TOL) = 𝑉𝐼𝐹

If TOL is close to zero, then multicollinearity is present and if it closer to one then
less degree of multicollinearity.

Remedial Measures

1.​ Using the a priori information

2.​ Dropping a variable: If there is some collinear variables in the model, dropping
any variable will solve multicollinearity of the model.

3.​ Transformation of a variable.

4.​ Additional or new data: Since multi-collinearity is a sample phenomenon, it can


be reduced by taking another sample or by increasing the sample size.

5.​ Combining cross-section and time series data.

HETEROSCEDASTICITY
The term Heteroscedasticity means that the variance of error term is not
constant, that is, different error terms have different variances.
2
var(ui|Xi)= σ𝑖

Heteroscedastic Homoscedastic

Causes of Heteroscedasticity

1. All the models pertaining to learning skills or error learning models exhibit
heteroscedasticity.

2. Economic variables such as income and wealth also show heteroscedasticity


because as income or wealth increase, so is the discretion to use it.

3. Some economic variables exhibit the skewness in distribution.

4. Data collection technique improves with time. As a result constant variance is not
found for all those economic variables where data collecting techniques are changing
very fast.

5. Specification errors in the models also lead to heteroscedasticity.

6. Outliers in the data may result into the problem of heteroscedasticity

Consequences of heteroscedasticity

1. The estimators are still linear, unbiased and consistency doesn’t change.

2. However, the estimators are no longer BLUE because they are no longer efficient;
therefore, they are not the best estimators. The estimators do not have a constant
variance.

Detection of Heteroscedasticity
•​ Park test

•​ Glejser test

•​ Spearman’s Rank Correlation Test

•​ Goldfeld-Quandt Test: This test only applicable to sample size greater than or
equal to 30.

•​ White’s General Heteroscedasticity Test

Remedial Measures

The problem of heteroscedasticity can be corrected when variances are known and
unknown.

Case1: When variances are known:

•​ Generalized Least Square (GLS): GLS is a procedure of transforming the original


variables of the model in such a way that the transformed variables satisfy the
assumption of the classical model and then apply OLS to them.

•​ Weighted Least Square (WLS)

Case 2: When variances are unknown:

In such a case, we transform the model in such a way so as to obtain a functional form
in which transformed error term has a constant variance.

AUTOCORRELATION

By the term auto-correlation we mean that the error terms are related to each other over
time or over space. Value of error term in one period is correlated with its value in
another period, then they are said to be serially correlated or auto-correlated.

⮚​ Spatial Auto- correlation: Correlation between cross-sectional units

⮚​ Serial Correlation: Correlation between error terms over a period of time

❑​ Positive auto-correlation is when the variables are going in the same direction.

❑​ Negative auto-correlation is when variables are going in different directions.

Causes of Auto-correlation

1. More prevalent in time series data.

2. Specification bias: Whenever we exclude an important variable or we use incorrect


functional form, then there is a possibility of auto-correlation in the model.
3. Cobweb Model: In case of cobweb models or whenever economic phenomenon
reflects cobweb phenomenon, that is, supply reacts to the price with a lag of one time
period, then we say that problem of auto-correlation arises in such data.

4. Lags: Whichever economic phenomenon is showing impact of the variables from the
previous time period, problem of auto-correlation is likely to surface in such models.

Consequence of Auto-correlation

•​ In the presence of autocorrelation the OLS estimators are still linear unbiased as
well as consistent and asymptotically normally distributed, but they are no longer
efficient (i.e., minimum variance).

•​ Confidence interval is likely to be wider than normal circumstances.

•​ t and F-tests are no longer valid and it will provide misleading result.
2
•​ 𝑅 is likely to be over-estimated.

Detecting Auto-correlation

•​ Graphical Method

•​ Breusch–Godfrey test (LM Test)

•​ Runs Test

•​ Durbin–Watson d Test

Remedial Measures

1.​ If the source of autocorrelation is omitted variables, then the remedy is to include
those omitted variables in the set of explanatory variables.

2.​ If autocorrelation is because of misspecification of the mathematical form , then


we need to change the form.

If auto-correlation exists because of reasons other than the above mentioned two, then
it is the case of pure auto-correlation.

●​ If ϱ(correlation coefficient) is known, then we apply GLS, transforming the


original data so as to produce a model whose random variables satisfy the
assumptions of Classical Least Square.
●​ If ϱ is not known, we then try to calculate p by using Durbin-Watson d statistic.

SIMULTANEOUS-EQUATION MODELS
A system describing the joint dependence of variables is called a system of
simultaneous equations or simultaneous equations model. A unique feature of
simultaneous-equation models is that the endogenous variable (i.e., regressand) in one
equation may appear as an explanatory variable (i.e., regressor) in another equation of
the system.

Simultaneous Equation Bias

The violation of assumption E(u𝑖 X𝑖 ) = 0 (zero covariance between independent


variable and error term) of OLS creates Simultaneous equation bias. The direct
application of OLS method in simultaneous equation will creates the following problems:

i. The problem of identification of the parameters of individual relationship

ii. There arise problem of estimation

iii. The OLS estimates are biased and inconsistent

IDENTIFICATION PROBLEM

Variables can be either exogenous or endogenous in Structural equation:

Q = D (P, X, Ud ) (demand)

P = MC (Q, Z,Us ) (supply)

Solving for equilibrium yields the reduced form relations

P = p (Z, X,Us ,Ud )

Q = q (Z, X,Us ,Ud ).

A reduced-form equation is one that expresses an endogenous variable solely in terms


of the predetermined variables and the stochastic disturbances.

•​ By the identification problem we mean whether numerical estimates of the


parameters of a structural equation can be obtained from the estimated
reduced-form coefficients.

•​ A model is said to be identified if it has a unique statistical form enabling unique


estimates of the parameters from the sample.

•​ If the model is not identified then estimates of the parameters can not be
estimated.
In econometric theory two possible situations of identifiability:

1.​ Equation under-identified: If an equation is unidentified it is impossible to


estimate all its parameters with any econometric technique.

2.​ Equation Identified: If an equation has a unique statistical form we say it is


identified.

a. Exactly Identified: It is said to be exactly identified if unique numerical values


of the structural parameters can be obtained.

b. Over-identified: It is said to be over-identified if more than one numerical


value can be obtained for some of the parameters of the structural equations

Rules of Identification

The rules or conditions of identification are known as:

1) Order Condition: It states that the total number of variables (endogenous and
exogenous) excluded from it must be equal to or greater than the number of
endogenous variables in the model less one. It is necessary condition but not the
sufficient condition for identification.

K–M≥G–1

K = number of total variable in structural model ( endogenous and

exogenous)

M = number of variables in the particular equation of the model which is

to be identified

G = Total number of equations or total number of endogenous variables.

2) Rank Condition: The rank condition states that in a system of G equations any
particular equation is identified if and only if it is possible to construct at least one non
zero determinant of order (G – 1) from the coefficients of the variables excluded from
that particular equation but contained in the other equations of the model.

⮚​ K – M = G – 1, the equation is exactly identified

⮚​ K – M < G – 1, the equation is under-identified

⮚​ K – M > G – 1, the equation is over-identified

METHODS TO ESTIMATE PARAMETERS

Indirect Least Squares (ILS) Method:


▪​ The method of ILS is used in case of Just/Exactly Identified Equations. It is the
method of obtaining the estimates of the structural coefficients from OLS
estimates of the reduced form coefficient.

Assumptions in ILS are:

1.​ Equation is Just/Exactly identified.

2.​ There must be full information about all equations in the model.

3.​ Error term from reduced form equations should be independently, identically
distributed.

4.​ Equations must be linear.

5.​ There should be no multicollinearity among the pre-determined variables of the


reduced form coefficients.

Properties of ILS Coefficient:

The ILS coefficients inherit all asymptotic properties like consistency and
efficiency; but the small sample properties such as unbiasedness do not
generally hold true.

Two Stage Least Squares (2SLS) Method

The 2SLS method was introduced by Henri Theil and Robert Bassmann and is mostly
used in equations which are over-identified. Under this method, the OLS is applied
twice. The method is used to obtain the proxy or instrumental variable for some
explanatory variable correlated with error term. 2SLS purifies the stochastic explanatory
variables from the influence of stochastic disturbance or random term.

Features of 2SLS

1.​ The method can be applied to an individual equation in the system without taking
into account the other equations.

2.​ This method is suitable for over-identified equations because it gives one
estimate per parameter.

3.​ This method can also be applied to unidentified equations but in that case ILS
estimates will be equal to 2SLS estimates.

4.​ If R² values in a reduced form regression are very high, then OLS and 2SLS will
be very close. If R² values in the 1st regressions are low, then 2SLS estimates
will be meaningless.

RECURSIVE MODELS
Recursive Models: If the equation of the structural form could be arranged in such a way
that the first equation contains only predetermined variables as explanatory variables
and second equation contains predetermined variables and the first endogenous
variable of the first equation and so on. Based on the assumption that explanatory
variables are uncorrelated with disturbance terms in the same equation.

𝑦1=𝑥1+𝑢1

𝑦2=𝑦1 + 𝑥 +𝑢2
1

𝑦3=𝑦1 + 𝑦2 + 𝑥 +𝑢3
1

This system above is called a recursive system. It is also called a triangular system as
coefficients of endogenous variables form a triangular form. The main advantage of
recursive model is that OLS can be applied directly on each equation to estimate
parameters and hence OLS will be best and unbiased.

TIME SERIES ANALYSIS

⮚​ A time series is simply a sequence of numbers collected at regular intervals over


a period of time.

⮚​ Time series is a collection of random variables (𝑌𝑡), such collection of random


variables ordered in time is called a random or stochastic process.

A stochastic process can be stationary or non stationary:

• A stochastic process is said to be stationary if its mean and variance are constant
overtime and the value of covariance between two time periods depends only on the
gap or lag between the two time periods.

Components of Time Series

The fluctuations of time series can be classified into four basic type of variations, they
are often called components or elements of a time series. They are:

1.​ Secular Trend: The secular trend is the main component of a time series which
results from long term effect of socio-economic and political factors. This trend
may show the growth or decline in a time series over a long period.

2.​ Seasonal Variations (Seasonal Trend): These are short term movements
occurring in a data due to seasonal factors. The short term is generally
considered as a period in which changes occur in a time series with variations in
weather or festivities.
3.​ Cyclical Variations: These are long term oscillation occurring in a time series.
These oscillations are mostly observed in economics data and the periods of
such oscillations are generally extended from five to twelve years or more.

4.​ Irregular Variations: These are sudden changes occurring in a time series
which are unlikely to be repeated, it is that component of a time series which
cannot be explained by trend, seasonal or cyclic movements.

Measurement of Trend

Trend can be measured using the below given methods:

⮚​ Moving Average Method: A moving average is a technique to get an overall idea


of the trends in a data set; Given a series of numbers and a fixed subset size, the
first element of the moving average is obtained by taking the average of the initial
fixed subset of the number series. Then the subset is modified by "shifting
forward"; that is, excluding the first number of the series and including the next
value in the subset.

⮚​ Method of Least Squares: Least Squares Method is a statistical technique to


determine the line of best fit for a model. The least squares method is specified
by an equation with certain parameters to observed data. This method is
extensively used in regression analysis and estimation.

Random Walk Models

A non – stationary time series will have a time varying mean or time varying
variance or both.

•​ Random Walk with Drift:

𝑌𝑡 = 𝛼 + 𝑌𝑡−1 + 𝑢𝑡

In random walk with drift the mean and variance increases overtime again violating
the condition of weak stationary

•​ Random Walk without Drift:

𝑌𝑡 = 𝑌𝑡−1 + 𝑢𝑡

The random walk without drift is a non stationary stochastic process

•​ Unit Root Stochastic Process

𝑌𝑡 = p𝑌𝑡−1 + 𝑢𝑡 −1≤𝜌≤1

If 𝜌 = 1, then the equation is known as unit root problem that is situation of


non-stationary. Thus term non-stationarity , Random walk and unit root are
synonymous.
Tests of Stationarity

The Unit Root Test:

𝑌𝑡 = 𝜌𝑌𝑡−1 + 𝑢𝑡 … … … . . (1)

If 𝝆 = 1 then equation (i) will become random walk model without drift which is
non-stationary stochastic process.

∆𝑌𝑡= (𝜌 − 1)−1 + 𝑢𝑡

∆𝑌𝑡= 𝛿𝑌𝑡−1 + 𝑢𝑡

𝑤ℎ𝑒𝑟𝑒 𝛿 = 𝜌 − 1

When 𝛿 = 0, = 1 that is we have unit root present.

• Dickey Fuller Test

CO-INTEGRATION

Stationarity is a crucial property for time series modeling. The problem is, in practice,
very few phenomena are actually stationary in their original form. The trick is to employ
the right technique for reframing the time series into a stationary form. One such
technique leverages a statistical property called co-integration. Co-integration forms a
synthetic stationary series from a linear combination of two or more non-stationary
series.

When two time series variables X and Y do not individually hang around a constant
value but their combination (could be linear) does hang around a constant is
called co-integration. Sometimes it's considered as a long term relationship between the
said variables.

METHODS OF FORECASTING

The most important aspect of time series is Forecasting. There are two methods of
forecasting which are popularly used. They are:

• Box – Jenkins Methodology (BJ Methodology): Box – Jenkins methodology technically


known as ARIMA methodology.

• Vector Auto-regression (VAR).

QUESTION DISCUSSION

▪​ Which of the following statements is true concerning standard regression model?

(A) Y has a probability distribution

(B) X has a probability distribution


(C) The disturbance term is assumed to be correlated with X

(D) For an adequate model the residual (û) will be zero for all sample data points
(Answer: A) (2013 sep iii 71)

Explanation: Only the errors follow a normal distribution (which implies the conditional
probability of Y given X is normal too). You do need distributional assumptions about the
response variable in order to make inferences (e.g, confidence intervals), but it is not
necessary that the response variable be normally distributed.

▪​ Standard Error (SE) of a large sample of size n from a population whose


2
variance is σ is:
2
σ σ
A. SE= 𝑛
B. SE= 𝑛

2
σ σ
C. SE= D. SE=
𝑛 𝑛

(Answer: D)(2015 june ii 44)

Explanation: The standard error (SE) of a statistic is the approximate standard deviation
of a statistical sample population. Therefore, the relationship between the standard
error of the mean and the standard deviation is such that, for a given sample size, the
standard error of the mean equals the standard deviation divided by the square root of
the sample size.

▪​ Which of the following is true in the context of statistical test of hypotheses for
two variable linear regression model:
2 2
A.​ 𝑡 >F B. 𝑡 <F
2
C.​ 𝑡 =F D. t=F (Answer: C)(2015 june iii 68)

Explanation: Relationship between ANOVA and t tests

• For two independent samples, either t or F can be used


2
– Always result in same decision F = 𝑡 .

▪​ For the regression model given below


Y=βo+ β1X+u

Y=20+2X

SE= 0.46

To test Ho: β1=2.1 against 𝐻1: β1≠ 2.1 (not equal to 2.1), test statistic |t| is equal to

A.​ 4.609 B. 0.217

C.​ 4.34 D. 0.33 (Answer: B)(2012 june iii 60)

Explanation: steps to calculate t statistic for slope of regression line:


𝐻𝑜: β1 −β1 2.1−2
t= 𝑆𝐸
= 0.46
= 0.217.

▪​ Generalized Least Squares Method is suitable for estimation of parameters in a


General Linear Model for dealing with problem of

(i) Measurement Errors (ii) Auto-correlated Disturbances

(iii) Multi-collinearity (iv) Heteroscedasticity

Select the correct answer from codes given below :

Codes :

(A) (i), (ii), (iii), (iv) (B) (i), (iii), (iv)

(C) (ii), (iv) (D) (i), (iii)

(Answer: C)(2016 july iii 73)

▪​ Three statements are given as under

(a) Generalized least square (GLS) method is capable of providing BLUE in situations
where OLS fails

(b) GLS is OLS on the transformed variables that satisfy the standard least squares
assumptions

(c) GLS method is suitable for dealing with multicollinearity problem

Choose the correct option:


A. a and b are correct but c is not correct

B. a and c are correct but b is not correct

C. a is correct but b and c are not correct

D. a is not correct but b and c are correct (Answer: A) (2019 dec 48)

Explanation: The generalized least squares (GLS) estimator of the coefficients of


a linear regression is a generalization of the ordinary least squares (OLS) estimator. It is
used to deal with situations in which the OLS estimator is not BLUE (best linear
unbiased estimator) because one of the main assumptions of the Gauss-Markov
theorem, namely that of homoscedasticity and absence of serial correlation, is violated.
In such situations, provided that the other assumptions of the Gauss-Markov theorem
are satisfied, the GLS estimator is BLUE.

▪​ If a Durbin - Watson statistics takes value close to zero, what will be the value of
first order autocorrelation coefficient ?

(A) Close to zero (B) Close to plus one

(C) Close to minus one (D) Close to either minus or plus one

(Answer: B)(2017 nov iii, 73)

Explanation: Durbin–Watson d Test

^
[
d= 2 1 − 𝑝 ]
^
-1<𝑝>1 (Simply correlation coefficient)

▪​ Multiplicative Decomposition model is used to


A. Deseasonalise the time series data

B. Build cost of living index

C. Test a hypothesis

D. Estimate probability

(Answer: A)(2019 june 75)

Explanation: The decomposition of time series is a statistical task that deconstructs


a time series into several components, each representing one of the underlying
categories of patterns

▪​ Boot strapping technique is used to:

A. Test for specification bias

B. Obtain sampling distribution of parameters of interest

C. Test for auto-correlation

D. Test for normality of error terms (Answer: B)(2019 dec 76)

Explanation: Bootstrapping methods are a numerical approach to generating confidence


intervals that use either resampled data or simulated data to estimate the sampling
distribution of the maximum likelihood parameter estimates.

❑​ Dickey Fuller test – Stationarity

❑​ Box-Jenkins test – Forecasting

❑​ Glejser test – Heteroscedasticity

❑​ Granger test – Causality

The Granger causality test is a statistical hypothesis test for determining whether
one time series is useful in forecasting another, first proposed in
1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued
that causality in economics could be tested for by measuring the ability to predict the
future values of a time series using prior values of another time series

❑​ Goldfeld-Quandt test – Heteroscedasticity


❑​ Gauss-Markov – BLUE

❑​ 𝐻0: βo = 0, 𝐻1: β 2≠0 – Two sided test

2
❑​ Coefficient of determination - 𝑅

❑​ Durbin Watson test – Autocorrelation

❑​ Delphi Method – Forecasting

The method relies on the key assumption that forecasts from a group are
generally more accurate than those from individuals. The aim of the Delphi
method is to construct consensus forecasts from a group of experts in a
structured iterative manner.

❑​ Students t test – Testing significance of regression coefficient


2
❑​ Contingency table - 𝑋 test

Frequency tables of two variables presented simultaneously are


called contingency tables. Hypothesis tests on contingency tables are based on a
statistic called chi-square.

❑​ Unit Root Test – Stationarity

❑​ Fisher’s F test – Test of significance of over-all regression

❑​ Kyock’s Approach – Lagged variables

Koyck’s approach to econometric analysis deals with relationships involving Lagged


explanatory variables

❑​ Ferrar-Glauber Test – Multicollinearity

Farrar-Glauber test is one of the statistical test used to detect multicollinearity.

❑​ Wald-Bartlett test – Errors in variables

The Wald test can tell you which model variables are contributing something significant.

❑​ Dummy variables – Qualitative variables

❑​ Identification – Rank Condition

▪​ In a two variable regression Y is dependent variable and X is independent


variable. The correlation coefficient between Y and X is 0.6. For this which

of the following result is correct ?


(A) 36% variations in Y are explained by X.

(B) 60% variations in Y are explained by X.

(C) 6% variations in Y are explained by X.

(D) None of the above. (Answer: A)(2013 dec iii 71)

▪​ Given the two regression lines estimated from given data as under :

Y = 4 + 0.4X

X = –2 + 0.9Y

Then coefficient of correlation between X and Y will be

(A) 0.4 (B) 0.5

(C) 0.6 (D) 0.8 (Answer: C)(2013 dec ii 47)

Correlation v/s Regression

•​ Use correlation for a quick and simple summary of the direction and strength of
the relationship between two or more numeric variables. Use regression when
you’re looking to predict, optimize, or explain a number response between the
variables (how x influences y).

Relationship between correlation and regression:

•​ r= 𝑏𝑥𝑦 * 𝑏𝑦𝑥
2
•​ Coefficient of determination (𝑟 ) is a measurement used to explain how much
variability of one factor can be caused by its relationship to another related factor.

Bibliography
H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1.

Gujarati, D. N. (2003). In Basic econometrics. essay, McGraw-Hill.

Roger B. Davis and ScD Kenneth J. MukamalMD, Davis, R. B., Roger B. Davis From
the Division of General Medicine and Primary Care, Mukamal, K. J., Kenneth J.
Mukamal From the Division of General Medicine and Primary Care, & Davis, C. to R. B.
(2006, September 5). Hypothesis testing. Circulation. Retrieved September 28, 2021,

You might also like