0% found this document useful (0 votes)

2 views

Econometrics Notes

Econometrics is a tool for empirically evaluating economic theories, involving a methodology that includes defining economic theories, specifying mathematical and econometric models, obtaining data, estimating models, hypothesis testing, forecasting, and policy application. It utilizes various data types such as time series, cross-sectional, pooled, and panel data, and employs regression analysis as a primary tool for estimation. The document also covers the assumptions of classical linear regression, properties of least squares estimators, coefficient of determination, multiple regression models, and hypothesis testing techniques including T-tests, Z-tests, and F-tests.

Uploaded by

Shubham Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Econometrics Notes

Uploaded by

Shubham Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Econometrics

notes
ECONOMETRICS

INTRODUCTION TO ECONOMETRICS

What is Econometrics?

“Econometrics is concerned with the empirical determination of economic laws” ( Theil,

1971).

As the definition suggest, Econometrics is a tool used to empirically evaluate economic

laws and theories.

Methodology of Econometrics

The overall methodology of Econometrics can be described under the below given 8
steps:

1: Economic theory: Since Econometrics is concerned with the empirical determination

of economic theories, the procedures of econometric operations start with a statement
of theory or hypothesis.

Example: Fundamental theory of consumption.

2: Specification of mathematical model of the theory: Every economic theory would

have equivalent mathematical models. For example, c=a+bY.

3: Specification of Econometric model of the theory: Mathematical models are

concerned with the exact or deterministic relationship among variables. But
relationships between economic variables are inexact. To allow for the inexact
relationships between economic variables, specification of econometric model is
required. Econometric models would provide more realistic expression of relationship
between variables by incorporating components of error term.

Example: c=a+bY+e

4: Obtaining data: To estimate the econometric model, that is, to obtain the
numerical values of a and b, we need data.

5: Estimation of the Econometric Model: Based on the available data, we can estimate
the parameters in econometric models. The numerical estimates of the parameters give
empirical content to the economic theory. The statistical technique of regression
analysis is the main tool used to obtain the estimates.
6: Hypothesis Testing: The estimation based on the sample data needs hypothesis
testing for generalization of the estimate. The confirmation or refutation of economic
theories on the basis of sample evidence is based on a branch of statistical theory
known as statistical inference (hypothesis testing).
7: Forecasting or Prediction: If the estimated parameters are statistically significant, we
can use the outcome for forecasting or prediction purposes.
8: Using the Model for Policy purpose: The outcomes derived from econometrics
operations can be used for various policy purposes.

Before moving into the core topics of Econometrics, some pre-requisite knowledge
about Types of data and measurement scale of data will strengthen the understanding
about Econometrics operations.

Types of Data

● Time series data: A time series is a set of observations on the values that a
variable takes at different times. Such data may be collected at regular intervals
such as daily, weekly, monthly, quarterly, annually.

For example: data of GDP growth for the last 10 years.

● Cross-sectional data: Cross-section data are data on one or more variables

collected at the same point in time.

For example: data of SGDP of all states in India during the year 2021.

● Pooled data: In pooled, or combined, data are elements of both time series and
cross-section data.

● Panel data: This is a special type of pooled data in which the same
cross-sectional unit is surveyed over time.

For example: data of monthly bond prices of 100 companies for five years.

Measurement scale of variables

1. Nominal Scale. Nominal variables can be placed into categories. They don’t
have a numeric value and so cannot be added, subtracted, divided or multiplied.
They also have no order. For example Gender.

2. Ordinal Scale. The ordinal scale contains things that you can place in order. For
example, Rank obtained by students. Basically, if you can rank data by 1st, 2nd,
3rd place (and so on), then you have data that’s on an ordinal scale.

3. Interval Scale. An interval scale is one where there is order and the difference
between two values is meaningful. Examples of interval variables include:
temperature, year, etc.

4. Ratio Scale. An extension of the interval scale, here both differences and ratios
are meaningful. Most of the quantitative variables comes under this head. For
example GDP.

THE SIMPLE LINEAR REGRESSION MODEL

As we mentioned in the methodology, the process of econometric analysis starts with an
econometric model. An econometric model can be constructed using regression
equation. Therefore regression is considered as a main tool of econometrics.

Regression analysis is concerned with the study of the dependence of one variable,
the dependent variable, on one or more other variables, the explanatory variables, with
a view to estimating and/or predicting the (population) mean or average value of the
former in terms of the known or fixed (in repeated sampling) values of the latter.

The term dependent variable can be denoted using other terminologies like Explained
variable, Regressand, Endogenous variable, and Controlled variable. Likewise the term
explanatory variables also have other terminologies such as Independent variable,
Regressor, Exogenous variable, and Control variable.

E(Y | Xi) = β1 + β2Xi

Where β1 and β2 are unknown but fixed parameters known as the regression
coefficients. Equation itself is known as the linear population regression function
(PRF).

● THE SAMPLE REGRESSION FUNCTION (SRF):

Yi = ˆ β1 + ˆβ2Xi + ˆui

The PRF is an idealized concept, since in practice one rarely has access to the entire
population of interest. Usually, one has a sample of observations from the population.
Therefore, error term is an inevitable element in models based on samples. Usually, one
has to use the stochastic sample regression function (SRF) to estimate the PRF.

OLS METHOD

The estimation of econometric model is very crucial in econometric analysis. The

method of ordinary least squares is attributed to Carl Friedrich Gauss, a German
mathematician. Under certain assumptions, the method of least squares has some very
attractive statistical properties that have made it one of the most powerful and popular
methods for estimating the unknown parameters in a linear regression model.

This method work based on the criterion of minimizing error term. Under this method,
the value of parameter is selected at which the summation of error square is minimum.

2 ^ 2
∑ 𝑢𝑖 =∑(𝑌𝑖 − 𝑌𝑖) should be minimum
Hence as per least square criterion, the first order derivative w.r.t β will provide exact
estimate of regression coefficients.

^ ^ 2
∑(𝑌𝑖 − β1 − β2Xi)

∑𝑥𝑖𝑦𝑖
β2=
2
∑𝑥𝑖

β1= 𝑌-β2𝑋

Here, 𝑋= mean of X

xi= 𝑋-X.

ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL

1: The regression model is linear in the parameters. Keep in mind that the
regressand Y and the regressor X themselves may be nonlinear.

2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be
nonstochastic.

3: Zero mean value of disturbance ui. Given the value of X, the mean, or expected,
value of the random disturbance term ui is zero. Technically, the conditional mean value
of ui is zero. Symbolically, we have
E(ui |Xi) = 0

4: Homoscedasticity or equal variance of ui. Given the value of X, the variance of ui

is the same for all observations. That is, the conditional variances of ui are identical.
Symbolically, we have
2
Var (ui) = σ

5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj (i

_= j), the correlation between any two ui and uj (i _= j) is zero. Symbolically,
Cov (ui, uj) = 0

6: Zero covariance between ui and Xi, or E(ui, Xi) = 0.

7: The number of observations n must be greater than the number of parameters
to be estimated.

8: Variability in X values. The X values in a given sample must not all be the same. If
all the X values are identical, then Xi = .X and the denominator of that equation will be
zero, making it impossible to estimate β2 and therefore β1.

9: The regression model is correctly specified. Alternatively, there is no

specification bias or error in the model used in empirical analysis.

10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.

PROPERTIES OF LEAST SQUARE ESTIMATORS

Given the assumptions of the classical linear regression model, the least-squares
estimates possess some ideal or optimum properties. These properties are contained in
the well-known Gauss–Markov theorem.

Gauss–Markov Theorem: Given the assumptions of the classical linear regression

model, the least-squares estimators, in the class of unbiased linear estimators, have
minimum variance, that is, they are BLUE.

Best Linear Unbiased Estimators (BLUE):

1. OLS estimator is linear, that is, a linear function of a random variable.
2. It is unbiased, that is, its average or expected value, E (ˆ β2), is equal to the true
value, β2.
3. OLS estimates give us minimum variance (best).
An unbiased estimator with the least variance is known as an efficient estimator.

These properties are specific to the small sample based models (finite sample
properties). While, for the large sample models, an additional property of Consistency
also included (asymptotic properties). An estimator is said to be consistent if its value
approaches the actual, true parameter (population) value as the sample size increases.

COEFFICIENT OF DETERMINATION

• The overall goodness of fit of the regression model is measured by the coefficient
2
of determination, 𝑟 . It tells what proportion of the variation in the dependent
variable is explained by the explanatory variable.
• The coefficient of determination is the square of the correlation (r) between
predicted y scores and actual y scores; This 𝑟2 lies between 0 and 1; closer it is
to 1, better is the fit.
• And 𝑟2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An 𝑟2 of 0.10 means that 10 percent of the variance in Y is
predictable from X; an 𝑟2 of 0.20 means that 20 percent is predictable; and so
on.
To compute this r 2, we proceed as follows: Recall that
Yi = ˆYi + ˆui
In deviation form: yi = ˆyi + ˆui
TSS = ESS + RSS

𝐸𝑆𝑆 𝑅𝑆𝑆
1= 𝑇𝑆𝑆 + 𝑇𝑆𝑆
TSS = Total sum square: total variation of the actual Y values about their sample mean
ESS = Explained sum square: variation of the estimated Y values about their mean
RSS = Residual sum square: residual or unexplained variation of the Y values about the
regression line.
MULTIPLE REGRESSION MODEL

Regression models with more than one independent variable called multiple
regression models.
𝑌𝑖 = β1 + β2𝑋2𝑖 + β3𝑋3𝑖 + 𝑢𝑖

• The coefficients β2 and β3 are called the partial regression coefficients, which
means β2 measures the change in the mean value of Y, per unit change in X2,
holding the value of X3 constant.
• R-squared or 𝑹𝟐 is used to find goodness of fit of a multiple regression model.
So, if 𝑹𝟐 is 0.8, it means 80% of the variation in the output variable is explained
by the input variables. So, in simple terms, higher the 𝑹𝟐, the more variation is
explained by your input variables and hence better is your model.
2 2
𝑅 and Adjusted 𝑅
However, the problem with 𝑹𝟐 is that it will either stay the same or increase with
addition of more variables, even if they do not have any relationship with the output
variables. This is where “Adjusted ” comes to help. Adjusted R-square penalizes
you for adding variables which do not improve your existing model.
• Hence, if you are building Linear regression on multiple variable, it is always
suggested that you use Adjusted 𝑹𝟐 to judge goodness of model. In case you only
have one input variable, 𝑹𝟐 and Adjusted 𝑹𝟐 would be exactly same.
• Typically, the more non-significant variables you add into the model, the gap in 𝑹𝟐
and Adjusted 𝑹𝟐 increases.
HYPOTHESIS TESTING

The purpose of statistical inference is to draw conclusions about a population on the

basis of data obtained from a sample of that population. Hypothesis testing is a process
used to evaluate the strength of evidence from the sample and provide a method for
understanding how reliably one can extrapolate observed findings in a sample under
study to the larger population from which the sample was drawn.

• Hypothesis: It is a general statement made about any relationship. In other

words, it is a tentative assumption made about any relationship/population
parameter.

• Null Hypothesis (Ho): Null hypothesis is stated for the purpose of testing or
verifying its validity. It assumes that there is no difference between population
parameter and sample statistic.

• Alternative Hypothesis (H1): It includes any other admissible hypothesis, other

than null hypothesis. Alternate hypothesis accepted when null hypothesis is
rejected.

Errors Ho When…
Type I Rejected Ho is true
Type II Accepted Ho is false

Some concepts related to hypothesis testing:

• Level of Significance: It is the probability of committing Type I error.

• Power of a test: It is the probability of avoiding a Type II error.

• Degree of freedom: Degree of freedom of an estimate is the number of

independent pieces of information that went into calculating the estimate.

• Two Tailed Test: In this, the critical region lies on both sides. It does not tell us
whether the value is less than or greater than the desired value. The rejection
region under this test is taken on both the sides of the distribution.
• One Tailed Test: Under this, H1 can either be greater than or less than the
desired value. The rejection region, under this test, is taken only on one side of
the distribution.

Steps of Testing Hypothesis:

1. Set your hypothesis.
2. Choose a level of significance
3. Choose a test
4. Make calculations or computations using the test.
5. Decision making stage

T-test

It was given by William Gosset in 1905 and is also known as Student’s T-test. It can be
used when standard deviation of population is unknown and number of observations is
less than 30.

Uses:

To test significance of mean of a random sample (Testing the Significance of

Regression Coefficients)
𝑋= sample average

µ0= population mean

s= standard deviation

S.E= standard error

Z-test

The test was given by Fischer and is used when population standard deviation is
known. The test is used when we need to identify whether the two samples are from the
same population or not. Z test can be considered as an alternative test for T test and it
follows certain assumptions, like:

Assumptions:

a. Sample size is large, that is, n > 30.

b. Population variance is known.

c. Population is normally distributed.

Z=
(𝑋−µ )0
𝑠

𝑋= sample average µ0= population mean s= standard deviation

F-test
• The test was given by Fisher in 1920’s and is closely related with ANOVA. It is
also known as Variance ratio test.
2
𝑠1
F= 2
𝑠2

• F-test is used to measure overall significance of regression model and is also a

2
test of significance of 𝑅 .
𝐸𝑆𝑆/𝑑𝑓 𝐸𝑆𝑆/(𝑘−1)
F= 𝑅𝑆𝑆/𝑑𝑓 = 𝑅𝑆𝑆/(𝑛−𝑘)

2
𝑥 (Chi-Square) test

It is a non-parametric test and does not make any assumptions about population from
which samples are drawn. It was first used by Karl Pearson in 1900.

Application of Chi-Square tests:

1. It is a test of independence.

2. It is a test of goodness of fit.

3. It is used to test the discrepancies between the observed frequencies and the
expected frequencies.

4. It is used to determine the association between two or more attributes.

DUMMY VARIABLE

Econometric analysis is not only for quantitative variables, we can incorporate

qualitative variables too. Dummy variables are used to represent qualitative variables in
a regression model, Ex: Gender, religion, nationality, etc. Variables that assume 0 and 1
values are called Dummy variable.

If a qualitative variable has m categories, introduce only (m − 1) dummy variables. For

example, the qualitative variable gender has two categories; male and female. We can
represent these two categories using a single variable. If the respondent given the value
of 1 for the variable, then it indicate, the respondent is male. Or if the respondent given
the value of 0 for the variable, then it indicate, the respondent is female.

ASSUMPTION VIOLATIONS
We already mentioned about CLRM assumption, but in some cases, the assumptions
may violate. In this section we are going to discuss about major assumption violations,
causes of such violations, consequences of such violation, detection methods and
remedial measures for respective assumption violations.

MULTICOLLINEARITY

The term multicollinearity is coined by Ragnar Frisch. Simply, Multicollinearity is

when two or more explanatory variables in a regression are highly related to one
another, such that they do not provide unique and/or independent information to that
regression.

Multicollinearity is perfect when 𝑟𝑥1𝑥2 = 1.

Reasons for the Problem of Multicollinearity

1. There is tendency of economics variables to move together over time. In time
series data, growth and trend factors are main cause for multicollinearity.

2. Model specification, for example, adding polynomial terms to a regression

model, especially when the range of the X variable is small.

3. An overdetermined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical
research where there may be a small number of patients about whom information
is collected on a large number of variables.

Consequences of Multicollinearity

1. When Multicollinearity is perfect:

• Least Square estimators are indeterminate.

• Variance and co-variances of the estimators become infinitely large.

yi = ˆ β2x2i + ˆ β3x3i + ˆui

Recall the meaning of ˆ β2: It gives the rate of change in the average value of Y as
X2 changes by a unit, holding X3 constant. But if X3 and X2 are perfectly collinear,
there is no way X3 can be kept constant: As X2 changes, so does X3 by the factor λ.
What it means, then, is that there is no way of disentangling the separate influences
of X2 and X3 from the given sample.

2. When Multicollinearity is less than perfect:

• Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
• Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily

• Although the t ratio of one or more coefficients is statistically insignificant, 𝑅2, the
overall measure of goodness of fit, can be very high.

• The OLS estimators and their standard errors can be sensitive to small changes
in the data.

Detection of Multicollinearity
2
1. If a model has High 𝑅 but few significant t ratios, there is a higher chance for
multicollinearity.

2. High pair-wise correlations among regressors denoting the presence of

multicollinearity in the model.

3. Examination of partial correlations.

4. Auxiliary Regression: Under this we exclude one variable and regress it on other
2
variables to estimate our 𝑅 .
1
5. Variance Inflation Factor (VIF) = 2
1−𝑟 23

1
Tolerance (TOL) = 𝑉𝐼𝐹

If TOL is close to zero, then multicollinearity is present and if it closer to one then
less degree of multicollinearity.

Remedial Measures

1. Using the a priori information

2. Dropping a variable: If there is some collinear variables in the model, dropping
any variable will solve multicollinearity of the model.

3. Transformation of a variable.

4. Additional or new data: Since multi-collinearity is a sample phenomenon, it can

be reduced by taking another sample or by increasing the sample size.

5. Combining cross-section and time series data.

HETEROSCEDASTICITY
The term Heteroscedasticity means that the variance of error term is not
constant, that is, different error terms have different variances.
2
var(ui|Xi)= σ𝑖

Heteroscedastic Homoscedastic

Causes of Heteroscedasticity

1. All the models pertaining to learning skills or error learning models exhibit
heteroscedasticity.

2. Economic variables such as income and wealth also show heteroscedasticity

because as income or wealth increase, so is the discretion to use it.

3. Some economic variables exhibit the skewness in distribution.

4. Data collection technique improves with time. As a result constant variance is not
found for all those economic variables where data collecting techniques are changing
very fast.

5. Specification errors in the models also lead to heteroscedasticity.

6. Outliers in the data may result into the problem of heteroscedasticity

Consequences of heteroscedasticity

1. The estimators are still linear, unbiased and consistency doesn’t change.

2. However, the estimators are no longer BLUE because they are no longer efficient;
therefore, they are not the best estimators. The estimators do not have a constant
variance.

Detection of Heteroscedasticity
• Park test

• Glejser test

• Spearman’s Rank Correlation Test

• Goldfeld-Quandt Test: This test only applicable to sample size greater than or
equal to 30.

• White’s General Heteroscedasticity Test

Remedial Measures

The problem of heteroscedasticity can be corrected when variances are known and
unknown.

Case1: When variances are known:

• Generalized Least Square (GLS): GLS is a procedure of transforming the original

variables of the model in such a way that the transformed variables satisfy the
assumption of the classical model and then apply OLS to them.

• Weighted Least Square (WLS)

Case 2: When variances are unknown:

In such a case, we transform the model in such a way so as to obtain a functional form
in which transformed error term has a constant variance.

AUTOCORRELATION

By the term auto-correlation we mean that the error terms are related to each other over
time or over space. Value of error term in one period is correlated with its value in
another period, then they are said to be serially correlated or auto-correlated.

⮚ Spatial Auto- correlation: Correlation between cross-sectional units

⮚ Serial Correlation: Correlation between error terms over a period of time

❑ Positive auto-correlation is when the variables are going in the same direction.

❑ Negative auto-correlation is when variables are going in different directions.

Causes of Auto-correlation

1. More prevalent in time series data.

2. Specification bias: Whenever we exclude an important variable or we use incorrect

functional form, then there is a possibility of auto-correlation in the model.
3. Cobweb Model: In case of cobweb models or whenever economic phenomenon
reflects cobweb phenomenon, that is, supply reacts to the price with a lag of one time
period, then we say that problem of auto-correlation arises in such data.

4. Lags: Whichever economic phenomenon is showing impact of the variables from the
previous time period, problem of auto-correlation is likely to surface in such models.

Consequence of Auto-correlation

• In the presence of autocorrelation the OLS estimators are still linear unbiased as
well as consistent and asymptotically normally distributed, but they are no longer
efficient (i.e., minimum variance).

• Confidence interval is likely to be wider than normal circumstances.

• t and F-tests are no longer valid and it will provide misleading result.
2
• 𝑅 is likely to be over-estimated.

Detecting Auto-correlation

• Graphical Method

• Breusch–Godfrey test (LM Test)

• Runs Test

• Durbin–Watson d Test

Remedial Measures

1. If the source of autocorrelation is omitted variables, then the remedy is to include
those omitted variables in the set of explanatory variables.

2. If autocorrelation is because of misspecification of the mathematical form , then

we need to change the form.

If auto-correlation exists because of reasons other than the above mentioned two, then
it is the case of pure auto-correlation.

● If ϱ(correlation coefficient) is known, then we apply GLS, transforming the

original data so as to produce a model whose random variables satisfy the
assumptions of Classical Least Square.
● If ϱ is not known, we then try to calculate p by using Durbin-Watson d statistic.

SIMULTANEOUS-EQUATION MODELS
A system describing the joint dependence of variables is called a system of
simultaneous equations or simultaneous equations model. A unique feature of
simultaneous-equation models is that the endogenous variable (i.e., regressand) in one
equation may appear as an explanatory variable (i.e., regressor) in another equation of
the system.

Simultaneous Equation Bias

The violation of assumption E(u𝑖 X𝑖 ) = 0 (zero covariance between independent

variable and error term) of OLS creates Simultaneous equation bias. The direct
application of OLS method in simultaneous equation will creates the following problems:

i. The problem of identification of the parameters of individual relationship

ii. There arise problem of estimation

iii. The OLS estimates are biased and inconsistent

IDENTIFICATION PROBLEM

Variables can be either exogenous or endogenous in Structural equation:

Q = D (P, X, Ud ) (demand)

P = MC (Q, Z,Us ) (supply)

Solving for equilibrium yields the reduced form relations

P = p (Z, X,Us ,Ud )

Q = q (Z, X,Us ,Ud ).

A reduced-form equation is one that expresses an endogenous variable solely in terms

of the predetermined variables and the stochastic disturbances.

• By the identification problem we mean whether numerical estimates of the

parameters of a structural equation can be obtained from the estimated
reduced-form coefficients.

• A model is said to be identified if it has a unique statistical form enabling unique

estimates of the parameters from the sample.

• If the model is not identified then estimates of the parameters can not be
estimated.
In econometric theory two possible situations of identifiability:

1. Equation under-identified: If an equation is unidentified it is impossible to

estimate all its parameters with any econometric technique.

2. Equation Identified: If an equation has a unique statistical form we say it is

identified.

a. Exactly Identified: It is said to be exactly identified if unique numerical values

of the structural parameters can be obtained.

b. Over-identified: It is said to be over-identified if more than one numerical

value can be obtained for some of the parameters of the structural equations

Rules of Identification

The rules or conditions of identification are known as:

1) Order Condition: It states that the total number of variables (endogenous and
exogenous) excluded from it must be equal to or greater than the number of
endogenous variables in the model less one. It is necessary condition but not the
sufficient condition for identification.

K–M≥G–1

K = number of total variable in structural model ( endogenous and

exogenous)

M = number of variables in the particular equation of the model which is

to be identified

G = Total number of equations or total number of endogenous variables.

2) Rank Condition: The rank condition states that in a system of G equations any
particular equation is identified if and only if it is possible to construct at least one non
zero determinant of order (G – 1) from the coefficients of the variables excluded from
that particular equation but contained in the other equations of the model.

⮚ K – M = G – 1, the equation is exactly identified

⮚ K – M < G – 1, the equation is under-identified

⮚ K – M > G – 1, the equation is over-identified

METHODS TO ESTIMATE PARAMETERS

Indirect Least Squares (ILS) Method:

▪ The method of ILS is used in case of Just/Exactly Identified Equations. It is the
method of obtaining the estimates of the structural coefficients from OLS
estimates of the reduced form coefficient.

Assumptions in ILS are:

1. Equation is Just/Exactly identified.

2. There must be full information about all equations in the model.

3. Error term from reduced form equations should be independently, identically
distributed.

4. Equations must be linear.

5. There should be no multicollinearity among the pre-determined variables of the

reduced form coefficients.

Properties of ILS Coefficient:

The ILS coefficients inherit all asymptotic properties like consistency and
efficiency; but the small sample properties such as unbiasedness do not
generally hold true.

Two Stage Least Squares (2SLS) Method

The 2SLS method was introduced by Henri Theil and Robert Bassmann and is mostly
used in equations which are over-identified. Under this method, the OLS is applied
twice. The method is used to obtain the proxy or instrumental variable for some
explanatory variable correlated with error term. 2SLS purifies the stochastic explanatory
variables from the influence of stochastic disturbance or random term.

Features of 2SLS

1. The method can be applied to an individual equation in the system without taking
into account the other equations.

2. This method is suitable for over-identified equations because it gives one
estimate per parameter.

3. This method can also be applied to unidentified equations but in that case ILS
estimates will be equal to 2SLS estimates.

4. If R² values in a reduced form regression are very high, then OLS and 2SLS will
be very close. If R² values in the 1st regressions are low, then 2SLS estimates
will be meaningless.

RECURSIVE MODELS
Recursive Models: If the equation of the structural form could be arranged in such a way
that the first equation contains only predetermined variables as explanatory variables
and second equation contains predetermined variables and the first endogenous
variable of the first equation and so on. Based on the assumption that explanatory
variables are uncorrelated with disturbance terms in the same equation.

𝑦1=𝑥1+𝑢1

𝑦2=𝑦1 + 𝑥 +𝑢2
1

𝑦3=𝑦1 + 𝑦2 + 𝑥 +𝑢3
1

This system above is called a recursive system. It is also called a triangular system as
coefficients of endogenous variables form a triangular form. The main advantage of
recursive model is that OLS can be applied directly on each equation to estimate
parameters and hence OLS will be best and unbiased.

TIME SERIES ANALYSIS

⮚ A time series is simply a sequence of numbers collected at regular intervals over

a period of time.

⮚ Time series is a collection of random variables (𝑌𝑡), such collection of random

variables ordered in time is called a random or stochastic process.

A stochastic process can be stationary or non stationary:

• A stochastic process is said to be stationary if its mean and variance are constant
overtime and the value of covariance between two time periods depends only on the
gap or lag between the two time periods.

Components of Time Series

The fluctuations of time series can be classified into four basic type of variations, they
are often called components or elements of a time series. They are:

1. Secular Trend: The secular trend is the main component of a time series which
results from long term effect of socio-economic and political factors. This trend
may show the growth or decline in a time series over a long period.

2. Seasonal Variations (Seasonal Trend): These are short term movements
occurring in a data due to seasonal factors. The short term is generally
considered as a period in which changes occur in a time series with variations in
weather or festivities.
3. Cyclical Variations: These are long term oscillation occurring in a time series.
These oscillations are mostly observed in economics data and the periods of
such oscillations are generally extended from five to twelve years or more.

4. Irregular Variations: These are sudden changes occurring in a time series
which are unlikely to be repeated, it is that component of a time series which
cannot be explained by trend, seasonal or cyclic movements.

Measurement of Trend

Trend can be measured using the below given methods:

⮚ Moving Average Method: A moving average is a technique to get an overall idea

of the trends in a data set; Given a series of numbers and a fixed subset size, the
first element of the moving average is obtained by taking the average of the initial
fixed subset of the number series. Then the subset is modified by "shifting
forward"; that is, excluding the first number of the series and including the next
value in the subset.

⮚ Method of Least Squares: Least Squares Method is a statistical technique to

determine the line of best fit for a model. The least squares method is specified
by an equation with certain parameters to observed data. This method is
extensively used in regression analysis and estimation.

Random Walk Models

A non – stationary time series will have a time varying mean or time varying
variance or both.

• Random Walk with Drift:

𝑌𝑡 = 𝛼 + 𝑌𝑡−1 + 𝑢𝑡

In random walk with drift the mean and variance increases overtime again violating
the condition of weak stationary

• Random Walk without Drift:

𝑌𝑡 = 𝑌𝑡−1 + 𝑢𝑡

The random walk without drift is a non stationary stochastic process

• Unit Root Stochastic Process

𝑌𝑡 = p𝑌𝑡−1 + 𝑢𝑡 −1≤𝜌≤1

If 𝜌 = 1, then the equation is known as unit root problem that is situation of

non-stationary. Thus term non-stationarity , Random walk and unit root are
synonymous.
Tests of Stationarity

The Unit Root Test:

𝑌𝑡 = 𝜌𝑌𝑡−1 + 𝑢𝑡 … … … . . (1)

If 𝝆 = 1 then equation (i) will become random walk model without drift which is
non-stationary stochastic process.

∆𝑌𝑡= (𝜌 − 1)−1 + 𝑢𝑡

∆𝑌𝑡= 𝛿𝑌𝑡−1 + 𝑢𝑡

𝑤ℎ𝑒𝑟𝑒 𝛿 = 𝜌 − 1

When 𝛿 = 0, = 1 that is we have unit root present.

• Dickey Fuller Test

CO-INTEGRATION

Stationarity is a crucial property for time series modeling. The problem is, in practice,
very few phenomena are actually stationary in their original form. The trick is to employ
the right technique for reframing the time series into a stationary form. One such
technique leverages a statistical property called co-integration. Co-integration forms a
synthetic stationary series from a linear combination of two or more non-stationary
series.

When two time series variables X and Y do not individually hang around a constant
value but their combination (could be linear) does hang around a constant is
called co-integration. Sometimes it's considered as a long term relationship between the
said variables.

METHODS OF FORECASTING

The most important aspect of time series is Forecasting. There are two methods of
forecasting which are popularly used. They are:

• Box – Jenkins Methodology (BJ Methodology): Box – Jenkins methodology technically

known as ARIMA methodology.

• Vector Auto-regression (VAR).

QUESTION DISCUSSION

▪ Which of the following statements is true concerning standard regression model?

(A) Y has a probability distribution

(B) X has a probability distribution

(D) For an adequate model the residual (û) will be zero for all sample data points
(Answer: A) (2013 sep iii 71)

Explanation: Only the errors follow a normal distribution (which implies the conditional
probability of Y given X is normal too). You do need distributional assumptions about the
response variable in order to make inferences (e.g, confidence intervals), but it is not
necessary that the response variable be normally distributed.

▪ Standard Error (SE) of a large sample of size n from a population whose

2
variance is σ is:
2
σ σ
A. SE= 𝑛
B. SE= 𝑛

2
σ σ
C. SE= D. SE=
𝑛 𝑛

(Answer: D)(2015 june ii 44)

Explanation: The standard error (SE) of a statistic is the approximate standard deviation
of a statistical sample population. Therefore, the relationship between the standard
error of the mean and the standard deviation is such that, for a given sample size, the
standard error of the mean equals the standard deviation divided by the square root of
the sample size.

▪ Which of the following is true in the context of statistical test of hypotheses for
two variable linear regression model:
2 2
A. 𝑡 >F B. 𝑡 <F
2
C. 𝑡 =F D. t=F (Answer: C)(2015 june iii 68)

Explanation: Relationship between ANOVA and t tests

• For two independent samples, either t or F can be used

2
– Always result in same decision F = 𝑡 .

▪ For the regression model given below

Y=βo+ β1X+u

Y=20+2X

SE= 0.46

To test Ho: β1=2.1 against 𝐻1: β1≠ 2.1 (not equal to 2.1), test statistic |t| is equal to

A. 4.609 B. 0.217

C. 4.34 D. 0.33 (Answer: B)(2012 june iii 60)

Explanation: steps to calculate t statistic for slope of regression line:

𝐻𝑜: β1 −β1 2.1−2
t= 𝑆𝐸
= 0.46
= 0.217.

▪ Generalized Least Squares Method is suitable for estimation of parameters in a

General Linear Model for dealing with problem of

(i) Measurement Errors (ii) Auto-correlated Disturbances

(iii) Multi-collinearity (iv) Heteroscedasticity

Select the correct answer from codes given below :

Codes :

(A) (i), (ii), (iii), (iv) (B) (i), (iii), (iv)

(C) (ii), (iv) (D) (i), (iii)

(Answer: C)(2016 july iii 73)

▪ Three statements are given as under

(a) Generalized least square (GLS) method is capable of providing BLUE in situations
where OLS fails

(b) GLS is OLS on the transformed variables that satisfy the standard least squares
assumptions

(c) GLS method is suitable for dealing with multicollinearity problem

Choose the correct option:

A. a and b are correct but c is not correct

B. a and c are correct but b is not correct

C. a is correct but b and c are not correct

D. a is not correct but b and c are correct (Answer: A) (2019 dec 48)

Explanation: The generalized least squares (GLS) estimator of the coefficients of

a linear regression is a generalization of the ordinary least squares (OLS) estimator. It is
used to deal with situations in which the OLS estimator is not BLUE (best linear
unbiased estimator) because one of the main assumptions of the Gauss-Markov
theorem, namely that of homoscedasticity and absence of serial correlation, is violated.
In such situations, provided that the other assumptions of the Gauss-Markov theorem
are satisfied, the GLS estimator is BLUE.

▪ If a Durbin - Watson statistics takes value close to zero, what will be the value of
first order autocorrelation coefficient ?

(A) Close to zero (B) Close to plus one

(Answer: B)(2017 nov iii, 73)

Explanation: Durbin–Watson d Test

^
[
d= 2 1 − 𝑝 ]
^
-1<𝑝>1 (Simply correlation coefficient)

▪ Multiplicative Decomposition model is used to

A. Deseasonalise the time series data

B. Build cost of living index

C. Test a hypothesis

D. Estimate probability

(Answer: A)(2019 june 75)

Explanation: The decomposition of time series is a statistical task that deconstructs

a time series into several components, each representing one of the underlying
categories of patterns

▪ Boot strapping technique is used to:

A. Test for specification bias

B. Obtain sampling distribution of parameters of interest

C. Test for auto-correlation

D. Test for normality of error terms (Answer: B)(2019 dec 76)

Explanation: Bootstrapping methods are a numerical approach to generating confidence

intervals that use either resampled data or simulated data to estimate the sampling
distribution of the maximum likelihood parameter estimates.

❑ Dickey Fuller test – Stationarity

❑ Box-Jenkins test – Forecasting

❑ Glejser test – Heteroscedasticity

❑ Granger test – Causality

The Granger causality test is a statistical hypothesis test for determining whether
one time series is useful in forecasting another, first proposed in
1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued
that causality in economics could be tested for by measuring the ability to predict the
future values of a time series using prior values of another time series

❑ Goldfeld-Quandt test – Heteroscedasticity

❑ Gauss-Markov – BLUE

❑ 𝐻0: βo = 0, 𝐻1: β 2≠0 – Two sided test

2
❑ Coefficient of determination - 𝑅

❑ Durbin Watson test – Autocorrelation

❑ Delphi Method – Forecasting

The method relies on the key assumption that forecasts from a group are
generally more accurate than those from individuals. The aim of the Delphi
method is to construct consensus forecasts from a group of experts in a
structured iterative manner.

❑ Students t test – Testing significance of regression coefficient

2
❑ Contingency table - 𝑋 test

Frequency tables of two variables presented simultaneously are

called contingency tables. Hypothesis tests on contingency tables are based on a
statistic called chi-square.

❑ Unit Root Test – Stationarity

❑ Fisher’s F test – Test of significance of over-all regression

❑ Kyock’s Approach – Lagged variables

Koyck’s approach to econometric analysis deals with relationships involving Lagged

explanatory variables

❑ Ferrar-Glauber Test – Multicollinearity

Farrar-Glauber test is one of the statistical test used to detect multicollinearity.

❑ Wald-Bartlett test – Errors in variables

The Wald test can tell you which model variables are contributing something significant.

❑ Dummy variables – Qualitative variables

❑ Identification – Rank Condition

▪ In a two variable regression Y is dependent variable and X is independent

variable. The correlation coefficient between Y and X is 0.6. For this which

of the following result is correct ?

(A) 36% variations in Y are explained by X.

(B) 60% variations in Y are explained by X.

(C) 6% variations in Y are explained by X.

(D) None of the above. (Answer: A)(2013 dec iii 71)

▪ Given the two regression lines estimated from given data as under :

Y = 4 + 0.4X

X = –2 + 0.9Y

Then coefficient of correlation between X and Y will be

(A) 0.4 (B) 0.5

(C) 0.6 (D) 0.8 (Answer: C)(2013 dec ii 47)

Correlation v/s Regression

• Use correlation for a quick and simple summary of the direction and strength of
the relationship between two or more numeric variables. Use regression when
you’re looking to predict, optimize, or explain a number response between the
variables (how x influences y).

Relationship between correlation and regression:

• r= 𝑏𝑥𝑦 * 𝑏𝑦𝑥
2
• Coefficient of determination (𝑟 ) is a measurement used to explain how much
variability of one factor can be caused by its relationship to another related factor.

Bibliography
H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1.

Gujarati, D. N. (2003). In Basic econometrics. essay, McGraw-Hill.

Roger B. Davis and ScD Kenneth J. MukamalMD, Davis, R. B., Roger B. Davis From
the Division of General Medicine and Primary Care, Mukamal, K. J., Kenneth J.
Mukamal From the Division of General Medicine and Primary Care, & Davis, C. to R. B.
(2006, September 5). Hypothesis testing. Circulation. Retrieved September 28, 2021,

Next Js Ebook
100% (2)
Next Js Ebook
95 pages
Works of Mary Robinson
No ratings yet
Works of Mary Robinson
2 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Econometric S
No ratings yet
Econometric S
4 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Lecture #1
No ratings yet
Lecture #1
22 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Econometrics Lecture1
No ratings yet
Econometrics Lecture1
24 pages
Chapter Two Metrics (I)
No ratings yet
Chapter Two Metrics (I)
35 pages
Study Material_Econometrics - Copy
No ratings yet
Study Material_Econometrics - Copy
20 pages
Econometrics Chapter _Two (1)
No ratings yet
Econometrics Chapter _Two (1)
71 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
econometrics 1
No ratings yet
econometrics 1
7 pages
Econometrics: Damodar Gujarati
No ratings yet
Econometrics: Damodar Gujarati
36 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Reference: Basic Econometrics by Damodar N. Gujarati Additional Reference: Introductory Econometrics by Jeffery M Wooldridge
No ratings yet
Reference: Basic Econometrics by Damodar N. Gujarati Additional Reference: Introductory Econometrics by Jeffery M Wooldridge
16 pages
econometrics notes 2024
100% (1)
econometrics notes 2024
46 pages
ECS4863 SOLUTIONS Activity 4.1 A Reviews of Statistical and Econometric Concepts
No ratings yet
ECS4863 SOLUTIONS Activity 4.1 A Reviews of Statistical and Econometric Concepts
5 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
No ratings yet
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
64 pages
Understanding Econometrics Basics
No ratings yet
Understanding Econometrics Basics
10 pages
Econometrics Test Prep
100% (2)
Econometrics Test Prep
7 pages
Chapter One Part 1
No ratings yet
Chapter One Part 1
20 pages
What Is Econometrics?: (Ref: Wooldridge, Chapter 1)
No ratings yet
What Is Econometrics?: (Ref: Wooldridge, Chapter 1)
11 pages
What Is Econometrics?: Hypotheses Forecasting
No ratings yet
What Is Econometrics?: Hypotheses Forecasting
6 pages
Assignment
No ratings yet
Assignment
20 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Econometrics II
No ratings yet
Econometrics II
15 pages
Econometrics Note
No ratings yet
Econometrics Note
13 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Outline _ Simple Regression
No ratings yet
Outline _ Simple Regression
51 pages
Econometrics Notes
No ratings yet
Econometrics Notes
72 pages
SL Sir App Ecotrix UNIT 1
No ratings yet
SL Sir App Ecotrix UNIT 1
18 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Chapter 1 ECONOMETRICS MGT
No ratings yet
Chapter 1 ECONOMETRICS MGT
5 pages
Econometrics - Basic 1-8
100% (1)
Econometrics - Basic 1-8
58 pages
ECO - Chapter 2 SLRM
No ratings yet
ECO - Chapter 2 SLRM
40 pages
CH - 3 - Econometrics UG
No ratings yet
CH - 3 - Econometrics UG
38 pages
Advanced Econometrics I: Dessalegn Shamebo (PHD)
No ratings yet
Advanced Econometrics I: Dessalegn Shamebo (PHD)
64 pages
CHAPTER 2
No ratings yet
CHAPTER 2
17 pages
Introduction To Econometric - Tutor
No ratings yet
Introduction To Econometric - Tutor
134 pages
ECM Class 1 2 3
No ratings yet
ECM Class 1 2 3
65 pages
Assumptions of The Ols Method
No ratings yet
Assumptions of The Ols Method
21 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Introduction To Bivariate Regression
No ratings yet
Introduction To Bivariate Regression
51 pages
Econometric Modeling
No ratings yet
Econometric Modeling
38 pages
Econometrics Chapter Two-1
No ratings yet
Econometrics Chapter Two-1
41 pages
Chapter 1 - Nature of Applied Econometrics and Economic Data
No ratings yet
Chapter 1 - Nature of Applied Econometrics and Economic Data
38 pages
Econemtrics_ppt_
No ratings yet
Econemtrics_ppt_
230 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
31 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
econometrics(1)
No ratings yet
econometrics(1)
19 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
108 pages
ACT-671 Introduction Econometrics-2012
No ratings yet
ACT-671 Introduction Econometrics-2012
29 pages
Econometrics Assig 1
0% (1)
Econometrics Assig 1
13 pages
Research What Is Research?
No ratings yet
Research What Is Research?
72 pages
Ch2 - Econometrics For Finance (Regression Part)
No ratings yet
Ch2 - Econometrics For Finance (Regression Part)
34 pages
Econometrics Notee August 18
No ratings yet
Econometrics Notee August 18
20 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Weins-Displacement-LawUG_Paper-II_Department-of-Physics
No ratings yet
Weins-Displacement-LawUG_Paper-II_Department-of-Physics
5 pages
Development Economics
No ratings yet
Development Economics
55 pages
Introduction to Statistics
No ratings yet
Introduction to Statistics
92 pages
International Economics Notes
No ratings yet
International Economics Notes
61 pages
Data Warehouse Exercise 6
No ratings yet
Data Warehouse Exercise 6
5 pages
Dlleng8-Observed1st Quarter 2019
No ratings yet
Dlleng8-Observed1st Quarter 2019
6 pages
ON Engleza Scris 9 A Bar
No ratings yet
ON Engleza Scris 9 A Bar
1 page
The Song of The Ocean
No ratings yet
The Song of The Ocean
5 pages
Learn Web Application Penetration Testing - Codelivly
No ratings yet
Learn Web Application Penetration Testing - Codelivly
14 pages
Can Could and
No ratings yet
Can Could and
2 pages
Law and Tradition in Judaism - B. Cohen
100% (2)
Law and Tradition in Judaism - B. Cohen
255 pages
EPI Manual Part1 2013 2
No ratings yet
EPI Manual Part1 2013 2
350 pages
10 3758-BF03201360
No ratings yet
10 3758-BF03201360
10 pages
Introduction To Philosophy Module 1
No ratings yet
Introduction To Philosophy Module 1
21 pages
Road Traffic Inspectorate Contact Details (October 2023) KZN SOUTH AFRICA
No ratings yet
Road Traffic Inspectorate Contact Details (October 2023) KZN SOUTH AFRICA
6 pages
Developing 2D Games with Unity: Independent Game Programming with C# 1st Edition Jared Halpern - The full ebook with all chapters is available for download now
100% (1)
Developing 2D Games with Unity: Independent Game Programming with C# 1st Edition Jared Halpern - The full ebook with all chapters is available for download now
67 pages
Pre-Assessment in LS6
No ratings yet
Pre-Assessment in LS6
6 pages
BEM 314 Case Study Exercise Chapters 4 & 5 Memorandum
No ratings yet
BEM 314 Case Study Exercise Chapters 4 & 5 Memorandum
2 pages
Unit1 - 2 - CSB4117 Data Structure (Stack & Queue Concepts)
No ratings yet
Unit1 - 2 - CSB4117 Data Structure (Stack & Queue Concepts)
28 pages
Sonnet 18 Worksheet
No ratings yet
Sonnet 18 Worksheet
11 pages
Metraux Religion and Shamanism
No ratings yet
Metraux Religion and Shamanism
42 pages
Guide To Install The SSL Certificate in Admanager Plus
No ratings yet
Guide To Install The SSL Certificate in Admanager Plus
5 pages
Wrapping TCL Scripts: Tclpot User Guide
No ratings yet
Wrapping TCL Scripts: Tclpot User Guide
23 pages
Figures of Speech
100% (1)
Figures of Speech
7 pages
Graphics Bunde From Daylon Decals
No ratings yet
Graphics Bunde From Daylon Decals
17 pages
Bible Geneology......
No ratings yet
Bible Geneology......
3 pages
Lacp With STP Sim
No ratings yet
Lacp With STP Sim
3 pages
Gateway 1 Term 1 Test 2 A
No ratings yet
Gateway 1 Term 1 Test 2 A
1 page
Hospital AVNR
No ratings yet
Hospital AVNR
61 pages
SynKernelDiag2020 07 02 - 21 08 16
No ratings yet
SynKernelDiag2020 07 02 - 21 08 16
222 pages
Ramadan 2015 Review Pack en Blog Final
No ratings yet
Ramadan 2015 Review Pack en Blog Final
104 pages
Essential Questions
No ratings yet
Essential Questions
3 pages

Econometrics Notes

Uploaded by

Econometrics Notes

Uploaded by

Econometrics

“Econometrics is concerned with the empirical determination of economic laws” ( Theil,

As the definition suggest, Econometrics is a tool used to empirically evaluate economic

1: Economic theory: Since Econometrics is concerned with the empirical determination

Example: Fundamental theory of consumption.

2: Specification of mathematical model of the theory: Every economic theory would

3: Specification of Econometric model of the theory: Mathematical models are

For example: data of GDP growth for the last 10 years.

●​ Cross-sectional data: Cross-section data are data on one or more variables

Measurement scale of variables

THE SIMPLE LINEAR REGRESSION MODEL

E(Y | Xi) = β1 + β2Xi

●​ THE SAMPLE REGRESSION FUNCTION (SRF):

The estimation of econometric model is very crucial in econometric analysis. The

ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL

4: Homoscedasticity or equal variance of ui. Given the value of X, the variance of ui

5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj (i

6: Zero covariance between ui and Xi, or E(ui, Xi) = 0.

9: The regression model is correctly specified. Alternatively, there is no

PROPERTIES OF LEAST SQUARE ESTIMATORS

Gauss–Markov Theorem: Given the assumptions of the classical linear regression

Best Linear Unbiased Estimators (BLUE):

The purpose of statistical inference is to draw conclusions about a population on the

•​ Hypothesis: It is a general statement made about any relationship. In other

•​ Alternative Hypothesis (H1): It includes any other admissible hypothesis, other

Some concepts related to hypothesis testing:

•​ Level of Significance: It is the probability of committing Type I error.

•​ Power of a test: It is the probability of avoiding a Type II error.

•​ Degree of freedom: Degree of freedom of an estimate is the number of

Steps of Testing Hypothesis:​

To test significance of mean of a random sample (Testing the Significance of

µ0= population mean

S.E= standard error

a. Sample size is large, that is, n > 30.

b. Population variance is known.

c. Population is normally distributed.

𝑋= sample average µ0= population mean s= standard deviation

•​ F-test is used to measure overall significance of regression model and is also a

Application of Chi-Square tests:

2. It is a test of goodness of fit.

4. It is used to determine the association between two or more attributes.

Econometric analysis is not only for quantitative variables, we can incorporate

If a qualitative variable has m categories, introduce only (m − 1) dummy variables. For

The term multicollinearity is coined by Ragnar Frisch. Simply, Multicollinearity is

Multicollinearity is perfect when 𝑟𝑥1𝑥2 = 1.

Reasons for the Problem of Multicollinearity

2.​ Model specification, for example, adding polynomial terms to a regression

1. When Multicollinearity is perfect:

•​ Least Square estimators are indeterminate.

•​ Variance and co-variances of the estimators become infinitely large.

yi = ˆ β2x2i + ˆ β3x3i + ˆui

2. When Multicollinearity is less than perfect:

2.​ High pair-wise correlations among regressors denoting the presence of

3.​ Examination of partial correlations.

1.​ Using the a priori information

3.​ Transformation of a variable.

4.​ Additional or new data: Since multi-collinearity is a sample phenomenon, it can

5.​ Combining cross-section and time series data.

2. Economic variables such as income and wealth also show heteroscedasticity

3. Some economic variables exhibit the skewness in distribution.

5. Specification errors in the models also lead to heteroscedasticity.

6. Outliers in the data may result into the problem of heteroscedasticity

•​ Spearman’s Rank Correlation Test

•​ White’s General Heteroscedasticity Test

Case1: When variances are known:

•​ Generalized Least Square (GLS): GLS is a procedure of transforming the original

•​ Weighted Least Square (WLS)

Case 2: When variances are unknown:

⮚​ Spatial Auto- correlation: Correlation between cross-sectional units

⮚​ Serial Correlation: Correlation between error terms over a period of time

❑​ Negative auto-correlation is when variables are going in different directions.

1. More prevalent in time series data.

2. Specification bias: Whenever we exclude an important variable or we use incorrect

•​ Confidence interval is likely to be wider than normal circumstances.

•​ Breusch–Godfrey test (LM Test)

2.​ If autocorrelation is because of misspecification of the mathematical form , then

● Cross-sectional data: Cross-section data are data on one or more variables

● THE SAMPLE REGRESSION FUNCTION (SRF):

• Hypothesis: It is a general statement made about any relationship. In other

• Alternative Hypothesis (H1): It includes any other admissible hypothesis, other

• Level of Significance: It is the probability of committing Type I error.

• Power of a test: It is the probability of avoiding a Type II error.

• Degree of freedom: Degree of freedom of an estimate is the number of

Steps of Testing Hypothesis:

• F-test is used to measure overall significance of regression model and is also a

2. Model specification, for example, adding polynomial terms to a regression

• Least Square estimators are indeterminate.

• Variance and co-variances of the estimators become infinitely large.

2. High pair-wise correlations among regressors denoting the presence of

3. Examination of partial correlations.

1. Using the a priori information

3. Transformation of a variable.

4. Additional or new data: Since multi-collinearity is a sample phenomenon, it can

5. Combining cross-section and time series data.

• Spearman’s Rank Correlation Test

• White’s General Heteroscedasticity Test

• Generalized Least Square (GLS): GLS is a procedure of transforming the original

• Weighted Least Square (WLS)

⮚ Spatial Auto- correlation: Correlation between cross-sectional units

⮚ Serial Correlation: Correlation between error terms over a period of time

❑ Negative auto-correlation is when variables are going in different directions.

• Confidence interval is likely to be wider than normal circumstances.

• Breusch–Godfrey test (LM Test)

2. If autocorrelation is because of misspecification of the mathematical form , then

● If ϱ(correlation coefficient) is known, then we apply GLS, transforming the

• By the identification problem we mean whether numerical estimates of the

• A model is said to be identified if it has a unique statistical form enabling unique

1. Equation under-identified: If an equation is unidentified it is impossible to

2. Equation Identified: If an equation has a unique statistical form we say it is

⮚ K – M = G – 1, the equation is exactly identified

⮚ K – M < G – 1, the equation is under-identified

⮚ K – M > G – 1, the equation is over-identified

1. Equation is Just/Exactly identified.

4. Equations must be linear.

5. There should be no multicollinearity among the pre-determined variables of the

⮚ A time series is simply a sequence of numbers collected at regular intervals over

⮚ Time series is a collection of random variables (𝑌𝑡), such collection of random

⮚ Moving Average Method: A moving average is a technique to get an overall idea

⮚ Method of Least Squares: Least Squares Method is a statistical technique to

• Random Walk with Drift:

• Random Walk without Drift:

• Unit Root Stochastic Process

▪ Which of the following statements is true concerning standard regression model?

▪ Standard Error (SE) of a large sample of size n from a population whose

▪ For the regression model given below

A. 4.609 B. 0.217

C. 4.34 D. 0.33 (Answer: B)(2012 june iii 60)

▪ Generalized Least Squares Method is suitable for estimation of parameters in a