Econometrics I
Econometrics I
INTRODUCTION TO ECONOMETRICS
• Econometrics- combining two Greek words, “oikonomia”
(Economics) and “metron” (measure).
• Literal meaning ‘measurement in Economics’.
• Branch of economics, which is a combination of mathematics,
statistics and economic theory or the application of mathematics
and statistical methods to the analysis of economic data.
INTRODUCTION TO ECONOMETRICS
• Econometrics is statistical and mathematical analysis of economic
relationships; science of empirically verifying economic theories;
set of tools used for forecasting future values of economic variables
and science of making quantitative policy recommendations in
government.
• The Econometrics has recognised as a branch of Economics only in
the 1930s with the foundation of ‘Econometric Society’ by Ragnar
Frisch and Irving Fisher.
• The term Econometrics have been first used by Powel Ciompa as
early as 1910, but Ragnar Frisch is credited for coining the term and
establishing it as a subject.
• In history of Econometrics, R J Epstien viewed Henry Moore (1896-
1958) as father of modern Econometrics because of his attempt in
1911 to provide statistical evidence for marginal productivity theory.
Classification of Econometrics
• Theoretical econometrics is concerned with the development of
appropriate methods for measuring economic relationships specified
by econometric models.
• In applied econometrics we use the tools of theoretical econometrics to
study some special field(s) of economics and business, such as the
production function, investment function, demand and supply
functions, portfolio theory, etc.
Uses of Econometrics
• In testing economic theories
• In formulation and evaluation of economic policy
• Prediction of macroeconomic variables
• In finding macroeconomic relationships
• Microeconomic relationship
• Finance
• In private and public sector decision making.
Types of Data
1. Cross Section Data: one dimensional micro data set collected at a point
of time without regarding the differences in time (firms, households,
regions, countries, etc)
2. Time Series Data: the macro data collected for the same entity at
different point of time (tick by tick, daily, weekly, monthly, yearly, ect)
3. Panel Data: Time series over different cross section units. The data
collected for different entities or subject (people, countries, same set of
stocks) in different point of time.
Regression and Causation
• The term regression was coined by Francis Galton.
• In correlation analysis, primary objective is to measure the
strength or degree of linear association between two random
variables.
• Regression analysis tries to estimate or predict the average
value of one variable (dependant) on the basis of the fixed
values of other variables (independent).
• In the literature of Econometrics the terms dependent variable
and explanatory variable are described variously like;
Simple Linear Regression (Two Variable)
• Consider a hypothetical example;
• Total population is 60 families
• Y = weekly family consumption expenditure
• X = weekly disposable family income
• The total 60 families were divided into 10 groups of approximately with same level of
income as;
80,100,120,140,160,180,200,220,240,260
• The detailed table is given in the next page.
• From the table we can understand that average weekly expenditure in each income group
is;
65,77,89,101,113,125,137,149,161,173
• We have 10 fixed values of X, and the corresponding Y values against each of the X
values.
• We call these mean values as Conditional Expected Mean Values- E(Y/X) – the expected
value of Y for given value of X.
• If we add the weekly consumption expenditure of all the 60 families and divide
this by 60, ie, (∑X/N) = 7272/60, we will get 121.20, which is called
Unconditional Mean or Expected weekly consumption expenditure E(Y).
• The above diagram shows Conditional distribution of expenditure for various levels of
income.
• There are considerable variations in weekly consumption expenditure in each income
group.
• Despite of the variability in consumption expenditure within each income group, the
average weekly consumption increases as income increases.
• If we join these conditional mean values, we obtain what is known as the
Population Regression Line (PRL), or Population Regression Curve
(PRC).
• More simply, it is the regression of Y on X.
• A population regression curve is simply the locus of the conditional
means of the dependent variable for the fixed values of the explanatory
variable.
• The above figure shows that for each X, there is Y values that are
spread around the conditional mean of those Y values.
• These Y values are distributed symmetrically around their respective
means.
• The regression line passes through these conditional values.
Population Regression Function (PRF)
• Each conditional mean E(Y/Xi) is function of Xi (value of X)
E(Y/X) = f(Xi)
• The expected value of Y for given X is functionally related to Xi.
• It is the average response of Y varies with X.
• In our example we know that consumption is a linear function of
income, and hence we have a linear population regression function;
E(Y/Xi) = f(Xi) = β0 + β1Xi
• In regression, we are estimating PRF, ie, estimating the values of the
unknowns β0 and β1 on the basis of the observation of Y and X.
• Yi = β0 + β1Xi2 is a non-linear regression function.
Stochastic Specification of PRF
• Yi = E(Y/Xi)+ui
• Yi = β0 + β1Xi+ui
• ui = Y – E(Y/Xi)
• Yi = E(Y/Xi)+ui has two components- fixed and random components.
• E(Y/Xi) states that the consumption expenditure of a given family is
depends upon the consumption expenditure of all the families wit same
level of income, which is systematic and deterministic.
• ui is pure random, it is a proxy for all the omitted and neglected variable
that may affect Y but are not included in the regression mode, which is
not systematic component.
• ui is known as the stochastic disturbance or stochastic error term.
Sample Regression Function (SRF)
• Estimate the PRF from the sample data.
• We are taking 2 sets of samples from our earlier example of income and
consumption;
• Now we are plotting it to a diagram as follows;
• Estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the
information provided by the sample at hand. A particular numerical value
obtained by the estimator in an application is known as an estimate.
• So, Population Regression Function is
• And Sample Regression Function is
• The primary objective of regression analysis is to estimate PRF on the basis of
SRF.
Estimation of Regression Equation
• There are various methods to estimate the value of unknown
parameters like;
• Ordinary Least Square Method
• Maximum Likelihood Method
• Moment Method
• Ordinary Least Square Method is the most popular and widely used
method of estimation.
Ordinary Least Squares
• OLS introduced by Carl Friedrich Gauss, German Mathematician.
• Our population regression function (PRF) is Yi = β1 + β2Xi +ui
• The PRF is not directly observable, so we estimate it from the SRF.
• Yi = β̂1 + β̂2Xi +ûi
= Ŷi + ûi
where Ŷi is the estimated (conditional mean) value of Yi
• ûi = Yi − Ŷi
= Yi − β̂1 − β̂2Xi which shows that the ûi (the residuals) are simply the
differences between the actual and estimated Y values.
• Now given ‘n’ pairs of observations on Y and X, we would like to
determine the SRF in such a manner that it is as close as possible to the
actual Y.
• For this we have to choose the SRF in such a way that the sum of the
residuals, ie, ∑ ûi = ∑(Y- Ŷi) is as small as possible.
• In the above figure we can find that the algebraic sum of the ûi is small (even zero)
although the ûi are widely scattered about the SRF.
• The method of least squares chooses β̂1 and β̂2 in such a manner that, for a given
sample or set of data ∑ui2 is as small as possible.
• For a given sample, the method of least squares provides us with unique estimates of
β1 and β2 that give the smallest possible value of ∑ui2 .
• The estimators obtained through such method are known as the least-squares
estimators.
• Selecting the values for β̂1 and β̂2 in such a way that the errors are small as possible.
• The positive errors will exactly balance the negative errors and the ∑ui is always zero.
• We can overcome this by adopting Least Square Criterion by minimising the sum of
squared residuals (RSS).
• Squaring the residuals does two things;
• 1) It avoids the possibility that large positive residuals and large
negative residuals could offset each other and still lead to a small or
even zero value of the sum of squared residuals.
• 2) It implicitly assign a large weight to numerically large residuals
regardless of whether they are positive or negative.
• If the sum of squared residuals (RSS) small, the better is the fit and if it
is zero we will get a perfect fit line although it is impossible.
RSS = u12+u22+u32+u42+…..+un2
• Here we are minimising squared residual, that’s why we call it as
Ordinary Least Square (OLS).
Interpretation of a Linear Regression Equation
Yi = β̂1 + β̂2Xi +ûi
• Suppose Y and X are expressed with natural units (no log or other
function).
• One unit increase in X will cause a β̂2 unit increase in Y.
• The constant gives the value of Y when X=0, it may or may not have
a plausible meaning depends upon the context.
Assumptions of Classical
Linear Regression Model
Assumptions of Classical Linear Regression Model
• The Gaussian, standard, or classical linear regression model
(CLRM), which is the cornerstone of most econometric theory,
makes 10 assumptions.
• Yi = β1 + β2Xi +ui
• The parameter β1 and β2 should appear with power one and it is not
multiplied or divided with any other parameters.
• Linearity implies that one unit change in X has the same effect on Y
irrespective of the initial value of X.