Mece-00 1:econometric Methods: Course Code: Asst
Mece-00 1:econometric Methods: Course Code: Asst
Assignment
Note: Answer all the questions.While questions in Section A carry 20 marks each, those in
Section B carry 12 marks each.
Section A
Section B
3. What is the underlying idea behind the probit model? Explain how parameters are
estimated in the probit model.
4. What is meant by dynamic model? Explain how the following model can be estimated?
Yt =ex: +f3xt + YYt-1 + Ut
where lyl < 1and Ut = p ut_ 1+ Et· In the above model Et is the usual stochastic error
term with mean zero and variance u 2 and lpl < 1.
5. Suppose the error terms in a regression model are correlated. What are its consequences
on the OLS estimators? How do you detect autocorrelation?
6. Suppose the explanatory variable in a regression model is measured with error. What are
its consequences? What steps will you take to solve the problem?
1
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
ASSIGNMENT SOLUTIONS GUIDE (2017-2018)
M.E.C.E-1
Econometric Methods
Disclaimer/Special Note: These are just the sample of the Answers/Solutions to some of the Questions given in the
Assignments. These Sample Answers/Solutions are prepared by Private Teacher/Tutors/Authors for the help and guidance
of the student to get an idea of how he/she can answer the Questions given the Assignments. We do not claim 100%
accuracy of these sample answers as these are based on the knowledge and capability of Private Teacher/Tutor. Sample
answers may be seen as the Guide/Help for the reference to prepare the answers of the Questions given in the assignment.
As these solutions and answers are prepared by the private teacher/tutor so the chances of error or mistake cannot be
denied. Any Omission or Error is highly regretted though every care has been taken while preparing these Sample Answers/
Solutions. Please consult your own Teacher/Tutor before you prepare a Particular Answer and for up-to-date and exact
information, data and solution. Student should must read and refer the official study material provided by the university.
Note: Answer all the questions. While questions in Section-A carry 20 marks each, those in Section-B
carry 12 marks each.
SECTION-A
Q. 1. (a) What is simultaneity bias? Explain the conditions required for identification of parameters in a
simultaneous equation model.
Ans. Simultaneity bias is a term for the unexpected results that happen when the explanatory variable is corre-
lated with the regression error term, (sometimes called the residual disturbance term), because of simultaneity. It's
so similar to omitted variables bias that the distinction between the two is often very unclear and in fact, both types
of bias can be present in the same equation.
The standard way to deal with this type of bias is with instrumental variables regression (e.g. two stage least
squares).
A model is identified if it has a UNIQUE STATISTICAL FORM.
This enables UNIQUE ESTIMATES of its PARAMETERS to be made.
D = b0 + b1 P + u
S = a0 + a1 P + v
D = S
Q = f(P)
2
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
A function belonging to a system of simultaneous equations is identified if it has a unique statistical form. This
means that there must be no other model in the system, or formed by algebraic manipulation of other equations
within the system, which contains the same variables as the function in question.
D = b0 + b1 P + u
S = a0 + a1 P + v
D = S
D = a0 + a1 P + V
The RANK and ORDER conditions.
1. Equation under-identified.
2. Equation identified: Exactly identified, Over-identified.
An equation is UNDER-IDENTIFIED if its statistical form is not unique. A system is UNDER-IDENTIFIED if
one or more of its equations is under-identified.
An equation which has a unique statistical form is IDENTIFIED. A system is IDENTIFIED if all of its equations
are IDENTIFIED.
Implications of Identification
If an equation (model) is under-identified it is impossible to estimate all its parameters with any econometric
technique.
If an equation (model) is identified in general its coefficients can be estimated. The appropriate estimation
technique will depend upon whether it is exactly identified or over-identified.
The order Condition for Identification
For an equation to be identified the total number of variables excluded from it must be equal to or greater than
the number of endogenous variables in the model less one.
Or:
For an equation to be identified the total number of variables excluded from it but included in other equations
must be at least as great as the number of equations in the system less one.
G = total number of equations (total number of endogenous variables.)
K = total number of variables in the model (endogenous and pre-determined).
M = number of variables, endogenous andpre-determined, in a particular equation.
3
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
K – M > G –1
The order condition is NECESSARY for identification but it is not SUFFICIENT.
THE RANK CONDITION FOR IDENTIFICATION.
In a system of G equations any particular equation is identified iff it is possible to construct at least one non-zero
determinant of the order (G –1) from the coefficients excluded from that particular equation but contained in other
equations of the model.
Or:
y1 = 3y2 – 2 x1 + x2 + u1
y 2 = y 3 + x 3 + u2
y 3 = y1 – y2 – 2x3 + u3
– y1 + 3y2 + 0y3 – 2x1 + x2 + 0x3 + u1 = 0
0y + 3y + 0y – 0x + x + 0x + x + u = 0
1 2 3 1 2 2 3 2
y1 + y2 + y3 – 0 x1 + 0 x2 + 2x3 + u3 = 0
y y y x x x
1 2 3 1 2 3
–1 3 0 –2 1 0
0 –1 1 0 0 1
1 –1 –1 0 0 –2
(b) In the following two-equation system check the identification status of both the equations.
D y = + Y + u
1 1 2 2 1
y2 = +
1
2 Y1 + 3 Z1 + 4 Z2 + u2
Sol.
y = y u (i)
1 1 2 2 1
y2 = +
1
2 Y1 + 3 Z1 + 4 Z2 + u2 (ii)
Consider the estimation of equation (i) Since independent from u 1, We have
Cov (y , u ) = 0
2 1
However y2 is not indepondent from u 1.
Hence
CV (y , u ) 0.
2 1
Since we have four coefficinet to estimate (, , , ) We have to find a variable that is independent of u ,
1 2 3 4 1
Clearly in this case we have Z 3 and.
COV (z3, 4,) = 0
Z3 is the instrumental variable of Y 2. then
1
z y – y
– β z –β z –β z = 0
1 1 2 1 1 1 1 2 2
N
1
z y – y
– β z –β z –β z
2 1 2 1 1 1 1 2 2
N
Q. 2. What are the assumptions of a classical regression model? Derive Ordinary Least Squares (OLS)
estimators for the model. Show that the OLS estimators are Best Linear Unbiased Estimators (BLUE).
Ans. Classical Assumption For OLS: Under certain assumptions, the method of least square has some very
attractive statistical properties that have made it one of the most powerful and popular methods of regression analysis.
The assumptions of the Ordinary Least Squares (OLS) model are as follows:
1. The regression model is linear in the parameters Y and X.
Yi = + Xi + i
2. X is assumed to be non-stochastic.
4
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
3. The conditional mean value of i is zero.
E/ X ) = 0
(
i i
4. The conditional variance of i are identical.
5
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Var ( / X ) = 2
i i
Where ‘Var’ stands for variance.
5. Correlation between any two and (i j) is zero.
i i
Cov ( , / X , X ) = 0
i i i i
6. Covariance between and X is equal to zero
i i
Cov (, X ) = 0.
i i
7. The number of observations n must be greater then the number of explanatory variables.
8. Var (X) must be a finite positive value.
9. There is no specification bias or error in the model used in empirical analysis.
10. There are no perfect linear relationship among the explanatory variables.
Least Squares Method of Estimation
The method of least square is attributed to Carl Friedrich Gauss, a German mathematician. Under certain
asumptions the method of least squares has some very atractive statistical properties that have made it one of the
most powerful and popular methods of regression analysis.
We want to estimate the relationship between Y and X from the sample observations above such that:
Ŷi = ̂ + ̂ Xi
Where ̂ and ̂ are estimates of the unknown parameters and and Ŷ is the estimate value of Y. The deviations
between the observed and estimated values of Y are called residuals , i.e.,
= Y – Ŷ
= Y – ̂ – ̂ X
The estimate equation will be the best fitting curve on the least square criterion. Therefore,
= (Y – Ŷi ) 2
2
i
Where, i = 1, 2, 3, ....,n.
= (Y i – ̂ – ̂ X i ) 2
is as small as possible, where i2 are the squared residuals. Now making deviation of both sides and equating zero,
we have–
2 )
̂
i = – 2 (Y – ̂ – X ˆ ) =0
i i
and
= – 2 X (Y
2
– ̂ – ̂ X ) = 0
i i i
̂
i
and X i Yi = ̂ X i X i
Where n is the sample size. These simultaneous equations are known as the Normal Equations.
Now solving the normal equations, we can get the values of ̂ and ̂ .
n X i Yi – X i Y i
̂ = 2 2
n X i – ( X i )
= (X (X
– X) (Y – Y)
– X) i i 2
i
and ̂ = Y – ̂ x
6
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
1
Where, Y = Y i
N
and
X =
Xi
n
The value of ̂ can also be shown in deviation of Y and X from their means. Using x i = Xi – X and
Y = Y – Y , we get
i i
̂ = x y i
2
i
x
i
β̂ =
X Y X (X i
2
i i i
2
i )
X X i i
Since,Yi = Xi + i
X i 2
β̂ = X i i
2 2
X X i i
β̂ =
X ...(i) i i
X
If we take the expected value of equation (i), we find that:
i2
E ( β̂ ) =
i
E i i
X
= X E ( ) 0 i i
Xi X
2
2
i
Hence, ̂ is an unbiased estimators of . This along with the linear properties shows that ̂ is an unbiased linear
estimator of .
Next we prove that the variance of ̂ is the best among a class of linear unbiased estimators, i.e., ̂ has the
smallest variance. Let us first obtain the variance of ̂ .
7
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
X i i 2
ˆ
2
= E 2 2 i i 2 2 X n n 2 X1 X 2 1 2 ...
( X i )
= (
1 2 Xi E i
2
2 2
X E ... X
2
2
2
2
2
n
2
n
X )
i
i
E ( )
2
=
i
X2 2
X X2
2 2
i i
2 2
ˆ =
X i 2 ...(ii)
To prove that of ̂ has the smallest variance of all linear unbiased estimates, we define:
= C i Yi
X
Where Ci = i
D
X 2 i
i
C i Xi = 1
Xi
D
Ci =
X 2 i
i
2
Xi
We have C X = Di X i
i i X 2
i
i
2
X D X
C i Xi =
Xi 2 i i
C i Xi = 1 + D i Xi
Thus E ( ) = only if the following condition is satisfied.
D i Xi = 0
Now, Variance of *.
2
*= E ( * – ) 2
= E C
2
i i
i
= 2 C
2
8
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
X 2 Xi
=
2
2i
D 2
2 di
2
*
i
X
i
Xi
2
2 2 Di 2
=
X i2
Since is at least equal to or larger than and D i 2 is at least zero or larger. This prove that has the
2 2
* *
smallest variance of all linear unbiased estimates. Hence, β̂ is a Best Linear Unbiased estimator (BLUE).
SECTION-B
Q. 3. What is the underlying idea behind the probit model? Explain how parameters are estimated in the
probit model.
Ans. The Probit Model: We discussed in next topic to explain the behaviour of a dichotomous dependent
variable we will have to use a suitable chosen CDF. the logit model uses the cumulative logistic function. But this is
not the only CDF that one can use. In some applications, the normal CDF has been found useful. The estimating
model that emerages from the normal CDF is popoularly known as the probit model, although sometimes it is also
known ass the normit model. In principle one could substitute the normal CDF in place of the logistic CDF. Now we
will present the probit model based on utility theory, or rational choice perspective on behaviour as developed by
McFadden.
To motivate the probit model, assume that in our home ownership example the decision of the ith family to own
a house or not depends on an unobservable utility index U i(also known as a latent variable) that is determined by one
or more explanatory variables say income X , in such a way that the larger the value of the index U the greater the
i i
probability of a family owning a house. we express the index U i as:
Ui = 1 + 2Xi (i)
Where X i is the income of the ith family.
How is the (unobservable ) index related to the actual decision to own a house? As before let Y = 1 if the family
owns a house and Y = 0 if it does not. Now it is reasonable to assume that there is a critical or threshold level of the
index, call it U i* such that if U iexceeds U i*, the family will own a house otherwise it will not. The threshold U *,
i
like
Ui, is not observable, but if we assume that it is normally distributed with the same mean and variance, it is possible
not only to estimate the parameters of the index given in (i) but also to get some information about the unobservable
index itself. This calculation is as follows:
Given the assumption of normality, the probability that U * is less than or equal to U can be computed from the
i i
standardized CDF as.
P = P (Y = 1/X) = P (U * U ) = P (Z + X )
i i i i 1 2 i
...(ii)
= F (1 + 2 X i)
Where P(Y = 1/X) means the probability that an event occurs given the value(s) of the X or explanatory variable(s)
and where Zi is the standard normal variable i.e., Z ~ N (0,2). F is the standard normal CDF, which written
explicitly in the present context is:
1 Ii – Z2 / 2
F(Ui) =
2
–
e dz ...(iii)
1 1 2 Xi – Z2 / 2
=
2
–
e dz
Since P represents the probability that an event will occur here probability of owning a house, it is measured by the
area of the standard normal curve from – to Ui as shown in fig.
Now to obtain infomation on U i the utility index as well as on 1 and 2, we take the inverse of (ii) to obtain:
I = F – 1 (U ) = F – 1 (P )
i i i
= 1 + 2 Xi
9
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Where F – 1 is the inverse of the normal CDF. What all this means can be made clear from fig. In panel a of this
figure we obtain from the ordinate the (cumulative) probability of owning a house given U *i U i, whereas in panel
b we obtain from the abscissa the value of U given the value of P , which is simply the reverse of the former.
i i
Pi = F(U*i) Pi = F(Ui)
1 1
Pi Pr(Ui Ui) Pi
– 0 + – 0 +
–1
Ui = 1 + 2 Xi Ui = F (Pi)
(I) (II)
Probit model: (I) given Ui read Pi from the ordinate;
(II) given Pi, read Ui from the abscissa
= (Xi , )
The function (Xi , ) is a commonly used notation for the standard normal probability density function.
Q. 4. What is meant by dynamic model? Explain how the following model can be estimated?
y = + x + ry – 1 + u
t t t t
Where | γ | < 1 and ut = ut–1+ εt . In the above model εt is the usual stochastic error term with mean zero
and variance σ2 and | ρ | < 1.
Ans. Dynamic or Autoregression Models: If the models includes one or more lagged values of the dependent
variable among its explanatory variables,it is called an autoregression model.
Thus,
Yt = 1 + 2 Xt + 3 Xt – 1+ 4 Xt – 2 + t
Represents a distributed–lag model, Whereas,
Yt = 1 + 2 Xt + 3 Xt – 1 + t
is an example of an autoregressive model. The latter are also known as dynamic models since they portray the time
path of the dependent variable in relation to its past value(s).
Here, the dynamic model with lagged dependent variables on the right hand side of the equation along with the
usual explantory variables.
Yt = Yt – 1 + Xt + t
(1 – L) Y t = Xt + t
1
Yt =
Xt
1 – L 1 – L
t
10
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
estimates of is the true coefficient of in Y = (X + X + 2 X + ...) ( + + 2 + ...)
t–1 t t t–1 t–2 t t–1 t–1
is in fact different from . Remember that represent the system dynamics and helps in estimating the speed of the
response or adjustment. Thus, misspecification may lead to wrong inferences about the speed of the response.
Assme that the order ferm t follows the nth order Auto reression AR (n) scheme as follows:
t = P1 t–1 + P2 t–2+ -- + Pn t–n + Ut
Where Ut is a white nosie error term, The nulls hypoteris Ho to be tested is that
HO : P1 = P2 --- Pn = 0
i.e. there is no serial correlation of any order. The LM test unvalues following steps:
Estimate equation by OLS and Obtain the residnal ε̂ t
Regress ε̂ t .
ε̂ t = 1 2 X t P1ε̂ t P2 ε̂ t -2 .... Pn ε̂ t-n U t
2
and R from auxiblary regression.
Q. 5. Suppose the error terms in a regression model are correlated. What are its consequences on the OLS
estimators? How do you detect autocorrelation?
Ans. The assumption of the classical linear regression model that the error or disturbance t , entering into the
Population Regression Function (PRF) and random or uncorrelated is violated, the problem of serial or auto correlation
arises.
Autocorrelation can arise for several reasons, such as inertia or sluggishness of economic time series, specification
bias resulting from excluding important variables from the model or using incorrect functional from, the cobweb
phenomenon, data massaging and data transformation. As a result, it is useful to distinguish between pure
autocorrelation and “induced” auto correlation because of one or more factors just discussed. Although in the presence
of autocorrelation the OLS estimators remain unbiased, consistent and asymptiotically normally distribution they
are no longer efficient.
The problem of autocorrelation primarily arise in the case of time series data and not for cross-sectional data.
Pannel data involves a different set of tools and not covered in this course. For time series data, however there is
usually considerable inertia, most macro- economic time series data follow a cyclical behaviour due to business
cycles. For instance, if past consumption (for last period and /or the period before that) is used as an explanatory
variable or regression in the MRM. This almost always results in autocorrelated error terms and adequate steps must
be taken to correct this. Therefore, one needs to be very cautious while using any kind of time series data for the
problem of autocorrelation.
In the following figures we show different error terms, the true values. We know the true values of error terms
are not observable and are given here for illustrative purpose. Now, we visualize some of the plausible patterns of
auto and non auto correlation.
In reality we will observe only the estimated residuals from the MRM. First figure shows that there is no
autocorrelation. In second figure shows that there is positive autocorrelation in the error structure, and the third
figure shows that there is negative autocorrelation. In the first figure the error terms are scattered around zero.
Which is expected since it is a normal variable drawn independently with mean zero. So, it is clear that there should
be no correlation between consecutive error terms. So a plot should not reveal any pattern at all. In second figure on
the other hand there is substantial correlation between the error terms in consecutive periods, there are several
periods when the error is high followed by serval periods when it is low. So there is a cyclical pattern to be observed
in the plot of residuals. This is true for most time series obtained from the real world i.e., this is the most common
pattern. From third figure we can shows negative autocorelation. It shows period to period fluctuations, i.e., if this
period the error is highly positive the next period it will be highly negative and the period after that it will be highly
positive again. In other words, consecutive terms are correlated but the sign changes from positive to negative and
back and so no.
11
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
1.5
0.5
-0.5
-1
-1.5
0
2 4 6 8 10 12 14 16 18 20
Fig.: Error Term with No Autocorrelation
2.5
1.5
0.5
0.5
–1
0 2 4 6 8 10 12 14 16 18 20
Fig.: Error Term with Positive Autocorrelation
-1
-2
-3 0 2 4 6 8 10 12 14 16 18 20
Fig.: Error Term with Negative Autocorrelation
Implications of Autocorrelation
As in the case of heteroscedasticity, in the presence of autocorrelation the OLS estimator are still linear unbiased
as well as consistent and asymptotically normally distributed but they are no longer efficient (i.e., Minimum Variance.
It means that since the variance of the OLS estimator is larger than alternatives, the confidence interval for the
estimated coefficients in the MRM are likely to be larger. Again, as in the case of heteroscedasticity, we distinguish
12
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
two cases. For peda- gogical purpose we will continue to work with the two-variable model although the following
discussion can be extended to multiple regression without much trouble. We know from the previous chapter-3 that
is the confidence interval are constucted using estimated coefficients and plus/minus a constant depending on the
level of confidence chosen (e.g., 95%) times the estimated variance of the estimate. When be consider a particular
estimate lies in the OLS confidence interval we could accept the hypothesis is zero with 95% confidence. But if we
were to use the OLS confidence interval, we could reject the null hypothesis is zero, which lies in the region of
rejection.
To establish confidence intervals and to test hypotheses one should use GLS and not OLS even though the
estimators derived from the latter are unbiased and consistent.
Problems Arising from Autocorrelation: The situation is potentially very serious if we not only use ̂ 2 but
also continue to use var ( ̂ 2 ) = x 2
which completely disregards the problem of autocorrelation i.e., we mistakenly
/2 t
believe that the usual assumptions of the classical model hold true. Error will arise for the following reasons:
̂ t
2
there which is non-zero as long as 0 ), its variance under (first-order) autocorrelation even though the
latter is inefficient compared to var ( ̂ 2 )GLS.
(iv) Therefore, the usual t and F tests of significance are no longer valid, and if applied are likely to give seriously
misleading conclusions about the statistical significance of the estimated regression coefficients. It is more
likely that we find as insignificant relationship, falsely.
Q. 6. Suppose the explanatory variable in a regression model is measured with error. What are its
consequences? What steps will you take to solve the problem?
Ans. Consequences of Errors in Variables: Measurement error is exactly what it says, either the dependent
variable or the regressions are measured with error. Thinking about the way economic data are reported measurment
error is probably quite prevalent. For example, estimates of growth of GDP inflation etc., are commonly revised
several times.
When these data are used (with containing errors) one of the assumptions of classical least squares method is
violated.
In this case, the classical least squares estimator will be biased even when sample size increases, which is
alternatively known as asymptotic biases or inconsistency.
Under the classical assumption that the OLS estimators are Best Linear Unbiased Estimator (BLUE). We consider
one of the major underlying assumption in the interdependence of regression from the disturbance term. If this
condition does not hold then the Ordinary Least Squares (OLS) estimators are biased and inconsistent. This statement
may be illustrated by simple errors in variables.
Here, we discuss about the consequences in the following, if the error appears in the measurment of dependent
variable, independent variables or both.
1. Measurement Error in Y: Here, we consider that only the dependent variable contains error of measurement.
Let us assume that the true regression model,
Yi = Xi + i ...(i)
Where = stochastic disturbance terms. Since Y is not directly measureable, we may use an observable variable
i i
Yi*, such that
Y* = Y + U ...(ii)
i i i
Where U denote errors of measurement in Y . Which is not associated with the regressior. Thus, we have
i i
cov (U , X ) = 0
i i
The regression model is estimated by Y *i as the dependent variable, with no account being taken of the fact that
Y * is not an accurate measures of Y . Therefore, instead of estimating equation (i), we estimate
i i
13
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Y * = Y +U
i i i
14
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Y* = (X + ) + U
i i i
Y* = X + ( +U )
i i i
Y* = X i + V i ...(iii)
whereV = + U is a composite error term, containing the population disturbance term (which may be called
i i i
the equation error term) and the measurement error term U .i
For simplicity assume that :
E (i) = E (Ui) = 0,
cov (X , ) = 0
i i
(Which is the assumption of the classical linear regression) and
cov (X , U ) = 0,
i i
i.e., the error of measurement in Y are uncorrelated with X and
i i
cov (X , U ) = 0,
i i
i.e, the equation error and the measurement error are uncorrelated with this assumption, it can be see that estimated
from either equation (i) or equation (ii) will be an unbiase estimator of the true i , i.e., the error of measurement in
the dependent variable Yi do not destroy the unbiasedness property of the OLS estimators. However, the variance
and standard error of estimated from equation (i) and equation (ii) will be different because employing the usual
formula, we obtain
From equation (i)
var (̂ ) = 2
...(iv)
x
2
i
x i
2
2 2
var (̂ ) = 2U
xi
2 U 2 ...(v)
var (̂ ) =
xi 2 xi 2
Obviously the latter variance is larger than the former. Therefore, although the error of measurement is the
dependent variable still give unbiased estimates of the parameters and their variance, the estimated variance are now
larger than in the case where there are no such error of measurement.
Measurment Error in X: We have taken the true regression model deviation form to be:
Yi = Xi + i ...(i)
Now let us assume that explanatory variable X is measured with error and the observed value becomes X *, such
i i
that
X * =X + w
i i i
X =X * – w ...(ii)
i i i
where w represents error of measurment in X *. Therefore, instead of estimating equation (i), we estimate
i i
Y = (X * – w ) +
i i i i
Y = X * – w +
i i i i
Y =X * + ( – w )
i i i i
Y = X + Z ...(iii)
i i i
Where Z = – w
i i i
a compound of equation and measurement errors. Now even if we assume that w ihas zero mean, is serially independent
and is uncorrelated with we can no longer assume that the composite error term Z is independent of the explanatory
i i
variable Xi because,
[assuming E (Z i) = 0]
cov (Z X *) = E [Z – E (Z )] [X * – E (X *)]
i, i i i i i
15
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
cov (Z X *) = E [Z – 0] [ X * – X ]
i, i i i i
cov (Z X *) = E [ – w ] w
i, i i i i
cov (Z X *) = E [( – w ) – w 2]
i, i i i i
̂ =
X Y i
*
2
i
X
*
i
(X w ) Y i i i
̂ =
(X w ) i i
2
i i i i i i i
X 2 X w2 X w
2
̂ = x w 2 X w
i i i i
From above we can able to find that ̂ is a biased estimate as E ( ̂ ) . Let us see the asymptotic properties of
̂ .
Since w and are stochastic and are uncorrelated with each other as well as with X , we can say that
i i i
X i 2
Plim ̂ = Plim
X i
2
wi 2
var (X)
= 2
var (X)
w
[ Xi = Xi – x by definition]
2
x
=
x
2
w
2
2 x
=
x
2
w
2
1
= 2 2 ...(v)
1 w / x
Thus, ̂ is biased even for an infinite sample and ̂ is an inconsistent estimator of .
3. Measurment Error in both X and Y: Now let us assume that both X and Y have errors of measurement. The
true model is as before:
Yi = Xi + i
16
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Since X and Y have error of measurements, we observe X * and Y * instead of X and Y , such that
i i i i
17
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Y * = Y +U
i i i
X* = X + w
i i i
Where U and w present the errors in the values of Y and X respectively. We make the following assumptions
i i i i
about the error terms.
(i) There is no correlation between the error term and corresponding variable i.e.,
E (Yi, Ui)= 0
E (X , W ) = 0
i i
(ii) There is no correlation between error of one variable and measurement of the other variable i.e.,
E (U , X ) = 0
i i
E (w , Y )= 0
i i
(iii) There is no correlation between errors in measurement of both the variables, i.e.,
E (U ,w )= 0
i i
On the basis of the above assumptions our estimated regression equation will be
Yi = Xi
Y *– U = (X * – w )
i i i i
Y * = X * – w + U
i i i i
Y * = X * + (U – w ) ...(i)
i i i i
From equation (i) we shows that if the model Y * as a function of X * the transformed disturbance contains the
i i
measurement in . Then we can write the OLS estimator ‘’ as:
̂ =
X Y i
*
i
*
*2
X i
̂ = (X w ) i i
2
̂ =
(X i wi ) (Xi Ui )
(X i wi )2
̂ = i i 2 i i
X w 2 X w
2
var (X)
=
var (X) + 2
w
2 x
=
x
2
w
2
1
=
1
2
w /
X
2
Thus ̂ will not be a consistent estimator of . The presence of measurement error of the type in question will
lead to an underestimate of the true regression parameter if ordinary least squares are used.
Instrumental Variables Method
One other suggested remedy is the use of instrumental or proxy variables that, although highly correlated with
the original X variables are incorrelated with the equation and measurement error terms ( i.e.,
18
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
and w ). As a general rule, we tend to pass over the problem of measurement error, hoping that the errors are small
i i
enough which will not destroy the validity of the estimation procedure. We can solve the measurement error problem
with “technique of instrumental variables estimation”. We discuss the concept of instrumental variables because it is
likely to be useful with measurement errors.
Suppose we can use an instrumental variables (Z) to replace an explanotor variable (X) i.e., correlated with error
term and the same time uncorrelated with the error term in the equation as well as the error of measurement of both
variables.
We can concerned with the consistency of parameter estimates and therefore concentrate on the relationship
between the variables Z and the remaining variables in the model when the sample size gets large. We can define the
random variable Z to be an instrument if in the regression.
(i) The correlation between Z, , U and w approach zero respectively as the sample size gets large.
(ii) As the sample size gets large, the correlation between Z and X is non-zero.
We simply select one instrument (or combination of instruments) that has the highest correlation with the X
variables we can assumed that, such a variable can be found, we can alter the least squares regression procedure to
obtain estimated parameters that are consistent unfortunately, there is no guarantee that the estimation process will
yield unbiased parameter estimates.
Let we consider the case of measurement error in the independent variable i.e.,
Yi = Xi + i ...(i)
Where only X is measured with error
(as X* = X + w).
For solving equation (i) we take the regression equation
Yi = Zi + i ...(ii)
where Z = The instrumental variable
The instrumental variables estimator of the regression slope in the above model is:
Y Z i i
̂ = ...(iii)
X Z i
*
i
The choice of this particular slope formula is made so that the resulting estimator will be consistent, for this we
can derive the relationship between the instrumental variables estimator and the true slope parameter, such that
Y Z i i
̂ =
X Z i
*
i
i i i
(X
* *
)Z
̂ =
*
X Z
i i
i i i i
X Z
*
Z
*
̂ = *
X i Zi
̂ = i * i
Z *
X i Zi
It is clear that, the choice of Z as an instrument guarantees that ̂ will approach as the sample size gets large
[cov (Z, *) approaches to zero] and will therefore be a consistent estimator of . Remember that the variable X *i in
equation (iii) was not repleced by Z i in the denominator of instrumental variables estimator, as the estimator
Y Zi i
does not yield a consistent estimator of as follows:
Z
2
i
̂ = Y Z i
2
i
Z
i
19
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
(X Z ) Z
* *
̂ = i 2 i i
i
i i i i
X Z
*
Z
*
̂ =
Z
2
i
i i i i i
(X w ) Z Z
*
̂ = Z*
i
i i i i i i
X Z Z w Z
*
̂ =
Z i
2
P lim ̂ = Plim X i Zi
Z i
2
cov (Xi2 , Zi )
= Z
Now, we can discussed for provide a simple solution to a difficult problem with the technique of instrumental
variables. It is defined as “estimation technique”, which yields consistent estimates.
Conclusions
(i) The OLS estimation technique is actually a special case of instrumental variables. Which is follows because
in the classical regression model X is uncorrelated with the error term and because X is perfectly correlated with
itself.
(ii) If we generalize the measurement error problem to errors in more than one independent variable, one instrument
is needed to replace each of the designated independent variables.
(iii) Finally, we repeat that instrumental variable estimation guarantees consistent estimation but does not guarantee
unbiased estimation.
Q. 7. Write short notes on the following:
(a) Factor Analysis.
Ans. Factor analysis is a method for investigating whether a number of variables of interest Y , Y , ..., Y , are
1 2 l
linearly related to a smaller number of unobservable factors F , F , ..., F .
1 2 k
The fact that the factors are not observable disqualifies regression and other methods previously examined. We
shall see, however, that under certain conditions the hypothesized factor model has certain implications, and these
implications in turn can be tested against the observation. Exactly, what these conditions and implications are, and
how the model can be tested, must be explained with some care.
Factor analysis is a collection of methods used to examine how underlying constructs influence the responses
on a number of measured variables.
There are basically two types of factors analysis: exploratory and confirmatory.
Exploratory Factor Analysis (EFA) attempts to discover the nature of the constructs influencing a set of
responeses.
Confirmatory Factor Analysis (CFA) tests whether a specified set of constructs is influencing responeses
in a predicted way.
Both types of factor analysis are based on the Common factor Model, illustrated in figure below. This model
proposes that each observed response (measure 1 through measure 5) is influenced partially by underlying
common factors (factor 1 and factor 2) and partially by underlying unique factors (E1 through E5). The
strength of the link between each factor and each measure varies, such that a given factor influenes some
measures more than others. This is the same basic model as is used for LISREL analysis.
20
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Measure1 E1
Factor 1 Measure 2 E2
Measure3 E3
Factor 2 Measure4 E4
Measure5 E5
Factor analysis are performed by examining the pattern of correlations (or covariances) between the observed
measures. Measures that are highly correlated (either positively or negatively) are likely influenced by the
same factors, while those that are relatively uncorrelated are likely influenced by different factors.
(b) Varimax solution.
Varimax Rotation and Simple Structure Concepts: When the first factor solution does not reveal the
hypothesized structure of the loadings, it is customary to apply rotation in an effort to find another set of loadings,
that fit the observations equally well but can be more easily interpreted. As it is impossible to examine all such
rotations, computer programs carry out rotations satisfying certain criteria.
Perhaps the most widely used of these is the varimax criterion. It seeks the rotated loadings that maximize the
variance of the squared loading for each factor, the goal is to make some of these loading as large as possible, and the
rest as small as possible in absolute value. The varimax method enourages the detection of factors each of which is
related to few variables. It discourages the detection of factors influencing all variables.
The quartimax criterion, on the other hand, seeks to maximize the variance of the squared loading for each
variable, and tends to produce factors with high loading for all variables.
Rotation Methods: Varimax Orhtogonal Transformation Matrix
1 2
1 0.99974 0.02264
2 – 0.02264 0.99974
Rotated Factor Pattern
Factor 1 Factor 2
FIN 0.00723 0.99993
MKT 0.99572 – 0.05900
POL 0.99471 0.07393
Variance explained by each factor
Factor 1 Factor 2
1.980964 1.008805
Final Communality Estimates: Total = 2.989769
FIN MKT POL
0.999910 0.994940 0.994920
Fig. SAS output continued, data of Table-1
21
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More
Following figure shows the output produced by the SAS program, instructed to apply the varimax rotation to the
first set of loading shown in Figure.
The output is translated and interpreted in Table-2.
The estimates of the communality of each variable and of the total communality are the same as in first Table but
the contributions of each factor differ slightly. in this example, rotation did not alter appreciably the first estimates of
the loading or the proportions of the observed variances explained by the two factors.
Table
Varimax Rotation Data of Table Standardized Variables
Standardized Observed Loading on Communality Percent
VariableY ' VariableS '2 F b F,b b 2+b 2 explained
i i 1 i1 2 i2 i1 i2
22
www.findyourbooks.in | E-Books | Solved Assignments | Ignou Guide Books | Literature & Lot More