Ecometrici Module
Ecometrici Module
Department of Economics
Collage of Business and Economics
Arsi University
January 2019
Assela, Ethiopia
PREFACE
In recent times, decision-making is becoming more research oriented. This demand availability of data
and information collected and analyzed in scientific manner using relevant analytical methodology. To
this end, econometrics techniques play great role. It is becoming common to see econometric applica-
tions in economic planning, research, and polices making. Furthermore, tremendous advancement in
computer technology and data science has made econometrics a handy tool to explain complex realities
of the actual world. Accordingly, researcher and students should be equipped with the basic theoretical
knowledge of econometrics and application issues (software usage and data analysis).
This teaching material on “Introduction to Econometrics (Econ2061)” is prepared to equip students and
researchers with the basic knowledge of econometrics as it gives a sound introduction into modern
econometrics for beginners and intermediate. It also provides the student with a thorough understand-
ing of the central ideas and their practical application along with examples. Besides, the material used
empirical data and examples for practical exercise using common econometrics software, STATA, to
extend knowledge on the subject. It also help student while doing research.
There are few materials that combine theoretical knowledge with data analysis package (software) and
availed the result for students. This material is designed to fill the gap in harmonized curriculum and
shortage of relevant reference materials in the library. The material use three ingredients: lectures to
discuss econometric models with illustrative examples; computer sessions to get hands-on practical ex-
perience related statistical software using real data and exercise sessions treating selected exercises to
train econometric methods on paper. The students can get further knowledge by working on the empiri-
cal exercises at the end of each chapter.
To better understand the subject, solid knowledge of the basic principles and theories of econometrics
along with sufficient background of statistics, calculus and algebra are required.
i
ACKNOWLEDGEMENT
The timely and successful completion of the material could hardly be possible without the helps and
supports from individuals and organization. I will take this opportunity to thank all of them who helped
me either directly or indirectly during this important work.
First of all I will like to express my sincere gratitude to Arsi University, Collage Business and Eco-
nomic for funding considering urgent need of the material. I am really would like thank office of Vice-
Dean for Research and Community service for guidance and facilitation in writing the teaching mate -
rial. I am also thankful to collage of business and Economics and Economics department staffs for
their comment and support.
My special thanks go to Dr. Endalu FuFa, the director of Arsi University educational quality assurance,
for unreserved comment and advice on the material writing and modality. He not only rendered help
educationally but also emboldened my spirit to carry forward my work. I am immensely grateful to
Tesfaye Abera for his valuable guidance, continuous encouragements and positive supports. I would
like to appreciate Mesay mesfin, who entirely edited the material, for always showing keen interest in
my queries and providing important suggestions and editing. Finally, I express my gratitude to
Beteliem Abera who wrote half of the script.
I owe a lot to wife for her constant love and support. She always encouraged me to have positive and
independent thinking, which really matter in my life. I would like to thank her very much and share
this moment of happiness with her. I also express whole hearted thanks to my friends for their care and
moral supports. Furthermore, I thank my mother and sister for their invaluable moral supports and con-
cern for my work.
Mohammed Beshir (Ato)
Lecturer and Head of Department
Department of Economics
Collage of Business and Economics
Arsi University, Ethiopia
ii
Overview of the teaching materials
The approaches applied in these teaching materials are carefully designed, and take into consideration
the existing Ethiopian reality. The material furnishes students with knowledge along with hypothetical
examples to support them. I believe that this approach will enable the students to learn the science of
econometrics and, at the same time, take a glimpse of country’s data and how it can be used in re-
search.
The material is divided into two major parts constituting 9 chapters altogether. The first part of the ma-
terial is organized in such a way that it covers four chapters. The first chapter introduces students with
the basic concept of econometrics such as definition and scope of econometrics, a brief explanation of
the methodological stages of econometric research. Furthermore, it reviews regression and correlation.
The second chapter deals with the simple linear regression models and estimation technique to esti-
mate parameters along with explanation of the desirable statistical properties. Chapter three extends the
lessons of simple linear regression models to multiple linear regression models. Chapter four relaxes
the assumptions of linear regression models and looks at some of violations along with remedial mea-
sures.
In part two we consider models of dummy variables, simultaneous equation, time series and panel data
econometrics. More specifically, chapter five concerned with qualitative independent variables treated
as dummy. Chapter six extends dummy variables concept to the case the dependent variable. We go
through linear probability model, probit, and logit, and tobit model. Chapter seven is about simultane-
ous equation model along with estimation techniques such as indirect least square, two stage least
square, instrumental variable models. Chapter eight is about times series data analysis and forecasting.
Lately, ninth chapter is concerned with panel data econometrics which has the characteristics of both
cross sectional and time series.
Since its coverage is comprehensive (from simple to intermediate) which contains the study of almost
all topics prescribed in new modular syllabus, instructors at university level will find it helpful for
them. Suggestions for further improvements of the paper from students and fellow teachers will be
heartily welcomed.
Mohammed Beshir (Ato)
iii
Table of Contents
PREFACE..................................................................................................................................................i
ACKNOWLEDGEMENT.........................................................................................................................ii
Overview of the teaching materials..........................................................................................................iii
Table of Contents.....................................................................................................................................iv
List of figure.............................................................................................................................................ix
List of Tables.............................................................................................................................................x
Chapter One: Basic Concept in Econometrics.......................................................................................11
1.1. Introduction......................................................................................................................................11
1.2. Definition of Econometrics................................................................................................................2
1.3. Characteristics of Econometrics.........................................................................................................2
1.4. Goals of Econometrics........................................................................................................................7
1.4.1 Analysis: Testing of Economic Theories......................................................................................8
1.4.2. Policy and/or Decision Making...................................................................................................9
1.4.3. Forecasting...................................................................................................................................9
1.5. Division of Econometrics.................................................................................................................10
1.6. Methodology of Econometrics.........................................................................................................12
1.6.1. Question to be analyzed, economic theory and past experiences..............................................13
1.6.2. Formulating mathematical models............................................................................................15
1.6.3. Formulating Statistical and Econometric models......................................................................19
1.6.3.1. Statistical model.................................................................................................................19
1.6.3.2. Econometric model and its specification............................................................................21
1.6.4. Obtaining Data...........................................................................................................................22
1.6.5: Estimation of the model.............................................................................................................27
1.6.6. Evaluation of the estimates/ Hypothesis Testing.......................................................................27
1.6.7. Forecasting or Prediction power of the model.........................................................................29
1.6.8. Using model for control or policy purposes..............................................................................30
1.6.9. Presenting the findings of analysis............................................................................................30
1.7. Correlation and regression analysis.............................................................................................32
1.7.1. Goal.......................................................................................................................................32
1.7.2. Types of relationship between /among variables..................................................................32
1.7.2.1. Covariance..........................................................................................................................32
1.7.2.2. Correlation..........................................................................................................................34
1.7.2.3. Regression..........................................................................................................................36
Empirical Example..................................................................................................................................39
iv
Summary..................................................................................................................................................40
Key terms.................................................................................................................................................41
Reference.............................................................................................................................................41
Review Questions....................................................................................................................................42
Chapter Two: Simple Linear Regression Model.................................................................................45
2.0. Introduction......................................................................................................................................45
2.1. Economic theory of two variable regressions...................................................................................45
2.2. Economic Models (Non-stochastic Relationships)...........................................................................46
2.3. Statistical model and/or Econometric model or stochastic model....................................................47
2.4. Data and area of applications............................................................................................................51
2.4.1. Data type and Hypothetical Example........................................................................................51
2.4.2. Population Regression Function (PRF).....................................................................................54
2.4.3. Sample Regression Functions (SRF).........................................................................................55
2.5. Estimation of the Classical Linear Regression Model......................................................................57
2.5.1. Assumptions of the Classical Linear Stochastic Regression Model..........................................57
2.5.2. Methods of Estimation...............................................................................................................64
2.5.2.1. The ordinary least square (OLS)........................................................................................64
2.5.2.2. Method of moments (MM).................................................................................................78
2.5.2.3 . Maximum likelihood principle.....................................................................................80
2.6. Evaluation of estimates: Statistical Properties of Least Square Estimators................................86
2.6.1. Theoretical a priori criteria (Economic criteria)........................................................................86
2.6.2. The econometric criteria: Properties of OLS Estimators.....................................................87
2.6.2.1. Desired properties of OLS estimators................................................................................87
2.6.2.2. Normality assumption, and probability distribution of disturbance term, ui.....................89
2.6.2.3. Distribution of dependent variable (Y) under normality assumption.................................91
2.6.2.2. Normality assumption, and corresponding properties of OLS estimates...........................92
2.6.3. Statistical test of the OLS Estimators (First Order tests)...........................................................94
2.6.3.1.Tests of the ‘goodness of fit’ with R2..................................................................................94
2.6.3.2. The correlation coefficient (r)...........................................................................................100
2.6.3.3. Precision of the estimators/ the distribution of random variable......................................102
2.6.4. Especial Distributions............................................................................................................106
2.7. Interval Estimation and Hypothesis Testing..............................................................................108
2.7.2. Interval estimation and Distribution....................................................................................108
2.7.2. Hypothesis testing and its approaches.................................................................................114
2.7.2.1 The standard error test of the least square estimates.........................................................115
2.7.2.2 Test of significance approach............................................................................................117
2.7.2.3 The Confidence Interval Approach to Hypothesis Testing...............................................124
v
2.7.2.4. The P-value approach to hypothesis testing.....................................................................126
2.7.2.5. χ2 test of significance approach/ Testing ..................................................................128
2
vii
4.5. Model Specification and Data issues..............................................................................................260
4.5.1. Over view................................................................................................................................260
4.5.2. Endogenous regressors assumption and Its problems: E(ɛi|Xj) ≠ 0.......................................260
4.5.3. Causes of misspecification and data issues.............................................................................261
4.5.3.1. Functional Form Misspecification (Specification Errors)................................................261
4.5.4. Consequences of misspecification problem........................................................................269
4.5.5. TESTS OF SPECIFICATION ERRORS............................................................................269
4.5.6. Remedial measures model misspecification and data issues.................................................274
Summary................................................................................................................................................275
Reference...............................................................................................................................................276
Review questions...................................................................................................................................277
viii
List of figure
FIGURE 1.1: ECONOMETRICS AS MULTIFACETED DISCIPLINE..................................................................................................................3
FIGURE 1.2: DEMAND CURVE.........................................................................................................................................................4
FIGURE 1:3 INTERPLAY OF ECONOMETRICS WITH COMPUTER APPLICATIONS.............................................................................................7
FIGURE 1.4: BRANCHES OF ECONOMETRICS BASED ON TYPE OF ANALYSIS............................................................................................11
FIGURE 1.5: DECOMPOSING ECONOMETRICS BASED ON DATA TYPE....................................................................................................12
FIGURE 1.6: FLOW CHART FOR THE STEPS OF AN EMPIRICAL STUDY...................................................................................................13
FIGURE 1.7: SCATTER PLOT OF LUNG CANCER AND SMOKING..............................................................................................................32
FIGURE 1.8: HYPOTHETICAL DISTRIBUTION OF SON’S HEIGHTS CORRESPONDING TO GIVEN HEIGHTS OF FATHERS...........................................37
FIGURE 2.1: REGRESSION LINE AND THE SCATTER DIAGRAM...............................................................................................................47
FIGURE 2.2: REGRESSION LINE AND ERROR TERM............................................................................................................................48
FIGURE2.3: CONDITIONS DISTRIBUTION OF EXPENDITURE FOR VARIOUS LEVEL OF INCOME.......................................................................54
FIGURE 2.4: POPULATION AND SAMPLE REGRESSION LINE.................................................................................................................56
FIGURE 2.5: REGRESSION LINES BASED ON TWO DIFFERENT SAMPLES..................................................................................................57
FIGURE 2.6: HOMOSCEDASTIC VARIANCE AND ITS DISTRIBUTION.........................................................................................................59
FIGURE 2.7: ERROR TERM AND FITTED SRF....................................................................................................................................66
FIGURE 2.8: DECOMPOSING THE VARIATION DEPENDENT VARIABLE (Y)................................................................................................95
FIGURE 2.9: TENDENCY OF T-DISTRIBUTION TO Z DISTRIBUTION.......................................................................................................107
ix
List of Tables
TABLE 1.2: VARIABLE TYPE AND NAMING’S.....................................................................................................................................17
TABLE 1.1: ABEBE'S DEMAND SCHEDULE FOR AN ORANGE..................................................................................................................4
TABLE 1.3: RECTANGULAR DATA SET FOR Y ,X, AND Z......................................................................................................................23
TABLE 1.4: CONSUMPTION EXPENDITURE AND INCOME RELATION......................................................................................................25
TABLE 1.5: CIGARETTES AND LUNG CAPACITY..................................................................................................................................32
TABLE 1.6: CIGARETTES AND LUNG CAPACITY COMPUTATION.............................................................................................................33
TABLE 1.8: CORRELATION COEFFICIENT COMPUTATION......................................................................................................................34
TABLE 1.9: DATA OF SUPPLY AND PRICE........................................................................................................................................36
TABLE 1.10: ROAD ACCIDENT AND CONSUMPTION COEFFICIENTS........................................................................................................39
TABLE 1.11: HYPOTHETICAL DATA ON CONSUMPTION EXPENDITURE (Y) AND INCOME (XI) BPG HET.........................................................44
TABLE 2.1: HOUSEHOLD INCOME(X) AND CONSUMPTION EXPENDITURE(Y)...........................................................................................52
TABLE 2.2: WEEKLY FAMILY CONSUMPTIONS EXPENDITURE AND CONDITIONAL MEAN INCOME..................................................................53
TABLE 2.3: RANDOM SAMPLES FROM THE POPULATION OF TABLE 2.1..................................................................................................56
TABLE 2.4: DIFFERENT SAMPLE AND CORRESPONDING ERRORS...........................................................................................................60
TABLE 2.5: SAMPLE WEEKLY FAMILY CONSUMPTION EXPENDITURE (Y ) AND WEEKLY FAMILY INCOME (X).................................................72
TABLE 2.6: COMPUTATION OLS COMPONENTS BASED ON SAMPLE IN TABLE 2.5.................................................................................72
TABLE 2.7: ADVERTISING EXPENDITURE AND SALES REVENUE IN THOUSANDS OF DOLLAR........................................................................73
TABLE 2.8: DETAIL COMPUTATION OF OLS COMPONENT ON Y AND X VARIABLES..................................................................................73
TABLE 2.9: FAMILY INCOME AND EXPENDITURE..............................................................................................................................77
TABLE 2.10: DEMAND FOR APPLES AND PRICES BY FAMILIES...............................................................................................................77
TABLE 2.11: THE COMPUTATION OF SUM OF SQUARES FROM THE SUPPLY FUNCTION..............................................................................97
TABLE 2.11: TEN FAMILY INCOME AND EXPENDITURE PER WEEK.......................................................................................................101
TABLE 2.12: DECISION RULE FOR T-TEST OF SIGNIFICANCE...............................................................................................................123
TABLE 2.13: REGRESSION RESULT OF INCOME AND EXPENDITURE......................................................................................................128
TABLE 2.14 : A SUMMARY OF THE 𝜒2 TEST....................................................................................................................................129
TABLE 2.15: ANOVA TABLE......................................................................................................................................................131
TABLE 2.16: ANOVA EXAMPLE................................................................................................................................................. 132
TABLE 2.17: REGRESSION RESULT BETWEEN SALES AND ADVERTISING.................................................................................................146
TABLE 2.18: HYPOTHETICAL PRODUCT AND IT CORRESPONDING PRICES..............................................................................................150
TABLE 2.19: GROSS NATIONAL PRODUCT (X) AND THE EXPENDITURE ON FOOD (Y)...............................................................................151
TABLE 4.1: THE ASSUMPTION OF CLRMA AND VIOLATION..............................................................................................................192
TABLE 4.2: CONSUMPTION EXPENDITURE ( D) AND DISPOSABLE INCOME(ID) FOR 30 FAMILIES..............................................................202
TABLE 4.3: PREDICTED CONSUMPTION EXPENDITURE AND RESIDUALS.................................................................................................203
TABLE 4.4: QUANTITY DEMANDED, PRICE OF THE COMMODITY AND INCOME OF CONSUMER DATA..........................................................204
TABLE 4.5: RANK CORRELATION TEST OF HETROSCEDASTICITY..........................................................................................................209
TABLE 4.6: HYPOTHETICAL DATA ON CONSUMPTION EXPENDITURE AND INCOME..................................................................................212
TABLE 4.7: REGRESSION RESULT OF TWO SUB GROUP FOR GOLDFELD-QUANDT..................................................................................212
TABLE 4.8: CONSUMPTION AND INCOME OVERTIME.......................................................................................................................236
TABLE 4.9: REGRESSION RESULT OF CONSUMPTION AND INCOME......................................................................................................236
TABLE 4.10: RESIDUAL FROM THE ABOVE REGRESSION...................................................................................................................237
TABLE 4.11: INVESTMENT AND VALUE OF OUTSTANDING SHARES FOR THE THREE 1935-1953...............................................................238
TABLE 12: ESTIMATED REGRESSION EQUATION OF Y ON X IS AND AUTOCORRELATION...........................................................................238
TABLE 4.13: HYPOTHETICAL VALUES OF X AND Y...........................................................................................................................243
TABLE 4.14: DATA ON THREE VARIABLES Y, X1, X2.........................................................................................................................256
TABLE 4.16: CONSEQUENCES OF VIOLATION OF CLRMA.................................................................................................................275
x
xi
Chapter One: Basic Concept in Econometrics
1.1. Introduction
Economics is mainly concerned with relationship among economic variables. For instance, quantity de-
mand of the product depends of its price, prices of related goods, taste and preference, etc and quantity
supplied of a good depends on its price and other factors. Aggregate demand is a function of consump-
tion, investments, government expenditure and net export. Furthermore; consumption function relates
aggregate consumption expenditure to the level of aggregate disposable income. We can find many
more such relationships among economic variables.
The purpose of looking at the relationships among economic variables is to understood and answers
many economic questions of real world we live in. For example, we may be interested in raising ques-
tions such as: is there any relationship between two or more variables? What type relationship exists
between two or more economic variables if it exists? Also, given the value of one variable; can we
forecast or predict the corresponding value of another? If one variable changes by certain magnitude,
by how much will another variable change? How to influence certain variables in desired way (public
or other)? What are the possible instruments for correcting macro problems? How economic theories
of fiscal and monetary policies have helped government reduce the effect of business cycle on the econ-
omy and others? Responses to these and other questions can be possible through intensive quantitative
research with the aid of econometric tools.
Econometric tools enable us apply economic theory to real problems and test their applicability. Fur -
thermore, econometrics also helps us design relevant policies based on research findings. Without being
able to forecast the growth of the economy, central bankers could not know when to stimulate the econ-
omy and when to cool it off. Without measurement of production and cost, economists could not
identify industries with competitive condition likely to benefit from deregulation and any other ob-
jectives. Without knowledge of consumer income and preferences, business could not predict the prof-
itability of their products.
To this end, we collect data from the population at hand and answer those questions and /or check
whether the theory confirms or not. If empirical data verify the relationship proposed by economic the-
ory, we accept the theory as valid .We either reject the theory or modify it in the light of the empirical
evidence it is incompatible with the observed behavior. Econometrics also enables us to extend the
frontier of knowledge of economics through developing new theories.
12
Accordingly, we are interested in knowing subject with all those advantages. We raise so many ques -
tions related to econometrics such as what econometrics is. How it is related to economic models? How
it is related to statistics? ... etc.. We will look these and other related questions. This demands brief
overview of basic concept of econometrics and econometrics analysis that will pay the way for subse-
quent chapters. First, put forward definition of econometrics, and then we describe the roles and objec-
tive of econometrics analysis. Then we try to deal with methodology of econometrics. At the end we
look at regression and correlation.
Formally, many definitions are forwarded, but all of those definitions revolve around the same issue/
conclusion. The most common and comprehensive definition can be presented as follows;
“Econometrics is the science which integrates economic theory, economic statistics, and math-
ematical economics to bring the empirical support of the general schematic law established by
economic theory”
13
Figure 1.1: Econometrics as multifaceted discipline
The following section give deep insight how econometrics is related to other fields.
A. Economic theories
Economic theory is simplified representation of economic reality as the world is complex and consisting
of different agents (firms; households etc) participating economic activities. Theory makes statements
that are mostly qualitative in nature under certain assumptions. For instance, demand of a product is in-
fluenced by many factors such as prices, tastes and preference, income, etc. But it does not imply the ex -
tent or strength of relationship between quantity demand and such variables. Personal consumption ex-
penditure depends on income, marginal propensity to consume, level of saving, autonomous consump-
tion and so on.
Econometrics gives empirical content to most economic theory. This is well founded in the definition
“Econometrics is the positive interaction between data and ideas about the way the economy works”.
We quantitatively measure the relationship among economic variables and estimate the parameters they
involve. By doing so, economic theories can be checked against empirical realities using data obtained
from the real world. If empirical data verify the relationship proposed by economic theory, we accept
the theory as valid, reject otherwise. If the theory is incompatible with the observed behavior, we either
reject the theory or modify in the light of the empirical evidence.
14
B. Mathematical Economics
To better understand economic relationships, forecast and use them as guide to economic policy mak-
ing, we need to know the quantitative relationships between different economic variables. Mathemati-
cal economics is a field which expresses economic theory and idea in quantitative form without empiri-
cal verification of the theory. Mathematical relationship can be expressed using schedule, graph or
equation. For instance, we can explain the above theoretical relationship of demand for a product and
its own price in mathematical form as follows:
Let us consider Abebe to be our representative consumer of an orange.
Now, take any two pairs of prices and quantities relationship of Abebe demand schedule, say
15
P = -2Q+18,
Hence, P = -2Q+18, is Abebe’s inverse demand function and his direct demand function is:
Q = 9 – 0.5P ……………………………………………………………..1.2
The same concept can be expressed graphically as depicted in figure 1.2 with downward sloping de -
mand curve which explaining inverse relationship between quantities demanded and price.
There is no essential difference between mathematical economics and economic theory. However,
mathematical economics are quantitative rather than qualitative. While economic theory uses verbal ex-
position, mathematical economics make use of mathematical symbols and forms. Mathematical eco-
nomic expresses relationships in an exact or deterministic form. Both state the same relationships. Nei-
ther mathematical economics nor economic theory allows for random elements which might affect the
relationship and make it stochastic. Furthermore, they do not provide numerical values for the coeffi-
cients of economic relationships.
Econometrics is mainly interested in empirical verification of the theory using data which is not the
case with mathematical economics. Furthermore, econometrics enables us derive numerical values of
the coefficients of economic relationships. As defined by Maddala (1992) explains how econometrics is
related to mathematical economics. “Econometrics is a special type of economic analysis and research
in which the general economic theories formulated in mathematical terms is combined with empirical
measurement of economic phenomena”. Besides, econometrics relationship presupposes stochastic re-
lation which assumes random relationships among economic variables. In other word, econometric
methods are designed to take into account random disturbances which relate deviations from exact be-
havioral patterns suggested by economic theory and mathematical economics.
C. Economic Statistics
Economic statistics is a descriptive aspect of economics mainly concerned with collecting, processing
and presenting economic data and their expression in readily understandable form. Moreover, it study
16
characteristics of population and/or sample such as measure of central tendency, measure of dispersion,
measure of correlation and covariance of data. Furthermore, we look at probability distribution, estima-
tion and hypothesis testing. Economic statistics neither concern with using the collected data to test
economic theories nor provide explanation of the development of various variables. Econometrics tries
to relate economic statistics and economic theory. According to T. Haavelmo, (1944) “Econometrics
is a conjunction of economic theory and actual measurements, using the theory and technique of statist-
ical inference as a bridge pier” Econometrics takes the equation of mathematical and statistical eco-
nomics and confronting them with economic data then seeks to use techniques of statistical inference
to give quantitative forms of equations.
In general, econometrics uses economic theories, mathematical economics and economic statistics. It
uses insights from economics and business relationship (deterministic and random) in selecting relevant
variables and models. It employs statistical methods of data collection to empirically measure vari-
ables. Mathematics helps to develop econometric methods that are appropriate for the data and the
problem at hand. It provides empirical value of the coefficient of the relationship and can be used to in-
fer something as conclusion.
Econometrics does not only integrate different discipline, but also relate different activities. Economet-
rics uses insights from economics and business (in selecting relevant variables and models), develop
econometric methods appropriate for the data and the problem at hand with the context of statistics and
mathematics framework. It uses computer applications and statistical software to process data and es-
17
timate the relationship anticipated by econometric models. The interplay of these disciplines in econo -
metric modelling is summarized in Figure 1.3.
Econometrics Mathematics
Statistics
Computer application
18
1.4.1 Analysis: Testing of Economic Theories
Economics is a science that studies social matter (behavior of institution and individual) via scientific
methods which is subjected to scrutiny in both logical and empirical manner. Economists also aim pri-
marily at both theories building and at their ramification. For most of its history as an academic disci-
pline, economics had little to do with statistics. Only in 1940’s economists begun to use statistical
methods to evaluate economic ideas which marked the birth of econometrics. Since then econometrics
can be used as a way of testing economic theories and a method by which statistical techniques are ap -
plied to economic problems. As a result economist gained not only a deeper knowledge about how the
economy actually work but also better insight as to how economic policy in the future will be de -
signed . In this case the purpose of economic analysis is obtaining empirical evidence to test the ex -
planatory power of economic theory. Such testing uses both deductive and inductive logic.
I. Deductive Logic
The early stages of the development of economic theory, so called Armchair economist, were concerned
with formulating basic economic principles using verbal explanation by applying deductive procedure
(from general to particular). In this period no attempts were made to examine whether the theories ade-
quately fit the actual economic behavior or not. Rather, economists by pure logical reasoning derive
some general conclusions /laws/ concerning the working process of economic system.
19
Practical Exercise 1.3
What kind of questions are of concern to both economists who build theories and to those who engage
in data analysis and testing.
1.4.3. Forecasting
Forecasting is estimating the future values of economic magnitudes using the numerical values of the
coefficients of economic relationships obtained. They also judge whether it is necessary to take any
measure in order to influence the future value of economic variables. Forecasting is used for both de-
veloped and developing countries in different ways. While developed countries use it for regulation of
their economies, developing countries use it for planning purpose.
For instance, Ethiopian government wants to make policy related to expenditure of imported goods ex-
penditure for the coming five or ten years. To formulate policy related to expenditure on imported
goods the government has to know (among others) forecasted level of personal disposable income, cur-
20
rent expenditure on imported goods, inflation rate and other factors. Accordingly, time series data from
year 1985-1995 was collected on those variables mainly on personal disposable income and expendi-
ture on imported goods. The estimated result for the Ethiopian economy for the year 1985-1995 is
found to be
………………………………………………………….1.4
the future values of expenditure on imported goods and services, he can take any measure to increase or
cut imports using these numerical values.
21
tionships are not exact. The econometrics method that will be used in theoretical econometrics may
be classified in to two
A. Single - equation techniques which shows one side relationship between variables at a time.
We have only one side causations as quantity demand depends up on the price of the com-
modity but not vice versa. For example;
------------------dd equation………………….……1.5
B. Simultaneous equation model: when there is two sided causation we find simultaneous rela-
tionship. For instance, equation (1.5) explains that quantity demand depends on the price of
the commodity but if the price of the commodity is in turn depends on the quantity supplied
equation as indicated in equation (1.6), we will have two side causations.
………………………………...……………..1.6
C. Equilibrium condition: If one imposes certain restriction such that one side of the equation is
equal to the other side it is termed as equilibrium condition. In this case we apply economet-
rics techniques simultaneously for all three equations at a time.
------------------------dd equation
------------------------ss equation
-------------------------------identity…………..…………..1.7
II. Applied Econometrics: This is the application of theoretical econometrics methods to the specific
area for verification and forecasting economic relationships such as demand, cost, supply, produc-
tion, investment, consumption and other related fields of economic theory. Applied econometrics
is the economists wind tunnel, the plane where we test our stories about how the economy
work.
Econometrics
Theoretical Applied
22
It can also be classified as cross-sectional, time series, and panel data econometrics. These categories are
discussed under methodology part in subsequent section.
Econometrics
cross-
time series panel data
sectional
Figure 1.5: Decomposing Econometrics based on data type
23
It can be graphically presented as follows;
These are challenges, problems, questions, theoretical dilemma of society in any given situations.
These questions arise due to the dissatisfaction with the status-quo. The question to be answered by
econometrics project must be a positive and not normative one. Economic questions are normative
question if its answer is subjective in which one cannot capable of being right or wrong. It de -
pends on value judgments on which two reasonable economists might differ. One cannot answer sub -
jective questions and make a measurement that provides the answers to questions objectively. For in-
stance, the question “should we reform welfare” is a normative. Its answer depends on subjective
value judgments about whether the benefit of welfare reform( increased labor force participation ,
reduced government spending ) are more important than the cost (elimination of social safety net ,
increased poverty). This is a poor question as there no evidence or data to judge.
24
On the other hand, question is positive if its answers are objective and capable of being proved as right
or wrong. Such question depends only on facts about any two reasonable economist should agree or
not? Question such as “how much would welfare spending be reduced if we reformed welfare” is posi-
tive. It gives some number that might be difficult to predict in advance and could easily be veri -
fied by comparing spending before and after welfare. An econometrics project might be designed
to identify the likely consequence of welfare reform on workforce participation by welfare recipi-
ents. Objective knowledge about the spending consequence of welfare reform is helpful in forming
a subjective opinion about whether or not reform is a good idea.
Such econometric research questions to be analyzed should be framed well. First, questions needs both
general (economic problem to be addressed) and specific whose answer will say something interesting
and useful about the topic. Second, question has to be feasible in a sense that objective answer. Fur-
thermore, questions have to be practical. It has to be possible to set up an econometric model ,
gather data, analyze those data within the time available for the project.
B. Economic theory (Statement of theory or hypothesis)
Once the questions are defined, it is important to have some central economic idea/theory that
bind variables (questions) together and gives direction. Without economic theory, the project can
descend into mere numerical calculations that leads nowhere and provide no useful economic
knowledge. Economic theories, hypothesis or preposition provide, a way of identifying and thinking
about relationship among economic variables. They are a way for making qualitative predictions about
outcomes. For example; an investigator may start with the famous Keynesian theory of marginal
propensity to consume which state that, consumption increases as income increases, but not as much as
the increase in income. That is the marginal propensity to consume (MPC) for a unit change in income
is greater than zero but less than unit (Keynes, 1936). Hence, his methodology starts by stating this pos-
tulate. Furthermore, law of demand says that as the prices of a commodity increase, its quantity demand
will decrease or there is inverse relationship between the two.
C. Previous Studies and Past Experiences
Different studies on the topic may have been conducted somewhere else which may have given clue
to this research concerning methodology, model previously used, data and their measurements which
can have relevance for the current study. Furthermore, one can learn from problems faced and reme-
dial measures taken by earlier researches. In econometric research project review of previous re-
searches which might have implication of related gap, dilemma, inconsistencies, similarities and differ-
ences … etc are par important.
25
The above relationship can be summarized as in the example below:
Problem Poverty
Questions What is the extent of poverty?
What are the factors triggering poverty?
Which factor plays key role in household poverty?
Economic theory Poverty theory and models
Previous studies Many studies on poverty in LDCs
A model is a simplified representation of an actual phenomenon (an actual system or process) which
gives us a framework of analysis for our econometric project and will help us answer those questions.
Most economic theories do not explicitly state relationship among economic variables. Model helps us
clarify and abstract relationships to explain extent of relationship, learn the type (causes-effect) and pre-
dict. It is a mechanism list of variables to be included in the econometric model, define how they are
related, and what consequences of changes in one variable will be for the model as a whole.
26
There will never be just one universally applicable model. As economist tell many stories about as -
pects of the economy that interest them, there are many economic models. They take different
forms from which details of the real world can be abstracted. Modelling or the art of model building in-
volves balancing often competing goals of realism and manageability. It should be realistic in incorpor -
ating the main elements of the phenomena being represented, specifying the interrelationships among
the constituent elements of the system. It should, however, at the same time be manageable in eliminat-
ing extraneous influences and simplifying processes so as to ensure its yields insights or conclusions
not obtainable from direct observation of the real-world system.
B. Types of Economic Models
There are alternative ways of representing the real-world system. The most important forms or types
models are verbal/logical, physical, geometric, and algebraics. Verbal/logical models use verbal analo-
gies to represent phenomena sometimes called paradigms. Physical models represent the real-world
system by a physical entity. Geometric models use diagrams to show relationships among variables. Al-
gebraic models which are the most important type of modelling for purposes of econometrics represent-
ing a real-world system by means of algebraic relations. Usually economic models are expressed
mathematically, but sometimes an adequately rigorous formulation can be done diagrammatically
or less often descriptively.
27
Table 1.2: variable Type and naming’s
Endogenous variable jointly called dependent Exogenous (es) or independent variables are vari-
variables. These are variables determined by the ables which are determined outside the system
model and are simultaneously determined by the but influence the values of the endogenous vari-
system of equations. ables. These variables affect the system but are
not in turn affected by the system
consumption theory, the consumption expenditure is dependent and income is an independent variable
which can be specified as:
. …………………………………………………..1.8
Economic theory postulates that the demand for a commodity depends on its own price (P), the prices of
substitute (Ps), price of complement (Pc), consumers’ income(Y) and tastes and preference. Quantity de-
mand of some commodity (qd) can be written as ;
………………..…………..………1.9
Along with is general one there is specific form model presented with specific functional forms.
II. Determine the theoretical sign and values of parameters
Economic model that express the relationship between variables involve questions concerning the
signs and magnitude or values of the unknown parameters to be estimated. The information about pa-
rameter (β0, β1), magnitude/ values of parameters and their sign (i.e. negative or positive relationship
28
between variables) is based on theoretical and/or priori expectation. For our example so far we can
have the following sign or direction of relationship between variables.
i. In simple consumption theory, the consumption expenditure is dependent and income is an
independent variable which can be specified as
.
……………………………………….……1.10
Where B0, B1, reflect the unknown parameters connecting consumption and income with B 0 referring
constant, and B1 is marginal propensity to consume. Likely sign for both B0 and B1are positive.
ii. economic model of demand above can be expressed with parameters as follow
………………………………….………..1.11
where B0 referring constant and B1, B2, B3, B4 reflect the unknown and unsigned parameters/coefficient
of own price, price of substitute, price of complement, income respectively.
…………………………………..1.13
According to the general theory of demand function is expected to have a negative function.
II. Non-linear Function: Consumption expenditure of an individual at time t is dependent variable
and income and future interest rate. Economic model of consumption function can be written as
………………………………………………..…………1.14
29
equations, but equation 1.14 is nonlinear equation. The coefficients of the variables ( )
will be the likely magnitudes of these coefficients. In equation 1.12 & 1.13 the coefficient repre-
sents marginal magnitude such as marginal propensity to consume which is between 0 and 1. It show
that if income increases by 1 birr on the average consumption will increase by certain amount. But, the
coefficient of equation 1.14 explains the elastic ties. i. e. explain if income increases
by 1%, consumption will increase on the average by 1% for show that if rate of interest is increas-
ing by 1%, consumption will be cut down on the average by 2 %. The magnitude or the numerical val-
The above economic relationship between Y and X is exact abstraction from reality. That is, all variation
in Y is due to solely change in X and that there are no other factors affecting the variation in Y. How -
ever, in reality such exact relationship may not exist. Rather we find stochastic relationship. The rela -
tionship between X and Y is said to be stochastic if for a particular value of X there is a whole proba -
bilistic distribution of Y. The deviation of observation from predicted line is captured by error term.
Hence, the true relationship which connects the variables is divided into two parts: a part represented by
a line and a part represented by the random term ‘u’. Accordingly,
Actual = Systematic +Random error terms.
The inclusion of Ui in the mathematical economics will transform the economic model in to statistical
model which account in part for;
wrong specification /mis-specification the model
controls for omitted factors, measurement error /errors in measuring variables,
Imperfections, looseness of statements in economic theories.
Limitation of our knowledge or factors which are operative in any particular case.
30
Formidable obstacles presented by data requirements in the estimation of large models.
Omissions of some important variables and /or equations from the functions (for example, in si-
multaneous equations model).
Irrelevant explanatory variables may be included, etc.
The values of this random variable cannot be actually observed like the values of the other explanatory
variables. We thus have to guess the pattern of the values of u by making some plausible assumptions
about their distribution.The inclusion of such stochastic disturbance terms in the economic model is the
basic tools of statistical inference to estimate parameters of the model.
A second change that we make to equation when going from the economic model to the statisti -
cal model is concept of observations. Since we are using observation or sample many observation are
included in the analysis. To differentiate one observation from the other, one should define a subscript
to each observation. Since we will be collecting data from household (observations) on Y and X the
subscript i is introduced to describe the i th observation on each of the variable. We use Y i to denote
consumption expenditure for the i th household ; Xi for income of the i th household, and ui unob-
servable random variable for all other factors influencing consumption expenditure for the i th
household. In some text book t can be used as subscript to refer i th observation or t-time period
for time series data.
III.Sampling process
The third issue in extending economic model to statistical model consideration of the process by which
the data were generated (from sample or population) and probabilities distribution along with error term.
Stated differently, statistical model should explain sampling process and probability distribution of the
data or observed value of the variables and error terms. The values taken by dependent variable are not
known with certainty. Rather, they can be considered as random drawings from a probability distribu -
tion with certain assumed moments. Random variables have certain assumed probability distribution
with defined properties such as mean, variance, and covariance. That is, the sampling process underlying
the observed Yi is directly related to the mechanism made about the random variable U i . It also
based on their probabilities distribution.
31
Let the consumption expenditure of a given product is determined by income. Illustrate the distinction
between stochastic and non-stochastic relationships with the help of a consumption function?
Given those issues of statistical and econometric model, general framework of econometric model can
be presented as follows;
……………………………1.15
Having access to only one explanatory variables we may write the complete model in the fol-
lowing way for a given household
……………………………………1.16
if everything left unaccounted will be summarized to error term, u, the equality of above equa-
tion hold true . Therefore, the system can be
……………………………………………………….1.17
Because variation in Y not fully explained by variation in X can be captured by including error
term, ui.
Accordingly, the above consumption expenditure and demand model can be modified as
One variable:
32
Many variable ………..………….…1.18
For linear specification above, we might specify our statistical model for the above example for n th ob-
servation as follows.
One variable:
where stands for the random factors which affect the quantity demanded.
It can also assume nonlinear form as specified in equation (1.20)
……………………..………1.20
Each of these functional forms represents a hypothesis about the form of relationship. Our inter -
est might center on trying to determine which hypothesis is compatible with the data.
The number of variables to be included in the model depends on the nature of the phenomenon being
studied and the purpose of the research. Usually we introduce explicitly only the most important (four
or five) explanatory variables. The influences of less important factors are taken into account by intro-
ducing a random variable in the model.
33
All data we consider in any particular analysis constitute data sets. A variable is a measurable charac-
teristic of interest of data be collected. Variables are typed as being discrete or continuous. The catego-
rization of variable as discrete or continuous be depends on situations. Discrete data variable usually
results from counting and its value are usually integers, even when the number of observation in
the data sets are quite large. Number of person in family or family size, age of students, etc. are usu-
ally measured using discrete number 0.1.2.3… . Continuous data variable usually result from mea-
suring possibly with a great precision and its value can be presented in range or specified in in -
terval. For example, income of families might be measured very precisely using continuous variable.
In a survey of 100 families their might well be 100 different value for the variable. Because of
this one may group them in to specific interval . Furthermore, number of families above can be
continuous when 0-2, 2-5.
Variables often are given symbolic names such as X, Y, Z. Depending on the nature of study we have
data of many observation on such variables. In other word, the data can be thought of as being ar-
ranged in rectangular array, in with each row being an observation and each column containing
the values of a variable as in Table 1.3 below. Suppose that we have “n” observations on the vari-
able X, Y, and Z. For example, consumption expenditure is related to income, family size, number of
children and so on. In reporting the results of data analysis there are two approaches for variables pre-
sentations: one approach is using plain words if descriptions of variables are short. Alternatively, it
often common and makes sense to use variables names that are mnemonics for the correspond -
ing characteristics (CONSUM, INCOME, FAMSIZE). These are short form or abbreviations for man-
ageability purpose.
Each element in the array symbolized by the variable name with a subscript indicating that obser-
vation number. A typical observations which we denote as the i th consist of the value Yi, Xi and
Zi. .
34
Table 1.3: Rectangular data set for Y ,X, and Z
Observation number Y X Z
1 Y1 X1 Z1
. . . .
. . . .
. . . .
. . . .
i Yi Xi Zi
. . . .
N Yn Xn Zn
Data variable is the one generated originally and it is different from random variable which can be as -
signed artificially. The distinct between these types is precisely drawn in the probability theory
dealing with random variable.
The data could be collected from either primary or secondary sources. Furthermore, the data used in the
estimation of econometric model may be of various types. The most important data structures encoun-
tered in applied work are:
I. Cross-sectional data
II. Time series data
III. Pooled data
IV. Panel data
I. Cross-sectional data
It represents measurement at a given point in time. Data collected on one or more variables at particular
period of time. It arises when the observations are of different entities(such as persons, firms, or na-
tions) for which a common set of variables is measured at a point in time. For example, the re-
sults of a survey in which various people are interviewed regarding their labor force activity and
earnings constitute a cross-sectional data set. In enrollment survey of elementary school, number of
children registered for schooling in all K.G. Schools of Adama in 2010 E.C. by sex, age, religion etc.
For use throughout this book a small cross-sectional data set has been selected from the re -
sponses to the survey of agricultural characteristics of consumers by CSA.
35
The data set consists of 100 observations on 12 variables. For convenience, the number of observations
was reduced to 100 by a random selection. Normally one would use as many observations as feasi-
ble as long as they are appropriate for the estimation at hand. These observation were selected
from among those families headed by a male aged 25-54 who was not predominantly self-em -
ployed in order to produce data relevant for examine an earning function. Also in order to ex -
amine normal behavior, families with wealth greater than 100000 were not included.
(see appendix part of this chapter)
Year X Y
1980 2447.1 3776.3
1981 2476.9 3843.1
1982 2503.7 3760.3
1983 2619.4 3906.6
1984 2746.1 4148.5
1985 2865.8 4279.8
1986 2969.1 4404.5
1987 3052.2 4539.9
36
1988 3162.4 4718.6
1989 3223.3 4838.0
1990 3260.4 4877.5
1991 3240.8 4821.0
The interval or periodicity of the time series will be annual , quarterly, or monthly depending on
whether the municipality wishes to account for change in annual , quarterly, or monthly. The type
of data available often dictates the periodicity of the data gathered.
37
The key feature of panel data that distinguishes it from a pooled cross section is the fact that the same
cross-sectional units (individuals, firms, or counties in the above examples) are followed over a given
time period. Because panel data require replication of the same units over time on individuals, house-
holds, and firms, it is more difficult to obtain than pooled cross sections.
Not surprisingly, observing the same units over time leads to several advantages over cross-sectional
data or even pooled cross-sectional data. One of the benefit is having multiple observations on the
same units allows us to control certain unobserved characteristics of individuals, firms, and so on. As
we will see, the use of more than one observation can facilitate causal inference in situations where in-
ferring causality would be very difficult if only a single cross section were available. A second advan -
tage of panel data is that it often allows us to study the importance of lags in behavior or the result of
decision making. This information can be significant since many economic policies can be expected to
have an impact only after some time has passed.
Most books at the undergraduate level do not contain a discussion of econometric methods for panel
data. However, economists now recognize that some questions are difficult, if not impossible, to answer
satisfactorily without panel data. To introduce the concept, we can make considerable progress with
simple panel data analysis, a method which is not much more difficult than dealing with a standard
cross-sectional data set.
38
principles which justify predicting ability of the model. The preliminary estimation of an economic
model does not always give satisfactory results. The result may be as per desired criteria which help us
justify what we have done. Sometimes results might surprise the investigator because variables that
were thought to be important a priori appear different after that fact is to be empirically tested
and found to be unimportant , or effects may go in unexpected directions( wrong sign). In this
case we have many options as remedial measures. We may reject the result or one has to reformu -
late the models and perhaps re-estimate them with different techniques.
For this purpose we have three common criteria (conditions) to evaluate the reliability of the estimates;
I. Economic a priori criteria / Economic interpretation of the results:
II. Statistical criteria (first-order tests) / Statistical interpretation of the results:
III. Econometric criteria (second-order tests) /Test of Econometric criterion. :
propensity to consume and its sign has to be positive & its magnitude (size) should be between zero &
one (0< <1). Given sample data collected from population the estimated results of the above con-
sumption expenditure is
………………………………………….1.21
The result show that if income increases by 1 birr consumption expenditure will increase on the average
by less than one birr i.e .203 cents. Then, the value of is less than one & greater than zero which is in
line with the economic theory or satisfies the a priori - economic criterion.
39
But, estimation of the same model using other data gives the following results
………………………………………………1.22
But the estimated model results are contradictory or do not confirm the economic theory as the sign of
is negative & its magnitude is greater than one. This is evidence to reject the model. In most of the
cases the deficiencies of empirical data utilized for the estimation of the model are responsible for the
occurrence of wrong sign and/or size of the estimated parameters. The deficiency of the empirical data
can be either due to problem of sampling procedure, unrepresentative sample observations from the
population or inadequate data or violation of some of assumptions of the method employed.
Step-B- First order test or statistical criterion: If the model passes a prior-economic criterion, then
the reliability of the estimates of the parameters will be evaluated using statistical criterio. . Confirma-
tion or refutation of economic theories based on sample evidence is object of statistical inference.
These aim at the evaluation of the statistical reliability of the estimates of the parameters of the model
determined by statistical theory. Since the estimated value is obtained from a sample of observations
taken from the population, then statistical test of the estimated values will help us find out how accurate
these estimates are (how they accurately explain the population?). To this end some of the most com-
monly used statistical methods of testing are correlation coefficient test, standard error test, t-test, F-
test, and R2-test. The correlation coefficient ( R2/r2/) will explain that the percentage of the total varia-
tion of the dependent variable explained by the explanatory variables (how much % of the dependent
variable is explained by the explanatory variables). The standard error /deviation/ (S.E) measures the
dispersion of the sample estimates around the true population parameters. The lower the S.E. the higher
the reliability (the sample estimates are closer to the population parameters) of the estimates & vice -
versa. Similarly t- ratio or F-test is used for testing statistical reliability of the estimates. According to
our example; is MPC < 1 statistically significant? If so, it may support Keynes’ theory.
Step -C- Second order test /Econometric Criterion/: these aims at investigation of whether the as-
sumptions of the econometric theory (method) employed are satisfied or not in any particular case. In
other word it is all about detecting of the violation of assumptions or checking the validity or reliability
of the estimates. They help us establish whether the estimates have the desirable properties of unbiased-
ness, consistency, efficiency, sufficiency etc. If any one of the econometric assumptions is violated, the
estimates of the parameters cease to possess some of the desirable properties and the econometric crite -
rion above loses their validity & became unreliable.
40
To solve such problem the researcher has to re-specify the already utilized model. To do so some ad-
justments say introduce additional variable in to the model or omit some variables from the model or
transform the original variables etc. After re-specify the model re-estimation & re-application of all the
tests (a priori, statistical & econometric) until the estimates satisfies all the tests.
At this stage estimated model has to be evaluated for its forecasting power and use it for predictions.
The estimated model may be economically meaningful, statistically & econometrically correct for the
sample period, but it may not have a good power of forecasting. This may be due to the inaccuracy of
the explanatory variables & deficiency of the data used in obtaining the estimated values and /or rapid
change in the structural parameters of the relationship in the real world. Therefore, this stage involves
the investigation of the stability of the estimates and their sensitivity to changes in the size of the sam-
ple. Consequently, we must establish whether the estimated model performs adequately outside the
sample of data or not. i.e. we must test an extra sample performance the model.
If this happens the estimated value (i.e. forecasted) should be compared with the actual realized value
of the relevant dependent variable. The difference between the actual & forecasted value is tested statis-
tically. If the difference is statistically significant, we concluded that the forecasting power of the model
is poor. If it is statistically insignificant the forecasting power of the model is good.
Then if forecasting power of the model is found to be good, we use the model for predictions. Given
future value(s) of X, what is the future value(s) of Y? say GDP=$600 bill in 1994, what is the forecas-
41
The next issue is to use the model for policy purpose or to influence desired value level. Suppose we
have the estimated consumption function given in (1.3.3) and government believes that consumer ex-
penditure of about 4000 will keep the unemployment rate at its current level of about 4.2 percent. What
level of income will guarantee the target amount of consumption expenditure?
Y=4000= -231.8+0.7194 X Þ X ~ 5882 ……………………………..1.23
Give MPC = 0.72, an income of $5,882 bill will produce an expenditure of $4000 birr. Accordingly,
government can manipulate the control variable X to get the desired level of target variable Y through
fiscal and monetary policy.
42
3. Analysis type : It is customary to include descriptive statistics which give among other mean
and standard deviation of every variables so that the reader can get sense of what the data
look like. The research should clearly state unit of observations, number observations in the sam-
ple and the reason why you choose that particular sample to estimates the model. Furthermore,
it should present statistical findings of your results along with their economic meanings. Do not
make mere report like “there is a negative relationship between changes in unemployment and GDP
growth”. That is true, but it is not very interesting. What matter is economic interpretation is rela-
tionship? Important point is to give your readers enough information which convinces readers.
4. General writing type: use good writing techniques in presenting the results; outline the paper;
take care of grammars, vocabulary , organization , show each steps of argument to your reader .
There are various methods of measuring relationships among variables among which three are com-
mon; covariance, correlation and regression.
1.7.2.1. Covariance
It measure how variables behave or vary together. To study relationship between cigarette smoking
and lung capacity, data from a group of people about their smoking habits, and their lung capacities
have been collected as presented Table 1.5.
43
Table 1.5: Cigarettes and lung capacity
(X) (Y)
0 45
40
5 42
10 33
30
15 31
Lung Capacity
20 29
20
-10 0 10 20 30
Smoking
The Sample covariance: Instead of averaging/dividing by N, we divide by N-1 as we are using sample
data. The resulting formula is
…………………………………………….1.25
44
Cigarettes (X) Lung Capacity (Y)
( )( )
0 -10 -90 +9 45
5 -5 -30 +6 42
10 0 0 -3 33
15 +5 -25 -5 31
20 +10 -70 -7 29
So we obtain 1
S xy ( 215) 53.75
4
There is also special form of covariance called variance. Variance of a variable is “its covariance wth
itself”. Indeed, the latter, variance, is a special case of the former, covariance.
1.7.2.2. Correlation
Correlation is a statistical technique which measure the degree or extent to which two or more variables
fluctuate with reference to one another. It measures degree and direction of linear association between
two variables. Correlation is just a number that reveal whether larger values of one variable tend
to go with larger(or with small) value of the other and vice versa. It just indicates that the numbers
seems to go together in some way. For example, we may be interested in finding the correlation (coeffi-
cient) between scores on statistics and mathematics examinations, between high school grades and col-
lege grades, and so on.
One can measure correlation of two variable(X, Y) using a scatter plot diagram gives only a rough
ideas of the relationship between X and Y. It also helps us measure the strength of the relationship be-
tween two variables. If the point lies close to the line, the correlation is strong. On the other hand, a
greater dispersion of point about the line implies weaker correlation.
For a precise quantity measurement of the degree of correlation between X and Y one can use the cor-
relation coefficients. It is denoted by Greec letter (ro) having as subscripts of the variable
whose correlation it measure(for population) say refers to the correlation of all the values of pop-
ulation of X and Y equal to . Sample correlation coefficient for a particular sample statistics de-
45
noted by with relevant subscripts. Accordingly, the correlation between X and Y for population
………………………….1.26
(5)(1585) (50)(180)
rxy
(6)(750) 502 (6)(6680) 180 2
7925 9000
(3750 2500)(33400 32400)
1075
.9615
1250 by
Sample correlation coefficient is estimates rxy
(1000)
……………..……………………1.27
46
………………….…………………………………..1.28
Denominator is
Note that; covariance of X and Y (but used occasionally). Sometime it is easier to use correla-
tion. Both correlation and covariance represents the same information ,but correlation represent
that information in a more accessible form. It represent whether relationship is positive or negative.
Time period(in days) Quantity supplied Yi(in tone ) Price Xi (in birr)
1 10 2
2 20 4
47
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
Determine the type of correlation that exists between these two variables?
1.7.2.3. Regression
There are two perspective to the definition of regression; Historical origin and Modern interpretation.
A. Historical origin of the term “Regression”
The term “Regression” was introduced by Francis Galton to mean “Regression to mediocrity” or move-
ment or tendency of certain relationship toward the average. He tried to show his theory using Family
Likeness in Stature in which the average height of children move or regress towards the average height
in the population. “ Tendency for tall parents to have tall children and for short parents to have short
children, but the average height of children born from parents of a given height tended to move (or re-
gress) toward the average height in the population as a whole (F. Galton, “Family Likeness in Stature”)
Galton’s Law was confirmed by Karl Pearson: The average height of sons of a group of tall fathers <
their fathers’ height. And the average height of sons of a group of short fathers > their fathers’ height.
Thus “regressing” tall and short sons tend toward the average height of all men. (K. Pearson and A.
Lee, “On the law of Inheritance”).
B. Modern interpretation of regression analysis
Regression is a mathematical way of expressing the average of relationship between two or more vari-
ables in terms of the original units of the data. In a regression analysis there are two types of variables;
the variable to be predicted/ influenced called dependent variable and the variable used for prediction or
which influences the values of dependent variable which is called independent variable. In regression
analysis we are concerned with statistical dependence among variables (not Functional or Determin-
istic), where we essentially deal with random variables with certain probability distributions. The de-
pendent variable is assumed have to be statistical, random, or stochastic with certain probability distri-
bution.
Our concern is with predicting the average height of sons knowing the height of their fathers. If Y =
Son’s Height; X = Father’s Height, then how heights of sons are related to height of their fathers. In
48
Figure 1.2 we assumed that the variable height of father’s was fixed at given levels and heights of son’s
measurements were obtained at these levels.
Figure 1.8: Hypothetical distribution of son’s heights corresponding to given heights of fathers
One can present many examples of regression analysis in economic variables. To mention some:
We may want to know or to predict the average score on a statistics examination by knowing a
student’s score on a mathematics examination.
Y = Personal Consumption Expenditure and X = Personal Disposable Income;
Y = Demand; X = Price
Y = Rate of Change of Wages and X = Unemployment Rate
Y = Money/Income; X = Inflation Rate
Y = % Change in Demand; X = % Change in the advertising budget
Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer
The same holds true for other relationships
C. Regression equation
Regression equations of on are algebraic expression that describe the variation in the values of
for a given changes in . But common regression equation of the is that of on is ex-
pressed as follows with regression lines
49
…………………………………………………………….1.30
The regression equation of on is expressed as follows
………………………………………………………………1.31
D. Regression vs. correlation or causation:
Although regression analysis deals with the dependence of one variable on other variables, it does not
necessarily imply causation. A statistical relationship cannot logically imply causation. “A statistical re-
lationship, however strong and suggestive, can never establish causal connection: our ideas of causation
must come from outside statistics, ultimately from some theory or other” . To ascribe causality, one
must appeal to a priori or theoretical considerations. For instance, economic theory says that consump-
tion expenditure depends on income. In this case change in income causes consumption expenditure to
change.
Closely related to but conceptually very much different from regression is correlation analysis. Regres-
sion and correlation have some fundamental differences that are worth mentioning.
In regression analysis there is a clear distinction between dependent and independent variable.
In correlation analysis we treat any (two) variables symmetrically; there are no distinction
between the dependent and explanatory variables. After all, the correlation between scores on
mathematics and statistics examinations is the same as that between scores on statistics and
mathematics examinations.
Correlation cannot explain why the two variables are associated. One possible basis for correl -
ation without causation is that there is some hidden unobserved third factor that makes
one of the variable seems cause the other when in fact each is being caused by the missing
variable termed as spurious correlation. For example, you might find a high correlation
between hiring new mangers and building(new facilities). Are the newly hired managers
causing new plant investments? Or does the act of constructing new building cause new
mangers to be hired? Probably there is a third fact namely high long-term demand for the
firm products is causing both.
Correlation does not necessary imply cause-effect relationship ,even when there are
grounds to believe the causal relationship exists, correlation does not tell us which variable
is the cause and which is the effect. It is reasonable when things cause another; the two tend to
be associated and therefore correlated. For example, environment and productivity, investment
and return… etc. However, there can be correlation without causation. For example, the de-
mand for a commodity and its price will generally be found to be correlated, but the ques-
tion whether demand depends on price or vice- versa will not answered by correlation.
50
Moreover, both variables are assumed to be random. As we shall see, most of the correlation
theory is based on the assumption of randomness of variables, whereas most of the regression
theory is conditional upon the assumption that the dependent variable is stochastic but the ex-
planatory variables are fixed or no stochastic.
Like all statistical summaries, the correlation is both helpful and limited. It provides an excel-
lent summarization of the relationship behaved linear not non-linear. Because, if there are
problems such as non-linear relationship , unequal variability of clustering or outlier in the
data correlation can be misleading.
Empirical Example
Suppose that one wants to know the relationship between road accident and consumption of beer over
in certain town Z. Data has been collected and presented as in Table 1.15. Based on the above informa -
tion,
a. Calculate the correlation coefficient between the following series and interpret the results
b. Is there any cause and effect relationship between the two”
Table 1.10: Road accident and consumption coefficients
Summary
Recent trend in economic analysis demanded more rigorous techniques. Econometrics play great role in
this regard. Econometrics is science of data analysis or measurement is an important aspect of econo-
metrics. It can be literary defined as “measurement in Economics”. Formally, it can be defined as “The
51
quantitative analysis of actual economic phenomena based on concurrent development of theory and
observation, related by appropriate methods of inference” (P.A.Samuelson, C.Koopmans and J.R.N.
Stone, 1954). Econometric are appropriate for the measurement of economic relationships in which
there are stochastic relationships. Econometrics may be considered as the integration of economics,
mathematics, and statistics for the purpose of providing numerical values for the parameters of eco-
nomic relationships and verifying economic theories.
Furthermore, econometrics aims primarily at the verification of economic theories, Policy and/or De-
cision Making and forecasting. Econometric research or inquiry generally precedes stages. These are
economic theory, mathematical econometric model, econometric model of theory/ statistical model,
data, estimation of econometric model, Hypothesis Testing, Forecasting or prediction and using the
model for control or policy purposes.
One of the most distinctive natures of econometrics is that it contains the random term which is not re -
flected in mathematical economics & economic theory. The adjustment consists primarily in specifying
the stochastic (random) elements that are supposed to operate in the real world and enter into the deter-
mination of the observed data. Applicability of economic theories to real-world problem has to be
checked against data obtained from the real world. But economic data are error laden. Though plenty
of data are available for research purpose, quality of data matters in arriving at a good result. The qual-
ity of data may not be good for different reasons.
Some of the techniques of analysis of relationship between economic variables are regression and cor-
relation. In regression analysis we try to estimate or predict the average value of one variable (depend-
ent, and assumed to be stochastic) on the basis of the fixed values of other variables (independent, and
non-stochastic). But, correlation analysis deals with degree of relationship between those variables.
Most economic relationships are inexact due to many factors though the mathematical model shows ex-
act relationships among economic variables.
Key terms
52
Econometric model Ordinary least square Method of moments
Maximum likelihood method Regression Correlation
Reference
Green, W. (2003): Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd.
Gujarat, O . (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Thomas R.L(1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside printing Co.
Ltd.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Theil, H., (1997): Specification error and the estimation of Econometric Relationships. Review of the
International Statitical Institute vol. 25: pp 41-51
Wooldridge, J. (2000). Introductory Econometrics: A modern approach. 2nd ed. New York:
South western publisher
Review Questions
Part I: Choose the best answer and write on the space provided.
1. Which one of the following definition of econometrics is correct?
a. measurement in economics
b. empirical determination of economics laws
53
c. science which integrate economic theory, economics statistics and mathematical economics
d. all
e. none
2. Econometrics is science that integrate
a. Economic theories
b. mathematical economics
c. economic statistics
d. mathematical statistics
e. all
f. None
3. which one of the following is true about econometrics
a. econometrics gives empirical content to most economic theory
b. econometrics methods provide numerical value of the coefficients of economic relationships
c. econometrics help as to infer relationship among variables
d. econometrics is an inter disciplinary field
e. all
f. f. none
4. which relationship and flow is relatively logical correct
a. economic theory economic model econometric models
b. economic model economic theory econometric model
c. econometric model economic theory economic model
d. economic theory econometric model economic model
e. all
f. none
5. which one of the following is not true about aims of econometrics analysis
a. testing economic theories
b. policy making
c. forecasting
d. evaluation
e. all
f. None
6. which one of the following is among the common data structure used in econometrics analysis
a. cross-sectional data b. time series data c. polled data d. panel data e. all f. none
7. correlation
54
a. express strength of relationship among economic variable
b. no distinction on variable whether it is dependent or in dependent
c. estimated value of linear correlation ranges from -1 to 1
d. is different from causation
e. all f. none
8. Regression analysis
a. Is concerned with the study of the dependence of one variable on one or more other variables
b. Is concerned with statistical dependency among variables
c. Estimates or predicts the average value of one variable on the basis of the fixed values of the
other variables
d. All of the above e. All except B f. None
9. Which one of the following is true about evaluation of estimates
a. Economic criteria which test whether it is in line with the theory and experience
b. Statistical criteria which deals with statistical reliability of the estimates of the parameters of
the model
c. Econometric criteria aim at the investigation of whether the assumptions of the econometric
method employed are satisfied or not in any particular case
d. All e. none
Part II: Briefly answer the following questions
1. Define Econometrics? How does it differ from mathematical economics & mathematical statistics?
2. What is the difference between theoretical & applied econometrics?
3. What is the difference between the model linear in parameter & & linear in variable?
4. Explain the stages in the methodology of econometrics?
5. Discuss about regression, correlation and covariance?
Part III: While answering the following question show your steps clearly Workout
1. Given theory by the well-known monetary economist Milton Fridman “ the theory of demand for
money have a strong positive relationship with price & income but has no relationship with rate of
interest”
a. Write the mathematical relationship
55
b. Formulate the econometric relationship
c. What will be the size & magnitude of the relationship between the dependent & indepen-
dent variables?
2. Assume that data is collected from 30 different families about their consumption expenditure and
disposable income. Estimate the relationship between consumption expenditure and disposable in-
come based on the data in Table 1.18?
Table 1.11: Hypothetical data on consumption expenditure (Y) and income (Xi) BPG het
1 55 80
2 65 100
3 70 85
4 80 110
5 79 120
6 84 115
7 98 130
8 95 140
9 90 125
10 75 90
11 74 105
12 110 160
13 113 150
14 125 165
15 108 145
16 115 180
17 140 225
18 120 200
19 145 240
20 130 185
21 152 220
22 144 210
23 175 245
24 180 260
25 135 190
26 140 205
27 178 265
28 191 270
29 137 230
30 189 250
2.0. Introduction
The simplest relationship is a case in which there is one dependent and other independent variable. A
statistical technique appropriate to use when it is believed is that the values of one variable are
systematically determined by the values of just one other variable is termed as two variable regres-
sion models or more commonly a simple linear regression model. Although the model is simple, and
hence unrealistic in most real-world situations, a good understanding of its basic underpinnings
will go a long way towards helping you understand more complex models.
56
In this chapter we describe general techniques or methodology for analyzing the relationship between
two economic variables. To this end, we follow econometric methodology discussed in previous chap-
ter. In other word, we try to answer some questions posed such as:
Given economic problems, how do we specify research question and economic theory?
Given economic theory, how do we specify economic models?
Given an economic model involving a relationship between two economic variables, how do we
go about specifying the corresponding statistical model?
Given the statistical model and a sample data on two economic variables, how do we use this
The kind of relation that we study in economics behavioral relation as it is based on behavior as in the
case of a consumption function or technology as in the case of a production function. We can think of
such a relation as being an economic process. The input and output of the process are observable ,
but the actual operation of the process is not.
-------------------------------------------------------2.1
In developing an economic model that specifies how expenditure relates to income, we label
household expenditure (Y) and corresponding household income as X. It may express relationship
between them in general form as indicated equation 2.1.
We need to know precise relationship between dependent and independent variables (linear or non-lin-
ear). In practice we never know the exact functional form for the relationship. We often use eco -
nomic theory or the information contained in the data to help us choose a reasonable one. Assum-
ing linear relationship between economic variables with algebraic form the economic model specified
as
In such model we are interested in measuring rate of changes of dependent variable for a unit change in
independent variable, slope.
………………………………2.3
Practical Exercise 2.1
Assuming that the supply for a certain commodity depends on its price (other determinants taken to be
constant) and the function being linear, present the model with justifications.
A. Observations
Since we collected data from different observations on Y and X the subscript i is introduced to
describe the ith observation on each of the variable. Let’s illustrate the distinction between stochastic
58
and non-stochastic relationships with the help of a supply function. If there is exact relationship be -
tween supply (Y) and price (X ), each value lies of straight line. If the model is a stochastic one the de -
pendent variable is not only determined by the explanatory variable(s) included in the model but also by
others factors which are not included in the model. All those factors are accounted by ‘disturbance’
term from the exact linear relationship which is assumed to exist between X and Y. For example, if we
gather observations on the quantities actually supplied in the market at various prices and plot them on a
diagram we see that they do not fall on a straight line. If we look a random sample of a household
and recoded average expenditure (Y) and average income(X) for each of these household, then it
would be unrealistic to expect each observed pair (Y, X) to lie exactly on straight line as pictured
in figure 2.1.
We theorize that the value of variable Y for each observation is determined by the equation. In such a
case, for any given value of X, the dependent variable Y assumes some specific value only with some
probability. The scatter observations represent the true relationship between Y and X. Accordingly, we
observe
Y 1 ,Y 2 ,......,Y n corresponding to X 1 , X 2 ,. . .. , X n . We would observe points on the line such as
Y '1 ,Y '2 ,......,Y 'n corresponding to . The line represents the exact part of the relationship. The
deviation of the observation from the line represents the random component (random disturbance) of the
Yi
59
u1
u3
u2
part of Y explained by the changes in X and represents the exact relationships explained by
the line. The second is the part of Y not explained by X, that is to say the change in Y is due to the ran-
dom influence of
ui . This can be explained using the figure 2.2 in which u , u , u …etc are random
1 2 3
once. The deviations of the observations from the line may be attributed to several factors such as;
1. Omission of variables from the function: In economic reality each variable is influenced by
very large number of factors and each variable may not be included in the function because
a. Some of the factors may not be known.
b. Even if we know some the factors, they may not be measured statistically. For example
psychological factors (taste, preferences, expectations…. etc) are not measurable
c. Some factors are random appearing in an unpredictable way & time. For example epi-
demic earth quacks etc.
d. Some factors may be omitted due to their small influence on the dependent variables
e. Even if all factors are known, the available data may not be adequate for the measure of
all factors influencing a relationship.
60
One can present the model of economic reality in which a very large number of factors are
influencing the variable as
……………………………………………..2.5
However, not all the factors influencing a certain variable can be included in the function for var-
ious reasons.
(2) Random behavior of the human beings(behavior)
The human behavior may deviate from the normal situation to a certain extent in unpredictable
way. For example, in a moment’s whim a consumer may change his expenditure pattern although
income and prices did not change. The scatter of points around the line may be attributed to an
intrinsic randomness which is inherent in human behavior in individual Y that cannot be ex-
plained not matter how hard we try.
Non-linear relationship
Linear relationship
Error term is to capture any approximate error that arise because of linear functional form used
and/ or some equations left out of the model.
62
The first four sources of error render the form of the equation wrong, and they are usually referred to as
error in the equation or error of omission. The fifth source of error is called error of measurement or er -
ror of observation. In order to take in to account the above sources of error, we introduce in econometric
functions a random variable u.
C. Sampling distributions
To make statistical model complete some mechanism for modeling Ui is required. That is, the sampling
process underlying the observed Yi is directly related to the mechanism made about the random
variable Ui and Y as well as and their probabilities distribution. Some of assumptions is that; u i is
random , variance is ,
.
Building econometric model depends on set of assumptions forwarded by classical termed Classical Lin-
ear Regression Model Assumption (CLRMA). They are cornerstone of most econometric theory. These
assumptions amount to a set of statement about population distribution, distribution of error terms (its
mean, variance, covariance.); relationship between error term and other variables. We study sets of as-
sumption which should hold and then go beyond them to the extent that we will at least study what hap-
pens when they go wrong and how we may test whether they have gone wrong.
Assumption 1:
U i is a random real variable
63
The value which u may assume in any one period depends on chance; it may be positive, negative or
zero. Every value has a certain probability of being assumed by u in any particular instance. It says that
the factor not explicitly included in the model and subsumed in u i do not systematically affect the mean
value of Y.
Assumption-2: The mean value of the random variable (U) in any particular period is zero
The random variable(u) may assume various values, some greater than zero and some less than zero,
but if we considered all the positive and negative values of u. , for any given value of X, they would
have on average value equal to zero. In other words the positive and negative values of u cancel each
other.
In a nutshell, this assumption implies that the factors not explicitly included in the model and therefore
subsumed in , does not systematically affect the mean value of Y; i.e., the positive values cancel
Assumption-3: The variance of the random variable (U) is constant in each period
The variance of about its mean is constant at all values of X. In other words, for all values of X,
will show the same dispersion around their mean. That is, the average variation around regression line is
the same across the X values. It neither increase or decrease as X varies (or big is not more likely to
occur when Xi is big or vice versa). The will tend to be distributed evenly around the line at all level
of Xi. This assumption implies that the values of Y corresponding to various values of X have constant
variance.
Mathematically;
Var (U i )=E [U i −E(U i )]2 =E(U i )2 =σ 2 (Since E(U i )=0 ). …………2.16
For joint relationship
64
For all values of X, the u’s will show the same dispersion around their mean. In Figure .2.6 this assump-
tion is denoted by the fact that the values that u can lie within the some limits irrespective of the value of
X. For X 1 , U can assume any value within the range AB; for X 2 , U can assume any value within the
range CD which is equal to AB and so on.
This assumption is important as testing needs the distribution of . The commonly used (assumed) dis-
tribution for Ui is normal distribution. The values of u (for each x) have a bell shaped symmetrical dis-
2
tribution about their zero mean and constant varianceσ , i.e.
U i N (0 , σ 2 ) ………………………………………..……2.17
65
Given any two Xi values: and ( and ) , the correlation between any two and is zero.
Algebraically,
Cov (ui u j )=Ε [ [(ui −Ε(ui )][u j −Ε(u j )] ] since = =0
=E (ui u j )=0 …………………………..….(2.18)
That is, the choice of one household does not influence the choice of other household.
B. Assumption about the relationship between Ui’s & the explanatory variables
Assumption-6: non-stochastic Xi in repeated sampling
It says that
X i assume a set of fixed values in the hypothetical repeated sampling which underlies the
linear regression model. That is, in taking large number of samples on Y and X, the
X i values are the
Sample one
Sample one
This assumption simplifies our analysis. Given a fixed value of X i, the model indicate that any Yi is
generated as a function of the single random variable and non-random term, . Hence,
Yi is a function of one rather than two variables. It’s expected value and variance are particularly simple
to evaluate.
……………………………………………….……..2.19
Furthermore, the fact that Ui is a linear function of normal random variable allow us summarize the dis-
tribution Yi as normal
66
The alternative way is to treat X as random variable with expectation and variance to be evaluated con-
ditional on Y. This is when X and U i may be correlated. This conception may be appropriate for the ex-
perimental science in which the explanatory variable is under the control of the researcher. It is not ob-
viously the case for most problems in economics where data are non-experimental and observations on
explanatory variables are outcomes of random sampling the process.
Assumption 7: Variability in X values
This means that X assumes different values in a given sample but fixed values in hypothetical repeated
samples. Without this assumption it would be impossible to estimate the parameters and conduct regres-
sion analysis. For example, if there is little variation in family income, we will not be able to explain
much of the variation in the consumption expenditure of the families. Technically, Var(X ) must be a fi-
nite number.
Assumption 8: The random variable (U) is independent of the explanatory variables.
There is no correlation between the random variable and the explanatory variable in the model. They
are distributed independently of the disturbance term. If two variables are unrelated their covariance is
C. Relationship between the explanatory variables themselves or functional forms for relationship
67
between variable
Assumption-9: The model is linear in parameters.
There are two forms of linearity: Linearity in the variables and linearity in the parameters.
A. Linearity in the Variables
The first and perhaps more “natural” meaning of linearity is that the conditional expectation of Y is a lin-
ear function of Xi. Geometrically, the regression curve is a straight line. Model specified as
2
is linear in variable as opposed a regression function such as E(Y | Xi) = β1 + β2 X i
is not a linear function because the variable X appears with a power or index of 2.
B. Linearity in the Parameters
The second interpretation of linearity is that the conditional expectation of Y, E(Y | Xi), is a linear func-
tion of the parameters, β’s. It may or may not be linear in the variable X. For instance, regression model
E(Y | Xi) = β1 + β2 X 2i is a linear in the parameter. The following model satisfies the assumption since the
classical worry on the parameters,
Y =α + βx +u
i. is linear in both parameters and the variables, so it satisfies the assumption
ln Y =α+β ln x +u
ii. is linear only in the parameters.
ln Y 2 =α + β ln X 2 +U i
iii.
Y i =√ α+βX i +U i
iv. non linear in parameter
Linearity of above model implies that one unit change in X has the effect on Y regardless of initial
value of X. But, this is unrealistic for many economic applications such as returns from education and
others. . To utilize such model one can transform the nonlinear form to linear form in parame -
ter depending on the situations by taking log both side . For example
The classical assumed linearity in the parameter regardless of whether the explanatory and the depen-
dent variables are linear or not. It is difficult to estimate parameters if they are non-linear. Therefore,
from now on the term “linear” regression will always mean a regression that is linear in the parameters;
the β’s (that is, the parameters are raised to the first power only). It may or may not be linear in the ex -
planatory variables, the X’s.
68
Assumption -10: The explanatory variables are measured without error
a. regressors are error free, and U absorbs the influence of omitted variables and possibly
errors of measurement in the Y’s and hence Y is measured correctly.
∴ Y i ~ N [ (α+ βxi ), σ 2 ]
i.e ………………………………..…………(2.22)
Ε( Y )=Ε ( α + βx i +ui ) =α + βX i
Proof: Mean: Since Ε(u i )=0
Var (Y i )=Ε ( Y i−Ε(Y i ) )2 =Ε ( α + βX i +ui −( α + βX i ) ) 2
Variance:
=Ε (ui )2 =σ 2 (since Ε( u i )2 =σ 2 )
∴ var (Y i )=σ 2
………………….……….(2.23)
Y u
The shape of the distribution of i is determined by the shape of the distribution of i which is normal
by assumption 4. Sinceα and β , being constant, they don’t affect the distribution of . Furthermore,
the values of the explanatory variable, , are a set of fixed values by assumption 5 and therefore don’t
∴ Y i ~ N (α+βx i , σ 2 )
affect the shape of the distribution of .
c. Successive values of the dependent variable are independent, i.e
69
variables in the model which will make the validity of regression result and its interpretation question -
able.
Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
70
The model functional relationships between variables can be written as
……………………………………………………….(2.6)
B. Conditional relationship
Economic relationship is not as such one to one correspondence, rather there is joint relationship. For
each Xi there are many Y values which varies between Y 1 to Yn and hence Ui. The same holds true for Xi
which range X1 …..Xn… Many Y value such as Y1, Y2, Y3…….,Yn related to fixed value of X. To under-
stand this, consider the data given in Table 2.2 about total population of 60 families in a hypothetical
community, their weekly income (X) and weekly consumption expenditure (Y), both in dollars. The 60
families are divided into 10 income groups (from $80 to $260) and the weekly expenditures of each
family in the various groups are as shown in the table. Therefore, we have 10 fixed values of X and the
corresponding Y values against each of the X values. Hence, there are considerable variations in weekly
consumption expenditure in each income group.
How do we look at the relationship between X and various value of Y? We use conditional expected
value to get unique value of Yi corresponding to each fixed value of Xi. It is important to distin-
guish these conditional expected values from the unconditional expected value of weekly consumption
expenditure, E(Y). Unconditional expected value or unconditional mean E(Y i) is obtained by summing
Yi over the entire sample and dividing it by number of sample. Adding the weekly consumption ex -
penditures for all families and dividing by the 60 families in the population or total population, we get
unconditional mean $121.20 ($7272/60). It is unconditional in the sense that in arriving at this number
we have disregarded the income levels of the various families.
Conditional expectation is obtained by finding unconditional mean Y i corresponding to each Xi. Obvi-
ously, the various conditional expected values of Y given in Table 2.3 corresponding to different Xi. The
expected value of weekly consumption expenditure of a family, whose monthly income is, say, $140, is
$101. This is conditional mean as if find average of Y corresponding to income of 140.
Table 2.3: Weekly family consumptions expenditure and conditional mean income
71
We have the mean, or average, weekly consumption expenditure corresponding to each of the 10 levels
of income. Thus, corresponding to the weekly income level of $80, the mean consumption expenditure
is $65, corresponding to the income level of $200, it is $137. In all we have 10 mean values for the 10
subpopulations of Y. We call these mean values conditional expected values, as they depend on the
given values of the (conditioning) variable X. There is considerable variation in weekly consumption ex-
penditure in each income group, which can be seen clearly from Figure 2.1. The general picture show
that, despite the variability of weekly consumption expenditure within each income bracket, on the aver-
age, weekly consumption expenditure increases as income increases.
For regression line construction we take mean value of Y or E(Y/X) and develop unique Y that repre -
sents the group. Regression analysis is largely concerned with estimating and /or predicting the popula-
tion means value of the dependent variable on the basis of known or fixed value of explanatory variable.
In can be specified as
Yi = E(Y | Xi) + ui
Geometrically, a population regression curve (line) is simply the locus of the conditional means of the
dependent variable for the fixed values of the explanatory variable(s).
72
Figure2.4: Conditions distribution of expenditure for various level of income
Figure 2.4 indicate that, given the income level of Xi, an individual family’s consumption expenditure is
clustered around the average consumption of all families E(Y/X). It is clear as family income increases,
family consumption expenditure on the average increases, too. But an individual family’s consumption
expenditure does not necessarily increase as the income level increases. For example, from Table 2.2 we
observe that corresponding to the income level of $100 there is one family whose consumption expendi-
ture is $65. But notice that on average consumption expenditure of families with a weekly income of
$100 is greater than the average consumption expenditure of families with a weekly income of $80 ($77
versus $65).
Our objective in regression analysis is to estimate coefficient and make decisions. This can be done us-
ing data collected from population and/ or sample. Hence, before we move for further analysis it is bet-
ter to clarify population and sample regression function or line. It is regression equation developed
based on data collected from population. For a simple data on Y and X simple linear relationship model
with stochastic terms can be specified as:
………………………….………………………………….(2.8)
When we are working with joint variable, we take expected value Y i given X or conditional mean E(Y |
Xi) as a function of Xi. When E(Y | Xi) is a linear function of Xi and is known as the conditional expecta-
tion function (CEF) or population regression function (PRF) or population regression (PR).
u^
of β1, conceptually i is analogous to ui and = estimator of . Note that: an estimator also known as
a (sample) statistic which simply tells us a rule or formula or method that tells how to estimate the pop-
ulation parameter from the information provided by the sample at hand. A particular numerical value ob-
tained by the estimator in an application is known as an estimate.
The predicted Y (Y^ i) is unlikely to concede with the actual Y. That is, all the point in the scatter diagram
would not line on fitted sample regression line. Now just as we expressed the PRF in equivalent forms
with the SRF (2.13) in its stochastic form as follows:
…………………………………………………………………2.13
The sample counterpart of Eq. (2.10) may be written as
74
E(Y | Xi) = ^β 0 + ^β 1Xi
Y^ i = ^β 0 + ^β 1Xi ---------------------- sample regression line-------------------- (2.14)
More, specifically, if Xi is given we can predict Y1 as Y^ i = ^β o + ^β 1Xi.. The figure 2.4 below indicate that
the PRF is by Yi = βo + β1Xi . But, sample regression (SRF) by Y^ i = ^β o + ^β 1Xi. One can find corre-
sponding difference in error term.
The question is that can we predict the average weekly consumption expenditure (Y) in the population
using sample data? In other words, can we estimate the PRF from the sample data? As one surely sus-
pects, we may not be able to estimate the PRF “accurately”. That is, population regression line differs
from sample regression line as we may not able to estimate the PRF correctly. The reason is that we pro-
duce many sample regression lines because of sampling fluctuation; as sample drawn changes, so do
SRF and hence many samples. To see this, suppose we draw another random sample from the popula-
tion of Table 2.1, as presented in Table 2.3-B.
Sample A Sample B
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260
Plotting the data of Tables 2.3 Sample A and B, we obtain the scattergram given in Figure 2.2. In the
scattergram two samples regression lines are drawn so as to “fit” the scatters reasonably well: SRF 1 is
75
based on the first sample, and SRF 2 is based on the second sample. Which of the two regression lines
represents the “true” population regression line? There is no way we can be absolutely sure that either of
the regression lines shown in Figure 2.2 represents the true population regression line (or curve).
Before going directly to estimation let’s see some of the underlying theoretical framework for estima-
tions of coefficients and related model specification. We can get estimates of the u i after the estimation
of regression line through deviations of the observation from this line.
76
The parameters of the simple linear regression model can be estimated by various methods. Three of the
commonly used methods are:
1. Ordinary least square method (OLS)
2. Maximum likelihood method (MLM)
3. Method of moments (MM)
Here we will deal with derivations and mechanics of OLS, MM and MLM methods of estimation.
is called the true relationship between Y and X with their respective population pa-
rameters ( ). But it is difficult to obtain the population value of Y and X because of technical
(unmanageability of census, depth of analysis) or economic (material, human and financial costs) rea-
sons. So we are forced to take the sample value of Y and X.
Therefore, we have to obtain a sample of observed values of Y and X, specify the distribution of the er-
ror (U) and try to get satisfactory estimates of the true parameters of the relationship. This is done by fit-
ting a regression line (SRF) which would be considered as an approximation to the true line. The param-
eters estimated using the sample value of Y and X are called the estimators of the true parameters
and are symbolized as for the sample counterpart. Accordingly, the model
, used to represents the sample counterpart of the population regression function be-
tween Y and X.
77
Our statistical experiment is concerned with repeated sampling. In repeated sampling process there are
many regression lines with difference in . Each SRF line has unique values. Then,
which of these lines should we choose? Generally we will look for the SRF which is very close or if pos-
sible exactly fit to the (unknown) PRF. We need a rule that makes the SRF as close as possible to the ob-
served data points. But, how can we devise such a rule? Equivalently, how can we choose the best tech-
nique to estimate the parameters of interest, ? We use the method that minimizes the sum of
square of error terms, OLS.
Steps in developing OLS estimator
Step-1: find SRF and fitted regression line corresponding to the population regression function and line
can be best estimator if error terms are as small as possible.
PRF : …………………………………….2.25
SRF: ,………………………………….2.26
SRL (Fitted line): then
Step -2: using SRF and SRL (fitted line) derive sample error term
--------------------------------------------------------------2.27
Given our observation on Y and X, we would like to determine the SRF in such a way that it is as close
as possible to the actual Y. Intuitively this is possible when the deviations from the line are smaller.
This can be done by choosing the SRF in such a manner that the sum of the residuals
is as small as possible
……………………..……..……………….2.28
This approach is not an appropriate, no matter how intuitively appealing it may be. The reason for this is
that the minimization of gives equal weight to different deviations; no matter how large or small
the deviations may be; i.e., it attaches equal importance to all ’s no matter how close or how widely
scattered the individual observation are from the SRF.
78
Figure 2.7: Error term and fitted SRF
Consequently, the algebraic sum of the is small (even zero) although individual are widely
scattered about the SRF. For instance, and is positive whereas and are negative, then the
weighted sum will be + + + =0. Can we minimize a quantity which is zero by definition and
solve the above problem? No . This means that the minimization of the doesn’t necessarily imply
…………………………………..2.29
This approach gives more weight to residuals with wider dispersion than those with closer dispersion
around the line.Accordingly, we give more weight to residuals such as U 1 and U4 (more deviations from
sample line) and less to U2 and U3 .From the estimated relationship , we obtain:
……………………………….…….……..(2.30)
………………………..………..……(2.31)
79
Sum of squared residuals is some function of estimators such as . For any given data sets as
choosing different values for coefficients will give different value U i’s and . We can find the line
that minimize the and best estimate the population parameters by minimizing the above equation.
……………………2.32
Step-4: takes partial derivative and minimizes equation
The minimization problem can be handled using differential calculus by taking partial derivatives of
∑ e2i with respect to and use for optimization conditions (first and second order condition).
That is set first order partial derivative equal to zero.
(Composite function)
First order condition for the equation is
………………………………..2.34
B. Partial derivative for the second coefficient ( ) will be
80
…………………………2.35
(chain rules)
Note that: since , this equation says which is related to our as-
sumption. It indicates that when the pair wise product of X i and Ui are summed over full sample the total
is zero.
-------------------------------------2.36
Step-5: Solve normal equations simultaneously
We have two normal equations from equation 2.34 and 2.35
----------------------------------------2.38
……………………………………..2.39
81
plug in value of
other side
………………………………2.41
Plug 2.40 in equation 2.33
82
Rearranging the above equation
n n n
n
Y X Y X X
i i
2
i i i
0 i 1 i 1 i 1 i 1
…………………………………2.42
n 2
n X 2 X
n
i 1
i i
i 1
Deviation form
The above formula for and can be expressed in terms of deviation from mean value
From equation – 2.33 we have
(Divide it by n)
83
………………………………..2.43
Sample regression line passes through means of the data in such a way that the residual sum is zero.
Some residuals are positive, others are negative so that positive residual tend to cancel with negative
one. From this normal equation cancellation is exact.
2
Σ XY −n X̄ Ȳ = ( ΣX i −n X̄ 2)
…………………………………………………..…….(2.44)
Equation (2.43) can be rewritten in somewhat different way (deviation form) as follows;
Σ( X − X̄ )(Y −Ȳ )=Σ( XY −X Ȳ − X̄ Y + X̄ Ȳ )
=Σ XY −Ȳ ΣX− X̄ ΣY +n X̄ Ȳ
=Σ XY −n Ȳ X̄−n X̄ Ȳ +n X̄ Ȳ divide it by n/n
………………………………………………..2.45
………………………………………...…………….2.46
Substituting (2.44) and (2.45) in (2.43), we get
Now, denoting ( X i − X̄ ) as
x i , and (Y i −Ȳ ) as y i we get;
84
Example one: Using the sample data of Table 2.1, which for convenience is reproduced as Table 2.5
estimate the relationship between consumption expenditure (Y) and income (X) as a test of the Keyne-
sian consumption function?
85
Table 2.5: Sample weekly family consumption expenditure (Y ) and weekly family income (X)
Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
Solution
Table 2.6: Computation OLS components based on Sample in Table 2.5
Yi Xi Y i Xi Xi
2 2
xi xi yi Y^ i u^ i=Y i−Y^ i
x i=¿ X − X ¿ (5) y i=¿Y −Y ¿ (6)
(1) (2) (3) (4) i i
(7) (8) (9) (10)
70 80 5600 6400 -90 -41 8100 3690 65.1818 4.8181
65 100 6500 10000 -70 -46 4900 3220 75.3636 -10.3636
90 120 10800 14400 -50 -21 2500 1050 85.5454 4.4545
95 140 13300 19600 -30 -16 900 480 95.7272 -0.7272
110 160 17600 25600 -10 -1 100 10 105.9090 4.0909
115 180 20700 32400 10 4 100 40 116.0909 -1.0909
120 200 24000 40000 30 9 900 270 125.2727 -6.2727
140 220 30800 48400 50 29 2500 1450 136.4545 3.5454
155 240 37200 57600 70 44 4900 3080 145.6363 8.3636
150 260 39000 67600 90 39 8100 3510 156.8181 -6.8181
1109.9995
Sum 1110 1700 205500 322000 0 0 33000 16800 0
≈ 1110.0
Mean 111 170 Nc Nc 0 0 Nc Nc 111 0
The estimated regression line therefore is Y^ i = 24.4545 + 0.5091Xi. The associated regression line are
interpreted as follows: Each point on the regression line gives an estimate of the expected or mean value
of Y corresponding to the chosen value of X. The value of ^β 1 = 0.5091, which measures the slope of the
86
line, shows that, within the sample range of X between $80 and $260 per week, as X increases, say, by
$1, the estimated increase in average weekly consumption expenditure amounts to about 51 cents. The
value of ^β 0 = 24.4545, which is the intercept of the line, indicates the average level of weekly consump-
tion expenditure when weekly income is zero (autonomous consumption expenditure).
Example Two: A researcher wants to see relationship between sales and advertising expenditure. Ac-
cordingly, data collected is presented in table 2.7. Using this data find coefficients of the relationship?
Table 2.7: Advertising Expenditure and Sales Revenue in thousands of Dollar
Solution
Table 2.8: Detail Computation of OLS component on Y and X Variables
Expense
Firm (i)
Advert-
Sales
ising
(Yi)
yi2
xi2
yi
xi
96 80 0 0 21 0 0 30.4 28 96 0 14.65
87
, ,
From the data on household expenditure and income survey , the following results were obtained for
25 observations/households.
Then calculate the three building blocks of two variable regression lines.
Answer
Then
^ Y ^ X
Given can be computed as β 0= - β 1 = 163.29-(0.82)(163.2) =30.71
Then
88
B. Estimations of elastic ties from an estimated regression line
¿ ¿
After estimation of the parameters, β 0 and β 1 , we can estimate the elasticity from an estimated regres-
¿ ¿
sion line. The equation of a line whose intercept is β 0 and its slope β 1 can be
¿ ¿ ¿
Y i =β 0 + β 1 X i ………………………………………………………….. (2:49)
¿ ¿
The coefficient β 1 is the deviation of Y with respect to X .
¿
¿
dY
β 1=
dX …………………….…………………………………………….. (2.50)
¿
Equation 2.48 shows the rate of change in Y as X changes by a very small amount. It should be
¿
noted that the estimated function is linear function. The coefficient β 1 is not the price elasticity but a
component of the elasticity which is defined by the formula:
dY /Y dY X
η P= = .
dX / X dX Y ………………..………………..…………………. (2.51)
¿
Where, η P = price elasticity,Y= quantity (demanded or supplied) and X= price. Clearly β 1 is the com-
dY
ponent of dX . From an estimated function we obtained an average elasticity
− −
¿ ¿
X X
η P =β 1 . −
=β 1 . −
¿
Y Y
……………………………………..……………………. (2.52)
−
Where, X = the average price in the sample
−
¿
Y = average regressed value of the quantity or the mean value of the estimated value of
the quantity in the sample
−
Y = average value of the quantity in the sample
−
¿ −
Note that Y = Y that is the mean of the estimated values of Y is equal to the mean of the actual
(sample) values of Y. This is because
¿ ¿ ¿
Y = β0 + β 1 X …………………………………………………….…. (2.53)
−
¿ ¿ ¿ − − ¿ − ¿ − −
Y = β0 + β 1 X =( Y −β 1 X )+ β 1 X =Y ………………………………..(2.54)
89
¿
Alternatively, the value of β 1 can be obtained using the deviation from their mean as follows.
− −
¿
β 1=
∑ ( Y i−Y )( X i −X )= ∑ x i y i =156 =3 .25
∑ ( X −
−
X ) 2 ∑ x 2i 48
i
constant term. Hence, it is desired to fit the line , subject to the restriction To
estimate , the problem is put in a form of restricted minimization problem and then Lagrange method
is applied.
We minimize: …………………………..………………………..2.55
Subject to:
Form the composite function for constrained minimization by incorporating λ . The above form be-
comes
90
……………………………………..(2.56)
This formula involves the actual values (observations) of the variables and not in their deviation forms,
For example, production functions of manufacturing products should normally have zero intercept be-
cause the output is zero when the factors of inputs are zero.
If the specific functional form when labor is only input and there is linear relationship between input and
output
Since without labor input no output can be produced hence L=0 then output =0
Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family expenditure 7 9 8 11 5 4 8 10 9 10
Estimate the regression line of food expenditure on income and interpret your results.
2. The following data refers to the demand for apples (Y) in Kg and the prices of applies X in birr
per Kg on 10 different market
Table 2.10: Demand for apples and prices by families
Families A B C D E F G H I J
Y 99 91 70 79 60 55 70 101 81 67
X 22 24 23 26 27 24 25 23 22 26
Find
3. Consider the model Yi= + Xi + U Show that the OLS estimate of is unbiased
91
2.5.2.2. Method of moments (MM)
This estimation is based on two common assumptions of CLRMA. These are
A.
B.
C. ……………………………………………… 2.57
ically, let (Beta hat) denote the sample estimates of and denoted the estimate .Then es-
timate of the error terms is obtained as
. …………………………………………………… 2.58
…………………………………………………..……..2.59
as
,but
92
Using this result, we get the following equation
…………………………………………. 2.60
The straight line is the estimated line and is known as the sample regression line
or the fitted straight line. We see from equation 2.60 above that the sample regression line passes
The second condition for obtaining is provided by assumption that . The sample
statistics corresponding to is
…………………………………………2.61
Setting this to zero we obtain the condition.
……………………….2.62
Taking this summation term by term and noting that can be taken out of the summation
, we get(canceling n)
or
…………………………… 2.63
Then equation 2.61 and 2.62 are known as the normal equation(no connection with the normal dis-
93
Plug this value in the second normal equation
……..…………………………………..2.64
94
Maximum likelihood (ML) is a method of point estimation with some stronger theoretical
properties than OLS. It is based on the idea that different populations generate different samples, and
any given sample is more likely to have come from same populations than from others. The principle
of maximum likelihood is based on intuitive notion that “an event occurred because it was most
likely to”. In other word, we compute the probability(which is called likelihood in the context of
estimation) of the observed outcomes of different alternatives under consideration, and choose that
alternative for which the probability of observing is the maximum. The belief is that the observed
sample values are more likely to have come from this population than from others.
More specifically, the general principle of maximum likelihood is as follows. Let Y i, be the random
variable whose probability distribution depends on the unknown parameter . That is, a random sam-
ple with value of Y1, Y2, Y3,……Yn of independent observations is drawn with corresponding proba-
. Then the functional form of likelihood function is joint density of the sample or product of
probabilities. It can be written as
…………………....….……..2.66
………………..….…...…..2.67
The ML estimator maximizes the likelihood function, L, which is the product of the individual probabil-
ities taken over the entire n observations. If the possible values of are discrete the procedures is
to evaluate for each possible value under consideration and choose the value for which
95
L is the maximum. If is differentiable, maximize it over the range of permissible values of .
This can be done using the first and second order condition
………………………………………..2.68
Let us extend the concept to our parameter estimation in simple linear regression developed from origi-
nal equation and to our error term.
Step-1: Assume observations such as (X 1,Y1,), (X2,Y2,), (X3,Y3)……(Xn,Yn) were known independently
with mean of and variance , the general function relating the two can be
……2.69
Since our central variable is error term in a sense we try to minimize errors and increase our precision,
. …………………….……………………………….………….2.70
Then the density function or the likelihood function or joint density function of is de-
……………….…………………2.71
……………….…………..2.72
96
Our aim is to maximize this likelihood function L with respect to the parameters . The ML
estimator of a parameter is the value of which would most likely generate the observed sample
observations; Y1,Y2,…….Yn .
To do this, it is more convenient to maximize the logarithm of the likelihood function which is
equivalent to maximizing L. Because the logarithm is a monotonically increasing transformation (that
is, if a > b, then Ln(a) > Ln(b)), then, the function is
…………………………….2.73
The only space where appears is in SSE. Therefore, maximizing lnL is equivalent to mini-
mizing SSE as there is negative sign before SSE which gives the least squares estimators. Therefore, the
least squares estimators are also MLE provided ’s identically and independently distributed as as
………..……..2.74
………...……2.75
……….……..2.76
97
For bases other than e (i.e. logbx), the rule is as follows
If b = e then
Then
Setting these equations equal to zero(the first order condition for optimization) and letting
………………………………….2.77
………………...………………2.78
……………………..………2.79
Upon rearrangement of equation 2.77 and 2.78 we get normal equation as in 2.80 and 2.81
………………………………………….……….2.80
……………………………………….……..2.81
After simplifying we obtain estimator for formula which is similar to the one using OLS in equation-
2.82 and 2.83. Therefore, the ML estimators of parameters are to the OLS estimators:
98
…………………………...…2.82
………………………………………….……2.83
To obtain the MLE of , we differentiate lnL partially with respect to and set result to zero.
……….….….2.84
=
Rearrange the equation we would then have
but
Solving this for we get , but SSE depends on . We can use their estimates
……………………………………………………2.85
From the above equation the ML estimator of differ from the OLS estimator , which
was shown to be unbiased estimator of . Thus, the ML estimator of is biased. The magnitude of
this can be easily determined by taking the mathematical expectation of the on both sides, we obtain
………………………………………….2.86
which shows that is biased downward (i.e., it underestimate the true ) in small samples. But,
notice that as n( the sample size) increase indefinitely, the second term in (2.85) the biased factor,
tends to be zero. Therefore, asymptotically (i.e. in a very large sample), is unbiased too. That is,
99
increases indefinitely converge to its true . That is , Because ML esti-
mators are consistent and asymptotically efficient , so as the OLS estimators.
We divide the available criteria into three groups: the theoretical a priori criteria, the statistical criteria,
and the econometric criteria.
In regression analysis priority should be given to fulfillment of the economic priori criteria (sign and
size of the estimates). Only when the economic criteria are satisfied one should proceed with the appli-
cation of first-order and second-order tests of significance. This stage consists of deciding whether the
estimates of the parameters are theoretically meaningful or accords with the expectations of the theory,
and statistically satisfactory. It all about checking whether the result obtained is in line with theoretical
100
or conceptual arguments or pervious empirical studies. If not there should be logical justifications for the
result obtained.
On the basis of a priori-economic criteria the sign which represents marginal propensity to consume
has to be positive & its magnitude (size) should be between zero & one (0< <1). If the sign & magni-
tudes of the parameter do not confirm the economic relationship between variables explained by the eco-
nomic theory, the model will be rejected. Because priori theoretical criterions are not satisfied as per the
estimates, or the estimates should be considered as unsatisfactory. But if there is a good reason to accept
the model then the reason should be clearly stated.
Given sample data collected form population, the estimated results of the above consumption expendi-
ture is The result says that if your income increases by 1 birr, your consump-
.
tion will increase on the average by less than one birr. i.e 0.203 cents. Then the value of is less than
one & greater than zero which is in line with the economic theory or satisfies the a priori - economic
criterion.
But, if estimation of the same model using other data gives the following results
the estimated model results are contradictory or do not confirm the economic theory as the sign of is
negative & its magnitude is greater than one then we reject the model. We reject such model in line
with the data or subject to further investigation. In most of the cases the deficiencies of empirical data
utilized for the estimation of the model are responsible for the occurrence of wrong sign and/or size of
the estimated parameters. The deficiency of the empirical data can be either due to problem of sampling
procedure, unrepresentative sample observation from the population or inadequate data or violation of
some assumption of the method employed.
101
2.6.2. The econometric criteria: Properties of OLS Estimators
These criteria usually evaluate the estimated result from desired properties of OLS estimators, distribu-
tion of the dependent variable(Y) and the distribution of random as a measure of precision of estimators.
of the estimates. For instance, once we derived estimators β 0 and β 1 (sample estimates) of the parame-
ters of our model, we should test their reliability and relationship to the true unknown values.
Our objective is to find estimate which is as close as possible or fit to the value of our population param -
eter. Closeness is measure goodness of fit. We need some criteria for judging the ‘goodness’ of an esti -
mate. How are we to choose among the different econometric methods the one that gives ‘good’ esti-
mates?
From their statistical properties we can get some measures on how much dependable they are in
economic applications. These can be done using certain defined econometric criteria or properties of
OLS. Closeness of the estimate to the population parameter is measured by the mean and variance or
standard deviation of the sampling distribution of the estimates of the different econometric methods.
A good estimate should have the properties of unbiasedness, consistency, efficiency & sufficiency or a
combination of such properties. If one method gives an estimate which possesses more of these desirable
characteristics than any other estimates from other methods, then the techniques will be selected. The
ideal or optimum properties that the OLS estimates should possess may be summarized by well-known
theorem known as the Gauss-Markov Theorem. The theorem state that: “Given the assumptions of the
classical linear regression model, the OLS estimators is in the class of linear and unbiased estimators,
with a minimum variance. The theorem sometimes referred as the BLUE (i.e. Best, Linear, Unbiased,
and Estimator) theorem. The detailed prove and logical framework is given in appendix 21, the core idea
can be summarized as follows.
a.Linear: a linear function of a random variable, such as, the dependent variable Y.
An estimator is linear if the equation that defines it is linear with respect to the dependent variable
Y. Otherwise it is non-linear.
102
Prepositions: The least square estimator is linear because we can rewrite the formula for the least
b. Unbiased: its average or expected value is equal to the true population parameter. An estimator
is unbiased if its expected value is equal to respectively, the true value of the pa-
rameter it estimates. Otherwise it is biased. Algebraically,
An estimator is clearly desirable or unbiased if, on average, the estimated values will be equal to the
true value even though in particular sample this may not be so. Put in other way if an estimator
is unbiased then the expected difference between and is zero; the estimates does not tend
to be too high or too low. If it is biased, then it tend to give an answer that is either too high ( if
the expected difference is positive) or too low ( if the expected difference is negative).
c. Minimum variance: an estimator has a minimum variance in the class of linear and unbiased estima-
tors if out of the class of linear and unbiased estimators of , possess the
smallest sampling variances. For this, we shall first obtain variance of then establish that
each has the minimum variance in comparison of the variances to other linear and unbiased estima-
tors obtained by any other econometric methods than OLS. The estimator with minimum variance is
said an efficient estimator.
; Since
103
Furthermore:
……………………..………..2.89
2 2 2
, Since σ X̄ Σci > 0
Therefore, the variance of OLS estimates is the minimum.
The least square estimates (for instance ) are obtained from a sample of observations on Y and
X. Since the method of OLS does not make any assumption about the probability distribution or
nature of ui, it is of little help for the purpose of drawing inference about the PRF based on SRF.
Besides, sampling errors are inevitable in all estimates, it is necessary to apply tests of significance in
order to measure the size of the error and determine the degree of confidence in the validity of the estim-
ates. This problem can be filled if we make assumption about probability distribution of error term, u i .
The error term should follow some probability distribution-more specifically normal distribution.
Adding the normality assumption for ui to the assumptions of classical linear regression
model(CLRM) will make the general theory what is known as the classical normal linear regression
model(CNLRM). Therefore, the probability distribution and will depend on the assumption
We use this assumption to test the significance of the estimators. Hence, before we discuss various test-
ing methods it is important to see whether the parameters are normally distributed or not.
Recall our classical linear regression model assumption related error term ui,
a.
b.
c.
When normality assumption is added to the above assumptions (a and b) we can write the error term
2
as a distributed with mean zero and variance σ , which can be compactly written as.
U i ~ N (0, σ 2 )
104
Where the symbol means distributed as and N stands for the normal distribution, the terms in the
parentheses representing the two parameters of the normal distribution, namely , the mean and vari-
ance. When the two normally distributed variables are not correlated meaning independently distributed
their covariance is zero. Therefore, we can get normally and independently distributed (NID) equation
specified as
……………………………………………………….2.90
A. As pointed out earlier , ui represent the combined effect of all other independent variables
that are not explicitly introduce in the model. Central limit theorem (CLT) provides a theoreti-
cal justification for the assumptions of normality of ui. CLT argue that as the number of obser-
vation increase indefinitely the distribution of error term tends to be a normal distribution. A
variant of the CLT states that, even if the number of variables is not very large or these
variables are not strictly independent their sum may still be normally distributed.
B. With the normality assumptions, the probability distributions of OLS estimators can be easily
derived as any linear function of normally distributed variables is itself normally distributed.
Therefore, OLS estimator are linear function of ei which makes our task of hypothesis test-
parameters(mean and variance) and its theoretical properties have been extensively studied
in mathematical statistics.
D. If we are dealing with a small or finite sample size (say data of less than 100 observation),
the normality assumption play a critical role. It not only helps us to derive the exact prob -
105
ability distributions of OLS estimators but also enables us to use the t, F and statisti-
(both normal) and σ^ (related to the chi square). This simplifies the task
2
pling distributions of and
, ……………………2.91
One can further elaborate the distribution by derive mean and variance of Yi.
I. The mean of the dependent variable is:
a.
…………………..…………………….……………….2.92
By assumption
X i ' s are assumed to be a fixed values in repeated sampling process, the covariance of
E( β 0 + β1 X i )=β 0 + β 1 X i
E(Y i )=β 0 +β 1 X i ------------------------------------------------------------------- (2.93)
II. Variance of dependent variable
The variance of the dependent variable is given by
---------------------------------------------- (2.94)
106
Substitute in the definition of variance (equation 2)
---------------------(2.95)
the distribution of
Y i . Furthermore, X i ' s are set of constant values and hence do not affect the distribu-
Y i≈ N ( β 0 +β 1 X i , σ 2μ ) ----------------------------------------------------------------- (2.96)
^2
will change from sample to sample, these estimators are random variables. As , and σ are ran-
dom variables, we would like to find how close to , to close , to close along with
probability and it distributions relating sample values to their true values. The OLS estimates of
1. Linear function
We need to make use of property of normal distribution which says “........ any linear function of a nor-
mally distributed variable is itself normally distributed.”. Consider and its formula derived so far
…………..…………………………2.97
Therefore, Eq. (2.91) shows that is a linear function of Yi, which is random by assumption. Since
107
Because ki, the betas, and Xi are all fixed, is ultimately a linear function of the random variable ui,
which is random by assumption.
Similar logic hold true for is linear in Y. We have established that: , Substitute
we obtain
…………..2.99
, then
, then
108
1. Suppose σ
2 ^ 2 is estimator ofσ 2 . Show that
is the population variance of the error term and σ
σ^ 2 of maximum likely hood is the biased estimator of the true 2 for the model Yi= +Xi + Ui.
^
^ =Ȳ − β X̄ possesses minimum variance.
2. In the model Yi= +Xi + Ui show that α
3. using the assumptions of simple regression model show that
a. YN ( +Xi, 2)
^ ^
b. Cov (α^ , β ) = - X̄ Var( β )
4. For the model Yi= +Xi + Ui
109
2
The coefficient of determination, r (two-variable case) or R2 (multiple regression) is a summary
measure that tells how well the sample regression line fits the data. The r2 shows the percentage of total
variation of the dependent variable explained by the changes in the explanatory variable(s). If all the ob-
servations were to lie on the regression line, we would obtain a “perfect” fit, but this is rarely the case.
The closer the observations are to the line, the better the goodness of fit. To elaborate this, let’s draw a
horizontal line corresponding to the mean value of the dependent variable Ȳ . (see figure 2.8. below).
……………………………………………………2.100
…………………………………………………………2.101
Then one can write the above equation
⇒Y =Y^ +e i ……………………………………….………………………(2.102)
e =Y −Y^
⇒ i i ……………………………………………….………………(2.103)
Step-2: Find mean of Y
Summing (2.102) will result in
⇒ y i= ^y i + e ………………………….………………..…………(2.106)
Step-4: Find sum of deviation square
By squaring and summing both sides, we obtain the following expression:
Σy2 =Σ( ^y 2 + e )2
But =
(But, . )
Therefore;
111
This is the part of the total variation of Yi, which is explained by the regression line. Hence, the sum of
the squares deviations in the total variation of dependent variable explained by the explanatory variable
¿ ¿ −
and is attributed to the existence of the disturbance term. We have also defined the residual
e i as the
difference between . Hence, the sum of the squared residuals (RSS) gives the total unexplained
−
variation of the dependent variable Y around the mean. That is, RSS =∑ ei =∑ (Y i−Y )
2 2
Such decomposition of the total variation in Y in to fitted Y and unexplained part is only possible for
OLS. Even though common sense suggests that the total variation in Y always be the sum of variation
that can be explained and unexplained, it is not possible using other method of estimation than OLS. The
above three sum of square computed from our supply function is shown in Table 2.11 below.
Table 2.11: The computation of sum of squares from the supply function
Mean of
Y i = 63 Mean = 0 ESS TSS ESS=507
¿
−
It can be seen from this table that the mean of Y i = 63 and equal to Y , which is the mean of Y.
112
n the explained variation (goodness of fit) or r 2 is expressed as a percentage (ratio) of total variation
The
explained by the model. Mathematically; the explained variation as a percentage of the total variation is
explained as:
ESS Σ ^y 2
=
TSS Σy 2 ………………………………….…………….(2.109)
But, but
Then
2
r = Cov (X,Y) / x2x2 = Σ xy / nx2x2 = Σ xy / ( Σx Σy )1/2 ………(2.113)
2
2
r2 = ( Σ xy )2 / ( Σx Σy ). ………….(2. 114)
2
Squaring (2.113) will result in:
Comparing (2.114) and (2.109), we see exactly the expressions. Therefore:
Interpretation of R2
r2 determines the proportion of the variation in Y explained by the variation in X. It is sometimes
known as the coefficient of determination. For example, if , this means that the regression
line explains 90% of the total variation in Y. The remaining 10% is unaccounted by the regression line
is attributed to factors included in the disturbance term. Using our data in Table-2.11 about the values
of ESS and TSS one can compute the coefficient of determination (goodness of fit) for the supply func-
387
r 2 =1− =1−0 . 4329=0 .5671
tion. The TSS and RSS are 894 and 378 respectively. Then 894 indi-
cates that 57% of the variation in Y (supply) is explained by the variation in X (price) and 43% of the
variation in Y are attributed to the changes in other factors not included in our regression function and
are captured by the disturbance term.
If the regression has no constant term, i.e if , the sum of squares decomposition given
above no longer hold. That is, . The value of R2 obtained as is not com-
parable to R2 obtained earlier because denominators are different. The third way of expressing r 2 es-
¿
.
114
Consider that their values in our supply function are ∑ x i y i =156 and ( ∑ x 2i )=48 . Therefore,
¿
¿ ( ∑ xi yi ) 156
r 2 =β 1 . =3. 25 . =0 .5671
(∑ y 2i ) 894
The limit of R2
It is a nonnegative quantity with limits are between zero and one i.e. 0 ≤ r 2 ≤ 1. An r 2 of 1 indicate a
perfect fit, that is, for each i. It implies that the variation in Y is perfectly explained by changes in
¿ ¿ ¿
may not happen in reality for there are a number of factors that affect a certain variable to change. On
the other hand, an r 2 of zero means that there is no relationship between the regressand and the regressor
what so ever (i.e., = 0). It implies that the variation in Y is not at all explained by the explanatory
¿ ¿ ¿
∑ e 2i =1
variable (regression line, Y i =β 0 + β 1 X i ). This in turn implies that ∑ y 2i Or ∑ e2i =∑ y 2i
.
or
115
One can derive r from r2 derived so far as follows
r = ±√ r 2 ---------------------------------------------------------------------------------- (2.119)
If r2 = 0.9621 then r = 0.9809. The value of r2 of 0.9621 means that about 96 percent of the variation
in the weekly consumption expenditure is explained by income. The coefficient of correlation of 0.9809
shows that the two variables, consumption expenditure and income, are highly positively correlated.
Further, the process of obtaining or deriving the value of goodness of fit (r 2) by squaring the formula for
correlation coefficient (r) establishes the relationship between r2 and r. By squaring the above equation
gives
( ∑ x i y i )2 ( ∑ xi y i ) ( ∑ xi y i )
r2= = .
( ∑ x 2i )( ∑ y 2i ) ( ∑ x 2i ) ( ∑ y 2i ) ----------- (2.120)
R 2 (ryˆ , y ) 2 (rx , y ) 2
where ryˆ , y and rx,y are the coefficients of correlation between Yˆ& Y, and X & Y, defined as:
This result hold true even if there are more explanatory variable, provided the regression has a
constant term.
Some of the properties of r are as follows:
1. It is a measure of linear association or linear dependence only; it has no meaning for describing
nonlinear relations like Y = X2
2. It lies between the limits of -1 and +1 ; that is . It can be positive, zero or negative, the
sign depending on the sign of the term in the numerator. For example r=+0.832 between the income
and consumption expenditure indicate a strong and direct linear association. This does not necessar-
ily infer that there was any causal link between the two variables.
116
3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y (rXY) is the same as
that between Y and X (rYX).
117
Table 2.11: Ten family income and expenditure per week
Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family expendi- 7 9 8 11 5 4 8 10 9 10
ture
Compute and interpret R2 and r ?
We have seen that OLS estimates possess a number of nice properties. The next step is to measures the
precision of the estimates. In above estimation, we assume the usual process of repeated sampling in
^
which we get a very large number of samples with size ‘n’; and compute the estimates β ’s from each
sample given the relevant distribution. Since, the data are likely to change from sample to sample, so do
the estimates. We need some measure of reliability or precision of estimators. What we have to do is to
compare the mean (expected value) and the variances of these distributions and choose among the alter-
native estimates the one whose distribution is concentrated or as close as possible to the population pa -
rameter.
a. Variance
We know from the theory of probability that the variance of a random variable is a measure of its
degree of dispersion around mean. The smaller the variances, the closer on average, individual value
are to the mean. When dealing with confidence interval, we know that the smaller the variance of
the random variable, the narrower the confidence interval of a parameter. The variance of the esti -
mators are thus an indicator of precision of estimates. It is therefore of considerable interest to com-
pute the variance of as they depend on Y’s which in turn depends on the random error term
118
2
You may observe that the variances of the OLS estimates involve σ , which is the population variance
of the random disturbance term. It is difficult to obtain the population data of the disturbance term be-
cause of technical and economic reasons. The formula of the variance of involves the
variance of the random term μ , . How do we estimate ? Since we did not have population ,
we estimate it using sample counterpart. That is, we may obtain an unbiased estimate of from the
sample counterpart
-------------------------------------------------------------- (2.124)
σ^ 2 in the expressions for the variances of , we have to prove whether σ^ is the unbi-
2
To use
2
ased estimator of σ , i.e.,
Since where being the sum of the residual square or residual sum of square ,
is the number of degree of freedom (df). The term number degree of freedom means the total
number of observations in the sample ( n) less the number of independent (linear) constraints or re-
strictions put on them. In other words, it is the number of independent observations out of a total of n
observations. There are two condition given by the normal equation for and from
119
2
Thus, is unbiased estimate of the true variance of the error term (σ ).
= …………………..……………………(2.126)
………………..……………(2.127)
Hence, it is assumed that the expected value of the estimated variance is equal to the true variance and
Expression of variance, , together with the assumption that the mean of the error term is
-------------------------------------------------------- (2.128)
…………………..……………(2.129)
………………..………(2.130)
120
These two estimates therefore put two restrictions on the RSS. Therefore, there are n-2 and n, indepen-
dent observations to compute the RSS. Following this logic in the three-variable regression RSS will
have n-3 df, and for the k-variable model it will have n-k df.
In passing, note that the positive root of , i.e. is known as the standard error of esti-
mates or the standard error of the regression(se). It is simply the standard deviation of the Y val -
ues about the estimated regression line and is often used as a summary measure of the “ goodness
of fit” of the estimated regression line.
c. Example
Refer to the above example 2.1 above, one can find the variance and standard errors estimates
^β 0 = 24.4545 var ( ^β 0) = 41.1370 and se ( ^β 0) = 6.4138
^β 1 = 0.5091 var ( ^β 1) = 0.0013 and se ( ^β 1) = 0.0357
cov ( ^β 0, ^β 1) = −0.2172 σ^ = 42.1591
2
r2 = 0.9621 r = 0.9809 df = 8
d. Properties of variance of
2
I. The variance of is directly proportional to σ but inversely proportional to . That is,
2
given σ the larger the variation in the X values, the smaller the variance of and the
2
II. The variance of is directly propo`rtional to σ and but inversely proportional to
III. Since are estimators, they will not only vary from sample to sample but in a given
sample they are likely to be dependent on each other , this dependence being measured by
the covariance
121
. Since, ……………..2.131
There is no general agreement among econometricians as to which of the two statistical criteria, high r 2
or lower standard errors of the estimates is better. Some tend to attribute great importance to r 2 and ac-
cept the parameter estimates while others suggest that acceptance or rejection of the estimate, depends
on the aim of the model in any particular situation and standard errors. Despite this controversy the
majority of writers seem to agree that r2 is more important criteria when the model is to be used for
forecasting while the standard errors acquire a greater importance when the purpose of the research is
explanation and reliability (analysis) of economic phenomenon. A high r 2 has a clear merit only when
combined with significant estimates (low standard errors). When a high r 2 and low standard errors are
not obtained in any particular research the researcher should be very careful in his/her interpretation
and acceptance of the results. Priority should always be given to the fulfillment of the economic a pri-
ori criteria (sign and size of the estimates). Only when the economic criteria are satisfied one should
proceed with the application of first-order and second-order tests of significance.
There are many distributions commonly used for statistical inference, but the most common are standard
normal distribution, t-distribution, and chi-square distribution.
D. Normal and Standard distribution
Random variable which follow normal distribution is distribution with mean ( ) and variance ( ),
The above normal distributions can be standardized or transformed into standard normal distribution by
transforming our coefficient to the standard normal variable, Z, with zero mean and unit variance
122
…………………..…….2.132
Since the variables of interest here are the OLS estimates of the parameters and , they can be
transformed Z as follows
E. The t-distribution
The same logic hold true if the number of observation is less than or equal to 30. The relevant distribu-
tion is termed as students or t-distribution. The procedure for constructing a confidence interval for t
distribution is similar with the one outlined for Z distribution earlier with the main difference in that t-
distribution take into account the degree of freedom. The t statistics for is
From our simple linear regression with two coefficients, we have the t-distribution of OLS estimates as
follows
123
Where is the estimated standard error. Since the true variances
number of parameters estimated) and obtain estimates of the standard error of the coefficients,
and .
Clearly as n increases, the t distribution approaches the standard normal distribution Z ~N(0,1).
normal distribution
t-distribution
χ2 distribution with n − 2 df. Variance ( ) follow chi-square distribution with n-2 degree of freedom as
124
of estimation consists of two parts: point and interval estimation. We have discussed point estimation
(the OLS method of point estimation) along with their reliability measures (standard error) and probabil-
ity distribution thoroughly in pervious sections. Instead of relying on the point estimates alone, we must
establish limiting values with in which the estimate of the true parameter is expected to lie within a cer -
tain “degree of confidence”. To be more specific, we want to find out how “close” say , to
with 1- degree of confidence given (standard error) that help us measure degree of precisions.
The reason is that although in repeated sampling sample mean value is expected to be equal to the true
value, a single estimate is likely to differ from the true value due to sampling fluctuation. In this
part we first consider interval estimation and then take up the issue of hypothesis testing, a topic
intimately related to interval estimation.
To this end, we try to find out two positive numbers and ( lying between 0 and 1), such that
………………………………………………..2.134
Such an interval, if it exist , is known as a confidence interval; 1- is known as the confidence
level; and is known as the level of significance . The symbol is also known as the
size of the (statistical) test. The endpoints of the confidence interval are known as the confidence
limit(also known as critical values), being the lower confidence limit and the upper
confidence limit. In practice and 1- are often expressed in percentage forms as 100 and 100(1-
) percent. For example, if or 5 percent, would read: the probability that the (random)
interval shown includes the true is 0.95 or 95%., commonly , is related standard errors as a
measure of precision say two or three standard errors on either side which has 95% probability of
including the true parameter value.
We choose a probability in advance and refer to it as the confidence level (confidence coefficient). It is
customary in econometrics to choose the 95 percent confidence level. In repeated sampling the confi-
dence limits computed from the sample would include the true population parameter with 95 percent of
the cases and 5 percent of the cases the population parameter will fall outside the confidence limits. The
equation 2.155 shows an interval estimators with a specified probability, of including the true
value of the parameter within specified its limits.
125
It is very important to know the following aspects of interval estimation:
Equation 2.134 does not say that the probability of (the true value) lying between the
Interval 2.134 is random as long as is not known. But once we have a specified sample and
probability is a given fixed interval includes the true .In this situation, is either
in the fixed interval or outside it. Therefore, the probability is either 1 or 0. Thus for our hy-
pothetical consumption-income example , if the 95% confidence interval were obtained ,we
cannot say the probability is 95% that this interval includes the true . The probability
is either 1 or 0.
How are the confidence intervals constructed? If the sampling probability distributions of the estimators
are known, one can construct confidence interval of estimates. With assumptions of normality of the
disturbance term ui, the OLS estimators are themselves normally distributed with mean and variances
given therein. The normality assumption permits us to derive the confidence interval depending on rel-
evant sampling distributions of , and . We can construct confidence interval for Z-distribution, t-
distribution, and Chi-square distribution.
126
In econometrics application the population variance of , which is unknown? When the population
variance is unknown and the sample with which we work is sufficiently large (n >30), we use the stan-
dard normal distribution and perform the Z test (approximately) since the sample estimate of the vari-
ance is a satisfactory approximation to the unknown variance. Furthermore, if we know the sample
standard deviation (σ ( ^β ) ) is a reasonably good estimate of the unknown population standard deviation,
i
then the Z- distribution will be used. As n increases, the standard deviation of the sampling distribution
decreases. That is, when n is large (we have more information), the sample mean can be expected to be
closer to the mean of the population. Hence, the sample mean will be a more reliable (unbiased) esti -
mate of the population mean and the sample estimate is a good estimate of .
We choose a confidence coefficient designated by α, and find out the interval of Z lying between −Z α
2
{ }
P −Z α < Z < Z α = 1- α…………………………………………………2.135
2 2
{ }
^β −β
i i
P −Z α < < Z α = 1- α
2
σ ( ^β ) 2
i
β i = ^β i ± Z α σ ( ^β )………………………………………………2.136
i
2
The confidence interval says the unknown population parameter, β i, will lie within the defined limits (1-
α)100 times out of 100.
127
Example1;
Construct a 95% confidence coefficient for the value of Z lying between –1.96 and 1.96 for estimator?
This may be written as follows.
P {-1.96 < Z < +1.96} = 0.95
¿ ¿
{ }
¿
β i−β i
P −1 . 96< ¿ <1 .96 =0 . 95
σ ( βi )
=0.95
Example-2:
Given estimated regression function as follows construct a 95% confidence interval and interpret?
Interval like (0.26 and 1.24) contains the true unknown (true population) parameter of β 1 in 95 out of
100 cases in the long run.
128
In a two-tail test with level of significance, the probability of obtaining the specific t-value on either
α
side ( –tc or tc) is 2 at n-2 degree of freedom. The probability of obtaining any value of t which is
^
β−β
equal to SE ( β^ ) at n-2 degree of freedom is 1−( 2 + 2 ) i . e . 1−α . Then, the confidence interval for t
α α
distribution will be
i.e. ……………………………………………(2.137)
but
Substitute (2.160) in (2.161) we obtain the following expression.
…………………………………………..(2.138)
The limit within which the true lies at (1−α)% degree of confidence is:
; where
t c is the critical value of t at α 2 confidence interval and n-2 degree of freedom.
Notice that: The width of confidence interval is proportional to the standard error of the estimator. That
is, the larger the standard error, the larger is the width of the confidence interval. Putting differently,
the larger the standard error of the estimator, the greater is the uncertainty of the estimating the true
value of the unknown parameter. Thus, the standard error of an estimator measures of the precision of
the estimators.
Example: Construct 95% confidence interval of OLS estimators for sample size of 25 for hypothetical
relationship?
Solution: The probability of t lying between –t 0.025 and +t0.025 is 0.95 (with n – K degree of freedom)
with n=25 can be constructed as
P {-t0.025 < t < +t0.025} = 0.95
¿ ¿
129
=0.95
¿
or
Se (38.4) (0.85)
From the t table we get the value of t 0.025 at 18 degree of freedom equal to 2.10. Given the above infor-
mation the 95% confidence interval for OLS the parameters can be ?
For
¿
For β 1=2. 88±(2 .10 )(0 .85 )=2 . 88±1 . 79=−1 . 09 and 4 . 67
In the long run , in 95 out of 100 cases intervals like (8.4, 169.6)will contain the true . But as
warned earlier , we cannot say that the probability is percent that the specific interval above contains
the true because this interval is now fixed and no longer random. Therefore, either lies in it or
does not. The probability that the specified fixed interval includes the true is therefore 1 and 0. The
where the χ2 value in the middle of this double inequality is as given in the above equation and where
2 2
χ 1−α /2 and χ α /2 are two values of χ2 (the critical χ2 values) obtained from the chi-square table for n − 2
df.
130
σ^
2
But 𝜒2 = (n2) 2 ……………………………………………………………..2.139
σ
σ^
2
Substituting χ2 = (n-2) 2 and rearranging the terms, we obtain
σ
{ }
2 2
σ^ 2 σ^
P ( n−2 ) 2
< σ < ( n−2 ) 2 = 1-𝛼………………………………….2.140
χ α /2 χ 1−α /2
Then,
2.8. Which distribution (t or Z) is relevant and why?
2.9. Construct a 95% confidence interval and interpret?
The aim of statistical inference is to draw conclusions about the population parameters from the sample
statistics. The concept of hypothesis testing may be stated simply as follows: is a given observation or
finding compatible with some stated hypothesis or not? The word “compatible,” used here, means “suf-
ficiently” close to the hypothesized value so that we do not reject the stated hypothesis.
To this end we start with hypothesized value for estimator from prior experience or theory. The form in
which one expresses the null and alternative hypothesis is important in defining rejection (acceptance)
region or critical region of the test. The hypothesis which we wish to test (on the basis of the evidence
of our sample estimate) is called the null hypothesis (H0) whereas the counterpart proposition of the null
hypothesis, alternative hypothesis (H1). The two hypotheses are opposite to each other. It may take one
of the following forms: two tailed or one tailed. The test simply differ depending on whether we
use one or two tailed hypothesis. It may take one of the following forms.
If we know value of
131
In this case the rule to accept or reject the null hypothesis is developed on the basis of test statistic ob-
tained from the data. This can be done using various tests. The most common ones are:
i) Standard error test
ii) Confidence interval(Z-test , Student’s t-test)
iii) Test of significance
iv) P-value test
Given the hypothesis, the next step is to test their statistical reliability, by applying some rule which will
enable us to decide whether to accept or reject our estimates. To make such decisions compare the sam-
ple estimate with the true value of population parameter. However, the population parameter is un-
known. Basic question will be how are we going to make the decision whether to accept or reject the
sample estimate given that we do not have the appropriate yardstick (that is the population parameter)
for making comparison required? To bypass such difficulty we make some assumption about the value
of the population parameter and use our sample estimate in order to decide whether our assumption is
acceptable or not.
Generally, the acceptance or rejection of the null hypothesis has a definite economic meaning. The ac -
ceptance of the null hypothesis ( ) implies that the explanatory variable(X) does not influence
the dependent variable Y and should not be included in the function. Because the conducted test does not
provided enough evidence to reject H0 or there is no relationship between X and Y or changes in X leave
Y unaffected. Rejection of the null hypothesis doesn’t mean that our estimate of and is not cor-
rect estimate of the true population parameter . It simply says that estimates comes from a
The statistical significance of the estimates is presented/ reported in one of the following three ways:
i. The estimates are significantly different from zero
ii. The estimates are statistically significant or insignificant
iii. We reject or accept null hypothesis
132
In statistics, when we reject the null hypothesis, we say that our finding is statistically significant. On
the other hand, when we do not reject the null hypothesis, we say that our finding is not statistically sig-
nificant.
This test uses the standard error of the estimates as a test statistic to decide on the rejection or accep-
tance of the null hypothesis. It uses two tailed hypothesis in which helps us to decide whether the esti-
mates and are significantly different from zero or not. The standard error test is an approximated
test (which is approximated from the z-test and t-test) and implies a two tail test conducted at 5% level
of significance. Steps to be followed are:
Formally we test the null hypothesis against the alternative for the slope param-
,
: and
Step 3: compare the standard deviations (errors) with values of estimates and make decision.
133
and furthermore and
Step4: Make decision
Decision Rule
SE( β^ i )> 1 2 β^ i
If , accept the null hypothesis and reject the alternative hypothesis. We conclude
^
that β i is statistically insignificant. In other words acceptance of H 0 implies that there is no evi-
dence from the sample data which implies relationship between Y and X. That is the relation-
ship can be ..
SE( β^ i )< 1 2 β^ i
If , reject the null hypothesis and accept the alternative hypothesis. We conclude
^
that β i is statistically significant.
Example:
Suppose using data collected from a sample of size n=30, we estimated the following supply function.
Test the significance of the slope parameter at 5% level of significance using the standard error test?
Solution:
then
SE( β^ i )< 1 2 β^ i
. i.e Thus, the β^ is statistically significant at 5% level of significance.
It indicates that there is relationship between price and quantity demand.
There are two mutually complementary approaches for devising such rules, namely, confidence interval
and test of significance. In the confidence-interval procedure we try to establish a range or an interval
with certain probability of including the true but unknown β i . and see whether the hypothesized value is
in the interval or not. But, in the test-of-significance approach we hypothesize some value for β i and try
134
to see whether the computed ^β lies within reasonable (confidence) limits around the hypothesized
i
value. Both of these approaches predicate that the variable(estimators) under consideration has some
probability distribution and that hypothesis testing involves making statements or assertions about
the value(s) of the parameters if such distribution.
I. Test of significance
Since sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order
to measure the size of the error, determine the degree of confidence and measure the validity of these es-
timates. If we invoke the assumption that ui ~N (0, σ2), then we just use the t-test (or Z-test) to test a hy-
pothesis about any individual partial regression coefficient. The model we are going to test is specified
as follows
The procedure for testing a hypothesis using test of significance approach involves the following steps;
Step 1: State null hypothesis (H0) and alternate hypothesis (H1)
b. t-test : When sample size is less than 30, we can derive the t-statistics of the OLS estimates
135
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2. Test statistics derived in this way can be shown to follow a t-distribu-
tion with n-2 degrees of freedom.
c. The same logic hold true for F-distribution.
Step 4: Select (specify) suitable significance level ( ). In making decision one never be 100% sure in
making the right decision. We choose a level of significance for deciding whether to accept or reject our
hypothesis. We are liable to commit one of the following types of error.
Type I Error: We reject the null hypothesis when it is actually true
Type II Error: We accept the null hypothesis when it is actually false
In accepting or rejecting a null hypothesis you are taking a chance of being wrong is α percent level of
significance. That is, the probability of committing a Type I error or the risk of rejecting the hypothesis
if it is actually true. It is customary in econometrics to choose the 5% or 1% level of significance. The
logical interpretation is that in making our decision we allow (tolerate) five times of a hundred to be
“wrong”, or to reject H0 when it is actually true. Since we want the probability of being wrong to be
small, we assign a lower value and we call it the level of significance of our test.
Step 5: Identify relevant distribution and choose the location of the critical region:
After defining the relevant distribution one should compute the range of values (confidence interval) of
the test statistic which results in a decision to accept or reject the hypothesis with probability 100(1−α)
%. These are theoretical (tabular) value(s) that defines the boundary of the critical region or the accep-
tance region for the test of OLS estimators with certain degree of confidence. This is some tabulated
distribution with which to compare the estimated test statistics. It is common to look it up from the
standard table of relevant distribution.
obtain the critical value from the t –table or Z table. The probability of test is now divided in to two. Ac-
α
cordingly, obtain critical value of z, called zc at 2 from Z table if number of observation is more than
30. Similarly, t-values with which to compare the test statistic can be obtained from the relevant t-tables.
That is, a critical value at the point such that and with n-
136
2 degree of freedom for two tail test from relevant t table. The z- table in the Appendix 2.1 and t-table
in 2.2 in the entry for n-2 degree of freedom at the given level of significance(say ).
we choose the 5% level of significance ( α ) , then each tail will include the area (probability) 0.025 (
α / 2) to obtain critical value from the relevant distribution. One should obtain those values from the rel-
evant z-table. Furthermore, in applied econometrics it becomes customary to perform a two-tail test.
f(x) f(x)
137
Figure 2.11: one-tailed test for
This approach can be used when we have a strong a priori or theoretical expectation regarding the sign
of the coefficients of economic relation (or expectations based on some previous empirical work) that
the alternative hypothesis is one-sided or unidirectional rather than two-sided. For instance, in the t-ta -
ble in annex 2.2 the entry for n-2 degree of freedom at the given level of significance(say ) and
Step 6: make decision or draw an engineering conclusion: The decision to accept or reject the hypothe-
sis is made usually by comparing test statistics with critical region. This test compares the value of Z
obtained from the sample estimates with the critical values of Z at the given significance level. If the em-
pirical value of Z falls in the critical region, we accept the null hypothesis because the probability of ob-
serving the empirical Z (if our hypothesis were true) is very small.
That is, compare t* (the computed value of t) and tc (critical value of t).
Two tailed test: If the calculated t-statistics (t*) falls in the critical region, reject null hypoth -
One tailed test: that is, if t*< tc , accept H0 and reject H1. The conclusion is that ̂ is statistically
insignificant. But, if the calculated t-statistics (t*) falls in the shaded area of the figure t c
t * tc
(critical region), then reject H0 and accept H1. That is, if the computed value lies outside
the interval, reject Ho. We conclude that is statistically greater than o ( if the alternative
had be
0 ). For Z distribution : compare z (critical value of z) and z* (the computed
c
value of z). If z* > zc, reject H0 and accept H1. The conclusion is that ̂ is statistically signifi-
cant. If z*< zc, accept H0 and reject H1. The conclusion is that ̂ is statistically insignificant.
such that the area to the right of it is one half the levels of significance. i.e.
Step-5: reject H0 if and conclude that ̂ is significantly different from ̂ at the level
accept otherwise.
ii. One tailed
The procedure for two sided alternative testing is as follows
Step-1:
point such that the area to the right of it is equal to the level of significance. i.e. .
Step-5: reject H0 if and conclude that ̂ is significantly different from ̂ at the level
, accept otherwise.
Example:
Suppose that from a sample size n=20 observations we estimated the following consumption function:
C 100 0.70Y
(75.5) (0.21)
The values in the brackets are standard errors. Given null hypothesis:
H 0 : i 0 against the alternative
H1 : i 0
test statistical significance at 5% level of significance.
a. the t-value for the test statistic is:
ˆ 0 ˆ 0.70
t* 3. 3
SE ( ˆ ) SE ( ˆ ) = 0.21
139
b. Since the alternative hypothesis (H1) is stated by inequality sign ( ) , it is a two tail test, we di-
vide
2 0.05 2 0.025 to obtain the critical value of ‘t’ at
2 =0.025 and 18 degree of freedom
(df) i.e. (n-2=20-2). From the t-table ‘tc’ at 0.025 level of significance with 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc which implies that ̂ is statistically significant.
d. Decision: we reject Ho and accept H1.
In practice, there is no need to estimate the confidence interval explicitly. One can compute the test
statistic and see whether it lies within the acceptance or rejection (critical) region. We can summarize
the t test of significance approach for hypothesis testing as shown in Table 2.12.
Type of H0 H1 Reject H 0
Hypothesis if
¿
Notes: β 2 is the hypothesized numerical value of β2
|t | means the absolute value of t. -t α / 2 or t α / 2 means the critical t value at the α/2 or -α/2 level of signifi-
cance and (n − 2) degrees of freedom for the two-variable model. The same procedure holds to test hy-
potheses about β1 and to undertake Z-test.
value”.
Example-7: Suppose that we have estimated the following supply function from a sample of 700 obser-
vations
H 0 : 0 0 H1 : 0 0
and against the alternative for the intercept.
ˆi 0 ˆi
z*
our case
i 0 , then t* becomes: SE ( ˆi ) SE ( ˆi )
141
2. Show that if a coefficient is significant at the 1 % level, it will also be significant at any
level higher than that?
3. Show that if a coefficient is insignificant at the 10% level, it will not be significant at any
lower than that?
In the confidence-interval procedure we try to establish a range or an interval that has a certain probabil-
ity of including the true but unknown β i. and compare whether the interval include hypothesized value
of estimates. Confidence intervals are almost invariably two-sided, although in theory a one-sided inter-
val can be constructed. For instance: we estimated a parameter and found to be say 0.93, and a “95%
confidence interval” to be (0.77, 1.09). The result show that our 95% confident interval containing the
true (but unknown) value of as the parameter value is in the interval.
β i, will lie within the defined limits or the confidence interval ^β i ± t α se( ^β ) with n – K degrees
i
2
pothesized value of in the null hypothesis is within the confidence interval, accept H 0 and
reject H1. The implication is that is statistically insignificant or the estimates are not stati-
142
cally different from zero. If the hypothesized value of in the null hypothesis is outside the
Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations.
Y 128.5 2.88 X e
(38.2) (0.85)
The values in the bracket are standard errors
Then
a. Construct 95% confidence interval for the slope of parameter
b. Test the significance of the slope parameter using constructed confidence interval.
Solution:
a. The limit within which the true lies at 95% confidence interval is:
ˆ SE( ˆ )t c
ˆ 2.88
SE ( ˆ ) 0.85
tc
at 0.025 level of significance and 18 degree of freedom is 2.10.
ˆ SE ( ˆ )t c 2.88 2.10(0.85) 2.88 1.79.
The confidence interval is: (1.09, 4.67)
b. The value of in the null hypothesis is zero which implies it is outside the confidence interval.
143
First, it is calculating probability that the random variable t is greater than the observed t c, that is
P value P(t tc )
. This probability is that same as the probability of type I error or the probability of
rejecting a true hypothesis. A high value for this probability implies that the consequences of reject-
ing the true Ho are sever. A lower p-value implies that the consequence of rejecting a true H 0 er-
roneously are not very sever. That is, the probability of making a mistake of type-I is low. And hence
we are “safe” in rejecting Ho.
Second, compare it with selected level of significance ( ) and make decision. The decision rule is,
therefore to accept Ho ( that is not reject it) if the p-value is too high ,say more than 0.10, 0.05, or
0.01, reject otherwise. In other words, if the P-value is higher than the specified level of signifi-
cance(say ), we conclude that the regression coefficient is not significantly greater than
0 or
different from null hypothesis at level of significance. If the P-value is less than we reject
Ho and conclude that is significantly greater than 0 .
How high /low being determined by the investigator? The modified steps for the p-value approach are as
follow:
i. One tailed
The procedure for two sided alternative testing follow the following steps
Step-1:
144
Step-5: reject Ho if the p-value ( ) is less than , level of significance and conclude that the
coefficient statistically significantly if P-value is less than the given level of significance( ).
Step-1:
Step-5: reject H0 if the two p-value ( )is less than , level of significance
Example: Using our data in table 2.13 on consumption expenditure and income the following result
will be obtained.
145
. reg Y X
The test based on P-value is that both slope coefficient are statistically significant with p-value of 0%
respectively.
Note that: the t-test and P-value are equivalent. We note it from figure 3.5 that if p-value is less
than the level ( i.e P (t t*) , then the point corresponding to t* must necessarily be to the right
of
t c n 2 ( ).
This means that t* will fall in the rejection region. Similarly, if P-value
P (t tc )
Although a test for significance of the error variance is not common , we present it here for
2
146
Step-3: from the chi- square table Appendex 2.3.C, we look up the value
*n 2 ( ) such
*n 2 ( )
Step-4: reject Ho at the level if c
df ( σ^ )
2 2 2
σ σ0
2 2
σ < σ0
< χ 21−α , df
σ 20
df ( σ^ )
2 2 2 2 2
σ = σ0 σ ≠ σ0
2 > χ 2α /2 , df or < χ ¿ ¿
σ0
_____________________________
2
α is also known as the probability of committing a Type I error. A Type I error consists in rejecting
a true hypothesis, whereas a Type II error consists in accepting a false hypothesis.
Note: σ 20 is the value of σ2 under the null hypothesis. The first subscript on χ2 in the last column is the
level of significance, and the second subscript is the degrees of freedom. These are critical chi-square
values.
Example: 4: If σ^ 2 = 42.1591 and df = 8, then construct confidence interval for σ2 taking 𝛼 =5% and in-
H 0 : 2 85 vs H1 : 2 85
terpret it? If we postulate that . Equation above for 𝜒2 in 2.90 provides the
test statistics for H0. Substituting the appropriate values in 2.90 it can be found that under H 0, 3.97
2
. If we assume 5%, the critical values are 2.1797 and 17.5346. Since the computed lies
2 2
between these limits, the data support the null hypothesis and we do not reject it. This test proce -
dure is called Chi-square test of significance.
147
by studying components of regression. A study of these components of TSS is known as Analysis of
variance (ANOVA) from the regression viewpoint. It introduces the learners with an illuminating and
complementary way of looking at the statistical inference problems. In the previous section equation
2.131 we developed the following identity
TSS ESS RSS
yi2
y
ˆ2 ei2
Total Explained Un exp lained
var iation var iation var ation
2 x2
yi2 ei2
i
Total Explained Un exp lained
var iation var ation
var iation
………………………………………2.141
It is used to judge the overall (joint) significance of parameters. It tries to test null hypothesis which
says
0 1 0 against 0 1 0 . The F- ratio provides a test statistic to test the null hypothesis
number of independent observations on which it is based. For the regression with two variables the
TSS has n-1 degree of freedom as it loss 1 degree of freedom in computing the sample mean Y ,
ESS has n-k as there is only two parameters we estimate, and RSS has n-2 degree of freedom . All that
need to be done is to compute the F-ratio and compare it with the critical F value obtained from
the F-table at the chosen level of significance , or obtain the p-value of the computed F statistics.
To this end we compare calculated F -value (F c) with F tabulated (FT). The F* ratio (the observed
variance ratio) is obtained by dividing the two ‘mean square errors’ (MSE), ESS (between the means)
and RSS (within the sample). The MSE is one of the desirable properties of OLS and it appears in the
last column of the Analysis of Variance (ANOVA) table.
Let us arrange the various sums of squares and their associated degree of freedom in the following
Table, which is the standard from of the ANOVA table.
148
Table 2.15: ANOVA table
y 2
i
(n–1)
y 2
i
V1=1
n 1 V2=7.82
Symbolically,
2
F * ( k 1,n k )
ESS /(n 1)
y
RSS /(n K ) y 2
…………………………………2.142
Suppose you are given with the following regression line and intermediate results for 25 sample obser-
vations.
Y 89 2.88 X
Se (38.4) (0.85)
r 2 0.76 e 2
i 135
In order to compile the ANOVA table based on the information given we need to get ESS (between the
mean) and TSS as follows
r 2
1
e 2
i
y 2
149
135
1 0.76 0.24
y2
0.24 y 2 135
135
y 2
0.24
562.5
Thus, the TSS will be obtained as
ESS TSS RSS
Hence,
ESS 562.5 135 427.5
To appraise the findings from the regression analysis construct the ANOVA table, obtain F*, and com-
pare the value of F* with the tabulated F value as follows.
Table 2.16: ANOVA example
y 2
i
(n – 1) 562.5/24=23.4 V1=1
V2=7.82
The decision for F test is to reject the null hypothesis if F* > F0.025. Therefore, since the tabulated F
value, F0.05 (1, 23) is 7.82 and F* is 8.82 we reject the null hypothesis. This implies that the parameter es-
timates are statistically significant.
small in relation to TSS and the computed value of the statistics should fall in the center of the
distribution. A value in the right tail of the distribution arise when SSR is large relative to SSE
and is evidence against the null hypothesis. H 0 is rejected at significance level of if F
F F (1,n 2) .
notice that although the alternative hypothesis 0 is two sided, the rejection region
is one tailed. It would not make sense to reject H0 on the basis of a value of the statistics in the
left tail of the distribution. A small value of SSR relative to SSE is evidence consistent with
0 and not 0 .
150
Figure 4.5: The F distribution with tabulated value cutting off a tail area of
F*
y /( K 1)
r 2 Y , X /( K 1)
tests are formally equivalent. For the null hypothesis 0 , the t test is based on the com-
ˆ
t .
sˆ
puted statistics Using previously established results and notation,
2
ESS ˆ 2 xi 2 ˆ
1 ˆ 2
F (1, v) 2 t 2( v )
TSS s2 s sˆ
n2 xi 2
…………….2.144
It is always the case that F (1, v) t (v) . Furthermore, a similar relationship exists be-
2
tween the tabulated values of the two distributions for v degree of degree of free-
dom. This means that the outcomes of the two test procedures are always consistent
given a chosen level of significance.
5. The regression analysis is more powerful than the ANOVA when analyzing market data
which are not experimental. Regression analysis provides all information which we may
obtain from the method of ANOVA. Furthermore, it provides numerical estimates for
the influence of each explanatory variable. But, the ANOVA approach the addition to the
explanation of the total variations which are obtained by the introduction of an additional
variable in the relationship.
151
6. It is often argued that the ANOVA approach is more appropriate for the study of the in-
fluence of qualitative variables. Because qualitative variables (like profession, sex, and
religion) do not have numerical values and hence their influence cannot be measured by
regression analysis. However, ANOVA technique solely depends on the values of Y and
it does not require knowledge of the values of X’s. This argument, however, lost a lot of
its merit with the expansion of dummy variable in regression analysis
2.8. Prediction/forecasting
One of the main objectives of a regression analysis is prediction. A regression equation is used to predict
E (Y )
values of dependent given the value of an independent/ exogenous variable, i. e X . For instance,
using sample from population we have estimated sample regression line for consumptions expenditure
as Y 30.71 0.817 X
where Y is consumption expenditure, and X is income. If X=1000, then
. That is, the estimated average expenditure when income is 1000 birr
is 837.71 birr.
There are two types of predictions: mean prediction and individual predictions. Given CLRMA, it is
possible to construct confidence interval for such predictions. Confidence interval for the regression
prediction provides a way of assessing the quality of prediction.
cast error. To assess these error we need to find the sample distributions of
Y0 .
152
It is common to find the difference between individual response and the average response. The differ-
ence is related to the variance of individuals about the average response.
Let X0 be the given value of X, we would be interested in predicting the mean of Y, given X0. That is,
Yp 0 1 X 0
we predict the corresponding value of Y sample regression line, to predict to predict
YP 0 1 X P .
Variance
Because and
1 are estimated with imprecision, the predictor Y 0 is also subject to error. To take
var(YˆP YP ) var( 0 0 ) X 02 var( 1 1 ) 2 X 0 cov( 0 0 , 1 1 )
153
The variance increases as the deviation in value of X0 is from mean increases.
Pr ˆ0 ˆ1 X 0 SE (Y )t / 2 ˆ0 ˆ1 X 0 ˆ0 ˆ1 X SE (Y )t / 2 1
………2.147
Where t* is the critical value of the t-distribution obtained earlier. Note that when X0 is farther
away from the mean X, is larger and the corresponding confidence interval is wider. This
means that if a forecast is made too far outside the sample range, the reliability of the forecast de-
crease.
Individual prediction is all about individual Y value corresponding to X 0. Suppose, we have a sample re-
gression equations corresponding to population regression equations specified as follows:
Yi 0 1 X i ui for i 1, 2, 3....., n
PRF
E (Y ) 0 1 X i for i 1, 2, 3....., n
YˆP 0 1 X P
Sample regression function corresponding to population regression function, , is pre-
YˆP Y0
dicted value of Y. The predicted value of will not normally coincide with the true value of which
154
The difference between the predicted Y and actual Y is termed as a prediction error or forecast error of
function. That is
f : (Y YˆP )
( 0 1 X 0 u ) ( 0 1 X 0 )
( 0 0 ) ( 1 1 ) X 0 u
(0 0 ) ( 1 1 )
In equation above and , are sampling error in estimating the value of unknown pa-
rameters and .These error tend to decline as we increase sample size. Since under classical as-
sumptions estimator ( 0 , 1 and i ) are all normally distributed with forecast error is a linear function
u
a. mean : E(
̂ 0 )= 0 E(
̂1 )= 1 and E(
ui
) =0
E ( f ) 0 E ( ˆ0 ) ( 1 E ( ˆ1 )) X 0 E (u ) 0
It follows that
The mean of sampling distribution for function is zero. The OLS predictions are an unbiased predic-
tor of Y.
E (YˆP YP ) 0
YˆP ˆ0 ˆ1 X 0 …………………………………………………2.146
2 2
E (Y0 Yˆ0 ) 2 E Y0 E (Yˆ0 ) E Yˆ0 E (Y0 )
2
f 2 2 E Yˆ0 E (Y0 )
2
E Yˆ0 E (Y0 ) Yˆ0 E (Y0 )
Variances of the predictor about its mean
2
1 X X
2 n0
n
E Yˆ0 E (Y0 )
2
Xi
2
but = i 1
form above equation
155
2
1 X X
f 2 2 2 n0
n 2
i 1
Xi
……………………………..2.147
Alternatively; the variance of the prediction error is:
var(YˆP YP ) var( ˆ0 0 ) X 02 var( ˆ1 1 ) 2 X 0 cov( ˆ0 0 , ˆ1 1 ) var( P )
2 X 2
2 2
ˆ
var( 1 1 ) 2 cov( ˆ0 0 , ˆ1 1 )= -X
var ˆ0 0 = 2
But x ; nx 2 ; nx
var(YˆP YP ) 2
X i
2
2
X 2
0
2 X 0 2
X
2
n x i
2
x 2
i xi
2
1 ( X 0 X )2
var(YˆP YP ) 2 [1 ]………………………………………….2.148
n xi2
The variance increases as the value of X 0 is from the mean of the observations increases on the basis of
which
ˆ0 and 1 have been computed. That is, prediction is more precise for values nearer to the mean
(as compared to extreme values). This shows that under classical assumption the forecast error has a
f2 2
S 2
Sf 2
But we do not have , we replace by its unbiased estimator ( ), we can obtain an unbi-
ased estimator of the variance of the forecast error function by
1 ( X X )2 f Y Y
Sf 2
S [1 0 2 ]
2
0 0 N (0,1) distribution
n xi f f …………2.149
f Y Y
0 0 N (0,1) distribution
Sf Sf
is student (t) distribution with n-2 degree of freedom.
Then, the general expression for a confidence interval can be constructed using t-tests within-sample
prediction (interpolation): if X0 lies within the range of the sample observations on X and out-of-sample
prediction (extrapolation): if X0 lies outside the range of the sample observations. But the latter is not
recommended!
On the basis of above distribution we are able to construct a confidence interval for the unknown Y 0.
Example: Using our example on consumption expenditure predict (a) the value of sales for a firm?
156
E (Y )
A. mean prediction:
When
X0 =100 we predict that X 0100
YˆP 0 1 X P
YˆP 75.3645
1 ( X 0 X )2
Y 2 2 [1 ]
n xi2
1 (100 170) 2
Y 2
42.159[1 ]
10 33, 000
E (Y ) 0 1 X 0
Therefore 95% confidence interval for X0
Pr ˆ0 ˆ1 X 0 SE (Y )t / 2 ˆ0 ˆ1 X 0 ˆ0 ˆ1 X SE (Y )t / 2 1
Pr 75.3645 2.302(3.2366) ˆ0 ˆ1 X 0 75.3645 2.302(3.2366) .95
Pr 67.9010 E (Y
X 0100
) 82.838 .95
Example: Given where Yi is average value of sales for firms and X i is advertising
expense. Predict (a) the value of sales for a firm, and (b) the average value of sales for firms, with an
advertising expense of six hundred Birr.
Solution:
ˆ 1 ( X 0 X )2
seˆ(YP ) ˆ [1
* 2
]
n xi2
157
1 (6 8) 2
seˆ(YˆP* ) 1.35 1
10 28
1.35(1.115) 1.508
1 ( X X )2
seˆ(YˆP* ) ˆ 2 [ 0 2 ]
Interval prediction: 95% CI:
n xi
ˆ 1 (6 8) 2
se(YP ) 1.35
*
10 28
[6.56,9.64]
Now that we have presented the results of regression analysis of our consumption–income example
above, we would like to question the adequacy of the fitted model. How “good” is the fitted model? We
need some criteria with which to answer this question. First, are the signs of the estimated coefficients in
accordance with theoretical or prior expectations? A priori, β1, the marginal propensity to consume
(MPC) in the consumption function, should be positive. In the present example it is. Second, if theory
says that the relationship should be positive, and we found it, is it statistically significant application?
The MPC is not only positive but also statistically significantly different from zero; the p value of the es-
timated t value is extremely small. The same comments apply about the intercept coefficient. Third, how
well does the regression model explain variation in the consumption expenditure? One can use R 2 to an-
158
swer this question. In the present example R 2 is about 0.96, which is a very high value considering that
R 2 can be at most 1.
Thus, the model we have chosen for explaining consumption expenditure behavior seems quite good.
We should also see whether our model satisfies the assumptions of CNLRM. We will not look at the
various assumptions now because the model is patently so simple. But there is one assumption that we
would like to check, namely, the normality of the disturbance term, ui. Recall that the t and F tests used
require that the error term follow the normal distribution. Otherwise, the testing procedure will not be
valid in small, or finite, samples.
Normality tests: although several tests of normality are discussed in the literature, we will consider just
three: Histogram of residuals, Normal probability plot (NPP), and Jarque-Bera test(JB)
1. Histogram of residuals
A histogram of residual is a simple graphic device that is used to learn something about the shape
of the probability density function(PDF) of a random variable. On the horizontal axis , we divide the
values of variable of interest (OLS residual) in to suitable intervals and in each class interval we
erect rectangles equal in height to the number of observations(frequency) in that class interval. If
you mentally superimpose the bell shaped normal distribution curve on the histogram, you will get
some idea as to whether normal (PDF) approximation may be appropriate. It is always a good
practices to plot the histogram of the residuals as a rough and ready method of testing for normal -
ity assumption.
1. Normal probability plot(NPP)
A comparatively simple graphic device to study the shape of the probability density function
(PDF) of a random variable is the normal probability plot(NPP) which makes use of normal
probability paper. For such a specially designed graph paper on the horizontal or x-axis, we plot
ui
values of the variable of interest (say OLS residuals, ) and on the vertical, or Y axis we show
the expected value of this variable if it were normally distributed. Therefore, if the variable is in
fact from the normal population, the NPP will be approximately a straight line. Stated other way , if the
fitted line in the NPP is approximately a straight line, once can conclude that the variable of in-
terest is normally distributed.
2. Jarque-Bera test(JB)
159
The JB test of normality is an asymptotic or large sample test . It is also based on the OLS resid -
uals. This test first computes the skewnesss and kurtosis of the OLS residuals and use the follow -
ing test statistic;
s 2 ( K 3) 2
JB h ..................................................................2.150
24
Example: using consumption data estimated result may be summarized and presented as follows
Y 128.5 2.88 X
(38.2) (0.85) R2 = 0.93. , n=25
Alternative approach
160
When the only restriction likely to be of interest are
ˆ0 0 and /or
ˆ1 0 researcher some
ˆ0 ˆ1
ˆ ˆ
times report the values of the t-ratios se( 0 ) and se( 1 ) in parenthesis under coefficient rather than
standard errors. In this way the reader can observe immediately whether the t-ratio exceeds the crit -
ical values corresponding to some chosen significance level.
Y 0 1 X
(t0 ) (t1 ) R 2 __ n ____ d . f . ___ F ___
……………………2.151
Sometimes, result is presented with more information including the estimated coefficients, the corre-
sponding standard errors, the p-values, and some other indicators are presented in tabular form.
Y 0 1 X
( s.e0 ) ( s.e1 ) R 2 __ n ___ d . f . __ F __
(t0 ) (t 1 )
( P value) ( P value) ……………2.152
For example, pervious consumption expenditure – income regression results is reported as
Y 24.45 0.5091X
( s.ei ) (64138) (0.0357) R 2 0.9621 d . f . 8 F1,8 202.87
(ti ) (3.8128) (14.2605)
( P v) (0.00026) (0.0000) --- 2.153
In Eq. (2.153) the figures in the first set of parentheses are the estimated standard errors of the regression
coefficients, in the second set are estimated t values computed under the null hypothesis that the true
population value of each regression coefficient individually is zero and in the third set are the estimated
p values.
Accordingly, for each coefficient they are computed as
24.45 0 24.45
t 0 0 3.8128
se( ) 6.4138 6.4138
0.5091 0 0.5091
t 1 1 14, 2605
1
se( 1 ) 0.0357 0.0357
One can compare this p-value with tabulated value in the level of significance is reported. Usually, the
“2-t” Rule of Thumb is used for 95% level of confidence.
By presenting the p values of the estimated t coefficients, we can see at once the exact level of signifi-
cance of each estimated t value. Thus, under the null hypothesis that the true populations intercept value
161
is zero, the exact probability (i.e., the p value) of obtaining a t value of 3.8128 or greater is only about
0.0026 for 8 df which less than 0.05(level of significance). We reject H 0 in a sense that the sample evi-
dence enable as to reject the argument that there is relationship between the two. Therefore, if we reject
this null hypothesis, the probability of committing a Type I error is about 26 in 10,000, a very small
probability indeed. For 1 , the probability of obtaining at t-value of 14.2605 or greater is about
0.000003 which is very small in dead.
If the true MPC were in fact zero, our chances of obtaining an MPC of 0.5091 would be practically zero.
But, our data says that p-value is closer to zero, which says probability of type I error is very small.
Hence, we can reject the null hypothesis that the true MPC is zero and it is statistically significant.
Earlier we showed the intimate connection between the F and t statistics, namely, F1,k = t2k . Under the
null hypothesis that the true β0=0, ( equ. 2.30) shows that the F value is 202.87 (for 1 numerator and 8
denominator df) and the t value is about 14.24 (8 df); as expected, the former value is the square of the
latter value, except for the round off errors.
In computing the elasticties from regression line we use the estimates of , , and of sample
2.11. Applications
Using date in table 1.4 we have estimated and found the result as indicated in table below
162
. reg Y X
Answers
a.
163
b. All the sign are in line with the economic theory, but the coefficients should be between one
and zero. For this case it is above one.
c. Both constant and coefficient is statistically significant
d. The independent variable has relationship with dependent variable.
A researcher wants to see relationship between sales and advertising expenditure presented in table 2.7
the following regression result is obtained
Table 2.17: regression result between sales and advertising
164
Summary
We have seen simple linear regression model in which there is a relationship between one dependent
(say Y) and independent variable (say X). The econometric model is specified as .
The model has two components: deterministic part, and stochastic part . The basic
reason for random part are; omission of variables from the regression, random behavior of the human
beings(behavior), imperfect specification of the mathematical form of the model, errors of aggregation,
unavailability of data, errors of measurement, poor proxy variables, principle of parsimony…. etc . Al-
though the model is simplistic, and hence unrealistic in most real-world situations, a good under-
standing of its basic underpinnings will go a long way towards helping you understand more
complex models.
The basic objectives are to estimate the relationship if it exist between variables, forecasting economics
and using the model for policy purpose. To estimate the model we use population data as a principle,
but in most practical situation we do not have full information on population due to unmanageability of
population and resource problem for data collection. Hence, we use sample regression function to better
estimate population regression function.
To this end, it is better to follow econometric methodology stated in chapter one. We start our economic
with theoretical framework of economic relationship and extend it to econometric model with set of as-
sumptions related to error terms and it’s distributions. The next step is estimation of the numerical val-
ues of the parameters of economic relationships. There are three of the most commonly used methods
estimation. These are ordinary least square method (OLS); maximum likelihood method (MLM);
method of moments (MM). Having estimated the model, it should be evaluated. There are three criteria
for evaluation: economic criteria, statistical criteria, and econometric criteria. Then hypothesis testing
will follow in which we start with hypothesized value for estimator from priori experience or theory.
The aim of statistical inference is to draw conclusions about the population parameters from the sample
statistics. Given the hypothesis we obtain estimates of the parameters of economic relationship and then
test their statistical reliability by applying some rule which will enable us to decide whether to accept or
reject estimates. There are many rules for testing such standard error approach, test of significance and
confidence interval, and p-value approach. There are two mutually complementary approaches for devis-
ing such rules, namely, confidence interval and test of significance.
165
Key terms
Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers
166
Review exercise
Part I: choose the best answer and write it on the space provided
2. Regression analysis
a. Is concerned with correlation between one variable on one or more other variables
b. Is concerned with statistical dependency among variables
c. predicts the average value of one variable on the basis of the fixed values of the other vari-
ables
d. B and C
e. All
f. None
3. which one of the following is true about desired properties of estimators as per Markov theorem
a. linear
b. unbiased
c. minimum variance
d. all
e. none
4. which one the following is among estimation mechanism for coefficients of simple linear model
a. ordinary least square
b. method of moments
c. maximum likelihood method
d. all
e. none
5. Why OLS is more powerful and popular method of estimation
a. the estimators obtained using OLS method have some optimal properties
b. simple computational procedures
c. easily understandable
d. essential components of most econometrics technique
e. all
f. none
6. As per OLS method
a. sum of error terms are minimized
b. sum of square of error terms are minimized
c. sum of deviation of error terms from their mean is minimized
d. all
e. none
167
7. What do R2 measure?
a. it measures goodness of fit
b. the extent to which the regression model capture the variation in total Y
c. correlation between two variables
d. covariance between two variables
e. A and B
f. all
g. none
Part II: Do the following questions clearly
1. The following results have been obtained from a sample of 11 observations on the value of sales
(Y) of a firm and the corresponding prices (X).
X = 519.18 Y = 217.82
N 1 2 3 4 5 6 7 8 9 10 11 12
X 9 12 6 10 9 10 7 8 12 6 11 8
Y 6 76 52 56 57 77 58 55 67 53 72 64
9
then:
Y 2
, X 2
, XY , y . x . xy
2 2
a. compute
Y Bo B1 X i U i
b. assuming obtain OLS estimators of Bo and B1
c. write fitted model and interpret the results
d. compute variance of Bo and B1
e. compute R2
Y Bo B1 X i U i
3. Given simple linear regression model of the form and all classical linear re-
gression assumptions, show that OLS estimator satisfy BLUE property specifically
B1
a) is linear
168
B1
b) is unbiased
c)
0 has minimum variance
4. The following table includes the gross national product (X) and the expenditure on food (Y)
measured in arbitrary units in an underdeveloped country over ten-year period 1960-9.
Table 2.19: gross national product (X) and the expenditure on food (Y)
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
a. Estimate the food function Y = β 1+ β 2X + u
b. What is the economic meaning of your results?
c. Compute the coefficient of determination and find the explained and unexplained varia-
tion in the food expenditure.
d. Compute the standard errors of the regression estimates and conduct tests of significance
at the 5 percent level of significance.
e. Find the 99 percent confidence interval for the population parameters.
5. Given the following estimated consumption function
^ =5,000 + 0.8 Y t
2
C r =0.95
t
(500) (0.09) n=15
Where C = consumption expenditure; Y = income.
a. Evaluate the above-estimated function on the basis of (i) the available economic theory,
(ii) statistical criteria r2 and t tests on the 𝛽’s. (use 𝛼 = 5 percent).
b. Estimate the savings function.
c. Estimate the MPC and MPS.
d. Interpret the constant intercepts of the consumption and saving functions.
e. Forecast the level of consumption and the level of saving for 1980, if in that year income
is $200,000.
169
Appendix 2.1:
Proof of BLUE property
The detailed proof of these properties are presented in Appendex 2.1
2.5.2.1. Linearity
Definitions: An equation is linear with respect to a set of variables Z if it can be written in the form
, but does not depend on Y. But it is non-linear with respect Z because the term
cannot be written as .
Definition: An estimator is linear if the equation that defines it is linear with respect to the depen -
dent variable Y. Otherwise it is non-linear.
Prepositions: The least square estimator is linear because we can rewrite the formula for the least
i. .Linearity: (for )
is linear in Y
Similarly
ii. is linear in Y?
170
Substitute we obtain
2.5.2.2. Unbiased4ness:
E( θ^ )−θ=the amount of bias and if θ^ is the unbiased estimator of θ then bias =0 i.e.
E( θ^ )−θ=0 ⇒ E ( θ)=θ
^
An estimator is clearly desirable because it means that on average, the estimated values will be the
true value even though in particular sample this may not be so. Put in other way if an estimator
is unbiased then the expected difference between and is zero; the estimates does not tend
to be too high or too low. In other word, if an estimator is unbiased, then we know that, although
the estimator probability does not take the true value, it does not tends to be off in one particular
direction or the other. If it is biased, then it tend to give an answer that is either too high ( if the
expected difference is positive) or too low ( if the expected difference is negative).
A.
We know that
,
Σx i Σ ( X − X̄ ) ΣX − n X̄
Σk i = = =
Σx 2
i
2
Σx i Σx 2
i
171
Because
To proof the numerator in the above equation is equal to ∑ x 2i follow the procedure below.
−
∑ x 2i =∑ ( X i− X )2
− −
=∑ ( X 2i −2 Xi X + X2 )
− −
= ∑ X i −2 X ∑ X i + n X
2 2
− ∑ Xi
∑ X 2i −2 X ∑ X i +nX n
=
−
=∑ X 2i −X ∑ X i
Finally
∑ x 2i =∑ x 2i
Therefore, equation 2.87 will be
172
¿
β1
Equation 2.90 is thus read as; the mean of the OLS estimate is equal to the true value
β1
of the parameter . Therefore, is unbiased estimator of .
B.
From the proof of linearity property under 2.2.2.3 (a), we know that:
, Since
,
=∑ ( 1 n − X̄ k i )ui
…………………………………… (A2:6)
is an unbiased estimator of
2.5.2.3. Consistency
Definition: an estimator is consistent as sample gets arbitrary large, the difference between the
estimated value and the true value gets arbitrary small. Otherwise, it is inconsistent. This prop-
erty assures as data becomes enough large , the probability of damagingly large error reduce and
173
become as close to 0 as we need to. It gives us some assurance that is in some sense close
to true .
Intuitive term, consistency is the property that the estimator converges to the true value as the sam-
ple size increased indefinitely. However, this property has its limits as well , because we never
have an infinite amount of data, or even as large an amount as wide like to have. For finite sam-
ples of data, the probability of being in any given range around the true value is less than one,
and we do not know exactly how much less. We would like to use an estimator as likely as
possible to be close to the true value. Even if it’s not very likely, it is the best option we have
if our estimator has a better chance than any other estimator.
The estimator presented. The estimator presented above is clearly not consistent because increasing
the sample size does not affect it.
(Minimum variance of
2.5.2.4. Efficiency )
Now, we have to establish that out of the class of linear and unbiased estimators of ,
possess the smallest sampling variances. For this, we shall first obtain variance of
then establish that each has the minimum variance in comparison of the variances to other
linear and unbiased estimators obtained by any other econometric methods than OLS.
I. Deriving variance of
¿
[ ]
2
∑ xi ∑ x2i 1
∑ k 2i = = =
∑ x2i ∑ x i ∑ x i ∑ x 2i
2 2
174
¿
--------------------------------------------------------------------- (A2:8)
----------------------------------------------------(A2.9)
Expanding the right hand side of equation A2.9 gives
------------------------------------------------(A2.10)
¿
β 0 =Y −β 1 X
but
¿ − −
β 0 =Y −k i Y i X
[ ]
¿ −
1
β 0 =∑ −X k i Y i
n
Therefore,
175
[ ( )] 1 −
¿
var ( β 0 )=var ∑ n
− X ki Y i
------------------------------------------(A2.12)
∑[ ]
2
¿
1 −
var( β 0 )= − X k i var(Y i )
n
2 2
But, var(Y i )=E[ Y i−E (Y i )]=E( μi )=σ μ
[ ]
−
− 2
¿
1 X ki
var ( β 0 )=σ 2μ ∑ n 2
−
n
+ X 2 k 2i
1
∑ k 2i =
Since ∑ k i=0 and ∑ x 2i , we obtain
[ ] [ ]
− −
var ( β 0 )=σ 2μ
¿
1
+
X 2
=σ 2μ
∑ 2
xi + n X
2
n ∑ xi2
n∑ 2
xi
-------------- (A2.13)
− −
Since, we know ∑ x i =∑ ( X i− X ) =∑ X i −n X ,
2 2 2 2
[ ]
− −
¿
var( β 0 )=σ 2μ
∑ X i −n X +n X
2 2 2
n ∑ x 2i
var( 0 ) 2
X i
2
------------------------------- (A2.14)
2
n x i
[
=Ε Σ (
1
− X̄ k i ) u2i
n
2
]
2
=∑ ( 1 n − X̄ k i ) Ε( ui )2
=σ 2 Σ( 1 n − X̄ k i )2
2
=σ2 Σ ( 1 − X̄ k i + X̄ 2 k 2i )
n2 n
=σ 2 Σ( 1 n −2 X̄ n Σki + X̄ 2 Σk2i )
, Since ∑ k i=0
=σ 2 ( 1 n + X̄ 2 Σk 2i )
1 X̄ 2 Σx 2i 1
=σ 2 ( + ) Σk i2= =
n ∑x 2 ( Σx 2i )2 Σx 2i
i , Since
Again:
176
( )
1 X̄ 2 Σx 2i +n X̄ 2 ΣX 2
+ = =
n Σx 2 nΣx2i nΣx2i
i
…………………………………………(A2.15)
To establish that possess minimum variance property, we compare their variances with that
of the variances of some other alternative linear and unbiased estimators of , say and
. Now, we want to prove that any other linear and unbiased estimator of the true population parame-
ter obtained from any other econometric method has larger variance those OLS estimators.
1. Minimum variance of
Let ………………………………(A2.16)
where ,
w i≠k i ; but: w i=k i +c i
Since
Y i =α + βX i +U i
But,
w i=k i +c i
Σw i=Σ(k i +c i )=Σki +Σc i
Therefore,
Σci =0 since Σki =Σw i=0
Again Σw i X i=Σ (k i +c i ) X i=Σk i X i +Σci X i
177
Since
Σw i X i=1 and Σki X i=1 ⇒ Σc i X i=0 .
To prove whether has minimum variance or not lets compute to compare with .
var( β∗)=var( Σw i Y i )
=Σw 2 var (Y i )
i
2
since Var (Y i )=σ
Σw 2 = Σ ( k i + ci )2 =Σk 2i + 2 Σk i ci + Σc 2i
But, i
Σc i x i
Σk i ci = =0
⇒ Σw 2i =Σk2i + Σc2i Since Σx2i
Therefore,
……………………A2.17
2
Given that ci is an arbitrary constant, σ Σc 2i is a positive i.e it is greater than zero. Thus
. This proves that possesses minimum variance property. In the similar way we
can prove that the least square estimate of the constant intercept ( ) possesses minimum variance.
2. Minimum Variance of
We take a new estimator , which we assume to be a linear and unbiased estimator of function of
^ is given by:
. The least square estimator α
By analogy with that the proof of the minimum variance property of β^ , let’s use the weights wi = ci + ki
Consequently;
Since we want to be on unbiased estimator of the true α , that is, , we substitute for
178
For α∗¿ ¿ to be an unbiased estimator of the true α , the following must hold.
^
As in the case of β , we need to compute Var ( ) to compare with var( )
var (α∗)=var ( Σ( n− X̄ wi )Y i )
1
=Σ( 1 n− X̄ wi )2 var(Y i )
=σ 2 Σ( 1 n − X̄ w i )2
1 2 1
=σ2 Σ ( + X̄ w i −2 2 ¯ X wi)
n2 n
n 2 1
= σ 2( + Σ X̄ w i −2 X̄ 2 Σw i )
n2 n
,Since
Σw i=0
Σw 2 = Σk 2i + Σc 2i
but i
=σ 2
( ) ΣX 2i
+ σ 2 X̄ 2 Σc 2i
nΣx 2i
…………………………..A2.18
2 2 2
, Since σ X̄ Σci > 0
Therefore, we have proved that the least square estimators of linear regression model are best, linear and
unbiased (BLU) estimators.
Appendix 2.2: Proof of variance
There are three steps to prove the variance.
179
Then
⇒Y =Y^ +e i
…………………………………………………………….A2.19
yi
V. Compute
y i and yˆ i
Now, we have to express in other expression as derived below.
Yi 0 1 X i U i SRF
From:
Y 0 1 X U
We get, by subtraction
yi (Yi Y ) 1 ( X i X ) (U i U ) 1 xi (U U )
yi 1 x (U U )
………………………………………………………….(A2.23)
yˆi
VI. Deriving
We assumed earlier that, (u ) 0 , i.e in taking a very large number samples we expect U to have a
mean value of zero, but in any particular single sample U is not necessarily zero.
Similarly: From;
Yˆ 0 1 x
Y 0 1 x
We get, by subtraction
Yˆ Yˆ ( X X )
1
180
ŷ 1 x
…………………………………………….…………………..(A2.24)
n 2 1n ( (u i2 ) 2u i u j ) i j
n 2 1n n u2 n2 (u i u j )
n u2 u2 ( given (u i u j ) 0)
u2 (n 1)
…………………………………………….…..(A2.26)
ˆ 2 2 2 ˆ
[( ) xi ] xi .( ) 2
b.
1
( ˆ ) 2 var( ˆ ) u2 2
Given that the X’s are fixed in all samples and we know that x
2 1
xi2 .( ˆ ) 2 xi2 . u x 2
Hence
xi2 .( ˆ ) 2 u2
………………………….……………………(A2.27)
[( 1 1 )xi (ui u )] 2[( 1 1 )(xiui u xi )]
c. -2
181
[( 1 1 )(xi ui )] ,sin ce xi 0
= -2
( ) ki ui
But from (2.22) , 1 1 and substitute it in the above expression, we will get:
ˆ
[( )xi (u i u ) 2(k i u i )(xi u i )]
-2
x u
i 2i ( x i u i )
ki
xi
x i
= -2 ,since x i
2
(x u ) 2
2 i 2i
xi
xi 2 u i 2 2xi x j u i u j
2 2
x i
x 2 (u i 2 ) 2( xi x j )(u i u j )
2 2 2
i j
xi xi
2
x (u i )
2
2 2
( given (u i u j ) 0)
x i
2(ui2 ) 2 2
……………………………………………..……….(2.119)
Consequently, Equation (2.38) can be written interms of (2.117), (2.118) and (2.119) as follows:
ei2 n 1 u2 2 2 u2 (n 2) u2
………………..………….(2.42)
From which we get
e 2 1
i E (ˆ u2 ) u2
n 2
u 0
………………………………………………..(2.43) n
ei2
ˆ u2
where ei being the sum of the residual square or residual sum of square ,
2
Since n2
n 2 is the number of degree of freedom (df). The term number degree of freedom means the total
number of observations in the sample ( n) less the number of independent (linear) constraints or re-
strictions put on them. In other words, it is the number of independent observations out of a total of n
observations. There are two condition given by the normal equation for
0 and 1 from
1
n
u 0
1
n
X i , ui 0
182
ei2
ˆ 2
For example, before the RSS. Thus, n 2 is unbiased estimate of the true variance of the error
ei2
ˆ 2
term ( ). The conclusion that we can drive from the above proof is that we can substitute
2
n2
ˆ ˆ 2 uˆi2
Var ( 1 ) 2
xi = (n 2) xi …………………..……………………(2.120)
2
X i2 uˆi 2 X i 2
Var ( ˆ0 ) ˆ 2 2
n xi n(n 2) xi 2
………………..……………(2.121)
Hence, it is assumed that the expected value of the estimated variance is equal to the true variance and
e y i ˆ xi y i
2
e
2 2
i
i
can be computed as .
Appendix A
183
Appendix B
184
Chapter Three: The Multiple Linear Regression Model
185
consumer’s income. In our consumption–income example besides income, a number of other variables
such as wealth of consumers, family size, social norm, etc are also likely to affect consumption expendi -
ture. Therefore, we need to extend our simple two-variable regression model to cover more than two
variables. Adding more variables leads us to the discussion of multiple regression models.
In this chapter we extend the general methodology of econometric analysis used in chapter two to case
with many variables. Accordingly, we start our discussion with multiple linear regression theory and
framework with two and three explanatory variables and then we extend it to k-explanatory variables.
Then, we look at assumptions of the multiple regressions, data, estimation, hypothesis testing, predictions
and reporting. Furthermore, we will proceed our analysis to generalize multiple regression model using
matrix algebra.
One easily extends the case to multiple linear regression models by adding number of explanatory vari-
ables. Let us start with the simplest form of the theory of demand. Economic theory postulates that quan-
tity demanded (say Y) of a given commodity depends on its price (X 1) and consumers’ income (X2) .
That is
Y f ( X1, X 2 )
……………………………………………………………..3.2
Assuming that the relationship between Y, and X1 and X2 is linear, the mathematical model will be:
Y 0 1 X 1 2 X 2
…………………….…………………..………………..3.3.
Equation 3.3 shows that the relationship between Y, and the explanatory variables X 1 and X2, is exact in
the sense that the variations in the quantity demanded of Y are fully explained by changes in price (X 1)
and income ( X2 ).
186
B. Three explanatory variables
One can extend the logic to model with four variables in which there are one explained and three ex -
planatory variables in similar manner. Quantity demanded (say Y) of a given commodity depends on its
price (X1) and consumers’ income (X2), X3 is price of substitute/complement goods. This can be written
as
Y f ( X1, X 2 , X 3 )
……………………….………………………………….3.4
Corresponding mathematical model can be
Yi 0 1 X 1 2 X 2 3 X 3
……………………..… …………………..…….3.5
C. K- explanatory variables case
Adding more variables leads us to the discussion of multiple regression models.
Yi f ( X 1 , X 2 , X 3 ,........, X k )
Economic model: …………….……………….….. …….3.6
Yi 0 1 X 1 2 X 2 3 X 3 ........ k X k
Mathematical model: ………………… 3.7
ence Y.
For two explanatory variables the mathematical model of equation 3.3 will be extended as:
187
Yi 0 1 X 1 2 X 2 U i
………….……………..……………………….……3.9
Yi
Where quantity demanded, X1 own price, X2 is consumer’s income. On a priori grounds, we would
Yi
Where quantity demanded, X1 own price, X2 is consumer’s income, X3 is price of substitute/comple-
ment goods, and ' s are unknown parameters and i is the disturbance term.
u
X ij
However, the most logical way of writing variables is using two subscripts as where i indicating row
and j is indicating column. The coefficients should be linear of degree one. The representation with two
subscript in equation below is designed to convert in to a matrix form ( the last part is relegated to
matrix form of econometrics model)
Yi 0 1 X 1i 2 X 2i ui
Two variable cases: ………… ……………..……………..….3.12
Yi 0 1 X 1i 2 X 2i 3 X 3i ui
Three variable cases: ……………….. …..….………..3.13
Yi 0 1 X 1i 2 X 2i 3 X 3i ...................... k X ki U i
K variable case …….…...........…3.14
(commonaly used )
188
Alternatively,
If our data set is having large information/observations with K explanatory variable the general rep -
resentation for can be
Yi 0 1 X i1 2 X i 2 ui
Two variable cases: …………… ………………………...3.15
Yi 0 1 X i1 2 X i 2 3 X i 3 ui
Three variable cases: ……………. …..…..……...3.16
Yi 0i i1 X i1 i 2 X i 2 i 3 X i 3 ......................ik X ik U i
K-variables case: ..………..3.17
X k i (i 1, 2,3,......., k )
Where are explanatory variables( in case of time series data the subscript t
j ( j 0,1,2,....(k 1))
denotes time and the ith observation), Yi is the dependent variable and are un-
ui
known parameters and is the disturbance term.
The relationship of econometric variables is not as such explained by one equation alone as explained
above. There are many equations at a time that one can run. So we can write it in expanded form
Y1 0 1 X 11 2 X 21 3 X 31 ..................... k X k 1 U1
Y2 0 2 X 12 2 X 22 3 X 32 .................... k X k 2 U 2
……………………3.18
Y2 0 2 X 13 2 X 23 3 X 33 .................... k X k 3 U 3
. . . . . . .
. . . . . . .
Yn 0 1 X 1n 2 X 2 n 3 X 3n .................... k X kn U n
Note that: The coefficients should be linear of degree one. The representation with two subscript in
189
The reason why there is a need for more than one predictor variables may be many. First, if we add more
ysis can
factors to our model more of the variation in Y can be explained. Thus, multiple regression anal
be used to build better ground for predicting the dependent variable. We want to know that individual ef -
fects or the contribution of each independent variable to explain the variation in the response variable.
Secondly, predictors may themselves be correlated. In order to understand the nature of multiple regres-
sion analysis easily, we start our analysis with the case of two explanatory variables, then extend this to
the case of k-explanatory variables.
where i is the population parameters, 0 is referred to as the intercept and 1 and 2 are also some-
times known as partial regression coefficients of the model. In words, (3.21) gives the conditional mean
or expected value of Y conditional upon the given or fixed values of X1 and X2. 2 for example measures
the effect of a unit change in X 2 on E (Y ) when X 1 is held constant. Therefore, as in the two-variable
case, in multiple regression analysis conditional upon the fixed values of the regressors. What we obtain
is the average or mean value of Y or the mean response of Y for the given values of the regressors.
Since the population regression equation is unknown to any investigator, it has to be estimated from sam -
ple data. The sample counterpart of the above population regression function will be
…………………………………………………3.22
As usual we can find the mean of Y or conditional of Y given explanatory variables by taking con -
ditional expectation of Y on both sides
Y
E( i ) 0 1 X 1i 2 X 2i
X 1i , X 2i ……………….………………………….3.23
It explains mean value of Yi given fixed values of regerssors.
190
3.3.3. Interpretation of multiple regression equation
3.4. Estimation
3.4.1. Assumptions of Multiple Regression Model
In order to specify our multiple linear regression models and proceed our analysis some assumptions are
compulsory. We continue to operate within the framework of the classical linear regression model
(CLRMA) introduced in chapter two above with additional compulsory assumption of no perfect multi-
collinearity. More specifically these assumptions are:
1. Linearity of the model in parameter: the model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or not.
2. Randomness of the error term: The variable u is a real random variable.
3. Zero mean of the error term: the random variable u i has a zero mean for each value of X i .
E (u i ) 0
That is,
E (u i ) u
2 2
ui Xi
4. Hemoscedasticity: the variance of each is the same for all the values. i.e. (con-
stant)
Estimating the above regression yields the combined effect of X 1 and X 2 as represented by
α =( β 1 +2 β2 ) where there is no possibility of separating their individual effects which are represented
by β 1 and β 2.
This assumption does not guarantee that there will not be correlations among the explanatory vari-
ables; it only means that the correlations are not exact or perfect, as it is not impossible to find two or
more (economic) variables that may not be correlated to some extent. Likewise, the assumption does
not guarantee absence of non-linear relationships among X’s either.
10. correct model specification: the model has no specification error in that all-important explana-
tory variables appears explicitly in the function and mathematical form in correctly defined(linear
or non-linear form and the number of equations in the model).
11. No error of measurements: explanatory variables are measured without errors.
We can’t exclusively list all the assumptions but the above assumptions are some of the basic assump-
tions that enable us to precede our analysis.
192
1. Which assumptions of classical linear regression model for simple linear regression model and multi-
ple regression models are similar?
2. Why assumption of multicolleanirity is so important when dealing with multiple regression models?
Given the above assumption one can estimate the regression using OLS. To this end let’s estimate the
parameters of the three variable regression model specified above. Suppose that the sample data has been
used to estimate the population regression equation.
A. Actual approach
We leave the method of estimation unspecified for the present and merely assume that equation (3.23)
has been estimated by sample regression equation which we write as:
Yˆ ˆ0 ˆ1 X 1 ˆ 2 X 2
……………………………………………….(3.24)
̂ j j
Where are estimates of the and Yˆ is known as the predicted value of Y.
Given sample observation on with sample relation between Y , X 1 & X 2 , we estimate the model using
the method of least square (OLS).
Y ˆ0 ˆ1 X 1i ˆ2 X 2i ei
…………………………………………….(3.25)
Y Y e i
pressions for the least square estimators, we partially differentiate i with respect to 0 1
e2 ˆ , ˆ and ˆ 2
193
ei2
2 X 1i Yi ˆ0 ˆ1 X 1i ˆ1 X 1i 0
ˆ
1 ……………………. (3.28)
e 2
X Y
2
ˆ0 ˆ1 X 1i ˆ 2 X 2i 0
i
ˆ
2i i
2 ………… …………..(3.28)
ei2
2 Yi ˆ0 ˆ1 X 1i ˆ 2 X 2i 0
ˆ
0
2 Yi ˆ0 ˆ1 X 1i ˆ2 X 2i 0
X 1i Y ˆ
i 0
ˆ1 X 1i ˆ1 X 1i 0
X 2i Y ˆ
i 0 ˆ1 X 1i ˆ2 X 2i 0
X 2i iY ˆ0 X 2i ˆ1 X 1i X 2i ˆ2 X 2i X 2i
194
X 2i iY ˆ0 X 2i ˆ1 X 1i X 2i ˆ2 X 2i 2
…………….……..3.31
Equation 3.29, 3.30, and 3.31 are the multiple regression equation that produces three normal equations:
Y nˆ0 ˆ1X 1i ˆ2 X 2i ………………………………….….(3.30)
X 1iYi ˆ0X 1i ˆ1X 12i ˆ2X 1i X 1i ………………………….….(3.31)
X 2iYi ˆ0 X 2i ˆ1X 1i X 2i ˆ2 X 22i ………………………....(3.32)
From (3.30) we obtain
̂ 0
ˆ Y ˆ X ˆ X
0 1 1 ----------------------------------------------------------- (3.33)
2 2
X Y nY X ˆ 2 (X 1i nX 1i ) ˆ 2 (X 1i X 2 nX 1 X 2 )
2 2
1i i 1i
------- (3.34)
We know that
X Yi (X iYi nX iYi ) xi yi ......................3.34 a
2
i
We can rewrite the above equation 3.34 in divation form using equation 3.35 and 3.36 as follows
x1 y ˆ1x1 ˆ2 x1 x2 …………………………………………...…(3.35)
2
The normal equation (3.30 to 3.32) can be written in deviation form bring (3.35) and (3.36) together as
indicated below
x1 y ˆ1x1 ˆ2 x1 x2 ……………………………………………….(3.37)
2
ˆ ˆ
One can easily solve equation 3.37 and 3.38, to get 1 and 2 using matrix. We can rewrite the above two
equations in matrix form as follows.
x12 x1 x2 1 x1 y
x1 x2 x2 2 x2 y
2
B. Deviation form
Altervatively, it is customary to express these formulas in terms of the deviations of sample observations
from their mean values. Hoping that interested students can drive the formula of OLS estimates in devia-
tion forms we present the procedures we used in chapter 2, as follows. To express the formula of the
OLS estimates in deviation form (deviation of sample observations of the variables from their mean), we
proceed as follows.
Find deviation of
Y Y y ( ˆ0 ˆ0 ) ˆ1 ( X 1 X 1 ) ˆ2 ( X 2 X 2 )
………………………….….3.44
196
Thirdly, first order conditions for minimization requires
……………….…….3.45
……….…….….. 3.46
Rearranging we obtain the normal system of equation with two explanatory variables written in deviation
form, are
x y ˆ x ˆ
2
1i i x x
1 1i 2 1i 2 i
…………………………………..3.49
The terms in the parentheses are the ‘knowns’ which are computed from the sample observations, while
^β and ^β are the unknowns. The known terms appearing on the right-hand side may be written in the
1 2
x1i 2 x x ˆ x1i yi
1i 2 i
1 ..................3.52
x1i x2i x 2i
2
ˆ2 x2i yi
A b
A b
|∑ x 21i ∑ x 1 i x 2 i =|A|
∑ x 1 i x 2 i ∑ x 22i |
197
Finally, by using Cramer’s rule, the formula 0 1
ˆ ˆ and ˆ2 in which the variables are expressed in
terms of deviation form their means will be given as
x1i yi x1i x2i x1i 2 x1i yi
x2i yi x2i 2 x1i x2i
x 2 i yi
1 , and 2
x1i 2
x1i x2i x1i 2
x1i x2i
x1i x2i x2i 2 x1i x2i x2 i 2
Thus
As mentioned earlier, the regression coefficients β1 and β2 are known as partial regression or partial
slope coefficients. The meaning of partial regression coefficient is as follows: β1 measures the change in
the mean value of Y, E(Y), per unit change in X1, holding the value of X2 constant. Put differently, it gives
the “direct” or the “net” effect of a unit change in X1 on the mean value of Y, holding the value of X1
constant. Likewise, β2 measures the change in the mean value of Y per unit change in X2, holding the
value of X1 constant. That is, it gives the “direct” or “net” effect of a unit change in X2 on the mean value
of Y, net of any effect that X2 may have on mean Y.
The multistep procedure just outlined is merely for pedagogic purposes to drive home the meaning of
“partial” regression coefficient. Fortunately, we do not have to do that, for the same job can be accom -
plished fairly quickly and routinely by the OLS procedure discussed in the next section.
198
3.5. Data and Example
One can use different data set depending on the situation.The following three example can help us how
one can use OLS estimate and interprete the coefficient.
Example-1:
Let data on a sample of five randomly drawn persons from a large firms to see how their annual
salaries related to years of education past high school, and years of experiences with the firm they
are working for.
Then
X
a. compute Y , 1 and Y 2
b. compute 1 , 2 , i 1 , 2 , 1 2
x2 x2 yx yx xx
, and 2
c. compute 0 1
Table 3.2: Coffiecent estimation of annual salary, year of education ,and experience relationship
Ob- Annual Years of Years of
serva- salary education experience yi Y Y x1 X 1 X 1 x2 X 2 X 2 x12 x22 x1y x2y x1x2 y2
tion (Y) (X1) (X2)
1 30 4 10 0 -1 0 1 0 0 0 0 0
2 20 3 8 -10 -2 -2 4 4 20 20 4 100
3 36 6 11 6 1 1 1 1 6 6 1 36
4 24 4 9 -6 -1 -1 1 1 6 6 1 36
5 40 8 12 10 3 2 9 4 30 20 6 100
sum 150 25 40 0 0 0 16 10 62 52 12 272
Mean 30 5 8
Solution
199
Y
Y 150 30 X1
X 1
25
5 X2
X 2
50
10
a. n 5 , n 5 . n 5
b. x 1
2
16
, x 2
2
10
, yx
i 1 62
, yx 2 52
, x x 1 2 12
, y 2
c.
ˆ1 x1i ˆ2 x1i x2i x1i yi
2
ˆ116 ˆ212 62
ˆ112 ˆ210 52
Interpretation
ˆ1 0.25, says that one year of education past high school increase by one year salaries increase by
0.25. Likewise , as years of experience increase by one year has an effect increasing salalry by 5.5 birr (
ˆ2 5.5 ) holding years of schooling constant.
Yˆ 23.75 0.25 X 1 5.5 X 2
The regression equation will be
Example 2
The table below contains observations on the quantity demanded (Y) of a certain commodity, its price
(X1) and consumers’ income (2) for a commodity for a period of 7years. Fit a linear regression to these
observations and test the overall goodness of fit (with R 2) as well as the statistical reliability of the esti-
mates ^β 1 , ^β 2 , β^ 3.
200
Then
X
a. compute Y , 1 and Y 2
b. compute x , x
1
2
2
2
, y x , yx , x x
i 1 2 1 2
c. compute
0 , 1 and 2
1
The minor determinant for each parameter is formed by the elements of the determinant left after strik-
ing out the row and column including the parameter
n 10 Y
Y i
800
X1
X 1
600
6 X2
X 2
8000
800
n 10 n 10 n 10
Yi Y y X 1 X 1 x1 X 2 X 2 x2
y 2
3450 x1
2
30 x 2
2
1580000
Y X1 X2
50 15 8
15 17 3
35 19 5
47 18 12
201
40 21 5
22 14 10
36 16 8
55 15 9
d. compute 1 , 2 , i 1 , 2 , 1 2
x2 x2 yx yx xx
, and 2
e. compute 0 1
R
2 ESS
y (Y Y ) 2
TSS y (Y Y )
2 2
RSS e 2 y 2 e 2
R2 1 1 i2 i 2 i
TSS yi yi
……………………………………………….3.59
y, ˆ1 x1i ˆ2 x2i and ei
One can alternatively using deviation from the mean . Accordingly,
y ˆ x ˆ x e
1 1i 2 2i i
y ˆ1 x1i ˆ2 x2i
y
y ei
ei y
y
ei 2 ei ei
As in simple regression, R2 is also viewed as a measure of the predictive ability of the model over the
sample observations or as a measure of how well the estimated regression fits the data. The value of R 2 is
Yˆ & Yt
also equal to the squared sample correlation coefficient between . The higher R2 indicates that there
Yt Yˆt
is a close association between the values of and the values of predicted by the model, . In this case,
the model is said to “fit” the data well. If R 2 is low, there is less or no association between the values of
Yt Yˆt
and the values predicted by the model, . Hence, the model does not fit the data well.
Like r2, the value R2, ranges from 0 to 1 (i.e., 0 ≤ R2, ≤ 1), where R2 = 1 implies that the fitted regression
line explains 100% of the variations in the dependent variable, while R2 = 0 implies that the model does
not explain any of the variations in the dependent variable. However, R 2 lies between two extremes. The
higher the value of R2, the greater would be the percentage of the variation of Y explained by the regres-
sion plane.
The general formula for R2 can be developed by inspecting the formula of R2 for two-variable and three-
variable models. Recall, R2 for a model with one explanatory variable (sometimes known as two-variable
model) studied in chapter two
Y ˆ0 ˆ1 X 1i ei
y ˆ1 x1i
ESS ˆ1x1i yi
R 2 y , x1 ................................................................3.62
TSS y 2
203
Three variable cases
variable,
x2i
has one extra term formed from the product of
̂ 2 and x2i yi . Generally, by inspecting the
formula of R2 from the three variable models given above, we see that for each additional explanatory
variable, the formula of R2 includes an additional term in the numerator, where the additional term is
formed from the product of the estimate of the parameter corresponding to the new variable, and the sum
of the product of the deviations of the new variable and the dependent variable.
Example: Refer to example 3.3 above determine the R2 and what percent remains unexplained?
Formula
ESS ˆ1x1i yi ˆ2x2i yi 0.25(62) 5.5(52)
R 2 y , x1 , x2 0.57
TSS y 2 y 2
The result shows that 57% of the variations in the quantity supplied of a commodity under considera -
tion is explained by the varation in the price of the commodity and 43% remained unexplained by
Where
Y ˆ0 ˆ1 X 1i ˆ2 X 2i ..........................ˆK X Ki ei
, and
y ˆ1 x1i ˆ2 x2i ........................... ˆK xKi
204
The important property of R2 is that it is a non-decreasing function of the number of explanatory variables
in the model. That is, as the number of explanatory variables increases, the value of R2 almost invariably
yi 2
increases and never decreases. Furthermore, in the formula of R2 is independent of the number of ex-
ei 2
planatory variables (because it is simply , but depends on the number of regressors in
the model. Intuitively, it is clear that as the number of explanatory variables increases, the residual term,
ei 2
is likely to decrease (at least it will not increase); hence R2 will increase. This suggests that in com-
paring two regression models with the same dependent variable but differing number of explanatory vari -
ables, one should worry of choosing the model with the highest R2.
2
B. Adjusted Coefficient of Determination ( R )
2
One difficulty with R is that it can be made large by adding more and more variables, even if the vari-
ables added have no economic justification. As the variables are added the sum of squared errors (RSS)
2
goes down (but rarely remain unchanged) and thus R goes up. If the model contains n-1 variables then
R 2 is nearly one. The manipulation of model just to obtain a high R 2 is not wise. An alternative measure
2 2
of goodness of fit, called the adjusted R and often symbolized as R , is usually reported by regression
programs. Generally, to compare two R2 terms, one must take into account the number of explanatory
variables in the model and adjust R2 for degrees of freedom. It is computed as:
ei2 / n k n 1
R 2 1 1 (1 R 2 )
y / n 1 n k -------------------------------- (3.65)
2
ei2 ei2
R2 1 1 R2
y 2
y 2
Note: The term adjusted means that R2 is obtained by adjusting R2 for the degrees of freedom (df) associ-
ated with the sums of squares ei and y entering equation 3.59.
2 2
But,
ei2 n 1
R2 1
y 2 n k
205
ei2 n k
(1 R )
2
y 2
n 1
ei2 ei2
y 2 y 2
nk
1 R2 (1 R )
2
n 1
n 1
1 R 1 R
2 2
nk
n 1
R 2 1 (1 R 2 ) 3.66
nk
2
Our R can be written in terms of sample variance as follows;
This measure does not always goes up as number of variable included or added increase because of the
degree of freedom term, n-k, in the denomerator. As the number of variables k increases, RSS goes
2
down, so does n-k leading adjusted R to increase. It is immediately apparent from equation 3.66 and
3.61 is that the adjusted R2 increases less than under unadjusted R 2. It can assume negative, although
2
R is necessarily non-negative. In case R
2
turn out to be negative in an application, it value is taken
as zero. While solving one problem related to goodness of fit, unfortunately another problem introduced
2
is related to lose in its interpretation; R is no longer the percent of variation explained.
2
This modified R is sometimes used and misused as a device for selecting the appropriate set of explana-
tory variables. Which R2 should be used in practices? There is no consensus among scholars. As Thiel
2
(2001) notes that it is good practice use R than R2. Because R2 tend to give optimistic picture of
the goodness of fit when the number of explanatory variables are not small compared to the num -
ber of observations. But such reason is not universally shared and no theoretical justification for its su-
2
periority. Furthermore, Goldberger (2005) argues that the modified R ( R ) will do well.
2
mod ified R 2 (1 k / n) R 2
206
His advice is to report R2 along with n and k and let the researcher decide which R 2 to take. Despite this
advice, it is the adjusted R2, as given in 3.66 reported by most statistical packages along with the
convectional R2.
yi 2.20 0.104 x1i 3.48 x2i 0.34 x3i
(3.4) (0.005) (2.2) (0.15)
measures the degree of linear association between two variables. For the three variables models, we can
ry , x1 , ry , x2 , rx1 , x2
and . That is;
compute three correlation coefficients:
ry , x1
= correlation coefficient between Y and X1
ry , x2
= correlation coefficient between Y and X2
rx1 , x2
= correlation coefficient between X 1 and X3
207
These correlation coefficients are called gross or simple correlation coefficients or coefficients of zero or-
ry , x1 ,
der. Does the simple correlation coefficient, say measure the “true” degree of (linear) association
between Y and X1 , when a second explanatory variable X2 is associated with both Y and X1 kept con-
stant. No. To answer this question, suppose the true sample regression model is:
Y ˆ0 ˆ1 X 1i ˆ2 X 2i ei
………………………………………….3.67
But, suppose for some reason we omit X2 from the model and regress Y on X1 as follows:
Then, we can put the above question in other words as: Will true coefficient of B1 in equation 3.67 be
flect the true degree of association between Y and X 1 in the presence of X2. It is likely to give false im-
pression of the nature of association between Y and X 1. Therefore, what we need is a correlation coeffi-
cient independent of the influence of X 2 on X1 and Y, if any. Such a correlation coefficient is known as
partial correlation coefficient (or first order correlation coefficient). The partial correlation coefficient is
determined in terms of the simple correlation coefficient among the various variables involved in a
ry , x1 , x2
multiple relationship. For example, is the partial correlation coefficient between Y and X 1, hold-
ing X2 constant.
Accordingly, we are requested to measure three partial correlation coefficients. These are
i. partial correlation coefficient between Y and X1 ,, keeping the effect of X2 constant is given
ii. the partial correlation between Y and X2 keeping the effect of X1 constant is given by
ryx2 ryx1 rx1 x2
ry , x2 , x1
(1 ryx1 2 )(1 rx1 x2 2 )
………………………………………3.70
iii. the partial correlation between X1and X2 keeping the effect of Y constant is given by
208
rx1x2 rx1 y rx1 y
rx1 , x2 , y ,
(1 rx1 y 2 )(1 rx2 y 2 )
……………………………..………..3.71
The correlation coefficients given in equations 3.69 to 3.71 are also called first order correlation coeffi -
cients. By order of the correlation coefficients we mean the number of secondary variables or the number
of variables held constant when we are analyzing the effect of one variable on some other.
Interpretations of partial correlation coefficient have certain implication. In the two-variable (explanatory
variable) case, r had a straightforward meaning: It measured the degree of linear association (and not cau-
sation) between the dependent variable Y and the single explanatory variable X. But once we go beyond
the two-variable case, we need to pay careful attention to the interpretation of the simple correlation coef-
ryx1 ryx2 rx1 x2
ry , x1 , x2
(1 ryx2 2 )(1 rx1 x2 2 )
ficient. From ( 3.69 , 3.70 and 3.71), for example, we observe the following: .
ryx1 0 ryx1 . x2 r r
Even if , will not be zero unless yx1 or x1 x2 both are zero.
The properties of OLS estimators of the multiple regression model parallel those of the two vari -
able model. More specifically;
I. Regression line passes through the means Y , X 2 , and X 3 . This property holds generally for
k-variable linear regression model [a regressand and (k− 1) regressors]
Yi 0 1 X 1i 2 X 2i ....... k X ki ui
209
…………………………………………….3.73
where as small letters indicate values of the variables as deviations from their respective means.
Summing both sides over the sample values and dividing through by the sample size, n, gives
III. Given the assumptions of the classical linear regression model, OLS estimators of the partial re -
gression coefficient not only are linear and unbiased , but also have minimum variance in the
class of all linear unbiased estimators.
A. Mean
The mean of the estimates of the parameters in the three-variable model is derived in the same way as in
the two-variable model. The estimates ^β 0, ^β 1, and ^β 2 are unbiased estimates of the true parameters of the
relationship between Y, X1 and X2. That is, their expected value or average is equal to their respective
true parameter itself.
…………………………..3.74
B. Variances of the parameter estimates and their formula
In the preceding chapter we have developed the formula of the variances of the estimates with one and
two explanatory variables. We need standard error for two main purpose; to establish confidence in-
terval and to test statistical hypothesis. The variance of the parameter estimates with two explanatory
variable are obtained by the following formula
var( 0 ) ui 2
X i
2
var( 0 ) ui 2
1 )
var(
2
(
1
) ..................3.75
x xi 2 i
i
i
2
x 12
Ri
2
The variance of the parameter estimates with two explanatory variable are obtained by the following for -
mula
^
var ( β 0 )= σ^
[2 1
n
+
(∑ x 2 )(∑ x2 )−(∑ x x )
1i 2i 1i
2
2i
]
X 21 ∑ x 22i + X 22 ∑ x 21 i−2 X 1 X 2 ∑ x 1 i x 2 i
…………………………3.76
210
var ( ^β 1) =σ^
2 ∑ x 22 i
2 ……………………….………..……….3.77
(∑ x 21 i )(∑ x 22 i )−(∑ x 1 i x 2 i)
var ( ^β 2) = σ^
2 ∑ x 21 i
2 ……………..………………..………3.78
(∑ x 21 i )(∑ x 22 i )−(∑ x 1 i x 2 i)
where σ^ 2=∑ u^ 2i /(n−K ), K being the total number of parameters which are estimated.
One can derive variance formula from the normal equation presented in eq-3.51 and eq 3.52. For better
understanding lets reproduced the normal equation as follows
x1i 2 x x ˆ x1i yi
1i 2 i
1
x1i x2i x 2i
2
ˆ2 x2i yi
A b
A b
The variance of each parameter is the product of multiplied by the ratio of the minor determinant 1 as-
2
^
var ( β 1) =σ .
| ∑ x 21 i ∑ x 1i x 2i
2 ∑ 1i 2i
x x ∑ x 22 i 2
=σ .
|∑ 2
x2 i 2
=σ .
∑ 2
x2 i
…..........…3.79
| ∑ x 21 i ∑ x 1i x 2i
∑ x 1i x 2i ∑ x 22 i | |
∑ x21 i ∑ x 1 i x 2 i
∑ x 1 i x 2 i ∑ x22 i |
(∑ x21 i )(∑ x 22 i )−¿ ¿ ¿
^
var ( β 2) =σ .
| ∑ x 21 i ∑ x 1i x 2i
2 ∑ 1i 2i
x x ∑ x 22 i 2
=σ .
|∑ 2
x1 i 2
=σ .
∑ 2
x1 i
……….…3.80
| ∑ x 21 i ∑ x 1i x 2i
∑ x 1i x 2i ∑ x 22 i | |
∑ x21 i ∑ x 1 i x 2 i
∑ x 1 i x 2 i ∑ x22 i |
(∑ x21 i )(∑ x 22 i )−¿ ¿ ¿
Therefore, inspection of the above equation tells us that the variance of estimates can be computed
from the ratio of two determinants.
211
a. determinate appearing in the numerator is the minor formed after striking out the
row and column of the term corresponding to the coefficient whose variance is
being computed
b. the determinants appearing in the denominator is the complete determinants on
|
∑
|
∑ x 1 i x 2 i =|A|
2
x 1i
the right hand side of the normal equation.
∑ x 1 i x 2 i ∑ x 22i
There is relationship between R2, variance, partial regression coefficients. Let us note the relationship
between R2 , the variance, and partial regression coefficient in the multiple regression model as indi-
cated in the below
Where is the partial regression coefficient of regressor Xi and Ri2 is the R2 in the regression of
Xi on the remaining (k-2) regerssors.
[ ]
X 1 ∑ x 2i + X 2 ∑ x 1 i−2 X 1 X 2 ∑ x 1 i x 2 i
2 2 2 2
^ 2 1
h. var ( β 0 )= σ^ n + …….……………..……….…….3.81
(∑ x 2 )(∑ x2 )−(∑ x x )
2
1i 2i 1i 2i
Where
212
r12
the correlation coefficient between X 1 and X2 increase as the variance of and
2
is inversely proportional to
r12 x12
, the variance of . That is, the greater the variation
213
in the sample value of X 1, the smaller the variance of , and therefore
1 can be esti-
mated more precisely. A similar statement can be made about the variance of .
ˆ ˆ
iv. We can also express 1 and 2 in terms of covariance and variances of Y , X 1 and X 2
Cov ( X 1 , Y ) . Var ( X 1 ) Cov ( X 1 , X 2 ) . Cov ( X 2 , Y )
ˆ1 3.84
Var ( X 1 ).Var ( X 2 ) [cov( X 1 , X 2 )]2
This section extends the ideas of interval estimation and hypothesis testing to models involving three or
more variables. Although in many ways the concepts developed in Chapter 2 can be applied straightfor-
wardly to the multiple regression models, a few additional features which are unique to such models will
receive more attention in this part.
ple regression models. With the normality assumption the estimators ̂ 0 , ̂1 , and
̂ 2 are themselves nor-
mally distributed with means equal to true β0, β1, and β2. The OLS estimators of the partial regression co-
efficients are best linear unbiased estimators (BLUE). Moreover, the three OLS estimators are distributed
214
independently identically with mean βi and variance of ˆ . Upon replacing σ2 by its unbiased estimator
2
ˆ 2 in the computation of the standard errors, each of the following variables follows the t distribution
with n − 3 df. Therefore, depending on sample size t or Z distribution can be used to establish confidence
intervals as well as test statistical hypotheses about the true population partial regression coefficients.
……………..3.86
ˆ 2
2
Similarly, the χ2 distribution can be used to test hypotheses about the true σ . Furthermore, (n − 3) 2
follows the distribution with n − 3 df (The proofs follow the two-variable case discussed in chapter
2
2).
215
Hypothesis testing about individual regression coefficients
The procedure for testing the significance of the partial regression coefficients is the same as those dis-
cussed for the two-variable case. Since sampling errors are inevitable in all estimates, it is necessary to
apply test of significance in order to measure the size of the error, determine the degree of confidence and
H 0 : i 0
measure the validity of these estimates. Formally, we test the null hypothesis against the al-
H1 : i 0
ternative hypothesis . The logic of hypothesis testing in multiple linear regressions is an ex-
tension of the concept discussed in chapter two with simple linear regression. This can be done by using
various tests. The most common ones are:
i) Standard error test
ii) Test of significance and Confidence interval(Z-test , Student’s t-test )
iii) P-value test
iv) F-test
H o : Bi 0
Namely, the acceptance of the null hypothesis ( ) implies that the explanatory variable to
which this estimate relates does not in fact influence the dependent variable (Y) and should not be in-
cluded in the function. Because the conducted test provided evidence that changes in X leaves Y unaf-
fected, which may mean that there is no relationship between X and Y. Rejection of the null hypothesis
simply means that our estimate comes from a sample drawn from a population whose parameter is dif-
ferent from zero and doesn’t mean that our estimate is the not correct estimate of the true pop-
ulation parameter
0 and 1 .
216
Ho : 2 0 vs. H1 : 2 0
Ho : 1 0
The null hypothesis ( ) states that, holding X2 constant X1 has no (linear) influence on Y. Sim-
Ho : 2 0
ilarly hypothesis ( ) states that holding X1 constant, X2 has no influence on the dependent vari-
able Yi.
ˆ 2 x 22i
SE ( ˆ1 ) var( ˆ1 )
x x 2
1i
2
2i ( x1 x 2 ) 2
ˆ 2 x12i
SE ( ˆ2 ) var( ˆ2 )
x x2
1i
2
2i ( x1 x2 ) 2
ˆ 2 x12i
SE ( ˆ2 ) var( ˆ2 )
x x
2
1i
2
2i ( x1 x2 ) 2
,
…………………3.87
Refers to Example 3.1 and computed standard error of the slopes, the estimate of the variance of the
random term is
ei2 140
ˆ 2 155.6
n 3 12 2
217
,
Step 2: Compare the standard deviations (errors) obtained in step two, with the numerical values of 0
1 and 2
218
The acceptance or rejection of the null hypothesis has definite economic meaning.
SE ( ˆi ) 1 2 ˆi
If , accept the null hypothesis and reject the alternative hypothesis. We conclude
that
̂ i is statistically insignificant.
SE ( ˆi ) 1 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude
If 2
that
̂ i is statistically significant.
Example: suppose that from a sample of size n=30, we estimate the following supply function.
Y 20 0.6 X 1 0.33 X 2
SE : (1.7) (0.025) (0.035)
Test the significance of the slope parameter at 5% level of significance using the standard error
test.
Solution:
SE ( ˆ ) 0.025 ( ˆ ) 0.6 1
2 ˆ 0.3
SE ( ˆi ) 1 2 ˆi
This implies that . The implication is that ̂ is statistically significant at 5% level of sig-
nificance. Refers to example 3.2 to test the significance of X 1 and X2 in determining the changes in
Y using the standard error in summarized tabular
Thus, the test show that the both coefficients are statistically insignificant implying the evidence does
not indicate relationship between variables.
If we invoke the assumption that ui ~N (0, σ2), then we can use the t test to test a hypothesis about any in-
dividual partial regression coefficient. That is, we just use the t-test (or Z-test) to test a hypothesis about
any individual partial regression coefficient. The test simply differ depending on whether we use one
tailed or two tailed hypothesis. The model we are going to test is specified as follows
Y ˆ0 ˆ1 X 1 ˆ2 X 2
I. Test of significance
A. Two tailed
General procedure for two sided alternative hypothesis testing concerning the value of population
parameters involves the following steps;
Step-1: state null hypothesis (H0) and alternate hypothesis (H1). That is,
H 0 : 0 H 0 : 0
Step-2: test statistics: Depending on the sample size and relevant distribution properties (Z, t,
Chi-square and F) compute test statistics of estimates.
Bi Bi
Z for Bi
B0 ~N B0 , se( B0 )
A. Z-test: ( Bi ) then the distribution .
ˆi
t* ~ t n -k
B. t-test: SE ( ˆi ) , If we have 3 parameters, the degree of freedom will be n-3. So;
,
ˆ 2
t*
When 2 0, with n-3 degree of freedom, then the t* becomes: SE ( ˆ 2 ) follows t
distribution with n− 3 df. where: SE = standard error k = number of parameters in the model.
C.
2 n k distribution to test hypotheses about the true σ 2, chi-squar is a especial form of gamma
(n−3) σ^ / σ
2 2
220
Step 3: Select (specify) suitable significance level ( ).It is customary in econometrics to choose
the 5% or 1% level of significance.
Step-4: From the t-table value with n-3 degree of freedom and the given level of significance, find
t c n 2 ( / 2)
such that the area to the right of it is one half the levels of significance. i.e. P (t t*) / 2
.
Step-5: To decide on the rejection or acceptance of the null hypothesis, the test compares the value of Z
obtained from the sample estimates with the critical values of Z at the given significance level. If Zc >
Zt, reject null hypothesis implying that the test is statistically significan, accept Ho otherwise. If t-dis-
tribution is used compare t * (critical value of t) and tc (the computed value of t): reject H0 if
and conclude that ̂ is significantly different from at the level accept other-
tc t * or tc t *
wise.
B.One tailed
The procedure for one sided alternative testing follow the following steps
H 0 : 0 H 0 : 0
Step-1:
Step-2: compute test statistics using the relevant type of distribution
Step-3: define level of significance ( )
Step-4: From the t-table value with n-3 degree of freedom and the given level of significance, find the
t *n 2 ( )
point such that the area to the right of t* is equal to the level of significance. i.e. P (t t*) .
Step-5: compare t* (critical value of t) and tc (the computed value of t) : reject H0 if
0 se( 0 ) 1 1
B. Estimate, , and , and se ( ) in the usual way
221
C. Choose a significance level, . This is equivalent to choosing a (1-)100% confidence
interval, i.e. 5% significance level = 95% confidence interval
D. Identify relevant distribution depending on sample size
E. Choose the location of the critical region: After defining the relevant distribution one
should compute the range of values of the test statistic which results in a decision to ac-
cept or reject the null hypothesis. That is, establish confidence interval with the probabil-
ity distribution of the test statistic is 100(1 − α) %. These are the theoretical (tabular)
value(s) that defines the boundary of the critical region or the acceptance region of
OLS estimators with certain degree of confidence. It is constructed in similar fashion
with the one in chapter. The decision of whether to choose a one tail or a two-tail
critical region depends on the form in which the alternative hypothesis is ex-
pressed.Most of the time is for two tailed test and interval ranges can be constructed
depending on the type of distribution:
{ }
^β −β
i i
−Z α < < Z α = 1- α…………………………………………………..……………..3.88
2
σ ( ^β ) 2
i
{^ ^
P βi−Z α σ ( ^β )< β i< β i + Z α σ ( ^β ) = 1- α
2
i
2
i }
Thus the (1- α)100 percent confidence interval for β i is
^β −Z σ ^ < β < ^β + Z σ ^
i α (β) ii
i α (β ) i or β i = ^β i ± Z α σ ( ^β )
i
2 2 2
β^ i
ii. t- distribution: t= ∼ t n− K , K = 𝓀 + 1 = number of variables. In a two-tail test at
se ( β^ )i
level of significance, the probability of obtaining the specific t-value either –tc or tc is 2 at
ˆi i
t*
Pr t / 2 t* t / 2 1 SE ( ˆi ) ………………………….(3.89)
, but
Substitute (2.58) in (2.57) we obtain the following expression.
ˆ i
Pr t / 2 i t / 2 1
SE ( ˆi ) …………………………………..………(3.90)
222
Pr ˆi SE ( ˆi )t / 2 i ˆi SE ( ˆi )t / 2 1 int erchanging
Then, the 100(1-𝛼) percent level of confidence interval for β i will be given by
β i= ^β i ± t α / 2 ,n− K se ( ^β i ) ∀ i ∈ [ 1 , K ] ………………………………..……3.91
The unknown population parameter, β i, will lie within the defined limits or the confidence
interval ^β i ± t α se( ^β ) with n – K degrees of freedom (1- α)100 times out of 100.
i
2
where the χ2 value in the middle of this double inequality is as given by the above equation and
2 2
where χ 1−α /2 and χ α /2 are two values of χ2 (the critical χ2 values) obtained from the chi-square
σ^
2
table for n − 2 df. Substituting χ2 = (n-2) 2 and rearranging the terms, we obtain
σ
{ }
2 2
σ^ 2 σ^
P ( n−2 ) 2
< σ < ( n−2 ) 2 = 1-𝛼……………………………………………………..3.93
χ α /2 χ 1−α /2
F. Step 5: make decision or draw an engineering conclusion: The decision to accept or reject the
hypothesis is made usually by comparing test statistics with critical region. If the hypo-
thesised value of (*) lies outside the confidence interval, then reject the null hypothesis that
H 0 i 0
of the parameter under the null hypothesis ( ) lies in this confidence region, the region of ac-
the estimates are not statically different from zero. If the hypothesized value of
i in the null hy-
pothesis is outside the limit, reject H0 and accept H1. The above test can be illustrated graphically.
223
Figure 3.1: Two tailed confidence interval for normal distribution
To illustrate the mechanics with example consider the demand for certain commodity is assumed to be
ˆ ˆ ˆ
function of income(X2) and Price(X1) . Y 0 1 X 1 2 X 2 ei
The null hypothesis states that with X2 (income ) held constant, X1 (price) has no (linear) influence on Y
(quantity demand). If the regession result is found to be
Y 20 0.0056 X 1 0.85 X 2
For our illustrative example and noting that β2 = 0 under the null hypothesis, we obtain
0.0056
t 2.8187....................................................3.94
0.0020
Notice that we have 64 observations. Therefore, the degrees of freedom in this example are 61. If you re -
fer to the t table, we do not have data corresponding to 61 df. The closest we have are for 60 df. If we use
these df, and assume α, the level of significance of 5 percent, the critical t value is 2.0 for a two-tail test
(look up tα/2 for 60 df ) or 1.671 for a one-tail test (look up tα for 60 df).
Therefore, we use the two-tail t value. Since the computed t value of 2.8187 (in absolute terms) exceeds
the critical t value of 2, we can reject the null hypothesis that price has no effect on quantity demand.
For our example, the 95% confidence interval for β2 is:
that is
0.0096 2 0.0016...............................................................3.96
,
the interval, −0.0096 to −0.0016 includes the true β2 coefficient with 95% confidence coefficient. That is,
if 100 samples of size 64 are selected and 100 confidence intervals like above are constructed, we expect
95 of them to contain the true population parameter β2. Since the interval above does not include the null-
hypothesized value of zero, we can reject the null hypothesis that the true β2 zero with 95% confidence.
Thus, whether we use the t test of significance as in or the confidence interval estimation we reach the
224
same conclusion. However, this should not be surprising in view of the close connection between confi -
dence interval estimation and hypothesis testing.
P value P (t tc )
………………………………….3.97
This probability is that same as the probability of type I error or the probability of rejecting a true hy -
pothesis. A high value for this probability implies that the consequences of rejecting the true Ho
are sever. A lower p-value implies that the consequence of rejecting a true H 0 erroneously are not
very sever(that is, the probability of making a mistake of type-I is low). Hence, we are “safe” in re-
jecting Ho. The decision rule is, therefore to accept Ho ( that is not reject it) if the p-value is too
high ,say more than 0.10, 0.05, or 0.01. In other words, if the P-value is higher than the specified
level of significance(say ) , we conclude that the regression coefficient is not significantly greater
0
than at level of siginficance . If the P-value is less than we reject Ho and conclude that
0
is significantly greater than .
P (t t*)
The t-test and P-value are equivalent. If is less than the level, then the point correspond-
t *n 2 ( ).
ing to tc must necessarily be to the right of This means that t* will fall in the rejection
P (t t*)
region. Similarly, if , then t* must be to the left of t c and hence fall in to acceptance
225
t *n 2 ( )
region. We accept Ho. In other word, find the point such that the area to the right of it
P (t t*)
equals to the level of significance, ,the test is not conclusive
Example: One can simply use the p value given in above example is 0.0065. The interpretation of this p
value (i.e. the exact level of significance) is that if the null hypothesis were true, the probability of obtain-
ing a t value of as much as 2.8187 or greater (in absolute terms) is only 0.0065 or 0.65 percent, which is
P (t t*)
indeed a small probability, much smaller than the artificially adopted value of α = 5%. Hence,
=0.0065 is less than the level 5%,we accept Ho.
Y X1 X2
49 35 53
40 35 53
41 38 50
46 40 64
52 40 70
59 42 68
53 44 59
61 46 73
55 50 59
64 50 71
Throughout the previous section we were concerned with testing the significance of the estimated partial
regression coefficients individually. That is, under the separate hypothesis that each true population par -
tial regression coefficient was zero. There is also a joint test of significance in which null hypothesis is
a joint hypothesis that β1 and β2 are jointly or simultaneously equal to zero. A test of such a hypothesis is
226
called a test of overall significance of the observed or estimated regression line. That is, whether Y is lin-
early related to both X1 and X2. It can be stated as:
0 1 2 0 …………………………………………..3.98
Can the joint hypothesis be tested by testing the significance of and individually? The answer is
no. Because, in testing the individual significance of observed partial regression coefficient above, we
assumed implicitly that each test of significance was based on a different (i.e., independent) sample.
Thus, in testing the significance of under the hypothesis that β2 = 0, assumed tacitly that the testing
was based on a different sample from the one used for testing the significance of under the null hy-
pothesis that β1 = 0.
But, if we use the same sample data in test in the joint hypothesis testing for the two or more coefficient,
we shall be violating the assumptions underlying the test procedure. Although the statements of confi-
dence interval are individually true in a sense it derived using different sample, it may not hold true in
case of overall test of significance. That is, the 1 probability that the intervals involve the coefficient
̂1 is
Pr ˆ1 SE ( ˆ1 )t / 2 1 ˆ1 SE ( ˆ1 )t / 2 1
……………………………………………..3.99
Total (TSS) y i
2
n-1
When K is number of coefficient in the model , the number of restriction imposed is equal to k-1,
which is related to . Accordingly, ESS has 2 df since it is a function of and , RSS has
n− 3 df, and TSS has n-1 df . Now, F testisics with 2 and n− 3 df under the assumption of normal dis-
……………………………….. 3.102
228
ui N (0, 2 )
It can be proved that under the assumption that the
…………………………………….………... 3.103
………………………………….……… 3.104
Therefore, if the null hypothesis is true, both (3.103) and (3.104) give identical estimates of true σ2. This
imply that if there is a trivial relationship between Y and X1 and X2, the sole source of variation in Y is
due to the random forces represented by ui. If, however, the null hypothesis is false, that is, X1 and X2
definitely influence Y, the equality between (3.103) and (3.104) will not hold. In this case, the ESS will
be relatively larger than the RSS, taking due account of their respective df. Therefore, the F value of
(3.102) provides a test of the null hypothesis that the true slope coefficients are simultaneously zero. If
the F value computed from (3.102) exceeds the critical F value from the F table at the α percent level of
significance, reject H0; otherwise accept it. If the computed value of F is greater than the critical value of
F (k-1, n-k), then the parameters of the model are jointly significant or the dependent variable Y is lin-
early related to the independent variables included in the model. Alter+natively, if the p value of the ob-
served F is sufficiently low, we can reject H0.
Example: Given the ANOVA data in table 3.8 for certain hypothetical data compute the F-test
3. what is F statistics
F0.05 (3,143) 6.63
4. if F tabulated with 3,143 at =0.05 is ( ) then what will be your overall signific-
ance test decision
There are difference between R2 and ANOVA. But there is an intimate relationship between the coeffi-
cient of determination R2 and the F test used in the analysis of variance. Assuming the normal distribu-
tion for the disturbances term ui and the null hypothesis that , we have seen that
ESS
F 2
TSS
n 3 …………………….………………………….........3.106
is distributed as the F distribution with 2 and n− 3 df.
230
Sources of variation Sum of Square(SS) Degree of free- Mean Sum of square
dom(df)
Due to regression (ESS) R 2 ( yi 2 ) 2 R 2 ( yi 2 )
2
Due to residual(RSS) (1 R )( yi )
2 2
n-3 (1 R )( yi 2 )
2
n3
Total (TSS) y i
2
n-1
R 2 ( yi 2 ) R2
F 2 2 .............................3.107
(1 R 2 )( yi 2 ) 1 R2
n3 n3
For the three-variable case becomes
One advantage of the F test expressed in terms of R2 is ease of computation: All that one needs to know is
the R2 value. Therefore, the overall F test of significance given in above equation 3.98 can be estimated
in terms of R2 as shown in Table 3.10.
There are K parameters to be estimated (K = 𝓀+1). Clearly the system of normal equations will consist of
K equations, in which the unknowns are the parameters ^β 1, ^β 2, ^β 3, …, ^β K , and the known terms will be
the sums of squares and the sums of products of all the variables in the structural equation.
The sample counterpart or estimated relationship of the above general equation can be
Y i= β^ 0 + β^ 1 X 1 i+ …+ ^β K X Ki + u^ i…………………………3.109
231
In order to derive the K normal equations without the formal differentiation procedure, we make use of
the following assumptions
∑ u^ i=0 and ∑ u i X j=0 where (j = 1, 2, 3, …, K)
The normal equations for a model with any number of explanatory variables may be derived in a mechan-
ical way, without recourse to differentiation. We will introduce a practical rule of thumb, derived by in -
spection of the normal equations of the two-variable and the three-variable models, then k-explanatory
variables. We begin by rewriting these normal equations.
i. Model with one explanatory variables
Structural form :Y 0 1 X 1i ui
………………………………………………3.110
and ∑ u i X j=0
……………………………………………3.111
Rearrangement of the above eq-3.110 and 3.111 we obtain normal equation as follow;
232
8. Models with two explanatory variables
Structural form :Y 0 1 X 1i 2 X 2i ui
Y X X
0 1 1i 2 2i
YX X X X X
1i 0 1i 1 1i
2
1 1i 2i
YX X X X X
2i 0 2i 1 1i 2i 2 2i
2
We can generalize the procedure above to find the K th normal equations for the K-variables which may
be obtained by multiplying the estimated form of the K-variable model by X Ki. Then, summing over all
sample observations to get normal equations. The estimated form of the model is
…………………………3.112
Finding ui from the from the above
………………….………..3.113
……….……..3.114
By assumption
ui X i 0
multiplication by X Ki yields
Y i X Ki= β^ 0 X Ki + ^β 1 X 1i X Ki + ^β 2 X 2i X K i +…+ β^ K X Ki + u^ i X Ki…………………………..3.115
2
and summation over the n sample observation gives the required Kth equation and u X
i i 0
| |
∑ x 22 i ∑ x 2 i x 3 i … ∑ x 2 i x Ki
∑ x 2 i x 3 i ∑ x23 i … ∑ x 3 i x Ki
⋮ ⋮ ⋮
^
∑ x 2i x Ki ∑ x 3 i x Ki ∑ x 2Ki
var ( β K ) =σ .
2
| |
…........3.117
∑ x 22 i ∑ x 2 i x 3 i … ∑ x 2 i x Ki
∑ x 2 i x 3 i ∑ x23 i … ∑ x 3 i x Ki
⋮ ⋮ ⋮
∑ x 2i x Ki ∑ x 3 i x Ki ∑ x 2Ki
2
^β ∑ y x
1 i 1i
1. Model with one explanatory variable , RY . X =
∑ y 2i
1
2
^β ∑ y x + ^β ∑ y x
1. Model with two explanatory variables, RY . X X = 1 i 1i 2 i 2i
1 2
∑ yi2
2. Accordingly, the formula of the coefficient of multiple determination for the K-variable
model is
2 β^ 1 ∑ y i x 1 i+ β^ 2 ∑ y i x 2 i +…+ β^ K ∑ y i x K
RY . X = ………………………….3.118
∑ y 2i
1 …XK
For each additional explanatory variable the formula of the R-squared includes an additional
term in the numerator formed by the estimate of the parameter corresponding to the new vari-
able multiplied by the sum of products of the deviations of the new variable and the depen-
dent one.
2
R =1−(1−R )
2 n−1
n−K
or
2
R =1−
[ ∑ yi2 /(n−1)]
∑ u2i /(n−K ) …………………………….3.119
where R2 is the unadjusted multiple correlation coefficient, n is the number of sample observations and
K is the number of parameters estimated from the sample. If n is large R2 and R2 will not differ much.
But with small samples, if the number of regressors (X’s) is large in relation to the sample observa-
tions, R2 will be much smaller than R 2 and can even assume negative values, in which case R2 should
be interpreted as being equal to zero.
Other criteria are often used to judge the adequacy of a regression model besides R2 and adjusted R2 as
goodness of fit measures are Akaike’s Information criterion and Amemiya’s Prediction criteria,
which are used to select between competing models.
This null hypothesis in a joint hypothesis are jointly or simultaneously equal to zero. A test of such a
hypothesis is called a test of overall significance of the observed or estimated regression. That is,
vidual significance of
̂ i ’s as the above. In testing the individual significance of an observed partial re-
gression coefficient, we assume implicitly that each test of significance was based on different (i.e. in-
0 , it was assumed
dependent) sample. In testing the significance of ̂1 under the hypothesis that 1
tacitly that the testing was based on different sample from the one used in testing the significance of
̂ 2
2 0
under the null hypothesis that . But to test the joint hypothesis, we shall be violating the assump-
tion underlying the test procedure.
235
The test procedure for any set of hypothesis jointly can be based on a comparison of the Restricted
Residual Sum of Square (RRSS), the sum of squared errors in the model obtained assuming that the
null hypothesis is true and unrestricted residual sum of square (URSS), the sum of the squared error of
the original unrestricted model. If the null hypothesis is true, we expect that the data are compliable
with the conditions placed on the parameters and there would be little change in the sum of squared er-
rors. If these sums of squared errors are substantially different, the data do not most likely support the
null hypothesis. In other word, if the null hypothesis is not true, then the difference between RRSS and
URSS (TSS & RSS) becomes large, implying that the constraints placed on the model by the null hy-
pothesis have large effect on the ability of the model to fit the data, and the value of F tends to be large.
Thus, we reject the null hypothesis if the F test static becomes too large. It is always true that RRSS -
URSS 0.
ei Yi Yˆi
If we set the joint hypothesis that all coefficients are zero. The test of joint hypothesis is that:
H 0 : 1 2 3 ............ k 0
̂ 0
Y i
Y
n (applying OLS)…………………………………………….3.120
e Y ̂ 0
but
̂ 0 Y
e Y Y
236
The sum of squared error when the null hypothesis is assumed to be true is called Restricted Residual
Sum of Square (RRSS) and this is equal to the total sum of square (TSS).
F-distribution with k-1 and n-k degrees of freedom for the numerator and denominator respectively can
RRSS URSS / K 1
F( k 1,n k ) ~
be computed as the ratio: URSS / n K ………………….…………… 3.121
URSS ei2 y 2 ˆ1yx1 ˆ2 yx2 ..........ˆk yxk RRSS
When RRSS TSS and
(TSS URSS ) / k 1
F( k 1,n k )
URSS / n k then it follows that
ESS / k 1
F( k 1,n k )
URSS / n k ……………………………………….………………. 3.122
follows the F distribution with k− 1 and n− k df. Needless to say, in the three-variable case (Y and X2,
X3) k is 3, in the four variable case k is 4, and so on. Most regression packages routinely calculate the F
value (given in the analysis of variance table) along with the usual regression output, such as the esti-
mated coefficients, their standard errors, t values, etc.
The decision can be made by comparing F calculated with the critical value of F which leaves the
probability of in the upper tail of the F-distribution with k-1 and n-k degree of freedom. If the com -
puted value of F is greater than the critical value of F (k-1, n-k), then the parameters of the model are
jointly significant or the dependent variable Y is linearly related to the independent variables included
in the model. i.e F > Fα,(k−1,n−k), we reject H0; otherwise acceptH0. Alternatively, if the p value of F ob-
tained is sufficiently low (below α ), reject H0.
Alternatively, the overall significance test can be done using R 2 which is an extension of 3.102.
Let us manipulate the equation as follows:
237
ESS
/ k 1
TSS R2 / k 1
F
RSS
/ k n 1 R / n k
2
TSS ……………………………………………..(3.114)
Alternatively
ESS / k 1 (n k ) ESS
F
RSS / n k (k 1) RSS
(n k ) ESS (n k ) R 2
TSS
(k 1) (TSS ESS ) (k 1)(1 ESS
TSS
)
= TSS
R2
(n k ) R
2
(k 1)
Fk 1,n k ,
(k 1)(1 R ) (1 R )
2 2
(n k )
……………………... 3.115
where R = ESS/TSS. Equation (3.115) shows how F and R2 are related. These two vary directly.
2
When R2= 0, F is zero ipso facto. The larger the R2, the greater the F value . In the limit, when R2 = 1, F
is infinite. This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R 2
& 1-R2. Thus the F test, which is a measure of the overall significance of the estimated regression is
also a test of significance of R2.
238
Summary
In this chapter we have extended econometrics of simple linear regression model to case of many vari-
ables in multistep process starting with the three variable linear regression model–one dependent vari-
able and two explanatory variable to n variable case. Although in many ways a straightforward ex-
tension of the two–variable linear regression model, the three variable model introduced several new
concepts such as partial regression coefficients an adjusted and unadjusted multiple coefficient of
determination
There two common ways of representation: Algebraic and matrix approach. Matrix approach is most
preferred approach as the numbers of variable increase it is possible to write it in compact form. We can
use OLS, method of moment, or maximum likelihood estimation techniques of estimation. We have seen
how OLS can be used in both, algebraic and matrix approach. The properties of OLS estimators of the
multiple regression model parallel those of the two variable model. More specifically; regression line
X3
passes through the means ( Y , X 2 , and ); mean of the residual is zero and the residual are uncorrelated
with X1 and X2. Furthermore, the mean value of the estimated is equal to the mean value of the actual Y i.
Given the assumptions of the classical linear regression model OLS estimators of the partial regres-
sion coefficient not only are linear and unbiased , abut also have minimum variance.
Although in many ways the concepts developed in simple linear regression model can be applied straight-
forwardly to the multiple regression models, a few additional features which are unique to such models
will receive more attention in this part.
Once we considered general framework and estimation of technique of multiple, we have hypothesis test-
ing significance. In multiple regression models we will undertake many tests of significance. The are
several kinds of hypothesis testing that one may encounter in the context of a multiple regression model.
Some of several interesting hypothesis testing are testing hypotheses about an individual partial regres-
sion coefficient; testing the overall significance of the estimated multiple regression model, that is, find-
ing out if all the partial slope coefficients are simultaneously equal to zero; Testing that two or more coef-
ficients are equal to one another or testing linear equality restrictions ; testing that the partial regression
coefficients satisfy certain restrictions; testing the stability of the estimated regression model over time or
in different cross-sectional units; and testing the functional form of regression models.
This test of significance is the same as the tests discussed in simple regression model. The procedure for
testing the significance of the partial regression coefficients is the same as that discussed for the two-vari-
able case. Since sampling errors are inevitable in all estimates, it is necessary to apply test of significance
in order to measure the size of the error, determine the degree of confidence and to measure the validity
of these estimates. This can be done by using various tests such as standard error test; confidence inter-
val (Z-test, student’s t-test); P-value test; and F-test. Using test of significance if the hypothesized
value of
i in the null hypothesis is outside the limit, reject H and accept H . This indicates ˆi is statis-
0 1
239
tically significant. If the hypothesized value of
i in the null hypothesis is within the confidence interval
Key terms
Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers
240
Review Question
242
H o : B1 0
H o : Bo Bo
H1 B1 0
H1 Bo Bo c.
a.
H o : Bo Bo
H o : B1 0
H1 B1 0 H1 Bo Bo
b. d.
e. all
f. none
243
ˆ
b. Compute variance of 2
c. Test the significant of 2 slope parameter at 5% level of significant
2 2
d. Compute R and R and interpret the result
e. Test the overall significance of the model
3. The following data is provided where X 1 and X2 are independent variable and Y is de-
pendent variable
Y X1 X2
50 15 8
15 17 3
35 19 5
47 18 12
40 21 5
22 14 10
36 16 8
55 15 9
4.0 Introduction
In both the simple and multiple regression models, we made important assumptions about the distribu -
Yt
tion of , Xi, and error term ‘ut’. It was on the basis of those assumptions that we try to estimate, and
test the significance the model. Violation of those assumptions lead to many problems related to func -
tional form, error terms and hypothesis testing. But the question is what would be the implications if
some or all of these assumptions are violated. For better understanding assumptions along with corre-
sponding violation are summarized in Table 4.1 below.
Table 4.1: The assumption of CLRMA and violation
Serial independence of cov(ui , u j ) 0 Autocorrelations or a situation where error terms are correlated.
error terms
Normality of ui U i N (0, 2 ) Non normality , Outlier which create hypothesis testing prob-
lems
X is non – stochastic cov( X i , ui ) 0 Endogeinity or regressors are correlated with the error term, or
OLS estimator is biased property
245
Linearity Y x u non linearity and /or wrong regressor or wrong functional form
In the subsequent sections, focus will be given to the problems of heteroskedaticity, autocerrelation,
multicollinearity, non-normality and model specification. Under each of these case, we look at nature of
the violation along with examples; reasons for such problems; consequences of such violations on least
square estimators; ways of detect the presence of those problems and remedial measures.
4.1. Heteroscedasticity
4.1.1 The nature of Heteroscedasticty
In the classical linear regression model, one of the basic assumptions is that the probability distribution
ui
of the disturbance term remains same over observations of X. That is, the variation of each around its
zero mean does not depend on the value of explanatory variable. This feature of homogeneity of vari-
ance is known as homoscedasticity. To make the concept clear assume the two-variable model
Y i=β 1 + β 2 X i+u i. The variance of each u i remains the same irrespective of values of the explanatory
Symbolically, ……………………4.1
If the variance of U were the same at every point or for all values of X, it can be plotted in the three di-
mensions as presented in figure 4.1(a). This holds if the observations of the error terms are drawn from
identical distributions. However, in applied works disturbance terms do not have the same variance. The
variance of error term is differs depending on observations as presented in figure 4.1(b).
246
This condition of non-constant variance or non-homogeneity of variance is known as heteroscedasticity.
var(u i ) u2 var(u i ) ui2 u2 is not constant but its
That is, (a constant) but (a value that varies). If
Yi ui
that the conditional variance of (which in fact is ) increases as X increases.
Generally, we can encounter three types of Heteroscedasticity:
i. Increasing heteroscedasticity: This is a case where variance of the stochastic term (monoton-
ically) increasing. That is, as X increases, so does the variance of u . This is the common
form of heteroscedasticity assumed in econometric applications.
ii. Decreasing heteroscedasticity: As X assumes higher values the deviation of the observations
from the regression line decreases. The variance of the random variable changes in the op-
posite direction to the explanatory variable.
iii. Cyclical heteroscedasticity: The variance of u decreases initially as X assumes higher values,
but after a certain level of X , the variance of u increases with X .
The above types are depicted diagrammatically in the following figures 4.2. In panel (a)
u2 seems to in-
crease with X but in panel (b) the error variance appears greater around in middle range of X’s, tapering
off toward the extremes. Finally, in panel (c), the variance of the error term is greater for low values of
X, declining and leveling off rapidly an X increases.
ship
ui2 and X, but u i ’s are not observable. In applied research we make convenient assumptions about
K
ui2
K (X )
2 2 2
K (X i )
2 2
X i ….etc.
from such i. ui i ii. iii.
Practical Exercise 4. 1
Mention three types of heteroscedasticity along with example?
247
4.1.2.. Examples and Type of data heteroskedaticity
4.1.2.1. Examples of heteroscedastic functions
I. Consumption Function:
Suppose we are to study the consumption expenditure from a given cross-sectional sample of family
C i Yi U i Ci Yi
budgets specified as ; where: consumption expenditure of ith household; dis-
posable income of ith household. At low levels of income, the average consumption expenditure is low,
and the variation below this level is less possible; as consumption cannot fall too far below certain criti -
cal level because this might mean starvation. On the other hand, it cannot rise too far above certain level
as money income does not permit it. Such constraints may not be found at higher income levels. Thus,
consumption patterns are more regular at lower income levels than at higher levels. This implies that at
high income level u' s will be high; while at low incomes u' s will be small. Hence, there is het-
eroscedasticity. Symbolically,
E ( u 2i ) =σ 2i …………………………………….………… 4.2
Notice the subscript of σ 2, reminds us that the conditional variances of ui (= conditional variances of Y i)
are no longer constant. The assumption of constant variance of u' s does not hold when estimating the
consumption function from across section of family budgets. Here, the variances of Y i are not the same.
ii. Production Function: Suppose we are required to estimate the production function specified in gen-
eral form as Y f ( K , L) from a cross-sectional random sample of firms. Disturbance terms in the pro-
duction function would stand for many factors; like entrepreneurship, technological differences, selling
and purchasing procedures, differences in organizations, etc. other than labor (L) and capital (K) consid-
ered in the production function. The factors mentioned above, which are not considered explicitly in the
production function show considerable variability in large firms than in small ones. This leads to break-
down of our assumption of homogeneity of variance terms.
iii. Wage and company
Average wages rise with the size of firm. For firm with smaller size the payment is low and variabil-
ity tend to be less. As larger firms pay more on average, but there is more variability in their wages.
The variance increases as firm size increases and hence heteroskedaticity.
iv. Savings
Savings increases with income, so does the variability of savings. As incomes grow, people have more
discretionary income, so more scope for choice about how to dispose and how much to save. The higher
income families on the average save more than the lower-income families. Estimating the savings func-
248
tion from a cross-section of family budget with assumption of constant variance of the u's is not appro-
priate.
It should be noted that the problem of heteroscedasticity is likely to be more common in cross-sectional
than in time series data. In cross-sectional data, one usually deals with members of a population at a
given point in time, such as individual consumers or their families, firms, industries, or geographical
subdivisions such as state, country, city, etc. Moreover, these members may be of different sizes, such as
small, medium, or large firms or low, medium, or high income. In time series data, on the other hand,
the variables tend to be of similar orders of magnitude because one generally collects the data for the
same entity over a period of time. Examples are GNP, consumption expenditure, savings, or employ -
ment over some period of time.
Practical Exercise 4. 2
Why heteroscedasticity is common in cross-sectional data ?
1. Error learning model: it states that as people learn more and gained experience, their error of be -
typing errors varies with typing practice as the number of hours of typing practice increases, the
average number of typing errors as well as their variance decreases.
2. As incomes grow, people have more discretionary income and hence more scope for choice
about the disposition of their income. Hence, σ 2i is likely to increase with income. For instance,
in the regression of savings on income, one is likely to find σ 2i is increasing with income because
people have more choices about their savings behavior. Because high income families show a
much greater variability in their saving behaviour than do low income families. High income
families tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than cutting down their consumption expenditure. On the other hand, low in-
come families save for certain purposes, and thus their saving patterns are more regular. This im-
plies that at high incomes the u' s will be high, while at low incomes the u' s will be small.
3. Heteroskedasticity may be because of change in sampling procedures. As error term (ui) ex-
presses the effect of measurement errors on the dependent variable, there is a cogent reason for
expecting ui to vary over time, in most cases. For example as Y increases, errors of measurement
249
tend to increase, because it becomes more difficult to collect data, and check its consistency and
reliability. Furthermore, the errors of measurement tend to be cumulative over time, so that their
size tends to increase. In this case the variance of u increases with increasing values of X. With
the sampling techniques and data collection methods may continuously improve over time, and
errors of measurement may decrease. For example, banks that have sophisticated data processing
equipment are likely to commit fewer errors in the monthly or quarterly statements of their cus-
tomers than banks without such facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying observation,
or outlier, is an observation much different (either very small or very large) in relation to other
observations in the sample (i.e., extreme values compared to the majority of a variable). The in-
clusion or exclusion of such an observation, especially if the sample size is small, can substan-
tially alter the results of regression analysis. With outliers it would be hard to maintain the as-
sumption of homoscedasticity.
5. Heteroskedasticity may be due to the omission of some explanatory variable (missing variable)
and /or wrong functional form. Many of the variables omitted from the model tend to change in
the same direction with X. They cause an increase in the variation of the observations of the de -
pendent variable from the regression line implies an increase in the variance of u i as the values of
X increase. In such situation the residuals obtained from the regression may give the distinct im-
pression that the error variance may not be constant. For example, in the demand function for a
commodity, if we do not include the prices of complementary commodities competing with the
commodity in question, the residuals obtained from the regression may give distinct impression
that the error variance may not be constant. But if the omitted variables are included in the
model, the impression may disappear.
6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors in-
cluded in the model. Examples, economic variables such as income, wealth, and education have
certain form of skweness. It is well known that the distribution of income and wealth in most
societies is uneven and skewed to one side. With the bulk of the income and wealth being
owned by a few at the top, it is skewed to one side.
Practical Exercise 4. 3
What are reasons for Heteroscedasticity
250
i. The OLS estimators of the regression coefficient would still be statistically unbiased and consis-
tent, albeit the u' s are heteroscedastic. The unbiasedness property of the least squares estimates
does not require that the variance of u's be constant. The coefficient estimates would still be stat-
istically linear.
Yi X i ui
xy xi 2 xi ui x u
ˆ i 2i
x 2
xi 2
x 2
x
x(ui )
( ˆ ) ...........................................4.3
x 2
ˆ ˆ
Similarly ; ˆ Y X ( X U ) X
(ˆ ) X (U ) ( ˆ ) X .............................4.4
i.e., the least square estimators are unbiased even under the condition of heteroscedasticity.
ii. Variance of OLS coefficients will be inefficient
xi x j E (uiu j )
xi 2 E (ui 2 ) i j
2 2
xi 2 xi 2
2 kxi 2
x x
i j
i j
k x
i j
i
2
i
var( ) .................................................................4.5
x i
2 2
These two variances are different. If u is heteroscedastic, the OLS estimators do not have the smallest
variance in the class of unbiased estimators. They are not efficient both in small and large samples. To
ˆ 2
var( 1 ) K 2
2 2
ui2 K i 2
251
Ki
Where are same non-stochastic constant weights. This assumption merely states that the hetroscedas-
ˆ ki xi2
(var( ) Homo . 2
.............................................................4.6
xi
ˆ
Thus, it can be seen that the two variances, var( ) and Var ( ) Het will be equal only if ki=1 for all .
That is, the error is homoskedastic.
ki xi 2 1,
If xi 2
then OLS will overestimate the variance of
ki xi 1,
2
if xi 2
then OLS will underestimate the variance of
var( ˆ ) K i (Yi ) K i2 ui2 2K i2 .
2 2
xi xi2 ui2
var( ˆ ) K i2 (Y 2i) x 2 i
(Y 2
) ...................................................4.7
(xi2 ) 2
ˆ
are positively correlated, the second term of (4.6) is greater than 1, and then var( )
x 2 and k i
That is, if
under heteroscedasticty will be greater than its variance under homoscedasticity. As a result the true
ˆ
standard error of shall be underestimated.
iii. Conventional OLS hypothesis is invalid: Standard confidences interval formula and hypothesis
testing procedure use these variance estimators and hence standard error. When there is hetroske-
daticity, the usual formulae of the variances of the coefficients are not appropriate to conduct
ui2
tests of significance and construct confidence intervals. is no more a finite constant figure,
but rather it tends to change with an increasing range of value of X and hence cannot be taken
out of the summation (notation). As a result the t-value associated with it will be overestimated
ˆ
which might lead to the conclusion that in a specific case at hand is statistically significant
(which in fact may not be true). Hence, OLS can either overestimate/underestimate the correct
variance and can be only slightly inaccurate or grossly so. Moreover, if we proceed with our
model under false belief of homoscedasticity of the error variance, our inference and prediction
about the population coefficients would be incorrect. Hence, Conclusion and hypothesis (CIs)
based on biased standard errors will be wrong, and the t & F tests will be invalid and unreliable.
252
iv. The prediction (of Y for a given value of X ) based on OLS estimates from the original data
would have a high variance, and hence the prediction would be inefficient. Therefore: the OLS
estimated variances, σ2 (and thus for the standard errors of the coefficients) are biased.
xi yi 1
x 2 x 2 i
Var ( ) V x 2 Var (Yi )
( i i
) of ordinary one cannot be us to say.
But, it is not possible to say anything in general about the direction or size of this bias.
Practical Exercise 4. 4
What are consequences of Heteroscedasticity ?
B. Graphical Method: It is a test based on the nature of the graph. If there is no a priori or em-
pirical information about the nature of heteroscedasticity, in practice one can do the regression
analysis on the assumption that there is no heteroscedasticity and then examination of the resid-
ual squared u^ 2i to see if they exhibit systematic pattern. Although u^ 2i are not the same thing as u2i ,
they can be used as proxies especially if the sample size is sufficiently large. An examination of
the u^ 2i may reveal patterns such as those shown in Figure 4.3.
253
Figure 4.3: Scattergram of estimated squard residule against X
To check whether a given data exhibits hetroscedsticity or not we look on whether there is a systematic
e2 X 2
relation between residual squared i and the mean value of Y or with i . Furthermore, one can see u^ i
are plotted against Y^ i, the estimated Y i from the regression line, the idea being to find out whether the
estimated mean value of Y is systematically related to the squared residual. To do this first run OLS and
Yˆ or ( X i )
e2
plot squared residual vs. Y or each X. In Figure 4.3 in the figure i are plotted against . Sec-
ond, the graph may show some relationship (linear, quadratic…) providing clues as to the nature of the
problem and a possible remedy.
In Figure 4.3a, it can be seen that there is no systematic pattern between the two variables, suggesting
that perhaps no heteroscedasticity is present in the data. Figure 4.3b however exhibits definite patterns.
For instance, Figure 4.3c suggests a linear relationship, whereas Figure 4.3.d and e indicates a quadratic
relationship between u^ 2i and Y^ i . Using such knowledge, one may transform the data in such a manner
that the transformed data do not exhibit heteroscedasticity. Instead of plotting u^ 2i against Y^ i, one may
plot them against one of the explanatory variables, especially plotting u^ 2i against Y^ i results in the pattern
shown in Figure 7.3a. This is useful for cross check.
We can illustrate the above using hypothetical example consumption expenditure and disposable in-
come indicated using table 4.2.
254
Table 4.2: Consumption Expenditure ( D) and disposable income(ID) for 30 families
Disposable
Fam- Consump-
income(ID
ily tion
)
1 10600 12000
2 10800 12000
3 11100 12000
4 11400 13000
5 11700 13000
6 12100 13000
7 12300 14000
8 12600 14000
9 13200 15000
10 13000 15000
11 13300 15000
12 13600 15000
13 13800 16000
14 14000 16000
15 14200 16000
16 14400 17000
17 14900 17000
18 15300 17000
19 15000 18000
20 15700 18000
21 16400 18000
22 15900 19000
23 16500 19000
24 16900 19000
25 16900 20000
26 17500 20000
27 18100 20000
28 17200 21000
29 17800 21000
30 18500 21000
255
Solution
Regressing c on ID we obtain the following results
C 1480 0.788 ID
se (449.61) (0, 026845)
t (3.29) (29.37)
Using this model we produce the following intermediate results that could help to draw the residual
plot.
Now, if we plot the residuals against the X values, we obtained the pattern shown by figure 4.4. The plot
seen figure 4.4 shows a wadge shape that increase to the right. This can be interpreted as an increasing
256
variance of the random term in the model or presence hetroskadatcity. If now we draw the plot of resid-
ual against the predicted value (fitted value) of c, we will observe similar results and reach the same in -
terpretation.
usquare usquare
Example-2: Table 4.4 gives data on quantity demanded (y), the price of the commodity (X 1) and income
of consumer (X2) for a commodity for a period of 10 months. Fit the regression equation of Y on X 1 and
X2 and check for the problem of hetroskedaticity in the model.
Table 4.4: Quantity demanded, price of the commodity and income of consumer data
NO. Y X1 X2
1 100 5 1000
2 75 7 600
3 80 6 1200
4 70 6 500
5 50 8 300
6 65 7 400
257
7 90 5 1300
8 100 4 1100
9 110 3 1300
10 60 9 300
Solution
2
Y 111.6918 7.18824 X 1 0.014297 X 2 R 2 0.89 R 0.86
se (23.53081) (2.555331) (0.011135)
t (4.75) ( 2.81) (1.28)
Practical Exercise 4. 05
1. Produce residual graph and test whether hetroscedaticity exist ?
2. Interpret the result of above example?
disturbance
i2 is some function of the explanatory variable X i . Specific functional form the suggested
was:
i2 2 X i e vi Or
ln i2 ln 2 ln X i vi ..................................4.8
Since
i2 is generally not known, park suggests using ei2 as proxy. Accordingly, park suggests running
258
Example-1: Suppose that from a sample of size 100 we estimate the relation between compensation for
workers (Xi) and productivity (Yi).
To illustrate the Park approach, the following regression function is used
Yi = βo + β1Xi + Ui ...............................................4.10
Suppose data on Y and X come up with the following result
Y 1992.342 0.2329 X i .................................................4.11
SE (936.479) (0.0098)
t (2.1275) (2.333) R 2 0.4375
The results reveal that the estimated slope coefficient is significant at 5% level of significant on the
bases of one tail t-test. The equation shows that as labor productivity increases by, say, a birr, if labor
compensation on the average increases by about 23 cents.
X
The residual obtained from regression (4.11) were regressed on i as suggested by equation (4.9) giving
the following result.
ln ei2 35.817 2.8099 ln X i ...................................4.12
SE (38.319) (4.216)
t (0.934) ( 0.667) R 2 0.0595
The above result revealed that the slope coefficient is statistically insignificant implying there is no sta-
tistically significant relationship between the two variables.
Practical Exercise 4. 5
Consider a relationship between Compensation (Y) and Productivity (X). To illustrate the Park ap-
proach, the following regression function is used
Yi = βo + β1Xi + Ui .....................................................................4.13
Suppose data on Y and X come up with the following result
Ŷ = 1992.34 + 0.23 Xi………………………….…….……….4.14
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on Xi as suggested in
(4.13), giving the following results.
ei22
ln = 35.82 - 2.81 lnXi ……………………………………….4.15
S.e = (38.32) (4.22)
t = (0.93) (-0.67) r2 = 0.46
Interpret the result?
Limitation
Following the park test, one may conclude that there is no hetroscedasticity in the error variance al-
though empirically appealing, the park test has some problems. Gold Field and Quandt have argued that
ei ln ei2 ln X i vi
the error term entering into may not satisfy the OLS assumptions and may it-
self be hetroscedastic. Nonetheless, as a strict explanatory method, one may use the park test.
259
B. Glejser test:
ei
The Glejser test is similar in sprit to the park test. After obtaining the residuals from the OLS regres-
260
2
The Glejser test gives information on the particular way or form of heteroscedasticity in which ui is
connected with X . This information is crucial for the correction of the heteroscedasticity of the disturb-
ance term.
Limitation
vi
Goldfeld and Quandt pointout that expected value error term ( ) is non-zero. That is, it is serially corre-
lated and irrorrically heteroscedstic. An additional difficulty with the Glejser method is that models such
ei X i vi ei X i2 vi
as: and are non-linear in parameters and therefore cannot be
estimated with the usual OLS procedure. As a practical matter, Glejester technique may be used for
large samples and may be used in small samples strictly as qualitative device to learn something about
heterosedasticity.
Practical Exercise 4. 6
Compare and contrast Park test and Glejester test?
Which one gives strong base for testing hetroskedaticity ?
This is the simplest test of heteroscedasticity test method. It can be applied to small samples as well as
large samples. The steps to conduct this test are outlined as follows.
Step 1: Formulate the null hypothesis of homoscedasticity against the alternative hypothesis of heteros-
H o s 0 and H1 s 0
, where, is the population rank correlation coeffi-
s
cedasticity as:
s
cient and its sample counterpart is r .
Yi 0 i X i U i
Step 2: Regress Y on X as shown below ………………………… 4.16
Then, obtain the residuals, ei' which are the estimates of the u' s from equation 4.17.
Step 3: Order the e' s (ignoring their sign) and the X value in ascending or descending order and com-
s
pute the rank correlation coefficient between e and X .The Spearman rank-correlation coefficient r
give as:
6 Di 2
r 1
s
...............................................................................4.17
n(n 2 1)
Di
Where, is the difference between the ranks of corresponding pairs of e and X and n is the number of
observations in the sample. If we have a model with more than one explanatory variable, we may com-
pute the spearman’s rank correlation coefficient between e and each one of the explanatory variables
separately.
261
s
Step 4: Compute the value of t ( t calculated) form equation 4.17. Assuming that the null hypothesis is
true (i.e., population rank correlation coefficient, is zero) and n > 8 , the significance of sample rank
s
s
correlation coefficient, r , can be tested by using t -test. The formula used to obtain the value of t for
this purpose is given as:
rs n 2
t s calculted ...................................................................4.18
1 (r s )2
Step 5: Make decision using the following rule: Reject the null hypothesis of homoscedasticity, if the
s
value of t obtained from equation 4.18. ( t calculated) is greater that the value of t obtained from t tables
for degree of freedom of n − 2 , and conclude that there is the problem of heteroscedasticity.
Example: Consider the regression Yi = β0 + β1Xi to illustrate the rank correlation test using10 observa-
tions. The following table makes use of the rank correlation approach to test the hypothesis of het -
roscedasticity. Notice that column 6 and 7 put rank of |Ûi| and Xi in an ascending order.
TOTAL 0 110
110
10(100 1)
Applying formula (4.17) we obtain, rs = 1 – 6 = 0.33
262
(0.33) 8
Applying the t-test given in (4.18), we obtain: t = 1 0.11 = 0.99
The t-value for 8 (=10-2) df is 1.856 indicating that the test is not significant even at the 10% level of
significance. Thus, there is no evidence of systematic relationship between the explanatory variable and
the absolute value of the residuals, which might suggest that there is no hetroscedasticity.
D. Goldfield-Quandt test
2
This method is applicable if one assumes that the heteroscedastic variance, i is positively related to
one of the explanatory variables in the regression model. For simplicity, consider the usual two variable
models:
Yi i X i U i
…………………………4.19
Suppose
i2 is positively related to X i as:
i2 2 X i2 .................................................4.20 ; where 2 is constant.
2
If the above equation is appropriate, it would mean i would be larger, the larger values of i .If that
X
turns out to be the case, hetroscedasticity is most likely to be present in the model.
To test this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Formulate the hypotheses to be tested. The customary hypotheses used in the Goldfeld and
X
Step 2: Order or rank the observations according to the values of i beginning with the lowest X value
Step 3: Select arbitrarily a certain number, c , of central observations to be omitted from the analysis.
Generally, for samples larger than 30 (n > 30), the optimum number of central observations to be omit-
ted from the test is approximately a quarter of the total observations. For example, we omit 8 central ob -
servations for n = 30 , 16 central observations for n = 60 and so on. A central observation (C) is speci-
(n c)
fied a priori, and divides the remaining (n-c) observations into two groups each of 2 observations.
Where one includes the small values of X and the other includes the large values of X .
(n c) (n c)
Step 4: Fit separate OLS regression to the first 2 observations and the last 2 observations,
and obtain the respective residual sums of squares RSS1, and RSS2, where RSS1 representing RSS from
X
the regression corresponding to the smaller i values (the small variance group) and RSS 2 that from the
larger
Xi
values (the large variance group). Let u12
be the sum of residuals obtained from the sub-
n c
2 k
degrees of freedom and 2 be the sum of residuals
u2
sample of low values of X , with
263
n c
2 k
from the sub-sample of high values of X , with the same degree of freedom, , where k is the
(n c) (n c 2 K )
K df
total number of parameters in the model.These RSS each have 2 or 2 ,
where: K is the number of parameters to be estimated, including the intercept term; and df is the de -
gree of freedom.
n c
RSS 2 / df u 22 /
2
k
...................................4.21
RSS1 / df n c
u 1 / 2 k
2
Step 5: compute
U
If i are assumed to be normally distributed (which we usually do), and if the assumption of ho -
moscedasticity is valid, then follows F distribution with numerator and denominator df each (n-c-
2k)/2.
RSS 2 /(n c 2 K ) / 2
~F ...................................4.22
RSS1 /(n c 2 K ) / 2 (n-c)
2
K ,
(n-c)
2
K
nc
v1 v2 k
has an F- distribution with 2 degrees of freedom. If the two variances are equal (i.e.,
if the u' s are homoscedastic) the value of F in equation above will tend to 1. On the other hand, if the
two variances differ, then the F will have a large value, given by the sign of the test
u2 2 u12
.
Step 5: Make decision: to make decision on the status of the null hypothesis, compare the observed
value of F obtained from equation 4.20 with the theoretical value of F (tabulated F) for
nc
v1 v2 k
2 degrees of freedom and the chosen level of significance. Reject the null hypothesis
(Ho ) of homoscedasticity
12 2 2 (i.e., no difference in the variances of the two groups), if Fcal > F
tabulated; otherwise we can conclude that there is heteroscedasticity problem in the model. If the com-
puted ( F ) is greater than the tabulated F at the chosen level of significance, we can reject the hypoth-
esis of homoscedasticity, i.e. we can say that hetroscedasticity is very likely. The higher the observed
value of F, the stronger would be the heteroscedasticity problem.
Example: To illustrate the Goldfeld-Quandt test, we present in table 4.6 data on consumption expendi-
ture in relation to income for a cross-section of 30 families. Suppose we postulate that consumption ex -
penditure Y($) is linearly related to income X($) but we suspect heteroscedasticity will present in the
data. The necessary reordering of the data for the application of the test is also presented in table 4.6.
264
Table 4.6: Hypothetical data on consumption expenditure and income
Y X Y X
55 80 55 80
65 100 70 85
70 85 75 90
80 110 65 100
79 120 74 105
84 115 80 110
98 130 84 115
95 140 79 120
90 125 90 125
75 90 98 130
74 105 95 140
110 160 108 145
113 150 113 150
125 165 110 160
108 145 125 165
115 180 115 180
140 225 130 185
120 200 135 190
145 240 120 200
130 185 140 205
152 220 144 210
144 210 152 220
175 245 140 225
180 260 137 230
135 190 145 240
140 205 175 245
178 265 189 250
191 270 180 260
137 230 178 265
189 250 191 270
Dropping the middle four observations, the OLS regression based on the first 13 and the last 13 observa-
tions and their associated sums of squares are shown (standard errors in parentheses).
Table 4.7: Regression result of two sub group for Goldfeld-Quandt
265
Regression based on the first 13 observa- Regression based on the last 13 observations
tions
The critical F-value for 11 numerator and 11 denominator for df at the 5% level is 2.82. Since the esti -
mated F ( ) value exceeds the critical value, we may conclude that there is hetrosedasticity in the er-
ror variance. However, if the level of significance is fixed at 1%, we may not reject the assumption of
homosedasticity is the value of the observed is 0.014.
E. Breusch–Pagan–Godfrey Test.
The success of the Goldfeld–Quandt test depends not only on the value of c (the number of central ob-
servations to be omitted) but also on identifying the correct X variable with which to order the observa-
tions. But limitation of this test can be avoided if we consider the Breusch–Pagan–Godfrey (BPG) test.
To illustrate this test, consider the k-variable linear regression model
Yi = 0 + 1X1i + … + kXki + Ui ….. ..........................................4.23
Some or all of the X’s can serve as Z’s. Assume that the error variance i2 is described as
i2 = f(1 + 2Z2i + … + mZmi)………..........................................4.24
i2 is some function of the non-stochastic variables Z’s. Specifically, assume that
i2 = 0 + 1Z1i + … + mZmi …...................................................4.25
i2 is a linear function of the Z’s . If 1 = 2 = … = m = 0, i2 = 0 which is constant. Therefore, to test
whether i2 is homoscedastic is to test the hypothesis that 1 = 2 = … = m = 0. This actual test proce-
dure is as follows.
Û
Step1. Estimate Yi = 0 + 1Z1i +…+ kZki + Ui by OLS and obtain the residuals Û 1 , Û 2 ,…, n
~ 2 Uˆ i2 n
Step2. Obtain . This is the maximum likelihood estimator of 2. (Recall from unit two
Uˆ i2 ~ 2
i defined as Pi =
Step3. Construct variables P ………………………………..4.26
266
~2
It is simply each residual squared divided by
Step 4. Regress Pi on Z’s as Pi = 0 + 1Z1i + …mZmi + Vi ………………………..4.27
where Vi is the residual term of this regression
Step 5. Obtain the ESS (explained sum of squares) from the above equation and define
1
= 2 (ESS) ………….………..………………………………..4.28
Assuming Ui are normally distributed, one can show that if there is homoscedasticity and if the sample
size (n) increases indefinitely, then ~ X2m-1 , that is, follows asymptotically chi-square distribution
with (m-1) degrees of freedom. Therefore, if the computed (= X2) exceeds the critical X2 value at the
chosen level of significance, one can reject the hypothesis of homoscedasticity; otherwise one does not
reject it.
Example: Suppose we have 30 observations data on Y and X that gave us the following regression re-
sult.
Step 1 : Ŷ = 9.29 + 0.64XI. …………………………………………………....………..4.29
S.e
S.e = (5.2) (0.03) RSS = 2361.15
~ 2 Uˆ i2 30 2361.15
30
Step 2 : = = 78.71
STEP 3: Pi=
Uˆ i
2
~ 2
is derived the residuals û obtained from regression in step 1 by 78.71 to construct
the variable pi.
Step 4: Assuming that Pi are linearly related to Xi (=Zi) we obtain the following regression result.
Pi= -0.74 + 0.01Xi ESS = 10.42
Step 5: = ½ (ESS) = 5.21……………………………………………..…..4.30
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus, the ob-
served Chi square value is significant at 5% level of significance. Note that BPG test is asymptotic.
That is, it is a large sample test. The test is sensitive in small samples with regard to the assumption that
the disturbances Vi are normally distributed.
Practical Exercise 4. 7
Compare and contrast those tests of hetroskedaticity ?
Which one gives strong base for testing hetroskedaticity ?
267
hence lead to biased and inefficient result. Therefore, remedial measures concentrate on the variance of
the error
term, sources of the problem and its treatments. It should be done in steps .
To this end, first check for some specification errors (omitted variables, wrong functional form) and
make necessary corrections. Because, heteroskedasticity could be a symptom of other problems e.g.
omitted variables and /or wrong functional form. Second, if it persists even after correcting for other
specification errors scholars recommended different approach. One of the approaches is to stick to OLS
but use robust standard error (hetroskedasticity). Second is using OLS after transformation the model.
i2 (u i ) 2 i2 k i f ( X i )
b. Assume that is not known : , (different form)
Since our transformed data no more possesses heteroscedasticity, it can be shown that the estimate of
the transformed model are more efficient (i.e. they possesses smaller variance).
i2 (ui ) 2 i2 f (ki )
A. Assume known:
268
Y X i U i u
Assume that our original model is: where i satisfied all the assumptions except that
ui
is heteroscedastic. That is;
Y X i U i , var(u i ) i2 , (u i ) 0 (u i u j ) 0
………………..4.31
(u i ) i f (k i )
2 2
. If we apply OLS to the above model, the estimators are no BLUE. To make them
BLUE we have to transform the above model using variance. When the population variance i is
2
known, we use it for transformation purpose. Form of variance differs depending on the situation. Let
2
us assume i known and its functional form is as specified
(u i ) 2 i2
…………………………………….………………………4.32
The transforming variable of the above model is done by dividing the model using standard error. The
i2 i
standard error is square root of the variance ; i,e . Now divide the above model by i .
Y X i Ui
......................................................................4.33
i i i i
The variance of the transformed error term is constant, i.e.
2
u u 1 1
var i i 2 (u i ) 2 2 i2 1
i i i i
Constant…………….4.34
The transformed parameters are BLUE. Because all the assumptions including homoscedasticity are sat-
ui 1
(Yi X i )
isfied to (4.31),
i i
(Yi X i ) 2
i i ................................................4.35
1
wi 2
Let i
wi uˆi2 wi (Yi ˆ ˆX i ) 2
……………………………………………4.36
The method of GLS (WLS) minimizes the weighted residual sum of squares.
wi uˆ i2
2wi (Yi ˆ ˆX i ) 0
First order condition with respect to ̂ : ˆ
wi (Yi ˆ ˆX i ) 0
w Y ˆw ˆw X 0
i i i i i
269
wi uˆ i2
2wi (Yi ˆ ˆX i )( X i ) 0
ˆ ˆ
First order condition with respect to =
wi (Yi ˆ ˆX i )( X i ) 0
w (Y X ˆX ˆX ) 0
2
i i i i i
Y * w X ˆX * w X ˆw X
2
i i i i i i
Thus, in GLS we minimize a weighted sum of residual squares with w i=1/σ i2 acting as the weights, but
in OLS we minimize an unweighted or equally weighted RSS. The weight assigned to each observation
is inversely proportional to its σi , that is, observations coming from a population with larger σ i will get
relatively smaller weight and those from a population with smaller σ i will get proportionately larger
weight in minimizing the RSS (4.33). Since (4.33) minimizes a weighted RSS, it is known as weighted
least squares (WLS), and the estimators thus obtained and given in (4.38) and (4.39) are known as
WLS estimators. But WLS is just a special case of the more general estimating technique, GLS. Note
that if w i=w , a constant for all i, * is identical with .
i2 (u i ) 2 i2 k i f ( X i )
B. Assume that is not known : , (different form)
It is quite rare to assume that population variance is known. If the population variance is not known we
use sample variance as a proxy. Sample variance may have different functional form. Let’s assume is
(u i ) 2 i2 k i f ( X i )
some function of X. i.e. . The transformed version of the model may be obtained
f (X i )
by dividing throughout the original model by .
(u i ) 2 i2 k 2 X i2
Case I. Suppose the heteroscedasticity is of the form . This equation implies that
ui X i2
the variance of increases proportional with , Then, solving for the constant factor of proportional-
w2
k2
2
ity, k , from equation above, we get X i 2 . If Y X i U i where var(u i ) i2 K i2 X i2
.
270
This suggests that the appropriate transformation of the original model consists of the division of the ori-
ginal model by the transforming variable of standard error specified as X2 X . The transformed
model is:
Y X i Ui
.............................................................4.40
Xi Xi Xi Xi
Ui
Xi Xi
Yi ui
Y* , and U *
Xi Xi
Let
1
Y* ( ) U *.............................................................4.41
Xi
ˆ Y 1
ˆ ..................................................................4.42
Xi Xi
U *
Note that: i in the transformed model above is homoscedastic.Then, applying OLS to the above het-
eroscedastic model
k u
ˆ i 2 i
x
2
ˆ ˆ k i u i
var( ) ( )
2
x
2
2
(k i u i 2X i X i u i u i )
x 2 2
i
U
i
OLS to the transformed version of the model X i X i . Note that in this transformation the position
1
of the coefficients has changed: the parameter of the variable X i in the transformed model is the con-
stant intercept of the original model, while the constant of term of the transformed model is the parame-
ter of the explanatory variable X in the original model. Therefore, to get back to the original model, we
K
shall have to multiply the estimated regression by i .
271
(u i2 ) i2 k 2 X i
Case II. Suppose the heteroscedasticity is of the form : . Then, the transformation
of the original model consists of the division of the original relationship given in equation
Y o 1 X i U i Xi
by the transforming variable is .
Y X i Ui
................................................4.43
Xi Xi Xi Xi
The transformed model is:
U
Xi i
Xi Xi
Ui
Xi
Note that the transformed random term in the transformed model given in equation 4.43 is homos-
2
cedastic with constant variance equal to k .: The variance of the stochastic term in the transformed
model can be given as
2
u 1 1
i (U i ) 2 k 2 X k 2 ........................................................................4.44
X X X
i
2
Note that: k is constant with constant variance, hence, we can conclude that with the above transforma-
tion we solved the problem of heteroscedasticity, and hence we can estimate the transformed model with
OLS. There is no intercept term in the transformed model. Therefore, one will have to use the ‘regres-
sion through the origin’ model to estimate and . In this case, therefore, to get back to the original
Xi
model, we shall have to multiply the estimated regression by
.
Y Xi Ui
.............................................................4.46
Xi Xi Xi Xi
Proof : Now we can show that the stochastic term of the transformed model given in equation above ,
has constant variance.
Ui 1 1
K 2 (Yi ) K 2 ..............................................4.47
2
(ui ) 2
X i ( X i ) (Yi )
2 2
Hence, with the above transformation we solved the problem of heteroscedasticity, and hence we can es-
timate the transformed model with OLS.The transformed model described in 4.47 above is however not
operational in this case. It is because values of and are not known. But since we can obtain
Yˆ ˆ ˆX i ,
the transformation can be made through the following two steps.
272
1st : we run the usual OLS regression disregarding the heteroscedasticity problem in the data and obtain
Yˆ using the estimated Yˆ , we transform the model as follows.
Y 1 X U
i i ....................................................................................4.48
Yˆ Yˆ Yˆ Yˆ
Assumption IV: A log transformation such as
ln Yi 1 ln X i U i ..................................................................................4.49
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are measured.
For example log transformation reduces a ten-fold difference between two values (such as between 8
and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)
Concluding Remark
The remedial measures explained earlier through transformation point out that we are essentially specu-
lating about the nature of i2. Note also that the OLS estimators obtained from the transformed equation
are BLUE. The transformation discussed will depend on the nature of the problem and the severity of
hetroscedasticity. Moreover, we may not know a priori which of the X variable should be chosen for
transformation the data in case of multiple regression model. In addition log transformation is not appli-
cable if some of the Y and X values are zero or negative. Besides the use of t-test, F tests, etc are valid
only in large samples when regression is conducted in transformed variables.
Practical Exercise 4. 8
What are remedial measures of hetroskedaticity ?
Briefly present how GLS works and logic behind the model?
273
4.2 Autocorrelation
4.2.1 The nature of Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of the classical lin-
cov(u i u j ) (u i u j ) 0
ear regression model is that the which implies that successive values of distur-
bance term U are temporarily independent. That is, disturbance occurring at one point of observation is
not related to any other disturbance or when observations are made over time, the effect of disturbance
occurring at one period does not carry over into another period. Sometimes the above assumption is not
satisfied; there is autocorrelation of the random variables. Autocorrelation is defined as a ‘correlation’
between members of series of observations ordered in time(as in time series data) or space(as in cross
sectional data) i.e. spatial autocorrelations. That is, if the value of U in any particular period is corre-
lated with its own preceding value(s).
There is a difference between ‘correlation’ and autocorrelation. Autocorrelation is a special case of cor-
relation which refers to the relationship between successive values of the same variable, while correla-
tion may also refer to the relationship between two or more different variables. Autocorrelation is also
sometimes called as serial correlation but some economists distinguish between these two terms. Ac-
cording to G.Tinner, autocorrelation is the lag correlation of a given series with itself but lagged by a
number of time units. The term serial correlation is defined as “lag correlation between two different se-
u1 , u 2 .........u10 u 2 , u 3 .........u11
ries.” Thus, correlations between two time series such as and , where the
latter series is lagged by one time period of the former. Whereas correlation between time series such as
u1 , u 2 .........u10
and v 2 , v 2 ......... v11 where U and V are two different time series, is called serial correlation.
Although the distinction between the two terms may be useful, we shall treat these terms synonymously
in our subsequent discussion.
Just as hetroskedaticity is generally associated with cross-sectional data, autocorrelation is usually asso-
ciated with time series data (i.e. data ordered in temporal sequences). We take a look at this in order to
answer the following question; what is the nature of autocorrelation? What are consequences (theoretical
and applied)? How does one detect autocorrelations? How does one remedy the problem of autocorrela-
tions?
274
Figure 4.5: Error terms and its pattern overtime
The figures (a) –(d) above, show a cyclical pattern among the U’s indicating autocorrelation i.e. figures
(b) suggest an upward and (c) suggest downward linear trend and (d) indicates quadratic trend in the dis-
turbance terms. Figure (e) indicates no systematic pattern supporting autocorrelation.
On the other hand one can present successive relationship between error terms as indicated by figure 4.6
f-g. Autocorrelation is graphically shown by plotting successive values of the random disturbance term
vertically (ui) and horizontally (uj).
There are several reasons why serial or autocorrelation a rises. Some of these are:
a. Cyclical fluctuations
Time series such as GNP, price index, production, employment and unemployment, and business cycle
exhibit certain cycle. For instance, business cycle start at the bottom of recession, when economic re-
covery starts, and then most of these series move upward. In this upswing, the value of a series at one
point in time is greater than its previous value. Thus, there is a momentum built in to them, and it con -
tinues until something happens (e.g. increase in interest rate or tax) to slowdown them. Therefore, in
regression involving time series data, successive observations are likely to be interdependent.
b. Interpolations in statistical observations (data manipulation)
In empirical analysis raw data are “massaged”. Most of the published time series data involve some in-
terpolation and “smoothing” processes. In this process, the average of the values of disturbance term
over successive time periods will be taken. Consequently, the successive values of U are interrelated and
exhibit autocorrelation pattern. For example, in time series regression involving quarterly data, such data
are often derived from the month by simply adding three monthly observations and dividing the sum by
3. This averaging introduces “smoothness” into the data by dampening the fluctuations in the monthly
data. This smoothness can itself lend to a systematic pattern in the disturbance, thereby, introducing
autocorrelations.
276
c. The cobweb phenomena:
The supply of many agricultural commodities reflects the so called cobweb phenomena, where supply
reacts to price with a lag of one time period because supply decisions take time to implement-the gesta-
tion period. Thus, crops planting decision of farmers at the beginning of this year’s, are influenced by
the price prevailing last year so that their supply functions is
Supply at t period B0 B1Pt 1 U t
Suppose at the end of period t, price P t turns out to be lower that Pt-1. Therefore, in period( t+1) farmers
decide to produce less than they did in period t, obviously, in this situation the disturbances Ut are not
expected to be random for in the farmers overproduce in year, they are likely to under produce in
year(t+1) etc, leading to a cobweb patterns.
d. Misspecification of the true random term, U
It may well be expected that the successive values of the true U are correlated by very nature of the data
or system. This case of autocorrelation may be called “true-autocorrelation” because its root lies in the
U term itself. Sometimes, purely random factors (such as wars, drought, storms, strikes etc) impose in-
fluences that are spread over more than one period of time. These factors result in serially dependent
values of the disturbance term U, which violate assumption of no autocorrelation.
c. Specification bias
Autocorrelation may arise as a result of specification biases which are arise as a result of exclusion of
variables from the regression model; incorrect functional form of the model; neglecting lagged terms
from the regression model. Let’s see one by one how the above specification biases causes autocorrela-
tion.
i. Exclusion of variables: One of these is exclusion of variable(s) from the model which reflect sys-
tematic pattern in error changes. It is customary that most economic variables tend to be auto-correl-
ated. If an auto-correlated variable has been excluded from the set of explanatory variables, obvi-
ously its influence will be reflected in the random variable U. This case may be called “quasi –
autocorrelation”, since it is due to the auto-correlated pattern of omitted explanatory variables; and
not due to the behavioural pattern of the values of the true U.
277
where y quantity of beef demanded, x1 price of beef, x 2 consumer income, x3 price of
pork and t time. Now, suppose we run the following regression in lieu of (4.50):
yt 1 x1t 2 x2 t Vt .............................................................4.51
Now, if equation 4.50 is the ‘correct’ model, running equation 4.51(incorrect model) is the tanta-
Vt 3 x3t U t
mount to letting . To the extent the price of pork affects the consumption of beef, the
disturbance term V will reflect a systematic pattern, thus creating autocorrelation. A simple test of
this would be to run both equation 4.50 and equation 4.51 and see whether autocorrelation, if any,
observed in equation 4.50 disappears when we run equation 4.51. The actual mechanics of detecting
autocorrelation will be discussed latter.
ii. Incorrect functional form: This is also other source of the autocorrelation of error term. If we have
adopted a mathematical form that differs from the true form of the relationship, the values of U may
show serial correlations. Suppose the ‘true’ or correct model in a cost-output study is specified as;
Marginal cost = 0 1outputi 2 outputi 2 U i ......................................................4.52
As the figure shows, between points A and B the linear marginal cost curve will consistently overesti-
mate the true marginal cost; whereas, outside these points it will consistently underestimate the true mar-
Vi
ginal cost. This result is to be expected because the disturbance term ( ) is, in fact, equal to (output)2+
ui, and hence will catch the systematic effect of the (output)2 term on the marginal cost. In this case, V i
will reflect autocorrelation because of the use of an incorrect functional form.
iii. Neglecting lagged term from the model: - If the dependent variable of a certain regression model is
to be affected by the lagged value of itself or the explanatory variable and is not included in the
278
model, the error term of the incorrect model will reflect a systematic pattern which indicates autocor-
relation in the model. Suppose the correct model for consumption expenditure is:
Ct 1Yt 2Yt 1 U t ...............................................................................4.54
Ct 1 yt Vt ..........................................................................................4.55
Vt 2Yt 1 U t
where . Hence, Vt shows systematic change reflecting autocrrelation.
Practical Exercise 4. 10
What are the possible reasons for autcorrelation ?
ut f (ut 1 )....................................................................................4.56
.
ut f (ut 1 , ut 2 ).................................................................................................4.57
This form of autocorrelation is called a second order autoregressive scheme and so on.
Then, the complete form of the first-order Markov process (i.e. the pattern of autocorrelation for all the
values of U ) can be developed as follows;
ut f (ut 1 ) ut 1 t
ut 1 f (ut 2 ) ut 2 t 1
ut 2 f (ut 3 ) ut 3 t 2
ut 3 f (ut 4 ) ut 4 t 3
ut r f (ut ( r 1 ) ut ( r 1) t r
279
Generally simple first order linear form of autocorrelation can be presented in formal way as
: ut = f(ut-1)
u t u t 1 vt
…………………………………………………….4.58
where the coefficient of autocorrelation and V is a random variable satisfying all the basic assump-
tion of ordinary least square.
(v) 0, (v 2 ) v2 and (v i v j ) 0 for i j
u and ut
Note that equation 4.58 represents a linear relationship between t 1 where a is the slope coeffi-
cient and the constant intercept is equal to zero. Furthermore, the equation is a simple linear regression
model with suppressed constant term.
Practical Exercise 4. 11
ut 1 and ut
We know that the autocorrelation coefficient, ρ, between is given as:
u u t t 1
u ,u
t t 1
......................................................................4.59
u u 2
t
2
t 1
Equation 4.59 is a special form of correlation coefficients between any two variables (X and Y), which
are defined as
x y i i
rx , y
x y 2
i
2
i
u 2
t u 2t 1........................................................................................4.60
u u t t 1 u u t t 1
u ,u
t t 1
.............................................................................4.61
u t 1
2 2
u t 1
2
Thus, in order to define the error term in any particular period t for the first order Markov process, we
u ut 1 t
start from the autocorrelation function at period t which is given equation .4. 58 as t One
.
can develop the formula for the AR1
280
u1 u0 1.....................................................................................................4.62
u2 u1 2 ( u0 1 ) 2 2u0 1 2
1
For this process to be stable, must be less than one in absolute value, that is , . Then, as the
power of ρ increases to infinity, the term with the lagged u (i.e. t
u0
), tends to zero. In other words,
t
,
Thus t can be expressed as:
t 1
ut t t 1 t 0 t 11 t 2 2 ....... t 1 t
j 0
t 1
ut t t 1 t 1 t 1 2 t 2 3 t 3 ..............................................................4.65
j 0
Thus, equation 4.65 gives the value of the error term when it is autocorrelated with a first-order autore -
1
, keeps on decreasing as j keeps on increasing. This means that the
j
gressive scheme. Note that
effect of the recent past is significant and keeps on diminishing as we go further back (called fading
memory). Finally, we will establish the mean, variance and covariance of Ut; auto correlated with a first-
order autoregressive scheme.
t
i. The mean of is
t 1 t 1
E (ut ) E t E ( t j ) t E ( t j ) 0
j 0 j 0
E (vt j ) 0 E ( t ) 0................4.66
But by the assumption of the distribution of ui we have ,
It is evident from equation 4.66 that the mean of an auto-correlated stochastic term, u t , is zero if the
autocorrelation is simple first-order autoregressive scheme.
1
a 1, then a j
j 0 1 a
Then, the sum of infinite series
281
t
ii. The variance of is
2 var( t ) E ( t ) E t E ( t ) E ( t ) 2 as E ( t ) 0...........................4.67
2
E 2 j 2 t j j i t j t i
j 0 j 0
2 j E ( 2 t j ) j i E ( t j t i ), E ( 2t j ) u 2 and E ( t j t i ) 0
j 0 j 0
or
2
E t t 1 2 t 2 ....
t 0
E t 2t r 2 E (cross products ); but E (cross products ) 0
2
t 0
Then it becomes
var(ut ) E t 2t r
2
t 0
t E ( 2t r )
2
t 0
2 i
t 0
t 2
j 0
2j
u u
2 2
j 0
2j
2 (1 2 4 6 ....).......................................4.68
j 0
2j
The expression is the sum of a geometric progression of infinite terms whose first term is 1 and
the common ratio is . Following the summation rule of a geometric progression, the sum of the terms
2
282
1 1
1. j
1 . Generally, the summation rule of
in the bracket converge to 1 since That is , j 0
a geometric progression states that the sum of n terms of a geometric progression with the first term of “
a ” and the common ratio of “λ ” is given by the formula.
a (1 n )
S ...............................................................4.69
1
1
For infinite series (i.e., n → ∞) with 1 , “ S ” reduces to 1 . Therefore, using this rule equation 4.68
becomes
1
E (ui 2 ) u 2 2
( for 1)
1
or
u2
2 var( t ) ............................................................4.70
1 2
Given that
t ut ut 1 2ut 2 ....... and t 1 ut 1 ut 2 2ut 3 ...... , ……………….……4.71
and so on
u 2 3 u 2 5 u 2 ........
u 2 (1 2 4 ........)
u 2 2
j
j o
u 2
1 2
u 2 ...................................................................................4.73
283
cov( t , t s ) E ( t , t s ) s 2 cov( t , s ) E ( t , s ) s t
2
In general: . Equivalently, we have
Thus, the relationship between the distributions depends on the value of the parameter . From above
result we have
cov( t , t 1 ) E ( t , t 1 ) 2
cov( t , t 1 ) E ( t , t 1 )
2
2
cov( t , t 1 )
sin ce var( t ) var( t 1 ) 2
var t var t 1
………………………………4.74
t 2
2
t
t 2
2
t 1
cov( t , t 2 ) E ( t , t 2 ) 2 2
cov( t , t 3 ) E ( t , t 3 ) 3 2
. . .
. . .
. . .
cov( t , t r ) E ( t , t r ) r 2 ( for r t )
From this pattern we can conclude that as the lag, say r , increases (i.e., as the lag gets far apart from the
current period) the (spill over) effect of u and u declines. This is due to the fact that 1 , and hence
t t-1
The above relationship states the simplest possible form of autocorrelation; if we apply OLS on the
model given in ( 4.61) we obtain:
n
u u t t 1
ˆ t 2
n
.........................................................................4.76
u
t 2
2
t 1
284
, we observe that coefficient of autocorrelation represents a
u t2 u t21
Note that: for large samples:
simple correlation coefficient r.
n n n
ut ut 1 ut ut 1 u u t t 1
ˆ t 2
n
t 2
t 2
rut ut 1 ...........................................4.77
2
ut2 ut21
u
t 2
2
t 1
n
u t 1
2
t 2 (Why?)
1 ˆ 1 since 1 r 1
This proves the statement that says “we can treat autocorrelation in the same way as correlation in gen-
eral”. From our chapter one discussion if the value of r is 1 we call it perfect positive correlation, if r is
-1 there is perfect negative correlation and if the value of r is 0 ,there is no correlation. By the same
analogy if the value of ̂ is 1 there is perfect positive autocorrelation, if ̂ is -1 it is called perfect neg-
u u t 1 v t
ative autocorrelation and if ̂ =0 in t
u
i.e. t is not correlated.
Practical Exercise 4. 12
Why autocorrelation and correlation coefficients range between zero and one?
We have seen that ordinary least square technique is based on assumptions related to mean, variance and
covariance of disturbance term. If these assumptions do not hold true, the estimators derived by OLS
procedure may not be efficient. Similarly, autocorrelation has certain effects of on the estimators of
OLS. Some of these effects are
( ˆ ) k i (u i ) (u i ) 0
We proved ,
ˆ
Therefore, ( ) ....................................................4.78
ˆ
2. The variance of OLS estimates is inefficient: The variance of estimate in simple regression
model will be biased down wards (i.e. underestimated) when u’s are auto correlated. It can be
shown as follows. From unbiased condition above we know that:
̂ k i u i ; ̂ k i wi
285
Var ( ˆ ) ( ˆ ) 2 (ki ui ) 2
2 2
(k1u1 k 2 u 2 ...... k n u n ) 2 (k1 u1 k 22 u 22 ....... k n2 u n2 2k1 k 2 u1u 2 .... 2k n 1 k n u n 1u n )
( k i u i 2 k i k j u i u j )
2 2
2
k i (u i ) 2 2k i k j (u i u j )
ˆ ) 2 k 2 u
2
(u i u j ) 0 var( u i
If there is no autocorrelation, , the last term disappears so that: x 2 . How-
ever, we proved that
(u t u t s ) 0
but equal to
s u2
2 xi x j
Var ( ˆ ) 2 u2 2 ...............................................................4.79
x 2
( x )
2 2
i
1
Var ( B ) AR (1) E x 2t 2t 2 xt t xs s
x t
2
s t
1 2
E x 2t 2t E xt t xs s
x x t s t
2 2 2
2
t
1 2 2
x t E ( t ) E xt xs E ( t s ) but E ( t s ) s t 2
2
x x t s t
2 2 2
2
t
2 2 2 xt xt 1 xx xx
2 t t 22 2 t t 23 .....
= x 2t x t x t x t
2 2
x t
2 2 xt xt 1 xx xx
var( B) 2 t t 22 2 t t 23 .....
x 2t x t x t x t
2
xi x j
var( ˆ ) auto var( ˆ ) nonauto 2 u2 s ...............................................4.80
(xi )
2 2
286
If is positive or when 0 and Xi is positively correlated with X t+1,X t+2, ……, the second term on
2
Var ( ˆ ) 2
the right hand side will usually be positive. The implication is that x if wrongly used
ˆ ˆ
Var ( ) auto Var ( ) nonauto
while the data is auto correlated. . The var( ) is underestimated if the data
is auto correlated. The true variance is
2 xi x j 2
2 u2 2 ...................................................................................4.81
x 2
( x )2 2
i not x
2
.
To see how the underestimation of var(ui) occurs, recall variance if there is no autocorrelation
E ( e ) (n 2)
2 2
or 2
e 2
n2
n 1 n2
xi xi 1 xx i i2
x x
E ( e ) n 2 2 n
2 2 i 1
2 2 i 1
n
...... 2 n 1 n i n ..............4.82
x2i
x 2i x 2
i
i 1 i 1 i 1
If both u and X are positively auto correlated, then the OLS formula will most likely underestimate u
2
because it would ignore the expression in parenthesis which is almost certain positive. In the real world,
the successive value of most explanatory variables(X is) are positively correlated, and this would tend to
underestimation (sometimes substantial) of the variance of u and of the variance of the estimates of the
i ’s OLS were applied.
In the case the explanatory variable X of the model is random, the covariance of successive values is
( x i x j 0)
zero , under such circumstance the bias in var( ) will not be serious even though u is auto-
correlated.
3. Wrong testing procedure which will lead wrong prediction and inference about the characteris-
tics of the population. If the values of u are correlated, the predictions based on ordinary least square
estimates will be inefficient in the sense that they will have larger variance as compared with prediction
var(ˆ ) SE ( ˆ )
based on estimates obtained from other econometrics methods. If is underestimated, is
ˆ
also underestimated, this makes t-ratio large. This large t-ratio may make statistically significant
while it may not. Thus, if the error are autocorrelated , and yet we keep on using OLS the variances of
regression coefficients will be underestimated leading to narrow confidence intervals, high values of R 2
and inflated t-ratios.
287
Practical Exercise 4. 13
There are two methods that are commonly used to detect the existence of autocorrelation. These are:
1. Graphic method
Recalled from section 4.2.2 that autocorrelation can be detected using graphs as a rough indicator. De-
tection of autocorrelation using graphs will be done in two ways: by plotting the regression residuals
against their own lagged values or against time. Given a data of economic variables, autocorrelation can
be detected in this data using graphs in the following two procedures.
a. Plotting the regression residuals against their own lagged values. To this end, first run regression us-
ing OLS to the given data. Then, obtain the error terms. To see whether there is autocorrelation or
et et 1
not plot horizontally and vertically for successive values of errors i.e. plot the following ob-
(e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......( en , en 1 )
servations .If it is found that most of the points fall in quadrant I
and III, as shown in fig (a) below, we say that the given data is autocorrelated and the type of auto -
correlation is positive autocorrelation. If most of the points fall in quadrant II and IV, as shown in fig
(b) below the autocorrelatioin is said to be negative. But if the points are scattered equally in all the
quadrants as shown in fig (c) below, then we say there is no autocorrelation in the given data.
288
Figure 4.8: Graphic pattern of autocorrelation
b. The graphs obtained by plotting the regression residuals against time: If the values of the regres-
sion residuals in successive periods show a regular pattern, we conclude that the error term is autocorrel-
ated. Specifically, if the successive values of the regression residuals change sign frequently, then there
will be negative autocorrelation. In contrary, the regression residuals do not change their sign frequently
so that several positive values of them are followed by several negative values, then it can be a signal for
positive autocorrelation. If they show no pattern, then it can be concluded that there is no autocorrela-
tion.(see part 4.2.2).
Examples: regress Ethiopian expenditure on imported goods (Y) and personal disposable income (X),
1968-1987(in ten million birr) and find the residual for the regression.
Solution:
Table 4.9: regression result of consumption and income
Variable Estimated Standard error t-ratios P-value
coefficient
Incidentally, such a plot is called a time series –sequence plot. An examination of Table 4.10 shows that
the residual Ui’s do not seem to be randomly distributed. As a matter of facts, they exhibit a distinct be-
haviour initially, they are generally positive, and then become negative, and after that a given turn will
be positive indicating positive autocorrelation.
Practical Exercise 4. 13
For the above data regression result has been found. Furthermore, residuals have been computed. draw
figure of residual with its lagged as per this model and test autocorrelation.
B Std.error
The F-statistics is significant at the 1% level. This indicates that the model is adequate.
292
of the above, it is called formal testing method. Different econometricians and statisticians suggest dif-
ferent techniques of testing methods. In this teaching material, the most frequently and widely used test-
ing methods by researchers are the following are Run test, Durbin – Watson Test, and Breusch –God-
frey (BG) Test. Let see them in some details
A. Run test: Before going to the detail analysis of this method, let us define what a run is in this con-
text. Run is the number of positive and negative signs of the error term arranged in sequence according
to the values of the explanatory variables, like “++++++++-------------++++++++------------+++++
+”
By examining how runs behave, one can derive a test of randomness of runs. If there are too many runs,
ˆ
that is if U ' s change sign frequently, it an indication that there is negative serial correlation. Similarly,
if there are too few runs, they may suggest positive autocorrelation. Now let: n total number of obser-
n
vations n1 n2 , n1 number of + symbols; 2 number of – symbols; and k number of runs or the
number of turns of positive and negative sign in our data set. Our decision is as per Swed and Eisenhart
special table of critical value of the runs expected in a random sequence of N observations attached in
appendices.
Using the above examples of consumption expenditure regression given in table 4.10. The last column
gives the signs of the various residuals. We can write these signs in a slightly different form as follows:
(+++++++) (-) (+++) (------) (++++) in a strictly random sequence of observations.
For instance, there are 5 runs (K=5) positive followed by negative and then positive etc. The number of
positive signs or negative signs show length of runs. For instance; 7 plus sign show length of 7 and one
minus sign show length of one and so on. For K=5, n 1=14 and n2=6 the critical value of the runs is 5. If
the actual number of runs is equal to or less than 5, then we can reject the null hypothesis implying there
is autocorrelation. If, we accept H0 of random ei as a critical value is 5, then these are autocorrelation.
Note that: Swed- Eisenhert is at most for 40 observations,20 pluse and 20 minus. But if actual sample
size is greater we cannot use this table rather we resort to nominal distribution.
Under the null hypothesis that successive outcomes (here, residuals) are independent, and assuming that
n1 10 and n2 10 , the number of runs is distributed (asymptotically) normally with:
293
2n1n2
( k ) 1..........................................................................4.83
Mean: n1 n2
2n1n2 (2n1n2 n1 n2 )
k2 .......................................................4.84
Variance: (n1 n2 ) 2 (n1 n2 1)
Decision rule is that you do not reject the null hypothesis of randomness or independence with 95%
confidence. If
(k ) 1.96 k k E (k ) 1.96 k ; reject the null hypothesis if the estimated k lies out-
side these limits.
Practical Exercise 4. 14
n1 12, n2 16 and k 5
For of we test autocorrelation using run test obtain test autocorrelation.
B. The Durbin-Watson or d test: The most celebrated test for detecting serial correlation is one that is
developed by statisticians Durbin and Waston. It is popularly known as the Durbin-Waston or simply d
statistic, which is defined as:
t n
(e et t 1 )2
d t 2
t n
...........................................................4.85
e
t 1
2
t
In the numerator of d statistic the number of observations is n 1 because one observation is lost in tak-
ing successive differences.
1. The regression model includes an intercept term. If such term is not present, as in the case of the re -
gression through the origin, it is essential to rerun the regression including the intercept term to ob-
tain the RSS.
2. The explanatory variables, the X’s, are non-stochastic, or fixed in repeated sampling.
Ut Vt u t 1 t
3. The disturbances are generated by the first order auto regressive scheme:
294
4. The regression model does not include lagged value of the dependent variable ,Y, as one of the ex -
planatory variables. Thus, the test is inapplicable to models of the following type
Yt 1 2 X 2t 3 X 3t ....... k X kt rYt 1 U t
Y
Where t 1 the one period lagged value of Y is such models are known as autoregressive models. If
d-test is applied mistakenly, the value of d in such cases will often be around 2, which is the value of
d in the absence of first order autocorrelation. Durbin developed the so-called h-statistic to test serial
correlation in such autoregressive.
5. There are no missing observations in the data. From equation 4.85 the value of
t n
(e t et 1 ) 2
d t 2
t n
e
t 1
2
t
e e2
t
2
t 1 2et et 1
d t 2 t 2
....................................................4.86
et2
n n
et2 et21
However, for large samples t 2 t 2 because in both cases one observation is lost. Thus,
n
2 et2
2et et 1
d t 2
e 2
t e 2 t
e e
d 2 1 n t t 1
et2
t 1
et et 1
but e 2t from equation
d 2(1 ˆ )....................................................................4.87
As 1 1 , implies that 0 d 4
295
ˆ 0, d 2
if ˆ 1, d 0
ˆ 1, d 4
Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept null hy-
pothesis, and if it is close to zero or four, we reject the null hypothesis that there is no autocorrelation.
It should be clear that in Durbin-Watson test the null hypothesis of zero autocorrelation, ρ = 0 is carried
out indirectly, by testing the equivalent hypothesis d = 0 . Therefore, after formulating the hypotheses,
we use the sample residuals and compute the empirical value of the Durbin-Watson d statistic. Then, fi-
nally, we compare the empirical d with the theoretical values d that define the critical region of the test.
However, because the exact value of d is never known, there exist ranges of values with in which we can
d
either accept or reject null hypothesis. We have d L -lower bound and u upper bound which are appro-
priate to test the hypothesis of zero first – order autocorrelation against the alternative hypothesis of pos -
itive first – order autocorrelation. Durbin and Watson have tabulated these upper and lower values at the
5 % and1% level of significance. The tables assume that the u' s are normal, homoscedastic and not auto
correlated, and that the X ' s are truly exogenous.
The test compares the empirical value of d with dl (lower boundary) and du (upper boundary) in the
Durbin-Watson tables and with their transformed values, (4-d L ) and ( 4-du ).The comparison using dL
and du investigates the possibility of positive autocorrelation; while the comparison with(4-d L ) and ( 4-
du ) investigates the possibility of negative autocorrelation.
For the two-tailed Durbin Watson test, we have five regions for the values of d. The mechanisms of the
D.W test are as follows, assuming that the assumptions underlying the tests are fulfilled.
Obtain the computed value of d using the formula given in equation 4.87
d
For the given sample size and given number of explanatory variables, find out critical d L and U
values.
296
1. If d is less that d L or greater than (4 d L ) we reject the null hypothesis of no autocorrelation in fa-
vor of the alternative which implies existence of autocorrelation.
dU (4 d U )
2. If, d lies between and , accept the null hypothesis of no autocorrelation
(4 d U )
and (4 d L ) , the D.W test is
d
3. If however the value of d lies between d L and U or between
inconclusive.
Y X U i d L 1.37;
Example 1. Suppose for a hypothetical model ,if we found d 0.1380 ;
dU
1.50. Based on the above values test for autocorrelation
(4 d U )
Solution: First compute (4 d L ) and
d
and compare the computed value of d with d L , U ,
(4 d L ) and (4 d U )
i.e.
297
Y X t U t
Example 2. Consider the model t with the following observation on X and Y
Table 4.13: Hypothetical values of X and Y
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Solution:
Y X t U t
1. Regress Y on X using data above : i.e. t :
xy 255, Y 7, (ei et 1 )2 60.21, x2 280, X 8, et2 41.767, y2 274
Then
xy 255
ˆ 0.91
x 2 280 , ˆ Y ˆX 7 0.91(8) 0.29
(et et 1 ) 2 60.213
d 1.442
Y 0.29 0.91X U i
, Yˆ 0.28 0.91X , R 2 0.85 , et2 41.767
d
Values of d L and U on 5% level of significance, with n=15 and one explanatory variable are: d L =1.08
d (4 d u ) 2.64 dU d 4 dU (1.364 2.64) d * 1.442
and U =1.36. . Since d* lies between
,
dU d 4 dU
, accept H0. This implies the data is autocorrelated.
Although D.W test is extremely popular, the d test has one great drawback when tests falls in the incon-
clusive zone or region, one cannot conclude whether autocorrelation does or does not exist. Several au-
thors have proposed modifications of the D.W test.
d
In many situations, however, it has been found that the upper limit U is approximately the true signifi-
d
cance limit. Thus, the modified DW test is based on U in case the estimated d value lies in the incon-
clusive zone, one can use the following modified d test procedure. Given the level of significance ; if
298
Practical Exercise 4. 15
Y X U i d L 1.37; d U 1.40.
Suppose for a hypothetical model ,if we found d 1.1380 ;
Based on the above values test for autocorrelation
ut
where fulfils all assumption of the CLRM. The null hypothesis to be tested is:
H 0 : 1 2 ..... p 0
.
Steps:
Yt X t t
1. Estimate the model: and obtain the residuals t
......... t p , t
2. find auxiliary regression by regress t on Xt and that is t 1, t 2,
t X t 1 t 1 t 2 ...... p t p t
……………………………4.89
cept otherwise.
Assume after applying one or more of the diagnostic tests of autocorrelation discussed in the previous
section, we find that there is autocorrelation. Since in the presence of serial correlation the OLS estima -
tors are inefficient, it is essential to seek for remedial measures. The remedy however depends on what
299
knowledge one has about the nature of interdependence among the disturbances. Different options are
recommended as remedial measures. First check for misspecification in a sense that it has excluded
some important variable or there is incorrect functional form. Second, try to check whether there is pure
autocorrelation and not. If it is pure autocorrelation, one can use appropriate transformation of the origi -
nal model so that in the transformed model we do not have the problem of pure autocorrelation. As in
the case of heteroskedaticity, we will have to use some form of generalized least square (GLS) methods.
This depends on whether the coefficient of autocorrelation is known or not known.
A. when is known- When the structure of autocorrelation is known, i.e is known, the appropriate
corrective procedure is to transform the original model or data so that error term of the transformed
model or data is not autocorrelated. When we transform, we are wipe out the effect of . Suppose that
our model is
Yt X t U t U t U t 1 Vt , | | 1................................4.90
and
Equation (4.90) indicates the existence of autocorrelation. If is known, then transform Equation (4.90
into one that has not autocorrelated. The procedure of transformation will be given below. Take the
lagged form of equation (1) and multiply through by .
y X t 1 U t 1......................................................................4.91
t 1
Yt Yt 1 ( ) ( X t X t 1 ) (U t U t 1 )................................4.92
Vt U t U t 1
Yt Yt 1 ( ) ( X t X t 1 ) vt ...............................................4.93
Yt * Y y t 1
Let:
a (1 )
X t* X t X t 1
Yt * a BX t* vt ...................................................................................4.94
300
It may be noted that in transforming Equation (4.90) into (4.94) one observation shall be lost because of
lagging and subtracting in (4.92). We can apply OLS to the transformed relation in (4.94) to obtain
ˆ and ˆ for our two parameters and .
2
aˆ 1
ˆ var ˆ var(aˆ )....................................4.95
1 and it can be shown that 1
v
Because ̂ is perfectly and linearly related to â . Again since t satisfies all standards assumptions, the
ˆ
variance of ˆ and would be given by our standard OLS formulae.
u2 X t2 * u2
var(ˆ ) n
, var( ˆ ) n
................................4.96
n ( X X )
*
t
2
( X *
t X ) * 2
t
ti
BLUE. One of the basic problem in the above transformation requires knowledge of the value of .
Thus we need to estimate it.
When is not known, we will describe the methods through which the coefficient of autocorrelation
can be estimated. There are two methods; based on a priori information on and Estimated from
d-statistic.
i. Method I: A priori information on
Many times an investigator makes some reasonable guess about the value of autoregressive coefficient
by using his knowledge or institution about the relationship under study. Many researchers usually as-
sume that =1 or -1. Under this method, the process of transformation is the same as when is
known. When =1, the transformed model becomes;
(Yt Yt 1 ) ( X t X t 1 ) Vt Vt U t U t 1.................................4.97
; where
The constant term is suppressed in the above. B̂ is obtained by taking merely the first differences of the
variable and obtaining line that passes through the origin. Suppose that one assumes = -1 instead of
=1, the case of perfect negative autocorrelation. In such a case, the transformed model becomes:
301
Yt Yt 1 ( X X t 1 ) vt
Yt Yt 1 2 ( X t X t 1 ) vt t .....................4.98
Or 2 2 2
This model is then called two period moving average regression model because actually we are regress -
Yt Yt 1 ( X t X t 1 )
ing the value of one moving average 2 on another 2 . This method of first differ-
ence in quite popular in applied research for its simplicity. But the method rests on the assumption that
either there is perfect positive or perfect negative autocorrelation in the data.
Yt on Yt 1 , X t and X t 1 :
c) Durbin’s method: Run the regression of
Yt Yt 1 X t X t 1 ut
……………………………………..4.99
An estimator of is the estimated coefficient of Y t-1. The above relationship is true only for large sam-
ples. For small samples, Theil and Nagar have suggested the following relation:
n 2 (1 d 2 ) k 2
ˆ ...........................................................................4.100
n2 k 2
where n=total number of observation; d= Durbin Watson statistic ; k=number of coefficients (including
intercept term). Using this value of ̂ we can perform the above transformation to avoid autocorrelation
from the model.
Practical Exercise 4. 16
What is logical reason for using a based on a priori information? And estimated values ̂ ?
302
4.3 Multicollinearity
4.3.1 Definition and nature of Multicollinearity
So far we have implicitly assumed no any form of linear relationship among explanatory vari-
ables. But variables are correlated in some sense. Many social research studies use a large number prob-
lems arise when the various predictors are highly & linearly related. Multicollinearity is the existence of
a “perfect” or exact, linear relationship among some or all explanatory variables of a regression model.
X 1 , X 2 ,..........., X n
For regression involving explanatory variables , an exact linear relationship is said to
exist if the following condition is satisfied.
1 X 1 2 X 2 .......... k X k 0.......................................................4.101
1 , 2 ,..... k
where are constants such that not all of them are simultaneously zero. Perfect multicollinear-
ity occurs when one (or more) of the regressors in a model (e.g., X K) is a linear function of other/s (X i, i
= 1, 2, …, K-1).
303
But, in practice neither of the above cases is often met. There is some degree of interrelation among the
explanatory variables due to the interdependence of many economic variables over time. Simple correl-
ation coefficient for each pair of explanatory variables will have a value between zero and unity. As this
value approaches unity, multicollinearity gets stronger and it impairs the accuracy and stability of the
parameter estimates. Finally, multicollinearity is not a condition that either exists or does not exist in
economic functions, but rather it is a phenomenon inherent in most relationships due to the nature of
economic magnitudes.
Y , X1, & X 2
The nature of multicollinarity can be illustrated using in figures below. The figures repre-
sent the variation in Y (the dependent variable) and X 1 and X2 (explanatory variables) respectively. The
degree of collinearity can be measured by the extent of overlap (shaded area) between X 1 and X2. In the
X 1 and X 2
fig.(a) below there is no overlap between and hence no collinearity. In figs. ‘b’ through ‘e’,
X 1 and X 2
there is “low” to “high” degree of collinearity. In the extreme if were to overlap completely
(or if X1 is completely inside X2, or vice versa), collinearity would be perfect.
Y n X X
i 0 1 1i 2 2i
Y X n X X X X
2
i 1i 0 1i 1 1i 2 1i 2i
Y X n X X X X
2
i 2i 0 2i 1 1i 2i 2 2i ...........................................4.103
But, substituting 2X1 for X2 in the 3rd equation yields the 2nd equation. i.e., one of the normal equations is
in fact redundant. Thus, we have only 2 independent equations but 3 unknowns (β's) to be estimate. As a
result, the normal equations will reduce to:
α̂ βˆ 1 2βˆ 2
YX i 1i nX1Y
...............................................................4.105
X 2
1i nX12
βˆ 0 Y [βˆ 1 2βˆ 2 ]X1.......................................................................4.106
Practical Exercise 4. 17
One of the reasons for multicollinearty is due the data collection method employed (e.g. sampling over a
limited range of X values). Example: if regression is conducted using small sample values of the popu-
lation, there may be multicollinearity. But, if we take all the possible values, it may not show multi -
collinearity. Second reason for multicolinearity is constraint over the model. For example: in the regres-
sion of electricity of consumption expenditure on income (x 1) and house size (x2), there is a physical
constraint in the population in that families with higher income generally have larger homes than with
lower incomes. Third, multicollinrearity can be due to overdetermined model: This happens when the
model has more explanatory variables than the number of observations. This could happen in medical
research where there may be a small number of patients about whom information is collected on a large
number of variables.
305
Fourth, adding many polynomial terms to a model, especially if the range of the X variable is small ; in-
cluding (almost) the same variable twice in the model or including a variable computed from other vari-
ables in the model (e.g. using family income, mother’s income & father’s income together) can be cause
of multicollinearity . In this case variables are highly correlated or a tendency of economic variables to
move together over time. Fifth, multicolinearity may be due to inherent nature of the data. Especially in
time series data, where the regressors included in the model share a common trend, they all increase or
decrease over time. For example, in the regression of consumption expenditure on income, wealth, and
population,…etc may all be growing over time at more or less the same rate, leading to collinearity
among these variables. Sixth , using of lagged values of some explanatory variables as separate inde-
pendent factors in the relationship. In addition, improper use of dummy variables can be sited.
Practical Exercise 4. 18
What are possible reasons for multicolinearity? Substantiate your answer with example?
306
ˆ
Applying the same procedure, we obtain similar result (indeterminate value) for 2 . Likewise, from our
2 x22
var( ˆ1 ) 2 2
ˆ x1 x2 (x1 x2 ) 2 . Substi-
discussion of multiple regression model, variance of 1 is given by:
tuting x 2 x1 in the above variance formula, we get:
2 2 x12 2 2x12
(infinite)
(x1 ) (x1 )
2 2 2 2 2 2
0 …………………………4.111
These are the consequences of perfect multicollinearity.
2. If multicollineaity is less than perfect (i.e near or high multicollinearity), the regression
coefficients are determinate. Consider the two explanatory variables model above in deviation form. If
x 2 x1 x1 and x 2
we assume , there is perfect correlation between because the change in x2 is com-
x 2i x1i vt 0, vt
pletely captured by the change in x 1. If we have: where is stochastic error
x i v i 0
term such that , is high multicolinearity but not perfect. For example, X1 = 3 – 5XK + ui. In
this case X2 is not only determined by x1, but also affected by some other variables given by v i (sto-
x 2 x1i vt ˆ1
chastic error term). Substitute in the formula of above
2 x 22
var( ˆ 2 )
x12 x 22 (x1 x 2 ) 2 …………………..…………….……………4.113
1
2
Multiply the numerator and the denominator by x 2
1
2 x22 . 2
x2 2
var( ˆ2 )
x12x22 (x1 x2 )2 . x1 2 x12 (x1xx22 )
2
2 1
2 2
..................................................4.114
(x1 x2 ) 2 x12 (1 r122 )
x 1
2
1
x12 x12
2
Where r12 is the square of correlation coefficient between x1 and x 2 ,
307
x2i x1i vi ˆ
, what happen to the variance of 1 as r12 is line rises. As r12 tends to 1 or as collinearity
2
If
ˆ
increases, the variance of the estimator increase and in the limit when r12 1 variance of 1 becomes in-
finite.
4. Invalid hypothesis testing: Because of the large variance of the estimators, which means large stan-
dard errors, the confidence interval tend to be much wider as the computed t-ratio will be very small.
i
Inflated standard errors of along with, small t-ratio leading one or more of the coefficients to be
statistically insignificant when tested individually leading to the acceptance of “zero null hypothesis”
& confidence interval become wide (rejecting H0: βj = 0 becomes very rare). In cases of high
collinearity, it is possible to find that one or more of the partial slope coefficients are individually sta-
tistically insignificant on the basis of t-test.
5. Although the t-ratio of one or more of the coefficients is very small (which makes the coefficients sta-
tistically insignificant individually), R2, the overall measure of goodness of fit, can be very high (say
excess of 0.9)., indicating no much individual variation in the X's, but a lot of common variation. In
other word, on the basis of F test one can convincingly reject the hypothesis that
1 2 k 0
Indeed, this is one of the signals of multicollinearity.
6. The implication of indetermination of regression coefficients in the case of less than perfect multicol -
x1 and x 2
inearity is that it is not possible to observe the separate influence of . But such extreme
case is not very frequent in practical applications. Most data exhibit less than perfect multicollinear -
ity. If there is inexact but strong multicollinearity: it difficult to isolate the effect of each of the highly
collinear X's on Y as the collinear regressors explain the same variation in Y. Furthermore, estimated
coefficients change radically depending on inclusion/exclusion of other predictor/s. The regression
tends to be very shaky from one sample to another.
Practical Exercise 4. 19
r12 2
cov( 1 , 2 ) (1 r122 ) x12x12
1. Show that is ?
2. How confidence interval will be affected when there is multicollinearity?
308
(r ' s)
c. High correlation coefficients among the explanatory variables xi x j
d. VIF and TOL
e. Large standard errors and smaller t-ratio of the regression parameters
Note that: None of the symptoms by itself are a satisfactory indicator of multicollinearity. Because:
Large standard errors may arise for various reasons and not only because of the presence of linear rela -
rxi x j 2
tionships among explanatory variables. A high is only sufficient but not a necessary condition (ade-
quate condition) for the existence of multicollinearity because multicollinearity can also exist even if the
correlation coefficient is low. However, the combination of all these criteria should help the detection of
multicollinearity.
Since multicollinearity arises because one or more of the regressors are exact or approximately linear
combinations of the other regressors, one may find out which X variable is related to other explanatory
variables. To do this regress each X i on the remaining X variables and compute the corresponding R 2,
Ri2
which we designate as , each one of these regressions is called auxiliary to the main regression of Y
on the X’s. Then, following the relationship between F and R 2 established to test over all significance
of the variable.
Rx21 , x2 , x3 ,... xk / k 2
Ri ~F( k 2, n k 1) .................................................................4.115
1 Rx21 , x2 , x3 ,... xk /(n k 1)
where: n is number of observation, k is number of parameters including the intercept
If the computed F exceeds the critical F at the chosen level of significance, it mean that the particular X i
is collinear with other X’s; if it does not exceed the critical F, we say that it is not collinearity with other
X’s in which case we may retain the variable in the model. If F i is statistically significant, we will have
to decide whether the particular Xi should be dropped from the model. According to Klieri’s rule of
thumb, multicollinearity may be a trouble if R 2 obtained from an auxiliary regression is greater than the
overall R2, that is obtained from the regression of Y on all regressors.
II. Test of multicollinearity using Eigen values and condition index:
Using Eigen values we can drive a number called condition number K as follows:
max imum eigen value
k ....................................................................4.116
Minimum eigen value
In addition using these value we can drive the condition index (CI) defined as
Max.eigen value
CI k ..................................................4.117
min . eigen value
Decision rule: If K is between 100 and 1000 there is moderate to strong muticollinearity and if it ex -
ceeds 1000 there is sever muticollinearity. Alternatively if CI( k ) is between 10 and 30, there is
moderate to strong multicollineaity and if it exceeds 30 there is sever muticollinearity. Example . If
k=123,864 and CI=352 – This suggest existence of multicollinearity.
309
III. Test of multicollinearity using Tolerance and variance inflation factor
R2 2
where i is the R in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is variance
inflation factor. The larger is the value of VIF j, the more “troublesome” or collinear is the variable X j.
However, how high should VIF be before a regressor becomes troublesome? As a rule of thumb, if VIF
R2
of a variable exceeds 10 (this will happens if i exceeds (0.9) the variable is said to be highly collinear.
Other authors use the measure of tolerance to detect multicollinearity. It is defined as
1
TOLi (1 R 2j )
VIF . Clearly, TOLj =1 if Xj is not correlated with the other regressors, where as it is
zero if it is perfectly related to other regressors.
One can derive VIF from variances and covariances of estimates as indicated by equation 4.11 are
2
x12 (1 r122 ) . Where r122 is the square of correlation coefficient between x1 and x 2 ,Similarly;
r12 2
cov(1 , 2 )
(1 r122 ) x12 x12
. As r12 increases to ward one, the covariance of the two estimators in-
crease in absolute value. The speed with which variances and covariance increase can be seen with the
1
VIF
variance-inflating factor (VIF) which is defined as: 1 r122 .VIF shows how the variance of an es-
2
timator is inflated by the presence of multicollinearity. As r12 approaches 1, the VIF approaches infin-
ity. That is, as the extent of collinearity increase, the variance of an estimator increases and in the limit
x and x 2
the variance becomes infinite. As can be seen, if there is no multicollinearity between 1 , VIF
ˆ
will be 1. Using this definition we can express var( 1 ) and var( 2 ) interms of VIF
ˆ 2 ˆ 2
var( 1 ) 2 var( 2 ) 2
x1 VIF and x 2 VIF…………………………………4.119
ˆ ˆ
Which shows that variances of 1 and 2 are directly proportional to the VIF.
Limitation: VIF (or tolerance) as a measure of collinearity, is not free of criticism. As we have seen ear-
2
var( ˆ ), 2 (VIF )
xi ; depends on three factors , xi and VIF . A high VIF can be counter
2 2
lier
balanced by low
2 or high xi2 . To put differently, a high VIF is neither necessary nor sufficient to
get high variances and high standard errors. Therefore, high multicollinearity, as measured by a high
VIF may not necessarily cause high standard errors.
Table 4.14: Data on three variables Y, X1, X2
Y X1 X2
70 80 810
310
65 100 1009
90 120 1273
95 140 1425
Solution
A. fitting regression is
Yt 24.34 0.87 X 1 0.035 X 2
se (6.286) (0.31) (0.03) R 2 0968 adjR 2 0.959
t (3.875) (2.772) ( 1.16)
P (0.006) (0.027) (0.284)
Hence, X1 is significant at 0.05 but X 2 is not. The correlation coefficients between income and wealth in
this example is also found to be 0.9937which indicate multicolinearity is a threat in the model. If we
want to test the multicollniearity with the VIF value, we regress X 1 on X2 and obtain the following
model
X t 2.43 0.095 X 2
se (7.03) (0.004) R 2 0988
t (0.346) (25.253)
P (0.006) (0.027)
The VIF then is found to be
1 1
VIF 83
1 R j 1 0.988
2
VIF
This value is obviously greater than 10. Therefore, we can say that there is multicolinearity in the model.
information on the exact value of 1 or 2 from extraneous source, we then make use of such infor-
mation in estimating the influence of the remaining variable of the model in the following way. Sup-
Thus, the estimation procedure described is equivalent to correcting the dependent variable for the influ-
ence of those explanatory variables with known coefficients (from extraneous source of information)
and regressing this residual on the remaining explanatory variables.
b. Methods of transforming variables or functional form: This method is used when the relationship
between certain parameters is known a priori.
312
i. Using ΔX instead of X (where the cause may be X's moving in the same direction over time),
X Xj
using instead of Xj tends to reduce collinearity in polynomial regressions.
ii. Using logs form reduces between variables by more factor than using levels. Suppose that we
u
want to estimate the production function expressed in the form Q AL K e where Q is
quantity produced L-labor input and K- the input of capital. It is required to estimate
and . Up on logarithmic transformation, the function becomes:
ln Q ln A ln L ln K u i
…………………………………4.121
Q* A * L * K * U
………………….……………………..4.122
The asterisk indicates logs of the variables. Suppose, it is observed that K and L move to-
gether so closely and it is difficult to separate the effect of changing quantities of labor inputs
on output from the effect of variation in the use of capital. Again, let us assume that on the
basis of information from some other source, industry is characterized by constant returns to
The methods described above are no sure methods to get rid of the problem of multicollinearity. The
rules work to be used in practice will depend on the nature of the data under investigation and severity
of the multicollinearity problem.
314
able effect on Y and is independent of X's. Thus, the effect of error term equals zero on average; and,
E(Y|X) is linear in stable parameters (β's).
The two versions of endogeneity assumptions:
i. E(ɛi) = 0 and X’s are fixed (non-stochastic), The assumption E(εi) = 0 amounts to: “We do not
systematically over-or-under estimate the PRF,” or the overall impact of all the excluded vari -
ables is random/unpredictable.
ii. E(ɛiXj) = 0 or E(ɛi|Xj) = 0 with stochastic X’s. This assumption cannot be tested as residuals will
always have zero-mean if the model has an intercept. If there is no intercept, some information
can be obtained by plotting the residuals. If E(ɛ i) = μ (a constant but ≠ 0) & X's are fixed, the es-
timators of all β's, except β0, will be OK.
If the assumption E(εi|Xj) = 0 is violated or if u is for whatever reason correlated with the explanatory
variable Xj, then we say that Xj is an endogenous explanatory variable. Then the OLS estimators will be
biased & inconsistent. However, assuming exogenous regressors is unrealistic in many situations.
315
Yi 0 1 X 3 X i 2 2i .........................................................................4.125
Since we assumed the model in equation 4.124 to be correct, adopting 4.125 would constitute a specific-
2i 3 Xi 3 1i ...........................................................................................4.126
B. Inclusion of an unnecessary variable(s)
Sometimes instead of the correct model in equation 4.124 one may use a model which include unne -
cessary or irrelevant variable. To see this type of error, suppose that a researcher uses the following
model:
Y 0 1 X 2 X i 2 3 X i 3 4 X i 4 1i ......................................4.128
Because
4 = 0 in the true model given in equation 4.124, but included as key variable in this model un-
necessarily.
C. Adopting the wrong functional form (Error in the algebraic form of the relationship)
Apart from omitting functions of independent variables, model can suffer from misspecified functional
form. A regression model suffers from functional form misspecification (specification bias) when it does
not properly account for the relationship between the dependent and the observed explanatory variables
or error in functional form. These types of errors are usually committed at the stage of representing
economic relationships by mathematical form. Sometimes a researcher may wrongly use linear model to
represent non-linear relationships. For instance, using a linear functional form when the true relationship
is logarithmic (log-log) or semi-logarithmic (lin-log or log-lin). For example, a researcher may use equa-
tion below to represent the cubic relationship between cost of production (Y ) and output produced ( X )
as given in equation 4.130.
ln Y 0 1 X 2 X i 2 3 X i 3 4i ......................................4.130
But, if new use Y instead of lnY and /or the power of X is ¼ rather than one, then we will not obtain un -
biased or consistent estimators of the partial effects.
There are two causes of such problems: stochastic regressors and errors of measurement.
A. Stochastic Regressors : incorrect specification of the stochastic error term.
Many economic variables are stochastic, and it is only for ease that we assumed fixed X's. It is not
whether X's are stochastic or fixed that matters, but the nature of correlation between X's & ɛ. In gen-
eral, stochastic regressors may or may not be correlated with the error term. If X & ɛ are independently
distributed, then E(ɛ|X) = 0, OLS retains all its desirable properties even if X's are stochastic.
316
If Xi are contemporaneously correlated with error term, there is problem of incorrect specification of the
stochastic error term. These errors determine the way the stochastic error term ui enters the regression
model, its estimation and interpretation. Consider the following regression model without the intercept
term:
Yi X iU i ................................................................................................4.131
In this model the stochastic error term enters multiplicatively and it does not satisfy the usual assump-
tions of the stochastic term. However, suppose that the true model is as given below.
Yi X i U i ..........................................................................................4.132
In equation 4.132 the error term enters additively. Although the variables are the same in the two mod-
els, the improper stochastic specification of the error term in equation 4.131 will constitute specification
error. Thus, if X & Ui are contemporaneously correlated, [E(ɛi|Xi±s)≠0 for s = 1, 2,…], or Ui & X are
asymptotically correlated, OLS retains its large sample properties: estimators are biased, but consistent
and asymptotically efficient.
A good example is the marginal income tax rate facing a family. The marginal rate may be hard to ob -
tain or summarize as a single number for all income levels. Instead, we might compute the average tax
rate based on total income and tax payments. Since average tax rate is used as a proxy for marginal in -
come tax which is poor.
317
i. Measurement Error in the Dependent Variable
We begin with the case where only the dependent variable is measured with error. Let y* denote the
variable that we would like to explain. For example, y* could be annual family savings. The regression
model satisfies the Gauss-Markov assumptions has the usual form
Y * 0 B1 X ....... Bk X k u............................................................4.134
Let Y represent the observable measure of Y*. In the savings case, Y is reported annual savings. Unfortu-
nately, families are not perfect in their reporting of annual family savings; it is easy to leave out cate -
gories or to overestimate the amount contributed to a fund. Generally, we can expect Y and Y* to differ,
at least for some subset of families in the population.
The measurement error (in the population) is defined as the difference between the observed value and
the actual value:
e0 y y *..........................................................................................4.135
ei 0 yi yi *
For a random draw i from the population, we can write , but the important thing is how the
measurement error in the population is related to other factors. To obtain an estimable model, we write
yi * yi ei 0
, plug this into equation (4.130. ), and then rearrange:
Y * 0 B1 X ....... Bk X k ei 0 u...................................................................4.136
ei 0 u
The error term in equation 4.136 is . Because Y, X1, X2, ..., Xk are observed, we can estimate this
model by OLS. Since the original model eq. 4.136 satisfies the Gauss-Markov assumptions, u has zero
mean and is uncorrelated with each Xj, it is only natural to assume that the measurement error has zero
mean. The usual assumption is that the measurement error is a random reporting error is statistically in-
dependent of each explanatory variable. Then OLS is good estimators from eq. 4.136 with unbiased and
consistent. Furthermore , the usual OLS inference procedures (t, F, and LM statistics) are valid.
relationship between the measurement error, e0, and the explanatory variables, Xj. This means that mea-
surement error in the dependent variable results in a larger error variance than when no error. Such mea-
surement error in the dependent variable can cause biases in OLS if it is systematically related to one or
more of the explanatory variables.
ii. Measurement Error in an Explanatory Variable
318
Traditionally, measurement error in an explanatory variable has been considered a much more important
problem than measurement error in the dependent variable. We begin with the simple regression model
which satisfies first four Gauss-Markov assumptions
Estimation of equation (4.137) by OLS would produce unbiased and consistent estimators of
0 and 1 .
X1 * X1 *
The problem is that is not observed. Instead, we have a measure of , call it X1.There may be
X1 * X1 *
difference between and X1. For example, could be actual income, and X1 could be reported in-
come. There are deviations between the two as household usually tendency to hide actual income. The
measurement error is simply
E (ei ) 0
We assume that the average measurement error in the population is zero: . This is natural, and,
in any case, it does not affect the important conclusions that follow. A maintained assumption is that u is
X1 *
uncorrelated with and X1.
E ( y / x1*, x1 ) E (Y / x1*)
In conditional expectation terms, we can write this as which just says that x1
X1 *
does not affect Y after has been controlled for. We used the same assumption in the proxy variable
X1 *
case. If we simply replace with X1 and run the regression of Y on X1 , we want to know the proper-
ties of OLS. They depend crucially on the assumptions we make about the measurement error. There are
two polar extremes assumptions in econometrics literature in this regard. The first assumption is that e1
is uncorrelated with the observed measure, X1:
From the relationship in (4.137), if assumption (4.139) is true, then e1 must be correlated with the unob-
X1 * X 1* X 1 e1
served variable . To determine the properties of OLS in this case, we write and plug
this into original equation:
X 1 , u 1e1
Because we have assumed that u and e1 both have zero mean and are uncorrelated with has
X1 *
zero mean and is uncorrelated with X1. It follows that OLS estimation with X1 in place of produces
319
a consistent estimator of
1 (and also 0 ) . Since u is uncorrelated with e , the variance of the error in
1
(4.140) is
Var (u B1e1 ) u 2 12 u 2
. Thus, except when
1 0 , measurement error increases the error
j
variance. But this does not affect any of the OLS properties (except that the variances of the will be
X1 *
larger than if we observe directly). The assumption that e1 is uncorrelated with X1 is analogous to
the proxy variable assumption. Since this assumption implies that OLS has all of its nice properties, this
is not usually what econometricians have in mind when they refer to measurement error in an explana-
tory variable.
The measurement error problem discussed in the previous section can be viewed as a data problem as a
result of missing data, nonrandom samples, and outlying observations.
1. Missing Data
Literarily missing data refers to a situation in which we cannot obtain data on the variables of interest.
The missing data can arise in a variety of forms. Often, we collect a random sample of people, schools,
cities, and so on, and then discover later that information is missing on some key variables for several
units in the sample. If data are missing for an observation on either the dependent variable or one of the
independent variables, then the observation cannot be used in a standard multiple regression analysis.
The composite error term is correlated with the miss measured independent variable, violating the
Gauss-Markov assumptions. Ignoring the observations that have missing information reduce the sample
size available for a regression. Although this makes the estimators less precise, it does not introduce any
bias.
2. Nonrandom Samples
Under the Gauss-Markov assumptions, it turns out that the sample can be chosen on the basis of the in-
dependent variables without causing any statistical problems. This is called sample selection based on
the independent variables. Nonrandom Samples data problems can violate the random sampling as-
sumption. Missing data is more problematic when it is as a results oa nonrandom sample from the
population. Suppose that we are estimating a saving function, where annual saving depends on income,
age, family size, and perhaps some other factors. A simple model is defined as
320
Suppose that our data set was based on a survey of people over 35 years of age, thereby leaving us with
a nonrandom sample of all adults. We can still get unbiased and consistent estimators of the parameters
in the population model (4.141), using the nonrandom sample. Selection on the basis of the independent
variables is not a serious problem provided there is enough variation in the independent variables in the
subpopulation.
For example, suppose we wish to estimate the relationship between individual wealth and several other
factors in the population of all adults:
Suppose that only people with wealth below $250,000 are included in the sample. This is a nonrandom
sample from the population of interest. Using a sample on people with wealth below $250,000 will re-
sult in biased and inconsistent estimators of the parameters in (4.142). Briefly, this occurs because the
population regression is not the same as the expected value conditional on wealth being less than
$250,000.
Stratified sampling is a fairly obvious form of nonrandom sampling. A common method of data collec-
tion is stratified sampling, in which the population is divided into nonoverlapping, exhaustive groups,
or strata. Then, some groups are sampled more frequently than is dictated by their population representa-
tion, and some groups are sampled less frequently. For example, some surveys purposely oversample
minority groups or low-income groups. Whether special methods are needed again hinges on whether
the stratification is exogenous (based on exogenous explanatory variables) or endogenous (based on the
dependent variable).
The sample is selected based on someone’s decision to work. However, since the decision to work might
be related to unobserved factors, selection might be endogenous, and this can result in a sample selection
bias in the OLS estimators. In other cases, nonrandom sampling causes the OLS estimators to be biased
and inconsistent.
3. Outliers and Influential Observations
In some applications, especially, with small data sets, the OLS estimates are influenced by one or sev-
eral observations. Such observations are called outliers or influential observations. An observation is
an outlier if dropping it from a regression analysis makes the OLS estimates change by a practically
321
“large” amount. OLS is susceptible to outlying observations because it minimizes the sum of squared
residuals: large residuals (positive or negative) receive a lot of weight in the least squares minimization
problem. If the estimates change by a practically large amount when we slightly modify our sample it is
because of outliers, we should be concerned.
When statisticians and econometricians study the problem of outliers theoretically, sometimes the data
are viewed as being from a random sample of a given population. Sometimes the outliers are assumed to
come from a different population. From a practical perspective, outlying observations can occur for two
reasons: when entering data and sampling from small population. The easiest case of outlier is when a
mistake has been made in entering the data. Adding extra zeros to a number or misplacing a decimal
point can throw off the OLS estimates, especially in small sample sizes. It is always a good idea to com -
pute summary statistics, especially minimums and maximums, in order to catch mistakes in data entry.
Unfortunately, incorrect entries are not always obvious.
Outliers can also arise when sampling from a small population if one or several members of the popula -
tion are very different in some relevant aspect from the rest of the population. The decision to keep or
drop such observations in a regression analysis can be a difficult one, and the statistical properties of the
resulting estimators are complicated. Outlying observations can provide important information by in-
creasing the variation in the explanatory variables (which reduces standard errors). But OLS results
should probably be reported with and without outlying observations in cases where one or several data
points substantially change the results.
In some cases, certain observations are suspected at the outset of being fundamentally different from the
rest of the sample. This often happens when we use data at very aggregated levels, such as the city,
county, or state level. Sometimes statistical formula can be used that can detect such influential obser -
vations. Another approach of dealing with influential observations is use an estimation method that is
less sensitive to outliers than OLS. For most economic variables, the logarithmic transformation signifi -
cantly narrows the range of the data and also yields functional forms such as constant elasticity models
can explain a broader range of data. One such method, which is becoming more and more popular
among applied econometricians, is called least absolute deviations (LAD). The LAD estimator mini-
mizes the sum of the absolute deviations of the residuals, rather than the sum of squared residuals. It is
known that LAD is designed to estimate the effects of explanatory variables on the conditional median,
rather than the conditional mean, of the dependent variable. Because the median is not affected by large
changes in extreme observations, the parameter estimates obtained by LAD are resilient to outlying ob-
322
servations. Least absolute deviations is a special case of what is often called robust regression. Unfortu-
nately, the way “robust” is used here can be confusing. In the statistics literature, a robust regression es-
timator is relatively insensitive to extreme observations. Effectively, observations with large residuals
are given less weight than in least squares.
We have seen that misspecification problem can be omitting relevant variables, inclusion of an irrelev-
ant Variable, exclusion of a relevant variable and functional problems.These problems have con-
sequence on the BLUE properties of OLS. Effect of misspecification differs depending on variables
inclusion and exclusion. If the variable is omitted and it is correlated with one or more of the included
variables (i.e., if the pair wise correlation coefficient between the two is nonzero), then the estimates will
be biased and inconsistent. The usual confidence interval and hypothesis testing procedures are likely to
give misleading conclusions about the statistical significance of the estimated parameters. The forecasts
based on the incorrect model and confidence intervals will be unreliable.
The OLS estimators of the parameters of the “incorrect” due to inclusion of irrelevant variables are all
unbiased and consistent. But, the usual confidence interval and hypothesis-testing procedures is invalid.
The estimates of the parameters will be generally inefficient in that their variances will be larger than
those obtained from the true model.
If there is exclusion of a relevant variable as a regressor, then error is correlated with the measured
variables, then OLS estimators will be biased & inconsistent. The estimates of the parameters will be
generally inefficient, as their variances will be generally larger than those obtained from the true model.
If we have data on all the necessary variables for obtaining a functional relationship that fits the data
well, the problem is minor. But when a key variable is omitted on which we cannot collect data, mis-
specifying the functional form of a model can certainly have serious consequences. The effects of func-
tional form misspecification are the same as those of omitting relevant variables.
In previous parts we have seen how to use the residuals of a model to examine autocorrelation and hetro-
scedasticity problems. But these residuals can also be examined, especially in cross-sectional data, for
model specification errors, such as omission of an important variable or incorrect functional form. To
this end or test misspecification errors; first estimate the model using OLS, obtain the residuals. Then
plot them and see the patterns of the plots or the plot of residuals vs fitted values. We suspect our model
for such errors, if the plot of the residuals exhibits distinct and noticeable patterns. But, no such problem
if we see residuals randomly scattered around zero. This approach is also used to have a quick glance at
problems like nonlinearity and others.
B. Different statistical tests
However, suppose that the researcher is not sure whether variable Xk really belongs in the model. How
do you think will the researcher find this out? One can do this using t-test, F test, Durbin–Watson test,
RESET test, etc.
I. t-test, F test
Given specific model we can find out whether one or more regressors are really relevant by the usual t
and F tests. The simple way to find this out is to test the significance of the estimated βk with the usual t
test. But note that the t and F tests should not be used to build a model iteratively. In other words, we
should not say that initially Y is related to X2 only because its coefficient is statistically significant and
then expand the model to include X3 and decide to keep that variable in the model if its coefficient turns
out to be statistically significant, and so on. Decision rule will be not eliminating variables from a model
324
based on insignificance implied by t-tests. In particular, do not drop a variable or more variables at once
(based on t-tests) even if each has |t| > 1, because the t-statistic corresponding to an X (X j) may radically
change once another (Xi) is dropped. However, if he/she is not sure about the relevance of more than one
variable (say X3 and X4), it can be easily ascertained by the F test.
In previous part we have seen that the Durbin–Watson test can be used to test the autocorrelation prob-
lem in the regression model. This test can also be used to detect specification errors. To use the Durbin–
Watson test for detecting model specification error(s), we proceed as follows:
Step 1: From the assumed model, obtain the OLS residuals.
Step 2. If it is believed that the assumed model is mis-specified because it excludes a relevant
explanatory variable, say, Xi from the model, order the residuals obtained in step 1 in as-
cending order of Xi values.
Step 3. Compute the d statistic from the residuals ordered in step 2 by the usual d formula
that we developed so far, namely
n
e e
2
t t 1
d i2
n
e
i 1
t
2
Step 4. From the Durbin–Watson tables, if the estimated d value is significant, then one can
accept the hypothesis of model misspecification. If this test leads us to accept the hypothesis
of misspecification, then inclusion of Xi in our will be the remedial measure.
The tests for misspecification due to omitted variables or a wrong functional form can be done using a
RESET (regression specification error test) proposed by Ramsey (1969). The idea behind RESET is
fairly simple. If the original model
satisfies assumptions of CLRMA, then no nonlinear functions of the independent variables should be
significant. In order to implement RESET, we must decide how many functions of the fitted values to
include in an expanded regression. There is no right answer to this question, but the squared and cubed
terms have proven to be useful in most applications.
325
Let denote the OLS fitted values from estimating 4.143. Consider the expanded equation
This equation seems a little odd, because functions of the fitted values from the initial estimation now
appear as explanatory variables. In fact, we will not be interested in the estimated parameters from equa-
tion 4.144; we only use this equation to test whether equation 4.143 has missed important nonlinearities.
The null hypothesis is that equation 4.143 is correctly specified. Thus, RESET is the F statistic for test-
H o 1 0, 2 0
ing in the expanded model specified in equation 4.144. A significant F statistic sug-
F2,n k 3
gests some sort of functional form problem if the distribution of the F statistic is approximately
in large samples under the null hypothesis (and the Gauss-Markov assumptions). The degree of freedom
in the expanded equation 4.144 is n-k-1-2= n-k-3.
To illustrate the simplest version of this test, let us assume cost function as a linear function of output as
given below.
2. Rerun to your original model in equation after introducing Y i in some form as an addi-
2
tional regressor (s) (i.e. Y i or Y i and so on) . That is, regress: Y on X's, and .
3. We run the following model.
326
4. Obtain R2 from equations 4.145 and 4.146 independently and call them R 2 old and R2 new re-
spectively and calculate the F value as given below.
where, K is the number of new regressors, n is the sample size and J is number of parameters in
the new model (equation 4.147). Then, find out if the increase in R2 is statistically significant by
using F test. To test for irrelevant variables, use F-tests (based on RRSS & URSS).
Decision rule: If and are significant (using F test), then reject H 0 & conclude that there is
misspecification. Stated differently, if the computed F value obtained from equation 4.147is sig-
nificant at a given significance level, then accept the hypothesis that the model 4.146 is mis-spe-
cified. RESET adds polynomials in the OLS fitted values to equation (4.146) to detect general
kinds of functional form misspecification. We already have a very powerful tool for detecting
misspecified functional form: the F test for joint exclusion restrictions.
Limitation
Some have argued that RESET is a very general test for model misspecification, including unobserved
omitted variables and heteroskedasticity. A drawback with RESET is that it provides no real direction on
how to proceed if the model is rejected. One advantage of RESET is that it is easy to apply, for it does
not require one to specify what the alternative model is. But knowing that a model is mis-specified does
not help us necessarily in choosing a better alternative model.
Unfortunately, such use of RESET is largely misguided. It can be shown that RESET has no power for
detecting omitted variables whenever they have expectations of linear relationship in the included inde-
pendent variables in the model for a precise statement. Furthermore, if the functional form is properly
specified, RESET has no power for detecting heteroskedasticity. The bottom line is that RESET is a
functional form test, and nothing more. In general, model misspecification due to the inclusion of irrele-
vant variables is less serious than due to omission of relevant variable/s.
327
against the model
This approach was suggested by Mizon and Richard (1986). Another approach has been suggested by
Davidson and MacKinnon (1981). That is left for you as reading assignment.
328
Summary
We have seen some of violation of classical linear regression model assumptions such as hetroskedat-
icty, autocorrelation, multicollinearlity, non-normality, misspectication of the model. Detail violations,
problems, respective consequences and solution are summarized in Table 4.16 below.
Table 4.16: Consequences of violation of CLRMA
Problem Solution
One of the violations is related to variance. Heteroskedaticity is due to error learning model, nature of
our data, sampling procedure and measurement errors, presence of outliers. Heteroskedatcity have the
consequences of affecting variance i.e variance of error terms are not efficient; but leave estimators(
o and 1 ) unbiased and consistent. Furthermore, prediction based on model with hetroskedaticity is
not reliable and conventional OLS hypothesis is invalid. This lack of efficiency makes the usual hypoth -
esis testing procedure of dubious value and the t & F tests will be invalid. The appropriate solution is to
transform the original model in such a way as to obtain a form in which the transformed disturbance
term has constant variance. One can correct the problem using generalized least square.
329
Another problem is autocorrelation a situation in which typical problem committed when two error
terms are correlated. Some of the reasons for reasons for autocorrelation are cyclical fluctuations, the
cobweb phenomena, misspecification of the true random terms and specification bais. Such problem
lead to inefficient standard errors which in turn lead to invalid hypothesis testing procedures.
Multicolinearity is all about existence of a perfect or exact linear relationship. Multicolinearity may be
because of sampling over limited range of X values, over determined model, improper use of dummy
variables and including the same variable twice. Consequences of multi-colinearity differ depending on
the situation. If it is perfect regression coefficients of variable(Xi) are indeterminate; if the problem is
less than perfects the regression coefficients are determinates; OLS estimator have large variance and
covariance; confidence interval tends to be much higher as standard error is large.
Another common problem is model misspecification which can be as a result endogienty, functional
form misspecification, and measurement error and data problems. An endogenous explanatory variable
exist if u is, for whatever reason, correlated with the explanatory variable Xj. some the mechanism of de-
tecting model specifications are using the Durbin–Watson test, RESET test. Some of the remedial mea-
sures are correcting functional form if possible omitting unnecessary variables, including necessary vari-
ables and using proxy variables for unobserved explanatory Variables.
Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers
330
Review questions
Choose the best answer and circle your choice
1. Which one of the following is true about heteroskedasticity
A. variance of error term is not constant
B. violation of CLRMA``
C. it can be increasing ,decreasing and cyclical
2 f (Xi )
D.
E. all
F. none
2. which one of the following is causes of hetreroskedaticity
A. error learning model
B. nature of economic variable s
C. sampling procedure and measurement errors
D. presence of outliers
E. all
F. none
3. which one of the following is consequence of heteroskedatcity
o and 1
A. estimators( ) are unbiased and consistent
B. variance of error terms are not efficient
C. prediction based on model with hetroskedaticity is not reliable
D. conventional OLS hypothesis is invalid
E. all
F. none
4. Which one of the following is ways of detecting heteroskedaticity?
A. informal way using graph
B. informal way using nature of economic variable
C. formally using parks test
D. formally using spearman rank-correlation
E. all
F. none
5. Regression model has been estimated for 13 observation and hetroskedaticity has been detected
using Goldfield –Quandt test as follow
331
Regression based on the first 13 observations
Yi 3.4094 0.6968 X i ei
(8.7049) (0.0744) R 2 0.8887 RSS1 377.17 df 11
Regression based on the last 13 observations
Yi 28.0272 0.7941X i ei
(30.6421) (0.1319) R 2 0.7681 RSS2 1536.8 df 11
IF the critical F-value for 11 numerator and 11 denominator for df at the 5% level is 2.82. then which
one of the following is true
A. F calculated is 4.07
B. Fcritical is less than F calculated
C. there exist hetroskedaticity in error variance
D. there is no heteroskedticity in error variance
E. all except “D”
F. none
6. which one of the following is among the remedial measures of problems of heteroskedaticity
A. use generalized least square (GLS)rather than OLS
B. stick to OLS but use robust standard errors
C. use weighted least square (WLS) rather than OLS
D. transform original model by dividing is by standard error
E. all
F. none
Y X i Ui ,
7. which one of the following is appropriate transformation of original model of
var(ui ) i2 , (ui ) 0 (ui u j ) 0
to model adjusted for heteroskedasticity
Y X i Ui
(u i )
2
i
2
i i i i
A. if variance is , then
Y X i Ui
Xi Xi Xi Xi
var(u i ) i2 K i2 X i2
B. if variance is , then
Y X i Ui
(u ) k X i
2
i i
2 2
Xi Xi Xi Xi
C. , then
332
Y Xi Ui
(vi ) (ao a1 X i ) (Yi ) X i
2 2
Xi Xi Xi Xi
D. then
E. all
F. none
8. Autocorrelation
A. typical error committed when two error terms are correlated
B. special type correlation refers to the relationship between successive values of the same
variable
C. it is assocated with time series data
D. is a correlation of error terms with lagged values of explanatory variables
E. All except D
F. none
9. Reasons for autocorrelation
A. cyclical fluctuations
B. the cobweb phenomena
C. misspecification of the true random terms
D. specification bais
E. all
F. none
10. Which one of the following is true about order and coefficients of autocolerrelation
u t f (u t 1 )
A. is first order autoregressive
u t f (u t 1 , u t 2 )
B. is second order autoregressive
u u
t t 1
u ,u
t t 1
ut ut 1 t u 2
t 1
C. for ,
D. Autocollelation coefficient ranges from -1 to +1
E. all
F. none
11. Which one of the following is effect of autocorrelations of OLS estimator
A. estimators are unbiased and inconsistent
333
B. variance of error term is underestimated and OLS is not Blue
C. wrong hypothesis testing procedures
D. variance Bi’s are overestimated
E. all except D
F. none
12. Which one of the following indicate autocorrelation
A. when plot of regression residuals against their own lagged value shows systematic pat-
terns
B. when plot of regression residual against time shows a regular patterns
C. when run test shows statistically significant results
D. when Breush-Godfrey(BG) test shows (T-P)R2 exceeds the tabulated value from the X2
distributions with p degree of freedom
E. all
F. none
13. The Durbin-Watson tests
t n
(e t et 1 ) 2
d t 2
t n
e
t 1
2
t
A.
d 2(1 ˆ )
B.
C. when d=1 ,there is positive perfect autocorrelation
0
D. when no autocorrelation
E. all
F. none
Yt X t U t
14. Given result of regression model Y on X: i.e. : Table we can compute the
following values.
xy 255
ˆ 0.91
x 2 280
334
ˆ Y ˆX 7 0.91(8) 0.29
Y 0.29 0.91X U i
(et et 1 ) 2 60.213
d 1.442
et2 41.767
d
Values of d L and U on 5% level of significance, with n=15 and one explanatory variable are: d L
d
=1.08 and U =1.36.
A. d les outside the range
dU d 4 dU (1.364 2.64)
B. interval for d is
C. the data implies autocorrelation
D. there is no indication for autocorrelation
E. all except “D”
F. none
15. which one of the following is among the remedial measure of autocorrelation
A. using generalized least square adjusted for autocorrelation
Yt X t U t U t U t 1 Vt , | | 1
B. Suppose that our model is and : Take the
lagged form of equation (1) and multiply through by .. and subtracting from original
Yt Yt 1 ( ) ( X t X t 1 ) (U t U t 1 )
equation as follows
C. using OLS
D. all
E. all Except C
F. none
16. which one of the following is a mechanism for estimation
1
ˆ 1 d
2
A.
̂
B. prior information on
335
̂
C. given value
n 2 (1 d 2 ) k 2
ˆ
n2 k 2
D.
E. all
F. none
17. multi-colinearity
A. existence of a perfect or exact linear raltions.
B. it does not refers to non-linear relationship
C. number of explanatory variable are reduced when OLS estimator is used
D. same approach is used with that of model with no multicolinearity
E. all
F. none
336
B. condition number (K) is between 100 and 1000
C. VIF of a variable exceeds 10
D. high R2 but few significant t-ratio
E. large standard error but small t-ratio
F. all
G. none
337