0% found this document useful (0 votes)
106 views

Ecometrici Module

This document provides an introduction to econometrics teaching materials. It discusses the definition, goals, and methodology of econometrics. Econometrics uses data and statistical methods to test economic theories and support decision making and forecasting. The teaching materials are divided into two parts covering topics from simple to multiple regression models, violations of assumptions, dummy variables, simultaneous equations, time series analysis, and panel data. The materials aim to equip students with both theoretical knowledge and practical skills in data analysis software.

Uploaded by

Ahmed Kedir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Ecometrici Module

This document provides an introduction to econometrics teaching materials. It discusses the definition, goals, and methodology of econometrics. Econometrics uses data and statistical methods to test economic theories and support decision making and forecasting. The teaching materials are divided into two parts covering topics from simple to multiple regression models, violations of assumptions, dummy variables, simultaneous equations, time series analysis, and panel data. The materials aim to equip students with both theoretical knowledge and practical skills in data analysis software.

Uploaded by

Ahmed Kedir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 338

ECONOMETRCS

Part I: Teaching material I


(Econ 2061)

Prepared By: Mohammed Beshir

Department of Economics
Collage of Business and Economics
Arsi University

January 2019
Assela, Ethiopia
PREFACE
In recent times, decision-making is becoming more research oriented. This demand availability of data
and information collected and analyzed in scientific manner using relevant analytical methodology. To
this end, econometrics techniques play great role. It is becoming common to see econometric applica-
tions in economic planning, research, and polices making. Furthermore, tremendous advancement in
computer technology and data science has made econometrics a handy tool to explain complex realities
of the actual world. Accordingly, researcher and students should be equipped with the basic theoretical
knowledge of econometrics and application issues (software usage and data analysis).

This teaching material on “Introduction to Econometrics (Econ2061)” is prepared to equip students and
researchers with the basic knowledge of econometrics as it gives a sound introduction into modern
econometrics for beginners and intermediate. It also provides the student with a thorough understand-
ing of the central ideas and their practical application along with examples. Besides, the material used
empirical data and examples for practical exercise using common econometrics software, STATA, to
extend knowledge on the subject. It also help student while doing research.

There are few materials that combine theoretical knowledge with data analysis package (software) and
availed the result for students. This material is designed to fill the gap in harmonized curriculum and
shortage of relevant reference materials in the library. The material use three ingredients: lectures to
discuss econometric models with illustrative examples; computer sessions to get hands-on practical ex-
perience related statistical software using real data and exercise sessions treating selected exercises to
train econometric methods on paper. The students can get further knowledge by working on the empiri-
cal exercises at the end of each chapter.

To better understand the subject, solid knowledge of the basic principles and theories of econometrics
along with sufficient background of statistics, calculus and algebra are required.

i
ACKNOWLEDGEMENT
The timely and successful completion of the material could hardly be possible without the helps and
supports from individuals and organization. I will take this opportunity to thank all of them who helped
me either directly or indirectly during this important work.

First of all I will like to express my sincere gratitude to Arsi University, Collage Business and Eco-
nomic for funding considering urgent need of the material. I am really would like thank office of Vice-
Dean for Research and Community service for guidance and facilitation in writing the teaching mate -
rial. I am also thankful to collage of business and Economics and Economics department staffs for
their comment and support.

My special thanks go to Dr. Endalu FuFa, the director of Arsi University educational quality assurance,
for unreserved comment and advice on the material writing and modality. He not only rendered help
educationally but also emboldened my spirit to carry forward my work. I am immensely grateful to
Tesfaye Abera for his valuable guidance, continuous encouragements and positive supports. I would
like to appreciate Mesay mesfin, who entirely edited the material, for always showing keen interest in
my queries and providing important suggestions and editing. Finally, I express my gratitude to
Beteliem Abera who wrote half of the script.

I owe a lot to wife for her constant love and support. She always encouraged me to have positive and
independent thinking, which really matter in my life. I would like to thank her very much and share
this moment of happiness with her. I also express whole hearted thanks to my friends for their care and
moral supports. Furthermore, I thank my mother and sister for their invaluable moral supports and con-
cern for my work.
Mohammed Beshir (Ato)
Lecturer and Head of Department
Department of Economics
Collage of Business and Economics
Arsi University, Ethiopia

ii
Overview of the teaching materials
The approaches applied in these teaching materials are carefully designed, and take into consideration
the existing Ethiopian reality. The material furnishes students with knowledge along with hypothetical
examples to support them. I believe that this approach will enable the students to learn the science of
econometrics and, at the same time, take a glimpse of country’s data and how it can be used in re-
search.
The material is divided into two major parts constituting 9 chapters altogether. The first part of the ma-
terial is organized in such a way that it covers four chapters. The first chapter introduces students with
the basic concept of econometrics such as definition and scope of econometrics, a brief explanation of
the methodological stages of econometric research. Furthermore, it reviews regression and correlation.
The second chapter deals with the simple linear regression models and estimation technique to esti-
mate parameters along with explanation of the desirable statistical properties. Chapter three extends the
lessons of simple linear regression models to multiple linear regression models. Chapter four relaxes
the assumptions of linear regression models and looks at some of violations along with remedial mea-
sures.

In part two we consider models of dummy variables, simultaneous equation, time series and panel data
econometrics. More specifically, chapter five concerned with qualitative independent variables treated
as dummy. Chapter six extends dummy variables concept to the case the dependent variable. We go
through linear probability model, probit, and logit, and tobit model. Chapter seven is about simultane-
ous equation model along with estimation techniques such as indirect least square, two stage least
square, instrumental variable models. Chapter eight is about times series data analysis and forecasting.
Lately, ninth chapter is concerned with panel data econometrics which has the characteristics of both
cross sectional and time series.

Since its coverage is comprehensive (from simple to intermediate) which contains the study of almost
all topics prescribed in new modular syllabus, instructors at university level will find it helpful for
them. Suggestions for further improvements of the paper from students and fellow teachers will be
heartily welcomed.
Mohammed Beshir (Ato)

iii
Table of Contents
PREFACE..................................................................................................................................................i
ACKNOWLEDGEMENT.........................................................................................................................ii
Overview of the teaching materials..........................................................................................................iii
Table of Contents.....................................................................................................................................iv
List of figure.............................................................................................................................................ix
List of Tables.............................................................................................................................................x
Chapter One: Basic Concept in Econometrics.......................................................................................11
1.1. Introduction......................................................................................................................................11
1.2. Definition of Econometrics................................................................................................................2
1.3. Characteristics of Econometrics.........................................................................................................2
1.4. Goals of Econometrics........................................................................................................................7
1.4.1 Analysis: Testing of Economic Theories......................................................................................8
1.4.2. Policy and/or Decision Making...................................................................................................9
1.4.3. Forecasting...................................................................................................................................9
1.5. Division of Econometrics.................................................................................................................10
1.6. Methodology of Econometrics.........................................................................................................12
1.6.1. Question to be analyzed, economic theory and past experiences..............................................13
1.6.2. Formulating mathematical models............................................................................................15
1.6.3. Formulating Statistical and Econometric models......................................................................19
1.6.3.1. Statistical model.................................................................................................................19
1.6.3.2. Econometric model and its specification............................................................................21
1.6.4. Obtaining Data...........................................................................................................................22
1.6.5: Estimation of the model.............................................................................................................27
1.6.6. Evaluation of the estimates/ Hypothesis Testing.......................................................................27
1.6.7. Forecasting or Prediction power of the model.........................................................................29
1.6.8. Using model for control or policy purposes..............................................................................30
1.6.9. Presenting the findings of analysis............................................................................................30
1.7. Correlation and regression analysis.............................................................................................32
1.7.1. Goal.......................................................................................................................................32
1.7.2. Types of relationship between /among variables..................................................................32
1.7.2.1. Covariance..........................................................................................................................32
1.7.2.2. Correlation..........................................................................................................................34
1.7.2.3. Regression..........................................................................................................................36
Empirical Example..................................................................................................................................39
iv
Summary..................................................................................................................................................40
Key terms.................................................................................................................................................41
Reference.............................................................................................................................................41
Review Questions....................................................................................................................................42
Chapter Two: Simple Linear Regression Model.................................................................................45
2.0. Introduction......................................................................................................................................45
2.1. Economic theory of two variable regressions...................................................................................45
2.2. Economic Models (Non-stochastic Relationships)...........................................................................46
2.3. Statistical model and/or Econometric model or stochastic model....................................................47
2.4. Data and area of applications............................................................................................................51
2.4.1. Data type and Hypothetical Example........................................................................................51
2.4.2. Population Regression Function (PRF).....................................................................................54
2.4.3. Sample Regression Functions (SRF).........................................................................................55
2.5. Estimation of the Classical Linear Regression Model......................................................................57
2.5.1. Assumptions of the Classical Linear Stochastic Regression Model..........................................57
2.5.2. Methods of Estimation...............................................................................................................64
2.5.2.1. The ordinary least square (OLS)........................................................................................64
2.5.2.2. Method of moments (MM).................................................................................................78
2.5.2.3 . Maximum likelihood principle.....................................................................................80
2.6. Evaluation of estimates: Statistical Properties of Least Square Estimators................................86
2.6.1. Theoretical a priori criteria (Economic criteria)........................................................................86
2.6.2. The econometric criteria: Properties of OLS Estimators.....................................................87
2.6.2.1. Desired properties of OLS estimators................................................................................87
2.6.2.2. Normality assumption, and probability distribution of disturbance term, ui.....................89
2.6.2.3. Distribution of dependent variable (Y) under normality assumption.................................91
2.6.2.2. Normality assumption, and corresponding properties of OLS estimates...........................92
2.6.3. Statistical test of the OLS Estimators (First Order tests)...........................................................94
2.6.3.1.Tests of the ‘goodness of fit’ with R2..................................................................................94
2.6.3.2. The correlation coefficient (r)...........................................................................................100
2.6.3.3. Precision of the estimators/ the distribution of random variable......................................102
2.6.4. Especial Distributions............................................................................................................106
2.7. Interval Estimation and Hypothesis Testing..............................................................................108
2.7.2. Interval estimation and Distribution....................................................................................108
2.7.2. Hypothesis testing and its approaches.................................................................................114
2.7.2.1 The standard error test of the least square estimates.........................................................115
2.7.2.2 Test of significance approach............................................................................................117
2.7.2.3 The Confidence Interval Approach to Hypothesis Testing...............................................124

v
2.7.2.4. The P-value approach to hypothesis testing.....................................................................126
2.7.2.5. χ2 test of significance approach/ Testing  ..................................................................128
2

2.7.2.6. Test of goodness of fit or overall test: The F test............................................................129


2.8. Prediction/forecasting.....................................................................................................................134
2.8.1. Mean prediction.......................................................................................................................134
2.8.2. Individual prediction................................................................................................................136
2.9. Evaluating the results of regression...........................................................................................140
2.10. Reporting the Results of Regression Analysis.............................................................................142
2.10. Non-linear relationship..............................................................................................................144
2.11. Applications...............................................................................................................................144
Summary................................................................................................................................................147
Key terms...............................................................................................................................................148
Reference...........................................................................................................................................148
Review exercise.....................................................................................................................................149
Appendix 2.1:........................................................................................................................................152
Chapter Three: The Multiple Linear Regression Model.......................................................................168
3.1. Economic theory and Empirical experiences.................................................................................168
3.2. Multiple regressions Model: Notation and modeling...............................................................168
3.3 Econometric model..........................................................................................................................169
3.3.1. Modelling.................................................................................................................................169
3.3.2. Population and sample regression functions...........................................................................172
3.3.3. Interpretation of multiple regression equation.........................................................................172
3.4. Estimation.......................................................................................................................................173
3.4.1. Assumptions of Multiple Regression Model..........................................................................173
3.4.2. Estimation of parameters of two-explanatory variables model...........................................175
3.5. Data and Example...........................................................................................................................180
3.6. Evaluation of model and estimates...............................................................................................183
3.6.1 The coefficient of determination (R2)and adjusted R2..............................................................183
3.6.2. Simple, Partial and Multiple Correlation Coefficients............................................................188
3.6.3. Properties of OLS estimators...................................................................................................190
3.7. Hypothesis testing in multiple linear regression........................................................................195
3.7.1.The normality assumption once again......................................................................................195
3.7.2. HYPOTHESIS TESTING IN MULTIPLE REGRESSIONS.................................................196
3.7.2.1. The standard error test of the least square estimates........................................................197
3.7.2.2. Confidence interval and test of significance.....................................................................199
3.7.2.3. . The P-value approach to hypothesis testing...................................................................204
3.7.2.4. Testing the overall significance of the sample regression................................................206
vi
3.7.2.5 . Important Relationship between R2 and F......................................................................209
3.8. The general linear regression model...............................................................................................210
3.8.1. Derivations of the normal equations........................................................................................210
3.8.2. Variance for general k-variables..............................................................................................212
3.8.3. Generalization of the formula for R2 and R 2..........................................................................213
3.8.4. Overall test of significance......................................................................................................214
Summary................................................................................................................................................218
Key terms...............................................................................................................................................219
Reference...............................................................................................................................................219
Review Question....................................................................................................................................191
Chapter Four: Violations of basic Classical Assumptions....................................................................192
4.0 Introduction.....................................................................................................................................192
4.1. Heteroscedasticity...........................................................................................................................193
4.1.1 The nature of Heteroscedasticty...............................................................................................193
4.1.2.. Examples and Type of data heteroskedaticity........................................................................194
4.1.2.1. Examples of heteroscedastic functions.............................................................................194
4.1.2.2. Type of data and heteroskedaticity...................................................................................195
4.1.3. Reasons/sources for Hetroscedasticity....................................................................................196
4.1.4. Consequences of Hetroscedasticity for the Least Square estimates........................................197
4.1.5. Detecting Heteroscedasticity...................................................................................................200
4.1.6. Remedial measures for the problems of heteroscedasticity.....................................................214
4.1.6.1. The method of generalized (Weight) Least Square..........................................................215
4.2 Autocorrelation................................................................................................................................220
4.2.1 The nature of Autocorrelation..................................................................................................220
4.2.2. Graphical representation of Autocorrelation...........................................................................221
4.2.3 Reasons for Autocorrelation.....................................................................................................223
4.2.4. The coefficient of autocorrelation...........................................................................................225
4.2.5. Effect of autocorrelation on OLS Estimators..........................................................................232
4.2.6. Detection (Testing) of Autocorrelation...................................................................................234
4.2.7. Remedial Measures for the problems of Autocorrelation.....................................................245
4.3 Multicollinearity............................................................................................................................249
4.3.1 Definition and nature of Multicollinearity................................................................................249
4.3.2. Reasons for Multicollinearity..............................................................................................251
4.3.3 Consequences of multicollinearity...........................................................................................252
4.3.4 Detection of Multicollinearity..................................................................................................254
4.3.5. Remedial measures..................................................................................................................257
4.4. Non-Normality of error terms....................................................................................................259

vii
4.5. Model Specification and Data issues..............................................................................................260
4.5.1. Over view................................................................................................................................260
4.5.2. Endogenous regressors assumption and Its problems: E(ɛi|Xj) ≠ 0.......................................260
4.5.3. Causes of misspecification and data issues.............................................................................261
4.5.3.1. Functional Form Misspecification (Specification Errors)................................................261
4.5.4. Consequences of misspecification problem........................................................................269
4.5.5. TESTS OF SPECIFICATION ERRORS............................................................................269
4.5.6. Remedial measures model misspecification and data issues.................................................274
Summary................................................................................................................................................275
Reference...............................................................................................................................................276
Review questions...................................................................................................................................277

viii
List of figure
FIGURE 1.1: ECONOMETRICS AS MULTIFACETED DISCIPLINE..................................................................................................................3
FIGURE 1.2: DEMAND CURVE.........................................................................................................................................................4
FIGURE 1:3 INTERPLAY OF ECONOMETRICS WITH COMPUTER APPLICATIONS.............................................................................................7
FIGURE 1.4: BRANCHES OF ECONOMETRICS BASED ON TYPE OF ANALYSIS............................................................................................11
FIGURE 1.5: DECOMPOSING ECONOMETRICS BASED ON DATA TYPE....................................................................................................12
FIGURE 1.6: FLOW CHART FOR THE STEPS OF AN EMPIRICAL STUDY...................................................................................................13
FIGURE 1.7: SCATTER PLOT OF LUNG CANCER AND SMOKING..............................................................................................................32
FIGURE 1.8: HYPOTHETICAL DISTRIBUTION OF SON’S HEIGHTS CORRESPONDING TO GIVEN HEIGHTS OF FATHERS...........................................37
FIGURE 2.1: REGRESSION LINE AND THE SCATTER DIAGRAM...............................................................................................................47
FIGURE 2.2: REGRESSION LINE AND ERROR TERM............................................................................................................................48
FIGURE2.3: CONDITIONS DISTRIBUTION OF EXPENDITURE FOR VARIOUS LEVEL OF INCOME.......................................................................54
FIGURE 2.4: POPULATION AND SAMPLE REGRESSION LINE.................................................................................................................56
FIGURE 2.5: REGRESSION LINES BASED ON TWO DIFFERENT SAMPLES..................................................................................................57
FIGURE 2.6: HOMOSCEDASTIC VARIANCE AND ITS DISTRIBUTION.........................................................................................................59
FIGURE 2.7: ERROR TERM AND FITTED SRF....................................................................................................................................66
FIGURE 2.8: DECOMPOSING THE VARIATION DEPENDENT VARIABLE (Y)................................................................................................95
FIGURE 2.9: TENDENCY OF T-DISTRIBUTION TO Z DISTRIBUTION.......................................................................................................107

FIGURE 2.10: TWO-TAILED TEST FOR AGAINST ..........................................................................................119

FIGURE 2.11: ONE-TAILED TEST FOR ............................................................................................................................120


FIGURE 4.1: ERROR VARIANCE AND THEIR TYPES............................................................................................................................193
FIGURE 4.2: PATTERNS OF HETEROSKEDATICITY.............................................................................................................................194
FIGURE 4.3: SCATTERGRAM OF ESTIMATED SQUARD RESIDULE AGAINST X...........................................................................................200
FIGURE 4.4: GRAPH OF UI2 AND XI.............................................................................................................................................204
FIGURE 4.5: ERROR TERMS AND ITS PATTERN OVERTIME..................................................................................................................221
FIGURE 4.6: AUTOCORRELATION AMONG ERROR TERMS..................................................................................................................222
FIGURE 4.7: PROBLEM OF INCORRECT FUNCTIONAL FORM...............................................................................................................225
FIGURE 4.8: GRAPHIC PATTERN OF AUTOCORRELATION...................................................................................................................235
FIGURE 4.9: EXTENT OF MULTICOLLINEARITY AMONG EXPLANATORY VARIABLES...................................................................................250

ix
List of Tables
TABLE 1.2: VARIABLE TYPE AND NAMING’S.....................................................................................................................................17
TABLE 1.1: ABEBE'S DEMAND SCHEDULE FOR AN ORANGE..................................................................................................................4
TABLE 1.3: RECTANGULAR DATA SET FOR Y ,X, AND Z......................................................................................................................23
TABLE 1.4: CONSUMPTION EXPENDITURE AND INCOME RELATION......................................................................................................25
TABLE 1.5: CIGARETTES AND LUNG CAPACITY..................................................................................................................................32
TABLE 1.6: CIGARETTES AND LUNG CAPACITY COMPUTATION.............................................................................................................33
TABLE 1.8: CORRELATION COEFFICIENT COMPUTATION......................................................................................................................34
TABLE 1.9: DATA OF SUPPLY AND PRICE........................................................................................................................................36
TABLE 1.10: ROAD ACCIDENT AND CONSUMPTION COEFFICIENTS........................................................................................................39
TABLE 1.11: HYPOTHETICAL DATA ON CONSUMPTION EXPENDITURE (Y) AND INCOME (XI) BPG HET.........................................................44
TABLE 2.1: HOUSEHOLD INCOME(X) AND CONSUMPTION EXPENDITURE(Y)...........................................................................................52
TABLE 2.2: WEEKLY FAMILY CONSUMPTIONS EXPENDITURE AND CONDITIONAL MEAN INCOME..................................................................53
TABLE 2.3: RANDOM SAMPLES FROM THE POPULATION OF TABLE 2.1..................................................................................................56
TABLE 2.4: DIFFERENT SAMPLE AND CORRESPONDING ERRORS...........................................................................................................60
TABLE 2.5: SAMPLE WEEKLY FAMILY CONSUMPTION EXPENDITURE (Y ) AND WEEKLY FAMILY INCOME (X).................................................72
TABLE 2.6: COMPUTATION OLS COMPONENTS BASED ON SAMPLE IN TABLE 2.5.................................................................................72
TABLE 2.7: ADVERTISING EXPENDITURE AND SALES REVENUE IN THOUSANDS OF DOLLAR........................................................................73
TABLE 2.8: DETAIL COMPUTATION OF OLS COMPONENT ON Y AND X VARIABLES..................................................................................73
TABLE 2.9: FAMILY INCOME AND EXPENDITURE..............................................................................................................................77
TABLE 2.10: DEMAND FOR APPLES AND PRICES BY FAMILIES...............................................................................................................77
TABLE 2.11: THE COMPUTATION OF SUM OF SQUARES FROM THE SUPPLY FUNCTION..............................................................................97
TABLE 2.11: TEN FAMILY INCOME AND EXPENDITURE PER WEEK.......................................................................................................101
TABLE 2.12: DECISION RULE FOR T-TEST OF SIGNIFICANCE...............................................................................................................123
TABLE 2.13: REGRESSION RESULT OF INCOME AND EXPENDITURE......................................................................................................128
TABLE 2.14 : A SUMMARY OF THE 𝜒2 TEST....................................................................................................................................129
TABLE 2.15: ANOVA TABLE......................................................................................................................................................131
TABLE 2.16: ANOVA EXAMPLE................................................................................................................................................. 132
TABLE 2.17: REGRESSION RESULT BETWEEN SALES AND ADVERTISING.................................................................................................146
TABLE 2.18: HYPOTHETICAL PRODUCT AND IT CORRESPONDING PRICES..............................................................................................150
TABLE 2.19: GROSS NATIONAL PRODUCT (X) AND THE EXPENDITURE ON FOOD (Y)...............................................................................151
TABLE 4.1: THE ASSUMPTION OF CLRMA AND VIOLATION..............................................................................................................192
TABLE 4.2: CONSUMPTION EXPENDITURE ( D) AND DISPOSABLE INCOME(ID) FOR 30 FAMILIES..............................................................202
TABLE 4.3: PREDICTED CONSUMPTION EXPENDITURE AND RESIDUALS.................................................................................................203
TABLE 4.4: QUANTITY DEMANDED, PRICE OF THE COMMODITY AND INCOME OF CONSUMER DATA..........................................................204
TABLE 4.5: RANK CORRELATION TEST OF HETROSCEDASTICITY..........................................................................................................209
TABLE 4.6: HYPOTHETICAL DATA ON CONSUMPTION EXPENDITURE AND INCOME..................................................................................212
TABLE 4.7: REGRESSION RESULT OF TWO SUB GROUP FOR GOLDFELD-QUANDT..................................................................................212
TABLE 4.8: CONSUMPTION AND INCOME OVERTIME.......................................................................................................................236
TABLE 4.9: REGRESSION RESULT OF CONSUMPTION AND INCOME......................................................................................................236
TABLE 4.10: RESIDUAL FROM THE ABOVE REGRESSION...................................................................................................................237
TABLE 4.11: INVESTMENT AND VALUE OF OUTSTANDING SHARES FOR THE THREE 1935-1953...............................................................238
TABLE 12: ESTIMATED REGRESSION EQUATION OF Y ON X IS AND AUTOCORRELATION...........................................................................238
TABLE 4.13: HYPOTHETICAL VALUES OF X AND Y...........................................................................................................................243
TABLE 4.14: DATA ON THREE VARIABLES Y, X1, X2.........................................................................................................................256
TABLE 4.16: CONSEQUENCES OF VIOLATION OF CLRMA.................................................................................................................275

x
xi
Chapter One: Basic Concept in Econometrics

1.1. Introduction
Economics is mainly concerned with relationship among economic variables. For instance, quantity de-
mand of the product depends of its price, prices of related goods, taste and preference, etc and quantity
supplied of a good depends on its price and other factors. Aggregate demand is a function of consump-
tion, investments, government expenditure and net export. Furthermore; consumption function relates
aggregate consumption expenditure to the level of aggregate disposable income. We can find many
more such relationships among economic variables.

The purpose of looking at the relationships among economic variables is to understood and answers
many economic questions of real world we live in. For example, we may be interested in raising ques-
tions such as: is there any relationship between two or more variables? What type relationship exists
between two or more economic variables if it exists? Also, given the value of one variable; can we
forecast or predict the corresponding value of another? If one variable changes by certain magnitude,
by how much will another variable change? How to influence certain variables in desired way (public
or other)? What are the possible instruments for correcting macro problems? How economic theories
of fiscal and monetary policies have helped government reduce the effect of business cycle on the econ-
omy and others? Responses to these and other questions can be possible through intensive quantitative
research with the aid of econometric tools.

Econometric tools enable us apply economic theory to real problems and test their applicability. Fur -
thermore, econometrics also helps us design relevant policies based on research findings. Without being
able to forecast the growth of the economy, central bankers could not know when to stimulate the econ-
omy and when to cool it off. Without measurement of production and cost, economists could not
identify industries with competitive condition likely to benefit from deregulation and any other ob-
jectives. Without knowledge of consumer income and preferences, business could not predict the prof-
itability of their products.

To this end, we collect data from the population at hand and answer those questions and /or check
whether the theory confirms or not. If empirical data verify the relationship proposed by economic the-
ory, we accept the theory as valid .We either reject the theory or modify it in the light of the empirical
evidence it is incompatible with the observed behavior. Econometrics also enables us to extend the
frontier of knowledge of economics through developing new theories.

12
Accordingly, we are interested in knowing subject with all those advantages. We raise so many ques -
tions related to econometrics such as what econometrics is. How it is related to economic models? How
it is related to statistics? ... etc.. We will look these and other related questions. This demands brief
overview of basic concept of econometrics and econometrics analysis that will pay the way for subse-
quent chapters. First, put forward definition of econometrics, and then we describe the roles and objec-
tive of econometrics analysis. Then we try to deal with methodology of econometrics. At the end we
look at regression and correlation.

1.2. Definition of Econometrics


The term econometrics is derived from two Greek words “Econo” and “metric” where the former is to
mean economy while the latter means measurement. So literally it means “Measurement in econom-
ics”. or “economic measurement” and hence econometrics is basically concerned with measuring rela-
tionships between economic variables. For example, we measure relationship between quantity demand
and price; gross domestic product and its relationship with consumption expenditure, investments, em-
ployment, money supply, export index, etc. However, the scope of econometrics is much broader than
measurement and includes analysis and decision making process.

Formally, many definitions are forwarded, but all of those definitions revolve around the same issue/
conclusion. The most common and comprehensive definition can be presented as follows;
“Econometrics is the science which integrates economic theory, economic statistics, and math-
ematical economics to bring the empirical support of the general schematic law established by
economic theory”

Practical exercise 1.1


Can we learn certain feature from the above definition?

1.3. Characteristics of Econometrics


The above definition reflects econometrics is an interdisciplinary field which integrates economic theo-
ries, mathematical economics, economic statistics and economic model. The definition of econometrics
by Arthur S. Goldberger (1964) which says: “The social science which applies economics, mathemat-
ics and statistical inference to the analysis of economic phenomena” which reflect the subject’s rela -
tionship with other fields. This can be presented in diagram in the following manner

13
Figure 1.1: Econometrics as multifaceted discipline
The following section give deep insight how econometrics is related to other fields.
A. Economic theories
Economic theory is simplified representation of economic reality as the world is complex and consisting
of different agents (firms; households etc) participating economic activities. Theory makes statements
that are mostly qualitative in nature under certain assumptions. For instance, demand of a product is in-
fluenced by many factors such as prices, tastes and preference, income, etc. But it does not imply the ex -
tent or strength of relationship between quantity demand and such variables. Personal consumption ex-
penditure depends on income, marginal propensity to consume, level of saving, autonomous consump-
tion and so on.

Econometrics gives empirical content to most economic theory. This is well founded in the definition
“Econometrics is the positive interaction between data and ideas about the way the economy works”.
We quantitatively measure the relationship among economic variables and estimate the parameters they
involve. By doing so, economic theories can be checked against empirical realities using data obtained
from the real world. If empirical data verify the relationship proposed by economic theory, we accept
the theory as valid, reject otherwise. If the theory is incompatible with the observed behavior, we either
reject the theory or modify in the light of the empirical evidence.

14
B. Mathematical Economics

To better understand economic relationships, forecast and use them as guide to economic policy mak-
ing, we need to know the quantitative relationships between different economic variables. Mathemati-
cal economics is a field which expresses economic theory and idea in quantitative form without empiri-
cal verification of the theory. Mathematical relationship can be expressed using schedule, graph or
equation. For instance, we can explain the above theoretical relationship of demand for a product and
its own price in mathematical form as follows:
Let us consider Abebe to be our representative consumer of an orange.

Abebe’s Demand Curve


Table 1.1: Abebe's Demand schedule for an Orange
P
Price Quantity De- 9
( Birr Per manded
Kg) (Kg per month) 7
0 18
1 16
2 14 5
3 12
4 10
3
5 8
6 6
7 4 1 Q
0 2 4 6 8 10 12
8 2
16 18
9 0

Figure 1.2: Demand curve

Equation form: Qd= f(P), ………….…………1.1.

Now, take any two pairs of prices and quantities relationship of Abebe demand schedule, say

Applying the above formula we get,


8−10
P = m (Q-q1) + p1  P = 5−4 (Q-4) + 10
 P = -2 (Q-4) + 10  P = -2Q+8 + 10

15
 P = -2Q+18,

Hence, P = -2Q+18, is Abebe’s inverse demand function and his direct demand function is:

Q = 9 – 0.5P ……………………………………………………………..1.2

The same concept can be expressed graphically as depicted in figure 1.2 with downward sloping de -
mand curve which explaining inverse relationship between quantities demanded and price.

There is no essential difference between mathematical economics and economic theory. However,
mathematical economics are quantitative rather than qualitative. While economic theory uses verbal ex-
position, mathematical economics make use of mathematical symbols and forms. Mathematical eco-
nomic expresses relationships in an exact or deterministic form. Both state the same relationships. Nei-
ther mathematical economics nor economic theory allows for random elements which might affect the
relationship and make it stochastic. Furthermore, they do not provide numerical values for the coeffi-
cients of economic relationships.

Econometrics is mainly interested in empirical verification of the theory using data which is not the
case with mathematical economics. Furthermore, econometrics enables us derive numerical values of
the coefficients of economic relationships. As defined by Maddala (1992) explains how econometrics is
related to mathematical economics. “Econometrics is a special type of economic analysis and research
in which the general economic theories formulated in mathematical terms is combined with empirical
measurement of economic phenomena”. Besides, econometrics relationship presupposes stochastic re-
lation which assumes random relationships among economic variables. In other word, econometric
methods are designed to take into account random disturbances which relate deviations from exact be-
havioral patterns suggested by economic theory and mathematical economics.

C. Economic Statistics

Economic statistics is a descriptive aspect of economics mainly concerned with collecting, processing
and presenting economic data and their expression in readily understandable form. Moreover, it study

16
characteristics of population and/or sample such as measure of central tendency, measure of dispersion,
measure of correlation and covariance of data. Furthermore, we look at probability distribution, estima-
tion and hypothesis testing. Economic statistics neither concern with using the collected data to test
economic theories nor provide explanation of the development of various variables. Econometrics tries
to relate economic statistics and economic theory. According to T. Haavelmo, (1944) “Econometrics
is a conjunction of economic theory and actual measurements, using the theory and technique of statist-
ical inference as a bridge pier” Econometrics takes the equation of mathematical and statistical eco-
nomics and confronting them with economic data then seeks to use techniques of statistical inference
to give quantitative forms of equations.

D. Mathematical (or Inferential) Statistics

Econometrics is sometimes defined as “Application of the mathematical statistics to economic data in


order to lend empirical support to the economic mathematical models and obtain numerical results”
(Gerhard Tintner, 1968). Mathematical (or inferential) statistics deals with the method of measurement
which is developed on the basis of controlled experiments. But some statistical methods of measure -
ment especially based on controlled experiment are not appropriate for a number of economic relation-
ships. This is basically due to the fact that the natures of relationships among economic variables are
stochastic or random and hence a carefully planned and controlled experiment cannot be designed.
Mathematical statistics provides many tools for economic studies, but econometrics supplies tools with
many special methods of quantitative analysis based on the nature of economic data.

In general, econometrics uses economic theories, mathematical economics and economic statistics. It
uses insights from economics and business relationship (deterministic and random) in selecting relevant
variables and models. It employs statistical methods of data collection to empirically measure vari-
ables. Mathematics helps to develop econometric methods that are appropriate for the data and the
problem at hand. It provides empirical value of the coefficient of the relationship and can be used to in-
fer something as conclusion.

Econometrics does not only integrate different discipline, but also relate different activities. Economet-
rics uses insights from economics and business (in selecting relevant variables and models), develop
econometric methods appropriate for the data and the problem at hand with the context of statistics and
mathematics framework. It uses computer applications and statistical software to process data and es-

17
timate the relationship anticipated by econometric models. The interplay of these disciplines in econo -
metric modelling is summarized in Figure 1.3.

Economics and Business

Econometrics Mathematics
Statistics

Computer application

Figure 1:3 Interplay of econometrics with computer applications


As the preceding definitions suggest, econometrics is an amalgam of economic theory, mathematical
economics, economic statistics, and mathematical statistics but it is completely distinct from each one
of these branches of science for the following reasons.
 Economic theory makes statements that are mostly qualitative in nature, while econometrics
gives empirical content to most economic theory.
 Mathematical economics express economic theory in mathematical form without empirical veri-
fication of the theory, while econometrics is mainly interested in the later.
 Economic statistics is mainly concerned with collecting, processing and presenting economic
data. It does not being concerned with using the collected data to test economic theories, econo-
metrics does.
 Mathematical statistics provides many tools for economic studies, but econometrics supplies the
later with many special methods of quantitative analysis based on economic data.

Practical exercise 1.2


How do you relate econometrics to statistics and economic statistics?

1.4. Goals of Econometrics


What can economists do with the results of econometrics analysis? Econometrics has a wide variety of
functions which falls into the following three general categories. These are:
1. Testing of economic theories /Analysis/
2. Policy making /estimation of the parameters of economic relationship /
3. Forecasting the future values of economic magnitudes
Therefore, successful econometric application should include combination of all the above goals.

18
1.4.1 Analysis: Testing of Economic Theories
Economics is a science that studies social matter (behavior of institution and individual) via scientific
methods which is subjected to scrutiny in both logical and empirical manner. Economists also aim pri-
marily at both theories building and at their ramification. For most of its history as an academic disci-
pline, economics had little to do with statistics. Only in 1940’s economists begun to use statistical
methods to evaluate economic ideas which marked the birth of econometrics. Since then econometrics
can be used as a way of testing economic theories and a method by which statistical techniques are ap -
plied to economic problems. As a result economist gained not only a deeper knowledge about how the
economy actually work but also better insight as to how economic policy in the future will be de -
signed . In this case the purpose of economic analysis is obtaining empirical evidence to test the ex -
planatory power of economic theory. Such testing uses both deductive and inductive logic.
I. Deductive Logic
The early stages of the development of economic theory, so called Armchair economist, were concerned
with formulating basic economic principles using verbal explanation by applying deductive procedure
(from general to particular). In this period no attempts were made to examine whether the theories ade-
quately fit the actual economic behavior or not. Rather, economists by pure logical reasoning derive
some general conclusions /laws/ concerning the working process of economic system.

II. Inductive Logic


Now a day any theory regardless of its elegance in exposition or its sound logical consistency cannot be
established and generally accepted without some empirical testing. It involves testing economic theor-
ies against economic reality, or the explanatory power of economic theory based on empirical results or
outcome of data analysis. Each theory predict particular relationship existed, and econometrician
could tell whether the data supported the theories or not. For instance, the economic problems of
1930 could not be explained by classical theories led foundation for new perspectives in economic
thinking: modern macroeconomics. Furthermore, the economic slowdown of the 1970’s was not con-
sistent with the standard macroeconomic models of that time. Besides, econometrics analysis of eco-
nomic condition of the 1980’s was able to show that some theories could not be reconciled with
actual economic performance whereas other were better explained what had happened and the re-
sults of economic policy decision. A great deal of efforts went into creating new theoretical explana-
tions of macroeconomic performance. The number of questions underlying decision or choice regard-
ing our own economic welfare or that of other is seemingly infinite.

19
Practical Exercise 1.3
What kind of questions are of concern to both economists who build theories and to those who engage
in data analysis and testing.

1.4.2. Policy and/or Decision Making


This goal deals with obtaining numerical estimates of the economic relationships for policy simulation
or decision making. It involves estimating the parameters /coefficients of economic relationship as effi-
ciently as possible. Using these numerical values, a decision can be made by different economic agents
depending on their area of concern. For example, the decision of the government about devaluating cur-
rency of a country may depend on volume of import (I), price of import (PI), price of export (PEx) and
other factors. Assuming  constant, B1 is marginal propensity to import, B 2 is marginal propensity to
export, B3 is import propensities & B4 is export propensities coefficients related to above variables,
simplified econometric model of devaluation can be;
D    1 I  2 Ex  3 PI   4 PEx  Ui ………………..1.3
On the basis of these model magnitudes of coefficients /numerical values/ of variable will be deter-
mined whether devaluation will eliminate the countries deficit or not. Furthermore, most often gov-
ernment creates incentives for people to change their behavior, either to get a reward from the gov -
ernment or to avoid a penalty.

Practical Exercise 1.4


Assume the federal government gives a tax credit to people who save for their children’s college
educations. If tax depends on income, economic sector, capital, government policies, etc. Present the
model for such system with linear model to make such policy effective?

1.4.3. Forecasting
Forecasting is estimating the future values of economic magnitudes using the numerical values of the
coefficients of economic relationships obtained. They also judge whether it is necessary to take any
measure in order to influence the future value of economic variables. Forecasting is used for both de-
veloped and developing countries in different ways. While developed countries use it for regulation of
their economies, developing countries use it for planning purpose.

For instance, Ethiopian government wants to make policy related to expenditure of imported goods ex-
penditure for the coming five or ten years. To formulate policy related to expenditure on imported
goods the government has to know (among others) forecasted level of personal disposable income, cur-

20
rent expenditure on imported goods, inflation rate and other factors. Accordingly, time series data from
year 1985-1995 was collected on those variables mainly on personal disposable income and expendi-
ture on imported goods. The estimated result for the Ethiopian economy for the year 1985-1995 is
found to be

………………………………………………………….1.4

Where Yi = Ethiopian expenditure on imported goods Xi = Personal disposable income


On the basis of the above result the government forecasted its expenditure in any year after 1995. If
disposable income (Xit) is 1 million in 1999 his expenditure on imported goods will be

by in the same year. Since the government knows

the future values of expenditure on imported goods and services, he can take any measure to increase or
cut imports using these numerical values.

Practical Exercise 1.5


Given the above information, one can raise many questions as follows
 What is the current situation of expenditure on imported goods given current disposable
income?
 What will be the expenditure on imported goods, say in ten years’ time if no measure is
taken by the government?
 What policy should be designed for certain public problems?
 To what extent those policies are effective in meeting their objectives?
 What is the impact of the federal budget deficit on the level of interest rates and the rate
of inflation?
 How does the trade deficit affect the level of employment and the bargaining position of
labor union? etc

1.5. Division of Econometrics


Just like any subjects econometrics is decomposed in many ways. Based on types of study it is divided
in to two branches: Theoretical & Applied econometrics based on type of study.
I. Theoretical Econometrics: it is the development of appropriate econometric methods for measur-
ing economic relationship among variables. The data used for measurement purpose are observa-
tions from the real world but are not derived from controlled experiments; hence, econometric rela-

21
tionships are not exact. The econometrics method that will be used in theoretical econometrics may
be classified in to two
A. Single - equation techniques which shows one side relationship between variables at a time.
We have only one side causations as quantity demand depends up on the price of the com-
modity but not vice versa. For example;

------------------dd equation………………….……1.5
B. Simultaneous equation model: when there is two sided causation we find simultaneous rela-
tionship. For instance, equation (1.5) explains that quantity demand depends on the price of
the commodity but if the price of the commodity is in turn depends on the quantity supplied
equation as indicated in equation (1.6), we will have two side causations.

………………………………...……………..1.6

C. Equilibrium condition: If one imposes certain restriction such that one side of the equation is
equal to the other side it is termed as equilibrium condition. In this case we apply economet-
rics techniques simultaneously for all three equations at a time.

------------------------dd equation

------------------------ss equation

-------------------------------identity…………..…………..1.7
II. Applied Econometrics: This is the application of theoretical econometrics methods to the specific
area for verification and forecasting economic relationships such as demand, cost, supply, produc-
tion, investment, consumption and other related fields of economic theory. Applied econometrics
is the economists wind tunnel, the plane where we test our stories about how the economy
work.

Econometrics

Theoretical Applied

Classical Bayesian Classical Bayesian

Figure 1.4: Branches of Econometrics based on type of analysis

22
It can also be classified as cross-sectional, time series, and panel data econometrics. These categories are
discussed under methodology part in subsequent section.

Econometrics

cross-
time series panel data
sectional
Figure 1.5: Decomposing Econometrics based on data type

Practical Exercise 1.7


Assume you want to study consumption expenditure and income relationship over time for 25 years.
Which type econometrics is more relevant? and why?

1.6. Methodology of Econometrics


Econometrics can provide helpful techniques in making some kind of decision. Research involving
econometric addressed in the form project is termed as econometric research project. Such project in-
volves many activities such as defining purpose, collecting data, selecting model to organized data, se-
lecting method of analysis and testing. These activities are better managed in logical steps/phases
termed as econometric methodologies. Several schools of thoughts proposed their respective economet-
ric methodology. However, emphasis will be given to the classical (traditional) methodology, which
still dominates empirical research in economics and other branches of science. Econometric research or
inquiry generally proceeds along the following lines/stages;
1. Economic Theory, Past experience, Other studies and , Questions
2. Mathematical Econometric Model
3. Econometric Model of Theory/ Statistical model
4. Data
5. Estimation of Econometric Model
6. Hypothesis Testing
7. Forecasting or Prediction
8. Using the Model for control or policy purposes

23
It can be graphically presented as follows;

Figure 1.6: Flow chart for the steps of an Empirical study


In the following section we see steps in conducting an econometric research project in detail

1.6.1. Question to be analyzed, economic theory and past experiences


The starting point of any econometric study can be societal questions or challenges or gaps in previous
empirical studies and related issues. We see such issues in some detail.
A. Question to be analyzed

These are challenges, problems, questions, theoretical dilemma of society in any given situations.
These questions arise due to the dissatisfaction with the status-quo. The question to be answered by
econometrics project must be a positive and not normative one. Economic questions are normative
question if its answer is subjective in which one cannot capable of being right or wrong. It de -
pends on value judgments on which two reasonable economists might differ. One cannot answer sub -
jective questions and make a measurement that provides the answers to questions objectively. For in-
stance, the question “should we reform welfare” is a normative. Its answer depends on subjective
value judgments about whether the benefit of welfare reform( increased labor force participation ,
reduced government spending ) are more important than the cost (elimination of social safety net ,
increased poverty). This is a poor question as there no evidence or data to judge.

24
On the other hand, question is positive if its answers are objective and capable of being proved as right
or wrong. Such question depends only on facts about any two reasonable economist should agree or
not? Question such as “how much would welfare spending be reduced if we reformed welfare” is posi-
tive. It gives some number that might be difficult to predict in advance and could easily be veri -
fied by comparing spending before and after welfare. An econometrics project might be designed
to identify the likely consequence of welfare reform on workforce participation by welfare recipi-
ents. Objective knowledge about the spending consequence of welfare reform is helpful in forming
a subjective opinion about whether or not reform is a good idea.

Such econometric research questions to be analyzed should be framed well. First, questions needs both
general (economic problem to be addressed) and specific whose answer will say something interesting
and useful about the topic. Second, question has to be feasible in a sense that objective answer. Fur-
thermore, questions have to be practical. It has to be possible to set up an econometric model ,
gather data, analyze those data within the time available for the project.
B. Economic theory (Statement of theory or hypothesis)

Once the questions are defined, it is important to have some central economic idea/theory that
bind variables (questions) together and gives direction. Without economic theory, the project can
descend into mere numerical calculations that leads nowhere and provide no useful economic
knowledge. Economic theories, hypothesis or preposition provide, a way of identifying and thinking
about relationship among economic variables. They are a way for making qualitative predictions about
outcomes. For example; an investigator may start with the famous Keynesian theory of marginal
propensity to consume which state that, consumption increases as income increases, but not as much as
the increase in income. That is the marginal propensity to consume (MPC) for a unit change in income
is greater than zero but less than unit (Keynes, 1936). Hence, his methodology starts by stating this pos-
tulate. Furthermore, law of demand says that as the prices of a commodity increase, its quantity demand
will decrease or there is inverse relationship between the two.
C. Previous Studies and Past Experiences

Different studies on the topic may have been conducted somewhere else which may have given clue
to this research concerning methodology, model previously used, data and their measurements which
can have relevance for the current study. Furthermore, one can learn from problems faced and reme-
dial measures taken by earlier researches. In econometric research project review of previous re-
searches which might have implication of related gap, dilemma, inconsistencies, similarities and differ-
ences … etc are par important.
25
The above relationship can be summarized as in the example below:
Problem Poverty
Questions What is the extent of poverty?
What are the factors triggering poverty?
Which factor plays key role in household poverty?
Economic theory Poverty theory and models
Previous studies Many studies on poverty in LDCs

Practical Exercise 1.8

1. Why econometrics questions are so important in study?


2. What makes a research question good?
3. For consumption expenditure and income relationship discuss how problems, questions, economic
theories, previous studies are related?

1.6.2. Formulating mathematical models


A. Economic Model-definition
Stories/theories which can be rigorously formulated and precisely expressed in terms of paradigm
called economic models. Just like a manufacturer who tests alternative design before they put into real -
ization, every analysis of economic system should be based on some underlying logical
structure(known as model)that describes the behaviors of the agent. Just like a model airplane
which abstract from the complexity of real airplane and yet capture many of its essentials includ -
ing the ability to fly, so do economist believe that we can learn a great deal about the economy
by constructing model.

A model is a simplified representation of an actual phenomenon (an actual system or process) which
gives us a framework of analysis for our econometric project and will help us answer those questions.
Most economic theories do not explicitly state relationship among economic variables. Model helps us
clarify and abstract relationships to explain extent of relationship, learn the type (causes-effect) and pre-
dict. It is a mechanism list of variables to be included in the econometric model, define how they are
related, and what consequences of changes in one variable will be for the model as a whole.

26
There will never be just one universally applicable model. As economist tell many stories about as -
pects of the economy that interest them, there are many economic models. They take different
forms from which details of the real world can be abstracted. Modelling or the art of model building in-
volves balancing often competing goals of realism and manageability. It should be realistic in incorpor -
ating the main elements of the phenomena being represented, specifying the interrelationships among
the constituent elements of the system. It should, however, at the same time be manageable in eliminat-
ing extraneous influences and simplifying processes so as to ensure its yields insights or conclusions
not obtainable from direct observation of the real-world system.
B. Types of Economic Models

There are alternative ways of representing the real-world system. The most important forms or types
models are verbal/logical, physical, geometric, and algebraics. Verbal/logical models use verbal analo-
gies to represent phenomena sometimes called paradigms. Physical models represent the real-world
system by a physical entity. Geometric models use diagrams to show relationships among variables. Al-
gebraic models which are the most important type of modelling for purposes of econometrics represent-
ing a real-world system by means of algebraic relations. Usually economic models are expressed
mathematically, but sometimes an adequately rigorous formulation can be done diagrammatically
or less often descriptively.

C. Component and Specification of the Model


Model reflects the known or conjectural relationship among economic variables (prices, quantity, in-
come, etc) and provide basis for classifying the variables and identifying relevant once hopefully
testable hypothesis. Economic models consist of the following three basic structural elements:
1. A set of variables; X,Y,Z,W
2. A number of strategic coefficients; β, α, etc.
3. A list of fundamental relationships; direct r/ship. indirect r/ship
Let’s see each of them in some detail.
I. A set of variables or selecting variables:; X,Y,Z,W
The system of equations involves certain variables in economic relationships which can be categorized
into two as dependent (endogenous or explained) variable and independent (exogenous or explanatory)
variables. Different naming are used for both types of variables as presented in table 1.2 below.

27
Table 1.2: variable Type and naming’s

Endogenous variable jointly called dependent Exogenous (es) or independent variables are vari-
variables. These are variables determined by the ables which are determined outside the system
model and are simultaneously determined by the but influence the values of the endogenous vari-
system of equations. ables. These variables affect the system but are
not in turn affected by the system

Dependent variable Explanatory Variable(s)


Explained variable Explanatory Variable(s)
Predictand Predictor(s)
Regeressand Regressor(s)
Response Stimulus or control variable(s)

One can specify general representation as and it can be read as

.where dependent variable, independent variable. In simple

consumption theory, the consumption expenditure is dependent and income is an independent variable
which can be specified as:

. …………………………………………………..1.8
Economic theory postulates that the demand for a commodity depends on its own price (P), the prices of
substitute (Ps), price of complement (Pc), consumers’ income(Y) and tastes and preference. Quantity de-
mand of some commodity (qd) can be written as ;

………………..…………..………1.9

Along with is general one there is specific form model presented with specific functional forms.
II. Determine the theoretical sign and values of parameters

Economic model that express the relationship between variables involve questions concerning the
signs and magnitude or values of the unknown parameters to be estimated. The information about pa-
rameter (β0, β1), magnitude/ values of parameters and their sign (i.e. negative or positive relationship

28
between variables) is based on theoretical and/or priori expectation. For our example so far we can
have the following sign or direction of relationship between variables.
i. In simple consumption theory, the consumption expenditure is dependent and income is an
independent variable which can be specified as

.
……………………………………….……1.10

Where B0, B1, reflect the unknown parameters connecting consumption and income with B 0 referring
constant, and B1 is marginal propensity to consume. Likely sign for both B0 and B1are positive.
ii. economic model of demand above can be expressed with parameters as follow

………………………………….………..1.11

where B0 referring constant and B1, B2, B3, B4 reflect the unknown and unsigned parameters/coefficient
of own price, price of substitute, price of complement, income respectively.

III. Fundamental relationships (specification of the model)


This is stage where we set fundamental functional form of the model. This will clarify the relationships
between the dependent & independent variables on the basis of economic theories along with parame -
ters expected. The relationship between economic variables explained using linear or non- linear equa -
tions or single equation or simultaneous equation model.
I. Linear function: If there is degree one relationship between variables. Let’s specify our previous
theoretical relationships;

 Consumption function ……………………………….. …1.12


Where Ci = aggregate consumption expenditure, Yi = aggregate income.
 The demand model can be written mathematically assuming linear relationship as:

…………………………………..1.13

According to the general theory of demand function is expected to have a negative function.
II. Non-linear Function: Consumption expenditure of an individual at time t is dependent variable
and income and future interest rate. Economic model of consumption function can be written as

………………………………………………..…………1.14

Where Ct is consumption at time t, Yi is income at time t, and ri is future rate of interest.


All the above equation is single equation model. More specifically, equations 1.12 to 1.13 are linear

29
equations, but equation 1.14 is nonlinear equation. The coefficients of the variables ( )

will be the likely magnitudes of these coefficients. In equation 1.12 & 1.13 the coefficient repre-
sents marginal magnitude such as marginal propensity to consume which is between 0 and 1. It show
that if income increases by 1 birr on the average consumption will increase by certain amount. But, the

coefficient of equation 1.14 explains the elastic ties. i. e. explain if income increases

by 1%, consumption will increase on the average by 1% for show that if rate of interest is increas-
ing by 1%, consumption will be cut down on the average by 2 %. The magnitude or the numerical val-

ues of the coefficients of the variables ( ) are determined by empirical observation or


data.

1.6.3. Formulating Statistical and Econometric models


1.6.3.1. Statistical model
Economic model is only logical description that explains exact relationship. But real economic relations
are not exact as there are many factors explaining the given variables, and functional form errors. In line
with the functional form of economic model there is statistical model which consider three additional
issues: introducing error term; introducing observation ; and sampling process.

I. Introducing error term

The above economic relationship between Y and X is exact abstraction from reality. That is, all variation
in Y is due to solely change in X and that there are no other factors affecting the variation in Y. How -
ever, in reality such exact relationship may not exist. Rather we find stochastic relationship. The rela -
tionship between X and Y is said to be stochastic if for a particular value of X there is a whole proba -
bilistic distribution of Y. The deviation of observation from predicted line is captured by error term.
Hence, the true relationship which connects the variables is divided into two parts: a part represented by
a line and a part represented by the random term ‘u’. Accordingly,
Actual = Systematic +Random error terms.
The inclusion of Ui in the mathematical economics will transform the economic model in to statistical
model which account in part for;
 wrong specification /mis-specification the model
 controls for omitted factors, measurement error /errors in measuring variables,
 Imperfections, looseness of statements in economic theories.
 Limitation of our knowledge or factors which are operative in any particular case.

30
 Formidable obstacles presented by data requirements in the estimation of large models.
 Omissions of some important variables and /or equations from the functions (for example, in si-
multaneous equations model).
 Irrelevant explanatory variables may be included, etc.

The values of this random variable cannot be actually observed like the values of the other explanatory
variables. We thus have to guess the pattern of the values of u by making some plausible assumptions
about their distribution.The inclusion of such stochastic disturbance terms in the economic model is the
basic tools of statistical inference to estimate parameters of the model.

II. Introducing observation

A second change that we make to equation when going from the economic model to the statisti -
cal model is concept of observations. Since we are using observation or sample many observation are
included in the analysis. To differentiate one observation from the other, one should define a subscript
to each observation. Since we will be collecting data from household (observations) on Y and X the
subscript i is introduced to describe the i th observation on each of the variable. We use Y i to denote
consumption expenditure for the i th household ; Xi for income of the i th household, and ui unob-
servable random variable for all other factors influencing consumption expenditure for the i th
household. In some text book t can be used as subscript to refer i th observation or t-time period
for time series data.

III.Sampling process
The third issue in extending economic model to statistical model consideration of the process by which
the data were generated (from sample or population) and probabilities distribution along with error term.
Stated differently, statistical model should explain sampling process and probability distribution of the
data or observed value of the variables and error terms. The values taken by dependent variable are not
known with certainty. Rather, they can be considered as random drawings from a probability distribu -
tion with certain assumed moments. Random variables have certain assumed probability distribution
with defined properties such as mean, variance, and covariance. That is, the sampling process underlying
the observed Yi is directly related to the mechanism made about the random variable U i . It also
based on their probabilities distribution.

Practical Exercise 1.9

31
Let the consumption expenditure of a given product is determined by income. Illustrate the distinction
between stochastic and non-stochastic relationships with the help of a consumption function?

1.6.3.2. Econometric model and its specification


A. Definition and model
Econometric models are generally stochastic algebraic models which include random variables. The
econometric models are statistical model which explains the rational for the possible relationship be-
tween the economic variables. The relationships measured with econometric techniques are relation-
ships in which some variables are postulated as causes of the variation of other variables. These are
models used for estimating parameters and testing statistically the hypothesis that economic model pro-
pose using empirical data.

Given those issues of statistical and econometric model, general framework of econometric model can
be presented as follows;

……………………………1.15

 Having access to only one explanatory variables we may write the complete model in the fol-
lowing way for a given household

……………………………………1.16

 if everything left unaccounted will be summarized to error term, u, the equality of above equa-
tion hold true . Therefore, the system can be

……………………………………………………….1.17
 Because variation in Y not fully explained by variation in X can be captured by including error
term, ui.

Accordingly, the above consumption expenditure and demand model can be modified as

One variable:

32
Many variable ………..………….…1.18
For linear specification above, we might specify our statistical model for the above example for n th ob-
servation as follows.

One variable:

Many variable ……….…….……1.19

where stands for the random factors which affect the quantity demanded.
It can also assume nonlinear form as specified in equation (1.20)

……………………..………1.20

 Each of these functional forms represents a hypothesis about the form of relationship. Our inter -
est might center on trying to determine which hypothesis is compatible with the data.
The number of variables to be included in the model depends on the nature of the phenomenon being
studied and the purpose of the research. Usually we introduce explicitly only the most important (four
or five) explanatory variables. The influences of less important factors are taken into account by intro-
ducing a random variable in the model.

Practical Exercise 1.10


1. The purely mathematical model of the consumption function given above is of limited interest to
the econometrician, why? What should the relationship look like?
2. Present relevant economic model, statistical model, and econometric model for linear consumption
function?

1.6.4. Obtaining Data


Given an econometric model that suggests relationship between variable, we need data (empirical evi-
dence) on the variables included in the model to estimate numerical values for the parameters in equa-
tions. Data(plural of datum) are the quantitative factor or experience which present information
about a set of cases or instances and/or years of the occurrence of that process. These cases, year
or instance are observations. The nature of the cases or instance defines unit of observations. For ex-
ample, at aggregate level consumption function focused on an aggregate figure as the unit of ob -
servations, and they may be based on years experiences. At the microeconomics level the study of
consumption behavior usually focuses on the family as the unit of observation. If we consider
family over time, the observations may consists of data for the same family during different periods
of time or for different families.

33
All data we consider in any particular analysis constitute data sets. A variable is a measurable charac-
teristic of interest of data be collected. Variables are typed as being discrete or continuous. The catego-
rization of variable as discrete or continuous be depends on situations. Discrete data variable usually
results from counting and its value are usually integers, even when the number of observation in
the data sets are quite large. Number of person in family or family size, age of students, etc. are usu-
ally measured using discrete number 0.1.2.3… . Continuous data variable usually result from mea-
suring possibly with a great precision and its value can be presented in range or specified in in -
terval. For example, income of families might be measured very precisely using continuous variable.
In a survey of 100 families their might well be 100 different value for the variable. Because of
this one may group them in to specific interval . Furthermore, number of families above can be
continuous when 0-2, 2-5.

Variables often are given symbolic names such as X, Y, Z. Depending on the nature of study we have
data of many observation on such variables. In other word, the data can be thought of as being ar-
ranged in rectangular array, in with each row being an observation and each column containing
the values of a variable as in Table 1.3 below. Suppose that we have “n” observations on the vari-
able X, Y, and Z. For example, consumption expenditure is related to income, family size, number of
children and so on. In reporting the results of data analysis there are two approaches for variables pre-
sentations: one approach is using plain words if descriptions of variables are short. Alternatively, it
often common and makes sense to use variables names that are mnemonics for the correspond -
ing characteristics (CONSUM, INCOME, FAMSIZE). These are short form or abbreviations for man-
ageability purpose.

Each element in the array symbolized by the variable name with a subscript indicating that obser-
vation number. A typical observations which we denote as the i th consist of the value Yi, Xi and
Zi. .

34
Table 1.3: Rectangular data set for Y ,X, and Z

Observation number Y X Z

1 Y1 X1 Z1
. . . .
. . . .
. . . .
. . . .
i Yi Xi Zi
. . . .

N Yn Xn Zn

Data variable is the one generated originally and it is different from random variable which can be as -
signed artificially. The distinct between these types is precisely drawn in the probability theory
dealing with random variable.
The data could be collected from either primary or secondary sources. Furthermore, the data used in the
estimation of econometric model may be of various types. The most important data structures encoun-
tered in applied work are:
I. Cross-sectional data
II. Time series data
III. Pooled data
IV. Panel data
I. Cross-sectional data
It represents measurement at a given point in time. Data collected on one or more variables at particular
period of time. It arises when the observations are of different entities(such as persons, firms, or na-
tions) for which a common set of variables is measured at a point in time. For example, the re-
sults of a survey in which various people are interviewed regarding their labor force activity and
earnings constitute a cross-sectional data set. In enrollment survey of elementary school, number of
children registered for schooling in all K.G. Schools of Adama in 2010 E.C. by sex, age, religion etc.
For use throughout this book a small cross-sectional data set has been selected from the re -
sponses to the survey of agricultural characteristics of consumers by CSA.

35
The data set consists of 100 observations on 12 variables. For convenience, the number of observations
was reduced to 100 by a random selection. Normally one would use as many observations as feasi-
ble as long as they are appropriate for the estimation at hand. These observation were selected
from among those families headed by a male aged 25-54 who was not predominantly self-em -
ployed in order to produce data relevant for examine an earning function. Also in order to ex -
amine normal behavior, families with wealth greater than 100000 were not included.
(see appendix part of this chapter)

Practical Exercise 2.11


See data on demand for a product and price relationship above in Table1.1. What type of data is it?

II. Time series data


Data arise when characteristics of the observations recorded for the same entity overtime or different
period of time. In other words, data related to a sequence of observations over time on an individual or
group of individuals etc. For example, records of a person’s employment and earning in each year
of his life constitute a time series data set. National income accounting of a given economy com-
piled over time for macroeconomic analysis. In the Table 1.4 below macroeconomic data 1980-2010 on
consumptions specified as Y= Personal consumption expenditure, X= Gross domestic product all in
million birr.
Table 1.4: Consumption Expenditure and Income Relation

Year X Y
1980 2447.1 3776.3
1981 2476.9 3843.1
1982 2503.7 3760.3
1983 2619.4 3906.6
1984 2746.1 4148.5
1985 2865.8 4279.8
1986 2969.1 4404.5
1987 3052.2 4539.9

36
1988 3162.4 4718.6
1989 3223.3 4838.0
1990 3260.4 4877.5
1991 3240.8 4821.0

The interval or periodicity of the time series will be annual , quarterly, or monthly depending on
whether the municipality wishes to account for change in annual , quarterly, or monthly. The type
of data available often dictates the periodicity of the data gathered.

Practical Exercise 2.12


Assume macroeconomics data of Ethiopia from year 1974-2019. What type of data is it?

III. Pooled data:


While the econometric analysis of time series uses many of the same tools as cross-sectional analysis, it
is more complicated due to the trending, highly persistent nature of macroeconomic time series. To this
end we employ time series econometrics and we look at methodology in detail in chapter five of the
second part. These are data of both time series & cross sectional data. The type of data collection
result in time series data for cross-section of household/events. Demographic and health survey
(DHS) survey conducted by CSA in 2000, 2005 and 2010 from different household mainly female at
different times.

IV. Panel data:


A panel data (or longitudinal data) set consists of a time series for each cross-sectional member in the
data set. These are the results of repeated survey of a single (cross sectional data) sample in different
periods of time. For example, we might collect information on wage, education, and employment his-
tory for a set of individuals (same person) over a ten-year period. Panel data can also be collected on
geographical units. For example, we can collect data for the same set of regions (Oromia, Amhara,
Tigray,) in the Ethiopia on output, tax revenue, wage rates, government expenditures, etc., for the years
2000, 2005, and 2010.

37
The key feature of panel data that distinguishes it from a pooled cross section is the fact that the same
cross-sectional units (individuals, firms, or counties in the above examples) are followed over a given
time period. Because panel data require replication of the same units over time on individuals, house-
holds, and firms, it is more difficult to obtain than pooled cross sections.

Not surprisingly, observing the same units over time leads to several advantages over cross-sectional
data or even pooled cross-sectional data. One of the benefit is having multiple observations on the
same units allows us to control certain unobserved characteristics of individuals, firms, and so on. As
we will see, the use of more than one observation can facilitate causal inference in situations where in-
ferring causality would be very difficult if only a single cross section were available. A second advan -
tage of panel data is that it often allows us to study the importance of lags in behavior or the result of
decision making. This information can be significant since many economic policies can be expected to
have an impact only after some time has passed.

Most books at the undergraduate level do not contain a discussion of econometric methods for panel
data. However, economists now recognize that some questions are difficult, if not impossible, to answer
satisfactorily without panel data. To introduce the concept, we can make considerable progress with
simple panel data analysis, a method which is not much more difficult than dealing with a standard
cross-sectional data set.

1.6.5: Estimation of the model


After the model has been specified (formulated), econometrician proceed to estimation of numerical es-
timates of the coefficients of the model. This is purely a technical stage which requires knowledge of
the various estimation methods, their assumptions and economic implications for the estimates of the
parameters. There are three estimation methods: methods of least square, methods of moment, maxi-
mum likelihood methods. The nature of the problem and the model specified usually dictate the proce-
dures used. There commonly three stages in estimation such as checking model for different problems,
make necessary transformation, selection of appropriate method of estimation.

Practical Exercise 2.13


Explain the three methods of estimations?

1.6.6. Evaluation of the estimates/ Hypothesis Testing


The estimated equation may then subject to testing called hypothesis testing. Testing hypothesis is done
to test the validity and reliability of the result to a body of theory, statistical as well as econometric

38
principles which justify predicting ability of the model. The preliminary estimation of an economic
model does not always give satisfactory results. The result may be as per desired criteria which help us
justify what we have done. Sometimes results might surprise the investigator because variables that
were thought to be important a priori appear different after that fact is to be empirically tested
and found to be unimportant , or effects may go in unexpected directions( wrong sign). In this
case we have many options as remedial measures. We may reject the result or one has to reformu -
late the models and perhaps re-estimate them with different techniques.

For this purpose we have three common criteria (conditions) to evaluate the reliability of the estimates;
I. Economic a priori criteria / Economic interpretation of the results:
II. Statistical criteria (first-order tests) / Statistical interpretation of the results:
III. Econometric criteria (second-order tests) /Test of Econometric criterion. :

Step A. Economic a prior criterion:


This stage consists of deciding whether the estimates of the parameters are theoretically meaningful or
accords with the expectations of the theory. In the evaluation of the estimates of the model we should
take in to consideration the sign & magnitudes of the estimated coefficients. If the sign & magnitudes
of the parameter do not confirm the economic relationship explained by the economic theory, the model
will be rejected. Because priori theoretical criteria’s are not satisfied as per the estimates, or the esti-
mates should be considered as unsatisfactory. But if there is a good reason to accept the model then the
reason should be clearly stated.

For example, we have the following consumption function where Ct is consumption

expenditure, and Yt is income. On the basis of a priori-economic criteria represents marginal

propensity to consume and its sign has to be positive & its magnitude (size) should be between zero &

one (0< <1). Given sample data collected from population the estimated results of the above con-
sumption expenditure is

………………………………………….1.21

The result show that if income increases by 1 birr consumption expenditure will increase on the average

by less than one birr i.e .203 cents. Then, the value of is less than one & greater than zero which is in
line with the economic theory or satisfies the a priori - economic criterion.

39
But, estimation of the same model using other data gives the following results

………………………………………………1.22

But the estimated model results are contradictory or do not confirm the economic theory as the sign of

is negative & its magnitude is greater than one. This is evidence to reject the model. In most of the
cases the deficiencies of empirical data utilized for the estimation of the model are responsible for the
occurrence of wrong sign and/or size of the estimated parameters. The deficiency of the empirical data
can be either due to problem of sampling procedure, unrepresentative sample observations from the
population or inadequate data or violation of some of assumptions of the method employed.

Step-B- First order test or statistical criterion: If the model passes a prior-economic criterion, then
the reliability of the estimates of the parameters will be evaluated using statistical criterio. . Confirma-
tion or refutation of economic theories based on sample evidence is object of statistical inference.
These aim at the evaluation of the statistical reliability of the estimates of the parameters of the model
determined by statistical theory. Since the estimated value is obtained from a sample of observations
taken from the population, then statistical test of the estimated values will help us find out how accurate
these estimates are (how they accurately explain the population?). To this end some of the most com-
monly used statistical methods of testing are correlation coefficient test, standard error test, t-test, F-
test, and R2-test. The correlation coefficient ( R2/r2/) will explain that the percentage of the total varia-
tion of the dependent variable explained by the explanatory variables (how much % of the dependent
variable is explained by the explanatory variables). The standard error /deviation/ (S.E) measures the
dispersion of the sample estimates around the true population parameters. The lower the S.E. the higher
the reliability (the sample estimates are closer to the population parameters) of the estimates & vice -
versa. Similarly t- ratio or F-test is used for testing statistical reliability of the estimates. According to
our example; is MPC < 1 statistically significant? If so, it may support Keynes’ theory.

Step -C- Second order test /Econometric Criterion/: these aims at investigation of whether the as-
sumptions of the econometric theory (method) employed are satisfied or not in any particular case. In
other word it is all about detecting of the violation of assumptions or checking the validity or reliability
of the estimates. They help us establish whether the estimates have the desirable properties of unbiased-
ness, consistency, efficiency, sufficiency etc. If any one of the econometric assumptions is violated, the
estimates of the parameters cease to possess some of the desirable properties and the econometric crite -
rion above loses their validity & became unreliable.

40
To solve such problem the researcher has to re-specify the already utilized model. To do so some ad-
justments say introduce additional variable in to the model or omit some variables from the model or
transform the original variables etc. After re-specify the model re-estimation & re-application of all the
tests (a priori, statistical & econometric) until the estimates satisfies all the tests.

Practical exercise 2.15


Mention three conditions for evaluating the econometric results?

1.6.7. Forecasting or Prediction power of the model

At this stage estimated model has to be evaluated for its forecasting power and use it for predictions.
The estimated model may be economically meaningful, statistically & econometrically correct for the
sample period, but it may not have a good power of forecasting. This may be due to the inaccuracy of
the explanatory variables & deficiency of the data used in obtaining the estimated values and /or rapid
change in the structural parameters of the relationship in the real world. Therefore, this stage involves
the investigation of the stability of the estimates and their sensitivity to changes in the size of the sam-
ple. Consequently, we must establish whether the estimated model performs adequately outside the
sample of data or not. i.e. we must test an extra sample performance the model.

If this happens the estimated value (i.e. forecasted) should be compared with the actual realized value
of the relevant dependent variable. The difference between the actual & forecasted value is tested statis-
tically. If the difference is statistically significant, we concluded that the forecasting power of the model
is poor. If it is statistically insignificant the forecasting power of the model is good.

Then if forecasting power of the model is found to be good, we use the model for predictions. Given
future value(s) of X, what is the future value(s) of Y? say GDP=$600 bill in 1994, what is the forecas-

ted consumption expenditure in this given year? , then

which forecasted income. Income multiplier (m) = 1/(1 – MPC)


=3.57. That is, a decrease (increase) of $1 in income will eventually lead to $3.57 decrease (increase) in
consumption expenditure.

1.6.8. Using model for control or policy purposes

41
The next issue is to use the model for policy purpose or to influence desired value level. Suppose we
have the estimated consumption function given in (1.3.3) and government believes that consumer ex-
penditure of about 4000 will keep the unemployment rate at its current level of about 4.2 percent. What
level of income will guarantee the target amount of consumption expenditure?
Y=4000= -231.8+0.7194 X Þ X ~ 5882 ……………………………..1.23
Give MPC = 0.72, an income of $5,882 bill will produce an expenditure of $4000 birr. Accordingly,
government can manipulate the control variable X to get the desired level of target variable Y through
fiscal and monetary policy.

1.6.9. Presenting the findings of analysis


As with paper in any discipline, the quality of your writing will affect the ability of other people to
learn from your work. Therefore, good writing is essential. Mainly, grammars, vocabulary, argument
and organization are key elements that one should give due attention while writing or presenting the
report. There are a number of things you should do whenever you write a econometric paper.
Some of these are
1. Problems and objective definition: State the clearly question that your analysis will answer.
Stating the question prepares the reader to understand reasons for choosing model and data. The pa-
per must present a question to the reader and propose answer. A good paper is one that not only
clearly and convincingly demonstrate to the reader the proposed answer, but also explain the im -
portance of that answer.
2. Variable of interest and model: It should explain what economic variables you think have
relationship, and why you believe those relationship exist. For example, we believe that price
are related to incomes of consumers because increases in income tend to increases in demand for
the products that consumer is looking. An increase in demand for a product leads to increase its
prices. Furthermore, explain the variable in the data set along with units of measurement. For instance,
is GDP in million or in billions of dollar; is it nominal or real; is unemployment number of un-
employed worker or the percentage of labor force unemployed? Example, if you use an economic
model that includes the price level as a variable , you must think about whether the consumer
price index or the producer price index is a more appropriate way to measure the economic con -
cept of price level with real data. The better way will probably depend on exact nature of your
project and research questions. If you look at several variations of you basic model, identify the
one you believe is best and use that one to draw your conclusions.

42
3. Analysis type : It is customary to include descriptive statistics which give among other mean
and standard deviation of every variables so that the reader can get sense of what the data
look like. The research should clearly state unit of observations, number observations in the sam-
ple and the reason why you choose that particular sample to estimates the model. Furthermore,
it should present statistical findings of your results along with their economic meanings. Do not
make mere report like “there is a negative relationship between changes in unemployment and GDP
growth”. That is true, but it is not very interesting. What matter is economic interpretation is rela-
tionship? Important point is to give your readers enough information which convinces readers.

4. General writing type: use good writing techniques in presenting the results; outline the paper;
take care of grammars, vocabulary , organization , show each steps of argument to your reader .

I.7. Correlation and regression analysis


I.7.1. Goal
We have seen econometrics is mainly concerned with relationship among economic variables: relation-
ship between attitude and productivity; between quantity and cost; between sales and earnings and so on.
There are three major/basic goals to keep in mind when studying the relationship in bivariate data.
I. Describing and Understanding the relationships which provide background information
II. Forecasting and Predicting a new observation based on information
III. For policy purpose one may design policy intervention and then adjustment and controlling
of process will be done.

I.7.2. Types of relationship between /among variables

There are various methods of measuring relationships among variables among which three are com-
mon; covariance, correlation and regression.

1.7.2.1. Covariance
It measure how variables behave or vary together. To study relationship between cigarette smoking
and lung capacity, data from a group of people about their smoking habits, and their lung capacities
have been collected as presented Table 1.5.

43
Table 1.5: Cigarettes and lung capacity

Cigarettes Lung Capacity 50

(X) (Y)
0 45
40

5 42
10 33
30
15 31

Lung Capacity
20 29
20
-10 0 10 20 30

Smoking

Figure 1.7: scatter plot of lung cancer and smoking


We can see easily from the graph the two variables covary in opposite directions in a sense that as
smoking goes up, lung capacity tends to go down see Figure 1.7. When two variables covary in oppo-
site directions, values tend to be on opposite sides of the group mean. That is, when smoking is above
its group mean, lung capacity tends to be below its group mean. Consequently, by averaging the prod-
uct of deviation scores, we can obtain a measure of how the variables vary together. A computational
formula for covariance is as indicated using equation 1.24 below.

The Sample covariance: Instead of averaging/dividing by N, we divide by N-1 as we are using sample
data. The resulting formula is

…………………………………………….1.25

Example: Calculating covariance of the above data in Table 1.5?


Table 1.6: Cigarettes and lung capacity computation

44
Cigarettes (X) Lung Capacity (Y)
( )( )
0 -10 -90 +9 45
5 -5 -30 +6 42
10 0 0 -3 33
15 +5 -25 -5 31
20 +10 -70 -7 29

So we obtain 1
S xy  ( 215)  53.75
4

Practical Exercise 1.16

Interpret the above results of sample covariance?

There is also special form of covariance called variance. Variance of a variable is “its covariance wth
itself”. Indeed, the latter, variance, is a special case of the former, covariance.

1.7.2.2. Correlation
Correlation is a statistical technique which measure the degree or extent to which two or more variables
fluctuate with reference to one another. It measures degree and direction of linear association between
two variables. Correlation is just a number that reveal whether larger values of one variable tend
to go with larger(or with small) value of the other and vice versa. It just indicates that the numbers
seems to go together in some way. For example, we may be interested in finding the correlation (coeffi-
cient) between scores on statistics and mathematics examinations, between high school grades and col-
lege grades, and so on.

One can measure correlation of two variable(X, Y) using a scatter plot diagram gives only a rough
ideas of the relationship between X and Y. It also helps us measure the strength of the relationship be-
tween two variables. If the point lies close to the line, the correlation is strong. On the other hand, a
greater dispersion of point about the line implies weaker correlation.

For a precise quantity measurement of the degree of correlation between X and Y one can use the cor-
relation coefficients. It is denoted by Greec letter (ro) having as subscripts of the variable
whose correlation it measure(for population) say refers to the correlation of all the values of pop-

ulation of X and Y equal to . Sample correlation coefficient for a particular sample statistics de-
45
noted by with relevant subscripts. Accordingly, the correlation between X and Y for population

which represented by . Population correlation of X and Y , , is given by

………………………….1.26

Table 1.8: Correlation coefficient computation

Cigarettes (X) XY Lung Capacity (Y)


X2 Y2
0 0 0 2025 45
5 25 210 1764 42
10 100 330 1089 33
15 225 465 961 31
20 400 580 841 29
50 750 1585 6680 180

(5)(1585)  (50)(180)
rxy 
(6)(750)  502  (6)(6680)  180 2 
7925  9000

(3750  2500)(33400  32400)
1075
  .9615
1250 by
Sample correlation coefficient is estimates rxy
(1000)

……………..……………………1.27

Note that: n-1 standard deviations calculation

If sample size is large enough, then

46
………………….…………………………………..1.28

Denominator is

Note that; covariance of X and Y (but used occasionally). Sometime it is easier to use correla-
tion. Both correlation and covariance represents the same information ,but correlation represent
that information in a more accessible form. It represent whether relationship is positive or negative.

Practical Exercise 2.17


The following table1.9 shows the quantity supplied for a commodity with the corresponding
price.

Table 1.9: Data of supply and price

Time period(in days) Quantity supplied Yi(in tone ) Price Xi (in birr)
1 10 2
2 20 4

47
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
Determine the type of correlation that exists between these two variables?

1.7.2.3. Regression
There are two perspective to the definition of regression; Historical origin and Modern interpretation.
A. Historical origin of the term “Regression”
The term “Regression” was introduced by Francis Galton to mean “Regression to mediocrity” or move-
ment or tendency of certain relationship toward the average. He tried to show his theory using Family
Likeness in Stature in which the average height of children move or regress towards the average height
in the population. “ Tendency for tall parents to have tall children and for short parents to have short
children, but the average height of children born from parents of a given height tended to move (or re-
gress) toward the average height in the population as a whole (F. Galton, “Family Likeness in Stature”)
Galton’s Law was confirmed by Karl Pearson: The average height of sons of a group of tall fathers <
their fathers’ height. And the average height of sons of a group of short fathers > their fathers’ height.
Thus “regressing” tall and short sons tend toward the average height of all men. (K. Pearson and A.
Lee, “On the law of Inheritance”).
B. Modern interpretation of regression analysis
Regression is a mathematical way of expressing the average of relationship between two or more vari-
ables in terms of the original units of the data. In a regression analysis there are two types of variables;
the variable to be predicted/ influenced called dependent variable and the variable used for prediction or
which influences the values of dependent variable which is called independent variable. In regression
analysis we are concerned with statistical dependence among variables (not Functional or Determin-
istic), where we essentially deal with random variables with certain probability distributions. The de-
pendent variable is assumed have to be statistical, random, or stochastic with certain probability distri-
bution.

Our concern is with predicting the average height of sons knowing the height of their fathers. If Y =
Son’s Height; X = Father’s Height, then how heights of sons are related to height of their fathers. In

48
Figure 1.2 we assumed that the variable height of father’s was fixed at given levels and heights of son’s
measurements were obtained at these levels.

Figure 1.8: Hypothetical distribution of son’s heights corresponding to given heights of fathers
One can present many examples of regression analysis in economic variables. To mention some:
 We may want to know or to predict the average score on a statistics examination by knowing a
student’s score on a mathematics examination.
 Y = Personal Consumption Expenditure and X = Personal Disposable Income;
 Y = Demand; X = Price
 Y = Rate of Change of Wages and X = Unemployment Rate
 Y = Money/Income; X = Inflation Rate
 Y = % Change in Demand; X = % Change in the advertising budget
 Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer
The same holds true for other relationships

C. Regression equation
Regression equations of on are algebraic expression that describe the variation in the values of
for a given changes in . But common regression equation of the is that of on is ex-
pressed as follows with regression lines

49
…………………………………………………………….1.30
The regression equation of on is expressed as follows

………………………………………………………………1.31
D. Regression vs. correlation or causation:
Although regression analysis deals with the dependence of one variable on other variables, it does not
necessarily imply causation. A statistical relationship cannot logically imply causation. “A statistical re-
lationship, however strong and suggestive, can never establish causal connection: our ideas of causation
must come from outside statistics, ultimately from some theory or other” . To ascribe causality, one
must appeal to a priori or theoretical considerations. For instance, economic theory says that consump-
tion expenditure depends on income. In this case change in income causes consumption expenditure to
change.
Closely related to but conceptually very much different from regression is correlation analysis. Regres-
sion and correlation have some fundamental differences that are worth mentioning.
 In regression analysis there is a clear distinction between dependent and independent variable.
In correlation analysis we treat any (two) variables symmetrically; there are no distinction
between the dependent and explanatory variables. After all, the correlation between scores on
mathematics and statistics examinations is the same as that between scores on statistics and
mathematics examinations.
 Correlation cannot explain why the two variables are associated. One possible basis for correl -
ation without causation is that there is some hidden unobserved third factor that makes
one of the variable seems cause the other when in fact each is being caused by the missing
variable termed as spurious correlation. For example, you might find a high correlation
between hiring new mangers and building(new facilities). Are the newly hired managers
causing new plant investments? Or does the act of constructing new building cause new
mangers to be hired? Probably there is a third fact namely high long-term demand for the
firm products is causing both.
 Correlation does not necessary imply cause-effect relationship ,even when there are
grounds to believe the causal relationship exists, correlation does not tell us which variable
is the cause and which is the effect. It is reasonable when things cause another; the two tend to
be associated and therefore correlated. For example, environment and productivity, investment
and return… etc. However, there can be correlation without causation. For example, the de-
mand for a commodity and its price will generally be found to be correlated, but the ques-
tion whether demand depends on price or vice- versa will not answered by correlation.
50
 Moreover, both variables are assumed to be random. As we shall see, most of the correlation
theory is based on the assumption of randomness of variables, whereas most of the regression
theory is conditional upon the assumption that the dependent variable is stochastic but the ex-
planatory variables are fixed or no stochastic.
 Like all statistical summaries, the correlation is both helpful and limited. It provides an excel-
lent summarization of the relationship behaved linear not non-linear. Because, if there are
problems such as non-linear relationship , unequal variability of clustering or outlier in the
data correlation can be misleading.

Practical Exercise 2.18

Discuss relationship between regression, correlation, and covariance?

Empirical Example

Suppose that one wants to know the relationship between road accident and consumption of beer over
in certain town Z. Data has been collected and presented as in Table 1.15. Based on the above informa -
tion,
a. Calculate the correlation coefficient between the following series and interpret the results
b. Is there any cause and effect relationship between the two”
Table 1.10: Road accident and consumption coefficients

Year Road accident Consumption of Beer


1961 155 70
1962 150 63
1963 180 72
1964 135 60
1965 156 66
1966 168 70
1967 178 74
1968 160 65
1969 132 62
1970 145 67

Summary
Recent trend in economic analysis demanded more rigorous techniques. Econometrics play great role in
this regard. Econometrics is science of data analysis or measurement is an important aspect of econo-
metrics. It can be literary defined as “measurement in Economics”. Formally, it can be defined as “The
51
quantitative analysis of actual economic phenomena based on concurrent development of theory and
observation, related by appropriate methods of inference” (P.A.Samuelson, C.Koopmans and J.R.N.
Stone, 1954). Econometric are appropriate for the measurement of economic relationships in which
there are stochastic relationships. Econometrics may be considered as the integration of economics,
mathematics, and statistics for the purpose of providing numerical values for the parameters of eco-
nomic relationships and verifying economic theories.
Furthermore, econometrics aims primarily at the verification of economic theories, Policy and/or De-
cision Making and forecasting. Econometric research or inquiry generally precedes stages. These are
economic theory, mathematical econometric model, econometric model of theory/ statistical model,
data, estimation of econometric model, Hypothesis Testing, Forecasting or prediction and using the
model for control or policy purposes.
One of the most distinctive natures of econometrics is that it contains the random term which is not re -
flected in mathematical economics & economic theory. The adjustment consists primarily in specifying
the stochastic (random) elements that are supposed to operate in the real world and enter into the deter-
mination of the observed data. Applicability of economic theories to real-world problem has to be
checked against data obtained from the real world. But economic data are error laden. Though plenty
of data are available for research purpose, quality of data matters in arriving at a good result. The qual-
ity of data may not be good for different reasons.
Some of the techniques of analysis of relationship between economic variables are regression and cor-
relation. In regression analysis we try to estimate or predict the average value of one variable (depend-
ent, and assumed to be stochastic) on the basis of the fixed values of other variables (independent, and
non-stochastic). But, correlation analysis deals with degree of relationship between those variables.
Most economic relationships are inexact due to many factors though the mathematical model shows ex-
act relationships among economic variables.

Key terms

Policy making Forecasting Evaluation


Economic theories Mathematical economics Economic statistics
Mathematical statistics Economic model Economic theory

52
Econometric model Ordinary least square Method of moments
Maximum likelihood method Regression Correlation

Reference

Green, W. (2003): Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd.
Gujarat, O . (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Thomas R.L(1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside printing Co.
Ltd.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Theil, H., (1997): Specification error and the estimation of Econometric Relationships. Review of the
International Statitical Institute vol. 25: pp 41-51
Wooldridge, J. (2000). Introductory Econometrics: A modern approach. 2nd ed. New York:
South western publisher

Review Questions
Part I: Choose the best answer and write on the space provided.
1. Which one of the following definition of econometrics is correct?
a. measurement in economics
b. empirical determination of economics laws
53
c. science which integrate economic theory, economics statistics and mathematical economics
d. all
e. none
2. Econometrics is science that integrate
a. Economic theories
b. mathematical economics
c. economic statistics
d. mathematical statistics
e. all
f. None
3. which one of the following is true about econometrics
a. econometrics gives empirical content to most economic theory
b. econometrics methods provide numerical value of the coefficients of economic relationships
c. econometrics help as to infer relationship among variables
d. econometrics is an inter disciplinary field
e. all
f. f. none
4. which relationship and flow is relatively logical correct
a. economic theory economic model econometric models
b. economic model economic theory econometric model
c. econometric model economic theory economic model
d. economic theory econometric model economic model
e. all
f. none
5. which one of the following is not true about aims of econometrics analysis
a. testing economic theories
b. policy making
c. forecasting
d. evaluation
e. all
f. None
6. which one of the following is among the common data structure used in econometrics analysis
a. cross-sectional data b. time series data c. polled data d. panel data e. all f. none
7. correlation
54
a. express strength of relationship among economic variable
b. no distinction on variable whether it is dependent or in dependent
c. estimated value of linear correlation ranges from -1 to 1
d. is different from causation
e. all f. none
8. Regression analysis
a. Is concerned with the study of the dependence of one variable on one or more other variables
b. Is concerned with statistical dependency among variables
c. Estimates or predicts the average value of one variable on the basis of the fixed values of the
other variables
d. All of the above e. All except B f. None
9. Which one of the following is true about evaluation of estimates
a. Economic criteria which test whether it is in line with the theory and experience
b. Statistical criteria which deals with statistical reliability of the estimates of the parameters of
the model
c. Econometric criteria aim at the investigation of whether the assumptions of the econometric
method employed are satisfied or not in any particular case
d. All e. none
Part II: Briefly answer the following questions
1. Define Econometrics? How does it differ from mathematical economics & mathematical statistics?
2. What is the difference between theoretical & applied econometrics?
3. What is the difference between the model linear in parameter & & linear in variable?
4. Explain the stages in the methodology of econometrics?
5. Discuss about regression, correlation and covariance?

Part III: While answering the following question show your steps clearly Workout
1. Given theory by the well-known monetary economist Milton Fridman “ the theory of demand for
money have a strong positive relationship with price & income but has no relationship with rate of
interest”
a. Write the mathematical relationship

55
b. Formulate the econometric relationship
c. What will be the size & magnitude of the relationship between the dependent & indepen-
dent variables?
2. Assume that data is collected from 30 different families about their consumption expenditure and
disposable income. Estimate the relationship between consumption expenditure and disposable in-
come based on the data in Table 1.18?
Table 1.11: Hypothetical data on consumption expenditure (Y) and income (Xi) BPG het
1 55 80
2 65 100
3 70 85
4 80 110
5 79 120
6 84 115
7 98 130
8 95 140
9 90 125
10 75 90
11 74 105
12 110 160
13 113 150
14 125 165
15 108 145
16 115 180
17 140 225
18 120 200
19 145 240
20 130 185
21 152 220
22 144 210
23 175 245
24 180 260
25 135 190
26 140 205
27 178 265
28 191 270
29 137 230
30 189 250

Chapter Two: Simple Linear Regression Model

2.0. Introduction
The simplest relationship is a case in which there is one dependent and other independent variable. A
statistical technique appropriate to use when it is believed is that the values of one variable are
systematically determined by the values of just one other variable is termed as two variable regres-
sion models or more commonly a simple linear regression model. Although the model is simple, and
hence unrealistic in most real-world situations, a good understanding of its basic underpinnings
will go a long way towards helping you understand more complex models.
56
In this chapter we describe general techniques or methodology for analyzing the relationship between
two economic variables. To this end, we follow econometric methodology discussed in previous chap-
ter. In other word, we try to answer some questions posed such as:
 Given economic problems, how do we specify research question and economic theory?
 Given economic theory, how do we specify economic models?
 Given an economic model involving a relationship between two economic variables, how do we
go about specifying the corresponding statistical model?
 Given the statistical model and a sample data on two economic variables, how do we use this

information to obtain estimates of the unknown parameters i.e. and


 Given that the value of one variable how can we forecast/predict the value of corresponding vari-
ables?
 Given the relationship and estimates, if one variable changes in a certain way by how much the
other variable changes?
Let’s start with econometric model discussed in chapter one.

2.1. Economic theory of two variable regressions


We have many economic issues/theories with two variables designed to simplify the relationship and
identify the direction. For instance, we may be interested in studying the relationship between the quan -
tity demands of a commodity with price, income of the consumer, price of other competing commodi-
ties. The law of demand provides rationale for inverse relationship between quantity demand of a given
product and its own price. Consumption expenditures depend on income, taste and preference, family
size, religion etc. More specifically, consumption expenditure is positively related to income. In all
these examples there may be some underlying theory that specifies why we would expect one variable
is related to one or more of other variables.

The kind of relation that we study in economics behavioral relation as it is based on behavior as in the
case of a consumption function or technology as in the case of a production function. We can think of
such a relation as being an economic process. The input and output of the process are observable ,
but the actual operation of the process is not.

2.2. Economic Models (Non-stochastic Relationships)


There are two types of relationship between economic variables: deterministic and stochastic. Mathe -
matical Economic model are non-stochastic or deterministic form of dealing with economic theory or
practical experience of two variables. For notational uniformity let Y represent dependent variable and
57
X is the independent variable. A relationship between X and Y, characterized as Y = f(X) is said to be
deterministic or non-stochastic if for each value of the independent variable (X), there is one and only
one corresponding value of dependent variable (Y). The general form of model can be presented as:

-------------------------------------------------------2.1
In developing an economic model that specifies how expenditure relates to income, we label
household expenditure (Y) and corresponding household income as X. It may express relationship
between them in general form as indicated equation 2.1.

We need to know precise relationship between dependent and independent variables (linear or non-lin-
ear). In practice we never know the exact functional form for the relationship. We often use eco -
nomic theory or the information contained in the data to help us choose a reasonable one. Assum-
ing linear relationship between economic variables with algebraic form the economic model specified
as

In such model we are interested in measuring rate of changes of dependent variable for a unit change in
independent variable, slope.

………………………………2.3
Practical Exercise 2.1
Assuming that the supply for a certain commodity depends on its price (other determinants taken to be
constant) and the function being linear, present the model with justifications.

2.3. Statistical model and/or Econometric model or stochastic model


2.3.1. Statistical model
When moving from the economic model to statistical model we consider three issues; error term,
observations, sampling process. The true relationship which connects the variables (Y and X) involved
is splited into two parts: a part represented by a line and a part represented by the random term ‘u’.

A. Observations

Since we collected data from different observations on Y and X the subscript i is introduced to
describe the ith observation on each of the variable. Let’s illustrate the distinction between stochastic

58
and non-stochastic relationships with the help of a supply function. If there is exact relationship be -
tween supply (Y) and price (X ), each value lies of straight line. If the model is a stochastic one the de -
pendent variable is not only determined by the explanatory variable(s) included in the model but also by
others factors which are not included in the model. All those factors are accounted by ‘disturbance’
term from the exact linear relationship which is assumed to exist between X and Y. For example, if we
gather observations on the quantities actually supplied in the market at various prices and plot them on a
diagram we see that they do not fall on a straight line. If we look a random sample of a household
and recoded average expenditure (Y) and average income(X) for each of these household, then it
would be unrealistic to expect each observed pair (Y, X) to lie exactly on straight line as pictured
in figure 2.1.

Figure 2.1: Regression line and the scatter diagram

We theorize that the value of variable Y for each observation is determined by the equation. In such a
case, for any given value of X, the dependent variable Y assumes some specific value only with some
probability. The scatter observations represent the true relationship between Y and X. Accordingly, we

observe
Y 1 ,Y 2 ,......,Y n corresponding to X 1 , X 2 ,. . .. , X n . We would observe points on the line such as

Y '1 ,Y '2 ,......,Y 'n corresponding to . The line represents the exact part of the relationship. The
deviation of the observation from the line represents the random component (random disturbance) of the

relationship. These points diverge from the regression line by


u1 , u2 ,. .. . ,u n

Yi
59
u1
u3
u2

Figure 2.2: Regression line and error term


The stochastic relationship presented in equation (2.5) with one explanatory variable is called simple lin-
ear regression model. Assuming simple linear relationship between X and Y, the model can be specified
as

where are parameters/coefficients of regression model, and or ‘ε ’ is disturbance term


and is also called error term or random disturbance or stochastic term, or stochastic disturbance or
stochastic error term.
B. Error term
The relationship between variables will be split in to two parts. The first component in the bracket is the

part of Y explained by the changes in X and represents the exact relationships explained by
the line. The second is the part of Y not explained by X, that is to say the change in Y is due to the ran-

dom influence of
ui . This can be explained using the figure 2.2 in which u , u , u …etc are random
1 2 3

once. The deviations of the observations from the line may be attributed to several factors such as;
1. Omission of variables from the function: In economic reality each variable is influenced by
very large number of factors and each variable may not be included in the function because
a. Some of the factors may not be known.
b. Even if we know some the factors, they may not be measured statistically. For example
psychological factors (taste, preferences, expectations…. etc) are not measurable
c. Some factors are random appearing in an unpredictable way & time. For example epi-
demic earth quacks etc.
d. Some factors may be omitted due to their small influence on the dependent variables
e. Even if all factors are known, the available data may not be adequate for the measure of
all factors influencing a relationship.

60
One can present the model of economic reality in which a very large number of factors are
influencing the variable as

……………………………………………..2.5

However, not all the factors influencing a certain variable can be included in the function for var-
ious reasons.
(2) Random behavior of the human beings(behavior)
The human behavior may deviate from the normal situation to a certain extent in unpredictable
way. For example, in a moment’s whim a consumer may change his expenditure pattern although
income and prices did not change. The scatter of points around the line may be attributed to an
intrinsic randomness which is inherent in human behavior in individual Y that cannot be ex-
plained not matter how hard we try.

(3) Imperfect specification of the mathematical form of the model


Even if we have theoretically correct variable which explain a phenomena, we do not know exact
functional form for the relationship. We may wrongly specify the relationship between variables.
For example, we may linear function instead of non- linearly related relationships or we may use
a single equation models for simultaneously determined relationships.

Non-linear relationship

Linear relationship

Error term is to capture any approximate error that arise because of linear functional form used
and/ or some equations left out of the model.

(4) Errors of aggregation


Many economics data are available in aggregate form in which we add magnitudes referring to
individuals whose behavior are dissimilar We often use aggregate data on consumption, income,
government expenditure etc. Aggregation of data introduces error in relationship as variables ex-
pressing individual peculiarities are missing.
61
(5) Unavailability of data
Even if we know what some of the excluded variables are and consider a multiple regres-
sion than simple one, we may not have quantitative information about these variables. It is
common experience in empirical analysis that the data we would ideally like to have often
are not available . For example, in principle we could introduce family wealth as an ex-
planatory variable in addition to the income variable to explain family consumption expendi-
ture. But, unfortunately, information on family wealth generally is not available. Therefore, we
may be forced to omit the wealth variable from our model despite its great theoretical rele-
vance in explaining consumption expenditure.
(6) Errors of measurement
Although we assume that the variable Y and X are measured accurately, in practice the data may
be plagued by error of measurements due to the methods of collection and processing statistical
information.
(6) Poor proxy variables
Some variables may not be measured accurately due to some psychological reasons. For instance;
people do not accurately report their income. To this end expenditure is used as proxy variable for
income in poverty study. While using such proxy, we may correctly measure variables. Consider
Milton Freidman’s well-known theory of consumption function. He regard permanent consumption
(Yp) as a function of Permanente income(Xp). But since data on these variables are not di-
rectly observable, in practice we use proxy variables, such as current consumptions (Y) and cur -
rent income(X) which can be observable . Since, the observed Y and X may not equal Yp and
Xp, there is the problem of error of measurement. The disturbance term may in this case repre -
sent error in measurement.

(7) Principle of parsimony


We would like to keep our regression model as simple as possible. If we can explain behavior of
Y “substantially’’ with two or three explanatory variables and if our theory is not strong enough
to suggest what other variables might be included , there is no need to include more variables.
Let Ui represent all other variables not included in the model. Of course, we should not exclude
relevant and important variables just to keep the regression model simple.

62
The first four sources of error render the form of the equation wrong, and they are usually referred to as
error in the equation or error of omission. The fifth source of error is called error of measurement or er -
ror of observation. In order to take in to account the above sources of error, we introduce in econometric
functions a random variable u.

C. Sampling distributions
To make statistical model complete some mechanism for modeling Ui is required. That is, the sampling
process underlying the observed Yi is directly related to the mechanism made about the random
variable Ui and Y as well as and their probabilities distribution. Some of assumptions is that; u i is

random , variance is ,
.

Practical Exercise 2.2


1. Assume that you want to study relationship between consumption expenditure and income,
a. Present non stochastic model
b. What are the possible sources of error in relationship among variables?

2.3.2. Econometric model

Building econometric model depends on set of assumptions forwarded by classical termed Classical Lin-
ear Regression Model Assumption (CLRMA). They are cornerstone of most econometric theory. These
assumptions amount to a set of statement about population distribution, distribution of error terms (its
mean, variance, covariance.); relationship between error term and other variables. We study sets of as-
sumption which should hold and then go beyond them to the extent that we will at least study what hap-
pens when they go wrong and how we may test whether they have gone wrong.

Assumptions are mainly in three regards;


a) Some related to error terms, it’s the distributions and their correlations
b) Some about the relationship between ui’s & the explanatory variables
c) Some refer to the relationship between the explanatory variables themselves or functional forms
for relationship between variable
The most important of these assumptions are discussed below.
a) Assumptions related to error terms, their correlation and it’s the distributions

Assumption 1:
U i is a random real variable
63
The value which u may assume in any one period depends on chance; it may be positive, negative or
zero. Every value has a certain probability of being assumed by u in any particular instance. It says that
the factor not explicitly included in the model and subsumed in u i do not systematically affect the mean
value of Y.
Assumption-2: The mean value of the random variable (U) in any particular period is zero
The random variable(u) may assume various values, some greater than zero and some less than zero,
but if we considered all the positive and negative values of u. , for any given value of X, they would
have on average value equal to zero. In other words the positive and negative values of u cancel each
other.

Mathematically, E(U i )=0 ……………………………………………..….(2.15)


Furthermore, given the value of X, the mean value of the random disturbance term ui is zero. Technic-
ally, the conditional mean value of ui is zero. Symbolically,

In a nutshell, this assumption implies that the factors not explicitly included in the model and therefore

subsumed in , does not systematically affect the mean value of Y; i.e., the positive values cancel

out the negative values so that their average effect on Y is zero.

this assumption leads to the fact that:


Given

Assumption-3: The variance of the random variable (U) is constant in each period

The variance of about its mean is constant at all values of X. In other words, for all values of X,
will show the same dispersion around their mean. That is, the average variation around regression line is

the same across the X values. It neither increase or decrease as X varies (or big is not more likely to

occur when Xi is big or vice versa). The will tend to be distributed evenly around the line at all level
of Xi. This assumption implies that the values of Y corresponding to various values of X have constant
variance.
Mathematically;
Var (U i )=E [U i −E(U i )]2 =E(U i )2 =σ 2 (Since E(U i )=0 ). …………2.16
For joint relationship

64
For all values of X, the u’s will show the same dispersion around their mean. In Figure .2.6 this assump-
tion is denoted by the fact that the values that u can lie within the some limits irrespective of the value of

X. For X 1 , U can assume any value within the range AB; for X 2 , U can assume any value within the
range CD which is equal to AB and so on.

Figure 2.3: Homoscedastic variance and its distribution


This constant variance assumption is called homoscedasticity assumption and the variance itself is
called homoscedastic variance.

Assumption-4: The random variable (U) has a normal distribution

This assumption is important as testing needs the distribution of . The commonly used (assumed) dis-
tribution for Ui is normal distribution. The values of u (for each x) have a bell shaped symmetrical dis-
2
tribution about their zero mean and constant varianceσ , i.e.
U i  N (0 , σ 2 ) ………………………………………..……2.17

Assumption 5: The random terms of different observations (


U i ,U j )
are independent.
This is the assumption of no autocorrelation or no error terms are correlated with one another. The
value which the random term assumed in one period does not depend on the value which it assumed in
any other period.

65
Given any two Xi values: and ( and ) , the correlation between any two and is zero.
Algebraically,
Cov (ui u j )=Ε [ [(ui −Ε(ui )][u j −Ε(u j )] ] since = =0
=E (ui u j )=0 …………………………..….(2.18)
That is, the choice of one household does not influence the choice of other household.
B. Assumption about the relationship between Ui’s & the explanatory variables
Assumption-6: non-stochastic Xi in repeated sampling

It says that
X i assume a set of fixed values in the hypothetical repeated sampling which underlies the

linear regression model. That is, in taking large number of samples on Y and X, the
X i values are the

same in all repeated samples, but the


ui values differ from sample to sample, and so do the values of .
Consider an experiment that studies the responses of crop yield to fertilizer applications. The amount of
fertilizer applied to different fields (plot of land) is set by researcher as constant.
Table 2.1: Different sample and corresponding errors

Sample-1 Sample-2 Sample-3 Sample-4


X1=80 X1=80 X1=80 X1=80
Y11=60 Y12=70 Y13=65 Y14=75
U11 U12 U13 U14
The yield then depends on a random terms

Sample one

Sample one
This assumption simplifies our analysis. Given a fixed value of X i, the model indicate that any Yi is

generated as a function of the single random variable and non-random term, . Hence,
Yi is a function of one rather than two variables. It’s expected value and variance are particularly simple
to evaluate.

……………………………………………….……..2.19
Furthermore, the fact that Ui is a linear function of normal random variable allow us summarize the dis-
tribution Yi as normal

66
The alternative way is to treat X as random variable with expectation and variance to be evaluated con-
ditional on Y. This is when X and U i may be correlated. This conception may be appropriate for the ex-
perimental science in which the explanatory variable is under the control of the researcher. It is not ob-
viously the case for most problems in economics where data are non-experimental and observations on
explanatory variables are outcomes of random sampling the process.
Assumption 7: Variability in X values
This means that X assumes different values in a given sample but fixed values in hypothetical repeated
samples. Without this assumption it would be impossible to estimate the parameters and conduct regres-
sion analysis. For example, if there is little variation in family income, we will not be able to explain
much of the variation in the consumption expenditure of the families. Technically, Var(X ) must be a fi-
nite number.
Assumption 8: The random variable (U) is independent of the explanatory variables.
There is no correlation between the random variable and the explanatory variable in the model. They
are distributed independently of the disturbance term. If two variables are unrelated their covariance is

zero. Hence Cov ( X i , U i )=0 …………………………………….……………..….(2.20)

Proof:- cov ( XU )= Ε [ [( X i−Ε ( X i )][U i −Ε(U i )]]


=Ε [( X i −Ε( X i )(U i )] given E(U i )=0
=Ε ( X i U i )−Ε( X i ) Ε(U i )
=Ε ( X i U i )
= X i Ε(U i ) , given that x i are fixed
=0
If the error term has correlated with any of independent variables, it leads to bias in OLS estimator,
called endogeneity bias. The correlated explanatory variable is called endogenous explanatory variable.
The term “endogenous explanatory variable” has evolved to cover any case where an explanatory vari-
able may be correlated with the error. This could be caused as a result of omitted variable bias, func-
tional form misspecification problem and /or presence of measurement error. Since the measurement er-
rors make independent variable stochastic or since the measurement error appears in both the new dis-
turbance term and regressor’s(incorrectly measure independent variable), this new estimated equation
has a disturbance that would be at the same time correlated with a regressor.

C. Relationship between the explanatory variables themselves or functional forms for relationship

67
between variable
Assumption-9: The model is linear in parameters.
There are two forms of linearity: Linearity in the variables and linearity in the parameters.
A. Linearity in the Variables
The first and perhaps more “natural” meaning of linearity is that the conditional expectation of Y is a lin-
ear function of Xi. Geometrically, the regression curve is a straight line. Model specified as
2
is linear in variable as opposed a regression function such as E(Y | Xi) = β1 + β2 X i
is not a linear function because the variable X appears with a power or index of 2.
B. Linearity in the Parameters
The second interpretation of linearity is that the conditional expectation of Y, E(Y | Xi), is a linear func-
tion of the parameters, β’s. It may or may not be linear in the variable X. For instance, regression model
E(Y | Xi) = β1 + β2 X 2i is a linear in the parameter. The following model satisfies the assumption since the
classical worry on the parameters,
Y =α + βx +u
i. is linear in both parameters and the variables, so it satisfies the assumption
ln Y =α+β ln x +u
ii. is linear only in the parameters.
ln Y 2 =α + β ln X 2 +U i
iii.

Y i =√ α+βX i +U i
iv. non linear in parameter

Linearity of above model implies that one unit change in X has the effect on Y regardless of initial
value of X. But, this is unrealistic for many economic applications such as returns from education and

others. . To utilize such model one can transform the nonlinear form to linear form in parame -
ter depending on the situations by taking log both side . For example

The classical assumed linearity in the parameter regardless of whether the explanatory and the depen-
dent variables are linear or not. It is difficult to estimate parameters if they are non-linear. Therefore,
from now on the term “linear” regression will always mean a regression that is linear in the parameters;
the β’s (that is, the parameters are raised to the first power only). It may or may not be linear in the ex -
planatory variables, the X’s.

68
Assumption -10: The explanatory variables are measured without error
a. regressors are error free, and U absorbs the influence of omitted variables and possibly
errors of measurement in the Y’s and hence Y is measured correctly.

b. The dependent variable


Y i is normally distributed.

∴ Y i ~ N [ (α+ βxi ), σ 2 ]
i.e ………………………………..…………(2.22)
Ε( Y )=Ε ( α + βx i +ui ) =α + βX i
Proof: Mean: Since Ε(u i )=0
Var (Y i )=Ε ( Y i−Ε(Y i ) )2 =Ε ( α + βX i +ui −( α + βX i ) ) 2
Variance:
=Ε (ui )2 =σ 2 (since Ε( u i )2 =σ 2 )
∴ var (Y i )=σ 2
………………….……….(2.23)
Y u
The shape of the distribution of i is determined by the shape of the distribution of i which is normal

by assumption 4. Sinceα and β , being constant, they don’t affect the distribution of . Furthermore,

the values of the explanatory variable, , are a set of fixed values by assumption 5 and therefore don’t

∴ Y i ~ N (α+βx i , σ 2 )
affect the shape of the distribution of .
c. Successive values of the dependent variable are independent, i.e

Cov (Y i ,Y j )=0 …………………………….……..…………….2.24


Proof: Cov (Y i ,Y j )=E {[ Y i−E (Y i )][Y j−E (Y j )]}
=E {[α + βX i +U i −E(α + βX i +U i )][ α + βX j +U j −E( α+ βX j +U j )}
Y =α + βX i +U i andY j=α+ βX j +U j )
(Since i
= E [( α+ βX i +Ui−α−βX i )(α+ βX j +U j−α−βX j )] , since Ε(u i )=0
=E (U i U j )=0 (From equation (2.5))
Cov (Y i, Y j )=0 . Therefore,
Assumption 11: The regression model is correctly specified:
The mathematical form of the model is correctly specified and all important explanatory variables are
included in it. In other words, there is no specification bias or error in the model used in empirical ana-
lysis. Unfortunately, in practice one rarely specifies the correct model. Hence, an econometrician would
use some judgment in choosing the correct model based on some a priori or theoretical grounds. Viola-
tions of this assumption are due to omitted variables, wrong functional form, and inclusion of irrelevant

69
variables in the model which will make the validity of regression result and its interpretation question -
able.

Practical Exercise 2.4


What are the basic rationales for using random error term?

2.4. Data and area of applications


Much has been said about data in chapter one. In this chapter we focus on using cross-sectional data for
simple regression model.

2.4.1. Data type and Hypothetical Example


Given the data of certain population one can see the relationships between two variables: simple and/or
conditional one. The reader may very well question the assumption that all other variables that in -
fluence on expenditure are observable. What about the price of food, household size, and the age- sex
composition of the household? Since the information is collected from a random sample of house-
holds taken at a given point in time, it is reasonable to assume that the price faced by all house -
hold is same, but age, sex composition are different matter. If we had data on these variables, it
would be good to include them in expenditure function. For the sake of simplicity we use model of
two variables.
A. Simple relationship
Simple relationship indicates one to one correspondence between X and Y. Suppose a weekly consump-
tion expenditure of household for different income level is given as follows. There is unique relationship
between X and Y.

Table 2.2: Household income(X) and consumption expenditure(Y)

Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260

70
The model functional relationships between variables can be written as

……………………………………………………….(2.6)
B. Conditional relationship
Economic relationship is not as such one to one correspondence, rather there is joint relationship. For
each Xi there are many Y values which varies between Y 1 to Yn and hence Ui. The same holds true for Xi
which range X1 …..Xn… Many Y value such as Y1, Y2, Y3…….,Yn related to fixed value of X. To under-
stand this, consider the data given in Table 2.2 about total population of 60 families in a hypothetical
community, their weekly income (X) and weekly consumption expenditure (Y), both in dollars. The 60
families are divided into 10 income groups (from $80 to $260) and the weekly expenditures of each
family in the various groups are as shown in the table. Therefore, we have 10 fixed values of X and the
corresponding Y values against each of the X values. Hence, there are considerable variations in weekly
consumption expenditure in each income group.

How do we look at the relationship between X and various value of Y? We use conditional expected
value to get unique value of Yi corresponding to each fixed value of Xi. It is important to distin-
guish these conditional expected values from the unconditional expected value of weekly consumption
expenditure, E(Y). Unconditional expected value or unconditional mean E(Y i) is obtained by summing
Yi over the entire sample and dividing it by number of sample. Adding the weekly consumption ex -
penditures for all families and dividing by the 60 families in the population or total population, we get
unconditional mean $121.20 ($7272/60). It is unconditional in the sense that in arriving at this number
we have disregarded the income levels of the various families.

Conditional expectation is obtained by finding unconditional mean Y i corresponding to each Xi. Obvi-
ously, the various conditional expected values of Y given in Table 2.3 corresponding to different Xi. The
expected value of weekly consumption expenditure of a family, whose monthly income is, say, $140, is
$101. This is conditional mean as if find average of Y corresponding to income of 140.
Table 2.3: Weekly family consumptions expenditure and conditional mean income

71
We have the mean, or average, weekly consumption expenditure corresponding to each of the 10 levels
of income. Thus, corresponding to the weekly income level of $80, the mean consumption expenditure
is $65, corresponding to the income level of $200, it is $137. In all we have 10 mean values for the 10
subpopulations of Y. We call these mean values conditional expected values, as they depend on the
given values of the (conditioning) variable X. There is considerable variation in weekly consumption ex-
penditure in each income group, which can be seen clearly from Figure 2.1. The general picture show
that, despite the variability of weekly consumption expenditure within each income bracket, on the aver-
age, weekly consumption expenditure increases as income increases.

For regression line construction we take mean value of Y or E(Y/X) and develop unique Y that repre -
sents the group. Regression analysis is largely concerned with estimating and /or predicting the popula-
tion means value of the dependent variable on the basis of known or fixed value of explanatory variable.
In can be specified as
Yi = E(Y | Xi) + ui

Geometrically, a population regression curve (line) is simply the locus of the conditional means of the
dependent variable for the fixed values of the explanatory variable(s).

72
Figure2.4: Conditions distribution of expenditure for various level of income
Figure 2.4 indicate that, given the income level of Xi, an individual family’s consumption expenditure is
clustered around the average consumption of all families E(Y/X). It is clear as family income increases,
family consumption expenditure on the average increases, too. But an individual family’s consumption
expenditure does not necessarily increase as the income level increases. For example, from Table 2.2 we
observe that corresponding to the income level of $100 there is one family whose consumption expendi-
ture is $65. But notice that on average consumption expenditure of families with a weekly income of
$100 is greater than the average consumption expenditure of families with a weekly income of $80 ($77
versus $65).

2.4.2. Population Regression Function (PRF)

Our objective in regression analysis is to estimate coefficient and make decisions. This can be done us-
ing data collected from population and/ or sample. Hence, before we move for further analysis it is bet-
ter to clarify population and sample regression function or line. It is regression equation developed
based on data collected from population. For a simple data on Y and X simple linear relationship model
with stochastic terms can be specified as:

………………………….………………………………….(2.8)
When we are working with joint variable, we take expected value Y i given X or conditional mean E(Y |
Xi) as a function of Xi. When E(Y | Xi) is a linear function of Xi and is known as the conditional expecta-
tion function (CEF) or population regression function (PRF) or population regression (PR).

Symbolically, E(Y | Xi) = f (Xi) ------------------------------------------------------------------- (2.9)


73
Where f (Xi) denotes some function of the explanatory variable X. It states merely that the expected value
of Y given Xi is functionally related to Xi which tells us how the mean or average response of Y varies
with X. As a first approximation or a working hypothesis, the non-stochastic population regression func-
tion (true PRF) is a linear function of Xi, can be specified as
E(Y | Xi) = β1 + β2 Xi ------------------ population regression line------------ (2.10)
Corresponding stochastic model can be written as Yi = E(Y | Xi) + ui
Yi = β0 + β1 Xi + ui -----------------population regression function --------- (2.11)
Where β1 and β2 are unknown but fixed parameters known as the regression coefficients
How do we interpret (2.11.)? We can say that the expenditure of an individual family, given its income
level, can be expressed as the sum of two components: (1) E(Y | Xi), which is simply the mean consump-
tion expenditure of all the families with the same level of income. This component is known as the sys-
tematic, or deterministic, component, and (2) ui, which is the random, or nonsystematic, component.

2.4.3. Sample Regression Functions (SRF)


However, in most practical situation we do not have full information on population due to unmanage-
ability and resource problem for data collection. That is, what we have is but a sample of Y values corre-
sponding to some fixed X’s. Therefore, our primary objective in regression analysis is to estimate the
PRF on the basis of the sample information. Analogous to the PRF that underlies the population regres -
sion line, we can develop the concept of the sample regression function (SRF). Accordingly, on the
basis of PRF specified in equation 2.8, the sample counterpart of regression function (SRF) is
Yi = ^β 0 + ^β 1Xi + u^ i ---------------------------------------------------------------- (2.12)
Where Y^ i is read as “Y-hat’’ or “Y-cap ‘or predicted or fitted Y, u^ i denotes the (sample) residual term. It
is given this name as once a sample regression equation line has been obtained; it is possible to obtain a
predicted level of Y. Accordingly, Y^ i = estimator of E(Y | Xi), ^β 0 = estimator of β0, ^β 1 = estimator

u^
of β1, conceptually i is analogous to ui and = estimator of . Note that: an estimator also known as
a (sample) statistic which simply tells us a rule or formula or method that tells how to estimate the pop-
ulation parameter from the information provided by the sample at hand. A particular numerical value ob-
tained by the estimator in an application is known as an estimate.
The predicted Y (Y^ i) is unlikely to concede with the actual Y. That is, all the point in the scatter diagram
would not line on fitted sample regression line. Now just as we expressed the PRF in equivalent forms
with the SRF (2.13) in its stochastic form as follows:

…………………………………………………………………2.13
The sample counterpart of Eq. (2.10) may be written as
74
E(Y | Xi) = ^β 0 + ^β 1Xi
Y^ i = ^β 0 + ^β 1Xi ---------------------- sample regression line-------------------- (2.14)
More, specifically, if Xi is given we can predict Y1 as Y^ i = ^β o + ^β 1Xi.. The figure 2.4 below indicate that
the PRF is by Yi = βo + β1Xi . But, sample regression (SRF) by Y^ i = ^β o + ^β 1Xi. One can find corre-
sponding difference in error term.

Figure 2.4: Population and sample regression line


The population of Table 2.1 was not known to us and the only information we had was a randomly se-
lected sample of Y values for the fixed X’s as given in Table 2.2. Unlike Table 2.1, we now have only
one Y value corresponding to the given X’s; each Y (given Xi) in Table 2.2 is chosen randomly from sim-
ilar Y’s corresponding to the same Xi .

The question is that can we predict the average weekly consumption expenditure (Y) in the population
using sample data? In other words, can we estimate the PRF from the sample data? As one surely sus-
pects, we may not be able to estimate the PRF “accurately”. That is, population regression line differs
from sample regression line as we may not able to estimate the PRF correctly. The reason is that we pro-
duce many sample regression lines because of sampling fluctuation; as sample drawn changes, so do
SRF and hence many samples. To see this, suppose we draw another random sample from the popula-
tion of Table 2.1, as presented in Table 2.3-B.

Table 2.4: random samples from the population of table 2.1

Sample A Sample B
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260

Plotting the data of Tables 2.3 Sample A and B, we obtain the scattergram given in Figure 2.2. In the
scattergram two samples regression lines are drawn so as to “fit” the scatters reasonably well: SRF 1 is

75
based on the first sample, and SRF 2 is based on the second sample. Which of the two regression lines
represents the “true” population regression line? There is no way we can be absolutely sure that either of
the regression lines shown in Figure 2.2 represents the true population regression line (or curve).

Figure 2.5: Regression lines based on two different samples


The regression lines in Figure 2.5 are known as the sample regression lines. They are approximation of
the true PRL. In general, we would get N different SRFs for N different samples, and these SRFs are not
likely to be the same. Which sample regression line best fit or approach the population regression line?
This takes to estimation.

Practical Exercise 2.3


Differentiate difference between sample regression and population regression function?

2.5. Estimation of the Classical Linear Regression Model


2.5.1. Methods of Estimation
Now assume that we have completed the work involved in the first four stages of the econometric meth -
odology discussed in chapter one; namely economic theory, mathematical and econometric model, along
with assumptions and collected the required data. Given a theoretical model for explaining how X is re-
lated to Y, the next step is the estimation of the numerical values of the parameters of economic relation-
ships. Given our stochastic model for simple linear regression specified above, our objective is to esti-
mates coefficients ( and ) based on sample information on X and Y. Estimation is a process of us-
ing sample data and statistical procedures to estimate a statistics (sample estimate) and infer about the
value of unknown population parameter. When we calculate estimators of population parameters using
sample data, we use sample regression model specified as Yi = ^β o + ^β oXi + u^ i..

Before going directly to estimation let’s see some of the underlying theoretical framework for estima-
tions of coefficients and related model specification. We can get estimates of the u i after the estimation
of regression line through deviations of the observation from this line.

76
The parameters of the simple linear regression model can be estimated by various methods. Three of the
commonly used methods are:
1. Ordinary least square method (OLS)
2. Maximum likelihood method (MLM)
3. Method of moments (MM)
Here we will deal with derivations and mechanics of OLS, MM and MLM methods of estimation.

2.5.2.1. The ordinary least square (OLS)


OLS is the most powerful and popular method in regression analysis due to many reasons. First, OLS is
an essential component of most other econometrics techniques. Second, the mechanisms of OLS are
simple to understand. Third, the computational procedures of OLS are fairly simple as compared with
other econometrics techniques. Fourth, OLS data requirement is not excessive. Fifth, parameters esti-
mates obtained by OLS have some optimal properties.

How do OLS works?


OLS is a rule that try to choose a line that best fits the data point in some sense or line that best express
population of concern and their economic relationships. To this end the method tries to find sample esti-
mate of population parameter through minimization of sum of square of errors. The model

is called the true relationship between Y and X with their respective population pa-

rameters ( ). But it is difficult to obtain the population value of Y and X because of technical
(unmanageability of census, depth of analysis) or economic (material, human and financial costs) rea-
sons. So we are forced to take the sample value of Y and X.

Therefore, we have to obtain a sample of observed values of Y and X, specify the distribution of the er-
ror (U) and try to get satisfactory estimates of the true parameters of the relationship. This is done by fit-
ting a regression line (SRF) which would be considered as an approximation to the true line. The param-
eters estimated using the sample value of Y and X are called the estimators of the true parameters

and are symbolized as for the sample counterpart. Accordingly, the model

, used to represents the sample counterpart of the population regression function be-
tween Y and X.

77
Our statistical experiment is concerned with repeated sampling. In repeated sampling process there are

many regression lines with difference in . Each SRF line has unique values. Then,
which of these lines should we choose? Generally we will look for the SRF which is very close or if pos-
sible exactly fit to the (unknown) PRF. We need a rule that makes the SRF as close as possible to the ob-
served data points. But, how can we devise such a rule? Equivalently, how can we choose the best tech-

nique to estimate the parameters of interest, ? We use the method that minimizes the sum of
square of error terms, OLS.
Steps in developing OLS estimator
Step-1: find SRF and fitted regression line corresponding to the population regression function and line
can be best estimator if error terms are as small as possible.

PRF : …………………………………….2.25
SRF: ,………………………………….2.26
SRL (Fitted line): then
Step -2: using SRF and SRL (fitted line) derive sample error term

--------------------------------------------------------------2.27
Given our observation on Y and X, we would like to determine the SRF in such a way that it is as close
as possible to the actual Y. Intuitively this is possible when the deviations from the line are smaller.
This can be done by choosing the SRF in such a manner that the sum of the residuals

is as small as possible

……………………..……..……………….2.28
This approach is not an appropriate, no matter how intuitively appealing it may be. The reason for this is

that the minimization of gives equal weight to different deviations; no matter how large or small

the deviations may be; i.e., it attaches equal importance to all ’s no matter how close or how widely
scattered the individual observation are from the SRF.

78
Figure 2.7: Error term and fitted SRF

Consequently, the algebraic sum of the is small (even zero) although individual are widely

scattered about the SRF. For instance, and is positive whereas and are negative, then the

weighted sum will be + + + =0. Can we minimize a quantity which is zero by definition and

solve the above problem? No . This means that the minimization of the doesn’t necessarily imply

that individual deviations ( ’s) are minimized.


Step-3: Squaring these deviation and sum them
To ease this problem, we adopt the least squares criteria by squaring the deviation and minimizing the
sum of square. This criterion requires the regression line to be drawn (its parameters to be chosen) in
such a way as to minimize the sum of the squares of the deviations of the observations from the mean. It
enables us to choose the line that makes this sum of squares of the vertical distance from each point on
the line as small as possible.

…………………………………..2.29
This approach gives more weight to residuals with wider dispersion than those with closer dispersion
around the line.Accordingly, we give more weight to residuals such as U 1 and U4 (more deviations from
sample line) and less to U2 and U3 .From the estimated relationship , we obtain:
……………………………….…….……..(2.30)

………………………..………..……(2.31)

79
Sum of squared residuals is some function of estimators such as . For any given data sets as

choosing different values for coefficients will give different value U i’s and . We can find the line

that minimize the and best estimate the population parameters by minimizing the above equation.

……………………2.32
Step-4: takes partial derivative and minimizes equation
The minimization problem can be handled using differential calculus by taking partial derivatives of

∑ e2i with respect to and use for optimization conditions (first and second order condition).
That is set first order partial derivative equal to zero.

A. First order condition of will be ………..….2.33

(Composite function)
First order condition for the equation is

(multiply by -1/2 on both sides)

………………………………..2.34
B. Partial derivative for the second coefficient ( ) will be

80
…………………………2.35

First order condition of will be

(chain rules)

(multiply either side by -1/2 )

Note that: since , this equation says which is related to our as-
sumption. It indicates that when the pair wise product of X i and Ui are summed over full sample the total
is zero.

-------------------------------------2.36
Step-5: Solve normal equations simultaneously
We have two normal equations from equation 2.34 and 2.35

----------------------------------------2.38

Solving the normal equation simultaneously we obtain estimate for we

i. From equation 2.38 we can compute

……………………………………..2.39

Plugging the value for in to equation 2.37above we obtain value for

81
plug in value of

make common denominator

move the term with out and the term with

other side

(multiply by minus -1)

ii. Solving for the equation for taking equation 2.39

Then solve for

………………………………2.41
Plug 2.40 in equation 2.33

82
Rearranging the above equation

use and multiply it by

use as a common denominator

rearrange to find formula for

n n n
 n

  Y  X   Y X   X
i i
2
i i i 

0  i 1 i 1 i 1 i 1
…………………………………2.42
 n 2
n X 2 X  
n

 
i 1
i  i  
 i 1 
 

Deviation form

The above formula for and can be expressed in terms of deviation from mean value
From equation – 2.33 we have

(Divide it by n)

83
………………………………..2.43
Sample regression line passes through means of the data in such a way that the residual sum is zero.
Some residuals are positive, others are negative so that positive residual tend to cancel with negative
one. From this normal equation cancellation is exact.

Substituting the values of in equation 2.38 we get:

Divide and by n/n

2
Σ XY −n X̄ Ȳ = ( ΣX i −n X̄ 2)

…………………………………………………..…….(2.44)
Equation (2.43) can be rewritten in somewhat different way (deviation form) as follows;
Σ( X − X̄ )(Y −Ȳ )=Σ( XY −X Ȳ − X̄ Y + X̄ Ȳ )
=Σ XY −Ȳ ΣX− X̄ ΣY +n X̄ Ȳ
=Σ XY −n Ȳ X̄−n X̄ Ȳ +n X̄ Ȳ divide it by n/n

………………………………………………..2.45

………………………………………...…………….2.46
Substituting (2.44) and (2.45) in (2.43), we get

Now, denoting ( X i − X̄ ) as
x i , and (Y i −Ȳ ) as y i we get;

The expression in (2.47) estimates the parameter coefficient in deviation form.

84
Example one: Using the sample data of Table 2.1, which for convenience is reproduced as Table 2.5
estimate the relationship between consumption expenditure (Y) and income (X) as a test of the Keyne-
sian consumption function?

85
Table 2.5: Sample weekly family consumption expenditure (Y ) and weekly family income (X)

Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260

Solution
Table 2.6: Computation OLS components based on Sample in Table 2.5

Yi Xi Y i Xi Xi
2 2
xi xi yi Y^ i u^ i=Y i−Y^ i
x i=¿ X − X ¿ (5) y i=¿Y −Y ¿ (6)
(1) (2) (3) (4) i i
(7) (8) (9) (10)
70 80 5600 6400 -90 -41 8100 3690 65.1818 4.8181
65 100 6500 10000 -70 -46 4900 3220 75.3636 -10.3636
90 120 10800 14400 -50 -21 2500 1050 85.5454 4.4545
95 140 13300 19600 -30 -16 900 480 95.7272 -0.7272
110 160 17600 25600 -10 -1 100 10 105.9090 4.0909
115 180 20700 32400 10 4 100 40 116.0909 -1.0909
120 200 24000 40000 30 9 900 270 125.2727 -6.2727
140 220 30800 48400 50 29 2500 1450 136.4545 3.5454
155 240 37200 57600 70 44 4900 3080 145.6363 8.3636
150 260 39000 67600 90 39 8100 3510 156.8181 -6.8181
1109.9995
Sum 1110 1700 205500 322000 0 0 33000 16800 0
≈ 1110.0
Mean 111 170 Nc Nc 0 0 Nc Nc 111 0

Notes: ≈ symbolizes “approximately equal to”; nc means “not computed.”

From these raw data, the following calculations are obtained;

The estimated regression line therefore is Y^ i = 24.4545 + 0.5091Xi. The associated regression line are
interpreted as follows: Each point on the regression line gives an estimate of the expected or mean value
of Y corresponding to the chosen value of X. The value of ^β 1 = 0.5091, which measures the slope of the

86
line, shows that, within the sample range of X between $80 and $260 per week, as X increases, say, by
$1, the estimated increase in average weekly consumption expenditure amounts to about 51 cents. The
value of ^β 0 = 24.4545, which is the intercept of the line, indicates the average level of weekly consump-
tion expenditure when weekly income is zero (autonomous consumption expenditure).
Example Two: A researcher wants to see relationship between sales and advertising expenditure. Ac-
cordingly, data collected is presented in table 2.7. Using this data find coefficients of the relationship?
Table 2.7: Advertising Expenditure and Sales Revenue in thousands of Dollar

Firm(i) Sales(Yi) Advertising (Xi)


1 11 10
2 10 7
3 12 10
4 6 5
5 10 8
6 7 8
7 9 6
8 10 7
9 11 9
10 10 10

Solution
Table 2.8: Detail Computation of OLS component on Y and X Variables
Expense
Firm (i)

Advert-
Sales

ising
(Yi)

yi2

xi2
yi

xi

1 11 10 1.4 2 1.4 2 1.96 4 0.01


2.8 11.1 -0.10
2 10 7 0.4 -1 0.4 -1 0.16 1
-0.4 8.85 1.15 1.3225
3 12 10 2.4 2 4.8 2.4 2 5.76 4 0.81
11.10 0.90
4 6 5 -3.6 -3 10.8 -3.6 -3 12.96 9
7.35 -1.35 1.8225
5 10 8 0.4 0 0 0.4 0 1.96 0 0.16
9.60 0.40
6 7 8 -2.6 0 0 -2.6 0 6.76 0 6.76
9.60 -2.60
7 9 6 -0.6 -2 1.2 -0.6 -2 0.36 4 0.81
8.10 0.90
8 10 7 -0.4 -1 -0.4 -0.4 -1 0.16 1 1.3225
8.85 1.15
9 11 9 -1.4 1 1.4 -1.4 1 1.96 1 0.4225
10.35 0.65
10 10 10 0.4 2 0.8 0.4 2 0.16 4 1.21
11.10 -1.10

96 80 0 0 21 0 0 30.4 28 96 0 14.65

87
, ,

Practical Exercise 2.5

From the data on household expenditure and income survey , the following results were obtained for
25 observations/households.

Then calculate the three building blocks of two variable regression lines.
Answer

Then

^ Y ^ X
Given can be computed as β 0= - β 1 = 163.29-(0.82)(163.2) =30.71

Then

88
B. Estimations of elastic ties from an estimated regression line
¿ ¿

After estimation of the parameters, β 0 and β 1 , we can estimate the elasticity from an estimated regres-
¿ ¿

sion line. The equation of a line whose intercept is β 0 and its slope β 1 can be
¿ ¿ ¿
Y i =β 0 + β 1 X i ………………………………………………………….. (2:49)
¿ ¿
The coefficient β 1 is the deviation of Y with respect to X .
¿
¿
dY
β 1=
dX …………………….…………………………………………….. (2.50)
¿
 Equation 2.48 shows the rate of change in Y as X changes by a very small amount. It should be
¿

noted that the estimated function is linear function. The coefficient β 1 is not the price elasticity but a
component of the elasticity which is defined by the formula:
dY /Y dY X
η P= = .
dX / X dX Y ………………..………………..…………………. (2.51)
¿

Where, η P = price elasticity,Y= quantity (demanded or supplied) and X= price. Clearly β 1 is the com-
dY
ponent of dX . From an estimated function we obtained an average elasticity
− −
¿ ¿
X X
η P =β 1 . −
=β 1 . −
¿
Y Y
……………………………………..……………………. (2.52)

Where, X = the average price in the sample

¿
Y = average regressed value of the quantity or the mean value of the estimated value of
the quantity in the sample

Y = average value of the quantity in the sample

¿ −
 Note that Y = Y that is the mean of the estimated values of Y is equal to the mean of the actual
(sample) values of Y. This is because
¿ ¿ ¿
Y = β0 + β 1 X …………………………………………………….…. (2.53)

¿ ¿ ¿ − − ¿ − ¿ − −
Y = β0 + β 1 X =( Y −β 1 X )+ β 1 X =Y ………………………………..(2.54)

89
¿

Alternatively, the value of β 1 can be obtained using the deviation from their mean as follows.
− −
¿
β 1=
∑ ( Y i−Y )( X i −X )= ∑ x i y i =156 =3 .25
∑ ( X −

X ) 2 ∑ x 2i 48
i

Finally, the estimated supply function is expressed as


Therefore, the price elasticity of our earlier function is

X 9
η P =(3 . 25 ) − =(3 . 25 ) =0 . 46
63
Y
C. Estimation of a function with zero intercept
In some cases economic theory may postulate the relationships between two variables are linear without

constant term. Hence, it is desired to fit the line , subject to the restriction To

estimate , the problem is put in a form of restricted minimization problem and then Lagrange method
is applied.

We minimize: …………………………..………………………..2.55

Subject to:
Form the composite function for constrained minimization by incorporating λ . The above form be-
comes

Where λ is a Lagrange multiplier.

We minimize the function with respect to

Substituting (iii) in (ii) and rearranging we obtain:

90
……………………………………..(2.56)
This formula involves the actual values (observations) of the variables and not in their deviation forms,

as in the case of unrestricted value of .

For example, production functions of manufacturing products should normally have zero intercept be-
cause the output is zero when the factors of inputs are zero.

If the specific functional form when labor is only input and there is linear relationship between input and
output

Since without labor input no output can be produced hence L=0 then output =0

The model can be estimated using regression without intercept.

Practical exercise 2.6


1. A random sample of ten families had the following income and food expenditure (in $ per week)
Table 2.9: Family income and expenditure

Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family expenditure 7 9 8 11 5 4 8 10 9 10
Estimate the regression line of food expenditure on income and interpret your results.
2. The following data refers to the demand for apples (Y) in Kg and the prices of applies X in birr
per Kg on 10 different market
Table 2.10: Demand for apples and prices by families

Families A B C D E F G H I J
Y 99 91 70 79 60 55 70 101 81 67
X 22 24 23 26 27 24 25 23 22 26
Find

3. Consider the model Yi= + Xi + U Show that the OLS estimate of is unbiased

91
2.5.2.2. Method of moments (MM)
This estimation is based on two common assumptions of CLRMA. These are

A.

B.

C. ……………………………………………… 2.57

Simple linear regression has three unknown parameters: . To estimate we need


two equations. These can be derived from a number of premises/assumptions.

Assumption1: state that error average out in the population

Assumption 2: or , Xi and are uncorrelated in population again.


A. First normal equation
Assuming similar condition to be satisfied in the sample, we can derive sample estimates. More specif-

ically, let (Beta hat) denote the sample estimates of and denoted the estimate .Then es-
timate of the error terms is obtained as

. …………………………………………………… 2.58

Sample analogous conditions is that . That is sample mean of is zero. Thus ,


we obtain the following conditions:

…………………………………………………..……..2.59

as

,but
92
Using this result, we get the following equation

…………………………………………. 2.60

The straight line is the estimated line and is known as the sample regression line
or the fitted straight line. We see from equation 2.60 above that the sample regression line passes

through the mean point (


B. Second equation

The second condition for obtaining is provided by assumption that . The sample

statistics corresponding to is

…………………………………………2.61
Setting this to zero we obtain the condition.

……………………….2.62

Taking this summation term by term and noting that can be taken out of the summation
, we get(canceling n)

or

…………………………… 2.63
Then equation 2.61 and 2.62 are known as the normal equation(no connection with the normal dis-

tribution) . We can process to actual calculation and solution for .

From the first normal equation we can get value

93
Plug this value in the second normal equation

……..…………………………………..2.64

From equation 2.60 we can derive

Practical Exercise 2.7


1. What are the building blocks of method of moment estimation?
2. What are basic difference between OLS and method of moment?

2.5.2.3 . Maximum likelihood principle

94
Maximum likelihood (ML) is a method of point estimation with some stronger theoretical
properties than OLS. It is based on the idea that different populations generate different samples, and
any given sample is more likely to have come from same populations than from others. The principle
of maximum likelihood is based on intuitive notion that “an event occurred because it was most
likely to”. In other word, we compute the probability(which is called likelihood in the context of
estimation) of the observed outcomes of different alternatives under consideration, and choose that
alternative for which the probability of observing is the maximum. The belief is that the observed
sample values are more likely to have come from this population than from others.

Steps for deriving maximum likelihood estimators are straight forward;


 First , define n independent random sample Y 1, Y2,, …………….., Yn and corresponding
probabilities
 Second, set up the likelihood function relating the joint density function of the observations to
the unknown parameters by product of probabilities.
 Third , maximize the function by partially differentiating the logarithm of likelihood func-
tion with respect to each unknown parameter and set it to zero.
 Fourth , solve the resulting first order conditions for ML estimators

More specifically, the general principle of maximum likelihood is as follows. Let Y i, be the random
variable whose probability distribution depends on the unknown parameter . That is, a random sam-
ple with value of Y1, Y2, Y3,……Yn of independent observations is drawn with corresponding proba-

bilities . The likelihood function is denoted by

. Then the functional form of likelihood function is joint density of the sample or product of
probabilities. It can be written as

…………………....….……..2.66

………………..….…...…..2.67

The ML estimator maximizes the likelihood function, L, which is the product of the individual probabil-
ities taken over the entire n observations. If the possible values of are discrete the procedures is

to evaluate for each possible value under consideration and choose the value for which

95
L is the maximum. If is differentiable, maximize it over the range of permissible values of .
This can be done using the first and second order condition

………………………………………..2.68

Let us extend the concept to our parameter estimation in simple linear regression developed from origi-
nal equation and to our error term.

Step-1: Assume observations such as (X 1,Y1,), (X2,Y2,), (X3,Y3)……(Xn,Yn) were known independently

drawn sample, and there is relationship between X i and Yi specified as where

are not known. If and are normally and independently distributed

with mean of and variance , the general function relating the two can be

Then the likelihood function is defined as

……2.69
Since our central variable is error term in a sense we try to minimize errors and increase our precision,

we relate Yi to ui . Corresponding to each Yi values there is error term as where

. …………………….……………………………….………….2.70

Then the density function or the likelihood function or joint density function of is de-

noted as which can be written as;

……………….…………………2.71

……………….…………..2.72

96
Our aim is to maximize this likelihood function L with respect to the parameters . The ML

estimator of a parameter is the value of which would most likely generate the observed sample
observations; Y1,Y2,…….Yn .

To do this, it is more convenient to maximize the logarithm of the likelihood function which is
equivalent to maximizing L. Because the logarithm is a monotonically increasing transformation (that
is, if a > b, then Ln(a) > Ln(b)), then, the function is

…………………………….2.73

The only space where appears is in SSE. Therefore, maximizing lnL is equivalent to mini-
mizing SSE as there is negative sign before SSE which gives the least squares estimators. Therefore, the

least squares estimators are also MLE provided ’s identically and independently distributed as as

Differentiating equation above with respect to , we obtain the following equations

………..……..2.74

………...……2.75

……….……..2.76

Recall the rule of derivative for logarithmic function

 For any number n, the derivative of a power function , then

97
 For bases other than e (i.e. logbx), the rule is as follows

 If b = e then

Then

Setting these equations equal to zero(the first order condition for optimization) and letting

denote the MLE we obtain:

………………………………….2.77

………………...………………2.78

……………………..………2.79
Upon rearrangement of equation 2.77 and 2.78 we get normal equation as in 2.80 and 2.81

………………………………………….……….2.80

……………………………………….……..2.81
After simplifying we obtain estimator for formula which is similar to the one using OLS in equation-
2.82 and 2.83. Therefore, the ML estimators of parameters are to the OLS estimators:

98
…………………………...…2.82

………………………………………….……2.83

To obtain the MLE of , we differentiate lnL partially with respect to and set result to zero.

……….….….2.84

=
Rearrange the equation we would then have

but

Solving this for we get , but SSE depends on . We can use their estimates

. We therefore get the MLE of variance of ui as

……………………………………………………2.85

From the above equation the ML estimator of differ from the OLS estimator , which
was shown to be unbiased estimator of . Thus, the ML estimator of is biased. The magnitude of
this can be easily determined by taking the mathematical expectation of the on both sides, we obtain

………………………………………….2.86

which shows that is biased downward (i.e., it underestimate the true ) in small samples. But,
notice that as n( the sample size) increase indefinitely, the second term in (2.85) the biased factor,

tends to be zero. Therefore, asymptotically (i.e. in a very large sample), is unbiased too. That is,

. It can further be proved that is also a consistent estimator. That is, as n

99
increases indefinitely converge to its true . That is , Because ML esti-
mators are consistent and asymptotically efficient , so as the OLS estimators.

Practical Exercise 2.8


1. Is there similarity and difference in estimator formula between OLS and Maximum Likelihood
method?
2. How do you compromise the difference in variance between OLS and Maximum likelihood esti-
mator? Which one is biased and which in unbiased? What will happen when the sample size is
increased?

2.6. Evaluation of estimates: Statistical Properties of Least Square Estima-


tors

We divide the available criteria into three groups: the theoretical a priori criteria, the statistical criteria,
and the econometric criteria.

2.6.1. Theoretical a priori criteria (Economic criteria)


In the evaluation of the estimates of the model we should take in to consideration the sign & magnitudes
of the estimated coefficients. For example, we have the following consumption function

where Ct is consumption expenditure, and Yt is income. We would like to question the


adequacy of the fitted model. How “good” is the fitted model? And are the signs of the estimated coeffi -
cients in accordance with theoretical or prior expectations? We need some criteria with which to answer
this question.

In regression analysis priority should be given to fulfillment of the economic priori criteria (sign and
size of the estimates). Only when the economic criteria are satisfied one should proceed with the appli-
cation of first-order and second-order tests of significance. This stage consists of deciding whether the
estimates of the parameters are theoretically meaningful or accords with the expectations of the theory,
and statistically satisfactory. It all about checking whether the result obtained is in line with theoretical

100
or conceptual arguments or pervious empirical studies. If not there should be logical justifications for the
result obtained.

On the basis of a priori-economic criteria the sign which represents marginal propensity to consume

has to be positive & its magnitude (size) should be between zero & one (0< <1). If the sign & magni-
tudes of the parameter do not confirm the economic relationship between variables explained by the eco-
nomic theory, the model will be rejected. Because priori theoretical criterions are not satisfied as per the
estimates, or the estimates should be considered as unsatisfactory. But if there is a good reason to accept
the model then the reason should be clearly stated.

Given sample data collected form population, the estimated results of the above consumption expendi-

ture is The result says that if your income increases by 1 birr, your consump-
.

tion will increase on the average by less than one birr. i.e 0.203 cents. Then the value of is less than
one & greater than zero which is in line with the economic theory or satisfies the a priori - economic
criterion.

But, if estimation of the same model using other data gives the following results

the estimated model results are contradictory or do not confirm the economic theory as the sign of is
negative & its magnitude is greater than one then we reject the model. We reject such model in line
with the data or subject to further investigation. In most of the cases the deficiencies of empirical data
utilized for the estimation of the model are responsible for the occurrence of wrong sign and/or size of
the estimated parameters. The deficiency of the empirical data can be either due to problem of sampling
procedure, unrepresentative sample observation from the population or inadequate data or violation of
some assumption of the method employed.

Practical exercise 2.9


Assume that you have collected data from certain social group and regression result found is as follows

Evaluate the result using economic criteria?

101
2.6.2. The econometric criteria: Properties of OLS Estimators
These criteria usually evaluate the estimated result from desired properties of OLS estimators, distribu-
tion of the dependent variable(Y) and the distribution of random as a measure of precision of estimators.

2.6.2.1. Desired properties of OLS estimators


The ability to estimates of the slope and intercepts of economic relations is tremendously useful. It is
still useful if we know something about their reliability and relationship to the true unknown values.
The estimates are random variables and are not guarantee to take any particular values. We can talk
about what values they are likely to take and their reliability, if we understand the distribution function
¿ ¿

of the estimates. For instance, once we derived estimators β 0 and β 1 (sample estimates) of the parame-

ters of our model, we should test their reliability and relationship to the true unknown values.
Our objective is to find estimate which is as close as possible or fit to the value of our population param -
eter. Closeness is measure goodness of fit. We need some criteria for judging the ‘goodness’ of an esti -
mate. How are we to choose among the different econometric methods the one that gives ‘good’ esti-
mates?

From their statistical properties we can get some measures on how much dependable they are in
economic applications. These can be done using certain defined econometric criteria or properties of
OLS. Closeness of the estimate to the population parameter is measured by the mean and variance or
standard deviation of the sampling distribution of the estimates of the different econometric methods.
A good estimate should have the properties of unbiasedness, consistency, efficiency & sufficiency or a
combination of such properties. If one method gives an estimate which possesses more of these desirable
characteristics than any other estimates from other methods, then the techniques will be selected. The
ideal or optimum properties that the OLS estimates should possess may be summarized by well-known
theorem known as the Gauss-Markov Theorem. The theorem state that: “Given the assumptions of the
classical linear regression model, the OLS estimators is in the class of linear and unbiased estimators,
with a minimum variance. The theorem sometimes referred as the BLUE (i.e. Best, Linear, Unbiased,
and Estimator) theorem. The detailed prove and logical framework is given in appendix 21, the core idea
can be summarized as follows.
a.Linear: a linear function of a random variable, such as, the dependent variable Y.
An estimator is linear if the equation that defines it is linear with respect to the dependent variable
Y. Otherwise it is non-linear.

102
Prepositions: The least square estimator is linear because we can rewrite the formula for the least

square estimators in linear form as where which do


not depends on Y.

b. Unbiased: its average or expected value is equal to the true population parameter. An estimator

is unbiased if its expected value is equal to respectively, the true value of the pa-
rameter it estimates. Otherwise it is biased. Algebraically,

An estimator is clearly desirable or unbiased if, on average, the estimated values will be equal to the
true value even though in particular sample this may not be so. Put in other way if an estimator

is unbiased then the expected difference between and is zero; the estimates does not tend
to be too high or too low. If it is biased, then it tend to give an answer that is either too high ( if
the expected difference is positive) or too low ( if the expected difference is negative).

c. Minimum variance: an estimator has a minimum variance in the class of linear and unbiased estima-

tors if out of the class of linear and unbiased estimators of , possess the

smallest sampling variances. For this, we shall first obtain variance of then establish that
each has the minimum variance in comparison of the variances to other linear and unbiased estima-
tors obtained by any other econometric methods than OLS. The estimator with minimum variance is
said an efficient estimator.

; Since

103
Furthermore:

……………………..………..2.89

2 2 2
, Since σ X̄ Σci > 0
Therefore, the variance of OLS estimates is the minimum.

2.6.2.2. Normality assumption, and probability distribution of disturbance term, u i

The least square estimates (for instance ) are obtained from a sample of observations on Y and
X. Since the method of OLS does not make any assumption about the probability distribution or
nature of ui, it is of little help for the purpose of drawing inference about the PRF based on SRF.
Besides, sampling errors are inevitable in all estimates, it is necessary to apply tests of significance in
order to measure the size of the error and determine the degree of confidence in the validity of the estim-
ates. This problem can be filled if we make assumption about probability distribution of error term, u i .
The error term should follow some probability distribution-more specifically normal distribution.
Adding the normality assumption for ui to the assumptions of classical linear regression
model(CLRM) will make the general theory what is known as the classical normal linear regression

model(CNLRM). Therefore, the probability distribution and will depend on the assumption

made about the distribution of ui.

We use this assumption to test the significance of the estimators. Hence, before we discuss various test-
ing methods it is important to see whether the parameters are normally distributed or not.
Recall our classical linear regression model assumption related error term ui,

a.

b.

c.
When normality assumption is added to the above assumptions (a and b) we can write the error term
2
as a distributed with mean zero and variance σ , which can be compactly written as.
U i ~ N (0, σ 2 )

104
Where the symbol means distributed as and N stands for the normal distribution, the terms in the
parentheses representing the two parameters of the normal distribution, namely , the mean and vari-
ance. When the two normally distributed variables are not correlated meaning independently distributed
their covariance is zero. Therefore, we can get normally and independently distributed (NID) equation

specified as

……………………………………………………….2.90

Some of the reasons for which we employ normality assumptions are;

A. As pointed out earlier , ui represent the combined effect of all other independent variables

that are not explicitly introduce in the model. Central limit theorem (CLT) provides a theoreti-

cal justification for the assumptions of normality of ui. CLT argue that as the number of obser-

vation increase indefinitely the distribution of error term tends to be a normal distribution. A

variant of the CLT states that, even if the number of variables is not very large or these

variables are not strictly independent their sum may still be normally distributed.

B. With the normality assumptions, the probability distributions of OLS estimators can be easily

derived as any linear function of normally distributed variables is itself normally distributed.

Therefore, OLS estimator are linear function of ei which makes our task of hypothesis test-

ing very straight forward.

C. Normal distribution is a comparatively simple distribution involving only two

parameters(mean and variance) and its theoretical properties have been extensively studied

in mathematical statistics.

D. If we are dealing with a small or finite sample size (say data of less than 100 observation),

the normality assumption play a critical role. It not only helps us to derive the exact prob -

105
ability distributions of OLS estimators but also enables us to use the t, F and statisti-

cal tests for regression models.

Practical Exercise 2.10


 Why normality assumptions are so important?
rmality assumption enables us derive the probability, or sam-
The important point to note is that the no

(both normal) and σ^ (related to the chi square). This simplifies the task
2
pling distributions of and

of establishing confidence intervals and testing (statistical) hypotheses.


Practical Exercise 2.10
1. Why normality assumptions are so important?

2.6.2.3. Distribution of dependent variable (Y) under normality assumption


With the assumption that , Yi is being a linear function of u i is also normally distributed
with mean and variance given by

, ……………………2.91
One can further elaborate the distribution by derive mean and variance of Yi.
I. The mean of the dependent variable is:

a.
…………………..…………………….……………….2.92

Given: , taking the expected value we find

By assumption
X i ' s are assumed to be a fixed values in repeated sampling process, the covariance of

the fixed value


X i ' s and the is zero. That is, , and . Then

E( β 0 + β1 X i )=β 0 + β 1 X i
E(Y i )=β 0 +β 1 X i ------------------------------------------------------------------- (2.93)
II. Variance of dependent variable
The variance of the dependent variable is given by

---------------------------------------------- (2.94)
106
Substitute in the definition of variance (equation 2)

---------------------(2.95)

From our CLRMA assumption the variance of is constant (homoscedastic), or

Therefore, the distribution of


Y i is normal as the shape of the distribution of Y i is determined by the
¿ ¿
μ
shape of the distribution of i , which is normal. Clearly, β 0 and β 1 , being constants and do not affect

the distribution of
Y i . Furthermore, X i ' s are set of constant values and hence do not affect the distribu-

Y i . Thus, the distribution of Y will be


tion of

Y i≈ N ( β 0 +β 1 X i , σ 2μ ) ----------------------------------------------------------------- (2.96)

2.6.2.2. Normality assumption, and corresponding properties of OLS estimates


In regression analysis our objective is not only to estimates the sample regression function (SRF), but
also to draw inference about the population regression function (PRF). Since values of these estimators

^2
will change from sample to sample, these estimators are random variables. As , and σ are ran-

dom variables, we would like to find how close to , to close , to close along with

probability and it distributions relating sample values to their true values. The OLS estimates of

are obtained from a sample of observations on Y and X.

1. Linear function

We need to make use of property of normal distribution which says “........ any linear function of a nor-

mally distributed variable is itself normally distributed.”. Consider and its formula derived so far

…………..…………………………2.97

Therefore, Eq. (2.91) shows that is a linear function of Yi, which is random by assumption. Since

(2.91) can write as ……………………………2.98

107
Because ki, the betas, and Xi are all fixed, is ultimately a linear function of the random variable ui,
which is random by assumption.

Similar logic hold true for is linear in Y. We have established that: , Substitute

we obtain

i as common factor we may write =


Taking Y

…………..2.99

We proceed to find out the probability distribution of the OLS estimators as

3. They are normally distributed being a linear function of ui. That is ;

, then

Or more compactly, then

, then

More compactly, . Then

4. Estimators are independently distributed in a sense that the

Practical Exercise 2.11

108
1. Suppose σ
2 ^ 2 is estimator ofσ 2 . Show that
is the population variance of the error term and σ
σ^ 2 of maximum likely hood is the biased estimator of the true 2 for the model Yi= +Xi + Ui.
^
^ =Ȳ − β X̄ possesses minimum variance.
2. In the model Yi= +Xi + Ui show that α
3. using the assumptions of simple regression model show that
a. YN ( +Xi, 2)
^ ^
b. Cov (α^ , β ) = - X̄ Var( β )
4. For the model Yi= +Xi + Ui

2.6.3. Statistical test of the OLS Estimators (First Order tests)


Thus far we were concerned with estimation of regression coefficients and determination of the least
square regression line, their standard errors, and some of their properties. Next, we need to judge our
model using statistical criteria. If the model passes a prior -economic criterio, the reliability of the esti-
mates of the parameters will be evaluated using statistical criteria (first order tests). These are deter-
mined by statistical theory aim at the evaluation of the statistical reliability of the estimates of the pa-
rameters of the model. The most commonly used first order tests in statistical analysis are:
i. The coefficient of determination (the square of the correlation coefficient i.e. R 2) or the good-
ness of fit. This test is used for judging the explanatory power of the independent variable(s).
ii. The standard error tests of the estimators (S.E). This test is used for judging the statistical relia-
bility of the estimates of the regression coefficients. It measures the dispersion of the sample
estimates around the true population parameters. The lower the S.E. the higher the reliability
(the sample estimates are closer to the population parameters) of the estimates & vice -versa.
iii. t- ratio or t-test of the estimates: Since the estimated value is obtained from a sample of obser-
vations taken from the population, the statistical test of the estimated values will help to find
out how accurate these estimates are (how they accurately explain the population?).
The two most commonly used tests in statistical are r 2 (statistical criteria or first order tests) and the
standard errors of the parameter estimates (economic criteria or second order tests).

2.6.3.1.Tests of the ‘goodness of fit’ with R2

109
2
The coefficient of determination, r (two-variable case) or R2 (multiple regression) is a summary
measure that tells how well the sample regression line fits the data. The r2 shows the percentage of total
variation of the dependent variable explained by the changes in the explanatory variable(s). If all the ob-
servations were to lie on the regression line, we would obtain a “perfect” fit, but this is rarely the case.
The closer the observations are to the line, the better the goodness of fit. To elaborate this, let’s draw a

horizontal line corresponding to the mean value of the dependent variable Ȳ . (see figure 2.8. below).

Figure 2.8: Decomposing the variation dependent variable (Y)


Steps in deriving r2
Step-1: rewrite the sample regression function and the predicted regression equation

……………………………………………………2.100

…………………………………………………………2.101
Then one can write the above equation
⇒Y =Y^ +e i ……………………………………….………………………(2.102)
e =Y −Y^
⇒ i i ……………………………………………….………………(2.103)
Step-2: Find mean of Y
Summing (2.102) will result in

ΣY i =Σ Y^ i sin ce ( Σei )=0


Dividing both sides of the above equation by ‘n’ will give us
ΣY Σ Y^ i
=
n n  …………………….…………………..(2.104)
Step-3: find deviation from mean of Y
Subtracting equation (2.102) from and (2.104) we get
Y =Y^ + e
Ȳ =Ȳ^
110
⇒(Y −Ȳ )=( Y^ −Ȳ^ )+e
…………………………..………………….(2.105)
Let :
e i=Y i−Y^
 = deviation of the observation Yi from the regression line.
y i =Y −Ȳ
 = deviation of Y from its mean or the variation in Y that can be attributed to the influ-
ence of X
^y =Y^ −Ȳ Y^
 = deviation of the regressed (predicted) value ( ) from the mean which can’t be at-
^ − Ȳ
Y
tributed to X is equal to which is referred to as the residual variation (see figure be-
low).

⇒ y i= ^y i + e ………………………….………………..…………(2.106)
Step-4: Find sum of deviation square
By squaring and summing both sides, we obtain the following expression:
Σy2 =Σ( ^y 2 + e )2

But =

(But, . )
Therefore;

Σy2i underbracealignl T⏟otal ¿ =Σ y^2underbracealignl E⏟


xplained ¿ ¿+ Σei underbracealignl U⏟
2
nexplained ¿ ¿¿
var iation ¿ var iation ¿ var ation ¿ …………………………….…………...(2.107)

i.e TSS=ESS+ RSS …………………………………………...(2.108)


We can compute the total variation as

Total Variation in Y or TSS=∑ yi =∑ (Y i−Y )


2 2
A.
¿
It is also possible to defined the deviation of the regressed values (estimated from the line), Y ’s from
¿ ¿ −

the mean values as, y i =( Y i −Y ) .


B. Explained variation or ESS

111
This is the part of the total variation of Yi, which is explained by the regression line. Hence, the sum of
the squares deviations in the total variation of dependent variable explained by the explanatory variable
¿ ¿ −

is expressed as, ESS = ∑ yi2=∑ (Y i−Y )2


C. Unexplained variation or RSS
This is the part of the variation of the dependent variable, which is not explained by the regression line

and is attributed to the existence of the disturbance term. We have also defined the residual
e i as the

difference between . Hence, the sum of the squared residuals (RSS) gives the total unexplained

variation of the dependent variable Y around the mean. That is, RSS =∑ ei =∑ (Y i−Y )
2 2

Such decomposition of the total variation in Y in to fitted Y and unexplained part is only possible for

OLS. Even though common sense suggests that the total variation in Y always be the sum of variation

that can be explained and unexplained, it is not possible using other method of estimation than OLS. The

above three sum of square computed from our supply function is shown in Table 2.11 below.
Table 2.11: The computation of sum of squares from the supply function

2. To find the 3. To find the 4. From 5 6 7=(2*5)



1. SRF
e
Mean of i Variance of
ei y i =( Y i −Y )
¿ ¿ ¿ ¿ ¿ ¿
y 2i = ¿ ¿ −
y 2i yi ei
Y i =β 0 + β 1 X i e i=(Y i−Y i ) =e2i y i =( Y i −Y )
63 6 36 36 0 0 0
72.75 3.25 10.5625 169 9.75 95.0625 31.6875
53.25 -1.25 1.5625 121 -9.75 95.0625 12.1875
66.25 -10.25 105.0625 49 3.25 10.5625 -33.3125
63 -6 36 36 0 0 0
66.25 10.75 115.5625 196 3.25 10.5625 34.9375
56.5 1.5 2.25 25 -6.5 42.25 -9.75
59.75 -4.75 22.5625 64 -3.25 10.5625 15.4375
72.75 -5.75 33.0625 16 9.75 95.0625 -56.0625
53.25 -0.25 0.0625 100 -9.75 95.0625 2.4375
69.5 2.5 6.25 81 6.5 42.25 16.25
59.75 4.25 18.0625 1 -3.25 10.5625 -13.8125
¿ ¿ ¿ ¿
∑ Y i= 756 ∑ ei = 0 ∑ e2i =387 ∑ yi2=894 ∑ yi =0 ∑ yi2=507 ∑ yi ei =0
¿

Mean of
Y i = 63 Mean = 0 ESS TSS ESS=507
¿

It can be seen from this table that the mean of Y i = 63 and equal to Y , which is the mean of Y.

112
n the explained variation (goodness of fit) or r 2 is expressed as a percentage (ratio) of total variation
The
explained by the model. Mathematically; the explained variation as a percentage of the total variation is
explained as:
ESS Σ ^y 2
=
TSS Σy 2 ………………………………….…………….(2.109)

But, but

Then

Squaring and summing both sides give us

We can substitute (2.110) in (2.109) and obtain:


β^ 2 Σx 2
ESS /TSS=
Σy2 ………………………………..……(2.111)
Σx 2i
( ) Σy
Σ xy 2 Σxi y i
= ^
β=
Σx 2 2
, Since Σx2i
Σ xy Σ xy
=
Σx 2 Σy2 ………………………………….………(2.112)
Comparing (2.112) with the formula of the correlation coefficient:

2
r = Cov (X,Y) / x2x2 = Σ xy / nx2x2 = Σ xy / ( Σx Σy )1/2 ………(2.113)
2

2
r2 = ( Σ xy )2 / ( Σx Σy ). ………….(2. 114)
2
Squaring (2.113) will result in:
Comparing (2.114) and (2.109), we see exactly the expressions. Therefore:

Further, we can derive the above relationship in other form.


¿
∑ y 2i
=
∑ y 2i + ∑ e 2i
∑ y 2i ∑ y 2i ∑ y 2i ---------------------------------------------------- (2.115)
113
TSS ESS RSS
= +
TSS TSS TSS

Alternatively, from (2.110), RSS=TSS-ESS. R2 becomes;


2
2 TSS−RSS RSS =1− Σei
R= =1−
TSS TSS Σy 2 ………………………….…………(2.116)

From equation (2.116) we can drive;

Interpretation of R2
r2 determines the proportion of the variation in Y explained by the variation in X. It is sometimes

known as the coefficient of determination. For example, if , this means that the regression
line explains 90% of the total variation in Y. The remaining 10% is unaccounted by the regression line
is attributed to factors included in the disturbance term. Using our data in Table-2.11 about the values
of ESS and TSS one can compute the coefficient of determination (goodness of fit) for the supply func-
387
r 2 =1− =1−0 . 4329=0 .5671
tion. The TSS and RSS are 894 and 378 respectively. Then 894 indi-
cates that 57% of the variation in Y (supply) is explained by the variation in X (price) and 43% of the
variation in Y are attributed to the changes in other factors not included in our regression function and
are captured by the disturbance term.

If the regression has no constant term, i.e if , the sum of squares decomposition given

above no longer hold. That is, . The value of R2 obtained as is not com-
parable to R2 obtained earlier because denominators are different. The third way of expressing r 2 es-
¿

tablishes the relationship between r2 and β 1 is by stating the original expression

.
114
Consider that their values in our supply function are ∑ x i y i =156 and ( ∑ x 2i )=48 . Therefore,
¿

156/48=3.25, which is also the value of β 1 . Therefore,

¿ ( ∑ xi yi ) 156
r 2 =β 1 . =3. 25 . =0 .5671
(∑ y 2i ) 894

The limit of R2
It is a nonnegative quantity with limits are between zero and one i.e. 0 ≤ r 2 ≤ 1. An r 2 of 1 indicate a

perfect fit, that is, for each i. It implies that the variation in Y is perfectly explained by changes in
¿ ¿ ¿

the explanatory variable (X) or Y i =β 0 + β 1 X i . It also indicate that ∑ ei / ∑ y i =0 . However, this


2 2

may not happen in reality for there are a number of factors that affect a certain variable to change. On
the other hand, an r 2 of zero means that there is no relationship between the regressand and the regressor

what so ever (i.e., = 0). It implies that the variation in Y is not at all explained by the explanatory

¿ ¿ ¿
∑ e 2i =1
variable (regression line, Y i =β 0 + β 1 X i ). This in turn implies that ∑ y 2i Or ∑ e2i =∑ y 2i
.

2.6.3.2. The correlation coefficient (r)


A quantity closely related to but conceptually very much different from r2 is the coefficient of correla-
tion, which is a measure of the degree of linear association between two variables. It simply measures
the extent to which variables move together and nothing about causation. Nothing can be inferred about
the direction of causation between the variables. The formula for correlation coefficient (r) between X
and Y is given by,

or

115
One can derive r from r2 derived so far as follows
r = ±√ r 2 ---------------------------------------------------------------------------------- (2.119)
If r2 = 0.9621 then r = 0.9809. The value of r2 of 0.9621 means that about 96 percent of the variation
in the weekly consumption expenditure is explained by income. The coefficient of correlation of 0.9809
shows that the two variables, consumption expenditure and income, are highly positively correlated.

Further, the process of obtaining or deriving the value of goodness of fit (r 2) by squaring the formula for
correlation coefficient (r) establishes the relationship between r2 and r. By squaring the above equation
gives
( ∑ x i y i )2 ( ∑ xi y i ) ( ∑ xi y i )
r2= = .
( ∑ x 2i )( ∑ y 2i ) ( ∑ x 2i ) ( ∑ y 2i ) ----------- (2.120)

 R 2  (ryˆ , y ) 2  (rx , y ) 2
where ryˆ , y and rx,y are the coefficients of correlation between Yˆ& Y, and X & Y, defined as:

This result hold true even if there are more explanatory variable, provided the regression has a
constant term.
Some of the properties of r are as follows:
1. It is a measure of linear association or linear dependence only; it has no meaning for describing
nonlinear relations like Y = X2
2. It lies between the limits of -1 and +1 ; that is . It can be positive, zero or negative, the
sign depending on the sign of the term in the numerator. For example r=+0.832 between the income
and consumption expenditure indicate a strong and direct linear association. This does not necessar-
ily infer that there was any causal link between the two variables.

116
3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y (rXY) is the same as
that between Y and X (rYX).

Practical Exercise 2.12


1. What type of relationship between R2 and r?
2. Compute R2 and r using data given on table 2.8 and interpret the result?
3. A random sample of ten families had the following income and food expenditure (in $ per
week)

117
Table 2.11: Ten family income and expenditure per week

Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family expendi- 7 9 8 11 5 4 8 10 9 10
ture
Compute and interpret R2 and r ?

2.6.3.3. Precision of the estimators/ the distribution of random variable

We have seen that OLS estimates possess a number of nice properties. The next step is to measures the
precision of the estimates. In above estimation, we assume the usual process of repeated sampling in
^
which we get a very large number of samples with size ‘n’; and compute the estimates β ’s from each
sample given the relevant distribution. Since, the data are likely to change from sample to sample, so do
the estimates. We need some measure of reliability or precision of estimators. What we have to do is to
compare the mean (expected value) and the variances of these distributions and choose among the alter-
native estimates the one whose distribution is concentrated or as close as possible to the population pa -
rameter.
a. Variance
We know from the theory of probability that the variance of a random variable is a measure of its
degree of dispersion around mean. The smaller the variances, the closer on average, individual value
are to the mean. When dealing with confidence interval, we know that the smaller the variance of
the random variable, the narrower the confidence interval of a parameter. The variance of the esti -
mators are thus an indicator of precision of estimates. It is therefore of considerable interest to com-

pute the variance of as they depend on Y’s which in turn depends on the random error term

.They are also a random variables with associated distribution as

118

2
You may observe that the variances of the OLS estimates involve σ , which is the population variance
of the random disturbance term. It is difficult to obtain the population data of the disturbance term be-

cause of technical and economic reasons. The formula of the variance of involves the

variance of the random term μ , . How do we estimate ? Since we did not have population ,

we estimate it using sample counterpart. That is, we may obtain an unbiased estimate of from the
sample counterpart

-------------------------------------------------------------- (2.124)

Where, , n is number of observation and , k the number of parameters in the


regression function. In the case of simple linear regression model, k =2 (i.e. )
2 Σe2i
^
σ u=
n−2 …………………………………………………..…..…(2.125)

σ^ 2 in the expressions for the variances of , we have to prove whether σ^ is the unbi-
2

To use

2
ased estimator of σ , i.e.,

Since where being the sum of the residual square or residual sum of square ,
is the number of degree of freedom (df). The term number degree of freedom means the total
number of observations in the sample ( n) less the number of independent (linear) constraints or re-
strictions put on them. In other words, it is the number of independent observations out of a total of n

observations. There are two condition given by the normal equation for and from

119
2
Thus, is unbiased estimate of the true variance of the error term (σ ).

Hence the formula of variance of and becomes;

= …………………..……………………(2.126)

………………..……………(2.127)
Hence, it is assumed that the expected value of the estimated variance is equal to the true variance and

hence it is unbiased estimates of . That is

 Expression of variance, , together with the assumption that the mean of the error term is

zero [ E (μ i )=0 ] gives the assumption that the variable


μi has a normal distribution. That is

-------------------------------------------------------- (2.128)

∑ ei 2 can be computed as∑ ei 2=∑ y i 2− β^ ∑ x i y i .


b. Standard error
Standard error is the standard deviation of the sampling distribution of the estimator, and the sam-
pling distribution of an estimator is simply a probability or frequency distribution of the estimator .
Sampling distributions are used to draw inferences about the value of the population parameters on
the basis of the value of the estimators calculated from one or more samples. In other word

must first be obtained using OLS.

…………………..……………(2.129)

………………..………(2.130)

120
These two estimates therefore put two restrictions on the RSS. Therefore, there are n-2 and n, indepen-
dent observations to compute the RSS. Following this logic in the three-variable regression RSS will
have n-3 df, and for the k-variable model it will have n-k df.

In passing, note that the positive root of , i.e. is known as the standard error of esti-
mates or the standard error of the regression(se). It is simply the standard deviation of the Y val -
ues about the estimated regression line and is often used as a summary measure of the “ goodness
of fit” of the estimated regression line.

c. Example
Refer to the above example 2.1 above, one can find the variance and standard errors estimates
^β 0 = 24.4545 var ( ^β 0) = 41.1370 and se ( ^β 0) = 6.4138
^β 1 = 0.5091 var ( ^β 1) = 0.0013 and se ( ^β 1) = 0.0357
cov ( ^β 0, ^β 1) = −0.2172 σ^ = 42.1591
2

r2 = 0.9621 r = 0.9809 df = 8

d. Properties of variance of

2
I. The variance of is directly proportional to σ but inversely proportional to . That is,

2
given σ the larger the variation in the X values, the smaller the variance of and the

greater the precision with which can be substantially.

2
II. The variance of is directly propo`rtional to σ and but inversely proportional to

and the sample size n.

III. Since are estimators, they will not only vary from sample to sample but in a given
sample they are likely to be dependent on each other , this dependence being measured by
the covariance

121
. Since, ……………..2.131
There is no general agreement among econometricians as to which of the two statistical criteria, high r 2
or lower standard errors of the estimates is better. Some tend to attribute great importance to r 2 and ac-
cept the parameter estimates while others suggest that acceptance or rejection of the estimate, depends
on the aim of the model in any particular situation and standard errors. Despite this controversy the
majority of writers seem to agree that r2 is more important criteria when the model is to be used for
forecasting while the standard errors acquire a greater importance when the purpose of the research is
explanation and reliability (analysis) of economic phenomenon. A high r 2 has a clear merit only when
combined with significant estimates (low standard errors). When a high r 2 and low standard errors are
not obtained in any particular research the researcher should be very careful in his/her interpretation
and acceptance of the results. Priority should always be given to the fulfillment of the economic a pri-
ori criteria (sign and size of the estimates). Only when the economic criteria are satisfied one should
proceed with the application of first-order and second-order tests of significance.

2.6.4. Especial Distributions

There are many distributions commonly used for statistical inference, but the most common are standard
normal distribution, t-distribution, and chi-square distribution.
D. Normal and Standard distribution
Random variable which follow normal distribution is distribution with mean ( ) and variance ( ),

The above normal distributions can be standardized or transformed into standard normal distribution by
transforming our coefficient to the standard normal variable, Z, with zero mean and unit variance

Z≈N (0 , 1) . This can be done using formula given as follows;

122
…………………..…….2.132

Since the variables of interest here are the OLS estimates of the parameters and , they can be
transformed Z as follows

E. The t-distribution
The same logic hold true if the number of observation is less than or equal to 30. The relevant distribu-
tion is termed as students or t-distribution. The procedure for constructing a confidence interval for t
distribution is similar with the one outlined for Z distribution earlier with the main difference in that t-

distribution take into account the degree of freedom. The t statistics for is

Where: SE = is standard error, k = number of parameters in the model.

From our simple linear regression with two coefficients, we have the t-distribution of OLS estimates as

follows

123
Where is the estimated standard error. Since the true variances

of the estimates, and , are unknown , we use the unbiased estimate


σ^
2
= ∑ 2
u i /n−K
(K is

number of parameters estimated) and obtain estimates of the standard error of the coefficients,

and .

Clearly as n increases, the t distribution approaches the standard normal distribution Z ~N(0,1).

normal distribution

t-distribution

Figure 2.9: Tendency of t-distribution to z distribution


F. Chi-Square distribution
As pointed out above (in section 2.2) under the normality assumption, the variance of error term follows

χ2 distribution with n − 2 df. Variance ( ) follow chi-square distribution with n-2 degree of freedom as

. The t-distribution was defined as the ratio of a standard normal to the


square root of chi-square independent of it.

Practical Exercise 2.13


1. Mention some especial distribution important under normality assumption which may help us in
confidence interval formation?

2.7. Interval Estimation and Hypothesis Testing


2.7.2. Interval estimation and Distribution
The method of least square with the added assumption of normality of u i provides us with all the
tools necessary for both estimation and hypothesis testing of the linear regression model. The theory

124
of estimation consists of two parts: point and interval estimation. We have discussed point estimation
(the OLS method of point estimation) along with their reliability measures (standard error) and probabil-
ity distribution thoroughly in pervious sections. Instead of relying on the point estimates alone, we must
establish limiting values with in which the estimate of the true parameter is expected to lie within a cer -

tain “degree of confidence”. To be more specific, we want to find out how “close” say , to
with 1- degree of confidence given (standard error) that help us measure degree of precisions.
The reason is that although in repeated sampling sample mean value is expected to be equal to the true
value, a single estimate is likely to differ from the true value due to sampling fluctuation. In this
part we first consider interval estimation and then take up the issue of hypothesis testing, a topic
intimately related to interval estimation.

To this end, we try to find out two positive numbers and ( lying between 0 and 1), such that

the probability that the random interval ( contains the true is 1- .


Symbolically,

………………………………………………..2.134
Such an interval, if it exist , is known as a confidence interval; 1- is known as the confidence

level; and is known as the level of significance . The symbol is also known as the
size of the (statistical) test. The endpoints of the confidence interval are known as the confidence

limit(also known as critical values), being the lower confidence limit and the upper
confidence limit. In practice and 1- are often expressed in percentage forms as 100 and 100(1-
) percent. For example, if or 5 percent, would read: the probability that the (random)

interval shown includes the true is 0.95 or 95%., commonly , is related standard errors as a
measure of precision say two or three standard errors on either side which has 95% probability of
including the true parameter value.

We choose a probability in advance and refer to it as the confidence level (confidence coefficient). It is
customary in econometrics to choose the 95 percent confidence level. In repeated sampling the confi-
dence limits computed from the sample would include the true population parameter with 95 percent of
the cases and 5 percent of the cases the population parameter will fall outside the confidence limits. The
equation 2.155 shows an interval estimators with a specified probability, of including the true
value of the parameter within specified its limits.
125
It is very important to know the following aspects of interval estimation:

 Equation 2.134 does not say that the probability of (the true value) lying between the

given limits is . Although is unknown, it is assumed to be some fixed number


which either lie in the interval or not.
 The interval 2.134 is a random interval which will vary from sample to sample and it is

based on which is random .


 Since the confidence interval is random, the probability statements attached to it should be
understood in the long run sense, or repeated sampling. More specifically, in the long run or
in a repeated sampling on the averages such interval specified 2.134 will enclose the true
value of the parameter in of the cases

 Interval 2.134 is random as long as is not known. But once we have a specified sample and

obtain a specified numerical value of , the interval is no longer random; it is fixed. In


this case , we cannot make the probabilistic statement (2.134); or we cannot say that the

probability is a given fixed interval includes the true .In this situation, is either
in the fixed interval or outside it. Therefore, the probability is either 1 or 0. Thus for our hy-
pothetical consumption-income example , if the 95% confidence interval were obtained ,we

cannot say the probability is 95% that this interval includes the true . The probability
is either 1 or 0.

How are the confidence intervals constructed? If the sampling probability distributions of the estimators
are known, one can construct confidence interval of estimates. With assumptions of normality of the
disturbance term ui, the OLS estimators are themselves normally distributed with mean and variances
given therein. The normality assumption permits us to derive the confidence interval depending on rel-

evant sampling distributions of , and . We can construct confidence interval for Z-distribution, t-
distribution, and Chi-square distribution.

A. Confidence interval of estimates from the standard normal distribution(Z-distribution )

126
In econometrics application the population variance of , which is unknown? When the population
variance is unknown and the sample with which we work is sufficiently large (n >30), we use the stan-
dard normal distribution and perform the Z test (approximately) since the sample estimate of the vari-

ance is a satisfactory approximation to the unknown variance. Furthermore, if we know the sample

standard deviation (σ ( ^β ) ) is a reasonably good estimate of the unknown population standard deviation,
i

then the Z- distribution will be used. As n increases, the standard deviation of the sampling distribution
decreases. That is, when n is large (we have more information), the sample mean can be expected to be
closer to the mean of the population. Hence, the sample mean will be a more reliable (unbiased) esti -

mate of the population mean and the sample estimate is a good estimate of .

We choose a confidence coefficient designated by α, and find out the interval of Z lying between −Z α
2

and Z α with probability of the value of 1- α. This may be written as follows


2

{ }
P −Z α < Z < Z α = 1- α…………………………………………………2.135
2 2

Substituting Z = ( ^βi− βi ) /σ ( ^β ) and rearranging slightly, we get


i

{ }
^β −β
i i
P −Z α < < Z α = 1- α
2
σ ( ^β ) 2
i

Upon further arrangement, we have


^
{ ^
P βi−Z α σ ( ^β )< β i< β i + Z α σ ( ^β ) = 1- α
2
i
2
i }
Thus the (1- α)100 percent confidence interval for β i is
^β −Z σ ^ < β < ^β + Z σ ^
i α (β) i i
i
α ( β ) or i
2 2

β i = ^β i ± Z α σ ( ^β )………………………………………………2.136
i
2

The confidence interval says the unknown population parameter, β i, will lie within the defined limits (1-
α)100 times out of 100.

127
Example1;
Construct a 95% confidence coefficient for the value of Z lying between –1.96 and 1.96 for estimator?
This may be written as follows.
P {-1.96 < Z < +1.96} = 0.95
¿ ¿

Substituting Z = ( β i−β i )/σ ( β i ) and rearranging slightly we get

{ }
¿
β i−β i
P −1 . 96< ¿ <1 .96 =0 . 95
σ ( βi )

=0.95

Thus the 95% confidence interval for is


¿ ¿ ¿ ¿
β i−1 . 96(σ β )< βi < βi +1. 96( σ βi ) or
¿ )
β i±(1. 96 ).(σ βi )
It says that the confidence interval stated above include the unknown population parameter, β 1 , will lie
within the defined 95% times out of hundred.

Example-2:
Given estimated regression function as follows construct a 95% confidence interval and interpret?

Solution : the confidence interval of


β 1 =0.75± 1.96(0.25)

0r 0.75 – 1.96(0.25) < 0.75 + 1.96 (0.75)

And 0.26 < β 1 < 1.24

Interval like (0.26 and 1.24) contains the true unknown (true population) parameter of β 1 in 95 out of
100 cases in the long run.

B. Confidence interval from the student’s t distribution


Other types of common distribution are the student’s or t distribution. It is applicable when the popula-
tion variance is unknown, and the sample with which we work is small (n < 30) provided that the popu-
lation of the parameters approximately normal. The t distribution is always symmetric, with mean equal

to zero and variance which approaches unit when n is large.

128
In a two-tail test with  level of significance, the probability of obtaining the specific t-value on either
α
side ( –tc or tc) is 2 at n-2 degree of freedom. The probability of obtaining any value of t which is
^
β−β
equal to SE ( β^ ) at n-2 degree of freedom is 1−( 2 + 2 ) i . e . 1−α . Then, the confidence interval for t
α α

distribution will be

i.e. ……………………………………………(2.137)

but
Substitute (2.160) in (2.161) we obtain the following expression.

…………………………………………..(2.138)

The limit within which the true lies at (1−α)% degree of confidence is:

; where
t c is the critical value of t at α 2 confidence interval and n-2 degree of freedom.
Notice that: The width of confidence interval is proportional to the standard error of the estimator. That
is, the larger the standard error, the larger is the width of the confidence interval. Putting differently,
the larger the standard error of the estimator, the greater is the uncertainty of the estimating the true
value of the unknown parameter. Thus, the standard error of an estimator measures of the precision of
the estimators.
Example: Construct 95% confidence interval of OLS estimators for sample size of 25 for hypothetical
relationship?
Solution: The probability of t lying between –t 0.025 and +t0.025 is 0.95 (with n – K degree of freedom)
with n=25 can be constructed as
P {-t0.025 < t < +t0.025} = 0.95
¿ ¿

Substituting t = ( β i−β i )/σ ( β i ) and rearranging slightly we get

129
=0.95
¿

Thus the 95% confidence interval for β 1 is

or

with (n – K) degree of freedom.


It can be interpreted as the confidence interval stated above has a 95% probability of including the
true value of the population parameter. Or it has 95 out of 100 cases the interval includes true popula -
tion parameters.
Example-2: Suppose we have estimated the following regression line for a sample of 20 observations.

Se (38.4) (0.85)
From the t table we get the value of t 0.025 at 18 degree of freedom equal to 2.10. Given the above infor-
mation the 95% confidence interval for OLS the parameters can be ?

 For
¿

 For β 1=2. 88±(2 .10 )(0 .85 )=2 . 88±1 . 79=−1 . 09 and 4 . 67
In the long run , in 95 out of 100 cases intervals like (8.4, 169.6)will contain the true . But as
warned earlier , we cannot say that the probability is percent that the specific interval above contains

the true because this interval is now fixed and no longer random. Therefore, either lies in it or

does not. The probability that the specified fixed interval includes the true is therefore 1 and 0. The

same logic holds true for .


C. CONFIDENCE INTERVAL FOR σ2
Under the normality assumptions, the variances of error term follow χ2 distribution with n-2 df. There-
fore, we can use the χ2 distribution to establish a confidence interval for σ2 as
P{ χ 1−α / 2< χ < χ α / 2 } = 1-𝛼 …………………………………………………..2.163
2 2 2

where the χ2 value in the middle of this double inequality is as given in the above equation and where
2 2
χ 1−α /2 and χ α /2 are two values of χ2 (the critical χ2 values) obtained from the chi-square table for n − 2
df.

130
σ^
2
But 𝜒2 = (n2) 2 ……………………………………………………………..2.139
σ
σ^
2
Substituting χ2 = (n-2) 2 and rearranging the terms, we obtain
σ

{ }
2 2
σ^ 2 σ^
P ( n−2 ) 2
< σ < ( n−2 ) 2 = 1-𝛼………………………………….2.140
χ α /2 χ 1−α /2

which gives the 100(1 − α)% confidence interval for σ2.

Practical Exercise 2.13


For a sample of 50 observations the regression is estimated and presented as follows

Then,
2.8. Which distribution (t or Z) is relevant and why?
2.9. Construct a 95% confidence interval and interpret?

2.7.2. Hypothesis testing and its approaches

The aim of statistical inference is to draw conclusions about the population parameters from the sample
statistics. The concept of hypothesis testing may be stated simply as follows: is a given observation or
finding compatible with some stated hypothesis or not? The word “compatible,” used here, means “suf-
ficiently” close to the hypothesized value so that we do not reject the stated hypothesis.

To this end we start with hypothesized value for estimator from prior experience or theory. The form in
which one expresses the null and alternative hypothesis is important in defining rejection (acceptance)
region or critical region of the test. The hypothesis which we wish to test (on the basis of the evidence
of our sample estimate) is called the null hypothesis (H0) whereas the counterpart proposition of the null
hypothesis, alternative hypothesis (H1). The two hypotheses are opposite to each other. It may take one
of the following forms: two tailed or one tailed. The test simply differ depending on whether we
use one or two tailed hypothesis. It may take one of the following forms.

If we know value of

131
In this case the rule to accept or reject the null hypothesis is developed on the basis of test statistic ob-
tained from the data. This can be done using various tests. The most common ones are:
i) Standard error test
ii) Confidence interval(Z-test , Student’s t-test)
iii) Test of significance
iv) P-value test
Given the hypothesis, the next step is to test their statistical reliability, by applying some rule which will
enable us to decide whether to accept or reject our estimates. To make such decisions compare the sam-
ple estimate with the true value of population parameter. However, the population parameter is un-
known. Basic question will be how are we going to make the decision whether to accept or reject the
sample estimate given that we do not have the appropriate yardstick (that is the population parameter)
for making comparison required? To bypass such difficulty we make some assumption about the value
of the population parameter and use our sample estimate in order to decide whether our assumption is
acceptable or not.

Generally, the acceptance or rejection of the null hypothesis has a definite economic meaning. The ac -

ceptance of the null hypothesis ( ) implies that the explanatory variable(X) does not influence
the dependent variable Y and should not be included in the function. Because the conducted test does not
provided enough evidence to reject H0 or there is no relationship between X and Y or changes in X leave

Y unaffected. Rejection of the null hypothesis doesn’t mean that our estimate of and is not cor-

rect estimate of the true population parameter . It simply says that estimates comes from a

sample whose parameter ( β ) is different from zero.

The statistical significance of the estimates is presented/ reported in one of the following three ways:
i. The estimates are significantly different from zero
ii. The estimates are statistically significant or insignificant
iii. We reject or accept null hypothesis

132
In statistics, when we reject the null hypothesis, we say that our finding is statistically significant. On
the other hand, when we do not reject the null hypothesis, we say that our finding is not statistically sig-
nificant.

2.7.2.1 The standard error test of the least square estimates

This test uses the standard error of the estimates as a test statistic to decide on the rejection or accep-
tance of the null hypothesis. It uses two tailed hypothesis in which helps us to decide whether the esti-

mates and are significantly different from zero or not. The standard error test is an approximated
test (which is approximated from the z-test and t-test) and implies a two tail test conducted at 5% level
of significance. Steps to be followed are:

Step 1: state the null vs alternative hypothesis

Formally we test the null hypothesis against the alternative for the slope param-

eter; and against the alternative for the intercept.


Step 2: Obtain the standard errors of the estimates: We have already derived that the standard error of

OLS estimates for and are given as:

 ,

Step 2: Reproduce the numerical values of and from the above

: and
Step 3: compare the standard deviations (errors) with values of estimates and make decision.

133
and furthermore and
Step4: Make decision
Decision Rule
SE( β^ i )> 1 2 β^ i
 If , accept the null hypothesis and reject the alternative hypothesis. We conclude
^
that β i is statistically insignificant. In other words acceptance of H 0 implies that there is no evi-
dence from the sample data which implies relationship between Y and X. That is the relation-

ship can be ..

SE( β^ i )< 1 2 β^ i
 If , reject the null hypothesis and accept the alternative hypothesis. We conclude
^
that β i is statistically significant.
Example:
Suppose using data collected from a sample of size n=30, we estimated the following supply function.

Test the significance of the slope parameter at 5% level of significance using the standard error test?

Solution:

then
SE( β^ i )< 1 2 β^ i
. i.e Thus, the β^ is statistically significant at 5% level of significance.
It indicates that there is relationship between price and quantity demand.

Practical Exercise 2.14


Suppose using data collected from a sample of size n=300, we estimated the following supply function.

Test the significance using standard error?

2.7.2.2 Test of significance approach

There are two mutually complementary approaches for devising such rules, namely, confidence interval
and test of significance. In the confidence-interval procedure we try to establish a range or an interval
with certain probability of including the true but unknown β i . and see whether the hypothesized value is
in the interval or not. But, in the test-of-significance approach we hypothesize some value for β i and try

134
to see whether the computed ^β lies within reasonable (confidence) limits around the hypothesized
i

value. Both of these approaches predicate that the variable(estimators) under consideration has some
probability distribution and that hypothesis testing involves making statements or assertions about
the value(s) of the parameters if such distribution.

I. Test of significance

Since sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order
to measure the size of the error, determine the degree of confidence and measure the validity of these es-
timates. If we invoke the assumption that ui ~N (0, σ2), then we just use the t-test (or Z-test) to test a hy-
pothesis about any individual partial regression coefficient. The model we are going to test is specified
as follows

The procedure for testing a hypothesis using test of significance approach involves the following steps;
Step 1: State null hypothesis (H0) and alternate hypothesis (H1)

Step-2: Estimate , and and in the usual ways


Step 3: test statistic: Depending on the sample size and relevant distribution properties (Z, t, Chi-square
and F) compute test statistics of estimators. The test statistics is called the computed value from sample
data. Some of this test statistics;
a. Z-test: when sample size is above 30 and population variance is known one can be estimate test
statistics using Z-statistics. Since the variables of interest here are the OLS estimates of the para-

meters β and β , the formula of Z statistic can be

. Then also follows the standard normal distribution.


.

b. t-test : When sample size is less than 30, we can derive the t-statistics of the OLS estimates

135
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2. Test statistics derived in this way can be shown to follow a t-distribu-
tion with n-2 degrees of freedom.
c. The same logic hold true for F-distribution.
Step 4: Select (specify) suitable significance level ( ). In making decision one never be 100% sure in
making the right decision. We choose a level of significance for deciding whether to accept or reject our
hypothesis. We are liable to commit one of the following types of error.
 Type I Error: We reject the null hypothesis when it is actually true
 Type II Error: We accept the null hypothesis when it is actually false
In accepting or rejecting a null hypothesis you are taking a chance of being wrong is α percent level of
significance. That is, the probability of committing a Type I error or the risk of rejecting the hypothesis
if it is actually true. It is customary in econometrics to choose the 5% or 1% level of significance. The
logical interpretation is that in making our decision we allow (tolerate) five times of a hundred to be
“wrong”, or to reject H0 when it is actually true. Since we want the probability of being wrong to be
small, we assign a lower value and we call it the level of significance of our test.

Step 5: Identify relevant distribution and choose the location of the critical region:

After defining the relevant distribution one should compute the range of values (confidence interval) of
the test statistic which results in a decision to accept or reject the hypothesis with probability 100(1−α)
%. These are theoretical (tabular) value(s) that defines the boundary of the critical region or the accep-
tance region for the test of OLS estimators with certain degree of confidence. This is some tabulated
distribution with which to compare the estimated test statistics. It is common to look it up from the
standard table of relevant distribution.

i. Two tailed critical region


The decision of whether to choose a one tail or a two-tail critical region depends on the form in which
the alternative hypothesis is expressed. If the inequality sign in the alternative hypothesis is not equal (
¿ ), a two tail test should be adopted. This is done by dividing the chosen level of significance to two to

obtain the critical value from the t –table or Z table. The probability of test is now divided in to two. Ac-
α
cordingly, obtain critical value of z, called zc at 2 from Z table if number of observation is more than
30. Similarly, t-values with which to compare the test statistic can be obtained from the relevant t-tables.

That is, a critical value at the point such that and with n-

136
2 degree of freedom for two tail test from relevant t table. The z- table in the Appendix 2.1 and t-table
in 2.2 in the entry for n-2 degree of freedom at the given level of significance(say ).

Figure 2.10: Two-tailed test for against


We use two-sided hypothesis test when we do not have a strong a priori or theoretical expectation about
the direction in which the alternative hypothesis should move from the null hypothesis. For example, if

we choose the 5% level of significance ( α ) , then each tail will include the area (probability) 0.025 (
α / 2) to obtain critical value from the relevant distribution. One should obtain those values from the rel-
evant z-table. Furthermore, in applied econometrics it becomes customary to perform a two-tail test.

ii. One-tailed test of significance


But if the inequality sign is either greater than (>) or less than (<), it indicates one tail test. There is no
need to divide the chosen level of significance by two to obtain the critical value from the t –table or Z
table. The critical region may be chosen either at the right end (right tail) or at the left tail or half at each
end of the distribution of the variable. The test can be illustrated graphically.

f(x) f(x)

95% non-rejection region 95% non-rejection


region 5% rejection region
5% rejection region

137
Figure 2.11: one-tailed test for
This approach can be used when we have a strong a priori or theoretical expectation regarding the sign
of the coefficients of economic relation (or expectations based on some previous empirical work) that
the alternative hypothesis is one-sided or unidirectional rather than two-sided. For instance, in the t-ta -
ble in annex 2.2 the entry for n-2 degree of freedom at the given level of significance(say ) and

find the point such that .

Step 6: make decision or draw an engineering conclusion: The decision to accept or reject the hypothe-
sis is made usually by comparing test statistics with critical region. This test compares the value of Z
obtained from the sample estimates with the critical values of Z at the given significance level. If the em-
pirical value of Z falls in the critical region, we accept the null hypothesis because the probability of ob-
serving the empirical Z (if our hypothesis were true) is very small.

That is, compare t* (the computed value of t) and tc (critical value of t).
 Two tailed test: If the calculated t-statistics (t*) falls in the critical region, reject null hypoth -

esis and we conclude that  is significantly different from


0 . If
H o if tc t * or tc  t * 
, reject and conclude that  is significantly different from o at
the level  . The two tail test is illustrated graphically in figure 2.10 the degree of freedom here
also are n-2.

 One tailed test: that is, if t*< tc , accept H0 and reject H1. The conclusion is that ̂ is statistically
insignificant. But, if the calculated t-statistics (t*) falls in the shaded area of the figure t c
t *  tc
(critical region), then reject H0 and accept H1. That is, if the computed value lies outside

the interval, reject Ho. We conclude that  is statistically greater than o ( if the alternative

had be
   0 ). For Z distribution : compare z (critical value of z) and z* (the computed
c

value of z). If z* > zc, reject H0 and accept H1. The conclusion is that ̂ is statistically signifi-

cant. If z*< zc, accept H0 and reject H1. The conclusion is that ̂ is statistically insignificant.

In short the steps are


i. Two tailed
The procedure for two sided alternative testing is as follow
138
Step-1:

Step-2: test statistics using


Step-3: define level of significance ( )
Step-4: form the t-table value with n-2 degree of freedom and the given level of significance find

such that the area to the right of it is one half the levels of significance. i.e.

(One half of the level of significance)

Step-5: reject H0 if and conclude that ̂ is significantly different from ̂ at the level
accept otherwise.
ii. One tailed
The procedure for two sided alternative testing is as follows

Step-1:

Step-2: test statistics using


Step-3: define level of significance ( )
Step-4: from the t-table value with n-2 degree of freedom and the given level of significance, find the

point such that the area to the right of it is equal to the level of significance. i.e. .

Step-5: reject H0 if and conclude that ̂ is significantly different from ̂ at the level
, accept otherwise.

Example:
Suppose that from a sample size n=20 observations we estimated the following consumption function:
C  100  0.70Y
(75.5) (0.21)

The values in the brackets are standard errors. Given null hypothesis:
H 0 :  i  0 against the alternative

H1 :  i  0
test statistical significance at 5% level of significance.
a. the t-value for the test statistic is:
ˆ  0 ˆ 0.70
t*    3. 3
SE ( ˆ ) SE ( ˆ ) = 0.21

139
b. Since the alternative hypothesis (H1) is stated by inequality sign (  ) , it is a two tail test, we di-

vide

2  0.05 2  0.025 to obtain the critical value of ‘t’ at 
2 =0.025 and 18 degree of freedom
(df) i.e. (n-2=20-2). From the t-table ‘tc’ at 0.025 level of significance with 18 df is 2.10.

c. Since t*=3.3 and tc=2.1, t*>tc which implies that ̂ is statistically significant.
d. Decision: we reject Ho and accept H1.

Practical Exercise 2.15

Given the above example test the significance of ?

In practice, there is no need to estimate the confidence interval explicitly. One can compute the test
statistic and see whether it lies within the acceptance or rejection (critical) region. We can summarize
the t test of significance approach for hypothesis testing as shown in Table 2.12.

Table 2.12: Decision rule for t-test of significance


• Table 4-1: Decision Rule for t-test of significance

Type of H0 H1 Reject H 0
Hypothesis if

Two-tail 2 = 2* 2 # 2* |t| > t/2,df

Right-tail 2  2* 2 > 2* t > t,df

Left-tail 2 2* 2 < 2* t < - t,df

¿
Notes: β 2 is the hypothesized numerical value of β2
|t | means the absolute value of t. -t α / 2 or t α / 2 means the critical t value at the α/2 or -α/2 level of signifi-
cance and (n − 2) degrees of freedom for the two-variable model. The same procedure holds to test hy-
potheses about β1 and to undertake Z-test.

The “Zero” Null Hypothesis and the “2-t” Rule of Thumb


A null hypothesis commonly tested in empirical work is H0: βi = 0, which says, the slope coefficient is
zero. The objective of “zero” null hypothesis is to find out whether Y is related to X, the explanatory
variable. This null hypothesis can be easily tested by the confidence interval or the t-test approach dis-
tc  2
cussed in the preceding sections. A critical value of is often used as a rule of thumb to judge
140
the significance of a t-statistics at 5% (for a two –tailed test). It is common to use shortcut by
adopting the “2-t” rule of significance, which may be stated as “2-t” Rule of Thumb. “If the number
of degrees of freedom is 20 or more and if α, the level of significance, is set at 0.05, then the null hy-
^β ^β −β
i i i
pothesis βi = 0 can be rejected if the t value [ = )] computed from t = exceeds 2 in absolute
se ( ^β ) se ( ^β )
i i

value”.

Example-7: Suppose that we have estimated the following supply function from a sample of 700 obser-
vations

Conduct the Z-test for the hypothesis H0:


1 = 0? Like the standard error test we formally undertake in

the above test following steps;

Step 1: test the hypothesis:


H 0 :  i  0 against the alternative H 1 :  i  0 for the slope parameter;

H 0 : 0  0 H1 :  0  0
and against the alternative for the intercept.

Step 2: Test statistics: computed value of Z or Z*, taking the value of


i in the null hypothesis. In

ˆi  0 ˆi
z*  
our case
i  0 , then t* becomes: SE ( ˆi ) SE ( ˆi )

ˆ1  0 4  0 4 ˆ0  0 100  0 100


z *ˆ     2.67 z *ˆ    5
1
SE ( ˆ1 ) 1.5 1.5 0
SE ( ˆ0 ) 20 20

Step 3: level of significance to be 5%,



Step 4: Obtain critical value of z, called z c at 2 as it is a two tail test. If the level of significance is 5%,
divide it by two to obtain critical value of z from the z-table.
Step 5: Compare z* (the computed value of z) and zc (critical value of z)

 If z*> zc , reject H0 and accept H1. The conclusion is ̂ is statistically significant.

 If z*< tc , accept H0 and reject H1. The conclusion is ̂ is statistically insignificant.

Practical exercise 2.16


1. For the above consumption function, test the hypothesis Ho: B=0.1 against H 1: at the
level of significance 0.005 and 0.001.

141
2. Show that if a coefficient is significant at the 1 % level, it will also be significant at any
level higher than that?
3. Show that if a coefficient is insignificant at the 10% level, it will not be significant at any
lower than that?

2.7.2.3 The Confidence Interval Approach to Hypothesis Testing

In the confidence-interval procedure we try to establish a range or an interval that has a certain probabil-
ity of including the true but unknown β i. and compare whether the interval include hypothesized value
of estimates. Confidence intervals are almost invariably two-sided, although in theory a one-sided inter-
val can be constructed. For instance: we estimated a parameter and found to be say 0.93, and a “95%
confidence interval” to be (0.77, 1.09). The result show that our 95% confident interval containing the
true (but unknown) value of  as the parameter value is in the interval.

How to carry out a Hypothesis Test Using Confidence Intervals

1. Estimate, , and , and se( ) in the usual way


2. Choose a significance level, , (again the convention is 5%). This is equivalent to choosing a
(1-)100% confidence interval, i.e. 5% significance level = 95% confidence interval
3. Use the t-tables to find the appropriate critical value, which will again have n-2 degrees of
freedom.

4. The confidence interval is given by . The unknown population parameter,

β i, will lie within the defined limits or the confidence interval ^β i ± t α se( ^β ) with n – K degrees
i
2

of freedom (1- α)100 times out of 100.


5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval, then
reject the null hypothesis that  = *, otherwise do not reject the null. For instance, if the hy-

pothesized value of in the null hypothesis is within the confidence interval, accept H 0 and

reject H1. The implication is that is statistically insignificant or the estimates are not stati-

142
cally different from zero. If the hypothesized value of in the null hypothesis is outside the

limit, reject H0 and accept H1. This indicates is statistically significant.

Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations.
Y  128.5  2.88 X  e
(38.2) (0.85)
The values in the bracket are standard errors
Then
a. Construct 95% confidence interval for the slope of parameter
b. Test the significance of the slope parameter using constructed confidence interval.
Solution:

a. The limit within which the true  lies at 95% confidence interval is:
ˆ  SE( ˆ )t c
ˆ  2.88
SE ( ˆ )  0.85
tc
at 0.025 level of significance and 18 degree of freedom is 2.10.
 ˆ  SE ( ˆ )t c  2.88  2.10(0.85)  2.88  1.79.
The confidence interval is: (1.09, 4.67)

b. The value of  in the null hypothesis is zero which implies it is outside the confidence interval.

Hence  is statistically significant.

Practical Exercise 2.16


Suppose ^β 2= 29.48 and σ ( ^β ) = 36.0. Test the hypothesis H0: β 2= 25.0.
2

2.7.2.4. The P-value approach to hypothesis testing


The t-test can also be carried out in an equivalent ways by calculating probability value (P-value) for test
statistic. The traditional methods of the hypothesis testing requires one to look up tabulated value
for the critical tc. An alternative method of test called p-value. The P-value approach however, re-
quires computer program that calculate the areas in the tail for any given value of t * and does not
need critical value. How do we work with p-value approach?

143
First, it is calculating probability that the random variable t is greater than the observed t c, that is
P  value  P(t tc )
. This probability is that same as the probability of type I error or the probability of
rejecting a true hypothesis. A high value for this probability implies that the consequences of reject-
ing the true Ho are sever. A lower p-value implies that the consequence of rejecting a true H 0 er-
roneously are not very sever. That is, the probability of making a mistake of type-I is low. And hence
we are “safe” in rejecting Ho.

Second, compare it with selected level of significance (  ) and make decision. The decision rule is,
therefore to accept Ho ( that is not reject it) if the p-value is too high ,say more than 0.10, 0.05, or
0.01, reject otherwise. In other words, if the P-value is higher than the specified level of signifi-

cance(say  ), we conclude that the regression coefficient is not significantly greater than
 0 or

different from null hypothesis at level of significance. If the P-value is less than  we reject

Ho and conclude that  is significantly greater than 0 .

How high /low being determined by the investigator? The modified steps for the p-value approach are as
follow:
i. One tailed
The procedure for two sided alternative testing follow the following steps

Step-1:

Step-2: test statistics using


Step-3: define level of significance ( )
Step-4: calculate the probability (P-value) that t* is greater than t c. That is, compact area to the right
of the critical value calculated tc.

144
Step-5: reject Ho if the p-value ( ) is less than , level of significance and conclude that the
coefficient statistically significantly if P-value is less than the given level of significance(  ).

ii. Two tailed


The procedure for two sided alternative testing follow the following steps

Step-1:

Step-2: test statistics using


Step-3: define level of significance ( )
Step-4: calculate p-value using tc with the formula defined as

because of the symmetry of the t-distribution around the origin.

Step-5: reject H0 if the two p-value ( )is less than , level of significance

Example: Using our data in table 2.13 on consumption expenditure and income the following result
will be obtained.

Table 2.13: Regression result of income and expenditure

145
. reg Y X

Source SS df MS Number of obs = 12


F( 1, 10) = 1094.16
Model 2063428.34 1 2063428.34 Prob > F = 0.0000
Residual 18858.572 10 1885.8572 R-squared = 0.9909
Adj R-squared = 0.9900
Total 2082286.91 11 189298.81 Root MSE = 43.426

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

X 1.377394 .0416407 33.08 0.000 1.284613 1.470175


_cons 358.4541 120.6035 2.97 0.014 89.73284 627.1754

The test based on P-value is that both slope coefficient are statistically significant with p-value of 0%
respectively.

Practical Exercise 2.17


Test statistical significance of intercept for the above result?

Note that: the t-test and P-value are equivalent. We note it from figure 3.5 that if p-value is less

than the level  ( i.e P (t t*) , then the point corresponding to t* must necessarily be to the right

of
t c n  2 ( ).
This means that t* will fall in the rejection region. Similarly, if P-value
P (t tc ) 

, then t* must be to the left of tc and hence fall in to acceptance region.

2.7.2.5. χ2 test of significance approach/ Testing 


2

Although a test for significance of the error variance  is not common , we present it here for
2

completeness. The steps are;


Ho   2   o2 H1   2   o 2
 Step-1:

If  is large we would suspect that  is proba-


2
 Step 2: the test statistics

bility not equal to


 o2 .

146
 Step-3: from the chi- square table Appendex 2.3.C, we look up the value
 *n  2 ( ) such

that the area to the right is  .


  *n  2 ( )
Step-4: reject Ho at the level  if c

The χ2 test of significance approach to hypothesis testing is summarized in Table 2.14.


Table 2.14 : A summary of the 𝜒2 test

Types of hypothesis H1: the alternative hypothe- Critical region: reject Ho if


sis
df ( σ^ )
2 2 2
σ  σ0
2 2
σ > σ0
2 > χ 2α , df
σ0

df ( σ^ )
2 2 2
σ  σ0
2 2
σ < σ0
< χ 21−α , df
σ 20

df ( σ^ )
2 2 2 2 2
σ = σ0 σ ≠ σ0
2 > χ 2α /2 , df or < χ ¿ ¿
σ0
_____________________________
2
α is also known as the probability of committing a Type I error. A Type I error consists in rejecting
a true hypothesis, whereas a Type II error consists in accepting a false hypothesis.

Note: σ 20 is the value of σ2 under the null hypothesis. The first subscript on χ2 in the last column is the
level of significance, and the second subscript is the degrees of freedom. These are critical chi-square
values.
Example: 4: If σ^ 2 = 42.1591 and df = 8, then construct confidence interval for σ2 taking 𝛼 =5% and in-
H 0 :  2  85 vs H1 :  2  85
terpret it? If we postulate that . Equation above for 𝜒2 in 2.90 provides the

test statistics for H0. Substituting the appropriate values in 2.90 it can be found that under H 0,   3.97
2

. If we assume   5%, the critical  values are 2.1797 and 17.5346. Since the computed  lies
2 2

between these limits, the data support the null hypothesis and we do not reject it. This test proce -
dure is called Chi-square test of significance.

2.7.2.6. Test of goodness of fit or overall test: The F test


So far we have seen how individual test can be done. In addition to individual test over all test of sig -
nificance is recommended to see whether the regression model better explains the behavior of Y than the
individual coefficient model. This can be done using test of significance called F test. This can be done

147
by studying components of regression. A study of these components of TSS is known as Analysis of
variance (ANOVA) from the regression viewpoint. It introduces the learners with an illuminating and
complementary way of looking at the statistical inference problems. In the previous section equation
2.131 we developed the following identity
TSS  ESS  RSS
yi2  
y 
ˆ2 ei2
 
Total Explained Un exp lained
var iation var iation var ation

2 x2
 
yi2   ei2
   i 
Total Explained Un exp lained
var iation var ation
var iation
………………………………………2.141
It is used to judge the overall (joint) significance of parameters. It tries to test null hypothesis which

says
 0  1  0 against  0  1  0 . The F- ratio provides a test statistic to test the null hypothesis

that the is true


1 is zero. Associated with any sum of square there is degree of freedom(df.), the

number of independent observations on which it is based. For the regression with two variables the

TSS has n-1 degree of freedom as it loss 1 degree of freedom in computing the sample mean Y ,
ESS has n-k as there is only two parameters we estimate, and RSS has n-2 degree of freedom . All that
need to be done is to compute the F-ratio and compare it with the critical F value obtained from
the F-table at the chosen level of significance , or obtain the p-value of the computed F statistics.
To this end we compare calculated F -value (F c) with F tabulated (FT). The F* ratio (the observed
variance ratio) is obtained by dividing the two ‘mean square errors’ (MSE), ESS (between the means)
and RSS (within the sample). The MSE is one of the desirable properties of OLS and it appears in the
last column of the Analysis of Variance (ANOVA) table.

Let us arrange the various sums of squares and their associated degree of freedom in the following
Table, which is the standard from of the ANOVA table.

148
Table 2.15: ANOVA table

Source of varia- Sum of Degree of MSE F*


tion square(ss) freedom
Between the means V1=(K–1)=1 2

Or due to regres- ESS / k  1


y r2
F*   K 1  K 1
sion (ESS) RSS / n  k  ei2 1 r2
nK
nk
Within the sample
e 2
i
V2= (n–K)=2
e 2
i
or due to residuals
n2
(RSS)
Total (TSS) F0.05

y 2
i
(n–1)
y 2
i
V1=1
n 1 V2=7.82

Symbolically,
2

F * ( k 1,n  k ) 
ESS /(n  1)

y
RSS /(n  K ) y 2

…………………………………2.142

Suppose you are given with the following regression line and intermediate results for 25 sample obser-
vations.

Y  89  2.88 X
Se (38.4) (0.85)

r 2  0.76 e 2
i  135

In order to compile the ANOVA table based on the information given we need to get ESS (between the
mean) and TSS as follows

r 2
 1
e 2
i

y 2

We know r = 0.76 and 


2 e 2
i  135
then we find y 2
.
,
135
0.76  1 
 y2 ,

149
135
1  0.76  0.24 
 y2
0.24 y 2  135

135
y 2

0.24
 562.5
Thus, the TSS will be obtained as
ESS  TSS  RSS
Hence,
ESS  562.5  135  427.5

To appraise the findings from the regression analysis construct the ANOVA table, obtain F*, and com-
pare the value of F* with the tabulated F value as follows.
Table 2.16: ANOVA example

Source of varia- Sum of Degree of free- MSE F*


tion square dom
Between the means 2 V1=(K –1) 135/1=135 F*=135/15.3
y
(ESS) =8.82
Within the sample e 2
i
V2= (n –K) 427.4/23=18.6
(RSS)
Total (TSS) F0.05

y 2
i
(n – 1) 562.5/24=23.4 V1=1
V2=7.82

The decision for F test is to reject the null hypothesis if F* > F0.025. Therefore, since the tabulated F
value, F0.05 (1, 23) is 7.82 and F* is 8.82 we reject the null hypothesis. This implies that the parameter es-
timates are statistically significant.

The F distribution is shown in figure 4. Under the null hypothesis


i  0 , the value of ESS will be

small in relation to TSS and the computed value of the statistics should fall in the center of the
distribution. A value in the right tail of the distribution arise when SSR is large relative to SSE
and is evidence against the null hypothesis. H 0 is rejected at significance level of  if F
F  F (1,n  2) .
notice that although the alternative hypothesis   0 is two sided, the rejection region
is one tailed. It would not make sense to reject H0 on the basis of a value of the statistics in the
left tail of the distribution. A small value of SSR relative to SSE is evidence consistent with
  0 and not   0 .
150
Figure 4.5: The F distribution with tabulated value cutting off a tail area of 

Comparison of regression analysis and ANOVA:


1. In both ways the variation in Y is split into two additive components
2. The test performed in regression analysis is a test concerning the explanatory power of
the regression as measured by goodness of fit (r 2). The F* ratio is a test of significance of
2

F* 
 y /( K  1)

r 2 Y , X /( K  1)

r2. That is  ei2 /( n  K ) (1  r 2 Y , X ) /( n  K ) …………………..2.143


3. ANOVA table from which we may compute F ratio and use them for testing hypothesis
related to the aim of the study in both methods.
4. For individual regression coefficient (that is a model without intercept term) the t and F

tests are formally equivalent. For the null hypothesis   0 , the t test is based on the com-

ˆ
t .
sˆ
puted statistics Using previously established results and notation,

2
ESS ˆ 2  xi 2  ˆ 
1 ˆ 2
F (1, v)    2    t 2( v )
TSS s2 s  sˆ 
n2  xi 2   
…………….2.144

It is always the case that F (1, v)  t (v) . Furthermore, a similar relationship exists be-
2

tween the tabulated values of the two distributions for v degree of degree of free-
dom. This means that the outcomes of the two test procedures are always consistent
given a chosen level of significance.
5. The regression analysis is more powerful than the ANOVA when analyzing market data
which are not experimental. Regression analysis provides all information which we may
obtain from the method of ANOVA. Furthermore, it provides numerical estimates for
the influence of each explanatory variable. But, the ANOVA approach the addition to the
explanation of the total variations which are obtained by the introduction of an additional
variable in the relationship.
151
6. It is often argued that the ANOVA approach is more appropriate for the study of the in-
fluence of qualitative variables. Because qualitative variables (like profession, sex, and
religion) do not have numerical values and hence their influence cannot be measured by
regression analysis. However, ANOVA technique solely depends on the values of Y and
it does not require knowledge of the values of X’s. This argument, however, lost a lot of
its merit with the expansion of dummy variable in regression analysis

2.8. Prediction/forecasting
One of the main objectives of a regression analysis is prediction. A regression equation is used to predict

E (Y )
values of dependent given the value of an independent/ exogenous variable, i. e X . For instance,
using sample from population we have estimated sample regression line for consumptions expenditure
 

as Y  30.71  0.817 X
where Y is consumption expenditure, and X is income. If X=1000, then

. That is, the estimated average expenditure when income is 1000 birr
is 837.71 birr.

There are two types of predictions: mean prediction and individual predictions. Given CLRMA, it is
possible to construct confidence interval for such predictions. Confidence interval for the regression
prediction provides a way of assessing the quality of prediction.

2.8.1. Mean prediction


Mean prediction is about prediction of the conditional mean value of Y corresponding to the chosen X,
say X0. One can explains the concept using sample regression equation. The estimated regression equa-
  
Y   0  1 X 0
tion is used for predicting the value (or the average value) of Y for given values of X.

Since sample estimates specified as Y  ˆ


0  ˆ1 X 0
is likely to be different from its true value. The dif-
ference between the two (actual and predicted) values will gives some ideas about the forecast and fore-

cast error. To assess these error we need to find the sample distributions of
Y0 .

152
It is common to find the difference between individual response and the average response. The differ-
ence is related to the variance of individuals about the average response.

Let X0 be the given value of X, we would be interested in predicting the mean of Y, given X0. That is,
  
Yp   0  1 X 0
we predict the corresponding value of Y sample regression line, to predict to predict

YP   0  1 X P .

Variance

Because and
1 are estimated with imprecision, the predictor Y 0 is also subject to error. To take

account of this, we compute a standard error ad confidence interval.


 
ˆ
The prediction error is: YP  YP  (  0   0 )  ( 1  1 ) X P . Then, the variance of the prediction error is:
2
 
 
Y  E (Y ) 
E   .

   
var(YˆP  YP )  var(  0   0 )  X 02 var( 1  1 )  2 X 0 cov(  0   0 , 1  1 )

Substituting the var( ), var( ) and cov( ,

153
The variance increases as the deviation in value of X0 is from mean increases.

Confidence interval for the mean predictor


The t- distribution can be used to derive confidence interval for the true, E(Y|X), and test of hypothesis
in the following usual manner. The confidence interval of the mean forecast is given by

  


Pr ˆ0  ˆ1 X 0  SE (Y )t / 2  ˆ0  ˆ1 X 0  ˆ0  ˆ1 X  SE (Y )t / 2  1  
………2.147
Where t* is the critical value of the t-distribution obtained earlier. Note that when X0 is farther
away from the mean X, is larger and the corresponding confidence interval is wider. This
means that if a forecast is made too far outside the sample range, the reliability of the forecast de-
crease.

2.8.2. Individual prediction

Individual prediction is all about individual Y value corresponding to X 0. Suppose, we have a sample re-
gression equations corresponding to population regression equations specified as follows:

Yi   0  1 X i  ui for i  1, 2, 3....., n 
PRF

E (Y )   0  1 X i for i  1, 2, 3....., n 
 
YˆP   0  1 X P
Sample regression function corresponding to population regression function, , is pre-
YˆP Y0
dicted value of Y. The predicted value of will not normally coincide with the true value of which

is given by Y0   0  1 X 0  u0 we have errors.

154
The difference between the predicted Y and actual Y is termed as a prediction error or forecast error of
function. That is
f : (Y  YˆP )
 
 (  0  1 X 0  u )  (  0  1 X 0 )
 
 (  0   0 )  ( 1  1 ) X 0  u
 
(0  0 ) ( 1  1 )
In equation above and , are sampling error in estimating the value of unknown pa-

rameters and .These error tend to decline as we increase sample size. Since under classical as-
 

sumptions estimator (  0 , 1 and i ) are all normally distributed with forecast error is a linear function
u

of these variable which is also normally distributed. That is,

a. mean : E(
̂ 0 )=  0 E(
̂1 )= 1 and E(
ui
) =0
E ( f )   0  E ( ˆ0 )  ( 1  E ( ˆ1 )) X 0  E (u )  0
It follows that
The mean of sampling distribution for function is zero. The OLS predictions are an unbiased predic-
tor of Y.

b. prediction error : The prediction error is


YˆP  YP  ( ˆ0   0 )  ( ˆ1  1 ) X 0  u P
E (YˆP  YP )  E ( ˆ0   0 )  E ( ˆ1  1 ) X 0  E (u P )

 E (YˆP  YP )  0
YˆP  ˆ0  ˆ1 X 0 …………………………………………………2.146

c. Efficiency : the variance of the forecast error is lengthy and tedious

2 2
E (Y0  Yˆ0 ) 2  E Y0  E (Yˆ0 )   E Yˆ0  E (Y0 ) 
2
 f 2   2  E Yˆ0  E (Y0 ) 
2
E Yˆ0  E (Y0 )  Yˆ0 E (Y0 )
Variances of the predictor about its mean
2
 
1 X  X 
 2   n0 
 n
E Yˆ0  E (Y0 ) 
2
 Xi 
2
 
but = i 1
form above equation

155
2
 
1 X  X 
 f 2   2   2   n0 
n 2 
 
i 1
Xi

……………………………..2.147
Alternatively; the variance of the prediction error is:
var(YˆP  YP )  var( ˆ0   0 )  X 02 var( ˆ1  1 )  2 X 0 cov( ˆ0   0 , ˆ1  1 )  var( P )

 2   X  2 
 
2 2
ˆ
var( 1  1 )   2  cov( ˆ0   0 , ˆ1  1 )= -X 
var ˆ0   0 = 2 
But  x  ; nx 2 ;  nx 

var(YˆP  YP )   2
X i
2

 2
X 2
0
 2 X 0 2
X
 2
n x i
2
x 2
i  xi
2

1 ( X 0  X )2
var(YˆP  YP )   2 [1   ]………………………………………….2.148
n  xi2
The variance increases as the value of X 0 is from the mean of the observations increases on the basis of

which
ˆ0 and 1 have been computed. That is, prediction is more precise for values nearer to the mean

(as compared to extreme values). This shows that under classical assumption the forecast error has a

sampling distribution of the form f  N (0,  ) .


2

f2  2
S 2
Sf 2
But we do not have , we replace by its unbiased estimator ( ), we can obtain an unbi-
ased estimator of the variance of the forecast error function by

1 ( X  X )2 f Y Y
Sf 2
 S [1   0 2 ]
2
 0 0  N (0,1) distribution
n  xi  f  f …………2.149

f Y Y
 0 0  N (0,1) distribution
Sf Sf
is student (t) distribution with n-2 degree of freedom.
Then, the general expression for a confidence interval can be constructed using t-tests within-sample
prediction (interpolation): if X0 lies within the range of the sample observations on X and out-of-sample
prediction (extrapolation): if X0 lies outside the range of the sample observations. But the latter is not
recommended!

On the basis of above distribution we are able to construct a confidence interval for the unknown Y 0.

Example: Using our example on consumption expenditure predict (a) the value of sales for a firm?
156
E (Y )
A. mean prediction:
 When
 X0 =100 we predict that X 0100
YˆP   0  1 X P

YˆP  24.4545  0.5091(100)

YˆP  75.3645

1 ( X 0  X )2
 Y 2   2 [1   ]
n  xi2
1 (100  170) 2
Y 2
 42.159[1   ]
10 33, 000

 Y 2  10, 4759 se(Yˆ0 )  3, 2366

E (Y )   0  1 X 0
Therefore 95% confidence interval for X0

  


Pr ˆ0  ˆ1 X 0  SE (Y )t / 2  ˆ0  ˆ1 X 0  ˆ0  ˆ1 X  SE (Y )t / 2  1  


Pr 75.3645  2.302(3.2366)  ˆ0  ˆ1 X 0  75.3645  2.302(3.2366)  .95 

Pr 67.9010  E (Y
X 0100 
)  82.838  .95

Example: Given where Yi is average value of sales for firms and X i is advertising
expense. Predict (a) the value of sales for a firm, and (b) the average value of sales for firms, with an
advertising expense of six hundred Birr.

Solution:

Sales value when advertising of 600 Birr is equal to 8,100 Birr.

Interval prediction of 95% Confidence interval

ˆ 1 ( X 0  X )2
seˆ(YP )  ˆ [1  
* 2
]
n  xi2

157
1 (6  8) 2
 seˆ(YˆP* )  1.35 1  
10 28

 1.35(1.115)  1.508

Hence, 95% CI: 8.1  (2.306)(1.508) = [4.62,11.58]

Yˆi  3.6  0.75 X i

Yˆi  3.6  0.75(6)  8.1


Point prediction: [Average sales | advertising of 600 Birr] = 8,100 Birr.

1 ( X  X )2
seˆ(YˆP* )  ˆ 2 [  0 2 ]
Interval prediction: 95% CI:
n  xi

ˆ 1 (6  8) 2
 se(YP )  1.35
*

10 28

 seˆ(YˆP* )  1.35(0.493)  0.667

95% CI: 8.1  (2.306)(0.667)

[6.56,9.64]

2.9. Evaluating the results of regression

Now that we have presented the results of regression analysis of our consumption–income example
above, we would like to question the adequacy of the fitted model. How “good” is the fitted model? We
need some criteria with which to answer this question. First, are the signs of the estimated coefficients in
accordance with theoretical or prior expectations? A priori, β1, the marginal propensity to consume
(MPC) in the consumption function, should be positive. In the present example it is. Second, if theory
says that the relationship should be positive, and we found it, is it statistically significant application?
The MPC is not only positive but also statistically significantly different from zero; the p value of the es-
timated t value is extremely small. The same comments apply about the intercept coefficient. Third, how
well does the regression model explain variation in the consumption expenditure? One can use R 2 to an-

158
swer this question. In the present example R 2 is about 0.96, which is a very high value considering that
R 2 can be at most 1.

Thus, the model we have chosen for explaining consumption expenditure behavior seems quite good.
We should also see whether our model satisfies the assumptions of CNLRM. We will not look at the
various assumptions now because the model is patently so simple. But there is one assumption that we
would like to check, namely, the normality of the disturbance term, ui. Recall that the t and F tests used
require that the error term follow the normal distribution. Otherwise, the testing procedure will not be
valid in small, or finite, samples.

Normality tests: although several tests of normality are discussed in the literature, we will consider just
three: Histogram of residuals, Normal probability plot (NPP), and Jarque-Bera test(JB)
1. Histogram of residuals
A histogram of residual is a simple graphic device that is used to learn something about the shape
of the probability density function(PDF) of a random variable. On the horizontal axis , we divide the
values of variable of interest (OLS residual) in to suitable intervals and in each class interval we
erect rectangles equal in height to the number of observations(frequency) in that class interval. If
you mentally superimpose the bell shaped normal distribution curve on the histogram, you will get
some idea as to whether normal (PDF) approximation may be appropriate. It is always a good
practices to plot the histogram of the residuals as a rough and ready method of testing for normal -
ity assumption.
1. Normal probability plot(NPP)
A comparatively simple graphic device to study the shape of the probability density function
(PDF) of a random variable is the normal probability plot(NPP) which makes use of normal
probability paper. For such a specially designed graph paper on the horizontal or x-axis, we plot

ui
values of the variable of interest (say OLS residuals, ) and on the vertical, or Y axis we show
the expected value of this variable if it were normally distributed. Therefore, if the variable is in
fact from the normal population, the NPP will be approximately a straight line. Stated other way , if the
fitted line in the NPP is approximately a straight line, once can conclude that the variable of in-
terest is normally distributed.
2. Jarque-Bera test(JB)

159
The JB test of normality is an asymptotic or large sample test . It is also based on the OLS resid -
uals. This test first computes the skewnesss and kurtosis of the OLS residuals and use the follow -
ing test statistic;
 s 2 ( K  3) 2 
JB  h    ..................................................................2.150
 24 

Where n=sample size , S= Skewness coefficient and K= Kurtosis coefficient


For a normally distributed variable, S=0, and K=3 . Therefore, the JB test of normality is a test of the
joint hypothesis that S and K are 0 and 3, respectively. In that case the value of the JB statistic is ex -
pected to be 0. Under the null hypothesis that the residuals are normally distributed, the JB statistic
given in 2.120 follows the chi-square distribution with 2 df as JB showed that asymptotically( i.e
in large samples). If the computed P value of the JB statistic in an application is sufficiently low,
which will happen if the value of the statistic is very different from 0. One can reject the hypothesis
that the residuals are normally distributed. But if the P value is reasonability high which will happen
if the statistic is close to zero, we do not reject the normality assumption.
The sample size in our consumption-income example is rather small. Hence, strictly speaking one
should not use the JB statistic. If we mechanically apply the JB formula to our example, the JB
statistic turns out to be 0.7768. The P-value of obtained from the chi-square distribution with 2 df
is about 068, which is quite high. In other words, we may not reject the normality assumption for our
example. Of course, bear in mind the warning about the sample size.

2.10. Reporting the Results of Regression Analysis


The key regression results can be presented in very briefly. There are various ways of reporting the re-
sults of regression. It has become common and customary to present the estimated equations (coeffi-
cient estimators) along with standard errors (se) and/or t-values placed in parenthesis below the esti-
mated parameter values. These results should be supplemented by R 2 on (to the right side of the regres-
sion equation).In addition, determining the appropriate degree of freedom for t-test requires sample size
information. A convectional way of summarizing this information is as follows;
Y  ˆ0  ˆ1 X
( s0 ) ( s1 ) R 2  __ n  ____ d . f .  ____ F  ___

Example: using consumption data estimated result may be summarized and presented as follows
Y  128.5  2.88 X
(38.2) (0.85) R2 = 0.93. , n=25
Alternative approach

160
When the only restriction likely to be of interest are
ˆ0  0 and /or
ˆ1  0 researcher some

ˆ0 ˆ1
ˆ ˆ
times report the values of the t-ratios se(  0 ) and se( 1 ) in parenthesis under coefficient rather than
standard errors. In this way the reader can observe immediately whether the t-ratio exceeds the crit -
ical values corresponding to some chosen significance level.
Y   0  1 X
(t0 ) (t1 ) R 2  __ n  ____ d . f . ___ F  ___
……………………2.151
Sometimes, result is presented with more information including the estimated coefficients, the corre-
sponding standard errors, the p-values, and some other indicators are presented in tabular form.
Y  0  1 X
( s.e0 ) ( s.e1 ) R 2  __ n  ___ d . f .  __ F  __
(t0 ) (t 1 )
( P  value) ( P  value) ……………2.152
For example, pervious consumption expenditure – income regression results is reported as
Y 24.45  0.5091X
( s.ei ) (64138) (0.0357) R 2  0.9621 d . f .  8 F1,8  202.87
(ti ) (3.8128) (14.2605)
( P  v) (0.00026) (0.0000) --- 2.153
In Eq. (2.153) the figures in the first set of parentheses are the estimated standard errors of the regression
coefficients, in the second set are estimated t values computed under the null hypothesis that the true
population value of each regression coefficient individually is zero and in the third set are the estimated
p values.
Accordingly, for each coefficient they are computed as

   24.45  0 24.45
t 0 0    3.8128
se(  ) 6.4138 6.4138

   0.5091  0 0.5091
t  1 1    14, 2605
1
se( 1 ) 0.0357 0.0357
One can compare this p-value with tabulated value in the level of significance is reported. Usually, the
“2-t” Rule of Thumb is used for 95% level of confidence.

By presenting the p values of the estimated t coefficients, we can see at once the exact level of signifi-
cance of each estimated t value. Thus, under the null hypothesis that the true populations intercept value
161
is zero, the exact probability (i.e., the p value) of obtaining a t value of 3.8128 or greater is only about
0.0026 for 8 df which less than 0.05(level of significance). We reject H 0 in a sense that the sample evi-
dence enable as to reject the argument that there is relationship between the two. Therefore, if we reject
this null hypothesis, the probability of committing a Type I error is about 26 in 10,000, a very small

probability indeed. For 1 , the probability of obtaining at t-value of 14.2605 or greater is about
0.000003 which is very small in dead.

If the true MPC were in fact zero, our chances of obtaining an MPC of 0.5091 would be practically zero.
But, our data says that p-value is closer to zero, which says probability of type I error is very small.
Hence, we can reject the null hypothesis that the true MPC is zero and it is statistically significant.

Earlier we showed the intimate connection between the F and t statistics, namely, F1,k = t2k . Under the
null hypothesis that the true β0=0, ( equ. 2.30) shows that the F value is 202.87 (for 1 numerator and 8
denominator df) and the t value is about 14.24 (8 df); as expected, the former value is the square of the
latter value, except for the round off errors.

2.10. Non-linear relationship

Sometimes, we are interested in percentage changes in Y brought about by a percentage change in X


called elasticities.

In computing the elasticties from regression line we use the estimates of , , and of sample

data. Thus, . By substituting for in expression of elasticties, we obtain

2.11. Applications

Using date in table 1.4 we have estimated and found the result as indicated in table below
162
. reg Y X

Source SS df MS Number of obs = 12


F( 1, 10) = 1094.16
Model 2063428.34 1 2063428.34 Prob > F = 0.0000
Residual 18858.572 10 1885.8572 R-squared = 0.9909
Adj R-squared = 0.9900
Total 2082286.91 11 189298.81 Root MSE = 43.426

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

X 1.377394 .0416407 33.08 0.000 1.284613 1.470175


_cons 358.4541 120.6035 2.97 0.014 89.73284 627.1754

a. Formally present the result?


i. In convectional way-using t-value
ii. In convectional way - using standard error
iii. Full form (including p-value)
b. Do the signs of the estimated coefficients agree with your intuition
c. Test statistical significance of the model
d. Interpret the coefficients

Answers
a.

163
b. All the sign are in line with the economic theory, but the coefficients should be between one
and zero. For this case it is above one.
c. Both constant and coefficient is statistically significant
d. The independent variable has relationship with dependent variable.

Practical Exercise 2.20

A researcher wants to see relationship between sales and advertising expenditure presented in table 2.7
the following regression result is obtained
Table 2.17: regression result between sales and advertising

. reg sales adver

Source SS df MS Number of obs = 10


F( 1, 8) = 8.60
Model 15.75 1 15.75 Prob > F = 0.0189
Residual 14.65 8 1.83125 R-squared = 0.5181
Adj R-squared = 0.4579
Total 30.4 9 3.37777778 Root MSE = 1.3532

sales Coef. Std. Err. t P>|t| [95% Conf. Interval]

adver .75 .2557377 2.93 0.019 .1602677 1.339732


_cons 3.6 2.090177 1.72 0.123 -1.219956 8.419956

a. Formally present the result?


i. In convectional way-using t-value
ii. In convectional way - using standard error
iii. Full form (including p-value)
b. Do the signs of the estimated coefficients agree with your intuition
c. Test statistical significance of the model
d. Interpret the coefficients

164
Summary
We have seen simple linear regression model in which there is a relationship between one dependent

(say Y) and independent variable (say X). The econometric model is specified as .

The model has two components: deterministic part, and stochastic part . The basic

reason for random part are; omission of variables from the regression, random behavior of the human
beings(behavior), imperfect specification of the mathematical form of the model, errors of aggregation,
unavailability of data, errors of measurement, poor proxy variables, principle of parsimony…. etc . Al-
though the model is simplistic, and hence unrealistic in most real-world situations, a good under-
standing of its basic underpinnings will go a long way towards helping you understand more
complex models.

The basic objectives are to estimate the relationship if it exist between variables, forecasting economics
and using the model for policy purpose. To estimate the model we use population data as a principle,
but in most practical situation we do not have full information on population due to unmanageability of
population and resource problem for data collection. Hence, we use sample regression function to better
estimate population regression function.

To this end, it is better to follow econometric methodology stated in chapter one. We start our economic
with theoretical framework of economic relationship and extend it to econometric model with set of as-
sumptions related to error terms and it’s distributions. The next step is estimation of the numerical val-
ues of the parameters of economic relationships. There are three of the most commonly used methods
estimation. These are ordinary least square method (OLS); maximum likelihood method (MLM);
method of moments (MM). Having estimated the model, it should be evaluated. There are three criteria
for evaluation: economic criteria, statistical criteria, and econometric criteria. Then hypothesis testing
will follow in which we start with hypothesized value for estimator from priori experience or theory.
The aim of statistical inference is to draw conclusions about the population parameters from the sample
statistics. Given the hypothesis we obtain estimates of the parameters of economic relationship and then
test their statistical reliability by applying some rule which will enable us to decide whether to accept or
reject estimates. There are many rules for testing such standard error approach, test of significance and
confidence interval, and p-value approach. There are two mutually complementary approaches for devis-
ing such rules, namely, confidence interval and test of significance.

165
Key terms

Linearity Population regression line


OLS assumptions Sample regression line
Economic criteria p-value
Econometric criteria t-value
Statistical criteria Regression sum of square
Method of moment Goodness of fits
Maximum likelihood methods Simple linear regression model
Best linear unbiased estimator Total sum of square
Confidence interval Test statistics
Normal equations Statistically significant

Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers

166
Review exercise
Part I: choose the best answer and write it on the space provided

1. Error term in a linear regression model of the form is


a. to capture any element of randomness in human behavior in each individual
b. to take in to account wrong functional form
c. to keep our regression model as simple as possible
d. it takes in to account errors in measurement
e. all
f. none

2. Regression analysis
a. Is concerned with correlation between one variable on one or more other variables
b. Is concerned with statistical dependency among variables
c. predicts the average value of one variable on the basis of the fixed values of the other vari-
ables
d. B and C
e. All
f. None
3. which one of the following is true about desired properties of estimators as per Markov theorem
a. linear
b. unbiased
c. minimum variance
d. all
e. none
4. which one the following is among estimation mechanism for coefficients of simple linear model
a. ordinary least square
b. method of moments
c. maximum likelihood method
d. all
e. none
5. Why OLS is more powerful and popular method of estimation
a. the estimators obtained using OLS method have some optimal properties
b. simple computational procedures
c. easily understandable
d. essential components of most econometrics technique
e. all
f. none
6. As per OLS method
a. sum of error terms are minimized
b. sum of square of error terms are minimized
c. sum of deviation of error terms from their mean is minimized
d. all
e. none

167
7. What do R2 measure?
a. it measures goodness of fit
b. the extent to which the regression model capture the variation in total Y
c. correlation between two variables
d. covariance between two variables
e. A and B
f. all
g. none
Part II: Do the following questions clearly
1. The following results have been obtained from a sample of 11 observations on the value of sales
(Y) of a firm and the corresponding prices (X).
X = 519.18 Y = 217.82

∑ X 2i =3,134,543 ∑ X i Y i =1,296,836 ∑ Y 2i =539,512


(i) Estimate the regression line of sales on price and interpret the results.
(ii) What is the part of the variation in sales which is not explained by the regression line?
(iii) Estimate the price elasticity of sales at the average values of sales and price?
2. The supply of hypothetical product and it corresponding prices are given as follows for 12 obser-
vation

Table 2.18: Hypothetical product and it corresponding prices

N 1 2 3 4 5 6 7 8 9 10 11 12
X 9 12 6 10 9 10 7 8 12 6 11 8
Y 6 76 52 56 57 77 58 55 67 53 72 64
9
then:

Y 2
, X 2
,  XY ,  y .  x .  xy
2 2

a. compute
Y  Bo  B1 X i  U i
b. assuming obtain OLS estimators of Bo and B1
c. write fitted model and interpret the results
d. compute variance of Bo and B1
e. compute R2
Y  Bo  B1 X i  U i
3. Given simple linear regression model of the form and all classical linear re-
gression assumptions, show that OLS estimator satisfy BLUE property specifically

B1
a) is linear
168

B1
b) is unbiased

c)
 0 has minimum variance

4. The following table includes the gross national product (X) and the expenditure on food (Y)
measured in arbitrary units in an underdeveloped country over ten-year period 1960-9.
Table 2.19: gross national product (X) and the expenditure on food (Y)

1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
a. Estimate the food function Y = β 1+ β 2X + u
b. What is the economic meaning of your results?
c. Compute the coefficient of determination and find the explained and unexplained varia-
tion in the food expenditure.
d. Compute the standard errors of the regression estimates and conduct tests of significance
at the 5 percent level of significance.
e. Find the 99 percent confidence interval for the population parameters.
5. Given the following estimated consumption function

^ =5,000 + 0.8 Y t
2
C r =0.95
t
(500) (0.09) n=15
Where C = consumption expenditure; Y = income.
a. Evaluate the above-estimated function on the basis of (i) the available economic theory,
(ii) statistical criteria r2 and t tests on the 𝛽’s. (use 𝛼 = 5 percent).
b. Estimate the savings function.
c. Estimate the MPC and MPS.
d. Interpret the constant intercepts of the consumption and saving functions.
e. Forecast the level of consumption and the level of saving for 1980, if in that year income
is $200,000.

169
Appendix 2.1:
Proof of BLUE property
The detailed proof of these properties are presented in Appendex 2.1
2.5.2.1. Linearity
Definitions: An equation is linear with respect to a set of variables Z if it can be written in the form

where ai does not depend on Z, otherwise it is non-linear. The


equation is linear with respect to the Z’s, since it can be written as .

The equation is linear with respect to Y, because it can be written as

, but does not depend on Y. But it is non-linear with respect Z because the term

cannot be written as .
Definition: An estimator is linear if the equation that defines it is linear with respect to the depen -
dent variable Y. Otherwise it is non-linear.
Prepositions: The least square estimator is linear because we can rewrite the formula for the least

square estimators in linear form as where which do


not depends on Y.
Proof

i. .Linearity: (for )

Proof: From (2.17) of the OLS estimator of is given by:


Σx (Y −Ȳ ) Σx i Y −Ȳ Σx i
= i 2 = ,
Σx i Σx 2i

(but Σ xi=∑ ( X− X̄ )=∑ X−n X̄ =n X̄−n X̄=0 )


xi
=K i
; Now, let Σx 2i (i=1,2,.....n)

is linear in Y
Similarly

ii. is linear in Y?

We have established that:

170
Substitute we obtain

Taking Yi as a common factor we may write

2.5.2.2. Unbiased4ness:

Proposition: are the unbiased estimators of the true parameters


^
From your statistics course, you may recall that if θ is an estimator of θ then

E( θ^ )−θ=the amount of bias and if θ^ is the unbiased estimator of θ then bias =0 i.e.

E( θ^ )−θ=0 ⇒ E ( θ)=θ
^

Definition: An estimator is unbiased if its expected value is equal to respectively, the


true value of the parameter it estimates. Otherwise it is biased. Algebraically,

An estimator is clearly desirable because it means that on average, the estimated values will be the
true value even though in particular sample this may not be so. Put in other way if an estimator

is unbiased then the expected difference between and is zero; the estimates does not tend
to be too high or too low. In other word, if an estimator is unbiased, then we know that, although
the estimator probability does not take the true value, it does not tends to be off in one particular
direction or the other. If it is biased, then it tend to give an answer that is either too high ( if the
expected difference is positive) or too low ( if the expected difference is negative).

A.

 Proof (1): Prove that is unbiased i.e.

We know that

,
Σx i Σ ( X − X̄ ) ΣX − n X̄
Σk i = = =
Σx 2
i
2
Σx i Σx 2
i

171
Because

To proof the numerator in the above equation is equal to ∑ x 2i follow the procedure below.

∑ x 2i =∑ ( X i− X )2
− −

=∑ ( X 2i −2 Xi X + X2 )
− −

= ∑ X i −2 X ∑ X i + n X
2 2

− ∑ Xi
∑ X 2i −2 X ∑ X i +nX n
=

=∑ X 2i −X ∑ X i

Finally

∑ x 2i =∑ x 2i
Therefore, equation 2.87 will be

 Taking the expected values and recalling that


X i ' s are fixed we obtain
¿
E( β 1 )=E (β 1 )+ E
∑ x i μi =E ( β )+ ∑ x i E ( μi )
∑ x 2i 1
∑ x2i
μ
 Since β 1 , the true parameter is constant, E( β 1 )=β 1 . Furthermore, since the mean of i is zero (
E( μi )=0 ), the second term in the right hand side vanishes and we have

 

Mean of  1  E (  1 )   1 -------------------------------------- (A2: 5)

172
¿
β1
 Equation 2.90 is thus read as; the mean of the OLS estimate is equal to the true value

β1
of the parameter . Therefore, is unbiased estimator of .

Therefore, is unbiased estimator of β .

B.

 Proof(2): prove that is unbiased i.e.:

From the proof of linearity property under 2.2.2.3 (a), we know that:

, Since

,
=∑ ( 1 n − X̄ k i )ui
…………………………………… (A2:6)

is an unbiased estimator of
2.5.2.3. Consistency

Definition: an estimator is consistent as sample gets arbitrary large, the difference between the

estimated value and the true value gets arbitrary small. Otherwise, it is inconsistent. This prop-
erty assures as data becomes enough large , the probability of damagingly large error reduce and

173
become as close to 0 as we need to. It gives us some assurance that is in some sense close

to true .
Intuitive term, consistency is the property that the estimator converges to the true value as the sam-
ple size increased indefinitely. However, this property has its limits as well , because we never
have an infinite amount of data, or even as large an amount as wide like to have. For finite sam-
ples of data, the probability of being in any given range around the true value is less than one,
and we do not know exactly how much less. We would like to use an estimator as likely as
possible to be close to the true value. Even if it’s not very likely, it is the best option we have
if our estimator has a better chance than any other estimator.

Property: The least estimators are consistent provided and .

The estimator presented. The estimator presented above is clearly not consistent because increasing
the sample size does not affect it.

(Minimum variance of
2.5.2.4. Efficiency )

Now, we have to establish that out of the class of linear and unbiased estimators of ,

possess the smallest sampling variances. For this, we shall first obtain variance of

then establish that each has the minimum variance in comparison of the variances to other
linear and unbiased estimators obtained by any other econometric methods than OLS.

I. Deriving variance of
¿

a. Variance of β 1 : It can be proved that

 Recall the following to proceed with prove


 ∑ k i=0
 ∑ k i X i =1

[ ]
2
∑ xi ∑ x2i 1
∑ k 2i = = =
 ∑ x2i ∑ x i ∑ x i ∑ x 2i
2 2

 Therefore, β 1=k i Y i will be

174
¿

 We already proved that E( β 1 )=E (β 1 )=β 1 , is unbiased estimator. Thus

--------------------------------------------------------------------- (A2:8)

----------------------------------------------------(A2.9)
 Expanding the right hand side of equation A2.9 gives

=Ε [ k 21 u21 +k 22 u22 +. . .. .. . .. .. .+k 2n u2n +2k 1 k 2 u1 u 2 +.. . .. ..+2 k n−1 k n un−1 u n ]


=Ε [ k 21 u21 +k 22 u22 +. . .. .. . .. .. .+k 2n u2n ]+Ε [2 k 1 k 2 u1 u2 +.. .. . ..+2 k n−1 k n un−1 u n ]
=Ε ( ∑ k 2i u2i )+Ε( Σk i k j u i u j ) i≠ j
2 2 2 2
=Σk i Ε(ui )+2 Σki k j Ε(u i u j )=σ Σki (Since Ε(u i u j ) =0
Σxi Σx 2i 1
Σki = Σk i2= =
Σx 2i =0 and therefore, ( Σx 2i )2 Σx 2i
=E [ ∑ k 2i σ 2 ]
μ

------------------------------------------------(A2.10)
¿

a. The variance of β 0 :It can also proved that

 Recall the following first


¿ − ¿ −

 β 0 =Y −β 1 X
but
¿ − −

 β 0 =Y −k i Y i X

[ ]
¿ −
1
β 0 =∑ −X k i Y i
 n
 Therefore,

175
[ ( )] 1 −
¿
var ( β 0 )=var ∑ n
− X ki Y i
------------------------------------------(A2.12)

∑[ ]
2
¿
1 −
var( β 0 )= − X k i var(Y i )
n
2 2
 But, var(Y i )=E[ Y i−E (Y i )]=E( μi )=σ μ

[ ]

− 2
¿
1 X ki
var ( β 0 )=σ 2μ ∑ n 2

n
+ X 2 k 2i

1
∑ k 2i =
 Since ∑ k i=0 and ∑ x 2i , we obtain

[ ] [ ]
− −

var ( β 0 )=σ 2μ
¿
1
+
X 2
=σ 2μ
∑ 2
xi + n X
2

n ∑ xi2
n∑ 2
xi
-------------- (A2.13)
− −

 Since, we know ∑ x i =∑ ( X i− X ) =∑ X i −n X ,
2 2 2 2

The above expression will be

[ ]
− −
¿
var( β 0 )=σ 2μ
∑ X i −n X +n X
2 2 2

n ∑ x 2i

 
var(  0 )   2 
 X i
2
 ------------------------------- (A2.14)
2 
 n x i 

Alternatively we can derive by substituting equation (A2.14) in (A2.13), we get

[
=Ε Σ (
1
− X̄ k i ) u2i
n
2
]
2
=∑ ( 1 n − X̄ k i ) Ε( ui )2
=σ 2 Σ( 1 n − X̄ k i )2
2
=σ2 Σ ( 1 − X̄ k i + X̄ 2 k 2i )
n2 n
=σ 2 Σ( 1 n −2 X̄ n Σki + X̄ 2 Σk2i )
, Since ∑ k i=0
=σ 2 ( 1 n + X̄ 2 Σk 2i )
1 X̄ 2 Σx 2i 1
=σ 2 ( + ) Σk i2= =
n ∑x 2 ( Σx 2i )2 Σx 2i
i , Since
Again:
176
( )
1 X̄ 2 Σx 2i +n X̄ 2 ΣX 2
+ = =
n Σx 2 nΣx2i nΣx2i
i

…………………………………………(A2.15)

II. Establish that variance of estimators ( ) are the minimum


We have computed the variances OLS estimators. Now, it is time to check whether these variances of
OLS estimators do possess minimum variance property compared to the variances of other estimators of

the true , other than .

To establish that possess minimum variance property, we compare their variances with that

of the variances of some other alternative linear and unbiased estimators of , say and

. Now, we want to prove that any other linear and unbiased estimator of the true population parame-
ter obtained from any other econometric method has larger variance those OLS estimators.

Lets first show minimum variance of and then that of .

1. Minimum variance of

Suppose: an alternative linear and unbiased estimator of β and;

Let ………………………………(A2.16)
where ,
w i≠k i ; but: w i=k i +c i

Since
Y i =α + βX i +U i

,since Ε(u i )=0

Since is assumed to be an unbiased estimator, then for is to be an unbiased estimator of β , there

must be true that


Σw i=0 and Σw i X=1 in the above equation.

But,
w i=k i +c i
Σw i=Σ(k i +c i )=Σki +Σc i
Therefore,
Σci =0 since Σki =Σw i=0
Again Σw i X i=Σ (k i +c i ) X i=Σk i X i +Σci X i

177
Since
Σw i X i=1 and Σki X i=1 ⇒ Σc i X i=0 .

From these values we can drive


Σci x i=0 , where xi =X i − X̄
Σci x i=∑ ci ( X i − X̄ )= Σci X i + X̄ Σci
Since
Σci x i=1 Σci =0 ⇒ Σc i x i =0
Thus, from the above calculations we can summarize the following results.
Σw i=0 , Σw i x i=1 , Σci =0 , Σci X i =0

To prove whether has minimum variance or not lets compute to compare with .
var( β∗)=var( Σw i Y i )
=Σw 2 var (Y i )
i
2
since Var (Y i )=σ
Σw 2 = Σ ( k i + ci )2 =Σk 2i + 2 Σk i ci + Σc 2i
But, i
Σc i x i
Σk i ci = =0
⇒ Σw 2i =Σk2i + Σc2i Since Σx2i

Therefore,

……………………A2.17
2
Given that ci is an arbitrary constant, σ Σc 2i is a positive i.e it is greater than zero. Thus

. This proves that possesses minimum variance property. In the similar way we

can prove that the least square estimate of the constant intercept ( ) possesses minimum variance.

2. Minimum Variance of

We take a new estimator , which we assume to be a linear and unbiased estimator of function of

^ is given by:
. The least square estimator α

By analogy with that the proof of the minimum variance property of β^ , let’s use the weights wi = ci + ki
Consequently;

Since we want to be on unbiased estimator of the true α , that is, , we substitute for

in α∗¿ ¿and find the expected value of α∗¿ ¿.

178
For α∗¿ ¿ to be an unbiased estimator of the true α , the following must hold.

∑ ( wi )=0 , Σ (wi X i )=1 and ∑ ( wi ui )=0


i.e., if
Σw i=0 , and Σw i X i=1 . These conditions imply that Σci =0 and Σci X i =0 .

^
As in the case of β , we need to compute Var ( ) to compare with var( )
var (α∗)=var ( Σ( n− X̄ wi )Y i )
1

=Σ( 1 n− X̄ wi )2 var(Y i )
=σ 2 Σ( 1 n − X̄ w i )2
1 2 1
=σ2 Σ ( + X̄ w i −2 2 ¯ X wi)
n2 n
n 2 1
= σ 2( + Σ X̄ w i −2 X̄ 2 Σw i )
n2 n

,Since
Σw i=0
Σw 2 = Σk 2i + Σc 2i
but i

=σ 2
( ) ΣX 2i
+ σ 2 X̄ 2 Σc 2i
nΣx 2i

The first term in the bracket it var( α^ ) , hence

…………………………..A2.18
2 2 2
, Since σ X̄ Σci > 0
Therefore, we have proved that the least square estimators of linear regression model are best, linear and
unbiased (BLU) estimators.
Appendix 2.2: Proof of variance
There are three steps to prove the variance.

G. Write the equation in deviation form


To prove this we have to compute
∑ ei 2 from the expressions of Y, Y^ , y ,

179
Then
⇒Y =Y^ +e i
…………………………………………………………….A2.19

Summing (2.111) will result the following expression


ΣY i =Σyi +Σei
ΣY i =Σ Y^ i sin ce ( Σei )=0
Dividing both sides the above by ‘n’ will give us
ΣY Σ Y^ i
=
n n 
Putting (2.112) and (2.113) together and subtract
Y =Y^ + e
Ȳ =Ȳ^
⇒(Y −Ȳ )=( Y^ −Ȳ^ )+e
 y i  yˆ i  e
…………………………….…………..……………(A2.21)
From (2.113):
ei  y i  yˆ i
………………………………………..….……………..(A2.22)
Where the y’s are in deviation form.

yi
V. Compute
y i and yˆ i
Now, we have to express in other expression as derived below.
  
Yi   0  1 X i  U i       SRF
From:
 
Y   0  1 X  U
We get, by subtraction
 
yi  (Yi  Y )  1 ( X i  X )  (U i  U )  1 xi  (U  U )

 yi  1 x  (U  U )
………………………………………………………….(A2.23)

yˆi
VI. Deriving

We assumed earlier that, (u )  0 , i.e in taking a very large number samples we expect U to have a
mean value of zero, but in any particular single sample U is not necessarily zero.
Similarly: From;
 
Yˆ   0  1 x
 
Y   0  1 x
We get, by subtraction

Yˆ  Yˆ   ( X  X )
1

180

 ŷ  1 x
…………………………………………….…………………..(A2.24)

VII. Combining the three

Substituting (2.22) and (2.23) in (2.24) we get


 
ei  yi  yˆi but yˆ  1 x , and yi  1 x  (U  U )
Then


ei  1 x  (U  U )  1 x,

 (ui  u )  ( 1  1 ) xi
The summation of squared residuals over n sample values yields:

ei2  [(ui  u )  ( 1  1 ) xi ]2
 
 [(ui  u ) 2  ( 1  1 ) 2 xi 2  2(ui  u )( 1  1 ) xi ]
 
 (ui  u ) 2  ( 1  1 ) 2 xi 2  2[( 1  1 )xi (ui  u )]
Taking expected values we have:
 
(ei2 )  [(ui  u ) 2 ]  [( 1  1 ) 2 xi 2 ]  2[( 1  1 )xi (ui  u )]
……… ……..……(A2.25)

The right hand side terms of (2.116) may be rearranged as follows


[(u  u ) 2 ]  (u i2  u u i )
a.
 ( u i ) 2 
  u i2  

 n 
1
  (u i2 )  (u ) 2
n
 n  n (u1  u 2  .......  u i ) 2
2 1
(u i2 )   u2
since
 n 2  1n (u i2  2u i u j )

 n 2  1n ( (u i2 )  2u i u j ) i  j
 n 2  1n n u2  n2  (u i u j )
 n u2   u2 ( given (u i u j )  0)
  u2 (n  1)
…………………………………………….…..(A2.26)
ˆ 2 2 2 ˆ
[(    ) xi ]  xi .(    ) 2
b.
1
( ˆ   ) 2  var( ˆ )   u2 2
Given that the X’s are fixed in all samples and we know that x
2 1
xi2 .( ˆ   ) 2  xi2 .  u x 2
Hence
xi2 .( ˆ   ) 2   u2
………………………….……………………(A2.27)
 
[( 1  1 )xi (ui  u )]  2[( 1  1 )(xiui  u xi )]
c. -2
181

[( 1  1 )(xi ui )] ,sin ce  xi  0
= -2

(    )  ki ui
But from (2.22) , 1 1 and substitute it in the above expression, we will get:
ˆ
[(    )xi (u i  u )  2(k i u i )(xi u i )]
-2
 x u  
  i 2i  ( x i u i ) 
 ki 
xi
 x i
= -2    ,since x i
2

 (x u ) 2
 2  i 2i 
 xi 
 xi 2 u i 2  2xi x j u i u j 
 2  2

 x i 
 x 2 (u i 2 )  2( xi x j )(u i u j ) 
 2 2 2
i  j 
 xi xi 
2
x (u i )
2
 2 2
( given (u i u j )  0)
x i
 2(ui2 )  2 2
……………………………………………..……….(2.119)
Consequently, Equation (2.38) can be written interms of (2.117), (2.118) and (2.119) as follows:
 
 ei2  n  1 u2   2  2 u2  (n  2) u2
………………..………….(2.42)
From which we get
 e 2  1 
 i   E (ˆ u2 )   u2
 n  2 
u  0
………………………………………………..(2.43) n
ei2
ˆ u2 
where ei being the sum of the residual square or residual sum of square ,
2
Since n2
n  2 is the number of degree of freedom (df). The term number degree of freedom means the total
number of observations in the sample ( n) less the number of independent (linear) constraints or re-
strictions put on them. In other words, it is the number of independent observations out of a total of n
 

observations. There are two condition given by the normal equation for
 0 and 1 from

1 
n
u  0
1
n
  X i , ui   0

182
ei2
ˆ 2 
For example, before the RSS. Thus, n  2 is unbiased estimate of the true variance of the error

ei2
ˆ 2 
term (  ). The conclusion that we can drive from the above proof is that we can substitute
2
n2

ˆ and ˆ1 , since


for (  ) in the variance expression of 0
2 E (ˆ 2 )   2
. Hence the formula of vari-
ˆ
ance of  and  becomes;
ˆ

ˆ ˆ 2 uˆi2
Var ( 1 )  2
xi = (n  2) xi …………………..……………………(2.120)
2

 X i2   uˆi 2  X i 2
Var ( ˆ0 )  ˆ 2  2 

 n xi  n(n  2) xi 2
………………..……………(2.121)
Hence, it is assumed that the expected value of the estimated variance is equal to the true variance and

hence it is unbiased estimates of


 u2 . That is

E ( u2 )   u2
[ E (  i )  0]
 Expression 2 together with the assumption that the mean of the error term is zero gives

the assumption that the variable


 i has a normal distribution. That is

  N (0,  u2 ) -------------------------------------------------------- (2.122)

e   y i  ˆ  xi y i
2
e
2 2
i
i
can be computed as .

Appendix A

183
Appendix B

184
Chapter Three: The Multiple Linear Regression Model

3.1. Economic theory and Empirical experiences


In simple regression we study the relationship between a dependent variable and single independent vari -
able. But it is rarely the case that economic relationships involve just two variables. Rather a dependent
variable (Y) can depend on a whole series of explanatory variables or regressors. For instance, quantity
demand of a good depends not only on price of the good, but also on price of substitute goods and the

185
consumer’s income. In our consumption–income example besides income, a number of other variables
such as wealth of consumers, family size, social norm, etc are also likely to affect consumption expendi -
ture. Therefore, we need to extend our simple two-variable regression model to cover more than two
variables. Adding more variables leads us to the discussion of multiple regression models.

In this chapter we extend the general methodology of econometric analysis used in chapter two to case
with many variables. Accordingly, we start our discussion with multiple linear regression theory and
framework with two and three explanatory variables and then we extend it to k-explanatory variables.
Then, we look at assumptions of the multiple regressions, data, estimation, hypothesis testing, predictions
and reporting. Furthermore, we will proceed our analysis to generalize multiple regression model using
matrix algebra.

3.2. Multiple regressions Model: Notation and modeling


Multiple regression models is an immediate and appealing extension of simple linear regression to a case
in which there are more than one explanatory variables or a set of explanatory variables. It is more
amenable to ceteris paribus assumption which allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. For better understanding, we need to extend our simple
two-variable regression model to cover models involving more than two variables.

A. The two explanatory variables Models


Recall our simple linear regression model
Y i   0  1 X i
……………..…………………….…………… 3.1

One easily extends the case to multiple linear regression models by adding number of explanatory vari-
ables. Let us start with the simplest form of the theory of demand. Economic theory postulates that quan-
tity demanded (say Y) of a given commodity depends on its price (X 1) and consumers’ income (X2) .
That is
Y  f ( X1, X 2 )
……………………………………………………………..3.2
Assuming that the relationship between Y, and X1 and X2 is linear, the mathematical model will be:
Y   0  1 X 1   2 X 2
…………………….…………………..………………..3.3.

Equation 3.3 shows that the relationship between Y, and the explanatory variables X 1 and X2, is exact in
the sense that the variations in the quantity demanded of Y are fully explained by changes in price (X 1)
and income ( X2 ).

186
B. Three explanatory variables
One can extend the logic to model with four variables in which there are one explained and three ex -
planatory variables in similar manner. Quantity demanded (say Y) of a given commodity depends on its
price (X1) and consumers’ income (X2), X3 is price of substitute/complement goods. This can be written
as
Y  f ( X1, X 2 , X 3 )
……………………….………………………………….3.4
Corresponding mathematical model can be
Yi   0  1 X 1   2 X 2  3 X 3
……………………..… …………………..…….3.5
C. K- explanatory variables case
Adding more variables leads us to the discussion of multiple regression models.
Yi  f ( X 1 , X 2 , X 3 ,........, X k )
Economic model: …………….……………….….. …….3.6

Yi   0  1 X 1   2 X 2  3 X 3  ........   k X k
Mathematical model: ………………… 3.7

3.3 Econometric model


3.3.1. Modelling
The above mathematical economic models can be extended to econometric model by including distur-
bance term. The disturbance term is of similar nature with that of simple regression model reflecting: ba-
sic random nature of human responses, errors of aggregation, errors of measurement, errors in specifica-
Xi
tion of the mathematical form of the model and any other (minor) factors, other than that might influ-

ence Y.

Accordingly, we include error term in our model above.


A. For two explanatory variable case , econometric model can be specified as
Yi   0  1 X  U i
……………..……………...……..…………….…………… 3.8
If we gather observations on these variables and plot them on a diagram, we will observe that not all of
them lie on a plane: some will lie on it, but others will lie above or below it. This scatter is due to the rea -
sons we discussed in chapter 2. Hence, we need to take the influence of such factors into account by in-
troducing a random variable U. We need some assumptions about the random variable Ui. The assump-
tions used here are the same as those used in the simple linear regression models.

For two explanatory variables the mathematical model of equation 3.3 will be extended as:
187
Yi   0  1 X 1   2 X 2  U i
………….……………..……………………….……3.9

Yi 
Where quantity demanded, X1 own price, X2 is consumer’s income. On a priori grounds, we would

expect the coefficient


1 to have a negative sign, given the law of demand. Since for normal commodi-

ties the quantity demanded changes in the same direction as income,


 2 is expected to be positive.

For three explanatory variable econometric model can be specified as


Yi   0  1 X 1   2 X 2  3 X 3  U i
………….……………..……………….……3. 10

Yi 
Where quantity demanded, X1 own price, X2 is consumer’s income, X3 is price of substitute/comple-

ment goods, and  ' s are unknown parameters and i is the disturbance term.
u

In case of many explanatory variables case


Yi   0  1 X 1   2 X 2  3 X 3  ............ k X k  ui
Econometric model: …………………..3.11

X ij
However, the most logical way of writing variables is using two subscripts as where i indicating row
and j is indicating column. The coefficients should be linear of degree one. The representation with two
subscript in equation below is designed to convert in to a matrix form ( the last part is relegated to
matrix form of econometrics model)

Yi   0  1 X 1i   2 X 2i  ui
Two variable cases: ………… ……………..……………..….3.12
Yi   0  1 X 1i   2 X 2i  3 X 3i  ui
Three variable cases: ……………….. …..….………..3.13

Yi   0  1 X 1i   2 X 2i  3 X 3i ...................... k X ki  U i
K variable case …….…...........…3.14

(commonaly used )

188
Alternatively,

If our data set is having large information/observations with K explanatory variable the general rep -
resentation for can be
Yi   0  1 X i1   2 X i 2  ui
Two variable cases: …………… ………………………...3.15
Yi   0  1 X i1   2 X i 2  3 X i 3  ui
Three variable cases: ……………. …..…..……...3.16

Yi   0i  i1 X i1  i 2 X i 2  i 3 X i 3 ......................ik X ik  U i
K-variables case: ..………..3.17
X k i  (i  1, 2,3,......., k )
Where are explanatory variables( in case of time series data the subscript t
 j ( j  0,1,2,....(k  1))
denotes time and the ith observation), Yi is the dependent variable and are un-
ui
known parameters and is the disturbance term.

The relationship of econometric variables is not as such explained by one equation alone as explained

above. There are many equations at a time that one can run. So we can write it in expanded form

Y1   0  1 X 11   2 X 21  3 X 31  .....................   k X k 1  U1

Y2   0   2 X 12   2 X 22  3 X 32  ....................   k X k 2  U 2
……………………3.18

Y2   0   2 X 13   2 X 23  3 X 33  ....................   k X k 3  U 3

. . . . . . .

. . . . . . .

Yn   0  1 X 1n   2 X 2 n  3 X 3n  ....................   k X kn  U n

Note that: The coefficients should be linear of degree one. The representation with two subscript in

equation above is designed to convert equation to a matrix in its transposed form.

189
The reason why there is a need for more than one predictor variables may be many. First, if we add more

ysis can
factors to our model more of the variation in Y can be explained. Thus, multiple regression anal
be used to build better ground for predicting the dependent variable. We want to know that individual ef -
fects or the contribution of each independent variable to explain the variation in the response variable.

Secondly, predictors may themselves be correlated. In order to understand the nature of multiple regres-
sion analysis easily, we start our analysis with the case of two explanatory variables, then extend this to
the case of k-explanatory variables.

3.3.2. Population and sample regression functions


Recall corresponding to each PRF we have SRF in simple regression model. We extend similar con-
cept to multiple regression model
PRF  Yi   0  1 X 1i   2 X 2i  U i
………………………………….…..3.19
The expected value of the above model is called population regression equation i.e.
E (Y )   0  1 X 1   2 X 2 E (U i )  0
, Since . …………………................... 3.20
Given the assumptions of the classical regression model, taking the conditional expectation of Y on both
sides of (3.20), we obtain
Y
E( i )   0  1 X 1i   2 X 2i
X 1i , X 2i
………………………………………3.21


where i is the population parameters,  0 is referred to as the intercept and  1 and  2 are also some-
times known as partial regression coefficients of the model. In words, (3.21) gives the conditional mean
or expected value of Y conditional upon the given or fixed values of X1 and X2.  2 for example measures
the effect of a unit change in X 2 on E (Y ) when X 1 is held constant. Therefore, as in the two-variable
case, in multiple regression analysis conditional upon the fixed values of the regressors. What we obtain
is the average or mean value of Y or the mean response of Y for the given values of the regressors.

Since the population regression equation is unknown to any investigator, it has to be estimated from sam -
ple data. The sample counterpart of the above population regression function will be

…………………………………………………3.22

As usual we can find the mean of Y or conditional of Y given explanatory variables by taking con -
ditional expectation of Y on both sides
  
Y
E( i )   0  1 X 1i   2 X 2i
X 1i , X 2i ……………….………………………….3.23
It explains mean value of Yi given fixed values of regerssors.

190
3.3.3. Interpretation of multiple regression equation
  

The above coefficients such as 0 1


 ,  , and  2 are partial coefficients. B measures the changes in the
1
Yi
E( )
mean value of X 1i , X 2i per unit changes in X holding X constant. It is the slope of sample re-
1i 2i
gression equation with respect to X1 holding X2 constant. It gives the direct or net effect of a unit
changes in X1 on the mean value of Y keeping X 2 as constant. In similar manner, B 2 measures the
Y
E( i )
changes in the mean value of X 1i , X 2i per unit changes in X holding X constant. It is the slope of
2i 1i
sample regression equation with respect to X 2 holding X1 constant. It gives the direct or net effect of a
unit changes in X2 on the mean value of Y keeping X1 as constant.

Practical Exercise 3.2


Assume that our demand function is in one with four independent variables: income, taste and preference,
price of substitute, price of complement. Then, construct population regression function and correspond-
ing sample regression econometric model? Give ideal interpretation?

3.4. Estimation
3.4.1. Assumptions of Multiple Regression Model
In order to specify our multiple linear regression models and proceed our analysis some assumptions are
compulsory. We continue to operate within the framework of the classical linear regression model
(CLRMA) introduced in chapter two above with additional compulsory assumption of no perfect multi-
collinearity. More specifically these assumptions are:
1. Linearity of the model in parameter: the model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or not.
2. Randomness of the error term: The variable u is a real random variable.
3. Zero mean of the error term: the random variable u i has a zero mean for each value of X i .
E (u i )  0
That is,
E (u i )   u
2 2
ui Xi
4. Hemoscedasticity: the variance of each is the same for all the values. i.e. (con-
stant)

are normally distributed. i.e. U i ~ N (0,  )


ui 2
5. Normality of u: The values of each
ui Xi
6. No auto or serial correlation: The values of (corresponding to ) are independent from the
uj E (u i u j )  0 Xi  j
values of any other (corresponding to Xj ) for i j. i.e. for
X 1i , X 2i , X 3i
Explanatory variables are non-stochastic variables in repeated sampling.
7.
191
ui ui
8. Independence of and Xi: Every disturbance term is independent of the explanatory variables.
E (u i X 1i )  E (u i X 2i )  0
i.e. .This condition is automatically fulfilled if we assume that the values of
the X’s are a set of fixed numbers in all (hypothetical) samples.
9. No perfect multicollinearity (no exact linear relationship): The explanatory variables are not per-
fectly linearly correlated. The assumption of no collinearity is a new one and means the absence of
possibility of one of the explanatory variables being expressed as a linear combination of the other.
Existence of exact linear dependence between X 1 and X 2 would mean that we have only one indepen-
dent variable in our model than two. If such a regression is estimated there is no way to estimate the
separate influence of X 1 (β ¿¿ 1)¿ and X 2 (β 2 ) on Y, since such a regression gives us only the com-
bined influence of X 1and X 2 on Y. To see this suppose X 2 =2 X 1 then
Y i=β 0 + β 1 X 1 i+ β 2 X 2 i +ui
Y i=β 0 + β 1 X 1 i+ β 2 ( 2 X 2 i ) +ui
Y i=β 0 + ( β 1 +2 β2 ) X 2i +ui
Y i=β 0 + α X 2 i+ ui , where α = ( β 1 +2 β 2 )

Estimating the above regression yields the combined effect of X 1 and X 2 as represented by
α =( β 1 +2 β2 ) where there is no possibility of separating their individual effects which are represented

by β 1 and β 2.

This assumption does not guarantee that there will not be correlations among the explanatory vari-
ables; it only means that the correlations are not exact or perfect, as it is not impossible to find two or
more (economic) variables that may not be correlated to some extent. Likewise, the assumption does
not guarantee absence of non-linear relationships among X’s either.
10. correct model specification: the model has no specification error in that all-important explana-
tory variables appears explicitly in the function and mathematical form in correctly defined(linear
or non-linear form and the number of equations in the model).
11. No error of measurements: explanatory variables are measured without errors.

We can’t exclusively list all the assumptions but the above assumptions are some of the basic assump-
tions that enable us to precede our analysis.

Practical Exercise 3.3

192
1. Which assumptions of classical linear regression model for simple linear regression model and multi-
ple regression models are similar?
2. Why assumption of multicolleanirity is so important when dealing with multiple regression models?

3.4.2. Estimation of parameters of two-explanatory variables model

Given the above assumption one can estimate the regression using OLS. To this end let’s estimate the
parameters of the three variable regression model specified above. Suppose that the sample data has been
used to estimate the population regression equation.
A. Actual approach
We leave the method of estimation unspecified for the present and merely assume that equation (3.23)
has been estimated by sample regression equation which we write as:
Yˆ  ˆ0  ˆ1 X 1  ˆ 2 X 2
……………………………………………….(3.24)
̂ j j
Where are estimates of the and Yˆ is known as the predicted value of Y.

Given sample observation on with sample relation between Y , X 1 & X 2 , we estimate the model using
the method of least square (OLS).
Y  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ei
…………………………………………….(3.25)
Y  Y  e i

Then the error is


ei  Yi  Yˆ  Yi  ˆ0  ˆ1 X 1  ˆ 2 X 2
…………………………………..(3.26)
ˆ , ˆ and ˆ 2 for which  ei is minimum. To obtain ex-
2
The method involves obtaining the values 0 1

pressions for the least square estimators, we partially differentiate  i with respect to 0 1
e2 ˆ , ˆ and ˆ 2

and set the partial derivatives equal to zero.



  ei2   
 2 Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i  0
ˆ 0 ………………………. (3.27)

193

  ei2  
 2 X 1i Yi  ˆ0  ˆ1 X 1i  ˆ1 X 1i  0 
ˆ
 1 ……………………. (3.28)
 e   2
 X Y 
2

 ˆ0  ˆ1 X 1i  ˆ 2 X 2i  0
i

ˆ
2i i
2 ………… …………..(3.28)

Partial derivatives with respect to


̂ 0


  ei2  
 2 Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i  0 
ˆ
 0


2 Yi  ˆ0  ˆ1 X 1i  ˆ2 X 2i  0 

2 Yi   ˆ0   ˆ1 X 1i   ˆ2 X 2i  0

 Y  nˆ i 0  ˆ1  X 1i  ˆ2  X 2i


…………………….………….3.29

Partial derivatives with respect to 1


̂
   ei2 
ˆ

 2 X 1i Yi  ˆ0  ˆ1 X 1i  ˆ1 X 2i  0 
1

X 1i Y  ˆ
i 0 
 ˆ1 X 1i  ˆ1 X 1i  0

X 1i iY  ˆ0  X 1i  ˆ1  X 1i 2  ˆ1  X 1i X 2i

X 1i iY  ˆ0  X 1i  ˆ1  X 1i 2  ˆ1  X 1i X 2i


……………….3.30

Partial derivatives with respect to


̂ 2

  ei2  
 2 X 2i Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i  0 
ˆ 2

X 2i Y  ˆ
i 0  ˆ1 X 1i  ˆ2 X 2i  0 
X 2i iY   ˆ0 X 2i   ˆ1 X 1i X 2i   ˆ2 X 2i X 2i

X 2i iY  ˆ0  X 2i  ˆ1  X 1i X 2i  ˆ2  X 2i X 2i

194
X 2i iY  ˆ0  X 2i  ˆ1  X 1i X 2i  ˆ2  X 2i 2
…………….……..3.31

Equation 3.29, 3.30, and 3.31 are the multiple regression equation that produces three normal equations:
 Y  nˆ0  ˆ1X 1i  ˆ2 X 2i ………………………………….….(3.30)
 X 1iYi  ˆ0X 1i  ˆ1X 12i  ˆ2X 1i X 1i ………………………….….(3.31)
 X 2iYi  ˆ0 X 2i  ˆ1X 1i X 2i  ˆ2 X 22i ………………………....(3.32)
From (3.30) we obtain
̂ 0
ˆ  Y  ˆ X  ˆ X
0 1 1 ----------------------------------------------------------- (3.33)
2 2

Substituting (3.33) in (3.31) , we get:


 X Y  (Y  ˆ X  ˆ X )X  ˆ X  ˆ X
2
1i i 1 1 2 2 1i 1 1i 2 2i

 X Y  YˆX  ˆ1 (X 1i  X 1X 2i )  ˆ 2 (X 1i X 2i  X 2 X 2i )


2
 1i i 1i

  X Y  nY X  ˆ 2 (X 1i  nX 1i )  ˆ 2 (X 1i X 2  nX 1 X 2 )
2 2
1i i 1i
------- (3.34)
We know that
X  Yi   (X iYi  nX iYi )  xi yi ......................3.34  a
2
i

X  X i   (X i 2  nX i 2 )  xi 2 ........................3.34  b


2
i

We can rewrite the above equation 3.34 in divation form using equation 3.35 and 3.36 as follows
 x1 y  ˆ1x1  ˆ2 x1 x2 …………………………………………...…(3.35)
2

Using the above procedure if we substitute (3.33) in (3.31), we get


 x2 y  ˆ1x1 x2  ˆ2 x2 ………………………………….…….…..(3.36)
2

The normal equation (3.30 to 3.32) can be written in deviation form bring (3.35) and (3.36) together as
indicated below
 x1 y  ˆ1x1  ˆ2 x1 x2 ……………………………………………….(3.37)
2

 x2 y  ˆ1x1 x2  ˆ2 x2 …………………………………..…….…….(3.38)


2

ˆ ˆ
One can easily solve equation 3.37 and 3.38, to get  1 and  2 using matrix. We can rewrite the above two
equations in matrix form as follows.
  x12  x1 x2    1    x1 y 
     
  x1 x2  x2    2    x2 y 
2

Using Cramer’s rule we obtain


2
 x y .  x  x x . x y
ˆ1  1 2 2 2 1 2 2 2
x1 . x 2  ( x1 x 2 ) …………………………..…………….. (3.39)
195
2
 x y . x   x x .  x y
ˆ 2  2 2 1 2 1 2 2 1
x1 . x 2  ( x1 x 2 ) ………………….……………………… (3.40)

B. Deviation form
Altervatively, it is customary to express these formulas in terms of the deviations of sample observations
from their mean values. Hoping that interested students can drive the formula of OLS estimates in devia-
tion forms we present the procedures we used in chapter 2, as follows. To express the formula of the
OLS estimates in deviation form (deviation of sample observations of the variables from their mean), we
proceed as follows.

First from deduct fitted from Y


Recall

But from equation 3.35, Yˆ is known as the predicted value of Y;


Yˆ  ˆ0  ˆ1 X 1  ˆ2 X 2
………………………… ……………………..3.41

Y  Yˆi  ui
 
ui  Y  Yi
Also mean of Y
Y  ˆ0  ˆ1 X 1  ˆ2 X 2
………………………………………………3.42.

Find deviation of
 
Y  Y  y  ( ˆ0  ˆ0 )  ˆ1 ( X 1  X 1 )  ˆ2 ( X 2  X 2 )

In deviation form, we can represent it as


 
Y  Y  y  0  ˆ1 x1i  ˆ2 x2i where x1i  X 1  X 1 ; x2 i  X 2  X 2

y  ˆ1 x1i  ˆ2 x2i
…………………………………….………….3.43
Since
 
ui  yi  y

ui  yi  ˆ1 x1i  ˆ2 x2i
Secondly, the sum of squared residuals is

………………………….….3.44
196
Thirdly, first order conditions for minimization requires

……………….…….3.45

……….…….….. 3.46

Carrying out the differentiation on both equation we obtain



2 ( y  ˆ x  ˆ x )( x )  0
i 1 1i 2 2i 1i
………………………………….3.47
2 ( yi  ˆ1 x1i  ˆ2 x2i )( x2i )  0
………………………..…………3.48

Rearranging we obtain the normal system of equation with two explanatory variables written in deviation
form, are

x y  ˆ x  ˆ  
2
1i i x x
1 1i 2 1i 2 i
…………………………………..3.49

x 2i yi  ˆ1  x1i x2i  ˆ2  x2i 2


………………………….………....3.50

The terms in the parentheses are the ‘knowns’ which are computed from the sample observations, while
^β and ^β are the unknowns. The known terms appearing on the right-hand side may be written in the
1 2

form of a matrix form

ˆ1  x1i  ˆ2  x1i x2i   x1i yi


2

ˆ1  x1i x2i  ˆ2  x2i 2   x2i yi


……………………………….………..3.51

  x1i 2 x x   ˆ    x1i yi 
 1i 2 i
 1    ..................3.52
  x1i x2i x 2i
2
  ˆ2    x2i yi 
A  b

A  b

|∑ x 21i ∑ x 1 i x 2 i =|A|
∑ x 1 i x 2 i ∑ x 22i |
197
Finally, by using Cramer’s rule, the formula 0 1
ˆ ˆ and ˆ2 in which the variables are expressed in
terms of deviation form their means will be given as
  x1i yi  x1i x2i    x1i 2  x1i yi 
   
   x2i yi  x2i 2     x1i x2i
  x 2 i yi 
1  , and  2
  x1i 2
 x1i x2i   x1i 2
 x1i x2i 
  
  x1i x2i  x2i 2   x1i x2i  x2 i 2 

Thus

ˆ0  Y  ˆ1 X 1  ˆ2 X 2


……………………..……………………….……3.53
2
x1 y . x 2  x1 x 2 . x 2 y
ˆ1  2 2
x1 . x 2  ( x1 x 2 ) 2 …………………………..……….…….. 3.54
2
x 2 y . x1  x1 x 2 . x1 y
ˆ 2  2 2
x1 . x 2  ( x1 x 2 ) 2 ………………….………………….…… 3.55

C. The meaning of partial regression coefficients

As mentioned earlier, the regression coefficients β1 and β2 are known as partial regression or partial
slope coefficients. The meaning of partial regression coefficient is as follows: β1 measures the change in
the mean value of Y, E(Y), per unit change in X1, holding the value of X2 constant. Put differently, it gives
the “direct” or the “net” effect of a unit change in X1 on the mean value of Y, holding the value of X1
constant. Likewise, β2 measures the change in the mean value of Y per unit change in X2, holding the
value of X1 constant. That is, it gives the “direct” or “net” effect of a unit change in X2 on the mean value
of Y, net of any effect that X2 may have on mean Y.

The multistep procedure just outlined is merely for pedagogic purposes to drive home the meaning of
“partial” regression coefficient. Fortunately, we do not have to do that, for the same job can be accom -
plished fairly quickly and routinely by the OLS procedure discussed in the next section.

Practical Exercise 3.4

1. Derive the formula for for three independent variable


2. How does one interpret the coefficient?

198
3.5. Data and Example
One can use different data set depending on the situation.The following three example can help us how
one can use OLS estimate and interprete the coefficient.
Example-1:
Let data on a sample of five randomly drawn persons from a large firms to see how their annual
salaries related to years of education past high school, and years of experiences with the firm they
are working for.

Table 3.1: Annual salary, year of education, and experience relationship

Observa- Years of education


Annual salary (Y) Years of experience (X2)
tion (X1)
1 30 4 10
2 20 3 8
3 36 6 11
4 24 4 9
5 40 8 12
Sum 150 25 40
Mean 30 5 8

Then
X
a. compute Y , 1 and Y 2

b. compute  1 ,  2 ,  i 1 ,  2 ,  1 2
x2 x2 yx yx xx
 ,  and  2
c. compute 0 1

Table 3.2: Coffiecent estimation of annual salary, year of education ,and experience relationship
Ob- Annual Years of Years of
serva- salary education experience yi  Y  Y x1  X 1  X 1 x2  X 2  X 2 x12 x22 x1y x2y x1x2 y2
tion (Y) (X1) (X2)
1 30 4 10 0 -1 0 1 0 0 0 0 0
2 20 3 8 -10 -2 -2 4 4 20 20 4 100
3 36 6 11 6 1 1 1 1 6 6 1 36
4 24 4 9 -6 -1 -1 1 1 6 6 1 36
5 40 8 12 10 3 2 9 4 30 20 6 100
sum 150 25 40 0 0 0 16 10 62 52 12 272
Mean 30 5 8

Solution

199
Y
 Y  150  30 X1 
X 1

25
 5 X2 
X 2

50
 10
a. n 5 , n 5 . n 5

b. x 1
2
 16
, x 2
2
 10
, yx
i 1  62
,  yx 2  52
, x x 1 2  12
, y 2

c.
ˆ1  x1i  ˆ2  x1i x2i  x1i yi
2

ˆ1  x1i x2i  ˆ2  x2i 2   x2i yi

ˆ116  ˆ212  62
ˆ112  ˆ210  52

ˆ1  0.25, ˆ2  5.5


Then
ˆ0  Y  ˆ1 X 1  ˆ2 X 2  30  (0.25)(5)  5.5(10)  23.75

Interpretation
ˆ1  0.25, says that one year of education past high school increase by one year salaries increase by
0.25. Likewise , as years of experience increase by one year has an effect increasing salalry by 5.5 birr (
ˆ2  5.5 ) holding years of schooling constant.
Yˆ  23.75  0.25 X 1  5.5 X 2
The regression equation will be

Example 2
The table below contains observations on the quantity demanded (Y) of a certain commodity, its price
(X1) and consumers’ income (2) for a commodity for a period of 7years. Fit a linear regression to these
observations and test the overall goodness of fit (with R 2) as well as the statistical reliability of the esti-
mates ^β 1 , ^β 2 , β^ 3.

Table 3.3: Quantity demand, price, and income

Year Quantity demanded(Y) Price Income


(X1) (X2)
2001 2 12 24
2002 34 24 12
2003 12 32 68
2004 45 56 46
2005 67 12 78
2006 8 2 24
2007 45 14 68

200
Then
X
a. compute Y , 1 and Y 2

b. compute  x , x
1
2
2
2
,  y x ,  yx ,  x x
i 1 2 1 2

c. compute
 0 , 1 and  2

d. the standard error of the coefficient 0 1 2  ,  and 


e. the calculated t statistics(t cal) of the coefficient of X 2 term
f. coefficient of determination

1
The minor determinant for each parameter is formed by the elements of the determinant left after strik-
ing out the row and column including the parameter

n  10 Y
Y i

800
X1 
X 1

600
6 X2 
X 2

8000
 800
n 10 n 10 n 10

Yi  Y  y X 1  X 1  x1 X 2  X 2  x2

y 2
 3450 x1
2
 30 x 2
2
 1580000

 yx 1  300 x x 1 2  5900  yx2 , 65000

ˆ0  8000  ˆ1 X 1  ˆ2 X 2


……………………..…………………………….…..…3.56
x1 y . x2  x1 x2 . x2 y ˆ 300 ( 30)  (5900)(65, 000)
2
ˆ1   1  
x12 . x2 2  ( x1 x2 ) 2 (30).(158, 0000)  ( 59, 000) 2 ……..….. 3.57
2
x y . x  x x . x y 65000(30)  (59000. ( 300)
ˆ 2  2 2 1 2 1 2 2 1  
x1 . x 2  ( x1 x 2 ) (30) 2 .(1,580, 000)  (59000) 2 ………..….… 3.58

Practical Exercise 3.5


1. The following data is provided where X 1 and X2 are independent variable and Y is dependent
variable

Table 3.4: Data on Y,X1 and X2

Y X1 X2
50 15 8
15 17 3
35 19 5
47 18 12

201
40 21 5
22 14 10
36 16 8
55 15 9

a. the constant term of the regression model


b. The coefficient of X2 is
X
c. compute Y , 1 and Y 2

d. compute  1 ,  2 ,  i 1 ,  2 ,  1 2
x2 x2 yx yx xx
 ,  and  2
e. compute 0 1

3.6. Evaluation of model and estimates


3.6.1 The coefficient of determination (R2)and adjusted R2
A. The coefficient of determination (R2)
In simple linear regression model, we have seen that r2 measures the goodness of fit of the regression
equation or the percentage of the total variation in the dependent variable (Y) explained by the explana-
tory variable X. This definition of r2 can be easily extended to multiple linear regression models. Like-
wise, R2 (capital letter in this case as opposed to small r 2 ) measure the proportion of the variation in Y
explained by all explanatory variables jointly in multiple linear regression models. The coefficient of de-
termination is formally written as :
2

R 
2 ESS

 y   (Y  Y ) 2

TSS  y  (Y  Y )
2 2

RSS e 2 y 2  e 2
R2  1   1 i2  i 2 i
TSS yi yi

……………………………………………….3.59
y, ˆ1 x1i ˆ2 x2i and ei
One can alternatively using deviation from the mean . Accordingly,
y  ˆ x  ˆ x  e
1 1i 2 2i i


y  ˆ1 x1i  ˆ2 x2i

y
y  ei

ei  y  
y
ei 2  ei ei

ei2  ( y i  ˆ1 x1i  ˆ 2 x 2i ) 2


202
 ei ( y i  ˆ1 x1i  ˆ 2 x 2i )
 ei y  ˆ1x1i ei  ˆ 2 ei x 2i
 e i y i e x  e i x 2 i  0
since i 1i
 y i ( y i  ˆ1 x1i  ˆ 2 x 2i )
i.e ei2  y 2  ˆ1x1i y i  ˆ 2 x 2i y i
 ˆ1x1i y i  ˆ 2 x 2i y i 
2
 
y
2
ei
        
Total sum of Explained sum of Re sidual sum of squares
square (Total square ( Explained ( un exp lained var iation)
var iation) var iation)
-------------------- (3.60)
ˆ ˆ
ESS  1x1i y i   2 x 2i y i
 R2  
TSS y 2 -------------------------------------(3.61)

As in simple regression, R2 is also viewed as a measure of the predictive ability of the model over the
sample observations or as a measure of how well the estimated regression fits the data. The value of R 2 is
Yˆ & Yt
also equal to the squared sample correlation coefficient between . The higher R2 indicates that there
Yt Yˆt
is a close association between the values of and the values of predicted by the model, . In this case,
the model is said to “fit” the data well. If R 2 is low, there is less or no association between the values of
Yt Yˆt
and the values predicted by the model, . Hence, the model does not fit the data well.

Like r2, the value R2, ranges from 0 to 1 (i.e., 0 ≤ R2, ≤ 1), where R2 = 1 implies that the fitted regression
line explains 100% of the variations in the dependent variable, while R2 = 0 implies that the model does
not explain any of the variations in the dependent variable. However, R 2 lies between two extremes. The
higher the value of R2, the greater would be the percentage of the variation of Y explained by the regres-
sion plane.

The general formula for R2 can be developed by inspecting the formula of R2 for two-variable and three-
variable models. Recall, R2 for a model with one explanatory variable (sometimes known as two-variable
model) studied in chapter two

Y  ˆ0  ˆ1 X 1i  ei

y  ˆ1 x1i
ESS ˆ1x1i yi
R 2 y , x1   ................................................................3.62
TSS y 2

203
Three variable cases

Y  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ei


,

y  ˆ1 x1i  ˆ2 x2i
ESS ˆ1x1i yi  ˆ2x2i yi
R 2 y , x1 , x2  
TSS y 2
…………………………….3.63
ˆ x y
Equation above shows that the formula of R 2 involves additional term, 2 2i i , in the numerator than
that in equation in chapter two 2.106. This implies that R2 for a model with one additional explanatory

variable,
x2i
has one extra term formed from the product of
̂ 2 and x2i yi . Generally, by inspecting the

formula of R2 from the three variable models given above, we see that for each additional explanatory
variable, the formula of R2 includes an additional term in the numerator, where the additional term is
formed from the product of the estimate of the parameter corresponding to the new variable, and the sum
of the product of the deviations of the new variable and the dependent variable.

Example: Refer to example 3.3 above determine the R2 and what percent remains unexplained?

Formula
ESS ˆ1x1i yi  ˆ2x2i yi 0.25(62)  5.5(52)
R 2 y , x1 , x2     0.57
TSS y 2 y 2

The result shows that 57% of the variations in the quantity supplied of a commodity under considera -

tion is explained by the varation in the price of the commodity and 43% remained unexplained by

the price of the commodity .

Consequently, the formula of R 2 for K -variables becomes:

ESS ˆ1x1i yi  ˆ2x2i yi  ........................  ˆK xKi yi


R 2 y , x1 , x2 ,...... xk   .............3.64
TSS y 2

Where
Y  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ..........................ˆK X Ki  ei
, and


y  ˆ1 x1i  ˆ2 x2i  ...........................  ˆK xKi

204
The important property of R2 is that it is a non-decreasing function of the number of explanatory variables
in the model. That is, as the number of explanatory variables increases, the value of R2 almost invariably
yi 2
increases and never decreases. Furthermore, in the formula of R2 is independent of the number of ex-

ei 2
planatory variables (because it is simply , but depends on the number of regressors in
the model. Intuitively, it is clear that as the number of explanatory variables increases, the residual term,
ei 2
is likely to decrease (at least it will not increase); hence R2 will increase. This suggests that in com-
paring two regression models with the same dependent variable but differing number of explanatory vari -
ables, one should worry of choosing the model with the highest R2.

2
B. Adjusted Coefficient of Determination ( R )

2
One difficulty with R is that it can be made large by adding more and more variables, even if the vari-
ables added have no economic justification. As the variables are added the sum of squared errors (RSS)
2
goes down (but rarely remain unchanged) and thus R goes up. If the model contains n-1 variables then
R 2 is nearly one. The manipulation of model just to obtain a high R 2 is not wise. An alternative measure
2 2
of goodness of fit, called the adjusted R and often symbolized as R , is usually reported by regression
programs. Generally, to compare two R2 terms, one must take into account the number of explanatory
variables in the model and adjust R2 for degrees of freedom. It is computed as:

ei2 / n  k  n 1 
R 2  1  1  (1  R 2 ) 
y / n  1  n  k  -------------------------------- (3.65)
2

Consider from equation 3.59 that

ei2 ei2
R2  1   1  R2
y 2
 y 2

Note: The term adjusted means that R2 is obtained by adjusting R2 for the degrees of freedom (df) associ-

ated with the sums of squares ei and y entering equation 3.59.
2 2

But,
ei2  n  1 
R2  1  
y 2  n  k 

205
ei2  n  k 
  (1  R )
2

y 2
 n 1 

ei2 ei2

y 2 y 2

 nk 
1  R2    (1  R )
2

 n 1 

 n 1 
1  R  1  R
2 2

 nk 
 n 1 
R 2  1  (1  R 2 )                          3.66
nk 

2
Our R can be written in terms of sample variance as follows;

This measure does not always goes up as number of variable included or added increase because of the
degree of freedom term, n-k, in the denomerator. As the number of variables k increases, RSS goes
2
down, so does n-k leading adjusted R to increase. It is immediately apparent from equation 3.66 and
3.61 is that the adjusted R2 increases less than under unadjusted R 2. It can assume negative, although
2
R is necessarily non-negative. In case R
2
turn out to be negative in an application, it value is taken
as zero. While solving one problem related to goodness of fit, unfortunately another problem introduced
2
is related to lose in its interpretation; R is no longer the percent of variation explained.

2
This modified R is sometimes used and misused as a device for selecting the appropriate set of explana-
tory variables. Which R2 should be used in practices? There is no consensus among scholars. As Thiel
2
(2001) notes that it is good practice use R than R2. Because R2 tend to give optimistic picture of
the goodness of fit when the number of explanatory variables are not small compared to the num -
ber of observations. But such reason is not universally shared and no theoretical justification for its su-
2
periority. Furthermore, Goldberger (2005) argues that the modified R ( R ) will do well.
2

mod ified R 2  (1  k / n) R 2

206
His advice is to report R2 along with n and k and let the researcher decide which R 2 to take. Despite this
advice, it is the adjusted R2, as given in 3.66 reported by most statistical packages along with the
convectional R2.

Practical Exercise 3.6


1. Given information under practical exercise 3.5
a. Compute coefficient of determination and adjusted R2
b. interpret the result
2. The following estimated equation was obtained by ordinary least square regression using quarterly
data for 1960 to 1979 inclusive(n=80) with standard error are in parenthesis, the explained sum of
squares was 112.5 and the residual sum of squares was 19.5.


yi  2.20  0.104 x1i  3.48 x2i  0.34 x3i
(3.4) (0.005) (2.2) (0.15)

1. compute R2 and adjusted R2


2. Which slope coefficient is/ are statistically significant different from zero at the 5% significance
level?
a. B1, B2, B3

3.6.2. Simple, Partial and Multiple Correlation Coefficients


We have seen elementary concept of correlation coefficient in chapter one. The correlation coefficient

measures the degree of linear association between two variables. For the three variables models, we can

ry , x1 , ry , x2 , rx1 , x2
and . That is;
compute three correlation coefficients:

ry , x1
= correlation coefficient between Y and X1

ry , x2
= correlation coefficient between Y and X2

rx1 , x2
= correlation coefficient between X 1 and X3

207
These correlation coefficients are called gross or simple correlation coefficients or coefficients of zero or-
ry , x1 ,
der. Does the simple correlation coefficient, say measure the “true” degree of (linear) association
between Y and X1 , when a second explanatory variable X2 is associated with both Y and X1 kept con-
stant. No. To answer this question, suppose the true sample regression model is:
Y  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ei
………………………………………….3.67

But, suppose for some reason we omit X2 from the model and regress Y on X1 as follows:

Then, we can put the above question in other words as: Will true coefficient of B1 in equation 3.67 be

equal to the true coefficient


1 in equation 3.68? The answer is no. In general ry , x is not likely to re-1

flect the true degree of association between Y and X 1 in the presence of X2. It is likely to give false im-
pression of the nature of association between Y and X 1. Therefore, what we need is a correlation coeffi-
cient independent of the influence of X 2 on X1 and Y, if any. Such a correlation coefficient is known as
partial correlation coefficient (or first order correlation coefficient). The partial correlation coefficient is
determined in terms of the simple correlation coefficient among the various variables involved in a
ry , x1 , x2
multiple relationship. For example, is the partial correlation coefficient between Y and X 1, hold-
ing X2 constant.

Accordingly, we are requested to measure three partial correlation coefficients. These are
i. partial correlation coefficient between Y and X1 ,, keeping the effect of X2 constant is given

ryx1  ryx2 rx1 x2


ry , x1 , x2 
(1  ryx2 2 )(1  rx1 x2 2 )
by :
………………………………………3.69

ii. the partial correlation between Y and X2 keeping the effect of X1 constant is given by
ryx2  ryx1 rx1 x2
ry , x2 , x1 
(1  ryx1 2 )(1  rx1 x2 2 )
………………………………………3.70

iii. the partial correlation between X1and X2 keeping the effect of Y constant is given by

208
rx1x2  rx1 y rx1 y
rx1 , x2 , y , 
(1  rx1 y 2 )(1  rx2 y 2 )
……………………………..………..3.71

The correlation coefficients given in equations 3.69 to 3.71 are also called first order correlation coeffi -
cients. By order of the correlation coefficients we mean the number of secondary variables or the number
of variables held constant when we are analyzing the effect of one variable on some other.

Interpretations of partial correlation coefficient have certain implication. In the two-variable (explanatory
variable) case, r had a straightforward meaning: It measured the degree of linear association (and not cau-
sation) between the dependent variable Y and the single explanatory variable X. But once we go beyond
the two-variable case, we need to pay careful attention to the interpretation of the simple correlation coef-
ryx1  ryx2 rx1 x2
ry , x1 , x2 
(1  ryx2 2 )(1  rx1 x2 2 )
ficient. From ( 3.69 , 3.70 and 3.71), for example, we observe the following: .
ryx1  0 ryx1 . x2 r r
Even if , will not be zero unless yx1 or x1 x2 both are zero.

3.6.3. Properties of OLS estimators

The properties of OLS estimators of the multiple regression model parallel those of the two vari -
able model. More specifically;
I. Regression line passes through the means Y , X 2 , and X 3 . This property holds generally for
k-variable linear regression model [a regressand and (k− 1) regressors]
Yi   0  1 X 1i   2 X 2i  .......   k X ki  ui

u are uncorrelated with X1 and X2 so that  i 1  i 2


u 0 ei ux  ux
II. i
and the residual

III. The mean value of the estimated Yi (Yi ) is equal to the mean value of the actual Y , i

209
…………………………………………….3.73

where as small letters indicate values of the variables as deviations from their respective means.
Summing both sides over the sample values and dividing through by the sample size, n, gives

. The above result we can write . Therefore, the SRF


can be expressed in the deviation form as

III. Given the assumptions of the classical linear regression model, OLS estimators of the partial re -
gression coefficient not only are linear and unbiased , but also have minimum variance in the
class of all linear unbiased estimators.
A. Mean
The mean of the estimates of the parameters in the three-variable model is derived in the same way as in
the two-variable model. The estimates ^β 0, ^β 1, and ^β 2 are unbiased estimates of the true parameters of the
relationship between Y, X1 and X2. That is, their expected value or average is equal to their respective
true parameter itself.

…………………………..3.74
B. Variances of the parameter estimates and their formula

In the preceding chapter we have developed the formula of the variances of the estimates with one and
two explanatory variables. We need standard error for two main purpose; to establish confidence in-
terval and to test statistical hypothesis. The variance of the parameter estimates with two explanatory
variable are obtained by the following formula

var(  0 )   ui 2
X i
2

var(  0 )   ui 2
1 )
var( 
2
(
1
) ..................3.75
x  xi 2  i
i
i
2
x 12
 Ri
2

The variance of the parameter estimates with two explanatory variable are obtained by the following for -
mula

^
var ( β 0 )= σ^
[2 1
n
+
(∑ x 2 )(∑ x2 )−(∑ x x )
1i 2i 1i
2
2i
]
X 21 ∑ x 22i + X 22 ∑ x 21 i−2 X 1 X 2 ∑ x 1 i x 2 i
…………………………3.76

210
var ( ^β 1) =σ^
2 ∑ x 22 i
2 ……………………….………..……….3.77
(∑ x 21 i )(∑ x 22 i )−(∑ x 1 i x 2 i)

var ( ^β 2) = σ^
2 ∑ x 21 i
2 ……………..………………..………3.78
(∑ x 21 i )(∑ x 22 i )−(∑ x 1 i x 2 i)
where σ^ 2=∑ u^ 2i /(n−K ), K being the total number of parameters which are estimated.

One can derive variance formula from the normal equation presented in eq-3.51 and eq 3.52. For better
understanding lets reproduced the normal equation as follows

ˆ1  x1i  ˆ2  x1i x2i   x1i yi


2

ˆ1  x1i x2i  ˆ2  x2i 2   x2i yi

  x1i 2 x x   ˆ    x1i yi 
 1i 2 i
 1   
  x1i x2i x 2i
2
  ˆ2    x2i yi 
A  b
A  b

The variances of the parameter are facilitated by the use of determinants .

The variance of each parameter is the product of  multiplied by the ratio of the minor determinant 1 as-
2

sociated with this parameters divided by the (complete) determinant. Thus

^
var ( β 1) =σ .
| ∑ x 21 i ∑ x 1i x 2i
2 ∑ 1i 2i
x x ∑ x 22 i 2
=σ .
|∑ 2
x2 i 2
=σ .
∑ 2
x2 i
…..........…3.79

| ∑ x 21 i ∑ x 1i x 2i
∑ x 1i x 2i ∑ x 22 i | |
∑ x21 i ∑ x 1 i x 2 i
∑ x 1 i x 2 i ∑ x22 i |
(∑ x21 i )(∑ x 22 i )−¿ ¿ ¿

^
var ( β 2) =σ .
| ∑ x 21 i ∑ x 1i x 2i
2 ∑ 1i 2i
x x ∑ x 22 i 2
=σ .
|∑ 2
x1 i 2
=σ .
∑ 2
x1 i
……….…3.80

| ∑ x 21 i ∑ x 1i x 2i
∑ x 1i x 2i ∑ x 22 i | |
∑ x21 i ∑ x 1 i x 2 i
∑ x 1 i x 2 i ∑ x22 i |
(∑ x21 i )(∑ x 22 i )−¿ ¿ ¿

Therefore, inspection of the above equation tells us that the variance of estimates can be computed
from the ratio of two determinants.

211
a. determinate appearing in the numerator is the minor formed after striking out the
row and column of the term corresponding to the coefficient whose variance is
being computed
b. the determinants appearing in the denominator is the complete determinants on

|

|
∑ x 1 i x 2 i =|A|
2
x 1i
the right hand side of the normal equation.
∑ x 1 i x 2 i ∑ x 22i

There is relationship between R2, variance, partial regression coefficients. Let us note the relationship
between R2 , the variance, and partial regression coefficient in the multiple regression model as indi-
cated in the below

Where is the partial regression coefficient of regressor Xi and Ri2 is the R2 in the regression of
Xi on the remaining (k-2) regerssors.

One can write variance with partial correlation coefficients

[ ]
X 1 ∑ x 2i + X 2 ∑ x 1 i−2 X 1 X 2 ∑ x 1 i x 2 i
2 2 2 2
^ 2 1
h. var ( β 0 )= σ^ n + …….……………..……….…….3.81
(∑ x 2 )(∑ x2 )−(∑ x x )
2
1i 2i 1i 2i

Where

212

r12
the correlation coefficient between X 1 and X2 increase as the variance of and
2

increase for given value of  and


2
x 1
2
or 
x2 2
.Similarly, for given values of  and
2

is inversely proportional to 
r12 x12
, the variance of . That is, the greater the variation

213
in the sample value of X 1, the smaller the variance of , and therefore
1 can be esti-

mated more precisely. A similar statement can be made about the variance of .

 Furthermore, the value of


r23
and x 1
2
or 
x2 2
the variances of the OLS estimators are

directly proportional to  . That is, they increase as  increases.


2 2

ˆ ˆ
iv. We can also express  1 and  2 in terms of covariance and variances of Y , X 1 and X 2
Cov ( X 1 , Y ) . Var ( X 1 )  Cov ( X 1 , X 2 ) . Cov ( X 2 , Y )
ˆ1             3.84
Var ( X 1 ).Var ( X 2 )  [cov( X 1 , X 2 )]2

Cov ( X 2 , Y ) . Var ( X 1 )  Cov ( X 1 , X 2 ) . Cov ( X 1 , Y )


ˆ2             3.85
Var ( X 1 ).Var ( X 2 )  [Cov( X 1 , X 2 )]2

Practical Exercise 3.7


Given information under practical exercise 3.5
c. The standard error of the coefficient of the constant term
d. The calculate t-value of X2

III.7. Hypothesis testing in multiple linear regression

This section extends the ideas of interval estimation and hypothesis testing to models involving three or
more variables. Although in many ways the concepts developed in Chapter 2 can be applied straightfor-
wardly to the multiple regression models, a few additional features which are unique to such models will
receive more attention in this part.

3.7.1.The normality assumption once again


If our sole objective is point estimation of the parameters of the regression models, ordinary least squares
(OLS), which does not make any assumption about the probability distribution of the disturbances ui ,
will suffice. But if our objective is estimation as well as inference, we need to assume that the ui follow
some probability distribution. One of our basic assumption in chapter two is that u i follow the normal dis-
tribution with zero mean and constant variance σ2. We continue to make the same assumption for multi-

ple regression models. With the normality assumption the estimators ̂ 0 , ̂1 , and
̂ 2 are themselves nor-

mally distributed with means equal to true β0, β1, and β2. The OLS estimators of the partial regression co-
efficients are best linear unbiased estimators (BLUE). Moreover, the three OLS estimators are distributed
214
independently identically with mean βi and variance of ˆ . Upon replacing σ2 by its unbiased estimator
2

ˆ 2 in the computation of the standard errors, each of the following variables follows the t distribution
with n − 3 df. Therefore, depending on sample size t or Z distribution can be used to establish confidence
intervals as well as test statistical hypotheses about the true population partial regression coefficients.

……………..3.86

The df are now n-3 because in computing  uˆ 2


i
we put three restrictions on the residual sum of
squares (RSS) as we need to estimate the three partial regression coefficients.

ˆ 2
2
Similarly, the χ2 distribution can be used to test hypotheses about the true σ . Furthermore, (n − 3) 2

follows the  distribution with n − 3 df (The proofs follow the two-variable case discussed in chapter
2

2).

3.7.2. HYPOTHESIS TESTING IN MULTIPLE REGRESSIONS


Once we considered general framework and estimation of technique of multiple linear regression, we go
for hypothesis testing. In multiple regression models we will undertake many tests of significance.
Some of several interesting forms are the following:
1. Testing hypotheses about an individual partial regression coefficient
2. Testing the overall significance of the estimated multiple regression model, that is, finding
out if all the partial slope coefficients are simultaneously equal to zero
3. Testing that two or more coefficients are equal to one another or testing linear equality re-
strictions
4. Testing that the partial regression coefficients satisfy certain restrictions
5. Testing the stability of the estimated regression model over time or in different cross-sec-
tional units
6. Testing the functional form of regression models
This test of significance is the same as the tests discussed in simple regression model. Since testing of
one or more of these types occurs so commonly in empirical analysis, we devote a section on the first
three types.

215
Hypothesis testing about individual regression coefficients
The procedure for testing the significance of the partial regression coefficients is the same as those dis-
cussed for the two-variable case. Since sampling errors are inevitable in all estimates, it is necessary to
apply test of significance in order to measure the size of the error, determine the degree of confidence and
H 0 : i  0
measure the validity of these estimates. Formally, we test the null hypothesis against the al-
H1 :  i  0
ternative hypothesis . The logic of hypothesis testing in multiple linear regressions is an ex-
tension of the concept discussed in chapter two with simple linear regression. This can be done by using
various tests. The most common ones are:
i) Standard error test
ii) Test of significance and Confidence interval(Z-test , Student’s t-test )
iii) P-value test
iv) F-test

H o : Bi  0
Namely, the acceptance of the null hypothesis ( ) implies that the explanatory variable to
which this estimate relates does not in fact influence the dependent variable (Y) and should not be in-
cluded in the function. Because the conducted test provided evidence that changes in X leaves Y unaf-
fected, which may mean that there is no relationship between X and Y. Rejection of the null hypothesis

simply means that our estimate comes from a sample drawn from a population whose parameter  is dif-

ferent from zero and doesn’t mean that our estimate is the not correct estimate of the true pop-

ulation parameter
 0 and 1 .

3.7.2.1. The standard error test of the least square estimates


This test uses the standard error of the estimates as a test statistic to decide on the rejection or acceptance
of the null hypothesis. i.e., whether the sample from which they have been estimated has come from a
  
  
population whose true parameters are zero, ( 0 =0 , 1 =0, and /or 2 =0), or not.

To illustrate consider the following example.


Y  ˆ0  ˆ1 X 1  ˆ 2 X 2  ei
Let …………………………………….…… 3.87
Step 1: state the null vs alternative hypothesis
Formally, we test the null hypothesis against the alternative hypothesis which can be stated as follows
 
Ho :  0  0 vs. H1 :  0  0
 
Ho : 1  0 vs. H1 : 1  0

216
 
Ho :  2  0 vs. H1 :  2  0

Ho : 1  0
The null hypothesis ( ) states that, holding X2 constant X1 has no (linear) influence on Y. Sim-

Ho :  2  0
ilarly hypothesis ( ) states that holding X1 constant, X2 has no influence on the dependent vari-
able Yi.

Step 2: Obtain the standard errors of the estimates


We have already derived that variance and standard error of OLS estimates. The variance of the parame-
ter estimates are obtained by the following formula

ˆ 2  x 22i
SE ( ˆ1 )  var( ˆ1 ) 
x x 2
1i
2
2i  ( x1 x 2 ) 2

ˆ 2  x12i
SE ( ˆ2 )  var( ˆ2 ) 
x x2
1i
2
2i  ( x1 x2 ) 2

ˆ 2  x12i
SE ( ˆ2 )  var( ˆ2 ) 
x x
2
1i
2
2i  ( x1 x2 ) 2

, K being the total number of parameters to be estimated. In three variable case


σ^ =∑ u^ i /(n−K )
2 2
where
ei2
ˆ  2

K=3. ; where n3


Then, the corresponding standard errors compute as follows;

,
…………………3.87

Refers to Example 3.1 and computed standard error of the slopes, the estimate of the variance of the
random term is
ei2 140
ˆ 2    155.6
n  3 12  2

217

,
Step 2: Compare the standard deviations (errors) obtained in step two, with the numerical values of 0
 
1 and  2

ˆ0  Y  ˆ1 X 1  ˆ2 X 2


2
x1 y . x 2  x1 x 2 . x 2 y
ˆ1  2 2
x1 . x 2  ( x1 x 2 ) 2
2
x 2 y . x1  x1 x 2 . x1 y
ˆ 2  2 2
x1 . x 2  ( x1 x 2 ) 2

Step 3: Make decision

218
The acceptance or rejection of the null hypothesis has definite economic meaning.
SE ( ˆi )  1 2 ˆi
 If , accept the null hypothesis and reject the alternative hypothesis. We conclude

that
̂ i is statistically insignificant.
SE ( ˆi )  1 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude
 If 2

that
̂ i is statistically significant.

Example: suppose that from a sample of size n=30, we estimate the following supply function.
Y  20  0.6 X 1  0.33 X 2
SE : (1.7) (0.025) (0.035)
Test the significance of the slope parameter at 5% level of significance using the standard error
test.

Solution:

SE ( ˆ )  0.025 ( ˆ )  0.6 1
2 ˆ  0.3
SE ( ˆi )  1 2 ˆi
This implies that . The implication is that ̂ is statistically significant at 5% level of sig-
nificance. Refers to example 3.2 to test the significance of X 1 and X2 in determining the changes in
Y using the standard error in summarized tabular

Table 3.5: Summary of standard error tests


Coefficient Hypothesis to be Estimates(partial coef- Estimates di- Standard er- Conclusion with reasons
tested ficient of the explan- vided by two ror of estim-
atory variables) ates(SE)
 0.6 0.3 0.025 We do not reject H0 as SE is
Ho : 1  0 greater than half of the coeffi-
vs. cient

H1 : 1  0
 0.33 0.165 0.035 We do not reject H0 as SE is
Ho :  2  0 greater than half of the coeffi-
vs. cient

H1 :  2  0

Thus, the test show that the both coefficients are statistically insignificant implying the evidence does
not indicate relationship between variables.

Practical Exercise 3.8


Given information under practical exercise 3.5
i. Test the significance using standard error of the coefficients
219
ii. the calculate t-value for coefficients

3.7.2.2. Confidence interval and test of significance

If we invoke the assumption that ui ~N (0, σ2), then we can use the t test to test a hypothesis about any in-
dividual partial regression coefficient. That is, we just use the t-test (or Z-test) to test a hypothesis about
any individual partial regression coefficient. The test simply differ depending on whether we use one
tailed or two tailed hypothesis. The model we are going to test is specified as follows
Y  ˆ0  ˆ1 X 1  ˆ2 X 2

I. Test of significance
A. Two tailed
General procedure for two sided alternative hypothesis testing concerning the value of population
parameters involves the following steps;

Step-1: state null hypothesis (H0) and alternate hypothesis (H1). That is,
H 0 :   0 H 0 :   0

Step-2: test statistics: Depending on the sample size and relevant distribution properties (Z, t,
Chi-square and F) compute test statistics of estimates.

Bi  Bi 
Z for Bi 
  

 B0 ~N  B0 , se( B0 ) 
A. Z-test:  ( Bi ) then the distribution   .

ˆi  
t*  ~ t n -k
B. t-test: SE ( ˆi ) , If we have 3 parameters, the degree of freedom will be n-3. So;

,
ˆ 2
t* 
When  2  0, with n-3 degree of freedom, then the t* becomes: SE ( ˆ 2 ) follows t
distribution with n− 3 df. where: SE = standard error k = number of parameters in the model.

C.
 2 n  k distribution to test hypotheses about the true σ 2, chi-squar is a especial form of gamma

(n−3) σ^ / σ
2 2

distribution specified as , Furthermore,


follows the χ2 distribution with n-3 df.

220
Step 3: Select (specify) suitable significance level (  ).It is customary in econometrics to choose
the 5% or 1% level of significance.

Step-4: From the t-table value with n-3 degree of freedom and the given level of significance, find
t c n  2 ( / 2)
such that the area to the right of it is one half the levels of significance. i.e. P (t t*)   / 2
.

Step-5: To decide on the rejection or acceptance of the null hypothesis, the test compares the value of Z
obtained from the sample estimates with the critical values of Z at the given significance level. If Zc >
Zt, reject null hypothesis implying that the test is statistically significan, accept Ho otherwise. If t-dis-
tribution is used compare t * (critical value of t) and tc (the computed value of t): reject H0 if

and conclude that ̂ is significantly different from  at the level  accept other-
tc t * or tc  t *

wise.

B.One tailed
The procedure for one sided alternative testing follow the following steps
H 0 :   0 H 0 :   0
Step-1:
Step-2: compute test statistics using the relevant type of distribution
Step-3: define level of significance (  )
Step-4: From the t-table value with n-3 degree of freedom and the given level of significance, find the
t *n  2 ( )
point such that the area to the right of t* is equal to the level of significance. i.e. P (t t*)   .
Step-5: compare t* (critical value of t) and tc (the computed value of t) : reject H0 if

and conclude that ̂ is significantly different from ̂ at the level  . On contrary, if


tc t * or tc  t *

t*< tc , accept H0 and reject H1. The conclusion is ̂ is statistically insignificant.

II. Confidence Intervals


Steps to carry out a hypothesis test using confidence intervals is
H 0 :  i  0 vs H1 :  i  0 i  0,1, 2,3
A. Define Null vs alternative hypothesis

0 se(  0 ) 1 1
B. Estimate, , and , and se ( ) in the usual way

221
C. Choose a significance level, . This is equivalent to choosing a (1-)100% confidence
interval, i.e. 5% significance level = 95% confidence interval
D. Identify relevant distribution depending on sample size
E. Choose the location of the critical region: After defining the relevant distribution one
should compute the range of values of the test statistic which results in a decision to ac-
cept or reject the null hypothesis. That is, establish confidence interval with the probabil-
ity distribution of the test statistic is 100(1 − α) %. These are the theoretical (tabular)
value(s) that defines the boundary of the critical region or the acceptance region of
OLS estimators with certain degree of confidence. It is constructed in similar fashion
with the one in chapter. The decision of whether to choose a one tail or a two-tail
critical region depends on the form in which the alternative hypothesis is ex-
pressed.Most of the time is for two tailed test and interval ranges can be constructed
depending on the type of distribution:

i. Z- distribution : Substituting Z = ( ^βi− βi ) /σ ( ^β ) and rearranging slightly, we get P


i

{ }
^β −β
i i
−Z α < < Z α = 1- α…………………………………………………..……………..3.88
2
σ ( ^β ) 2
i

Upon arrangement, we have

{^ ^
P βi−Z α σ ( ^β )< β i< β i + Z α σ ( ^β ) = 1- α
2
i
2
i }
Thus the (1- α)100 percent confidence interval for β i is
^β −Z σ ^ < β < ^β + Z σ ^
i α (β) ii
i α (β ) i or β i = ^β i ± Z α σ ( ^β )
i
2 2 2

β^ i
ii. t- distribution: t= ∼ t n− K , K = 𝓀 + 1 = number of variables. In a two-tail test at 
se ( β^ )i


level of significance, the probability of obtaining the specific t-value either –tc or tc is 2 at

n-2 degree of freedom. The confidence interval is given by


i  t / 2 ,n  2 se(  i ) . i.e.

ˆi   i
t* 
Pr t / 2  t*  t / 2   1   SE ( ˆi ) ………………………….(3.89)
, but
Substitute (2.58) in (2.57) we obtain the following expression.
 ˆ  i 
Pr t / 2  i  t / 2   1  
 SE ( ˆi )  …………………………………..………(3.90)

222
 
Pr ˆi  SE ( ˆi )t / 2   i  ˆi  SE ( ˆi )t / 2  1        int erchanging

Then, the 100(1-𝛼) percent level of confidence interval for β i will be given by
β i= ^β i ± t α / 2 ,n− K se ( ^β i ) ∀ i ∈ [ 1 , K ] ………………………………..……3.91
The unknown population parameter, β i, will lie within the defined limits or the confidence

interval ^β i ± t α se( ^β ) with n – K degrees of freedom (1- α)100 times out of 100.
i
2

iii. χ2 distribution: we can use χ2 distribution to establish a confidence interval for σ2 as


P{ χ 1−α / 2< χ < χ α / 2 } = 1-𝛼………………………………………………………………………3.92
2 2 2

where the χ2 value in the middle of this double inequality is as given by the above equation and
2 2
where χ 1−α /2 and χ α /2 are two values of χ2 (the critical χ2 values) obtained from the chi-square

σ^
2
table for n − 2 df. Substituting χ2 = (n-2) 2 and rearranging the terms, we obtain
σ

{ }
2 2
σ^ 2 σ^
P ( n−2 ) 2
< σ < ( n−2 ) 2 = 1-𝛼……………………………………………………..3.93
χ α /2 χ 1−α /2
F. Step 5: make decision or draw an engineering conclusion: The decision to accept or reject the
hypothesis is made usually by comparing test statistics with critical region. If the hypo-
thesised value of  (*) lies outside the confidence interval, then reject the null hypothesis that 

= *, otherwise do not reject the null. This indicates


ˆi is statistically significant. But, if the value

H 0  i  0
of the parameter under the null hypothesis ( ) lies in this confidence region, the region of ac-

ceptance, do not reject the null hypothesis.The implication is that


ˆi is statistically insignificant or

the estimates are not statically different from zero. If the hypothesized value of
i in the null hy-

pothesis is outside the limit, reject H0 and accept H1. The above test can be illustrated graphically.

223
Figure 3.1: Two tailed confidence interval for normal distribution
To illustrate the mechanics with example consider the demand for certain commodity is assumed to be
ˆ ˆ ˆ
function of income(X2) and Price(X1) . Y   0  1 X 1   2 X 2  ei

Let us postulate that


H 0 : 1  0 H 0 : 1  0

The null hypothesis states that with X2 (income ) held constant, X1 (price) has no (linear) influence on Y
(quantity demand). If the regession result is found to be
Y  20  0.0056 X 1  0.85 X 2

For our illustrative example and noting that β2 = 0 under the null hypothesis, we obtain
0.0056
t  2.8187....................................................3.94
0.0020
Notice that we have 64 observations. Therefore, the degrees of freedom in this example are 61. If you re -
fer to the t table, we do not have data corresponding to 61 df. The closest we have are for 60 df. If we use
these df, and assume α, the level of significance of 5 percent, the critical t value is 2.0 for a two-tail test
(look up tα/2 for 60 df ) or 1.671 for a one-tail test (look up tα for 60 df).

Therefore, we use the two-tail t value. Since the computed t value of 2.8187 (in absolute terms) exceeds
the critical t value of 2, we can reject the null hypothesis that price has no effect on quantity demand.
For our example, the 95% confidence interval for β2 is:

which in our example becomes


0.0056  2(0.0020)   2  0.0056  2(0.0020)

that is
0.0096   2  0.0016...............................................................3.96
,

the interval, −0.0096 to −0.0016 includes the true β2 coefficient with 95% confidence coefficient. That is,
if 100 samples of size 64 are selected and 100 confidence intervals like above are constructed, we expect
95 of them to contain the true population parameter β2. Since the interval above does not include the null-
hypothesized value of zero, we can reject the null hypothesis that the true β2 zero with 95% confidence.
Thus, whether we use the t test of significance as in or the confidence interval estimation we reach the

224
same conclusion. However, this should not be surprising in view of the close connection between confi -
dence interval estimation and hypothesis testing.

Practical Exercise 3.9


A production function is estimated as
4.0 0.7 X 2 0.2 X 3 R 2=0.86
Y^ = + +
(0.78) (0.102) (0.102) n=23
where X 1= labor, X 2 = capital, and Y = output
Test the hypothesis β 1=0 , β 2=0 at 𝛼 = 5% using the test of significance and confidence interval ap-
proach.

3.7.2.3. . The P-value approach to hypothesis testing


In practice, one does not have to assume a particular value of α to conduct hypothesis testing. The t-test
can also be carried out in an equivalent ways using p-value. First, calculating probability that the ran -
dom variable t is greater than the observed tc, that is calculated

P  value  P (t tc )
………………………………….3.97

This probability is that same as the probability of type I error or the probability of rejecting a true hy -
pothesis. A high value for this probability implies that the consequences of rejecting the true Ho
are sever. A lower p-value implies that the consequence of rejecting a true H 0 erroneously are not
very sever(that is, the probability of making a mistake of type-I is low). Hence, we are “safe” in re-
jecting Ho. The decision rule is, therefore to accept Ho ( that is not reject it) if the p-value is too
high ,say more than 0.10, 0.05, or 0.01. In other words, if the P-value is higher than the specified

level of significance(say ) , we conclude that the regression coefficient is not significantly greater
0  
than at level of siginficance . If the P-value is less than we reject Ho and conclude that
 0
is significantly greater than .

P (t t*) 
The t-test and P-value are equivalent. If is less than the level, then the point correspond-
t *n  2 ( ).
ing to tc must necessarily be to the right of This means that t* will fall in the rejection
P (t t*) 
region. Similarly, if , then t* must be to the left of t c and hence fall in to acceptance

225
t *n  2 ( )
region. We accept Ho. In other word, find the point such that the area to the right of it
P (t t*)  
equals to the level of significance, ,the test is not conclusive

Example: One can simply use the p value given in above example is 0.0065. The interpretation of this p
value (i.e. the exact level of significance) is that if the null hypothesis were true, the probability of obtain-
ing a t value of as much as 2.8187 or greater (in absolute terms) is only 0.0065 or 0.65 percent, which is
P (t t*)
indeed a small probability, much smaller than the artificially adopted value of α = 5%. Hence,
=0.0065 is less than the level 5%,we accept Ho.

Practical Exercise 3.10


1. Assume data on Y and two independent variables(X 1 and X2) the test the hypothesis using
p-value approach
Table 3.6: Hypothetical data on Y , X1and X2

Y X1 X2
49 35 53
40 35 53
41 38 50
46 40 64
52 40 70
59 42 68
53 44 59
61 46 73
55 50 59
64 50 71

3.7.2.4. Testing the overall significance of the sample regression

Throughout the previous section we were concerned with testing the significance of the estimated partial
regression coefficients individually. That is, under the separate hypothesis that each true population par -
tial regression coefficient was zero. There is also a joint test of significance in which null hypothesis is
a joint hypothesis that β1 and β2 are jointly or simultaneously equal to zero. A test of such a hypothesis is
226
called a test of overall significance of the observed or estimated regression line. That is, whether Y is lin-
early related to both X1 and X2. It can be stated as:

 0  1   2  0 …………………………………………..3.98

Can the joint hypothesis be tested by testing the significance of and individually? The answer is
no. Because, in testing the individual significance of observed partial regression coefficient above, we
assumed implicitly that each test of significance was based on a different (i.e., independent) sample.

Thus, in testing the significance of under the hypothesis that β2 = 0, assumed tacitly that the testing

was based on a different sample from the one used for testing the significance of under the null hy-
pothesis that β1 = 0.

But, if we use the same sample data in test in the joint hypothesis testing for the two or more coefficient,
we shall be violating the assumptions underlying the test procedure. Although the statements of confi-
dence interval are individually true in a sense it derived using different sample, it may not hold true in
case of overall test of significance. That is, the 1   probability that the intervals involve the coefficient
̂1 is

 
Pr ˆ1  SE ( ˆ1 )t / 2  1  ˆ1  SE ( ˆ1 )t / 2  1  
……………………………………………..3.99

In same logic holds true for


 
̂ 2 . Pr ˆ2  SE ( ˆ2 )t / 2   2  ˆ2  SE ( ˆ2 )t / 2  1   ………3.100

 ˆ  t SE ( ˆ1 ), ˆ2  t / 2 SE ( ˆ2 ) 


But the probability that the intervals  1  / 2  simultaneously include β1 and β2
is (1 − α)2. Because, the intervals may not be independent when the same data are used to derive them as
a joint test of several hypotheses assume that any single hypothesis is “affected’’ by the information in
the other hypotheses. For instance, in equation (3.99 ) above we established a 95% confidence interval
for β1, if we use the same sample data to establish a confidence interval for β2 (say, with a confidence co-
efficient of 95%), we cannot assert that both β1 and β2 lie in their respective confidence intervals with
stated probability. The probability of joint hypothesis in this case will be (1 − α)(1 − α) = (0.95)(0.95)
not the usual 95% confidence interval . This violates the rule of confidence interval estimation. The
upshot of the preceding argument is that for a given example (sample) only one confidence interval or
only one test of significance can be obtained. We cannot use the usual t test to test the joint hypothesis
that the true partial slope coefficients are zero simultaneously.
227
It is common to use Analysis of Variance (ANOVA) Approach to testing the overall significance of
an observed multiple regression or the F Test. ANOVA is statistical method derived by Fisher for
analysis of experimental data. The F-test work by decomposing sample variation which leads to an
alternative approaches to testing hypothesis. It also provide numerical values for the influence of
explanatory factors on the dependent variables in addition to the information concerning breaking
down of the total on variation of Y in to additive components: explained sum of squared (ESS) and
unexplained variation(RSS). We test whether null hypothesis (Ho) of variation around a constant
population coefficients(mean Y ) is rejected in favor of regression model. This amount to asking
whether the systematic component measured by ESS is significantly large in relation to the unsys-
tematic components(RSS). It can be demonstrated as indicated using table 3.7. The purpose of present-
ing the table is to test the significance of ESS.

Table 3.7: ANOVA for three variable regerssion model

Sources of variation Sum of Square(SS) Degree of free- Mean Sum of square


dom(df)
Due to regression (ESS) k-1=2

Due to residual(RSS) n-3

Total (TSS) y i
2
n-1

Recall the identity: ………………….3.101

When K is number of coefficient in the model , the number of restriction imposed is equal to k-1,

which is related to . Accordingly, ESS has 2 df since it is a function of and , RSS has
n− 3 df, and TSS has n-1 df . Now, F testisics with 2 and n− 3 df under the assumption of normal dis-

tribution for ui and the null hypothesis can be

……………………………….. 3.102
228
ui  N (0,  2 )
It can be proved that under the assumption that the

…………………………………….………... 3.103

With the additional assumption that it can be shown that

………………………………….……… 3.104

Therefore, if the null hypothesis is true, both (3.103) and (3.104) give identical estimates of true σ2. This
imply that if there is a trivial relationship between Y and X1 and X2, the sole source of variation in Y is
due to the random forces represented by ui. If, however, the null hypothesis is false, that is, X1 and X2
definitely influence Y, the equality between (3.103) and (3.104) will not hold. In this case, the ESS will
be relatively larger than the RSS, taking due account of their respective df. Therefore, the F value of
(3.102) provides a test of the null hypothesis that the true slope coefficients are simultaneously zero. If
the F value computed from (3.102) exceeds the critical F value from the F table at the α percent level of
significance, reject H0; otherwise accept it. If the computed value of F is greater than the critical value of
F (k-1, n-k), then the parameters of the model are jointly significant or the dependent variable Y is lin-
early related to the independent variables included in the model. Alter+natively, if the p value of the ob-
served F is sufficiently low, we can reject H0.

Example: Given the ANOVA data in table 3.8 for certain hypothetical data compute the F-test

Table 3.8: Hypothetical ANOVA data

Using equation 3.102, we obtain


128, 681
F  73.83
1742.88 …………………………………………………………..3.105
The p value of obtaining an F value of as much as 73.8325 or greater is almost zero, leading to the rejec-
tion of the null hypothesis that together X 1 and X2 have no effect on Y. If you were to use the conven-
229
tional 5 percent level-of-significance , the critical F value for 2 df in the numerator and 60 df in the de-
nominator (the actual df, however, are 61) is about 3.15 or about 4.98 if you were to use the 1 percent
level of significance. Obviously, the observed F of about 74 far exceeds any of these critical F values.

Practical Exercise 3.11


Given the ANOVA table as follow, answer the following questions
Table 3.9: The ANOVA table for the regression

Source of variation Sum of Degree of Mean


squares freedom square
Due to regression 51190.3 3 51190.39
9
Due to Error 5232.47 143 36.591
Total 56422.8 146
6

3. what is F statistics
 F0.05 (3,143)  6.63
4. if F tabulated with 3,143 at =0.05 is ( ) then what will be your overall signific-
ance test decision

3.7.2.5 . Important Relationship between R2 and F

There are difference between R2 and ANOVA. But there is an intimate relationship between the coeffi-
cient of determination R2 and the F test used in the analysis of variance. Assuming the normal distribu-

tion for the disturbances term ui and the null hypothesis that , we have seen that

ESS
F 2
TSS
n  3 …………………….………………………….........3.106
is distributed as the F distribution with 2 and n− 3 df.

Table 3.10: ANOVA table in terms of R2

230
Sources of variation Sum of Square(SS) Degree of free- Mean Sum of square
dom(df)
Due to regression (ESS) R 2 ( yi 2 ) 2 R 2 ( yi 2 )
2
Due to residual(RSS) (1  R )( yi )
2 2
n-3 (1  R )( yi 2 )
2

n3
Total (TSS) y i
2
n-1

R 2 ( yi 2 ) R2
F 2  2 .............................3.107
(1  R 2 )( yi 2 ) 1 R2
n3 n3
For the three-variable case becomes

One advantage of the F test expressed in terms of R2 is ease of computation: All that one needs to know is
the R2 value. Therefore, the overall F test of significance given in above equation 3.98 can be estimated
in terms of R2 as shown in Table 3.10.

Practical Exercise 3.12


For the above table in practical exercise 3.10 conduct overall test of significance in two approach using
R2 and component analysis?

3.8. The general linear regression model


In this section we will extend the method of least squares to models with k explanatory variables. There
are some rules of thumb by which we can derive (a) the normal equations, (b) the coefficients of multi-
ple determination, (c) the variances of the coefficients for relationships including any number of ex-
planatory variables.

3.8.1. Derivations of the normal equations


The general linear regression model with 𝓀 explanatory variables can be
Y i=β 1 + β 2 X 2i + β 3 X 3 i +…+ β K X Ki + ui…………….3.108

There are K parameters to be estimated (K = 𝓀+1). Clearly the system of normal equations will consist of
K equations, in which the unknowns are the parameters ^β 1, ^β 2, ^β 3, …, ^β K , and the known terms will be
the sums of squares and the sums of products of all the variables in the structural equation.

The sample counterpart or estimated relationship of the above general equation can be
Y i= β^ 0 + β^ 1 X 1 i+ …+ ^β K X Ki + u^ i…………………………3.109

231
In order to derive the K normal equations without the formal differentiation procedure, we make use of
the following assumptions
∑ u^ i=0 and ∑ u i X j=0 where (j = 1, 2, 3, …, K)
The normal equations for a model with any number of explanatory variables may be derived in a mechan-
ical way, without recourse to differentiation. We will introduce a practical rule of thumb, derived by in -
spection of the normal equations of the two-variable and the three-variable models, then k-explanatory
variables. We begin by rewriting these normal equations.
i. Model with one explanatory variables
Structural form :Y   0  1 X 1i  ui

Then using ∑ u^ i=0

………………………………………………3.110

and ∑ u i X j=0

……………………………………………3.111
Rearrangement of the above eq-3.110 and 3.111 we obtain normal equation as follow;

232
8. Models with two explanatory variables
Structural form :Y   0  1 X 1i   2 X 2i  ui

In similar manner, we derive  u and  u X


i i i
the normal equations

Y       X    X
0 1 1i 2 2i

 YX    X    X    X X
1i 0 1i 1 1i
2
1 1i 2i

 YX    X    X X    X
2i 0 2i 1 1i 2i 2 2i
2

We can generalize the procedure above to find the K th normal equations for the K-variables which may
be obtained by multiplying the estimated form of the K-variable model by X Ki. Then, summing over all
sample observations to get normal equations. The estimated form of the model is

…………………………3.112
Finding ui from the from the above

………………….………..3.113

Summation over the error u i

……….……..3.114

By assumption 
ui X i  0
multiplication by X Ki yields
Y i X Ki= β^ 0 X Ki + ^β 1 X 1i X Ki + ^β 2 X 2i X K i +…+ β^ K X Ki + u^ i X Ki…………………………..3.115
2

and summation over the n sample observation gives the required Kth equation and u X
i i 0

∑ Y i X Ki = ^β1 ∑ X Ki + β^ 2 i ∑ X 2 i X Ki + β^ 3 ∑ X 3 i X Ki +…+ ^β K ∑ X 2Ki …………………3.116


∑ u^ i X Ki =0

3.8.2. Variance for general k-variables


We may generalize the above expressions of the variances of the coefficient estimates in equation 3.77
and 3.78 to k-varibles. The variances of the estimates of the model including 𝓀-explanatory variables
can be computed by the ratio of two determinants: the determinant appearing in the numerator is the mi -
233
nor formed after striking out the row and column of the terms corresponding to the coefficient whose
variance is being computed; the determinant appearing in the denominator is the complete determinants
of the known terms appearing on the rihgt-hand side of the normal equations. For example, the variance
of ^β K is given by the following expression.

| |
∑ x 22 i ∑ x 2 i x 3 i … ∑ x 2 i x Ki
∑ x 2 i x 3 i ∑ x23 i … ∑ x 3 i x Ki
⋮ ⋮ ⋮
^
∑ x 2i x Ki ∑ x 3 i x Ki ∑ x 2Ki
var ( β K ) =σ .
2

| |
…........3.117
∑ x 22 i ∑ x 2 i x 3 i … ∑ x 2 i x Ki
∑ x 2 i x 3 i ∑ x23 i … ∑ x 3 i x Ki
⋮ ⋮ ⋮
∑ x 2i x Ki ∑ x 3 i x Ki ∑ x 2Ki

3.8.3. Generalization of the formula for R2 and R2


A. R2
The generalization of the formula of the coefficient of multiple determination may be derived by inspec-
tion of the formulae of R2 for the two-variable and three-variable models.

2
^β ∑ y x
1 i 1i
1. Model with one explanatory variable , RY . X =
∑ y 2i
1

2
^β ∑ y x + ^β ∑ y x
1. Model with two explanatory variables, RY . X X = 1 i 1i 2 i 2i
1 2
∑ yi2

2. Accordingly, the formula of the coefficient of multiple determination for the K-variable
model is

2 β^ 1 ∑ y i x 1 i+ β^ 2 ∑ y i x 2 i +…+ β^ K ∑ y i x K
RY . X = ………………………….3.118
∑ y 2i
1 …XK

For each additional explanatory variable the formula of the R-squared includes an additional
term in the numerator formed by the estimate of the parameter corresponding to the new vari-
able multiplied by the sum of products of the deviations of the new variable and the depen-
dent one.

B. The adjusted coefficient of determination: R2


The inclusion of additional explanatory variables in the function can never reduce the coefficients of
multiple determination and will usually raise it. By introducing a new regressor we increase the value
234
of the numerator of the expression for R2, while the denominator remain the same (∑ yi2 ,the total vari-
ations of Yi is given in any particular sample). To correct this defect we adjust R 2 by taking into ac-
count the degrees of freedom, which clearly decrease as new regressors are introduced in the function.
The expression for the adjusted coefficient of multiple determinations is

2
R =1−(1−R )
2 n−1
n−K
or
2
R =1−
[ ∑ yi2 /(n−1)]
∑ u2i /(n−K ) …………………………….3.119

where R2 is the unadjusted multiple correlation coefficient, n is the number of sample observations and
K is the number of parameters estimated from the sample. If n is large R2 and R2 will not differ much.
But with small samples, if the number of regressors (X’s) is large in relation to the sample observa-
tions, R2 will be much smaller than R 2 and can even assume negative values, in which case R2 should
be interpreted as being equal to zero.

Other criteria are often used to judge the adequacy of a regression model besides R2 and adjusted R2 as
goodness of fit measures are Akaike’s Information criterion and Amemiya’s Prediction criteria,
which are used to select between competing models.

3.8.4. Overall test of significance


We extend the idea of joint test of the relevance of all explanatory variables to many variables cases.

Now consider the following: Y   0   1 X 1   2 X 2  .........   k X k  U i


H 0 :  1   2   3  ............   k  0

H 1 : at least one of the  k is non-zero

This null hypothesis in a joint hypothesis are jointly or simultaneously equal to zero. A test of such a
hypothesis is called a test of overall significance of the observed or estimated regression. That is,

whether Y is linearly related to


X 1 , X 2 ,........ X k . The joint hypothesis cannot be tested by testing indi-

vidual significance of
̂ i ’s as the above. In testing the individual significance of an observed partial re-

gression coefficient, we assume implicitly that each test of significance was based on different (i.e. in-
  0 , it was assumed
dependent) sample. In testing the significance of ̂1 under the hypothesis that 1

tacitly that the testing was based on different sample from the one used in testing the significance of
̂ 2

2  0
under the null hypothesis that . But to test the joint hypothesis, we shall be violating the assump-
tion underlying the test procedure.

235
The test procedure for any set of hypothesis jointly can be based on a comparison of the Restricted
Residual Sum of Square (RRSS), the sum of squared errors in the model obtained assuming that the
null hypothesis is true and unrestricted residual sum of square (URSS), the sum of the squared error of
the original unrestricted model. If the null hypothesis is true, we expect that the data are compliable
with the conditions placed on the parameters and there would be little change in the sum of squared er-
rors. If these sums of squared errors are substantially different, the data do not most likely support the
null hypothesis. In other word, if the null hypothesis is not true, then the difference between RRSS and
URSS (TSS & RSS) becomes large, implying that the constraints placed on the model by the null hy-
pothesis have large effect on the ability of the model to fit the data, and the value of F tends to be large.
Thus, we reject the null hypothesis if the F test static becomes too large. It is always true that RRSS -
URSS  0.

Y  ˆ0  ˆ1 X 1  ˆ2 X 2  .........  ˆk X k  ei


Consider .
Yˆ  ˆ0  ˆ1 X 1i  ˆ 2 X 2i  .........  ˆ k X ki
We know that:
Yi  Yˆ  e

ei  Yi  Yˆi

ei2  (Yi  Yˆi ) 2


This sum of squared error is called unrestricted residual sum of square (URSS). This is the case when

the null hypothesis is not true or at least one of the


 k is different from zero.

If we set the joint hypothesis that all coefficients are zero. The test of joint hypothesis is that:
H 0 :  1   2   3  ............   k  0

H 1 : at least one of the  k is different from zero.

when all the slope coefficients are zero,the model becom


Y  ̂ 0  ei

̂ 0 
Y i
Y 
n (applying OLS)…………………………………………….3.120
e  Y  ̂ 0
but
̂ 0  Y

e  Y Y

ei2  (Yi  Yˆi ) 2  y 2  TSS

236
The sum of squared error when the null hypothesis is assumed to be true is called Restricted Residual
Sum of Square (RRSS) and this is equal to the total sum of square (TSS).

F-distribution with k-1 and n-k degrees of freedom for the numerator and denominator respectively can
RRSS  URSS / K  1
F( k 1,n  k ) ~
be computed as the ratio: URSS / n  K ………………….…………… 3.121
URSS  ei2  y 2  ˆ1yx1  ˆ2 yx2  ..........ˆk yxk  RRSS
When RRSS  TSS and

(TSS  URSS ) / k  1
F( k 1,n  k ) 
URSS / n  k then it follows that

ESS / k  1
F( k 1,n  k ) 
URSS / n  k ……………………………………….………………. 3.122

follows the F distribution with k− 1 and n− k df. Needless to say, in the three-variable case (Y and X2,
X3) k is 3, in the four variable case k is 4, and so on. Most regression packages routinely calculate the F
value (given in the analysis of variance table) along with the usual regression output, such as the esti-
mated coefficients, their standard errors, t values, etc.

The decision can be made by comparing F calculated with the critical value of F which leaves the
probability of  in the upper tail of the F-distribution with k-1 and n-k degree of freedom. If the com -
puted value of F is greater than the critical value of F (k-1, n-k), then the parameters of the model are
jointly significant or the dependent variable Y is linearly related to the independent variables included
in the model. i.e F > Fα,(k−1,n−k), we reject H0; otherwise acceptH0. Alternatively, if the p value of F ob-
tained is sufficiently low (below α ), reject H0.

Alternatively, the overall significance test can be done using R 2 which is an extension of 3.102.
Let us manipulate the equation as follows:

If we divide the above numerator and denominator by y  TSS then:


2

237
ESS
/ k 1
TSS R2 / k 1
F 
RSS
/ k  n 1 R / n  k
2

TSS ……………………………………………..(3.114)
Alternatively
ESS / k  1 (n  k ) ESS
F 
RSS / n  k (k  1) RSS

(n  k ) ESS (n  k ) R 2
TSS 
(k  1) (TSS  ESS ) (k  1)(1  ESS
TSS
)
= TSS

R2
(n  k ) R
2
(k  1)
Fk 1,n  k ,  
(k  1)(1  R ) (1  R )
2 2

(n  k )
……………………... 3.115
where R = ESS/TSS. Equation (3.115) shows how F and R2 are related. These two vary directly.
2

When R2= 0, F is zero ipso facto. The larger the R2, the greater the F value . In the limit, when R2 = 1, F
is infinite. This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R 2
& 1-R2. Thus the F test, which is a measure of the overall significance of the estimated regression is
also a test of significance of R2.

Practical Exercise 3.12


Let our model have four variables (say Y, X1, X2, X3) then
 Present the general model involving such variables
 Present methods of estimation using OLS
 Variance for general 3-variables
 Derive R2 and adjusted R2
 Overall test of significance

238
Summary

In this chapter we have extended econometrics of simple linear regression model to case of many vari-
ables in multistep process starting with the three variable linear regression model–one dependent vari-
able and two explanatory variable to n variable case. Although in many ways a straightforward ex-
tension of the two–variable linear regression model, the three variable model introduced several new
concepts such as partial regression coefficients an adjusted and unadjusted multiple coefficient of
determination

There two common ways of representation: Algebraic and matrix approach. Matrix approach is most
preferred approach as the numbers of variable increase it is possible to write it in compact form. We can
use OLS, method of moment, or maximum likelihood estimation techniques of estimation. We have seen
how OLS can be used in both, algebraic and matrix approach. The properties of OLS estimators of the
multiple regression model parallel those of the two variable model. More specifically; regression line
X3
passes through the means ( Y , X 2 , and ); mean of the residual is zero and the residual are uncorrelated

with X1 and X2. Furthermore, the mean value of the estimated is equal to the mean value of the actual Y i.
Given the assumptions of the classical linear regression model OLS estimators of the partial regres-
sion coefficient not only are linear and unbiased , abut also have minimum variance.

Although in many ways the concepts developed in simple linear regression model can be applied straight-
forwardly to the multiple regression models, a few additional features which are unique to such models
will receive more attention in this part.

Once we considered general framework and estimation of technique of multiple, we have hypothesis test-
ing significance. In multiple regression models we will undertake many tests of significance. The are
several kinds of hypothesis testing that one may encounter in the context of a multiple regression model.
Some of several interesting hypothesis testing are testing hypotheses about an individual partial regres-
sion coefficient; testing the overall significance of the estimated multiple regression model, that is, find-
ing out if all the partial slope coefficients are simultaneously equal to zero; Testing that two or more coef-
ficients are equal to one another or testing linear equality restrictions ; testing that the partial regression
coefficients satisfy certain restrictions; testing the stability of the estimated regression model over time or
in different cross-sectional units; and testing the functional form of regression models.

This test of significance is the same as the tests discussed in simple regression model. The procedure for
testing the significance of the partial regression coefficients is the same as that discussed for the two-vari-
able case. Since sampling errors are inevitable in all estimates, it is necessary to apply test of significance
in order to measure the size of the error, determine the degree of confidence and to measure the validity
of these estimates. This can be done by using various tests such as standard error test; confidence inter-
val (Z-test, student’s t-test); P-value test; and F-test. Using test of significance if the hypothesized

value of
i in the null hypothesis is outside the limit, reject H and accept H . This indicates ˆi is statis-
0 1

239
tically significant. If the hypothesized value of
i in the null hypothesis is within the confidence interval

or if t*< tc, accept H0 and reject H1. The implication is that


ˆi is statistically insignificant. The con-

clusion is ̂ is statistically insignificant.

Key terms

Economic criteria p-value


Econometric criteria t-value
Statistical criteria Regression sum of square
Method of moment Goodness of fits
Maximum likelihood methods Multiple linear regression model
Best linear unbiased estimator Total sum of square
Confidence interval Test statistics
Sample regression line Statistically significant

Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers

240
Review Question

Part I: Multiple choices


1. which one of the following is true about the confidence interval when one state non directional

null hypothesis with level of significance ( )

Pr ˆ  SE ( ˆ )t    ˆ  SE ( ˆ )t
 / 2, n  k  / 2, n  k   1
a.
 
Pr ˆ  SE ( ˆ ) Z / 2    ˆ  SE ( ˆ ) Z / 2  1  
b.
 
Pr ˆ  SE ( ˆ ) Z / 2, n  k    ˆ  SE ( ˆ ) Z / 2, n  k  1  
c.
 
Pr ˆ  SE ( ˆ ) F / 2, n  k    ˆ  SE ( ˆ ) F / 2, n  k  1  
d.
e. A and B
f. none
1. P- Value
a. Observed significance level of a statistical test is the smallest value of  for which H0 can
be rejected.
b. is the smallest value of  for which a test result is considered statistically significant.
c. It is the actual risk of committing a type I error,
d. simple as compared to confidence interval
e. all
f. none
2. which one of the following is true
a. there are two type of error when making decision namely Type-I and Type-II errors
b. type-I error arise by rejecting H0 when it is true
c. Type –II error is there when Ho is not rejected when it is false
d. probability of type-I error is level of significance
e. all
f. none
3. which one of the following is true about distribution
a. Z distribution is appropriate for known population variance and n greater than 30
b. t distribution is used if population variance is unknown and sample is small
sample statistics  hypothesized value of population
test statsics 
s tan dard divation
c.

d. non-directional hypothesis for Y=B0+B1Xi+Ui , H0:Bo=0,vs.H1 0
e. all
f. none
4. Linearity is one assumption of classicalist in simple regression analysis, which of the following
satisfies this assumption
241
a. LnY2= +1X1i +2X2i +Ui
b. Y= +(1/1) X1i +2X2i +Ui
c. Y= +1X21i +2X2i +Ui
d. LnY= +1LnX1i +2LnX2i +Ui
e. Yi= +2Xi + Ui
f. A, C, and D
g. all
h. none
5. Which one of the following is true?
a. multiple linear regression is an extension of simple linear regression
b. assumption of simple linear regression model is similar with multiple linear regression
c. for two independent variable multiple regression model is specified as
Y  Bo  B1 X 1i  B2 X 2i  U i

d. it assume no exact correlation relationship X1i and X2i


e. all
f. none
Y  Bo  B1 X 1i  B2 X 2i  U i
6. which one of the following is true about
a. B1 and B2 are partial regression coefficients
b. one can conduct individual as well as overall (F-test)coefficient tests for such model
c. for over all test the null hypothesis is defined as H0=B1=B2=0
B  Y  ˆ X  ˆ X
0 1 1i 2 2i
d.
e. all
f. none
7. F-test in multiple linear regression
ESS / n  1
F
RSS / n  k
a.
b. it is concerned with joint tests of estimators
c. it is overall test of significance
R2 / n 1
F
(1  R 2 ) / n  k
d.
e. all
f. none
8. An estimator is said to be____________ if it uses all the information about the population para-
meter contained in the sample
a. Unbiased b. sufficient c. reliable d. efficient f. none
Y  Bo  B1 X i  U i
9. Which representation of hypothesis is likely condition for ?

242

H o : B1  0
H o : Bo  Bo
 H1  B1  0
H1  Bo  Bo c.
a. 
H o : Bo  Bo
H o : B1  0

H1  B1  0 H1  Bo  Bo
b. d.
e. all
f. none

Part-II: Short answer

1. State assumptions of multiple regressions?


2. What is the main reason for using matrix vector in the formulation of the multiple linear regression
models?
3. Mention steps of hypothesis testing using standard error

Part III: Workout


1. Given Yi  23.75  0.25 X 1i  5.5 X 2i R 2  0.998


(0.041) (0.612) (0.007)
Where Y is annual salary in thousands, X1 is years of education past high school and X 2 is years of expe-
rience. Figures in parenthesis are p-values. Answer the following!
a.Interpret the coefficient of X1
b. Is the statement above valid based on the data provided? (That is, do we have sufficient evi-
dence to support the statement or interpretation you gave in (A) above? Explain!
c.Interpret the coefficient of X 2. Is there sufficient evidence supporting the inference you made in
your interpretation?
d. What is the effect of a simultaneous change in X1 and X2 on Y?
e.Suppose the dependent variable was measured in logarithms. How would this change your inter-
pretation in (A) above?
2. On the basis of the information given below answer the following question related to model
Y     1 X 1i   2 X 2i  U i
specified as:
.
X 1  3200
2
X 1 X 2  4300 X 2  400
X 22  7300 X 1Y  8400 X 2Y  13500
Y  800 X 1  250 n  25
Yi 2  28,000
a. Find the OLS estimate of the slope coefficient

243
ˆ
b. Compute variance of  2
c. Test the significant of  2 slope parameter at 5% level of significant
2 2
d. Compute R and R and interpret the result
e. Test the overall significance of the model

3. The following data is provided where X 1 and X2 are independent variable and Y is de-
pendent variable

Y X1 X2
50 15 8
15 17 3
35 19 5
47 18 12
40 21 5
22 14 10
36 16 8
55 15 9

a. the constant term of the regression model


b. The coefficient of X2
c. The standard error of the coefficient of the constant term
d. The calculate t-value of X2
e. The coefficient of determination
4. The following data is provided where X1 and X2 are independent variable and Y is de -
pendent variable
Y X1 X2
50 150 8
15 170 3
35 190 5
47 188 12
40 210 5
22 145 10
36 160 8
55 150 9

b. calculate the multiple regression equation and interpret


c.calculate the standard error of the estimate
244
d. predict the value of Y when X1=200 and X2= 15
e.calculate the total variation , explained variation and unexplained variation and coeffi -
cient of multiple determination
f. standard errors of the estimated parameters conduct test of significance
g. construct 95% confidence interval estimation of the interpret and slopes
h. find adjusted R2
i. test if the regression is significance at 0.07 level of significance

Chapter Four: Violations of basic Classical Assumptions

4.0 Introduction
In both the simple and multiple regression models, we made important assumptions about the distribu -
Yt
tion of , Xi, and error term ‘ut’. It was on the basis of those assumptions that we try to estimate, and
test the significance the model. Violation of those assumptions lead to many problems related to func -
tional form, error terms and hypothesis testing. But the question is what would be the implications if
some or all of these assumptions are violated. For better understanding assumptions along with corre-
sponding violation are summarized in Table 4.1 below.
Table 4.1: The assumption of CLRMA and violation

Assumptions Mathematical Violations and its implications


expressions

Zero mean of error term E (U i )  0 It lead to biased intercept

Homoskedaticity var(ui )   2 Heteroskedaticity or it lead to a situation where variance varies


from sample to sample

Serial independence of cov(ui , u j )  0 Autocorrelations or a situation where error terms are correlated.
error terms

Normality of ui U i  N (0,  2 ) Non normality , Outlier which create hypothesis testing prob-
lems

X is variable var( x) is not 0 It lead to a situation in which there is errors in variables

X is non – stochastic cov( X i , ui )  0 Endogeinity or regressors are correlated with the error term, or
OLS estimator is biased property

Serial independence of cov( X i , X j )  0 multicolinearity among the explanatory variable


explanatory variables

245
Linearity Y    x  u non linearity and /or wrong regressor or wrong functional form

In the subsequent sections, focus will be given to the problems of heteroskedaticity, autocerrelation,
multicollinearity, non-normality and model specification. Under each of these case, we look at nature of
the violation along with examples; reasons for such problems; consequences of such violations on least
square estimators; ways of detect the presence of those problems and remedial measures.

4.1. Heteroscedasticity
4.1.1 The nature of Heteroscedasticty
In the classical linear regression model, one of the basic assumptions is that the probability distribution
ui
of the disturbance term remains same over observations of X. That is, the variation of each around its
zero mean does not depend on the value of explanatory variable. This feature of homogeneity of vari-
ance is known as homoscedasticity. To make the concept clear assume the two-variable model
Y i=β 1 + β 2 X i+u i. The variance of each u i remains the same irrespective of values of the explanatory

variable, Xi. Mathematically,


 u2 is not a function of X; i.e.  2  f ( X i )

var(ui )   ui  (ui )   (ui2 )   u2


2

Symbolically, ……………………4.1

If the variance of U were the same at every point or for all values of X, it can be plotted in the three di-
mensions as presented in figure 4.1(a). This holds if the observations of the error terms are drawn from
identical distributions. However, in applied works disturbance terms do not have the same variance. The
variance of error term is differs depending on observations as presented in figure 4.1(b).

(a) Homosecdastic error variance (b) Heteroscedastic error variance


Figure 4.1: Error variance and their types

246
This condition of non-constant variance or non-homogeneity of variance is known as heteroscedasticity.
var(u i )   u2 var(u i )   ui2  u2 is not constant but its
That is, (a constant) but (a value that varies). If

value depends on the value of X; it means that


 ui2  f ( X i ) . This is indicated in Fig (b) which shows

Yi ui
that the conditional variance of (which in fact is ) increases as X increases.
Generally, we can encounter three types of Heteroscedasticity:
i. Increasing heteroscedasticity: This is a case where variance of the stochastic term (monoton-
ically) increasing. That is, as X increases, so does the variance of u . This is the common
form of heteroscedasticity assumed in econometric applications.
ii. Decreasing heteroscedasticity: As X assumes higher values the deviation of the observations
from the regression line decreases. The variance of the random variable changes in the op-
posite direction to the explanatory variable.
iii. Cyclical heteroscedasticity: The variance of u decreases initially as X assumes higher values,
but after a certain level of X , the variance of u increases with X .

The above types are depicted diagrammatically in the following figures 4.2. In panel (a)
 u2 seems to in-

crease with X but in panel (b) the error variance appears greater around in middle range of X’s, tapering
off toward the extremes. Finally, in panel (c), the variance of the error term is greater for low values of
X, declining and leveling off rapidly an X increases.

Figure 4.2: Patterns of heteroskedaticity


The pattern of hetroscedasticity would depend on the signs and values of the coefficients of the relation-

ship
 ui2 and X, but u i ’s are not observable. In applied research we make convenient assumptions about

specific functional form to be used,


 ui2  f ( X i ) . Most recommended form has the following functional

K
 ui2 
  K (X )
2 2 2
  K (X i )
2 2
X i ….etc.
from such i. ui i ii. iii.
Practical Exercise 4. 1
Mention three types of heteroscedasticity along with example?
247
4.1.2.. Examples and Type of data heteroskedaticity
4.1.2.1. Examples of heteroscedastic functions
I. Consumption Function:
Suppose we are to study the consumption expenditure from a given cross-sectional sample of family
C i     Yi  U i Ci  Yi 
budgets specified as ; where: consumption expenditure of ith household; dis-
posable income of ith household. At low levels of income, the average consumption expenditure is low,
and the variation below this level is less possible; as consumption cannot fall too far below certain criti -
cal level because this might mean starvation. On the other hand, it cannot rise too far above certain level
as money income does not permit it. Such constraints may not be found at higher income levels. Thus,
consumption patterns are more regular at lower income levels than at higher levels. This implies that at
high income level u' s will be high; while at low incomes u' s will be small. Hence, there is het-
eroscedasticity. Symbolically,
E ( u 2i ) =σ 2i …………………………………….………… 4.2
Notice the subscript of σ 2, reminds us that the conditional variances of ui (= conditional variances of Y i)

are no longer constant. The assumption of constant variance of u' s does not hold when estimating the
consumption function from across section of family budgets. Here, the variances of Y i are not the same.
ii. Production Function: Suppose we are required to estimate the production function specified in gen-

eral form as Y  f ( K , L) from a cross-sectional random sample of firms. Disturbance terms in the pro-
duction function would stand for many factors; like entrepreneurship, technological differences, selling
and purchasing procedures, differences in organizations, etc. other than labor (L) and capital (K) consid-
ered in the production function. The factors mentioned above, which are not considered explicitly in the
production function show considerable variability in large firms than in small ones. This leads to break-
down of our assumption of homogeneity of variance terms.
iii. Wage and company
Average wages rise with the size of firm. For firm with smaller size the payment is low and variabil-
ity tend to be less. As larger firms pay more on average, but there is more variability in their wages.
The variance increases as firm size increases and hence heteroskedaticity.
iv. Savings
Savings increases with income, so does the variability of savings. As incomes grow, people have more
discretionary income, so more scope for choice about how to dispose and how much to save. The higher
income families on the average save more than the lower-income families. Estimating the savings func-

248
tion from a cross-section of family budget with assumption of constant variance of the u's is not appro-
priate.

4.1.2.2. Type of data and heteroskedaticity

It should be noted that the problem of heteroscedasticity is likely to be more common in cross-sectional
than in time series data. In cross-sectional data, one usually deals with members of a population at a
given point in time, such as individual consumers or their families, firms, industries, or geographical
subdivisions such as state, country, city, etc. Moreover, these members may be of different sizes, such as
small, medium, or large firms or low, medium, or high income. In time series data, on the other hand,
the variables tend to be of similar orders of magnitude because one generally collects the data for the
same entity over a period of time. Examples are GNP, consumption expenditure, savings, or employ -
ment over some period of time.
Practical Exercise 4. 2
Why heteroscedasticity is common in cross-sectional data ?

4.1.3. Reasons/sources for Hetroscedasticity


ui
There are several reasons why the variances of may be vary. Some of these are:

1. Error learning model: it states that as people learn more and gained experience, their error of be -

havior tend to become smaller over time. In this case


 i2 is expected to decrease. For example:

typing errors varies with typing practice as the number of hours of typing practice increases, the
average number of typing errors as well as their variance decreases.
2. As incomes grow, people have more discretionary income and hence more scope for choice
about the disposition of their income. Hence, σ 2i is likely to increase with income. For instance,
in the regression of savings on income, one is likely to find σ 2i is increasing with income because
people have more choices about their savings behavior. Because high income families show a
much greater variability in their saving behaviour than do low income families. High income
families tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than cutting down their consumption expenditure. On the other hand, low in-
come families save for certain purposes, and thus their saving patterns are more regular. This im-
plies that at high incomes the u' s will be high, while at low incomes the u' s will be small.
3. Heteroskedasticity may be because of change in sampling procedures. As error term (ui) ex-
presses the effect of measurement errors on the dependent variable, there is a cogent reason for
expecting ui to vary over time, in most cases. For example as Y increases, errors of measurement

249
tend to increase, because it becomes more difficult to collect data, and check its consistency and
reliability. Furthermore, the errors of measurement tend to be cumulative over time, so that their
size tends to increase. In this case the variance of u increases with increasing values of X. With
the sampling techniques and data collection methods may continuously improve over time, and
errors of measurement may decrease. For example, banks that have sophisticated data processing
equipment are likely to commit fewer errors in the monthly or quarterly statements of their cus-
tomers than banks without such facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying observation,
or outlier, is an observation much different (either very small or very large) in relation to other
observations in the sample (i.e., extreme values compared to the majority of a variable). The in-
clusion or exclusion of such an observation, especially if the sample size is small, can substan-
tially alter the results of regression analysis. With outliers it would be hard to maintain the as-
sumption of homoscedasticity.
5. Heteroskedasticity may be due to the omission of some explanatory variable (missing variable)
and /or wrong functional form. Many of the variables omitted from the model tend to change in
the same direction with X. They cause an increase in the variation of the observations of the de -
pendent variable from the regression line implies an increase in the variance of u i as the values of
X increase. In such situation the residuals obtained from the regression may give the distinct im-
pression that the error variance may not be constant. For example, in the demand function for a
commodity, if we do not include the prices of complementary commodities competing with the
commodity in question, the residuals obtained from the regression may give distinct impression
that the error variance may not be constant. But if the omitted variables are included in the
model, the impression may disappear.
6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors in-
cluded in the model. Examples, economic variables such as income, wealth, and education have
certain form of skweness. It is well known that the distribution of income and wealth in most
societies is uneven and skewed to one side. With the bulk of the income and wealth being
owned by a few at the top, it is skewed to one side.
Practical Exercise 4. 3
What are reasons for Heteroscedasticity

4.1.4. Consequences of Hetroscedasticity for the Least Square estimates


What happens when we use ordinary least squares procedure to a model with hetroscedastic disturbance
terms?

250
i. The OLS estimators of the regression coefficient would still be statistically unbiased and consis-
tent, albeit the u' s are heteroscedastic. The unbiasedness property of the least squares estimates
does not require that the variance of u's be constant. The coefficient estimates would still be stat-
istically linear.
 
Yi     X i  ui
xy xi 2 xi ui x u
ˆ        i 2i
x 2
xi 2
x 2
x
x(ui )
( ˆ )      ...........................................4.3
x 2
ˆ ˆ
Similarly ; ˆ  Y   X  (   X  U )   X
(ˆ )     X  (U )  ( ˆ ) X   .............................4.4
i.e., the least square estimators are unbiased even under the condition of heteroscedasticity.
ii. Variance of OLS coefficients will be inefficient

Under homoscedasticity we have constant minimums variance which is specified as


2
var( ˆ1 )   2 K 2 
x 2 . But under hetroscedastic assumption, it does not have minimum variance/
2
ˆ
  xui 
Var (  ) Het  E (    )  E  2 
2

not the most efficient/. We shall have:  xi 



2
 xi x j ui u j 
 xi 2 ui
2
i j 
 E 
  xi 
2 2
xi  
2 2

 xi x j E (uiu j )
xi 2 E (ui 2 ) i  j
 
   
2 2
xi 2 xi 2

 2 kxi 2
x x
i j
i j

  , E (ui 2 )  k 2 , sin ce E (ui u j )  0


 x 
i
2 2
 x  i
2 2


k x
i j
i
2
i

 var(  ) .................................................................4.5
 x  i
2 2

These two variances are different. If u is heteroscedastic, the OLS estimators do not have the smallest
variance in the class of unbiased estimators. They are not efficient both in small and large samples. To

ˆ 2
var( 1 )   K  2
2 2

see the consequence of using (4.5) instead of x , let us assume that:

 ui2  K i 2

251
Ki
Where are same non-stochastic constant weights. This assumption merely states that the hetroscedas-

tic variances are proportional to


Ki  2
;
2
being facto of proportionality. Substituting this value of ui in
(4.5), we obtain:
 2ki xi2   2   ki xi2 
var( ˆ ) Het     
(xi2 )(xi ) 2  x 2  xi2 

ˆ  ki xi2 
 (var(  ) Homo .  2 
.............................................................4.6
 xi 

ˆ
Thus, it can be seen that the two variances, var(  ) and Var (  ) Het will be equal only if ki=1 for all .
That is, the error is homoskedastic.
 ki xi 2 1, 

 If  xi 2
then OLS will overestimate the variance of 
 ki xi 1,
2

 if  xi 2
then OLS will underestimate the variance of 
var( ˆ )  K i (Yi )  K i2 ui2   2K i2 .
2 2

 xi  xi2 ui2
var( ˆ )  K i2 (Y 2i)   x 2  i
    (Y 2
)  ...................................................4.7
(xi2 ) 2
ˆ
are positively correlated, the second term of (4.6) is greater than 1, and then var( )
x 2 and k i
That is, if
under heteroscedasticty will be greater than its variance under homoscedasticity. As a result the true
ˆ
standard error of  shall be underestimated.
iii. Conventional OLS hypothesis is invalid: Standard confidences interval formula and hypothesis
testing procedure use these variance estimators and hence standard error. When there is hetroske-
daticity, the usual formulae of the variances of the coefficients are not appropriate to conduct
 ui2
tests of significance and construct confidence intervals. is no more a finite constant figure,
but rather it tends to change with an increasing range of value of X and hence cannot be taken
out of the summation (notation). As a result the t-value associated with it will be overestimated
ˆ
which might lead to the conclusion that in a specific case at hand is statistically significant
(which in fact may not be true). Hence, OLS can either overestimate/underestimate the correct
variance and can be only slightly inaccurate or grossly so. Moreover, if we proceed with our
model under false belief of homoscedasticity of the error variance, our inference and prediction
about the population coefficients would be incorrect. Hence, Conclusion and hypothesis (CIs)
based on biased standard errors will be wrong, and the t & F tests will be invalid and unreliable.

252
iv. The prediction (of Y for a given value of X ) based on OLS estimates from the original data
would have a high variance, and hence the prediction would be inefficient. Therefore: the OLS
estimated variances, σ2 (and thus for the standard errors of the coefficients) are biased.

   xi yi   1 
  x 2    x 2   i
Var (  )  V   x 2 Var (Yi )
(  i   i 
) of ordinary one cannot be us to say.
But, it is not possible to say anything in general about the direction or size of this bias.

Practical Exercise 4. 4
What are consequences of Heteroscedasticity ?

4.1.5. Detecting Heteroscedasticity


We have observed that the consequences of heteroscedasticty on OLS estimates. There are two methods
of testing or detecting heteroscedasticity. These are: informal method and formal method.
i. Informal method
This method is called informal because it does not undertake the formal testing procedures such as t-test,
F-test and the like. These include two methods: Nature of the problem and Graphic methods.
A. Nature of the Problem: Very often the nature of the problem under consideration suggests
whether heteroscedasticity is likely to be encountered. As a matter of fact, in cross-sectional data
involving heterogeneous units, heteroscedasticity may be the rule rather than the exception. For
example, the residual variance around the regression of consumption on income increased with
income.

B. Graphical Method: It is a test based on the nature of the graph. If there is no a priori or em-
pirical information about the nature of heteroscedasticity, in practice one can do the regression
analysis on the assumption that there is no heteroscedasticity and then examination of the resid-
ual squared u^ 2i to see if they exhibit systematic pattern. Although u^ 2i are not the same thing as u2i ,
they can be used as proxies especially if the sample size is sufficiently large. An examination of
the u^ 2i may reveal patterns such as those shown in Figure 4.3.

253
Figure 4.3: Scattergram of estimated squard residule against X
To check whether a given data exhibits hetroscedsticity or not we look on whether there is a systematic
e2 X 2
relation between residual squared i and the mean value of Y or with i . Furthermore, one can see u^ i
are plotted against Y^ i, the estimated Y i from the regression line, the idea being to find out whether the
estimated mean value of Y is systematically related to the squared residual. To do this first run OLS and
Yˆ or ( X i )

e2
plot squared residual vs. Y or each X. In Figure 4.3 in the figure i are plotted against . Sec-
ond, the graph may show some relationship (linear, quadratic…) providing clues as to the nature of the
problem and a possible remedy.

In Figure 4.3a, it can be seen that there is no systematic pattern between the two variables, suggesting
that perhaps no heteroscedasticity is present in the data. Figure 4.3b however exhibits definite patterns.
For instance, Figure 4.3c suggests a linear relationship, whereas Figure 4.3.d and e indicates a quadratic
relationship between u^ 2i and Y^ i . Using such knowledge, one may transform the data in such a manner
that the transformed data do not exhibit heteroscedasticity. Instead of plotting u^ 2i against Y^ i, one may
plot them against one of the explanatory variables, especially plotting u^ 2i against Y^ i results in the pattern
shown in Figure 7.3a. This is useful for cross check.
We can illustrate the above using hypothetical example consumption expenditure and disposable in-
come indicated using table 4.2.

254
Table 4.2: Consumption Expenditure ( D) and disposable income(ID) for 30 families

Disposable
Fam- Consump-
income(ID
ily tion
)
1 10600 12000
2 10800 12000
3 11100 12000
4 11400 13000
5 11700 13000
6 12100 13000
7 12300 14000
8 12600 14000
9 13200 15000
10 13000 15000
11 13300 15000
12 13600 15000
13 13800 16000
14 14000 16000
15 14200 16000
16 14400 17000
17 14900 17000
18 15300 17000
19 15000 18000
20 15700 18000
21 16400 18000
22 15900 19000
23 16500 19000
24 16900 19000
25 16900 20000
26 17500 20000
27 18100 20000
28 17200 21000
29 17800 21000
30 18500 21000
255
Solution
Regressing c on ID we obtain the following results
C  1480  0.788 ID
se (449.61) (0, 026845)
t (3.29) (29.37)
Using this model we produce the following intermediate results that could help to draw the residual
plot.

Table 4.3: Predicted consumption expenditure and residuals


Disposable Predicted
Family Consumption Residuals
income(ID) C
1 10600 12000 10941.81 -341.82
2 10800 12000 10941.82 -141.82
3 11100 12000 10941.82 158.18
4 11400 13000 11730.3 -330.3
5 11700 13000 11730.3 -30.3
6 12100 13000 11730.3 369.7
7 12300 14000 12518.79 -218.79
8 12600 14000 12518.79 81.21
9 13200 15000 12518.79 681.21
10 13000 15000 13307.27 -303.27
11 13300 15000 13307.27 -7.27
12 13600 15000 13307.27 292.73
13 13800 16000 14095.76 -295.76
14 14000 16000 14095.76 -95.76
15 14200 16000 14095.76 104.24
16 14400 17000 14884.24 -484.24
17 14900 17000 14884.24 15.76
18 15300 17000 14884.24 415.76
19 15000 18000 15672.73 -672.73
20 15700 18000 15672.73 27.27
21 16400 18000 15672.73 727.27
22 15900 19000 16461.21 -561.21
23 16500 19000 16461.21 38.79
24 16900 19000 16461.21 438.79
25 16900 20000 17249.7 -349.7
26 17500 20000 17249.7 250.3
27 18100 20000 17249.7 850.3
28 17200 21000 18038.18 -838.18
29 17800 21000 18038.18 -238.18
30 18500 21000 18038.18 461.82

Now, if we plot the residuals against the X values, we obtained the pattern shown by figure 4.4. The plot
seen figure 4.4 shows a wadge shape that increase to the right. This can be interpreted as an increasing
256
variance of the random term in the model or presence hetroskadatcity. If now we draw the plot of resid-
ual against the predicted value (fitted value) of c, we will observe similar results and reach the same in -
terpretation.

Figure 4.4: Graph of Ui2 and Xi


800000
600000
usquare
400000
200000
0

12000 14000 16000 18000 20000 22000


disposableincomeid

usquare usquare

Example-2: Table 4.4 gives data on quantity demanded (y), the price of the commodity (X 1) and income
of consumer (X2) for a commodity for a period of 10 months. Fit the regression equation of Y on X 1 and
X2 and check for the problem of hetroskedaticity in the model.

Table 4.4: Quantity demanded, price of the commodity and income of consumer data

NO. Y X1 X2
1 100 5 1000
2 75 7 600
3 80 6 1200
4 70 6 500
5 50 8 300
6 65 7 400

257
7 90 5 1300
8 100 4 1100
9 110 3 1300
10 60 9 300

Solution
 2
Y  111.6918  7.18824 X 1  0.014297 X 2 R 2  0.89 R  0.86
se  (23.53081) (2.555331) (0.011135)
t  (4.75) ( 2.81) (1.28)
Practical Exercise 4. 05
1. Produce residual graph and test whether hetroscedaticity exist ?
2. Interpret the result of above example?

ii. Formal methods


There are several formal methods of testing heteroscedasticty. Some of formal tests heterosedasticity
are Park test, Glejser test, the Spearman rank-correlation test, Goldfield-Quandt test, Breusch–Pagan–
Godfrey Test.
A. Park test
Park formalizes the graphical method of testing heteroskedaticity suggesting that the variance of random

disturbance
 i2 is some function of the explanatory variable X i . Specific functional form the suggested

was:
 i2   2 X i e vi Or
ln  i2  ln  2   ln X i  vi ..................................4.8

where v is the stochastic disturbance term.

Since
 i2 is generally not known, park suggests using ei2 as proxy. Accordingly, park suggests running

the following regression.


ln ei2  ln  2   ln X i  vi
ln ei2     ln X i  vi ................................................4.9
Since  is constant.
2

The park test is a three-stage testing procedure;


e
 in the first stage, we run OLS regression disregarding the hetroscedasticity question. We obtain i
from this regression and then
 in the second stage we run the regression in equation (4.9) above. Equation (4.9) indicates how to
2
test hetroscedasticity by testing whether there is a significant relation between X i and ei .
H : 0
 In third stage set hypothesis: 0 against H 1 :   0 . If  turns out to be statistically signif-
icant, it would suggest that hetroscedasticity is present in the data. If it turns out to be insignificant,
we may accept the assumption of homoscedasticity.

258
Example-1: Suppose that from a sample of size 100 we estimate the relation between compensation for
workers (Xi) and productivity (Yi).
To illustrate the Park approach, the following regression function is used
Yi = βo + β1Xi + Ui ...............................................4.10
Suppose data on Y and X come up with the following result
Y  1992.342  0.2329 X i .................................................4.11
SE  (936.479) (0.0098)
t  (2.1275) (2.333) R 2  0.4375
The results reveal that the estimated slope coefficient is significant at 5% level of significant on the
bases of one tail t-test. The equation shows that as labor productivity increases by, say, a birr, if labor
compensation on the average increases by about 23 cents.
X
The residual obtained from regression (4.11) were regressed on i as suggested by equation (4.9) giving
the following result.
ln ei2  35.817  2.8099 ln X i ...................................4.12
SE  (38.319) (4.216)
t  (0.934) ( 0.667) R 2  0.0595
The above result revealed that the slope coefficient is statistically insignificant implying there is no sta-
tistically significant relationship between the two variables.

Practical Exercise 4. 5
Consider a relationship between Compensation (Y) and Productivity (X). To illustrate the Park ap-
proach, the following regression function is used
Yi = βo + β1Xi + Ui .....................................................................4.13
Suppose data on Y and X come up with the following result
Ŷ = 1992.34 + 0.23 Xi………………………….…….……….4.14
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on Xi as suggested in
(4.13), giving the following results.
ei22
ln = 35.82 - 2.81 lnXi ……………………………………….4.15
S.e = (38.32) (4.22)
t = (0.93) (-0.67) r2 = 0.46
Interpret the result?
Limitation
Following the park test, one may conclude that there is no hetroscedasticity in the error variance al-
though empirically appealing, the park test has some problems. Gold Field and Quandt have argued that
ei ln ei2     ln X i  vi
the error term entering into may not satisfy the OLS assumptions and may it-
self be hetroscedastic. Nonetheless, as a strict explanatory method, one may use the park test.
259
B. Glejser test:
ei
The Glejser test is similar in sprit to the park test. After obtaining the residuals from the OLS regres-

sion, Glejser regress the absolute value of


Ui
on
Xi
assuming it is closely associated with
 i2 . Glejser

suggested or used the following functional forms in his experiment:


1
ei     X i  vi , ei      vi
Xi
ei     X i  vi , ei     X i  vi
1
ei      vi , ei     X i2  vi
Xi
v
; where i is error term.
Glejester has found that the first four preceding models give generally satisfactory results in detecting
heterosedasticity for large samples. To detect the possible existence of heteroscedasticity using the
Glejser test, we commonly use the following steps.
Step 1: Regress the dependent variable, Y, on all the explanatory variables and compute the regression
residuals, e' s
 ui 2
Step2: Then regress the absolute values of e' s , ( e ), on the explanatory variable with which is
thought, on a priori grounds, to be associated. The actual form of this regression is usually not known, so
that one may experiment with various formulations containing various powers of X. For example, we
can regress;
ui  ao  a1 X i 2
1
ui  ao  a1 X i 1  ao  a1
Xi
ui  ao  a1 X i1/ 2  ao  a1 X i and so on.
Finally, among these regressions, we choose the model that gives the best fit in the light of the correla-
a and a1
tion coefficient and the statistical significance the coefficients 0 based on the standard errors. In
other words, we perform the usual standard test of significance for these coefficients, and if they are
found to be significantly different from zero we reject the null hypotheses of homoscedasticity. Then, by
a and a1
using the coefficients ( 0 ) of the chosen model, we can decide on the existence of heteroscedasti-
city problem as follows.
a 0
i. If o , while a1 ≠ 0, it implies that there is pure heteroscedasticity.
ii. If both a0 ≠ 0 and a1≠0, it implies that there is mixed heteroscedasticity

260
 2
The Glejser test gives information on the particular way or form of heteroscedasticity in which ui is
connected with X . This information is crucial for the correction of the heteroscedasticity of the disturb-
ance term.
Limitation
vi
Goldfeld and Quandt pointout that expected value error term ( ) is non-zero. That is, it is serially corre-
lated and irrorrically heteroscedstic. An additional difficulty with the Glejser method is that models such

ei     X i  vi ei    X i2  vi
as: and are non-linear in parameters and therefore cannot be
estimated with the usual OLS procedure. As a practical matter, Glejester technique may be used for
large samples and may be used in small samples strictly as qualitative device to learn something about
heterosedasticity.
Practical Exercise 4. 6
Compare and contrast Park test and Glejester test?
Which one gives strong base for testing hetroskedaticity ?

C. The Spearman rank-correlation test

This is the simplest test of heteroscedasticity test method. It can be applied to small samples as well as
large samples. The steps to conduct this test are outlined as follows.
Step 1: Formulate the null hypothesis of homoscedasticity against the alternative hypothesis of heteros-
H o   s  0 and H1   s  0
, where,  is the population rank correlation coeffi-
s
cedasticity as:
s
cient and its sample counterpart is r .
Yi   0   i X i  U i
Step 2: Regress Y on X as shown below ………………………… 4.16
Then, obtain the residuals, ei' which are the estimates of the u' s from equation 4.17.
Step 3: Order the e' s (ignoring their sign) and the X value in ascending or descending order and com-
s
pute the rank correlation coefficient between e and X .The Spearman rank-correlation coefficient r
give as:
6 Di 2
r  1
s
...............................................................................4.17
n(n 2  1)
Di
Where, is the difference between the ranks of corresponding pairs of e and X and n is the number of
observations in the sample. If we have a model with more than one explanatory variable, we may com-
pute the spearman’s rank correlation coefficient between e and each one of the explanatory variables
separately.

261
s
Step 4: Compute the value of t ( t calculated) form equation 4.17. Assuming that the null hypothesis is

true (i.e., population rank correlation coefficient,  is zero) and n > 8 , the significance of sample rank
s

s
correlation coefficient, r , can be tested by using t -test. The formula used to obtain the value of t for
this purpose is given as:
rs n  2
t s calculted  ...................................................................4.18
1  (r s )2

Step 5: Make decision using the following rule: Reject the null hypothesis of homoscedasticity, if the
s
value of t obtained from equation 4.18. ( t calculated) is greater that the value of t obtained from t tables
for degree of freedom of n − 2 , and conclude that there is the problem of heteroscedasticity.
Example: Consider the regression Yi = β0 + β1Xi to illustrate the rank correlation test using10 observa-
tions. The following table makes use of the rank correlation approach to test the hypothesis of het -
roscedasticity. Notice that column 6 and 7 put rank of |Ûi| and Xi in an ascending order.

Table 4.5: Rank Correlation Test of Hetroscedasticity


d (difference between
Observation Y X Ŷ Û=(Y- Ŷ) Rank of Ûi Rank of Xi
the two ranking) d2

1 12.4 12.1 11.37 1.03 9 4 5 25

2 14.4 21.4 15.64 1.24 10 9 1 1

3 14.6 18.4 14.4 0.20 4 7 -3 9

4 16 21.7 15.78 0.22 5 10 -5 25

5 11.3 12.5 11.56 0.26 6 5 1 1

6 10.0 10.4 10.59 0.59 7 2 5 25

7 16.2 20.8 15.37 0.83 8 8 0 0

8 10.4 10.2 10.50 0.10 3 1 2 4

9 13.1 16.0 13.16 0.06 2 6 -4 16

10 11.3 12.0 11.33 0.03 1 3 -2 4

TOTAL 0 110

 110 
10(100  1) 
Applying formula (4.17) we obtain, rs = 1 – 6   = 0.33

262
(0.33) 8
Applying the t-test given in (4.18), we obtain: t = 1  0.11 = 0.99

The t-value for 8 (=10-2) df is 1.856 indicating that the test is not significant even at the 10% level of
significance. Thus, there is no evidence of systematic relationship between the explanatory variable and
the absolute value of the residuals, which might suggest that there is no hetroscedasticity.

D. Goldfield-Quandt test

2
This method is applicable if one assumes that the heteroscedastic variance, i is positively related to
one of the explanatory variables in the regression model. For simplicity, consider the usual two variable
models:
Yi     i X i  U i
…………………………4.19

Suppose
 i2 is positively related to X i as:
 i2   2 X i2 .................................................4.20 ; where  2 is constant.
2
If the above equation is appropriate, it would mean i would be larger, the larger values of i .If that
X
turns out to be the case, hetroscedasticity is most likely to be present in the model.
To test this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Formulate the hypotheses to be tested. The customary hypotheses used in the Goldfeld and

Quandt test are Ho  u homoskedastic ( 1


 2   2 2 ) and H = the u' s are heteroscedastic (  12   2 2 )
1

X
Step 2: Order or rank the observations according to the values of i beginning with the lowest X value
Step 3: Select arbitrarily a certain number, c , of central observations to be omitted from the analysis.
Generally, for samples larger than 30 (n > 30), the optimum number of central observations to be omit-
ted from the test is approximately a quarter of the total observations. For example, we omit 8 central ob -
servations for n = 30 , 16 central observations for n = 60 and so on. A central observation (C) is speci-
(n  c)
fied a priori, and divides the remaining (n-c) observations into two groups each of 2 observations.
Where one includes the small values of X and the other includes the large values of X .
(n  c) (n  c)
Step 4: Fit separate OLS regression to the first 2 observations and the last 2 observations,
and obtain the respective residual sums of squares RSS1, and RSS2, where RSS1 representing RSS from
X
the regression corresponding to the smaller i values (the small variance group) and RSS 2 that from the
larger
Xi
values (the large variance group). Let  u12
be the sum of residuals obtained from the sub-
 n  c 
 2   k
degrees of freedom and  2 be the sum of residuals
u2
sample of low values of X , with

263
n c
 2   k
from the sub-sample of high values of X , with the same degree of freedom, , where k is the
(n  c) (n  c  2 K )
K df
total number of parameters in the model.These RSS each have 2 or 2 ,
where: K is the number of parameters to be estimated, including the intercept term; and df is the de -
gree of freedom.
n c
RSS 2 / df  u 22 / 
 2 
k
  ...................................4.21
RSS1 / df n c
 u 1 /  2   k
2

Step 5: compute
U
If i are assumed to be normally distributed (which we usually do), and if the assumption of ho -
moscedasticity is valid, then  follows F distribution with numerator and denominator df each (n-c-
2k)/2.
RSS 2 /(n  c  2 K ) / 2
 ~F ...................................4.22
RSS1 /(n  c  2 K ) / 2  (n-c)
2
K ,
(n-c)
2

K 

 nc
v1  v2   k
has an F- distribution with  2  degrees of freedom. If the two variances are equal (i.e.,
if the u' s are homoscedastic) the value of F in equation above will tend to 1. On the other hand, if the

two variances differ, then the F will have a large value, given by the sign of the test 
u2 2   u12
.
Step 5: Make decision: to make decision on the status of the null hypothesis, compare the observed
value of F obtained from equation 4.20 with the theoretical value of F (tabulated F) for
 nc
v1  v2   k
 2  degrees of freedom and the chosen level of significance. Reject the null hypothesis

(Ho ) of homoscedasticity
 12   2 2 (i.e., no difference in the variances of the two groups), if Fcal > F

tabulated; otherwise we can conclude that there is heteroscedasticity problem in the model. If the com-

puted  ( F ) is greater than the tabulated F at the chosen level of significance, we can reject the hypoth-
esis of homoscedasticity, i.e. we can say that hetroscedasticity is very likely. The higher the observed
value of F, the stronger would be the heteroscedasticity problem.
Example: To illustrate the Goldfeld-Quandt test, we present in table 4.6 data on consumption expendi-
ture in relation to income for a cross-section of 30 families. Suppose we postulate that consumption ex -
penditure Y($) is linearly related to income X($) but we suspect heteroscedasticity will present in the
data. The necessary reordering of the data for the application of the test is also presented in table 4.6.

264
Table 4.6: Hypothetical data on consumption expenditure and income

Y X Y X
55 80 55 80
65 100 70 85
70 85 75 90
80 110 65 100
79 120 74 105
84 115 80 110
98 130 84 115
95 140 79 120
90 125 90 125
75 90 98 130
74 105 95 140
110 160 108 145
113 150 113 150
125 165 110 160
108 145 125 165
115 180 115 180
140 225 130 185
120 200 135 190
145 240 120 200
130 185 140 205
152 220 144 210
144 210 152 220
175 245 140 225
180 260 137 230
135 190 145 240
140 205 175 245
178 265 189 250
191 270 180 260
137 230 178 265
189 250 191 270

Dropping the middle four observations, the OLS regression based on the first 13 and the last 13 observa-
tions and their associated sums of squares are shown (standard errors in parentheses).
Table 4.7: Regression result of two sub group for Goldfeld-Quandt
265
Regression based on the first 13 observa- Regression based on the last 13 observations
tions

Yi  3.4094  0.6968 X i  ei Yi  28.0272  0.7941X i  ei


(8.7049) (0.0744) (30.6421) (0.1319)
R 2  0.8887 R 2  0.7681
RSS1  377.17 RSS 2  1536.8
df  11 df  11
From these results we obtain:
RSS 2 / df 1536.8 /11
   4.07
RSS1 / df 377.17 /11

The critical F-value for 11 numerator and 11 denominator for df at the 5% level is 2.82. Since the esti -

mated F (  ) value exceeds the critical value, we may conclude that there is hetrosedasticity in the er-
ror variance. However, if the level of significance is fixed at 1%, we may not reject the assumption of
homosedasticity is the  value of the observed  is 0.014.
E. Breusch–Pagan–Godfrey Test.

The success of the Goldfeld–Quandt test depends not only on the value of c (the number of central ob-
servations to be omitted) but also on identifying the correct X variable with which to order the observa-
tions. But limitation of this test can be avoided if we consider the Breusch–Pagan–Godfrey (BPG) test.
To illustrate this test, consider the k-variable linear regression model
Yi = 0 + 1X1i + … + kXki + Ui ….. ..........................................4.23
Some or all of the X’s can serve as Z’s. Assume that the error variance i2 is described as
i2 = f(1 + 2Z2i + … + mZmi)………..........................................4.24
i2 is some function of the non-stochastic variables Z’s. Specifically, assume that
i2 = 0 + 1Z1i + … + mZmi …...................................................4.25
i2 is a linear function of the Z’s . If 1 = 2 = … = m = 0, i2 = 0 which is constant. Therefore, to test
whether i2 is homoscedastic is to test the hypothesis that 1 = 2 = … = m = 0. This actual test proce-
dure is as follows.

Step1. Estimate Yi = 0 + 1Z1i +…+ kZki + Ui by OLS and obtain the residuals Û 1 , Û 2 ,…, n
~ 2   Uˆ i2 n
Step2. Obtain . This is the maximum likelihood estimator of 2. (Recall from unit two

previous discussion that the OLS estimator i2 is Uˆ i


2
(n  k )

Uˆ i2 ~ 2
i defined as Pi =
Step3. Construct variables P ………………………………..4.26
266
~2
It is simply each residual squared divided by 
Step 4. Regress Pi on Z’s as Pi = 0 + 1Z1i + …mZmi + Vi ………………………..4.27
where Vi is the residual term of this regression
Step 5. Obtain the ESS (explained sum of squares) from the above equation and define
1
 = 2 (ESS) ………….………..………………………………..4.28
Assuming Ui are normally distributed, one can show that if there is homoscedasticity and if the sample
size (n) increases indefinitely, then  ~ X2m-1 , that is,  follows asymptotically chi-square distribution
with (m-1) degrees of freedom. Therefore, if the computed  (= X2) exceeds the critical X2 value at the
chosen level of significance, one can reject the hypothesis of homoscedasticity; otherwise one does not
reject it.
Example: Suppose we have 30 observations data on Y and X that gave us the following regression re-
sult.
Step 1 : Ŷ = 9.29 + 0.64XI. …………………………………………………....………..4.29
S.e
S.e = (5.2) (0.03) RSS = 2361.15

~ 2   Uˆ i2 30 2361.15
30
Step 2 : = = 78.71
STEP 3: Pi=
Uˆ i
2
~ 2
is derived the residuals û obtained from regression in step 1 by 78.71 to construct
the variable pi.
Step 4: Assuming that Pi are linearly related to Xi (=Zi) we obtain the following regression result.
Pi= -0.74 + 0.01Xi ESS = 10.42
Step 5:  = ½ (ESS) = 5.21……………………………………………..…..4.30
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus, the ob-
served Chi square value is significant at 5% level of significance. Note that BPG test is asymptotic.
That is, it is a large sample test. The test is sensitive in small samples with regard to the assumption that
the disturbances Vi are normally distributed.

Practical Exercise 4. 7
Compare and contrast those tests of hetroskedaticity ?
Which one gives strong base for testing hetroskedaticity ?

4.1.6. Remedial measures for the problems of heteroscedasticity


Having identified problem of heteroscedasticity in a given model on the basis of any of the tests we have
discussed in section 4.2.2, the next issue is to deal with model to be used. Because, the usual OLS
method does not make use of the “information” contained in the unequal variability of the variables and

267
hence lead to biased and inefficient result. Therefore, remedial measures concentrate on the variance of
the error
term, sources of the problem and its treatments. It should be done in steps .
To this end, first check for some specification errors (omitted variables, wrong functional form) and
make necessary corrections. Because, heteroskedasticity could be a symptom of other problems e.g.
omitted variables and /or wrong functional form. Second, if it persists even after correcting for other
specification errors scholars recommended different approach. One of the approaches is to stick to OLS
but use robust standard error (hetroskedasticity). Second is using OLS after transformation the model.

4.1.6.1. The method of generalized (Weight) Least Square


This remedial measure recommends transforming the above model so that the transformed model satis-
fies all the assumptions of the classical regression model including homoscedasticity. If we apply OLS
var(u i )
to the above then it will result in inefficient parameters since is not constant. This is procedure
of transforming the original variables in such a way that the transformed variables satisfy the standard
least-squares assumptions and then applying OLS to them is known as the method of generalized least
squares (GLS), (\GLS or WLS /FGLS). The estimators thus obtained are known as GLS estimators. In
short GLS is OLS on the transformed variables that satisfy the standard least squares assumptions.
It should be, therefore, clear that in order to adopt the necessary corrective measure we must have infor-
mation on the form of heteroscedasticity. The transformation of the model depends on the particular
 ui 2
form of heteroscedasticity, i.e., on the form of the relationship between the variance of ui , , and the
 ui 2
values of the explanatory variables ( = f(x)).
Some the recommended forms are
 i2 (u i ) 2   i2  f (k i )
a. Assume known:

 i2 (u i ) 2   i2  k i f ( X i )
b. Assume that is not known : , (different form)

Since our transformed data no more possesses heteroscedasticity, it can be shown that the estimate of
the transformed model are more efficient (i.e. they possesses smaller variance).

 i2 (ui ) 2   i2  f (ki )
A. Assume known:

268
Y    X i  U i u
Assume that our original model is: where i satisfied all the assumptions except that
ui
is heteroscedastic. That is;
Y    X i  U i , var(u i )   i2 , (u i )  0 (u i u j )  0
………………..4.31
(u i )   i  f (k i )
2 2
. If we apply OLS to the above model, the estimators are no BLUE. To make them
BLUE we have to transform the above model using variance. When the population variance i is
2
known, we use it for transformation purpose. Form of variance differs depending on the situation. Let
2
us assume i known and its functional form is as specified
(u i ) 2   i2
…………………………………….………………………4.32
The transforming variable of the above model is done by dividing the model using standard error. The
 i2   i 
standard error is square root of the variance ; i,e . Now divide the above model by i .
Y   X i Ui
   ......................................................................4.33
i i i i
The variance of the transformed error term is constant, i.e.
2
u  u  1 1
var i    i   2 (u i ) 2  2  i2  1
i  i  i i
Constant…………….4.34
The transformed parameters are BLUE. Because all the assumptions including homoscedasticity are sat-
ui 1
 (Yi     X i )
isfied to (4.31),
 i  i

We can know apply OLS to the above model.


2
 ui  1
2

       (Yi    X i ) 2
 i i ................................................4.35
1
wi  2
Let i
 wi uˆi2  wi (Yi  ˆ  ˆX i ) 2
……………………………………………4.36
The method of GLS (WLS) minimizes the weighted residual sum of squares.
  wi uˆ i2
 2wi (Yi  ˆ  ˆX i )  0
First order condition with respect to ̂ :  ˆ
wi (Yi  ˆ  ˆX i )  0
w Y  ˆw  ˆw X  0
i i i i i

 wi Yi  ˆwi X i  ˆwi


w Y ˆwi X i
ˆ  i i   Y *  ˆ X * .........................................4.37
wi wi
* *
where Y is the weighted mean of Y and X is the weighted mean of X which are different
from the ordinary mean .

269
  wi uˆ i2
 2wi (Yi  ˆ  ˆX i )( X i )  0
ˆ ˆ

First order condition with respect to  =
wi (Yi  ˆ  ˆX i )( X i )  0
w (Y X  ˆX  ˆX )  0
2
i i i i i

wi Yi X i  ˆwi X i  ˆwi X i  0


2

 wiYi X i  ˆwi X i  ˆwi X i 2 ..............................4.38


ˆ *
Substituting ˆ  Y  X in the above equation, we get
*

wi Yi X i  (Y *  ˆX * )wi X i  ˆwi X i


2

 Y * w X  ˆX * w X  ˆw X
2
i i i i i i

wi Yi X i  Y wi X i  ˆ (wi X i2  X * wi X i )


*

wi Yi X i  Y * X * wi  ˆ (wi X i2  X * wi )


2

wiYi X i  Y * X *wi x* y *


ˆ   ....................................4.39
wi X i2  X *2wi x*2
where x* and y* are weighted deviations. These parameters are now BLUE.

Thus, in GLS we minimize a weighted sum of residual squares with w i=1/σ i2 acting as the weights, but
in OLS we minimize an unweighted or equally weighted RSS. The weight assigned to each observation
is inversely proportional to its σi , that is, observations coming from a population with larger σ i will get
relatively smaller weight and those from a population with smaller σ i will get proportionately larger
weight in minimizing the RSS (4.33). Since (4.33) minimizes a weighted RSS, it is known as weighted
least squares (WLS), and the estimators thus obtained and given in (4.38) and (4.39) are known as
WLS estimators. But WLS is just a special case of the more general estimating technique, GLS. Note
that if w i=w , a constant for all i,  * is identical with  .

 i2 (u i ) 2   i2  k i f ( X i )
B. Assume that is not known : , (different form)

It is quite rare to assume that population variance is known. If the population variance is not known we
use sample variance as a proxy. Sample variance may have different functional form. Let’s assume is
(u i ) 2   i2  k i f ( X i )
some function of X. i.e. . The transformed version of the model may be obtained
f (X i )
by dividing throughout the original model by .
(u i ) 2   i2  k 2 X i2
Case I. Suppose the heteroscedasticity is of the form . This equation implies that
ui X i2
the variance of increases proportional with , Then, solving for the constant factor of proportional-
 w2
k2 
2
ity, k , from equation above, we get X i 2 . If Y     X i  U i where var(u i )   i2  K i2 X i2
.

270
This suggests that the appropriate transformation of the original model consists of the division of the ori-

ginal model by the transforming variable of standard error specified as X2  X . The transformed
model is:
Y   X i Ui
   .............................................................4.40
Xi Xi Xi Xi
 Ui
  
Xi Xi
Yi ui
Y*  , and U * 
Xi Xi
Let
1
Y*   ( )    U *.............................................................4.41
Xi
ˆ Y 1
ˆ   ..................................................................4.42
Xi Xi

U *
Note that: i in the transformed model above is homoscedastic.Then, applying OLS to the above het-
eroscedastic model
k u
ˆ    i 2 i
x
2
ˆ ˆ  k i u i 
var(  )  (    )  
2

 x 
2

2
(k i u i  2X i X i u i u i )

x 2 2
i

xi2 (u i ) 2 K 2 xi2 X 2


 
x  2 2
i
( x 2 ) 2
ˆ  i2 1 X 2 K 2  1 X 
2
 u2 X i2
var( ˆ )   
n ( 1 X  ( 1 X ) 2 ) n( 1 X  ( 1 X )) , Since var(ˆ ) in OLS
2
n x 2
2
u  1 K2X 2
 i   2 (u i2 )  2
 K2
U * X Xi Xi
Proof: The variance of i can be given as  i  constant which proves
that the new random term in the model has a finite constant variance ( K ) . We can, therefore, apply
2

 U
  i
OLS to the transformed version of the model X i X i . Note that in this transformation the position
1
of the coefficients has changed: the parameter of the variable X i in the transformed model is the con-
stant intercept of the original model, while the constant of term of the transformed model is the parame-
ter of the explanatory variable X in the original model. Therefore, to get back to the original model, we
K
shall have to multiply the estimated regression by i .

271
(u i2 )   i2  k 2 X i
Case II. Suppose the heteroscedasticity is of the form : . Then, the transformation
of the original model consists of the division of the original relationship given in equation
Y   o  1 X i  U i Xi
by the transforming variable is .
Y   X i Ui
   ................................................4.43
Xi Xi Xi Xi
The transformed model is:
 U
   Xi  i
Xi Xi
Ui
Xi
Note that the transformed random term in the transformed model given in equation 4.43 is homos-
2
cedastic with constant variance equal to k .: The variance of the stochastic term in the transformed
model can be given as
2
 u  1 1
  i   (U i ) 2  k 2 X  k 2 ........................................................................4.44
 X  X X
 i 
2
Note that: k is constant with constant variance, hence, we can conclude that with the above transforma-
tion we solved the problem of heteroscedasticity, and hence we can estimate the transformed model with
OLS. There is no intercept term in the transformed model. Therefore, one will have to use the ‘regres-

sion through the origin’ model to estimate  and  . In this case, therefore, to get back to the original
Xi
model, we shall have to multiply the estimated regression by
.

(ui2 )   i2  k 2 (Yi )   k 2 (ao  a1 X i ) 2


2

Case III: Suppose heteroscedasticity is of the form . Then,


the appropriate transformation will be to divide the original model by the transforming variable is

(vi ) 2  (ao  a1 X i ) 2  (Yi )     X i ........................................................4.45

Y   Xi Ui
   .............................................................4.46
   Xi    Xi    Xi    Xi
Proof : Now we can show that the stochastic term of the transformed model given in equation above ,
has constant variance.
 Ui  1 1
K 2 (Yi )  K 2 ..............................................4.47
2
  (ui ) 2 
    X i  (   X i ) (Yi )
2 2

Hence, with the above transformation we solved the problem of heteroscedasticity, and hence we can es-
timate the transformed model with OLS.The transformed model described in 4.47 above is however not

operational in this case. It is because values of  and  are not known. But since we can obtain
Yˆ  ˆ  ˆX i ,
the transformation can be made through the following two steps.

272
1st : we run the usual OLS regression disregarding the heteroscedasticity problem in the data and obtain
Yˆ using the estimated Yˆ , we transform the model as follows.
Y 1 X U
    i  i ....................................................................................4.48
Yˆ Yˆ Yˆ Yˆ
Assumption IV: A log transformation such as
ln Yi    1 ln X i  U i ..................................................................................4.49

Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are measured.
For example log transformation reduces a ten-fold difference between two values (such as between 8
and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)

Concluding Remark
The remedial measures explained earlier through transformation point out that we are essentially specu-
lating about the nature of i2. Note also that the OLS estimators obtained from the transformed equation
are BLUE. The transformation discussed will depend on the nature of the problem and the severity of
hetroscedasticity. Moreover, we may not know a priori which of the X variable should be chosen for
transformation the data in case of multiple regression model. In addition log transformation is not appli-
cable if some of the Y and X values are zero or negative. Besides the use of t-test, F tests, etc are valid
only in large samples when regression is conducted in transformed variables.
Practical Exercise 4. 8
What are remedial measures of hetroskedaticity ?
Briefly present how GLS works and logic behind the model?

273
4.2 Autocorrelation
4.2.1 The nature of Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of the classical lin-
cov(u i u j )  (u i u j )  0
ear regression model is that the which implies that successive values of distur-
bance term U are temporarily independent. That is, disturbance occurring at one point of observation is
not related to any other disturbance or when observations are made over time, the effect of disturbance
occurring at one period does not carry over into another period. Sometimes the above assumption is not
satisfied; there is autocorrelation of the random variables. Autocorrelation is defined as a ‘correlation’
between members of series of observations ordered in time(as in time series data) or space(as in cross
sectional data) i.e. spatial autocorrelations. That is, if the value of U in any particular period is corre-
lated with its own preceding value(s).
There is a difference between ‘correlation’ and autocorrelation. Autocorrelation is a special case of cor-
relation which refers to the relationship between successive values of the same variable, while correla-
tion may also refer to the relationship between two or more different variables. Autocorrelation is also
sometimes called as serial correlation but some economists distinguish between these two terms. Ac-
cording to G.Tinner, autocorrelation is the lag correlation of a given series with itself but lagged by a
number of time units. The term serial correlation is defined as “lag correlation between two different se-
u1 , u 2 .........u10 u 2 , u 3 .........u11
ries.” Thus, correlations between two time series such as and , where the
latter series is lagged by one time period of the former. Whereas correlation between time series such as
u1 , u 2 .........u10
and v 2 , v 2 ......... v11 where U and V are two different time series, is called serial correlation.
Although the distinction between the two terms may be useful, we shall treat these terms synonymously
in our subsequent discussion.
Just as hetroskedaticity is generally associated with cross-sectional data, autocorrelation is usually asso-
ciated with time series data (i.e. data ordered in temporal sequences). We take a look at this in order to
answer the following question; what is the nature of autocorrelation? What are consequences (theoretical
and applied)? How does one detect autocorrelations? How does one remedy the problem of autocorrela-
tions?

4.2.2. Graphical representation of Autocorrelation


Since autocorrelation is correlation between members of series of observations ordered in time, we will
see graphically the trend of the random variable by plotting time horizontally and the random variable
(U i )
vertically. Consider the following figures

274
Figure 4.5: Error terms and its pattern overtime
The figures (a) –(d) above, show a cyclical pattern among the U’s indicating autocorrelation i.e. figures
(b) suggest an upward and (c) suggest downward linear trend and (d) indicates quadratic trend in the dis-
turbance terms. Figure (e) indicates no systematic pattern supporting autocorrelation.

On the other hand one can present successive relationship between error terms as indicated by figure 4.6
f-g. Autocorrelation is graphically shown by plotting successive values of the random disturbance term
vertically (ui) and horizontally (uj).

Figure 4.6: Autocorrelation among error terms


The assumption of non-autocorrelation is more frequently violated in case of relations estimated from
time series data. The above figures f and g similarly indicates us positive and negative auto- correlation
respectively while h indicates no autocorrelation data types and examples. In general, if the disturbance
terms follow systematic pattern as in (f) and (g) there is autocorrelation or serial correlation. But, if there
is no systematic pattern, as own by figure (h), there is no correlation. In general, there are a lot of con-
ditions under which the errors are autocorrelated(AC). In such cases we have
cov(ut , ut  s )  E (ut , ut  s )  0 for s  0 275
cov(ut , ut  s )  E (ut , ut  s )  0 for s  0
.In a study of the relationship between output and inputs of a
firm or industry from monthly observations there is autocorrelation. Sometimes one may think that,
there is no autocorrelation of the disturbance which implies that the effect of machine breakdown is
strictly temporary in the sense that only the current month’s output is affected. But in practice, the effect
of a machine breakdown in one month may affect current month’s output as well as the output of sub -
sequent months.
In a study of the relationship between demand and price of elasticity using monthly observations, the ef-
fect of price change in a certain month will affect the consumption behaviour of households (firms) in
subsequent months (that is, the effect will be felt for months to come). Thus, the assumption of non-
autocorrelation does not seem plausible here.
Practical Exercise 4. 9
Give us three practical example s of autocorrelation?
Why autocorrelation is common in time series data?

4.2.3 Reasons for Autocorrelation

There are several reasons why serial or autocorrelation a rises. Some of these are:

a. Cyclical fluctuations

Time series such as GNP, price index, production, employment and unemployment, and business cycle
exhibit certain cycle. For instance, business cycle start at the bottom of recession, when economic re-
covery starts, and then most of these series move upward. In this upswing, the value of a series at one
point in time is greater than its previous value. Thus, there is a momentum built in to them, and it con -
tinues until something happens (e.g. increase in interest rate or tax) to slowdown them. Therefore, in
regression involving time series data, successive observations are likely to be interdependent.
b. Interpolations in statistical observations (data manipulation)

In empirical analysis raw data are “massaged”. Most of the published time series data involve some in-
terpolation and “smoothing” processes. In this process, the average of the values of disturbance term
over successive time periods will be taken. Consequently, the successive values of U are interrelated and
exhibit autocorrelation pattern. For example, in time series regression involving quarterly data, such data
are often derived from the month by simply adding three monthly observations and dividing the sum by
3. This averaging introduces “smoothness” into the data by dampening the fluctuations in the monthly
data. This smoothness can itself lend to a systematic pattern in the disturbance, thereby, introducing
autocorrelations.

276
c. The cobweb phenomena:
The supply of many agricultural commodities reflects the so called cobweb phenomena, where supply
reacts to price with a lag of one time period because supply decisions take time to implement-the gesta-
tion period. Thus, crops planting decision of farmers at the beginning of this year’s, are influenced by
the price prevailing last year so that their supply functions is
Supply at t period  B0  B1Pt 1  U t

Suppose at the end of period t, price P t turns out to be lower that Pt-1. Therefore, in period( t+1) farmers
decide to produce less than they did in period t, obviously, in this situation the disturbances Ut are not
expected to be random for in the farmers overproduce in year, they are likely to under produce in
year(t+1) etc, leading to a cobweb patterns.
d. Misspecification of the true random term, U
It may well be expected that the successive values of the true U are correlated by very nature of the data
or system. This case of autocorrelation may be called “true-autocorrelation” because its root lies in the
U term itself. Sometimes, purely random factors (such as wars, drought, storms, strikes etc) impose in-
fluences that are spread over more than one period of time. These factors result in serially dependent
values of the disturbance term U, which violate assumption of no autocorrelation.

c. Specification bias

Autocorrelation may arise as a result of specification biases which are arise as a result of exclusion of
variables from the regression model; incorrect functional form of the model; neglecting lagged terms
from the regression model. Let’s see one by one how the above specification biases causes autocorrela-
tion.

i. Exclusion of variables: One of these is exclusion of variable(s) from the model which reflect sys-
tematic pattern in error changes. It is customary that most economic variables tend to be auto-correl-
ated. If an auto-correlated variable has been excluded from the set of explanatory variables, obvi-
ously its influence will be reflected in the random variable U. This case may be called “quasi –
autocorrelation”, since it is due to the auto-correlated pattern of omitted explanatory variables; and
not due to the behavioural pattern of the values of the true U.

For example, suppose the correct demand model is given by:


yt    1 x1t   2 x2 t   31 x3t  U t ................................................4.50

277
where y  quantity of beef demanded, x1  price of beef, x 2  consumer income, x3  price of

pork and t  time. Now, suppose we run the following regression in lieu of (4.50):
yt    1 x1t   2 x2 t  Vt .............................................................4.51

Now, if equation 4.50 is the ‘correct’ model, running equation 4.51(incorrect model) is the tanta-
Vt   3 x3t  U t
mount to letting . To the extent the price of pork affects the consumption of beef, the
disturbance term V will reflect a systematic pattern, thus creating autocorrelation. A simple test of
this would be to run both equation 4.50 and equation 4.51 and see whether autocorrelation, if any,
observed in equation 4.50 disappears when we run equation 4.51. The actual mechanics of detecting
autocorrelation will be discussed latter.
ii. Incorrect functional form: This is also other source of the autocorrelation of error term. If we have
adopted a mathematical form that differs from the true form of the relationship, the values of U may
show serial correlations. Suppose the ‘true’ or correct model in a cost-output study is specified as;
Marginal cost = 0  1outputi   2 outputi 2  U i ......................................................4.52

However, we incorrectly fit the following model.


M arg inal cos ti  1   2 outputi  Vi .......................................................................4.53
.
The non-linear model of marginal cost curve corresponding to the ‘true’ model is shown in the figure
4.7 below along with the ‘incorrect’ linear cost curve.

Figure 4.7: Problem of incorrect functional form

As the figure shows, between points A and B the linear marginal cost curve will consistently overesti-
mate the true marginal cost; whereas, outside these points it will consistently underestimate the true mar-
Vi
ginal cost. This result is to be expected because the disturbance term ( ) is, in fact, equal to (output)2+
ui, and hence will catch the systematic effect of the (output)2 term on the marginal cost. In this case, V i
will reflect autocorrelation because of the use of an incorrect functional form.

iii. Neglecting lagged term from the model: - If the dependent variable of a certain regression model is
to be affected by the lagged value of itself or the explanatory variable and is not included in the
278
model, the error term of the incorrect model will reflect a systematic pattern which indicates autocor-
relation in the model. Suppose the correct model for consumption expenditure is:
Ct    1Yt   2Yt 1  U t ...............................................................................4.54

but again for some reason we incorrectly regress:

Ct    1 yt  Vt ..........................................................................................4.55

Vt   2Yt 1  U t
where . Hence, Vt shows systematic change reflecting autocrrelation.
Practical Exercise 4. 10
What are the possible reasons for autcorrelation ?

4.2.4. The coefficient of autocorrelation


This discussion of sources of autocorrelation makes it obvious that the assumption of temporal inde-
pendence of the values of U can be easily violated in practice. Hence, a logical question that follows is:
“What form will the autocorrelation among the values of the error term take?
Autocorrelation is a kind of lag correlation between successive values of same variables. Thus, we treat
autocorrelation in the same way as correlation and one can develop many relationships in this line. A
simple case is linear correlation is termed as autocorrelation of first order. That is, if the value of U in
any particular period depends on its own value in the preceding period, we say that U’s follow a first or -
der autoregressive scheme AR(1) (or first order Markov scheme) i.e.

ut  f (ut 1 )....................................................................................4.56
.

If ut depends on the values of the two previous periods, then:

ut  f (ut 1 , ut  2 ).................................................................................................4.57

This form of autocorrelation is called a second order autoregressive scheme and so on.

Then, the complete form of the first-order Markov process (i.e. the pattern of autocorrelation for all the
values of U ) can be developed as follows;
ut  f (ut 1 )   ut 1   t
ut 1  f (ut  2 )   ut  2   t 1
ut  2  f (ut 3 )   ut 3   t  2
ut 3  f (ut  4 )   ut  4   t 3
ut  r  f (ut ( r 1 )   ut ( r 1)   t  r
279
Generally simple first order linear form of autocorrelation can be presented in formal way as

: ut = f(ut-1)

u t  u t 1  vt
…………………………………………………….4.58

where  the coefficient of autocorrelation and V is a random variable satisfying all the basic assump-
tion of ordinary least square.
(v)  0, (v 2 )   v2 and (v i v j )  0 for i  j

u and ut
Note that equation 4.58 represents a linear relationship between t 1 where a is the slope coeffi-
cient and the constant intercept is equal to zero. Furthermore, the equation is a simple linear regression
model with suppressed constant term.
Practical Exercise 4. 11

Why do we think we use  as a coefficient of ut-1 in equation 4.58 above?

ut 1 and ut
We know that the autocorrelation coefficient, ρ, between is given as:

u u t t 1
 u ,u 
t t 1
......................................................................4.59
u u 2
t
2
t 1

Equation 4.59 is a special form of correlation coefficients between any two variables (X and Y), which
are defined as

x y i i
rx , y 
x y 2
i
2
i

Note that for large sample size (i.e. for large t ),

u 2
t   u 2t 1........................................................................................4.60

Then, equation 4.59 can be rewritten as:

u u t t 1 u u t t 1
 u ,u 
t t 1
 .............................................................................4.61
 u t 1
2 2
 
  u t 1 
2

 
Thus, in order to define the error term in any particular period t for the first order Markov process, we
u   ut 1   t
start from the autocorrelation function at period t which is given equation .4. 58 as t One
.
can develop the formula for the AR1
280
u1   u0  1.....................................................................................................4.62

u2   u1   2   (  u0  1 )   2   2u0   1   2

u3   u2   3   (  2u0   1   2 )   3   3u0   21   1   3 .......................4.63

ut   ut 1   t   (  t 1u0   t  21  .......   2 )   3


  t u0   t 1 u1   t  2 2  .......    t 1   t
t 1
  t u0    t  t  j ............................................................................................4.64
j 0

 1
For this process to be stable,  must be less than one in absolute value, that is , . Then, as the
power of ρ increases to infinity, the term with the lagged u (i.e.  t
u0
), tends to zero. In other words,
t

as t → ∞ ,  → 0 .In such cases as t  ,


t
  0  0. t

,
Thus t can be expressed as:
t 1
ut    t  t 1   t  0   t 11   t  2 2  .......    t 1   t
j 0
t 1
ut    t  t 1   t   1 t 1   2 t  2   3 t 3  ..............................................................4.65
j 0

Thus, equation 4.65 gives the value of the error term when it is autocorrelated with a first-order autore -
 1
,  keeps on decreasing as j keeps on increasing. This means that the
j
gressive scheme. Note that
effect of the recent past is significant and keeps on diminishing as we go further back (called fading
memory). Finally, we will establish the mean, variance and covariance of Ut; auto correlated with a first-
order autoregressive scheme.

t
i. The mean of is

 t 1  t 1
E (ut )  E    t E ( t  j )     t E ( t  j )  0
 j 0  j 0

E (vt  j )  0 E ( t )  0................4.66
But by the assumption of the distribution of ui we have ,
It is evident from equation 4.66 that the mean of an auto-correlated stochastic term, u t , is zero if the
autocorrelation is simple first-order autoregressive scheme.

1
a 1, then  a j 
j 0 1 a
Then, the sum of infinite series

281
t
ii. The variance of is
 2  var( t )  E ( t )  E  t  E ( t )   E ( t ) 2 as E ( t )  0...........................4.67
2

Substituting the value of t u from equation 4.64 we get


2
 
 E   j t  j 
 j 0 

  
 E    2 j  2 t  j    j  i  t  j  t i 
 j 0 j 0 

 
   2 j E ( 2 t  j )    j  i E ( t  j  t i ), E ( 2t  j )   u 2 and E ( t  j  t i )  0
j 0 j 0

or
2
 E  t   t 1   2 t  2  ....

 E  2t   2t 1   2 2t  2  ...........  2(cross products ) 



 E    t   2t  r  2(cross products
2

 t 0

 E    t   2t  r  2 E (cross products ); but E (cross products )  0
2

 t 0

Then it becomes

 
var(ut )  E    t   2t r 
2

 t 0 

    t  E ( 2t  r )
2

t 0

  2 i  
t 0
t 2

 


j 0
2j
u  u
2 2

j 0
2j
  2 (1   2   4   6 ....).......................................4.68


j 0
2j

The expression is the sum of a geometric progression of infinite terms whose first term is 1 and
the common ratio is  . Following the summation rule of a geometric progression, the sum of the terms
2

282

1 1
 1.  j 
1   . Generally, the summation rule of
in the bracket converge to 1   since That is , j 0
a geometric progression states that the sum of n terms of a geometric progression with the first term of “
a ” and the common ratio of “λ ” is given by the formula.
a (1   n )
S ...............................................................4.69
1 
1
For infinite series (i.e., n → ∞) with   1 , “ S ” reduces to 1   . Therefore, using this rule equation 4.68
becomes

 1 
E (ui 2 )   u 2  2
( for  1)
1   
or
 u2
 2  var( t )  ............................................................4.70
1  2

c. The covariance between


 t and  t  j is derived as

Given that
 t  ut   ut 1   2ut  2  ....... and  t 1  ut 1   ut  2   2ut 3  ...... , ……………….……4.71

cov( t ,  t 1 )  E ut  E (ut ) ut 1  E (ut 1 )   E (ut , ut 1 )


Then the covariance function can be:

 E (ut   1ut 1   2ut  2  ......)(ut 1   u t  2   2 ut 3  ......)

E (ut ut  2 )   E (ut ut  2 )   2 E (ut ut 3 )  ........


        
0 0 0

  E (ut 1ut 1 )   E (ut 1ut  2 )   2 E (ut 1ut 3 )  .....


2
        
2 0 0

  E (ut  2ut 1 )   E (ut  2ut  2 )   E (ut  2ut 3 )  .....


2 3 4
        
2 0 0

  E (ut 3ut 1 )   E (ut 3ut  2 )   E (ut 3ut 3 )  .....


3 4 5
        
0 0 2

and so on

  u 2   3 u 2   5 u 2  ........
  u 2 (1   2   4  ........)

  u 2    2 
j

j o

 u 2

1  2
  u 2 ...................................................................................4.73

283
cov( t ,  t  s )  E ( t ,  t  s )   s 2 cov( t ,  s )  E ( t ,  s )   s t
2
In general: . Equivalently, we have

Thus, the relationship between the distributions depends on the value of the parameter  . From above
result we have

cov( t ,  t 1 )  E ( t ,  t 1 )    2
cov( t ,  t 1 ) E ( t ,  t 1 )
 
 2
2

cov( t ,  t 1 )
 sin ce var( t )  var( t 1 )   2
var  t var  t 1
………………………………4.74

One can write the above formula in terms of expectation as follows


T  
 ( t ,  t 1 )
 t 2
......................................................................4.75
T  T 


t 2
 2
t 
t 2
 2
t 1

Thus,  is nothing but the coefficient of correlation between


t and  t 1. 
is estimated from the
residuals of the OLS regression by:

Similarly, it can be shown that:

cov( t ,  t  2 )  E ( t ,  t  2 )   2 2
cov( t ,  t 3 )  E ( t ,  t 3 )   3 2
. . .
. . .
. . .
cov( t ,  t  r )  E ( t ,  t  r )   r 2 ( for r  t )
From this pattern we can conclude that as the lag, say r , increases (i.e., as the lag gets far apart from the
current period) the (spill over) effect of u and u declines. This is due to the fact that   1 , and hence
t t-1

 declines and approaches zero as r approaches infinite.


r

The above relationship states the simplest possible form of autocorrelation; if we apply OLS on the
model given in ( 4.61) we obtain:
n

u u t t 1
ˆ  t 2
n
.........................................................................4.76
u
t 2
2
t 1

284
, we observe that coefficient of autocorrelation  represents a
u t2  u t21
Note that: for large samples:
simple correlation coefficient r.
n n n

 ut ut 1  ut ut 1 u u t t 1
ˆ  t 2
n
 t 2
 t 2
 rut ut 1 ...........................................4.77
2
ut2 ut21
u
t 2
2
t 1
 n

  u t 1 
2

 t 2  (Why?)

 1  ˆ  1 since  1  r  1

This proves the statement that says “we can treat autocorrelation in the same way as correlation in gen-
eral”. From our chapter one discussion if the value of r is 1 we call it perfect positive correlation, if r is
-1 there is perfect negative correlation and if the value of r is 0 ,there is no correlation. By the same

analogy if the value of ̂ is 1 there is perfect positive autocorrelation, if ̂ is -1 it is called perfect neg-
u  u t  1  v t
ative autocorrelation and if ̂ =0 in t
u
i.e. t is not correlated.

Practical Exercise 4. 12

Why autocorrelation and correlation coefficients range between zero and one?

4.2.5. Effect of autocorrelation on OLS Estimators.

We have seen that ordinary least square technique is based on assumptions related to mean, variance and
covariance of disturbance term. If these assumptions do not hold true, the estimators derived by OLS
procedure may not be efficient. Similarly, autocorrelation has certain effects of on the estimators of
OLS. Some of these effects are

1. OLS estimates are unbiased: We know that:


̂    k i u i

( ˆ )    k i (u i )  (u i )  0
We proved ,

ˆ
Therefore, (  )   ....................................................4.78

ˆ
2. The variance of OLS estimates is inefficient: The variance of estimate  in simple regression
model will be biased down wards (i.e. underestimated) when u’s are auto correlated. It can be
shown as follows. From unbiased condition above we know that:

̂    k i u i ;  ̂    k i wi

285
Var ( ˆ )  ( ˆ   ) 2  (ki ui ) 2
2 2
 (k1u1  k 2 u 2  ......  k n u n ) 2  (k1 u1  k 22 u 22  .......  k n2 u n2  2k1 k 2 u1u 2  ....  2k n 1 k n u n 1u n )
  (  k i u i  2 k i k j u i u j )
2 2

2
 k i (u i ) 2  2k i k j (u i u j )

ˆ )   2 k 2   u
2

(u i u j )  0 var(  u i
If there is no autocorrelation, , the last term disappears so that: x 2 . How-
ever, we proved that
(u t u t s )  0
but equal to
 s u2

2 xi x j
Var ( ˆ )   2 u2   2 ...............................................................4.79
x 2
( x )
2 2
i

But in the presence of autocorrelation


2
 
    xt  t 
Var ( B) AR (1)  E ( B  B)  E 
2

 x t
2

 
 

 1  
Var ( B ) AR (1)  E   x 2t  2t  2 xt  t xs  s 
x t 
2
s t 

1   2  
 E   x 2t  2t   E   xt  t xs  s 
 x     x t   s t
2 2 2
2
t
 

1  2  2  
   x t E ( t )   E   xt xs E ( t  s )  but E ( t  s )   s t 2
2

 x     x t   s t
2 2 2
2
t
 

2 2 2  xt xt 1 xx xx 
     2 t t 22   2 t t 23  .....
=  x 2t   x t x t x t
2 2
x t 

2 2  xt xt 1 xx xx 
 var( B)      2 t t 22   2 t t 23  .....
 x 2t   x t x t x t
2


xi x j
var( ˆ ) auto  var( ˆ ) nonauto  2 u2  s ...............................................4.80
(xi )
2 2

286
If  is positive or when   0 and Xi is positively correlated with X t+1,X t+2, ……, the second term on
2
Var ( ˆ )  2
the right hand side will usually be positive. The implication is that x if wrongly used
ˆ ˆ
Var (  ) auto  Var (  ) nonauto
while the data is auto correlated. . The var( ) is underestimated if the data
is auto correlated. The true variance is

2 xi x j 2
 2 u2  2 ...................................................................................4.81
x 2
( x )2 2
i not x
2
.

To see how the underestimation of var(ui) occurs, recall variance if there is no autocorrelation

E ( e )  (n  2)
2 2
or  2

e 2

n2

If we assume a first-order autoregressive scheme, it can be shown that

  n 1 n2

   xi xi 1 xx i i2
x x 
E ( e )    n  2  2   n
2 2 i 1
 2 2 i 1
n
 ......  2  n 1 n i n   ..............4.82
  x2i  
 

 x 2i x 2
i  
 i 1 i 1 i 1

If both u and X are positively auto correlated, then the OLS formula will most likely underestimate  u
2

because it would ignore the expression in parenthesis which is almost certain positive. In the real world,
the successive value of most explanatory variables(X is) are positively correlated, and this would tend to
underestimation (sometimes substantial) of the variance of u and of the variance of the estimates of the
i ’s OLS were applied.

In the case the explanatory variable X of the model is random, the covariance of successive values is
(  x i x j  0)
zero , under such circumstance the bias in var( ) will not be serious even though u is auto-
correlated.

3. Wrong testing procedure which will lead wrong prediction and inference about the characteris-
tics of the population. If the values of u are correlated, the predictions based on ordinary least square
estimates will be inefficient in the sense that they will have larger variance as compared with prediction
var(ˆ ) SE ( ˆ )
based on estimates obtained from other econometrics methods. If is underestimated, is
ˆ
also underestimated, this makes t-ratio large. This large t-ratio may make statistically significant
while it may not. Thus, if the error are autocorrelated , and yet we keep on using OLS the variances of
regression coefficients will be underestimated leading to narrow confidence intervals, high values of R 2
and inflated t-ratios.

287
Practical Exercise 4. 13

What are the possible consequences of autocorrelation?

4.2.6. Detection (Testing) of Autocorrelation

There are two methods that are commonly used to detect the existence of autocorrelation. These are:

1. Graphic method

Recalled from section 4.2.2 that autocorrelation can be detected using graphs as a rough indicator. De-
tection of autocorrelation using graphs will be done in two ways: by plotting the regression residuals
against their own lagged values or against time. Given a data of economic variables, autocorrelation can
be detected in this data using graphs in the following two procedures.

a. Plotting the regression residuals against their own lagged values. To this end, first run regression us-
ing OLS to the given data. Then, obtain the error terms. To see whether there is autocorrelation or
et et 1
not plot horizontally and vertically for successive values of errors i.e. plot the following ob-
(e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......( en , en 1 )
servations .If it is found that most of the points fall in quadrant I
and III, as shown in fig (a) below, we say that the given data is autocorrelated and the type of auto -
correlation is positive autocorrelation. If most of the points fall in quadrant II and IV, as shown in fig
(b) below the autocorrelatioin is said to be negative. But if the points are scattered equally in all the
quadrants as shown in fig (c) below, then we say there is no autocorrelation in the given data.

288
Figure 4.8: Graphic pattern of autocorrelation
b. The graphs obtained by plotting the regression residuals against time: If the values of the regres-
sion residuals in successive periods show a regular pattern, we conclude that the error term is autocorrel-
ated. Specifically, if the successive values of the regression residuals change sign frequently, then there
will be negative autocorrelation. In contrary, the regression residuals do not change their sign frequently
so that several positive values of them are followed by several negative values, then it can be a signal for
positive autocorrelation. If they show no pattern, then it can be concluded that there is no autocorrela-
tion.(see part 4.2.2).

Examples: regress Ethiopian expenditure on imported goods (Y) and personal disposable income (X),
1968-1987(in ten million birr) and find the residual for the regression.

Table 4.8: Consumption and Income overtime


289
Year Y X
1968 135.7 1551.3
1969 144.6 1599.8
1970 150.9 1668.1
1971 166.2 1728.4
1972 190.7 1797.4
1973 218.2 1916.3
1974 211.8 1896.9
1975 187.9 1931.7
1976 229.4 2000.1
1977 259.4 2066.6
1978 274.1 2167.4
1979 277.9 2212.6
1980 253.6 2214.3
1981 258.7 2248.6
1982 249.5 2261.5
1983 282.2 2331.9
1984 351.1 2469.8
1985 367.9 2542.8
1986 412.3 2640.9
1987 439 2686.3

Solution:
Table 4.9: regression result of consumption and income
Variable Estimated Standard error t-ratios P-value
coefficient

X 0.24523 0.014759 16.616 0.000

Constant -261.09 31.327 -8.3345 0.000

Table 4.10: Residual from the above regression


290
Error( ui D= Ut- Sign of
Ut U t-1 D2 Ui2
) Ut-1 Ut
U1 16.3642 - - - 267.787 +
U2 13.3705 16.3642 -2.9937 8.96224 178.787 +
U3 2.9212 13.3705 -10.4493 109.1879 8.533409 +
U4 3.4338 2.9212 0.5126 0.262759 11.79098 +
U5 11.0128 3.4338 7.579 57.44124 121.2818 +
U6 9.3548 11.0128 -1.658 2.74896 87.51228 +
U7 7.7123 9.3548 -1.6425 2.697806 59.47957 +
U8 -24.7218 7.7123 -32.4341 1051.971 611.1674 -
U9 0.2837 -24.7218 25.0055 625.275 0.080486 +
U10 13.6966 0.2837 13.4129 179.9059 187.5969 +
U11 3.6772 13.6966 -10.0194 100.3884 13.5218 +
U12 -3.6072 3.6772 -7.2844 53.06248 13.01189 -
U13 -28.3241 -3.6072 -24.7169 610.9251 802.2546 -
U14 -31.6355 -28.3241 -3.3114 10.96537 1000.805 -
U15 -43.999 -31.6355 -12.3635 152.8561 1935.912 -
U16 -28.5633 -43.999 15.4357 238.2608 815.8621 -
U17 6.5193 -28.5633 35.0826 1230.789 42.50127 +
U18 5.4174 6.5193 -1.1019 1.214184 29.34822 +
U19 25.7603 5.4174 20.3429 413.8336 663.5931 +
U20 41.3268 25.7603 15.5665 242.3159 1707.904 +
5093.063 8558.714

Incidentally, such a plot is called a time series –sequence plot. An examination of Table 4.10 shows that
the residual Ui’s do not seem to be randomly distributed. As a matter of facts, they exhibit a distinct be-
haviour initially, they are generally positive, and then become negative, and after that a given turn will
be positive indicating positive autocorrelation.
Practical Exercise 4. 13
For the above data regression result has been found. Furthermore, residuals have been computed. draw
figure of residual with its lagged as per this model and test autocorrelation.

C. Test based on the partial autocorrelation function (PACF) of OLS residuals


Plot the PACF of OLS residuals. If the function at lag one is outside the 95% upper or lower confidence
limits, then this is an indication that the errors follow the AR(1) process. Higher order error processes
can be detected similarly.Illustrative example is presented using data in table 4.12.
291
Table 4.11: Investment and Value of outstanding shares for the three 1935-1953

Year Investment(Y) Valsh(X)


1935 317.6 3078.5
1936 391.8 4661.7
1937 410.6 5387.1
1938 257.7 2792.2
1939 330.8 4313.2
1940 461.2 4643.9
1941 512 4551.2
1942 448 3244.1
1943 499.6 4053.7
1944 547.5 4379.3
1945 561.2 4840.9
1946 688.1 4900.9
1947 568.9 3526.5
1948 529.2 3254.7
1949 555.1 3700.2
1950 642.9 3755.6
1951 755.9 4833
1952 891.2 4924.9
1953 1304.4 6241.7

Table 12: Estimated regression equation of Y on X is and autocorrelation

Understanding coefficients T Sig

B Std.error

Constant -186.160 216.292 .861 0.401

Coefficient .175 .050 3.527 .003

R2=0.422, F=12.437 (p-value=0.003), DW-stat=0.553

The F-statistics is significant at the 1% level. This indicates that the model is adequate.

2. Formal testing method


Generally, the graphical method gives rough idea about the existence and pattern of autocorrelation,
which makes the use of other methods compulsory. This method is called formal as the test is based on
the formal testing procedure. It is based on either the z-test, t-test, F-test or X 2 test. If a test applies any

292
of the above, it is called formal testing method. Different econometricians and statisticians suggest dif-
ferent techniques of testing methods. In this teaching material, the most frequently and widely used test-
ing methods by researchers are the following are Run test, Durbin – Watson Test, and Breusch –God-
frey (BG) Test. Let see them in some details

A. Run test: Before going to the detail analysis of this method, let us define what a run is in this con-
text. Run is the number of positive and negative signs of the error term arranged in sequence according
to the values of the explanatory variables, like “++++++++-------------++++++++------------+++++
+”

By examining how runs behave, one can derive a test of randomness of runs. If there are too many runs,
ˆ
that is if U ' s change sign frequently, it an indication that there is negative serial correlation. Similarly,
if there are too few runs, they may suggest positive autocorrelation. Now let: n  total number of obser-
n 
vations  n1  n2 , n1  number of + symbols; 2 number of – symbols; and k  number of runs or the
number of turns of positive and negative sign in our data set. Our decision is as per Swed and Eisenhart
special table of critical value of the runs expected in a random sequence of N observations attached in
appendices.

Using the above examples of consumption expenditure regression given in table 4.10. The last column
gives the signs of the various residuals. We can write these signs in a slightly different form as follows:
(+++++++) (-) (+++) (------) (++++) in a strictly random sequence of observations.

For instance, there are 5 runs (K=5) positive followed by negative and then positive etc. The number of
positive signs or negative signs show length of runs. For instance; 7 plus sign show length of 7 and one
minus sign show length of one and so on. For K=5, n 1=14 and n2=6 the critical value of the runs is 5. If
the actual number of runs is equal to or less than 5, then we can reject the null hypothesis implying there
is autocorrelation. If, we accept H0 of random ei as a critical value is 5, then these are autocorrelation.

Note that: Swed- Eisenhert is at most for 40 observations,20 pluse and 20 minus. But if actual sample
size is greater we cannot use this table rather we resort to nominal distribution.

Under the null hypothesis that successive outcomes (here, residuals) are independent, and assuming that
n1  10 and n2  10 , the number of runs is distributed (asymptotically) normally with:

293
2n1n2
( k )   1..........................................................................4.83
Mean: n1  n2
2n1n2 (2n1n2  n1  n2 )
 k2  .......................................................4.84
Variance: (n1  n2 ) 2 (n1  n2  1)

Decision rule is that you do not reject the null hypothesis of randomness or independence with 95%
confidence. If
(k )  1.96 k  k  E (k )  1.96 k  ; reject the null hypothesis if the estimated k lies out-
side these limits.

In a hypothetical example of n1  14, n2  18 and k  5 we obtain

(k )  16.75,  k2  7.49395   k  2.7375

Hence the 95% confidence interval is:

16.75  1.96(2.7375)  11 .3845,22.1155 


since k=5, it clearly falls outside this interval. Therefore, we can reject the hypothesis that the observed
sequence of residuals is random (are of independent) with 95% confidence.

Practical Exercise 4. 14
n1  12, n2  16 and k  5
For of we test autocorrelation using run test obtain test autocorrelation.

B. The Durbin-Watson or d test: The most celebrated test for detecting serial correlation is one that is
developed by statisticians Durbin and Waston. It is popularly known as the Durbin-Waston or simply d
statistic, which is defined as:
t n

 (e  et t 1 )2
d t 2
t n
...........................................................4.85
e
t 1
2
t

In the numerator of d statistic the number of observations is n  1 because one observation is lost in tak-
ing successive differences.

It is important to note the assumptions underlying the d-statistics

1. The regression model includes an intercept term. If such term is not present, as in the case of the re -
gression through the origin, it is essential to rerun the regression including the intercept term to ob-
tain the RSS.

2. The explanatory variables, the X’s, are non-stochastic, or fixed in repeated sampling.

Ut Vt  u t 1   t
3. The disturbances are generated by the first order auto regressive scheme:

294
4. The regression model does not include lagged value of the dependent variable ,Y, as one of the ex -
planatory variables. Thus, the test is inapplicable to models of the following type

Yt  1   2 X 2t   3 X 3t  .......   k X kt  rYt 1  U t

Y
Where t 1 the one period lagged value of Y is such models are known as autoregressive models. If
d-test is applied mistakenly, the value of d in such cases will often be around 2, which is the value of
d in the absence of first order autocorrelation. Durbin developed the so-called h-statistic to test serial
correlation in such autoregressive.

5. There are no missing observations in the data. From equation 4.85 the value of
t n

 (e t  et 1 ) 2
d t 2
t n

e
t 1
2
t

Squaring the numerator of the above equation, we obtain


n n

e  e2
t
2
t 1  2et et 1
d t 2 t 2
....................................................4.86
et2

n n

 et2   et21
However, for large samples t 2 t 2 because in both cases one observation is lost. Thus,
n
2 et2
2et et 1
d t 2

e 2
t e 2 t

 
 e e 
d  2 1  n t t 1 
 et2 



t 1 

et et 1

but e 2t from equation

d  2(1  ˆ )....................................................................4.87

As 1    1 , implies that 0  d  4

From the above relation, therefore

295
 ˆ  0, d  2

if  ˆ  1, d  0
 ˆ  1, d  4

Thus we obtain two important conclusions

i. Values of d lies between 0 and 4

ii. If there is no autocorrelation ˆ  0, then d  2

iii. if ˆ  1, d  0 we have perfect positive autocorrelation, therefore, if 0<d*<2 there


is some degree of positive autocorrelation, but it is stronger the closer d* is to zero.

iv. if ˆ  1, d  0 we have perfect negative autocorrelation therefore, if 2<d*<4 there


is some degree of negative autocorrelation, but it is stronger the higher the value of
d*. 8 is to zero.

Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept null hy-
pothesis, and if it is close to zero or four, we reject the null hypothesis that there is no autocorrelation.
It should be clear that in Durbin-Watson test the null hypothesis of zero autocorrelation, ρ = 0 is carried
out indirectly, by testing the equivalent hypothesis d = 0 . Therefore, after formulating the hypotheses,
we use the sample residuals and compute the empirical value of the Durbin-Watson d statistic. Then, fi-
nally, we compare the empirical d with the theoretical values d that define the critical region of the test.
However, because the exact value of d is never known, there exist ranges of values with in which we can
d
either accept or reject null hypothesis. We have d L -lower bound and u upper bound which are appro-
priate to test the hypothesis of zero first – order autocorrelation against the alternative hypothesis of pos -
itive first – order autocorrelation. Durbin and Watson have tabulated these upper and lower values at the
5 % and1% level of significance. The tables assume that the u' s are normal, homoscedastic and not auto
correlated, and that the X ' s are truly exogenous.
The test compares the empirical value of d with dl (lower boundary) and du (upper boundary) in the
Durbin-Watson tables and with their transformed values, (4-d L ) and ( 4-du ).The comparison using dL
and du investigates the possibility of positive autocorrelation; while the comparison with(4-d L ) and ( 4-
du ) investigates the possibility of negative autocorrelation.

For the two-tailed Durbin Watson test, we have five regions for the values of d. The mechanisms of the
D.W test are as follows, assuming that the assumptions underlying the tests are fulfilled.

 Run the OLS regression and obtain the residuals

 Obtain the computed value of d using the formula given in equation 4.87

d
 For the given sample size and given number of explanatory variables, find out critical d L and U
values.

 Now follow the decision rules given below.

296
1. If d is less that d L or greater than (4  d L ) we reject the null hypothesis of no autocorrelation in fa-
vor of the alternative which implies existence of autocorrelation.

dU (4  d U )
2. If, d lies between and , accept the null hypothesis of no autocorrelation

(4  d U )
and (4  d L ) , the D.W test is
d
3. If however the value of d lies between d L and U or between
inconclusive.

Y    X  U i d L  1.37;
Example 1. Suppose for a hypothetical model ,if we found d  0.1380 ;
dU 
1.50. Based on the above values test for autocorrelation

(4  d U )
Solution: First compute (4  d L ) and
d
and compare the computed value of d with d L , U ,
(4  d L ) and (4  d U )
i.e.

(4  d L ) =4-1.37=2.63 and (4  dU )  4  1.5  2.50

Since d is less than d L we reject the null hypothesis of no autocorrelation

297
Y    X t  U t
Example 2. Consider the model t with the following observation on X and Y
Table 4.13: Hypothetical values of X and Y

X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11

Test for autocorrelation using Durbin -Watson method

Solution:

Y    X t  U t
1. Regress Y on X using data above : i.e. t :
xy  255, Y  7, (ei  et 1 )2  60.21, x2  280, X  8, et2  41.767, y2  274

Then

xy 255
ˆ    0.91
x 2 280 , ˆ  Y  ˆX  7  0.91(8)  0.29

(et  et 1 ) 2 60.213
d   1.442
Y  0.29  0.91X  U i
, Yˆ  0.28  0.91X , R 2  0.85 , et2 41.767

d
Values of d L and U on 5% level of significance, with n=15 and one explanatory variable are: d L =1.08
d (4  d u )  2.64 dU  d  4  dU  (1.364 2.64) d *  1.442
and U =1.36. . Since d* lies between
,
dU  d  4  dU
, accept H0. This implies the data is autocorrelated.

Although D.W test is extremely popular, the d test has one great drawback when tests falls in the incon-
clusive zone or region, one cannot conclude whether autocorrelation does or does not exist. Several au-
thors have proposed modifications of the D.W test.

d
In many situations, however, it has been found that the upper limit U is approximately the true signifi-
d
cance limit. Thus, the modified DW test is based on U in case the estimated d value lies in the incon-
clusive zone, one can use the following modified d test procedure. Given the level of significance  ; if

   0 versus H 1 :   0 if the estimated d  d U , reject H0 at level  , that is there is statisti-


cally significant positive correlation.
H 0 :   0 versus H 1 :   0 d  dU (4  d u )  d U
 if the estimated or reject H0 at level 2
statistically there is significant evidence of autocorrelation, positive or negative.

298
Practical Exercise 4. 15

Y    X  U i d L  1.37; d U  1.40.
Suppose for a hypothetical model ,if we found d  1.1380 ;
Based on the above values test for autocorrelation

C. Breusch –Godfrey (BG) Test


As alternative to DW test one can use Breusch –Godfrey (BG) test. This test has three advantages over
DW test. The test is always conclusive and also valid when lagged values of the dependent variable ap -
pear as regressors. Furthermore, the test is valid for higher order AR schemes (not just for AR(1) error
scheme only). Assume that the error term follows the autoregressive scheme of order p (AR(p)) given
by:

 t  1 t 1   t 2  ......   p  t  p  ut ........................................................4.88

ut
where fulfils all assumption of the CLRM. The null hypothesis to be tested is:
H 0 : 1   2  .....   p  0
.

Steps:

Yt     X t   t
1. Estimate the model: and obtain the residuals  t

   
  .........  t  p ,   t
2. find auxiliary regression by regress  t on Xt and that is t 1, t 2,
  
 t     X t  1  t 1    t  2  ......   p  t  p  t
……………………………4.89

3. Obtain the coefficient of determination R2 from the auxiliary regression.


(n  p) R 2  p 2
4. If the sample size T is large, Breusch and Godfrey have shown that (T-P)R2 or fol-
2
lows the Chi-square ( X ) distribution with p degrees of freedom.
5. Decision criteria is to reject the null hypothesis of no autocorrelation if (n − p)R 2 exceeds the tabu-
lated value from the  distribution with p degrees of freedom for a given level of significance  ac-
2

cept otherwise.

4.2.7. Remedial Measures for the problems of Autocorrelation

Assume after applying one or more of the diagnostic tests of autocorrelation discussed in the previous
section, we find that there is autocorrelation. Since in the presence of serial correlation the OLS estima -
tors are inefficient, it is essential to seek for remedial measures. The remedy however depends on what

299
knowledge one has about the nature of interdependence among the disturbances. Different options are
recommended as remedial measures. First check for misspecification in a sense that it has excluded
some important variable or there is incorrect functional form. Second, try to check whether there is pure
autocorrelation and not. If it is pure autocorrelation, one can use appropriate transformation of the origi -
nal model so that in the transformed model we do not have the problem of pure autocorrelation. As in
the case of heteroskedaticity, we will have to use some form of generalized least square (GLS) methods.
This depends on whether the coefficient of autocorrelation is known or not known.

A. when  is known- When the structure of autocorrelation is known, i.e  is known, the appropriate
corrective procedure is to transform the original model or data so that error term of the transformed
model or data is not autocorrelated. When we transform, we are wipe out the effect of  . Suppose that
our model is

Yt     X t  U t U t  U t 1  Vt , |  | 1................................4.90
and

Equation (4.90) indicates the existence of autocorrelation. If  is known, then transform Equation (4.90
into one that has not autocorrelated. The procedure of transformation will be given below. Take the
lagged form of equation (1) and multiply through by  .

 y     X t 1  U t 1......................................................................4.91
t 1

Subtracting (4.91) from (4.90), we have:

Yt  Yt 1  (   )  (  X t   X t 1 )  (U t  U t 1 )................................4.92

By rearranging the terms in (4.92), we have

Vt  U t  U t 1

which on substituting the last term of (4.92) gives

Yt  Yt 1  (   )   ( X t   X t 1 )  vt ...............................................4.93

Yt *  Y  y t 1
Let:

a       (1   )

X t*  X t  X t 1

Equation (4.93) may be written as:

Yt *  a  BX t*  vt ...................................................................................4.94

300
It may be noted that in transforming Equation (4.90) into (4.94) one observation shall be lost because of
lagging and subtracting in (4.92). We can apply OLS to the transformed relation in (4.94) to obtain
ˆ and ˆ for our two parameters  and  .

2
aˆ  1 
ˆ  var ˆ    var(aˆ )....................................4.95
1   and it can be shown that  1   

v
Because ̂ is perfectly and linearly related to â . Again since t satisfies all standards assumptions, the
ˆ
variance of ˆ and  would be given by our standard OLS formulae.

 u2 X t2 *  u2
var(ˆ )  n
, var( ˆ )  n
................................4.96
n ( X  X )
*
t
2
 ( X *
t X ) * 2
t
ti

The above transformation is known as the Cochrane-Orcutt transformation. Since t


   t 1  ut , ful-
fils all assumption of the CLRM, we can apply OLS to equation (4.94) to get estimators which are

BLUE. One of the basic problem in the above transformation requires knowledge of the value of  .
Thus we need to estimate it.

B. When  is not known

When  is not known, we will describe the methods through which the coefficient of autocorrelation

can be estimated. There are two methods; based on a priori information on  and Estimated  from
d-statistic.


i. Method I: A priori information on

Many times an investigator makes some reasonable guess about the value of autoregressive coefficient
by using his knowledge or institution about the relationship under study. Many researchers usually as-
sume that  =1 or -1. Under this method, the process of transformation is the same as when  is
known. When  =1, the transformed model becomes;

(Yt  Yt 1 )  ( X t  X t 1 )  Vt Vt  U t  U t 1.................................4.97
; where

The constant term is suppressed in the above. B̂ is obtained by taking merely the first differences of the
variable and obtaining line that passes through the origin. Suppose that one assumes  = -1 instead of
 =1, the case of perfect negative autocorrelation. In such a case, the transformed model becomes:

301
Yt  Yt 1 ( X  X t 1 ) vt
Yt  Yt 1  2   ( X t  X t 1 )  vt    t  .....................4.98
Or 2 2 2

This model is then called two period moving average regression model because actually we are regress -
 Yt  Yt 1   ( X t  X t 1 ) 
   
ing the value of one moving average  2  on another  2  . This method of first differ-
ence in quite popular in applied research for its simplicity. But the method rests on the assumption that
either there is perfect positive or perfect negative autocorrelation in the data.

Method II: Estimated  from d-statistic:


a) Using the Durbin-Watson statistic: From equation ( 3.47 ), we obtained d  2(1   ) . Suppose we
ˆ
calculate certain value of d-statistic in the case of certain data. Given the d-value we can estimate 
1
d  2(1  ˆ )  ˆ  1  d
from this. 2 . It can be shown that as T (the sample size) gets larger, the
DW statistic d approaches to 2(1   ), i.e., d  2(1   ) as T  . But, this estimator is highly inaccur-
ate if the sample size is small.
 
 = t-1 +u t
b) From OLS residuals: regress OLS residuals  t on  t 1 without a constant term: t

An estimate of r is the estimated coefficient of  t 1 .

Yt on Yt 1 , X t and X t 1 :
c) Durbin’s method: Run the regression of
Yt    Yt 1   X t   X t 1  ut
……………………………………..4.99

An estimator of  is the estimated coefficient of Y t-1. The above relationship is true only for large sam-
ples. For small samples, Theil and Nagar have suggested the following relation:

n 2 (1  d 2 )  k 2
ˆ  ...........................................................................4.100
n2  k 2

where n=total number of observation; d= Durbin Watson statistic ; k=number of coefficients (including
intercept term). Using this value of ̂ we can perform the above transformation to avoid autocorrelation
from the model.

Practical Exercise 4. 16

What is logical reason for using  a based on a priori information? And estimated values ̂ ?

302
4.3 Multicollinearity
4.3.1 Definition and nature of Multicollinearity
So far we have implicitly assumed no any form of linear relationship among explanatory vari-
ables. But variables are correlated in some sense. Many social research studies use a large number prob-
lems arise when the various predictors are highly & linearly related. Multicollinearity is the existence of
a “perfect” or exact, linear relationship among some or all explanatory variables of a regression model.
X 1 , X 2 ,..........., X n
For regression involving explanatory variables , an exact linear relationship is said to
exist if the following condition is satisfied.
1 X 1  2 X 2  ..........  k X k  0.......................................................4.101
1 ,  2 ,..... k
where are constants such that not all of them are simultaneously zero. Perfect multicollinear-
ity occurs when one (or more) of the regressors in a model (e.g., X K) is a linear function of other/s (X i, i
= 1, 2, …, K-1).

303
But, in practice neither of the above cases is often met. There is some degree of interrelation among the
explanatory variables due to the interdependence of many economic variables over time. Simple correl-
ation coefficient for each pair of explanatory variables will have a value between zero and unity. As this
value approaches unity, multicollinearity gets stronger and it impairs the accuracy and stability of the
parameter estimates. Finally, multicollinearity is not a condition that either exists or does not exist in
economic functions, but rather it is a phenomenon inherent in most relationships due to the nature of
economic magnitudes.
Y , X1, & X 2
The nature of multicollinarity can be illustrated using in figures below. The figures repre-
sent the variation in Y (the dependent variable) and X 1 and X2 (explanatory variables) respectively. The
degree of collinearity can be measured by the extent of overlap (shaded area) between X 1 and X2. In the
X 1 and X 2
fig.(a) below there is no overlap between and hence no collinearity. In figs. ‘b’ through ‘e’,
X 1 and X 2
there is “low” to “high” degree of collinearity. In the extreme if were to overlap completely
(or if X1 is completely inside X2, or vice versa), collinearity would be perfect.

Figure 4.9: Extent of multicollinearity among explanatory variables


Multicollinearity refers to only linear relationships among the x-variables. It does not rule out non-lin-
ear relationships among the x-variables. For example:
Y    1 xi  1 xi2  1 xi3  vi ........................................................4.102

Where: Y-Total cost and X-output.


X 2 and X i3
The variables i are obviously functionally related to x i but the relationship is non-linear.
Strictly, models such (4.102) do not violate the assumption of no multicollineaity. However, in concrete
X i , X i2 and X i3
applications, the conventionally measured correlation coefficient will show to be highly
correlated, which as we shall show, will make it difficult to estimate the parameters with greater preci -
sion (i.e. with smaller standard errors).
304
 
Yi    X   X  u if X 2i  2 X 1i
For instance, suppose, PRF: 1 1i 2 2i i , then there is a perfect (an
exact) multicollinearity between X1 & X2. The OLS technique yields 3 normal equations:

 Y  n     X    X
i 0 1 1i 2 2i

 Y X  n   X    X    X  X
2
i 1i 0 1i 1 1i 2 1i 2i

 Y X  n   X    X  X    X
2
i 2i 0 2i 1 1i 2i 2 2i ...........................................4.103

But, substituting 2X1 for X2 in the 3rd equation yields the 2nd equation. i.e., one of the normal equations is
in fact redundant. Thus, we have only 2 independent equations but 3 unknowns (β's) to be estimate. As a
result, the normal equations will reduce to:

 Yi  nˆ0  (ˆ1  2ˆ2 ) X 1i


The number of β's to be estimated is greater than the2 number of independent equations. So, if two or
 Yi X 1i  ˆ0  X 1i  ( ˆ1  2 ˆ2 ) X 1i ......................................................4.104
more X's are perfectly correlated, it is not possible to find the estimates for all β's. We cannot find
1 and  2 separately, but 1  2 2 .

α̂  βˆ 1  2βˆ 2 
YX i 1i  nX1Y
...............................................................4.105
X 2
1i  nX12
βˆ 0  Y  [βˆ 1  2βˆ 2 ]X1.......................................................................4.106

Practical Exercise 4. 17

What is multicolinearity is strict sense? Explain it using example?

4.3.2. Reasons for Multicollinearity

One of the reasons for multicollinearty is due the data collection method employed (e.g. sampling over a
limited range of X values). Example: if regression is conducted using small sample values of the popu-
lation, there may be multicollinearity. But, if we take all the possible values, it may not show multi -
collinearity. Second reason for multicolinearity is constraint over the model. For example: in the regres-
sion of electricity of consumption expenditure on income (x 1) and house size (x2), there is a physical
constraint in the population in that families with higher income generally have larger homes than with
lower incomes. Third, multicollinrearity can be due to overdetermined model: This happens when the
model has more explanatory variables than the number of observations. This could happen in medical
research where there may be a small number of patients about whom information is collected on a large
number of variables.

305
Fourth, adding many polynomial terms to a model, especially if the range of the X variable is small ; in-
cluding (almost) the same variable twice in the model or including a variable computed from other vari-
ables in the model (e.g. using family income, mother’s income & father’s income together) can be cause
of multicollinearity . In this case variables are highly correlated or a tendency of economic variables to
move together over time. Fifth, multicolinearity may be due to inherent nature of the data. Especially in
time series data, where the regressors included in the model share a common trend, they all increase or
decrease over time. For example, in the regression of consumption expenditure on income, wealth, and
population,…etc may all be growing over time at more or less the same rate, leading to collinearity
among these variables. Sixth , using of lagged values of some explanatory variables as separate inde-
pendent factors in the relationship. In addition, improper use of dummy variables can be sited.

Practical Exercise 4. 18

What are possible reasons for multicolinearity? Substantiate your answer with example?

4.3.3 Consequences of multicollinearity


The reason why classical linear regression model put the assumption of no multicollinearity among the
X’s to avoid effects of such problem. Some of the consequences of multicollinearity on OLS estimators
are as follows;
1. If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate and
their standard errors are infinite. Consider a multiple regression model with two explanatory vari-
ables, where the dependent and independent variables are given in deviation form as follows.
y i  ˆ1 x1i  ˆ 2 x 2i  ei ˆ ˆ 1 2
. Recall the formulas of and from our discussion of chapter
three?

x1 y( x1 ) 2  x1 yx1 x1


ˆ1  ..................................................4.107
x12i ( x2 ) 2  (x1 x2 ) 2
x2 y( x1 ) 2  x1 yx1 x2
ˆ2  ................................................4.108
x12i ( x2 i ) 2  (x1 x2 ) 2
x2   x1..................................................................................................................4.109
Assume
ˆ ˆ
Where  is non-zero constants. Substitute 4.109 in the above  1 and  2 formula:
x1 y(x1 ) 2  x1 yx1x1
ˆ1 
x12i (x1 ) 2  (x1 x1 ) 2
x1 y 2 x1  x1 yx1
2 2
 0

 (x )   (x1 )
2 2
1
2 2 2 2
0  indeterminate………4.110

306
ˆ
Applying the same procedure, we obtain similar result (indeterminate value) for  2 . Likewise, from our
 2 x22
var( ˆ1 )  2 2
ˆ x1 x2  (x1 x2 ) 2 . Substi-
discussion of multiple regression model, variance of  1 is given by:
tuting x 2  x1 in the above variance formula, we get:
 2  2 x12  2 2x12
   (infinite)
  (x1 )   (x1 )
2 2 2 2 2 2
0 …………………………4.111
These are the consequences of perfect multicollinearity.
2. If multicollineaity is less than perfect (i.e near or high multicollinearity), the regression
coefficients are determinate. Consider the two explanatory variables model above in deviation form. If
x 2  x1 x1 and x 2
we assume , there is perfect correlation between because the change in x2 is com-

x 2i  x1i  vt   0, vt
pletely captured by the change in x 1. If we have: where is stochastic error
x i v i  0
term such that , is high multicolinearity but not perfect. For example, X1 = 3 – 5XK + ui. In
this case X2 is not only determined by x1, but also affected by some other variables given by v i (sto-
x 2  x1i  vt ˆ1
chastic error term). Substitute in the formula of above

ˆ x1 yx 22  x 2 yx1 x 2


1 
x12i x 22  (x1 x 2 ) 2
x1 y2 x12  vi2  (y i x1i  y i vi )x12i
0
 
x 22i (2 x 22i  vi2 )  (x12 ) 2 0  determinate……….….4.112
This proves that if we have less than perfect multicollinearity the OLS coefficients are determinate.
3. Although BLUE, the OLS estimators have large variances and covariances.

 2 x 22
var( ˆ 2 ) 
x12 x 22  (x1 x 2 ) 2 …………………..…………….……………4.113
1
2
Multiply the numerator and the denominator by x 2
1
 2 x22 . 2
x2 2
var( ˆ2 )  
x12x22  (x1 x2 )2 . x1 2 x12  (x1xx22 )
2

2 1

2 2
  ..................................................4.114
 (x1 x2 ) 2  x12  (1  r122 )
x   1 
2
1 
 x12 x12 
2
Where r12 is the square of correlation coefficient between x1 and x 2 ,
307
x2i  x1i  vi ˆ
, what happen to the variance of  1 as r12 is line rises. As r12 tends to 1 or as collinearity
2
If
ˆ
increases, the variance of the estimator increase and in the limit when r12  1 variance of  1 becomes in-
finite.
4. Invalid hypothesis testing: Because of the large variance of the estimators, which means large stan-
dard errors, the confidence interval tend to be much wider as the computed t-ratio will be very small.
i
Inflated standard errors of along with, small t-ratio leading one or more of the coefficients to be
statistically insignificant when tested individually leading to the acceptance of “zero null hypothesis”
& confidence interval become wide (rejecting H0: βj = 0 becomes very rare). In cases of high
collinearity, it is possible to find that one or more of the partial slope coefficients are individually sta-
tistically insignificant on the basis of t-test.
5. Although the t-ratio of one or more of the coefficients is very small (which makes the coefficients sta-
tistically insignificant individually), R2, the overall measure of goodness of fit, can be very high (say
excess of 0.9)., indicating no much individual variation in the X's, but a lot of common variation. In
other word, on the basis of F test one can convincingly reject the hypothesis that
1   2       k  0
Indeed, this is one of the signals of multicollinearity.
6. The implication of indetermination of regression coefficients in the case of less than perfect multicol -
x1 and x 2
inearity is that it is not possible to observe the separate influence of . But such extreme
case is not very frequent in practical applications. Most data exhibit less than perfect multicollinear -
ity. If there is inexact but strong multicollinearity: it difficult to isolate the effect of each of the highly
collinear X's on Y as the collinear regressors explain the same variation in Y. Furthermore, estimated
coefficients change radically depending on inclusion/exclusion of other predictor/s. The regression
tends to be very shaky from one sample to another.

Practical Exercise 4. 19
 r12 2
cov( 1 ,  2 ) (1  r122 ) x12x12
1. Show that is ?
2. How confidence interval will be affected when there is multicollinearity?

4.3.4 Detection of Multicollinearity


Multicollinearity can be detected using one of the following mechanisms;
a. Auxiliary regressions
b. High coefficient of determination ( R2) but few significant t- ratios

308
(r ' s)
c. High correlation coefficients among the explanatory variables xi x j
d. VIF and TOL
e. Large standard errors and smaller t-ratio of the regression parameters

Note that: None of the symptoms by itself are a satisfactory indicator of multicollinearity. Because:
Large standard errors may arise for various reasons and not only because of the presence of linear rela -
rxi x j 2
tionships among explanatory variables. A high is only sufficient but not a necessary condition (ade-
quate condition) for the existence of multicollinearity because multicollinearity can also exist even if the
correlation coefficient is low. However, the combination of all these criteria should help the detection of
multicollinearity.

I. Test Based on Auxiliary Regressions:

Since multicollinearity arises because one or more of the regressors are exact or approximately linear
combinations of the other regressors, one may find out which X variable is related to other explanatory
variables. To do this regress each X i on the remaining X variables and compute the corresponding R 2,
Ri2
which we designate as , each one of these regressions is called auxiliary to the main regression of Y
on the X’s. Then, following the relationship between F and R 2 established to test over all significance
of the variable.
Rx21 , x2 , x3 ,... xk / k  2
Ri  ~F( k  2, n k 1) .................................................................4.115
1  Rx21 , x2 , x3 ,... xk /(n  k  1)
where: n is number of observation, k is number of parameters including the intercept
If the computed F exceeds the critical F at the chosen level of significance, it mean that the particular X i
is collinear with other X’s; if it does not exceed the critical F, we say that it is not collinearity with other
X’s in which case we may retain the variable in the model. If F i is statistically significant, we will have
to decide whether the particular Xi should be dropped from the model. According to Klieri’s rule of
thumb, multicollinearity may be a trouble if R 2 obtained from an auxiliary regression is greater than the
overall R2, that is obtained from the regression of Y on all regressors.
II. Test of multicollinearity using Eigen values and condition index:

Using Eigen values we can drive a number called condition number K as follows:
max imum eigen value
k ....................................................................4.116
Minimum eigen value
In addition using these value we can drive the condition index (CI) defined as
Max.eigen value
CI   k ..................................................4.117
min . eigen value
Decision rule: If K is between 100 and 1000 there is moderate to strong muticollinearity and if it ex -
ceeds 1000 there is sever muticollinearity. Alternatively if CI(  k ) is between 10 and 30, there is
moderate to strong multicollineaity and if it exceeds 30 there is sever muticollinearity. Example . If
k=123,864 and CI=352 – This suggest existence of multicollinearity.
309
III. Test of multicollinearity using Tolerance and variance inflation factor

Some authors suggest using the VIF as an indicator of multicollinearity.


ˆ 2  1  2 1
var( 1 )  2  2 
 2 VIF ; VIF  .....................................4.118
x1  1  Ri  xi 1  Ri2

R2 2
where i is the R in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is variance
inflation factor. The larger is the value of VIF j, the more “troublesome” or collinear is the variable X j.
However, how high should VIF be before a regressor becomes troublesome? As a rule of thumb, if VIF
R2
of a variable exceeds 10 (this will happens if i exceeds (0.9) the variable is said to be highly collinear.
Other authors use the measure of tolerance to detect multicollinearity. It is defined as
1
TOLi  (1  R 2j ) 
VIF . Clearly, TOLj =1 if Xj is not correlated with the other regressors, where as it is
zero if it is perfectly related to other regressors.

One can derive VIF from variances and covariances of estimates as indicated by equation 4.11 are
2
x12  (1  r122 ) . Where r122 is the square of correlation coefficient between x1 and x 2 ,Similarly;
 r12 2
cov(1 ,  2 ) 
(1  r122 ) x12 x12
. As r12 increases to ward one, the covariance of the two estimators in-
crease in absolute value. The speed with which variances and covariance increase can be seen with the
1
VIF 
variance-inflating factor (VIF) which is defined as: 1  r122 .VIF shows how the variance of an es-
2
timator is inflated by the presence of multicollinearity. As r12 approaches 1, the VIF approaches infin-
ity. That is, as the extent of collinearity increase, the variance of an estimator increases and in the limit
x and x 2
the variance becomes infinite. As can be seen, if there is no multicollinearity between 1 , VIF
ˆ
will be 1. Using this definition we can express var( 1 ) and var( 2 ) interms of VIF
ˆ 2 ˆ 2
var(  1 )  2 var(  2 )  2
x1 VIF and x 2 VIF…………………………………4.119
ˆ ˆ
Which shows that variances of 1 and  2 are directly proportional to the VIF.
Limitation: VIF (or tolerance) as a measure of collinearity, is not free of criticism. As we have seen ear-
 2 
var( ˆ ),    2 (VIF ) 
 xi  ; depends on three factors  , xi and VIF . A high VIF can be counter
2 2
lier
balanced by low
 2 or high xi2 . To put differently, a high VIF is neither necessary nor sufficient to
get high variances and high standard errors. Therefore, high multicollinearity, as measured by a high
VIF may not necessarily cause high standard errors.
Table 4.14: Data on three variables Y, X1, X2
Y X1 X2

70 80 810

310
65 100 1009

90 120 1273

95 140 1425

110 160 1633

115 180 1876

120 200 2253

140 220 2201

155 240 2435

150 260 2686

A. find multiple linear regression of Y on X1 and X2


B. is there co linearity in the model

Solution
A. fitting regression is

Yt  24.34  0.87 X 1  0.035 X 2
se  (6.286) (0.31) (0.03) R 2  0968 adjR 2  0.959
t  (3.875) (2.772) ( 1.16)
P  (0.006) (0.027) (0.284)
Hence, X1 is significant at 0.05 but X 2 is not. The correlation coefficients between income and wealth in
this example is also found to be 0.9937which indicate multicolinearity is a threat in the model. If we
want to test the multicollniearity with the VIF value, we regress X 1 on X2 and obtain the following
model

X t  2.43  0.095 X 2
se  (7.03) (0.004) R 2  0988
t  (0.346) (25.253)
P  (0.006) (0.027)
The VIF then is found to be
1 1
VIF    83
1  R j 1  0.988
2
VIF
This value is obviously greater than 10. Therefore, we can say that there is multicolinearity in the model.

4.3.5. Remedial measures


It is more difficult to deal with models indicating the existence of multicollinearity than detecting the
problem of multicollinearity. Different remedial measures have been suggested by econometricians; de-
pending on the severity of the problem, availability data and the importance of the variables, which are
found to be multicollinear in the model. Some suggest that minor degree of multicollinearity can be tol-
311
erated although one should be a bit careful while interpreting the model under such conditions. Others
suggest removing the variables that show multicollinearity if it is not important in the model. By doing
so, the desired characteristics of the model may then get affected. However, the following corrective
procedures have been suggested if the problem of multicollinearity is found to be serious.
1. Increase the size of the sample: it is suggested that multicollinearity may be avoided or reduced if
the size of the sample is increased. With increase in the size of the sample, the covariances are inversely
related to the sample size. But we should remember that this will be true when intercorrelation happens
to exist only in the sample but not in the population of the variables. If the variables are collinear in the
population, the procedure of increasing the size of the sample will not help to reduce multicollinearity.
2. Introduce additional equation in the model: the problem of mutlicollinearity may be overcomed by
expressing explicitly the relationship between multicollinear variables. Such relation in a form of an
equation may then be added to the original model. The addition of new equation transforms our single
equation (original) model to simultaneous equation model. The reduced form method (which is usually
applied for estimating simultaneous equation models) can then be applied to avoid multicollinearity.
4. Use extraneous information: extraneous information is the information obtained from any other
source outside the sample which is being used for the estimation. Extraneous information may be avail-
able from economic theory or from some empirical studies already conducted in the field in which we
are interested. Three methods through which extraneous information are utilized in order to deal with
the problem of multicollinearity.
a. Method of using prior information: Suppose that the correct specification of the model is
Y    1 X 1   2 X 2  U i
, and also X 1 and X 2 are found to be collinear. If it is possible to gather

information on the exact value of  1 or  2 from extraneous source, we then make use of such infor-
mation in estimating the influence of the remaining variable of the model in the following way. Sup-

pose  2* known a priori, then: Y   2* X 2    1 X 1  U . Applying OLS method:


x1 ( y  ˆ2* x2 ) x1 y  ˆ2*x1 x2
ˆ1   ............................................4.120
x12 x12
ˆ
i.e.  1 is the OLS estimator of the slope of the regression of (Y   2 X 2 ) on X 1 .
*

Thus, the estimation procedure described is equivalent to correcting the dependent variable for the influ-
ence of those explanatory variables with known coefficients (from extraneous source of information)
and regressing this residual on the remaining explanatory variables.
b. Methods of transforming variables or functional form: This method is used when the relationship
between certain parameters is known a priori.

312
i. Using ΔX instead of X (where the cause may be X's moving in the same direction over time),

X Xj
using instead of Xj tends to reduce collinearity in polynomial regressions.
ii. Using logs form reduces between variables by more factor than using levels. Suppose that we
  u
want to estimate the production function expressed in the form Q  AL K e where Q is
quantity produced L-labor input and K- the input of capital. It is required to estimate
 and  . Up on logarithmic transformation, the function becomes:

ln Q  ln A   ln L   ln K  u i
…………………………………4.121
Q*  A * L *   K * U
………………….……………………..4.122
The asterisk indicates logs of the variables. Suppose, it is observed that K and L move to-
gether so closely and it is difficult to separate the effect of changing quantities of labor inputs
on output from the effect of variation in the use of capital. Again, let us assume that on the
basis of information from some other source, industry is characterized by constant returns to

scale. This implies that     1 , we can therefore substitute   (1   ) in the transformed


function. Then combining the results, the relationship becomes:
Dˆ t    ˆPt   2*Yt  uˆ t
…………………………………………..4.123
ˆ ˆ*
Where  1 is derived from the time series data,  2 is obtained by using the cross-section
data.
iii. Pooling cross-sectional and time-series data. By pooling technique, we have reduce problem
multicollinearity between income and price.
iv. Dropping one of the collinear predictors. But, this may lead to the omitted variable bias.

The methods described above are no sure methods to get rid of the problem of multicollinearity. The
rules work to be used in practice will depend on the nature of the data under investigation and severity
of the multicollinearity problem.

4.4. Non-Normality of error terms


A. Definition: The CLRM merely requires errors term to be independently identically distributed
(IID). Normality of errors is required only for valid hypothesis testing especially for validity of t-
and F-tests. There is no obligation on X's to be normally distributed. It is not required to check
BLUE property of β’s. Hence, non-normality affects hypothesis testing.
313
B. Reasons for non-normality of error: The assumption of IID errors is violated if a (simple) random
sampling cannot be assumed. Specifically, the assumption of IID errors fails if:
I. errors are not identically distributed, i.e., if var(εi) varies with observation, and hence hetero-
scedasticity
II. errors are not independently distributed, i.e., if εi's are correlated to each other, and hence
serial correlation
III. errors are both heteroscedastic & auto-correlated
C. Non normality is common in panel & time series data.
D. Consequence: For large sample or population it is not a big problem because of law of large num -
ber or limit theory of large sample. With large samples (due to the CLT) hypothesis testing is
(asymptotically) valid even if the distribution of errors deviates from normality. In small samples,
if errors are not normally distributed, estimated coefficients will not follow normal distribution,
which complicates inference.
E. Detecting mechanism: A formal test of normality is Shapiro-Wilk test in which errors are as-
sumed to be normally distributed.
F. ways of correcting : If H0 is rejected, transforming the regressand or re-specifying the functional
form of the model may help.

4.5. Model Specification and Data issues


4.5.1. Over view
So far, we dealt with some of failures of the Gauss-Markov assumptions: Heteroskedasticity and auto-
correlation, multicollieanarity, non-normality. We have done all these based on the assumption that the
regression model used in the analysis is “correctly” specified given CLRMA. Much more serious prob-
lem in practice is when such assumption is violated and hence model specification bias and measure-
ment errors. Such problem may be due to endogienty, functional form misspecification, measure-
ment error and data problems. An endogenous explanatory variable exist if u is, for whatever rea-
son, correlated with the explanatory variable Xj. In this section we provide a more detailed discussion
model specification and their problems.

4.5.2. Endogenous regressors assumption and Its problems: E(ɛ i|Xj) ≠ 0


A key assumption maintained in the previous lessons is that the model was correctly specified. The

model Y   0  1 X i  ui is correctly specified if ε enters the model with an additively separ-

314
able effect on Y and is independent of X's. Thus, the effect of error term equals zero on average; and,
E(Y|X) is linear in stable parameters (β's).
The two versions of endogeneity assumptions:
i. E(ɛi) = 0 and X’s are fixed (non-stochastic), The assumption E(εi) = 0 amounts to: “We do not
systematically over-or-under estimate the PRF,” or the overall impact of all the excluded vari -
ables is random/unpredictable.

ii. E(ɛiXj) = 0 or E(ɛi|Xj) = 0 with stochastic X’s. This assumption cannot be tested as residuals will
always have zero-mean if the model has an intercept. If there is no intercept, some information
can be obtained by plotting the residuals. If E(ɛ i) = μ (a constant but ≠ 0) & X's are fixed, the es-
timators of all β's, except β0, will be OK.
If the assumption E(εi|Xj) = 0 is violated or if u is for whatever reason correlated with the explanatory
variable Xj, then we say that Xj is an endogenous explanatory variable. Then the OLS estimators will be
biased & inconsistent. However, assuming exogenous regressors is unrealistic in many situations.

4.5.3. Causes of misspecification and data issues


Possible sources of endogeneity are: specification errors, stochastic regressors & measurement error;
nonlinearity & instability of parameters; and bidirectional link between the X's and Y.

4.5.3.1. Functional Form Misspecification (Specification Errors)


The decision to include/exclude variables should be guided by economic theory and reasonings. Some-
times it is common to see specification errors. Model misspecification may result from: omission of rele-
vant variable/s, using wrong functional form, or inclusion of irrelevant variable/s. Consider the familiar
example of the cubic total cost function given as follows:
Y   0  B1 X   2 X i 2  3 X i 3  1i ................................................4.124

Where Y = total cost of production and X = output


Suppose that on the basis of the criteria discussed above, we have verified that this model is accepted as
a good model. Now, let us see the possible types of specification errors in turn in light of the model
given under equation 4.124
A. Omission of a relevant variable(s)
Sometimes, there may be problem of omitted variable (explanatory variable) in the model which leads
to functional form misspecification. Omitting a key variable can cause correlation between the error
and some of the explanatory variables, which generally leads to bias and inconsistency of OLS estima-
tors. Now, suppose for some reason a researcher decided to use the model in equation 4.125 instead of
model 4.124.

315
Yi   0  1 X  3 X i 2   2i .........................................................................4.125

Since we assumed the model in equation 4.124 to be correct, adopting 4.125 would constitute a specific-

ation error by omitting a relevant variable. Consequently, the error term


 2i become

 2i  3 Xi 3  1i ...........................................................................................4.126
B. Inclusion of an unnecessary variable(s)
Sometimes instead of the correct model in equation 4.124 one may use a model which include unne -
cessary or irrelevant variable. To see this type of error, suppose that a researcher uses the following
model:
Y   0  1 X   2 X i 2  3 X i 3   4 X i 4  1i ......................................4.128

Hence, the error term is in fact


 3i  1i   4 Xi 4 ...................................................................4.129

Because
 4 = 0 in the true model given in equation 4.124, but included as key variable in this model un-

necessarily.

C. Adopting the wrong functional form (Error in the algebraic form of the relationship)
Apart from omitting functions of independent variables, model can suffer from misspecified functional
form. A regression model suffers from functional form misspecification (specification bias) when it does
not properly account for the relationship between the dependent and the observed explanatory variables
or error in functional form. These types of errors are usually committed at the stage of representing
economic relationships by mathematical form. Sometimes a researcher may wrongly use linear model to
represent non-linear relationships. For instance, using a linear functional form when the true relationship
is logarithmic (log-log) or semi-logarithmic (lin-log or log-lin). For example, a researcher may use equa-
tion below to represent the cubic relationship between cost of production (Y ) and output produced ( X )
as given in equation 4.130.
ln Y   0   1 X   2 X i 2   3 X i 3   4i ......................................4.130

But, if new use Y instead of lnY and /or the power of X is ¼ rather than one, then we will not obtain un -
biased or consistent estimators of the partial effects.
There are two causes of such problems: stochastic regressors and errors of measurement.
A. Stochastic Regressors : incorrect specification of the stochastic error term.
Many economic variables are stochastic, and it is only for ease that we assumed fixed X's. It is not
whether X's are stochastic or fixed that matters, but the nature of correlation between X's & ɛ. In gen-
eral, stochastic regressors may or may not be correlated with the error term. If X & ɛ are independently
distributed, then E(ɛ|X) = 0, OLS retains all its desirable properties even if X's are stochastic.
316
If Xi are contemporaneously correlated with error term, there is problem of incorrect specification of the
stochastic error term. These errors determine the way the stochastic error term ui enters the regression
model, its estimation and interpretation. Consider the following regression model without the intercept
term:
Yi   X iU i ................................................................................................4.131

In this model the stochastic error term enters multiplicatively and it does not satisfy the usual assump-
tions of the stochastic term. However, suppose that the true model is as given below.
Yi   X i  U i ..........................................................................................4.132

In equation 4.132 the error term enters additively. Although the variables are the same in the two mod-
els, the improper stochastic specification of the error term in equation 4.131 will constitute specification
error. Thus, if X & Ui are contemporaneously correlated, [E(ɛi|Xi±s)≠0 for s = 1, 2,…], or Ui & X are
asymptotically correlated, OLS retains its large sample properties: estimators are biased, but consistent
and asymptotically efficient.

B. Errors in measurement (Measurement Error)


These errors occur due to measurement problem of variables. The measurement error is an issue only
when the variables for which the econometrician can collect data differ from the variables that influence
decisions by individuals, families, firms, and so on. Sometimes, in economic applications, we cannot
collect data on the variable that truly affects economic behavior rather use proxy variable. When we use
an imprecise measure of an economic variable in a regression model, then our model contains measure-
ment error. Suppose that a researcher used the following model instead of the model given in equation
4.130.
Y *   *0  B *1 X *   *3 X i *2   *4 X *i 3  *1i ................................................4.133

Yi *  Yi   i X i *  X i  wi  i and wi being the errors of measurement. This model shows


Where , , and
Yi * and X i *
that instead of using the true values of the variables, the researcher used their proxies, ;
which may contain errors of measurement. Therefore, the researcher is bound to commit the measure-
ment errors bias. Measurement errors can be because of dependent and independent variables.

A good example is the marginal income tax rate facing a family. The marginal rate may be hard to ob -
tain or summarize as a single number for all income levels. Instead, we might compute the average tax
rate based on total income and tax payments. Since average tax rate is used as a proxy for marginal in -
come tax which is poor.

317
i. Measurement Error in the Dependent Variable

We begin with the case where only the dependent variable is measured with error. Let y* denote the
variable that we would like to explain. For example, y* could be annual family savings. The regression
model satisfies the Gauss-Markov assumptions has the usual form
Y *   0  B1 X  .......  Bk X k  u............................................................4.134

Let Y represent the observable measure of Y*. In the savings case, Y is reported annual savings. Unfortu-
nately, families are not perfect in their reporting of annual family savings; it is easy to leave out cate -
gories or to overestimate the amount contributed to a fund. Generally, we can expect Y and Y* to differ,
at least for some subset of families in the population.

The measurement error (in the population) is defined as the difference between the observed value and
the actual value:
e0  y  y *..........................................................................................4.135

ei 0  yi  yi *
For a random draw i from the population, we can write , but the important thing is how the
measurement error in the population is related to other factors. To obtain an estimable model, we write
yi *  yi  ei 0
, plug this into equation (4.130. ), and then rearrange:
Y *   0  B1 X  .......  Bk X k  ei 0  u...................................................................4.136

ei 0  u
The error term in equation 4.136 is . Because Y, X1, X2, ..., Xk are observed, we can estimate this
model by OLS. Since the original model eq. 4.136 satisfies the Gauss-Markov assumptions, u has zero
mean and is uncorrelated with each Xj, it is only natural to assume that the measurement error has zero
mean. The usual assumption is that the measurement error is a random reporting error is statistically in-
dependent of each explanatory variable. Then OLS is good estimators from eq. 4.136 with unbiased and
consistent. Furthermore , the usual OLS inference procedures (t, F, and LM statistics) are valid.

If it does not hold, we simply get a biased estimator of the intercept,


 0 . But our assumption about the

relationship between the measurement error, e0, and the explanatory variables, Xj. This means that mea-
surement error in the dependent variable results in a larger error variance than when no error. Such mea-
surement error in the dependent variable can cause biases in OLS if it is systematically related to one or
more of the explanatory variables.
ii. Measurement Error in an Explanatory Variable

318
Traditionally, measurement error in an explanatory variable has been considered a much more important
problem than measurement error in the dependent variable. We begin with the simple regression model
which satisfies first four Gauss-Markov assumptions

Estimation of equation (4.137) by OLS would produce unbiased and consistent estimators of
 0 and 1 .

X1 * X1 *
The problem is that is not observed. Instead, we have a measure of , call it X1.There may be
X1 * X1 *
difference between and X1. For example, could be actual income, and X1 could be reported in-
come. There are deviations between the two as household usually tendency to hide actual income. The
measurement error is simply

which can be positive, negative, or zero.

E (ei )  0
We assume that the average measurement error in the population is zero: . This is natural, and,
in any case, it does not affect the important conclusions that follow. A maintained assumption is that u is
X1 *
uncorrelated with and X1.

E ( y / x1*, x1 )  E (Y / x1*)
In conditional expectation terms, we can write this as which just says that x1
X1 *
does not affect Y after has been controlled for. We used the same assumption in the proxy variable
X1 *
case. If we simply replace with X1 and run the regression of Y on X1 , we want to know the proper-
ties of OLS. They depend crucially on the assumptions we make about the measurement error. There are
two polar extremes assumptions in econometrics literature in this regard. The first assumption is that e1
is uncorrelated with the observed measure, X1:

From the relationship in (4.137), if assumption (4.139) is true, then e1 must be correlated with the unob-
X1 * X 1*  X 1  e1
served variable . To determine the properties of OLS in this case, we write and plug
this into original equation:

X 1 , u  1e1
Because we have assumed that u and e1 both have zero mean and are uncorrelated with has
X1 *
zero mean and is uncorrelated with X1. It follows that OLS estimation with X1 in place of produces
319
a consistent estimator of
1 (and also 0 ) . Since u is uncorrelated with e , the variance of the error in
1

(4.140) is
Var (u  B1e1 )   u 2  12 u 2
. Thus, except when
1  0 , measurement error increases the error

j
variance. But this does not affect any of the OLS properties (except that the variances of the will be
X1 *
larger than if we observe directly). The assumption that e1 is uncorrelated with X1 is analogous to
the proxy variable assumption. Since this assumption implies that OLS has all of its nice properties, this
is not usually what econometricians have in mind when they refer to measurement error in an explana-
tory variable.

iii. Causes of measurement errors

The measurement error problem discussed in the previous section can be viewed as a data problem as a
result of missing data, nonrandom samples, and outlying observations.
1. Missing Data
Literarily missing data refers to a situation in which we cannot obtain data on the variables of interest.
The missing data can arise in a variety of forms. Often, we collect a random sample of people, schools,
cities, and so on, and then discover later that information is missing on some key variables for several
units in the sample. If data are missing for an observation on either the dependent variable or one of the
independent variables, then the observation cannot be used in a standard multiple regression analysis.
The composite error term is correlated with the miss measured independent variable, violating the
Gauss-Markov assumptions. Ignoring the observations that have missing information reduce the sample
size available for a regression. Although this makes the estimators less precise, it does not introduce any
bias.
2. Nonrandom Samples
Under the Gauss-Markov assumptions, it turns out that the sample can be chosen on the basis of the in-
dependent variables without causing any statistical problems. This is called sample selection based on
the independent variables. Nonrandom Samples data problems can violate the random sampling as-
sumption. Missing data is more problematic when it is as a results oa nonrandom sample from the
population. Suppose that we are estimating a saving function, where annual saving depends on income,
age, family size, and perhaps some other factors. A simple model is defined as

320
Suppose that our data set was based on a survey of people over 35 years of age, thereby leaving us with
a nonrandom sample of all adults. We can still get unbiased and consistent estimators of the parameters
in the population model (4.141), using the nonrandom sample. Selection on the basis of the independent
variables is not a serious problem provided there is enough variation in the independent variables in the
subpopulation.

For example, suppose we wish to estimate the relationship between individual wealth and several other
factors in the population of all adults:

Suppose that only people with wealth below $250,000 are included in the sample. This is a nonrandom
sample from the population of interest. Using a sample on people with wealth below $250,000 will re-
sult in biased and inconsistent estimators of the parameters in (4.142). Briefly, this occurs because the
population regression is not the same as the expected value conditional on wealth being less than
$250,000.
Stratified sampling is a fairly obvious form of nonrandom sampling. A common method of data collec-
tion is stratified sampling, in which the population is divided into nonoverlapping, exhaustive groups,
or strata. Then, some groups are sampled more frequently than is dictated by their population representa-
tion, and some groups are sampled less frequently. For example, some surveys purposely oversample
minority groups or low-income groups. Whether special methods are needed again hinges on whether
the stratification is exogenous (based on exogenous explanatory variables) or endogenous (based on the
dependent variable).
The sample is selected based on someone’s decision to work. However, since the decision to work might
be related to unobserved factors, selection might be endogenous, and this can result in a sample selection
bias in the OLS estimators. In other cases, nonrandom sampling causes the OLS estimators to be biased
and inconsistent.
3. Outliers and Influential Observations

In some applications, especially, with small data sets, the OLS estimates are influenced by one or sev-
eral observations. Such observations are called outliers or influential observations. An observation is
an outlier if dropping it from a regression analysis makes the OLS estimates change by a practically

321
“large” amount. OLS is susceptible to outlying observations because it minimizes the sum of squared
residuals: large residuals (positive or negative) receive a lot of weight in the least squares minimization
problem. If the estimates change by a practically large amount when we slightly modify our sample it is
because of outliers, we should be concerned.

When statisticians and econometricians study the problem of outliers theoretically, sometimes the data
are viewed as being from a random sample of a given population. Sometimes the outliers are assumed to
come from a different population. From a practical perspective, outlying observations can occur for two
reasons: when entering data and sampling from small population. The easiest case of outlier is when a
mistake has been made in entering the data. Adding extra zeros to a number or misplacing a decimal
point can throw off the OLS estimates, especially in small sample sizes. It is always a good idea to com -
pute summary statistics, especially minimums and maximums, in order to catch mistakes in data entry.
Unfortunately, incorrect entries are not always obvious.

Outliers can also arise when sampling from a small population if one or several members of the popula -
tion are very different in some relevant aspect from the rest of the population. The decision to keep or
drop such observations in a regression analysis can be a difficult one, and the statistical properties of the
resulting estimators are complicated. Outlying observations can provide important information by in-
creasing the variation in the explanatory variables (which reduces standard errors). But OLS results
should probably be reported with and without outlying observations in cases where one or several data
points substantially change the results.

In some cases, certain observations are suspected at the outset of being fundamentally different from the
rest of the sample. This often happens when we use data at very aggregated levels, such as the city,
county, or state level. Sometimes statistical formula can be used that can detect such influential obser -
vations. Another approach of dealing with influential observations is use an estimation method that is
less sensitive to outliers than OLS. For most economic variables, the logarithmic transformation signifi -
cantly narrows the range of the data and also yields functional forms such as constant elasticity models
can explain a broader range of data. One such method, which is becoming more and more popular
among applied econometricians, is called least absolute deviations (LAD). The LAD estimator mini-
mizes the sum of the absolute deviations of the residuals, rather than the sum of squared residuals. It is
known that LAD is designed to estimate the effects of explanatory variables on the conditional median,
rather than the conditional mean, of the dependent variable. Because the median is not affected by large
changes in extreme observations, the parameter estimates obtained by LAD are resilient to outlying ob-
322
servations. Least absolute deviations is a special case of what is often called robust regression. Unfortu-
nately, the way “robust” is used here can be confusing. In the statistics literature, a robust regression es-
timator is relatively insensitive to extreme observations. Effectively, observations with large residuals
are given less weight than in least squares.

4.5.4. Consequences of misspecification problem

We have seen that misspecification problem can be omitting relevant variables, inclusion of an irrelev-
ant Variable, exclusion of a relevant variable and functional problems.These problems have con-
sequence on the BLUE properties of OLS. Effect of misspecification differs depending on variables
inclusion and exclusion. If the variable is omitted and it is correlated with one or more of the included
variables (i.e., if the pair wise correlation coefficient between the two is nonzero), then the estimates will
be biased and inconsistent. The usual confidence interval and hypothesis testing procedures are likely to
give misleading conclusions about the statistical significance of the estimated parameters. The forecasts
based on the incorrect model and confidence intervals will be unreliable.

The OLS estimators of the parameters of the “incorrect” due to inclusion of irrelevant variables are all
unbiased and consistent. But, the usual confidence interval and hypothesis-testing procedures is invalid.
The estimates of the parameters will be generally inefficient in that their variances will be larger than
those obtained from the true model.

If there is exclusion of a relevant variable as a regressor, then error is correlated with the measured
variables, then OLS estimators will be biased & inconsistent. The estimates of the parameters will be
generally inefficient, as their variances will be generally larger than those obtained from the true model.
If we have data on all the necessary variables for obtaining a functional relationship that fits the data
well, the problem is minor. But when a key variable is omitted on which we cannot collect data, mis-
specifying the functional form of a model can certainly have serious consequences. The effects of func-
tional form misspecification are the same as those of omitting relevant variables.

4.5.5. TESTS OF SPECIFICATION ERRORS


In regression analysis, we develop a model that we believe captures the essence of the subject under
study based on theory or prior empirical work. We then subject the model to empirical testing. After we
obtain the results, we begin the postmortem, keeping in mind the criteria of a good model discussed
323
earlier. It is at this stage that we come to know if the chosen model is adequate. To determine adequacy
of a model, we look at some features of the results, such as the R2 value, the estimated t ratios, the signs
of the estimated coefficients in relation to their prior expectations, the Durbin–Watson statistic, and the
like. If these diagnostics are reasonably good, we claim that the chosen model is a fair representation of
reality. Nonetheless, if the results do not look encouraging, for example if the R2 value is too low; very
few coefficients are statistically significant or have the correct signs; the Durbin–Watson d is too low;
then we begin to worry about model adequacy and look for remedies. May be we have omitted an im -
portant variable, or have used the wrong functional form, or have not first differenced the time series (to
remove serial correlation), and so on. One can test the above problem through examination of residuals,
Ramsay test, Durbin–Watson test etc.

A. Examination of Residuals: Graphic tests

In previous parts we have seen how to use the residuals of a model to examine autocorrelation and hetro-
scedasticity problems. But these residuals can also be examined, especially in cross-sectional data, for
model specification errors, such as omission of an important variable or incorrect functional form. To
this end or test misspecification errors; first estimate the model using OLS, obtain the residuals. Then
plot them and see the patterns of the plots or the plot of residuals vs fitted values. We suspect our model
for such errors, if the plot of the residuals exhibits distinct and noticeable patterns. But, no such problem
if we see residuals randomly scattered around zero. This approach is also used to have a quick glance at
problems like nonlinearity and others.
B. Different statistical tests

Suppose a researcher develops a k-variable model to explain a certain phenomenon as follows:

However, suppose that the researcher is not sure whether variable Xk really belongs in the model. How
do you think will the researcher find this out? One can do this using t-test, F test, Durbin–Watson test,
RESET test, etc.
I. t-test, F test
Given specific model we can find out whether one or more regressors are really relevant by the usual t
and F tests. The simple way to find this out is to test the significance of the estimated βk with the usual t
test. But note that the t and F tests should not be used to build a model iteratively. In other words, we
should not say that initially Y is related to X2 only because its coefficient is statistically significant and
then expand the model to include X3 and decide to keep that variable in the model if its coefficient turns
out to be statistically significant, and so on. Decision rule will be not eliminating variables from a model

324
based on insignificance implied by t-tests. In particular, do not drop a variable or more variables at once
(based on t-tests) even if each has |t| > 1, because the t-statistic corresponding to an X (X j) may radically
change once another (Xi) is dropped. However, if he/she is not sure about the relevance of more than one
variable (say X3 and X4), it can be easily ascertained by the F test.

II. The Durbin–Watson test

In previous part we have seen that the Durbin–Watson test can be used to test the autocorrelation prob-
lem in the regression model. This test can also be used to detect specification errors. To use the Durbin–
Watson test for detecting model specification error(s), we proceed as follows:
 Step 1: From the assumed model, obtain the OLS residuals.
 Step 2. If it is believed that the assumed model is mis-specified because it excludes a relevant
explanatory variable, say, Xi from the model, order the residuals obtained in step 1 in as-
cending order of Xi values.
 Step 3. Compute the d statistic from the residuals ordered in step 2 by the usual d formula
that we developed so far, namely
n

 e  e 
2
t t 1
d i2
n

e
i 1
t
2

 Step 4. From the Durbin–Watson tables, if the estimated d value is significant, then one can
accept the hypothesis of model misspecification. If this test leads us to accept the hypothesis
of misspecification, then inclusion of Xi in our will be the remedial measure.

III. Ramsey’s regression equation specification error test (RESET)

The tests for misspecification due to omitted variables or a wrong functional form can be done using a
RESET (regression specification error test) proposed by Ramsey (1969). The idea behind RESET is
fairly simple. If the original model

satisfies assumptions of CLRMA, then no nonlinear functions of the independent variables should be
significant. In order to implement RESET, we must decide how many functions of the fitted values to
include in an expanded regression. There is no right answer to this question, but the squared and cubed
terms have proven to be useful in most applications.
325
Let denote the OLS fitted values from estimating 4.143. Consider the expanded equation

This equation seems a little odd, because functions of the fitted values from the initial estimation now
appear as explanatory variables. In fact, we will not be interested in the estimated parameters from equa-
tion 4.144; we only use this equation to test whether equation 4.143 has missed important nonlinearities.

Note that, and are just nonlinear functions of the Xj.

The null hypothesis is that equation 4.143 is correctly specified. Thus, RESET is the F statistic for test-
H o  1  0,  2  0
ing in the expanded model specified in equation 4.144. A significant F statistic sug-
F2,n  k 3
gests some sort of functional form problem if the distribution of the F statistic is approximately
in large samples under the null hypothesis (and the Gauss-Markov assumptions). The degree of freedom
in the expanded equation 4.144 is n-k-1-2= n-k-3.

To illustrate the simplest version of this test, let us assume cost function as a linear function of output as
given below.

where Y=total cost and X=output


Steps in Ramsey’s RESET test
1. Regress Y on X's, and get Ŷ & ũ. From the chosen model (in this case equation 4.145), ob-
 
Yi (i.e. Y i )
tain the estimated values of and ũ. where some clue about the form of Y i is ob-
 
ui
tained by plotting the estimated value of the error term against Y i . Suppose, the plot sug-
 
ui
gests a curvilinear relationship between and Y i .

2. Rerun to your original model in equation after introducing Y i in some form as an addi-

  2

tional regressor (s) (i.e. Y i or Y i and so on) . That is, regress: Y on X's, and .
3. We run the following model.

326
4. Obtain R2 from equations 4.145 and 4.146 independently and call them R 2 old and R2 new re-
spectively and calculate the F value as given below.

where, K is the number of new regressors, n is the sample size and J is number of parameters in
the new model (equation 4.147). Then, find out if the increase in R2 is statistically significant by
using F test. To test for irrelevant variables, use F-tests (based on RRSS & URSS).

Decision rule: If and are significant (using F test), then reject H 0 & conclude that there is
misspecification. Stated differently, if the computed F value obtained from equation 4.147is sig-
nificant at a given significance level, then accept the hypothesis that the model 4.146 is mis-spe-
cified. RESET adds polynomials in the OLS fitted values to equation (4.146) to detect general
kinds of functional form misspecification. We already have a very powerful tool for detecting
misspecified functional form: the F test for joint exclusion restrictions.
Limitation
Some have argued that RESET is a very general test for model misspecification, including unobserved
omitted variables and heteroskedasticity. A drawback with RESET is that it provides no real direction on
how to proceed if the model is rejected. One advantage of RESET is that it is easy to apply, for it does
not require one to specify what the alternative model is. But knowing that a model is mis-specified does
not help us necessarily in choosing a better alternative model.

Unfortunately, such use of RESET is largely misguided. It can be shown that RESET has no power for
detecting omitted variables whenever they have expectations of linear relationship in the included inde-
pendent variables in the model for a precise statement. Furthermore, if the functional form is properly
specified, RESET has no power for detecting heteroskedasticity. The bottom line is that RESET is a
functional form test, and nothing more. In general, model misspecification due to the inclusion of irrele-
vant variables is less serious than due to omission of relevant variable/s.

IV. Tests against Nonnested Alternatives


Obtaining tests for other kinds of functional form misspecification is using nonnested model in which
we are trying to decide whether an independent variable should appear in level or logarithmic form.
This can be tested using the following approach

327
against the model

and vice versa.


However, these are nonnested models and so we cannot simply use a standard F test. Two different ap-
proaches have been suggested. The first is to construct a comprehensive model that contains each model
as a special case and then to test the restrictions that led to each of the models. In the current example,
the comprehensive model is

We can first test H0:


 1  0,  2  0 as a test of 4.148. We can also test H :  1  0,  2  0 as a test of 4.149.
0

This approach was suggested by Mizon and Richard (1986). Another approach has been suggested by
Davidson and MacKinnon (1981). That is left for you as reading assignment.

4.5.6. Remedial measures model misspecification and data issues


A. Correcting functional form if possible
B. Omitting unnecessary variables, including necessary variables
C. Using Proxy Variables for Unobserved Explanatory Variables. One possibility is to obtain a
proxy variable for the omitted variable. A proxy variable is something that is related to the un-
observed variable that we would like to control for in our analysis.

328
Summary
We have seen some of violation of classical linear regression model assumptions such as hetroskedat-
icty, autocorrelation, multicollinearlity, non-normality, misspectication of the model. Detail violations,
problems, respective consequences and solution are summarized in Table 4.16 below.
Table 4.16: Consequences of violation of CLRMA

Problem Solution

Problem Biased Biased stan- Invalid t& f High var (b)


b dard error (SE) tests [inefficient]

Heteroscedasticity No Yes Yes Yes Transform Y, WLS

Non normal ε No No yes- bad p-val- Yes


ues

Autocorrelation No Yes Yes Yes

X correlated with ε Yes Yes Yes -

Omit relevant X vari- Yes Yes Yes -


able

X measured with error Yes Yes Yes - Nothing

Include irrelevant X No No No Yes

Multicollinearity No No - individual p- Yes remove x


value inflated

Nonlinear relationship Yes Yes Yes -

One of the violations is related to variance. Heteroskedaticity is due to error learning model, nature of
our data, sampling procedure and measurement errors, presence of outliers. Heteroskedatcity have the
consequences of affecting variance i.e variance of error terms are not efficient; but leave estimators(
 o and 1 ) unbiased and consistent. Furthermore, prediction based on model with hetroskedaticity is

not reliable and conventional OLS hypothesis is invalid. This lack of efficiency makes the usual hypoth -
esis testing procedure of dubious value and the t & F tests will be invalid. The appropriate solution is to
transform the original model in such a way as to obtain a form in which the transformed disturbance
term has constant variance. One can correct the problem using generalized least square.

329
Another problem is autocorrelation a situation in which typical problem committed when two error
terms are correlated. Some of the reasons for reasons for autocorrelation are cyclical fluctuations, the
cobweb phenomena, misspecification of the true random terms and specification bais. Such problem
lead to inefficient standard errors which in turn lead to invalid hypothesis testing procedures.
Multicolinearity is all about existence of a perfect or exact linear relationship. Multicolinearity may be
because of sampling over limited range of X values, over determined model, improper use of dummy
variables and including the same variable twice. Consequences of multi-colinearity differ depending on
the situation. If it is perfect regression coefficients of variable(Xi) are indeterminate; if the problem is
less than perfects the regression coefficients are determinates; OLS estimator have large variance and
covariance; confidence interval tends to be much higher as standard error is large.
Another common problem is model misspecification which can be as a result endogienty, functional
form misspecification, and measurement error and data problems. An endogenous explanatory variable
exist if u is, for whatever reason, correlated with the explanatory variable Xj. some the mechanism of de-
tecting model specifications are using the Durbin–Watson test, RESET test. Some of the remedial mea-
sures are correcting functional form if possible omitting unnecessary variables, including necessary vari-
ables and using proxy variables for unobserved explanatory Variables.

Reference
Green, W. (2003). Econometric analysis. 5th ed. India: Dorling Kindersley Privet Ltd
Gujarat, O. (1999). Basic econometrics. 4th ed. New York: McGraw-Hill.
Heij, C., Boer, P. D., Franses, P.H. Kloek,T., Dijk, H. K. (2004). Econometric Methods with
Applications in Business and Economics. New York: Oxford University Press Inc.
Johnston, J. (1984) Econometric Methods. 3rd ed. New York: McGraw-Hill.
Maddala, G. (1992): Introduction to Econometrics. 3rd ed. New York: Macmillan.
Theil, H., (1997). Specification error and the estimation of Econometric Relationships: Review
of the International Statistical Institute vol. 25: pp 41-51
Thomas, R.L. (1997). Modern Econometrics: an introduction. 1st ed. UK: The riverside
printing Co. Ltd.
Verbeek, M. (2004). A Guide to Modern Econometrics. 2nd edition. England: John Wiley &
Sons Ltd
Wooldridge, J.(2000). Introductory Econometrics: A modern approach. 2nd ed. New
York: South western publishers
330
Review questions
Choose the best answer and circle your choice
1. Which one of the following is true about heteroskedasticity
A. variance of error term is not constant
B. violation of CLRMA``
C. it can be increasing ,decreasing and cyclical
 2  f (Xi )
D.
E. all
F. none
2. which one of the following is causes of hetreroskedaticity
A. error learning model
B. nature of economic variable s
C. sampling procedure and measurement errors
D. presence of outliers
E. all
F. none
3. which one of the following is consequence of heteroskedatcity
 o and 1
A. estimators( ) are unbiased and consistent
B. variance of error terms are not efficient
C. prediction based on model with hetroskedaticity is not reliable
D. conventional OLS hypothesis is invalid
E. all
F. none
4. Which one of the following is ways of detecting heteroskedaticity?
A. informal way using graph
B. informal way using nature of economic variable
C. formally using parks test
D. formally using spearman rank-correlation
E. all
F. none
5. Regression model has been estimated for 13 observation and hetroskedaticity has been detected
using Goldfield –Quandt test as follow

331
 Regression based on the first 13 observations
Yi  3.4094  0.6968 X i  ei
(8.7049) (0.0744) R 2  0.8887 RSS1  377.17 df  11
 Regression based on the last 13 observations
Yi  28.0272  0.7941X i  ei
(30.6421) (0.1319) R 2  0.7681 RSS2  1536.8 df  11
IF the critical F-value for 11 numerator and 11 denominator for df at the 5% level is 2.82. then which
one of the following is true
A. F calculated is 4.07
B. Fcritical is less than F calculated
C. there exist hetroskedaticity in error variance
D. there is no heteroskedticity in error variance
E. all except “D”
F. none
6. which one of the following is among the remedial measures of problems of heteroskedaticity
A. use generalized least square (GLS)rather than OLS
B. stick to OLS but use robust standard errors
C. use weighted least square (WLS) rather than OLS
D. transform original model by dividing is by standard error
E. all
F. none
Y     X i  Ui ,
7. which one of the following is appropriate transformation of original model of
var(ui )   i2 , (ui )  0 (ui u j )  0
to model adjusted for heteroskedasticity

Y   X i Ui
  
(u i )  
2
i
2
i i i i
A. if variance is , then
Y   X i Ui
  
Xi Xi Xi Xi
var(u i )   i2  K i2 X i2
B. if variance is , then
Y  X i Ui
  
(u )    k X i
2
i i
2 2
Xi Xi Xi Xi
C. , then

332
Y   Xi Ui
  
(vi )  (ao  a1 X i )  (Yi )     X i
2 2
   Xi    Xi    Xi    Xi
D. then
E. all
F. none
8. Autocorrelation
A. typical error committed when two error terms are correlated
B. special type correlation refers to the relationship between successive values of the same
variable
C. it is assocated with time series data
D. is a correlation of error terms with lagged values of explanatory variables
E. All except D
F. none
9. Reasons for autocorrelation
A. cyclical fluctuations
B. the cobweb phenomena
C. misspecification of the true random terms
D. specification bais
E. all
F. none
10. Which one of the following is true about order and coefficients of autocolerrelation
u t  f (u t 1 )
A. is first order autoregressive
u t  f (u t 1 , u t  2 )
B. is second order autoregressive

u u
t t 1
 u ,u 
t t 1

ut   ut 1   t u 2
t 1

C. for ,
D. Autocollelation coefficient ranges from -1 to +1
E. all
F. none
11. Which one of the following is effect of autocorrelations of OLS estimator
A. estimators are unbiased and inconsistent

333
B. variance of error term is underestimated and OLS is not Blue
C. wrong hypothesis testing procedures
D. variance Bi’s are overestimated
E. all except D
F. none
12. Which one of the following indicate autocorrelation
A. when plot of regression residuals against their own lagged value shows systematic pat-
terns
B. when plot of regression residual against time shows a regular patterns
C. when run test shows statistically significant results
D. when Breush-Godfrey(BG) test shows (T-P)R2 exceeds the tabulated value from the X2
distributions with p degree of freedom
E. all
F. none
13. The Durbin-Watson tests
t n

 (e t  et 1 ) 2
d t 2
t n

e
t 1
2
t

A.
d  2(1  ˆ )
B.
C. when d=1 ,there is positive perfect autocorrelation
 0
D. when no autocorrelation
E. all
F. none
Yt     X t  U t
14. Given result of regression model Y on X: i.e. : Table we can compute the
following values.

xy  255, Y  7, (ei  et 1 ) 2  60.21


x 2  280, X  8, et2  41.767
y 2  274

xy 255
ˆ    0.91
x 2 280
334
ˆ  Y  ˆX  7  0.91(8)  0.29

Y  0.29  0.91X  U i

Yˆ  0.28  0.91X , R 2  0.85

(et  et 1 ) 2 60.213
d   1.442
et2 41.767

d
Values of d L and U on 5% level of significance, with n=15 and one explanatory variable are: d L
d
=1.08 and U =1.36.
A. d les outside the range
dU  d  4  dU  (1.364 2.64)
B. interval for d is
C. the data implies autocorrelation
D. there is no indication for autocorrelation
E. all except “D”
F. none
15. which one of the following is among the remedial measure of autocorrelation
A. using generalized least square adjusted for autocorrelation
Yt     X t  U t U t  U t 1  Vt , |  | 1
B. Suppose that our model is and : Take the

lagged form of equation (1) and multiply through by .. and subtracting from original
Yt  Yt 1  (   )  (  X t   X t 1 )  (U t  U t 1 )
equation as follows
C. using OLS
D. all
E. all Except C
F. none

16. which one of the following is a mechanism for estimation
1
ˆ  1  d
2
A.
̂
B. prior information on

335
̂
C. given value
n 2 (1  d 2 )  k 2
ˆ 
n2  k 2
D.
E. all
F. none
17. multi-colinearity
A. existence of a perfect or exact linear raltions.
B. it does not refers to non-linear relationship
C. number of explanatory variable are reduced when OLS estimator is used
D. same approach is used with that of model with no multicolinearity
E. all
F. none

18. Multicolinearity may be because of


A. sampling over limted range of X values
B. over determined model
C. improper use of dummy variables
D. including the same variable twice
E. all
F. none

19. consequences of multi-colinearity


A. if it is perfect regression coefficients of variable(Xi) are indeterminate
B. if the problem is less than perfects the regression coefficients are determinates
C. OLS estimator have large variance and covariance
D. confidence interval tends to be much higher as standard error is large
E. all
F. none
20. As per the rule there is no multicolinarity problem if
R x21 , x2 , x3 ,... xk / k  2
Ri  ~ F( k  2, n  k 1)
1  R x21 , x2 , x3 ,... xk /( n  k  1)
A. exceeds F critical at chosen level of signifi-
cance

336
B. condition number (K) is between 100 and 1000
C. VIF of a variable exceeds 10
D. high R2 but few significant t-ratio
E. large standard error but small t-ratio
F. all
G. none

Part II: answer the following question


21. Mention steps for correcting hetroskedatcity using GLS?
22. State with brief reason whether the following statements are true, false, or uncertain and justify
A. In the presence of hetroscedasticity OLS estimators are biased as well as inefficient
B. If hetroscedasticity is present, the conventional t and F tests are invalid
23. State three consequences of autocorrelation
Part III: Do the following questions
24. Suppose that you have data of personal saving and personal income of Ethiopia for 31 year pe-
riod. Assume that graphical inspection suggest that Ui's are hetroscedasticso so that you wanted
to employ the Gordfield Quandt test. Suppose you ordered the observation in ascending order of
income and omit the nine central observations. Applying OLS to each subset, you obtained the
following result:
2
a) For sub set I: Ŝ1 = -738.84 + 0.008Ii  Û 1 = 144,771.5
2
b) For Sub set II: Ŝ 2 = 1141.07 + 0.029I  Û 2 = 769,899.2
is there any evidence of hetroscedasticity?

337

You might also like