Lab - Lecture 1
Lab - Lecture 1
Department of Economics
ECN-311: Basic Econometrics (Lab Lecture-1)
Course Teacher: Nazrul Islam
This lecture is a sequel to your lecture sessions on the theory of CLRM, the method and
properties of OLS estimation (BLUE) and hypothesis testing. From this lecture this lecture,
you will have a hands-on computer exercise in estimating simple (bi-variate) linear regression
models, interpreting regression results, testing hypothesis and carrying out basic post
regression diagnosies. The latter include statistical tests to detect whether the assumptions of
CLRM actually hold in a given example. The above mentioned elements constitute what we
call statistical inference.
We start with a hypothetical data on weekly family income and weekly family consumption
expenditure. Theory suggests that as income increases, individual tend to increase their
consumption expenditure but not by as much as the increase in income. Although theory does
not specify the functional form of this relationship it does suggest that the marginal propensity
to consume (MPC) is less than one. If we are willing to assume a linear relationship, the
econometric model would look like equation(1):
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 (1)
Where Y is weekly consumption and X is weekly income. Our objective is to estimate the
population parameters 𝛽1 and 𝛽2 based on a random sample drown from the population.
Before running any regression, however, it is important to examine the relationship between
the two variables graphically and determine if a linear model would fit the data very well. To
do so we use a simple scatter plot. A scatter plot can be drawn in STATA using the graphics
facilities.
Click <Graphics> <Twoway Plots> and select ‘Scatter’ from the plot type. And then
choose ‘income’ for the X variable and Consumption for the Y variable or you can type
scatter consm income in the command window.
250
200
consm
150
100
50
The graph suggests that there is a linear relationship between the two variables and a linear
regression model will be appropriate.
In addition to the scatter plot, you can supplement your understanding of the data by looking
at the summary statistics (The mean, median, the inter-quartile range etc.)
Estimation
Running regression in STATA is simple and straightforward. We use the regress command.
Regress consm income (reg for short) requests stata to run regression of the dependent
variable consumption on the explanatory (predictor) variable income.
̂
To begin with we can write the model as: 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = 70.21 + 0.245 𝑖𝑛𝑐𝑜𝑚𝑒
This result suggests that for a one dollar increase in income, average consumption rises by
0.245 cents. Which upon initial inspection appears to support the Keynesian theory of
consumption stated above. In the model 𝛽1 = 70.21 which is the intercept. Often the
intercept does not carry much of an economic meaning -but in this case it can be viewed as
autonomous consumption, i.e., the average consumption when income is zero.
The reported 𝑅 2 suggests that income explains 94.66% of the total variation in consumption.
In other words, our model explains 91.25% of the interpersonal differences in consumption
expenditure. In the regression results reported above SS stands for sum of squares. Therefore,
the Model SS, i.e., 5408.4973 stands for ESS (the Explained Sum of Squares) and Residual
SS which equals 5185.46934 corresponds to the RSS (Residual Sum of Squares); measures
which are discussed in the lecture sessions. The Total SS is the sum of the ESS and RSS. As
you already know the 𝑅 2 = 𝐸𝑆𝑆⁄𝑇𝑆𝑆.
Hypothesis Testing
The reported t-statistic which equals 17.09 is based on the null hypothesis that the coefficient
of income in this model is equal to zero, i.e., 𝐻0 : 𝛽2 = 0 against 𝐻1 : 𝛽2 ≠ 0
̂2 −𝐸(𝛽̂
𝛽 2)
̂ −𝛽
𝛽2 2
We know 𝑡 = ̂ = 𝑠𝑒(𝛽
̂)
𝑠𝑒(𝛽2 2
For this reason, the t-statistic for the coefficient of income reported above is the ratio of the
coefficient to its own standard error, i.e., t=0.2453/0.0144=17.09 (except for rounding error).
This is calculated t value.
To test the stated hypothesis at 5% level of significance, you need to look at the t-distribution
table.
1
Since this is a two tailed test, you need to use the top row and look for 0.025 (2 𝛼 where 𝛼 =
0.05) with a degrees of freedom equal to 28. The degrees of freedom is the total number of
observations less the number of coefficients estimated. That is critical 𝑡𝛼/2,28 = 2.048.
Although the preceding test procedures explain the practice of hypothesis testing, they do not
constitute a test of the Keynesian theory of consumption. The Keynesian theory does not
question whether the coefficient on income is zero or not. (i.e. whether there is a relationship
between consumption or not). Rather its concern is whether the marginal propensity to
consume is less than one or not. In that sense, the hypothesis should be expressed as:
𝐻0 : 𝛽2 = 1 against \𝐻1 : 𝛽2 ≠ 1
Accordingly, the t-statistic should be calculated as follows:
̂2 −𝐸(𝛽̂
𝛽 2)
̂ −𝛽
𝛽 2 2 0.245−1
|𝑡| = | ̂2 |=| 𝑠𝑒(𝛽
̂)|=| 0.01435 | = 52.61
𝑠𝑒(𝛽 2
Once again the calculated value is greater than the critical value of 2.048 confirming that our
regression result supports the Keynesian theory of consumption.
Drawing the regression line
Our model as already indicated is 𝑌𝑖 = 70.21 + 0.245𝑋𝑖 + 𝑢𝑖
Where Y is consumption and X is income. The expected value of this model is the systematic
̂1 + 𝛽
component which is often represented as 𝑌𝑖 = 𝛽 ̂2 𝑋𝑖 = 70.21 + 0.245𝑋𝑖
We call this the predicted value and we use STATA’s predict command to calculate it.
Predict yhat
This command generates a new variable yhat which contains the predicted value of
consumption for each value of income. To see the regression line, we resort back to the
graphics command.
You can as well type the following command to get the same result.
twoway (scatter consm income) (line yhat income)
Figure-2
250
200
150
100
50
Such a graph allows us to inspect how well the regression line represents the data points. We
can see that the regression curve is a good fit. Once you know how the regression line is
obtained based on the predicted value, there is even an easier way to get the same graph in
STATA.
predict resid, residual here resid is the variable name we chose to give the residual
generated by this command. The option residual after the comma, tells stata to generate the
residual. This is to distinguish it from the command used to generate the fitted value.
In this exercise you are required to repeat the preceding regression on transformed variables,
i.e., on the logarithms of consumption and income. First you need to generate the logarithmic
values:
gen lnconsm=ln(consm)
gen lnincome=ln(income)
Your Task:
i) Run a regression of log-consumption on log-income and comment on your results.
ii) Draw a graph with scatter plot overlaid with the fitted value based on the log-
linear model. Does the regression line fit the data very well?
iii) Construct a 90% confidence interval for the slope coefficient.
iv) Test the null hypothesis that the income elasticity of consumption equals to one,
i.e., testing unitary elasticity. Use the confidence interval approach at 90%
confidence level.
v) Use the t-test at 1% level of significance to test the same null hypothesis that
income elasticity of consumption equals one.
vi) Test the hypothesis that the log-linear model is a regression through the origin.
Use the 5% level of significance.
vii) Test if the normality assumption holds in this regression using graphical methods.