0% found this document useful (0 votes)
21 views

Lab - Lecture 1

This document provides instructions for students to conduct a hands-on computer exercise to estimate a simple linear regression model using STATA. It uses hypothetical weekly income and consumption expenditure data to estimate a regression model and interpret the results. Key steps covered include generating a scatter plot to assess linear fit, running the regression, interpreting coefficients and goodness of fit, conducting hypothesis tests on coefficients, and overlaying the regression line on the scatter plot.

Uploaded by

Sohel Rana Sohag
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lab - Lecture 1

This document provides instructions for students to conduct a hands-on computer exercise to estimate a simple linear regression model using STATA. It uses hypothetical weekly income and consumption expenditure data to estimate a regression model and interpret the results. Key steps covered include generating a scatter plot to assess linear fit, running the regression, interpreting coefficients and goodness of fit, conducting hypothesis tests on coefficients, and overlaying the regression line on the scatter plot.

Uploaded by

Sohel Rana Sohag
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Hajee Mohammad Danesh Science & Technology University, Dinajpur

Department of Economics
ECN-311: Basic Econometrics (Lab Lecture-1)
Course Teacher: Nazrul Islam

This lecture is a sequel to your lecture sessions on the theory of CLRM, the method and
properties of OLS estimation (BLUE) and hypothesis testing. From this lecture this lecture,
you will have a hands-on computer exercise in estimating simple (bi-variate) linear regression
models, interpreting regression results, testing hypothesis and carrying out basic post
regression diagnosies. The latter include statistical tests to detect whether the assumptions of
CLRM actually hold in a given example. The above mentioned elements constitute what we
call statistical inference.

We start with a hypothetical data on weekly family income and weekly family consumption
expenditure. Theory suggests that as income increases, individual tend to increase their
consumption expenditure but not by as much as the increase in income. Although theory does
not specify the functional form of this relationship it does suggest that the marginal propensity
to consume (MPC) is less than one. If we are willing to assume a linear relationship, the
econometric model would look like equation(1):

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 (1)

Where Y is weekly consumption and X is weekly income. Our objective is to estimate the
population parameters 𝛽1 and 𝛽2 based on a random sample drown from the population.

Before running any regression, however, it is important to examine the relationship between
the two variables graphically and determine if a linear model would fit the data very well. To
do so we use a simple scatter plot. A scatter plot can be drawn in STATA using the graphics
facilities.

Click <Graphics> <Twoway Plots> and select ‘Scatter’ from the plot type. And then
choose ‘income’ for the X variable and Consumption for the Y variable or you can type
scatter consm income in the command window.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


Figure-1: Regression of weekly consumption on income.

250
200
consm

150
100
50

0 200 400 600 800


income

The graph suggests that there is a linear relationship between the two variables and a linear
regression model will be appropriate.

In addition to the scatter plot, you can supplement your understanding of the data by looking
at the summary statistics (The mean, median, the inter-quartile range etc.)

Estimation
Running regression in STATA is simple and straightforward. We use the regress command.

Regress consm income (reg for short) requests stata to run regression of the dependent
variable consumption on the explanatory (predictor) variable income.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


STATA reports three blocks of results each providing information on different aspects of the
regression model. It is not just the coefficients and their significance that a researcher should
look for the entire set of information is important for a complete interpretation of the results.

̂
To begin with we can write the model as: 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = 70.21 + 0.245 𝑖𝑛𝑐𝑜𝑚𝑒

This result suggests that for a one dollar increase in income, average consumption rises by
0.245 cents. Which upon initial inspection appears to support the Keynesian theory of
consumption stated above. In the model 𝛽1 = 70.21 which is the intercept. Often the
intercept does not carry much of an economic meaning -but in this case it can be viewed as
autonomous consumption, i.e., the average consumption when income is zero.

The reported 𝑅 2 suggests that income explains 94.66% of the total variation in consumption.
In other words, our model explains 91.25% of the interpersonal differences in consumption
expenditure. In the regression results reported above SS stands for sum of squares. Therefore,
the Model SS, i.e., 5408.4973 stands for ESS (the Explained Sum of Squares) and Residual
SS which equals 5185.46934 corresponds to the RSS (Residual Sum of Squares); measures
which are discussed in the lecture sessions. The Total SS is the sum of the ESS and RSS. As
you already know the 𝑅 2 = 𝐸𝑆𝑆⁄𝑇𝑆𝑆.

Hypothesis Testing

The reported t-statistic which equals 17.09 is based on the null hypothesis that the coefficient
of income in this model is equal to zero, i.e., 𝐻0 : 𝛽2 = 0 against 𝐻1 : 𝛽2 ≠ 0

̂2 −𝐸(𝛽̂
𝛽 2)
̂ −𝛽
𝛽2 2
We know 𝑡 = ̂ = 𝑠𝑒(𝛽
̂)
𝑠𝑒(𝛽2 2

For this reason, the t-statistic for the coefficient of income reported above is the ratio of the
coefficient to its own standard error, i.e., t=0.2453/0.0144=17.09 (except for rounding error).
This is calculated t value.

To test the stated hypothesis at 5% level of significance, you need to look at the t-distribution
table.
1
Since this is a two tailed test, you need to use the top row and look for 0.025 (2 𝛼 where 𝛼 =
0.05) with a degrees of freedom equal to 28. The degrees of freedom is the total number of
observations less the number of coefficients estimated. That is critical 𝑡𝛼/2,28 = 2.048.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


The critical value is 2.048 which is far below the calculated t-statistic 17.09. We say that the
test is significant at 5% and we can thus reject the null hypothesis in favour of the alternative
hypothesis.
Notice that the 95% confidence interval confirms the same test result. If the null hypothesis
were true, the 95% confidence interval would have contained the hypothesized value, i.e.,
zero. But it does not and hence we can reject the null hypothesis with 95% level of
confidence. This means that there is a 5% chance that such an interval may contain the value
zero. The p-value gives another look at the test of significance. It tells us that chances of
committing Type-I error is very low, even less than 1 in a thousand.

Although the preceding test procedures explain the practice of hypothesis testing, they do not
constitute a test of the Keynesian theory of consumption. The Keynesian theory does not
question whether the coefficient on income is zero or not. (i.e. whether there is a relationship
between consumption or not). Rather its concern is whether the marginal propensity to
consume is less than one or not. In that sense, the hypothesis should be expressed as:

𝐻0 : 𝛽2 = 1 against \𝐻1 : 𝛽2 ≠ 1
Accordingly, the t-statistic should be calculated as follows:
̂2 −𝐸(𝛽̂
𝛽 2)
̂ −𝛽
𝛽 2 2 0.245−1
|𝑡| = | ̂2 |=| 𝑠𝑒(𝛽
̂)|=| 0.01435 | = 52.61
𝑠𝑒(𝛽 2

Once again the calculated value is greater than the critical value of 2.048 confirming that our
regression result supports the Keynesian theory of consumption.
Drawing the regression line
Our model as already indicated is 𝑌𝑖 = 70.21 + 0.245𝑋𝑖 + 𝑢𝑖
Where Y is consumption and X is income. The expected value of this model is the systematic
̂1 + 𝛽
component which is often represented as 𝑌𝑖 = 𝛽 ̂2 𝑋𝑖 = 70.21 + 0.245𝑋𝑖

We call this the predicted value and we use STATA’s predict command to calculate it.

Predict yhat

This command generates a new variable yhat which contains the predicted value of
consumption for each value of income. To see the regression line, we resort back to the
graphics command.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


Click <Graphics> <Overlaid twoway>
This option allows you to combine scatter plots with line graphs and many others. Here we
will combine the scatter plot we have already seen and over lay it with the regression line. All
we need to do is to choose Scatter plot for plot 1 and Line graph for plot 2. In both plots we
use income as the X-variable. In plot-1 the Y-variable will be consumption and in Plot-2 the
Y-variable will be yhat.

You can as well type the following command to get the same result.
twoway (scatter consm income) (line yhat income)
Figure-2
250
200
150
100
50

0 200 400 600 800


income

consm Fitted values

• Indicates weekly consumption and solid line indicates fitted values

Such a graph allows us to inspect how well the regression line represents the data points. We
can see that the regression curve is a good fit. Once you know how the regression line is
obtained based on the predicted value, there is even an easier way to get the same graph in
STATA.

Click <Graphics> <Regression Diagnostics> <Component-plus-Residual> and select the


independent variable which in this case is income.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


Testing Normality
One of the key assumptions in regression analysis is that the error terms are normally
distributed.
There are several ways to test this assumption. In this exercise we focus on two approaches:
a) Using Histograms
b) Normal Probability plot or Quantile-Normal plot
These two are visual (informal) methods of checking normality. For these informal tests we
need to generate the residual from our regression. After running the regression, you can get
the residuals by using the predict command:

predict resid, residual here resid is the variable name we chose to give the residual
generated by this command. The option residual after the comma, tells stata to generate the
residual. This is to distinguish it from the command used to generate the fitted value.

hist resid, bin(9) draws a histogram of the residual


hist resid, bin(9) norm superimposes the normal curve on the histogram.
qnorm resid draws a normal probability plot. Here we compare the quantiles of a variable
(in this case resid) against the quantiles of a theoretical normal distribution which has the
same mean and variance as resid. If resid is normally distributed, the normal probability plot
would show an overlap between the two distributions.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770


Exercise A

In this exercise you are required to repeat the preceding regression on transformed variables,
i.e., on the logarithms of consumption and income. First you need to generate the logarithmic
values:
gen lnconsm=ln(consm)
gen lnincome=ln(income)

Your Task:
i) Run a regression of log-consumption on log-income and comment on your results.
ii) Draw a graph with scatter plot overlaid with the fitted value based on the log-
linear model. Does the regression line fit the data very well?
iii) Construct a 90% confidence interval for the slope coefficient.
iv) Test the null hypothesis that the income elasticity of consumption equals to one,
i.e., testing unitary elasticity. Use the confidence interval approach at 90%
confidence level.
v) Use the t-test at 1% level of significance to test the same null hypothesis that
income elasticity of consumption equals one.
vi) Test the hypothesis that the log-linear model is a regression through the origin.
Use the 5% level of significance.
vii) Test if the normality assumption holds in this regression using graphical methods.

Nazrul Islam, LECTURER, DEPT. OF ECONOMICS, RABINDRA UNIVERSITY, SIRAJGANJ-6770

You might also like