0% found this document useful (0 votes)
18 views19 pages

Tutorial3 EBC2090

Uploaded by

Haoxiang Xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Tutorial3 EBC2090

Uploaded by

Haoxiang Xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Growth Project Empirical Econometrics:

Tutorial Assignments

Block Period 2 2024-2025

Getting Started in R
You will have to analyze your data in R. R is a computer programming language that can be used for
statistical analyses. You can find a manual on http://www.r-project.org/, following “Manuals” to “An
Introduction to R”, or you can consult an interactive introductory course on https://www.datacamp.com.
Throughout the tutorial assignments initial help will be provided to appropriately analyze your data in
R. !! Importantly, we will provide initial help for specific functions the first time that you will need to make
use of this function. If in a later assignment (in the same tutorial or in a next tutorial), you will again
have to make use of a function that was introduced to you already earlier on, then that function will not be
repeated !!

Software Access
Start by download R from https://cran.r-project.org. A user-friendly interface is available via RStudio,
to be downloaded from https://www.rstudio.com/products/rstudio/download. You will need to have
downloaded R and RStudio preferably before the first lecture, but definitely before handing in the first
tutorial assignment in week 1 as you will need it!
When you open RStudio, the screen is divided into three main windows, as shown in Figure 1. First, the
console (left) where you can directly give in commands and where output is returned. Under the default text
that is already given in this window, you can see the “>” sign, followed by the text cursor. This is called the
“prompt”, through which R indicates that it is ready to execute a new command. Second, the environment
window (top right) gives an overview of all objects in memory. Third, in the bottom right window, plots are
returned, help files can be accessed and packages can be downloaded.

First Steps in R
R can be used as a simple calculator. Try to enter
1350 + 6750
in the console. R will give you the following answer:
[1] 8100

If you want to re-execute your last command, you do not have to type it in all over again. Just press on
the upwards arrow button on your keyboard and the last command will reappear. If you want, you can now
make adjustment to this command by using the left and right arrow buttons.

Creating objects. You can also create objects in R. You could consider an object to be a “box” to which
you give a name and in which you store data such as numbers, vectors, matrices, etc. Suppose that we want
to create the object “x” to which we want to attribute the value “5”. You can do this by executing the
command
x <- 5
where you attribute, through the command <-, the value 5 to the object x.

1
Figure 1: RStudio and its different windows.

If you would now want to ask R what the object “x” contains, you can simple give the command
x
and R will reply with
[1] 5

Similarly, you can attribute a vector to “x” through the function c() where the elements in the vector
are separated by a comma:
x <- c(3,7,10)
If you now ask R what the object “x” contains, R will reply with
[1] 3 7 10
You can also access individual elements in a vector through the brackets [ ]. For instance: if you would
like to find out what the second element of “x” is, you can give the command
x[2]
and R will reply with
[1] 7

When assigning objects, please take into account that you cannot use object names that belong to R’s
internal vocabulary. For example, you cannot create an object with the name “sqrt”, as this is reserved for
computing the square root of a number. Furthermore, R is case sensitive! “x” and “X” are thus two different
objects!

Logarithm and differences. Throughout the course, we will often make use of logarithmic and/or dif-
ference transformations to make our time series stationary. Let “x” again be defined as the vector described
previously:
x <- c(3,7,10)
You can now create a new object called “log x” that represents the natural logarithm of “x”
log_x <- log(x)
To see the value of “log x”, type in
log_x
in the console and you get
[1] 1.098612 1.945910 2.302585

Similarly, you can define “d x” as the first differences of “‘x”


d_x <- diff(x)

Page 2
Figure 2: RStudio and R scripts.

d_x
[1] 4 3
which indeed returns the differences between the consecutive elements in the vector (i.e. 7 − 3 = 4 and
10 − 7 = 3). Finally, let us compute “dlog x” as the first differences of the log-transformed “x”
dlog_x <- diff(log(x))
dlog_x
[1] 0.8472979 0.3566749
Computing such a log-difference transformation will turn out to be useful to obtain growth rates of our time
series, as you will learn during the tutorials.

If you ever need some further documentation on one of R’s functions, for example on the log function,
you can use the question mark functionality:
?log
after which the function documentation pops up in the bottom right window of RStudio.
If you would like to execute a function, for example taking the logarithm of a number, but you do not
know the exact name of the function, you can try:
??logarithm
and several documentations files will be suggested.

R Scripts
Suppose that you have been working in R for several hours, but it is getting late and you want to continue
your work tomorrow. If you would now close R, all of your work would be gone! To avoid this problem, we
will not give R commands directly through the R console, but save them in an R script. Hence, for your
own records, and with a view on preparing your final paper, it is a good idea to keep a systematic record of
all your workflow through R scripts.
You can open a new R script by clicking in the menu bar (at the top of RStudio) on “File”, “New File”,
“R Script”. The left panel then gets divided into two windows, see Figure 2: The top one is your R script,
the bottom one is the console we have been using until now. In the R script, you can now enter commands
like we did before and execute them by first selecting the command and then clicking “Run” (keyboard short
cuts are available but depend on your system). When you are finished working in R, save the script (“File”
and then “Save as”). You can later re-open these R scripts in RStudio to continue working with them.
When you write R scripts, the code can become very long and obscure. To clarify your work, you can
use commentary lines in R to provide your code with additional info. Such commentary lines should always
be start with the “#” sign. You could even include output of your analysis as comments in your R script.

Page 3
Data Source
The data sets provided to you come from the Penn World Table (PWT) version 10.0. PWT is a secondary
data source, conveniently and freely accessible through the website of the Groningen Growth and Develop-
ment Centre (GGDC), where you will also find detailed and up-to-date documentation: www.rug.nl/ggdc/
productivity/pwt.
When using these data, please refer to the following paper: Feenstra, Robert C., Robert Inklaar and
Marcel P. Timmer (2015), “The Next Generation of the Penn World Table” American Economic Review,
105(10), 3150-3182, available for download at www.ggdc.net/pwt.

Introduction to the Data


Recall the basic GDP identity from macroeconomics:
Y =C +G+I +X −M
where GDP (represented by Y) is decomposed in four major categories: private consumption (C), government
consumption (G), investments (I), and international trade (exports X minus imports M ). This identity is
the major foundation for national accounts and holds both in current and in constant prices (apart from an
unavoidable statistical discrepancy):
YU = CU + GU + IU + XU − M U ,
YO = CO + GO + IO + XO − M O .
The notation is conventional: Y U (Y O) for GDP in current (constant) prices; CU (CO) for private consump-
tion expenditures in current (constant) prices; GU (GO) for government expenditures in current (constant)
prices; IU (IO) for investment expenditures in current (constant) prices; XU (XO) for exports in current
(constant) prices; and M U (M O) for imports in current (constant) prices.
You will use annual data on these macroeconomic aggregates for one specific country and for the longest
time period available, both in current prices (actual values) and in constant prices. Constant prices means
that the value of the macroeconomic aggregate has been recalculated using price levels of some fixed base
year. As a rule, national account data are available for the main OECD countries starting sometime between
1950 and 1970. In the case of developing countries or regime changes data start later. Table 1 provides and
overview of the key variables contained in your data set, together with their acronyms we will use in the
tutorial assignments to refer to a particular variable.
Note that successive observations of a time series are distinguished by a time index t , as in CUt , IUt , Y Ut .
The index t runs from the first to the last time period (in our case years), and this is written as t = 1, . . . , T .
This notation is only needed in algebraic equations. Computer programs like R never loose track of the time
index but keep it implicit. In the tutorial assignments, we may omit the time indexes when convenient.
In addition to System of National Accounts (SNA) data, you might need variables like the population
size. There are of course many more variables that could potentially be useful. Some main candidates that
are included in the data files are listed in Table 2. Interested students are encouraged to go to the website of
the Penn World Project and download the raw data files. The constructed files on Canvas are created from
the data sets “pwt100” and “pwt100-na-data”. For every variable we use in our data set, make sure to trace
it back to the appropriate variable in the raw data files.1
While most tutorial assignments only require you to use the variables in the provided data files, more
examples of potentially useful variables are representative interest rates (short or long term, accounting for
both the cost of borrowing and wealth effects); an unemployment rate (to reflect job insecurity); a stock
market index (tracking wealth effects and the economic climate); etc. You may gather more potentially
interesting variables, but make sure to take note of (a) the sources, (b) the definitions, and (c) the units of
measurement of any data that you look up.
If you collect additional data, go ahead and prepare a coherent data set. An Excel spreadsheet can be
useful at this stage, but must be used very carefully so as to ensure reproducibility. When ready this data
set will have to be loaded into R.
1 You should be able to directly retrieve all variables except for the variable KO, which you can construct as KO =
rkna ∗ q gdp/rgdpna, please see www.ggdc.net/pwt for extensive documentation.

Page 4
Table 1: Overview of key GDP components and identifier variables.

V ariableN ame Description


COU N T RY Country name (country identifier)
COU N T RY CODE 3-letter ISO country code (country identifier)
CU R Currency unit
Y EAR Year (time identifier with equally spaced, unique and recognisable values)

YU GDP (or GNP) at actual, cUrrent national prices


= nominal GDP (or nominal GNP) in macroeconomics texts
YO GDP (or GNP) at base year (2017), cOnstant national prices
= real GDP (or real GNP) in macroeconomics texts
CU Private Consumption valued at actual, cUrrent national prices
= nominal consumption expenditures in macroeconomics texts
CO Private Consumption valued at cOnstant, base year national prices
= real consumption expenditures in macroeconomics texts
GU Government expenditures valued at actual, cUrrent national prices
= nominal Government expenditures (or government consumption)
GO Government expenditures valued at base year, cOnstant national prices
= real Government expenditures (or real government consumption)
IU Investments (Gross Capital Formation) valued at actual, cUrrent national prices
= nominal investment expenditures
IO Investments (Gross Capital Formation) valued at base year, cOnstant national prices
= real investment expenditures
XU eXports of goods and services valued at actual, cUrrent national prices
= nominal exports in macroeconomics texts
XO eXports of goods and services valued in cOnstant, base year national prices
= real exports in macroeconomics texts (also “exports volume”)
MU iMports of goods and services valued at actual, cUrrent national prices
= nominal imports in macroeconomics texts
MO iMports of goods and services valued in cOnstant, base year national prices
= real imports in macroeconomics texts (also “imports volume”)

Table 2: Additional candidate variables.

V ariableN ame Description


P OP Population size (total number of inhabitants in millions)
KO Physical capital stock valued at base year, cOnstant prices
EM P L Number of persons engaged/ EMPLoyed (in millions)
(constructed by accumulating past investments after depreciation)
HCAP Human CAPital index, based on years of schooling and returns to education
EXR Exchange Rate (national currency/USD (market or estimated))

Page 5
Tutorial 1: Exploring R and Reviewing Regression Analysis
In this tutorial, you will learn to work with R, you will inspect your data through plots and you will review
the basics of regression analysis.

1. Getting started in R.
(a) Read the section “Getting Started in R” at the start of this document to get you started with
this tutorial.
(b) Start by creating a directory on your computer, for instance, “tutorialsEBC2090”. This directory
should contain all files we use in this tutorial assignment.
(c) Go to Canvas and download the data file of your country. This is an .RData file. For instance,
the data file for the Netherlands is “NLD data.RData”. Save the RData file of your country in
the directory on your computer that you have just created (i.e. tutorialsEBC2090 in my case).
(d) Open RStudio and open a new R script. Give it an appropriate name, for instance, “EBC2090-
tutorial1” and save the file. To do this, click (in the menu bar at the top of RStudio) on “File” and
then “Save as”. Enter EBC2090-tutorial1 (or another file name) under “File name” and navigate
to the directory of your choosing (tutorialsEBC2090 in my case) to save the file there. You will
see that the file will be saved as an .R file.
(e) It is good practice to start your R script by clearing your environment in R, this can be done by
typing the following line into your R script
rm(list=ls())
and then pressing “Run” to execute it. To get more information on this function, remember that
you can execute the code ?rm and consult the corresponding documentation in the help-window.
(f) Next, you need to tell R the location of your working directory. This is the directory “tutori-
alsEBC2090” where we will save all the files we use in this tutorial. You can do this by clicking on
“Session”, “Set Working Directory”, and finally “Choose Directory”. Now scroll to the location of
your directory tutorialsEBC2090 and click on Open. You will see that this executes the command
in the form of
setwd("C:/..../tutorialsEBC2090")
in the R console to set the working directory in R. Note that the “...” in the command above will
not appear since the specific path will be dependent on the location of your directory on your
laptop, and hence different for everyone. Copy this command from the console into your R script
(on a new line) such that you can remember it and execute it for later use!
2. Importing the data.
(a) We are now ready to import our data into R. To load your data set into R, you can type the
command
load("NLD_data.RData")
into your R script and execute it. Naturally, if your country is not the Netherlands, you need to
write the appropriate name of your RData file here but also in the remainder of the exercises! You
should notice that in the environment window (top right panel of RStudio), the object “NLD data”
is now listed. If so, then you have successfully imported your data into R!
(b) Let us now inspect all variables that are included in your data file. To to this, type the command
View(NLD_data)
into your R script and execute it. This opens up a new window (new header will appear next
to your R script) with a spreadsheet type of view on your data. Scroll through your data set to
inspect it.
(c) If you want to know the names given to your variables in your data set, you can use
names(NLD_data)
By giving R the command
attach(NLD_data)
you can now address your variables with the names that were given to them!

Page 6
3. Time series plots. Visually inspect your data. That is the way to get to know your data and to trace
data errors. We start by making time series (line) plots.

(a) R (as any other software package) does not assume your variable is a time series by default, instead
it assumes it is a typical numerical variable (for numeric data). We thus need to explicitly declare
that the variable Y U is a time series. In R, create a new time series object, “YU ts”, by using
the function ts:
YU_ts <- ts(NLD_data$YU, start = 1950, frequency = 1)
where we tell R that the data set starts in year 1950 (start = 1950; ! check this, as this may be
different for your country!) and the data are annual (frequency = 1).
(b) Now make a time series plot by using the command
ts.plot(YU_ts)
Discuss the properties of the time series. Note that there are many additional arguments in the
plot function, to change the axis labels, to make the line thicker, .... you can explore these on
your own via the documentation provided in R.
(c) Optional Tip: It is convenient to save your plots as separate files, such that you can show them in
class or later include them in your paper. To save a figure in, for instance, .pdf format, you first
tell R to open a pdf file, where you give the file a name (time-plot-YU), and also set the width
and the height of the file. On the next line you then write the command for the figure you want
to plot. Then R will fill in the .pdf file with the figure (possibly even several figures on subsequent
pages!). You need to finally tell R (on the third line below) that it should close the .pdf file, as
you do not want to further add content to it:
pdf(file = "time-plot-YU.pdf", width = 6, height = 6)
ts.plot(YU_ts)
dev.off()
If you successfully created and closed the pdf file, it should appear in your working directory and
you can open the pdf file to inspect your plot! Final note: if you want to overwrite the content of
your pdf-file (for instance run the code again if you noticed you made a mistake), make sure that
your pdf file is closed on your laptop, otherwise R can not (over-)write the file!
(d) Now let us plot two time series on the same graph: namely Y U and Y O. You should start by
declaring the latter also as a time series (see instructions above!), give it the name“YO ts”. To
plot several time series on the same plot, you may use:
ts.plot(YU_ts, YO_ts, ylab = "YU versus YO", col = c("blue", "black"))
where we now specified what R should use as label on the vertical axis (via ylab), and where
we indicate that the first series should be visualized in blue, the second in black; you can choose
different colors! To add a legend to your graph, you can execute the following command after
your plot command:
legend("topleft", legend = c("YU", "YO"), col = c("blue", "black"), lty = c(1,1))
where you first indicate the position of the legend, then the argument legend specifies the text
that needs to be displayed, followed by the colors for the lines and the line type; where lty=c(1,1)
indicates that both lines– hence you use a vector –are corresponding to line type 1; namely a solid
line. Note that you can also add the lty argument in the ts.plot function; try what happens if you
use lty=c(2,2)...
Discuss the figure. Where do the series cross, and why? Are the series trending over time or do
they fluctuate around a constant mean?
(e) Now repeat the same exercise, hereby plotting Y U , Y O, IU , IO all on one graph! Discuss the
figure as you did above.

4. Scatter plots. Now inspect scatter plots, relating one variable to the other. Make a scatter of IO on
Y O: The first variable on the vertical axis, the second on the horizontal axis via the command
plot(x = YO, y = IO)

(a) Describe what a dot in this scatter plot represents.

Page 7
(b) Observe whether or not a relationship seems to emerge, and whether or not it could be approxi-
mated by a linear function.

5. Simple Regressions. Run a simple regression of investment (in constant prices) on output (in constant
prices) and a constant term:
IOt = β0 + β1 Y Ot + ut . (1)

In R, use the function lm to estimate a linear regression model. You can name the object however you
want, here we give the following name:
fit_IO_on_YO <- lm(IO~YO)
Note that an intercept is included by default. Then ask for a summary of your fitted linear regression
model:
summary(fit_IO_on_YO)

(a) What are the values of the estimates β̂0 and β̂1 ?
(b) Interpret the coefficient β̂1 .
(c) Is the variable Y Ot significant at 5% significance level? Answer this question in three different
ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 . Note that
the former two are displayed in the output, but you need to compute the 95% confidence interval
manually!
(d) You may also ask R to compute the 95% confidence interval:
confint(fit_IO_on_YO, parm= "YO", level = 0.95)
Verify your manual computation against the R output!
(e) Inspect the overall goodness-of-fit in terms of the R2 . Give its value and interpret it. Do you
notice something unusual?

6. Residual Inspection. Closely inspect the residuals of your estimated regression of IOt on Y Ot . In R,
the residuals of the regression are saved in your fitted regression object fit IO on YO (together with a
lot of other useful information). To see what information is saved in the fitted object, ask for:
names(fit_IO_on_YO)
you will notice that there is a slot “residuals” which indeed contains the residuals of the estimated
regression. You can access any output slot in the fitted regression object through the dollar symbol,
hence to plot the residuals, use
plot(fit_IO_on_YO$residuals, type = "l")
where the type argument indicates that you want to make a line graph.

(a) Examine the time pattern of the residuals. Does it seem that the residual series is distributed in
accordance with the assumption of random sampling?
(b) Plot the residual series against Y Ot in a scatter plot (the former on the vertical axis, the latter
on the horizontal axis). Does a visual inspection of the residual series suggest that they satisfy
the assumption of constant variance (homoskedasticity)?
(c) Plot a histogram summarising the frequency distribution of the residuals using the command
hist(fit_IO_on_YO$residuals)
Does the assumption of normality seem plausible?
(d) Formally whether the residuals are normally distributed using the Jarque-Bera test to test the
null of normality. It measures how much the skewness (asymmetry) and the kurtosis (curvature
and tail thickness) of the residual series differ from those of a Normal distribution.
This test is contained in a specific R library, namely the library tseries. So first we need to
install this library in R. You can install this library in R by going to the bottom left panel in
R, click in the menu bar on “Packages”, then “Install”. Type the name of the package, namely
tseries, and click on “Install”. R will install the package for you. Once this is done, then you

Page 8
need to load the library into R, such that you can access the functions in this library. To load the
library in R, use the command
library(tseries)
You can now perform the Jarque-Bera test:
jarque.bera.test(fit_IO_on_YO$residuals)
What is your conclusion?

Important note on R libraries: Installing a library only needs to be done once, but in case you
would like to make use of functions in the library, you need to load the library in every R session!

7. Log-Log Specification. Consider the simple log-log regression model:

ln IOt = β0 + β1 ln Y Ot + ut . (2)

(a) Generate the variable ln IOt which is the natural logarithm of the variable IOt :
lnIO_ts <- log(IO_ts)
Do the same for ln Y Ot .
(b) Make a time series plot of ln IOt , ln Y Ot . How does this plot compare/differ to the one you made
of IOt , Y Ot ? Can you think of reasons to apply the log-transformation?
(c) Make a scatter plot of ln IOt , ln Y Ot . Do you see a relationship emerging, and could it be
approximated by a linear function?
(d) Estimate the simple regression model in equation (2).
(e) What are the values of the estimates β̂0 and β̂1 ?
(f) Interpret the coefficient β̂1 . Be careful!, it has a different interpretation than in regression (1)!
(g) Is the variable ln Y Ot significant at 5% significance level? Answer this question in three different
ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 .
(h) Give the value of the R2 and interpret it.
(i) Inspect the residuals of the log-log model as you did in part 5. What are your conclusions?

8. Specification in Log-Differences. Consider the regression model in log-differences:

∆ ln IOt = β0 + β1 ∆ ln Y Ot + ut . (3)

where ∆ ln IOt = ln IOt − ln IOt−1 . The term “log-difference” is short for (first) logarithmic difference.
A log-difference should be interpreted as a rate of change (or growth rate). Log-differences are often
preferable to ordinary percentage changes because unlike percentage changes they are additive and
symmetric. It is important to understand the properties of logarithms! (See e.g. Appendix A.4 of
Wooldridge.)

(a) Generate the variable dlnIO which is the first difference of lnIO using the command:
dlnIO_ts <- diff(log(IO_ts))
Do the same for ∆ ln Y Ot .
(b) How many observations are available for the variable IOt and how many are available for ∆ ln IOt ?
Explain the difference!
(c) Make a time series plot of ∆ ln IOt , ∆ ln Y Ot . How does this plot compare/differ to the one you
made of ln IOt , ln Y Ot ?
(d) Make a scatter plot of ∆ ln IOt , ∆ ln Y Ot . Do you see a relationship emerging, an could it be
approximated by a linear function?
(e) Estimate the simple regression model in equation (3).
(f) What are the values of the estimates β̂0 and β̂1 ?

Page 9
(g) Interpret the coefficient β̂1 .
(h) Is the variable ∆ ln Y Ot significant at 5% significance level? Answer this question in three different
ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 .
(i) Give the value of the R2 and interpret it. Do you observe a difference compared to the earlier
regressions you ran?
(j) Inspect the residuals of the model in log-differences as you did in part 5. What are your conclu-
sions?

Page 10
Tutorial 2: Basic Time Series Regressions
In this tutorial, you will discuss basic time series concepts and time series regressions while revising how to
perform a joint hypothesis test. By default, we work with a significance level of 5% in this tutorial and in
the next ones!
!! Important Reminder: We will provide initial help for specific functions the first time that you will
need to make use of this function. Hence, functions introduced in the previous tutorial assignment(s) will
not be repeated here. You can always look back at the previous tutorial assignments if you no longer know
what the appropriate functions to use are !!

1. Setting up R. Set-up your R script as you did for the first tutorial, so: set your working directory, and
import your data into R.
2. Visual Inspection of Stationarity.
(a) Make a time series plot of ln IOt and ln Y Ot . Are these time series stationary? Discuss.
(b) Make a correlogram of ln IOt and ln Y Ot . Discuss the values of the autocorrelations at the first
couple of lags.
In R, use the command
acf(lnIO_ts)
to display the correlogram of a particular time series (assuming you created a time series object
for the log-transformed IO variable).
(c) Do the same for ∆ ln IOt and ∆ ln Y Ot : discuss stationarity based on the time series plot and
discuss the values of the autocorrelations.

3. Autoregressive Model for ln IOt . Consider the AutoRegression of order 1, denoted AR(1), for ln IOt :

ln IOt = β0 + β1 ln IOt−1 + ut . (4)

(a) To generate the response and predictor variable in regression model (4), you can make use of the
function embed:
lags_lnIO <- embed(lnIO, dimension = 2)
which generates a new matrix where the response ln IOt is contained in the first column and the
predictor ln IOt−1 is contained in the second column. More lags can be obtained by adjusting the
argument dimension.
(b) Inspect the newly created matrix lags IO via the function View. How many observations (rows)
does the matrix lags IO have?
(c) You can then access the first column in a matrix via [, 1], and the second column via [, 2] to
generate the response and predictor needed for estimating model (4):
lnIO_0 <- lags_lnIO[, 1]
lnIO_1 <- lags_lnIO[, 2]
where we use the notation x to denote the xth lag of a certain variable.
(d) Estimate the AR(1) model in equation (4) using the function lm.
(e) Interpret the value of the estimate β̂1 .
(f) What does the value of the estimate β̂1 tell you about the stationarity of the series ln IOt ?

4. Autoregressive Model for ∆ ln IOt . Consider the AR(1) model for ∆ ln IOt :

∆ ln IOt = β0 + β1 ∆ ln IOt−1 + ut . (5)

(a) Generate the variable dlnIO (the series IO in first differences). Then generate the variable dlnIO 0
and dlnIO 1 (respectively the response and predictor in equation (5)) via the function embed.

Page 11
(b) Estimate the AR(1) model in equation (5). How many observations are used to estimate this
model?
(c) Interpret the value of the estimate β̂1 .
(d) What does the value of the estimate β̂1 tell you about the stationarity of the series ∆ ln IOt ?
We will return to the topic of stationarity, unit roots and unit root tests in Tutorial 3! In this tutorial,
let us consider static time series regression models, finite distributed lag models and autoregressive
distributed lag models for ln IOt .

5. Static Model. Return to the log-log regression model for ln IOt :

ln IOt = β0 + β1 ln Y Ot + ut . (6)

(a) Explain what a static time series regression means and why the regression in equation (6) is one.
(b) Estimate model (6). Make a (i) line plot of the residuals, as well as a (ii) correlogram of the
residuals. Are the residuals autocorrelated?
(c) In case the residuals are autocorrelated: does this cause the OLS estimator to be biased?
(d) In case the residuals are autocorrelated: does this cause problems for inference (t-statistics, p-
values, ....). If so, can you think if solutions to circumvent this problem?

6. Finite Distributed Lag Model. Estimate the Finite Distributed Lag model of order one, denoted as
FDL(1):
ln IOt = β0 + β1 ln Y Ot + β5 ln Y Ot−1 + ut . (7)

Note: It will become clear later on in the assignment why we use β5 and not β2 in front of ln Y Ot−1 .
(a) Explain what a dynamic time series regression means and why the regression in equation (7) is
one.
(b) Generate the variables lnYO 0 and lnYO 1 by using the function embed.
(c) Estimate the FDL(1) model in equation (7). What would happen if you execute the following
code in R:
fit_FDL <- lm(lnIO ~ lnYO_0 + lnYO_1)
(d) To remove the first observation from a vector, you may use the notation [-1]. Generate the new
variable:
lnIO_0 <- lnIO[-1]
Discuss why the following regression will give you the desired outcome:
fit_FDL <- lm(lnIO_0 ~ lnYO_0 + lnYO_1)
Note: we have now over-written the variable lnIO 0 since we also generated this variable in
Assignment 3(c) above. But in fact, the definition of lnIO 0 here and the one given in Assignment
3(c) gives you exactly the same result. Discuss!
(e) Based on the regression output, (manually) draw a picture of the lag distribution: summarizing
the effect of ln Y O on ln IO at lag zero, one and two.
(f) What is the value of the estimated impact multiplier?
(g) What is the value of the estimated long-run multiplier?
(h) Test the joint null hypothesis H0 : β1 = β5 = 0 versus the alternative that at least one of the two
betas is different from zero.
In R, this joint hypothesis test, where we test the joint nullity of all regression parameters (apart
from the intercept) is by default reported in the summary output of your lm object, namely on
the last line. What is the value of the F -statistic? What is the corresponding p-value? What do
you conclude?

Page 12
7. Recap Multiple Hypothesis Testing. Consider the choice between current and constant prices. Extend-
ing the investment function with price indexes allows a formal comparison between nominal and real
specifications, by means of statistical hypothesis tests. We will work with the implicit price deflators
Y Ut IUt
P Yt = and P It = .
Y Ot IOt
(a) Run the extended regression of real investment on a constant, current and lagged real output,
both price indexes, and one lagged price index:

ln IOt = β0 + β1 ln Y Ot + β2 ln P It + β3 ln P Yt + β4 ln P It−1 + β5 ln Y Ot−1 + ut . (8)

You need to generate all your variables first. Assume that you name the fitted regression model
in (8) fit lnIO ur. Then present your regression output and test the separate hypotheses that
each price coefficient separately (β2 , β3 , β4 ) is in fact zero.
(b) Now consider the hypothesis that the price coefficients in equation (8) are all three zero:

H0 : β 2 = β 3 = β 4 = 0

(versus the alternative that at least one is different from zero). Note that if the hypothesis is true,
then regression (8) reduces to the regression (7).
Give the formula in Wooldridge to test this joint hypothesis.
(c) We will start by computing the F -statistic manually in R. You will have to compute sum of
squared residuals (SSR) of two regression models. Which regression models? You can use the
following code to obtain the SSR of, for instance, regression model (8):
SSR_ur <- sum(fit_lnIO_ur$residuals^2)
What is the value of the F -statistic? What are the degrees of freedom? Do you reject the null
hypothesis H0 : β2 = β3 = β4 = 0 or not?
Note that you can compute the critical values of the F-distribution in R via the function qf. Use
the documentation in R to appropriately fill in the arguments in the function to compute the
critical value for your country!
(d) We can also opt to directly perform the F -test in R. To this end, we can use the linearHypothesis
function in the R package car. Start by installing the package car. You can then use the following
command to directly obtain the F -test:
linearHypothesis(fit_lnIO_ur, c("lnPI_0=0", "lnPY_0=0", "lnPI_1=0"), test="F")
where you simply write out the restrictions under the null hypothesis by refering to the variable
names of the corresponding parameters. The argument test="F" ensures that you compute the
F -test. Does the output provided by R match with your manual computation? Interpret the
output.

(e) Next, an important hypothesis is that of price homogeneity, i.e., the theory that absolute price
levels are unimportant and instead only relative prices matter. Price homogeneity is a theoretical
property considered as desirable in economic models, at least in the long run. It implies the
absence of money illusion. Define the relative price of investment goods and services, P IRt :
P It
P IRt ≡ .
P Yt
Under strict price homogeneity, the three price indexes in the regression 8 may be replaced by
the single relative price, P IRt :

ln IOt = γ0 + γ1 ln Y Ot + γ2 ln P IRt + γ5 ln Y Ot−1 + ut . (9)

Write down the null hypothesis of strict price homogeneity in terms of the regression coefficients
of 8 (so in terms of the βs!). To this end, start from equation 9 and plug in the definition of P IRt .
Which restrictions on the regression coefficients of 8 arise then?

Page 13
(f) Test the null hypothesis of strict price homogeneity using the linearHypothesis function in R.
What do you conclude?

(g) A weaker version of price homogeneity would allow for short-run deviations of strict homogeneity.
One way to do this is to introduce an effect of investment price inflation dlnPIt = ∆ ln P It next
to relative prices:

ln IOt = γ0 + γ1 ln Y Ot + γ2 ln P IRt + γ3 ∆ ln P It + γ5 ln Y Ot−1 + ut . (10)

Here absolute price levels still play no role, but the rhythm of price changes does; price homo-
geneity holds only in the longer run.
Write down the hypothesis of weak price homogeneity in terms of the regression coefficients of (8)
(so in terms of the βs!). To this end, start from equation (10) and plug in the definition of P IRt
and ∆ ln P It . Which restrictions on the regression coefficients of (8) arise then?
(h) Test the null hypothesis of weak price homogeneity. What do you conclude?

(i) Finally, test the hypothesis that (8) simplifies to a simple relation between nominal investment
and nominal output:
ln IUt = γ0 + γ1 ln Y Ut + ut . (11)

This hypothesis too implies joint coefficient restrictions , and you need to find out precisely what
these restrictions are. Start from equation (11) and plug in the definitions of IUt and Y Ut . Which
restrictions on the regression coefficients of (8) arise then (so in terms of the βs!)?
(j) Test the null hypothesis you derived in part (h). What do you conclude?

8. AutoRegressive Distributed Lag Model. Consider the ARDL(1,1) model

ln IOt = β0 + β1 ln Y Ot + β2 ln Y Ot−1 + β3 ln IOt−1 + ut . (12)

(a) Explain why the regression in equation (12) is a dynamic time series regression.
(b) Explain in words (so intuitively) the difference between the FDL(1) in equation (7) and the
ARDL(1,1) in equation (12).
(c) Estimate the ARDL(1,1) model in R using the lm function.
(d) What is the value of the impact multiplier?
(e) What is the value of the long-run multiplier? To this end, start from the equilibrium model (see
model with ? notation on the lecture slides) and solve for the coefficient in front of ln Y O.

Page 14
Tutorial 3: Unit Roots, Trends, Unit Root Tests and Spurious
Regressions
In this tutorial, you will learn how to perform unit root tests, how to determine the order of integration of
a series, how to recognize spurious regressions and you will dive into one of its solutions: ARDL models.
1. Setting up R. Set-up your R script as you did for the previous tutorials, so: set your working directory,
and import your data into R.
2. Visual Inspection of Stationarity (Recap)
(a) Make a time series plot of ln IOt and ln Y Ot . Are these time series likely to be stationary?
Discuss.
(b) Make a time series plot of ∆ ln IOt and ∆ ln Y Ot . Are these time series likely to be stationary?
Discuss.

3. Visual Inspection of Trends. Consider the regression model for ln IOt with a trend:

ln IOt = β0 + β1 t + ut . (13)

For the trend t you can generate a variable trend in R via the commands:
n <- length(lnIO)
trend <- 1:n
where the function length returns the length of the variable (hence it gives you the sample size n),
and the command 1:n simply returns you a sequence of numbers from 1 to n in steps of one.
(a) Estimate regression model (13) using the lm function and carefully interpret the estimated coef-
ficient β̂1 .
(b) Save the residual of model (13). What do the residuals intuitively represent?
(c) Make a time series plot of the residual series. Do you think it is likely that ln IOt has a deter-
ministic or a stochastic trend? Explain the difference between both in your answer!
(d) If a series is trend stationary, what does this mean? Does it then have a deterministic or a
stochastic trend?

(e) Repeat the same exercise for ln Y Ot . What is your conclusion: is ln Y Ot likely to have a deter-
ministic or a stochastic trend?

4. Dickey-Fuller Unit Root Test (with constant and trend). To formally test whether a series has a
stochastic or a deterministic trend, we need to perform a unit root test with constant and trend.

(a) What is the null hypothesis of this unit root test? What is the alternative hypothesis?
(b) We start by running the Dickey-Fuller (DF) test (with constant and trend) for ln IOt .
In R, start by installing the package bootUR, which offers a wide range of unit root test. After
loading the library, you can then use the commands
df_lnIO <- adf(lnIO, deterministics = "trend", max_lag = 0)
df_lnIO
to perform a Dickey Fuller test (max lag = 0, hence zero lagged first difference terms included)
unit root test with a constant and trend term included (deterministics = "trend"). Present
the output of the unit root test for ln IOt , how should you interpret it?
(c) What is your conclusion for ln IOt : does it have a stochastic or a deterministic trend?
(d) How to proceed in case of a stochastic trend? How to proceed in case of a deterministic trend?

(e) Repeat the same exercise for ln Y Ot : does it have a stochastic or a deterministic trend?

Page 15
5. Augmented Dickey-Fuller Unit Root Test (with constant and trend). Now consider the “Augmented”
Dickey-Fuller (ADF) unit root test.
(a) How does the ADF unit root test differ from the DF test? Why is the augmentation needed?
(b) Run the ADF test for ln IOt .
In R, use the commands
adf_lnIO <- adf(lnIO, deterministics = "trend")
adf_lnIO
This function automatically includes lagged difference terms in the test equation, by using an
information criterion based on Akaike Information Criterion to determine how many of these
terms should be added.
Present the output of the unit root test for ln IOt . What is your conclusion for ln IOt : does it
have a stochastic or a deterministic trend?

(c) Repeat the same exercise for ln Y Ot . What do you conclude?

6. Bootstrap union of rejection test. In the previous exercise, we used the ADF test as a unit root test,
which is by far the most popular unit root test. Still the ADF test requires us to specify which
deterministic components to include in the test equation (a constant and a trend in case the series
displays a trend; a constant only when the series displays no trend). To relieve the user of making
this choice (in case it is not so clear cut), you may use the union of rejections test instead. The null
hypothesis and alternative hypothesis stay the same as before.
(a) Run the test via the command:
union_lnIO = boot_union(lnIO)
Present the output of the test for ln IOt . What is your conclusion for ln IOt : does it have a
stochastic or a deterministic trend?
(b) Repeat the same exercise for ln Y Ot . What do you conclude?

7. Unit Root Test on the series in log-differences. The series ln IOt or ln Y Ot will never be stationary (at
most trend-stationary). (Remind yourself why this is the case!). We now test whether the series in
log-differences are stationary.

(a) Perform the union of rejections test on ∆ ln IOt . What is the null hypothesis? What is the
alternative hypothesis? Present your output of the test. How should you interpret it?
(b) After having ran the unit root test on ln IOt and ∆ ln IOt what do you conclude about the order
of integration of ln IOt ?
Explain the difference between a series that is I(1) (“integrated of order one”) or I(0) (“integrated
of order zero) in your answer!

(c) Repeat the same exercise for ∆ ln Y Ot .

8. Static Regression for the series in log-levels (revisited) and Spurious Regressions. Re-consider the static
regression model for the series in log-levels:

ln IOt = β0 + β1 ln Y Ot + ut . (14)

(a) Given the outcome of your unit root tests, is the static regression model (14) possibly a spurious
regression?
Explain what a spurious regression means and what drives this!
(b) Is it “safe” to interpret the regression output of model (14)?
(c) Re-inspect the value of the R2 . Is it spurious? Should we interpret it?
(d) What are solutions to the spurious regression problem?

Page 16
(e) Which solutions have we considered already in earlier tutorials, which haven’t we considered yet?

9. Static Regression for the series in first differences and Spurious Regressions. Re-consider the static
regression model for the series in first differences:

∆ ln IOt = β0 + β1 ∆ ln Y Ot + ut . (15)

(a) Given the outcome of your unit root tests, is the static regression model 15 possibly a spurious
regression?
(b) Is it “safe” to interpret the regression output of model 15?
(c) Re-inspect the value of the R2 . Is it spurious? Should we interpret it?

10. ARDL models: short-run and long-run effects. In this tutorial, we will zoom into one of the solutions
for spurious regression problems, namely ARDL models. Consider the ARDL model

yt = β0 + β1 xt + β2 xt−1 + β3 yt−1 + ut (16)

where you may take yt ≡ ln IOt and xt ≡ ln Y Ot .


(a) Revisit the assumptions needed for OLS to be unbiased or consistent. Can we still rely on strict
exogeneity of the regressors in ARDL models? Why (not)?
(b) Estimate the ARDL(1,1) model. Note that you have estimated this model already in Tutorial 2!
Below, we further investigate the estimation output.

We now examine how a permanent rise in xt (a “permanent shock”) affects the conditional mean
of yt in the following years. Define three time horizons: the same year as the shock (short-run),
one year later (medium-run), and many years later (long-run), with corresponding effects analyzed
below.
(c) Short-run. Define the same-year effect, known as the “impact multiplier”, as

θ1 ≡ E (yt | xt , yt−1 , xt−1 , . . .) . (17)
∂xt

Finding the impact multiplier for the ARDL(1,1) model should be easy (you did this already in
Tutorial 2)! It is simply the instantaneous partial derivative:

θ1 ≡ E (yt | xt , . . .) = β1 .
∂xt

What is the value of the impact multiplier for the ARDL(1,1) model you estimated?
(d) Medium-run. Define the cumulative effect after two years, known as the “two-year (interim)
multiplier”, as
 
∂ ∂
θ2 ≡ + E (yt | xt , yt−1 , xt−1 , . . .)
∂xt ∂xt−1

= θ1 + E (yt | xt , yt−1 , xt−1 , . . .) . (18)
∂xt−1
This is the sum of the impact multiplier and the second-year partial effect of the shock.
To obtain the two-year (interim) multiplier for model 16, start by substituting away yt−1 as
follows:

yt = β0 + β1 xt + β2 xt−1 + β3 (β0 + β1 xt−1 + β2 xt−2 + β3 yt−2 + ut−1 ) + ut


= β0 (1 + β3 ) + β1 xt + (β2 + β3 β1 ) xt−1 + β3 (β2 xt−2 + β3 yt−2 + ut−1 ) + ut .

Page 17
From this expression, you can easily obtain the second-year partial effect as the partial derivative


E (yt | xt , . . .) = β2 + β3 β1
∂xt−1
as the coefficient in front of xt−1 . The two-year multiplier is found as the sum of these two partials,

θ 2 ≡ β1 + β2 + β3 β1 .

What is the value of the two-year multiplier for the ARDL(1,1) model you estimated?
(e) Long-run. Define the cumulative long-run effect, known as the “total multiplier” as

!
X ∂
θ∞ ≡ E (yt | xt , yt−1 , xt−1 , . . .) . (4.6.iii)
i=0
∂xt−i

This is the sum of all partial effects, at impact and in the entire sequel of years.
To determine long-run effects in a model, we establish whether the model admits a state where all
variables have converged to some static “equilibrium” level. See what happens when you drop all
time subscripts and replace them by stars (to indicate constant equilibrium values), then solve the
resulting relationship for the dependent variable. For instance, the ARDL(1,1) model becomes

y∗ = β0 + β1 x∗ + β2 x∗ + β3 y∗ + u∗ .

Setting u∗ we can solve for y∗ :


β0 β1 +β2 β0
y∗ = 1−β3 + 1−β3 x∗ = 1−β3 + θ ∞ x∗ . (4.6.v)

This is a stationary state (assuming β3 < 1) which can be viewed as a hypothetical long-run
equilibrium of the model. The coefficient of x∗ here denoted as θ∞ = β1−β
1 +β2
3
is the long-run effect
on y∗ of shocks in the explanatory variable (Cf. Wooldridge § 10.2, Problem 10.3).
What is the value of the long-run multiplier for the ARDL(1,1) model you estimated?

(f) Now, let us investigate whether the impact, two-year and long-run multipliers are significantly
different from zero.
Obtain the standard error of the estimated impact multiplier, this should be easy. Is the impact
multiplier significantly different from zero?
(g) Obtaining the standard error for the two-year and long-run multipliers is more difficult. Let us
consider the two-year multiplier.
To obtain a standard error for the two-year multiplier estimate θ̂2 , you need to apply the reshuffling
or “theta trick” (Wooldridge § 4.4). In this example, the reshuffling trick is to substitute out one
of the β 0 s, say β1 , from the estimating equation (16) in favor of θ2 , using
θ2 −β2
β1 = 1+β3 ,

and then to rearrange the terms in (16) so as to estimate θ2 :


θ2 −β2
yt = β0 + 1+β3 xt + β2 xt−1 + β3 yt−1 + ut .

The above equation is nonlinear in its coefficients. To implement the last nonlinear regression in
R, you need to use the function nls and enter it as an explicit algebraic equation in your software:
nls_theta2 = nls(lnIO_0 ~ beta0 + ((theta2 -beta2)/(1+ beta3))*lnYO_0 + beta2*lnYO_1 +
beta3*lnIO_1, start = list(beta0 = 1, theta2 = 1, beta2 = 1, beta3 = 1))
The estimated coefficient “theta2” is a direct estimate of the two-year multiplier θ2 and its stan-
dard error is reported along with the estimate! Note that we provide some starting values, which
are the values in the list since nonlinear estimation procedures are iterative. In the code above,

Page 18
we initialize all parameters at one, but you can use their actual values since you have computed
these before! So you can use these as starting values to ensure faster convergence of the non-linear
least squares estimation. As a double check: verify that the estimated values of the betas and the
thetas in the output of the nls estimation coincide with the values you obtained above, as they
should!
Implement this procedure to get the standard error for θ̂2 . Is the two-year multiplier significant?
(h) Implement a similar “theta” trick to get the standard error for θ̂∞ . Start by re-expressing β1 in
favor of θ∞ . Which expression to you get? Run the non-linear regression to get the standard
error. Is the long-run multiplier significant?

Page 19

You might also like