0% found this document useful (0 votes)
25 views

Time Series Linear Models

Uploaded by

Ai Kar Pao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Time Series Linear Models

Uploaded by

Ai Kar Pao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Time series Linear models

Assoc.Prof. Woraphon Yamaka


CEE and ECON
Two Objectives of Time Series Models
• Forecasting Future Values: Time series models aim to predict future
values based on historical data. By identifying patterns and trends in
past observations, these models can generate forecasts that help in
decision-making and planning.
• Understanding Temporal Relationships: Another goal is to uncover
and understand the underlying temporal relationships within the
data. This involves analyzing how different variables interact over
time and identifying the factors driving changes in the observed time
series.
The differences between statistical modeling
and machine learning

Statistical modeling Machine learning


Formalization of relationships between variables in the Algorithm that can learn from the data without relying
form of mathematical equations. (Y=XB+u) on rule-based programming
Required model distribution assumption. Does not need to assume underlying shape, as
machine learning algorithms can learn complex patterns
automatically based on the provided data.
Various diagnostics of parameters are performed, Not perform any statistical diagnostic significance tests
like p-value, confident interval, t-test
statistics and mathematics background Computer science
Forecasting Future Values

• 1. Collection of data
• 2. Data preparation and missing/outlier treatment.
• 3. Data analysis and feature engineering : Data needs to be analyzed in order to
• find any hidden patterns and relations between variables, and so on
• 4. Train algorithm on training and validation data: data will be divided into three subsets (training,
validation, and test data) for guarantee the forecasting validation.
• 5. Test the algorithm on test data: Once the model has shown a good enough
performance on train and validation data, its performance will be checked against
• unseen test data. If the performance is still good enough, we can proceed to the
• next and final step.
Training, validation, and test data

Machine learning utilizes optimization for tuning all the parameters of various algorithms.
Hence, it is a good idea to know some basics about optimization.
Example of Econometrics and Machine
learning models
• 1. Regression
• 2. Decision Tree
• 3. Random Forest
• 4. Support Vector Machine
• 5. Neural network
1. Regression
• Linear regression is a basic and commonly used type of predictive analysis. The overall idea of
regression is to examine two things:
• (1) Does a set of predictor variables do a good job in explaining an outcome (dependent) variable?
• (2) Which variables in particular are significant predictors of the outcome variable, and in what way do
they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?
y
• These regression estimates are used to explain the relationship between one dependent variable and
one or more independent variables.

y  X   u
Dynamic regression model (Autoregressive
(AR model)
• AR (AutoRegressive) model is a type of statistical model used for analyzing and forecasting time
series data. In an AR model, the current value of a variable is expressed as a linear combination of
its previous values, plus a random error term. The general form of an AR model of order p (AR(p))
is:

Yt    1Yt 1  ...   pYt  p   t


p
Yt      jYt  j   t
j 1
Dynamic regression model (Moving Average (MA)
model)
• MA (Moving Average) model is a type of time series model that represents the current value of a
variable as a linear combination of past error terms (or shocks). Unlike the AR (AutoRegressive)
model, which relies on past values of the series itself, the MA model uses past forecast errors to
model the current value. The general form of an MA model of order q (MA(q)) is:

Yt    1 t 1  ...   q t q   t
q
Yt      j  t  j   t
j 1
Dynamic regression model (Autoregressive
Moving Average (ARMA) model)
• ARMA (AutoRegressive Moving Average) model is a combination of two types of time series
models: the AutoRegressive (AR) model and the Moving Average (MA) model. It is used for
analyzing and forecasting stationary time series data, capturing both the relationships between
past values of the series and the past forecast errors.
• The general form of an ARMA(p, q) model is:

Yt    1Yt 1  ...   pYt  p  1 t 1  ...   q t q   t


p q
Yt      jYt  j  i t i   t
j 1 i 1
Other extension of ARMA
• An ARIMA (AutoRegressive Integrated Moving Average) model is a widely used time series
forecasting method that combines three components: AutoRegressive (AR), Integrated (I), and
Moving Average (MA). ARIMA is particularly useful for non-stationary time series data, where the
data's statistical properties, such as mean and variance, change over time.The ARIMA model is
denoted as ARIMA(p, d, q), where:p is the order of the AutoRegressive (AR) component.d is the
order of differencing needed to make the series stationary.q is the order of the Moving Average
(MA) component.
• The ARIMA (p, d, q), model can be expressed as:

 d Yt    1 d Yt 1  ...   p  d Yt  p  1 t 1  ...   q t  q   t


p q
d d
 Yt      j  Yt  j   i t i   t
j 1 i 1
Other extension of ARMA
• A SARIMA (Seasonal AutoRegressive Integrated Moving Average) model is an extension of the ARIMA model
that explicitly deals with seasonality in time series data. SARIMA is particularly useful for data that exhibit
regular patterns or cycles over a fixed period, such as monthly sales figures, quarterly GDP, or daily
temperatures.
• The SARIMA model is denoted as ARIMA(p, d, q)(P, D, Q)[s], where:
• p: Order of the non-seasonal AutoRegressive (AR) part.
• d: Order of non-seasonal differencing.
• q: Order of the non-seasonal Moving Average (MA) part.
• P: Order of the seasonal AutoRegressive (SAR) part.
• D: Order of seasonal differencing.
• Q: Order of the seasonal Moving Average (SMA) part.
• s: Length of the seasonal cycle (e.g., 12 for monthly data with yearly seasonality).

 P ( B s )(1  B s ) D (1  B )d Yt    1 d Yt 1  ...   p  d Yt  p  1 t 1  ...   q t q  Q ( B s ) t


p q
d d
 Yt      j  Yt  j   i  t i   t
j 1 i 1
2. Decision Tree

• Decision trees are a popular model, used in operations research, strategic planning, and
machine learning. Each square above is called a node, and the more nodes you have, the
more accurate your decision tree will be (generally). The last nodes of the decision tree,
where a decision is made, are called the leaves of the tree. Decision trees are intuitive and
easy to build but fall short when it comes to accuracy.
2. Decision Tree
3. Random Forest

• Random forests are an ensemble learning technique that builds off of decision trees.
Random forests involve creating multiple decision trees using bootstrapped datasets of the
original data and randomly selecting a subset of variables at each step of the decision tree.
The model then selects the mode of all of the predictions of each decision tree. What’s the
point of this? By relying on a “majority wins” model, it reduces the risk of error from an
individual tree.
4. Support Vector Machine
• The support vector machine (SVM) is a supervised learning model with associated
learning algorithms that analyze data used for classification and regression
analysis.
• Let’s assume that there are two classes of data. A support vector machine will
find a hyperplane or a boundary between the two classes of data that maximizes
the margin between the two classes (see below). There are many planes that can
separate the two classes, but only one plane can maximize the margin or distance
between the classes.
4.Support Vector Machine : classification
4. Support Vector Machine : classification
4. Support Vector Machine : classification
4. Support Vector Machine : classification
4. Support Vector Machine : classification
4. Support Vector Machine : classification

Low Regularization

High Regularization
5. Support Vector Machine : regression

f ( x ) = W    ( X )  b,
where W is the weight parameter and  is a nonlinear
transformation function. b is the threshold or bias.
5.Neural Network
5.Neural Network
•ANN is a network of artificial neurons, which can receive inputs, change their internal states according to the
inputs, and then compute outputs based on the inputs and internal states. These artificial neurons have weights
that can be modified by a process called learning. The ANN model can be presented as



yt  g ( g ( xi w I  b I ) wO  bO )

Where g and g are the output and input activation functions, respectively. yt and xi are output and
I O
input, respectively. b I and bO are the bias term of input and output layers, respectively. w and w are
the weight vector between the hidden layer and the input layer; and between the hidden layer and the
output layer, resoectively
5.Neural Network
• Neural Network Architecture
Let Practice
1. Regression
# Simulation
set.seed(1)
e=rnorm(100)
x=rnorm(100)
y=1+2*x+e
# Estimation
linear=lm(y~x)
summary(linear)
# Prediction
pred=predict(linear)
plot(ts(y), col="blue", lwd=2, lty=2)
lines(pred, col="red", lwd=2)
legend("bottomleft", legend=c("Pred", "True"),col=c("red", "blue"), lty=1:2, cex=1)
1. Regression : Out-of-sample Predict

4
2
ts(y)

0
-2
-4
Pred
True

0 20 40 60 80 100

Time
1.1 ARIMA and SARIMA
library (forecast)
arima1 = Arima(y ,order=c(0,1,1),seasonal=list(order=c(0,0,0),period=1))
sarima1 = Arima(y ,order=c(0,1,1),seasonal=list(order=c(1,1,0),period=1))
# in sample
inpred=fitted(arima1)
# out of sample
outpred=predict(sarima1, 5)

plot(ts(y), col="blue", lwd=2, lty=2)


lines(inpred, col="red", lwd=2)
legend("bottomleft", legend=c("Pred", "True"),col=c("red", "blue"), lty=1:2, cex=1)
2. Decision Tree
• For this part, you will use the Boston housing data to explore our forecasting. The
dataset is located in the MASS package. It gives housing values and other statistics
in each of 506 suburbs of Boston based on a 1970 census.
This data frame contains the following columns:
crim per capita crime rate by town.
zn proportion of residential land zoned for lots over 25,000 sq.ft.
indus proportion of non-retail business acres per town.
chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox nitrogen oxides concentration (parts per 10 million).
rm average number of rooms per dwelling.
age proportion of owner-occupied units built prior to 1940.
dis weighted mean of distances to five Boston employment centres.
rad index of accessibility to radial highways.
tax full-value property-tax rate per $10,000.
ptratio pupil-teacher ratio by town.
black 1000(Bk−0.63)2 where Bk is the proportion of blacks by town.
lstat lower status of the population (percent).
medv median value of owner-occupied homes in $1000s.
2. Decision Tree
library(MASS)
library("ISLR")
library(tree)
data(Boston)
boston<-Boston
# Training
set.seed(101)
train = sample(1:nrow(boston), 300)
treefit= tree(medv~crim+rm, boston, subset=train)
plot(treefit)
text(treefit)
tree.pred = predict(treefit, boston[-train,c("crim","rm")])
plot(tree.pred, lwd=2)
true=boston[-train,"medv"]
points(true, col="red", lwd=2)
2. Decision Tree : Tree
rm < 6.92
|

crim < 5.84803 rm < 7.437

crim < 5.18903


rm < 6.1205
46.61
crim < 4.24185 rm < 6.543 13.70 33.66 23.64
rm < 5.722
22.43 26.62
16.27 19.88 28.48
3. Decision Tree : Predict

45
40
35
tree.pred

30
25
20
15

0 50 100 150 200

Index
3. Random Forest
library(MASS)
library("randomForest")

# Training
set.seed(101)
train = sample(1:nrow(boston), 300)
rf.boston = randomForest(medv~crim+rm, data = boston, subset = train)

# Out-of-sample prediction and compare with actual value


pred=predict(rf.boston , boston[-train, c("crim","rm")])
true=boston[-train,"medv"]

plot(pred, lwd=2)
points(true, col="red", lwd=2)
3. Random Forest : Out-of-sample Predict

40
30
pred

20
10

0 50 100 150 200

Index
4. Support Vector Machine : regression

library(e1071)
# Training
svmfit = svm(medv~ crim+rm, data = boston[train,], kernel = "linear", cost = 1, scale = FALSE)
print(svmfit)
# Out-of-sample prediction and compare with actual value
pred=predict(svmfit, boston[-train, c("crim","rm")])
true=boston[-train,"medv"]
plot(pred, lwd=2)
points(true, col="red", lwd=2)
4. Support Vector Machine : regression : Out-of-
sample Predict

40
30
20
pred

10
0

0 50 100 150 200

Index
5. Neural network

library(neuralnet)
# One layers with 2 neuron, respectively. The activation function is tanh
# Training
nn <- neuralnet(medv~crim+rm, data=boston[train,], hidden=c(2), act.fct =
"tanh",linear.output=TRUE, threshold=0.01)
nn$result.matrix
plot(nn)

## Out-of sample forecast


nn.results <- compute(nn, boston[-train,])
pred=nn.results$net.result
plot(pred, lwd=2)
points(true, col="red", lwd=2)
5. Neural network : Architecture

1 1 1

-2 .5
3 53
5
crim -2.47605
7.
86

11 .28
-5 .

10

-5 .87
1
29
6

046
281
06

-10.79186 medv
-3 .67 8
896

2
96
17

36

42
-3 .

5.
rm -2.81406

Error: 11501.180183 Steps: 143

Two layers with 2 and 1 neuron, respectively. The activation function is tanh
5. Neural network : Out-of-sample Predict

50
40
pred

30
20

0 50 100 150 200

Index
Understanding Temporal Relationships
• In addition to linear regression, it has several model used to
Understanding Temporal Relationships
• ECM
• ARDL
• Quantile regression
ARDL
• Instead of working differences with Y and X which have unit roots
• you may wish to estimate the ARDL model: Recently, we can add
more lag term in ARDL as
P Q
Yt      pYt  p    q X t q  vt
p 1 q 1

• Thus, we can converge this ARDL to ECM as


P Q
Yt      p Yt  p   q X t  q   ut 1  vt
p 1 q 1
Cointegration test
• As I mentioned earlier , if a linear combination of I(1) variables is
stationary or I(0), then the variables are said to be cointegrated.
• In the other words, if Y and X are I(1), then u becomes I(0).
• Thus, it simples to check whether the error term u is stationary.
Cointegration test
• Bound test
• The ARDL / Bounds Testing methodology of Pesaran and Shin (1999) and
Pesaran et al. (2001) has a number of features that many researchers feel give it
some advantages over conventional cointegration testing
• Recall that the ECM is Q
P
Yt      p Yt  p   q X t  q   ut 1  vt
p 1 q 1

• Or, ARDL-ECM Pesaran et al. (2001) rewrite it as


P Q
Yt      p Yt  p   q X t  q  1Yt 1   2 X t 1  vt
p 1 q 1

we might call equation (4) an "unrestricted ECM", or an "unconstrained ECM". Pesaran et


al. (2001) call this a "conditional ECM".
Cointegration test
• All that we're going to do is preform an "F-test" of the hypothesis,
H0: θ1 = θ2 = 0 ; against the alternative that H0 is not true.
• A rejection of H0 implies that we have a long-run relationship.
• There is a practical difficulty that has to be addressed when we
conduct the F-test.
• The distribution of the test statistic is totally non-standard (and also
depends on a "nuisance parameter", the cointegrating rank of the
system) even in the asymptotic case where we have an infinitely large
sample size.
Cointegration test
• Exact critical values for the F-test aren't available for an arbitrary mix
of I(0) and I(1) variables. However, Pesaran et al. (2001)
supply bounds on the critical values for the asymptotic distribution of
the F-statistic.
Cointegration test
• For various situations (e.g., different numbers of variables, (k + 1)), they
give lower and upper bounds on the critical values. In each case, the lower
bound is based on the assumption that all of the variables are I(0), and the
upper bound is based on the assumption that all of the variables are I(1). In
fact, the truth may be somewhere in between these two polar extremes.
If the computed F-statistic falls below the lower bound we would conclude
that the variables are I(0), so no cointegration is possible, by definition. If
the F-statistic exceeds the upper bound, we conclude that we have
cointegration. Finally, if the F-statistic falls between the bounds, the test is
inconclusive.
• If the cointegration based ARDL is confirmed, the mixture I(1) and I(0) in
the regression model becomes valid.
ARDL code
library(ARDL)
library(urca)
data(denmark)
head(denmark)
attach(data.frame(denmark))
## Step 1 Unit root test
u1=ur.df(LRM ,type="drift",selectlags = "AIC")
u2=ur.df(IDE,type="drift",selectlags = "AIC")
u3=ur.df(LRY ,type="drift",selectlags = "AIC")
u4=ur.df(IBO,type="drift",selectlags = "AIC")
summary(u1)
summary(u2)
summary(u3)
summary(u4)
## Step 2 Find the best ARDL order --------------------------------------------
model1 <- auto_ardl(LRM ~ LRY + IBO + IDE, data = denmark,max_order = c(5,4,4,4))
model1$top_orders
ARDL code
## Step 3 Estimate best ARDL
ardl_3132 <- ardl(LRM ~ LRY + IBO + IDE,data = denmark, order = c(3,1,3,2))
# Step 4 Estimate the ARDL-ECM
uecm_3132 <- uecm(ardl_3132)
summary(uecm_3132)
# Step 5 Estimate the ECM
recm_3132 <- recm(ardl_3132, case=3)
summary(recm_3132)
# Step 6 Bounds test from Pesaran et al. (2001)
bounds_f_test(ardl_3132, case = 3)
# step 7 Long run equation
multipliers(ardl_3132 , type = "lr", vcov_matrix = NULL)
Panel regression
Panel regression model is used to examine the relationship between the dependent variable and a
set of independent variables across multiple entities (e.g., countries, firms) over time. Panel data,
which combines cross-sectional and time series data, allows us to control for individual
heterogeneity, detect and measure effects that are not observable in pure cross-sectional or time
series data, and improve the efficiency of econometric estimates.

= 1 + 2 + 3 + + +
Estimators of Panel regression
• Pooled Ordinary Least Squares (Pooled OLS): Assumes that there are no unique attributes of
individuals or time periods, and the data can be pooled without accounting for individual or time-
specific effects. This method ignores the panel structure.

= + 1 + 2 + 3 +

• Fixed Effects Model (FE): Accounts for individual-specific effects that may correlate with the
independent variables. The model assumes these effects are constant over time and focuses on
within-entity variation. The model is specified as:

= 1 + 2 + 3 + + +
Estimators of Panel regression
• Random Effects Model (RE): Assumes that the individual-specific effects are random and
uncorrelated with the independent variables. The model is specified as:

= + 1 + 2 + 3 + + +

• To compared FE and RE estimators, Hausman Test is suggested

• Null Hypothesis ( 0 ): The random effects model is appropriate (i.e., the random effects are
uncorrelated with the regressors).
• Alternative Hypothesis ( 1 ): The fixed effects model is appropriate (i.e., the random effects are
correlated with the regressors).
Programming
• Rcode
• Stata
• Data : Panel Data ICT 10 countries from 1990-2020
R code: Panel regression
library(plm)
#Step 1 Import Data
data=read.csv(file.choose(),head=TRUE)
head(data)
#Step 2 Convert file to be Panel data
panel <- pdata.frame(data,c("country","year"))
#Step 3 Run Panel regression (Fixed effect)
# 3.1 (Fixed effect) #effect = c("individual", "time", "twoways")
fe <- plm( GINI ~ IU+ FT, model = "within", effect = "individual", data=panel)
summary(fe)
# 3.2 (Random effect) #effect = c("individual", "time", "twoways")
re <- plm( GINI ~ IU+ FT, model = "random", effect = "individual",data=panel)
summary(re)
# 3.3 (Pooling OLS )
pool <- plm( GINI ~ IU+ FT, model = "pool", data=panel)
summary(pool)

55
R code: Hausman Test

# Hausman Test (Compare only Random and Fixed )


phtest(fe, re)

56
STATA
• This program is highly popular for conducting Panel
Regression in the present day and offers more comprehensive
testing compared to EViews. Therefore, we can use the STATA
program for estimating Panel Regression.
• We will perform the following estimations:
5.1 Importing and setting up the data
5.2 Estimating the Fixed Effects model
5.3 Estimating the Random Effects model
5.4 Hausman Test.

57
STATA: STEP 1 Import data
Click

58
STATA: STEP 1 Import data
1) The window for entering data will appear similar to Excel. You can copy the data from the Excel file and paste it into this window.
2) Copy → Paste and select "Treat first rows as variable name."
3)The data will appear as shown in the image.

59
STATA: STEP 1 Set up Panel data
Click

60
STATA: STEP 1 Set up Panel data

column for id
column for Time(year)

Choose frequency

61
STATA: STEP 1 : Set up Panel data

Click OK

62
STATA: STEP 1 SET UP Complete

63
STATA: STEP 2 Run Fixed effects
Click

Click

64
STATA: STEP 2 Run Fixed effects
Dependent variable

Independent variables

Click

OK

GINI it   0  1 IU it   2 FTit   i   it

65
STATA: Fixed effects results

66
STATA: STEP 3 Run Random effects
Similar to Fixed effects
Click
Dependent variable
Independent variable

Click

OK

GINI it   0  1 IU it   2 FTit   i   it
67
STATA: Random effects results

68
STATA: Hausman Test (Stata Code)
Command
xtreg gini iu ft, fe
estimates store fix
xtreg gini iu ft, re
estimates store random
hausman random fix

69
STATA: Hausman Test Result

P-value=0.000 , Reject H0

70
R-code : Check Unit root test

library(plm)
#Step 1 Import Data
data=read.csv(file.choose(),head=TRUE)
head(data)
#Step 2 Convert file to be Panel data
panel <- pdata.frame(data,c("country","year"))
#Step 3 Get each variable
LLC <- purtest(GINI,test = "levinlin", lags = "AIC", pmax = 1)

71
STATA: check unit root test
Click

Click

72
STATA: check unit root test
Levin Lin and Chu Unit root test

Variable
Tick

lag

Ok

73
Panel unit root test result

74
Dynamic Panel regression
• Panel or longitudinal data enables accounting for unobserved unit-
specific heterogeneity and modeling dynamic adjustment or feedback
processes.
• Instrumental Variables (IV) and Generalized Method of Moments
(GMM) are the predominant estimation techniques for handling
models with endogenous variables, particularly when dealing with
lagged dependent variables in short time horizons.
• The model takes form as

= + 1 + 2 + 3 + + +
Some Stata milestones
December 15, 2000: xtabond command for the Arellano and Bond
(1991) difference GMM (diff-GMM) estimation.
November 26, 2003: xtabond2 command for Arellano and Bover (1995)
and Blundell and Bond (1998) system GMM (sys-GMM) estimation.
June 25, 2007: xtdpdsys command is used for system-GMM
estimation. Both xtabond and xtdpdsys are built on the xtdpd
command, offering different approaches to dynamic panel data
estimation.
June 1, 2017 : xtdpdgmm estimates a linear (dynamic) panel data
model with the generalized method of moments (GMM). The main
value added of the new command is that is allows to combine the
traditional linear moment conditions with the nonlinear moment
conditions suggested by Ahn and Schmidt (1995) under the assumption
of serially uncorrelated idiosyncratic errors.
Generalized method of moments (GMM)
Generalized method of moments (GMM)
Generalized method of moments (GMM)

Z can be lagged of Y and X or other instrument variables


Generalized method of moments (GMM)
Generalized method of moments (GMM)
• Note that
Generalized method of moments (GMM)
In the GMM procedure, you use m(b) and W together to find the parameter estimates that best
satisfy the moment conditions, ensuring the model fits the data as closely as possible (One
step GMM estimation). Or we can use two-step GMM
Generalized method of moments (GMM)
In the GMM procedure, you use m(b) and W together to find the parameter estimates that best
satisfy the moment conditions, ensuring the model fits the data as closely as possible (One
step GMM estimation). Or we can use two-step GMM
Diff-GMM
The key idea of Diff-GMM is to first-difference the data to remove any time-invariant
unobserved effects (also known as individual-specific effects). Then, it uses lagged levels of
the variables as instruments for the differenced equations.
Diff-GMM

where
STATA
• Data : Panel Data ICT 10 countries from 1990-2020

GINI it   0  1 IU it   2 FTit   3GDPit   i   it


86
Estimation with Stata

Select estimation
Arellano-Bond diff-GMM
Arellano-Bond diff-GMM
Arellano and Bover (1995) and Blundell and
Bond (1998) system GMM (sys-GMM)
Workshop on real data application
Paper 1 : ARIMA and SARIMA
ARIMA forecasting of primary energy demand
by fuel in Turkey
• Study: In this study, they used the Autoregressive Integrated Moving
Average (ARIMA) and seasonal ARIMA (ARIMA) methods to estimate
the future primary energy demand of Turkey from 2005 to 2020
• Data: Primary energy demand of Turkey from between 1950 and
2004 ( However, this example data covers only 1965-2005)
https://ourworldindata.org/energy/country/turkey
• Method : ARIMA and ARIMA
R code
# Load the readxl package
library(readxl)
library (forecast)
# Read the data from sheet "paper1"
data <- read_excel(file.choose(), sheet = "paper1")
# Conver to data frame
data=data.frame(data)
# Display the first few rows of the imported data
head(data)
attach(data)
# Set time series data
energy=ts( data[,2], start=1965, freq=1)
arima1 = Arima(energy,order=c(1,1,1),seasonal=list(order=c(0,0,0),period=1))
sarima1 = Arima(energy,order=c(1,1,1),seasonal=list(order=c(1,2,1),period=1))
# In of sample forecast
inpred=fitted(arima1 )
# Out of sample forecast
outpred1=predict(arima1, 15)$pred
outpred2=predict(sarima1, 15)$pred
# Combine actual data with forecasted values
extended_energy <- ts(c(energy, outpred2), start = 1965, frequency = 1)

# Plotting the actual data and forecasts


plot(extended_energy, col = "blue", lwd = 2, lty = 2, main = "Energy Time Series Forecast", ylab = "Energy", xlab = "Year")
lines(outpred1, col = "red", lwd=2)
lines(energy, col = "black", lwd=2)

# Add legend
legend("topleft", legend = c("Actual", "Out-of-sample Forecast arima", "Out-of-sample Forecast sarima"),
col = c("black", "red", "blue"), lty = c(1, 1, 1), lwd = 2, cex = 1)
Paper 2 : ARDL model
Economic growth and biomass energy
• Study: This paper investigates the short-run and long-run causality analysis
between biomass energy consumption and economic growth in the selected 10
developing and emerging countries by using the Autoregressive Distributed Lag
bounds testing (ARDL) approach
• Data: Argentina, Bolivia, Cuba, Costa Rica, El Salvador, Jamaica, Nicaragua,
Panama, Paraguay and Peru. bc represents the biomass energy consumption log
(bct), and py represents the logarithm of real GDP. Data were taken from World
Bank, the International Energy Agency. Data covers the 1980-2009
# Bio energy consumption
Energy Statistics Data Browser – Data Tools – IEA
# REAL gdp
https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?locations=AR-BO
• Method : ARDL
Methodology
• Methodology
The ARDL model for the standard log-linear functional specification of
long-run
Methodology
• Methodology
The Error Correction model used to analyze relationships between the
variables was constructed as follows:
ARDL r code ( Argentina case)
# Load the package
library(ARDL)
library(urca)
# Read the data from sheet "paper1"
data <- read_excel(file.choose(), sheet = "paper2")
# Conver to data frame
data=data.frame(data)
# Display the first few rows of the imported data
head(data)
attach(data)
# transform to log
logBioArg=log(BioArgen)
logGDPArg=log(GDPArgen)
## Step 1 Unit root test I(0)
u1=ur.df(logBioArg,type="drift",selectlags = "AIC")
u2=ur.df(logGDPArg,type="drift",selectlags = "AIC")
summary(u1)
summary(u2)
## Unit root test I(1)
u3=ur.df(diff(logBioArg),type="drift",selectlags = "AIC")
u4=ur.df(diff(logGDPArg),type="drift",selectlags = "AIC")
summary(u3)
summary(u4)
## Step 2 Find the best ARDL order --------------------------------------------
data1=data.frame(data,logBioArg,logGDPArg)
model1 <- auto_ardl(logGDPArg~ logBioArg, data = data1,max_order = c(4,4))
model1$top_orders
ARDL code
## Step 3 Estimate best ARDL
ardl_argen <- ardl(logGDPArg~ logBioArg,data = data1, order = c(1,1))
# Step 4 Estimate the ARDL-ECM
uecm_argen <- uecm(ardl_argen)
summary(uecm_argen )
# Step 5 Estimate the ECM
recm_argen <- recm(ardl_argen , case=3)
summary(recm_argen )
# Step 6 Bounds test from Pesaran et al. (2001)
bounds_f_test(ardl_argen , case = 3)
# step 7 Long run
multipliers(ardl_argen, type = "lr", vcov_matrix = NULL)

Now let’s do the ARDL in which Biomass is the dependent variable and GDP is the independent variables
Paper 3 : Panel regression
• Study: This paper revisits the renewable energy-economic growth nexus in
seven European countries for the 34-year period of 1985–2018.
• Data: seven OECD countries in Europe (Germany, Italy, Netherlands,
Poland, Spain, Turkey, and United Kingdom), spanning the period of 1985–
2018. All data are taking logarithm
• Renewable energy consumption (RE) and electricity generation shares are
derived from BP’s 2019 Statistical Review of World Energy data file. We
obtain the OECD Europe price indexes for coal and natural gas from the IEA
Energy Prices and Taxes database.
• The data for real GDP (Y) is acquired from the International Monetary
Fund. Finally, the fixed gross capital formation (K) and labour force (L) data
are from World Bank’s World Development Indicators databank.
• Method : Panel regression ( Pooled mean group)
Methodology
• Model

• Panel cointegration test


Stata
Step 1 : Import data and declare data as panel
Stata
• Step 2 : Unit root test
Panel unit root test results

They are not stationary at I


(1), so we transform data as
first diff and check the panel
unit root test again
Code to generate first diff
generate Dre=re-re[_n-1]
Panel unit root test at first diff
Panel cointegration perodi
Panel cointegration perodi result
Panel regression ( Pooled mean group)
• Stata code ( you need to install xtpmg package before using xtpmg
function)
xtpmg d.re d.gdp d.coal_p d.gas_p , lr(l.re gdp coal_p gas_p) ec(ec)
replace pmg

This separation maintains the full expression of the PMG regression model,
capturing both the error correction term (long-run relationship) and the short-
run dynamics
PMG results
Paper 4 : Dynamic panel model
Do shareholder coalitions affect agency
costs? Evidence from Italian-listed companies
Study : This study investigates the relationship between agency costs and
ownership structure for a sample of listed Italian companies to determine the
impact of shareholder coalitions on agency costs.
Data: Using a balanced panel dataset of 163 Italian firm-year observations for the
period 2002–2013
- available data on ownership structure for the entire study period; information
acquired from the Consob (Commissione Nazionale per le Società e la Borsa,
2014) website; and individual company reports on corporate governance. •
- available data on firm-level indicators (debt-to-capital ratio, size, age of the firm,
industry sector) for all companies in the sample. Data were collected from
Datastream, Bloomberg, Calepino dell’Azionista (Mediobanca, 2014), and
obtained manually from the financial statements of the individual companies.
Methodology : Dynamic panel data model involving a two-step system-GMM (
Methodology
Stata : DPD SYSTEM-GMM (2 steps)
Autocorrelation and Sargan test
Thank you

You might also like