Econometric Methods
Econometric Methods
The seasonal component reveals a consistent and repeating pattern every four quarters, indicating the
presence of strong and regular seasonal effects. Each year, specific quarters systematically show increases
or decreases in profit levels, likely due to underlying economic or fiscal cycles. This seasonal structure
remains stable throughout the entire period, suggesting that the seasonality is not evolving over time.
Lastly, the random or irregular component accounts for short-term, unpredictable fluctuations that are not
explained by either trend or seasonality. These residual variations are relatively small in magnitude
compared to the trend and seasonal components, indicating that the majority of the variability in the time
series is systematic and predictable.
Overall, the classical decomposition reveals that the "Profits" series is driven primarily by a combination
of a strong upward trend and stable seasonality, with limited noise from random effects. This structure
supports the use of time series models such as ARIMA or Seasonal ARIMA for effective forecasting.
To determine whether the "Profits" time series is stationary, we applied the Augmented Dickey-Fuller
(ADF) test, which checks for the presence of a unit root a characteristic of non-stationary data. The test was
first applied to the original series. The result yielded a Dickey-Fuller test statistic of –2.7963 with a p-value
of 0.2487. Since the p-value is greater than the conventional 0.05 threshold, we fail to reject the null
hypothesis of a unit root. This indicates that the original series is non-stationary, and therefore unsuitable
for direct modeling with ARIMA.
To address this, we applied a first-order difference to the series and re-ran the ADF test on the differenced
data. This time, the test produced a Dickey-Fuller statistic of -3.9945 and a p-value of 0.01356. As this p-
value is less than 0.05, we reject the null hypothesis and conclude that the differenced series is stationary.
These results confirm that the "Profits" series requires first-order differencing (d = 1) to achieve stationarity,
which is a necessary condition for building a valid ARIMA model.
ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function)
To forecast the "Profits" variable using the ARIMA model,
the first step in the Box-Jenkins methodology is to identify
the appropriate orders of the autoregressive (p), differencing
(d), and moving average (q) components. The original time
series was first checked for stationarity using the Augmented
Dickey-Fuller (ADF) test. The result of the ADF test
indicated that the series was non-stationary, suggesting the
need for differencing. After applying first-order differencing,
the ADF test confirmed stationarity of the differenced series.
Therefore, the order of differencing was selected as d = 1.
After tentatively identifying the ARIMA (1,1,1) model, we estimated its parameters using maximum
likelihood estimation. The fitted model produced an autoregressive coefficient (ar1) of 0.4066 and a moving
average coefficient (ma1) of –0.1311, with standard errors of 0.5823 and 0.6533, respectively. These
relatively large standard errors suggest that the individual coefficients are not highly significant, though the
model does capture some of the dynamics of the series. The model’s estimated variance (σ²) was 80.63, and
it had a log-likelihood of –314.45, resulting in an AIC of 634.91. The training set error measures, including
RMSE = 8.93, MAE = 6.58, and MAPE = 5.19%, indicate a moderate level of forecasting error.
In contrast, the automated model selected by auto.arima() suggested a more complex seasonal ARIMA
model: ARIMA (2,1,2) (0,0,1) [4], which includes two autoregressive terms, two moving average terms,
and one seasonal moving average term. This model had more statistically significant coefficients, with low
standard errors, and achieved a better overall fit. It yielded a lower estimated variance (σ² = 72.32), a higher
log-likelihood (–308.13), and notably better model selection criteria with AIC = 628.25 and BIC = 643.05.
Furthermore, the error metrics were improved compared to the simpler model, with RMSE = 8.21, MAE =
6.13, and MAPE = 4.86%, indicating enhanced predictive accuracy.
Overall, the ARIMA (2,1,2) (0,0,1) [4] model appears to provide a better fit to the data based on lower AIC,
BIC, and error measures, and its parameters are estimated with greater statistical significance.
Step 3: Diagnostic Checking (Residual Analysis)
Box-Ljung test
After fitting the ARIMA (1,1,1) model, the next step in the Box-Jenkins methodology is to evaluate whether
the model's residuals behave like white noise. This means the residuals should have a mean of zero, constant
variance, and no significant autocorrelation. A residual diagnostic plot was generated, which included the
residual time series, the ACF of the residuals, and the PACF of the residuals. From the residual plot, we
observe that the residuals fluctuate randomly around zero without any visible trend, suggesting the absence
of systematic patterns. Additionally, the ACF and PACF plots show that most of the autocorrelations fall
within the 95% confidence bounds, indicating no significant autocorrelation remaining in the residuals.
To statistically verify this observation, we conducted the Box-Ljung test, which tests the null hypothesis
that the residuals are independently distributed. The test produced a Chi-squared value of 41.396 with 20
degrees of freedom and a p-value of 0.0033. Since the p-value is less than the conventional significance
level of 0.05, we reject the null hypothesis. This suggests that the residuals of the ARIMA (1,1,1) model
are not purely random and still exhibit some autocorrelation.
Therefore, although the residual plots appear mostly acceptable visually, the Box-Ljung test indicates a
potential lack of complete model adequacy. This suggests that a more complex model (such as ARIMA
(2,1,2) (0,0,1) [4]) may provide a better fit, as seen in Step 2, where it had both lower AIC/BIC and better
error metrics.
Step 4: Forecasting
In the final step of the Box-Jenkins methodology,
forecasts were generated using both the manually
specified ARIMA (1,1,1) model and the auto-selected
ARIMA (2,1,2) (0,0,1) [4] model. The forecasts cover
the next 8 quarters, providing short-term projections
for the “Profits” variable.
Overall, both models predict stability in future profits; however, the auto ARIMA model provides a more
statistically reliable forecast with tighter confidence bands. Based on forecasting accuracy, model
diagnostics, and residual behavior, the auto ARIMA model is preferred for making predictions in this case.
Model Comparison
Model ARIMA Order df AIC BIC R²
model_111 ARIMA (1,1,1) 3 634.9059 642.3036 0.965311
model_auto ARIMA (2,1,2) (0,0,1) [4] 6 628.2536 643.0490 0.9706773
To determine the best-fitting model for forecasting the "Profits" variable, we compared the manually
specified ARIMA (1,1,1) model with the automatically selected ARIMA (2,1,2) (0,0,1) [4] model using
several statistical criteria: AIC (Akaike Information Criterion) and R² (coefficient of determination).
The AIC for ARIMA (1,1,1) was 634.91 with 3 degrees of freedom, while the AIC for the auto ARIMA
model was 628.25 with 6 degrees of freedom. A lower AIC value indicates a better trade-off between model
fit and complexity, suggesting that the auto ARIMA model provides a superior fit despite its higher
complexity.
Additionally, the R² value for ARIMA (1,1,1) was 0.9653, whereas the auto ARIMA model achieved a
slightly higher R² of 0.9707. This indicates that the auto ARIMA model explains a slightly larger proportion
of the variance in the data, further supporting its superior performance.
In summary, both the lower AIC and higher R² values suggest that the auto ARIMA model fits the data
better and should be preferred for forecasting. This conclusion is consistent with the residual diagnostics
and forecasting accuracy observed in earlier steps.