(SEHS - 2020) Template
(SEHS - 2020) Template
3 Abstract
4 A-nine years data on monthly-confirmed dengue fever (DF) cases from Kuantan, Pahang
5 were retrieved and analysed using a time series analysis model. This model could provide
6 useful information to facilitate the planning of public health interventions to minimise the
7 frequency of DF outbreaks. The objectives of this study were to: (i) analyse the trend of
9 Moving Average (SARIMA); (ii) Test the accuracy of the parameters of the model by
10 forecasting monthly cases of DF in 2018 using the cases of DF from 2011 to 2017 and
11 comparing it with actual monthly cases of DF in 2018 and, (iii) construct a SARIMA model,
12 by adopting the Box-Jenkins method, to forecast the monthly DF cases in 2019. Monthly-
13 confirmed DF cases from 2011 to 2018 fit the model while the prediction was validated using
14 epidemiological data from January 2018 to December 2018. The study concluded that the
15 SARIMA (0,1,0)(3,0,2)12 model was the best-fit and could be used to extrapolate case
17 was relatively close to the actual number of monthly DF cases and fell within the confidence
18 interval (CI). Therefore, the SARIMA model developed by this study is capable of accurately
19 forecasting and predicting future DF cases. This can help improve existing intervention
20 programmes, which are an integral component of minimising the burden of the disease in
21 Kuantan.
24 Dengue fever (DF) is a burgeoning public health problem in Malaysia (Bujang, et al.,
26 are vital to reduce the likelihood of new infections as well as the burden of the disease in the
27 country. The proposed strategies would be more effective if supported by accurate statistical
28 and scientific data. Time series analysis is one of the statistical techniques that is used to
30 continuous sequence of numerical data points. In an investment industry, for instance, time
31 series is represented by the movement of specific data points; such as the price of a
32 commodity; over a specified period with regularly reported data points. There is no time
33 period to allow policymakers or analysts to obtain important and highly sought-after data.
34 Often, time series analysis is related to trend analysis, cyclical fluctuation analysis and issues
35 of seasonality. For example, disease such as Malaria or Tuberculosis can be analysed daily,
36 weekly or by a monthly basis. Time series analysis will also show whether the disease is
37 seasonal by evaluating if it goes through peaks and troughs at specific times of the year. The
39 Penang (Skae, 1980). The first outbreak, of epidemic proportions, was reported in 1973 and
40 resulted in a total of 969 confirmed cases and 54 mortality cases (Wallace, et al., 1980) such
41 situation continued to deteriorate with the rampant spread of the disease in urban populations.
42 Subsequent outbreaks in the following years resulted in 1487 cases and 54 deaths in 1973,
43 2200 cases and 104 deaths in 1974, and 3006 cases and 35 deaths in 1982 (Mudin, 2015). The
44 number of confirmed DF cases has only continued to increase since 2000, with the highest
46
47 According to Ler, et al., (2011), DF can be caused by any of the four genetically related
48 but antigenically distinct dengue virus (DENV) serotypes which are DENV-1, DENV-2,
50 DENV-1, -2 and -3 identified in Negeri Sembilan (Ahmad Nizal, et al., 2012), multiple
51 entries of DENV-2 and -4 in Sarawak (Holmes et al., 2009) and cases in Kuala Lumpur and
52 Selangor predominantly from DENV-4 (Chew et al., 2012). Although each DENV serotype
53 has a distinct clinical and epidemiological profile, accurately identifying each serotype’s
54 clinical characteristics proves to be a challenge. Studies indicate that DENV-2 and -3 have
55 more severe disease outcomes while DENV-4 is the mildest (Nisalak, et al., 2016; Vaughn, et
56 al., 2000). All genders and ethnicities have been found to be equally vulnerable. Severe cases
57 of dengue haemorrhagic fever (DHF) and dengue shock syndrome (DSS) affect paediatric
58 patients between the ages of 2 years and 15 years throughout Southeast Asia. (Bhatia et al.,
59 2013).
61 tourist travelling into dengue-endemic areas has increased. Imported DF cases can further
62 spread in non-endemic areas when competent vectors, such as Aedes albopictus and Aedes
63 aegypti mosquitos, are present. Following disease importation in recent years, autochthonous
64 DF outbreaks have been reported in several non-endemic countries such as France, Croatia,
66
67 In Malaysia, nearly all age groups are vulnerable to the disease. The most vulnerable age
68 group between 15 years to 49 years old (Mudin, 2015). DF is considered a highly contagious
69 health threat with a growing trend of infection in Malaysia. Between 2000 to 2010, the
70 number of DF cases and DF-related deaths increased by an average of 14% and 8%,
71 respectively, each year (Mia M.S, et al., 2013). Malaysia suffered a 151% increase in cases in
140000
120836
120000
108698
101357
100000
No of dengue cases
83849
80615
80000
60000
48846 49335
46171
43346
41486
39654 38556
40000 32767 31545 33895
21900
19884
20000 16368
7103
73
75
77 This study used systematically sampled data of confirmed DF cases reported by the
78 Vector-borne Disease Sector, Disease Control Division, Pahang State Health Department at
79 the Ministry of Health, Malaysia’s real-time national database of dengue cases that is
82 confirmed case from 2011 to 2018 was downloaded and placed in a specific folder in
83 Microsoft Office format. Data saved was updated using Microsoft Excel and statistically
84 analysed using Statistical Package for Social Science Version 20 (SPSS Version 20.0). Box et
85 al., (2015) and Boudrioua et al., (2020) expressed the SARIMA model as follows:
86 Φ p ( B s ) ϕ p ( B ) ∇ DS ∇ d χ t =❑ϱ ( B ) wt .
92 The model used autoregression terms (P, D, Q) extracted through autocorrelation and
93 added to the seasonality element (p, d, q) to develop a model capable of predicting dengue.
95 against stationary and alternatively is tested in the Augmented dickey and Fuller test (ADF).
p
96 The hypotheses H0: Xt is non-stationary and H1: Xt is stationary. The Xt = 0+t+1Xt-1+∑ γi
i=1
97 Xt-1+t regression formula (Cryer et al., 2008) can test both and, if necessary, fulfil the
98 underlying assumption via differencing before forecasting using the Box-Jenkins method.
99 Auto-correlation function (ACF) and partial auto-correlation function (PACF) plots were
100 generated to measure the degree of correlation between observations in the time series. Both
101 ACF and PACF were compared to determine their characteristic and theoretical behaviours.
102 The model was estimated using mean squared error (MSE), mean absolute percentage error
103 (MAPE), mean absolute error (MAE) and root-mean-square error (RMSE). The final model’s
104 goodness of fit was tested using Bayesian information criterion (BIC). To obtain a forecast
105 with minimal errors, a Seasonal ARIMA Model must possess good features. The model
106 should be parsimonious (smallest coefficients), stationary and have constant mean and
107 variance values while its coefficient must be significant and have white noise as a residual.
108 Lastly, the time series model should be distributed normally to appropriately fit the forecast
111 3. Results
112 Figure 2 shows 8005 confirmed DF cases between 2011 to 2018 in Kuantan. An
113 increasing trend of DF cases, beginning in 2011 and finally showing a peak in 2015, was
114 observed during such period. It increased by 216% in magnitude and frequency as indicated
115 by the 541 cases and 1711cases in 2011 and 2015 respectively. The number of cases
116 subsequently decreased to 1684 in 2016, further decrease in DF cases was observed in 963 in
1200
f(x) = 63.1071428571429 x + 716.642857142857
1000 963
800 722
641
541 578
600
400
200
0
2011 2012 2013 2014 2015 2016 2017 2018
Year
118
119 Figure 2: Time series plot of yearly dengue cases in Kuantan from 2011–2018
120
121 A pattern of short-term changes was observed in the data indicating the existence of
122 seasonal fluctuations. The decomposition method estimated the trends while the moving
123 average method calculated the seasonal fluctuations. This produced a single figure that
124 showed the original series (observed), trend, seasonal effects, and random elements (Figure
125 3). The additive model seemed more appropriate than the multiplicative model because, over
126 time, the frequency of the seasonal fluctuations and the pattern did not vary. Increased in
127 variance of the random element meant that log transformation was more suitable for the
128 sequence.
129
131 Natural logarithm and natural differentiation were carried out to stabilise the time
132 series variance. Furthermore, stationarity testing of the time series was carried out using the
134 test for the monthly DF cases in Kuantan. Hypothesis were H 0: Xt is non-stationary and Ha: Xt
135 is the stationary sequence tested using ADF. H0: Xt hypotheses are levels or patterns of
136 stationarity that have been tested for the Ha: Xt non-stationary series.
137 As shown in Table 1(a), the 0.513 p-value of the ADF test (greater than =0.05)
138 indicated that the original time series was non-stationary. This non-stationarity was also
139 supported by the 0.02417 was p-value of the KPSS test (less than =0.05). Therefore,
140 differencing was used to convert the original time series to stationary. The first differencing
141 of the original time series detrended and stabilised it. Table 1(b) shows the ADF test’s p-
142 value of 0.01 (less than =0.05), indicating a rejection of the non-hypotheses and
143 demonstrating the success of differencing the time series. The KPSS test’s p-value of 0.1
144 (greater than =0.05) indicated that the non-hypotheses of stationarity in the time series was
146
147 Table 1 (a) Unit root test before differentiation, (b) Unit root test after first
148 differentiation.
149 a)
150 b)
Unit Root Test t-statistic P-value
151
152 The structure dependence of the coefficient rates was calculated by testing ACF and
153 PACF analyses. The ACF and PACF plots in Figures 4(b) and 4(c) defined the dependence of
154 the coefficient structure suggesting that non-seasonal (p, d, q) and seasonal (P, D, Q)
155 parameters were required in the model design. Major cuts were observed at lags 1 and 12 on
156 the ACF and PACF plots after non-seasonal differentiation as shown in Figures 4(e) and 4(f).
157 ACF and PACF analyses suggested that the p and q values should be equal to 0 or 1.
A B C
D E F
158 Figure 4: (A) Natural series of Dengue Fever Cases, (B) ACF plot of natural series Dengue Fever cases, (C) PACF plot of Natural series
159 Dengue Fever, (D) Natural Logarithm with first differencing of Dengue Fever Series, (E) ACF plot of natural logarithm with first
160 differencing of Dengue Fever, (F) PACF of natural log with first differencing of Dengue Fever Cases
161
162 Table 2 showed the BIC, RMSE, MAE and MAPE parameter values of the developed
163 SARIMA models in relation to different p, d and q parameters. From the models, the
164 SARIMA (0,1,0) (3,0,2)12 model had the lowest BIC, RMSE, MAPE and MAE parameter
165 values, and the highest coefficient of determination (R 2) value which made it the most
166 appropriate, compatible, and best-fit model for DF cases. The parameters were estimated
167 using maximum likelihood estimation (MLE), the best and most appropriate method of
168 estimation. A Ljung-Box test of the SARIMA (0,1,0)(3,0,2)12 model, using the values below,
169 had a p-value greater than =0.05 which indicated that the model was appropriate.
170 Table 2: Tentative model of SARIMA for Dengue Fever cases in Kuantan
174 (3,0,2)12 model data. For all the time lags, the plot showed that the ACF parameters fell within
175 the 95% confidence interval (CI) and the plot values were close to zero indicating that the
176 series is considered white noise. The normality plot, shown in Figure 5(b), revealed that the
177 residual data was distributed normally. The 0.844 p-value, shown in Table 3, indicated that
178 the alternative hypothesis was rejected and that the data was distributed normally.
(a) (b)
179
180 Figure 5 (a) Residual plot of SARIMA (0,1,0)(3,0,2)12 , (b) normality plot of residuals
182
183 The statistical data, shown in Table 3, are from Shapiro-Wilk and Kolmogorov-
184 Smirnov tests. The Shapiro-Wilk test was used to observe datasets smaller than 2000,
185 otherwise, the Kolmogorov-Smirnov test was used. Since there were only 96 observable data
186 records, the Shapiro-Wilk test would have been used. If the residuals were distributed
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
189
190 After diagnostic testing of the time series, the model was tested using actual DF case
191 values from January 2011 to December 2018, that was named as training dataset. The dataset
192 was executed using SARIMA (0,1,0) (3,0,2)12 model to forecast DF cases in 2018. The results
193 were validated by comparing it to the actual DF case numbers in 2018 using metrics
194 specifically BIC, R2, RMSE, MAPE, and MAE in the developed predictive models which
195 allowed for an objective view of the strengths and weaknesses of each model. The validation
197 Fig
198 ure 6: Validation of SARIMA (0,1,0)(3,0,2)12 model with actual dengue fever cases from
201 (0,1,0) (3,0,2)12 model. Later, the model was used to forecast monthly DF cases for 2019
202 (Figure 7). The blue line shows the predicted DF case numbers from Jan 2011 to December
203 2018. The results showed that the SARIMA (0,1,0) (3,0,2)12 model’s forecast was reasonably
204 accurate, with the predicted DF case pattern for 2019 being almost identical to the actual DF
205 case pattern and fell within the 95% CI confirming that the forecasted data was adequate and
206 efficient.
207
208 Figure 7: Plot of predicted dengue cases in 2019 using SARIMA (0,1,0) (3,0,2)12 with
210 4. Discussion
211 To assess the risk of an outbreak, particularly DF, an early prediction tool is necessary.
212 Instead of controlling the disease, early diagnosis will not only allow for early intervention
213 but prevention as well. Therefore, an early warning system must be established to identify
214 and quantify the threat of DF in the population. The existing system of DF outbreak
215 prediction focuses solely on various entomological indices while ignoring epidemiological
216 indices. The SARIMA model is a useful tool for tracking and interpreting data. It has great
217 potential as a public health decision-making tool to improve contingency planning and
218 mitigation initiatives (Dom N.C.et al., 2013). The SARIMA model developed in this study
219 closely mimicked the pattern of DF cases in Kuantan. The model was tested by forecasting
220 DF case numbers for 2019 through auto-regression and moving average parameters.
221 Therefore, using multi-month trend extrapolation, this model can successfully forecast the
223 This study focused on forecasting DF cases in Kuantan using a SARIMA model. It has
224 been determined that, of all the models developed in this study, the SARIMA (0,1,0) (3,0,2) 12
225 model was the most appropriate and parsimonious model with the lowest BIC, RMSE, MAE
226 and MAPE parameter values and the highest R 2 value. It was found to accurately predict the
227 number of DF cases for which is months ahead of time, indicating that the method could be
228 used to predict DF case numbers for 2019 in Kuantan. The model forecasted a total of 814
229 DF cases in 2019 with the highest number of cases (14% or 111) occurring in November and
230 the lowest number of cases (6% or 48) occurring in February (Table 4). A SARIMA model,
231 utilising the same BIC, RMSE, MAE and MAPE parameters, was used to predict DF case
232 numbers in Selangor and found to closely reflect the actual number of DF cases
233 (Thiruchelvam et al., 2018). Several other studies have reported similar findings using
234 SARIMA models developed using secondary data and Akaike information criterion (AIC),
235 RMSE, MAE, MAPE parameters (Phuthomdee et al., 2018; Che Dom et al., 2013). This
236 model was able to consistently predict and tally with actual DF case numbers. Many studies
237 also consider and discuss climate change impacts, such as precipitation, temperature, and
239
240 Table 4: Summary of the forecasted values with the lower and upper 95% CI
Model Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
(0,1,0)
2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019
(3,0,2)12
Forecast 53 48 59 53 52 57 76 59 68 83 111 95
UCL 79 83 115 113 120 139 198 162 196 252 350 313
LCL 34 25 27 21 18 18 21 15 16 18 21 17
241
242
243 5. Conclusion
244 In conclusion, every objective of this study was successfully achieved. Based on our
245 prediction model, Kuantan can expect a 41% (236) increase of dengue fever (DF) cases in
246 2019 from the 578 cases reported in 2018. This model is a good fit for these cases only
247 because they are localised transmission (peri-domestic infection). DF transmission is very
248 complex with the risk of transmission varying in different locations and from season-to-
249 season. The disease cycle depends on seasonal conditions, immunity, and changes in hyper-
251 Due to the intrinsically complex nature of these processes, time series prediction is a
253 of both systems, is near impossible to tell. More importantly, it is unknown to what degree a
254 non-linear deterministic system preserves its properties when distorted by white noise. White
255 noise may influence a model in various ways even though the model’s equations remain
256 deterministic. Since there is no single accurate statistical measure of chaos, it is crucial to
257 combine multiple tests, especially when working with small and white noise data sets such as
259 Ideally, this model can be used to track and predict the occurrence of DF in Kuantan.
260 This is in line with the need to develop DF monitoring and prediction strategies to reduce not
261 only local and national cases but regional cases as well. Therefore, the SARIMA model can
262 accurately forecast DF cases, thereby enhancing the current intervention programme by
263 allowing them to install vector control measures a few months ahead of DF seasons.
264
265
266 References
267 Ahmad, N., Rozita, H., Mazrura, S., Zainudin, M. A., Hidayatulfathi, O., Faridah, M. A., &
268 Artika, E. A. N. (2012). Dengue infections and circulating serotypes in Negeri Sembilan,
269 Malaysia. Malaysian Journal of Public Health Medicine, 12(1), 21-30.
270
271 Bhatia, R., Dash, A. P., & Sunyoto, T. (2013). Changing epidemiology of dengue in South-
272 East Asia. WHO South-East Asia Journal of Public Health, 2(1), 23-27.
273
274 Boudrioua, M. S., & Boudrioua, A. (2020). Modeling and Forecasting the Algerian Stock
275 Exchange Using the Box-Jenkins Methodology. Journal of Economics, Finance and
276 Accounting Studies, 1-15.
277
278 Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis:
279 forecasting and control. John Wiley & Sons.
280
281 Bujang, M. A., Mudin, R. N., Haniff, J., Sidik, T. M. I. T. A. B., & Nordin, N. A. M. (2017).
282 Trend of dengue infection in Malaysia and the forecast up until year 2040. International
283 Medical Journal, 24(6), 438-441.
284
285 Chew, M. H., Rahman, M. M., Jelip, J., Hassan, M. R., & Isahak, I. (2012). All serotypes of
286 dengue viruses circulating in Kuala Lumpur, Malaysia. Current Research Journal of
287 Biological Sciences, 4(2), 229-234.
288
289 Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R. Springer
290 Science & Business Media.
291
292 Dom, N. C., Hassan, A. A., Abd Latif, Z., & Ismail, R. (2013). Generating temporal model
293 using climate variables for the prediction of dengue cases in Subang Jaya,
294 Malaysia. Asian Pacific Journal of Tropical Disease, 3(5), 352-361.
295
296 Gjenero-Margan, I., Aleraj, B., Krajcar, D., Lesnikar, V., Klobučar, A., & Pem-Novosel, L.
297 (2011). Autochthonous dengue fever in Croatia, August-September 2010. Euro Surveill.
298
299 Holmes, E. C., Tio, P. H., Perera, D., Muhi, J., & Cardosa, J. (2009). Importation and co-
300 circulation of multiple serotypes of dengue virus in Sarawak, Malaysia. Virus
301 Research, 143(1), 1-5.
302
303 Mia, M. S., Begum, R. A., Er, A. C., Abidin, R. D., & Pereira, J. J. (2013). Trends of dengue
304 infections in Malaysia, 2000–2010. Asian Pacific Journal of Tropical Medicine, 6(6),
305 462-6.
306
307 Mudin, R. N. (2015). Dengue incidence and the prevention and control program in
308 Malaysia. IIUM Medical Journal Malaysia, 14(1).
309
310 Nisalak, A., Clapham, H. E., Kalayanarooj, S., Klungthong, C., Thaisomboonsuk, B.,
311 Fernandez, S., ... & Cummings, D. A. (2016). Forty years of dengue surveillance at a
312 tertiary pediatric hospital in Bangkok, Thailand, 1973–2012. The American Journal of
313 Tropical Medicine and Hygiene, 94(6), 1342-1347.
314
315 Phuthomdee, S., Soontornpipit, P., Viwatwongkasem, C., & Sillabutra, J. (2018). Dengue
316 Forecasting Model using SARIMA model to predict the Incidence of Dengue in
317 Thailand. Current Applied Science and Technology, 18(2), 58-65.
318
319 Skae, F. M. T. (1902). Dengue fever in Penang. British Medical Journal, 2(2185), 1581.
320
321 Ler, T. S., Ang, L. W., Yap, G. S. L., Ng, L. C., Tai, J. C., James, L., & Goh, K. T. (2011).
322 Epidemiological characteristics of the 2005 and 2007 dengue epidemics in Singapore–
323 similarities and distinctions. Western Pacific surveillance and response journal:
324 WPSAR, 2(2), 24.
325
326 Thiruchelvam, L., Dass, S. C., Zaki, R., Yahya, A., & Asirvadam, V. S. (2018). Correlation
327 analysis of air pollutant index levels and dengue cases across five different zones in
328 Selangor, Malaysia. Geospatial health, 13(1).
329
330 Vaughn, D. W., Green, S., Kalayanarooj, S., Innis, B. L., Nimmannitya, S., Suntayakorn, S.,
331 & Nisalak, A. (2000). Dengue viremia titer, antibody response pattern, and virus
332 serotype correlate with disease severity. The Journal of Infectious Diseases, 181(1), 2-9.
333
334 Wallace, H. G., Lim, T. W., Rudnick, A., Knudsen, A. B., Cheong, W. H., & Chew, V.
335 (1980). Dengue hemorrhagic fever in Malaysia: the 1973 epidemic. The Southeast Asian
336 Journal of Tropical Medicine and Public Health, 11(1), 1-13.
337