0% found this document useful (0 votes)

2 views

A Data-driven Approach to Forecasting Ground-level Ozone Concentration

This study presents a machine learning approach to forecast ground-level ozone concentrations in southern Switzerland, utilizing feature selection methods and Shapley values to enhance model explainability. The authors emphasize the importance of accurately predicting ozone levels for public health and demonstrate how weighting observations can improve forecast accuracy. The research also compares various forecasting algorithms and assesses the impact of atmospheric variables on ozone predictions.

Uploaded by

Miraceli Bonjardim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

A Data-driven Approach to Forecasting Ground-level Ozone Concentration

Uploaded by

Miraceli Bonjardim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

International Journal of Forecasting 38 (2022) 970–987

Contents lists available at ScienceDirect

International Journal of Forecasting

journal homepage: www.elsevier.com/locate/ijforecast

A data-driven approach to forecasting ground-level ozone

concentration
∗
Dario Marvin, Lorenzo Nespoli, Davide Strepparava, Vasco Medici
Scuola Universitaria Professionale della Svizzera Italiana, Switzerland

article info a b s t r a c t

Keywords: The ability to forecast the concentration of air pollutants in an urban region is crucial
Shapley values for decision-makers wishing to reduce the impact of pollution on public health through
Genetic algorithms active measures (e.g. temporary traffic closures). In this study, we present a machine
Environmental forecasting
learning approach applied to forecasts of the day-ahead maximum value of ozone
Evaluating forecasts
concentration for several geographical locations in southern Switzerland. Due to the
Multivariate time series
low density of measurement stations and to the complex orography of the use-case
terrain, we adopted feature selection methods instead of explicitly restricting relevant
features to a neighborhood of the prediction sites, as common in spatio-temporal
forecasting methods. We then used Shapley values to assess the explainability of the
learned models in terms of feature importance and feature interactions in relation to
ozone predictions. Our analysis suggests that the trained models effectively learned
explanatory cross-dependencies among atmospheric variables. Finally, we show how
weighting observations helps to increase the accuracy of the forecasts for specific ranges
of ozone’s daily peak values.
© 2021 The Author(s). Published by Elsevier B.V. on behalf of International Institute of
Forecasters. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).

1. Introduction and motivations concentration of a given day by performing the prediction

at two different times: the evening before and the early
Ground-level ozone (O3 ), which forms in the tropo- morning of the target day. We are particularly interested
sphere by photochemical reactions in the presence of in the days in which the ozone concentration is signifi-
sunlight and precursor pollutants, such as the oxides of cantly higher than usual, due to their potential impact on
nitrogen (NOx ) and volatile organic compounds (VOCs) public health. The choice of the target variable is based
(Calvert, Orlando, Stockwell, & Wallington, 2015), is on Swiss legislation, which states that the 1-hour mean
known to be a pollutant particularly dangerous to human of 120 µg/m3 must not be exceeded more than once
health (Stewart et al., 2017; World Health Organization, per year (The Swiss Federal Council, 1985). We use an
empirical, data-centric approach that leverages a large
2003). Forecasting ozone concentrations is an important
dataset of air quality, weather station measurements, and
task to ensure the protection of outdoor workers who are
weather forecasts. Data are collected for seven sites in
exposed to polluted air during the most dangerous hours
southern Switzerland, for which the forecast is performed.
of the day, as well as sensitive people such as children or
We use Shapley values and a genetic algorithm to
the elderly.
select the most important features for each model, and we
In this paper, we tested a number of machine learn- calculate various forecasts with the help of cutting-edge
ing algorithms that forecast the maximum hourly ozone forecasting algorithms.
The work is organized as follows: Section 1.1 contains
∗ Corresponding author. an overview of similar research in the scientific liter-
E-mail address: [email protected] (V. Medici). ature, and Section 1.2 highlights the contributions we

https://doi.org/10.1016/j.ijforecast.2021.07.008
0169-2070/© 2021 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under
the CC BY license (http://creativecommons.org/licenses/by/4.0/).
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

bring to this work. Section 2 introduces the dataset and were obtained by fitting knowledge-based empirical for-
the nomenclature we use in the paper to refer to the mulae using historical data. No systematic investigation
different features, and Section 3 presents the forecast- of the interaction of variables was carried out. The au-
ing problem peculiarities and the problem formulation. thors showed how the maximum daily temperature is
Section 4 describes the two feature selection methods that the single most relevant variable in predicting (and fore-
were tested, namely a custom genetic algorithm and a fea- casting) ozone concentration. They consistently found a
ture selection method based on Shapley values. Section 5 strong correlation between the numerical weather predic-
outlines the regression algorithms used to perform the tion forecast error of this value and the error for ozone
analysis. Section 6 introduces the deterministic and prob- prediction. In Eslami, Choi, Lops, and Sayeed (2019) the
abilistic key performance indicators (KPIs) that we used to authors proposed a deep convolutional neural network
evaluate the different forecasters. Section 7.1 presents the (CNN) to forecast the hourly ozone concentrations for the
results of the two tested feature selection algorithms. In day ahead, over 25 monitoring sites. Despite the ability
Section 7.2 we study how different features and feature of the CNN to predict daily ozone trends correctly, the
interactions affect the final predictions of the forecast, authors found that it under-predicted high ozone peaks
using Shapley values. In Section 7.3, we show the numer- during the summer. In Gong and Ordieres-Meré (2016),
ical results for the tested forecasting algorithms, while in the authors focused on forecasts of extreme ozone con-
Section 7.4 we focus on predictions of extreme events. centrations, which are also the most useful to predict.
Finally, Section 8 concludes the paper with a summary of Forecasting extreme events is, in fact, more complicated
our main findings. than predicting them, as demonstrated in Mohan and
Saranya (2019). When one is mostly interested in predict-
1.1. Related works ing these tail events, sampling techniques can be applied
in order to mitigate the class imbalance problem (rare
Tropospheric ozone concentration has been the subject events are under-represented in the training data). The
of several studies, both for prediction (the task of finding authors of this study applied different sampling methods
a map from a set of covariates to a target) and for fore- to increase the classification accuracy of ozone concen-
tration, considering three different classes. They found
casting (predicting the values of the target in advance, in
that under-sampling can indeed increase the classification
future time steps). In Al Abri, Edirisinghe, Nawadha, and
performance. Unfortunately, a drawback of this technique
Kingdom (2015) different non-parametric models from
is that several data of the most represented class are dis-
the WEKA toolkit were tested to derive the ozone con-
carded, which could lead to a lack of generalization of the
centration from a set of eight different gaseous chemicals
model, due to over-fitting or a reduction of cross-learning
and atmospheric conditions measured at a single location.
(learning patterns from data in a given class, which are
Similarly, WEKA was used in Mohan and Saranya (2019)
also present in a second unobserved class, which could
to adapt models representing atmospheric conditions to
increase the prediction accuracy).
the ozone concentration at ground level, which showed
that even summer ozone peaks can be accurately pre-
1.2. Contributions
dicted if the atmospheric conditions are known. In Feng,
Zhang, Sun, and Zhang (2011), meteorological data from In the presence of a high number of relevant features,
a site near Beijing were used to predict the hourly ozone the task of forecasting the next-day peak in the concentra-
concentration at that point by using a neural network tion of ozone becomes highly challenging, due to the low
whose weights were trained using a genetic algorithm. number of observations on which a forecasting algorithm
In addition, different models were adapted for different can be trained. In fact, having a dataset consisting of a
times of the day. In Sheta, Faris, Rodan, Kovač-Andrić, few years of observations could result in having a number
and Al-Zoubi (2018), a nonlinear state-space model using of features higher than the number of observations, as
PM10 , temperature, wind speed, and relative humidity in the presented case. On the other hand, observations
as inputs was identified by using a neural network to further back in time may not be representative of the
predict ozone concentration. The model was then com- current situation, as the mixture of nitrogen oxides in the
pared with linear models and a multilayer perceptron. air has changed over time following vehicle fleet renewal.
In Siwek and Osowski (2016) the authors used a dataset As a consequence, due to the scarce number of instances,
of 55 characteristics (meteorological conditions and their we could not apply under-sampling techniques, as done
statistical transformations) collected in Warsaw to predict in Mohan and Saranya (2019). The only effective way
various air pollutants. They showed that by reducing the to train a model is by applying dimensionality reduction
number of features with a pre-selection step, the final techniques. Our first contribution consists of evaluating
accuracy of the prediction could be increased. Two pre- two different methods to perform feature selection. First,
selection methods were compared: a genetic algorithm we tested a genetic algorithm, as was done in Siwek
and a stepwise greedy strategy for linear models. and Osowski (2016) for the pollutant prediction task.
The task of forecasting PM2.5 and ozone concentra- In this case, we crafted custom mutation and crossover
tions for three large Chinese cities was considered in Lv, functions tailored to the forecasting task. The second ap-
Cobourn, and Bai (2016). Like in our study, the authors proach we tested is based on Shapley values (Lundberg
considered multiple monitoring stations, but the final val- & Lee, 2017). We then evaluated and compared the two
ues of the relevant atmospheric variables were weighted feature selection methods. To show that the feature se-
averages of neighbors of the target cities. The forecasts lection step is beneficial in increasing the accuracy of
971
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Table 1 Table 2
Geographic context of air quality and weather monitoring stations. Geographic context of weather forecasting locations.
Station Code Altitude Context Main O3 Location Code Altitude [m a.s.l.] Context
[m a.s.l.] source Comprovasco p1 575 Rural
Locarno l1 200 Urban Industry Matro p2 2171 Mountain
Brione l2 486 Suburb Valley floor Bioggio p3 518 Suburb
Bioggio l3 314 Suburb Industry Tesserete p4 626 Rural
Tesserete l4 518 Rural Valley floor Chiasso p5 240 Urban
Chiasso l5 230 Urban Industry Sagno p6 704 Suburb
Mendrisio l6 354 Suburb Industry
Sagno l7 704 Rural Valley floor
Table 3
Dataset description. The symbol † denotes a variable that is both
measured and forecasted by the NWP service, while the symbol ‡
indicates a signal that is only forecasted.
the predictive algorithm, we compared our models with
Signal Symbol Unit
two control cases: one in which the model uses all the
[µg/m3 ]
[ 3
]
available features and one in which we pick the features Nitrogen oxide NO
entirely at random. Our second contribution is to compare Nitrogen dioxide NO2 µg/m
Generic nitrogen oxides NOx [ppb]
the performance of different popular learning algorithms
[µg/m2 ]
[ 3
]
Ozone O3
trained on the selected features. Third, we investigate † Global irradiance G W/m
the effect of imposing weights on the observations with † Atmospheric pressure P [hPa]
the highest daily ozone concentration on the algorithms’ † Precipitation Prec [mm]
† Relative humidity RH [%]
forecasting quality of extreme values. Our final contribu-
† Temperature T [◦C]
tion is an a posteriori explanation of feature importance. ‡ Dew point TD [◦C]
We investigate the more relevant feature interactions in † Wind direction, vectorial average Wd [◦ ]
predicting the ozone peak and explain our findings in † Wind speed, vectorial average Ws [m/s]
terms of atmospheric physics. † Cloud cover CN [–]

2. Dataset
algorithms correctly, while simultaneously avoiding the
2.1. Geographical context and data acquisition use of previous years, when the emissions of precursors
NO, NO2 , and NOx in Switzerland were more intense than
In this study, we focused on the Canton of Ticino, the today. We used data from the first four years to train the
southernmost canton of Switzerland. In this region, the forecasting algorithms and data from 2019 to test them.
concentration of air pollutants is generally higher than in A variety of signals covering weather and air quality
the rest of the country and is influenced by both the orog- with hourly resolution were considered for the model, as
raphy and the level of urbanization and industrialization. shown in Table 3. Most monitoring stations record the full
The natural shield provided by the Alps makes Ticino the set of signals specified in Table 3 on site. The few excep-
region with the highest solar radiation rate in Switzer- tions, where some signals were not collected locally, were
land. Ticino is characterized by a densely populated and managed with data from the nearest available stations.
When training forecasting algorithms, we considered data
heavily trafficked southern region, and by a sparsely pop-
measured up to 72 h in the past (except for the ARIMAX
ulated and more mountainous northern region. It borders
model) and weather forecasts up to 33 h in the future. The
Lombardy to the south, the most industrialized region in
value of 72 h was chosen based on preliminary results, in
Italy.
which we considered a history length up to seven days,
In this study, we used data acquired from several air and systematically shortened it. As we did not experience
quality and weather stations distributed in the region. significant improvements in accuracy from using a history
In addition, the Swiss Federal Office of Meteorology and length higher than 72 h, we fixed this value for all the
Climatology MeteoSwiss1 numerical weather prediction experiments. Table 3 shows the available signals.
(NWP) service provided weather forecasts for some of For various reasons, such as station maintenance, data
these locations. Fig. 1 shows the position of the moni- transmission failure, or power outages, some of the data
toring stations and the locations for which the weather in the time series of the years from 2015 to 2019 are
forecasts are available. Tables 1 and 2 describe the ge- missing. During the training period, data completeness is
ographical context for the monitoring stations and the above 99%, but two stations have substantial data holes in
weather forecasting locations, respectively. 2019: in Tesserete and Sagno, respectively, 50 and 15 full
Due to its photochemical origin, O3 shows a strong days of measurements are missing during the test period.
seasonal pattern, with higher concentrations in summer. All the missing data in the training set were filled using a
For this reason we focused our analysis only on the period random forest with surrogate splits, trained to predict the
from May to September in the years between 2015 and missing data using the station itself and its neighbors.
2019. This period was chosen to take into account a sig-
nificant number of measurements, i.e. enough to train the 2.2. Feature engineering

The large number of signals and their high granularity

1 https://www.meteoswiss.admin.ch/. resulted in a high dimensionality of the dataset. To reduce
972
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 1. Map of Ticino with the locations of stations used in the study. Red shows the stations where air quality and weather measurements are
collected, and blue shows the locations for which weather forecasts are available. The maps were originally downloaded from d-maps-1 (2020a)
and d-maps-2 (2020b). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

the overall number of features and minimize the compu- 2.3. Nomenclature
tational effort, we partly replaced the hourly values of the
measured and forecasted signals with basic statistical ag- The ozone forecast at any station for any given day D is
gregations, i.e. minimum, maximum, and average values computed twice: first at 18:30 (16:30 UTC) on the previ-
over a longer time period, as illustrated in Table 4. ous day D − 1, which we call the EVE forecast, and second
Based on suggestions from experts in the field of atmo- at 06:30 (04:30 UTC) on the same day D, here called the
spheric physics,2 we further manipulated some of the sig- MOR forecast. This is because the weather forecasts issued
nals available in the dataset to create additional features. by the NWP services are published twice a day, at 05:00
The engineered features are listed in Table 4. In addition, and at 14:00 local time. Fig. 2 illustrates the time window
we included a categorical feature, called RHW , which de- for a generic day. For each station, we tested eight differ-
scribes the general situation of the weather in Switzerland ent prediction methods at both prediction times, EVE and
for the prediction day using 12 weather types. MOR, for a total of 16 models per station.
For each location, separate forecasting models are When labeling the aggregated data in Table 4, we
trained using a subset of the matrix of all features. This use the following conventions. Measured quantities are
subset contains data specific to the location and infor- denoted by the letter m and weather forecasts provided
mation from the neighboring stations. For NWP, hourly by NWP services are denoted by the letter s. The index
values are used for the specific location, while bins are is the difference in hours between the last available data
used for the data of the neighboring stations, as sum- point and the acquisition time. Following this convention,
marized in Table 4. For example, the dataset of Chiasso m0 refers to the last measured data point available, i.e. the
contains the hourly NWP from Chiasso itself and bins value measured at 06:00 for the MOR forecast and at
aggregations from Sagno. 18:00 for EVE forecast. Likewise, m1 refers to the value
Given the different number of stations involved each measured at 05:00 and 17:00, and so on up to m23 .
time, the number of features for each model is variable The same temporal indexing applies to values provided
and comprise between 1700 and 2100. by NWP services. For MOR we call s0 the forecasted value
at 06:00, s1 the value for 07:00, and so on. In the EVE case
2 Environment Observatory of Southern Switzerland (OASI). we call s0 the predicted value at 18:00, s1 the predicted
973
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Table 4
Summary of all the features used in this study. More information about the MOR and EVE cases is given in Section 2.3.
Signal kind Time interval Code Aggregation
Past 24 h (m0 , . . . , m23 ) mi Hourly values
All measured data From 0 to 24 h before 24h
From 0 to 48 h before 48h Mean
From 0 to 72 h before 72h
All forecasts (same station) MOR: from s0 to s32 si Hourly values
EVE: from s0 , to s29
All forecasts (neighboring station) MOR: from s0 to s7 b1 Minimum, maximum and
EVE: from s0 , to s6 average of every bin bi
MOR: from s8 to s16 b2
EVE: from s7 to s13
MOR: from s17 to s24 b3
EVE: from s14 to s19
MOR: from s25 to s32 b4
EVE: from s20 to s29
Measured NOx MOR: previous afternoon (m6 to m18 ) NOx12h Mean
EVE: previous morning (m6 to m18 )
Forecasted T MOR: upcoming afternoon (s6 to s18 ) T̂PM , T̂PM ,squared Mean and squared mean
EVE: upcoming afternoon (s18 to s29 )
Forecasted T MOR: all hourly values, from s0 to s32 T̂max Maximum
EVE: all hourly values, from s0 to s29
transf
Forecasted TD MOR: all hourly values, from s0 to s32 TD
ˆ max , TD
ˆ max Maximum, (Maximum + 20)3
EVE: all hourly values, from s0 to s29
Forecasted G MOR: upcoming morning (s0 to s6 ) ĜAM , ĜPM Mean
MOR: upcoming afternoon (s6 to s18 )
EVE: upcoming morning (s6 to s18 )
EVE: upcoming afternoon (s18 to s29 )
Forecasted CN MOR: upcoming morning (s0 to s6 ) CN
ˆ AM , CN
ˆ PM Mean
MOR: upcoming afternoon (s6 to s18 )
EVE: upcoming morning (s6 to s18 )
EVE: upcoming afternoon (s18 to s29 )
Forecasted Prec Upcoming 24 h (s0 to s23 ) Prec
ˆ 24h,sum Sum
Measured YO3 O3 measurements of the previous day YO3 Maximum
Forecasted RHW One categorical value for the RHW –
prediction day

Fig. 2. Time window of input and output data. Times are given in local time (CEST).

value at 19:00, and so on. The structure of the aggregation the measured NO concentrations up to 72 h before the
bins is shown in Table 4. prediction, in Brione.
To better refer to each specific component of the mod-
els, we denote the features based on the location of the 3. Problem formulation
measurement and time to which it is referred, combining
the codes of Tables 3, 4, 1, and 2. For example, Gm1 1
l The problem of forecasting the daily maximum ozone
designates the global irradiance measured in Locarno at signal presents the following characteristics:
05:00 in the MOR model and at 17:00 in the EVE model. 1. The signal is strongly seasonal, due to the pres-
p
Similarly, T̂s103 is the forecasted temperature in Bioggio at ence of annual patterns in both anthropogenic and
16:00 in the MOR model, and at 04:00 on the following non-anthropogenic processes governing ozone gen-
l2
day in the EVE model. NO72h is the mean value of all eration.
974
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

2. The signal is non-stationary, since its variance is case the model can be described as f (xtr , ytr , Θ ), where
subject to inter- and intra-annual fluctuations. the endogenous input signal ytr is then opportunely
3. The forecasts’ dependence on the features is non- shifted with the use of the backshift operator, as further
linear, as described in the literature and as further explained in Section 5.
detailed in Section 7.2.
4. The forecasted values are physically bounded by 4. Feature selection methods
the photo-chemistry and advective phenomena
regulating the formation and transport of ozone in Given the large number of features in each model,
the troposphere and atmosphere. if we were to train the prediction algorithms using all
5. In our use case, monitoring stations providing mea- the variables, whose number largely exceeds the num-
surements of relevant features for ozone forecast- ber of available observations, we could potentially incur
ing, such as temperature, past ozone and NOx numerical problems of solution non-uniqueness and mul-
values, are not sufficiently dense (nor at similar dis- ticollinearity that would corrupt the prediction process.
tances from prediction points) to provide a regular Moreover, even if the dataset contained a proportional
mesh, as can be seen in Fig. 1. In this case, the number of observations, an excessive number of features
use of spatio-temporal Gaussian processes (Kupilik would still result in a long computational time, which is
& Witmer, 2018), Gaussian Markov random fields justified only if the forecasting performance is better than
(Cameletti, Lindgren, Simpson, & Rue, 2013), or that of an algorithm trained on a subset of the features.
other graph-based spatio-temporal techniques We decided to perform feature selection using a cus-
(Carrillo et al., 2020) can lead to poor results. tom implementation of a genetic algorithm (GA), as well
Given the above considerations, we chose to model daily as using a proceeding issued from game theory, exploiting
ozone maxima using separate predictors for each location, Shapley values. The effectiveness of these two approaches
while still taking into account relevant features from is compared in Section 7.1 against a model composed of
nearby locations. The neighboring stations for each pre- features picked at random and a model composed of all
diction point are illustrated in Fig. 1, where the whole the available features.
region is divided into three macro-zones. In any case, we
let the feature selection processes described in 4.1. Feature selection using a genetic algorithm
Section 4 discriminate whether a given measurement
station is relevant. Calling n the number of observations In our implementation of the GA, an individual A is
and k the number of features, we define a training and defined as a subset of the entire feature set F with car-
a test dataset, Dtr = {xtr , ytr } and Dte = {xte , yte }, dinality k:
respectively, where xtr ∈ Rntr ×k and xte ∈ Rnte ×k are A ⊂ F, |A| = k (3)
matrices of features, and ytr ∈ Rntr and yte ∈ Rnte are
the target vectors, containing the maximum hourly ozone where k is the number of retained features. As anticipated
concentration of the same days. In this paper, ntr and nte in Section 3, in this study we set k = 30.
are equal to 587 and 151, respectively, that is, the number We defined a crossover function that ensures that the
of available days between May and September for the offsprings that emerge from it still retain k features from
2015–2018 period and for 2019. On the other hand, the their parents, with no repetitions. Formally, the offspring
number of features, k, is fixed at 30 for all the numerical C is a subset of the union of the sets of its parents A and
experiments and raised to 100 for the study of predicting B, with cardinality k:
high peaks, described in Section 7.4. These numbers were
chosen experimentally by systematically increasing them C ⊂ (A ∪ B) , |C | = k (4)
and choosing the k value beyond which the predictors’ We defined a custom mutation function so that each
performance no longer increased significantly. feature of the offspring C is either the original feature of
We train a model f (xtr , Θ ), where Θ is a set of the its parent A with probability 95%, or a new feature from
model’s parameters, in order to produce the forecasts for B ⊂ (F \ A) with probability 5%, and such that C has only
unseen data xte ∈ Rnte ×k : unique features. In practice, we generate two sequences
ŷte = f (xte , Θ ). (1) a and b from the sets A and B, by randomly fixing their
order, and iterate on them to generate the new set C ,
In order to compare the results across the different ap- which is composed of the elements of the sequence c,
proaches, we used regression-specific key performance where the ith element of c is defined as:
indicators (KPIs), as classification scores can only be com-
pared while using the same bins for the choice of the ci = ai (u ≥ 0.05) + bi (u < 0.05) u ∼ U [0, 1] (5)
classes. Different values for the bins’ edges are used in
The fitness function is defined as the out-of-bag mean
the ozone prediction literature, since those are typically
squared error (MSE) of a random forest composed of 30
chosen based on the local legislation. As such, we trained
bootstrap-aggregated (bagged) decision trees, trained on
the model f (xtr , Θ ) minimizing L2 loss:
the k active features of the individuals. We selected a
Θ ∗ = argmin ∥ytr − f (xtr , Θ )∥22 . (2) population size of 10k individuals, a crossover fraction of
Θ 80%, and an elite count of 5% of the population size. The
We highlight how this notation must be slightly adapted GA stops after 100 stall generations and the feature set of
for the ARIMAX model introduced in Section 5; in this the best individual is selected.
975
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

4.2. Feature selection using Shapley values shrinkage and selection operator (LASSO) regression (Tib-
shirani, 1996) penalizes β using the L1 norm, such that
Another method to assess the importance of each fea- some of the elements of β̂ can be set to zero. Unlike
ture is presented in Lundberg and Lee (2017). This method ridge regression, LASSO does not have a closed-form solu-
assigns feature importance scores using Shapley values, tion, but rather must be approximated through numerical
which originated in the field of game theory where they methods.
are used to estimate the contribution of various agents
in increasing the welfare of a community. These are ex- ARIMAX. The well-known autoregressive integrated
pressed as: moving average with explanatory variable (ARIMAX) is
defined as:
∑ |z |! (k − |z | − 1)!
φi (f , xtr ) = [f (z , Θ ) − f (z \i, Θ )]
p
∑ q
∑
z ⊆x
k! ŷt = β xtr + φi Bi (y′t ) − θi ϵt −1 + ϵt . (8)
i=1 i=1
(6)
where B is the backshift operator, i.e. Bn (zt ) = zt −n ; ϵt
where f (xtr , Θ ) is a regression model, in our specific case is additive white Gaussian noise; and the time series y′
the NGBoost algorithm, which is introduced in Section 5; is the result of differencing ytr d times. The matrix xtr
xtr is the feature set on which the model has been trained; contains the daily values of the selected features up to the
Θ is a model-specific set of parameters; k is the number of day before the prediction, and β denotes the regression
variables in the training set xtr ; and z \i denotes the minus coefficients as usual.
set operation, that is, the subtraction of the ith feature
The models are created and calculated with
from the reduced dataset z. The authors in Lundberg
statsmodels’s SARIMAX function, and the parameters
and Lee (2017) showed that such coefficients have highly
p, q, and d are chosen via grid search and fitted using
desirable properties that favorably affect their ability in
maximum likelihood estimation; we considered only d =
the (local) explanation of the models, and have been
0, since we do not have important trends, while we set
shown to be consistent and more robust with respect
7 and 3 as maximum values for p and q, respectively. We
to other more widespread methods for the evaluation of
stress that, even if the features contained in xtr refer to the
feature importance. Furthermore, the authors in Lundberg
et al. (2020) recently proposed a computationally efficient last 72 h, the ARIMAX model has been left free to extend
algorithm specifically tailored for tree-based models, and the endogenous signal’s influence on the forecast up to
made it available through the shap Python package. The seven previous days, that is p = 7. However, for all the
shap package provides an exact computation of Shapley considered locations, the grid search returned p <= 3.
value explanations for tree-based models. This provides Random forests and quantile random forests. The ran-
local explanations with theoretical guarantees of local ac- dom forest (RF) algorithm independently fits several deci-
curacy and consistency, which increase the robustness of sion trees, each trained on different datasets, created from
the method, since it does not rely on random samplings, the original one through random re-sampling of the ob-
which would be required to find the Shapley values using
servations, and keeping only a fraction of the overall fea-
approximate algorithms.
tures, chosen at random (Hastie, Tibshirani, & Friedman,
2009). The final prediction of the RF is then a (possibly
5. Regression models
weighted) average of the trees’ responses. One important
variant of RF algorithms is quantile regression forests
After the k features that best explain the data are
selected, we use them to create the regression matrix (QRF). The main difference from RF is that QRF keeps the
and produce the test forecasts. For this work, we studied value of all the observations in the fitted trees’ nodes, not
the output of several parametric models, such as linear just their mean, and assesses the conditional distribution
regression, ridge regression, LASSO, and ARIMAX, as well based on this information. Here, we used the Matlab
as more complex non-parametric tree-based algorithms, TreeBagger class, which implements the QRF algorithm
such as random forests, XGBoost, NGBoost, and LSBoost, described in Meinshausen (2014).
as described below. Tree-based boosting algorithms. Boosting algorithms
Penalized linear regression algorithms. Ridge regression employ additive training: starting from a constant model,
is a method designed to avoid collinearity issues and at each iteration, a new tree or any other so-called ‘‘weak
avoid near-singular matrix inversions when solving linear learner’’ hk (x) is added to the overall model Fk (x), so that
regression problems, especially in the case in which the Fk+1 (x) = Fk (x) + ηhk (x), where η ≤ 1 is a hyper-
number of features is large compared to the number of parameter denoting the learning rate, which helps reduce
observations. In this case, the regression coefficients β ∈ over-fitting. The least-squares gradient boosting (LSBoost)
Rk are quadratically penalized with parameter λ, such algorithm applies boosting in functional space: each weak
that the closed-form solution becomes: learner h tries to learn the gradient (with respect to the
)−1 T previous model Fk (x)) of the least-squares loss function.
β̂R = xTtr xtr + λIk xtr ytr ,
(
(7)
In other words, hk is fitted on the overall prediction error
where Ik is an identity matrix of size k. In this work, at iteration k − 1.
λ is tuned in cross-validation on the training set. In- A different approach is used by the XGBoost algo-
stead of punishing β using the L2 norm, least absolute rithm (Chen & Guestrin, 2016), which fits the additive
976
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

model Fk (x) in parameter space, that is, using a second- percentage error, S denotes forecast skill, and A denotes
order approximation of the loss, as a function of the accuracy. RMSEpers is the RMSE of the persistence model,
parameters of the weak learners (decision trees). This ap- i.e. the model where the prediction at day D + 1 is equal
proximation and other techniques used by XGBoost (like to the measured value at day D. C (yi ) is the function
an approximate histogram search for selecting splitting that associates every measured or forecasted value to the
points in the trees) result in a speedup of the training respective class, explicitly given by
process, with respect to LSBoost or RF algorithms, without
1 if 0 < yi ≤ 60,
⎧
sacrificing accuracy. At the same time, the algorithm in- ⎪
60 < yi ≤ 120,
⎪
troduces quadratic penalization on the parameter’s value ⎪
⎪
⎪ 2 if
and on the overall complexity of the trees, with which
⎪
120 < yi ≤ 135,
⎪
⎨3 if
parameters can be tuned to further mitigate over-fitting. C (yi ) = (14)
In addition to the QRF algorithm, we used a second ⎪
⎪ 4 if 135 < yi ≤ 180,
180 < yi ≤ 240,
⎪
algorithm that is able to assess the conditional probability 5 if
⎪
⎪
⎪
⎪
distribution of the predictions: natural gradient boosting ⎩
6 if 240 < yi .
(NGBoost) (Duan et al., 2019). While none of the previ-
ous algorithms introduced assumptions on the probability These values are the thresholds of classes of increas-
distribution of the observations, NGBoost explicitly fits ing severity of air pollution, as indicated by the Swiss
the parameters of a parametric probability distribution Society of Air Protection Officers (Cercl’Air) (Swiss Soci-
on each observation. This is made possible by exploiting ety of Air Protection Officers, 2019). Class 3 is especially
the tree structure of the underlying weak learner, since narrow compared to the other classes. As a result it will
observations in the same leaves share the same prob- be harder for the regression forecasting algorithms to
ability distribution parameters. The algorithm is fitted correctly predict this class.
in functional space, but instead of directly learning the Finally, we evaluated those algorithms which also re-
maximum likelihood gradient, the authors propose to cor- turned conditional distributions—that is, QRF and
rect it with the Fisher information. This results in fitting
NGBoost—using two additional KPIs. The first KPI is the
the so-called natural gradient, which makes the learning
reliability (Pinson, McSharry, & Madsen, 2010), defined as
process invariant to reparametrization of the underlying
probability distribution. n
1∑
We fitted LSBoost models using Matlab’s fitrensem- R(τ ) = 1{yi < ŷi,τ }, (15)
n
ble function, tuning its hyper-parameters via Bayesian i=1
optimization and using five-fold cross-validation. The XG-
where ŷi,τ is the quantile predicted by the algorithm at
Boost and NGBoost algorithms were fitted using their
the level τ ∈ [0, 1]. This KPI calculates how many of the
official xgboost and ngboost Python packages, respec-
tively, and hyper-parameters were selected using a grid total number of measured values are indeed lower than
search, always using a five-fold cross-validation strat- the quantile predicted on the same observations. If the
egy. We highlight that tuning the hyper-parameters in forecasting algorithm were perfect, the R(τ ) curve would
cross-validation mitigates over-fitting issues with the re- lie on the bisector of the first quadrant.
gression algorithms. The second probabilistic KPI is the average quantile
loss function, also known as pinball loss (Bentzien &
6. Key performance indicators Friederichs, 2014):
n
The performance of the forecasting algorithms intro- 1∑
ρ̄ (τ ) = ρτ (yi − ŷi,τ ), (16)
duced in Section 5 was evaluated using to the following n
i=1
standard performance indicators:

 n where the function ρτ (x) is defined as
1 ∑
τ |x| x ≥ 0,
{
RMSE = √ (ŷi − yi )2 (9) if
n ρτ (x) = (17)
i=1 (1 − τ ) |x| if x < 0,
n
1 ∑⏐ ⏐ This KPI measures how narrow the predicted probability
MAE = ⏐ŷi − yi ⏐ (10)
n density function is around the observations. It can be
i=1
shown that this loss is minimized, independently of the
n ⏐ ⏐
100% ∑ ⏐ yi − ŷi ⏐ underlying distribution which generated the data, when
MAPE = ⏐ ⏐ (11)
n ⏐ yi ⏐ the predicted quantiles are the true ones. It should be
i=1
noted that for τ = 0.5, the corresponding value ρ̄ (τ ) is
RMSE
S=1− (12) half the value of the MAE statistic. To further evaluate the
RMSEpers performance of the quantiles as a single score, we inte-
n
100% ∑ { } grate ρ̄ (τ ) over the [0, 1] interval, as outlined in Gneiting
A= 1 C (ŷi ) = C (yi ) (13) and Raftery (2007). Thus we define
n
i=1 ∫ 1
where RMSE is the root mean squared error, MAE is Q-score = ρ̄ (τ ) dτ . (18)
the mean absolute error, MAPE is the mean absolute 0

977
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 3. Boxplots for the computational time as a function of an increas-

ing number of features k and feature selection method. Each boxplot
contains values from seven-fold CV, for three different locations.

7. Results

7.1. Feature number and feature selection methods compar-

Fig. 4. Boxplots for the MAE and RMSE, with respect to the number of
ison features k. Each boxplot contains values from seven-fold CV, for three
different locations.
To investigate the impact of model complexity on the
general quality of the results, we varied the number of
features k from 6 to 36 in increments of six. Due to the 1999), a post-hoc pairwise test used to compare a set
relatively high computational time required to run this of m different models on a group of n independent ex-
experiment, the models were calculated for only three periments. First, a matrix R ∈ Rn×m is obtained whose
stations (Bioggio, Chiasso, and Locarno) on data from 2015 elements ri,j are the ranks for experiment i and model j.
to 2018, using seven-fold cross-validation. Fig. 3 shows Then, the mean rank for each model is retrieved through
the computational time, for all the models, as a function of column-wise averages of R. The performance of the two
the number of features and the feature selection method, models is identified as significantly different by the Ne-
for the three selected stations. menyi test if the corresponding average ranks differ by at
The time needed to perform feature selection and pre- least the critical difference
dict the results increases exponentially as a function of √
m(m + 1)
the number of features for feature selection based on the CD = qα,m (19)
GA, while it shows a linear trend for feature selection 12n
based on Shapley values. CV computation could take up where qα is the quantile α of the Studentized range statis-
to 4 h on our machine, a 16-core Intel i9-7960X CPU @ tic with m samples, here set to α = 0.9. We implemented
2.80 GHz with 128 GB of RAM. the Nemenyi test in Python following the implementa-
Similarly, using features selected with SHAP, we com- tion in the tsutils R package (Kourentzes, Svetunkov,
pared the performance of a subset of forecasting algo- & Schaer, 2020). The Nemenyi test is usually performed
rithms by gradually increasing the number of features after Friedman’s test, which is a non-parametric analog
(Fig. 4). The errors of the models do not appear to de- of variance for a randomized block design; this can be
crease after about 24 features. Consequently, we decided considered a non-parametric version of one-way ANOVA
to construct our models for the main analysis with k = 30 with repeated measures. More details on the difference
features. This number is a good compromise between the and implementation of the two tests can be found in Dale
quality of the model and the computation time. (2006).
To investigate whether feature selection effectively im- The results of the Nemenyi test can be observed in
proves the quality of prediction and to determine which Fig. 5. In Table 5, we show the KPIs calculated for a few
is the most efficient technique, we compared the perfor- selected forecasting algorithms for the locations with best,
mance of the algorithms across all locations using four average, and worst RMSE statistics, namely Bioggio MOR,
methods: GA, Shapley values (SHAP), random features, Chiasso MOR, and Mendrisio EVE.
and all available features (no feature selection). For the The results show that the approach using Shapley val-
first three methods, 30 features are either selected accord- ues obtains the best rank for all the ensemble methods
ing to the corresponding method or picked at random, across all the locations. For Ridge and LM, we essentially
while in the last case all the available features are used have a tie between SHAP and GA. In the case of LASSO, the
to build the model. approach that uses all features obtains the best rank. This
To compare the performance of the different models, is likely explained by the fact that LASSO, by design, in-
we used Nemenyi statistical tests (Hollander & Wolfe, herently performs feature selection. We can conclude that
978
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 5. Nemenyi test for comparing the RMSE performance of the four feature selection methods across all locations.

Table 5
Comparison between different feature selections for the best, average, and worst stations.
BIO MOR CHI MOR MEN EVE
SHAP GA All Rand SHAP GA All Rand SHAP GA All Rand
MAE 13.46 14.17 14.41 15.58 14.50 15.18 15.27 19.02 16.69 17.24 16.90 19.84
RMSE 17.68 18.38 18.98 20.35 19.36 21.25 20.03 24.21 21.59 22.61 21.70 24.92
LSBoost MAPE 10.34 11.06 11.29 12.32 10.65 11.05 11.33 14.42 12.75 13.57 13.52 15.59
S 0.430 0.408 0.388 0.344 0.419 0.362 0.399 0.273 0.333 0.302 0.330 0.231
Accuracy 71.43 68.71 71.43 69.39 71.43 71.43 70.75 60.54 65.07 60.96 59.59 59.59
MAE 14.05 14.32 14.63 15.55 15.84 15.66 17.34 21.13 16.78 17.69 17.54 21.43
RMSE 18.58 19.01 19.44 21.09 20.87 21.26 21.58 26.27 22.23 22.80 22.27 26.94
XGBoost MAPE 10.91 11.17 11.67 12.24 11.77 11.54 12.95 16.56 13.05 13.66 14.01 16.88
S 0.401 0.387 0.373 0.320 0.374 0.362 0.352 0.212 0.314 0.296 0.313 0.169
Accuracy 68.71 72.79 65.99 70.07 70.07 70.07 69.39 57.14 65.75 62.33 60.27 56.85
MAE 13.85 14.38 15.02 15.64 15.04 15.88 15.85 18.72 16.12 17.05 17.54 20.14
RMSE 18.50 18.70 20.04 20.60 19.90 21.64 21.11 23.58 21.32 22.79 22.76 25.46
NGBoost MAPE 10.73 11.13 11.96 12.36 11.06 11.85 11.98 14.26 12.54 13.29 14.12 16.04
S 0.404 0.397 0.354 0.336 0.403 0.351 0.367 0.292 0.342 0.297 0.298 0.214
Accuracy 68.71 68.71 68.71 68.03 69.39 70.07 67.35 59.86 66.44 60.27 58.90 57.53

performing a preliminary feature selection using Shapley Table 6

values analysis is preferable in terms of KPI quality and Three most relevant features for the MOR and EVE predictions for the
seven considered locations. Colors refer to the two dominant classes of
computation time. The SHAP feature selection is faster variables. Green: NWP forecasted temperature, blue: measured ozone.
compared to GA and All and avoids needlessly slowing
Location f1 f2 f3
down the predictive algorithms. p l p
MOR T̂b 4,mean NO2m3 0 T̂s144
2
BIO p p l
7.2. Feature importance and interactions EVE T̂s254 T̂b 4,mean O3m3 0
4
p l p
MOR T̂b 6,mean O3m5 11 T̂s125
To assess the importance of each feature in predicting 2
CHI p l p
the maximum ozone concentration, we again applied the EVE T̂b 6,mean O3m5 1 T̂s235
4

shap library introduced in Section 4.2, using the NGBoost MOR

p
T̂b 6,mean
p
T̂s126 NO2m5 0
l

algorithm as the f (xtr , Θ ) regression model. We thus an-

2
MEN p l p
EVE T̂b 6,mean O3m5 1 5
ĜPM
alyzed the most relevant features for the seven locations 4
p p p
and their interactions. Table 6 summarizes the three most MOR T̂b 2,mean Ĝb2 ,mean T̂s102
2 1

relevant features for the considered locations. For five of LOC p l p

EVE T̂b 2,max O3m1 0 T̂s191
4
the considered cases, the most important feature was a p p p
MOR T̂s82 T̂b 2,mean T̂s71
value of forecasted temperature for both the MOR and BRI p p
2
p
EVE predictions. For the other cases, the most important EVE T̂b 2,max T̂s191 T̂s211
4

feature was a past value of ozone concentration. The MOR

p
T̂b 2,max O3m7 13
l
YO3l7
4
three most relevant features include forecasted values of SAG l l l
EVE O3m7 1 O3m7 0 7
O324h
the temperature, locally or at a nearby station, and past
4 l p p
measured ozone concentrations for all the locations. MOR O324h ,mean T̂s154 Wds34
TES l p p
EVE O3m4 0 T̂s254 T̂s264
7.2.1. Temperature influence
In Fig. 6, we show the influence of single observations
on the ozone forecast, divided by feature, for the loca- which is the forecasted temperature at a nearby location.
p
tion of Chiasso. The most relevant predictor is T̂b 6,mean ,
2

979
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 7. Feature importance interaction between the mean of forecasted

temperatures in Sagno between 14:00 and 23:00, and forecasted
temperature in Chiasso at 17:00, on the EVE prediction of ozone in
Chiasso.

typical exponential form of a chemical rate constant, but

rather a sigmoid. A similar sigmoid-like functional rela-
tion between these two variables can be observed in all
the other locations. This functional form can be explained
by explicitly modeling the main chemical reactions in-
volved in the formation and destruction of O3 , and their
Fig. 6. Feature importance for each observation, for Chiasso MOR.
temperature-dependent rate coefficients. The authors in
Pusede et al. (2014), Pusede, Steiner, and Cohen (2015)
modeled the interdependence of the O3 production rate,
Several chemical reactions and pathways influence the
NOx , and temperature. Initially, they modeled the main
chemistry of ozone formation and removal in the tro-
chemical tropospheric O3 formation processes. Then, they
posphere (Crutzen, Lawrence, & Pöschl, 1998; Lu, Zhang,
replaced the volatile organic compounds (VOC) reactivity
& Shen, 2019; Monks et al., 2015), such that imputing
temperature’s influence to just one of them is hardly with a functional relation learned from observations be-
satisfactory; other factors and processes must be consid- tween the latter and daily maximum temperature. They
ered. Photochemical processes are dominant factors in found a sigmoid-like influence between the daily maxi-
determining the ozone peak concentrations, together with mum temperature on O3 production, for concentrations
thermal and radiative conditions. Other more physical of NOx greater than 6 ppb. A similar sigmoid-like depen-
processes, such as local wind systems and diurnal bound- dence of O3 formation rates and the temperature was
ary layer dynamics, are of secondary importance in our found in Walcek and Yuan (1995), both considering or
study, since it focuses on the generally well-mixed diurnal disregarding a linear correlation between the maximum
boundary layer. We aim to forecast the peak concen- daily temperature and solar irradiance (Fig. 4a and Fig. 5
trations under well-mixed afternoon conditions, rather in the reference, respectively). In all these studies, the
than variations in ozone concentration within the diurnal authors kept the influence of other variables (e.g. VOC
cycle. We stress that some phenomena, like the role of concentration and irradiance) fixed, or treated them as
the effective height of the boundary layer for the air parameters. This is equivalent to performing a sensitivity
mass characteristics (precursor mixture, photochemical analysis on the rate of change of O3 , which is exactly
reactivity), determining the volume available to dilute the
the scope of the Shapley variables. In this sense, the
emissions, and the role of large-scale advection, are not
findings of the aforementioned authors are compatible
directly considered by the forecasting algorithms, and that
with Fig. 7. We stress that the sigmoid-like importance
our analysis is limited to the variables selected by the
of temperature in predicting O3 concentrations cannot be
feature selection procedure. However, since the tempera-
ture is causally correlated with the height of the boundary directly extracted from the raw data. To better explain
layer, we indirectly account for some of its effect by using this, we show in Fig. 8 a partial dependence plot of the
temperature as a predictor. Fig. 7 shows the Shapley val- measured O3 , that is, the target variable and the mea-
ues for the mean value of the temperatures forecasted in sured temperature. As we can see, the raw data show an
Sagno between 14:00 and 23:00, and the interaction with exponential-like relation, very similar to what was found
the mean forecasted temperature in Chiasso at 17:00. We in Walcek and Yuan (1995), Fig. 1a, where measurements
can observe how the influence of the forecasted tem- of maximum hourly O3 concentrations and temperatures
perature on the maximum O3 peak does not follow the from the New Jersey urban region are plotted.
980
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 9. Feature importance interaction between measured NO2 in

Chiasso and temperature forecasted in Sagno, on the MOR prediction
of ozone in Mendrisio.

Fig. 8. Partial dependence plot between instantaneously measured T

and O3 in Chiasso. Continuous lines are iso-density surfaces estimated
with a kernel density estimator.

7.2.2. NO2 importance interactions

Solar irradiance and NOx concentration play a major
role in the photochemistry of tropospheric ozone (Crutzen
et al., 1998; Walcek & Yuan, 1995). In Figs. 9 and 10, the
influence of the measured NO2 and its interaction with
forecasted temperature on the MOR prediction of O3 in
Bioggio and Sagno is shown. We can see for both cases
how the sigmoid-like NO2 importance becomes more
pronounced for increasing forecasted temperatures. This
means that at high concentrations of NO2 , higher tem- Fig. 10. Feature importance interaction between measured NO2 in
Bioggio and temperature forecasted in Sagno, on the MOR prediction
peratures accelerate the O3 formation rate. On the other of ozone in Bioggio.
hand, Figs. 9 and 10 suggest that at low concentrations
of NO2 , O3 generation from NO2 becomes increasingly
important with increasing temperature, with respect to
other O3 formation concurring processes. As such, low results are not directly comparable with those of the other
stations.
NO2 concentrations are more significant predictors of low
Table 7 shows that, for the same algorithm, the MOR
O3 concentrations at high ambient temperatures. These
results are generally better than the EVE counterpart,
conclusions cannot be explained with the same kind of
although the difference in some cases is slight. This is
analysis as that carried out in Pusede et al. (2015) for
not surprising, since the MOR predictions also use the
example, where only the O3 formation rate with respect
data gathered overnight, which are not available to the
to NOx and temperature was investigated, rather than its
EVE forecasters. However, there is an exception in Brione,
relative importance over other concurrent photochemical where the MAE statistic is slightly higher at MOR for all
pathways. the ensemble algorithms. The best results are obtained in
Bioggio, where both MOR and EVE forecasts have the best
7.3. Performance of the regression models KPIs across all the stations.
The relatively small differences between MOR and EVE
In Table 7 we present the results of our study with re- seem to indicate that the data gathered during the night
spect to the KPIs introduced in Section 6, obtained by first are not particularly important. In fact, as Table 6 shows,
applying the SHAP feature selection approach described in the three most important features for each model essen-
4.2. tially revolve around ozone measured during the peak
In general, the two algorithms that perform the best of the previous day and forecasted temperature. In all
overall are LSBoost and NGBoost, since most of the lowest models, much importance is given to the forecasts of the
MAE and RMSE values and the highest accuracy values upcoming afternoon and the measured ozone values of
are concentrated in these two algorithms. In the station the previous afternoon. Arguably, the NWP forecasts are
of Tesserete, nearly one-third of the total ozone measure- more precise in the morning, thus leading to more precise
ments in the summer of 2019 were unavailable, so the ozone predictions as well.
981
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Table 7
Main results of the study with Shapley values feature selection. Values in boldface indicate the lowest RMSE for the corresponding location.
BIO CHI MEN LOC BRI SAG TES
MOR EVE MOR EVE MOR EVE MOR EVE MOR EVE MOR EVE MOR EVE
MAE 13.81 15.28 15.20 15.82 15.70 16.10 14.08 14.43 14.52 14.71 14.88 15.37 13.64 15.03
RMSE 18.07 19.82 20.21 21.66 20.54 21.30 18.33 19.01 19.03 20.20 19.48 20.62 19.41 21.63
MAPE 10.74 11.83 11.13 11.69 12.35 12.65 11.78 12.36 12.49 12.80 11.93 12.25 12.08 13.82
RF
S 0.418 0.364 0.394 0.352 0.364 0.342 0.394 0.374 0.404 0.370 0.295 0.256 0.201 0.114
Accuracy 69.39 67.81 66.67 65.75 65.99 62.33 68.03 62.33 66.67 67.12 61.76 62.22 66.33 65.98
Q-score 4.987 5.396 5.452 5.765 5.590 5.749 5.133 5.239 5.208 5.364 5.259 5.662 4.494 4.961
MAE 13.46 13.96 14.50 15.56 14.66 16.69 14.20 14.30 14.25 14.50 15.08 14.59 13.37 14.86
RMSE 17.68 18.18 19.36 21.03 19.50 21.59 18.58 19.03 18.78 20.53 19.39 19.73 18.89 20.73
LSBoost MAPE 10.34 10.78 10.65 11.47 11.54 12.75 11.71 11.98 12.03 12.53 12.09 11.53 11.71 13.44
S 0.430 0.416 0.419 0.371 0.396 0.333 0.386 0.373 0.412 0.360 0.298 0.288 0.222 0.151
Accuracy 71.43 67.81 71.43 67.12 66.67 65.07 67.35 69.18 68.03 69.18 61.76 61.48 70.41 64.95
MAE 14.05 15.10 15.84 16.80 16.01 16.78 14.81 15.84 15.57 15.18 15.92 16.96 13.73 15.31
RMSE 18.58 19.58 20.87 22.04 21.45 22.23 18.85 22.21 19.58 21.31 20.25 22.44 19.33 21.27
XGBoost MAPE 10.91 11.73 11.77 12.30 12.35 13.05 12.17 13.83 13.17 13.19 12.66 13.73 11.92 13.77
S 0.401 0.371 0.374 0.341 0.336 0.314 0.377 0.268 0.387 0.335 0.266 0.190 0.204 0.129
Accuracy 68.71 73.29 70.07 67.12 65.31 65.75 66.67 70.55 68.03 65.07 61.76 56.30 65.31 64.95
MAE 13.85 14.83 15.04 16.23 14.84 16.12 14.96 14.02 15.18 15.09 19.57 19.77 19.34 21.76
RMSE 18.50 19.56 19.90 21.27 19.62 21.32 18.92 18.84 19.78 20.37 25.36 26.44 26.12 28.92
MAPE 10.73 11.32 11.06 11.94 11.68 12.54 12.42 12.06 12.83 13.08 16.93 16.86 18.37 20.90
NGBoost
S 0.404 0.372 0.403 0.364 0.393 0.342 0.375 0.379 0.381 0.365 0.082 0.046 −0.07 −0.18
Accuracy 68.71 69.18 69.39 65.75 65.99 66.44 61.90 67.12 65.31 69.18 47.79 50.37 59.18 52.58
Q-score 5.947 6.298 6.298 7.485 6.429 7.671 6.453 6.333 6.241 5.979 12.14 12.50 25.94 27.36
MAE 15.27 15.64 17.46 17.49 16.55 26.42 16.12 27.26 15.32 73.43 15.14 14.67 13.01 171.7
RMSE 21.67 20.32 23.03 23.37 21.74 109.9 20.84 144.2 20.41 693.6 19.78 19.99 19.03 945.3
LM MAPE 12.01 12.32 13.38 13.62 12.92 21.12 13.67 22.76 12.95 62.56 12.22 11.60 11.25 154.0
S 0.301 0.347 0.309 0.301 0.327 −2.39 0.311 −3.75 0.361 −20.6 0.284 0.279 0.217 −37.7
Accuracy 68.71 65.75 66.67 61.64 65.31 61.64 65.31 65.07 68.71 66.44 63.24 64.44 67.35 17.53
MAE 15.76 15.91 17.54 17.68 17.94 17.56 15.33 16.26 15.38 18.11 15.59 14.87 12.95 14.18
RMSE 21.13 20.43 22.99 23.58 23.22 22.89 20.18 21.07 20.73 23.82 20.11 20.32 18.83 20.64
LASSO MAPE 12.30 12.40 13.31 13.45 14.12 14.00 12.99 13.87 13.27 15.79 12.52 11.67 11.35 12.92
S 0.319 0.344 0.310 0.295 0.281 0.293 0.333 0.306 0.351 0.257 0.272 0.267 0.225 0.155
Accuracy 65.99 64.38 63.27 61.64 58.50 59.59 65.31 63.70 68.03 62.33 58.82 63.70 65.31 64.95
MAE 15.23 15.63 17.46 17.47 16.51 26.42 16.12 27.26 15.16 71.56 15.13 14.67 12.97 168.5
RMSE 21.61 20.32 23.03 23.32 21.70 109.9 20.84 144.2 20.23 670.5 19.77 19.99 19.00 914.6
Ridge MAPE 11.97 12.31 13.38 13.59 12.88 21.12 13.67 22.76 12.83 60.96 12.22 11.61 11.21 151.1
S 0.304 0.347 0.309 0.302 0.328 −2.39 0.311 −3.75 0.367 −19.9 0.284 0.279 0.218 −36.4
Accuracy 68.71 66.44 66.67 61.64 65.31 61.64 65.31 65.07 69.39 65.75 64.71 64.44 67.35 16.49
MAE 17.57 16.87 17.12 17.72 16.57 18.05 14.8 14.74 16.3 14.94 16.25 14.38 12.89 13.32
RMSE 23.34 22.28 23.21 22.77 21.95 23.42 20.39 19.79 22.76 19.73 21.35 19.07 19.17 20.88
ARIMAX MAPE 13.42 13.05 12.92 13.26 12.44 13.94 12.33 12.43 13.41 12.81 13.09 11.44 10.89 11.6
S 0.25 0.28 0.3 0.32 0.32 0.28 0.32 0.35 0.29 0.38 0.21 0.29 0.71 0.69
Accuracy 60.54 63.7 63.95 61.38 63.95 57.24 63.95 68.49 65.31 69.18 56.46 63.7 53.06 58.9

Figs. 11, 12, and 13 graphically illustrate the results For this reason, we decided to prompt the predictor to
for the best (Bioggio MOR), average (Chiasso MOR), and focus more on high concentrations, in our case classes 4,
worst (Mendrisio EVE) cases. Each figure is composed of 5, and 6 as defined in Eq. (14), by introducing weighted
four plots. The top one shows a comparison between the training. We assigned different weights to the observa-
main forecasting algorithms and the measured values. The
tions depending on the severity class they were in. We
second plot shows the prediction intervals at levels 20%,
40%, 60%, and 80% issued by the RF algorithm as well found that it was beneficial to assign a weight w1 ∈
as the RF prediction. The third and fourth plots further [20, 200] to observations in class 6, w2 ∈ [20, 200] to
investigate the goodness of fit of the quantiles of the RF observations in class 5, w3 ∈ [10, 20] to observations in
and NGBoost algorithms. class 4, and a fixed weight of 1 to all the other observa-
tions. The key idea is to find the optimal set of weights
7.4. High peaks prediction w1 , w2 , w3 for each location and case with improved fore-
casting quality at high concentrations. The same set of
The analysis presented so far focused on the dataset
weights is used during feature selection with SHAP and
in its entirety, aiming to provide the best KPIs over all
the data, independently from the air pollution severity NGBoost and to train the prediction algorithm with the
class. However, when predicting ozone concentrations, it selected features. Applying weights during feature selec-
is generally more important to be able to correctly predict tion should help select the most important features to
high concentrations, as they can pose a health risk. recognize the highest ozone concentrations.
982
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 11. Result plots of the station yielding the best results, Bioggio MOR.

Fig. 12. Result plots of a station yielding average results, Chiasso MOR.

983
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Fig. 13. Result plots of the station yielding the worst results, Mendrisio EVE.

We sought to increase the classification accuracy of iso-lines are obtained with cubic interpolation. It is diffi-
three particular subsets of our data using weighted train- cult to infer which weights give the best results, especially
ing, by evaluating the best combination of weights to when trying to maximize the accuracy of observations
optimize prediction accuracy for the following: above 135 µg/m3 , but for the other two cases, a high w2
a and variable w1 give the best weights for observations
• Values in classes 4, 5, and 6 (O3 > 135 µg/m3 ) above 180 µg/m3 , whereas a high w1 and a low w2
• Values in classes 5 and 6 (O3 > 180 µg/m3 ) appear to be the best combination of weights for correctly
• Values in class 6 (O3 > 240 µg/m3 ) predicting observations above 240 µg/m3 .
We analyzed the four stations with the highest number Table 9 shows the results of weighted training for the
of extreme measurements in 2019: Bioggio, Mendrisio, considered stations. We report the KPIs and the three sets
of weights that gave the best accuracy for each fraction
Locarno, and Chiasso. All these stations registered at least
of the dataset, compared to the results obtained in the
one event of class 6 and many events of class 5, as shown
case where no weights are applied. We can see that the
in Table 8. In particular, Chiasso registered 27 measure-
introduction of weights does not unduly affect the KPIs,
ments above 180 µg/m3 , of which four were above 240
and in fact improve them in some cases. For Chiasso
µg/m3 . In contrast to what we observed with unweighted
MOR, we also show in Fig. 15 the complete confusion
training, we noticed that when using weighted train-
matrices obtained. It can be seen that when actively trying
ing, increasing the number of selected features above 30
to enhance the recognition of observations above 135
improved the prediction accuracy at high ozone concen- µg/m3 , the correct classification of these values increased
trations. Therefore, we decided to increase to 100 the from about 50%–75% to about 80%–85%. Similarly, for
number of features that the algorithm can use to perform observations above 180 µg/m3 the correct recognition
its prediction. We used SHAP as the feature selection rate increases from 50%–70% to 80%–90% in all stations
method and NGBoost as the regressor. We calculated the but Locarno, where it stops at 70%. Finally, in Mendrisio
KPIs of the models for each combination of the weights and Chiasso, we could correctly predict all values above
w with w1 , w2 ∈ {20, 40, 60, . . . , 200}, w3 ∈ {10, 20}. 240 µg/m3 , which was not achieved in the unweighted
Fig. 14 shows the aggregated accuracy of the prediction analysis. This is not the case for Bioggio and Locarno,
for the three different classes of interest. Figs. 14(a) and where the only class 6 value is never recognized.
14(b) illustrate the distribution of the accuracy when
considering only the ozone measured values above 135 8. Conclusions
µg/m3 . Similarly, Figs. 14(c), 14(d), and 14(e), 14(f) show
the accuracy when restricting ourselves only to values In this study, we forecasted the day-ahead maximum
above 180 and 240 µg/m3 , respectively. The continuous ground-level ozone concentration during the summer of
984
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Table 8
Number of particular events registered in the analyzed stations in 2019.
Station Values above Values above Values above
135 µg/m3 cl. [4, 5, 6] 180 µg/m3 cl. [5, 6] 240 µg/m3 cl. 6
Chiasso 46 23 4
Bioggio 47 17 1
Mendrisio 70 22 2
Locarno 41 9 1

Fig. 14. Plot of the results of the weights grid search with 100 selected features, aggregated across the stations of Chiasso, Bioggio, Mendrisio and
Locarno.

Fig. 15. Confidence matrices for Chiasso MOR.

2019 in seven localities in southern Switzerland, using a Our analysis showed that gradient boosting
physics-agnostic, data-driven approach. Due to the high algorithms, and in particular least-squares boosting and
number of signals potentially affecting the predictions, natural gradient boosting, consistently outperformed the
we performed preliminary feature selection using two other tested forecasting methods. Where possible, we
methods, which we compared. The selected features were
further compared our results with those of other papers,
then used to train different state-of-the-art forecasting
and we were able to conclude that our results are similar
algorithms. Analyzing feature importance interactions us-
ing Shapley values suggested that the models trained to previous analysis and, in some cases, even better.
through our learning pipeline effectively learned explana- We then evaluated the effect of weighted training
tory cross-dependencies among atmospheric variables de- to increase the accuracy of predictions for high ozone
scribed in the ozone photochemistry literature. concentrations. Our analysis showed that this method is
985
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Table 9
Results of the weighted analysis.
Weights Target class accuracy KPIs
w1 w2 w3 cl. [4, 5, 6] cl. [5, 6] cl. 6 MAE RMSE Tot. Acc.
No weights – – – 72.72 61.11 0.00 13.28 17.66 70.95
Max. accuracy classes [4, 5, 6] 140 80 20 85.07 77.78 0.00 13.03 17.52 73.68
BIO MOR
Max. accuracy classes [5, 6] 120 200 10 79.10 88.89 0.00 13.61 18.25 71.05
Max. accuracy class 6 180 100 10 77.61 72.22 0.00 12.62 16.18 70.39
No weights – – – 69.69 61.11 0.00 14.22 18.50 68.71
Max. accuracy classes [4, 5, 6] 40 100 20 84.84 77.78 0.00 14.59 18.47 73.51
BIO EVE
Max. accuracy classes [5, 6] 60 180 10 83.33 83.33 0.00 15.08 19.27 69.53
Max. accuracy class 6 80 40 10 83.33 72.22 0.00 14.24 18.31 73.50
No weights – – – 72.60 55.56 0.00 14.61 18.79 71.52
Max. accuracy classes [4, 5, 6] 180 20 10 80.82 74.07 75.00 13.75 19.54 75.50
CHI MOR
Max. accuracy classes [5, 6] 140 200 20 79.45 81.48 50.00 15.51 21.33 70.86
Max. accuracy class 6 200 40 10 75.34 70.37 100.00 13.53 19.14 71.52
No weights – – – 65.75 48.15 0.00 15.28 20.43 67.35
Max. accuracy classes [4, 5, 6] 80 20 20 82.19 74.07 75.00 14.90 19.96 71.52
CHI EVE
Max. accuracy classes [5, 6] 200 200 10 75.34 85.19 75.00 15.26 20.13 73.51
Max. accuracy class 6 180 20 20 75.34 70.37 75.00 15.26 20.09 67.55
No weights – – – 77.14 68.18 50.00 14.69 19.49 71.62
Max. accuracy classes [4, 5, 6] 120 40 20 84.29 81.82 100.00 14.17 19.02 71.43
MEN MOR
Max. accuracy classes [5, 6] 180 60 10 80.00 86.36 100.00 14.56 19.40 67.35
Max. accuracy class 6 200 40 10 77.14 81.82 100.00 14.34 19.23 69.39
No weights – – – 74.28 72.72 50.00 15.29 19.38 68.02
Max. accuracy classes [4, 5, 6] 120 120 20 82.86 81.82 50.00 16.13 21.13 67.81
MEN EVE
Max. accuracy classes [5, 6] 200 60 10 74.29 90.91 100.00 16.53 21.06 64.38
Max. accuracy class 6 180 40 10 81.43 81.82 100.00 15.54 19.75 67.81
No weights – – – 54.90 40.00 0.00 14.12 18.45 64.86
Max. accuracy classes [4, 5, 6] 180 100 20 80.39 60.00 0.00 14.43 18.52 72.11
LOC MOR
Max. accuracy classes [5, 6] 40 140 10 72.55 70.00 0.00 15.06 19.65 68.03
Max. accuracy class 6 60 180 10 70.59 70.00 0.00 13.92 17.97 70.07
No weights – – – 52.94 40.00 0.00 14.09 18.84 65.99
Max. accuracy classes [4, 5, 6] 40 80 20 80.39 50.00 0.00 14.37 18.61 69.18
LOC EVE
Max. accuracy classes [5, 6] 60 200 10 76.47 70.00 0.00 14.71 19.83 67.12
Max. accuracy class 6 100 40 10 66.67 50.00 0.00 13.93 17.75 67.81

feasible, as it increases forecast accuracy without com- the steering committee of the world congress in computer science,
promising overall forecast quality. Future directions for computer engineering and applied computing (No. x) (pp. 148–154).
this work include the formulation of probabilistic tech- Bentzien, S., & Friederichs, P. (2014). Decomposition and graphical
portrayal of the quantile score. Quarterly Journal of the Royal
niques for robust estimations of annual ozone concentra- Meteorological Society, 140(683), 1924–1934. http://dx.doi.org/10.
tion peaks, which are the most difficult events to predict, 1002/qj.2284.
due to their scarcity in the training set. In this view, train- Calvert, J. G., Orlando, J. J., Stockwell, W. R., & Wallington, T. J. (2015).
ing forecasters with ad-hoc-generated adversarial exam- The mechanisms of reactions influencing atmospheric ozone. Oxford
ples could result in a better forecast of the conditional University Press.
Cameletti, M., Lindgren, F., Simpson, D., & Rue, H. (2013). Spatio-
probability distributions.
temporal modeling of particulate matter concentration through
the SPDE approach. AStA. Advances in Statistical Analysis, 97(2),
Declaration of competing interest 109–131. http://dx.doi.org/10.1007/s10182-012-0196-3, URL: http:
//link.springer.com/10.1007/s10182-012-0196-3.
The authors declare that they have no known com- Carrillo, R. E., Leblanc, M., Schubnel, B., Langou, R., Topfel, C., & Alet, P.-
peting financial interests or personal relationships that J. (2020). High-resolution PV forecasting from imperfect data: A
graph-based solution. Energies, 13(21), 5763. http://dx.doi.org/10.
could have appeared to influence the work reported in
3390/en13215763.
this paper. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting
system. In Proceedings of the 22nd ACM SIGKDD international con-
Acknowledgment ference on knowledge discovery and data mining (pp. 785–794).
http://dx.doi.org/10.1145/2939672.2939785.
This work was funded by the Environment Observatory Crutzen, P. J., Lawrence, M. G., & Pöschl, U. (1998). On the background
photochemistry of tropospheric ozone. Tellus, Series B (Chemical and
of Southern Switzerland (OASI, www.ti.ch/oasi) of the
Physical Meteorology), 0889, http://dx.doi.org/10.3402/tellusb.v51i1.
Department of Territory of Canton Ticino (DT). 16264.
d-maps-1 (2020a). Map of Europe. d-maps.com, https://d-maps.com/
References carte.php?num_car=2232&lang=en. (Accessed 24 April 2020).
d-maps-2 (2020b). Map of Canton Ticino. d-maps.com, https://d-maps.
Al Abri, E. S., Edirisinghe, E. A., Nawadha, A., & Kingdom, U. (2015). com/carte.php?num_car=10350&lang=en. (Accessed 24 April 2020).
Modelling ground-level ozone concentration using ensemble learn- Dale, D. J. (2006). Statistical comparisons of classifiers over multiple
ing algorithms. In International conference on data mining (DMIN). data sets. Journal of Machine Learning Research, 7(1), 1–30.

986
D. Marvin, L. Nespoli, D. Strepparava et al. International Journal of Forecasting 38 (2022) 970–987

Duan, T., Avati, A., Ding, D. Y., Thai, K. K., Basu, S., Ng, A. Y., et al. (2019). Monks, P. S., Archibald, A. T., Colette, A., Cooper, O., Coyle, M., Der-
NGBoost: Natural Gradient Boosting for Probabilistic Prediction. went, R., et al. (2015). Tropospheric ozone and its precursors from
http://arxiv.org/abs/1910.03225. the urban to the global scale from air quality to short-lived climate
Eslami, E., Choi, Y., Lops, Y., & Sayeed, A. (2019). A real-time hourly forcer. Atmospheric Chemistry and Physics, 15(15), 8889–8973. http:
ozone prediction system using deep convolutional neural network. //dx.doi.org/10.5194/acp-15-8889-2015.
Neural Computing and Applications, 0123456789, 8–11. http://dx.doi. Pinson, P., McSharry, P., & Madsen, H. (2010). Reliability diagrams
org/10.1007/s00521-019-04282-x, https://doi.org/10.1007/s00521- for non-parametric density forecasts of continuous variables: Ac-
019-04282-x. counting for serial correlation. Quarterly Journal of the Royal
Feng, Y., Zhang, W., Sun, D., & Zhang, L. (2011). Ozone concen-
Meteorological Society, 136(646), 77–90. http://dx.doi.org/10.1002/
tration forecast method based on genetic algorithm optimized
qj.559.
back propagation neural networks and support vector machine
Pusede, S. E., Gentner, D. R., Wooldridge, P. J., Browne, E. C., Rollins, A.
data classification. Atmospheric Enviroment, 45(11), 1979–1985.
W., Min, K. E., et al. (2014). On the temperature dependence of or-
http://dx.doi.org/10.1016/j.atmosenv.2011.01.022, http://dx.doi.org/
10.1016/j.atmosenv.2011.01.022. ganic reactivity, nitrogen oxides, ozone production, and the impact
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring of emission controls in San Joaquin Valley, California. Atmospheric
rules, prediction, and estimation. Journal of the American Sta- Chemistry and Physics, 14(7), 3373–3395. http://dx.doi.org/10.5194/
tistical Association, 102(477), 359–378. http://dx.doi.org/10.1198/ acp-14-3373-2014.
016214506000001437. Pusede, S. E., Steiner, A. L., & Cohen, R. C. (2015). Temperature and re-
Gong, B., & Ordieres-Meré, J. (2016). Prediction of daily maximum cent trends in the chemistry of continental surface ozone. Chemical
ozone threshold exceedances by preprocessing and ensemble Reviews, 115(10), 3898–3918. http://dx.doi.org/10.1021/cr5006815.
artificial intelligence techniques. Environmental Modelling & Soft- Sheta, A., Faris, H., Rodan, A., Kovač-Andrić, E., & Al-Zoubi, A. M. (2018).
ware, 84(C), 290–303. http://dx.doi.org/10.1016/j.envsoft.2016.06. Cycle reservoir with regular jumps for forecasting ozone concentra-
020, https://doi.org/10.1016/j.envsoft.2016.06.020. tions: Two real cases from the east of Croatia: CRJ for forecasting
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of ozone concentrations. Air Quality, Atmosphere and Health, 11(5),
statistical learning. Elements, 1, 337–387. http://dx.doi.org/10.1007/ 559–569. http://dx.doi.org/10.1007/s11869-018-0561-9.
b94608, URL: http://www.springerlink.com/index/10.1007/b94608. Siwek, K., & Osowski, S. (2016). Data mining methods for prediction
Hollander, M., & Wolfe, D. (1999). A volume in the wiley series in prob- of air pollution. International Journal of Applied Mathematics and
ability and mathematical statistics, Nonparametric statistical methods Computer Science, 26(2), 467–478. http://dx.doi.org/10.1515/amcs-
(2nd ed.).
2016-0033.
Kourentzes, N., Svetunkov, I., & Schaer, O. (2020). https://github.com/
Stewart, D. R., Saunders, E., Perea, R. A., Fitzgerald, R., Campbell, D.
trnnick/tsutils/.
E., & Stockwell, W. R. (2017). Linking air quality and human
Kupilik, M., & Witmer, F. (2018). Spatio-temporal violent event pre-
health effects models: An application to the Los Angeles air
diction using Gaussian process regression. Journal of Computational
Social Science, 1(2), 437–451. http://dx.doi.org/10.1007/s42001-018- basin. Environmental Health Insights, 11, http://dx.doi.org/10.1177/
0024-y, https://doi.org/10.1007/s42001-018-0024-y. 1178630217737551.
Lu, X., Zhang, L., & Shen, L. (2019). Meteorology and climate influences Swiss Society of Air Protection Officers (2019). Indice de
on tropospheric ozone: a review of natural sources, chemistry, pollution de l’air à court terme IPC. (in French). URL:
and transport patterns. Current Pollution Reports, 5(4), 238–260. https://cerclair.ch/assets/pdf/27a_2019_08_28_F_Indice_de_
http://dx.doi.org/10.1007/s40726-019-00118-3. pollution_de_lair_court_terme.pdf.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., The Swiss Federal Council (1985). Ordinance on Air Pollution Control
et al. (2020). From local explanations to global understanding with (OAPC). URL: https://www.admin.ch/opc/en/classified-compilation/
explainable ai for trees. Nature Machine Intelligence, http://dx.doi. 19850321/index.html.
org/10.1038/s42256-019-0138-9. Tibshirani, R. (1996). Regression shrinkage and selection via the
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpret- Lasso. Journal of the Royal Statistical Society. Series B. Statistical
ing model predictions. Advances in Neural Information Processing Methodology, 58(1), 267–288.
Systems. Walcek, C. J., & Yuan, H. H. (1995). Calculated influence of temperature-
Lv, B., Cobourn, W. G., & Bai, Y. (2016). Development of nonlinear related factors on ozone formation rates in the lower tropo-
empirical models to forecast daily PM2.5 and ozone levels in sphere. Journal of Applied Meteorology, http://dx.doi.org/10.1175/
three large Chinese cities. Atmospheric Enviroment, 147, 209–223.
1520-0450(1995)034<1056:CIOTRF>2.0.CO;2.
http://dx.doi.org/10.1016/j.atmosenv.2016.10.003, http://dx.doi.org/
World Health Organization (2003). Health aspects of air pollu-
10.1016/j.atmosenv.2016.10.003.
tion with particulate matter, ozone and nitrogen dioxide. Report
Meinshausen, N. (2014). Quantile regression forests. Journal of Ma-
on a WHO working group (p. 95). Bonn, Germany: Regional
chine Learning Research, 131, 65–78. http://dx.doi.org/10.1016/j.
jmva.2014.06.005. Office for Europe, http://www.euro.who.int/_data/assets/pdf_file/
Mohan, S., & Saranya, P. (2019). A novel bagging ensemble approach 0005/112199/E79097.pdf.
for predicting summertime ground-level ozone concentration.
Journal of the Air and Waste Management Association, 69(2), 220–
233. http://dx.doi.org/10.1080/10962247.2018.1534701, https://doi.
org/10.1080/10962247.2018.1534701.

987

Chapter 9 Applications of Trigonometry
0% (1)
Chapter 9 Applications of Trigonometry
8 pages
Keys For The Determination of The Agaricales - Rolf SInger
No ratings yet
Keys For The Determination of The Agaricales - Rolf SInger
68 pages
Unit Plan - Scale and Measurement & Indigenous Knowledge
No ratings yet
Unit Plan - Scale and Measurement & Indigenous Knowledge
11 pages
A Machine Learning Based Ensemble Model For Estimating 2024 Science of The
No ratings yet
A Machine Learning Based Ensemble Model For Estimating 2024 Science of The
16 pages
RP5
No ratings yet
RP5
9 pages
Science of The Total Environment
No ratings yet
Science of The Total Environment
11 pages
A Deep Learning Model Integrating a Wind Direction-based Dynamic Graph Network for Ozone Prediction
No ratings yet
A Deep Learning Model Integrating a Wind Direction-based Dynamic Graph Network for Ozone Prediction
15 pages
Meteorological Applications - 2013 - Ebert - Progress and Challenges in Forecast Verification
No ratings yet
Meteorological Applications - 2013 - Ebert - Progress and Challenges in Forecast Verification
10 pages
Forecasting of The Daily Meteorological Pollution Using Wavelets and Support Vector Machine
No ratings yet
Forecasting of The Daily Meteorological Pollution Using Wavelets and Support Vector Machine
11 pages
Predicting Weather Forecaste Uncertainty With Machine Learning
No ratings yet
Predicting Weather Forecaste Uncertainty With Machine Learning
17 pages
A High-Performance Convolutional Neural Network for Ground-Level Ozone Estimation in Eastern China
No ratings yet
A High-Performance Convolutional Neural Network for Ground-Level Ozone Estimation in Eastern China
17 pages
Short-Term Air Quality Prediction Using A Case-Based Classifier
No ratings yet
Short-Term Air Quality Prediction Using A Case-Based Classifier
10 pages
Probabilistic Forecasting For Extreme NO2 Pollution Episodes
No ratings yet
Probabilistic Forecasting For Extreme NO2 Pollution Episodes
8 pages
IJDKP
No ratings yet
IJDKP
18 pages
A Comparative Study On Air Quality Analysis by SVM K - Means and Naive Bayes Algorithms
No ratings yet
A Comparative Study On Air Quality Analysis by SVM K - Means and Naive Bayes Algorithms
17 pages
The Weather Forecast Using Data Mining Research Based On Cloud Computing
No ratings yet
The Weather Forecast Using Data Mining Research Based On Cloud Computing
7 pages
19.0 Neural Network Model For The Prediction of PM10 Daily Concentrations in Two Sites in The Western Mediterranean
No ratings yet
19.0 Neural Network Model For The Prediction of PM10 Daily Concentrations in Two Sites in The Western Mediterranean
9 pages
1 s2.0 S1470160X24010665 Main
No ratings yet
1 s2.0 S1470160X24010665 Main
16 pages
Uncertainty and Sensitive Analysis of Environmental Model For Risk
No ratings yet
Uncertainty and Sensitive Analysis of Environmental Model For Risk
7 pages
10 1016@j Jenvrad 2016 05 027
No ratings yet
10 1016@j Jenvrad 2016 05 027
10 pages
2021 - IntelliO3-ts v1.0 A Neural Network Approach To Predict Near-Surface Ozone Concentrations in
No ratings yet
2021 - IntelliO3-ts v1.0 A Neural Network Approach To Predict Near-Surface Ozone Concentrations in
25 pages
Xia2021 Compressed
No ratings yet
Xia2021 Compressed
13 pages
Prediction of Hourly O3 Concentrations Using Support Ve - 2010 - Atmospheric Env
No ratings yet
Prediction of Hourly O3 Concentrations Using Support Ve - 2010 - Atmospheric Env
8 pages
Geoscience Frontiers: Ehsan Olyaie, Hamid Zare Abyaneh, Ali Danandeh Mehr
No ratings yet
Geoscience Frontiers: Ehsan Olyaie, Hamid Zare Abyaneh, Ali Danandeh Mehr
11 pages
Air Population Components Estimation in Silk Board Bangalore, India
No ratings yet
Air Population Components Estimation in Silk Board Bangalore, India
7 pages
Urban Ozone Concentration Forecasting With Artificial Neural Network in Corsica
No ratings yet
Urban Ozone Concentration Forecasting With Artificial Neural Network in Corsica
8 pages
Forecasting of Air Quality Using PCR
No ratings yet
Forecasting of Air Quality Using PCR
9 pages
Learning Surface Ozone From Satellite Columns LESO A Regional Daily Estimation Framework For Surface Ozone Monitoring in China
No ratings yet
Learning Surface Ozone From Satellite Columns LESO A Regional Daily Estimation Framework For Surface Ozone Monitoring in China
11 pages
Chemosphere: 2.5 Yun Bai, Bo Zeng, Chuan Li, Jin Zhang
No ratings yet
Chemosphere: 2.5 Yun Bai, Bo Zeng, Chuan Li, Jin Zhang
9 pages
Neat 2.0
No ratings yet
Neat 2.0
18 pages
IJERA Paper PDF
No ratings yet
IJERA Paper PDF
13 pages
Modeling of Ambient Air Pollutants Throu
No ratings yet
Modeling of Ambient Air Pollutants Throu
4 pages
1-s2.0-S1309104215301082-main
No ratings yet
1-s2.0-S1309104215301082-main
12 pages
Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017 a machine learning approach
No ratings yet
Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017 a machine learning approach
12 pages
ma2021randomforestespacial
No ratings yet
ma2021randomforestespacial
9 pages
Air Quality Prediction Using Machine Learning Algorithms
100% (1)
Air Quality Prediction Using Machine Learning Algorithms
4 pages
Science of The Total Environment
No ratings yet
Science of The Total Environment
12 pages
Deep Learning Based Anomaly Detection Approach For Air Pollution Assessment
No ratings yet
Deep Learning Based Anomaly Detection Approach For Air Pollution Assessment
12 pages
Prediction of Air Pollution in Smart Cities Using Machine Learning Techniques
No ratings yet
Prediction of Air Pollution in Smart Cities Using Machine Learning Techniques
7 pages
An LSTM Based Aggregated Model For Air Pollutio 2020 Atmospheric Pollution R
No ratings yet
An LSTM Based Aggregated Model For Air Pollutio 2020 Atmospheric Pollution R
13 pages
2008waf2222143 1
No ratings yet
2008waf2222143 1
12 pages
Air Pollution Monitoring System Based On Geosensor
No ratings yet
Air Pollution Monitoring System Based On Geosensor
6 pages
Constructing A Pollen Proxy From Low-Cost Optical Particle Counter (OPC) Data Processed With Neural Networks and Random Forests
No ratings yet
Constructing A Pollen Proxy From Low-Cost Optical Particle Counter (OPC) Data Processed With Neural Networks and Random Forests
13 pages
A-decision-making-model-for-flood-warning-system-based-o_2019_Journal-of-Hyd
No ratings yet
A-decision-making-model-for-flood-warning-system-based-o_2019_Journal-of-Hyd
13 pages
25.0 Identification of Significant Factors For Air Pollution Levels Using A Neural Network Based Knowledge Discovery System.
No ratings yet
25.0 Identification of Significant Factors For Air Pollution Levels Using A Neural Network Based Knowledge Discovery System.
8 pages
Detecting Trends in Forest Disturbance and Recovery Using Yearly Landsat Time Series Kennedy PDF
No ratings yet
Detecting Trends in Forest Disturbance and Recovery Using Yearly Landsat Time Series Kennedy PDF
14 pages
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
No ratings yet
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
15 pages
v3 Issue2 4
No ratings yet
v3 Issue2 4
7 pages
RESEARCH - On Aqi
No ratings yet
RESEARCH - On Aqi
8 pages
6
No ratings yet
6
20 pages
Comp. Calidad Del Aire en Italia, Alemania y Polonia Utiliz. Índices - Articulo Cientifico
No ratings yet
Comp. Calidad Del Aire en Italia, Alemania y Polonia Utiliz. Índices - Articulo Cientifico
13 pages
Prior Knowledge-Based Retrieval and Validation of Information From Remote-Sensing Data at Various Scales
No ratings yet
Prior Knowledge-Based Retrieval and Validation of Information From Remote-Sensing Data at Various Scales
10 pages
Meteorological Applications - 2021 - Hu - Deep Learning Based Precipitation Bias Correction Approach For Yin He Global
No ratings yet
Meteorological Applications - 2021 - Hu - Deep Learning Based Precipitation Bias Correction Approach For Yin He Global
14 pages
Acp 22 1293 2022
No ratings yet
Acp 22 1293 2022
17 pages
McKercher Et Al - 2017 - Env Pollution - Characteristics and Applications of Small, Portable Gaseous Air Pollution Monitors
No ratings yet
McKercher Et Al - 2017 - Env Pollution - Characteristics and Applications of Small, Portable Gaseous Air Pollution Monitors
9 pages
Citation 8
No ratings yet
Citation 8
11 pages
Global Aerosol Models
No ratings yet
Global Aerosol Models
30 pages
Nitrogenoarroz
No ratings yet
Nitrogenoarroz
12 pages
09-AnalysisoftheAerosolOpticalDepthandtheAirQualityinQingdaoChina
No ratings yet
09-AnalysisoftheAerosolOpticalDepthandtheAirQualityinQingdaoChina
8 pages
Neural Network
No ratings yet
Neural Network
13 pages
1 s2.0 S2352484715000177 Main
No ratings yet
1 s2.0 S2352484715000177 Main
9 pages
AiCareBreath_IoT-Enabled_Location-Invariant_Novel_Unified_Model_for_Predicting_Air_Pollutants_to_Avoid_Related_Respiratory_Disease
No ratings yet
AiCareBreath_IoT-Enabled_Location-Invariant_Novel_Unified_Model_for_Predicting_Air_Pollutants_to_Avoid_Related_Respiratory_Disease
9 pages
Mass Spectrometry: Techniques and Applications
From Everand
Mass Spectrometry: Techniques and Applications
Anshul Pandey
No ratings yet
Film Review: Name: Yosi Tiara Sari Class: Xii Ipa 6
No ratings yet
Film Review: Name: Yosi Tiara Sari Class: Xii Ipa 6
5 pages
Ascorbic Acid Drug Study
75% (4)
Ascorbic Acid Drug Study
1 page
2025 All Island Toplist - Paper 19
No ratings yet
2025 All Island Toplist - Paper 19
4 pages
Managed Detection and Response (MDR) : Solution Brief
No ratings yet
Managed Detection and Response (MDR) : Solution Brief
2 pages
Lesson Plan 3
No ratings yet
Lesson Plan 3
2 pages
A Case Study On Procter & Gamble Inc
100% (1)
A Case Study On Procter & Gamble Inc
12 pages
Patentability of Biotechnology in India
No ratings yet
Patentability of Biotechnology in India
21 pages
Paraphrasing Techniques
No ratings yet
Paraphrasing Techniques
18 pages
Narayan Deorao Javle (Deceased) Vs Krishna (2021) - Equity of Redemption Is A Right Which Is Subsidiary To The
No ratings yet
Narayan Deorao Javle (Deceased) Vs Krishna (2021) - Equity of Redemption Is A Right Which Is Subsidiary To The
4 pages
Multi Degree of Freedom Systems - 1
No ratings yet
Multi Degree of Freedom Systems - 1
46 pages
1000 English Words Book
No ratings yet
1000 English Words Book
66 pages
Job Description HSE Personel
No ratings yet
Job Description HSE Personel
6 pages
12 Abm 2 SGBM - Module No. 4 Business Finance
No ratings yet
12 Abm 2 SGBM - Module No. 4 Business Finance
5 pages
Elijah Rock: Arr. Parker C. Watson
No ratings yet
Elijah Rock: Arr. Parker C. Watson
13 pages
detail-in-contemporary-hotel-design
No ratings yet
detail-in-contemporary-hotel-design
431 pages
Course Take Away
No ratings yet
Course Take Away
2 pages
Principality of Ulek - Royaume Cotier Allié Proche de Keoland de 3E - Living Greyhawk Gazetteer-3
No ratings yet
Principality of Ulek - Royaume Cotier Allié Proche de Keoland de 3E - Living Greyhawk Gazetteer-3
3 pages
Faq Recurring Bill Payments
No ratings yet
Faq Recurring Bill Payments
6 pages
DC Online Maps and Data
No ratings yet
DC Online Maps and Data
115 pages
11 Descartes Group 7 Assessing The Public Speaking Proficiency of Grade 11 STEM Students
No ratings yet
11 Descartes Group 7 Assessing The Public Speaking Proficiency of Grade 11 STEM Students
16 pages
Readings Unpacking The Self
No ratings yet
Readings Unpacking The Self
31 pages
TLE 10 Cookery Q3Eclass Record 2023 2024
No ratings yet
TLE 10 Cookery Q3Eclass Record 2023 2024
8 pages
Pds PNP
No ratings yet
Pds PNP
7 pages
"HATRED Ultralite Frame Board" Instruction Manual: Software Revision 3
No ratings yet
"HATRED Ultralite Frame Board" Instruction Manual: Software Revision 3
2 pages
ignou equality
No ratings yet
ignou equality
36 pages
Saep 1626
No ratings yet
Saep 1626
13 pages
Retuning The Plates of A - Cello - CAS
0% (1)
Retuning The Plates of A - Cello - CAS
1 page

A Data-driven Approach to Forecasting Ground-level Ozone Concentration

Uploaded by

A Data-driven Approach to Forecasting Ground-level Ozone Concentration

Uploaded by

International Journal of Forecasting 38 (2022) 970–987

Contents lists available at ScienceDirect

International Journal of Forecasting

A data-driven approach to forecasting ground-level ozone

1. Introduction and motivations concentration of a given day by performing the prediction

The large number of signals and their high granularity

Fig. 3. Boxplots for the computational time as a function of an increas-

7.1. Feature number and feature selection methods compar-

performing a preliminary feature selection using Shapley Table 6

shap library introduced in Section 4.2, using the NGBoost MOR

algorithm as the f (xtr , Θ ) regression model. We thus an-

relevant features for the considered locations. For five of LOC p l p

feature was a past value of ozone concentration. The MOR

Fig. 7. Feature importance interaction between the mean of forecasted

typical exponential form of a chemical rate constant, but

Fig. 9. Feature importance interaction between measured NO2 in

Fig. 8. Partial dependence plot between instantaneously measured T

7.2.2. NO2 importance interactions

Fig. 15. Confidence matrices for Chiasso MOR.

You might also like