0% found this document useful (0 votes)
19 views15 pages

A Deep Learning Model Integrating a Wind Direction-based Dynamic Graph Network for Ozone Prediction

The study presents a novel deep learning model, WDDSTG-Net, for predicting hourly ozone concentrations using a dynamic graph network that incorporates wind direction data. This model outperforms traditional static models by effectively capturing evolving spatial relationships and integrating meteorological predictions, achieving a mean absolute error of 6.69 μg/m3 for 1-hour predictions. The findings highlight the significance of dynamic modeling in improving ozone prediction accuracy and its potential application for other airborne pollutants.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

A Deep Learning Model Integrating a Wind Direction-based Dynamic Graph Network for Ozone Prediction

The study presents a novel deep learning model, WDDSTG-Net, for predicting hourly ozone concentrations using a dynamic graph network that incorporates wind direction data. This model outperforms traditional static models by effectively capturing evolving spatial relationships and integrating meteorological predictions, achieving a mean absolute error of 6.69 μg/m3 for 1-hour predictions. The findings highlight the significance of dynamic modeling in improving ozone prediction accuracy and its potential application for other airborne pollutants.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Science of the Total Environment 946 (2024) 174229

Contents lists available at ScienceDirect

Science of the Total Environment


journal homepage: www.elsevier.com/locate/scitotenv

A deep learning model integrating a wind direction-based dynamic graph


network for ozone prediction
Shiyi Wang a, 1 , Yiming Sun a, 1 , Haonan Gu a , Xiaoyong Cao a, b , Yao Shi a , Yi He a, b, c, *
a
College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
b
Institute of Zhejiang University-Quzhou, Quzhou 324000, China
c
Department of Chemical Engineering, University of Washington, Seattle 98915, USA

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• The WDDSTG-Net uses changing wind


direction data to construct dynamic
directed graph.
• The WDDSTG-Net uses the graph atten­
tion mechanism to assign dynamic
weights and aggregate information.
• Simultaneous prediction of meteorolog­
ical features improves ozone prediction
performance.
• The WDDSTG-Net exhibits better per­
formance than other models including
the static model.

A R T I C L E I N F O A B S T R A C T

Editor: Meng Gao Ozone pollution is an important environmental issue in many countries. Accurate forecasting of ozone con­
centration enables relevant authorities to enact timely policies to mitigate adverse impacts. This study develops a
Keywords: novel hybrid deep learning model, named wind direction-based dynamic spatio-temporal graph network
Ozone prediction (WDDSTG-Net), for hourly ozone concentration prediction. The model uses a dynamic directed graph structure
Dynamic graph structure
based on hourly changing wind direction data to capture evolving spatial relationships between air quality
Graph neural network
monitoring stations. It applied the graph attention mechanism to compute dynamic weights between connected
Deep learning
stations, thereby aggregating neighborhood information adaptively. For temporal modeling, it utilized a
sequence-to-sequence model with attention mechanism to extract long-range temporal dependencies. Addi­
tionally, it integrated meteorological predictions to guide the ozone forecasting. The model achieves a mean
absolute error of 6.69 μg/m3 and 18.63 μg/m3 for 1-h prediction and 24-h prediction, outperforming several
classic models. The model’s IAQI accuracy predictions at all stations are above 75 %, with a maximum of 81.74
%. It also exhibits strong capabilities in predicting severe ozone pollution events, with a 24-h true positive rate of
0.77. Compared to traditional static graph models, WDDSTG-Net demonstrates the importance of incorporating

* Corresponding author at: College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China.
E-mail address: [email protected] (Y. He).
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.scitotenv.2024.174229
Received 5 February 2024; Received in revised form 11 June 2024; Accepted 21 June 2024
Available online 23 June 2024
0048-9697/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Wang et al. Science of the Total Environment 946 (2024) 174229

short-term wind fluctuations and transport dynamics for data-driven air quality modeling. In principle, it may
serve as an effective data-driven approach for the concentration prediction of other airborne pollutants.

1. Introduction LSTM model, utilizing CNNs and LSTMs to extract spatial and tempo­
ral dependencies, respectively. They used data from 12 meteorological
In recent decades, ozone pollution has emerged as an urgent envi­ sites and air quality monitoring sites in Beijing as model inputs to predict
ronmental problem in many countries (Gao et al., 2017). Prolonged the next day’s 8-h average O3 concentration. The results showed that the
exposure to elevated ozone concentrations has been associated with root mean square error (RMSE) was reduced by approximately 35 %
adverse cardiovascular and respiratory impacts, along with decreased compared to the LSTM model (Pak et al., 2018). Wang et al. adopted a
crop yields (Cao et al., 2020; Wang et al., 2017a). Since 2013, China has DNN and an attention-based Seq2Seq model for spatial modeling of
established >1300 air quality monitoring stations, enabling real-time geographic information and temporal modeling of historical pollutant
monitoring of major air pollutants (Wang et al., 2022a). With the and meteorological information, respectively, to predict the 24-h O3
continuous efforts of the government, five of the six major air pollutants concentration in Beijing. The results demonstrated that the mean ab­
(PM2.5, PM10, SO2, NO2, and CO) have been controlled to a certain solute error (MAE) was reduced by at least 5.8 % relative to the
extent. However, tropospheric ozone concentrations persistently remain compared models (Wang et al., 2020a). Although the above models
high (Chen et al., 2020; Fan et al., 2020; Maji et al., 2019). Therefore, it based on CNNs-RNNs can extract the spatiotemporal dependencies be­
is necessary to develop reliable methods for predicting ozone concen­ tween historical data from monitoring stations, there are still two sig­
tration so that timely measures can be taken to mitigate its pollution. nificant problems to address. Firstly, RNN variants have certain
Ozone prediction has long been a challenging task, owing to the limitations in processing long-term input data (Hao et al., 2019). Sec­
complex photochemical reaction mechanism involved in O3 formation, ondly, CNN is only suitable for Euclidean space, but the monitoring
the intricate non-linear relationships between O3 and its precursors NOx network belongs to non-Euclidean space and has complex topological
and volatile organic compounds (VOCs), and the involvement of mul­ relationships. Using CNNs may result in the loss of topological infor­
tiple meteorological conditions in the reaction process. Consequently, mation and a decrease in prediction accuracy (Wei et al., 2020; Zhang
O3 exhibits complex dynamic spatiotemporal processes. Currently, et al., 2020).
models for ozone prediction are mainly divided into two types: nu­ With the continuous development of artificial intelligence technol­
merical driven models and data-driven models. Numerical driven ogy, it has achieved milestone results in many fields, and a variety of
models, which have widely been used in air pollutant prediction for the emerging model methods can learn key information from various types
past few decades, include the Community Multi-scale Air Quality of data, thus promoting the development of scientific research(Xu et al.,
(CMAQ) model (Foley et al., 2010) and the Weather Research and 2023). Graph neural networks (GNN) have emerged as a novel model for
Forecasting Model with Chemistry (WRF-Chem) (Chuang et al., 2011; extracting and mapping intricate topological relationships in non-
Wang et al., 2020b). However, numerical models heavily depend on Euclidean spaces (Lin et al., 2018; Ouyang et al., 2021; Qi et al., 2019;
parameter settings, and their accuracy in prediction work can be greatly Wu et al., 2021). Yu et al. proposed the Graph Interpolation Attention
influenced by the incompleteness of the dataset and insufficient un­ Recursive Network (GinAR), which employs interpolation attention and
derstanding of ozone-related chemical formation mechanisms (Chen adaptive graph convolution to accurately reconstruct spatiotemporal
et al., 2021; Wang et al., 2020a; Zhan et al., 2018). Data-driven models dependencies, thereby achieving prediction and advancing the devel­
can achieve predictions without substantial prior knowledge, which can opment of GNN to a certain extent(Yu et al., 2024). Liu et al. utilized
be divided into three subcategories: statistical models, shallow machine GCN-LSTM and GCN-GRU as main predictors and employed a Q-
learning (SML) models, and deep learning (DL) models (Zhang et al., learning algorithm to realize the ensemble of the two predictors, ulti­
2022). Statistical models, such as autoregressive moving average mately developing the GCN-LSTM-GRU-Q model, which demonstrated
(ARMA) model (Zhu and Lu, 2016), autoregressive integrated moving excellent performance in pollutant forecasting(Liu et al., 2021). Wu
average (ARIMA) model (Wang et al., 2017b), multiple linear regression et al. employed a residual neural network (ResNet) to adaptively learn
(MLR) model (Chelani, 2019), can capture linear relationships but the deep spatial correlations among monitoring sites, utilized a graph
cannot handle complex nonlinear problems. Considering inherent convolutional network (GCN) to capture the topological information of
nonlinear mapping capabilities, SML models such as random forest (RF) the entire monitoring site network, and used a BiLSTM to extract the
(Feng et al., 2019; Song et al., 2021), extreme gradient boosting temporal correlations of auxiliary information and meteorological data.
(XGBoost) (Ma et al., 2020), and support vector machine (SVM) (He They achieved hourly O3 concentration prediction for Shanghai,
et al., 2018), have experienced exponential development. However, as demonstrating improved prediction performance compared to other
the volume and dimensions of data have increased substantially, SML models (Wu et al., 2023). In the current GNN-based research, the spatial
models have struggled to perform effectively in handling enormous relationships among graph nodes are computed and constructed based
datasets and high dimensions. As an emerging form of SML models, DL on distances between stations, so the graph structure remains static. In
models have quickly become an efficient way to process complex multi- ozone prediction, ozone is more likely to spread downwind and more
dimensional data and have exhibited state-of-the-art performances in air difficult to diffuse upwind, leading to anisotropic ozone behavior under
prediction (Kim et al., 2021; Li et al., 2019), such as recurrent neural wind effects. Considering the horizontal transport and diffusion caused
network (RNN) (Freeman et al., 2018), long short-term memory recur­ by wind speed and wind direction, ozone concentration will be
rent network (LSTM) (Li et al., 2017), gated recurrent network (GRU) dynamically influenced by the transportation of pollutant from neigh­
(Cheng et al., 2021), sequence-to-sequence (Seq2Seq) (Cho et al., 2014) boring stations. Therefore, it is necessary to construct dynamic graph
and convolutional neural network (CNN) (Eslami et al., 2020). Spatio­ structure based on changeable wind direction instead of static graph
temporal hybrid models can extract both temporal and spatial features structure for dynamically modeling the monitoring network, which is
(Chen et al., 2022; Dun et al., 2022; Le et al., 2020; Mao et al., 2022; theoretically possible to improve the prediction performance. However,
Wang et al., 2022a, 2022b, 2022c), have also achieved favorable pre­ to the best of our knowledge, no studies have considered dynamic graph
diction performance. Hu et al. constructed a CNN-BiLSTM-GRU model to structure in ozone prediction.
forecast the hourly ozone concentration at stations in Beijing. The pre­ To this end, a novel spatiotemporal hybrid deep learning model
dictive performance of the model was superior to that of standalone called wind direction-based dynamic spatio-temporal graph network
models such as BiLSTM (Hu et al., 2023). Pak et al. employed a CNN- (WDDSTG-Net) was proposed to predict hourly ozone concentration.

2
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 1. The study area and geographical distribution of monitoring stations.

Specifically, in the spatial feature extraction module, a dynamic directed because of the proximity of air quality monitoring stations, making
graph structure was constructed for the monitoring network based on meteorological data from nearby stations highly similar. Therefore, the
changeable wind direction to achieve dynamic evolution processing and Xiaoshan meteorological station is selected to represent the meteoro­
calculated the dynamic weights between stations based on the graph logical features of the entire study area.
attention mechanism, which allowed for comprehensive extraction and
aggregation of dynamic spatial neighborhood information. For temporal 2.2. Data preprocessing process
modeling, it utilized sequence-to-sequence with attention to focus more
on the crucial information and extract long-range dependencies by 2.2.1. Outlier processing
assigning different weights to different moments when processing time Outliers can arise due to various reasons, including machine abnor­
series. Additionally, meteorological predictions were incorporated to malities, sudden power outages, or manual counting errors. Boxplots of
guide the forecasting. each pollutant were used to remove outliers to prevent adverse effects.
Boxplots typically include five values, from bottom to top are the lower
2. Materials and method limit, the first quartile, the median, the third quartile, and the upper
limit. Data above the upper limit or below the lower limit are considered
2.1. Study area and dataset description outliers and are recognized as null values.

Hangzhou was selected as the study area in this work. Hangzhou 2.2.2. Missing value processing
ranges from 29.18◦ to 30.55◦ N and 118.35◦ to 120.50◦ E. It locates in Different strategies were used based on the duration of continuous
East China, downstream of the Qiantang River, and northern Zhejiang. data gaps for missing value processing. If the value is missing continu­
Fig. 1 shows the geographical location of the study area and the distri­ ously for <3 h, linear interpolation is used. If the value is missing
bution of monitoring stations, including 10 air quality monitoring sta­ continuously for >3 h, cubic spline interpolation is used. The whole
tions and 1 meteorological monitoring station. The latitude and interpolation process ensures that the interpolated data is between the
longitude coordinates of each station are shown in Table S1. In 2012, upper threshold and the lower threshold.
after the Ministry of Environmental Protection of the People’s Republic
of China (MEP) published new air quality standards, various cities across 2.2.3. Normalization
the country gradually established environmental monitoring stations. Due to significant differences in the significance and numerical
The air pollution dataset was derived from the China National Envi­ ranges of different features, normalization is essential before inputting
ronmental Monitoring Center (CNEMC) for the 10 stations from 2016 to into the model. The normalized data accelerates model convergence and
2022. Additionally, the meteorological dataset at the same period is improves model prediction performance. This study uses linear
obtained from the U.S. National Climate Data Center. The specific data normalization, mapping the data into a numerical space ranging from
format and data description of the two datasets are detailed in Table S2. 0 to 1, the formula is as follows (1):
Additionally, histograms which represent the data distribution of key x − xmin
pollutant features (Fig. S1) and meteorological features (Fig. S2) are xscaled = (1)
xmax − xmin
provided, thereby facilitating a visual inspection of the distribution of
the original data. It should be noted that the Zhaohuiwuqu station has no where xscaled is the normalized value, x is the original data, xmin is the
data record since January 1, 2021, with an hourly ozone concentration minimum value in the original data, xmax is the maximum value in the
missing rate as high as 31.3 %. Therefore, Zhaohuiwuqu station is not original data.
considered. For meteorological data, there is only one meteorological
station in Hangzhou. In fact, it is common that there are no matching
meteorological stations near many air quality monitoring stations. It is

3
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 2. The proposed model WDDSTG-Net architecture diagram.

2.3. Data analysis and variable selection neighboring stations.

2.3.1. Spatiotemporal correlation analysis 2.3.2. Input variable selection


As an example, the concentration trends of six air pollutants over the The selection of input variables represents a crucial prerequisite for
past five years at Binjiang station are plotted (Fig. S3). It shows that the reflecting the performance of the model. The performance of the model
ozone concentration exhibits a distinct periodicity over the yearly depends on the input of the most effective information and the least
dimension. As a secondary pollutant, ozone is generated through redundant information. Insufficient input variables result in a decline in
photochemical reactions under sunlight and higher temperatures. the model’s generalization performance and a sharp decrease in pre­
Therefore, sunlight and temperature will indirectly determine the ozone diction performance due to the absence of considerable relevant infor­
concentration in the atmosphere, resulting in a certain daily cyclic trend mation. Conversely, excessive input variables lead to a complex model
in ozone concentration, along with distinct seasonal characteristics. The structure, introducing substantial redundant information and noise,
autocorrelation curve for ozone concentration is shown in Fig. S4. Peaks thereby adversely affecting the model and increasing the risk of over­
appear at lag times such as 24 h and 48 h, and the autocorrelation co­ fitting. Therefore, selecting variables with higher correlation with ozone
efficient is >0.5, which also proves the cyclical trend of daily changes in series as inputs and excluding irrelevant input variables are expected to
ozone. In addition, the weekend effect also needs to be considered. Ni­ obtain the optimal combination of input variables, which achieves the
trogen oxides are important precursors for the formation of ozone, and purpose of maximizing the efficiency of saving computational resources
significantly impact ozone concentration. Compared to weekdays, and maximizing the performance of the model. The hourly concentra­
human activities are more diverse and extensive during weekends. tions were selected as the concentration feature of each pollutant to
Increased travel during this period indirectly leads to higher emissions avoid multicollinearity. Furthermore, considering the highly nonlinear
of nitrogen oxides, subsequently resulting in a sustained increase in characteristics of the ozone generation process (Hong et al., 2023),
ozone concentration. Based on the above analysis, auxiliary features are mutual information (MI) was employed for subsequent feature selection.
incorporated into the original dataset, including hour, day, day_of_week MI is suitable for complex nonlinear data and can mine the correlation
(day of the week), weekend (1 for yes, 0 for no), month, season, and and nonlinear relationships between variables, so it is widely used in
year. research on predicting air quality (Russo et al., 2013). The formula of MI
To capture the variation trends of ozone concentration across mul­ is as follows:
tiple stations over a period, correlation analysis is employed to analyze ∫ ∫ ( )
p(x, y)
the degree of correlation between variables. Pearson correlation analysis I(x, y) = p(x, y)log dxdy (3)
p(x) • p(y)
is suitable for correlation analysis between continuous variables. The x y

formula of the Pearson correlation coefficient is as follows (2):


where x and y represent two continuous random variables, p(x, y) rep­
cov(A, B) E[(A − μA )(B − μB ) ] resents the joint probability density function of x and y, p(x) and p(y) are
ρA,B = = (2)
σA • σB σA • σB the marginal probability density functions of x and y respectively, I(x, y)
is the MI. According to the above formula, the histogram between the
where A and B represent two variable sequences, σ A and σ B denote the hourly ozone concentration and other input variables is shown in
standard deviations of A and B, μA and μB are the means of A and Fig. S6. As observed in the graph, variables such as “day_of_week” and
B, cov(A, B) denotes the covariance between A and B, E denotes the “weekend” exhibit MI values almost close to zero. This implies a weak
mean. The correlation of ozone concentration between monitoring sta­ correlation, which indicates their insignificance and should be elimi­
tions was calculated based on the formula (Fig. S5). It is evident that nated. Consequently, following the variable selection process, 18 fea­
ozone has a strong spatial correlation between different monitoring tures, including the ozone concentration, was used as the input sequence
stations and the ozone concentration at a specific station is closely for each monitoring station.
related to other stations. Through the above analysis, it is evident that
ozone has a stronger diffusion transport effect between stations, and also
proves that it is necessary to consider potential influences from

4
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 3. Two dynamic directed graph construction strategies. (a) Cartesian directed graph. (b) Angle-based directed graph.

2.4. Architecture of the proposed model LSTM is as follows:


it = σ(Whi ht− 1 + Wxi xt + bi ) (5)
Fig. 2 shows the detailed architecture of WDDSTG-Net. The model
mainly consists of four main parts: (1) Meteorological prediction mod­ ( )
ft = σ Whf ht− 1 + Wxf xt + bf (6)
ule. A variant of the RNN structure LSTM is used for prediction of
meteorological features. (2) Spatial feature extraction module. Based on
ot = σ (Who ht− 1 + Wxo xt + bo ) (7)
hourly changing wind direction data from the meteorological station,
the dynamic directed graph structure of the monitoring network is (
ct = ft ⊙ ct− 1 + it ⊙ tanh Whg ht− 1 + Wxg xt + bg
)
(8)
constructed, and dynamic neighborhood information and topological
information are extracted and integrated to achieve dynamic evolution ht = ot ⊙ tanh(ct ) (9)
processing. (3) Temporal feature extraction module. The above output is
integrated into the Seq2Seq model with an attention mechanism to where it , ft , ot represent the input gate, forget gate, and output gate at
further extract the temporal information and long-term dependence time t, respectively. ct and ht respectively denote the cell state and
features. (4) Fusion module. The output of the extracted temporal and hidden state, σ and tanh are activation functions, ⊙ represents Hada­
spatial dependence is combined with predicted meteorological features, mard product, Whi 、Wxi 、Whf 、Wxf 、Who 、Wxo 、Whg 、Wxg are train­
and input to the fully connected layer to achieve the final output. These able weight matrices, bi , bf , bo , bg are the bias terms.
components form the core architecture of the WDDSTG-Net model, Since there is only one meteorological station in the study area, only
enabling it to effectively capture and integrate spatiotemporal de­ the temporal dependencies of meteorological features need to be
pendencies for enhanced prediction performance. extracted. This is exactly what the RNN-based models are good at, using
t− r∼t
meteorological data Xm to predict weather data in the future.
2.4.1. Problem definition In this module, in order to ensure the accuracy of meteorological
WDDSTG-Net was used to predict ozone concentration based on prediction, three kinds of meteorological prediction strategies were
relevant input variables such as air quality data, meteorological data, proposed in advance (Fig. S7), and the multi-model strategy was
and auxiliary data. Assuming that the current time is t, the predicted selected to forecast meteorological conditions based on specific pre-
[ ]
result can be defined as Y = yt+1 , yt+2 , yt+3 , …, yt+n , representing n experimental results (Fig. S8). The multi-model strategy requires
prediction steps from t + 1 to t + n, and the historical time series used for training multiple models, each of which is used to predict a meteoro­
prediction ranges from t - r to t. More specifically, the input can be logical feature, and output the predictions for all time steps at once.
divided into historical air quality data Xat− r∼t , historical meteorological
t− r∼t
data Xm ,and historical auxiliary data Xet− r∼t . Additionally, since it 2.4.3. Spatial feature extraction module
involves constructing graph structures, the inputs should also contain Ozone has obvious regional characteristics, showing spatial depen­
the historical graph structures Gt− r∼t . The proposed model structure can dence. The degree of spatial dependence between monitoring stations is
be simply considered as a mapping function F(⋅), thus the entire problem different and shows strong variation and instability over time. The
can be understood as follows: technical regulation for selection of ambient air quality monitoring
( ) stations (on trial) dictates the placement of air quality stations (https://
F Xat− r∼t , Xmt− r∼t , Xet− r∼t , Gt− r∼t = Y (4)
www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201309/t20130925_2
60810.htm). Overall, these stations exhibit an irregular spatial distri­
2.4.2. Meteorological prediction module bution, representing a typical non-euclidean spatial data pattern. There
Meteorological factors (such as temperature, etc.) are considered to are limitations when using traditional CNN-based models to process this
be important factors that affect ozone concentration (Baklanov et al., type of data, whereas graph neural networks can effectively extract
2008). Therefore, this module intends to guide the ozone prediction complex topological relationships within the monitoring network.
performance of the entire model through prediction of future meteoro­ The primary key to using graph neural networks is to convert regular
logical factors. As a variant of RNN, LSTM can reduce the risk of van­ sequence data into a graph topology structure that the model can handle
ishing gradient to some extent. It mines long-term dependencies in effectively. For a given moment, the internal relationships within
historical data through three internal gated recurrent units (input gate, monitoring network can be abstractly represented as graph structure,
forget gate and output gate), and can flexibly and effectively decide [ ]
denoted as Gt = (Vt , Et ), where Vt = v1t , v2t , …, vnt is a series of nodes
whether to forget the old state or accept new input. The formula for

5
S. Wang et al. Science of the Total Environment 946 (2024) 174229

representing air quality monitoring stations at time t, n is the total ( )


180
number of monitoring stations, Et represents the set of all edges between βc,a = β* + 360 %360 (18)
connected nodes indicating the adjacency relationships between moni­ π
toring stations at time t. It is noteworthy that it refers to a directed path
( ) where % represents the modulus operation, atan2() is used to calculate
starting from node i and ending at node j for a specific edge v1i , v2j ∈ Et . the angle between the line from point (x, y) in the two-dimensional
Therefore, graph structures at each moment denoted as coordinate system to (0,0) and the positive half of the x-axis. Its value
[Gt− r , Gt− r+1 , Gt− r+2 , …, Gt ]. range is [ − π , π], hence requiring the conversion of radians to degrees to
Two strategies are considered to construct dynamic directed graphs ensure the azimuth angle βc,a falls within the range of [0,360◦ ]. θ rep­
based on the wind direction, which are called the cartesian directed resents the angle between the wind direction and the directed path be­
graph and the angle-based directed graph. The target station A is used as tween stations. If θ falls within the range of − 90◦ to 90◦ , the wind
an example to elaborate on the construction of the spatial relationship. direction can be decomposed into components parallel to the directed
For the cartesian directed graph, as illustrated in Fig. 3(a), if the wind path between stations and components perpendicular to this path. In this
direction at the previous moment was northeasterly, a two-dimensional situation, station C is considered to influence the target station An under
cartesian coordinate system is established with station A as the central the influence of wind, demonstrating a spatial correlation that a directed
point. The wind direction is decomposed into two wind components: a edge from C to A should be constructed.
horizontal component aligned with the east and a vertical component ⎧
⎨ exists, − 90 < θ < 90
◦ ◦

aligned with the north. Consequently, only station E located in the first
< vc , va >= (19)
quadrant theoretically will not exert an influence on the target station A, ⎩
not exists, otherwise
and stations in other quadrants have the potential to influence station A.
Therefore, directed paths are created from all stations except station E where θ can be calculated from the wind direction angle α and the azi­
towards station A, enabling the construction of the spatial relationship
muth angle βc,a . The conditions of θ ∈ ( − 90 , 90 ) can be converted into
◦ ◦

network of the target station A at this moment through this strategy.


the following three formulas. If any of these conditions is satisfied, it is
For the angle-based directed graph, the wind direction angle is the
considered that a directed edge exists:
angle between the wind direction and the due north, which denotes the
angle of rotation from the due north to the wind direction in a clockwise βc,a + 360 − α < 90 (20)
direction. In Fig. 3(b), the wind direction angle is denoted as α, which
corresponds to the wind direction provided in the dataset, with values − 90 < βc,a − α < 90 (21)
ranging between [0,360◦ ]. Taking any other station in the space as a
starting point (station C is taken as an example) and station A as an α + 360 − βc,a < 90 (22)
endpoint, a directed line segment is drawn, representing the azimuth
Through the above steps, it can be determined whether a directed
angle between the stations, denoted as βc,a . The calculation of the angle
path exists from station C to station A. However, further calculations and
βc,a is based on the latitude and longitude coordinates of the two sta­
( ) assessments are required to determine if directed paths exist from all
tions. Assuming the coordinates of station A and station C are xa , ya stations except A to the target station A. In the example of Fig. 3(b),
( ) ( )
and xc , yc , where xi , yi represent the latitude and longitude of the station D and station E do not meet the conditions of directed edges, so
station. Initially, the coordinates of both stations need to be converted they are ignored. This process achieves the construction of the spatial
into radians. relationship network, represented as an angle-based directed graph, for
station A at this moment.
rxa = xa *π/180 (10)
Based on the above two strategies, the specific spatial relationship
/ needs to be calculated when other stations are target stations (the same
rya = ya *π 180 (11)
calculation process as station A). Through the above steps, two different
rxc = xc *π/180 (12) graph structures can be obtained, which essentially represent the adja­
cency matrix in the graph neural network to reflect the spatial connec­
/
ryc = yc *π 180 (13) tion relationship of the entire monitoring system. Since the wind
direction data is in hourly intervals, which varies at each moment, it is
where rxa and rya represent the radian expressions of the latitude and essential to conduct the graph construction process at all moments. This
longitude of station A, respectively, and rxc and ryc represent the radian implies that each piece of data corresponds to a unique graph structure,
expressions of the latitude and longitude of station C, respectively. The demonstrating the dynamic characteristic of directed graphs. Ulti­
calculation is then performed according to the following equation. mately, the construction of dynamic directed graph structure is realized.
For a target station, the spatial dependences on different stations at
Δr = ryc − rya (14) the same time is different, and the spatial dependences on the same
station at different times is also different. This indicates the dynamic
y = sinΔr − rya (15) characteristic of spatial correlations among monitoring stations. To
obtain dynamic dependency weights among connected stations in a
x = cosrxa *sinrxc − sinrxa *cosrxc *cosΔr (16) dynamic directed graph, the graph attention mechanism is used for
calculation. The graph attention mechanism quantifies the influence of
where Δr represents the difference in longitude between the starting
different stations on the target station, facilitating the aggregation of
point C and the endpoint A, x and y are respectively the horizontal and
features from connected stations and updating the internal state of the
vertical coordinates of the two-dimensional coordinate system after
target station. Assuming the current time is t, each station within the
transformation. After this transformation, the magnitude of the azimuth
spatial domain has its unique features xti ∈ RM , where M represents the
angle is altered to the angle between (x, y) and the positive half of the x-
total number of input features. For station i, the correlation coefficients
axis in the two-dimensional space.
of the neighboring stations are calculated in the graph structure at time
β = atan2(y, x) (17) t.
[ ⃒ ]

eti,j = aT Wxti ∣⃒Wxtj , j ∈ Nti (23)

6
S. Wang et al. Science of the Total Environment 946 (2024) 174229

where eti,j represents the correlation coefficient between station i and the decoder to adaptively select the hidden states generated in the
2Mʹ
encoder, and models the dynamic temporal dependences of the source
station j at time t, a ∈ R is a single-layer feedforward neural network, time series and the generated sequence. This process, combined with
W ∈ RM ×M is a trainable weight matrix enabling a linear transformation
ʹ
weather data predicted in the meteorological prediction module, leads
from input feature M to output feature M’, Nti represents the set of all to the output. As an example, the future time tʹ is taken to introduce the
stations connected to station i at time t, || represents the connection calculation process of attention.
operation. To facilitate easier computation and comparison of attention
coefficients for the same station, the softmax function is employed for etʹt = vTe tanh(W1 stʹ− 1 + W2 ht + b) (27)
regularization.
exp(etʹt )
( ( )) αtʹt = ∑r (28)
exp LeakyReLU eti,j t=1 exp(etʹt )
t
αi,j = ∑ ( ( )) (24)
k∈Nt exp LeakyReLU ei,k
t
i where W1 , W2 , ve and b are trainable parameters, r represents the input
time step, etʹt represents the correlation between the previous decoder
where LeakyReLU(⋅) represents an activation function, and the obtained hidden state stʹ− 1 and the encoder’s hidden state ht at time t,
αti,j represents the degree of influence of neighbor station j on target and αtʹt represents the normalized relevance weight derived from etʹt .
station i at time t. The context vector at time tʹ can be obtained by the weighted summation
The above process is applied to all neighboring stations of the target of αtʹ .
station to obtain their respective regularized attention coefficients. ∑l
Subsequently, these coefficients are used to aggregate all neighborhood ctʹ = t=1
αtʹt ht (29)
spatial features, followed by concatenation with the target station’s
Further, the next hidden state and the output of this time step can be
initial information to facilitate the fusion process. This results in the
calculated.
update of the target station’s internal features.
⎛ ⎛ ⎞⃦ ⎞ stʹ = GRU(stʹ− 1 , mtʹ , ctʹ ) (30)
∑ ⃦
⃦ t
cti = ⎝FC⎝ αti,j • Wxtj ⎠ ⃦⃦ xi
⎠ (25) ytʹ = tanh(FC(stʹ , mtʹ , ctʹ ) ) (31)
t
j∈Ni ⃦
where mtʹ represents the meteorological data predicted by the meteo­
where cti represents the updated state feature of node i after utilizing the
rological prediction module at time tʹ, and ytʹ represents the output at
graph attention mechanism to aggregate neighborhood information. time tʹ.
Since the node states (variable feature states) of the neighboring stations
and the target station change at each time step, the attention weight 2.4.5. Fusion module
coefficients will also vary according to the node states of the stations, After extracting spatial and temporal dependencies from the time
which reflect the dynamism of each station. This process effectively series, the output is concatenated with meteorological data output by
distinguishes the degree of influence of each station and reflects the the meteorological prediction module. The concatenated data is then
dynamic influence from all neighboring stations. These weights are input into a fully connected layer:
utilized to aggregate neighborhood information and update target sta­
tion states, allowing to adaptively extract dynamically varying spatial yi = δ(W[O1 : O2 ] + b ) (32)
characteristics.
where W and b are trainable parameters,: represents the concatenation
2.4.4. Temporal feature extraction module operation, δ is a non − linear activation function, O1 and O2 represent
The Seq2Seq with attention mechanism is used for temporal the outputs of the meteorological prediction module and the time
modeling. GRU is chosen as the decoder, which can also avoid gradient feature extraction module, respectively. Finally, through the collabo­
disappearance but uses smaller parameters compared to LSTM. Bi-LSTM ration and guidance of the above modules, the predicted hourly ozone
is selected as the encoder. As an extension of LSTM, Bi-LSTM uses two concentrations for the target station are obtained as a sequence yi =
[ i i ]
LSTM networks to process time sequences in both forward and backward y1 , y2 , …, yin .
directions. This allows the model to effectively leverage information
from both forward and backward simultaneously, demonstrating 2.5. Process details
exceptional performance in simulating long-term sequences with tem­
poral dependencies (Bahdanau et al., 2016). For Bi-LSTM, the hidden Based on the specific model architecture of WDDSTG-Net described

state obtained from the forward LSTM cell is the forward hidden state ht , above, the detailed procedure for ozone prediction is summarized
while the hidden state obtained from the backward LSTM cell is referred (Fig. S9). Generally, it contains the following parts.

to as the backward hidden state ht . The concatenation of these two (1) Data collection and preprocessing. Collected meteorological data
components constitutes the entire hidden state of the Bi-LSTM at time t, and air quality data in Hangzhou from 2016 to 2022. Boxplots
denoted as: were used to set manual thresholds for outlier removal, filled in
[→ ← ]
data according to the duration of continuous missing data, and
ht = ht , ht (26)
segmented the dataset to construct input-output pairs for pre­
After using a dynamic graph neural network to extract spatial de­ diction, completing the data cleansing process.
pendencies from time series, the sequence corresponding to the target (2) Feature selection. Using data analysis techniques such as auto­
station is concatenated and input into the Bi-LSTM as the encoder. In the correlation analysis, Pearson correlation analysis, and mutual
encoder, the main work is to iteratively extract the temporal charac­ information to select the combinations of input features.
teristics of the input sequence, compressing spatiotemporal information (3) Model training and parameter tuning. The WDDSTG-Net model
into an intermediate variable s0 , while also generating hidden states at was constructed, using the training set for model training. The
each historical time step. The GRU, as the decoder, receives the inter­ hyperparameters in the model were fine-tuned using data from
mediate variable s0 as the initial input, uses the attention mechanism in the validation set to achieve the optimal model structure.

7
S. Wang et al. Science of the Total Environment 946 (2024) 174229

(4) The optimally trained model was tested using the data from the where IAQIpred represents the predicted IAQI for ozone, IAQItrue repre­
testing set. The results were compared with other models based sents the true IAQI for ozone. If the above conditions are not met, it is
on various evaluation metrics to complete the ozone prediction. considered as an inaccurate prediction. Furthermore, based on the
magnitude relationship between IAQIpred and IAQItrue , two types of
2.6. Experimental settings inaccuracies are further categorized: overestimation or underestimation
of IAQI. The accuracy, underestimation rate, and overestimation rate are
Pytorch is used as deep learning framework to build the models. The calculated using the following formula, where k represents the number
time range of data in the dataset is from 0:00 on January 1, 2016, to of accurate predictions, underestimated predictions and overestimated
23:00 on December 31, 2022. The dataset is divided into 70 % for predictions respectively, and N represents the total number of samples.
training, 15 % for validation, and 15 % for testing to ensure that each
k
subset contains at least one complete year of data. The input time step is Percent = × 100% (38)
N
set to 24, which is optimized for better model performance (Fig. S10).
The learning rate is 0.00001(Fig. S11), and the batchsize is set to 64 To further analyze the predictive ability of the model for high ozone
(Table S3). Models are trained for 200 epochs and the early stopping concentrations, three evaluation metrics are introduced for an in-depth
strategy is used to prevent overfitting and improve training efficiency evaluation. These metrics are the true positive rate (TPR), false accep­
(patience is set to 5). Adaptive Moment Estimation(Adam) is selected to tance rate (FAR), and false positive rate (FPR). When using these met­
be the optimizer and the HuberLoss is selected as the loss function. rics, a threshold needs to be set. Values above the threshold are regarded
Three model evaluation metrics, including root mean square error as positive samples of high ozone concentration, while values below the
(RMSE), mean absolute error (MAE), and the coefficient of determina­ threshold are considered negative samples of low ozone concentration.
tion (R2), are used to evaluate the ozone prediction performance of TPR, also known as recall rate, reflects the ability of the model to
models. The specific formulas are as follows: correctly predict positive samples, representing the proportion of
correctly predicted positive samples in the total positive samples. FAR,
n
1∑ also known as false alarm rate, reflects the model’s tendency to incor­
MAE = |Obsi − Prei | (33)
n i=1 rectly identify predictions as positive samples, representing the pro­
portion of negative samples incorrectly predicted as positive samples
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1 ∑ n among all predicted positive samples. FPR, also known as the false
RMSE = (Obsi − Prei )2 (34) recognition rate, reflects the ability of the model to incorrectly predict
n i=1
negative samples, signifying the proportion of negative samples incor­
n
∑ rectly predicted as positive samples among all actual negative samples.
(Obsi − Prei )2 The relevant formulas are shown below:
i=1
R = 1−
2
n (35)
∑ TP
(Obsi − Obsi )2 TPR = 1 − Ratemiss = (39)
i=1 (TP + FN)

where Obsi is the observed value of hourly ozone concentration, Prei is FP


FAR = 1 − Rateprecision = (40)
the predicted value of hourly ozone concentration, n represents the (TP + FP)
number of samples in the test set, and Obsi represents the average value
of Obsi . FP
FPR = (41)
To further assess the level of ozone pollution in practical, the indi­ (FP + TN)
vidual air quality index(IAQI) is computed and evaluate its accuracy to
conduct a secondary evaluation of model to analyze the model’s ability where TP represents true positive, indicating samples correctly pre­
to solve real-world ozone pollution. The IAQI represents the air quality dicted as positive by the model, TN represents true negative, indicating
score for individual pollutants and is calculated based on the technical samples correctly predicted as negative, FP represents false positive,
regulation on ambient air quality index(https://www.mee.gov.cn/ indicating negative samples incorrectly predicted as positive, FN rep­
ywgz/fgbz/bz/bzwb/jcffbz/201203/t20120302_224166.shtml). Ac­ resents false negative, indicating positive samples incorrectly predicted
cording to the document, the IAQI for ozone, based on the hourly ozone as negative.
concentration, can be calculated using the following formula:
IAQIhigh − IAQIlow 2.7. Compared model
IAQIozone = (Cozone − BPlow ) + IAQIlow (36)
BPhigh − BPlow
In order to verify the prediction capability of the proposed model, the
where IAQIozone represents the IAQI for ozone, Cozone represents the following models are used as the baselines:
hourly ozone concentration, BPhigh and BPlow respectively represent the
upper and lower bounds of the ozone concentration interval con­ • MLP: Multilayer Perceptron, is also called artificial neural network,
taining Cozone , IAQIhigh and IAQIlow represent the corresponding IAQI which contains an input layer, a hidden layer and an output layer.
values for BPhigh and BPlow . The specific ozone concentration intervals • LSTM: LSTM is a classic variant of the RNN model, which completes
and their corresponding IAQI data are summarized in Table S4. time series prediction by capturing time dependencies.
The accuracy evaluation method is derived from the technical reg­ • GRU: GRU is another classic variant of the RNN model. Similar to
ulations for forecast and evaluation of urban ambient air quality index LSTM, prediction is achieved by extracting temporal features and
(AQI) (http://www.cnemc.cn/zzjj/jgsz/ybzx/gzdt_ybzx/202112/t202 consumes fewer calculations.
11203_962949.shtml). If it satisfies the following formula, the IAQI • CNN-LSTM: CNN-LSTM is a classic model that implements spatio­
prediction is considered to be accurate. temporal feature extraction in air pollutant concentration prediction.
⎧ ⃒ ⃒ The model includes CNN and LSTM to realize the mining of temporal
⎨ ⃒IAQIpred − IAQItrue ⃒ ≤ 10 when IAQIpred ≤ 50 characteristics and spatial characteristics respectively.
(37)
⎩ ⃒⃒ ⃒ • DSSTG-Net: The full name is distance-based static spatio-temporal
IAQIpred − IAQItrue ⃒ ≤ 0.2*IAQIpred otherwise
graph network (DSSTG-Net). DSSTG-Net relies on the geographical
distances between monitoring stations to establish static station

8
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Table 1
Comparison of results of two directed construction strategies.
Strategy Cartesian directed graph Angle-based directed graph
2
Monitoring Station R MAE RMSE R2 MAE RMSE

Xixi 0.940 8.31 11.22 0.952 6.69 9.96


Yunqi 0.937 8.26 11.64 0.946 7.41 10.76
Wolongqiao 0.930 9.16 12.67 0.946 7.77 11.16
Hemuxiaoxue 0.938 8.46 12.26 0.945 8.03 11.60
Zhedanongda 0.952 7.54 10.91 0.951 7.64 11.02
Binjiang 0.951 7.40 10.61 0.953 7.19 10.43
Linpingzhen 0.952 7.79 10.85 0.958 7.19 10.27
Chengxiangzhen 0.923 9.81 12.83 0.940 8.32 11.30
Xiasha 0.947 8.52 12.01 0.947 8.60 11.98

The bold text indicates better results for the same metric.

μg/m3, and 2.068 μg/m3 respectively. Detailed results of the meteoro­


logical prediction performance at each time step can be seen in the
Table S5. The LSTM exhibited the highest scores in all evaluation metrics
compared to other models for predicting meteorological features. This
indicates that the LSTM model effectively captures and explores the
deep non-linear relationships of the meteorological time series data.
Therefore, the LSTM combined with the multi-model strategy is
employed as the foundational model for meteorological prediction,
aiming to generate future meteorological data. This will assist in
achieving more accurate ozone prediction within the model.

3.2. Comparison of directed graph construction strategies

According to the two proposed strategies for constructing graphs


based on changeable wind direction, dynamic directed graphs were
individually constructed, which were used as inputs for the spatiotem­
poral model to make predictions, yielding results for different moni­
toring stations. Table 1 summarizes the results for predicting 1-h ozone
concentration when using two strategies respectively. For convenience
of explanation, station IDs S1, S2…S9 are used to represent each station,
in the order from top to bottom in Table 1.
According to the results, it can be seen that for some stations (such as
Xiasha, etc.), the performance of the angle-based directed graph strategy
is close to the cartesian directed graph strategy. For Zhedanongda, the
angle-based directed graph strategy is slightly worse. This situation
arises due to the dispersed distribution of stations, leading to potential
differences in the surrounding terrain conditions for each station.
Consequently, a specific directed graph construction strategy may not
yield the optimal result for all stations. However, it can be seen that for
the vast majority of stations, the performance using the angle-based
directed graph strategy significantly outperforms that of the cartesian
Fig. 4. 24-h prediction performance comparison chart of meteorological fea­ directed graph strategy. Specifically, for the Chengxiangzhen station,
tures. (a) Air temperature. (b) Dew point temperature. (c) Sea level pressure. employing the angle-based directed graph strategy resulted in an
approximately 1.81 % enhancement in R2 compared to the cartesian
connection relationships, constructing a conventional static graph directed graph strategy, MAE and RMSE are reduced by roughly 17.91 %
structure as the input for the graph neural network. The remaining and 13.54 %, respectively. Furthermore, for the Xixi station, the angle-
components are identical to WDDSTG-Net. based directed graph strategy even led to a remarkably high decrease in
MAE by 24.22 %.
3. Results and discussions Ozone prediction performance were further explored and analyzed
in a longer time range. Taking the Chengxiangzhen station as an
3.1. Performance of meteorological features prediction example, as shown in Fig. S12. As the prediction length increases, both
graph construction strategies exhibit a continuous decrease in R2 and an
In the meteorological prediction module, the multi-model strategy increase in MAE and RMSE. However, it is evident that the performance
was applied to several typical machine learning and deep learning of the angle-based directed graph strategy significantly outperforms the
models to further observe and compare their predictive performance. cartesian directed graph strategy for a 24-h prediction. Notably, for a 12-
Fig. 4(a) shows the 24 h prediction results of air temperature. The R2, h prediction, the angle-based directed graph strategy shows the greatest
MAE and RMSE of LSTM are 0.943, 1.789 μg/m3 and 2.345 μg/m3 improvement when comparing to the cartesian directed graph strategy,
respectively. Fig. 4(b) shows the 24 h prediction results of dew point with a 10.21 % increase in R2, and reductions of 15.43 % and 12.67 % in
temperature. The R2, MAE and RMSE of LSTM are 0.933, 1.761 μg/m3 MAE and RMSE, respectively.
and 2.425 μg/m3 respectively. Fig. 4(c) shows the 24 h prediction results Based on the above results, it can be considered that the angle-based
of sea level pressure. The R2, MAE, and RMSE of LSTM are 0.948, 1.586 directed graph strategy has more significant prediction advantages and

9
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 5. The MAE and RMSE of different models in 24-h prediction. (a) Chengxiangzhen station. (b) Zhedanongda station.

stronger applicability in this study area, and also has better performance horizon was extended from the next hour to the next natural day (24 h).
in longer-term predictions. Therefore, the angle-based directed graph Taking Chengxiangzhen station and Zhedanongda station as an
strategy is used to construct a unified dynamic graph structure for example, the 24-h prediction results are shown in Fig. 5. For all models,
monitoring network with topological relationships. as the time step increases, both MAE and RMSE continuously increase
and tend to be smooth, indicating that long-term prediction is more
challenging and less accurate than short-term prediction. This suggests
3.3. Model prediction performance for ozone concentration an accumulation of prediction errors over time. For Chengxiangzhen
station, WDDSTG-Net performs similarly to other models for the first
The WDDSTG-Net was used to predict the ozone concentration for two hours. After 2 h, MAE and RMSE are the lowest among all models,
each station in the next hour and summarized the results reported in the and MLP performs the worst, indicating that its limitations in capturing
literature review (Zhang et al., 2022)as baselines(Table S6). A inherent nonlinear relationships. For Zhedanongda station, WDDSTG-
comparative analysis reveals that WDDSTG-Net exhibits significantly Net consistently outperforms excellent time-series models at all times,
superior performance in all metrics compared to statistic models and indicating its superior predictive performance. It can also be observed
deterministic models. It also closely approaches the performance of DL that WDDSTG-Net consistently outperforms DSSTG-Net at each time
models, which were reported as the best performing in the literature step, indicating that the proposed wind-direction-based dynamic
review, and achieves lower MAE in the predictions for most stations. directed graph structure exhibits superior predictive performance
Achieving such outstanding performance results is still commendable compared to the traditional static graph structure based on geographical
and promising. adjacency relationships. Such outstanding performance shows that the
Furthermore, prediction results from other studies were collected WDDSTG-Net model can more effectively capture complex nonlinear
and compared with the proposed model. Taking the Xixi as an example, relationships in longer-term ozone prediction, and has the potential to
the detailed results are summarized in Table S7. For 1-h prediction, the play a more important role in long-term ozone prediction.
WDDSTG-Net exhibited relatively better fitting and prediction perfor­ The results of the 24-h prediction of all stations were shown in the
mance than Seq2Seq (Jia et al., 2021) and Res-GCN-BiLSTM (Wu et al., Table S8. MLP performs the worst in the majority of stations, confirming
2023), with higher R2 and lower MAE and RMSE, suggesting its potential its weaker ability to capture nonlinear relationships and process time
advantage and prospects in short-term ozone forecasting to some extent. series. Both GRU and LSTM exhibit comparable and relatively good
Based on the results, the performance of all monitoring stations were predictive performance across each station which exhibit relatively
summarized (Fig. S13). It is evident that the trend lines fitted for each thorough exploration of deep temporal information within time series.
station closely align with y = x, visualizing the proximity and strong However, CNN-LSTM, which acts as a spatiotemporal hybrid model,
correlation between predicted and observed values, highlighting the does not show significantly better predictive performance than models
excellence of prediction results. such as GRU and LSTM, which only extract temporal features. This
In order to afford the government more policy response time in might be attributed to its complex internal structure and potential
addressing potentially severe ozone-related problems, long-term pre­ overfitting due to the relatively small dataset size. It is also possible that
diction of ozone concentrations is crucial. Therefore, the prediction

10
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 6. (a) Distribution of R2 of different models at each station. (b) Distribution of mean value of ozone at each monitoring station in the study area and distribution
of MAE of different models.

the dataset has relatively complex internal features, leading to the


extraction of excessive redundant features during spatiotemporal
feature extraction. These redundant features introduce considerable
noise during the prediction process, resulting in a worse predictive
performance compared to LSTM and GRU. DSSTG-Net uses a graph
neural network that extracts spatial topological relationships, enabling
more efficient handling of non-Euclidean spatial data compared to CNN.
Consequently, it exhibits superior predictive performance relative to
other temporal models and spatiotemporal hybrid models. Remarkably,
the proposed WDDSTG-Net achieves a secondary performance boost by
improving the static graph based on geographical adjacency relation­
ships in DSSTG-Net into a dynamic directed graph, and obtains the best
predictions at all monitoring stations. The performance shows that the
dynamic directed graph structure is a more effective and feasible graph
construction method, implying the high rationality of the model archi­
tecture design process and the strong applicability for ozone prediction.
Taking Linpingzhen station as an example, WDDSTG-Net achieves an R2
of 0.687, MAE of 20.86 μg/m3 and RMSE of 27.59 μg/m3 in the 24-h
ozone prediction, displaying outstanding performance. Compared to
other models, the improvements in R2 are 17.64 %, 4.41 %, 5.05 %, 6.35
%, and 2.84 %. Similarly, the improvements in MAE are 15.13 %, 4.75
%, 5.44 %, 6.04 %, and 3.52 %, and in RMSE are 13.07 %, 4.60 %, 4.93
%, 5.77 %, and 3.06 %, reflecting a remarkably improvement.
Table S9 summarizes the comprehensive 24-h O3 concentration
prediction results of the proposed model in this study and other research
models (Wang et al., 2022c; Zang et al., 2021). It is worth noting that in
the DCRNN study, predictions were made by dividing the data into
seasons, and thus the prediction results from different seasons were
averaged as the final prediction result of their model. It can be observed Fig. 7. The distribution of multiple indicators in different months. (a) The
that the WDDSTG-Net also demonstrates a certain prediction advantage, mean value of ozone. (b) The MAE result of ozone prediction. (c) The RMSE
with the RMSE decreasing by 5.15 % and 11.13 % compared to the other result of ozone prediction.
models, respectively.
Fig. 6. combines the mean hourly ozone concentrations for each and a strong correlation with each other. Stations like S5, S6, S7, S8 and
station. It is evident that the mean concentrations of the nine stations S9, were defined as high ozone areas which exhibit relatively higher
show minimal variation, ranging between 55 μg/m3 and 70 μg/m3. This ozone concentrations, models like MLP demonstrate poorer predictive
suggests that there is sufficient spatial transmission between stations performance. This indicates the limited predictive capability of

11
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 8. IAQI accuracy of each monitoring station.

conventional deep learning models in high ozone areas, making it in the proposed WDDSTG-Net structure, including meteorological pre­
challenging to effectively capture ozone peaks. For WDDSTG-Net, the diction module, spatial feature extraction module, and temporal feature
predictions remain relatively stable in high ozone areas. This demon­ extraction module, as detailed in Table S11. It’s obvious that removing
strates the model’s robust predictive capability for high ozone concen­ any of the three modules leads to a certain degree of reduction in
tration levels. model’s predictive accuracy. This means that each module plays an
Ozone variations exhibit not only daily periodicity but also distinct important role in mining specific deep feature relationships. These
seasonality. Therefore, a detailed analysis was conducted from a modules mutually interact and influence each other, coupling internally
monthly perspective. Fig. 7(a) presents the concentrations of ozone in in certain dimensions, compensating for the limitations of a singular
different months. It is evident that months with higher ozone concen­ model in capturing missed key information. They constitute the
trations mainly span from April to September, peaking in August with a comprehensive and outstanding performance of the WDDSTG-Net
mean ozone concentration of 95.80 μg/m3. From October to March, model, helping to achieve optimal prediction results. All these results
ozone levels are lower, reaching their minimum in December with a provide insights into the architectural design of spatiotemporal hybrid
median concentration of 28.99 μg/m3. Ozone concentrations vary models for ozone concentration prediction, contributing to the devel­
widely, with generally high concentrations in summer and autumn and opment of more efficient and accurate prediction models for processing
low concentrations in spring and winter. This variation is attributed to data with spatiotemporal features.
seasonal differences in meteorological factors such as temperature, solar
radiation, and precipitation. Typically, summer and autumn have more
3.5. Analysis of IAQI accuracy and other metrics
sunlight and longer daylight hours, conditions conducive to the photo­
chemical reactions involved in secondary pollutants like ozone. Thus,
Models were applied at a real-world level, extending the evaluation
the climate characteristics of summer and autumn determine the prev­
from theoretical numerical proximity to ozone pollution levels based on
alence of elevated ozone levels to a certain extent. In Fig. 7(b) and Fig. 7
policy documents in real society, thereby measuring the model’s prac­
(c), the worst result occurred in July, a month within the high ozone
tical significance. According to the IAQI calculation formula, the pre­
concentration range, with median MAE and RMSE of 25.47 μg/m3 and
dicted IAQI were calculated based on the 24-h ozone concentration
33.98 μg/m3, respectively. The best fitting was observed in December,
prediction results, and calculated the IAQI accuracy as shown in Fig. 8.
the month with the lowest ozone concentration, with median MAE and
The IAQI accuracy of all stations can reach >75 %, and the accuracy of
RMSE of 13.17 μg/m3 and 17.53 μg/m3, respectively. The results indi­
S2 and S8 can even reach >80 %. This shows that WDDSTG-Net can
cate that high ozone concentrations are often associated with lower
efficiently quantify ozone pollution levels and show promising practical
prediction accuracy, denoted by relatively higher MAE and RMSE.
value. Additionally, it’s noteworthy that the underestimation rates for
Taking the runtime of the GRU model as a benchmark, the compu­
each station are higher than the overestimation rates, with underesti­
tational cost for multiple models including WDDSTG-Net was computed.
mation rates ranging between 10 % and 14 % and overestimation rates
The specific results and efficiency analysis were summarized in
remain between 7 % and 11 %. This suggests that WDDSTG-Net tends to
Table S10.
underestimate the ozone pollution levels when the prediction is not
In summary, the proposed WDDSTG-Net demonstrates significant
accurate enough. The model’s ability to detect extreme weather still
application advantages when comparing to other models in predicting
needs to be further strengthened.
short-term or longer-term ozone concentration. Its remarkable sensi­
A detailed analysis of the IAQI accuracy at each hour was conducted
tivity and detection capabilities towards extreme concentrations are
for the Chengxiangzhen station and Zhedanongda station (Fig. S14).
commendable. During periods of severe ozone pollution, the model’s
MLP achieves the worst prediction for almost all hours. The changes in
predictive capacity remains impressive, allowing a substantial capture
accuracy for LSTM and GRU show remarkable similarity, while CNN-
of the complex diurnal cycle and seasonal variation patterns of ozone in
LSTM has overall higher accuracy rates than both LSTM and GRU,
high dimensions to a great extent.
although it slightly underperforms for specific hours. However, the ac­
curacy of DSSTG-Net at each time step is slightly higher than CNN-
3.4. Ablation study LSTM. Among all models, WDDSTG-Net achieves the best prediction
results. For the initial two hours, WDDSTG-Net achieves accuracy rates
Ablation studies were conducted to verify the rationality of each part exceeding 90 %, while achieving over 80 % accuracy for the first six

12
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Fig. 9. Performance of WDDSTG-Net on other metrics.

hours. Even for the platform with the worst accuracy, it maintains rates BiLSTM. Overall, the proposed model provides an effective data-driven
above 77 % and 73 %, respectively. approach for hourly ozone concentration prediction. It exhibited po­
Other metrics were also calculated to further inspect the perfor­ tentials for the prediction of other airborne pollutants.
mance of the model in predicting severe ozone pollution or extreme
peaks, setting 100 μg/m3 as the critical threshold for high ozone con­ CRediT authorship contribution statement
centration. Fig. 9. shows the calculation of various metrics at each sta­
tion in the 24-h prediction. TPR consistently remains above 0.69, Shiyi Wang: Writing – original draft, Visualization, Methodology,
reaching a maximum of 0.77, indicating a reasonable ability of the Data curation, Conceptualization. Yiming Sun: Software, Investigation.
model to predict high ozone concentrations. FAR remains mostly below Haonan Gu: Validation. Xiaoyong Cao: Formal analysis. Yao Shi:
0.28, indicating a low tendency for incorrect prediction. FPR is below Resources. Yi He: Writing – review & editing, Supervision, Project
0.06, suggesting a minimal likelihood of incorrectly predicting in the administration.
absence of high ozone concentrations. Taking FAR and FPR together,
these two metrics indicate that the model does not tend to overestimate Declaration of competing interest
actual concentrations. Based on the results of multiple indicators,
WDDSTG-Net demonstrates sensitivity to high ozone concentration and The authors declare that they have no known competing financial
false warnings is also maintained at a low level. However, there is still interests or personal relationships that could have appeared to influence
room for improvement and enhancement in predicting extreme the work reported in this paper.
scenarios.
Data availability
4. Conclusion
Data will be made available on request.
A hybrid model WDDSTG-Net was proposed and used to predict
hourly ozone concentration. For both 1-h prediction and 24-h predic­ Acknowledgements
tion, this model outperformed several classic data-driven models which
were used as benchmarks in the study of Hangzhou. The MAE of This work is supported by the National Key Research and Develop­
WDDSTG-Net was 6.69 μg/m3 and 18.63 μg/m3, respectively. Daily ment Program of China (grant number 2022YFE0106100), and the
change analysis, specific station analysis and monthly analysis showed National Natural Science Foundation of China (grant number 22178299,
that the model can effectively applied in different concentration time 51933009).
scales and spatial scales. Furthermore, ablation studies at monitoring
stations reflected the rationality of the overall structure of the model. Appendix A. Supplementary data
The IAQI accuracy calculated for each station shows a high accuracy of
81.74 %. In addition, for the prediction ability of the WDDSTG-Net for Supplementary data to this article can be found online at https://doi.
high ozone concentrations, results show that TPR reaches 0.77, and FAR org/10.1016/j.scitotenv.2024.174229.
and FPR are as low as 0.21 and 0.05, showing the ability of the model in
achieving accurate peak predictions. References
The model demonstrated the importance of considering three major
components simultaneously, including the dynamic spatial correlation, Bahdanau, D., Cho, K., Bengio, Y., 2016. Neural Machine Translation by Jointly Learning
temporal correlation and meteorological predictions. The key to to Align and Translate. https://doi.org/10.48550/arXiv.1409.0473.
Baklanov, A., Korsholm, U., Mahura, A., Petersen, C., Gross, A., 2008. ENVIRO-HIRLAM:
implement the dynamic spatial correlation was to use directed graphs
on-line coupled modelling of urban meteorology and air pollution. Adv. Sci. Res. 2,
dynamically based on wind direction by applying the angle-based 41–46. https://doi.org/10.5194/asr-2-41-2008.
strategy. In addition, the graph attention mechanism was used to Cao, Y., Qiao, X., Hopke, P.K., Ying, Q., Zhang, Y., Zeng, Y., Yuan, Y., Tang, Y., 2020.
Ozone pollution in the West China rain zone and its adjacent regions, southwestern
assign dynamic weights to each station in the dynamic directed graphs.
China: concentrations, ecological risk, and sources. Chemosphere 256, 127008.
Moreover, LSTM was used to implement to achieve meteorological https://doi.org/10.1016/j.chemosphere.2020.127008.
prediction and attained the higher R2 in the prediction of the three Chelani, A.B., 2019. Estimating PM2.5 concentration from satellite derived aerosol optical
meteorological factors than other classic methods such as MLP, GRU and depth and meteorological variables using a combination model. Atmos. Pollut. Res.
10, 847–857. https://doi.org/10.1016/j.apr.2018.12.013.

13
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Chen, S., Wang, H., Lu, K., Zeng, L., Hu, M., Zhang, Y., 2020. The trend of surface ozone convolutional network. Atmospheric. Pollut. Res. 12, 101197 https://doi.org/
in Beijing from 2013 to 2019: indications of the persisting strong atmospheric 10.1016/j.apr.2021.101197.
oxidation capacity. Atmos. Environ. 242, 117801 https://doi.org/10.1016/j. Ma, J., Ding, Y., Cheng, J.C.P., Jiang, F., Tan, Y., Gan, V.J.L., Wan, Z., 2020.
atmosenv.2020.117801. Identification of high impact factors of air quality on a national scale using big data
Chen, D., Wang, G., Xinyue, Z., Liu, Q., Liu, X., 2021. A hybrid CNN-LSTM model for and machine learning techniques. J. Clean. Prod. 244, 118955 https://doi.org/
predicting PM2.5 in Beijing based on spatiotemporal correlation. Environ. Ecol. Stat. 10.1016/j.jclepro.2019.118955.
28, 503–522. https://doi.org/10.1007/s10651-021-00501-8. Maji, K.J., Ye, W.-F., Arora, M., Nagendra, S.M.S., 2019. Ozone pollution in Chinese
Chen, Y., Chen, X., Xu, A., Sun, Q., Peng, X., 2022. A hybrid CNN-transformer model for cities: assessment of seasonal variation, health effects and economic burden.
ozone concentration prediction. Air Qual. Atmos. Health 15, 1533–1546. https:// Environ. Pollut. 247, 792–801. https://doi.org/10.1016/j.envpol.2019.01.049.
doi.org/10.1007/s11869-022-01197-w. Mao, W., Jiao, L., Wang, W., 2022. Long time series ozone prediction in China: a novel
Cheng, Y., He, L.-Y., Huang, X.-F., 2021. Development of a high-performance machine dynamic spatiotemporal deep learning approach. Build. Environ. 218, 109087
learning model to predict ground ozone pollution in typical cities of China. https://doi.org/10.1016/j.buildenv.2022.109087.
J. Environ. Manage. 299, 113670 https://doi.org/10.1016/j.jenvman.2021.113670. Ouyang, X., Yang, Y., Zhang, Y., Zhou, W., 2021. Spatial-temporal dynamic graph
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., convolution neural network for air quality prediction, in: 2021 international joint
Bengio, Y., 2014. Learning Phrase Representations using RNN Encoder-Decoder for conference on neural networks (IJCNN). In: Presented at the 2021 International
Statistical Machine Translation. https://doi.org/10.48550/arXiv.1406.1078. Joint Conference on Neural Networks (IJCNN), pp. 1–8. https://doi.org/10.1109/
Chuang, M.-T., Zhang, Y., Kang, D., 2011. Application of WRF/Chem-MADRID for real- IJCNN52387.2021.9534167.
time air quality forecasting over the southeastern United States. Atmos. Environ. 45, Pak, U., Kim, C., Ryu, U., Sok, K., Pak, S., 2018. A hybrid model based on convolutional
6241–6250. https://doi.org/10.1016/j.atmosenv.2011.06.071. neural networks and long short-term memory for ozone concentration prediction.
Dun, A., Yang, Y., Lei, F., 2022. A novel hybrid model based on spatiotemporal Air Qual. Atmos. Health 11, 883–895. https://doi.org/10.1007/s11869-018-0585-1.
correlation for air quality prediction. Mob. Inf. Syst. 2022, e9759988 https://doi. Qi, Y., Li, Q., Karimian, H., Liu, D., 2019. A hybrid model for spatiotemporal forecasting
org/10.1155/2022/9759988. of PM2.5 based on graph convolutional neural network and long short-term memory.
Eslami, E., Choi, Y., Lops, Y., Sayeed, A., 2020. A real-time hourly ozone prediction Sci. Total Environ. 664, 1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333.
system using deep convolutional neural network. Neural Comput. & Applic. 32, Russo, A., Raischel, F., Lind, P.G., 2013. Air quality prediction using optimal neural
8783–8797. https://doi.org/10.1007/s00521-019-04282-x. networks with stochastic variables. Atmos. Environ. 79, 822–830. https://doi.org/
Fan, Y., Ding, X., Hang, J., Ge, J., 2020. Characteristics of urban air pollution in different 10.1016/j.atmosenv.2013.07.072.
regions of China between 2015 and 2019. Build. Environ. 180, 107048 https://doi. Song, X.-Y., Gao, Y., Peng, Y., Huang, S., Liu, C., Peng, Z.-R., 2021. A machine learning
org/10.1016/j.buildenv.2020.107048. approach to modelling the spatial variations in the daily fine particulate matter
Feng, R., Zheng, H., Gao, H., Zhang, A., Huang, C., Zhang, J., Luo, K., Fan, J., 2019. (PM2.5) and nitrogen dioxide (NO2) of Shanghai, China. Environment and Planning
Recurrent neural network and random forest for analysis and accurate forecast of B: Urban Analytics and City Science 48, 467–483. https://doi.org/10.1177/
atmospheric pollutants: a case study in Hangzhou, China. J. Clean. Prod. 231, 2399808320975031.
1005–1015. https://doi.org/10.1016/j.jclepro.2019.05.319. Wang, T., Xue, L., Brimblecombe, P., Lam, Y.F., Li, L., Zhang, L., 2017a. Ozone pollution
Foley, K.M., Roselle, S.J., Appel, K.W., Bhave, P.V., Pleim, J.E., Otte, T.L., Mathur, R., in China: a review of concentrations, meteorological influences, chemical precursors,
Sarwar, G., Young, J.O., Gilliam, R.C., Nolte, C.G., Kelly, J.T., Gilliland, A.B., Bash, J. and effects. Sci. Total Environ. 575, 1582–1596. https://doi.org/10.1016/j.
O., 2010. Incremental testing of the community multiscale air quality (CMAQ) scitotenv.2016.10.081.
modeling system version 4.7. Geosci. Model Dev. 3, 205–226. https://doi.org/ Wang, P., Zhang, H., Qin, Z., Zhang, G., 2017b. A novel hybrid-Garch model based on
10.5194/gmd-3-205-2010. ARIMA and SVM for PM2.5 concentrations forecasting. Atmos. Pollut. Res. 8,
Freeman, B.S., Taylor, G., Gharabaghi, B., Thé, J., 2018. Forecasting air quality time 850–860. https://doi.org/10.1016/j.apr.2017.01.003.
series using deep learning. J. Air Waste Manage. Assoc. 68, 866–886. https://doi. Wang, H.-W., Li, X.-B., Wang, D., Zhao, J., He, H., Peng, Z.-R., 2020a. Regional
org/10.1080/10962247.2018.1459956. prediction of ground-level ozone using a hybrid sequence-to-sequence deep learning
Gao, J., Woodward, A., Vardoulakis, S., Kovats, S., Wilkinson, P., Li, L., Xu, L., Li, J., approach. J. Clean. Prod. 253, 119841 https://doi.org/10.1016/j.
Yang, J., Li, J., Cao, L., Liu, X., Wu, H., Liu, Q., 2017. Haze, public health and jclepro.2019.119841.
mitigation measures in China: a review of the current evidence for further policy Wang, P., Qiao, X., Zhang, H., 2020b. Modeling PM2.5 and O3 with aerosol feedbacks
response. Sci. Total Environ. 578, 148–157. https://doi.org/10.1016/j. using WRF/Chem over the Sichuan Basin, southwestern China. Chemosphere 254,
scitotenv.2016.10.231. 126735. https://doi.org/10.1016/j.chemosphere.2020.126735.
Hao, S., Lee, D.-H., Zhao, D., 2019. Sequence to sequence learning with attention Wang, Sichen, Huo, Y., Mu, X., Jiang, P., Xun, S., He, B., Wu, W., Liu, L., Wang, Y.,
mechanism for short-term passenger flow prediction in large-scale metro system. 2022a. A high-performance convolutional neural network for ground-level ozone
Transportation Research Part C: Emerging Technologies 107, 287–300. https://doi. estimation in eastern China. Remote Sens. (Basel) 14, 1640. https://doi.org/
org/10.1016/j.trc.2019.08.005. 10.3390/rs14071640.
He, H., Li, M., Wang, W., Wang, Z., Xue, Y., 2018. Prediction of PM2.5 concentration Wang, Shun, Qiao, L., Fang, W., Jing, G., Sheng, V., Zhang, Y., 2022b. Air pollution
based on the similarity in air quality monitoring network. Build. Environ. 137, prediction via graph attention network and gated recurrent unit. CMC 73, 673–687.
11–17. https://doi.org/10.1016/j.buildenv.2018.03.058. https://doi.org/10.32604/cmc.2022.028411.
Hong, F., Ji, C., Rao, J., Chen, C., Sun, W., 2023. Hourly ozone level prediction based on Wang, Dongsheng, Wang, H.-W., Lu, K.-F., Peng, Z.-R., Zhao, J., 2022c. Regional
the characterization of its periodic behavior via deep learning. Process Saf. Environ. prediction of ozone and fine particulate matter using diffusion convolutional
Prot. 174, 28–38. https://doi.org/10.1016/j.psep.2023.03.059. recurrent neural network. Int. J. Environ. Res. Public Health 19, 3988. https://doi.
Hu, J., Chen, Y., Wang, W., Zhang, S., Cui, C., Ding, W., Fang, Y., 2023. An optimized org/10.3390/ijerph19073988.
hybrid deep learning model for PM2.5 and O3 concentration prediction. Air Qual. Wei, X., Yu, R., Sun, J., 2020. View-GCN: view-based graph convolutional network for 3D
Atmos. Health 16, 857–871. https://doi.org/10.1007/s11869-023-01317-0. shape analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern
Jia, P., Cao, N., Yang, S., 2021. Real-time hourly ozone prediction system for Yangtze Recognition (CVPR). Presented at the 2020 IEEE/CVF Conference on Computer
River Delta area using attention based on a sequence to sequence model. Atmos. Vision and Pattern Recognition (CVPR), pp. 1847–1856. https://doi.org/10.1109/
Environ. 244, 117917 https://doi.org/10.1016/j.atmosenv.2020.117917. CVPR42600.2020.00192.
Kim, J., Wang, X., Kang, C., Yu, J., Li, P., 2021. Forecasting air pollutant concentration Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S., 2021. A comprehensive survey on
using a novel spatiotemporal deep learning model based on clustering, feature graph neural networks. IEEE Transactions on Neural Networks and Learning Systems
selection and empirical wavelet transform. Sci. Total Environ. 801, 149654 https:// 32, 4–24. https://doi.org/10.1109/TNNLS.2020.2978386.
doi.org/10.1016/j.scitotenv.2021.149654. Wu, C., He, H., Song, R., Zhu, X., Peng, Z., Fu, Q., Pan, J., 2023. A hybrid deep learning
Le, V.-D., Bui, T.-C., Cha, S.-K., 2020. Spatiotemporal deep learning model for citywide model for regional O3 and NO2 concentrations prediction based on spatiotemporal
air pollution interpolation and prediction. In: 2020 IEEE International Conference on dependencies in air quality monitoring network. Environ. Pollut. 320, 121075
Big Data and Smart Computing (BigComp). Presented at the 2020 IEEE International https://doi.org/10.1016/j.envpol.2023.121075.
Conference on Big Data and Smart Computing (BigComp), pp. 55–62. https://doi. Xu, Y., Wang, F., An, Z., Wang, Q., Zhang, Z., 2023. Artificial intelligence for
org/10.1109/BigComp48618.2020.00-99. science—bridging data to wisdom. The Innovation 4, 100525. https://doi.org/
Li, X., Peng, L., Yao, X., Cui, S., Hu, Y., You, C., Chi, T., 2017. Long short-term memory 10.1016/j.xinn.2023.100525.
neural network for air pollutant concentration predictions: method development and Yu, C., Wang, F., Shao, Z., Qian, T., Zhang, Z., Wei, W., Xu, Y., 2024. GinAR: An end-to-
evaluation. Environ. Pollut. 231, 997–1004. https://doi.org/10.1016/j. end multivariate time series forecasting model suitable for variable missing. Doi:1
envpol.2017.08.114. 0.48550/arXiv.2405.11333.
Li, L.-L., Wen, S.-Y., Tseng, M.-L., Wang, C.-S., 2019. Renewable energy prediction: a Zang, Z., Guo, Y., Jiang, Y., Zuo, C., Li, D., Shi, W., Yan, X., 2021. Tree-based ensemble
novel short-term prediction model of photovoltaic output power. J. Clean. Prod. 228, deep learning model for spatiotemporal surface ozone (O3) prediction and
359–375. https://doi.org/10.1016/j.jclepro.2019.04.331. interpretation. Int. J. Appl. Earth Obs. Geoinf. 103, 102516 https://doi.org/
Lin, Y., Mago, N., Gao, Y., Li, Y., Chiang, Y.-Y., Shahabi, C., Ambite, J.L., 2018. 10.1016/j.jag.2021.102516.
Exploiting Spatiotemporal Patterns for Accurate Air Quality Forecasting Using Deep Zhan, Y., Luo, Y., Deng, X., Grieneisen, M.L., Zhang, M., Di, B., 2018. Spatiotemporal
Learning, in: Proceedings of the 26th ACM SIGSPATIAL International Conference on prediction of daily ambient ozone levels across China using random forest for human
Advances in Geographic Information Systems, SIGSPATIAL ‘18. Association for exposure assessment. Environ. Pollut. 233, 464–473. https://doi.org/10.1016/j.
Computing Machinery, New York, NY, USA, pp. 359–368. https://doi.org/10.1145/ envpol.2017.10.029.
3274895.3274907.
Liu, X., Qin, M., He, Y., Mi, X., Yu, C., 2021. A new multi-data-driven spatiotemporal
PM2.5 forecasting model based on an ensemble graph reinforcement learning

14
S. Wang et al. Science of the Total Environment 946 (2024) 174229

Zhang, J., Chen, F., Guo, Y., Li, X., 2020. Multi-graph convolutional network for short- Zhu, H., Lu, X., 2016. The prediction of PM2.5 value based on ARMA and improved BP
term passenger flow forecasting in urban rail transit. IET Intell. Transp. Syst. 14, neural network model. In: 2016 International Conference on Intelligent Networking
1210–1217. https://doi.org/10.1049/iet-its.2019.0873. and Collaborative Systems (INCoS). Presented at the 2016 International Conference
Zhang, B., Rong, Y., Yong, R., Qin, D., Li, M., Zou, G., Pan, J., 2022. Deep learning for air on Intelligent Networking and Collaborative Systems (INCoS), pp. 515–517. https://
pollutant concentration prediction: a review. Atmos. Environ. 290, 119347 https:// doi.org/10.1109/INCoS.2016.81.
doi.org/10.1016/j.atmosenv.2022.119347.

15

You might also like