0% found this document useful (0 votes)
5 views12 pages

Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017 a machine learning approach

This study employs a machine learning approach using the XGBoost algorithm to predict surface ozone levels in China from 2005 to 2017, addressing the scarcity of in-situ ozone measurements prior to 2013. The model demonstrates high accuracy in predicting ozone concentrations, identifying pollution hotspots in major urban regions, and revealing a significant increasing trend in the Beijing-Tianjin-Hebei area. The findings highlight the public health risks associated with ozone pollution, with over a quarter of the population living in areas exceeding national air quality standards.

Uploaded by

connorwong0817
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017 a machine learning approach

This study employs a machine learning approach using the XGBoost algorithm to predict surface ozone levels in China from 2005 to 2017, addressing the scarcity of in-situ ozone measurements prior to 2013. The model demonstrates high accuracy in predicting ozone concentrations, identifying pollution hotspots in major urban regions, and revealing a significant increasing trend in the Beijing-Tianjin-Hebei area. The findings highlight the public health risks associated with ozone pollution, with over a quarter of the population living in areas exceeding national air quality standards.

Uploaded by

connorwong0817
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Environment International 142 (2020) 105823

Contents lists available at ScienceDirect

Environment International
journal homepage: www.elsevier.com/locate/envint

Spatiotemporal distributions of surface ozone levels in China from 2005 to T


2017: A machine learning approach
⁎ ⁎
Riyang Liua, Zongwei Maa,b, , Yang Liuc, Yanchuan Shaoa, Wei Zhaoa, Jun Bia,b,
a
State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China
b
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology,
Nanjing, Jiangsu, China
c
Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA

A R T I C LE I N FO A B S T R A C T

Keywords: In recent years, ground-level ozone has become a severe ambient pollutant in major urban areas of China, which
Surface ozone has adverse impacts on population health. However, in-situ measurements of the ozone concentration before
MDA8 2013 in China are quite scarce, which cannot facilitate the assessment of the long-term trends and effects of
XGBoost ozone pollution. In this study, we used daily maximum 8-hour average (MDA8) ozone observations from 2013 to
Spatiotemporal patterns
2017 combined with concurrent ozone retrievals, aerosol reanalysis, meteorological parameters, and land-use
data to establish a nationwide MDA8 prediction model based on the eXtreme Gradient Boosting (XGBoost)
algorithm. The model achieves high prediction accuracy compared with other studies, with R2 values for the by-
year, site-based, and sample-based cross-validation (CV) schemes of 0.61, 0.64, and 0.78, respectively, at the
daily level. External testing with regional measurements from 2005 to 2012 and nationwide data in 2018 have
shown that the model is robust and reliable for historical data prediction, with external model testing R2 values
ranging from 0.60 to 0.87 at the month level in different years. Using the final estimator, we obtained na-
tionwide monthly mean ozone concentrations from 2005 to 2012 and daily MDA8 ozone concentrations from
2013 to 2017 at a resolution of 0.1° × 0.1°. According to the average number of days exceeding the standard and
the average of the 90th percentile of the MDA8 ozone concentrations, the Beijing-Tianjin-Hebei (BTH), the
Yangtze River Delta, the Pearl River Delta, the Jianghan Plain, the Sichuan Basin, and the Northeast Plain
regions were identified as pollution hotspots. During the research period, the overall ozone levels fluctuated
slightly, and their trends were not spatially continuous. There was a significant increasing trend in the BTH
region by 1.37 (95% CI: 0.46,2.29) μg/m3/year between 2013 and 2017. In 2017, 26.24% of the population
lived in areas exceeding the Chinese grade II national air quality standard, which shows that ozone pollution has
posed an obvious threat to population health in China. Our products will provide reliable support for future long-
term nationwide health impact studies and policy-making for pollution control and prevention.

1. Introduction human health. Yin et al found that a 10-µg/m3 increase in the MDA8
ozone concentration was associated with a 0.24% higher daily mor-
Ground-level ozone has become a severe ambient pollutant in major tality from all non-accidental causes based on evidence from 272 Chi-
urban areas of China, as PM2.5 levels have decreased dramatically in nese cities from 2013 to 2015 (Yin et al. 2017). The expected premature
recent years (Ma et al. 2019). According to official reports (MEE 2018; deaths due to O3 pollution varied from 28,000 to 74,000 deaths de-
MEP 2014), from 2013 to 2017, the average 90th percentile con- pending on the selected metric in 2015 according to another study
centration of the O3 daily maximum 8-hour average (MDA8) of 74 key (Feng et al. 2019). Thus, it is necessary to investigate the long-term
cities increased by 20.14%, from 139 μg/m3 to 167 μg/m3. In the Pearl spatiotemporal pattern in surface ozone to carry out environmental
River Delta (PRD) region, the Yangtze River Delta (YRD) region and the health studies.
Beijing-Tianjin-Hebei (BTH) region, the percentage of days with O3 as However, it is difficult to capture the historical variation in surface
the primary pollutant among the nonattainment days were 70.6%, ozone due to the lack of observations. Monitoring and publication for
50.4%, and 41%, respectively, in 2017. Epidemiological studies have the whole country had not been carried out until a monitoring network
found that ground-level ozone is associated with adverse effects on covering major cities across China was completed at the end of 2012


Corresponding authors at: State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China.

https://doi.org/10.1016/j.envint.2020.105823
Received 1 March 2020; Received in revised form 8 May 2020; Accepted 18 May 2020
Available online 07 June 2020
0160-4120/ © 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
R. Liu, et al. Environment International 142 (2020) 105823

(Wang et al. 2017a). To address the defects of long-term ground ob- stable in different years. Due to the deficiency of historical observa-
servations, satellite remote sensing data have been utilized as an al- tions, this hypothesis cannot be proven perfectly. In previous studies on
ternative tool to investigate trends in ozone pollution. However, be- historical PM2.5 estimates, external validation was either limited in a
cause the instrument has a weak retrieval sensitivity to ground-level single year (Ma et al. 2016) or just a few cities (Xiao et al. 2018; Xue
ozone (Bowman 2013), the day-level correlations between satellite re- et al. 2019) based on data availability. In our study, we applied a more
trieved-ozone and surface ozone were relatively weak, with correlation integrated validation strategy that aimed to enhance the representa-
coefficient (R) ranging from 0.3 to 0.6 for sites in southern China and tiveness of validation.
ranging from 0.1 to 0.3 for sites in northern China during the sum- In this work, we developed a machine learning model based on the
mertime from 2013 to 2017 (Shen et al. 2019). Therefore, the satellite XGBoost algorithm to estimate the long-term surface ozone across
data approach alone can hardly obtain accurate long-term ground-level China from 2005 to 2017 at a spatial resolution of 0.1° × 0.1°. Ozone
ozone estimates. Besides, chemical transport models (CTMs), which retrievals, aerosol reanalysis, meteorological observations, and land-use
take meteorological fields and atmospheric pollutant emission in- data were used as predictors. We used the datasets from 2013 to 2017
ventory as inputs, have been widely used to simulate atmospheric as the training datasets. An annual-based cross-validation technique
chemical processes and acquire ground-level ozone concentrations that we call the by-year CV technique was introduced as the hy-
(Sharma et al. 2017). The CTMs can acquire results with high accuracy, perparameter-tuning strategy. The model generalization performance
for example, changes on surface O3 levels over central-eastern China was tested with observations beyond the modeling datasets from sev-
(CEC) between July 2003 and July 2015 were obtained using the eral regions during the 2005–2012 period and nationwide in 2018.
Goddard Earth Observing System Chemical Transport Model (GEOS- Further spatiotemporal pattern analysis and exposure assessment were
Chem) with a grid resolution of 0.5°×0.625°. The model results were conducted based on long-term, high accuracy, nationwide ozone esti-
highly correlated with the monthly observations (coefficient of de- mates derived in this study, combined with gridded population data-
termination R2 = 0.79) (Sun et al. 2019). However, the use of CTMs is sets.
often constrained by their high computational cost, relatively coarse
resolution, and high uncertainty in emission inventory as well as the 2. Material and methods
model assumptions (Sharma et al. 2017).
It was found that the formation and fate of ground-level ozone are 2.1. Data preparation
associated with elevation, land-use type, atmospheric components and
some meteorological factors such as temperature, wind, sunshine, hu- 2.1.1. Ground-level ozone monitoring data
midity, and precipitation (Fu and Tai 2015; Gaudel et al., 2018; Li et al. Measurements of ozone were published by the China National
2019a; Shen and Mickley 2017; Tu et al. 2007; Wang et al. 2001). Environmental Monitoring Center. We downloaded the hourly ozone
Therefore, statistical models can be used to capture the associations concentration of mainland China for January 2013 to December 2018
between surface ozone concentration and explanatory variables di- from Qingyue Open Environment Data Center (https://data.epmap.org)
rectly; as a result, the associations have the potential to predict the and BeijingAir (http://beijingair.sinaapp.com/). The hourly monitoring
ozone concentration when and where monitoring points are deficient data of O3 concentration in the Hong Kong Special Administrative
based on the temporal and spatial variations in these factors. Since the Region and Taiwan Province from January 2005 to December 2018
1990s, some scholars have researched city-scale real-time forecasting were downloaded from the Hong Kong Environmental Protection
methods and compared the effects of different nonlinear fitting methods Agency (http://cd.epic.epd.gov.hk/EPICDI/air/station/?Lang=zh) and
on a small scale (Burrows et al. 1995; Feister and Balzer 1991; Feng the Taiwan Air Quality Publishing Platform (http://taqm.epa.gov.tw/
et al. 2011). In recent years, some scholars have tried to build models qm/tw/YearlyDataload.aspx). There were 1713 sites in total; 1618
on a larger spatial scale. Zhan et al. introduced a random forest algo- were in mainland China, 16 were in Hong Kong and 79 were in Taiwan,
rithm using multi-source data including meteorological parameters, as Fig. 1 shows. Additional observations before 2013 on PRD, Xi’an city,
altitude, anthropogenic emission inventory, land use, normalized dif- and Mount Tai were obtained with the help of authors of several pre-
ference vegetation index, and road density to estimate the 2015 MDA8 vious studies (Sun et al. 2016; Wang et al. 2012; Yang et al. 2019). The
ozone concentration at a resolution of 0.1° × 0.1°. The site-based cross- unit of monitoring data in Taiwan Province was transformed from ppb
validation (CV) results show that R2 is 0.71 and the root mean square to μg/m3 with the formula 1.96 μg/m3 = 1 ppb mentioned in a pre-
error (RMSE) is 19 μg/m3 for the month-level analysis (Zhan et al. vious study (Yin et al. 2017). The missing and problematic data were
2017). However, similar studies on ozone in China are still scarce, and removed based on the data control method introduced by another study
few of them involve historical concentration estimation. To fill this gap, (Lu et al. 2018) before calculating the MDA8 ozone concentration.
two scientific questions need to be answered: how to establish a stable
relationship between pollutants and explanatory variables that will not 2.1.2. The reanalysis data
change dramatically over time and how to conduct reliable validation We obtained the Modern-Era Retrospective Analysis for Research
on historical estimates. and Applications, Version 2 (MERRA-2) data during the period of
We chose eXtreme gradient boosting (XGBoost) trees to obtain the 2005–2018 from the National Aeronautics and Space Administration
complex nonlinear relationship between ozone concentration and ex- (NASA) (https://disc.gsfc.nasa.gov/datasets?Project=MERRA-2). The
planatory variables. The XGBoost algorithm is an optimized library MERRA-2 includes assimilations of aerosols, ozone, and meteorological
under the gradient boosting framework that attempts to accurately fields. They are assimilated from multiple sources, such as model si-
predict a target variable by combining the estimates of a set of simpler, mulations, ground observations, and satellite observations (McCarty
weaker models. It is more interpretable and computationally efficient et al. 2016). Specifically, aerosols, which consist of dust, sea salt, black
than deep learning models such as neural networks (Hu et al. 2017) and and organic carbon, and sulfate, were simulated with a radiatively
needs fewer restrictive assumptions about the samples than the tradi- coupled version of the Goddard Chemistry, Aerosol, Radiation, and
tional regression method (Hu et al. 2017; Reid et al. 2015). The Transport model based on the database of emissions (Randles et al.
XGBoost algorithm achieves superiority in both speed and performance 2017). The reason for including them is that they can represent the
and has won numerous data science competitions (Chen and Guestrin effects of anthropogenic factors to some extent through participating in
2016). It has already been used in previous similar studies (Li et al. ozone-related chemical reactions (Li et al. 2019b) and affecting the
2020; Xiao et al. 2018). photochemical reaction rate (Li et al. 2017). The input observations of
Historical estimates depend on the important assumption that the ozone consist of stratospheric profiles from the Microwave Limb
relationship between ozone concentration and explanatory variables is Sounder (MLS) instrument and total ozone column observations from

2
R. Liu, et al. Environment International 142 (2020) 105823

Fig. 1. Spatial distribution of the ozone


monitoring sites in China in 2018. Note that
some clustered sites overlap because of their
proximity. The areas within the red lines are
three megacity clusters in China: BTH, YRD,
and PRD. (For interpretation of the refer-
ences to colour in this figure legend, the
reader is referred to the web version of this
article.)

the Ozone Monitoring Instrument (OMI) (McCarty et al. 2016). Besides, the nationwide elevation data came from RESDC (http://
The horizontal spatial resolution of the data is 0.625° × 0.5° (ap- www.resdc.cn/data.aspx?DATAID=284). The dataset is a 90-m data
proximately 50 km × 50 km). NASA provides instantaneous and product based on the latest Shuttle Radar Topography Mission (SRTM)
average products every six hours, every three hours and every hour V4.1 data, whose raw data were collected in 2000.
(Gelaro et al. 2017). The daily (Beijing Standard Time, UTC + 8) mean
values of the above data were calculated in the original grid after
2.1.5. Data integration
downloading.
We constructed a 0.1° × 0.1° modeling grid across China for data
In total, 34 parameters were collected (for details, see Table S1). Not
integration. The ground-level O3 monitoring data were spatially joined
all of these parameters were included in the final model, and the se-
with the modeling grid. Mean values were calculated if a grid had more
lection process is described in detail in section 2.2.2.
than 1 monitoring site. MERRA-2 reanalysis data and meteorological
ground observations were interpolated by inverse distance weighting
2.1.3. Ground meteorological observations (IDW) to the modeling grid as in previous studies (Xiao et al. 2018; Xue
Due to the lack of sunshine duration data in the reanalysis data, we et al. 2019). The land-use data were used to calculate the area pro-
derived meteorological ground observations as a supplement from the portion of each land-use type in each grid. Since the years of land-use
Resource and Environment Science Data Center of the Chinese data were discrete (i.e., 2005, 2010, 2015, and 2018), the data of other
Academy of Sciences (RESDC) (http://www.resdc.cn/data.aspx? years were obtained by linear interpolation. The LandScan datasets
DATAID=230). We extracted the temperature, precipitation, and sun- were aggregated into the gridded map.
shine duration data collected from 2554 meteorological stations from All the data (including the ground-level ozone measurements, re-
2005 to 2018 (for the details, see Table S2, temperature and pre- analysis data, meteorological observations, land-use data, and elevation
cipitation data were included for further selection). Then, outliers of data) were matched by their grid cell ID and the day of the year (DOY)
these 7 parameters were removed according to the requirements of the for model development (we refer to this model as a grid-based model in
QX/T 118–2010 Quality Control Standard for Surface Meteorological this study) and validation.
Observations (CMA 2010). Some previous studies’ modeling datasets were matched by station
ID and DOY(Qian et al. 2016) (we refer to this model as a station-based
model in this study). We compared the performance of these two ap-
2.1.4. Other geographical data
proaches in choosing the optimal setting.
The nationwide land-use data at a resolution of 1 km × 1 km were
also derived from RESDC (http://www.resdc.cn/data.aspx?
DATAID=184) to provide information on the spatial distribution of 2.2. Model development and validation
different anthropogenic activities. Six primary types of land use were
contained: cultivated land, forestland, grassland, water bodies, con- 2.2.1. Spatiotemporal terms
struction land, and unused land. According to the study period, we The relationship between ozone concentration and covariates
selected four years of data, namely, 2005, 2010, 2015, and 2018. changes over time and space, which is often referred to as spatio-
We obtained continuous population distribution data at approxi- temporal heterogeneity. Previous studies have shown that distance
mately 1 km (30″ × 30″) spatial resolution from the Oak Ridge fields can account for spatial non-stationarity and spatial autocorrela-
National Laboratory (https://landscan.ornl.gov/landscan-datasets) tion and provide results comparable to conventional spatial models
covering the entire study period (2005–2017) (Rose et al. 2018). We such as regression kriging and geographically weighted regression
extracted data on China from these global datasets. (Behrens et al. 2018; Wei et al. 2020). We constructed spatial terms to

3
R. Liu, et al. Environment International 142 (2020) 105823

solve this problem similarly. The spatial information was represented study (Xiao et al. 2018), the range of values of the above four hy-
by the geographic distances to the four corners and the center of the perparameters was set to 4–10, 3–8, 0.5–0.9, 0.5–0.9, and the fractional
rectangle surrounding our modeling grid (73.2°E-135.3°E, 18.1°N- values are available for the last three hyperparameters. To save time
53.8°N) using the Haversine approach, whose formula is shown in and improve the performance, the model-based optimization (MBO,
Equations 1–4. Since it was indicated that the influence of each me- also known as Bayesian optimization) algorithm (Bischl et al.) was
teorological parameter on ozone levels varied significantly across sea- adopted as the search strategy, and the maximum estimated amount of
sons and years (Chen et al. 2019), we used the order of the weeks in a optimization was set to 100. To avoid overfitting, only hyperparameters
year as the temporal term, as Equation (5) shows. Some researchers with the statistic “gamma” larger than 0.1 were acceptable in this
have argued that the hypothesis that the relationship between pollutant model, which describes the minimum loss reduction required to make a
concentration and covariates remains unchanged on the same day in further partition on a leaf node of the tree.
different years is unrealistic. Therefore, the prediction based on this
hypothesis may lead to a large deviation (Xiao et al. 2018), so we used a
weaker assumption. 2.2.4. Cross-validation
To fulfill our goal of acquiring a reliable historical estimator, we
PS (i, j, t ) = f (Lon (i, j) , Lat(i, j) )
evaluated the performance of all the models with the same target-or-
= haversine (α2 − α1) + cos(α1)cos(α2) haversine (β2 − β1) (1) iented strategy called the by-year CV as in previous studies on historical
PM2.5 estimates (He et al. 2020; Xue et al. 2019), which is illustrated in
α = Lon × π/180 (2)
Figure S3. We separated our datasets from 2013 to 2017 into 5 groups
β = Lat × π/180 (3) by calendar year. In each iteration, the samples in 4 groups were used
as training data, while the remaining samples were held out as the
α “historical” year for validation. The above modeling and validation
haversine (α ) = sin2 ⎛ ⎞ = [1 − cos (α )]/2
⎝2⎠ (4) process was repeated 5 times until each year’s data had been validated.
PT (i, j, t ) = WOYt With such a cross-validation technique, we can select the model that
(5)
best reflects the commonality of the associations between surface ozone
where PS (i, j, t ) represents the distance from pixel (i,j) in the gridded map levels and explanatory variables in different years, thereby improving
to a corner or the center on date t, Lon (i, j) represents the longitude of the extrapolation capacity of the model.
pixel (i,j), Lat(i, j) represents the latitude of pixel (i,j), and WOY re- However, the model’s extrapolation performance may not remain
presents the week of the year. constant outside the spatial domain of modeling, so we derived a site-
based CV scheme (also referred to as a “spatial CV” scheme in a pre-
2.2.2. Feature selection vious study (Xiao et al. 2018)) to detect potential spatial overfitting.
Although machine learning methods do not make too many as- Many studies used a sample-based CV scheme as the standard CV ap-
sumptions about the correlation between variables, the correlation proach, so we also derived the sample-based CV method for compar-
between variables will still have an impact from the perspective of ison. These three cross-validation strategies differ mainly in the re-
model interpretability and model overfitting. When variables are cor- sampling method. In a site-based CV scheme, the data samples are
related, different variables can replace each other (Reid et al. 2015), separated by their locations, while a sample-based CV scheme divides
resulting in an unstable model pattern. Taking too many variables into the datasets randomly. Both the site-based and sample-based CV
consideration may also lead to overfitting. Therefore, it is necessary to schemes are 10-fold in our study.
control the number of variables within a reasonable range. We selected Statistical indicators, such as R2 and RMSE, were calculated and
our features in two stages. The first stage was based on the Pearson compared among models at the daily, monthly, and seasonal scales (in
correlation coefficient and its p-value for each pair of variables. We this study, spring is March, April, and May, summer is June, July, and
dropped those with a smaller correlation coefficient to ozone con- August, autumn is September, October, and November, and winter is
centration among highly correlated variables (for details, see Figure December, January, and February). We also calculated the correlation
S1). Then, we excluded less important variables according to the fea- coefficient (R) for comparison with CTMs.
ture importance scores of all the selected variables in pre-trained
models (for details, see Figure S2).
The final features are as follows, and the abbreviations are in par- 2.2.5. Model testing using data beyond the years of the final model
entheses: daily maximum temperature (temp_max), sunshine duration We established an integrated external validation method in this
(SSD), total precipitable liquid water (TQL), zero-plane displacement study to test the extrapolation capability of the models independently,
height (DISPH), 10-meter northward wind (V10M), 10-meter eastward which is composed of two parts: a test of the long time series and a test
wind (U10M), 2-meter specific humidity (QV2M), total column ozone covering the entire study area. Although historical (2005–2012) MDA8
(TO3), ozone mass mixing ratio (o3d), carbon monoxide (CO), area ozone concentrations for mainland China were scarce, routine mon-
proportion of cultivated land (crop_ratio), forestland (forest_ratio), itoring and publication were carried out in Hong Kong, Taiwan, and
grassland (grass_ratio), waters (water_ratio) and construction land PRD (Yang et al. 2019), and there were some field measurements in
(constr_ratio) in each grid, altitude, black carbon surface mass con- Xi’an city (Wang et al. 2012) and Mount Tai (Sun et al. 2016). There-
centration (BCSMASS), SO4 surface mass concentration (SO4SMASS), fore, we estimated the MDA8 ozone concentrations throughout the
dust surface mass concentration of PM2.5 (DUSMASS25), geographic study period in these regions using the model established for
distances to the corners (C1H, C2H) and center (CCH) of a rectangle 2013–2017 (the final model) and compared them with the in situ
around the modeling grid, and the week of the year (week). measurements. Although these regions are located in different parts of
China, the testing results only covered a reltively small area, which may
2.2.3. XGBoost modeling not represent the situation across the country. Therefore, we im-
We tuned hyperparameters to acquire the optimal form of our plemented another nationwide test with the measurements from 2018.
model. The range of four hyperparameters, namely, the depth of the Based on the results, we explored how model performance changes
tree (max_depth), the sample weight of the smallest leaf node (min_- spatially. The testing results were also calculated at daily, monthly, and
child_weight), the proportion of random sampling for features (col- seasonal scales.
sample), and the subsample from random sampling for each tree, is
specified as the search space (Schratz 2018). Referring to a previous

4
R. Liu, et al. Environment International 142 (2020) 105823

2.3. Spatiotemporal trend analysis between the by-year CV results and site-based CV results indicates that
there is no obvious spatial overfitting in this model. The model per-
2.3.1. Spatial pattern recognition formance at the month and season levels were better than that at the
We hindcasted the historical MDA8 ozone concentration with the day level, indicating that averaging over time reduces the errors.
final estimator. The seasonal and annual metrics of ozone pollution in Compared to the final day-level model, the R2 for the by-year CV results
the study period were derived according to the method described in a and the sample-based CV results of the station-based model fell slightly
previous study (Lu et al. 2018), including exceedance160, ex- to 0.60 and 0.75, respectively, while that for the site-based CV results
ceedance100, the annual mean of the MDA8 ozone concentration, and rose to 0.66. Detailed CV results for the model based on stations are
the 90th percentile of the MDA8 ozone concentration (the shorthand is given in the Supplemental Information (Figure S4).
“Perc 90”) in a year or a season. Exceedance160 was defined as the Overall, the model based on grids achieved a slightly better per-
number of days when MDA8 greater than 160 μg/m3 (the Chinese grade formance than the model based on stations. The latter performed better
II national air quality standard (MEP and AQSIQ 2012)), and ex- in the results of the site-based CV scheme partly because the spatial
ceedance100 was defined as the number of days when MDA8 greater distances between the samples were shorter; as a result, these samples
than 100 μg/m3 (standards recommended by the WHO (WHO, 2006). were more similar according to the first law of geography.
Maps were created based on these metrics. We drew a time series diagram for testing the models’ ability to
reproduce ozone trends in the 2013–2017 period. As shown in Figure
2.3.2. Temporal trend analysis S5, the monthly mean MDA8 ozone estimates matched well with the
We calculated the annual and warm-season average MDA8 ozone ground-observed values in all three megacity clusters, which shows that
anomalies from 2005 to 2017 as a representation of the overall tem- there are no obvious anomalies in the performance of our model over
poral variations. The warm season is April to September, which often time.
witnesses severe ozone pollution and has received special attention in
previous studies (Ding et al. 2019; Lu et al. 2018). 3.2. External testing for historical estimates
We also analyzed the trends in the ozone concentration during the
whole period based on the monthly average to avoid the relatively high Table 1 presents the results of grid-based model testing for the
uncertainty in the daily estimates. The ozone anomalies were derived historical estimates. By comparing the testing results of different years,
by subtracting the long-term averages in the same month of different it is found that the prediction accuracy is only attenuated in small in-
years from the monthly values to remove the seasonal effect. We then crements with the increase in the time interval from the years of the
calculated the linear trends for each grid cell and region (all of China training datasets (the R2 values for the day-level models range from
and the three megacity clusters) with the least-squares approach as in 0.52 to 0.74, the RMSEs for the day-level ranges from 23.78 to
previous studies (Li et al. 2019a). Some previous studies have shown 31.72 μg/m3), indicating that the model established in this study is
great concern about the changes in ozone pollution since the im- relatively stationary when used for extrapolation on a long time scale.
plementation of the Air Pollution Prevention and Control Action Plan Similar to the CV results, the model performance at the month and
(APPCAP) in 2013 (Li et al. 2019a; Li et al. 2019b). Therefore, this season levels were also better. The R2 values at the month level ranged
study calculates the trends in the ozone concentration for 2005–2017, from 0.60 to 0.87, and the RMSEs ranged from 12.94 to 18.41 μg/m3,
2005–2012, and 2013–2017. showing the relatively high accuracy of the estimates. Among all the
regions, the R2 values at the month level were larger than 0.81 except in
PRD and Hong Kong from 2005 to 2012 (see details in Table S3).
2.3.3. Exposure assessment
We drew a time series diagram for testing the models’ ability to
Combining the annual mean MDA8 ozone concentrations with
reproduce ozone trends in the 2005–2017 period in Hong Kong and
concurrent LandScan population distribution datasets, we calculated
Taiwan (for details, see the Supplementary Information, Figure S6). The
the number of people exposed to different concentrations, and the in-
time series of the estimated and measured monthly mean MDA8 ozone
tervals were determined according to the standards recommended by
concentrations are quite similar, yet with noticeable overestimation in
the WHO (WHO, 2006) and the National Ambient Air Quality Standard of
Hong Kong from 2005 to 2008. Thus, the performance of the model is
China (MEP and AQSIQ 2012).
generally stable in the time series, while the uncertainty is obvious in
Then, we calculated the annual and warm-season population-
certain regions and certain years.
weighted average MDA8 ozone anomalies from 2005 to 2017. The
The model performance in 2018 was better than that of other his-
detailed formulation of the population-weighted O3 level is summarized
torical estimates at the day level, reflecting the applicability of the
as follows:
model on a national scale. Figure S7 illustrates the spatial distribution
CPW = ∑ (Pi × Ci)/ ∑ Pi (6) of the interpolated R2 and RMSE values between observed and esti-
mated ozone concentrations in 2018. Most of the areas have R2 values
where CPW is the population-weighted MDA8 ozone concentration for a greater than 0.5, and the areas with lower R2 values were not densely
certain region, Pi denotes the population in grid cell i, and Ci denotes the populated, which showed that the uncertainty might not have serious
estimated MDA8 level in the same grid cell. negative effects on the population exposure assessments. The best
All the analyses were implemented in the R environment (Core performance is achieved in North China, which is even better than that
2015). The R packages mlr and xgboost (GPU version) were used to in Taiwan, where the model has been proven to be reliable throughout
train the XGBoost model. the study period.

3. Results 3.3. Spatial patterns

3.1. Results of model cross-validation Fig. 3 demonstrates the spatial patterns in ozone pollution from
2005 to 2017. According to Fig. 3 (a), the average number of days
Fig. 2 shows the cross-validation results of the final model (the grid- exceeding the standard of 160 μg/m3 is 3.7 days, and the standard
based model). At the daily level, the R2 values for the by-year CV, site- deviation is 8.6 days, with an average exceedance160 not larger than
based CV, and sample-based CV results are 0.61, 0.64, and 0.78, re- 7 days per year in most areas of China. The areas with a large amount of
spectively, and the corresponding RMSEs are 28.34 μg/m3, 27.27 μg/ exceedance were mainly concentrated in the BTH, YRD, PRD, the
m3 and 21.47 μg/m3. The minor difference in the statistical metrics Jianghan Plain (JHP), the Sichuan Basin (SCB), and the Northeast Plain

5
R. Liu, et al. Environment International 142 (2020) 105823

Fig. 2. Density scatterplots of the cross-validation results for the final estimator. From top to bottom, the three rows are the results at the day, month, and season
levels. From left to right, the three columns were derived from the by-year, site-based, and sample-based CV schemes.

Table 1 southern China and some megacities. In spring, the pollution was
The model testing results of the final estimator at the day, month, and season mainly concentrated in the BTH, YRD, JHP, and NEP regions and the
levels. southern part of Yunnan Province. The air quality in northern Northeast
year MDA8 Monthly Average Seasonal average China and most of southern China except the PRD region was better in
summer. The pollution in southern China was more serious in autumn,
2 2
Sample Size R RMSE R RMSE R2 RMSE which might be related to Asian summer monsoon, tropical cyclones
and the land-sea breeze according to previous studies (Chen et al. 2020;
2005 24,046 0.60 26.16 0.82 13.13 0.86 11.88
2006 24,498 0.57 26.76 0.74 14.42 0.77 13.06 Li et al. 2014; Maji et al. 2019).
2007 29,070 0.55 31.72 0.69 18.10 0.70 16.46 Figure S8 and S9 show the spatial distribution of the annual mean
2008 29,509 0.56 30.04 0.71 16.36 0.72 14.74 MDA8 ozone concentrations and the warm-season mean MDA8 ozone
2009 24,589 0.62 24.80 0.80 12.94 0.86 9.92 concentrations from 2013 to 2017 in detail. Our model reproduced the
2010 29,080 0.52 30.10 0.60 16.97 0.60 15.40
2011 29,135 0.57 30.94 0.67 18.41 0.67 16.48
original spatial patterns of the monitoring data well, and in general,
2012 28,758 0.62 27.63 0.74 15.18 0.74 13.59 these patterns were stable over time.
2018 368,176 0.74 23.78 0.87 14.31 0.87 12.62

3.4. Temporal trends


Note. The unit for the RMSE is μg/m3.

Fig. 5 shows that both the annual and warm-season average MDA8
(NEP) regions, indicating that these areas suffered the most severe
anomalies fluctuated within a relatively small range (2.9 μg/m3 for the
ozone pollution in China. Northwest and South China had relatively
former and 4.2 μg/m3 for the latter) between 2005 and 2017. The
fewer exceedance days than other regions, but the average 90th per-
fluctuations in the annual average are smaller than those of the warm
centile concentrations were relatively high, as shown in Fig. 3 (b). All
season, indicating that there may be differences in the concentration
the hotspot areas in our results, except Northwest China, were densely
changes between different seasons of the same year. From the results of
populated and had many anthropogenic emission sources.
the time series analysis of regional averages in Figure S10, most
When using the standard recommended by the WHO, as shown in
megacity clusters, except for the BTH region, show a small range of
Fig. 3 (c), more than half of the country had over 100 nonattainment
fluctuations, without any obvious trend. The results of the least-squares
days each year, which indicates that the overall level of ozone pollution
method in Table S4 show that the ozone concentration in the BTH re-
across the country is relatively high. Fig. 3 (d) shows a similar pattern.
gion increased at a rate of 1.37 (95% CI: 0.46, 2.29) μg/m3/year be-
In terms of the seasonal differences, Fig. 4 shows that the area with
tween 2013 and 2017, indicating that ozone pollution was intensified in
high pollution was the smallest in winter and the largest in summer.
the BTH region. These findings also show that there are distinctions in
Pollution in winter was mainly concentrated in the coastal areas of
the concentration changes between different regions.

6
R. Liu, et al. Environment International 142 (2020) 105823

Fig. 3. Spatial distribution of (a) fourteen-year (2005–2017) average exceedance160, (b) fourteen-year average 90th percentile of the MDA8 ozone concentration, (c)
fourteen-year average exceedance100 and (d) fourteen-year average of the annual mean MDA8 ozone concentrations. The means and standard deviations over the
grids are shown in the insets.

Fig. 6 illustrates the spatial distribution of the monthly mean ozone and Southwest China showed a rather rapid increase up to over 3 μg/
anomaly trends for 2005–2017 and 2013–2017 at the grid scale. During m3/year, while some regions in South China showed a more obvious
the entire study period, ozone pollution in most parts of China had only downward trend.
a slight change trend. Among them, North China and Southwest China It is worth mentioning that the characteristics of the variation in the
showed an upward trend, and most parts of South China showed a ozone concentration are not spatially continuous, even in a small spatial
downward trend. However, both trends seldom exceeded ± 1 μg/m3/ scope. The trends in the grids are more obvious than those in the re-
year. From 2013 to 2017, the concentration of ozone in North China gions such as the three megacity clusters (see Supplemental

Fig. 4. Fourteen-year (2005–2017) mean the 90th percentiles of the MDA8 ozone concentration in each season.

7
R. Liu, et al. Environment International 142 (2020) 105823

Fig. 5. (a) Annual and (b) warm-season average MDA8 ozone anomalies from 2005 to 2017.

Fig. 6. Spatial distribution of the monthly mean ozone anomaly trends for 2005–2017 (left) and 2013–2017 (right) at the grid scale. The white areas in these two
figures indicate the significance level p ≥ 0.05.

Information, Table S4). For example, in the YRD region, many grids 2006, with an average of 94.43%. This finding shows that ozone pol-
exhibited significantly upward trends between 2013 and 2017, but the lution has posed a severe threat to population health in China.
trend in the whole region was not statistically significant. Fig. 8 shows that both annual and warm-season population-
weighted average MDA8 anomalies fluctuated within relatively small
3.5. Population exposure ranges (5.0 μg/m3 for the former and 9.2 μg/m3 for the latter) from
2005 through 2017. The change in the anomalies in this figure is quite
Fig. 7 shows the total number of people exposed to specific ranges of different from that of Fig. 5, which reflects the uncertainty in health loss
ozone concentration by year. The number of people living where Perc caused by changes in the population distribution and ozone con-
90 greater than 160 μg/m3 has changed dramatically in the past centration distribution. The annual mean values of the population-
5 years, with the highest ratio being 26.24% in 2017, the lowest ratio weighted average concentration in all the years are higher than that of
being 8.54% in 2013, and the average being 12.10%. The changes in the annual mean MDA8 ozone concentrations, showing that ozone
the number of people living in areas with an ozone concentration above concentration in the densely populated regions was higher than the
100 μg/m3 were smaller, ranging from 92.72% in 2014 to 95.56% in sparsely populated regions, which is consistent with findings in a

8
R. Liu, et al. Environment International 142 (2020) 105823

Fig. 7. Time series of populations exposed to ozone pollution (the 90th percentile of the MDA8 ozone concentration) at different concentrations for the period of
2005–2017.

Fig. 8. (a) Annual and (b) warm-season population-weighted average MDA8 ozone anomalies from 2005 to 2017.

9
R. Liu, et al. Environment International 142 (2020) 105823

previous study (Wang et al. 2017b). distribution of ozone across China, which revealed more detailed in-
The warm-season population-weighted average MDA8 ozone con- formation on changes in ozone pollution. The pollution hotspots iden-
centration rose continuously between 2013 and 2017, indicating that tified in our study, most of which were densely populated, was similar
the overall environmental health risks are increasing in warm seasons. to that derived from CTMs (Liu et al. 2018) and machine learning
models (Zhan et al. 2017). Spatially discontinuous patterns in the ozone
4. Discussion concentration and trends may be partially explained by the complex
chemical mechanism of ozone and local emissions. For example, the
In the Introduction, we raised two scientific questions on the issue of NOx titration effect caused by relatively higher NOx emissions in urban
long-term ozone concentration estimation beyond the period for which areas could consume O3, and the extinction effect of particulate matter
the monitored ozone concentration is not available. To establish a with higher concentrations in urban areas reduces the generation of O3,
stable relationship between pollutants and explanatory variables, this as discussed in another study (Yue et al. 2017). As a result, there were
study took multiple parameters that are tightly associated with ozone non-negligible differences in the ozone concentration in a small spatial
formation and fate into consideration. The spatial and temporal terms scope. The spatially discontinuous patterns in the ozone trends derived
were constructed to represent the effects of the spatiotemporal di- from our results are similar to those of other studies, as the OMI record
mensions. Feature selection was performed to reduce overfitting and from 2005 to 2017 showed a slight increase in East China and an in-
improve the generalization performance of the model. The XGBoost creasing frequency of pollution episodes, particularly in the north (Shen
algorithm was combined with the target-oriented CV methods to ex- et al. 2019). Another study (Li et al. 2017) found that severe O3 pol-
plore the stable relationship between ozone concentration and covari- lution has been looming in eastern China since the implementation of
ates in the available multiyear data. A comprehensive external valida- APPCAP, whose finding is also in agreement with our results. These
tion method composed of testing with regional multiyear data and findings may be partly explained by O3 sensitivity and changes in
testing with nationwide single-year data was introduced in this study to precursor emissions. Satellite retrievals revealed that O3 sensitivity
test the extrapolation capability of the models. The effects of different varies dynamically depending on both time and location (Fu et al.
model settings on model performance were compared to select the final 2012); for example, megacity clusters were volatile organic compound
estimator. (VOC) limited in both warm and cold seasons, while other parts in
We have demonstrated that our approach can achieve a high pre- Northeast Asia were VOC limited in January and NOx limited in July,
diction accuracy when comparing with historical in-situ measurements. and other parts in Southeast Asia were NOx limited in both January and
Our model also has similar results to previous observations from pub- July (Fu et al. 2012; Jin and Holloway 2015). Overall anthropogenic
lications in terms of monthly, seasonal or annual metrics, especially in NOx emissions in China are estimated to have decreased by 21% during
low altitude areas (see details in Table S6). Besides, the result is com- 2013–2017, whereas VOC emissions changed little (Li et al. 2019c).
parable to other studies based on machine learning approaches, even Decreasing NOx concentrations would increase the ozone concentration
across such a broad geographic area. Our model yielded a similar CV R2 under the VOC-limited conditions thought to prevail in Northeast Asia
and model testing R2 to a previous regional study conducted on Hainan in cold seasons and urban China while decreasing the ozone con-
Island from 2015 to 2017 with the XGBoost method (CV R2 = 0.59, centrations under NOx-limited conditions (Li et al. 2019a) so that
RMSE = 24.14 μg/m3, model testing R2 = 0.54, RMSE = 25.96 μg/m3) Southeast China witnessed a decreasing trend during 2013–2017.
(Li et al. 2020), even though our model was constructed on a national The severe threat that ozone pollution has posed to population
scale instead of a regional scale. When compared with another na- health in China finded in this study is in agreement with other studies
tionwide study based on the random forest algorithm conducted in too. Since previous studies used other metrics for non-attainment, we
2015 (site-based daily CV R2 = 0.69, RMSE = 26 μg/m3, monthly CV recalculated the same metric with other studies for comparison. Zhan
R2 = 0.71, RMSE = 19 μg/m3) (Zhan et al. 2017), our model also et. al estimated that in 2015 about 58% of the population lived in areas
showed comparable performance. Our model also outperforms many with more than 100 nonattainment days when using the WHO standard
CTMs. Liu et.al simulated 2015 ozone in mainland China with Weather (100 μg/m3), while the number estimated in our study was 66%. In
Research and Forecasting (WRF)-Community Multiscale Air Quality terms of the population exposed to 160 μg/m3 or above ozone pollution
(CMAQ) models, with correlation coefficients (R) greater than 0.60 for more than 30 days, the number estimated by Zhan et.al was 12%,
except that in January(Liu et al. 2018). Lin et.al used WRF-CMAQ to while our number was 20%. Considering the uncertainty in the popu-
derive 2014 ozone in China, with R greater than 0.5 in most areas(Lin lation distribution data used, this result is acceptable.
et al. 2018). While the R value for the by-year CV results in our study is The major limitation of our study is the prediction uncertainty for
0.78, which is much higher than the aforementioned studies. the locations where monitoring data are not available. For example, the
To the best of our knowledge, this is one of the first studies to use a increasing trend in the southwest of China is of high uncertainty. The
spatiotemporal machine learning model to reconstruct daily and monitoring sites are sparsely distributed over a vast area, which means
monthly surface ozone concentrations at a high spatial resolution of that there may not be enough data samples to capture the accurate
0.1° for the long-term period of 2005 to 2017, which can provide basic association between ozone concentration and predictors there.
data for future relevant environmental health studies and policy- Although we used the site-based CV scheme to test the models’ spatial
making on pollution control and prevention. extrapolation ability, the problem remains, as models can only learn
The feature importance scores show that the explanatory variables from the given data. We assume that the relationship between the
we collected and constructed based on knowledge from the literature ozone concentration and other variables is still valid in areas where
did play a role in the performance (for details, see Figure S11). The monitoring stations are not available. However, due to the constraints
orders of different variables were quite similar for both estimators, of the data sample distribution, we cannot prove this hypothesis per-
which showed a stable pattern in the impact these variables had on fectly. With further improvements in the distribution of monitoring
MDA8 ozone concentration estimation. The daily maximum tempera- stations in the future, the model can be further optimized.
ture, total precipitable liquid water, and sunshine duration were the top Another limitation originated from the spatiotemporal terms used in
3 meteorological factors, which is quite similar to a previous study this research; they were relatively simple, without considering the ef-
(Zhan et al. 2017). The rankings of the temporal terms, spatial terms fects of the spatiotemporal adjacency relationship, which may limit
C1H and SO4 surface mass concentration are 4, 7, and 3, respectively, further improvements in model performance.
showing the importance of spatiotemporal dimensions and aerosols in
the model.
Another highlight of this study lies in the long-term spatiotemporal

10
R. Liu, et al. Environment International 142 (2020) 105823

5. Conclusions concentrations in Beijing During 2006–2016. Environ. Pollut. 245, 29–37.


CMA, 2010. Quality Control Standard for Surface Meteorological Observations. Beijing:
China Meteorological Administration.
In this work, we developed a machine learning model based on the Core, R., 2015. A language and environment for statistical computing 1, 12–21.
XGBoost algorithm by using ground monitoring data combined with Ding, D., Xing, J., Wang, S., Chang, X., Hao, J., 2019. Impacts of emissions and me-
ozone retrievals, aerosol reanalysis, meteorological parameters, and teorological changes on China’s ozone pollution in the warm seasons of 2013 and
2017. Frontiers of Environmental Science & Engineering 13, 76.
land-use data. The model can achieve high prediction accuracy com- Feister, U., Balzer, K., 1991. Surface ozone and meteorological predictors on a sub-
pared with those of other studies, even across such a broad geographic regional scale. Atmospheric Environment Part a-General Topics 25, 1781–1790.
area, with R2 values for the by-year, site-based and sample-based CV Feng, Y., Zhang, W., Sun, D., Zhang, L., 2011. Ozone concentration forecast method based
on genetic algorithm optimized back propagation neural networks and support vector
results up to 0.61, 0.64, and 0.78, respectively, at the day level. machine data classification. Atmos. Environ. 45, 1979–1985.
External validation with regional multiyear data and nationwide single- Feng, Z.Z., De Marco, A., Anav, A., Gualtieri, M., Sicard, P., Tian, H.Q., Fornasier, F., Tao,
year data has shown the model to be robust and reliable for historical F.L., Guo, A.H., Paoletti, E., 2019. Economic losses due to ozone impacts on human
health, forest productivity and crop yield across China. Environ. Int. 131.
data prediction, with external model testing R2 values ranging from
Fu, J.S., Dong, X., Gao, Y., Wong, D.C., Lam, Y.F., 2012. Sensitivity and linearity analysis
0.60 to 0.87 at the month level in different years. of ozone in East Asia: The effects of domestic emission and intercontinental transport.
Using the final estimator, we obtained the nationwide ozone esti- J. Air Waste Manag. Assoc. 62, 1102–1114.
mates from 2005 to 2017. The BTH, YRD, PRD, JHP, SCB, and NEP Fu, Y., Tai, A.P.K., 2015. Impact of climate and land cover changes on tropospheric ozone
air quality and public health in East Asia between 1980 and 2010. Atmos Chem Phys
regions were identified as pollution hotspots according to the average 15, 10093–10106.
number of days exceeding the standard and the average 90th percentile Gaudel, A.; Cooper, O.; Ancellet, G.; Barret, B.; Boynard, A.; Burrows, J.; Clerbaux, C.;
of the MDA8 ozone concentration. During the research period, the Coheur, P.-F.; Cuesta, J.; Cuevas Agulló, E. Tropospheric Ozone Assessment Report:
Present-day distribution and trends of tropospheric ozone relevant to climate and
overall ozone levels fluctuated slightly, and their trends were not spa- global atmospheric chemistry model evaluation. 2018.
tially continuous. There was a significant increasing trend in the ozone Gelaro, Ronald, McCarty, Will, Suárez, Max J., Todling, Ricardo, Molod, Andrea, Takacs,
concentration in the BTH region of 1.37 (95% CI: 0.46, 2.29) μg/m3/ Lawrence, Randles, Cynthia A., Darmenov, Anton, Bosilovich, Michael G., Reichle,
Rolf, Wargan, Krzysztof, Coy, Lawrence, Cullather, Richard, Draper, Clara, Akella,
year between 2013 and 2017. In 2017, 26.24% of the population lived Santha, Buchard, Virginie, Conaty, Austin, da Silva, Arlindo M., Gu, Wei, Kim, Gi-
in areas exceeding the Chinese grade II national air quality standard, Kong, Koster, Randal, Lucchesi, Robert, Merkova, Dagmar, Nielsen, Jon Eric, Partyka,
which shows that ozone pollution has posed an obvious threat to po- Gary, Pawson, Steven, Putman, William, Rienecker, Michele, Schubert, Siegfried D.,
Sienkiewicz, Meta, Zhao, Bin, 2017. The Modern-Era Retrospective Analysis for
pulation health in China. Our products will provide basic data for future Research and Applications, Version 2 (MERRA-2). J. Climate 30 (14), 5419–5454.
relevant environmental health studies and policy-making for pollution He, Q., Gu, Y., Zhang, M., 2020. Spatiotemporal trends of PM2.5 concentrations in central
control and prevention. China from 2003 to 2018 based on MAIAC-derived high-resolution data. Environ. Int.
137, 105536.
Hu, X., Belle, J.H., Xia, M., Wildani, A., Waller, L., Strickland, M., Yang, L., 2017.
CRediT authorship contribution statement Estimating PM2.5 Concentrations in the Conterminous United States Using the
Random Forest Approach. Environ. Sci. Technol. 51, 6936.
Riyang Liu: Methodology, Software, Validation, Formal analysis, Jin, X., Holloway, T., 2015. Spatial and temporal variability of ozone sensitivity over
China observed from the Ozone Monitoring Instrument. Journal of Geophysical
Data curation, Writing - original draft, Visualization. Zongwei Ma: Research: Atmospheres 120, 7229–7246.
Conceptualization, Methodology, Writing - review & editing, Funding Li, G., Bei, N., Cao, J., Wu, J., Long, X., Feng, T., Dai, W., Liu, S., Zhang, Q., Tie, X., 2017.
acquisition. Yang Liu: Writing - review & editing. Yanchuan Shao: Widespread and persistent ozone pollution in eastern China during the non-winter
season of 2015: observations and source attributions. Atmos Chem Phys 17,
Investigation. Wei Zhao: Investigation. Jun Bi: Conceptualization, 2759–2774.
Resources, Supervision, Project administration, Funding acquisition. Li, J., Lu, K., Lv, W., Li, J., Zhong, L., Ou, Y., Chen, D., Huang, X., Zhang, Y., 2014. Fast
increasing of surface ozone concentrations in Pearl River Delta characterized by a
regional air quality monitoring network during 2006–2011. J. Environ. Sci. 26,
Acknowledgments 23–36.
Li, K., Jacob, D.J., Liao, H., Shen, L., Zhang, Q., Bates, K.H., 2019a. Anthropogenic drivers
This work was supported by the National Natural Science of 2013–2017 trends in summer surface ozone in China. Proc. Natl. Acad. Sci. 116,
422–427.
Foundation of China (41601546, 71921003, & 91644220). Thanks to
Li, K., Jacob, D.J., Liao, H., Zhu, J., Shah, V., Shen, L., Bates, K.H., Zhang, Q., Zhai, S.,
Qingyue Open Environmental Data Center (https://data.epmap.org) for 2019b. A two-pollutant strategy for improving ozone and particulate air quality in
support on Environmental data processing. China. Nat. Geosci. 1–5.
Li, M., Zhang, Q., Zheng, B., Tong, D., Lei, Y., Liu, F., Hong, C., Kang, S., Yan, L., Zhang,
Y., Bo, Y., Su, H., Cheng, Y., He, K., 2019c. Persistent growth of anthropogenic
Appendix A. Supplementary data NMVOC emissions in China during 1990–2017: dynamics, speciation, and ozone
formation potentials. Atmos Chem Phys Discuss 2019, 1–29.
Supplementary data to this article can be found online at https:// Li, R., Cui, L., Hongbo, F., Li, J., Zhao, Y., Chen, J., 2020. Satellite-based estimation of
full-coverage ozone (O3) concentration and health effect assessment across Hainan
doi.org/10.1016/j.envint.2020.105823. Island. J. Cleaner Prod. 244, 118773.
Lin, Y., Jiang, F., Zhao, J., Zhu, G., He, X., Ma, X., Li, S., Sabel, C.E., Wang, H., 2018.
References Impacts of O3 on premature mortality and crop yield loss across China. Atmos.
Environ. 194, 41–47.
Liu, H., Liu, S., Xue, B.R., Lv, Z.F., Meng, Z.H., Yang, X.F., Xue, T., Yu, Q., He, K.B., 2018.
Behrens, T., Schmidt, K., Viscarra Rossel, R., Gries, P., Scholten, T., MacMillan, R., 2018. Ground-level ozone pollution and its health impacts in China. Atmos. Environ. 173,
Spatial modelling with Euclidean distance fields and machine learning. Eur. J. Soil 223–230.
Sci. 69, 757–770. Lu, X., Hong, J., Zhang, L., Cooper, O.R., Schultz, M.G., Xu, X., Wang, T., Gao, M., Zhao,
Bischl, B.; Richter, J.; Bossek, J.; Horn, D.; Thomas, J.; Lang, M. mlrMBO: A Modular Y., Zhang, Y., 2018. Severe surface ozone pollution in China: A global perspective.
Framework for Model-Based Optimization of Expensive Black-Box Functions. Environ. Sci. Technol. Lett. 5, 487–494.
Bowman, K.W., 2013. Toward the next generation of air quality monitoring: Ozone. Ma, Z.; Hu, X.; Sayer, A.M.; Levy, R.; Zhang, Q.; Xue, Y.; Tong, S.; Bi, J.; Huang, L.; Liu, Y.
Atmos. Environ. 80, 571–583. Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013.
Burrows, W.R., Benjamin, M., Beauchamp, S., Lord, E.R., McCollor, D., Thomson, B., Environmental Health Perspectives 2016;124:184.
1995. Cart decision-tree statistical-analysis and prediction of summer season max- Ma, Z., Liu, R., Liu, Y., Bi, J., 2019. Effects of air pollution control policies on PM2.5
imum surface ozone for the Vancouver, Montreal, and Atlantic Regions of Canada. J. pollution improvement in China from 2005 to 2017: a satellite-based perspective.
Appl. Meteorol. 34, 1848–1862. Atmos Chem Phys 19, 6861–6877.
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd Maji, K.J., Ye, W.-F., Arora, M., Nagendra, S.M.S., 2019. Ozone pollution in Chinese cities:
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Assessment of seasonal variation, health effects and economic burden. Environ.
San Francisco, California, USA: ACM; 2016. Pollut. 247, 792–801.
Chen, X., Zhong, B.Q., Huang, F.X., Wang, X.M., Sarkar, S., Jia, S.G., Deng, X.J., Chen, McCarty, W.; Coy, L.; Gelaro, R.; Huang, A.; Merkova, D.; Smith, E.B.; Sienkiewicz, M.;
D.H., Shao, M., 2020. The role of natural factors in constraining long-term tropo- Wargan, K. MERRA-2 Input Observations: Summary and Assessment. 2016.
spheric ozone trends over Southern China. Atmos. Environ. 220. MEE. Report on the State of the Ecology and Environment in China 2017. Beijing:
Chen, Z.Y., Zhuang, Y., Xie, X.M., Chen, D.L., Cheng, N.L., Yang, L., Li, R.Y., 2019. Ministry of Ecology and Environment of China; 2018.
Understanding long-term variations of meteorological influences on ground ozone MEP, 2014. Report on the State of the Environment in China 2013. Beijing: Ministry of

11
R. Liu, et al. Environment International 142 (2020) 105823

Environmental Protection. of China. relation to complex wind flow in Hong Kong. Atmos. Environ. 35, 3203–3215.
MEP; AQSIQ. National Ambient Air Quality Standard of China. Ministry of Environmental Wang, T., Xue, L.K., Brimblecombe, P., Lam, Y.F., Li, L., Zhang, L., 2017. Ozone pollution
Protection of China. Administration of Quality Supervision, Inspection and in China: A review of concentrations, meteorological influences, chemical precursors,
Quarantine; 2012. and effects. Sci. Total Environ. 575, 1582–1596.
Qian, D., Rowland, S., Koutrakis, P., Schwartz, J., 2016. A Hybrid Model for Spatially and Wang, W.N., Cheng, T.H., Gu, X.F., Chen, H., Guo, H., Wang, Y., Bao, F.W., Shi, S.Y., Xu,
Temporally Resolved Ozone Exposures in the Continental United States. J. Air Waste B.R., 2017;7... Zuo X.; Meng C.; Zhang, X.C. Assessing Spatial and Temporal Patterns
Manag. Assoc. 67. of Observed Ground-level Ozone in China. Sci. Rep.
Randles, C.A., Da Silva, A.M., Buchard, V., Colarco, P.R., Darmenov, A., Govindaraju, R., Wang, X., Shen, Z., Cao, J., Zhang, L., Liu, L., Li, J., Liu, S., Sun, Y., 2012. Characteristics
Smirnov, A., Holben, B., Ferrare, R., Hair, J., Shinozuka, Y., Flynn, C.J., 2017. The of surface ozone at an urban site of Xi'an in Northwest China. J. Environ. Monit. 14,
MERRA-2 Aerosol Reanalysis, 1980 - onward, Part I: System Description and Data 116–126.
Assimilation Evaluation. J. Clim. 30, 6823–6850. Wei, J., Li, Z., Cribb, M., Huang, W., Xue, W., Sun, L., Guo, J., Peng, Y., Li, J., Lyapustin,
Reid, C.E., Jerrett, M., Petersen, M.L., Pfister, G.G., Morefield, P.E., Tager, I.B., Raffuse, A., Liu, L., Wu, H., Song, Y., 2020. Improved 1 km resolution PM2.5 estimates across
S.M., Balmes, J.R., 2015. Spatiotemporal Prediction of Fine Particulate Matter During China using enhanced space–time extremely randomized trees. Atmos. Chem Phys
the 2008 Northern California Wildfires Using Machine Learning. Environ. Sci. 20, 3273–3289.
Technol. 49, 3887–3896. WHO. WHO Air quality guidelines for particulate matter, ozone, nitrogen dioxide, and
Rose, A.N., McKee, J.J., Urban, M.L., Bright, E.A., 2018. LandScan 2017. Oak Ridge sulfur dioxide global update 2005 Summary of risk assessment ed^eds: World Health
National Laboratory, Oak Ridge, TN. Organization; 2006.
Schratz, P. Tuning Hyperparameters. 2018. Xiao, Q., Chang, H.H., Geng, G., Liu, Y., 2018. An ensemble machine-learning model to
Sharma, S., Sharma, P., Khare, M., 2017. Photo-chemical transport modelling of tropo- predict historical PM2. 5 concentrations in China from satellite data. Environ Sci
spheric ozone: A review. Atmos. Environ. 159, 34–54. Technol 52, 13260–13269.
Shen, L., Jacob, D.J., Liu, X., Huang, G.Y., Li, K., Liao, H., Wang, T., 2019. An evaluation Xue, T., Zheng, Y., Tong, D., Zheng, B., Li, X., Zhu, T., 2019. Zhang Q. Spatiotemporal
of the ability of the Ozone Monitoring Instrument (OMI) to observe boundary layer continuous estimates of PM2.5 concentrations in China, 2000–2016: A machine
ozone pollution across China: application to 2005–2017 ozone trends. Atmos. Chem. learning method with inputs from satellites, chemical transport model, and ground
Phys. 19, 6551–6560. observations. Environ. Int. 123, 345–357.
Shen, L., Mickley, L.J., 2017. Seasonal prediction of US summertime ozone using statis- Yang, L.F., Luo, H.H., Yuan, Z.B., Zheng, J.Y., Huang, Z.J., Li, C., Lin, X.H., Louie, P.K.K.,
tical analysis of large scale climate patterns. PNAS 114, 2491–2496. Chen, D.H., Bian, Y.H., 2019. Quantitative impacts of meteorology and precursor
Sun, L., Xue, L., Wang, T., Gao, J., Ding, A., Cooper, O.R., Lin, M., Xu, P., Wang, Z., Wang, emission changes on the long-term trend of ambient ozone over the Pearl River Delta,
X., Wen, L., Zhu, Y., Chen, T., Yang, L., Wang, Y., Chen, J., Wang, W., 2016. China, and implications for ozone control strategy. Atmos. Chem. Phys. 19,
Significant increase of summertime ozone at Mount Tai in Central Eastern China. 12901–12916.
Atmos Chem Phys 16, 10637–10650. Yin, P., Chen, R., Wang, L., Meng, X., Liu, C., Niu, Y., Lin, Z., Liu, Y., Liu, J., Qi, J., 2017.
Sun, L., Xue, L.K., Wang, Y.H., Li, L.L., Lin, J.T., Ni, R.J., Yan, Y.Y., Chen, L.L., Li, J., Ambient Ozone Pollution and Daily Mortality: A Nationwide Study in 272 Chinese
Zhang, Q.Z., Wang, W.X., 2019. Impacts of meteorology and emissions on summer- Cities. Environ. Health Perspect. 125, 117006.
time surface ozone increases over central eastern China between 2003 and 2015. Yue, X., Unger, N., Harper, K., Xia, X., Liao, H., Zhu, T., Xiao, J., Feng, Z., Li, J., 2017.
Atmos. Chem. Phys. 19, 1455–1469. Ozone and haze pollution weakens net primary productivity in China. Atmos. Chem.
Tu, J., Xia, Z.G., Wang, H.S., Li, W.Q., 2007. Temporal variations in surface ozone and its Phys. 17, 6073–6089.
precursors and meteorological effects at an urban site in China. Atmos. Res. 85, Zhan, Y., Luo, Y., Deng, X., Grieneisen, M.L., Zhang, M., Di, B., 2017. Spatiotemporal
310–337. prediction of daily ambient ozone levels across China using random forest for human
Wang, T., Wu, Y.Y., Cheung, T.F., Lam, K.S., 2001. A study of surface ozone and the exposure assessment. Environ. Pollut. 233, 464.

12

You might also like