Jaeog WCF Liaovandijk
Jaeog WCF Liaovandijk
A R T I C LE I N FO A B S T R A C T
Keywords: Detailed spatial information on the presence and properties of woody vegetation serves many purposes, in-
Tree cover cluding carbon accounting, environmental reporting and land management. Here, we investigated whether
Vegetation height machine learning can be used to combine multiple spatial observations and training data to estimate woody
Woody biomass vegetation canopy cover fraction (‘cover’), vegetation height (‘height’) and woody above-ground biomass dry
Australia
matter (‘biomass’) at 25-m resolution across the Australian continent, where possible on an annual basis. We
LiDAR
Landsat
trained a Random Forest algorithm on cover and height estimates derived from airborne LiDAR over 11 regions
and inventory-based biomass estimates for many thousands of plots across Australia. As predictors, we used
annual geomedian Landsat surface reflectance, ALOS/PALSAR L-band radar backscatter mosaics, spatial vege-
tation structure data derived primarily from ICESat/GLAS satellite altimetry, and spatial climate data. Cross-
validation experiments were undertaken to optimize the selection of predictors and the configuration of the
algorithm. The resulting estimation errors were 0.07 for cover, 3.4 m for height, and 80 t dry matter ha-1 for
biomass. A large fraction (89–94 %) of the observed variance was explained in each case. Priorities for future
research include validation of the LiDAR-derived cover training data and the use of new satellite vegetation
height data from the GEDI mission. Annual cover mapping for 2000–2018 provided detailed insight in woody
vegetation dynamics. Continentally, woody vegetation change was primarily driven by water availability and its
effect on bushfire and mortality, particularly in the drier interior. Changes in woody vegetation made a sub-
stantial contribution to Australia’s total carbon emissions since 2000. Whether these ecosystems will recover
biomass in future remains to be seen, given the persistent pressures of climate change and land use.
⁎
Corresponding author.
E-mail address: [email protected] (A.I.J.M. Van Dijk).
https://doi.org/10.1016/j.jag.2020.102209
Received 24 April 2020; Received in revised form 26 July 2020; Accepted 30 July 2020
0303-2434/ © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/BY/4.0/).
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
cover over Australia was to support the country’s National Carbon et al., 2020; Van Dijk et al., 2018). Our objective was to make optimal
Accounting System (NCAS; Furby, 2002; Lehmann et al., 2013). The use of this expanded range of satellite, airborne and field observations
NCAS mapping system continues to operate and uses statistical methods available to produce spatial estimates of woody vegetation cover,
applied to Landsat series observations to map the presence or absence height and biomass for Australia at high (25-m resolution). Where
of forest, as defined under the Kyoto protocol. Known disadvantages of feasible, we produced such estimates annually.
the mapping product are its categorical nature, which can be chal-
lenged by Australia’s extensive low-cover dry woodlands, and the par- 2. Data and methods
tially counterfactual character of Australia’s Kyoto definition of a forest
as “a minimum area of land of 0.2 ha with tree crown cover (or equivalent 2.1. Overall approach
stocking level) of more than 20 per cent with trees with the potential [sic] to
reach a minimum height of 2 m at maturity in situ”. Similarly, the defi- Our overall approach was to use Random Forest machine learning
nition of forest used in Australia’s National Forest Inventory has been algorithms to spatially estimate (1) woody vegetation canopy cover
“An area, incorporating all living and non-living components, that is domi- (‘cover’, WCF as a fraction), (2) vegetation height (‘height’, VH in m)
nated by trees having usually a single stem and a mature or potentially and (3) woody above-ground biomass (‘biomass’, WAGB in t dry matter
mature stand height exceeding 2 m and with existing or potential crown ha−1). The precise definition of these variables is a function of the
cover of overstorey strata about equal to or greater than 20 %”, with an training data and is described below. Our intent was to estimate annual
operational minimum area threshold of 0.5 ha (pers. comm. S. Read, time series of each variable. This is directly possible for cover, by using
ABARES). In contrast, operational mapping programmes such as the only the annual Landsat geomedians as predictors. However, as will be
Queensland Government’s Statewide Landcover and Trees Study con- demonstrated, Landsat observations alone do not allow reliable esti-
sider the actual rather than potential presence of woody vegetation, and mation of height and biomass. We addressed this by developing re-
use a foliage projected cover threshold of 10 % rather than a crown ference estimates for these variables using additional inputs based on
cover threshold. These differences in definitions have contributed to GLAS and PALSAR for the period 2007–2010 during which these ob-
contrasting reports regards trends in forest and woody vegetation more servations were made.
broadly.
Vegetation height and above-ground biomass have proven more
2.2. Training data
challenging to sense remotely. LiDAR measurements are arguably the
most reliable method to measure vegetation height, but suitable mea-
We used cover and height data derived from airborne LiDAR as
surements at global scale were not available until the launch of the
training data, as well as in situ biomass estimates from a large number
Geoscience Laser Altimeter System (GLAS) instrument aboard the
of plots. The LiDAR-based vegetation cover and height data are avail-
ICESat mission launched in 2003. It proved capable of providing global
able from the Terrestrial Ecosystem Research Network (TERN)1 for 11
measurements of canopy height (cf. Lefsky et al., 2005; Los et al., 2012;
areas shown in Fig. 1. Some characteristics of the data are listed in
Simard et al., 2011), although the spacing between along-track mea-
Table 1. The LiDAR point cloud data were acquired and processed for
surements has limited the accuracy and resolution of vegetation height
TERN following data and methods fully described in Van Dijk et al.
data.
(2018). Briefly, data for the ACT was collected at a nominal point
A comprehensive overview of the development of large-scale forest
density of at least 4 ppm across the region and 8 ppm over the urban
biomass remote sensing techniques is provided by Lucas et al. (2012).
area using a Trimble AX60 system containing a Riegl LMS-Q780 laser
Briefly, vegetation height data from altimetry has provided one ap-
instrument operated by RPS Mapping, while the remaining sites were
proach to estimating biomass, along with active and passive microwave
mapped at a nominal point density of 6–8 ppm using a Riegl Q560
observations (e.g., Baccini et al., 2017; Liu et al., 2015; Saatchi et al.,
operated by Airborne Research Australia.
2011). Lucas et al. (2010) were among the first to explore the use of
Woody cover fraction (WCF) was defined as the complement of the
backscatter by the active L-band ALOS PALSAR radar satellite instru-
gap fraction of the canopy above 2 m height, including both woody and
ment for biomass estimation and calibrated a conceptual model to field
leaf canopy elements, calculated following Fisher et al. (2020). These
biomass measurements over the state of Queensland. They found that
authors compared field observations of canopy gap fraction above 2-m
soil moisture had an important confounding influence on backscatter
with various candidate metrics calculated from airborne LiDAR col-
observations, which has since been mitigated by the development of
lected over 13 Australian sites, including some of those included here.
annual PALSAR composites (Shimada et al., 2014). Scarth et al. (2019)
They found empirically that error and bias were minimal when con-
developed an approach to integrate the GLAS, PALSAR and Landsat
sidering all vegetation returns originating from above 1.5 m height,
observations for forest structural mapping at 30 m resolution. Except
expressed as a ratio over the total number of vegetation and non-ve-
for the low-resolution (∼50 km) passive microwave remote sensing
getation returns. We applied this simple calculation to estimate WCF
approach of Liu et al. (2015), these remote sensing approaches to forest
from the classified LiDAR points at 5-m horizontal grid resolution.
height and biomass mapping have the disadvantage of being derived
Vegetation height VH was calculated as the median height of the
from active sensors with a short mission life, meaning that they are not
first return of all pulses at 1- or 2-m resolution (van Leeuwen and
suitable for reliable repeated mapping. ESA’s twin Sentinel-1 radar
Nieuwenhuis, 2010). Both cover and height were resampled to 50-m
imagers do provide an ongoing source of C-band backscatter observa-
resolution by calculating the median average to generate training data.
tions, but given the short wavelength of these observations, the signal
Spatially-continuous biomass training data were not available.
may be expected to be more sensitive to the canopy than to total forest
Instead, we used the Biomass Plot Library, a database of in situ biomass
biomass.
inventories compiled and published by TERN.2 The database is a col-
This study builds on the advances presented but aims to capitalise
lation of stem inventory data collected by different government, uni-
on some further recent innovations. Firstly, standardised processing of
versity, and industry organisations. Allometric models were used to
Landsat observations through Geoscience Australia’s Digital Earth
convert individual tree dimensions to biomass in tonnes of dry matter,
Australia program has generated a series of annual surface reflectance
which were subsequently aggregated for the full inventory plot. From
composites from 1988 onwards (Roberts et al., 2017). So far, these have
not been used for forest presence or property mapping. Secondly, a
database of spatial vegetation height and fractional cover products over 1
all data can be visualised and downloaded via https://maps.tern.org.au
several regions spread across Australia has recently become available 2
http://data.auscover.org.au/xwiki/bin/view/Product+pages/Biomass
and provides a rich database for machine-learning approaches (Fisher +Plot+Library
2
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
such sites. Therefore, the database was augmented with a large sample
of sites without any woody vegetation. We sampled these by combining
the national forest presence mapping (Furby, 2002) and MODIS Vege-
tation Continuous Fields (Hansen et al., 2000) with the cover and
height products derived as part of this study. From all locations where
all products agreed woody cover was zero and vegetation height less
than 2 m, a total of 11,835 samples were drawn, assigned zero biomass
and used to augment the woody biomass data.
2.3. Predictors
Table 1
Description of sites with LiDAR-based woody vegetation cover and height data.
Site name Survey year Area (km2) Training sample Land cover types
the full database of 10,823 entries, we rejected those corresponding to describe how the data was ortho-rectified, topographically corrected,
plot sizes smaller than 0.03 ha (< 1% of the database), surveyed before and intensity balanced during mosaicking. We used both the HH and
2000 (17 %), and with an estimated absolute or relative uncertainty in HV mosaics and calculated average values for the four years
field-based biomass exceeding 75 t/ha (4 %) or 25 % (5%), respectively. (2007–2010), after which they were resampled to 25-m in Australian
Visual inspection indicated there were some sites where woody vege- Albers projection.
tation cover was partially or wholly lost between the survey year and The structural vegetation classification for Australia by Scarth et al.
the reference period for biomass estimation. Furthermore, several more (2019) was also used to help estimate height and biomass. The authors
sites were temporally stable but located on or near the edge of a tran- first applied image segmentation to the PALSAR and Landsat imagery to
sition between woody and non-woody vegetation. In some cases, this create self-similar clusters of pixels (or superpixels). ICESat/GLAS L2
might have been due to modest (< 30 m) geolocation errors, whereas in Global Land Surface Altimetry Data for the period 2003–2009 were
other cases it appeared that the plot shape was very elongated (e.g., obtained from the National Snow and Ice Data Centre. These data were
being part of a tree belt). To address these issues, we sampled a 3-by-3 processed for differences in footprint size, laser output power, vegeta-
neighbourhood around the corresponding pixel in the 25-m gridded tion/ground reflectance, and terrain slope and then aggregated for each
WCF estimates derived here, for all years between the survey year and cluster into mean vertical profiles. From these, profile height percen-
the end of the 2010 reference period. Calculating the spatial mean tiles were extracted for different percentiles (h25, h50, h75, h95 and
cover for the common period 2007–2010 between the ALOS/PALSAR h100). The metrics for clusters without data were estimated based on
and ICESat/GLAS missions as a reference value, we rejected those sites similarity criteria. The resulting mapping product is available from
with a spatial standard deviation exceeding 10 % cover (4%); differ- TERN4 . Because PALSAR mosaics were used alongside Landsat in the
ences between cover for the survey year and reference value exceeding segmentation and interpolation steps, the structural vegetation classi-
10 % cover (28 %); and differences between mean cover for the full fication is not perfectly independent from the PALSAR backscatter data.
ICESat/GLAS mission period (2003–2010) and the reference value ex-
ceeding 10 % (25 %). The resulting sample size was 5,547, i.e., about
half (51 %) of the full database. 3
https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm
Initial training attempts produced overestimates of biomass in non- 4
http://data.auscover.org.au/xwiki/bin/view/Product+pages/ICESat
woody vegetation, most likely because the database did not include any +Vegetation+Structure
3
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
However, in practice the shared information content is minimal and in yest overlap.
any event the machine learning method used is robust against corre-
lation between predictors.
Initial testing made it clear that static spatial data sets of climate 2.6. Evaluation
variables improved the estimation of forest biomass considerably (see
Results). To this end, gridded data on long-term average air tempera- Validation performance was evaluated using the withheld sample
ture, precipitation and radiation were obtained from TERN and from data from the k-fold trials. The RMSE, the coefficient of determination
the Bureau of Meteorology (BoM). Specifically, gridded mean air tem- (R2), bias and model efficiency ME (Nash and Sutcliffe, 1970) were
perature and precipitation at 0.05° based on interpolation of station calculated as measures of performance. The value of ME is mathema-
measurements were obtained from BoM, and mean annual radiation tically equal to R2 if there is no systematic bias in the predictions and
based on a combination of solar radiation and illumination modelling lower if there is. The final predictions (i.e., not those used for valida-
on a 3-arcsecond (∼90 m) digital elevation model (Wilson and Gallant, tion) were made with an algorithm trained on the full training data set.
2000) was obtained from TERN. For cover and height, we also evaluated performance for individual
The different predictor data sets varied in spatial resolution LiDAR capture areas using the full dataset collected.
(25–90 m) and in some cases needed to be reprojected. In training and We analysed the agreement and differences between the cover es-
performance assessment this was allowed for by resampling the pre- timates and alternative vegetation cover products. These included the
dictor and predictands to 50 m and 75 m squared, respectively. The MODIS VCF product, Normalised Difference Vegetation Index (NDVI),
outputs were produced at 25-m resolution, but with the caveat that the and a Photosynthetic Vegetation cover fraction. MODIS VCF for the
spatial accuracy may be somewhat less. year 2015 was reprojected and resampled to 500-m resolution, and
compared to WCF for the same year and at the same resolution.
2.4. Methods The Normalised Difference Vegetation Index (NDVI) has frequently
been used to estimate fractional canopy cover (FC) and the closely re-
We trained Random Forests to predict cover, height and biomass lated fraction of absorbed photosynthetic radiation (FPAR, often as-
using the spatial input fields. For cover, we used all bands of the sumed 0.95 times FC). To do so, NDVI is commonly linearly scaled
Landsat geomedians as candidate predictors. For height, we used these between values of ca. 0.10–0.15 and ca. 0.80–0.85, assuming that a
same geomedians, as well as the ALOS/PALSAR mosaics and ICESAT/ maximum FC of 0.95 is achieved for the latter. NDVI was calculated
GLAS vegetation height percentiles. For biomass, we used the average from the geomedian reflectances.
of cover predictions for 2007–2010 and the height predictions derived Guerschman et al. (2015) developed a method to estimate the re-
in the previous steps, as well as the Landsat geomedians, ALOS/PALSAR spective projected fractions of photosynthetic vegetation (PV, i.e. living
mosaics and gridded climate data. leaves), non-photosynthetic vegetation (wood and litter) and bare soil
When trained on LiDAR-derived cover and height, the predictors from Landsat reflectance. The respective fractions are produced oper-
were resampled to the same 50-m grid as the resampled LiDAR pro- ationally and available through Geoscience Australia. We composited
ducts, after which a regularly-spaced sparse sample was taken to limit the individual images to bi-monthly median values, and the lowest
computational costs. The sample represented 1.3 % of the available value (minPV) of the six intervals was compared to WCF. The ex-
data for the ACT (N = 6838) and close to 25 % (N∼2000) for the other, pectation was that there would be a correlation between our estimated
smaller grids. This produced a total sample size of N = 40,585. WCF and MODIS VCF, NDVI and minPV, but that NDVI and minPV
Regression modelling requires that the data distribution is approxi- would generally exceed WCF due to the contribution of short vegeta-
mately Gaussian. This was the case for cover and height, but the bio- tion. The strongest correlation was expected with MODIS VCF, as it
mass data were positively skewed (skewness 2.37). Taking the cube represents woody canopy cover only, similar to our definition of WCF.
root made these data close to normally distributed (skewness 1.04). The weakest correlation was expected for NDVI. An intermediate cor-
relation was expected between WCF and minPV, as short herbaceous
2.5. Model vegetation tends to senescence at some stage during the year across
most of Australia, although with some notable exceptions.
The ‘TreeBagger’ function in MatLab was used. Experiments were The LiDAR-derived and predicted cover were also compared to the
carried out with two- to eight-fold cross-validation (and all integer categorical NCAS forest mapping product. Our interest was to de-
values in between) to determine the most robust model configuration termine whether a cover threshold could be found and applied to the
and selection of predictors, using the root mean square error (RMSE) as LiDAR-observed and predicted cover data to closely match the cate-
the selection criterion. The number of leaf nodes was varied from 5 to gorical NCAS mapping, and to interpret any differences that occurred.
50, and the ensemble size from 5 to 500. The NCAS mapping product contains three classes (non-forest, sparse
To understand the contribution of each of the predictors to perfor- woodland and forest), intended to correspond to cover thresholds of 10
mance, mean importance was calculated for k out-of-bag predictor % and 20 %, respectively. Four of the LiDAR areas had sufficient non-
importance sets returned by the algorithm in k-fold cross-validation. We forest and forest for a comparison. For the purposes of comparison we
carried out predictor removal experiments to determine the optimal merged the sparse woodland class with the non-forest class. We used
number and set of predictors, in which the algorithm was initially LiDAR cover to identify the cover threshold for each site that produced
trained with all available predictors and successively retrained, each the highest rate of agreement, calculated as the fraction of pixels clas-
time removing the least important remaining predictor. The experiment sified correctly. We subsequently applied these site-specific thresholds
was repeated three times and the set of predictors that consistently to the cover product. We also applied pre-determined thresholds of 10
outperformed the others was selected. % and 20 % for comparison.
Generalised empirical response functions between each predictor Finally, continental predictions of cover, height and biomass were
(xi) and the predictand (y) were visualised by calculating the median made, in the case of cover for the period 2000–2018, by distributed
and interquartile range of y for equal xi intervals between the 2% lowest processing on high-performance computing infrastructure. These were
and 2% highest xi values (see Fig. 2b for an example). An interquartile analysed for consistency with known spatial and temporal patterns of
range that is narrow compared to the range of median values indicates a woody vegetation cover, height and biomass.
strong relationship between xi and y. The same visualisation was made
using predicted values (yest) instead of y. For a reliable prediction, it
would be expected that the median and interquartile patterns for y and
4
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Fig. 2. (a) Scatter plot of predicted vs. reference Woody Cover Fraction (WCF, as fraction) for validation coloured from low (red) to high kernel density (blue) (only
half of the data pairs are shown for visualisation purposes). (b) Plots showing the distribution of in situ (blueish colours) and predicted (reddish colours) cover as a
function of each predictor (reflectance as a fraction), shown in order of decreasing prediction importance. Shown are the median (line) and inter-quartile range
(band) for equal predictor intervals; purple colours indicate overlapping ranges. Also shown the sample size for each interval (black line, no scale). (For interpretation
of the references to colour in the Figure, the reader is referred to the web version of this article).
Table 2
Performance of the trained random forest ensemble in terms of root mean square error (RMSE), coefficient of determination (R2) and Nash-Sutcliffe model efficiency
(ME). Also listed is the total sample size (N) and the predictors used in order of decreasing importance. WAGB* denotes performance for the estimates transformed
back to t ha−1 and considering the in-situ woody biomass data only, rather than the augmented data set.
Variable RMSE R2 ME N Variable importance order
WCF (fraction) 0.0715 0.940 0.940 40,585 nir, blue, swir1, swir2, red, green
VH (m) 3.35 0.889 0.888 40,585 blue, nir, swir1, HV, swir2, green, red, HH, h95, h75, h50, h25, h100
WAGB (t ha−1)1/3 0.667 0.927 0.927 16,681 Temp, VH, WCF, Rain, Rad
WAGB* (t ha−1) 79.8 0.519 0.494 5546 (as above)
5
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Table 3 Table 4
Description of sites in terms of LiDAR-derived median and 95 % range of woody Description of sites in terms of median and 95 % range of vegetation height (VH
cover fraction (WCF) and measures of prediction performance: prediction bias, in m) and measures of prediction performance (acronyms as for Table 3).
root mean square error (RMSE), coefficient of determination (R2) and model
Site name median (95 % bias RMSE R2 ME
efficiency (ME). Note: all cover values are given in % to reduce the number of
range)
digits.
1) Australian Capital 5 (0−17) 0.4 2.7 0.68 0.67
Site name median (95 % bias RMSE R2 ME
Territory
range)
2) Alice Springs (NT) 3 (2−4) 0.1 0.5 0.20 0.11
1) Australian Capital 37 (0−71) 0.3 7.3 0.90 0.90 3) Credo (WA) 6 (3−8) −0.3 1.0 0.33 0.26
Territory 4) Litchfield (NT) 8 (4−12) −0.7 1.7 0.48 0.38
2) Alice Springs (NT) 36 (26−43) −1.1 3.5 0.56 0.42 5) Robson (Qld) 26 (0−32) −0.4 3.3 0.85 0.85
3) Credo (WA) 18 (6−33) 0.3 4.2 0.63 0.62 6) Rushworth (Vic) 12 (10−15) −0.1 1.6 0.12 0.01
4) Litchfield (NT) 24 (15−36) 0.2 4.4 0.41 0.40 7) Tumbarumba (NSW) 22 (13−32) 0.0 3.8 0.54 0.53
5) Robson (Qld) 92 (0−98) −0.8 8.2 0.91 0.91 8) Warra (Tas) 27 (9−38) 1.2 6.1 0.36 0.34
6) Rushworth (Vic) 43 (31−49) −0.3 3.4 0.70 0.70 9) Watts (Vic) 38 (19−55) −3.0 8.2 0.32 0.21
7) Tumbarumba (NSW) 52 (25−70) 1.4 7.1 0.62 0.61 10) Whroo (Vic) 16 (15−17) −0.7 1.3 0.06 −1.00
8) Warra (Tas) 91 (50−96) −1.7 9.0 0.34 0.32 11) ZigZag (Vic) 17 (12−25) −0.3 2.6 0.46 0.45
9) Watts (Vic) 86 (66−94) −3.6 9.1 0.12 −0.41
10) Whroo (Vic) 42 (36−47) −0.8 2.7 0.24 −0.05
11) ZigZag (Vic) 59 (44−82) 1.8 8.1 0.35 0.23 3.5. Woody above-ground biomass
The RMSE for biomass predictions was 80 t ha−1 and ME was 0.49
3.4. Vegetation height (Table 2). A comparison of predicted and field-measured data pairs
(Fig. 4a) suggests saturation at high biomass. However, this effect is
The RMSE for height predictions was 3.4 m and ME was 0.88 much less in the transformed data and therefore, can be attributed
(Table 2). A comparison of predicted and LiDAR-derived height data primarily to the non-Gaussian distribution of the biomass data. Indeed,
pairs (Fig. 3a) suggests some saturation for tall forests, in that the under- and overestimations at high biomass were similarly frequent
height of vegetation taller than ∼40 m was frequently underestimated. (Fig. 4a). As a logical consequence of the back-transformation, the es-
Generally, estimation errors increased with vegetation height, i.e., er- timation error increases with biomass.
rors were heteroscedastic and partly proportional to the estimate. The empirical relationships between biomass, cover and height
Empirical relationships between Landsat surface reflectance and were as expected, that is, biomass monotonically increased with both
GLAS height and VH were broadly similar to those with cover, but with variables (Fig. 4b). The relationship with mean temperature (Temp) was
greater scatter. The relationship between PALSAR radar backscatter and also as expected, with the highest biomass found at cool sites. The
height conformed to expectation, although the non-monotonic re- decrease in biomass with local mean radiation (Rad) may be because
lationship for HH polarisation backscatter was not. The relationship higher radiation broadly coincides with higher temperature in Aus-
between GLAS and LiDAR height was monotonic and near-linear as tralia. The increase in biomass with precipitation (Rain) and stabilisa-
expected. Predicted height most closely approached h75 (i.e. the height tion around 800 mm y−1 rainfall agreed with expectation, but the
below which 75 % of the GLAS returns originated) (Fig. 3b). subsequent decrease did not.
Performance statistics for individual sites showed a bias within 1 m
and an RMSE similar or less than the full dataset for most sites, except 3.6. Continental mapping
for the sites with the tallest forests (8 and 9, Table 4). In both cases, the
predictions could not fully reproduce the large height variations Continental predictions of cover, height and biomass for the period
(> 20 m) over short distances (see Supplementary Material). 2007–2010 show strong commonalities in spatial patterns (Fig. 5).
Notable differences are the more localised occurrence of tall (> 30 m)
Fig. 3. As Fig. 2 but for vegetation height (VH in m). Units are fraction for Landsat reflectances (red, green, blue, nir, swir1, swir2), dB for PALSAR (HV and HH) and
m for altimetry (h25, h50, h75, h95 and h100). (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article).
6
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Fig. 4. As Fig. 2 but for woody above-ground biomass (AGB in t/ha); dashed lines in (a) indicate the standard error of the estimate. Predictor-predictand relationships
are for mean annual rainfall (Rain in mm), VH (m), WCF (fraction), mean annual temperature (Tmean in °C) and mean daily radiation (Rad in MJ m−2).
forests when compared to the spatial pattern of high cover values. any topographic shadows that remain in the radiometrically-corrected
Biomass broadly follows cover and height patterns but is greater in the annual geomedian reflectances. We did not find evidence that this
southern half of the continent. caused errors in the cover estimates. We hypothesise that the Random
Forest algorithm structure is able to account for differences in illumi-
nation, which affects reflectance in all bands.
3.7. Comparison with alternative vegetation cover measures
The prediction of height required a combination of optical, radar
and LiDAR altimetry-derived information. Experiments using only a
Our cover estimates showed reasonable agreement with MODIS VCF
subset of these inputs produced degraded performance. Conversely, the
woody canopy cover fraction (Fig. 6). MODIS VCF values were within
performance of biomass predictions benefited from the removal of re-
0.1 units of our WCF estimates for 71 % of all grid cells with WCF > 10
dundant variables. We hypothesise that this was due to correlations
% and within 0.2 units for 95 % of grid cells. MODIS VCF values were
between cover, height, and the original observations from which they
generally somewhat lower than WCF values across most of its range
were derived. To test this, we calculated Spearman’s non-parametric
(Fig. 6).
correlation between each combination of predictors, using the values
A comparison between LiDAR-derived cover and NDVI for the 11
included in the biomass training set. These correlation values were used
LiDAR sites confirms our expectation that NDVI and minPV values
to construct a cluster dendrogram, showing the relationship between
would exceed our cover estimates due to the short vegetation compo-
predictor variables (Fig. 7). We found that all altimetry-derived mea-
nent included in the former two (Fig. 8a ). Minimum bi-monthly PV
sures were highly correlated, as were the two PALSAR-derived pre-
values were typically lower than NDVI and closer to cover, but still
dictors and all Landsat bands except NIR. WCF was not most strongly
higher for low cover and, less expected, lower for high cover values
correlated to the Landsat reflectance it was derived from but closer
(Fig. 8b).
related to radar and altimetry-derived predictors. We attribute this to
Four of the LiDAR sites had a sufficiently large sample of both forest
the ability of the WCF algorithm to retrieve canopy structure in-
and non-forest vegetation in the NCAS mapping product for comparison
formation from the combination of contrasting visible/SWIR vs. NIR
with cover (Table 5). Thresholds that maximise the agreement between
responses to canopy cover (Fig. 2). Furthermore, VH was more strongly
the LiDAR-derived cover and NCAS mapping varied from 0.12 to 0.40.
correlated to the radar than to the altimetry observations, which cor-
When applying the site-specific threshold optimised for NCAS to pre-
responds with the greater importance of the PALSAR rather than alti-
dicted cover, the latter still produced a better classification result for
metry observations in predicting VH (Fig. 3). We assume this to be a
three out of four sites. If uniform thresholds of 10 % or 20 % cover were
result of the necessary extrapolation of relatively sparse ICESat/GLAS
used, our cover predictions provided better classification results than
measurements, compared to the greater density of the PALSAR ob-
NCAS in all cases.
servations. Future work to include GEDI mission observations should
provide new opportunities to test this assumption.
4. Discussion
The six Landsat bands available could be used to predict the pre- The agreement between LiDAR- and satellite-derived cover esti-
sence and cover fraction of woody vegetation with good accuracy. The mates was generally robust with low bias, but some sources of differ-
information content of the NIR band was notably different from that in ences could still be identified. These were found to originate from both
the other six bands, which all appeared to show rather similar inverse data sources. The geomedian reflectance represented an annual average
relationships to cover. In the exploratory stage of this research, we and was not necessarily representative for canopy cover at the time of
calculated and used various spectral indices used to classify woody LiDAR acquisition, though we did not find cases where this clearly af-
vegetation or forests proposed in previous studies. However, a Random fected the results. Artefacts were visible for mixed pixels containing
Forests model using these predictors instead of, or in addition to, the both vegetation and water, where the lower reflectance of the water
original reflectance showed marginally worse validation performance background appeared to produce potential overestimates of cover.
than the configuration used here. This may be due to the larger number Some evidence was found in the mapping shown in the Supplementary
of predictor variables, which can degrade prediction performance if the Material to suggest erroneous results for woody vegetation likely to be
information content between variables is strongly overlapping. A po- shorter than 2 m, such as new plantations and dense shrubland. In those
tential downside of the use of ‘raw’ reflectance is the potential effect of cases, the LiDAR-derived cover would be zero, but if the spectral
7
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
8
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Fig. 8. Density scatter plots showing the relationship between LiDAR-derived woody-vegetation cover fraction (WCF) and (a) the Normalised Difference Vegetation
Index (NDVI) derived from the annual Landsat reflectance geomedians; (b) the minimum of bi-monthly average Photosynthetic Vegetation cover fraction derived
following Guerschman et al. (2015).
The main challenge for height estimation and the primary source of remained. Apart from the limitations of the available predictors, these
error were its indirect relationship with optical and radar observations differences may partly arise from the training data itself. The database
and the low spatial density of GLAS altimetry observations. The pre- represented a collation of site observations from many different
dictive performance of Landsat reflectance appeared to be due mainly sources, determined by varying methods and allometric relationships,
to the increasing optical darkness associated with taller forests. The each with biases and errors of their own. Some of the differences be-
predictive performance of L-band radar is assumed to be due to the tween predicted and in situ observations could be attributed with
increased biomass and presence of large reflectors in taller forests. Both confidence. Among the clearest discrepancies, we predicted a very
are indirect relationships not directly attributable to vegetation height, small or zero biomass for some sites where relatively high biomass
and both appeared to saturate at heights of 20–30 m. By comparison, values (e.g., WAGB > 50 t ha−1) were reported. Visual inspection
altimetry can provide a more direct estimate of vegetation height, but showed that many of these sites were with tree plantings that were
there the challenge was the extrapolation of sparse along-track mea- sufficiently large in total area to be included in the training data, but
surements to pixels without any observations. Although the optical- and were in effect narrow plantings (e.g. tree belts) too narrow for the
radar-based segmentation approach developed by Scarth et al. (2019) is surrounding 75 × 75 m pixel region to be representative, despite our
an innovative solution to the sparse nature of the data, it cannot fully best efforts to exclude such sites. Other sites coincided with sparse ri-
avoid uncertainties due to the extrapolation. Among the different parian forests in dry regions. We hypothesise that the use of rainfall as a
ICESat/GLAS percentiles, LiDAR vegetation height most closely re- predictor may have led to an underestimation of the growth potential
sembled h75, that is, the height below which 75 % of the GLAS returns for these ecosystems, which receive additional water inputs. This il-
originated. That VH did not correspond to the highest canopy elements lustrates the risk of using predictors that have an environmental but no
(e.g., h95 or h100) may be because the height derived initially from the direct physical connection to biomass. The considerable unexplained
LiDAR represented the median of first returns. This likely resulted in a variation for high-biomass forests was probably partly related to the
value closer to Lorey’s mean height rather than maximum height. underestimation of height, but otherwise could not be attributed with
For biomass, the main challenge was the still relatively small confidence. We hypothesise that age, species composition and ecolo-
number (N = 5546) of in situ estimates available for training, and the gical history, including fire disturbance, may all be responsible for the
wide range of biomass for tree stands with identical height and canopy unexplained variation.
cover. This wide range partly could be explained by climate controls, Explicit consideration of disturbance history, e.g. derived from the
and mean temperature in particular. The greater biomass of cooler cover estimates presented here, may help improve the estimation of
forests is well known and can be attributed to lower turnover rates height and biomass. More immediately, the recent Global Ecosystem
(Keith et al., 2009). However, after accounting for temperature and Dynamics Investigation (GEDI) mission can be expected to shine a new
other climatological factors, a relatively large error (RMSE 80 t ha−1) light on the underlying reasons for the remaining unexplained variation
Table 5
Comparison of binary forest presence mapping performance between the national NCAS mapping product (Furby, 2002) and thresholding the cover predictions
developed in this study, evaluated against LiDAR-derived cover at four sites. Listed are the probability of correct classification (P) and the LiDAR-derived forest
percentage (ffor) when using that threshold. Also listed are the site-specific threshold of LiDAR-based cover (WCFlim) producing the highest PNCAS at each site.
Site site-specific WCFlim WCFlim = 0.20 WCFlim = 0.10
WCFlim PNCAS PWCF ffor PNCAS PWCF ffor PNCAS PWCF ffor
1) ACT 0.36 0.82 0.86 49% 0.65 0.85 66% 0.71 0.81 75 %
3) Credo 0.12 0.36 0.34 80% 0.17 0.66 37% 0.00 0.11 90 %
4) Litchfield 0.26 0.38 0.50 41% 0.17 0.29 81% 0.15 0.14 100 %
5) Robson 0.40 0.95 0.97 84% 0.90 0.95 87% 0.89 0.92 88%
mean 0.29 0.63 0.67 0.47 0.69 0.44 0.49
± st.dev. ± 0.12 ± 0.3 ± 0.3 ± 0.37 ± 0.29 ± 0.43 ± 0.43
9
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Fig. 9. Temporal change in total woody vegetation cover across Australia (a) annual time series of total extent (in millions of ha, Mha) for different cover thresholds;
(b) change in woody vegetation cover between 2000 − 2003 and 2015 − 2018 averaged at 25-km resolution; (c–e) false colour composites of temporal cover
dynamics for three regions shown in (b); colours indicate low or absent mean cover for the periods 2015-2018 (red), 2000-2003 (blue) and the entire period (2000-
2018), respectively (The colour scales were stretched between 0 and 50 % cover for visualisation purposes). Interpretation is as follows: grey shades indicate stable
cover, red hues recent vegetation losses, yellow losses earlier in the period, blue recent gains, green losses followed by recovery, and purple initial gains followed by
losses. (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article).
in height and biomass, and provide opportunities to improve the esti- calculated for sparse woodlands and shrublands (10–20 % cover), as
mates. well as a −22 % or 21 Mha decrease for woodland forest (20–50 %
cover), but contrasting with an 8% or 2 Mha increase in open and closed
forests (> 50 % cover; Fig. 9). Clearly, the definition of woody vege-
4.3. Trends in Australian woody vegetation cover and biomass tation cover has significant consequences for calculated extents, trends
and inferences derived from them.
The cover time series was used to analyse temporal changes of At the national scale, woody cover gains during the 2000–2018
woody vegetation cover on the Australian continent. For each year, we period dominate in many of the coastal regions, whereas losses dom-
calculated the total extent of woody vegetation for different cover inate inland (Fig. 9b). Detailed colour composites illustrate the driving
thresholds (10, 20, 30, 50 and 70 %) at 25-m resolution (Fig. 9). processes (Fig. 9c-e): the planting and harvesting cycles of plantation
For the period 2000–2018, the total area of woody vegetation with forestry (strong red and blue hues); the impact of bushfires at different
≥50 % cover (including open and closed forests conform Australia’s times (patches in multiple colours with sharp boundaries) and the ef-
National Forest Inventory) was 25.3 Mha or 3.3 % of the total land fects of land clearing and drought mortality in more sparse vegetation
surface area (771 Mha). However, choosing a threshold of 10 % in- (light yellow and pink hues).
creased the area to 173.4 Mha (22.5 %). As much as 69 % of the land Estimating the changes in biomass associated with these processes
area appeared to have at least some woody vegetation at 25-m re- would be valuable, e.g. for understanding the contribution of
solution (> 1 % cover), although we expect this number to be biased Australia’s ecosystems to the global carbon cycle. This requires as-
high due to estimation error at very small cover values. Temporal sumptions, because here we were only able to estimate mean biomass
fluctuations occur regardless of definition (Fig. 9a). These changes are for the reference period 2007–2010. An approximate estimate may be
well-understood and attributable to multi-annual cycles in water obtained by multiplying changes in each cover class (cf. Fig. 9a) with
availability, with a gradual decline during the Millennium Drought the average biomass for that class. This approach does not account for
(2001–2009), recovery due to the ‘Big Wet’ (2010–2011) and a sub- changes in non-woody (i.e., herbaceous) biomass, dead biomass, below-
sequent decline due to a return of dry conditions (van Dijk et al., 2013). ground biomass, and soil organic matter. Furthermore, the assumption
They are superimposed on a long-term declining trend (2000 − 2018) that biomass increases or disappears in proportion with canopy cover
due to ongoing land clearing, but the long-term trend varies as a certainly will often not hold at small scales, though it may be justified
function of threshold. A large −39 % or 46 Mha decrease was
10
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Table 6
Summary of the extent and above-ground biomass (WAGB) carbon for woody vegetation in different woody cover fraction (WCF) classes. Also listed are changes in
extent and total carbon between 2000 and 2018, and the mean biomass carbon density for each class. Carbon content was estimated as half of total dry matter.
WCF class all ≥70 % 50–70 % 30–50 % 20–30 % 10–20 % < 10 %
11
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209
Government. High performance computing and data storage resources using ICESat. Geophys. Res. Lett. 32 (22).
were made available by the National Computational Infrastructure. Z.L. Lehmann, E.A., Wallace, J.F., Caccetta, P.A., Furby, S.L., Zdunic, K., 2013. Forest cover
trends from time series Landsat data for the Australian continent. Int. J. Appl. Earth
was supported by a travel grant from the Chinese Science Council and Obs. Geoinf. 21, 453–462.
the National Natural Science Foundation of China (Contract Liu, Y.Y., et al., 2015. Recent reversal in loss of global terrestrial biomass. Nature Clim.
41671361). We thank Drs Xingwen Quan, Marta Yebra and Xiangzhuo Change 5 (5), 470–474.
Los, S., et al., 2012. Vegetation height products between 60° S and 60° N from ICESat
Liu for their assistance and companionship. GLAS data. Geosci. Model. Dev. 5, 413–432.
Lucas, R., et al., 2010. An evaluation of the ALOS PALSAR L-band backscatter—above
Appendix A. Supplementary data ground biomass relationship Queensland, Australia: impacts of surface moisture
condition and vegetation structure. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 3
(4), 576–593.
Supplementary material related to this article can be found, in the Lucas, R., et al., 2012. Global forest monitoring with synthetic aperture radar (SAR) data.
online version, at doi:https://doi.org/10.1016/j.jag.2020.102209. Global For. Monit. Earth Obs. 1, 287.
Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I
— a discussion of principles. J. Hydrol. 10 (3), 282–290.
References Roberts, D., Mueller, N., Mcintyre, A., 2017. High-dimensional pixel composites from
earth observation time series. IEEE Trans. Geosci. Remote. Sens. 55 (11), 6254–6264.
Baccini, A., et al., 2017. Tropical forests are a net carbon source based on aboveground Saatchi, S.S., et al., 2011. Benchmark map of forest carbon stocks in tropical regions
measurements of gain and loss. Science 358 (6360), 230–234. across three continents. Proc. Natl. Acad. Sci.
Department of Environment and Energy, 2017. Australian National Greenhouse Accounts: Scarth, P., Armston, J., Lucas, R., Bunting, P., 2019. A structural classification of aus-
National Inventory Report, 2017 I. tralian vegetation using ICESat/GLAS, ALOS PALSAR, and landsat sensor data.
Fisher, A., Armston, J., Goodwin, N., Scarth, P., 2020. Modelling canopy gap probability, Remote Sens. 11 (2), 147.
foliage projective cover and crown projective cover from airborne lidar metrics in Shimada, M., et al., 2014. New global forest/non-forest maps from ALOS PALSAR data
Australian forests and woodlands. Remote Sens. Environ. 237, 111520. (2007–2010). Remote Sens. Environ. 155, 13–31.
Furby, S., 2002. Land Cover Change: Specification for Remote Sensing Analysis. Simard, M., Pinto, N., Fisher, J.B., Baccini, A., 2011. Mapping forest canopy height
Australian Greenhouse Office, Canberra. globally with spaceborne lidar. J. Geophys. Res. Biogeosci. 116 (G4).
Guerschman, J.P., et al., 2015. Assessing the effects of site heterogeneity and soil prop- van Dijk, A.I.J.M., et al., 2013. The millennium drought in southeast Australia
erties when unmixing photosynthetic vegetation, non-photosynthetic vegetation and (2001–2009): natural and human causes and implications for water resources, eco-
bare soil fractions from Landsat and MODIS data. Remote Sens. Environ. 161, 12–26. systems, economy and society. Water Resour. Res. 49, 1–18.
Hansen, M.C., DeFries, R.S., Townshend, J.R.G., Sohlberg, R., 2000. Global land cover van Dijk, A., Mount, R., Gibbons, P., Vardon, M., Canadell, P., 2014. Environmental re-
classification at 1 km spatial resolution using a classification tree approach. Int. J. porting and accounting in Australia: progress, prospects and research priorities. Sci.
Remote Sens. 21 (6), 1331–1364. Total Environ. 473–474 (0), 338–349.
Hansen, M.C., et al., 2013. High-resolution global maps of 21st-century forest cover Van Dijk, A.I.J.M., Paget, M., Suarez, L., Gale, M., 2018. TERN Airborne LiDAR and
change. Science 342 (6160), 850–853. Hyperspectral Products Document. Canberra. .
Keith, H., Mackey, B.G., Lindenmayer, D.B., 2009. Re-evaluation of forest biomass carbon van Leeuwen, M., Nieuwenhuis, M., 2010. Retrieval of forest structural parameters using
stocks and lessons from the world’s most carbon-dense forests. Proc. Natl. Acad. Sci. LiDAR remote sensing. Eur. J. For. Res. 129 (4), 749–770.
U.S.A. 106 (28), 11635–11640. Wilson, J.P., Gallant, J.C., 2000. Secondary topographic attributes. Terrain anal.
Lefsky, M.A., et al., 2005. Estimates of forest canopy height and aboveground biomass Principles appl. 87–131.
12