0% found this document useful (0 votes)
15 views12 pages

Jaeog WCF Liaovandijk

This document describes a study that used machine learning to estimate woody vegetation cover, height, and biomass across Australia at 25-m resolution using multiple data sources. Random Forest models were trained on cover and height estimates from airborne LiDAR in 11 regions and biomass estimates from thousands of field plots. Models used Landsat surface reflectance, radar backscatter, vegetation structure from satellite altimetry, and climate data as predictors to produce annual cover maps from 2000 to 2018 with errors of 0.07 for cover, 3.4 m for height, and 80 t/ha for biomass. Changes in woody vegetation contributed substantially to Australia's carbon emissions since 2000.

Uploaded by

Albert VanDijk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Jaeog WCF Liaovandijk

This document describes a study that used machine learning to estimate woody vegetation cover, height, and biomass across Australia at 25-m resolution using multiple data sources. Random Forest models were trained on cover and height estimates from airborne LiDAR in 11 regions and biomass estimates from thousands of field plots. Models used Landsat surface reflectance, radar backscatter, vegetation structure from satellite altimetry, and climate data as predictors to produce annual cover maps from 2000 to 2018 with errors of 0.07 for cover, 3.4 m for height, and 80 t/ha for biomass. Changes in woody vegetation contributed substantially to Australia's carbon emissions since 2000.

Uploaded by

Albert VanDijk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Contents lists available at ScienceDirect

Int J Appl Earth Obs Geoinformation


journal homepage: www.elsevier.com/locate/jag

Woody vegetation cover, height and biomass at 25-m resolution across T


Australia derived from multiple site, airborne and satellite observations
Zhanmang Liaoa, Albert I.J.M. Van Dijkb,*, Binbin Hea,d, Pablo Rozas Larraondob, Peter F. Scarthc
a
School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
b
Fenner School of Environment & Society, Australian National University, Canberra, 2601 ACT, Australia
c
Joint Remote Sensing Research Program, School of Earth and Environmental Sciences, University of Queensland, Brisbane, QLD, 4072, Australia
d
Center for Information Geoscience, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China

A R T I C LE I N FO A B S T R A C T

Keywords: Detailed spatial information on the presence and properties of woody vegetation serves many purposes, in-
Tree cover cluding carbon accounting, environmental reporting and land management. Here, we investigated whether
Vegetation height machine learning can be used to combine multiple spatial observations and training data to estimate woody
Woody biomass vegetation canopy cover fraction (‘cover’), vegetation height (‘height’) and woody above-ground biomass dry
Australia
matter (‘biomass’) at 25-m resolution across the Australian continent, where possible on an annual basis. We
LiDAR
Landsat
trained a Random Forest algorithm on cover and height estimates derived from airborne LiDAR over 11 regions
and inventory-based biomass estimates for many thousands of plots across Australia. As predictors, we used
annual geomedian Landsat surface reflectance, ALOS/PALSAR L-band radar backscatter mosaics, spatial vege-
tation structure data derived primarily from ICESat/GLAS satellite altimetry, and spatial climate data. Cross-
validation experiments were undertaken to optimize the selection of predictors and the configuration of the
algorithm. The resulting estimation errors were 0.07 for cover, 3.4 m for height, and 80 t dry matter ha-1 for
biomass. A large fraction (89–94 %) of the observed variance was explained in each case. Priorities for future
research include validation of the LiDAR-derived cover training data and the use of new satellite vegetation
height data from the GEDI mission. Annual cover mapping for 2000–2018 provided detailed insight in woody
vegetation dynamics. Continentally, woody vegetation change was primarily driven by water availability and its
effect on bushfire and mortality, particularly in the drier interior. Changes in woody vegetation made a sub-
stantial contribution to Australia’s total carbon emissions since 2000. Whether these ecosystems will recover
biomass in future remains to be seen, given the persistent pressures of climate change and land use.

1. Introduction optical observations to develop the ‘vegetation-continuous fields’


(MODIS VCF) product using a machine learning approach. The in-
Detailed spatial information on the presence and properties of formation was developed using a regression tree ensemble using 250-m
woody vegetation serves various purposes, from land management and MODIS reflectance that was trained on an aggregated 30-m mapping of
planning to nature conservation, fire risk management, and greenhouse projected canopy cover using Landsat; itself verified using very high-
gas emissions mitigation and reporting on both sources and sinks. resolution imagery. Addressing the increase in spatial resolution ne-
Relevant properties include, in order of increasing complexity: the cessary to monitor the ongoing degradation of global forest stocks,
presence, canopy cover fraction, height, biomass, structure and floristic Hansen et al. (2013) developed a Landsat-based binary forest mapping
composition of the woody vegetation component. Ideally, this in- methodology. These global data sets are a valuable resource for global
formation is available as a regularly updated time series, allowing it to change studies and as a benchmark. However, they have typically
be used for monitoring, accounting and reporting purposes (van Dijk traded off resolution to achieve global coverage and do not always
et al., 2014). make optimum use of regional or national data to constrain estimates,
Satellite remote sensing has been shown a cost-effective method for at least in the Australian context. A number of studies have explored the
deriving temporal information on forest presence and canopy density use of single or multiple sensor satellite observations to assess woody
over large areas. At the global scale, Hansen et al. (2000) used MODIS vegetation over Australia. The first attempt to repeatedly map forest


Corresponding author.
E-mail address: [email protected] (A.I.J.M. Van Dijk).

https://doi.org/10.1016/j.jag.2020.102209
Received 24 April 2020; Received in revised form 26 July 2020; Accepted 30 July 2020
0303-2434/ © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/BY/4.0/).
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

cover over Australia was to support the country’s National Carbon et al., 2020; Van Dijk et al., 2018). Our objective was to make optimal
Accounting System (NCAS; Furby, 2002; Lehmann et al., 2013). The use of this expanded range of satellite, airborne and field observations
NCAS mapping system continues to operate and uses statistical methods available to produce spatial estimates of woody vegetation cover,
applied to Landsat series observations to map the presence or absence height and biomass for Australia at high (25-m resolution). Where
of forest, as defined under the Kyoto protocol. Known disadvantages of feasible, we produced such estimates annually.
the mapping product are its categorical nature, which can be chal-
lenged by Australia’s extensive low-cover dry woodlands, and the par- 2. Data and methods
tially counterfactual character of Australia’s Kyoto definition of a forest
as “a minimum area of land of 0.2 ha with tree crown cover (or equivalent 2.1. Overall approach
stocking level) of more than 20 per cent with trees with the potential [sic] to
reach a minimum height of 2 m at maturity in situ”. Similarly, the defi- Our overall approach was to use Random Forest machine learning
nition of forest used in Australia’s National Forest Inventory has been algorithms to spatially estimate (1) woody vegetation canopy cover
“An area, incorporating all living and non-living components, that is domi- (‘cover’, WCF as a fraction), (2) vegetation height (‘height’, VH in m)
nated by trees having usually a single stem and a mature or potentially and (3) woody above-ground biomass (‘biomass’, WAGB in t dry matter
mature stand height exceeding 2 m and with existing or potential crown ha−1). The precise definition of these variables is a function of the
cover of overstorey strata about equal to or greater than 20 %”, with an training data and is described below. Our intent was to estimate annual
operational minimum area threshold of 0.5 ha (pers. comm. S. Read, time series of each variable. This is directly possible for cover, by using
ABARES). In contrast, operational mapping programmes such as the only the annual Landsat geomedians as predictors. However, as will be
Queensland Government’s Statewide Landcover and Trees Study con- demonstrated, Landsat observations alone do not allow reliable esti-
sider the actual rather than potential presence of woody vegetation, and mation of height and biomass. We addressed this by developing re-
use a foliage projected cover threshold of 10 % rather than a crown ference estimates for these variables using additional inputs based on
cover threshold. These differences in definitions have contributed to GLAS and PALSAR for the period 2007–2010 during which these ob-
contrasting reports regards trends in forest and woody vegetation more servations were made.
broadly.
Vegetation height and above-ground biomass have proven more
2.2. Training data
challenging to sense remotely. LiDAR measurements are arguably the
most reliable method to measure vegetation height, but suitable mea-
We used cover and height data derived from airborne LiDAR as
surements at global scale were not available until the launch of the
training data, as well as in situ biomass estimates from a large number
Geoscience Laser Altimeter System (GLAS) instrument aboard the
of plots. The LiDAR-based vegetation cover and height data are avail-
ICESat mission launched in 2003. It proved capable of providing global
able from the Terrestrial Ecosystem Research Network (TERN)1 for 11
measurements of canopy height (cf. Lefsky et al., 2005; Los et al., 2012;
areas shown in Fig. 1. Some characteristics of the data are listed in
Simard et al., 2011), although the spacing between along-track mea-
Table 1. The LiDAR point cloud data were acquired and processed for
surements has limited the accuracy and resolution of vegetation height
TERN following data and methods fully described in Van Dijk et al.
data.
(2018). Briefly, data for the ACT was collected at a nominal point
A comprehensive overview of the development of large-scale forest
density of at least 4 ppm across the region and 8 ppm over the urban
biomass remote sensing techniques is provided by Lucas et al. (2012).
area using a Trimble AX60 system containing a Riegl LMS-Q780 laser
Briefly, vegetation height data from altimetry has provided one ap-
instrument operated by RPS Mapping, while the remaining sites were
proach to estimating biomass, along with active and passive microwave
mapped at a nominal point density of 6–8 ppm using a Riegl Q560
observations (e.g., Baccini et al., 2017; Liu et al., 2015; Saatchi et al.,
operated by Airborne Research Australia.
2011). Lucas et al. (2010) were among the first to explore the use of
Woody cover fraction (WCF) was defined as the complement of the
backscatter by the active L-band ALOS PALSAR radar satellite instru-
gap fraction of the canopy above 2 m height, including both woody and
ment for biomass estimation and calibrated a conceptual model to field
leaf canopy elements, calculated following Fisher et al. (2020). These
biomass measurements over the state of Queensland. They found that
authors compared field observations of canopy gap fraction above 2-m
soil moisture had an important confounding influence on backscatter
with various candidate metrics calculated from airborne LiDAR col-
observations, which has since been mitigated by the development of
lected over 13 Australian sites, including some of those included here.
annual PALSAR composites (Shimada et al., 2014). Scarth et al. (2019)
They found empirically that error and bias were minimal when con-
developed an approach to integrate the GLAS, PALSAR and Landsat
sidering all vegetation returns originating from above 1.5 m height,
observations for forest structural mapping at 30 m resolution. Except
expressed as a ratio over the total number of vegetation and non-ve-
for the low-resolution (∼50 km) passive microwave remote sensing
getation returns. We applied this simple calculation to estimate WCF
approach of Liu et al. (2015), these remote sensing approaches to forest
from the classified LiDAR points at 5-m horizontal grid resolution.
height and biomass mapping have the disadvantage of being derived
Vegetation height VH was calculated as the median height of the
from active sensors with a short mission life, meaning that they are not
first return of all pulses at 1- or 2-m resolution (van Leeuwen and
suitable for reliable repeated mapping. ESA’s twin Sentinel-1 radar
Nieuwenhuis, 2010). Both cover and height were resampled to 50-m
imagers do provide an ongoing source of C-band backscatter observa-
resolution by calculating the median average to generate training data.
tions, but given the short wavelength of these observations, the signal
Spatially-continuous biomass training data were not available.
may be expected to be more sensitive to the canopy than to total forest
Instead, we used the Biomass Plot Library, a database of in situ biomass
biomass.
inventories compiled and published by TERN.2 The database is a col-
This study builds on the advances presented but aims to capitalise
lation of stem inventory data collected by different government, uni-
on some further recent innovations. Firstly, standardised processing of
versity, and industry organisations. Allometric models were used to
Landsat observations through Geoscience Australia’s Digital Earth
convert individual tree dimensions to biomass in tonnes of dry matter,
Australia program has generated a series of annual surface reflectance
which were subsequently aggregated for the full inventory plot. From
composites from 1988 onwards (Roberts et al., 2017). So far, these have
not been used for forest presence or property mapping. Secondly, a
database of spatial vegetation height and fractional cover products over 1
all data can be visualised and downloaded via https://maps.tern.org.au
several regions spread across Australia has recently become available 2
http://data.auscover.org.au/xwiki/bin/view/Product+pages/Biomass
and provides a rich database for machine-learning approaches (Fisher +Plot+Library

2
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

such sites. Therefore, the database was augmented with a large sample
of sites without any woody vegetation. We sampled these by combining
the national forest presence mapping (Furby, 2002) and MODIS Vege-
tation Continuous Fields (Hansen et al., 2000) with the cover and
height products derived as part of this study. From all locations where
all products agreed woody cover was zero and vegetation height less
than 2 m, a total of 11,835 samples were drawn, assigned zero biomass
and used to augment the woody biomass data.

2.3. Predictors

A primary input to the mapping is an annual time series of Landsat


surface reflectance values. Roberts et al. (2017) recently developed an
approach to composite repeated Landsat observations over any chosen
period into a single higher-dimension geometric median (or geome-
dian) of internally consistent surface reflectance values. The value of a
pixel in a geomedian image is the statistical median of all observations
for that pixel from a period of time but calculated in a way that respects
the relationship between reflectance in different bands. The product is
made available by Geoscience Australia as a 25-m annual geomedian
Fig. 1. Location of the airborne LiDAR data (red, numbered cf. Table 1) and of product, based on atmospherically-corrected and cloud-masked Landsat
biomass inventory sites that were (blue) and were not (green) included in the 5, 7 and 8 observations transformed to Australian Albers projection.
training sample, as well as the augmented sample of zero-woody biomass sites Active L-band radar mosaics were used to estimate height and
(pink). (For interpretation of the references to colour in the Figure, the reader is biomass. ALOS PALSAR L-band Fine Beam Dual (HH and HV) polar-
referred to the web version of this article).
isation data mosaics are available from JAXA.3 Shimada et al. (2014)

Table 1
Description of sites with LiDAR-based woody vegetation cover and height data.
Site name Survey year Area (km2) Training sample Land cover types

1) Australian Capital Territory 2015 3227 17032 Natural, agricultural, residential


2) Alice Springs (NT) 2014 21 2040 Dry mulga woodland
3) Credo (WA) 2012 22 2220 Dry open woodland
4) Litchfield (NT) 2013 22 2030 Open savannah forest
5) Robson (Qld) 2012 28 5155 Tropical rainforest and pasture
6) Rushworth (Vic) 2012 21 2240 Temperate woodland
7) Tumbarumba (NSW) 2011 24 2699 Humid temperate forest
8) Warra (Tas) 2015 22 2229 Humid temperate forest
9) Watts (Vic) 2012 22 2230 Humid mountain ash forest
10) Whroo (Vic) 2012 0.4 440 Temperate woodland
11) ZigZag (Vic) 2012 22 2270 Temperate forest and open woodland

the full database of 10,823 entries, we rejected those corresponding to describe how the data was ortho-rectified, topographically corrected,
plot sizes smaller than 0.03 ha (< 1% of the database), surveyed before and intensity balanced during mosaicking. We used both the HH and
2000 (17 %), and with an estimated absolute or relative uncertainty in HV mosaics and calculated average values for the four years
field-based biomass exceeding 75 t/ha (4 %) or 25 % (5%), respectively. (2007–2010), after which they were resampled to 25-m in Australian
Visual inspection indicated there were some sites where woody vege- Albers projection.
tation cover was partially or wholly lost between the survey year and The structural vegetation classification for Australia by Scarth et al.
the reference period for biomass estimation. Furthermore, several more (2019) was also used to help estimate height and biomass. The authors
sites were temporally stable but located on or near the edge of a tran- first applied image segmentation to the PALSAR and Landsat imagery to
sition between woody and non-woody vegetation. In some cases, this create self-similar clusters of pixels (or superpixels). ICESat/GLAS L2
might have been due to modest (< 30 m) geolocation errors, whereas in Global Land Surface Altimetry Data for the period 2003–2009 were
other cases it appeared that the plot shape was very elongated (e.g., obtained from the National Snow and Ice Data Centre. These data were
being part of a tree belt). To address these issues, we sampled a 3-by-3 processed for differences in footprint size, laser output power, vegeta-
neighbourhood around the corresponding pixel in the 25-m gridded tion/ground reflectance, and terrain slope and then aggregated for each
WCF estimates derived here, for all years between the survey year and cluster into mean vertical profiles. From these, profile height percen-
the end of the 2010 reference period. Calculating the spatial mean tiles were extracted for different percentiles (h25, h50, h75, h95 and
cover for the common period 2007–2010 between the ALOS/PALSAR h100). The metrics for clusters without data were estimated based on
and ICESat/GLAS missions as a reference value, we rejected those sites similarity criteria. The resulting mapping product is available from
with a spatial standard deviation exceeding 10 % cover (4%); differ- TERN4 . Because PALSAR mosaics were used alongside Landsat in the
ences between cover for the survey year and reference value exceeding segmentation and interpolation steps, the structural vegetation classi-
10 % cover (28 %); and differences between mean cover for the full fication is not perfectly independent from the PALSAR backscatter data.
ICESat/GLAS mission period (2003–2010) and the reference value ex-
ceeding 10 % (25 %). The resulting sample size was 5,547, i.e., about
half (51 %) of the full database. 3
https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm
Initial training attempts produced overestimates of biomass in non- 4
http://data.auscover.org.au/xwiki/bin/view/Product+pages/ICESat
woody vegetation, most likely because the database did not include any +Vegetation+Structure

3
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

However, in practice the shared information content is minimal and in yest overlap.
any event the machine learning method used is robust against corre-
lation between predictors.
Initial testing made it clear that static spatial data sets of climate 2.6. Evaluation
variables improved the estimation of forest biomass considerably (see
Results). To this end, gridded data on long-term average air tempera- Validation performance was evaluated using the withheld sample
ture, precipitation and radiation were obtained from TERN and from data from the k-fold trials. The RMSE, the coefficient of determination
the Bureau of Meteorology (BoM). Specifically, gridded mean air tem- (R2), bias and model efficiency ME (Nash and Sutcliffe, 1970) were
perature and precipitation at 0.05° based on interpolation of station calculated as measures of performance. The value of ME is mathema-
measurements were obtained from BoM, and mean annual radiation tically equal to R2 if there is no systematic bias in the predictions and
based on a combination of solar radiation and illumination modelling lower if there is. The final predictions (i.e., not those used for valida-
on a 3-arcsecond (∼90 m) digital elevation model (Wilson and Gallant, tion) were made with an algorithm trained on the full training data set.
2000) was obtained from TERN. For cover and height, we also evaluated performance for individual
The different predictor data sets varied in spatial resolution LiDAR capture areas using the full dataset collected.
(25–90 m) and in some cases needed to be reprojected. In training and We analysed the agreement and differences between the cover es-
performance assessment this was allowed for by resampling the pre- timates and alternative vegetation cover products. These included the
dictor and predictands to 50 m and 75 m squared, respectively. The MODIS VCF product, Normalised Difference Vegetation Index (NDVI),
outputs were produced at 25-m resolution, but with the caveat that the and a Photosynthetic Vegetation cover fraction. MODIS VCF for the
spatial accuracy may be somewhat less. year 2015 was reprojected and resampled to 500-m resolution, and
compared to WCF for the same year and at the same resolution.
2.4. Methods The Normalised Difference Vegetation Index (NDVI) has frequently
been used to estimate fractional canopy cover (FC) and the closely re-
We trained Random Forests to predict cover, height and biomass lated fraction of absorbed photosynthetic radiation (FPAR, often as-
using the spatial input fields. For cover, we used all bands of the sumed 0.95 times FC). To do so, NDVI is commonly linearly scaled
Landsat geomedians as candidate predictors. For height, we used these between values of ca. 0.10–0.15 and ca. 0.80–0.85, assuming that a
same geomedians, as well as the ALOS/PALSAR mosaics and ICESAT/ maximum FC of 0.95 is achieved for the latter. NDVI was calculated
GLAS vegetation height percentiles. For biomass, we used the average from the geomedian reflectances.
of cover predictions for 2007–2010 and the height predictions derived Guerschman et al. (2015) developed a method to estimate the re-
in the previous steps, as well as the Landsat geomedians, ALOS/PALSAR spective projected fractions of photosynthetic vegetation (PV, i.e. living
mosaics and gridded climate data. leaves), non-photosynthetic vegetation (wood and litter) and bare soil
When trained on LiDAR-derived cover and height, the predictors from Landsat reflectance. The respective fractions are produced oper-
were resampled to the same 50-m grid as the resampled LiDAR pro- ationally and available through Geoscience Australia. We composited
ducts, after which a regularly-spaced sparse sample was taken to limit the individual images to bi-monthly median values, and the lowest
computational costs. The sample represented 1.3 % of the available value (minPV) of the six intervals was compared to WCF. The ex-
data for the ACT (N = 6838) and close to 25 % (N∼2000) for the other, pectation was that there would be a correlation between our estimated
smaller grids. This produced a total sample size of N = 40,585. WCF and MODIS VCF, NDVI and minPV, but that NDVI and minPV
Regression modelling requires that the data distribution is approxi- would generally exceed WCF due to the contribution of short vegeta-
mately Gaussian. This was the case for cover and height, but the bio- tion. The strongest correlation was expected with MODIS VCF, as it
mass data were positively skewed (skewness 2.37). Taking the cube represents woody canopy cover only, similar to our definition of WCF.
root made these data close to normally distributed (skewness 1.04). The weakest correlation was expected for NDVI. An intermediate cor-
relation was expected between WCF and minPV, as short herbaceous
2.5. Model vegetation tends to senescence at some stage during the year across
most of Australia, although with some notable exceptions.
The ‘TreeBagger’ function in MatLab was used. Experiments were The LiDAR-derived and predicted cover were also compared to the
carried out with two- to eight-fold cross-validation (and all integer categorical NCAS forest mapping product. Our interest was to de-
values in between) to determine the most robust model configuration termine whether a cover threshold could be found and applied to the
and selection of predictors, using the root mean square error (RMSE) as LiDAR-observed and predicted cover data to closely match the cate-
the selection criterion. The number of leaf nodes was varied from 5 to gorical NCAS mapping, and to interpret any differences that occurred.
50, and the ensemble size from 5 to 500. The NCAS mapping product contains three classes (non-forest, sparse
To understand the contribution of each of the predictors to perfor- woodland and forest), intended to correspond to cover thresholds of 10
mance, mean importance was calculated for k out-of-bag predictor % and 20 %, respectively. Four of the LiDAR areas had sufficient non-
importance sets returned by the algorithm in k-fold cross-validation. We forest and forest for a comparison. For the purposes of comparison we
carried out predictor removal experiments to determine the optimal merged the sparse woodland class with the non-forest class. We used
number and set of predictors, in which the algorithm was initially LiDAR cover to identify the cover threshold for each site that produced
trained with all available predictors and successively retrained, each the highest rate of agreement, calculated as the fraction of pixels clas-
time removing the least important remaining predictor. The experiment sified correctly. We subsequently applied these site-specific thresholds
was repeated three times and the set of predictors that consistently to the cover product. We also applied pre-determined thresholds of 10
outperformed the others was selected. % and 20 % for comparison.
Generalised empirical response functions between each predictor Finally, continental predictions of cover, height and biomass were
(xi) and the predictand (y) were visualised by calculating the median made, in the case of cover for the period 2000–2018, by distributed
and interquartile range of y for equal xi intervals between the 2% lowest processing on high-performance computing infrastructure. These were
and 2% highest xi values (see Fig. 2b for an example). An interquartile analysed for consistency with known spatial and temporal patterns of
range that is narrow compared to the range of median values indicates a woody vegetation cover, height and biomass.
strong relationship between xi and y. The same visualisation was made
using predicted values (yest) instead of y. For a reliable prediction, it
would be expected that the median and interquartile patterns for y and

4
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Fig. 2. (a) Scatter plot of predicted vs. reference Woody Cover Fraction (WCF, as fraction) for validation coloured from low (red) to high kernel density (blue) (only
half of the data pairs are shown for visualisation purposes). (b) Plots showing the distribution of in situ (blueish colours) and predicted (reddish colours) cover as a
function of each predictor (reflectance as a fraction), shown in order of decreasing prediction importance. Shown are the median (line) and inter-quartile range
(band) for equal predictor intervals; purple colours indicate overlapping ranges. Also shown the sample size for each interval (black line, no scale). (For interpretation
of the references to colour in the Figure, the reader is referred to the web version of this article).

3. Results variables (6 geomedian reflectance, 5 LiDAR heights, 2 radar variables,


3 climate variables, and WCF and VH) did not provide the optimal
3.1. Model configuration results. RMSE slightly improved as variables were successively re-
moved, achieving a 4.1 % improved RMSE when only five variables
In all three cases, we found that calibration and validation perfor- remained. Removing these remaining variables (Rad, Rain, WCF, VH,
mance converged and stabilised if an ensemble of 50 or more was used and finally Temp) did degrade performance. Therefore, the five vari-
in combination with 20 leaves per node, regardless of the k-fold sample ables were retained in the selected model.
split chosen. For cover and height, there was no appreciable change in
validation performance from two- to eight-fold cross-validation. We
interpreted this to be because the training sample was sufficiently large 3.3. Woody canopy cover fraction
for half of the sample to contain most of the variation present in the
population. In the case of biomass, the eight-fold validation RMSE was The RMSE for predictions of cover was 0.072 and ME was 0.94
4 % smaller than the two-fold validation performance. We interpret this (Table 2). A comparison of predicted and LiDAR-derived cover data
to indicate a risk that the eight-fold cross-validation model might have pairs (Fig. 2a) suggests a good predictive ability for most of the dynamic
been overfitted to some degree, and selected four-fold cross-validation, range, but there was evidence of slight underestimation over nearly
which showed near-identical performance, to reduce that risk. In the closed canopy (e.g., WCF > 0.85). With the exception of near-infrared
remainder of this section, we refer to results obtained with a Random (NIR) reflectance, empirical relationships between reflectance and
Forest ensemble of 100 with a minimum 30 leaves per node, applied in cover resembled a ramp function with low and high reflectance corre-
two-fold (for cover and height) or four-fold (for biomass) cross-vali- sponding to high and low cover, respectively (Fig. 2b). A more complex
dation. and less well-defined relationship was found between NIR and cover.
Interpreting these relationships, a combination of high NIR reflectance
3.2. Performance and predictors selected and low reflectance in the other bands leads to a high cover estimate,
whereas alternative combinations produce a lower cover prediction.
The performance measures achieved are listed in Table 2. The R2 The predicted relationships and ranges corresponded well with the
and ME values were identical, which means that the model was free of empirical relationships derived from the LiDAR observations (over-
any bias. For the biomass values transformed back to their original lapping lines and shaded areas in Fig. 2b).
units, the ME value was still very close to R2, suggesting that the A comparison of performance statistics between sites produced a
transformation did not lead to a large bias. similarly small bias and RMSE as for the full dataset (see Supplementary
The predictor removal experiments for cover and height showed Material for figures for each site). The worst performance was found for
that all candidate predictors made a meaningful contribution to vali- site 9. This could partially be attributed to the fact that the very high
dation performance, although the contribution of h25 and h100 to cover over this tall mountain ash forest could not be reproduced, and
height predictions was marginal. For biomass, the complete set of 18 partly to the moderate spatial variance at this site (Table 3).

Table 2
Performance of the trained random forest ensemble in terms of root mean square error (RMSE), coefficient of determination (R2) and Nash-Sutcliffe model efficiency
(ME). Also listed is the total sample size (N) and the predictors used in order of decreasing importance. WAGB* denotes performance for the estimates transformed
back to t ha−1 and considering the in-situ woody biomass data only, rather than the augmented data set.
Variable RMSE R2 ME N Variable importance order

WCF (fraction) 0.0715 0.940 0.940 40,585 nir, blue, swir1, swir2, red, green
VH (m) 3.35 0.889 0.888 40,585 blue, nir, swir1, HV, swir2, green, red, HH, h95, h75, h50, h25, h100
WAGB (t ha−1)1/3 0.667 0.927 0.927 16,681 Temp, VH, WCF, Rain, Rad
WAGB* (t ha−1) 79.8 0.519 0.494 5546 (as above)

5
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Table 3 Table 4
Description of sites in terms of LiDAR-derived median and 95 % range of woody Description of sites in terms of median and 95 % range of vegetation height (VH
cover fraction (WCF) and measures of prediction performance: prediction bias, in m) and measures of prediction performance (acronyms as for Table 3).
root mean square error (RMSE), coefficient of determination (R2) and model
Site name median (95 % bias RMSE R2 ME
efficiency (ME). Note: all cover values are given in % to reduce the number of
range)
digits.
1) Australian Capital 5 (0−17) 0.4 2.7 0.68 0.67
Site name median (95 % bias RMSE R2 ME
Territory
range)
2) Alice Springs (NT) 3 (2−4) 0.1 0.5 0.20 0.11
1) Australian Capital 37 (0−71) 0.3 7.3 0.90 0.90 3) Credo (WA) 6 (3−8) −0.3 1.0 0.33 0.26
Territory 4) Litchfield (NT) 8 (4−12) −0.7 1.7 0.48 0.38
2) Alice Springs (NT) 36 (26−43) −1.1 3.5 0.56 0.42 5) Robson (Qld) 26 (0−32) −0.4 3.3 0.85 0.85
3) Credo (WA) 18 (6−33) 0.3 4.2 0.63 0.62 6) Rushworth (Vic) 12 (10−15) −0.1 1.6 0.12 0.01
4) Litchfield (NT) 24 (15−36) 0.2 4.4 0.41 0.40 7) Tumbarumba (NSW) 22 (13−32) 0.0 3.8 0.54 0.53
5) Robson (Qld) 92 (0−98) −0.8 8.2 0.91 0.91 8) Warra (Tas) 27 (9−38) 1.2 6.1 0.36 0.34
6) Rushworth (Vic) 43 (31−49) −0.3 3.4 0.70 0.70 9) Watts (Vic) 38 (19−55) −3.0 8.2 0.32 0.21
7) Tumbarumba (NSW) 52 (25−70) 1.4 7.1 0.62 0.61 10) Whroo (Vic) 16 (15−17) −0.7 1.3 0.06 −1.00
8) Warra (Tas) 91 (50−96) −1.7 9.0 0.34 0.32 11) ZigZag (Vic) 17 (12−25) −0.3 2.6 0.46 0.45
9) Watts (Vic) 86 (66−94) −3.6 9.1 0.12 −0.41
10) Whroo (Vic) 42 (36−47) −0.8 2.7 0.24 −0.05
11) ZigZag (Vic) 59 (44−82) 1.8 8.1 0.35 0.23 3.5. Woody above-ground biomass

The RMSE for biomass predictions was 80 t ha−1 and ME was 0.49
3.4. Vegetation height (Table 2). A comparison of predicted and field-measured data pairs
(Fig. 4a) suggests saturation at high biomass. However, this effect is
The RMSE for height predictions was 3.4 m and ME was 0.88 much less in the transformed data and therefore, can be attributed
(Table 2). A comparison of predicted and LiDAR-derived height data primarily to the non-Gaussian distribution of the biomass data. Indeed,
pairs (Fig. 3a) suggests some saturation for tall forests, in that the under- and overestimations at high biomass were similarly frequent
height of vegetation taller than ∼40 m was frequently underestimated. (Fig. 4a). As a logical consequence of the back-transformation, the es-
Generally, estimation errors increased with vegetation height, i.e., er- timation error increases with biomass.
rors were heteroscedastic and partly proportional to the estimate. The empirical relationships between biomass, cover and height
Empirical relationships between Landsat surface reflectance and were as expected, that is, biomass monotonically increased with both
GLAS height and VH were broadly similar to those with cover, but with variables (Fig. 4b). The relationship with mean temperature (Temp) was
greater scatter. The relationship between PALSAR radar backscatter and also as expected, with the highest biomass found at cool sites. The
height conformed to expectation, although the non-monotonic re- decrease in biomass with local mean radiation (Rad) may be because
lationship for HH polarisation backscatter was not. The relationship higher radiation broadly coincides with higher temperature in Aus-
between GLAS and LiDAR height was monotonic and near-linear as tralia. The increase in biomass with precipitation (Rain) and stabilisa-
expected. Predicted height most closely approached h75 (i.e. the height tion around 800 mm y−1 rainfall agreed with expectation, but the
below which 75 % of the GLAS returns originated) (Fig. 3b). subsequent decrease did not.
Performance statistics for individual sites showed a bias within 1 m
and an RMSE similar or less than the full dataset for most sites, except 3.6. Continental mapping
for the sites with the tallest forests (8 and 9, Table 4). In both cases, the
predictions could not fully reproduce the large height variations Continental predictions of cover, height and biomass for the period
(> 20 m) over short distances (see Supplementary Material). 2007–2010 show strong commonalities in spatial patterns (Fig. 5).
Notable differences are the more localised occurrence of tall (> 30 m)

Fig. 3. As Fig. 2 but for vegetation height (VH in m). Units are fraction for Landsat reflectances (red, green, blue, nir, swir1, swir2), dB for PALSAR (HV and HH) and
m for altimetry (h25, h50, h75, h95 and h100). (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article).

6
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Fig. 4. As Fig. 2 but for woody above-ground biomass (AGB in t/ha); dashed lines in (a) indicate the standard error of the estimate. Predictor-predictand relationships
are for mean annual rainfall (Rain in mm), VH (m), WCF (fraction), mean annual temperature (Tmean in °C) and mean daily radiation (Rad in MJ m−2).

forests when compared to the spatial pattern of high cover values. any topographic shadows that remain in the radiometrically-corrected
Biomass broadly follows cover and height patterns but is greater in the annual geomedian reflectances. We did not find evidence that this
southern half of the continent. caused errors in the cover estimates. We hypothesise that the Random
Forest algorithm structure is able to account for differences in illumi-
nation, which affects reflectance in all bands.
3.7. Comparison with alternative vegetation cover measures
The prediction of height required a combination of optical, radar
and LiDAR altimetry-derived information. Experiments using only a
Our cover estimates showed reasonable agreement with MODIS VCF
subset of these inputs produced degraded performance. Conversely, the
woody canopy cover fraction (Fig. 6). MODIS VCF values were within
performance of biomass predictions benefited from the removal of re-
0.1 units of our WCF estimates for 71 % of all grid cells with WCF > 10
dundant variables. We hypothesise that this was due to correlations
% and within 0.2 units for 95 % of grid cells. MODIS VCF values were
between cover, height, and the original observations from which they
generally somewhat lower than WCF values across most of its range
were derived. To test this, we calculated Spearman’s non-parametric
(Fig. 6).
correlation between each combination of predictors, using the values
A comparison between LiDAR-derived cover and NDVI for the 11
included in the biomass training set. These correlation values were used
LiDAR sites confirms our expectation that NDVI and minPV values
to construct a cluster dendrogram, showing the relationship between
would exceed our cover estimates due to the short vegetation compo-
predictor variables (Fig. 7). We found that all altimetry-derived mea-
nent included in the former two (Fig. 8a ). Minimum bi-monthly PV
sures were highly correlated, as were the two PALSAR-derived pre-
values were typically lower than NDVI and closer to cover, but still
dictors and all Landsat bands except NIR. WCF was not most strongly
higher for low cover and, less expected, lower for high cover values
correlated to the Landsat reflectance it was derived from but closer
(Fig. 8b).
related to radar and altimetry-derived predictors. We attribute this to
Four of the LiDAR sites had a sufficiently large sample of both forest
the ability of the WCF algorithm to retrieve canopy structure in-
and non-forest vegetation in the NCAS mapping product for comparison
formation from the combination of contrasting visible/SWIR vs. NIR
with cover (Table 5). Thresholds that maximise the agreement between
responses to canopy cover (Fig. 2). Furthermore, VH was more strongly
the LiDAR-derived cover and NCAS mapping varied from 0.12 to 0.40.
correlated to the radar than to the altimetry observations, which cor-
When applying the site-specific threshold optimised for NCAS to pre-
responds with the greater importance of the PALSAR rather than alti-
dicted cover, the latter still produced a better classification result for
metry observations in predicting VH (Fig. 3). We assume this to be a
three out of four sites. If uniform thresholds of 10 % or 20 % cover were
result of the necessary extrapolation of relatively sparse ICESat/GLAS
used, our cover predictions provided better classification results than
measurements, compared to the greater density of the PALSAR ob-
NCAS in all cases.
servations. Future work to include GEDI mission observations should
provide new opportunities to test this assumption.
4. Discussion

4.1. Basis for prediction 4.2. Sources of error and uncertainty

The six Landsat bands available could be used to predict the pre- The agreement between LiDAR- and satellite-derived cover esti-
sence and cover fraction of woody vegetation with good accuracy. The mates was generally robust with low bias, but some sources of differ-
information content of the NIR band was notably different from that in ences could still be identified. These were found to originate from both
the other six bands, which all appeared to show rather similar inverse data sources. The geomedian reflectance represented an annual average
relationships to cover. In the exploratory stage of this research, we and was not necessarily representative for canopy cover at the time of
calculated and used various spectral indices used to classify woody LiDAR acquisition, though we did not find cases where this clearly af-
vegetation or forests proposed in previous studies. However, a Random fected the results. Artefacts were visible for mixed pixels containing
Forests model using these predictors instead of, or in addition to, the both vegetation and water, where the lower reflectance of the water
original reflectance showed marginally worse validation performance background appeared to produce potential overestimates of cover.
than the configuration used here. This may be due to the larger number Some evidence was found in the mapping shown in the Supplementary
of predictor variables, which can degrade prediction performance if the Material to suggest erroneous results for woody vegetation likely to be
information content between variables is strongly overlapping. A po- shorter than 2 m, such as new plantations and dense shrubland. In those
tential downside of the use of ‘raw’ reflectance is the potential effect of cases, the LiDAR-derived cover would be zero, but if the spectral

7
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Fig. 6. Relationship between woody-vegetation cover fraction (WCF) derived


here and the MODIS Vegetation Continuous Fields (VCF) product, both for 2015
and resampled to 500-m across the continent. For each 0.02 WCF interval, the
median (connected dots), interquartile range (solid lines) and 90 % range
(dotted line) were calculated.

Fig. 7. Cluster dendrogram showing the correlation between variables in the


full training dataset used for biomass prediction. WAGB* refers to the plot
measurements, whereas WCF and VH refer to predicted values.

estimates. Firstly, the identification of points in the full waveform re-


quires a clear peak intensity. This is more likely to occur in open ve-
getation with distinct layering than in dense and uniform vegetation,
and this introduces a degree of uncertainty into the cover estimates.
Secondly, LiDAR pulse density and footprint varied between sites. The
specified target pulse density varied from 4 to 8 ppm between sites and
within the ACT site. Lower pulse densities can be achieved by higher
flight altitudes, and assuming that the LiDAR beam divergence did not
change between flights, the LiDAR footprint would have been larger
and more diffuse for lower pulse densities. The achieved density of
returns also varied widely, from less than 4 to more than 20 ppm, as a
result of changes in flight speed and overlapping acquisitions from
adjoining flight lines. The above factors lead to inhomogeneity between
Fig. 5. Continental maps of estimated (top) woody cover fraction (WCF), and within sites. Nonetheless, Fisher et al. (2020) used partly the same
(centre) vegetation height (VH), and (bottom) woody above-ground biomass LiDAR data as we used, with the same inhomogeneities. Proposing the
(WAGB), all representing the period 2007–2010. same calculation also applied here (i.e., the fraction of all LiDAR returns
that originate from above 1.5 m height) they achieved an accuracy of
∼0.06 when compared to in-field measurements under the canopy.
Landsat signature was very similar to that of a forest, a non-zero cover
This accuracy is numerically similar to the agreement between LiDAR-
was predicted.
and Landsat-derived cover found here.
There will also have been error and bias in the LiDAR-derived cover

8
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Fig. 8. Density scatter plots showing the relationship between LiDAR-derived woody-vegetation cover fraction (WCF) and (a) the Normalised Difference Vegetation
Index (NDVI) derived from the annual Landsat reflectance geomedians; (b) the minimum of bi-monthly average Photosynthetic Vegetation cover fraction derived
following Guerschman et al. (2015).

The main challenge for height estimation and the primary source of remained. Apart from the limitations of the available predictors, these
error were its indirect relationship with optical and radar observations differences may partly arise from the training data itself. The database
and the low spatial density of GLAS altimetry observations. The pre- represented a collation of site observations from many different
dictive performance of Landsat reflectance appeared to be due mainly sources, determined by varying methods and allometric relationships,
to the increasing optical darkness associated with taller forests. The each with biases and errors of their own. Some of the differences be-
predictive performance of L-band radar is assumed to be due to the tween predicted and in situ observations could be attributed with
increased biomass and presence of large reflectors in taller forests. Both confidence. Among the clearest discrepancies, we predicted a very
are indirect relationships not directly attributable to vegetation height, small or zero biomass for some sites where relatively high biomass
and both appeared to saturate at heights of 20–30 m. By comparison, values (e.g., WAGB > 50 t ha−1) were reported. Visual inspection
altimetry can provide a more direct estimate of vegetation height, but showed that many of these sites were with tree plantings that were
there the challenge was the extrapolation of sparse along-track mea- sufficiently large in total area to be included in the training data, but
surements to pixels without any observations. Although the optical- and were in effect narrow plantings (e.g. tree belts) too narrow for the
radar-based segmentation approach developed by Scarth et al. (2019) is surrounding 75 × 75 m pixel region to be representative, despite our
an innovative solution to the sparse nature of the data, it cannot fully best efforts to exclude such sites. Other sites coincided with sparse ri-
avoid uncertainties due to the extrapolation. Among the different parian forests in dry regions. We hypothesise that the use of rainfall as a
ICESat/GLAS percentiles, LiDAR vegetation height most closely re- predictor may have led to an underestimation of the growth potential
sembled h75, that is, the height below which 75 % of the GLAS returns for these ecosystems, which receive additional water inputs. This il-
originated. That VH did not correspond to the highest canopy elements lustrates the risk of using predictors that have an environmental but no
(e.g., h95 or h100) may be because the height derived initially from the direct physical connection to biomass. The considerable unexplained
LiDAR represented the median of first returns. This likely resulted in a variation for high-biomass forests was probably partly related to the
value closer to Lorey’s mean height rather than maximum height. underestimation of height, but otherwise could not be attributed with
For biomass, the main challenge was the still relatively small confidence. We hypothesise that age, species composition and ecolo-
number (N = 5546) of in situ estimates available for training, and the gical history, including fire disturbance, may all be responsible for the
wide range of biomass for tree stands with identical height and canopy unexplained variation.
cover. This wide range partly could be explained by climate controls, Explicit consideration of disturbance history, e.g. derived from the
and mean temperature in particular. The greater biomass of cooler cover estimates presented here, may help improve the estimation of
forests is well known and can be attributed to lower turnover rates height and biomass. More immediately, the recent Global Ecosystem
(Keith et al., 2009). However, after accounting for temperature and Dynamics Investigation (GEDI) mission can be expected to shine a new
other climatological factors, a relatively large error (RMSE 80 t ha−1) light on the underlying reasons for the remaining unexplained variation

Table 5
Comparison of binary forest presence mapping performance between the national NCAS mapping product (Furby, 2002) and thresholding the cover predictions
developed in this study, evaluated against LiDAR-derived cover at four sites. Listed are the probability of correct classification (P) and the LiDAR-derived forest
percentage (ffor) when using that threshold. Also listed are the site-specific threshold of LiDAR-based cover (WCFlim) producing the highest PNCAS at each site.
Site site-specific WCFlim WCFlim = 0.20 WCFlim = 0.10

WCFlim PNCAS PWCF ffor PNCAS PWCF ffor PNCAS PWCF ffor
1) ACT 0.36 0.82 0.86 49% 0.65 0.85 66% 0.71 0.81 75 %
3) Credo 0.12 0.36 0.34 80% 0.17 0.66 37% 0.00 0.11 90 %
4) Litchfield 0.26 0.38 0.50 41% 0.17 0.29 81% 0.15 0.14 100 %
5) Robson 0.40 0.95 0.97 84% 0.90 0.95 87% 0.89 0.92 88%
mean 0.29 0.63 0.67 0.47 0.69 0.44 0.49
± st.dev. ± 0.12 ± 0.3 ± 0.3 ± 0.37 ± 0.29 ± 0.43 ± 0.43

9
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Fig. 9. Temporal change in total woody vegetation cover across Australia (a) annual time series of total extent (in millions of ha, Mha) for different cover thresholds;
(b) change in woody vegetation cover between 2000 − 2003 and 2015 − 2018 averaged at 25-km resolution; (c–e) false colour composites of temporal cover
dynamics for three regions shown in (b); colours indicate low or absent mean cover for the periods 2015-2018 (red), 2000-2003 (blue) and the entire period (2000-
2018), respectively (The colour scales were stretched between 0 and 50 % cover for visualisation purposes). Interpretation is as follows: grey shades indicate stable
cover, red hues recent vegetation losses, yellow losses earlier in the period, blue recent gains, green losses followed by recovery, and purple initial gains followed by
losses. (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article).

in height and biomass, and provide opportunities to improve the esti- calculated for sparse woodlands and shrublands (10–20 % cover), as
mates. well as a −22 % or 21 Mha decrease for woodland forest (20–50 %
cover), but contrasting with an 8% or 2 Mha increase in open and closed
forests (> 50 % cover; Fig. 9). Clearly, the definition of woody vege-
4.3. Trends in Australian woody vegetation cover and biomass tation cover has significant consequences for calculated extents, trends
and inferences derived from them.
The cover time series was used to analyse temporal changes of At the national scale, woody cover gains during the 2000–2018
woody vegetation cover on the Australian continent. For each year, we period dominate in many of the coastal regions, whereas losses dom-
calculated the total extent of woody vegetation for different cover inate inland (Fig. 9b). Detailed colour composites illustrate the driving
thresholds (10, 20, 30, 50 and 70 %) at 25-m resolution (Fig. 9). processes (Fig. 9c-e): the planting and harvesting cycles of plantation
For the period 2000–2018, the total area of woody vegetation with forestry (strong red and blue hues); the impact of bushfires at different
≥50 % cover (including open and closed forests conform Australia’s times (patches in multiple colours with sharp boundaries) and the ef-
National Forest Inventory) was 25.3 Mha or 3.3 % of the total land fects of land clearing and drought mortality in more sparse vegetation
surface area (771 Mha). However, choosing a threshold of 10 % in- (light yellow and pink hues).
creased the area to 173.4 Mha (22.5 %). As much as 69 % of the land Estimating the changes in biomass associated with these processes
area appeared to have at least some woody vegetation at 25-m re- would be valuable, e.g. for understanding the contribution of
solution (> 1 % cover), although we expect this number to be biased Australia’s ecosystems to the global carbon cycle. This requires as-
high due to estimation error at very small cover values. Temporal sumptions, because here we were only able to estimate mean biomass
fluctuations occur regardless of definition (Fig. 9a). These changes are for the reference period 2007–2010. An approximate estimate may be
well-understood and attributable to multi-annual cycles in water obtained by multiplying changes in each cover class (cf. Fig. 9a) with
availability, with a gradual decline during the Millennium Drought the average biomass for that class. This approach does not account for
(2001–2009), recovery due to the ‘Big Wet’ (2010–2011) and a sub- changes in non-woody (i.e., herbaceous) biomass, dead biomass, below-
sequent decline due to a return of dry conditions (van Dijk et al., 2013). ground biomass, and soil organic matter. Furthermore, the assumption
They are superimposed on a long-term declining trend (2000 − 2018) that biomass increases or disappears in proportion with canopy cover
due to ongoing land clearing, but the long-term trend varies as a certainly will often not hold at small scales, though it may be justified
function of threshold. A large −39 % or 46 Mha decrease was

10
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Table 6
Summary of the extent and above-ground biomass (WAGB) carbon for woody vegetation in different woody cover fraction (WCF) classes. Also listed are changes in
extent and total carbon between 2000 and 2018, and the mean biomass carbon density for each class. Carbon content was estimated as half of total dry matter.
WCF class all ≥70 % 50–70 % 30–50 % 20–30 % 10–20 % < 10 %

Extent in 2018 (106 ha) 531 12 14 33 41 73 357


Extent change 2000–2018 (106 ha) −72 1 1 −6 −15 −46 −7
Biomass carbon density (tC ha−1) 12 95 90 50 23 12 2
Biomass carbon in 2018 (106 tC) 6596 1107 1223 1669 964 892 742
Biomass carbon change 2000–2018 (106 tC) −1,050 130 50 −310 −345 −560 −14
Percentage of total woody vegetation extent (2018, %) 100 % 2% 3% 6% 8% 14 % 67 %
Percentage of biomass carbon (2018, %) 100 % 17 % 19 % 25 % 15 % 14 % 11 %

for total continental values. unexplained variance.


The resulting calculation suggests that Australia lost 1.1 GtC in [5] Preliminary interpretation of the cover time series demonstrated
biomass between 2000–2018, corresponding to an average of 58.3 MtC its suitability for detecting, and in some cases, attributing changes in
y−1 or an equivalent emission of 214 Mt CO2-eq y−1 (Table 6). This woody cover. At the continental scale, the temporal dynamics of cover
makes it a large, mostly additional, contribution to Australia’s reported for the period 2000–2018 are primarily driven by changes in water
total annual emissions, which varied from 530 − 610 Mt CO2-eq y−1 availability and their effect on bush fires and mortality, particularly in
for the same period according to official Australian government re- sparse woody vegetation in Australia’s drier interior.
porting (Department of Environment and Energy, 2017). Vegetation [6] The importance of sparse woody ecosystems in Australia makes
biomass contributions from land use, land-use change and forestry the estimation of total forest extent and even trends in extent suscep-
(LULUCF) included in the official accounts represent only a small, and tible to the tree cover fraction threshold chosen to define forests. This
in recent years, negative (i.e. net uptake) component of reportable goes some way to explaining the divergence in published estimates.
emissions. This is not necessarily inconsistent with the large emission Comparison with categorical (as opposed to our continuous) forest
source we infer, as the official inclusion of LULUCF biomass losses is cover mapping for four regions suggested that the cover data produced
subject to narrow definitions. Specifically, LULUCF reporting only in- here can reliably be used for categorical forest cover mapping following
cludes change due to human activities, and the definition of forest any defined cover threshold.
applied in official statistics excludes any woody vegetation with less [7] Loss of woody vegetation has made a very large contribution to
than 20 % forest canopy cover, whereas we found that this category was Australia’s total carbon emissions since 2000. Whether these ecosys-
associated with 55 % of total carbon losses (Table 6). Regardless of tems will recover biomass in future remains to be seen, given the per-
these accounting conventions, our results indicate that the combined sistent pressures of climate change and land use.
effects of drought mortality, bushfire and clearing in Australia’s ex-
tensive areas of sparse woody vegetation have made a very large con- Author statement
tribution to the country’s total emissions. Whether these ecosystems
will recover in future remains to be seen, given the persistent pressures Zhanmang Liao and Albert Van Dijk designed this study, carried out
of climate change and land use. the analysis and wrote a first draft of the manuscript. Pablo Rozas
Larraondo provided expert guidance in the use of the Random Forest
algorithm, Peter Scarth provided expert guidance in the appropriate use
5. Conclusions
of the in situ biomass data and radar and altimetry-based data and
Binbin He provided support and mentoring to Zhanmang Liao; all three
Our objective was to test whether machine learning can be used to
assisted in completing the manuscript.
combine multiple sources of spatial predictor and training data to es-
timate cover, height and biomass at high resolution across the
CRediT authorship contribution statement
Australian continent, where possible on an annual basis. We draw the
following conclusions:
Zhanmang Liao: Conceptualization, Methodology, Software,
[1] Airborne LiDAR provided a rich source of training data to esti-
Investigation, Data curation, Writing - review & editing. Albert I.J.M.
mate cover and height from satellite optical, active radar and altimetry
Van Dijk: Conceptualization, Methodology, Software, Validation,
observations. The resulting predictions were of relatively high accu-
Formal analysis, Writing - original draft, Supervision. Binbin He:
racy, explaining 89–94 % of the total variance, with estimation errors of
Writing - review & editing, Funding acquisition. Pablo Rozas
0.072 for cover fraction and 3.4 m for height, respectively.
Larraondo: Resources, Writing - review & editing. Peter F. Scarth:
[2] The use of annual Landsat surface reflectance geomedians was
Resources, Writing - review & editing.
successful, and points at a way to achieve large-area high-resolution
cover mapping that avoids the data quality and volume issues asso-
ciated with the use of a single scene. Declaration of Competing Interest
[3] The scarcity of ICESat/GLAS satellite LiDAR observations lim-
ited the achievable accuracy of tree height estimation, and through this, The authors declare that they have no known competing financial
the estimation of biomass. The recent GEDI mission would appear to interests or personal relationships that could have appeared to influ-
provide a valuable opportunity to improve this aspect of the metho- ence the work reported in this paper.
dology.
[4] Tree biomass could not be reliably predicted from satellite ob- Acknowledgements
servations alone, but the inclusion of climate variables, in particular,
mean temperature, led to considerably better accuracy. The overall This research was made possible thanks to the collation of plot
accuracy was 80 t ha−1, dominated by large unexplained variation for biomass data, the production of continental vegetation structure attri-
(tall) forests. While the plot biomass database provided a unique op- butes, and the acquisition and processing of airborne LiDAR observa-
portunity for the present study, the highly heterogeneous nature of tions funded by the Terrestrial Ecosystem Research Network.
these field-collected data also would have contributed to the Additional LiDAR data was acquired by the Australian Capital Territory

11
Z. Liao, et al. Int J Appl Earth Obs Geoinformation 93 (2020) 102209

Government. High performance computing and data storage resources using ICESat. Geophys. Res. Lett. 32 (22).
were made available by the National Computational Infrastructure. Z.L. Lehmann, E.A., Wallace, J.F., Caccetta, P.A., Furby, S.L., Zdunic, K., 2013. Forest cover
trends from time series Landsat data for the Australian continent. Int. J. Appl. Earth
was supported by a travel grant from the Chinese Science Council and Obs. Geoinf. 21, 453–462.
the National Natural Science Foundation of China (Contract Liu, Y.Y., et al., 2015. Recent reversal in loss of global terrestrial biomass. Nature Clim.
41671361). We thank Drs Xingwen Quan, Marta Yebra and Xiangzhuo Change 5 (5), 470–474.
Los, S., et al., 2012. Vegetation height products between 60° S and 60° N from ICESat
Liu for their assistance and companionship. GLAS data. Geosci. Model. Dev. 5, 413–432.
Lucas, R., et al., 2010. An evaluation of the ALOS PALSAR L-band backscatter—above
Appendix A. Supplementary data ground biomass relationship Queensland, Australia: impacts of surface moisture
condition and vegetation structure. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 3
(4), 576–593.
Supplementary material related to this article can be found, in the Lucas, R., et al., 2012. Global forest monitoring with synthetic aperture radar (SAR) data.
online version, at doi:https://doi.org/10.1016/j.jag.2020.102209. Global For. Monit. Earth Obs. 1, 287.
Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I
— a discussion of principles. J. Hydrol. 10 (3), 282–290.
References Roberts, D., Mueller, N., Mcintyre, A., 2017. High-dimensional pixel composites from
earth observation time series. IEEE Trans. Geosci. Remote. Sens. 55 (11), 6254–6264.
Baccini, A., et al., 2017. Tropical forests are a net carbon source based on aboveground Saatchi, S.S., et al., 2011. Benchmark map of forest carbon stocks in tropical regions
measurements of gain and loss. Science 358 (6360), 230–234. across three continents. Proc. Natl. Acad. Sci.
Department of Environment and Energy, 2017. Australian National Greenhouse Accounts: Scarth, P., Armston, J., Lucas, R., Bunting, P., 2019. A structural classification of aus-
National Inventory Report, 2017 I. tralian vegetation using ICESat/GLAS, ALOS PALSAR, and landsat sensor data.
Fisher, A., Armston, J., Goodwin, N., Scarth, P., 2020. Modelling canopy gap probability, Remote Sens. 11 (2), 147.
foliage projective cover and crown projective cover from airborne lidar metrics in Shimada, M., et al., 2014. New global forest/non-forest maps from ALOS PALSAR data
Australian forests and woodlands. Remote Sens. Environ. 237, 111520. (2007–2010). Remote Sens. Environ. 155, 13–31.
Furby, S., 2002. Land Cover Change: Specification for Remote Sensing Analysis. Simard, M., Pinto, N., Fisher, J.B., Baccini, A., 2011. Mapping forest canopy height
Australian Greenhouse Office, Canberra. globally with spaceborne lidar. J. Geophys. Res. Biogeosci. 116 (G4).
Guerschman, J.P., et al., 2015. Assessing the effects of site heterogeneity and soil prop- van Dijk, A.I.J.M., et al., 2013. The millennium drought in southeast Australia
erties when unmixing photosynthetic vegetation, non-photosynthetic vegetation and (2001–2009): natural and human causes and implications for water resources, eco-
bare soil fractions from Landsat and MODIS data. Remote Sens. Environ. 161, 12–26. systems, economy and society. Water Resour. Res. 49, 1–18.
Hansen, M.C., DeFries, R.S., Townshend, J.R.G., Sohlberg, R., 2000. Global land cover van Dijk, A., Mount, R., Gibbons, P., Vardon, M., Canadell, P., 2014. Environmental re-
classification at 1 km spatial resolution using a classification tree approach. Int. J. porting and accounting in Australia: progress, prospects and research priorities. Sci.
Remote Sens. 21 (6), 1331–1364. Total Environ. 473–474 (0), 338–349.
Hansen, M.C., et al., 2013. High-resolution global maps of 21st-century forest cover Van Dijk, A.I.J.M., Paget, M., Suarez, L., Gale, M., 2018. TERN Airborne LiDAR and
change. Science 342 (6160), 850–853. Hyperspectral Products Document. Canberra. .
Keith, H., Mackey, B.G., Lindenmayer, D.B., 2009. Re-evaluation of forest biomass carbon van Leeuwen, M., Nieuwenhuis, M., 2010. Retrieval of forest structural parameters using
stocks and lessons from the world’s most carbon-dense forests. Proc. Natl. Acad. Sci. LiDAR remote sensing. Eur. J. For. Res. 129 (4), 749–770.
U.S.A. 106 (28), 11635–11640. Wilson, J.P., Gallant, J.C., 2000. Secondary topographic attributes. Terrain anal.
Lefsky, M.A., et al., 2005. Estimates of forest canopy height and aboveground biomass Principles appl. 87–131.

12

You might also like