0% found this document useful (0 votes)
89 views

Use of PCA For Vegetation Mapping - Revisited v3

Methods used for a quantitative ecological survey of Botsalano Game Reserve, North West Province. Description of Principal Components Analysis (PCA).

Uploaded by

Jeff W Morris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Use of PCA For Vegetation Mapping - Revisited v3

Methods used for a quantitative ecological survey of Botsalano Game Reserve, North West Province. Description of Principal Components Analysis (PCA).

Uploaded by

Jeff W Morris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Use of PCA for vegetation surveys - revisited

Jeff W Morris

Introduction
In 1967 the author competed a study of the vegetation of Ntshongweni, Natal, in partial
fulfilment of the requirements for a MSc degree at the University of Natal,
Pietermaritzburg (Morris 1967, 1968). This was arguably the first use of Principal
components analysis (PCA) in South Africa in a plant ecology context. After a hiatus of
50-odd years the author decided to use the technique for a citizen science project in
Botsalano Game Reserve, North West Province. A visitor guide, including history,
location, topography, hydrology, geology, vegetation and tourist facilities is in Morris
(2021).
In a nutshell, PCA is a useful statistical technique that has found application in fields
such as face recognition and image compression, and is a common technique for finding
patterns in data of high dimension (for example, Smith 2002).
A vegetation map of the Reserve was drawn by Bosch (2011) for management purposes.
He gathered 75 samples of the tree, shrub, grass and forb components and arrived at a
classification of five communities using the PHYTOTAB-PC package (Westfall 1992). The
author decided to repeat the exercise by re-sampling only the woody stratum. PCA and
GIS mapping software (QGIS), which was not available in 2011, would be used. The
salient question was, could a meaningful vegetation classification and map be produced
with this small sample and the PCA technique? The other aim was to draw a vegetation
map that would be useful to tourists visiting the Reserve.
The outcome was a new classification and map. This document aims to describe and
discuss the methods used after a brief reminiscence of the earlier study. Finally, the re-
classification is described.

PCA in 1967
A PCA program was written from scratch for the
author in FORTRAN IV by the manager of the Computer
centre on the Durban campus of the University of
Natal. There were no packages those days. One
computer served both campuses; a stand-alone
IBM1130 with 8k words memory and no terminals or
network (PCs were unheard of). Input was with punch
cards which were fed into a reader and the results
printed out. All of this was in one room with the
author standing by. Graphic representation of the
results were basic x-y plots by a printer and not a
plotter. These were re-drawn by hand for publication
purposes. Three-D modelling was by the use of wires
mounted on a wooden plank with plasticine balls of
various sizes and colours representing points in three- Figure 1. Model of first three principal
dimensional space (Fig. 1). This was state-of-the-art components from the Ntshongweni
science. dataset.

Page 1
Methods in 2021
After a brief reconnaissance of the Reserve, it was decided to sample the woody-plant
layer only as a basis for a new vegetation map with tourists in mind.
Initially, samples were placed arbitrarily within sites deemed representative of the five
communities (1.1, 1.2, 2.1, 2.2 and 3) mapped by Bosch (2011) (Fig. 2). Ecotone areas
and clearly-disturbed areas were avoided. Most samples were located along roads or
tracks, often at road crossings. This was to make it easier for follow-up studies to find
the same spot. The Reserve authorities were also not keen that the author venture alone
too far from the roads in case of attack by buffalo! One morning was spent off-road in
the company of an armed guard but this was then deemed unnecessary. The aim was to
obtain 10-15 samples in each Bosch community.
The woodland immediately behind Sentry hill bush camp was recognised as different
from any other in the Reserve. It was not separately mapped by Bosch, being a
relatively small area within the Reserve although it extends some distance onto the
adjacent farm. It was sampled with seven samples. Four samples were specifically
collected in areas recognised as having affiliation with Kalahari sandveld rather that
Klerksdorp thornveld (terminology from Mucina & Rutherford 2006), also not separately
mapped by Bosch (2011).

Figure 2 Sites of samples overlain on Bosch (2011) vegetation map.


At each sample point a list was made of the presence of woody species. Starting at a
point, the observer moved in circles with increasing radius around the starting point until
no new species were found. Care was taken not to wander into adjacent Bosch
communities but to stay within the designated community. Only presence was recorded

Page 2
and no other quantitative measure like abundance, dominance or cover. The diameter of
the sample varied from approximately 20m to 60m. A total of 91 samples were collected
with 24 woody species occurring in more than two samples. Sample locations are shown
in Fig. 2, superimposed on a re-drawn Bosch map.
The raw data and preliminary manipulations have been
deposited here: 100
https://www.scribd.com/document/513771453/Botsalano-PCA-

Number of occurrences
Analysis-Data 80

60
Results
40
Species frequency within samples is shown in Fig. 3. Two
species occurred in 81 and 79 of the 91 samples, respectively. 20
These were Ziziphus mucronata and Grewia flava. Eight of the
24 occurred in 50 or more of the 91 samples and are 0
considered common throughout the Reserve. There was a long
Figure 3. Frequency of
tail of 13 less-common species occurring in fewer than 40
occurrence of each of the 24
samples. The least-common species occurred in six samples.
species.
Species diversity is expressed as the average number of
different species recorded in samples. The lowest density was in Umbrella thorn
woodland with an average of 5.5 species per sample and the highest 10.2 in Umbrella
thorn savanna. Camel thorn woodland, Blackthorn scrub and Bushwillow woodland had
8.1, 8.6 and 8.9 species per sample, respectively.
The matrix of samples by species was processed
with Principal components analysis (PCA) using the
PRCOMP() function and associated graphic plotting
functions from R Core Team (2021). The many
publications on using PRCOMP() were of great
assistance as the author had no prior knowledge of
the R programming language. Another very useful
publication was on preparing the data matrix
before starting analysis (STDHA 2021).
The plot of eigenvalues shows that the first and
second principal axes contain a large proportion of
the variability (about 30%) with the remainder of
Figure 4. Eigenvalues for the first 10
the axes each contributing relatively little (Fig. 4). components.
This is a normal result to expect.
Analysis of the samples with PCA confirmed
the earlier field observation. On the first
component the seven Sentry hill samples are
clearly separated from the others (Fig. 5).
The second component separates the hill
samples within themselves. There are two
intermediate samples which happen to be
positioned downslope of the hill and contain
elements of the hill and adjacent
communities.
In this analysis no clear explanation could be
found for the pattern within the large group
of samples on the right-hand, lower end of
Figure 5. Plot of 1st and 2nd components.
the first and second components. It was

Page 3
considered that the gross difference
between the hill and the rest of the
samples was distorting the picture within
the non-hill samples.
It was decided to perform another PCA
after excluding the seven Sentry hill
samples. After exclusion of the samples,
two species that occurred only in the
seven hill samples, and nowhere else,
were removed from the matrix, leaving
84 samples and 22 species for further
analysis.
Results of the second PCA are presented
in Fig. 6. The first component (17% of
Figure 6. Plot of 1st and 2nd components excluding
the total variation) separated rocky,
Bushwillow woodland samples.
higher-lying samples, mostly on Kanye
formation rocks from those in low-lying dry
stream valleys with deep clay soils or Kalahari
sand. The second component (10%) neatly
separated the clay-soil samples from those on
sand on the left-hand side of the 1st component.
The second component also separated the rocky
upland samples into two entities; Open savanna
woodland and Closed savanna woodland. Clear
gradients exist between these four entities as
illustrated by the many samples around the
centroid. Only samples at the edges of the two
axes can be considered representative of the
vegetation communities that have been identified.
The large percentage of very common species Figure 7. Plot of 1st and 3rd components
throughout the Reserve probably accounts for the excluding Bushwillow woodland.
central cluster.
Interpretation of the third component (9%) proved more difficult (Fig. 7). The 1st
component is the same as in Fig. 7, separating valley bottoms and low-lying areas from
rocky uplands. The extremes of the right-hand side of the 3rd component separate
samples on dykes (top), in Closed savanna
woodland, from samples in very open parts of Open
savanna woodland (bottom of graph). No
interpretation could be found for the scatter on the
3rd component of the left-hand side of the 1st
component.
In addition to making two-dimensional plots of the
principal components axes a three-dimensional
model was also made with SCATTERPLOT3D from R
Core Team (2021). Interactive views from all angles
can be projected in two dimensions as illustrated in
Fig. 8. This technique was not used for interpretation
of the results in this investigation.

Figure 8. Example 3D plot from the The new vegetation map and descriptions of the
Botsalano dataset. communities are given in Morris (2021)

Page 4
Discussion and conclusions
Influences on the vegetation composition include past management practices like fire
management and overgrazing. Also, aspect and slope steepness affect the vegetation.
Virtually the entire area is underlain by Kanye formation rocks and soils are typically
very shallow. Dyke swarms also influence the vegetation.
In addition to providing the Reserve with a vegetation map for use by tourists, this was
an exercise to see whether a quantitative classification could be achieved based solely on
woody plants with a limited number of samples. The area is rich in grasses and forbs and
a more comprehensive assessment could have been achieved with their inclusion.
Sampling time would, however, have been increased enormously. The exercise is
considered successful and could be a template for studies of other similar reserves and
farms.
The author had a map by a previous researcher as base by which samples could be
stratified. Distribution of samples without that base would have been much more
difficult. Either many more samples would be needed or more time would have to be
spent on preliminary investigations.
The strategy of concentrating sampling along roads and tracks, however, meant that
large expanses were not represented in samples (Fig.2). This has lead to mapping
decisions having to be made on the basis of site visits after completion of the analysis,
satellite photograph interpretation and reference to the previous map.

Acknowledgements
I thank Mike Panagos for discussions about the area and for providing unpublished
material. Dave Berger put me on the trail for R programming. Braam van Wyk was most
helpful with identifying plants.
The Reserve Manager and staff are thanked for permission to do fieldwork and for help
in many other ways.

References
Bosch, AD (2011) The Vegetation Management of the Botsalano Game Reserve in the
North West Province, South Africa. Unpublished Magister Technologiae: Nature
Conservation thesis, Faculty of Science, Tshwane University of Technology, Pretoria.
Core Team (2021). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Morris, JW (1967) Descriptive and Quantitative Plant Ecology of Ntshongweni, Natal.
Unpublished MSc thesis, University of Natal, Pietermaritzburg.
Morris, JW (1968) An Ordination of the Vegetation of Ntshongweni, Natal. Bothalia -
African Biodiversity and Conservation 10(1) 89-120.
Morris, JW (2021) Botsalano Game Reserve. A visitor guide. Unpublished pdf report.
https://www.scribd.com/document/513783370/Botsalano-Visitor-Guide-v9
R Mucina, L & Rutherford MC (eds) 2006 The vegetation of South Africa, Lesotho and
Swaziland. Strelitzia 19. South African National Biodiversity Institute, Pretoria.
Smith, LI (2002) A tutorial on Principal Components Analysis. Technical Report OUCS-
2002-12. Department of Computer Science, University of Otago, New Zealand.

Page 5
STHDA (2021) Statistical tools for high-throughput data analysis
http://www.sthda.com/english/wiki/best-practices-in-preparing-data-files-for-importing-
into-r
WESTFALL, RH (1992). Objectivity in stratification, sampling and classification of
vegetation. PhD thesis. University of Pretoria, Pretoria.

[email protected] 2021 07-01

Page 6

You might also like