0% found this document useful (0 votes)
45 views

Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T

Paper ensemble machine learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T

Paper ensemble machine learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Catena 196 (2021) 104886

Contents lists available at ScienceDirect

Catena
journal homepage: www.elsevier.com/locate/catena

Ensemble learning-based classification models for slope stability analysis T


a,b c c c,⁎
Khanh Pham , Dongku Kim , Sangyeong Park , Hangseok Choi
a
Department of Civil Engineering, International University, Ho Chi Minh City, Viet Nam
b
Vietnam National University, Ho Chi Minh City, Viet Nam
c
School of Civil, Environmental, and Architectural Engineering, Korea University, Seoul, Republic of Korea

ARTICLE INFO ABSTRACT

Keywords: In this study, ensemble learning was applied to develop a classification model capable of accurately estimating
Ensemble classifier slope stability. Two prominent ensemble techniques—parallel learning and sequential learning—were applied to
Ensemble learning implement the ensemble classifiers. Additionally, for comparison, eight versatile machine learning algorithms
Slope stability analysis were utilized to formulate the single-learning classification models. These classification models were trained and
Machine learning
evaluated on the well-established global database of slope documented from 1930 to 2005. The performance of
these classification models was measured by considering the F1 score, accuracy, receiver operating characteristic
(ROC) curve and area under the ROC curve (AUC). Furthermore, K-fold cross-validation was employed to fairly
assess the generalization capacity of these models. The obtained results demonstrated the advantage of ensemble
classifiers over single-learning classification models. When ensemble learning was used instead of the single
learning, the average F1 score, accuracy, and AUC of the models increased by 2.17%, 1.66%, and 6.27%, re­
spectively. In particular, the ensemble classifiers with sequential learning exhibited better performance than
those with parallel learning. The ensemble classifiers on the extreme gradient boosting (XGB-CM) framework
clearly provided the best performance on the test set, with the highest F1 score, accuracy, and AUC of 0.914,
0.903, and 0.95, respectively. The excellent performance on the spatially well-distributed database along with its
capacity to distribute computing indicates the significant potential applicability of the presented ensemble
classifiers, particularly the XGB-CM, for landslide risk assessment and management on a global scale.

1. Introduction Griffiths et al., 1999; Matsui and San, 1992), outperforms the slice
methods by eliminating the assumptions on interslice forces and pro­
Landslides are one of the most severe disasters that cause con­ viding essential information for tracing progressive failure (Lechman
siderable damage to human lives and economies. Therefore, under­ and Griffiths, 2000). However, this approach requires a deep under­
standing the collapse mechanisms and accurately estimating the slope standing of soil behavior, which can be ideally described by sophisti­
stabilities are crucial for landslide risk assessment and management. cated constitutive laws. Moreover, obtaining the solution normally re­
Owing to the uncertainty and nonlinear nature of geomaterials, it is quires prior assumptions and simplifications, which subsequently
difficult to reliably evaluate the safety of the slope using conventional govern the accuracy of the employed approach (Keaton, 2007).
physics-based models (e.g., slice method and numerical approach). The remarkable development of machine learning (ML) algorithms
Michalowski (1995) raised concerns on the accuracy of the slice along with the extensive data accumulation in this field provides a
methods, which have been routinely used in practice, because it is valuable alternative to the physics-based models that learns and re­
impossible to determine the static admissibility of stress fields or the cognizes the failure patterns of slopes under different circumstances.
kinematic admissibility of the collapse mechanism using these methods. Several ML-based models have been proposed to estimate the safety of
The numerical approach, which incorporates computational techniques slopes with a certain level of success (Kang et al., 2017; Lin et al., 2009;
with the shear strength reduction algorithm (Dawson et al., 1999; Manouchehrian et al., 2014; Samui, 2013). In addition, various

Abbreviations: ML, machine learning; KNN, K-nearest neighbor; SVM, support vector machine; SGD, stochastic gradient descent; GP, Gaussian process; QDA,
quadratic discriminant analysis; GNB, Gaussian naïve Bayes; DT, decision tree; XGB, extreme gradient boosting; ANN, artificial neural network; RF, random forest;
AB, adaptive boost; GB, gradient boosting; ROC, receiver operating characteristic; AUC, area under the curve; FP, false positive; FN, false negative; TP, true positive,
TN, true negative; TPR, true positive rate; FPR, false positive rate; STD, standard deviation; TNR, true negative rate

Corresponding author.
E-mail address: [email protected] (H. Choi).

https://doi.org/10.1016/j.catena.2020.104886
Received 13 January 2020; Received in revised form 27 August 2020; Accepted 28 August 2020
0341-8162/ © 2020 Elsevier B.V. All rights reserved.
K. Pham, et al. Catena 196 (2021) 104886

optimization algorithms (e.g., differential evolutions, firefly algorithm, and enables training with massive training sets, its final solution (i.e.,
and particle swarm optimization) have been integrated with these ML the model parameters) is satisfactory, but not optimal (Aurélien Géron,
models to improve their performance (Das et al., 2011; Hoang and 2017). The SGD uses two crucial hyperparameters including the
Pham, 2016; Xue, 2017). However, the prediction accuracy of these number of epochs (niter) and α, which regulates the learning rate, to
ML-based models might be excessively sensitive to data properties (e.g., control the overfitting issue experienced during the training phase.
distribution of attributes and data volume) owing to the mathematical Gaussian process (GP) is a generic supervised learning method used
assumptions of the utilized ML algorithm (Lin et al., 2018; Qi and Tang, for binary classification. The main working principle involves placing a
2018a). GP over the latent function beforehand, which is then squeezed through
Ensemble learning, inspired by the law of large numbers, has been a link function to obtain a probabilistic classification (Rasmussen and
validated as one of the most efficient approaches for improving the Williams, 2006). Because the prediction is made in terms of Gaussian
performance of ML models (Hansen and Salamon, 1990; Kuncheva, probability, the empirical confidence intervals can be computed and
2004; Zhou et al., 2002). Fundamentally, ensemble learning enhances used to refit the prediction in a region of interest. However, by utilizing
the performance of ML models by embracing the diversity of its base the information of entire samples, GP presumably loses efficiency in the
predictors to learn different perspectives of the database. These diverse case of high-dimensional spaces (Pedregosa et al., 2011).
set of predictors can be obtained either by employing different learning Quadratic discriminant analysis (QDA) is a modification of the linear
algorithms (i.e., heterogeneous ensemble learning) or utilizing one discriminant analysis; QDA eliminates the assumption of equal covar­
single-learning algorithm trained with random subsets of the training iance matrices among the groups. This algorithm is desirable because it
set (i.e., homogeneous ensemble learning). provides closed-form solutions that are generally suitable in practice.
Although ensemble learning has demonstrated its overwhelming Furthermore, no hyperparameters are required to tune QDA.
advantages in solving actual problems in various fields, its application Gaussian naïve Bayes (GNB) is formulated based on Bayes’ theorem
is relatively limited in the context of slope stability analysis as only along with the naïve assumption of conditional independence among
heterogeneous ensemble learning has been applied to evaluate the the input features. Although its underlying assumption is rarely ap­
safety of slope (Qi et al., 2018; Qi and Tang, 2018b). Recently, plicable in real data, GNB performs quite well in practice. Furthermore,
Bragagnolo et al. (2020) utilized homogeneous ensemble learning to GNB requires a relatively small amount of training data and yields a
formulate a framework for mapping landslide susceptibility. Although high-speed computation. Moreover, GNB can alleviate severe problems
these studies worked on different types of data, they supported the related to the curse of dimensionality (Pedregosa et al., 2011).
superiority of ensemble learning in significantly enhancing the accu­ Decision tree (DT) is a non-parametric model capable of fitting
racy of predictions. In addition to parallel learning (i.e., homogeneous complex datasets. During the training phase, a simple decision rule is
and heterogeneous ensembles), ensemble learning possesses a variety of inferred from data features to formulate the model. DT is a white-box
abilities that have not been recognized in this field. model, in which all the information regarding model behaviors and
By analyzing the performance of ensemble classifiers using various influential variables is available. Two vital elements of the white-box
learning concepts (i.e., parallel learning and sequential learning), this model include interpretable model features and the transparent
study attempts to systematically examine the ensemble learning used in learning process. DT does not require strict data preparation and is
estimating the safety of the slope. Furthermore, single-learning classi­ commonly used as the base predictors for ensemble learning (e.g.,
fication models formulated on the framework of versatile ML algo­ random forest, adaptive boosting, gradient boosting, and extreme gra­
rithms were also presented for comparison. The classification models in dient boosting). Nevertheless, DT is susceptible to overfitting and in­
the examination were trained and evaluated on a published database of stability; generally, the learning phase terminates at a reasonably good
153 slope cases collected from field investigations at different locations solution, not the optimal one. In this study, two hyperparameters were
worldwide. The K-fold cross-validation was also applied for a fair as­ employed to control the overfitting issue: max_depths to determine the
sessment. maximum depth of the DT, and max_leaf_nodes to define the maximum
number of leaf nodes.
2. Machine learning algorithms and ensemble learning Artificial neural network (ANN) combines multiple neurons con­
nected in the form of a network. The advantage of ANN is its ability to
2.1. Machine learning algorithms learn nonlinear models. However, the need to determine optimal ar­
chitecture and tune hyperparameters are the disadvantages of ANN.
Complete details of the eight ML algorithms that are considered Furthermore, ANN is sensitive to input feature scaling.
herein have been well presented in literature (Bishop, 2006; Rasmussen
and Williams, 2006; Shalev-Shwartz and Ben-David, 2013). In this 2.2. Ensemble learning
section, the advantages and disadvantages of each algorithm are sum­
marized, along with the approaches for controlling their performance. Ensemble learning is a technique used in aggregating individual
K-nearest neighbor (KNN) is a non-parametric algorithm that does learning algorithms, known as base predictors, to yield a potentially
not assume the underlying data distribution. The training phase of KNN superior predictor. The diversity among the base predictors plays a vital
is high-speed; however, this algorithm is computationally expensive. role in governing the final prediction accuracy (Kuncheva and
The critical factors governing the performance of KNN include the Whitaker, 2003). Additionally, Minaei-Bidgoli et al. (2014) demon­
number of nearest neighbors (k) of each query point and the leaf size strated the effect of the resampling method and its adaptation to the
(leaf_size) that affects the speed of construction and query (Pedregosa efficacy of clustering ensemble. Kuncheva (2004) comprehensively
et al., 2011). presented the algorithms to combine pattern classifiers. Fundamentally,
Support vector machine (SVM) is a non-parametric algorithm pro­ ensemble learning can be sorted into two groups according to their
posed by Cortes and Vapnik (1995). SVM usually performs well even learning concept: parallel ensemble and sequential ensemble.
with high-dimensional data. However, it requires significant amounts of
computer resources and occasionally suffers from numerical instability 2.2.1. Parallel ensemble
when determining solutions to optimization problems. The general­ Parallel ensemble trains the base predictors in parallel to utilize the
ization error of SVM can be controlled by the regularization parameters, characteristics of independence between them. One of the advantages
C and γ. of parallel ensemble is the ability to utilize different CPU cores or
Stochastic gradient descent (SGD) is an iterative method used to op­ machines to execute training and predicting simultaneously. The base
timize a differentiable cost function. Although SGD is considerably fast predictors can be different learning algorithms (i.e., heterogeneous

2
K. Pham, et al. Catena 196 (2021) 104886

with majority votes is determined as the final decision. In soft-voting,


the final prediction is the class with the highest probability averaged
over the base predictors. Generally, soft-voting performs better than
hard-voting because the former places more weight on highly reliable
votes (Aurélien Géron, 2017).

2.2.2. Sequential ensemble


The sequential ensemble, also known as boosting, trains the base
predictors sequentially, with each newly added predictor attempting to
correct its predecessor. The predictor focuses more on challenging cases
to improve prediction accuracy. Fig. 3 illustrates the flowchart of the
sequential ensemble.
One of the most popular sequential ensemble algorithms originally
proposed by Freund and Schapire (1997) is Adaptive Boost (AB), in
which a new predictor attempts to correct its predecessor by adding
more weight on the unfitted training instances. This study briefly
summarized the AB learning concept; details regarding this im­
Fig. 1. Flowchart of heterogeneous ensemble classifier.
plementation can be found in Freund and Schapire (1997). In the initial
state, each instance weight w j ( j = 1. .m , where m is the number of
ensemble) or a single-learning algorithm (i.e., homogeneous ensemble). training instances) is set equal to 1/m, and the subsequent steps of the
Owing to their different mathematical backgrounds, heterogeneous training process are as follows:
ensemble exploits the diversity of base predictors to increase the
probability of different error types. Consequently, the overall prediction while stop criteria = false :
accuracy can be improved. For the heterogeneous ensemble classifier, Train base predictor ith prediction: yi j
the eight ML algorithms, as mentioned earlier, were employed in this m j
j = 1 w | yi
j
yj
study; their hyperparameters were tuned via the grid search algorithm. Error rate ri = m j (y j :ground truth)
j=1 w
Fig. 1 presents the flowchart of a heterogeneous ensemble classifier. 1 ri
Predictor weight: = log ( : learning rate)
Besides, a set of diverse classifiers can be achieved using one i ri

learning algorithm for all base predictors; however, it must be trained Update instance weight:
with different random subsets of the training set. Sampling can be w j if yi j = yi
carried out either with replacement, (i.e., bagging) or without re­ wj =
w j e i if y i j yi
placement (i.e., pasting). Although these two techniques allow the
training data to be sampled several times across the base predictors, wj =
wj
m j
only the bagging technique permits sampling data several times for the j= 1 w

same predictor. Fig. 2 illustrates the flowchart of a homogeneous en­ Next base predictor i + =1 (1)
semble classifier. Random forest (RF) is a well-known example of a
homogeneous ensemble, in which the DT is used as the base predictor where stop criteria is set when the number of predictors npredictor is
along with bootstrap sampling. Despite its simplicity, RF is considered achieved.
as one of the most powerful conventional ML algorithms used in For the new input x, all base predictors of AB are executed to obtain
practice. In addition to RF, this study adopted the bagging technique the predictions along with their predictor weights. The class with the
along with utilizing SVM, ANN, and KNN as the base predictors to majority of weighted votes is determined as the final prediction y (x) , as
formulate the homogeneous ensemble classifiers. expressed in Eq. (2). In particular, the overfitting issue can be con­
Final predictions of both the heterogeneous and homogeneous en­ trolled by tuning npredictor or using the efficiently regularized base pre­
semble classifiers can be obtained by hard-voting or soft-voting. Hard- dictors.
voting aggregates the predictions of each base predictor, and the class npredictor
y (x) = arg max i yi (x) = k
k i=1 (2)
Breiman (1997) recast AB in the statistical framework, which was
then developed into gradient boosting (GB) by Friedman (2001). Con­
ceptually, GB is the combination of AB and weighted minimization, in
which the residual errors made by the previous predictor are fed to the
new predictor. The objective of GB is to minimize the loss of a model by
sequentially adding new base predictors using a gradient descent-like
procedure. The difference between GB and AB is that GB freezes the
weights of the predecessor whenever the new predictor is added. The
three principal components of GB are loss function, weak learner, and
additive model. For the classification task, the logarithmic loss and DT
are broadly used. Mason et al. (2000) proposed a functional-gradient-
descent algorithm that aids the addition of new predictors in the di­
rection of reducing the residual loss of the model. The new predictors
are added until the predefined number is achieved or the loss in the
validation set does not improve on the next iterations. GB overcomes
the overfitting problem by employing more regularized base predictors
or adjusting the learning rate for updating the weights. Additionally,
random sampling can be used to reduce the variance, in which the
Fig. 2. Flowchart of homogeneous ensemble classifier. subsets of the training set are sampled randomly to train each base

3
K. Pham, et al. Catena 196 (2021) 104886

Fig. 3. Flowchart of sequential ensemble learning.

predictor. investigations during the period from 1930 to 2005. Fig. 4 illustrates
The most significant drawback of the sequential ensemble is not the locations of the slope cases considered in this study. It is observed
able to be parallelized, which results in being unscalable (Géron, 2017). that these slope locations were relatively well-distributed across dif­
However, Chen and Guestrin (2016) developed extreme gradient ferent areas (e.g., Europe, Asia, and North America) around the world.
boosting (XGB), a scalable tree boosting system, on the framework of The database of Sakellariou and Ferentinou (2005) consists of 46 slope
GB to increase computational speed. The advantages of XGB are evident cases, which were adjusted from the original database of Sah et al.
in its ability to support distributed training and integrate with the cloud (1994). The 53 rock slope cases investigated by Chen et al. (2011) were
dataflow system. Distributed training on multiple machines is im­ obtained from a mountainous area in Guizhou, a southeastern province
plemented using built-in interfaces to integrate them with distributed of China. The other 26 slope cases explored and statistically summar­
computing frameworks (e.g., DASK) to perform feature engineering or ized by Wang et al. (2005) represented typical and large-scale slopes
allocating base predictors. Furthermore, XGB utilizes more regularized with a high probability of failure, located in the Qing river basin region,
models to settle the overfitting issue, due to which it performs better China. The remaining slope cases obtained from the databases of Xu
than GB (Chen and Guestrin, 2016). Consequently, XGB dominates most et al. (1999) and Feng (2000) were rock slopes.
of the other learning algorithms in recent ML comparisons. In addition Five factors defined the geological and geometry conditions of
to AB and GB, this study applied XGB to formulate boosting classifiers. slopes, namely: unit weight γ (kN/m3), cohesion c (kPa), internal fric­
tion angle φ (rad), slope height H (m), and slope inclination angle β
(rad). These five factors were then utilized as the input features of the
3. Database
classification models. The stability status of the slope cases was iden­
tified as either stable (S) or failure (F). Fig. 5 compares the numbers of
This study analyzed the database of 153 slope cases documented in
failure and stable cases, indicating a slight skew toward the stable
published literature (Chen et al., 2011; Feng, 2000; Sakellariou and
slopes.
Ferentinou, 2005; Wang et al., 2005; Xu et al., 1999). The database
Table 1 summarizes the statistical descriptions of the five factors in
contains information regarding the geological conditions, geometry,
the examined database. According to statistics, the values of these
stability status, and location of the slope obtained from field

Fig. 4. Approximate locations of slope cases in consideration.

4
K. Pham, et al. Catena 196 (2021) 104886

nine outliers exhibiting the cohesion greater than 100 kPa. The internal
friction angle showed a nearly unimodal distribution in the range
0–0.79 rad, which was majorly centered in the range 0.43–0.61 rad.
The slope inclination angle also showed an approximately normal dis­
tribution in the range 0.17–1.03 rad, which indicated that the database
contains a variety of slope geometries (i.e., slight to steep inclination).
In the case of slope height, this value had the broadest range among the
other factors (i.e., 3.66–511 m), representing different slopes from re­
latively low to extremely high slopes. The slope height, which was
mainly centered around 30.5–108 m, skewed toward the lower value.
Only 25% of the slope cases had a slope height greater than 108 m. In
particular, 20 outliers with slope height greater than 239 m were de­
tected.
Fig. 5. Comparison of number of failures and stable cases in the examined These five factors have different scales that can significantly affect
database. the performance of SVM and ANN. The standardization technique was
applied to adjust the input features to the same scale before im­
Table 1 plementing these two models. Furthermore, besides the internal friction
Statistical descriptions of input features of database. angle and the slope inclination angle, the other histograms are tail-
heavy or bimodal distributions; the outliers were also detected. These
Statistical descriptions γ (kN/m3) c (kPa) φ (rad) β (rad) H (m)
two properties of the database make it difficult for ML algorithms to
Number 153 153 153 153 153 learn the patterns.
Mean 22.60 34.73 0.50 0.60 97.26 Moreover, the highly correlated input features could degrade the
Standard Deviation 3.90 43.19 0.16 0.18 115.46
performance of ML algorithms. Therefore, the Pearson correlation
Min 12 0 0 0.17 3.66
Q1 (25%) 20.41 11.97 0.43 0.49 30.50
coefficient (P) was briefly applied to examine the correlation between
Median - Q2 (50%) 22.40 29.30 0.53 0.61 50 each pair of input features. Fig. 7 illustrates the heatmap of the Pearson
Q3 (75%) 26.20 40.00 0.61 0.74 108 correlation coefficient.
Max 28.44 300 0.79 1.03 511 According to the obtained results, the γ and φ pair (P = 0.57) is the
most correlated among the pairs of input features examined. The cor­
relation of this pair is consistent with that of the physical interpretation.
However, the level of correlation of this pair might be considered as a
medium, which could be adopted to classify the stability status of
slopes.

4. Methodology

4.1. Dataset partition

ML models conduct specific tasks in accordance with the patterns


extracted from the given databases. The process of recognizing the
regularity and pattern of the database is called the learning or training
process. Once the learning phase is completed, the trained model can

Fig. 6. Distribution of five factors of database on their ranges.

factors have broad ranges, thereby indicating that the database consists
of diverse soil types and slope conditions. Fig. 6 illustrates the dis­
tribution of these factors in their ranges.
The unit weight exhibited a possible bimodal distribution, ranging
from 12 to 28.44 kN/m3, which was mostly centered in the range
20.41–26.20 kN/m3. The cohesion had a broad range (0–300 kPa),
which skewed toward the smaller value ranging from 11.97 to 40 kPa.
Fig. 7. Heatmap of correlation of each pair of input features. Notice: X1: γ (kN/
Only 25% of the slope cases had a cohesion higher than 40 kPa, with
m3); X2: c (kPa); X3: φ (rad); X4: β (rad); and X5: H (m).

5
K. Pham, et al. Catena 196 (2021) 104886

Table 3
Layout of confusion matrix to visualize performance of a classifier.
Predicted

Failure (Negative) Stable (Positive)

Actual Failure (Negative) True negative (TN) False positive (FP)


Stable (Positive) False negative (FN) True positive (TP)

where TN: number of failure slopes classified correctly as the failure class; FP:
number of failure slopes classified incorrectly as the stable class; FN: number of
stable slopes classified incorrectly as the failure class; TP: number of stable
slopes classified correctly as the stable class.

convenience, the harmonic means of precision and recall, known as the


F1 score, was employed to measure the performance of classification
models, as expressed in Eq. (3). Furthermore, other concise metrics
including accuracy, receiver operating characteristic (ROC) curve, and
Fig. 8. Histogram of slope height categories. area under the ROC (AUC) were also employed for efficient evaluation,
as expressed in Eq. (3). The accuracy is determined as the ratio of in­
properly execute a given task on the previously unseen inputs, and this stances that are correctly classified. The ROC curve plots the true po­
ability is known as generalization. This study used 80% of the database sitive rate (TPR) against the false positive rate (FPR). A good classifi­
to train the classification models, and the remaining 20% to evaluate cation model should have an AUC value close to 1.
the generalization capacity.
Because the volume of the examined database is relatively small, F1 = 1
2
+
1 ( precision = TP
TP + FP
recall = TPR =
TP
TP + FN )
with 153 slope cases alone, stratified sampling was applied to ensure precision recall
FP
the test set is representative. In other words, the database was divided FPR = FP + TN
into homogenous subgroups, called strata, and the sampling process Accuracy =
TP + TN
was performed on each stratum. Among the five input features, the TP + TN + FN + FP (3)
slope height is considered to be the most sensitive to slope stability (Lin For fair assessment, this study applied K-fold cross-validation
et al., 2018). Therefore, this study conducted stratified sampling for the (Stone, 1974) to examine the performance of classification models. The
test set according to the slope height categories. Although the slope cross-validation split the training set into k subsets called the folds.
height was experimentally categorized, sufficient instances were still Thereafter, the examined models were trained and evaluated k times.
ensured in each stratum to avoid sampling bias. Fig. 8 illustrates the Each time, k-1 folds were picked for training, and the remaining folds
histogram of slope height categories, and Table 2 summarizes the range were used to evaluate the classification model. The results of the K-fold
of slope heights corresponding to each category. The obtained results cross-validation are expressed as an array containing k evaluation
show a relatively similar portion of the slope height categories between scores. In this study, considering the computational time, k was set as
the test set and overall database. 10.

4.2. Performance measurement and K-fold cross-validation 4.3. Hyperparameters tuning

This study used the confusion matrix to visualize the performance of As mentioned earlier, the ML algorithms considered in this study
classification models. Each row and column in the confusion matrix provide a set of hyperparameters to control the gap between the
represents actual and predicted classes, respectively. Table 3 presents a training and test errors. However, manually tuning the hyperpara­
layout of the confusion matrix. meters to determine the best set of hyperparameter values is tedious
False positive (FP) and false negative (FN) evidently represent dif­ and time-consuming work. Instead, this study applied a grid search
ferent insights in Table 3. In the context of slope stability analysis, as approach to automatically tune hyperparameters. All possible combi­
well as risk assessment, significant attention should be paid to the FP nations of hyperparameters were generated from a predefined grid of
because the cost of misclassifying negative samples (e.g., unstable hyperparameters, which, in turn, depend on each algorithm. Each
slopes) could be more than that of omitting positive samples (e.g., combination of hyperparameters was then evaluated using the cross-
stable slopes). However, a high number of positive samples detected validation, in which the F1 score was chosen to rate its performance.
incorrectly (i.e., high FN) could mislead the decision in the resource The grid search results in the best combination of hyperparameters for
management or cost-benefit analysis. Because the databases in most the algorithm that provides the highest F1 score. Table 4 summarizes
practical applications contain noise, adjusting classification models can the results of tuning the grid search hyperparameters for all the algo­
either increase the ratio of positive instances correctly detected (i.e., rithms considered in this study.
recall) or the accuracy of the positive prediction (i.e., precision). This Fig. 9 presents a concise flowchart of the procedure applied in this
phenomenon is the renowned precision-recall tradeoff situation. For study, including the processing of data, implementation of classification
models, and evaluation steps.
Table 2
Slope height categories and their portion in test set and overall dataset. 5. Results
Category Range (m) Number Overall Stratified Sampling
5.1. F1 score and accuracy
1 3.66–22 33 0.216 0.226
2 26–50 45 0.294 0.290 Fig. 10 presents the F1 score and accuracy obtained from the K-fold
3 51–75 23 0.150 0.161
cross-validation, including the evaluation of the training and test set.
4 76.81–100 13 0.085 0.065
5 108–511 39 0.255 0.258 Table 5 summarizes the confusion matrix, F1 score and accuracy.
A close observation of the results from the K-fold cross-validation

6
K. Pham, et al. Catena 196 (2021) 104886

Table 4 observation from the K-fold cross-validation mentioned earlier. The


Best combination of hyperparameters for each ML algorithm. averaged F1 score and accuracy of ensemble classifiers were approxi­
Machine learning Hyperparameters mately 5.58% and 5.82%, respectively, higher than those of the single-
algorithms learning classification models. Furthermore, with its highest F1 score of
0.914 and accuracy (correctly classifying 28 out of 31 slopes) on the
SVM kernel: linear, c = 1.1, γ = 0.001
test set, the XGB-CM still outperformed the other classification models
KNN leaf_size = 2, n_neighbors = 11
SGD α = 0.1, max_iter = 1640
considered in this study. It is noted that, compared to the examined
DT max_depth = 2, max_leaf_nodes = 5 classification models, the GB-CM was the most susceptible to over­
NN α = 0.1, number of neurons in hidden layer: 2, early fitting, as shown in Fig. 10(b) and (d).
stopping: True
RF Bootstrap: True, n_estimators = 130, base_predictor:
DT 5.2. ROC-AUC
AB n_estimators = 20, base_predictor: DT
GB n_estimators = 20, base_predictor: DT Fig. 11 illustrates the ROC-AUC for the performance of the classi­
XGB n_estimators = 150, base_predictor: DT fication models; a significant difference was observed in the evaluation
B-SVM n_estimators = 40, base_predictor: SVM
B-NN n_estimators = 39, base_predictor: NN
results. For convenience, the general rule proposed by Hosmer and
B-KNN n_estimators = 39, base_predictor: KNN Lemeshow (2000), as shown in Table 6, was adopted to categorize the
performance of the classification models according to their AUCs.
Table 7 summarizes the AUC obtained from the K-fold cross-vali­
dation ( AUCK fold ), along with the evaluation on the training and test
set. In the case of single-learning classification models, AUCK fold ran­
ging from 0.846 to 0.956 covers two discrimination categories in
Table 6 (i.e., excellent and outstanding discrimination). In particular,
among these single-learning classification models, GP-CM provided the
highest AUCGP K fold
of 0.956, sequentially followed by KNN-CM and SVM-
CM. Moreover, according to the standard deviation (STD) of their K-fold
cross-validation results, the performance of the previous model was
relatively more stable than the later ones.
In the case of ensemble classifiers, all the performance belonged to
the outstanding discrimination Min (AUCK fold ) equals to 0.93. The
averaged AUCK fold of the ensemble classifiers was 6.27% higher than
that of the single-learning classification models. Applying the bagging
technique (e.g., B-KNN, B-ANN, and B-SVM) improved the performance
of these classification models, in terms of both the AUC, which in­
creased from 3.09 to 6.64%, and the stability expressed by the reduc­
tion in the value of the STD.
The boosting technique efficiently enhanced the performance of the
classification models compared with that of the other two techniques.
Among the ensemble classifiers, the highest AUC and smallest STD were
obtained in the case of B-KNN, followed by those of the XGB-CM, as
shown in Fig. 11(a) and Table 7.
Fig. 9. Flowchart for procedure of data processing, ML model implementation, The evaluation results of the test set were similar to those of the K-
and evaluation. fold cross-validation. Generally, the averaged AUC Test of the ensemble
classifiers was 12.45% higher than that of the single-learning classifi­
cation models. The performance of single-learning classification models
revealed a permutation in the rank order of the performance of the
crossed all the discrimination categories in Table 6. The outstanding
single-learning classification models based on the F1 score and accu­
performance of GP-CM and KNN-CM was still superior to the remaining
racy. However, both metrics supported the superiority of SVM-CM
single-learning classification models considered in this study. However,
compared to other single-learning classification models, as shown in
when using the test set, the performance of SVM-CM was classified as
Fig. 10(a).
acceptable discrimination. Additionally, in the case of ensemble clas­
By adopting ensemble learning, the averaged values of the F1 score
sifiers, compared to the remaining classification models, the XGB-CM
and accuracy of single-learning classification models increased by
and heterogeneous ensemble provided the highest AUC of the test set,
2.17% and 1.67%, respectively. According to the significant increase in
with AUC test equal to 0.95, as shown in Fig. 11(b).
the F1 score and accuracy, ranging from 1.56 to 2.93% and 0.92 to
The additional metrics of specificity, also known as the true nega­
2.81%, respectively, it was evident that the boosting technique effi­
tive rate (TNR), defined in Eq. (4) was adopted to further analyze the
ciently enhanced the performance of the classification models com­
performance of the classification models along with TPR, which is
pared to the other two techniques (i.e., homogeneous and hetero­
known as sensitivity or recall. The specificity is defined as the ratio of
geneous ensemble). Among them, the XGB-CM provided the highest
negative instances that are correctly detected. (Géron, 2017)
values for both the F1 score and accuracy.
The evaluation results of the test set obtained from both metrics TN
TNR =
yielded the same rank order of performance for the classification TN + FP (4)
models, which was relatively different from the results of the K-fold
Fig. 12 presents the correlation of TNR and TPR corresponding to
cross-validation. The GP-CM outperformed the other single-learning
each classification model on the test set. Most of the examined classi­
classification models with its highest F1 score of 0.893 and accuracy
fication models belonged to zone 2, which tended to classify slope cases
(correctly classifying 26 out of 31 slopes) on the test set.
as stable (positive class) more often than as failure (negative class),
The correlation of the F1 score and accuracy between the ensemble
particularly in the AB-CM. The classification models implemented with
classifiers and single-learning classification models was similar to the
SVM, GNB, ANN, RF, and heterogeneous ensemble (i.e., group III in

7
K. Pham, et al. Catena 196 (2021) 104886

Fig. 10. F1 score and accuracy and of classifications models (a) F1 score from K-fold cross-validation; (b) F1 score from evaluation on training and test set; (c)
Accuracy from K-fold cross-validation; and (d) Accuracy from evaluation on training and test set.

Table 5 In particular, the SGD-CM detected 14 out of 15 unstable slopes cor­


General rule for classifying the discrimination according the AUC. rectly; however, it could correctly classify only 8 out of 16 stable slopes.
AUC values Discrimination categories

AUC = 0.5 No discrimination 6. Discussion


0.7 AUC < 0.8 Acceptable
0.8 AUC < 0.9 Excellent The input requirements of the classification models presented in this
0.9 AUC Outstanding
study were simplified into five fundamental parameters among crucial
factors (Kavzoglu et al., 2014; Lee et al., 2018) to evaluate the safety of
the slope. The statistical descriptions summarized in Table 1 demon­
Fig. 12) showed relatively similar predictions for both the stable and
strate the diversity of these features in representing the geological and
failure slopes, which were expressed by the relatively approximate
geometry conditions of slopes at different locations worldwide, as
values of TPR and TNR. XGB-CM evidently exhibited outstanding per­
shown in Fig. 4. Consequently, the models trained with such database
formance in detecting the stable slopes, and a significant high accuracy
have high applicability for landslide risk assessment and management
in detecting the failure slopes, as shown in Fig. 12; XGB-CM detected 16
on a global scale. Further improvement could be achieved in ac­
out of the 16 stable slopes correctly resulting in a TPR of 1 and also
cordance with the availability of databases to address additional factors
estimated 12 out of the 15 failure slopes correctly, resulting in a TNR of
determining the stable state of slopes, such as rainfall (Matsushi et al.,
0.8.
2006; Pham et al., 2018) and earthquake (Alfaro et al., 2012; Rodrı́guez
The classification models implemented along with KNN, B-KNN, B-
et al., 1999).
SVM, GP, and SGD classified slopes as failures more often than as stable.
The results of rigorous evaluation summarized in Tables 5 and 7

Fig. 11. ROC-AUC of classification models (a) ROC-AUC from K-fold cross-validation; (b) ROC-AUC from evaluation on training and test set.

8
K. Pham, et al. Catena 196 (2021) 104886

Table 6
Result of performance measurements of classification models.
Classification models Confusion matrix of on test set* F1 Accuracy

K-fold Train Test K-fold Train Test

KNN 13 2 0.852 ± 0.13 0.876 0.839 0.869 ± 0.08 0.877 0.839


3 13
SVM 12 3 0.864 ± 0.08 0.880 0.812 0.869 ± 0.06 0.877 0.806
3 13
SGD 14 1 0.809 ± 0.11 0.667 0.640 0.853 ± 0.09 0.820 0.710
8 8
GP 13 2 0.849 ± 0.14 0.930 0.893 0.861 ± 0.1 0.926 0.839
3 13
QDA 11 4 0.852 ± 0.09 0.859 0.788 0.853 ± 0.07 0.852 0.774
3 13
GNB 12 3 0.852 ± 0.09 0.877 0.812 0.853 ± 0.07 0.869 0.806
3 13
DT 11 4 0.831 ± 0.14 0.885 0.788 0.852 ± 0.1 0.885 0.774
3 13
ANN 12 3 0.852 ± 0.1 0.868 0.821 0.844 ± 0.1 0.861 0.806
3 13
B-KNN 14 1 0.874 ± 0.1 0.924 0.867 0.877 ± 0.09 0.918 0.871
3 13
B-SVM 14 1 0.844 ± 0.13 0.876 0.867 0.86 ± 0.08 0.877 0.871
3 13
B-ANN 12 3 0.877 ± 0.11 0.962 0.848 0.877 ± 0.09 0.959 0.839
2 14
RF 12 3 0.844 ± 0.13 0.905 0.812 0.861 ± 0.09 0.902 0.806
3 13
AB 11 4 0.866 ± 0.09 0.948 0.857 0.869 ± 0.08 0.943 0.839
1 15
GB 11 4 0.854 ± 0.1 0.953 0.788 0.862 ± 0.08 0.951 0.774
3 13
XGB 12 3 0.883 ± 0.1 0.993 0.914 0.886 ± 0.08 0.992 0.903
0 16
Het. ensemble 12 3 0.866 ± 0.1 0.928 0.812 0.876 ± 0.1 0.926 0.806
3 13

*Notice: the layout of the confusion matrix is referred to Table 3.

Table 7
ROC-AUC of the classification models.
Classification Model ROC-AUC

K-fold Train Test

KNN 0.935 ± 0.07 0.968 0.931


SVM 0.906 ± 0.06 0.916 0.796
SGD 0.846 ± 0.09 0.763 0.688
GP 0.956 ± 0.05 0.988 0.933
QDA 0.88 ± 0.07 0.892 0.817
GNB 0.85 ± 0.06 0.890 0.775
DT 0.887 ± 0.09 0.952 0.829
ANN 0.873 ± 0.06 0.935 0.817
B-KNN 0.973 ± 0.03 0.986 0.938
B-SVM 0.934 ± 0.06 0.972 0.892
B-ANN 0.931 ± 0.06 0.994 0.933
RF 0.962 ± 0.04 0.985 0.904
AB 0.952 ± 0.05 0.994 0.910
GB 0.933 ± 0.06 0.994 0.929
XGB 0.965 ± 0.04 0.999 0.950
Het. Ensemble 0.93 ± 0.08 0.987 0.950

Fig. 12. Correlation between TPR and TNR of classification models on test set.
agree with previous studies regarding the high potential of ensemble
learning in enhancing the performance of ML. Furthermore, ensemble presented in this study could be promising for landslides susceptibility
classifiers work efficiently with big data (e.g., landscapes at high re­ mapping, particularly when dealing with large landscapes.
solution) owing to their ability to distribute training on multiple ma­ The systematic measurement of multiple metrics (i.e., F1 score, ac­
chines or servers. In the context of landslide susceptibility mapping, curacy, ROC-AUC, TNR, and TPR) demonstrated the reliability of en­
although high-resolution mapping can help improve estimation accu­ semble classifiers in estimating the safety of the slope with consistently
racy by providing additional details regarding landslide features, most high accuracy, particularly in the case of XGB-CM. However, owing to
of the current ML-based approaches restrict mapping at a reasonably the bias of the database toward positive samples, the trained ensemble
high resolution owing to computational efficiency (Huang and Zhao, classifiers considered in this study tended to classify more slopes as
2018; Wu et al., 2014). Consequently, the ensemble learning models stable than as unstable, as shown in Fig. 5. This tendency may reduce

9
K. Pham, et al. Catena 196 (2021) 104886

the reliability of such models in landslide assessments, which primarily Feng, X.-T., 2000. Introduction of intelligent rock mechanics.
focus on detecting unstable slopes. However, this limitation could be Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning
and an application to boosting. J. Comput. Syst. Sci. 55, 119–139. https://doi.org/10.
eliminated by increasing the volume of data. 1006/jcss.1997.1504.
Friedman, J.H., 2001. Greedy function approximation: A gradient boosting machine. Ann.
7. Conclusion Stat. https://doi.org/10.2307/2699986.
Griffiths, D.V., Lane, P.a., Hyatt, M., Way, C., Station, F., Handbook, M.M., Nakamura, A.,
Cai, F., Ugai, K., Lau, Y.C.C.K., Roth, W.H., Dawson, E.M., Drescher, A., He, B.,
This study developed classification models using ensemble learning Zhang, H., Matthews, C., Farook, Z., Stability, T., Cruikshank, K.M., 1999. Slope
to estimate the stability status of slopes. Two prominent ensemble stability analysis by Finite elements. Geotechnique 49, 387–403. https://doi.org/10.
1680/geot.1999.49.6.835.
techniques were employed to implement these ensemble classifiers in­ Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans. Pattern Anal.
cluding the parallel ensemble with both homogeneous and hetero­ Mach. Intell. https://doi.org/10.1109/34.58871.
geneous ensembles and the sequential ensemble. Additionally, for Hoang, N.D., Pham, A.D., 2016. Hybrid artificial intelligence approach based on meta­
heuristic and machine learning for slope stability assessment: A multinational data
comparison, eight versatile learning algorithms (i.e., KNN, SVM, GP,
analysis. Expert Syst. Appl. 46, 60–68. https://doi.org/10.1016/j.eswa.2015.10.020.
GNB, QDA, ANN, DT, and SGD) widely used in literature for slope Hosmer, D.W., Lemeshow, S., 2000. Applied logistic regression second edition. Appl.
stability analyses were considered. The grid search algorithm was ap­ Logist. Regress. https://doi.org/10.1002/0471722146.
plied to tune the hyperparameters of each learning algorithm. Huang, Y., Zhao, L., 2018. Review on landslide susceptibility mapping using support
vector machines. Catena 165, 520–529. https://doi.org/10.1016/j.catena.2018.03.
Furthermore, K-fold cross-validation was employed to fairly evaluate 003.
the performance of the classification models. Kang, F., Xu, B., Li, J., Zhao, S., 2017. Slope stability evaluation using Gaussian processes
The obtained results demonstrated the superiority of ensemble with various covariance functions. Appl. Soft Comput. J. 60, 387–396. https://doi.
org/10.1016/j.asoc.2017.07.011.
classifiers over the single-learning classification models in evaluating Kavzoglu, T., Sahin, E.K., Colkesen, I., 2014. Landslide susceptibility mapping using GIS-
the stability status of slopes. When ensemble learning was applied to based multi-criteria decision analysis, support vector machines, and logistic regres­
implement the classification model instead of the single-learning algo­ sion. Landslides 11, 425–439. https://doi.org/10.1007/s10346-013-0391-7.
Keaton, J.R., 2007. Rock slope engineering. Environ. Eng. Geosci. https://doi.org/10.
rithm, the averaged F1 score, accuracy, and AUC of the classification 2113/gseegeosci.13.4.369.
models from the K-fold cross-validation increased by 2.17%, 1.66%, Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley
and 6.27%, respectively. In particular, boosting learning significantly & Sons.
Kuncheva, L.I., Whitaker, C.J., 2003. Measures of diversity in classifier ensembles and
improved the performance of the classification models. The highest F1 their relationship with the ensemble accuracy. Mach. Learn. https://doi.org/10.
score, accuracy, and AUC of 0.914, 0.903, and 0.95, respectively, on the 1023/A:1022859003006.
test set were obtained by XGB-CM. Furthermore, XGB-CM detected 16 Lechman, J.B., Griffiths, D.V., 2000. Analysis of the progression of failure of earth slopes
by finite elements. Slope Stability 2000, 250–265.
out of the 16 stable slopes and 12 out of the 15 failure slopes correctly,
Lee, J.H., Sameen, M.I., Pradhan, B., Park, H.J., 2018. Modeling landslide susceptibility in
thereby indicating its outstanding performance. Consequently, en­ data-scarce environments using optimized data mining and statistical methods.
semble learning, particularly XGB, is strongly suggested to develop Geomorphology 303, 284–298. https://doi.org/10.1016/j.geomorph.2017.12.007.
reliable models for estimating slope stability status as well as for further Lin, H.M., Chang, S.K., Wu, J.H., Juang, C.H., 2009. Neural network-based model for
assessing failure potential of highway slopes in the Alishan, Taiwan Area: Pre- and
applications in geotechnical fields. post-earthquake investigation. Eng. Geol. 104, 280–289. https://doi.org/10.1016/j.
enggeo.2008.11.007.
Declaration of Competing Interest Lin, Y., Zhou, K., Li, J., 2018. Prediction of slope stability using four supervised learning
methods. IEEE Access 6, 31169–31179. https://doi.org/10.1109/ACCESS.2018.
2843787.
The authors declare that they have no known competing financial Manouchehrian, A., Gholamnejad, J., Sharifzadeh, M., 2014. Development of a model for
analysis of slope stability for circular mode failure using genetic algorithm. Environ.
interests or personal relationships that could have appeared to influ­ Earth Sci. 71, 1267–1277. https://doi.org/10.1007/s12665-013-2531-8.
ence the work reported in this paper. Mason, L., Baxter, J., Bartlett, P., Frean, M., 2000. Boosting algorithms as gradient des­
cent. In: Advances in Neural Information Processing Systems.
Matsui, T., San, K.-C., 1992. Finite element slope stability analysis by shear strength re­
Acknowledgments duction technique. Soils Found. 32, 59–70. https://doi.org/10.3208/sandf1972.
32.59.
This research was supported by Science Research Program through Matsushi, Y., Hattanji, T., Matsukura, Y., 2006. Mechanisms of shallow landslides on soil-
mantled hillslopes with permeable and impermeable bedrocks in the Boso Peninsula,
the National Research Foundation of Korea (NRF) funded by the
Japan. Geomorphology 76, 92–108. https://doi.org/10.1016/j.geomorph.2005.10.
Ministry of Education (2019R1A2C2086647). 003.
Michalowski, R.L., 1995. Slope stability analysis: a kinematical approach. Géotechnique
References 45, 283–293. https://doi.org/10.1680/geot.1995.45.2.283.
Minaei-Bidgoli, B., Parvin, H., Alinejad-Rokny, H., Alizadeh, H., Punch, W.F., 2014.
Effects of resampling method and adaptation on clustering ensemble efficacy. Artif.
Alfaro, P., Delgado, J., García-Tortosa, F.J., Lenti, L., López, J.A., López-Casado, C., Intell. Rev. 41, 27–48. https://doi.org/10.1007/s10462-011-9295-x.
Martino, S., 2012. Widespread landslides induced by the Mw 5.1 earthquake of 11 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
May 2011 in Lorca, SE Spain. Eng. Geol. 137–138, 40–52. https://doi.org/10.1016/j. Prettenhofer, P., Weiss, R., Dubourg, V., 2011. Scikit-learn: Machine learning in
enggeo.2012.04.002. Python. J. Mach. Learn. Res. 12, 2825–2830.
Aurélien Géron, 2017. Hands-on Machine Learning with Scikit-Learn & Tensor Flow. Pham, K., Kim, D., Choi, H.J., Lee, I.M., Choi, H., 2018. A numerical framework for in­
Bishop, C.M., 2006. Patterns Recognition and Machine Learning, Springer-Verlag, New finite slope stability analysis under transient unsaturated seepage conditions. Eng.
York. https://doi.org/10.1016/B978-044452701-1.00059-4. Geol. 243, 36–49. https://doi.org/10.1016/j.enggeo.2018.05.021.
Bragagnolo, L., da Silva, R.V., Grzybowski, J.M.V., 2020. Artificial neural network en­ Qi, C., Fourie, A., Ma, G., Tang, X., 2018. A hybrid method for improved stability pre­
sembles applied to the mapping of landslide susceptibility. CATENA 184, 104240. diction in construction projects: A case study of stope hangingwall stability. Appl.
https://doi.org/10.1016/J.CATENA.2019.104240. Soft Comput. J. 71, 649–658. https://doi.org/10.1016/j.asoc.2018.07.035.
Breiman, L., 1997. Arcing the edge. Statistics (Ber). Qi, C., Tang, X., 2018a. Slope stability prediction using integrated metaheuristic and
Chen, C., Xiao, Z., Zhang, G., 2011. Stability assessment model for epimetamorphic rock machine learning approaches: A comparative study. Comput. Ind. Eng. 118, 112–122.
slopes based on adaptive neuro-fuzzy inference system. Electron. J. Geotech. Eng. 16 https://doi.org/10.1016/j.cie.2018.02.028.
A, 93–107. Qi, C., Tang, X., 2018b. A hybrid ensemble method for improved prediction of slope
Chen, T., Guestrin, C., 2016. XGBoost: A scalable tree boosting system. In: Proceedings of stability. Int. J. Numer. Anal. Methods Geomech. 42, 1823–1839. https://doi.org/10.
the ACM SIGKDD International Conference on Knowledge Discovery and Data 1002/nag.2834.
Mining, https://doi.org/10.1145/2939672.2939785. Rasmussen, C.E., Williams, C.K.I., 2006. Gaussian process for machine learning. MIT
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20, 273–297. press.
https://doi.org/10.1023/A:1022627411411. Rodriguez, C.E., Bommer, J.J., Chandler, R.J., 1999. Earthquake-induced landslides:
Das, S.K., Biswal, R.K., Sivakugan, N., Das, B., 2011. Classification of slopes and pre­ 1980–1997. Soil Dyn. Earthq. Eng. 18, 325–346. https://doi.org/10.1016/S0267-
diction of factor of safety using differential evolution neural networks. Environ. Earth 7261(99)00012-3.
Sci. 64, 201–210. https://doi.org/10.1007/s12665-010-0839-1. Sah, N.K., Sheorey, P.R., Upadhyaya, L.N., 1994. Maximum likelihood estimation of slope
Dawson, E.M., Roth, W.H., Drescher, A., 1999. Slope stability analysis by strength re­ stability. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 31, 47–53. https://doi.org/10.
duction. Géotechnique 49, 835–840. https://doi.org/10.1680/geot.1999.49.6.835. 1016/0148-9062(94)92314-0.

10
K. Pham, et al. Catena 196 (2021) 104886

Sakellariou, M.G., Ferentinou, M.D., 2005. A study of slope stability prediction using 06.005.
neural networks. Geotech. Geol. Eng. 23, 419–445. https://doi.org/10.1007/s10706- Wu, X., Ren, F., Niu, R., 2014. Landslide susceptibility assessment using object mapping
004-8680-5. units, decision tree, and support vector machine models in the Three Gorges of China.
Samui, P., 2013. Support vector classifier analysis of slope. Geomatics. Nat. Hazards Risk Environ. Earth Sci. 71, 4725–4738. https://doi.org/10.1007/s12665-013-2863-4.
4, 1–12. https://doi.org/10.1080/19475705.2012.684725. Xu, W., Xie, S., Jean-Pascal, D., Nicolas, B., Imbert, P., 1999. Slope stability analysis and
Shalev-Shwartz, S., Ben-David, S., 2013. Understanding machine learning: From theory to evaluation with probabilistic artificial neural network method. Site Investig. Sci.
algorithms, Understanding Machine Learning: From Theory to Algorithms. https:// Technol. 3, 19–21.
doi.org/10.1017/CBO9781107298019. Xue, X., 2017. Prediction of slope stability based on hybrid PSO and LSSVM. J. Comput.
Stone, M., 1974. Cross-validatory choice and assessment of statistical predictions. J. R. Civ. Eng. 31, 04016041. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000607.
Stat. Soc. Ser. B. Methodol. https://doi.org/10.2307/2984809. Zhou, Z.H., Wu, J., Tang, W., 2002. Ensembling neural networks: Many could be better
Wang, H.B., Xu, W.Y., Xu, R.C., 2005. Slope stability evaluation using back propagation than all. Artif. Intell. 137, 239–263. https://doi.org/10.1016/S0004-3702(02)
neural networks. Eng. Geol. 80, 302–315. https://doi.org/10.1016/j.enggeo.2005. 00190-X.

11

You might also like