0% found this document useful (0 votes)
13 views

[1.2]

This study presents novel ensemble models based on the Dual Perturb and Combine for Tree-based (DPCT) approach to predict landslide susceptibility along the Ha Long – Van Don highway in Vietnam. Utilizing a dataset of 78 landslide locations and 14 conditional factors, the B-DPCT model achieved optimal evaluation results, suggesting its effectiveness for construction planning and mitigation efforts. The findings highlight the importance of advanced machine learning techniques in enhancing landslide prediction accuracy in mountainous terrains.

Uploaded by

dotuannghiax8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

[1.2]

This study presents novel ensemble models based on the Dual Perturb and Combine for Tree-based (DPCT) approach to predict landslide susceptibility along the Ha Long – Van Don highway in Vietnam. Utilizing a dataset of 78 landslide locations and 14 conditional factors, the B-DPCT model achieved optimal evaluation results, suggesting its effectiveness for construction planning and mitigation efforts. The findings highlight the importance of advanced machine learning techniques in enhancing landslide prediction accuracy in mountainous terrains.

Uploaded by

dotuannghiax8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Geosciences Journal

Novel ensemble models based on Dual Perturb and Combine for Tree-based (DPCT)
for Landslide Susceptibility Mapping: a case in along Ha Long – Van Don Highway
--Manuscript Draft--

Manuscript Number: GEOJ-D-24-00125

Full Title: Novel ensemble models based on Dual Perturb and Combine for Tree-based (DPCT)
for Landslide Susceptibility Mapping: a case in along Ha Long – Van Don Highway

Short Title: Landslide Susceptibility Mapping based on DPCT (Ha Long – Van Don Highway)

Article Type: Article

Manuscript Classifications: 160: Engineering Geology; 440: Remote Sensing/GIS; 470: Soils

Corresponding Author: Tuan-Nghia Do, Ph.D.


Thuyloi University
Hanoi, VIET NAM

Corresponding Author's Institution: Thuyloi University

Corresponding Author E-Mail: [email protected]

First Author: Tran Van Phong, Msc.

Order of Authors: Tran Van Phong, Msc.

Tuan-Nghia Do, Ph.D.

Phan Trong Trinh, Ph.D.

Bui Nhi Thanh, Ph.D.

Vuong Hong Nhat, Msc.

Funding Information: National Foundation for Science and Mr. Tuan-Nghia Do


Technology Development
(105.08-2020.25)

Abstract: The areas along transportation routes constructed in mountainous terrain often harbor
significant landslide hazards. Ensemble learning techniques have proven their
effectiveness in improving landslide susceptibility prediction performance. In this study,
novel ensemble models (Bagging (B), Cascade Generalization (CG), and Dagging (D))
based on the Dual Perturb and Combine for Tree-based (DPCT) approach were
employed to predict landslide susceptibility along the Ha Long – Van Don highway. The
dataset comprised 78 landslide locations (3263 points), non-landslide locations (1:1
ratio with landslide points), and 14 conditional factors, including topography
characteristics, geology, rainfall, and land use/land cover (LULC) were input
parameters for the models (B-DPCT, CG-DPCT, D-DPCT, and DPCT). Evaluation
criteria for model prediction outcomes included the area under the receiver operating
characteristic curve (AUC), parameters derived from the confusion matrix, Kappa
statistics, and root mean square error (RMSE). Accordingly, landslide susceptibility
maps predicted based on the B-DPCT model exhibited optimal evaluation results on
the validation dataset (AUC = 0.948, accuracy ACC = 83.6, Kappa statistic = 0.67, and
RMSE = 0.37), suggesting their recommended use for construction planning and
mitigation efforts along the Ha Long – Van Don highway to minimize landslide-induced
damages.

Keywords: DPCTree; Ensemble Models; Machine Learning; LSM; HL-VD Highway

Suggested Reviewers: Abolfazl Jaafari, Ph.D.


Professor, Agricultural Research Education and Extension Organization
[email protected]
He has the same research field as mine.

Mahmoud Bayat, Ph.D.


Professor, Agricultural Research Education and Extension Organization
[email protected]

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
He has the same research field as mine.

Abhik Saha, Ph.D.


Professor, Indian Institute of Technology
[email protected]
He has the same research field as mine.

Additional Information:

Question Response

Does this manuscript belong to a special No


issue?

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript Click here to access/download;Manuscript;Manuscript 30 04
2024.docx
Click here to view linked References

1 Novel ensemble models based on dual perturb and combine for tree-

2 based (DPCT) for landslide susceptibility mapping: a case in along

3 Halong – Vandon expressway

5 Tran V. Phong1,2, Tuan-Nghia Do3*, Phan T. Trinh1,2, Bui N. Thanh2,4, Vuong H. Nhat5

1
7 Institute of Geological Sciences, Vietnam Academy of Science and Technology, 84 Chua Lang

8 Street, Hanoi, Vietnam.


2
9 Graduate University of Science and Technology, Vietnam Academy of Science and Technology,

10 18 Hoang Quoc Viet Street, Hanoi, Vietnam.


3
11 Faculty of Civil Engineering, Thuyloi University, 175 Tay Son, Hanoi, Vietnam.
4
12 Institute of Marine Geology and Geophysics, Vietnam Academy of Science and Technology, 18

13 Hoang Quoc Viet Street, Hanoi, Vietnam.


5
14 Institute of Geography, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet

15 Street, Hanoi, Vietnam.

16

17 Tran V. Phong: [email protected], researcher.

18 Tuan-Nghia Do*: [email protected], lecturer.

19 Phan T. Trinh: [email protected], researcher.

20 Bui N. Thanh: [email protected], researcher.

21 Vuong H. Nhat: [email protected], researcher.

22

1
23 *Corresponding author:

24 Tuan-Nghia Do

25 Lecturer, Faculty of Civil Engineering, Thuyloi University, room 416, A1 building, 175 Tay Son,

26 Hanoi, Vietnam.

27 +84-943312614, [email protected], ORCID: 0000-0003-1196-0616

28

29 Running title:

30 Landslide Susceptibility Mapping based on DPCT (Ha Long – Van Don Highway)

31

32

33

34

35

36

37

38

39

40

41

42

2
43 Abstract: The areas along transportation routes constructed in mountainous terrain often harbor

44 significant landslide hazards. Ensemble learning techniques have proven their effectiveness in

45 improving landslide susceptibility prediction performance. In this study, novel ensemble models

46 (Bagging (B), Cascade Generalization (CG), and Dagging (D)) based on the Dual Perturb and

47 Combine for Tree-based (DPCT) approach were employed to predict landslide susceptibility along

48 the Halong – Vandon expressway. The dataset comprised 78 landslide locations (3263 points), non-

49 landslide locations (1:1 ratio with landslide points), and 14 conditional factors, including

50 topography characteristics, geology, rainfall, and land use/land cover (LULC) were input

51 parameters for the models (B-DPCT, CG-DPCT, D-DPCT, and DPCT). Evaluation criteria for

52 model prediction outcomes included the area under the receiver operating characteristic curve

53 (AUC), parameters derived from the confusion matrix, Kappa statistics, and root mean square error

54 (RMSE). Accordingly, landslide susceptibility maps predicted based on the B-DPCT model

55 exhibited optimal evaluation results on the validation dataset (AUC = 0.948, accuracy ACC = 83.6,

56 Kappa statistic = 0.67, and RMSE = 0.37), suggesting their recommended use for construction

57 planning and mitigation efforts along the Halong – Vandon expressway to minimize landslide-

58 induced damages.

59 Key words: DPCTree, ensemble models, machine learning, LSM, Halong-Vandon expressway

60

61

62

63

64

3
65 1. INTRODUCTION

66

67 Nowadays, with the progression of climate change and global warming, extreme weather

68 phenomena are occurring with increased frequency and magnitude (D’Amato and Akdis, 2020;

69 Ogunbode, Doran and Böhm, 2020; Zaini, Zandalinas, Fritschi and Mittler, 2021; Hoang-Cong et

70 al., 2022; Ngo-Duc, 2023; Vonnisa and Marzuki, 2024). Consequently, natural disasters are also

71 occurring more frequently and on a larger scale (AghaKouchak et al., 2020; Ward et al., 2020;

72 Masson-Delmotte et al., 2022; Giang Linh, Dang Kinh and Bui Thanh, 2023; Farinós-Dasí et al.,

73 2024; Tin et al., 2024). Globally, it is estimated that over the past 20 years, the incidence of natural

74 disasters has increased by approximately 75%, resulting in the loss of over 1 million lives and

75 affecting the livelihoods of over 4 billion people, causing economic damages close to 3 trillion USD

76 (UNDRR, 2022). Among these, landslides are one of the most impactful forms of natural disasters

77 on the socio-economic front (Yadav et al., 2023). Therefore, landslide prediction research remains

78 a pressing issue to support disaster prevention and damage mitigation efforts (Binh Thai et al., 2022;

79 Pham et al., 2022; Dao Minh et al., 2023; Le Minh et al., 2023; Tong et al., 2023; Doan et al., 2024).

80 Landslide susceptibility mapping is an effective tool in landslide forecasting, assisting governments

81 in land management, urban planning, and settlement (Azarafza et al., 2021; Ado et al., 2022). There

82 are two main approaches to landslide susceptibility mapping: 1) qualitative approach and 2)

83 quantitative approach. Among them, the quantitative approach has proven more effective in

84 landslide prediction than the qualitative approach (Ibrahim et al., 2020; Shano, Raghuvanshi and

85 Meten, 2020; Asadi et al., 2022). Machine learning methods are currently being researched and

86 widely applied for landslide susceptibility mapping (Azarafza et al., 2021; Ado et al., 2022; Liu et

87 al., 2023).

4
88 In machine learning models applied to landslide susceptibility mapping, ensemble models

89 are often used to enhance the performance of a single model ( Di Napoli et al., 2020; Saha et al.,

90 2021; Arabameri et al., 2022; Pham et al., 2022). Commonly used ensemble techniques include

91 Bagging (Gu et al., 2024; Zhang et al., 2024), Cascade Generalization (Hong, 2023b; Ali et al.,

92 2024), Dagging (Bui et al., 2023; Le Minh et al., 2023; Tong et al., 2023), Decorate (Hong, 2023a;

93 Le Minh et al., 2023), Multi Boost (Ajin et al., 2022; Bien et al., 2023) và Rotation Forest (Kalantar

94 et al., 2020; Fang et al., 2021; Pham et al., 2022; Ali et al., 2024). Landslide prediction is a complex,

95 multivariate, and multicriteria problem (Liu et al., 2023). Each study area has different

96 characteristics and conditions leading to landslides (Costanzo and Irigaray, 2020; Ramos-Bernal et

97 al., 2021). Therefore, no single best model for solving all landslide prediction problems (Pham et

98 al., 2022). Hence, researching and understanding new machine learning models for landslide

99 susceptibility mapping applications is necessary to find the optimal prediction model (Ali et al.,

100 2024). Each machine-learning model has its theoretical foundation and techniques (Azevedo, Rocha

101 and Pereira, 2024). Therefore, tuning the parameters of these models helps improve accuracy in

102 landslide prediction (Yu, Wang and Pradhan, 2024). Particularly, ensemble techniques prove

103 effective in enhancing the prediction accuracy of the original single model (Tang et al., 2023; Zeng

104 et al., 2023).

105 Expressways are vital links in a country's transportation system and socio-economic

106 relations (Ngewie, 2024; Zhou et al., 2024). Expressways are designed for vehicles to operate at

107 high speeds, reducing travel time. Therefore, ensuring the smooth operation of expressways is

108 crucial for maintaining stability in socio-economic development. Landslides on expressways pose

109 a threat, disrupting traffic flow, potentially causing harm to people and vehicles in transit, and

110 damaging the road structure (Sun et al., 2023). During expressway construction, many mountainous

5
111 areas are disturbed, leading to numerous new landslide masses emerging along the route,

112 particularly with the formation of large landslide masses (Nguyen, Tien and Do, 2020; Pasang and

113 Kubíček, 2020). Furthermore, inadequate consideration in designing landslide mitigation structures

114 contributes to frequent landslides (Nguyen, Tien and Do, 2020; Van Tien et al., 2021). Predicting

115 landslides along expressway corridors is essential (Pasang and Kubíček, 2020; Sassa et al., 2020).

116 This serves the purpose of prevention, minimizing risks posed by landslides to vehicles, humans,

117 and the infrastructure of the expressway (Beigh and Bukhari, 2024; Sassa et al., 2020).

118 This study successfully applied novel ensemble models based on Dual Perturb. It combined

119 Tree-based techniques for landslide susceptibility mapping in the Halong – Vandon expressway

120 area, Quang Ninh province, Vietnam. The ensemble techniques (Bagging, CG, Dagging) helped

121 improve the Dual Perturb and Combined landslide prediction performance for the Tree-based

122 model. The landslide susceptibility maps are a robust scientific basis for managing, urban planning,

123 preventing, and mitigating landslide damages in the Halong – Vandon expressway area.

124 Additionally, this research provides evidence for the applicability of the ensemble models based on

125 DPCT techniques in landslide susceptibility mapping in expressway areas.

126

127 2. STUDY AREA

128

129 The Halong - Vandon Expressway construction commenced in September 2015. It was

130 completed by the end of 2018, spanning 59 kilometers within the jurisdiction of Quang Ninh

131 province in northeastern Vietnam. This expressway features four lanes with a design speed of 100

132 km/h and is crucial in promoting the socio-economic development of the northern region of

133 Vietnam. It connects renowned tourist destinations such as Halong Bay, Cat Ba Island, and Bai Tu

6
134 Long Bay with the Vandon Island District (https://www.quangninh.gov.vn/). The study area chosen

135 along the Halong - Vandon Expressway covers an area of 180.55 km2 (Fig. 1). This area exhibits

136 diverse topography, primarily hills, mountains, and plains, with elevations ranging from 2.5 to 395

137 meters. The region experiences prolonged rainy seasons from May to October each year, with an

138 average annual rainfall of 2300 mm, an average temperature of approximately 23°C, and an average

139 humidity of 84.6%. Winter months often witness foggy conditions (Technology, 2009). The

140 geological composition of the area is diverse, predominantly comprising rock formations from the

141 Hon Gai Formation, Tan Mai Formation, Binh Lieu Formation, Ha Coi Formation, Cat Ba

142 Formation, and the Quaternary (Fig. 3c). Specifically, the Halong - Vandon Expressway

143 predominantly traverses through the Hon Gai Formation, which harbors numerous coal seams of

144 industrial value (Thanh, 2011). After the Halong – Vandon expressway was operated, landslides

145 continued to occur here, affecting the traffic safety for vehicles passing through (Fig. 2) (Van Tien

146 et al., 2021).

147 Figure 1 is about here.

148 Figure 2 is about here.

149

150 3. METHODOLOGY AND MATERIALS

151

152 3.1. Methodology

153

154 The methodology for establishing landslide susceptibility maps in the Halong - Vandon

155 expressway area is presented in the flow chart of Fig. 3. In this research, machine learning models,

156 evaluation parameters, attribute selection methods, and data used are presented in this section. The

7
157 models in this paper utilized Weka software version 3.8.6 for computation and modeling

158 (https://ml.cms.waikato.ac.nz//weka/). The hyperparameters of the models are presented in Table

159 1.

160 Figure 3 is about here.

161 Table 1 is about here.

162

163 3.2. Adopted Models

164

165 3.2.1. Dual perturb and combine for tree-based (DPCT)

166 The DPCT is a machine learning algorithm for classification and regression problems,

167 which was first introduced by (Geurts and Wehenkel, 2005). This algorithm is a modified version

168 of the traditional dual perturb and combine (DPC) technique, where the combination of perturbed

169 datasets and predictions of base models is performed analytically without the need for multiple

170 iterations of the training and prediction process (Geurts, 2001; Geurts and Wehenkel, 2005). By

171 finding the optimal combination weights through analysis, DPC can provide efficient solutions and

172 extensions for combining learning with tree-based models (Geurts and Wehenkel, 2005; Khosravi

173 et al., 2022). Below is the sequence of operational steps of DPCT:

174 Step 1: Data Perturbation: Generate perturbed versions of the original dataset using techniques such

175 as bootstrapping, feature sampling, or instance sampling.

176 Step 2: Base Model Training: Train multiple base models, such as decision trees, random forests,

177 or gradient boosting machines, on each perturbed dataset independently. These base models capture

178 different aspects of the data due to the introduced randomness during perturbation.

8
179 Step 3: Analytical Combination: Instead of combining predictions using techniques like voting or

180 averaging, DPCT analytically determines the optimal combination weights for the predictions of

181 the tree-based models. This may involve solving optimization problems to minimize loss functions

182 or directly computing closed-form expressions for the optimal weights.

183 Step 4: Final Prediction: Once the optimal combination weights are determined, the final prediction

184 for a specific input sample is calculated as the weighted sum of the predictions from the base

185 models.

186 Step 5: Evaluation and Tuning: Evaluate the performance of the ensemble model using appropriate

187 evaluation metrics and adjust hyperparameters if necessary.

188

189 3.2.2. Bagging

190 Bagging is a machine-learning algorithm used to improve the performance of prediction

191 models. The benefits of bagging include reducing overfitting increasing the model's stability and

192 accuracy while also helping to minimize reliance on the training data (Breiman, 1996). Bagging,

193 mainly when applied to weak models such as decision trees, can generate more robust models

194 capable of aggregating loss patterns effectively. The bagging algorithm operates as follows:

195 Step 1: Bootstrap Sampling: Generate multiple subsets of data from the original training dataset

196 through bootstrap sampling. Each subset is the same size as the original dataset but may contain

197 repeated and missing samples.

198 Step 2: Model Training: Train a prediction model on each subset of data created in Step 1. Each

199 model is trained on a different subset of data thus they learn different aspects of the data.

9
200 Step 3: Prediction Aggregation: Combine the predictions from all the models trained in Step 2. In

201 the case of classification problems, the voting method is commonly applied to select the final

202 prediction.

203

204 3.2.3. Cascade Generalization (CG)

205 CG is an ensemble model for classification problems based on the stacking algorithm. CG

206 enhances the base model's performance by employing a sequential ensemble of classifiers, whereby

207 new attributes are inserted into the original dataset at each step. These new attributes are derived

208 from the probability layer provided by the base model (Gama and Brazdil, 2000). This reduces bias

209 in attribute evaluation, thereby improving the base model's performance. CG is currently one of the

210 most popular used ensemble models in natural disaster assessment (Chen et al., 2019; Pham et al.,

211 2019).

212

213 3.2.4. Dagging

214 Dagging is an ensemble model primarily used in classification tasks. Dagging helps improve

215 the model's performance by dividing the original dataset into smaller subsets and combining

216 predictions from sub-models (Ting and Witten, 1997). This helps the model avoid overfitting and

217 enhances generalization. The dagging algorithm operates as follows:

218 Step 1: Decomposition: Firstly, the training dataset is decomposed into subsets using a specific

219 decomposition method. Decomposition methods may include linear decomposition, scalar

220 decomposition, or decomposition using a neural network.

221 Step 2: Aggregation: Prediction models are trained on the subsets obtained from the decomposition

222 process. Each prediction model focuses on solving a specific part of the problem.

10
223 Step 3: Prediction Combination: Finally, sub-model predictions are combined to produce the final

224 prediction. The voting method is used to combine predictions from the sub-models.

225

226 3.3. Validation Parameters of Adopted Models

227

228 The models used for landslide susceptibility prediction are validated using evaluation

229 metrics for classification problems (Wardhani et al., 2019; Pham et al., 2022; Bien et al., 2023; Le

230 Minh et al., 2023), including AUC, Positive Predictive Value (PPV), Negative Predictive Value

231 (NPV), Sensitivity (SST), Specificity (SPF), Accuracy (ACC), Kappa index, and Root Mean Square

232 Error (RMSE). In there, AUC is a critical metric frequently utilized to evaluate the performance of

233 classifiers (Chen and Chen, 2021). The AUC is determined by combining SST and SPF values at

234 each predicted value threshold. The value of AUC ranges from 0 to 1, with a higher AUC indicating

235 better model performance (Chen and Chen, 2021; Pham et al., 2022; Bien et al., 2023; Le Minh et

236 al., 2023). The PPV, NPV, SST, SPF, and ACC metrics are expressed as percentages and are

237 calculated based on four parameters derived from the confusion matrix. These parameters consist

238 of True Positive (TP) and False Positive (FP), which respectively denote correctly and incorrectly

239 predicted landslide samples; True Negative (TN) and False Negative (FN), representing correctly

240 and incorrectly predicted non-landslide samples (Pham et al., 2022; Bien et al., 2023; Le Minh et

241 al., 2023). Higher PPV, NPV, SST, SPF, and ACC values, along with lower RMSE, indicate greater

242 model accuracy (Dao et al., 2020). The Kappa index is used as a statistical measure of agreement

243 between predicted and actual values ( Sterlacchini et al., 2011; Baeza, Lantada and Amorim, 2016).

244 The Kappa value ranges from 0 to 1, with a value closer to 1 indicating greater model accuracy

11
245 (Prakash et al., 2024). A model is considered to have high confidence accuracy with Kappa > 0.59

246 (Prakash et al., 2024).

247 The formulas for calculating the metrics mentioned above are as follows (Le Minh et al., 2023):

248 PPV = TP/(TP+FP), (1)

249 NPV = TN/(TN + FN), (2)

250 SST = TP/(TP + FN), (3)

251 SPF = TN/(TN+FP), (4)

252 ACC = (TP + TN)/(TN + FN + TP + FP), (5)

∑(𝑥𝑖 −𝑥̂𝑖 )2
253 𝑅𝑀𝑆𝐸 = √ , (6)
𝑁−𝑃

254 where 𝑥𝑖 and 𝑥̂𝑖 are the actual and predicted landslide susceptibility values, and P is the number of

255 estimated parameters, including the constant. N is the total number of landslide samples.
𝑃0 − 𝑃𝑚
256 𝐾𝑎𝑝𝑝𝑎 = (7)
1− 𝑃𝑚

257 where 𝑃0 is the relative observed agreement among raters and 𝑃𝑚 is the assumed probability of

258 random agreement.

259

260 3.4. Evaluation Attribute Methods

261

262 3.4.1. Correlation attribute evaluation (CAE)

263 CAE, or Pearson correlation coefficient, measures the linear correlation between two

264 continuous variables. It is used to quantify the strength and direction of the linear relationship

265 between the variables (Nettleton, 2014). The value of the correlation coefficient lies between -1 and

266 1 (Nettleton, 2014). This study determines the correlation coefficient between two variables: the

12
267 conditional factor and the landslide, or non-landslide, of the training dataset (Lucchese, de Oliveira

268 and Pedrollo, 2020). The correlation value is normalized to the range from 0 to 1. Thus, if the

269 correlation coefficient is close to 1, it indicates a strong positive correlation between the variables,

270 meaning that the conditional factor significantly influences landslides. Conversely, if the correlation

271 coefficient is close to 0, it indicates a weak linear relationship between the influencing parameter

272 and landslides. The formula for calculating the correlation coefficient is presented below (Nettleton,

273 2014):

∑(𝑥 −𝑥̅ )(𝑦 −𝑦̅)


274 𝑅 = |∑(𝑥 −𝑥̅𝑖 )2 ∑(𝑦𝑖 −𝑦̅)2 |, (8)
𝑖 𝑖

275 where 𝑥𝑖 and 𝑦𝑖 are the values of the two variables. 𝑥̅ and 𝑦 are the means of the two variables.

276

277 3.4.2. Gain ratio attribute evaluation (GRAE)

278 The GRAE is a metric used in attribute evaluation within the context of decision trees and

279 other machine-learning algorithms (Quinlan, 1986). It's utilized explicitly in feature selection to

280 determine the most informative attributes for classification tasks. Accordingly, the higher the

281 GRAE value, the more influence the conditional factor has on the landslide. If the GRAE value

282 equals 0, then the conditional factor is unrelated to landslides. Here's how it works:

283 Step 1: Entropy: Entropy measures the impurity or randomness of the data. It's calculated based on

284 the distribution of class labels within a dataset. Higher entropy indicates more disorder.

285 Step 2: Information Gain: Information gain measures how much a given attribute contributes to

286 reducing entropy in the dataset. When a dataset is split based on an attribute, information gain

287 quantifies how much more ordered the resulting subsets are than the original dataset.

13
288 Step 3: Split Information: This component of the Gain Ratio considers the intrinsic randomness

289 associated with the attribute. It's calculated based on the distribution of values of the attribute. If an

290 attribute has many distinct values, its split information is higher.

291 Step 4” Gain Ratio: The gain ratio considers information gain and split information. It's calculated

292 by dividing the information gained by the split information. This normalization helps in selecting

293 attributes that have a good balance between information gain and intrinsic randomness.

294

295 3.4.3. OneR method

296 OneR is used in attribute selection, particularly in machine learning and data mining (Holte,

297 1993). This method evaluates attributes based on their relevance to the target variable in a dataset

298 (Le Minh et al., 2023). Accordingly, the higher the OneR value, the greater the influence ranking

299 of the conditioning variable on landslides. Here's how the OneR algorithm typically works:

300 Step 1: Selecting a Target Variable: The first step is to select a target variable, which is the variable

301 that you want to predict or classify. This could be a categorical or numerical variable, depending

302 on the nature of the problem.

303 Step 2: Grouping Data by Each Attribute: Next, the algorithm examines each attribute in the dataset

304 one at a time. For each attribute, the data is grouped by its values.

305 Step 3: Finding the Most Common Class: Within each group, "One R" determines the target

306 variable's most common class or outcome. This could be the most frequent category in the case of

307 a categorical target variable or the mean or median value in the case of a numerical target variable.

308 Step 4: Creating Rules: Based on the most common class found in each group, "One R" creates

309 simple rules or decision boundaries. These rules say: "If attribute A has value X, then predict class

310 Y."

14
311 Step 5: Measuring Accuracy: Once rules are created for each attribute, the algorithm measures the

312 accuracy of predictions using these rules.

313 Step 6: Selecting the Best Attribute: Finally, the algorithm selects the attribute that produces the

314 most accurate predictions as the OneR model for that dataset.

315

316 3.5. Materials

317

318 3.5.1. Landslide inventories

319 The inventory data includes landslide and non-landslide locations (Yang et al., 2023). The

320 role of these data is to label the machine learning models (Tehrani, Santinelli and Herrera Herrera,

321 2021; Gu et al., 2024). This study collected all historical landslide data within the research area.

322 The method for identifying landslide locations involved two main steps: (1) Digitization on satellite

323 imagery (Google Earth) and (2) Verification through field surveys. In the research area, 78 landslide

324 locations were identified and represented on the map as polygons. To convert the data into a format

325 understandable by machine learning models, landslide data was assigned a value of 1, and non-

326 landslide data was assigned a value of 0. Accordingly, the entire landslide area was divided into

327 points corresponding to a 10m/pixel spatial resolution. The total number of computed landslide

328 points is 3263, which are divided into two sets: a training set consisting of 70% of the landslide

329 polygons (2392 points) and a testing set consisting of 30% of the landslide polygons (871 points).

330 Non-landslide data (3263 points) were sampled at a 1:1 ratio with landslide points. The sequence

331 of sampling non-landslide points in this study was based on two main steps: (1) Randomly sampling

332 points on map layers with slopes less than 50 and layers of curvature with values > -0.05 and < 0.05

333 (flat class), and (2) Verification, and normalized from field surveys.

15
334

335 3.5.2. Conditional factors

336 Landslide is a process driven by the interaction of conditional factors related to geological

337 characteristics, topography, geomorphology, land cover, and rainfall (Le Minh et al., 2023). Among

338 these, rainfall is often the triggering factor for landslides (Polemio and Petrucci, 2000; Crosta and

339 Frattini, 2008), with other factors contributing to the predisposition for such events (Zhang et al.,

340 2019). We selected 14 conditioning parameters to model the establishment of landslide

341 susceptibility maps in the Vandon - Halong Expressway area. These parameters and their sources

342 are presented in Fig. 4 and Table 2. The selection principle of these parameters is based on

343 synthesizing expert methods, considering the characteristic conditions of the study area, and

344 utilizing statistical evaluation methods (Correlation Attribute Evaluation, Gain Ratio Attribute

345 Evaluation, OneR). Accordingly, Elevation (m) is characteristic of 'terrain potential,' where higher

346 elevations indicate more significant terrain potential and are more conducive to landslide

347 occurrence (Bien et al., 2023; Le Minh et al., 2023). Weathering crust type is indicative of rock

348 destruction, related to the stability of the soil (Mai, 1996; Thanh et al., 2020; Phong et al., 2021).

349 Geological and geotechnical engineering factors characterize the properties, types, and components

350 of soil and rocks, indirectly related to the physical properties of soil and rocks ( Ohlmacher, 2000;

351 Chacón et al., 2006; Sitányiová et al., 2015). Hydrogeological characteristics indirectly indicate the

352 water retention capacity of soil and rocks, which is related to the conditions of water pressure in

353 voids within the soil (Tacher et al., 2005). LULC represents the vegetation cover characteristics on

354 the land; typically, areas with dense forest cover have lower landslide probabilities (Rabby et al.,

355 2022). Next, rainfall amount characterizes the landslide activation factor. Water infiltrates the soil

356 when rainfall occurs, saturating and breaking the original soil-rock bonds. Higher rainfall amounts

16
357 favor landslide occurrence (Zhang et al., 2019). This study calculates rainfall amount as the daily

358 average, and the classes are divided using the nature break statistical method. Fault density

359 (km/km2) is characteristic of the degree of rock destruction by tectonic; areas with higher fault

360 densities experience more significant rock destruction, facilitating landslides (Le Minh et al., 2023).

361 Stream density (km/km2) indirectly indicates the water retention capacity of soil and the drainage

362 conditions on the terrain (Shirzadi et al., 2017). Typically, areas with higher flow densities are more

363 favorable for landslides. Slope (degree) is also important for landslide occurrence (Çellek, 2020).

364 Generally, slopes ranging from 250 to 400 are favorable for landslides (Çellek, 2020). Aspect

365 represents the characteristics of windward slopes, indirectly related to the soil's moisture absorption

366 from humid air streams (Seda, 2021). Curvature characterizes the surface terrain, where flat terrain

367 (values from -0.05 to 0.05) usually experiences fewer landslides, while concave (<-0.05) and

368 convex (>0.05) terrains are more favorable for landslides (Phong et al., 2021). The Topographic

369 Wetness Index (TWI) indirectly indicates the moisture retention conditions of the terrain, related to

370 the soil's water saturation. Higher TWI values indicate greater moisture retention capacity in the

371 soil, and vice versa (Conoscenti, Di Maggio and Rotigliano, 2008). Lastly, the Stream Power Index

372 (SPI) is characteristic of the energy of the terrain. Higher SPI values correspond to higher landslide

373 probabilities (Yilmaz, 2009).

374 Table 2 is about here.

375 Figure 4 is about here.

376

377 4. RESULTS

378

17
379 4.1. Conditional Factor Importance

380

381 The evaluation results of the importance of landslide conditioning factors show that each

382 method has its ranking (Table 3). According to both CAE and GRAE methods, Slope is the most

383 influential factor, while the OneR method ranks Slope second. Elevation is the most influential

384 factor according to the OneR method, ranked third by CAE and GRAE methods. TWI is ranked

385 second in influence by the CAE method and only ranks fourth in the other two methods. Aspect is

386 ranked fourth in influence by the CAE method and fifth by the other methods.

387 Similarly, curvature is ranked second and third in influence by the GRAE and OneR methods.

388 However, the CAE method ranks curvature as thirteenth. Likewise, rainfall is evaluated as the fifth

389 most important factor according to the CAE method but ranks ninth and twelfth according to the

390 OneR and GRAE methods. Geology is ranked sixth by the GRAE and OneR methods but only

391 eleventh by the CAE method. Fault density is ranked sixth by the CAE method, seventh by the

392 OneR method, and tenth by the GRAE method. The remaining five conditioning factors,

393 Weathering crust, Geotechnical Engineering, Hydrogeology, SPI, and LULC, are all ranked as

394 having low influence by all three evaluation methods. Overall, the results of evaluating the

395 importance of conditioning factors to landslides using different methods indicate that these factors

396 each have a certain level of influence on landslide causation. The most important factors include

397 Slope, Elevation, Aspect, TWI, Rainfall, Curvature, and Geology.

398 Fig. 5 illustrates the distribution of landslide and non-landslide positions across the classes

399 of each conditioning factor. The analysis results from the charts reveal which classes influence

400 landslide and non-landslide occurrences most. Accordingly, for the Slope factor, non-landslide

401 positions mainly concentrate in classes ranging from 0-50, particularly with class 00 having a

18
402 predominant sample count of over 1700. Landslide positions are more evenly distributed across

403 classes ranging from 0-450, with fewer occurrences in classes more significant than 450. Regarding

404 the Weathering crust, landslides, and non-landslides are predominantly distributed across two

405 classes: Ferosialit andFerosialit–Sialferit. For the Geology factor, landslides are predominantly

406 concentrated in the Hon Gai Formation ( 2200 samples), Ha Coi Formation ( 630 samples), and

407 Quaternary ( 250 samples). Conversely, non-landslide positions are more evenly distributed across

408 predominant classes, particularly in the Hon Gai formation ( 1050 samples) and Quaternary (

409 650 samples). Continuing with the Geotechnical Engineering factor, landslides, and non-landslides

410 are concentrated on class G2, with sample counts of approximately 2450 and 2000, respectively.

411 Similarly, for the hydrogeology factor, landslide and non-landslide positions are predominantly

412 concentrated in the water-poor region class, with sample counts of 3000 and 2500, respectively. For

413 LULC, landslides and non-landslides are predominantly distributed in the Forest class, with sample

414 counts of approximately 2600 and 2000, respectively. Concerning Elevation, landslides are

415 primarily distributed at elevations ranging from 2.5 to 250m, while non-landslides are distributed

416 at lower elevations (< 50m). Subsequently, rainfall, landslides, and non-landslides are distributed

417 across all classes. Landslide positions are concentrated in classes with high rainfall (291 – 353

418 mm/day) and moderate rainfall (148 – 155 mm/day), with sample counts of approximately 1150

419 and 1000, respectively. Non-landslide positions are concentrated in classes with low rainfall (0 –

420 148 mm/day), with approximately 1400 samples. Landslides and non-landslides are distributed

421 across classes ranging from 0 – 1 km/km2 for the Fault density factor, with the highest concentration

422 in the class with a 1 km/km2 value. Regarding Stream density, landslide positions concentrate from

423 0 – 11 km/km2, while non-landslide positions predominantly concentrate from 5-12 km/km2. Next,

424 landslides are fairly evenly distributed across directions for the Aspect factor, with a significant

19
425 concentration in the South (S) class with 700 samples. Non-landslide positions predominantly

426 concentrate in the Flat class with approximately 1700 samples. Considering the Curvature factor,

427 landslide positions are scattered across the value range, whereas non-landslide positions

428 predominantly concentrate within the value range from -0.05 to 0.05. For the SPI factor, landslides

429 and non-landslides are sporadically and unevenly distributed across classes. Finally, for the TWI

430 factor, landslide positions predominantly concentrate in the value range from 0.32 – 8.0, while non-

431 landslide positions predominantly concentrate in the value range from 10 - 21.

432 Table 3 is about here.

433 Figure 5 is about here.

434

435 4.2. Evaluation of Models

436

437 The reliability and accuracy of the models based on critical parameters, including AUC,

438 PPV (%), NPV (%), SST (%), SPF (%), ACC (%), Kappa, and RMSE, are evaluated. The numerical

439 evaluation results of the models are detailed in Fig. 6 and Table 4. Accordingly, the B-DPCT model

440 yields the best results on the validation set with AUC = 0.948, PPV = 97.73%, NPV = 67.85%, SST

441 = 73.4%, SPF = 96.42%, ACC = 83.6%, Kappa = 0.67, and RMSE = 0.39 (Fig. 6b, Table 4).

442 Figure 6 is about here.

443 Table 4 is about here.

444

445 4.3. Landslide Susceptibility Maps

446

20
447 Using all the datasets for the entire study area, we have established four landslide

448 susceptibility maps created by four models, namely DPCT, B-DPCT, CG-DPCT, and D-DPCT

449 (Fig. 7). Each map is divided into five susceptibility classes: very low, low, moderate, high, and

450 very high (Fig. 7) using the natural break method. According to the susceptibility classes, we found

451 that the single model DPCT has a high area ratio in the very low (41.3%) and low (41.3%) classes,

452 while the moderate, high, and very high classes only account for 17.6% (Fig. 7a, 8a). The CG-

453 DPCT model is similar to DPCT, with most of the area in the very low and low classes (56.8%),

454 while the remaining three classes account for 43.2% (Fig. 7c, 8a). Likewise, the B-DPCT model

455 dominates the area in the very low and low classes (54.1%), with the other three accounting for

456 45.9% (Fig. 7b, 8a). In the D-DPCT model, the landslide susceptibility classes are evenly

457 distributed: very low (18.8%), low (14.7%), moderate (15.8%), high (14.1%), and very high

458 (10.8%) (Fig. 7d, 8a).

459 Figure 7 is about here.

460 Figure 8 is about here.

461

462 5. DISCUSSION

463

464 Recent studies have primarily focused on evaluating the performance of models based on

465 assessments of validation datasets (Bui et al., 2020; Dao et al., 2020; Thanh et al., 2020; Ghasemian

466 et al., 2020; Phong et al., 2021; Nhu et al., 2022; Pham et al., 2022; Shahzad, Ding and Abbas,

467 2022; Bien et al., 2023; Le Minh et al., 2023). The evaluation results based on metrics of the base

468 DPCT model are as follows: AUC = 0.919, PPV = 97.63%, NPV = 56.83%, SST = 71.60%, SPF =

469 95.56%, ACC = 78.34%, Kappa = 0.56, and RMSE = 0.4. Consequently, the Bagging ensemble

21
470 technique enhances the performance of the DPCT model with the following evaluation metrics:

471 AUC = 0.948 (an increase of 0.029), PPV = 97.73% (an increase of 0.1%), NPV = 67.85% (an

472 increase of 11.02%), SST = 77.22% (an increase of 5.62%), SPF = 96.41% (increase of 0.85%),

473 ACC = 83.60% (increase of 5.26%), Kappa = 0.67 (increase of 0.11), RMSE = 0.37 (decrease of

474 0.3). Similarly, the Cascade Generalization technique also improves the performance of the base

475 DPCT model with the following evaluation metrics: AUC = 0.920 (an increase of 0.001), PPV =

476 97.63% (unchanged), NPV = 59.82% (an increase of 2.99%), SST = 73.04% (increase of 1.44%),

477 SPF = 95.77% (increase of 0.21%), ACC = 79.75% (increase of 1.41%), Kappa = 0.59 (increase of

478 0.03), RMSE = 0.39 (decrease of 0.1). Lastly, the Dagging technique also improves the performance

479 of the base DPCT model with the following evaluation metrics: AUC = 0.932 (an increase of 0.013),

480 PPV = 97.43% (decrease of 0.2%), NPV = 65.33% (an increase of 8.5%), SST = 75.80% (an

481 increase of 4.2%), SPF = 95.79% (increase of 0.23%), ACC = 82.25% (increase of 3.91%), Kappa

482 = 0.64 (increase of 0.08), RMSE = 0.37 (decrease of 0.3). The analysis results above demonstrate

483 that the Bagging, Cascade Generalization, and Dagging techniques all have the potential to enhance

484 the performance of the DPCT model (Fig. 6, Table 4). In other landslide evaluation studies, these

485 techniques also demonstrate the ability to improve the performance of a single model (Ali et al.,

486 2024; Gu et al., 2024; Zhao et al., 2024). This indicates that ensemble learning techniques can easily

487 enhance the performance of the original single model (Liu et al., 2024; Singh et al., 2024). In this

488 study, the Bagging ensemble based on the DPCT model exhibited the best performance compared

489 to the other models.

490 Alongside evaluating the performance of the models based on the evaluation metrics

491 presented above, incorporating additional analysis of landslide susceptibility map results helps

492 answer the question of which model is the most reasonable for establishing landslide susceptibility

22
493 maps in the study area (Dao et al., 2020; Phong et al., 2021; Le Minh et al., 2023). Fig. 8 presents

494 the results of the analysis of evaluation metrics based on landslide susceptibility maps. In all models,

495 the percentage of landslides and the frequency ratio (FR) of landslides tend to increase across

496 landslide susceptibility classes from very low to very high, with landslide locations primarily

497 concentrated in the high and very high classes (Figure 8b, 8d). The percentage of non-landslides

498 and the frequency ratio of non-landslides tend to decrease across landslide susceptibility classes

499 from very low to very high, with non-landslide locations mainly concentrated in the very low and

500 low classes (Fig. 8c, 8e). According to the frequency ratio evaluation method, the higher the FR

501 value of landslide classes lies above the high and very high landslide susceptibility classes, the more

502 reliable the landslide susceptibility map becomes (Dao et al., 2020; Phong et al., 2021; Bien et al.,

503 2023; Le Minh et al., 2023). Conversely, the lower the FR value of non-landslide classes lies below

504 the low and very low landslide susceptibility classes, the more reliable the landslide susceptibility

505 map becomes. Accordingly, the Bagging ensemble based on DPCT model yields the best results

506 according to the FR of landslides with FR = 4.59 in the high class and FR = 10.03 (Fig. 8d). The

507 Dagging ensemble based on DPCT model yields the best results according to the FR of non-

508 landslides with FR = 3.38 in the very low class and FR = 1.44 in the low class (Fig. 8e).

509 The results of the analysis of 14 conditioning factors for landslides based on comparing different

510 attribute selection methods show that each method yields different evaluations of the importance of

511 each factor (Table 3). This indicates that the conditioning factors all play specific roles in

512 influencing landslides (Keshri, Sarkar and Chattoraj, 2023; Yu, Wang and Pradhan, 2024).

513 Therefore, the more factors affecting landslides are selected, the more objective the evaluation and

514 establishment of landslide susceptibility maps become. In this study, we did not select factors such

515 as distance to roads or traffic density like other studies (Dao et al., 2020; Phong et al., 2021; Le

23
516 Minh et al., 2023) because the main focus of the research area is the Halong – Vandon expressway

517 Therefore, selecting parameters related to roads would not reflect landslide characteristics on these

518 conditioning factors and would introduce noise into the datasets.

519 The NPV results of all models used on training and validation datasets are much lower than

520 the PPV results (Table 4). This suggests that selecting appropriate non-landslide locations may help

521 improve the model's performance (Yang et al., 2023; Gu et al., 2024). In subsequent studies, more

522 attention should be paid to evaluating the selection of non-landslide locations (Yang et al., 2023;

523 Gu et al., 2024; Huang et al., 2024). Additionally, we believe that the following enhancements can

524 improve the performance of the models in future studies: 1) adjusting the structural parameters of

525 the model, selecting the optimal model structure, 2) enhancing the detail and resolution of the

526 conditioning factor maps, 3) ensuring consistency in the research level and scale of the data, 4)

527 considering weighting the predicted labels according to the scale of each landslide location.

528 There are still many issues related to improving the performance of models, such as selecting non-

529 landslide locations, choosing conditional factors, the level of detail of the data, and the structure of

530 the selected models. However, this is the first study to successfully apply combined models to

531 enhance the performance of DPCT models in establishing landslide susceptibility maps in

532 expressway areas. We recommend applying the Bagging ensemble based on the DPCT model to

533 establish landslide susceptibility maps for similar conditions. The landslide susceptibility map

534 based on the B-DPCT model in the research area is recommended for use in management,

535 construction planning, and disaster prevention and mitigation efforts. The classes of the landslide

536 susceptibility map provide a solid scientific basis for managers to carry out the tasks above.

537

24
538 6. CONCLUSIONS

539

540 This study highlights the potential of employing ensemble methods to improve the

541 performance of landslide susceptibility prediction. It is the first time ensemble methods based on

542 Dual Perturb and Combine for Tree-based have been applied to enhance landslide prediction

543 performance in the Halong - Vandon Expressway area, Quang Ninh, Vietnam. Through evaluations

544 and validations, the Bagging technique combined with Dual Perturb and Combine for Tree-based

545 yielded the most optimal results and is recommended for landslide prediction in similar conditions.

546 Additionally, the study demonstrates that the structure of models can significantly influence the

547 accuracy and effectiveness of landslide prediction. Among the 14 conditional factors, different

548 evaluation methods show varying impacts of each factor. Dominant influencing condition factors

549 include Slope, Elevation, Aspect, TWI, Rainfall, Curvature, and Geology.

550 The models identify approximately 10% of the study area with the highest landslide

551 susceptibility, mainly concentrated along the Halong - Vandon Expressway. The distribution of

552 landslide susceptibility classes provides a scientific basis for the government to manage and plan

553 future construction projects and implement preventive measures to minimize landslide risks. This

554 study suggests that further research is necessary to help improve the performance of landslide

555 prediction. The performance of future landslide prediction models can be enhanced by modifying

556 model structures, adjusting conditional factors, and selecting suitable non-landslide locations.

557 Particularly, improving the data detail significantly impacts the enhancement of landslide prediction

558 performance.

559

25
560 ACKNOWLEDGMENTS

561

562 This research is funded by the Vietnam National Foundation for Science and Technology

563 Development (NAFOSTED) under grant number 105.08-2020.25.

564

565 REFERENCES

566

567 Ado, M., Amitab, K., Maji, A.K., Jasińska, E., Gono, R., Leonowicz, Z. and Jasiński, M., 2022,

568 Landslide susceptibility mapping using machine learning: a literature survey. Remote

569 Sensing, 14, 3029. https://doi.org/10.3390/rs14133029

570 AghaKouchak, A., Chiang, F., Huning, L.S., Love, C.A., Mallakpour, I., Mazdiyasni, O.,

571 Moftakhari, H., Papalexiou, S.M., Ragno, E. and Sadegh, M., 2020, Climate extremes and

572 compound hazards in a warming world. Annual Review of Earth and Planetary Sciences,

573 48, 519–548. https://doi.org/10.1146/annurev-earth-071719-055228

574 Ajin, R.S., Saha, S., Saha, A., Biju, A., Costache, R. and Kuriakose, S.L., 2022, Enhancing the

575 accuracy of the REPTree by integrating the hybrid ensemble meta-classifiers for modelling

576 the landslide susceptibility of Idukki district, south-western India. Journal of the Indian

577 Society of Remote Sensing, 50, 2245–2265. https://doi.org/10.1007/s12524-022-01599-4

578 Ali, N., Chen, J., Fu, X., Ali, R., Hussain, M.A., Daud, H., Hussain, J. and Altalbe, A., 2024,

579 Integrating machine learning ensembles for landslide susceptibility mapping in northern

580 Pakistan. Remote Sensing, 16, 988. https://doi.org/10.3390/rs16060988

581 Arabameri, A., Chandra Pal, S., Rezaie, F., Chakrabortty, R., Saha, A., Blaschke, T., Di Napoli, M.,

582 Ghorbanzadeh, O. and Thi Ngo, P.T., 2022, Decision tree based ensemble machine learning

26
583 approaches for landslide susceptibility mapping. Geocarto International, 37, 4594–4627.

584 https://doi.org/10.1080/10106049.2021.1892210

585 Asadi, M., Goli Mokhtari, L., Shirzadi, A., Shahabi, H. and Bahrami, S., 2022, A comparison study

586 on the quantitative statistical methods for spatial prediction of shallow landslides (case

587 study: Yozidar-Degaga route in Kurdistan province, Iran). Environmental Earth Sciences,

588 81, 51. https://doi.org/10.1007/s12665-021-10152-4

589 Azarafza, M., Azarafza, M., Akgün, H., Atkinson, P.M. and Derakhshani, R., 2021, Deep learning-

590 based landslide susceptibility mapping. Scientific Reports, 11, 24112.

591 https://doi.org/10.1038/s41598-021-03585-1

592 Azevedo, B.F., Rocha, A.M.A.C. and Pereira, A.I., 2024, Hybrid approaches to optimization and

593 machine learning methods: a systematic literature review. Machine Learning.

594 https://doi.org/10.1007/s10994-023-06467-x

595 Baeza, C., Lantada, N. and Amorim, S., 2016, Statistical and spatial analysis of landslide

596 susceptibility maps with different classification systems. Environmental Earth Sciences, 75,

597 1318. https://doi.org/10.1007/s12665-016-6124-1

598 Beigh, I.H. and Bukhari, S.K., 2024, Landslide susceptibility assessment using GIS-based

599 multicriteria decision analysis (MCDA) along a part of national expressway-1, Kashmir-

600 Himalayas, India. Applied Geomatics. https://doi.org/10.1007/s12518-024-00559-6

601 Bien, T.X., Iqbal, M., Jamal, A., Nguyen, D.D., Van Phong, T., Costache, R., Ho, L.S., Van Le, H.,

602 Nguyen, H.B.T., Prakash, I. and Pham, B.T., 2023, Integration of rotation forest and

603 multiboost ensemble methods with forest by penalizing attributes for spatial prediction of

604 landslide susceptible areas. Stochastic Environmental Research and Risk Assessment, 37,

605 4641–4660. https://doi.org/10.1007/s00477-023-02521-1

27
606 Binh Thai, P., Duc Nguyen, D., Bui Thi, Q.-A., Duc Nguyen, M., Tien Vu, T. and Prakash, I., 2022,

607 Estimation of load-bearing capacity of bored piles using machine learning models. Vietnam

608 Journal of Earth Sciences, 44, 470–480. https://doi.org/10.15625/2615-9783/17177

609 Breiman, L., 1996, Bagging predictors. Machine learning, 24, 123–140.

610 https://doi.org/10.1007/BF00058655

611 Bui, D.T., Tsangaratos, P., Nguyen, V.-T., Liem, N.V. and Trinh, P.T., 2020, Comparing the

612 prediction performance of a Deep Learning Neural Network model with conventional

613 machine learning models in landslide susceptibility assessment. Catena, 188, 104426.

614 https://doi.org/10.1016/j.catena.2019.104426

615 Bui, Q.D., Ha, H., Khuc, D.T., Nguyen, D.Q., von Meding, J., Nguyen, L.P. and Luu, C., 2023,

616 Landslide susceptibility prediction mapping with advanced ensemble models: Son La

617 province, Vietnam. Natural Hazards, 116, 2283–2309. https://doi.org/10.1007/s11069-022-

618 05764-3

619 Çellek, S., 2020, Effect of the slope angle and its classification on landslide. Natural Hazards Earth

620 System Sciences, 2020, 1–23. https://doi.org/10.5194/nhess-2020-87

621 Chacón, J., Irigaray, C., Fernández, T. and El Hamdouni, R., 2006, Engineering geology maps:

622 landslides and geographical information systems. Bulletin of Engineering Geology and the

623 Environment, 65, 341–411. https://doi.org/10.1007/s10064-006-0064-z

624 Chen, W., Yan, X., Zhao, Z., Hong, H., Bui, D. and Pradhan, B., 2019, Spatial prediction of

625 landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and

626 RBFNetwork models for the Long county area (China). Bulletin of Engineering Geology

627 and the Environment, 78, 247–266. https://doi.org/10.1007/s10064-018-1256-z

28
628 Chen, X. and Chen, W., 2021, GIS-based landslide susceptibility assessment using optimized

629 hybrid machine learning methods. Catena, 196, 104833.

630 https://doi.org/10.1016/j.catena.2020.104833

631 Conoscenti, C., Di Maggio, C. and Rotigliano, E., 2008, GIS analysis to assess landslide

632 susceptibility in a fluvial basin of NW Sicily (Italy). Geomorphology, 94, 325–339.

633 https://doi.org/10.1016/j.geomorph.2006.10.039

634 Costanzo, D. and Irigaray, C., 2020, Comparing forward conditional analysis and forward logistic

635 regression methods in a landslide susceptibility assessment: a case study in Sicily.

636 Hydrology, 7, 37. https://doi.org/10.3390/hydrology7030037

637 Crosta, G.B. and Frattini, P., 2008, Rainfall-induced landslides and debris flows. Hydrological

638 Processes, 22, 473–477. https://doi.org/10.1007/s10346-019-01336-y

639 D’Amato, G. and Akdis, C.A., 2020, Global warming, climate change, air pollution and allergies.

640 Allergy, 75, 2158–2160. https://doi.org/10.1111/all.14527

641 Dao, D.V., Jaafari, A., Bayat, M., Mafi-Gholami, D., Qi, C., Moayedi, H., Phong, T.V., Ly, H.-B.,

642 Le, T.-T., Trinh, P.T., Luu, C., Quoc, N.K., Thanh, B.N. and Pham, B.T., 2020, A spatially

643 explicit deep learning neural network model for the prediction of landslide susceptibility.

644 Catena, 188, 104451. https://doi.org/10.1016/j.catena.2019.104451

645 Dao Minh, D., Vu Cao, M., Hoang Hai, Y., Nguyen The, L. and Do Minh, D., 2023, Analysis of

646 landslide kinematics integrating weather and geotechnical monitoring data at Tan Son slow

647 moving landslide in Ha Giang province. Vietnam Journal of Earth Sciences, 45, 131–146.

648 https://doi.org/10.15625/2615-9783/18204

649 Di Napoli, M., Carotenuto, F., Cevasco, A., Confuorto, P., Di Martire, D., Firpo, M., Pepe, G.,

650 Raso, E. and Calcaterra, D., 2020, Machine learning ensemble modelling as a tool to

29
651 improve landslide susceptibility mapping reliability. Landslides, 17, 1897–1914.

652 https://doi.org/10.1007/s10346-020-01392-9

653 Doan, V.L., Nguyen, B.-Q.-V., Nguyen, C.C. and Nguyen, C.T., 2024, Effect of time-variant

654 rainfall on landslide susceptibility: a case study in Quang Ngai province, Vietnam. Vietnam

655 Journal of Earth Sciences, 46, 203–221. https://doi.org/10.1515/geo-2022-0550

656 Fang, Z., Wang, Y., Duan, G. and Peng, L., 2021, Landslide susceptibility mapping using rotation

657 forest ensemble technique with different decision trees in the Three Gorges reservoir area,

658 China. Remote Sensing, 13, 238. https://doi.org/10.3390/rs13020238

659 Farinós-Dasí, J., Pinazo-Dallenbach, P., Peiró Sánchez-Manjavacas, E. and Rodríguez-Bernal,

660 D.C., 2024, Disaster risk management, climate change adaptation and the role of spatial and

661 urban planning: evidence from European case studies. Natural Hazards.

662 https://doi.org/10.1007/s11069-024-06448-w

663 Gama, J. and Brazdil, P., 2000, Cascade generalization. Machine Learning, 41, 315–343.

664 https://doi.org/10.1023/A:1007652114878

665 Geurts, P., 2001, Dual perturb and combine algorithm. Proceedings of the 8th International

666 Workshop on Artificial Intelligence and Statistics, Florida, 196–201.

667 Geurts, P. and Wehenkel, L., 2005, Closed-form dual perturb and combine for tree-based models.

668 Proceedings of the 22nd International Conference on Machine Learning, New York, 233–

669 240. https://doi.org/10.1145/1102351.1102381

670 Ghasemian, B., Asl, D.T., Pham, B.T., Avand, M., Nguyen, H.D. and Janizadeh, S., 2020, Shallow

671 landslide susceptibility mapping: A comparison between classification and regression tree

672 and reduced error pruning tree algorithms. Vietnam Journal of Earth Sciences, 42, 208–227.

673 https://doi.org/10.15625/0866-7187/42/3/14952

30
674 Giang Linh, T., Dang Kinh, B. and Bui Thanh, Q., 2023, Coastline and shoreline change assessment

675 in sandy coasts based on machine learning models and high-resolution satellite images.

676 Vietnam Journal of Earth Sciences, 45, 251–270. https://doi.org/10.15625/2615-

677 9783/18407

678 Gu, T., Duan, P., Wang, M., Li, J. and Zhang, Y., 2024, Effects of non-landslide sampling strategies

679 on machine learning models in landslide susceptibility mapping. Scientific Reports, 14,

680 7201. https://doi.org/10.1038/s41598-024-57964-5

681 Hoang-Cong, H., Ngo-Duc, T., Nguyen-Thi, T., Trinh-Tuan, L., Jing Xiang, C., Tangang, F.,

682 Jerasorn, S. and Phan-Van, T., 2022, A high-resolution climate experiment over part of

683 Vietnam and the Lower Mekong Basin: performance evaluation and projection for rainfall.

684 Vietnam Journal of Earth Sciences, 44, 92–108. https://doi.org/10.15625/2615-9783/16942

685 Holte, R.C., 1993, Very simple classification rules perform well on most commonly used datasets.

686 Machine Learning, 11, 63-90. https://doi.org/10.1023/A:1022631118932

687 Hong, H., 2023a, Assessing landslide susceptibility based on hybrid multilayer perceptron with

688 ensemble learning. Bulletin of Engineering Geology and the Environment, 82, 382.

689 https://doi.org/10.1007/s10064-023-03409-8

690 Hong, H., 2023b, Assessing landslide susceptibility based on hybrid Best-first decision tree with

691 ensemble learning model. Ecological Indicators, 147, 109968.

692 https://doi.org/10.1016/j.ecolind.2023.109968

693 Huang, F., Xiong, H., Jiang, S.-H., Yao, C., Fan, X., Catani, F., Chang, Z., Zhou, X., Huang, J. and

694 Liu, K., 2024, Modelling landslide susceptibility prediction: a review and construction of

695 semi-supervised imbalanced theory. Earth-Science Reviews, 250, 104700.

696 https://doi.org/10.1016/j.earscirev.2024.104700

31
697 Ibrahim, M.B., Harahap, I.S.H., Balogun, A.-L.B. and Usman, A., 2020, The use of geospatial data

698 from GIS in the quantitative analysis of landslides. IOP Conference Series: Earth and

699 Environmental Science, 540, 012048. https://doi.org/10.1088/1755-1315/540/1/012048

700 Kalantar, B., Ueda, N., Saeidi, V., Ahmadi, K., Halin, A.A. and Shabani, F., 2020, Landslide

701 susceptibility mapping: machine and ensemble learning based on remote sensing big data.

702 Remote Sensing, 12, 1737. https://doi.org/10.3390/rs12111737

703 Keshri, D., Sarkar, K. and Chattoraj, S.L., 2023, Landslide susceptibility mapping in parts of Aglar

704 watershed, Lesser Himalaya based on frequency ratio method in GIS environment. Journal

705 of Earth System Science, 133, 1. https://doi.org/10.1007/s12040-023-02204-z

706 Khosravi, K., Golkarian, A., Melesse, A.M. and Deo, R.C., 2022, Suspended sediment load

707 modeling using advanced hybrid rotation forest based elastic network approach. Journal of

708 Hydrology, 610. https://doi.org/10.1016/j.jhydrol.2022.127963

709 Le Minh, N., Truyen, P.T., Van Phong, T., Jaafari, A., Amiri, M., Van Duong, N., Van Bien, N.,

710 Duc, D.M., Prakash, I. and Pham, B.T., 2023, Ensemble models based on radial basis

711 function network for landslide susceptibility mapping. Environmental Science and Pollution

712 Research, 30, 99380–99398. https://doi.org/10.1007/s11356-023-29378-9

713 Liu, L.-L., Danish, A., Wang, X.-M. and Zhu, W.-Q., 2024, Ensemble stacking: a powerful tool for

714 landslide susceptibility assessment – a case study in Anhua county, Hunan province, China.

715 Geocarto International, 39, 2326005. https://doi.org/10.1080/10106049.2024.2326005

716 Liu, S., Wang, L., Zhang, W., He, Y. and Pijush, S., 2023, A comprehensive review of machine

717 learning-based methods in landslide susceptibility mapping. Geological Journal, 58, 2283–

718 2301. https://doi.org/10.1002/gj.4666

32
719 Lucchese, L.V., de Oliveira, G.G. and Pedrollo, O.C., 2020, Attribute selection using correlations

720 and principal components for artificial neural networks employment for landslide

721 susceptibility assessment. Environmental Monitoring and Assessment, 192, 129.

722 https://doi.org/10.1007/s10661-019-7968-0

723 Mai, N.T., 1996, Forecasting occurrence of landslide related to the tropical weathering crust by

724 statistical analysis. 情報地質, 7, 91–95.

725 Masson-Delmotte, V., Zhai, P., Pörtner, H.-O., Roberts, D., Skea, J. and Shukla, P.R., 2022, Global

726 Warming of 1.5°C: IPCC Special Report on Impacts of Global Warming of 1.5°C above

727 Pre-industrial Levels in Context of Strengthening Response to Climate Change, Sustainable

728 Development, and Efforts to Eradicate Poverty. Cambridge University Press

729 Nettleton, D., 2014, Chapter 6 - Selection of variables and factor derivation. In: Nettleton, D. (ed.),

730 Commercial Data Mining. Morgan Kaufmann, Boston, 79–104.

731 https://doi.org/10.1016/B978-0-12-416602-8.00006-6

732 Ngewie, D.T.L., 2024, The impacts of road transport infrastructure and the socio-economic

733 development in the Bamenda III municipality, Mezam division, north west region

734 Cameroon. International Journal of Business Diplomacy and Economy, 3, 39–51.

735 https://doi.org/10.51699/ijbde.v3i1.3322

736 Ngo-Duc, T., 2023, Rainfall extremes in northern Vietnam: a comprehensive analysis of patterns

737 and trends. Vietnam Journal of Earth Sciences, 45, 183-198. https://doi.org/10.15625/2615-

738 9783/18284

739 Nguyen, L.C., Tien, P.V. and Do, T.-N., 2020, Deep-seated rainfall-induced landslides on a new

740 expressway: a case study in Vietnam. Landslides, 17, 395–407.

741 https://doi.org/10.1007/s10346-019-01293-6

33
742 Nhu, V.-H., Bui, T.T., My, L.N., Vuong, H. and Duc, H.N., 2022, A new approach based on

743 integration of random subspace and C4.5 decision tree learning method for spatial prediction

744 of shallow landslides. Vietnam Journal of Earth Sciences, 44, 327–342.

745 https://doi.org/10.15625/2615-9783/16929

746 Ogunbode, C.A., Doran, R. and Böhm, G., 2020, Exposure to the IPCC special report on 1.5 °C

747 global warming is linked to perceived threat and increased concern about climate change.

748 Climatic Change, 158, 361–375. https://doi.org/10.1007/s10584-019-02609-0

749 Ohlmacher, G.C., 2000, The relationship between geology and landslide hazards of Atchison,

750 Kansas, and vicinity. Current Research in Earth Sciences, 244, 1–16.

751 https://doi.org/10.17161/cres.v0i244.11833

752 Pasang, S. and Kubíček, P., 2020, Landslide susceptibility mapping using statistical methods along

753 the Asian expressway, Bhutan. Geosciences, 10, 430.

754 https://doi.org/10.3390/geosciences10110430

755 Pham, B., Prakash, I., Chen, W., Ly, H.-B., Ho, L., Omidvar, E., Tran, V. and Bui, D., 2019, A

756 novel intelligence approach of a sequential minimal optimization-based support vector

757 machine for landslide susceptibility mapping. Sustainability, 11, 6323.

758 https://doi.org/10.3390/su11226323

759 Pham, B.T., Vu, V.D., Costache, R., Phong, T.V., Ngo, T.Q., Tran, T.-H., Nguyen, H.D., Amiri,

760 M., Tan, M.T., Trinh, P.T., Le, H.V. and Prakash, I., 2022, Landslide susceptibility mapping

761 using state-of-the-art machine learning ensembles. Geocarto International, 37, 5175–5200.

762 https://doi.org/10.1080/10106049.2021.1914746

763 Phong, T.V., Phan, T.T., Prakash, I., Singh, S.K., Shirzadi, A., Chapi, K., Ly, H.-B., Ho, L.S., Quoc,

764 N.K. and Pham, B.T., 2021, Landslide susceptibility modeling using different artificial

34
765 intelligence methods: a case study at Muong Lay district, Vietnam. Geocarto International,

766 36, 1685–1708. https://doi.org/10.1080/10106049.2019.1665715

767 Polemio, M. and Petrucci, O., 2000, Rainfall as a landslide triggering factor an overview of recent

768 international research. Thomas Telford Ltd.

769 Prakash, I., Nguyen, D.D., Tuan, N.T. and Phong, T.V., 2024, Landslide susceptibility zoning:

770 integrating multiple intelligent models with SHAP analysis. Journal of Science and

771 Transport Technology, 4, 23–41. https://doi.org/10.58845/jstt.utt.2024.en.4.1.23-41

772 Quinlan, J.R., 1986, Induction of decision trees. Machine Learning, 1, 81–106.

773 https://doi.org/10.1007/BF00116251

774 Rabby, Y.W., Li, Y., Abedin, J. and Sabrina, S., 2022, Impact of land use/land cover change on

775 landslide susceptibility in Rangamati municipality of Rangamati district, Bangladesh.

776 ISPRS International Journal of Geo-Information, 11, 89.

777 https://doi.org/10.3390/ijgi11020089

778 Ramos-Bernal, R.N., Vázquez-Jiménez, R., Cantú-Ramírez, C.A., Alarcón-Paredes, A., Alonso-

779 Silverio, G.A., G. Bruzón, A., Arrogante-Funes, F., Martín-González, F., Novillo, C.J. and

780 Arrogante-Funes, P., 2021, Evaluation of conditioning factors of slope instability and

781 continuous change maps in the generation of landslide inventory maps using machine

782 learning (ML) algorithms. Remote Sensing, 13, 4515. https://doi.org/10.3390/rs13224515

783 Saha, S., Roy, J., Pradhan, B. and Hembram, T.K., 2021, Hybrid ensemble machine learning

784 approaches for landslide susceptibility mapping using different sampling ratios at east

785 Sikkim Himalayan, India. Advances in Space Research, 68, 2819–2840.

786 https://doi.org/10.1016/j.asr.2021.05.018

35
787 Sassa, K., Mikoš, M., Sassa, S., Bobrowsky, P.T., Takara, K. and Dang, K., 2020, Understanding

788 and reducing landslide disaster risk. Springer Cham. https://doi.org/10.1007/978-3-030-

789 60196-6

790 Seda, C., 2021, The Effect of aspect on landslide and its relationship with other parameters. In:

791 Yuanzhi, Z. and Qiuming, C. (eds.), Landslides. IntechOpen, Rijeka, Chapter 2.

792 https://doi.org/10.5772/intechopen.99389

793 Shahzad, N., Ding, X. and Abbas, S., 2022, A comparative assessment of machine learning models

794 for landslide susceptibility mapping in the rugged terrain of northern Pakistan. Applied

795 Sciences, 12, 2280. https://doi.org/10.3390/app12052280

796 Shano, L., Raghuvanshi, T.K. and Meten, M., 2020, Landslide susceptibility evaluation and hazard

797 zonation techniques – a review. Geoenvironmental Disasters, 7, 18.

798 https://doi.org/10.1186/s40677-020-00152-0

799 Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A., Shahabi, H. and Revhaug,

800 I., 2017, Shallow landslide susceptibility assessment using a novel hybrid intelligence

801 approach. Environmental Earth Sciences, 76, 60. https://doi.org/10.1007/s12665-016-6374-

802 y

803 Singh, K., Bhardwaj, V., Sharma, A. and Thakur, S., 2024, A comprehensive review on landslide

804 susceptibility zonation techniques. Quaestiones Geographicae, 43, 79–91.

805 https://doi.org/10.14746/quageo-2024-0005

806 Sitányiová, D., Vondráčková, T., Stopka, O., Myslivečková, M. and Muzik, J., 2015, GIS based

807 methodology for the geotechnical evaluation of landslide areas. Procedia Earth and

808 Planetary Science, 15, 389–394. https://doi.org/10.1016/j.proeps.2015.08.011

36
809 Sterlacchini, S., Ballabio, C., Blahut, J., Masetti, M. and Sorichetta, A., 2011, Spatial agreement of

810 predicted patterns in landslide susceptibility maps. Geomorphology, 125, 51–61.

811 https://doi.org/10.1016/j.geomorph.2010.09.004

812 Sun, D., Gu, Q., Wen, H., Xu, J., Zhang, Y., Shi, S., Xue, M. and Zhou, X., 2023, Assessment of

813 landslide susceptibility along mountain expressways based on different machine learning

814 algorithms and mapping units by hybrid factors screening and sample optimization.

815 Gondwana Research, 123, 89–106. https://doi.org/10.1016/j.gr.2022.07.013

816 Tacher, L., Bonnard, C., Laloui, L. and Parriaux, A., 2005, Modelling the behaviour of a large

817 landslide with respect to hydrogeological and geomechanical parameter heterogeneity.

818 Landslides, 2, 3–14. https://doi.org/10.1007/s10346-004-0038-9

819 Tang, H., Wang, C., An, S., Wang, Q. and Jiang, C., 2023, A novel heterogeneous ensemble

820 framework based on machine learning models for shallow landslide susceptibility mapping.

821 Remote Sensing, 15, 4159. https://doi.org/10.3390/rs15174159

822 Technology, V.I.f.B.S.a., 2009, Vietnam building code natural physical & climatic data for

823 construction.

824 Tehrani, F.S., Santinelli, G. and Herrera Herrera, M., 2021, Multi-regional landslide detection using

825 combined unsupervised and supervised machine learning. Geomatics, Natural Hazards and

826 Risk, 12, 1015–1038. https://doi.org/10.1080/19475705.2021.1912196

827 Thanh, D.Q., Nguyen, D.H., Prakash, I., Jaafari, A., Nguyen, V.T., Phong, T.V. and Pham, B.T.,

828 2020, GIS based frequency ratio method for landslide susceptibility mapping at Da Lat City,

829 Lam Dong province, Vietnam. Vietnam Journal of Earth Sciences, 42, 55–66.

830 https://doi.org/10.15625/0866-7187/42/1/14758

37
831 Thanh, T.-D., 2011, Stratigraphic units of Viet Nam (Second Edition - Revised and Updated).

832 Vietnam National University Publisher, Hanoi

833 Tin, D., Cheng, L., Le, D., Hata, R. and Ciottone, G., 2024, Natural disasters: a comprehensive

834 study using EMDAT database 1995–2022. Public Health, 226, 255–260.

835 https://doi.org/10.1016/j.puhe.2023.11.017

836 Ting, K.M. and Witten, I.H., Year, Stacking bagged and dagged models. Proceedings of the 14th

837 International Conference on Machine Learning, San Francisco, CA, 367–375.

838 Tong, Z.l., Guan, Q.t., Arabameri, A., Loche, M. and Scaringi, G., 2023, Application of novel

839 ensemble models to improve landslide susceptibility mapping reliability. Bulletin of

840 Engineering Geology and the Environment, 82, 309. https://doi.org/10.1007/s10064-023-

841 03328-8

842 UNDRR, 2022, United nations office for disaster risk reduction: annual report 2022, report, 7bis

843 Avenue de la Paix, CH1211 Geneva 2, Switzerland.

844 Van Tien, P., Luong, L.H., Nhat, L.M., Thanh, N.K. and Van Cuong, P., 2021, Landslides Along

845 Halong-Vandon Expressway in Quang Ninh Province, Vietnam. In: Guzzetti, F., Mihalić

846 Arbanas, S., Reichenbach, P., Sassa, K., Bobrowsky, P.T. and Takara, K. (eds.),

847 Understanding and Reducing Landslide Disaster Risk: Volume 2 From Mapping to Hazard

848 and Risk Zonation. Springer Cham, 133–139. https://doi.org/10.1007/978-3-030-60227-

849 7_14

850 Ward, P.J., Blauhut, V., Bloemendaal, N., Daniell, J.E., de Ruiter, M.C., Duncan, M.J., Emberson,

851 R., Jenkins, S.F., Kirschbaum, D., Kunz, M., Mohr, S., Muis, S., Riddell, G.A., Schäfer, A.,

852 Stanley, T., Veldkamp, T.I.E. and Winsemius, H.C., 2020, Review article: Natural hazard

38
853 risk assessments at the global scale. Natural Hazards Earth System Sciences, 20, 1069–1096.

854 https://doi.org/10.5194/nhess-20-1069-2020

855 Wardhani, N.W.S., Rochayani, M.Y., Iriany, A., Sulistyono, A.D. and Lestantyo, P., Year, Cross-

856 validation metrics for evaluating classification performance on imbalanced data. 2019

857 International Conference on Computer, Control, Informatics and its Applications (IC3INA),

858 23-24 Oct. 2019, 14–18.

859 Yadav, M., Pal, S.K., Singh, P.K. and Gupta, N., 2023, Landslide susceptibility zonation mapping

860 using frequency ratio, information value model, and logistic regression model: a case study

861 of Kohima district in Nagaland, India. In: Thambidurai, P. and Singh, T.N. (eds.),

862 Landslides: Detection, Prediction and Monitoring: Technological Developments. Springer

863 Cham, 333-363. https://doi.org/10.1007/978-3-031-23859-8_17

864 Yang, C., Liu, L.-L., Huang, F., Huang, L. and Wang, X.-M., 2023, Machine learning-based

865 landslide susceptibility assessment with optimized ratio of landslide to non-landslide

866 samples. Gondwana Research, 123, 198–216. https://doi.org/10.1016/j.gr.2022.05.012

867 Yilmaz, I., 2009, Landslide susceptibility mapping using frequency ratio, logistic regression,

868 artificial neural networks and their comparison: A case study from Kat landslides (Tokat—

869 Turkey). Computers & Geosciences, 35, 1125–1138.

870 https://doi.org/10.1016/j.cageo.2008.08.007

871 Yu, L., Wang, Y. and Pradhan, B., 2024, Enhancing landslide susceptibility mapping incorporating

872 landslide typology via stacking ensemble machine learning in Three Gorges reservoir,

873 China. Geoscience Frontiers, 15, 101802. https://doi.org/10.1016/j.gsf.2024.101802

39
874 Zaini, A.Z.A., Vonnisa, M. and Marzuki, M., 2024, Impact of different ENSO positions and Indian

875 Ocean Dipole events on Indonesian rainfall. Vietnam Journal of Earth Sciences, 46, 100–

876 119. https://doi.org/10.15625/2615-9783/19926

877 Zandalinas, S.I., Fritschi, F.B. and Mittler, R., 2021, Global warming, climate change, and

878 environmental pollution: recipe for a multifactorial stress combination disaster. Trends in

879 Plant Science, 26, 588–599. https://doi.org/10.1016/j.tplants.2021.02.011

880 Zeng, T., Wu, L., Peduto, D., Glade, T., Hayakawa, Y.S. and Yin, K., 2023, Ensemble learning

881 framework for landslide susceptibility mapping: Different basic classifier and ensemble

882 strategy. Geoscience Frontiers, 14, 101645. https://doi.org/10.1016/j.gsf.2023.101645

883 Zhang, K., Wang, S., Bao, H. and Zhao, X., 2019, Characteristics and influencing factors of rainfall-

884 induced landslide and debris flow hazards in Shaanxi province, China. Natural Hazards

885 Earth System Sciences, 19, 93–105. https://doi.org/10.5194/nhess-19-93-2019

886 Zhang, Q., Ning, Z., Ding, X., Wu, J., Wang, Z., Tsangaratos, P., Ilia, I., Wang, Y. and Chen, W.,

887 2024, Hybrid integration of bagging and decision tree algorithms for landslide susceptibility

888 mapping. Water, 16, 657. https://doi.org/10.3390/w16050657

889 Zhao, F., Miao, F., Wu, Y., Ke, C., Gong, S. and Ding, Y., 2024, Refined landslide susceptibility

890 mapping in township area using ensemble machine learning method under dataset

891 replenishment strategy. Gondwana Research, 131, 20–37.

892 https://doi.org/10.1016/j.gr.2024.02.011

893 Zhou, Z., Duan, J., Geng, S. and Li, R., 2024, The role of expressway construction in influencing

894 agricultural green total factor productivity in China: agricultural industry structure

895 transformation perspective. Frontiers in Sustainable Food Systems, 7.

896 https://doi.org/10.3389/fsufs.2023.1315201

40
897 LIST OF TABLES AND FIGURES

898

899 Table 1. The parameters of the models used for establishing landslide susceptibility maps in the

900 research area

901 Table 2: The source of the conditional factors used in this study

902 Table 3. The ranking compares conditional factors using selecting attribute methods

903 Table 4. Models performance using multicriteria

904

905 Fig. 1. Area study of Halong – Vandon Expressway for Landslide Susceptibility Mapping.

906 Fig. 2. Several technically reinforced landslides continue to occur along the Halong - Vandon

907 expressway: a) at km 10, b) at km 19, c) at km 24, and d) at km 30 (photo source: Tuan-Nghia Do).

908 Fig 3. Flow chart of the methodology for Landslide Susceptibility Mapping in Halong – Vandon

909 Expressway.

910 Fig. 4: Conditional factor maps for Landslide Susceptibility Mapping in the Halong - Vandon

911 Expressway area.

912 Fig. 5. Zonal histogram between conditional factor maps with Landslide and Non-landslide

913 inventories in the study.

914 Fig. 6. AUC performance of the models: a) Training dataset, b) Validation dataset.

915 Fig. 7. Landslide susceptibility maps in the Halong – Vandong Expressway area: a) DPCT, b) B-

916 DPCT, c) CG-DPCT, d) D-DPCT.

917 Fig. 8. Analysis results of landslide susceptibility maps: a) Percentage of area of landslide

918 susceptibility classes on each landslide susceptibility class, b) Percentage of validation landslide

919 dataset on each landslide susceptibility class, c) Percentage of validation non-landslide dataset on

41
920 each landslide susceptibility class, d) Frequency ratio of landslides on each landslide susceptibility

921 class, and e) Frequency ratio of non-landslides on each landslide susceptibility class.

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

42
939 TABLE

940

941 Table 1. The parameters of the models used for establishing landslide susceptibility maps in the

942 research area

Models

No Hyperparameters Cascade
DPCT Bagging Dagging
Generalization

1 Lambda 0.2 - - -

2 Batch Size 100 100 100 100

3 Classifier - DPCT DPCT DPCT

4 Number of Decimal Places - 2 2 2

5 Number of Execution Slots - 1 1 -

6 Number of Interactions - 10 - -

7 Seed - 1 1 1

8 Number of Folds - - 20 2

943

944

945

946

947

948

43
949 Table 2: The source of the conditional factors used in this study

No Variable Scale Source

DEM (generated from a 1:10,000 scale topographic map

1 Elevation (m) 10 m from the Department Of Survey, Mapping, and

Geographic Information Viet)

Survey report on the natural project, code 105.08-


2 Weathering crust 10 m
2020.25

Geological map at scale 1:50,000 from General


3 Geology 10 m
Department of Geology and Minerals of Viet Nam

Geotechnical Survey report on the natural project, code 105.08-


4 10 m
Engineering 2020.25

Survey report on the natural project, code 105.08-


5 Hydrogeology 10 m
2020.25

Survey report on the natural project, code 105.08-


6 LULC 10 m
2020.25

Viet Nam Meteorological and Hydrological


7 Rainfall (mm/day) 10 m
Administration

8 Fault density (km/km2) 10 m Geological map

Stream density
9 10 m Generated from DEM
2
(km/km )

10 Slope (degree) 10 m Generated from DEM

11 Aspect 10 m Generated from DEM

12 Curvature 10 m Generated from DEM

44
13 TWI 10 m Generated from DEM

14 SPI 10 m Generated from DEM

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

45
967 Table 3. The ranking compares conditional factors using selecting attribute methods

Methods

Rank CAE Gain Ratio AE OneR

Factor Values Factor Values Factor Values

1 Slope (degree) 0.65 Slope (degree) 0.22 Elevation (m) 86.30

2 TWI 0.59 Curvature 0.15 Slope (degree) 85.56

3 Elevation (m) 0.50 Elevation (m) 0.14 Curvature 80.61

4 Aspect 0.45 TWI 0.13 TWI 78.50

Rainfall
5 0.28 Aspect 0.12 Aspect 77.01
(mm/day)

Fault density
6 0.24 Geology 0.11 Geology 75.36
2
(km/km )

Geotechnical Fault density


7 Stream density 0.22 0.07 66.31
2
Engineering (km/km )

8 Weathering crust 0.18 SPI 0.07 SPI 65.81

Geotechnical Rainfall
9 0.14 Stream density 0.06 65.62
Engineering (mm/day)

Fault density Geotechnical


10 Hydrogeology 0.10 0.04 63.07
(km/km2) Engineering

Weathering
11 Geology 0.06 Hydrogeology 0.04 60.58
crust

Rainfall
12 LULC 0.02 0.04 Stream density 60.39
(mm/day)

46
13 Curvature 0.01 LULC 0.03 LULC 59.29

Weathering
14 SPI 0.01 0.03 Hydrogeology 57.21
crust

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

47
984 Table 4. Models performance using multicriteria

Models

N Paramete Training dataset Validating dataset

o rs DPC B- CG- D- DPC B- CG- D-

T DPCT DPCT DPCT T DPCT DPCT DPCT

1 TP 2220 2221 2219 2198 948 949 948 946

2 TN 1820 2113 1872 2155 495 591 521 569

3 FP 16 15 17 38 23 22 23 25

4 FN 572 279 520 237 376 280 350 302

5 PPV (%) 99.28 99.33 99.24 98.30 97.63 97.73 97.63 97.43

6 NPV (%) 76.09 88.34 78.26 90.09 56.83 67.85 59.82 65.33

7 SST (%) 79.51 88.84 81.01 90.27 71.60 77.22 73.04 75.80

8 SPF (%) 99.13 99.30 99.10 98.27 95.56 96.41 95.77 95.79

9 ACC (%) 87.29 93.65 88.40 94.06 78.34 83.60 79.75 82.25

10 Kappa 0.75 0.87 0.77 0.88 0.56 0.67 0.59 0.64

11 RMSE 0.30 0.30 0.30 0.29 0.40 0.37 0.39 0.37

985

986

987

988

989

990

48
991 FIGURE

992

993

994 Fig. 1. Area study of Halong – Vandon Expressway for Landslide Susceptibility Mapping.

49
995

996 Fig. 2. Several technically reinforced landslides continue to occur along the Halong - Vandon

997 expressway: a) at km 10, b) at km 19, c) at km 24, and d) at km 30 (photo source: Tuan-Nghia Do).

50
998

999 Fig 3. Flow chart of the methodology for Landslide Susceptibility Mapping in Halong – Vandon

1000 Expressway.

51
1001

1002

1003 Fig. 4: Conditional factor maps for Landslide Susceptibility Mapping in the Halong - Vandon

1004 Expressway area.

52
1005

1006 Fig. 5. Zonal histogram between conditional factor maps with Landslide and Non-landslide

1007 inventories in the study.

53
1008

1009 Fig. 6. AUC performance of the models: a) Training dataset, b) Validation dataset.

1010

1011

1012

1013

54
1014

1015 Fig. 7. Landslide susceptibility maps in the Halong – Vandong Expressway area: a) DPCT, b) B-

1016 DPCT, c) CG-DPCT, d) D-DPCT.

55
1017

1018 Fig. 8. Analysis results of landslide susceptibility maps: a) Percentage of area of landslide

1019 susceptibility classes on each landslide susceptibility class, b) Percentage of validation landslide

1020 dataset on each landslide susceptibility class, c) Percentage of validation non-landslide dataset on

1021 each landslide susceptibility class, d) Frequency ratio of landslides on each landslide susceptibility

1022 class, and e) Frequency ratio of non-landslides on each landslide susceptibility class.

56

You might also like