[1.2]
[1.2]
Novel ensemble models based on Dual Perturb and Combine for Tree-based (DPCT)
for Landslide Susceptibility Mapping: a case in along Ha Long – Van Don Highway
--Manuscript Draft--
Full Title: Novel ensemble models based on Dual Perturb and Combine for Tree-based (DPCT)
for Landslide Susceptibility Mapping: a case in along Ha Long – Van Don Highway
Short Title: Landslide Susceptibility Mapping based on DPCT (Ha Long – Van Don Highway)
Manuscript Classifications: 160: Engineering Geology; 440: Remote Sensing/GIS; 470: Soils
Abstract: The areas along transportation routes constructed in mountainous terrain often harbor
significant landslide hazards. Ensemble learning techniques have proven their
effectiveness in improving landslide susceptibility prediction performance. In this study,
novel ensemble models (Bagging (B), Cascade Generalization (CG), and Dagging (D))
based on the Dual Perturb and Combine for Tree-based (DPCT) approach were
employed to predict landslide susceptibility along the Ha Long – Van Don highway. The
dataset comprised 78 landslide locations (3263 points), non-landslide locations (1:1
ratio with landslide points), and 14 conditional factors, including topography
characteristics, geology, rainfall, and land use/land cover (LULC) were input
parameters for the models (B-DPCT, CG-DPCT, D-DPCT, and DPCT). Evaluation
criteria for model prediction outcomes included the area under the receiver operating
characteristic curve (AUC), parameters derived from the confusion matrix, Kappa
statistics, and root mean square error (RMSE). Accordingly, landslide susceptibility
maps predicted based on the B-DPCT model exhibited optimal evaluation results on
the validation dataset (AUC = 0.948, accuracy ACC = 83.6, Kappa statistic = 0.67, and
RMSE = 0.37), suggesting their recommended use for construction planning and
mitigation efforts along the Ha Long – Van Don highway to minimize landslide-induced
damages.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
He has the same research field as mine.
Additional Information:
Question Response
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript Click here to access/download;Manuscript;Manuscript 30 04
2024.docx
Click here to view linked References
1 Novel ensemble models based on dual perturb and combine for tree-
5 Tran V. Phong1,2, Tuan-Nghia Do3*, Phan T. Trinh1,2, Bui N. Thanh2,4, Vuong H. Nhat5
1
7 Institute of Geological Sciences, Vietnam Academy of Science and Technology, 84 Chua Lang
16
22
1
23 *Corresponding author:
24 Tuan-Nghia Do
25 Lecturer, Faculty of Civil Engineering, Thuyloi University, room 416, A1 building, 175 Tay Son,
26 Hanoi, Vietnam.
28
29 Running title:
30 Landslide Susceptibility Mapping based on DPCT (Ha Long – Van Don Highway)
31
32
33
34
35
36
37
38
39
40
41
42
2
43 Abstract: The areas along transportation routes constructed in mountainous terrain often harbor
44 significant landslide hazards. Ensemble learning techniques have proven their effectiveness in
45 improving landslide susceptibility prediction performance. In this study, novel ensemble models
46 (Bagging (B), Cascade Generalization (CG), and Dagging (D)) based on the Dual Perturb and
47 Combine for Tree-based (DPCT) approach were employed to predict landslide susceptibility along
48 the Halong – Vandon expressway. The dataset comprised 78 landslide locations (3263 points), non-
49 landslide locations (1:1 ratio with landslide points), and 14 conditional factors, including
50 topography characteristics, geology, rainfall, and land use/land cover (LULC) were input
51 parameters for the models (B-DPCT, CG-DPCT, D-DPCT, and DPCT). Evaluation criteria for
52 model prediction outcomes included the area under the receiver operating characteristic curve
53 (AUC), parameters derived from the confusion matrix, Kappa statistics, and root mean square error
54 (RMSE). Accordingly, landslide susceptibility maps predicted based on the B-DPCT model
55 exhibited optimal evaluation results on the validation dataset (AUC = 0.948, accuracy ACC = 83.6,
56 Kappa statistic = 0.67, and RMSE = 0.37), suggesting their recommended use for construction
57 planning and mitigation efforts along the Halong – Vandon expressway to minimize landslide-
58 induced damages.
59 Key words: DPCTree, ensemble models, machine learning, LSM, Halong-Vandon expressway
60
61
62
63
64
3
65 1. INTRODUCTION
66
67 Nowadays, with the progression of climate change and global warming, extreme weather
68 phenomena are occurring with increased frequency and magnitude (D’Amato and Akdis, 2020;
69 Ogunbode, Doran and Böhm, 2020; Zaini, Zandalinas, Fritschi and Mittler, 2021; Hoang-Cong et
70 al., 2022; Ngo-Duc, 2023; Vonnisa and Marzuki, 2024). Consequently, natural disasters are also
71 occurring more frequently and on a larger scale (AghaKouchak et al., 2020; Ward et al., 2020;
72 Masson-Delmotte et al., 2022; Giang Linh, Dang Kinh and Bui Thanh, 2023; Farinós-Dasí et al.,
73 2024; Tin et al., 2024). Globally, it is estimated that over the past 20 years, the incidence of natural
74 disasters has increased by approximately 75%, resulting in the loss of over 1 million lives and
75 affecting the livelihoods of over 4 billion people, causing economic damages close to 3 trillion USD
76 (UNDRR, 2022). Among these, landslides are one of the most impactful forms of natural disasters
77 on the socio-economic front (Yadav et al., 2023). Therefore, landslide prediction research remains
78 a pressing issue to support disaster prevention and damage mitigation efforts (Binh Thai et al., 2022;
79 Pham et al., 2022; Dao Minh et al., 2023; Le Minh et al., 2023; Tong et al., 2023; Doan et al., 2024).
81 in land management, urban planning, and settlement (Azarafza et al., 2021; Ado et al., 2022). There
82 are two main approaches to landslide susceptibility mapping: 1) qualitative approach and 2)
83 quantitative approach. Among them, the quantitative approach has proven more effective in
84 landslide prediction than the qualitative approach (Ibrahim et al., 2020; Shano, Raghuvanshi and
85 Meten, 2020; Asadi et al., 2022). Machine learning methods are currently being researched and
86 widely applied for landslide susceptibility mapping (Azarafza et al., 2021; Ado et al., 2022; Liu et
87 al., 2023).
4
88 In machine learning models applied to landslide susceptibility mapping, ensemble models
89 are often used to enhance the performance of a single model ( Di Napoli et al., 2020; Saha et al.,
90 2021; Arabameri et al., 2022; Pham et al., 2022). Commonly used ensemble techniques include
91 Bagging (Gu et al., 2024; Zhang et al., 2024), Cascade Generalization (Hong, 2023b; Ali et al.,
92 2024), Dagging (Bui et al., 2023; Le Minh et al., 2023; Tong et al., 2023), Decorate (Hong, 2023a;
93 Le Minh et al., 2023), Multi Boost (Ajin et al., 2022; Bien et al., 2023) và Rotation Forest (Kalantar
94 et al., 2020; Fang et al., 2021; Pham et al., 2022; Ali et al., 2024). Landslide prediction is a complex,
95 multivariate, and multicriteria problem (Liu et al., 2023). Each study area has different
96 characteristics and conditions leading to landslides (Costanzo and Irigaray, 2020; Ramos-Bernal et
97 al., 2021). Therefore, no single best model for solving all landslide prediction problems (Pham et
98 al., 2022). Hence, researching and understanding new machine learning models for landslide
99 susceptibility mapping applications is necessary to find the optimal prediction model (Ali et al.,
100 2024). Each machine-learning model has its theoretical foundation and techniques (Azevedo, Rocha
101 and Pereira, 2024). Therefore, tuning the parameters of these models helps improve accuracy in
102 landslide prediction (Yu, Wang and Pradhan, 2024). Particularly, ensemble techniques prove
103 effective in enhancing the prediction accuracy of the original single model (Tang et al., 2023; Zeng
105 Expressways are vital links in a country's transportation system and socio-economic
106 relations (Ngewie, 2024; Zhou et al., 2024). Expressways are designed for vehicles to operate at
107 high speeds, reducing travel time. Therefore, ensuring the smooth operation of expressways is
108 crucial for maintaining stability in socio-economic development. Landslides on expressways pose
109 a threat, disrupting traffic flow, potentially causing harm to people and vehicles in transit, and
110 damaging the road structure (Sun et al., 2023). During expressway construction, many mountainous
5
111 areas are disturbed, leading to numerous new landslide masses emerging along the route,
112 particularly with the formation of large landslide masses (Nguyen, Tien and Do, 2020; Pasang and
113 Kubíček, 2020). Furthermore, inadequate consideration in designing landslide mitigation structures
114 contributes to frequent landslides (Nguyen, Tien and Do, 2020; Van Tien et al., 2021). Predicting
115 landslides along expressway corridors is essential (Pasang and Kubíček, 2020; Sassa et al., 2020).
116 This serves the purpose of prevention, minimizing risks posed by landslides to vehicles, humans,
117 and the infrastructure of the expressway (Beigh and Bukhari, 2024; Sassa et al., 2020).
118 This study successfully applied novel ensemble models based on Dual Perturb. It combined
119 Tree-based techniques for landslide susceptibility mapping in the Halong – Vandon expressway
120 area, Quang Ninh province, Vietnam. The ensemble techniques (Bagging, CG, Dagging) helped
121 improve the Dual Perturb and Combined landslide prediction performance for the Tree-based
122 model. The landslide susceptibility maps are a robust scientific basis for managing, urban planning,
123 preventing, and mitigating landslide damages in the Halong – Vandon expressway area.
124 Additionally, this research provides evidence for the applicability of the ensemble models based on
126
128
129 The Halong - Vandon Expressway construction commenced in September 2015. It was
130 completed by the end of 2018, spanning 59 kilometers within the jurisdiction of Quang Ninh
131 province in northeastern Vietnam. This expressway features four lanes with a design speed of 100
132 km/h and is crucial in promoting the socio-economic development of the northern region of
133 Vietnam. It connects renowned tourist destinations such as Halong Bay, Cat Ba Island, and Bai Tu
6
134 Long Bay with the Vandon Island District (https://www.quangninh.gov.vn/). The study area chosen
135 along the Halong - Vandon Expressway covers an area of 180.55 km2 (Fig. 1). This area exhibits
136 diverse topography, primarily hills, mountains, and plains, with elevations ranging from 2.5 to 395
137 meters. The region experiences prolonged rainy seasons from May to October each year, with an
138 average annual rainfall of 2300 mm, an average temperature of approximately 23°C, and an average
139 humidity of 84.6%. Winter months often witness foggy conditions (Technology, 2009). The
140 geological composition of the area is diverse, predominantly comprising rock formations from the
141 Hon Gai Formation, Tan Mai Formation, Binh Lieu Formation, Ha Coi Formation, Cat Ba
142 Formation, and the Quaternary (Fig. 3c). Specifically, the Halong - Vandon Expressway
143 predominantly traverses through the Hon Gai Formation, which harbors numerous coal seams of
144 industrial value (Thanh, 2011). After the Halong – Vandon expressway was operated, landslides
145 continued to occur here, affecting the traffic safety for vehicles passing through (Fig. 2) (Van Tien
149
151
153
154 The methodology for establishing landslide susceptibility maps in the Halong - Vandon
155 expressway area is presented in the flow chart of Fig. 3. In this research, machine learning models,
156 evaluation parameters, attribute selection methods, and data used are presented in this section. The
7
157 models in this paper utilized Weka software version 3.8.6 for computation and modeling
159 1.
162
164
166 The DPCT is a machine learning algorithm for classification and regression problems,
167 which was first introduced by (Geurts and Wehenkel, 2005). This algorithm is a modified version
168 of the traditional dual perturb and combine (DPC) technique, where the combination of perturbed
169 datasets and predictions of base models is performed analytically without the need for multiple
170 iterations of the training and prediction process (Geurts, 2001; Geurts and Wehenkel, 2005). By
171 finding the optimal combination weights through analysis, DPC can provide efficient solutions and
172 extensions for combining learning with tree-based models (Geurts and Wehenkel, 2005; Khosravi
174 Step 1: Data Perturbation: Generate perturbed versions of the original dataset using techniques such
176 Step 2: Base Model Training: Train multiple base models, such as decision trees, random forests,
177 or gradient boosting machines, on each perturbed dataset independently. These base models capture
178 different aspects of the data due to the introduced randomness during perturbation.
8
179 Step 3: Analytical Combination: Instead of combining predictions using techniques like voting or
180 averaging, DPCT analytically determines the optimal combination weights for the predictions of
181 the tree-based models. This may involve solving optimization problems to minimize loss functions
183 Step 4: Final Prediction: Once the optimal combination weights are determined, the final prediction
184 for a specific input sample is calculated as the weighted sum of the predictions from the base
185 models.
186 Step 5: Evaluation and Tuning: Evaluate the performance of the ensemble model using appropriate
188
191 models. The benefits of bagging include reducing overfitting increasing the model's stability and
192 accuracy while also helping to minimize reliance on the training data (Breiman, 1996). Bagging,
193 mainly when applied to weak models such as decision trees, can generate more robust models
194 capable of aggregating loss patterns effectively. The bagging algorithm operates as follows:
195 Step 1: Bootstrap Sampling: Generate multiple subsets of data from the original training dataset
196 through bootstrap sampling. Each subset is the same size as the original dataset but may contain
198 Step 2: Model Training: Train a prediction model on each subset of data created in Step 1. Each
199 model is trained on a different subset of data thus they learn different aspects of the data.
9
200 Step 3: Prediction Aggregation: Combine the predictions from all the models trained in Step 2. In
201 the case of classification problems, the voting method is commonly applied to select the final
202 prediction.
203
205 CG is an ensemble model for classification problems based on the stacking algorithm. CG
206 enhances the base model's performance by employing a sequential ensemble of classifiers, whereby
207 new attributes are inserted into the original dataset at each step. These new attributes are derived
208 from the probability layer provided by the base model (Gama and Brazdil, 2000). This reduces bias
209 in attribute evaluation, thereby improving the base model's performance. CG is currently one of the
210 most popular used ensemble models in natural disaster assessment (Chen et al., 2019; Pham et al.,
211 2019).
212
214 Dagging is an ensemble model primarily used in classification tasks. Dagging helps improve
215 the model's performance by dividing the original dataset into smaller subsets and combining
216 predictions from sub-models (Ting and Witten, 1997). This helps the model avoid overfitting and
218 Step 1: Decomposition: Firstly, the training dataset is decomposed into subsets using a specific
219 decomposition method. Decomposition methods may include linear decomposition, scalar
221 Step 2: Aggregation: Prediction models are trained on the subsets obtained from the decomposition
222 process. Each prediction model focuses on solving a specific part of the problem.
10
223 Step 3: Prediction Combination: Finally, sub-model predictions are combined to produce the final
224 prediction. The voting method is used to combine predictions from the sub-models.
225
227
228 The models used for landslide susceptibility prediction are validated using evaluation
229 metrics for classification problems (Wardhani et al., 2019; Pham et al., 2022; Bien et al., 2023; Le
230 Minh et al., 2023), including AUC, Positive Predictive Value (PPV), Negative Predictive Value
231 (NPV), Sensitivity (SST), Specificity (SPF), Accuracy (ACC), Kappa index, and Root Mean Square
232 Error (RMSE). In there, AUC is a critical metric frequently utilized to evaluate the performance of
233 classifiers (Chen and Chen, 2021). The AUC is determined by combining SST and SPF values at
234 each predicted value threshold. The value of AUC ranges from 0 to 1, with a higher AUC indicating
235 better model performance (Chen and Chen, 2021; Pham et al., 2022; Bien et al., 2023; Le Minh et
236 al., 2023). The PPV, NPV, SST, SPF, and ACC metrics are expressed as percentages and are
237 calculated based on four parameters derived from the confusion matrix. These parameters consist
238 of True Positive (TP) and False Positive (FP), which respectively denote correctly and incorrectly
239 predicted landslide samples; True Negative (TN) and False Negative (FN), representing correctly
240 and incorrectly predicted non-landslide samples (Pham et al., 2022; Bien et al., 2023; Le Minh et
241 al., 2023). Higher PPV, NPV, SST, SPF, and ACC values, along with lower RMSE, indicate greater
242 model accuracy (Dao et al., 2020). The Kappa index is used as a statistical measure of agreement
243 between predicted and actual values ( Sterlacchini et al., 2011; Baeza, Lantada and Amorim, 2016).
244 The Kappa value ranges from 0 to 1, with a value closer to 1 indicating greater model accuracy
11
245 (Prakash et al., 2024). A model is considered to have high confidence accuracy with Kappa > 0.59
247 The formulas for calculating the metrics mentioned above are as follows (Le Minh et al., 2023):
∑(𝑥𝑖 −𝑥̂𝑖 )2
253 𝑅𝑀𝑆𝐸 = √ , (6)
𝑁−𝑃
254 where 𝑥𝑖 and 𝑥̂𝑖 are the actual and predicted landslide susceptibility values, and P is the number of
255 estimated parameters, including the constant. N is the total number of landslide samples.
𝑃0 − 𝑃𝑚
256 𝐾𝑎𝑝𝑝𝑎 = (7)
1− 𝑃𝑚
257 where 𝑃0 is the relative observed agreement among raters and 𝑃𝑚 is the assumed probability of
259
261
263 CAE, or Pearson correlation coefficient, measures the linear correlation between two
264 continuous variables. It is used to quantify the strength and direction of the linear relationship
265 between the variables (Nettleton, 2014). The value of the correlation coefficient lies between -1 and
266 1 (Nettleton, 2014). This study determines the correlation coefficient between two variables: the
12
267 conditional factor and the landslide, or non-landslide, of the training dataset (Lucchese, de Oliveira
268 and Pedrollo, 2020). The correlation value is normalized to the range from 0 to 1. Thus, if the
269 correlation coefficient is close to 1, it indicates a strong positive correlation between the variables,
270 meaning that the conditional factor significantly influences landslides. Conversely, if the correlation
271 coefficient is close to 0, it indicates a weak linear relationship between the influencing parameter
272 and landslides. The formula for calculating the correlation coefficient is presented below (Nettleton,
273 2014):
275 where 𝑥𝑖 and 𝑦𝑖 are the values of the two variables. 𝑥̅ and 𝑦 are the means of the two variables.
276
278 The GRAE is a metric used in attribute evaluation within the context of decision trees and
279 other machine-learning algorithms (Quinlan, 1986). It's utilized explicitly in feature selection to
280 determine the most informative attributes for classification tasks. Accordingly, the higher the
281 GRAE value, the more influence the conditional factor has on the landslide. If the GRAE value
282 equals 0, then the conditional factor is unrelated to landslides. Here's how it works:
283 Step 1: Entropy: Entropy measures the impurity or randomness of the data. It's calculated based on
284 the distribution of class labels within a dataset. Higher entropy indicates more disorder.
285 Step 2: Information Gain: Information gain measures how much a given attribute contributes to
286 reducing entropy in the dataset. When a dataset is split based on an attribute, information gain
287 quantifies how much more ordered the resulting subsets are than the original dataset.
13
288 Step 3: Split Information: This component of the Gain Ratio considers the intrinsic randomness
289 associated with the attribute. It's calculated based on the distribution of values of the attribute. If an
290 attribute has many distinct values, its split information is higher.
291 Step 4” Gain Ratio: The gain ratio considers information gain and split information. It's calculated
292 by dividing the information gained by the split information. This normalization helps in selecting
293 attributes that have a good balance between information gain and intrinsic randomness.
294
296 OneR is used in attribute selection, particularly in machine learning and data mining (Holte,
297 1993). This method evaluates attributes based on their relevance to the target variable in a dataset
298 (Le Minh et al., 2023). Accordingly, the higher the OneR value, the greater the influence ranking
299 of the conditioning variable on landslides. Here's how the OneR algorithm typically works:
300 Step 1: Selecting a Target Variable: The first step is to select a target variable, which is the variable
301 that you want to predict or classify. This could be a categorical or numerical variable, depending
303 Step 2: Grouping Data by Each Attribute: Next, the algorithm examines each attribute in the dataset
304 one at a time. For each attribute, the data is grouped by its values.
305 Step 3: Finding the Most Common Class: Within each group, "One R" determines the target
306 variable's most common class or outcome. This could be the most frequent category in the case of
307 a categorical target variable or the mean or median value in the case of a numerical target variable.
308 Step 4: Creating Rules: Based on the most common class found in each group, "One R" creates
309 simple rules or decision boundaries. These rules say: "If attribute A has value X, then predict class
310 Y."
14
311 Step 5: Measuring Accuracy: Once rules are created for each attribute, the algorithm measures the
313 Step 6: Selecting the Best Attribute: Finally, the algorithm selects the attribute that produces the
314 most accurate predictions as the OneR model for that dataset.
315
317
319 The inventory data includes landslide and non-landslide locations (Yang et al., 2023). The
320 role of these data is to label the machine learning models (Tehrani, Santinelli and Herrera Herrera,
321 2021; Gu et al., 2024). This study collected all historical landslide data within the research area.
322 The method for identifying landslide locations involved two main steps: (1) Digitization on satellite
323 imagery (Google Earth) and (2) Verification through field surveys. In the research area, 78 landslide
324 locations were identified and represented on the map as polygons. To convert the data into a format
325 understandable by machine learning models, landslide data was assigned a value of 1, and non-
326 landslide data was assigned a value of 0. Accordingly, the entire landslide area was divided into
327 points corresponding to a 10m/pixel spatial resolution. The total number of computed landslide
328 points is 3263, which are divided into two sets: a training set consisting of 70% of the landslide
329 polygons (2392 points) and a testing set consisting of 30% of the landslide polygons (871 points).
330 Non-landslide data (3263 points) were sampled at a 1:1 ratio with landslide points. The sequence
331 of sampling non-landslide points in this study was based on two main steps: (1) Randomly sampling
332 points on map layers with slopes less than 50 and layers of curvature with values > -0.05 and < 0.05
333 (flat class), and (2) Verification, and normalized from field surveys.
15
334
336 Landslide is a process driven by the interaction of conditional factors related to geological
337 characteristics, topography, geomorphology, land cover, and rainfall (Le Minh et al., 2023). Among
338 these, rainfall is often the triggering factor for landslides (Polemio and Petrucci, 2000; Crosta and
339 Frattini, 2008), with other factors contributing to the predisposition for such events (Zhang et al.,
341 susceptibility maps in the Vandon - Halong Expressway area. These parameters and their sources
342 are presented in Fig. 4 and Table 2. The selection principle of these parameters is based on
343 synthesizing expert methods, considering the characteristic conditions of the study area, and
344 utilizing statistical evaluation methods (Correlation Attribute Evaluation, Gain Ratio Attribute
345 Evaluation, OneR). Accordingly, Elevation (m) is characteristic of 'terrain potential,' where higher
346 elevations indicate more significant terrain potential and are more conducive to landslide
347 occurrence (Bien et al., 2023; Le Minh et al., 2023). Weathering crust type is indicative of rock
348 destruction, related to the stability of the soil (Mai, 1996; Thanh et al., 2020; Phong et al., 2021).
349 Geological and geotechnical engineering factors characterize the properties, types, and components
350 of soil and rocks, indirectly related to the physical properties of soil and rocks ( Ohlmacher, 2000;
351 Chacón et al., 2006; Sitányiová et al., 2015). Hydrogeological characteristics indirectly indicate the
352 water retention capacity of soil and rocks, which is related to the conditions of water pressure in
353 voids within the soil (Tacher et al., 2005). LULC represents the vegetation cover characteristics on
354 the land; typically, areas with dense forest cover have lower landslide probabilities (Rabby et al.,
355 2022). Next, rainfall amount characterizes the landslide activation factor. Water infiltrates the soil
356 when rainfall occurs, saturating and breaking the original soil-rock bonds. Higher rainfall amounts
16
357 favor landslide occurrence (Zhang et al., 2019). This study calculates rainfall amount as the daily
358 average, and the classes are divided using the nature break statistical method. Fault density
359 (km/km2) is characteristic of the degree of rock destruction by tectonic; areas with higher fault
360 densities experience more significant rock destruction, facilitating landslides (Le Minh et al., 2023).
361 Stream density (km/km2) indirectly indicates the water retention capacity of soil and the drainage
362 conditions on the terrain (Shirzadi et al., 2017). Typically, areas with higher flow densities are more
363 favorable for landslides. Slope (degree) is also important for landslide occurrence (Çellek, 2020).
364 Generally, slopes ranging from 250 to 400 are favorable for landslides (Çellek, 2020). Aspect
365 represents the characteristics of windward slopes, indirectly related to the soil's moisture absorption
366 from humid air streams (Seda, 2021). Curvature characterizes the surface terrain, where flat terrain
367 (values from -0.05 to 0.05) usually experiences fewer landslides, while concave (<-0.05) and
368 convex (>0.05) terrains are more favorable for landslides (Phong et al., 2021). The Topographic
369 Wetness Index (TWI) indirectly indicates the moisture retention conditions of the terrain, related to
370 the soil's water saturation. Higher TWI values indicate greater moisture retention capacity in the
371 soil, and vice versa (Conoscenti, Di Maggio and Rotigliano, 2008). Lastly, the Stream Power Index
372 (SPI) is characteristic of the energy of the terrain. Higher SPI values correspond to higher landslide
376
377 4. RESULTS
378
17
379 4.1. Conditional Factor Importance
380
381 The evaluation results of the importance of landslide conditioning factors show that each
382 method has its ranking (Table 3). According to both CAE and GRAE methods, Slope is the most
383 influential factor, while the OneR method ranks Slope second. Elevation is the most influential
384 factor according to the OneR method, ranked third by CAE and GRAE methods. TWI is ranked
385 second in influence by the CAE method and only ranks fourth in the other two methods. Aspect is
386 ranked fourth in influence by the CAE method and fifth by the other methods.
387 Similarly, curvature is ranked second and third in influence by the GRAE and OneR methods.
388 However, the CAE method ranks curvature as thirteenth. Likewise, rainfall is evaluated as the fifth
389 most important factor according to the CAE method but ranks ninth and twelfth according to the
390 OneR and GRAE methods. Geology is ranked sixth by the GRAE and OneR methods but only
391 eleventh by the CAE method. Fault density is ranked sixth by the CAE method, seventh by the
392 OneR method, and tenth by the GRAE method. The remaining five conditioning factors,
393 Weathering crust, Geotechnical Engineering, Hydrogeology, SPI, and LULC, are all ranked as
394 having low influence by all three evaluation methods. Overall, the results of evaluating the
395 importance of conditioning factors to landslides using different methods indicate that these factors
396 each have a certain level of influence on landslide causation. The most important factors include
398 Fig. 5 illustrates the distribution of landslide and non-landslide positions across the classes
399 of each conditioning factor. The analysis results from the charts reveal which classes influence
400 landslide and non-landslide occurrences most. Accordingly, for the Slope factor, non-landslide
401 positions mainly concentrate in classes ranging from 0-50, particularly with class 00 having a
18
402 predominant sample count of over 1700. Landslide positions are more evenly distributed across
403 classes ranging from 0-450, with fewer occurrences in classes more significant than 450. Regarding
404 the Weathering crust, landslides, and non-landslides are predominantly distributed across two
405 classes: Ferosialit andFerosialit–Sialferit. For the Geology factor, landslides are predominantly
406 concentrated in the Hon Gai Formation ( 2200 samples), Ha Coi Formation ( 630 samples), and
407 Quaternary ( 250 samples). Conversely, non-landslide positions are more evenly distributed across
408 predominant classes, particularly in the Hon Gai formation ( 1050 samples) and Quaternary (
409 650 samples). Continuing with the Geotechnical Engineering factor, landslides, and non-landslides
410 are concentrated on class G2, with sample counts of approximately 2450 and 2000, respectively.
411 Similarly, for the hydrogeology factor, landslide and non-landslide positions are predominantly
412 concentrated in the water-poor region class, with sample counts of 3000 and 2500, respectively. For
413 LULC, landslides and non-landslides are predominantly distributed in the Forest class, with sample
414 counts of approximately 2600 and 2000, respectively. Concerning Elevation, landslides are
415 primarily distributed at elevations ranging from 2.5 to 250m, while non-landslides are distributed
416 at lower elevations (< 50m). Subsequently, rainfall, landslides, and non-landslides are distributed
417 across all classes. Landslide positions are concentrated in classes with high rainfall (291 – 353
418 mm/day) and moderate rainfall (148 – 155 mm/day), with sample counts of approximately 1150
419 and 1000, respectively. Non-landslide positions are concentrated in classes with low rainfall (0 –
420 148 mm/day), with approximately 1400 samples. Landslides and non-landslides are distributed
421 across classes ranging from 0 – 1 km/km2 for the Fault density factor, with the highest concentration
422 in the class with a 1 km/km2 value. Regarding Stream density, landslide positions concentrate from
423 0 – 11 km/km2, while non-landslide positions predominantly concentrate from 5-12 km/km2. Next,
424 landslides are fairly evenly distributed across directions for the Aspect factor, with a significant
19
425 concentration in the South (S) class with 700 samples. Non-landslide positions predominantly
426 concentrate in the Flat class with approximately 1700 samples. Considering the Curvature factor,
427 landslide positions are scattered across the value range, whereas non-landslide positions
428 predominantly concentrate within the value range from -0.05 to 0.05. For the SPI factor, landslides
429 and non-landslides are sporadically and unevenly distributed across classes. Finally, for the TWI
430 factor, landslide positions predominantly concentrate in the value range from 0.32 – 8.0, while non-
431 landslide positions predominantly concentrate in the value range from 10 - 21.
434
436
437 The reliability and accuracy of the models based on critical parameters, including AUC,
438 PPV (%), NPV (%), SST (%), SPF (%), ACC (%), Kappa, and RMSE, are evaluated. The numerical
439 evaluation results of the models are detailed in Fig. 6 and Table 4. Accordingly, the B-DPCT model
440 yields the best results on the validation set with AUC = 0.948, PPV = 97.73%, NPV = 67.85%, SST
441 = 73.4%, SPF = 96.42%, ACC = 83.6%, Kappa = 0.67, and RMSE = 0.39 (Fig. 6b, Table 4).
444
446
20
447 Using all the datasets for the entire study area, we have established four landslide
448 susceptibility maps created by four models, namely DPCT, B-DPCT, CG-DPCT, and D-DPCT
449 (Fig. 7). Each map is divided into five susceptibility classes: very low, low, moderate, high, and
450 very high (Fig. 7) using the natural break method. According to the susceptibility classes, we found
451 that the single model DPCT has a high area ratio in the very low (41.3%) and low (41.3%) classes,
452 while the moderate, high, and very high classes only account for 17.6% (Fig. 7a, 8a). The CG-
453 DPCT model is similar to DPCT, with most of the area in the very low and low classes (56.8%),
454 while the remaining three classes account for 43.2% (Fig. 7c, 8a). Likewise, the B-DPCT model
455 dominates the area in the very low and low classes (54.1%), with the other three accounting for
456 45.9% (Fig. 7b, 8a). In the D-DPCT model, the landslide susceptibility classes are evenly
457 distributed: very low (18.8%), low (14.7%), moderate (15.8%), high (14.1%), and very high
461
462 5. DISCUSSION
463
464 Recent studies have primarily focused on evaluating the performance of models based on
465 assessments of validation datasets (Bui et al., 2020; Dao et al., 2020; Thanh et al., 2020; Ghasemian
466 et al., 2020; Phong et al., 2021; Nhu et al., 2022; Pham et al., 2022; Shahzad, Ding and Abbas,
467 2022; Bien et al., 2023; Le Minh et al., 2023). The evaluation results based on metrics of the base
468 DPCT model are as follows: AUC = 0.919, PPV = 97.63%, NPV = 56.83%, SST = 71.60%, SPF =
469 95.56%, ACC = 78.34%, Kappa = 0.56, and RMSE = 0.4. Consequently, the Bagging ensemble
21
470 technique enhances the performance of the DPCT model with the following evaluation metrics:
471 AUC = 0.948 (an increase of 0.029), PPV = 97.73% (an increase of 0.1%), NPV = 67.85% (an
472 increase of 11.02%), SST = 77.22% (an increase of 5.62%), SPF = 96.41% (increase of 0.85%),
473 ACC = 83.60% (increase of 5.26%), Kappa = 0.67 (increase of 0.11), RMSE = 0.37 (decrease of
474 0.3). Similarly, the Cascade Generalization technique also improves the performance of the base
475 DPCT model with the following evaluation metrics: AUC = 0.920 (an increase of 0.001), PPV =
476 97.63% (unchanged), NPV = 59.82% (an increase of 2.99%), SST = 73.04% (increase of 1.44%),
477 SPF = 95.77% (increase of 0.21%), ACC = 79.75% (increase of 1.41%), Kappa = 0.59 (increase of
478 0.03), RMSE = 0.39 (decrease of 0.1). Lastly, the Dagging technique also improves the performance
479 of the base DPCT model with the following evaluation metrics: AUC = 0.932 (an increase of 0.013),
480 PPV = 97.43% (decrease of 0.2%), NPV = 65.33% (an increase of 8.5%), SST = 75.80% (an
481 increase of 4.2%), SPF = 95.79% (increase of 0.23%), ACC = 82.25% (increase of 3.91%), Kappa
482 = 0.64 (increase of 0.08), RMSE = 0.37 (decrease of 0.3). The analysis results above demonstrate
483 that the Bagging, Cascade Generalization, and Dagging techniques all have the potential to enhance
484 the performance of the DPCT model (Fig. 6, Table 4). In other landslide evaluation studies, these
485 techniques also demonstrate the ability to improve the performance of a single model (Ali et al.,
486 2024; Gu et al., 2024; Zhao et al., 2024). This indicates that ensemble learning techniques can easily
487 enhance the performance of the original single model (Liu et al., 2024; Singh et al., 2024). In this
488 study, the Bagging ensemble based on the DPCT model exhibited the best performance compared
490 Alongside evaluating the performance of the models based on the evaluation metrics
491 presented above, incorporating additional analysis of landslide susceptibility map results helps
492 answer the question of which model is the most reasonable for establishing landslide susceptibility
22
493 maps in the study area (Dao et al., 2020; Phong et al., 2021; Le Minh et al., 2023). Fig. 8 presents
494 the results of the analysis of evaluation metrics based on landslide susceptibility maps. In all models,
495 the percentage of landslides and the frequency ratio (FR) of landslides tend to increase across
496 landslide susceptibility classes from very low to very high, with landslide locations primarily
497 concentrated in the high and very high classes (Figure 8b, 8d). The percentage of non-landslides
498 and the frequency ratio of non-landslides tend to decrease across landslide susceptibility classes
499 from very low to very high, with non-landslide locations mainly concentrated in the very low and
500 low classes (Fig. 8c, 8e). According to the frequency ratio evaluation method, the higher the FR
501 value of landslide classes lies above the high and very high landslide susceptibility classes, the more
502 reliable the landslide susceptibility map becomes (Dao et al., 2020; Phong et al., 2021; Bien et al.,
503 2023; Le Minh et al., 2023). Conversely, the lower the FR value of non-landslide classes lies below
504 the low and very low landslide susceptibility classes, the more reliable the landslide susceptibility
505 map becomes. Accordingly, the Bagging ensemble based on DPCT model yields the best results
506 according to the FR of landslides with FR = 4.59 in the high class and FR = 10.03 (Fig. 8d). The
507 Dagging ensemble based on DPCT model yields the best results according to the FR of non-
508 landslides with FR = 3.38 in the very low class and FR = 1.44 in the low class (Fig. 8e).
509 The results of the analysis of 14 conditioning factors for landslides based on comparing different
510 attribute selection methods show that each method yields different evaluations of the importance of
511 each factor (Table 3). This indicates that the conditioning factors all play specific roles in
512 influencing landslides (Keshri, Sarkar and Chattoraj, 2023; Yu, Wang and Pradhan, 2024).
513 Therefore, the more factors affecting landslides are selected, the more objective the evaluation and
514 establishment of landslide susceptibility maps become. In this study, we did not select factors such
515 as distance to roads or traffic density like other studies (Dao et al., 2020; Phong et al., 2021; Le
23
516 Minh et al., 2023) because the main focus of the research area is the Halong – Vandon expressway
517 Therefore, selecting parameters related to roads would not reflect landslide characteristics on these
518 conditioning factors and would introduce noise into the datasets.
519 The NPV results of all models used on training and validation datasets are much lower than
520 the PPV results (Table 4). This suggests that selecting appropriate non-landslide locations may help
521 improve the model's performance (Yang et al., 2023; Gu et al., 2024). In subsequent studies, more
522 attention should be paid to evaluating the selection of non-landslide locations (Yang et al., 2023;
523 Gu et al., 2024; Huang et al., 2024). Additionally, we believe that the following enhancements can
524 improve the performance of the models in future studies: 1) adjusting the structural parameters of
525 the model, selecting the optimal model structure, 2) enhancing the detail and resolution of the
526 conditioning factor maps, 3) ensuring consistency in the research level and scale of the data, 4)
527 considering weighting the predicted labels according to the scale of each landslide location.
528 There are still many issues related to improving the performance of models, such as selecting non-
529 landslide locations, choosing conditional factors, the level of detail of the data, and the structure of
530 the selected models. However, this is the first study to successfully apply combined models to
531 enhance the performance of DPCT models in establishing landslide susceptibility maps in
532 expressway areas. We recommend applying the Bagging ensemble based on the DPCT model to
533 establish landslide susceptibility maps for similar conditions. The landslide susceptibility map
534 based on the B-DPCT model in the research area is recommended for use in management,
535 construction planning, and disaster prevention and mitigation efforts. The classes of the landslide
536 susceptibility map provide a solid scientific basis for managers to carry out the tasks above.
537
24
538 6. CONCLUSIONS
539
540 This study highlights the potential of employing ensemble methods to improve the
541 performance of landslide susceptibility prediction. It is the first time ensemble methods based on
542 Dual Perturb and Combine for Tree-based have been applied to enhance landslide prediction
543 performance in the Halong - Vandon Expressway area, Quang Ninh, Vietnam. Through evaluations
544 and validations, the Bagging technique combined with Dual Perturb and Combine for Tree-based
545 yielded the most optimal results and is recommended for landslide prediction in similar conditions.
546 Additionally, the study demonstrates that the structure of models can significantly influence the
547 accuracy and effectiveness of landslide prediction. Among the 14 conditional factors, different
548 evaluation methods show varying impacts of each factor. Dominant influencing condition factors
549 include Slope, Elevation, Aspect, TWI, Rainfall, Curvature, and Geology.
550 The models identify approximately 10% of the study area with the highest landslide
551 susceptibility, mainly concentrated along the Halong - Vandon Expressway. The distribution of
552 landslide susceptibility classes provides a scientific basis for the government to manage and plan
553 future construction projects and implement preventive measures to minimize landslide risks. This
554 study suggests that further research is necessary to help improve the performance of landslide
555 prediction. The performance of future landslide prediction models can be enhanced by modifying
556 model structures, adjusting conditional factors, and selecting suitable non-landslide locations.
557 Particularly, improving the data detail significantly impacts the enhancement of landslide prediction
558 performance.
559
25
560 ACKNOWLEDGMENTS
561
562 This research is funded by the Vietnam National Foundation for Science and Technology
564
565 REFERENCES
566
567 Ado, M., Amitab, K., Maji, A.K., Jasińska, E., Gono, R., Leonowicz, Z. and Jasiński, M., 2022,
568 Landslide susceptibility mapping using machine learning: a literature survey. Remote
570 AghaKouchak, A., Chiang, F., Huning, L.S., Love, C.A., Mallakpour, I., Mazdiyasni, O.,
571 Moftakhari, H., Papalexiou, S.M., Ragno, E. and Sadegh, M., 2020, Climate extremes and
572 compound hazards in a warming world. Annual Review of Earth and Planetary Sciences,
574 Ajin, R.S., Saha, S., Saha, A., Biju, A., Costache, R. and Kuriakose, S.L., 2022, Enhancing the
575 accuracy of the REPTree by integrating the hybrid ensemble meta-classifiers for modelling
576 the landslide susceptibility of Idukki district, south-western India. Journal of the Indian
578 Ali, N., Chen, J., Fu, X., Ali, R., Hussain, M.A., Daud, H., Hussain, J. and Altalbe, A., 2024,
579 Integrating machine learning ensembles for landslide susceptibility mapping in northern
581 Arabameri, A., Chandra Pal, S., Rezaie, F., Chakrabortty, R., Saha, A., Blaschke, T., Di Napoli, M.,
582 Ghorbanzadeh, O. and Thi Ngo, P.T., 2022, Decision tree based ensemble machine learning
26
583 approaches for landslide susceptibility mapping. Geocarto International, 37, 4594–4627.
584 https://doi.org/10.1080/10106049.2021.1892210
585 Asadi, M., Goli Mokhtari, L., Shirzadi, A., Shahabi, H. and Bahrami, S., 2022, A comparison study
586 on the quantitative statistical methods for spatial prediction of shallow landslides (case
587 study: Yozidar-Degaga route in Kurdistan province, Iran). Environmental Earth Sciences,
589 Azarafza, M., Azarafza, M., Akgün, H., Atkinson, P.M. and Derakhshani, R., 2021, Deep learning-
591 https://doi.org/10.1038/s41598-021-03585-1
592 Azevedo, B.F., Rocha, A.M.A.C. and Pereira, A.I., 2024, Hybrid approaches to optimization and
594 https://doi.org/10.1007/s10994-023-06467-x
595 Baeza, C., Lantada, N. and Amorim, S., 2016, Statistical and spatial analysis of landslide
596 susceptibility maps with different classification systems. Environmental Earth Sciences, 75,
598 Beigh, I.H. and Bukhari, S.K., 2024, Landslide susceptibility assessment using GIS-based
599 multicriteria decision analysis (MCDA) along a part of national expressway-1, Kashmir-
601 Bien, T.X., Iqbal, M., Jamal, A., Nguyen, D.D., Van Phong, T., Costache, R., Ho, L.S., Van Le, H.,
602 Nguyen, H.B.T., Prakash, I. and Pham, B.T., 2023, Integration of rotation forest and
603 multiboost ensemble methods with forest by penalizing attributes for spatial prediction of
604 landslide susceptible areas. Stochastic Environmental Research and Risk Assessment, 37,
27
606 Binh Thai, P., Duc Nguyen, D., Bui Thi, Q.-A., Duc Nguyen, M., Tien Vu, T. and Prakash, I., 2022,
607 Estimation of load-bearing capacity of bored piles using machine learning models. Vietnam
609 Breiman, L., 1996, Bagging predictors. Machine learning, 24, 123–140.
610 https://doi.org/10.1007/BF00058655
611 Bui, D.T., Tsangaratos, P., Nguyen, V.-T., Liem, N.V. and Trinh, P.T., 2020, Comparing the
612 prediction performance of a Deep Learning Neural Network model with conventional
613 machine learning models in landslide susceptibility assessment. Catena, 188, 104426.
614 https://doi.org/10.1016/j.catena.2019.104426
615 Bui, Q.D., Ha, H., Khuc, D.T., Nguyen, D.Q., von Meding, J., Nguyen, L.P. and Luu, C., 2023,
616 Landslide susceptibility prediction mapping with advanced ensemble models: Son La
618 05764-3
619 Çellek, S., 2020, Effect of the slope angle and its classification on landslide. Natural Hazards Earth
621 Chacón, J., Irigaray, C., Fernández, T. and El Hamdouni, R., 2006, Engineering geology maps:
622 landslides and geographical information systems. Bulletin of Engineering Geology and the
624 Chen, W., Yan, X., Zhao, Z., Hong, H., Bui, D. and Pradhan, B., 2019, Spatial prediction of
625 landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and
626 RBFNetwork models for the Long county area (China). Bulletin of Engineering Geology
28
628 Chen, X. and Chen, W., 2021, GIS-based landslide susceptibility assessment using optimized
630 https://doi.org/10.1016/j.catena.2020.104833
631 Conoscenti, C., Di Maggio, C. and Rotigliano, E., 2008, GIS analysis to assess landslide
633 https://doi.org/10.1016/j.geomorph.2006.10.039
634 Costanzo, D. and Irigaray, C., 2020, Comparing forward conditional analysis and forward logistic
637 Crosta, G.B. and Frattini, P., 2008, Rainfall-induced landslides and debris flows. Hydrological
639 D’Amato, G. and Akdis, C.A., 2020, Global warming, climate change, air pollution and allergies.
641 Dao, D.V., Jaafari, A., Bayat, M., Mafi-Gholami, D., Qi, C., Moayedi, H., Phong, T.V., Ly, H.-B.,
642 Le, T.-T., Trinh, P.T., Luu, C., Quoc, N.K., Thanh, B.N. and Pham, B.T., 2020, A spatially
643 explicit deep learning neural network model for the prediction of landslide susceptibility.
645 Dao Minh, D., Vu Cao, M., Hoang Hai, Y., Nguyen The, L. and Do Minh, D., 2023, Analysis of
646 landslide kinematics integrating weather and geotechnical monitoring data at Tan Son slow
647 moving landslide in Ha Giang province. Vietnam Journal of Earth Sciences, 45, 131–146.
648 https://doi.org/10.15625/2615-9783/18204
649 Di Napoli, M., Carotenuto, F., Cevasco, A., Confuorto, P., Di Martire, D., Firpo, M., Pepe, G.,
650 Raso, E. and Calcaterra, D., 2020, Machine learning ensemble modelling as a tool to
29
651 improve landslide susceptibility mapping reliability. Landslides, 17, 1897–1914.
652 https://doi.org/10.1007/s10346-020-01392-9
653 Doan, V.L., Nguyen, B.-Q.-V., Nguyen, C.C. and Nguyen, C.T., 2024, Effect of time-variant
654 rainfall on landslide susceptibility: a case study in Quang Ngai province, Vietnam. Vietnam
656 Fang, Z., Wang, Y., Duan, G. and Peng, L., 2021, Landslide susceptibility mapping using rotation
657 forest ensemble technique with different decision trees in the Three Gorges reservoir area,
660 D.C., 2024, Disaster risk management, climate change adaptation and the role of spatial and
661 urban planning: evidence from European case studies. Natural Hazards.
662 https://doi.org/10.1007/s11069-024-06448-w
663 Gama, J. and Brazdil, P., 2000, Cascade generalization. Machine Learning, 41, 315–343.
664 https://doi.org/10.1023/A:1007652114878
665 Geurts, P., 2001, Dual perturb and combine algorithm. Proceedings of the 8th International
667 Geurts, P. and Wehenkel, L., 2005, Closed-form dual perturb and combine for tree-based models.
668 Proceedings of the 22nd International Conference on Machine Learning, New York, 233–
670 Ghasemian, B., Asl, D.T., Pham, B.T., Avand, M., Nguyen, H.D. and Janizadeh, S., 2020, Shallow
671 landslide susceptibility mapping: A comparison between classification and regression tree
672 and reduced error pruning tree algorithms. Vietnam Journal of Earth Sciences, 42, 208–227.
673 https://doi.org/10.15625/0866-7187/42/3/14952
30
674 Giang Linh, T., Dang Kinh, B. and Bui Thanh, Q., 2023, Coastline and shoreline change assessment
675 in sandy coasts based on machine learning models and high-resolution satellite images.
677 9783/18407
678 Gu, T., Duan, P., Wang, M., Li, J. and Zhang, Y., 2024, Effects of non-landslide sampling strategies
679 on machine learning models in landslide susceptibility mapping. Scientific Reports, 14,
681 Hoang-Cong, H., Ngo-Duc, T., Nguyen-Thi, T., Trinh-Tuan, L., Jing Xiang, C., Tangang, F.,
682 Jerasorn, S. and Phan-Van, T., 2022, A high-resolution climate experiment over part of
683 Vietnam and the Lower Mekong Basin: performance evaluation and projection for rainfall.
685 Holte, R.C., 1993, Very simple classification rules perform well on most commonly used datasets.
687 Hong, H., 2023a, Assessing landslide susceptibility based on hybrid multilayer perceptron with
688 ensemble learning. Bulletin of Engineering Geology and the Environment, 82, 382.
689 https://doi.org/10.1007/s10064-023-03409-8
690 Hong, H., 2023b, Assessing landslide susceptibility based on hybrid Best-first decision tree with
692 https://doi.org/10.1016/j.ecolind.2023.109968
693 Huang, F., Xiong, H., Jiang, S.-H., Yao, C., Fan, X., Catani, F., Chang, Z., Zhou, X., Huang, J. and
694 Liu, K., 2024, Modelling landslide susceptibility prediction: a review and construction of
696 https://doi.org/10.1016/j.earscirev.2024.104700
31
697 Ibrahim, M.B., Harahap, I.S.H., Balogun, A.-L.B. and Usman, A., 2020, The use of geospatial data
698 from GIS in the quantitative analysis of landslides. IOP Conference Series: Earth and
700 Kalantar, B., Ueda, N., Saeidi, V., Ahmadi, K., Halin, A.A. and Shabani, F., 2020, Landslide
701 susceptibility mapping: machine and ensemble learning based on remote sensing big data.
703 Keshri, D., Sarkar, K. and Chattoraj, S.L., 2023, Landslide susceptibility mapping in parts of Aglar
704 watershed, Lesser Himalaya based on frequency ratio method in GIS environment. Journal
706 Khosravi, K., Golkarian, A., Melesse, A.M. and Deo, R.C., 2022, Suspended sediment load
707 modeling using advanced hybrid rotation forest based elastic network approach. Journal of
709 Le Minh, N., Truyen, P.T., Van Phong, T., Jaafari, A., Amiri, M., Van Duong, N., Van Bien, N.,
710 Duc, D.M., Prakash, I. and Pham, B.T., 2023, Ensemble models based on radial basis
711 function network for landslide susceptibility mapping. Environmental Science and Pollution
713 Liu, L.-L., Danish, A., Wang, X.-M. and Zhu, W.-Q., 2024, Ensemble stacking: a powerful tool for
714 landslide susceptibility assessment – a case study in Anhua county, Hunan province, China.
716 Liu, S., Wang, L., Zhang, W., He, Y. and Pijush, S., 2023, A comprehensive review of machine
717 learning-based methods in landslide susceptibility mapping. Geological Journal, 58, 2283–
32
719 Lucchese, L.V., de Oliveira, G.G. and Pedrollo, O.C., 2020, Attribute selection using correlations
720 and principal components for artificial neural networks employment for landslide
722 https://doi.org/10.1007/s10661-019-7968-0
723 Mai, N.T., 1996, Forecasting occurrence of landslide related to the tropical weathering crust by
725 Masson-Delmotte, V., Zhai, P., Pörtner, H.-O., Roberts, D., Skea, J. and Shukla, P.R., 2022, Global
726 Warming of 1.5°C: IPCC Special Report on Impacts of Global Warming of 1.5°C above
729 Nettleton, D., 2014, Chapter 6 - Selection of variables and factor derivation. In: Nettleton, D. (ed.),
731 https://doi.org/10.1016/B978-0-12-416602-8.00006-6
732 Ngewie, D.T.L., 2024, The impacts of road transport infrastructure and the socio-economic
733 development in the Bamenda III municipality, Mezam division, north west region
735 https://doi.org/10.51699/ijbde.v3i1.3322
736 Ngo-Duc, T., 2023, Rainfall extremes in northern Vietnam: a comprehensive analysis of patterns
737 and trends. Vietnam Journal of Earth Sciences, 45, 183-198. https://doi.org/10.15625/2615-
738 9783/18284
739 Nguyen, L.C., Tien, P.V. and Do, T.-N., 2020, Deep-seated rainfall-induced landslides on a new
741 https://doi.org/10.1007/s10346-019-01293-6
33
742 Nhu, V.-H., Bui, T.T., My, L.N., Vuong, H. and Duc, H.N., 2022, A new approach based on
743 integration of random subspace and C4.5 decision tree learning method for spatial prediction
745 https://doi.org/10.15625/2615-9783/16929
746 Ogunbode, C.A., Doran, R. and Böhm, G., 2020, Exposure to the IPCC special report on 1.5 °C
747 global warming is linked to perceived threat and increased concern about climate change.
749 Ohlmacher, G.C., 2000, The relationship between geology and landslide hazards of Atchison,
750 Kansas, and vicinity. Current Research in Earth Sciences, 244, 1–16.
751 https://doi.org/10.17161/cres.v0i244.11833
752 Pasang, S. and Kubíček, P., 2020, Landslide susceptibility mapping using statistical methods along
754 https://doi.org/10.3390/geosciences10110430
755 Pham, B., Prakash, I., Chen, W., Ly, H.-B., Ho, L., Omidvar, E., Tran, V. and Bui, D., 2019, A
758 https://doi.org/10.3390/su11226323
759 Pham, B.T., Vu, V.D., Costache, R., Phong, T.V., Ngo, T.Q., Tran, T.-H., Nguyen, H.D., Amiri,
760 M., Tan, M.T., Trinh, P.T., Le, H.V. and Prakash, I., 2022, Landslide susceptibility mapping
761 using state-of-the-art machine learning ensembles. Geocarto International, 37, 5175–5200.
762 https://doi.org/10.1080/10106049.2021.1914746
763 Phong, T.V., Phan, T.T., Prakash, I., Singh, S.K., Shirzadi, A., Chapi, K., Ly, H.-B., Ho, L.S., Quoc,
764 N.K. and Pham, B.T., 2021, Landslide susceptibility modeling using different artificial
34
765 intelligence methods: a case study at Muong Lay district, Vietnam. Geocarto International,
767 Polemio, M. and Petrucci, O., 2000, Rainfall as a landslide triggering factor an overview of recent
769 Prakash, I., Nguyen, D.D., Tuan, N.T. and Phong, T.V., 2024, Landslide susceptibility zoning:
770 integrating multiple intelligent models with SHAP analysis. Journal of Science and
772 Quinlan, J.R., 1986, Induction of decision trees. Machine Learning, 1, 81–106.
773 https://doi.org/10.1007/BF00116251
774 Rabby, Y.W., Li, Y., Abedin, J. and Sabrina, S., 2022, Impact of land use/land cover change on
777 https://doi.org/10.3390/ijgi11020089
778 Ramos-Bernal, R.N., Vázquez-Jiménez, R., Cantú-Ramírez, C.A., Alarcón-Paredes, A., Alonso-
779 Silverio, G.A., G. Bruzón, A., Arrogante-Funes, F., Martín-González, F., Novillo, C.J. and
780 Arrogante-Funes, P., 2021, Evaluation of conditioning factors of slope instability and
781 continuous change maps in the generation of landslide inventory maps using machine
783 Saha, S., Roy, J., Pradhan, B. and Hembram, T.K., 2021, Hybrid ensemble machine learning
784 approaches for landslide susceptibility mapping using different sampling ratios at east
786 https://doi.org/10.1016/j.asr.2021.05.018
35
787 Sassa, K., Mikoš, M., Sassa, S., Bobrowsky, P.T., Takara, K. and Dang, K., 2020, Understanding
789 60196-6
790 Seda, C., 2021, The Effect of aspect on landslide and its relationship with other parameters. In:
792 https://doi.org/10.5772/intechopen.99389
793 Shahzad, N., Ding, X. and Abbas, S., 2022, A comparative assessment of machine learning models
794 for landslide susceptibility mapping in the rugged terrain of northern Pakistan. Applied
796 Shano, L., Raghuvanshi, T.K. and Meten, M., 2020, Landslide susceptibility evaluation and hazard
798 https://doi.org/10.1186/s40677-020-00152-0
799 Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A., Shahabi, H. and Revhaug,
800 I., 2017, Shallow landslide susceptibility assessment using a novel hybrid intelligence
802 y
803 Singh, K., Bhardwaj, V., Sharma, A. and Thakur, S., 2024, A comprehensive review on landslide
805 https://doi.org/10.14746/quageo-2024-0005
806 Sitányiová, D., Vondráčková, T., Stopka, O., Myslivečková, M. and Muzik, J., 2015, GIS based
807 methodology for the geotechnical evaluation of landslide areas. Procedia Earth and
36
809 Sterlacchini, S., Ballabio, C., Blahut, J., Masetti, M. and Sorichetta, A., 2011, Spatial agreement of
811 https://doi.org/10.1016/j.geomorph.2010.09.004
812 Sun, D., Gu, Q., Wen, H., Xu, J., Zhang, Y., Shi, S., Xue, M. and Zhou, X., 2023, Assessment of
813 landslide susceptibility along mountain expressways based on different machine learning
814 algorithms and mapping units by hybrid factors screening and sample optimization.
816 Tacher, L., Bonnard, C., Laloui, L. and Parriaux, A., 2005, Modelling the behaviour of a large
819 Tang, H., Wang, C., An, S., Wang, Q. and Jiang, C., 2023, A novel heterogeneous ensemble
820 framework based on machine learning models for shallow landslide susceptibility mapping.
822 Technology, V.I.f.B.S.a., 2009, Vietnam building code natural physical & climatic data for
823 construction.
824 Tehrani, F.S., Santinelli, G. and Herrera Herrera, M., 2021, Multi-regional landslide detection using
825 combined unsupervised and supervised machine learning. Geomatics, Natural Hazards and
827 Thanh, D.Q., Nguyen, D.H., Prakash, I., Jaafari, A., Nguyen, V.T., Phong, T.V. and Pham, B.T.,
828 2020, GIS based frequency ratio method for landslide susceptibility mapping at Da Lat City,
829 Lam Dong province, Vietnam. Vietnam Journal of Earth Sciences, 42, 55–66.
830 https://doi.org/10.15625/0866-7187/42/1/14758
37
831 Thanh, T.-D., 2011, Stratigraphic units of Viet Nam (Second Edition - Revised and Updated).
833 Tin, D., Cheng, L., Le, D., Hata, R. and Ciottone, G., 2024, Natural disasters: a comprehensive
834 study using EMDAT database 1995–2022. Public Health, 226, 255–260.
835 https://doi.org/10.1016/j.puhe.2023.11.017
836 Ting, K.M. and Witten, I.H., Year, Stacking bagged and dagged models. Proceedings of the 14th
838 Tong, Z.l., Guan, Q.t., Arabameri, A., Loche, M. and Scaringi, G., 2023, Application of novel
841 03328-8
842 UNDRR, 2022, United nations office for disaster risk reduction: annual report 2022, report, 7bis
844 Van Tien, P., Luong, L.H., Nhat, L.M., Thanh, N.K. and Van Cuong, P., 2021, Landslides Along
845 Halong-Vandon Expressway in Quang Ninh Province, Vietnam. In: Guzzetti, F., Mihalić
846 Arbanas, S., Reichenbach, P., Sassa, K., Bobrowsky, P.T. and Takara, K. (eds.),
847 Understanding and Reducing Landslide Disaster Risk: Volume 2 From Mapping to Hazard
849 7_14
850 Ward, P.J., Blauhut, V., Bloemendaal, N., Daniell, J.E., de Ruiter, M.C., Duncan, M.J., Emberson,
851 R., Jenkins, S.F., Kirschbaum, D., Kunz, M., Mohr, S., Muis, S., Riddell, G.A., Schäfer, A.,
852 Stanley, T., Veldkamp, T.I.E. and Winsemius, H.C., 2020, Review article: Natural hazard
38
853 risk assessments at the global scale. Natural Hazards Earth System Sciences, 20, 1069–1096.
854 https://doi.org/10.5194/nhess-20-1069-2020
855 Wardhani, N.W.S., Rochayani, M.Y., Iriany, A., Sulistyono, A.D. and Lestantyo, P., Year, Cross-
856 validation metrics for evaluating classification performance on imbalanced data. 2019
857 International Conference on Computer, Control, Informatics and its Applications (IC3INA),
859 Yadav, M., Pal, S.K., Singh, P.K. and Gupta, N., 2023, Landslide susceptibility zonation mapping
860 using frequency ratio, information value model, and logistic regression model: a case study
861 of Kohima district in Nagaland, India. In: Thambidurai, P. and Singh, T.N. (eds.),
864 Yang, C., Liu, L.-L., Huang, F., Huang, L. and Wang, X.-M., 2023, Machine learning-based
867 Yilmaz, I., 2009, Landslide susceptibility mapping using frequency ratio, logistic regression,
868 artificial neural networks and their comparison: A case study from Kat landslides (Tokat—
870 https://doi.org/10.1016/j.cageo.2008.08.007
871 Yu, L., Wang, Y. and Pradhan, B., 2024, Enhancing landslide susceptibility mapping incorporating
872 landslide typology via stacking ensemble machine learning in Three Gorges reservoir,
39
874 Zaini, A.Z.A., Vonnisa, M. and Marzuki, M., 2024, Impact of different ENSO positions and Indian
875 Ocean Dipole events on Indonesian rainfall. Vietnam Journal of Earth Sciences, 46, 100–
877 Zandalinas, S.I., Fritschi, F.B. and Mittler, R., 2021, Global warming, climate change, and
878 environmental pollution: recipe for a multifactorial stress combination disaster. Trends in
880 Zeng, T., Wu, L., Peduto, D., Glade, T., Hayakawa, Y.S. and Yin, K., 2023, Ensemble learning
881 framework for landslide susceptibility mapping: Different basic classifier and ensemble
883 Zhang, K., Wang, S., Bao, H. and Zhao, X., 2019, Characteristics and influencing factors of rainfall-
884 induced landslide and debris flow hazards in Shaanxi province, China. Natural Hazards
886 Zhang, Q., Ning, Z., Ding, X., Wu, J., Wang, Z., Tsangaratos, P., Ilia, I., Wang, Y. and Chen, W.,
887 2024, Hybrid integration of bagging and decision tree algorithms for landslide susceptibility
889 Zhao, F., Miao, F., Wu, Y., Ke, C., Gong, S. and Ding, Y., 2024, Refined landslide susceptibility
890 mapping in township area using ensemble machine learning method under dataset
892 https://doi.org/10.1016/j.gr.2024.02.011
893 Zhou, Z., Duan, J., Geng, S. and Li, R., 2024, The role of expressway construction in influencing
894 agricultural green total factor productivity in China: agricultural industry structure
896 https://doi.org/10.3389/fsufs.2023.1315201
40
897 LIST OF TABLES AND FIGURES
898
899 Table 1. The parameters of the models used for establishing landslide susceptibility maps in the
901 Table 2: The source of the conditional factors used in this study
902 Table 3. The ranking compares conditional factors using selecting attribute methods
904
905 Fig. 1. Area study of Halong – Vandon Expressway for Landslide Susceptibility Mapping.
906 Fig. 2. Several technically reinforced landslides continue to occur along the Halong - Vandon
907 expressway: a) at km 10, b) at km 19, c) at km 24, and d) at km 30 (photo source: Tuan-Nghia Do).
908 Fig 3. Flow chart of the methodology for Landslide Susceptibility Mapping in Halong – Vandon
909 Expressway.
910 Fig. 4: Conditional factor maps for Landslide Susceptibility Mapping in the Halong - Vandon
912 Fig. 5. Zonal histogram between conditional factor maps with Landslide and Non-landslide
914 Fig. 6. AUC performance of the models: a) Training dataset, b) Validation dataset.
915 Fig. 7. Landslide susceptibility maps in the Halong – Vandong Expressway area: a) DPCT, b) B-
917 Fig. 8. Analysis results of landslide susceptibility maps: a) Percentage of area of landslide
918 susceptibility classes on each landslide susceptibility class, b) Percentage of validation landslide
919 dataset on each landslide susceptibility class, c) Percentage of validation non-landslide dataset on
41
920 each landslide susceptibility class, d) Frequency ratio of landslides on each landslide susceptibility
921 class, and e) Frequency ratio of non-landslides on each landslide susceptibility class.
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
42
939 TABLE
940
941 Table 1. The parameters of the models used for establishing landslide susceptibility maps in the
Models
No Hyperparameters Cascade
DPCT Bagging Dagging
Generalization
1 Lambda 0.2 - - -
6 Number of Interactions - 10 - -
7 Seed - 1 1 1
8 Number of Folds - - 20 2
943
944
945
946
947
948
43
949 Table 2: The source of the conditional factors used in this study
Stream density
9 10 m Generated from DEM
2
(km/km )
44
13 TWI 10 m Generated from DEM
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
45
967 Table 3. The ranking compares conditional factors using selecting attribute methods
Methods
Rainfall
5 0.28 Aspect 0.12 Aspect 77.01
(mm/day)
Fault density
6 0.24 Geology 0.11 Geology 75.36
2
(km/km )
Geotechnical Rainfall
9 0.14 Stream density 0.06 65.62
Engineering (mm/day)
Weathering
11 Geology 0.06 Hydrogeology 0.04 60.58
crust
Rainfall
12 LULC 0.02 0.04 Stream density 60.39
(mm/day)
46
13 Curvature 0.01 LULC 0.03 LULC 59.29
Weathering
14 SPI 0.01 0.03 Hydrogeology 57.21
crust
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
47
984 Table 4. Models performance using multicriteria
Models
3 FP 16 15 17 38 23 22 23 25
5 PPV (%) 99.28 99.33 99.24 98.30 97.63 97.73 97.63 97.43
6 NPV (%) 76.09 88.34 78.26 90.09 56.83 67.85 59.82 65.33
7 SST (%) 79.51 88.84 81.01 90.27 71.60 77.22 73.04 75.80
8 SPF (%) 99.13 99.30 99.10 98.27 95.56 96.41 95.77 95.79
9 ACC (%) 87.29 93.65 88.40 94.06 78.34 83.60 79.75 82.25
985
986
987
988
989
990
48
991 FIGURE
992
993
994 Fig. 1. Area study of Halong – Vandon Expressway for Landslide Susceptibility Mapping.
49
995
996 Fig. 2. Several technically reinforced landslides continue to occur along the Halong - Vandon
997 expressway: a) at km 10, b) at km 19, c) at km 24, and d) at km 30 (photo source: Tuan-Nghia Do).
50
998
999 Fig 3. Flow chart of the methodology for Landslide Susceptibility Mapping in Halong – Vandon
1000 Expressway.
51
1001
1002
1003 Fig. 4: Conditional factor maps for Landslide Susceptibility Mapping in the Halong - Vandon
52
1005
1006 Fig. 5. Zonal histogram between conditional factor maps with Landslide and Non-landslide
53
1008
1009 Fig. 6. AUC performance of the models: a) Training dataset, b) Validation dataset.
1010
1011
1012
1013
54
1014
1015 Fig. 7. Landslide susceptibility maps in the Halong – Vandong Expressway area: a) DPCT, b) B-
55
1017
1018 Fig. 8. Analysis results of landslide susceptibility maps: a) Percentage of area of landslide
1019 susceptibility classes on each landslide susceptibility class, b) Percentage of validation landslide
1020 dataset on each landslide susceptibility class, c) Percentage of validation non-landslide dataset on
1021 each landslide susceptibility class, d) Frequency ratio of landslides on each landslide susceptibility
1022 class, and e) Frequency ratio of non-landslides on each landslide susceptibility class.
56