0% found this document useful (0 votes)

90 views12 pages

1 s2.0 S0925753523000802 Main

Uploaded by

jeffshiu2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views12 pages

1 s2.0 S0925753523000802 Main

Uploaded by

jeffshiu2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Safety Science 163 (2023) 106138

Contents lists available at ScienceDirect

Safety Science
journal homepage: www.elsevier.com/locate/safety

Application of machine learning technology for occupational accident

severity prediction in the case of construction collapse accidents
Xixi Luo a, Xinchun Li a, *, Yang Miang Goh b, Xuefeng Song a, Quanlong Liu a
a
School of Economics and Management, China University of Mining & Technology, Xuzhou 221116, China
b
College of Design and Engineering, National University of Singapore, Singapore 117566, Singapore

A R T I C L E I N F O A B S T R A C T

Keywords: Machine learning algorithms are capable of handling complex non-linear problems related to the prediction
Occupational safety and health (OSH) domain, but further exploration is required for automated, semi-supervised outcome prediction of occupational
Data preprocessing accidents employing unstructured textual data. It has been demonstrated that the injury severity can be predicted
Random forest (RF)
from the equipment, scenario and environmental attributes in the workplace, so this paper aims to enhance text
Severity prediction
Construction collapse accidents
data pre-processing and optimize machine learning algorithms to create an attribute factor-based occupational
accident severity prediction framework, mapping characteristic attributes to accident severity categories (i.e.,
casualties and property damage). The reliability validation of the prediction framework and analysis of critical
attribute components are performed using the collapse accidents data in construction engineering as a case study,
which is the third most serious occupational problem. The findings indicate that the dataset obtained after
addressing the class imbalance issue and improving the text segmentation procedure can be utilized as a training
sample to accurately predict injury severity. The accuracy of the prediction model is evaluated in three simulated
scenarios, and it can reach 82%, confirming the robust performance of the prediction model based on RF ma
chine learning. Additionally, the outcomes of the measured ranking of feature importance enable the identifi
cation of critical attributes that can credibly explain the causal relationships resulting in injury severity findings,
and provide managers with accident prevention strategies to minimize occupational injuries and losses.

1. Introduction accident texts has proven to be an essential step for safety risk man
agement in construction projects. However, the typical analysis of his
With the introduction of Industry 4.0 and the promotion of the smart torical accident data based on questionnaires and expert interviews have
factory concept, technologies such as big data, artificial intelligence flaws such as insufficient objectivity and weak interpretative capacity,
(AI), and the Internet of Things (IoT) are widely used in the and there is an urgent need to establish new multi-source data sources
manufacturing industry, and they have increased production efficiency (Xu et al., 2021). Emerging professional technologies like computer
while also bringing about more uncertainties and material losses for vision, unmanned aerial vehicles (UAV), and virtual reality (VR) bring
occupational safety and health (OSH) (Song and Yang, 2021; Zorzenon fundamental support for the gathering, processing, and storage of
et al., 2022). According to the statistics of the International Labor Or massive heterogeneous data, enabling the organization and application
ganization (ILO), more than 2.78 million workers worldwide die from of structured data, unstructured data, real-time collection data, and
occupational accidents each year, and 374 million more experience non- other multiform data for internal decision-making, and guaranteeing the
fatal occupational accidents, with the construction industry accounting expansion of occupational production risk assessment and management
for roughly 60% of these fatalities (ILO et al., 2021). As a result, it’s strategies. For instance, identifying safety hazards for construction
crucial to acknowledge the significance of OSH in the mining, trans workers by measuring gait variability factors with a wearable insole
portation and chemical industries as well as to give the construction pressure system (Antwi-Afari et al., 2020). A wearable inertial mea
sector more consideration. surement unit (WIMU) is utilized to collect data on unusual gait patterns
Learning from past accidents is crucial to prevent accident scenarios during a worker’s cycle, combined with position detection data to
involving the same hazardous substances, so knowledge mining of OSH identify slip, trip, and fall hazards (Yang and Ahn, 2019). Using portable

* Corresponding author at: University RD. 1, Xuzhou, Jiangsu Province 221116, China.
E-mail address: [email protected] (X. Li).

https://doi.org/10.1016/j.ssci.2023.106138
Received 1 August 2022; Received in revised form 23 December 2022; Accepted 5 March 2023
Available online 14 March 2023
0925-7535/© 2023 Published by Elsevier Ltd.
X. Luo et al. Safety Science 163 (2023) 106138

eye-tracking devices to collect scan data to identify psychological risk- et al., 2021). Occupational safety and health concerns in the modern
seeking behaviors in the construction industry (Xu et al., 2019). Un industrial process are currently receiving a lot of attention worldwide.
structured accident text data has emerged as a significant data source for For example, Beckert and Barros (2022) explored the novel occupational
risk management in the construction industry because, in addition to the health risks of solid waste collection workers (SWCW) in Sao Paulo,
widespread application of structured data, a significant amount of Brazil, during the COVID-19 pandemic, showing that employees work
valuable information contained in unstructured data also provides an ing in waste collection during the COVID-19 pandemic were exposed to
efficient channel for OSH knowledge mining. increased vulnerability to infection. Guzman et al. (2022) used partial
There are many alternative areas for occupational accident research, least squares structural equation modeling (PLS-SEM) to simulate OSH
and hazard monitoring and risk assessment have been among the pop in the oil and gas industry (OGI) during the COVID-19 pandemic, inte
ular research topics. In this research environment, several research grating five domains (direct exposure hazards in the workplace, policies
findings related to the identification of occupational accident risk fac and procedures, self-safety culture perception, self-safety responsibility
tors, simulation of hazard scenarios, prediction of the frequency and perception, preventive measures) for OSH assessment, which verified
severity of injury events, and decision-rule-based prevention counter the positive effect of safety culture perception on advancing OSH. Liu
measures are available (Sarkar and Maiti, 2020). Prediction of injury et al. (2021) developed a novel OSH risk assessment model that com
severity aims to improve project safety performance and suggest pro bines picture fuzzy sets (PFS) and alternative queuing methods (AQM) to
active corrective measures for noted near misses. Numerous studies have assess and rank occupational hazard risk factors, and verified the
been done on the use of injury severity prediction in various domains, in feasibility and superiority of the model through a real case of con
recognition of the advantages of the research. For example, Lu (2022) struction site excavation. In addition, studies on OSH issues for in
applied data from the Global Integrated Shipping Information System dustries such as mining (Lu et al., 2022), textile (Mutlu and Altuntas,
(GISIS) to calculate the severity probability of maritime routes based on 2019) and fish farming (Thorvaldsen et al., 2020) can provide a better
random forest (RF) algorithm, and quantified the severity through understanding of process risk factors and the resulting occupational
expert scoring. Chebila (2020) applied the Major Accident Reporting injuries for effective interventions to prevent and improve employee
System (MARS) to predict the consequences of accidents involving safety and health.
hazardous chemicals based on multiple machine learning algorithms.
Baker et al. (2020) extracted construction methods and environmental 2.2. Accident severity analysis based on text data
conditions from accident reports based on handwritten logic rules and
custom dictionaries, and predicted injury severity based on SVM and In occupational accident research, causal analysis of risk factors and
XGBoost algorithms. As one of the most lethal industries, OSH risk risk assessment have always been hot topics, and with the development
research in the construction industry has a significant role in reducing of artificial intelligence techniques and a deeper understanding of data
casualties and property damage (Goh et al., 2018), and collapse acci in text form rich in valuable information, there has been a gradual rise in
dents are a particularly dangerous accident category in the construction studies using quantitative methods to predict the probability and
industry. Mass fatalities and injuries are a common feature of accidents, severity of occupational accidents. For example, Wang and Yang (2018)
therefore the distribution of accident severity in the textual data is collected investigation reports of waterway accidents from the Maritime
comparatively uniform and suitable as an experimental case to investi Safety Administration (MSA) of China, mined characteristic factors in
gate the implied relationship between attribute factors and injury text cases based on grounded theory, and then developed a Bayesian
severity. Although predicting injury severity is essential for accident network (BN) risk assessment model to identify the dependencies be
prevention, few research have used textual data for risk assessment and tween key risk factors and waterway accidents severity. Poh (2018) used
model optimization for construction collapse accidents. safety inspection records, accident cases and project-related data ob
The primary purpose of this paper is to enhance the pre-processing of tained from a large contractor as safety leading indicators based on five
text data and optimize the risk assessment model in order to systemat popular ML algorithms to predict the occurrence and severity of con
ically establish a framework for predicting the severity of occupational struction accidents, guided by the industry-recognized Cross-industry
accidents. The data preprocessing stage focuses on addressing the issues Standard Process for Data Mining (CRISP-DM) framework. Stemn and
of word segmentation accuracy, category imbalance and multi Krampah (2022) investigated the association between accident factors
collinearity in the text, and the risk assessment model optimization stage and different injury severity levels in the open pit mining industry using
focuses on confirming the model’s validity and obtaining critical attri correspondence analysis based on the accident reports collected by the
bute factors based on the results of performance indicators in three Inspection Department of Mining Commission (IDMC), which helped to
scenarios. Therefore, this study contributes to the existing literature by: gain a deeper understanding of the hidden and root causes of mine ac
cidents. Tamascelli et al. (2022) mapped the extracted accident feature
(i) Phrase extraction techniques based on mutual information and attributes to the corresponding accident severity categories based on the
information entropy are utilized to achieve automated and ac text reports from the Major Hazardous Accident Data Service database,
curate text segmentation. and the constructed Wide&Deep model showed superior prediction
(ii) The data pre-processing procedure can process sample data more performance.
thoroughly by including oversampling and correlation coefficient
measurement. 2.3. Using text mining and machine learning techniques for construction
(iii) Simulating the prediction performance of accident severity under accident research
each of the three scenarios using the RF algorithm.
(iv) The critical attribute factors of occupational accidents are The introduction of AI technology into the workflow can drive de
analyzed using the results of the feature importance ranking. cision support systems based on a data-driven approach, which can not
only fully capture the potential value contained in the data, but also
2. Literature review reasonably promote structural transformation based on simulated
complex scenarios, resulting in numerous innovative researches in the
2.1. Occupational safety and health (OSH) field of construction occupational accident analysis. In the content
analysis and retrieval applications of unstructured accident textual in
The term “OSH” refers to technical and organizational steps taken to formation, Tixier et al. (2016) used natural language processing (NLP)
ensure the health and safety of employees while they are at work to technology to extract attribute information from unstructured injury
reduce losses and damages brought on by their work activities (Adaku reports and developed a system framework capable of automatically

2
X. Luo et al. Safety Science 163 (2023) 106138

processing textual injury reports, addressing the need for manual anal et al. (2021) fully exploits the potential information in heterogeneous
ysis of injury reports. Zou et al. (2017) used the vector space model and data in construction projects and develops an integrated analysis
semantic query expansion to build a framework for a construction framework (IAF) for identifying and assessing potential risk factors
project risk case retrieval system to improve the efficiency and perfor using ML methods and the concept of cascading effects. In the applica
mance of retrieving similar risk cases. Parinaz et al. (2021) used NLP and tion of accident prediction using unstructured text, Hingorani et al.
four machine learning (ML) algorithms (Naive Bayes, Logistic Regres (2020) developed a target reliability level prediction model for struc
sion, Random Forest and Extreme Gradient Boosting) to develop an tural components based on the type and use of buildings, and proposed
automatic identification system for construction contract reports, that to quantitatively distinguish the level of failure consequences according
not only retrieves the required contract text efficiently, but also quan to the number of victims and the degree of damage, which can be
titatively predicts the time and cost associated with report preparation, applied to the potential risk analysis of specific building structures. Kang
facilitating a smooth automated contract review and optimal reporting and Ryu (2019) used an RF model to predict occupational accident
process for decision-makers. In the application of risk identification types, constructing an occupational accident prediction model using
using accident texts, Zhang (2019) proposed a two-stage text mining Korean construction accident and weather data, and deriving key risk
method for construction accident cause classification, and evaluated the factors for different accident types.
performance of five ML classifiers for accident cause classification based Based on the aforementioned literature analysis, it is evident that
on the construction of a word embedding model as a specific corpus. Ma systematic research on the severity of occupational accidents has

Fig. 1. Framework flowchart.

3
X. Luo et al. Safety Science 163 (2023) 106138

produced a significant number of successful research findings against Mutual information is a useful information measure in information
the backdrop of ongoing attention to the governance of occupational theory, which represents the correlation between two sets of events. It
safety and health issues, but relatively few studies have combined text can be used to measure the degree of interdependence between words
data and machine learning algorithms to predict the severity of injuries when applied to text segmentation processing (Qian et al., 2020). In text
in construction accidents. Therefore, the goal of this paper is to improve analysis, the mutual information quantification formula can be
the performance of text mining and machine learning algorithms for expressed as:
estimating the severity of construction collapse accidents and efficiently ( )
p(x, y)
identify critical accident attribute factors to support the accident pre MI(x, y) = log (1)
p(x) × p(y)
vention mechanism of hidden danger management and risk control in
the construction industry.
where p(x) represents the probability of the character x appearing in the
text, p(y) represents the probability of the character y appearing in the
3. Materials and methods text, and p(x, y) represents the probability of the character x and y
appearing together in the text. The value of the MI(x, y) indicates the
As shown in Fig. 1, the framework flowchart of this study contains degree of correlation between x and y. When MI(x,y) > 0, it means that x
two main phases in general: data preprocessing and data analysis. The and y are related, and the larger the MI value, the greater the degree of
main purpose of the first stage is to convert the original accident text correlation. When MI(x,y) = 0, it means that x and y are independent of
data into a format suitable for computer learning analysis, and the main each other.
purpose of the second stage is to apply the RF algorithm to systemati
cally construct an accident severity prediction model and evaluate the (2) Information entropy.
model performance to explore model optimization strategies.
Information entropy is used to describe the degree of information
confusion, also known as the degree of uncertainty, and the higher the
3.1. Data preprocessing information entropy, the higher the degree of information uncertainty.
The concept of information entropy can be applied to measure the
3.1.1. Text segmentation amount of information that unknown events may contain, which plays
The first phase in text processing is text segmentation, which also an important role in data storage and transmission in the field of NLP
serves as the foundational module for information extraction. In addi (Ruan and Wan, 2018). In text processing, the values of the left and right
tion to assisting computers in understanding and processing natural information entropy of a string reflect the uncertainty of the character
language more effectively, reliable text segmentation results have a and its left and right adjacent characters. The greater the uncertainty
significant impact on the analysis and execution effectiveness of means that the adjacent characters contain more information and have a
following NLP tasks. The current implementation principles of textual higher probability of word formation. The candidate words can be
word segmentation processing are mainly divided into knowledge- expanded by the left and right information entropy, and the word
driven mechanical word segmentation and data-driven statistical word boundaries can be determined to form words with complete meaning.
segmentation (Zhang et al., 2020). The mechanical word segmentation The left and right information entropy can be expressed as:
algorithm, also known as the string matching segmentation algorithm, ∑
matches the strings that need to be matched with the pre-made dictio Leftinformationentropy : EL = − p(Wl |W)logp(Wl |W) (2)
naries in accordance with a certain strategy to improve the retrieval Wl ∈Sl

efficiency with the stored existing information. The statistical word ∑

segmentation algorithm uses the frequency of adjacent words along with Rightinformationentropy : ER = − p(Wr |W)logp(Wr |W) (3)
contextual lexical meaning to label words. When the frequency of Wr ∈Sr

adjacent co-occurring combination words in the statistical corpus is

where Sl represents the set of left adjacent words of character W, Sr
high, it can be determined that the words together form phrases, and this
represents the set of right adjacent words of character W, p(Wl |W) de
method of word division based on the theory of word frequency statistics
notes the conditional probability that Wl is a left adjacency of character
has better credibility, so the statistical word segmentation algorithm is
W, and p(Wr |W) denotes the conditional probability that Wr is a right
becoming more and more popular. The basic and straightforward
adjacency of character W.
sequence labeling concept used in the design and execution of the sta
The phrase extraction technology combines the statistics of mutual
tistical word segmentation algorithm based on the Hidden Markov
information and information entropy, starting from the tightness of the
Model (HMM) displays a significant reference value in lexical analysis
internal combination of the word strings and the boundary measure
research. In the choice of word segmentation tools, Jieba, SnowNLP,
ment of the left and right adjacent words, and finally presents the word
NLTK, HanLP, etc. are common word segmentation tools, among which
strings with high co-occurrence frequency information content in a
Jieba, which supports three word segmentation modes (exact mode, full
complete form, so as to improve the word formation accuracy to opti
mode and search engine mode), not only has the feature of simple and
mize the word segmentation process.
accurate use, but also can realize rich functions such as keyword
extraction and word position query. In order to identify suitable words
3.1.2. Data processing of accident attribute list
for this paper’s word segmentation, the Jieba word segmentation tool is
After word segmentation based on mutual information and infor
first used to segment words using statistical word segmentation methods
mation entropy for the text content of accident cause descriptions, the
based on the Hidden Markov (HMM) model. Then, to avoid the
statistical target domain words are extracted by setting the mutual in
incompleteness of domain-oriented dictionaries and the ineffectiveness
formation and the left and right information entropy thresholds, and the
of dictionary-based matching strategies (Yan et al., 2021), a combina
accident attribute list can be obtained according to the word frequency
tion of mutual information and information entropy is used to analyze
ranking results. Then, the high-frequency words are contextualized to
word frequency co-occurrence and semantic expansion of candidate
form a list of accident attribute terms to achieve the purpose of effec
words to automatically identify more valuable and complete word
tively identifying valuable information from unstructured data. The
separation results, improving the accuracy and efficiency of text infor
identified construction collapse accident attribute factors and accident
mation processing.
severity are data-processed, among which 49 accident attribute factors
belonging to disordered categorical variables are transformed into data
(1) Mutual information.

4
X. Luo et al. Safety Science 163 (2023) 106138

information using the “0-1′′ notation. The three accident severity cate the RF-based algorithm to create a construction collapse accident pre
gories belong to unordered categorical variables, with ”0′′ , “1′′ and ”2′′ diction model. The basic idea of the RF algorithm, which is an integrated
denoting the general, large and major accidents, respectively, and the learning technique with decision tree as the base learner, is to select and
final 264 data records are collected to provide data support for risk train a single decision tree using random samples and random features
assessment using machine learning algorithms. using bagging, and then combine the models of several simple classifiers
to improve the overall classification effect. As a result, the RF algorithm
3.1.3. Class imbalance processing based on oversampling technology has the properties of high prediction accuracy, fast training speed, and
The classes of the analyzed datasets are highly unbalanced, and the strong model generalization and interpretability (Ali et al., 2022).
number of large accident cases in construction collapse accidents is In this study, the RF risk assessment model was constructed mainly
respectively twice and 11 times higher than that of the general and from the collected data of 264 construction collapse accident reports,
major accident cases, and the class imbalance will adversely affect the and some features were randomly selected from the extracted attribute
learning process of underrepresented classes. To solve the presence of feature items to form a decision tree based on the principle of mini
class imbalance, undersampling and oversampling methods are often mizing the Gini coefficient, and the final prediction result was deter
used to process the samples. Since the undersampling methods usually mined by the plural of the output results of the m decision tree. The
remove some samples in the majority class, resulting in the loss of some research data were divided into training and test data in the ratio of 8:2,
information, it will also affect the performance of the study with a small and this division result had a crucial impact on the robustness of the
initial sample size. Therefore, the oversampling methods of expanding assessment model.
the original data set are often used to overcome the class imbalance.
Synthetic Minority Oversampling Technique (SMOTE) is a representa
3.3. Model optimization and performance evaluation
tive method of random oversampling, which is not direct reuse of mi
nority class samples, but generates a specified oversampling rate based
It is known from the construction of the RF algorithm that some
on the majority class sample dataset. The K-nearest neighbor (KNN)
attribute features will be chosen at random in the construction of the
algorithm is then used to synthesize new samples to oversample the
decision tree, and the critical attribute features can be identified by
minority class, thereby reducing the possibility of model overfitting and
using the feature evaluation method, so that the feature reduction
improving the generalization performance of machine learning tech
method can be utilized to improve the prediction performance of the
niques in applications (Pan et al., 2020). As an effective preprocessing
algorithm. Therefore, this subsection creates two contexts of crucial
technique for handling imbalanced data, SMOTE has been widely used
feature attribute prediction and hyperparameter tweaking based on the
in many fields such as medical cancer prediction, software defect pre
original prediction model to maximize the effectiveness. The RF model
diction, and corporate bankruptcy prediction (Feng et al., 2021).
needs to be generalized in order to reduce the issue of model underfitting
or overfitting, the use of K-fold cross-validation to be able to fit the data
3.1.4. Analysis of multicollinearity features
distribution uniformly, which helps in assessing the consistency level of
Multicollinearity refers to the presence of precisely or highly corre
the results for different random splits of the data, thus improving the
lated relationships among the explanatory variables in a linear regres
precision of the proposed models (Arteaga et al., 2020).
sion model, which makes the estimation results of the model distorted or
inaccurate, thus affecting the predictive ability of the model (Assaf and
3.3.1. Optimizing prediction models using features importance
Tsionas, 2021). When there is covariance in the selected attribute fea
The basic idea of using RF for feature importance assessment is to
tures, the independent variables provide overlapping information and
evaluate the contribution of each feature in the decision-making process,
the insignificant independent variables need to be removed to reduce
and the Gini index or out-of-bag (OOB) error is chosen as the quanti
the model estimation bias caused by multicollinearity. To measure the
tative index value to measure the contribution. The OOB error is the
presence of multicollinearity in the model feature variables, the
internal estimation method to monitor the prediction error of OOB after
Spearman correlation coefficients can be calculated to perform hierar
constructing a decision tree using randomly selected training samples,
chical clustering on rank-order correlations, and if the resulting corre
and then the value of OOB is calculated again after randomly displacing
lation coefficient exceeds a threshold, one feature attribute is retained
the variable observation values. The average value of the difference
from each cluster. The Spearman correlation coefficient, also known as
between the two OOB errors after standardization in all decision trees is
the rank correlation coefficient, does not focus on the distribution of the
the variable importance measure (VIM), which is used to measure the
original variables, but only performs a linear correlation analysis on the
importance of RF attribute features (Janitza et al., 2016). The VIM value
ranking values of the two variables, reflecting the direction of the
of the feature variable Xj is expressed as follows:
change and the strength of the correlation. The Spearman correlation
coefficient is expressed as: 1 ∑( )
VIM(Xj ) = ErrOOBji − ErrOOBi (5)
∑ Ntree i
6 di2
ρs = 1 − (4)
n(n2 − 1) where Ntree represents the number of decision trees, ErrOOBi repre
sents the number of error samples of out-of-bag data OOBi that the i-th
where di represents the difference between the rank values of the i-th
data pair, and n represents the number of data. decision tree responds to, and ErrOOBji represents the number of out-of-
bag error samples obtained by recalculating after random replacement
of feature variable Xj . The principle of this measurement process is to
3.2. Modeling of construction collapse accident prediction introduce random noise interference to the attribute features and mea
sure the change in out-of-bag accuracy. If the change is significant, it
Among the traditional machine learning techniques, the RF algo means that the feature item has a greater impact on the sample classi
rithm has demonstrated significant improvements in prediction model fication results, that is to say, it is more important.
performance and the capacity to reduce overfitting (Lourenco et al.,
2021; Poh et al., 2018). Additionally, the RF algorithm’s importance 3.3.2. Hyperparameter tuning of RF classifiers
measure of variables can calculate and rank the importance of high- Hyperparameter optimization in ML algorithms aims to find the
dimensional data features to construct a nonlinear integrated learning combination of hyperparameters that makes the algorithm perform the
model to obtain feature importance scores and obtain the key feature best on the validation dataset. Compared with other ML algorithms, RF
factors that affect the accuracy of the model, leading to the selection of has more hyperparameters, and the hyperparameters that need to be

5
X. Luo et al. Safety Science 163 (2023) 106138

optimized are the number of decision trees (n_estimators), the maximum predicted values in the confusion matrix with the true values can reveal
number of features (max_features), the maximum depth of the decision the distribution of incorrect predictions in different classes.
tree (max_ depth), minimum samples of leaf nodes (min_samples_leaf)
and minimum samples of splits (min_samples_split). The number of de (3) Receiver Operating Characteristic (ROC) curve.
cision trees determines the number of CART decision trees created. The
maximum number of features limits the number of features considered ROC curve is an important measurement tool for predictive analysis.
when finding the best split point. The maximum depth can roughly After calculating the value of true positive rate (TPR) and false positive
adjust the structure of the tree, and more leaf nodes mean more de rate (FPR), the characteristic curve is formed with the FPR as the hori
viations in the submodel. The minimum samples of leaf nodes and the zontal coordinate and the TPR as the vertical coordinate. This calcula
minimum samples of split allow can adjust the structure of the tree in tion method takes into account the classification ability of the classifier
more detail, and the lower the number of samples at the leaf nodes or the for positive and negative cases at the same time in the plotting process,
lower the number required for splitting, the worse the model stability eliminating the effect of unbalanced sample categories on the classifier,
(Bergstra and Bengio, 2012). In view of the complex and diverse and the classification accuracy can be easily and intuitively observed
hyperparameters, the automatic parameter tuning method of random from the figure.
search (RandomizedSearchCV) is used to adjust the hyperparameters
that affect the overall performance of the model to control the
complexity of the model and the time cost of learning. Different from the 3.4. Identify critical attribute factors
grid search method of GridSearchCV, random search finds near-optimal
hyperparameter combinations with fewer iterations by randomly sam Based on the optimization process of applying importance features to
pling in the parameter space. the RF prediction model, a feature importance ranking of attribute
factors of construction collapse accidents is created, indicating the
3.3.3. Performance evaluation relative relevance of each attribute factor. The results of the feature
For the construction collapse accident severity prediction model, the importance ranking of the attribute factors can be used as a valuable
three parameters of precision, recall and F1-score are used as data in reference for improving the prediction model, as well as to provide a
dicators to evaluate the model performance. In addition, two general clear causal analysis of the attribute factors that lead to the severity of
ized performance evaluation tools, confusion matrix and ROC curve, are accidents, analyze the process of workplace safety accidents, and pro
combined to achieve a complete evaluation of the model learning vide a theoretical foundation for creating an early warning system based
ability. on hidden danger management and risk classification.

(1) Data indicators. 4. Experiments and results

The formulae for calculating the precision, recall and F1-score are In this section, the construction collapse accident database is applied
expressed as follows: to validate the proposed theoretical framework and evaluate the per
TP formance of the prediction model. According to China’s regulations on
Precision = (6) reporting, investigation and handling of safety production accidents,
(TP + FP)
when an accident occurs, the industry competent department and su
TP pervisory department form an expert investigation team to conduct a
Recall = (7) detailed investigation and form an authoritative investigation and
(TP + FN)
handling report, making it possible to analyze the safety production
2 × Precision × Recall situation from textual data. Considering the representativeness and
F1 = (8)
Precision + Recall comprehensiveness, this study uses the investigation report of con
struction engineering collapse accidents in China from 2013 to 2021 as
where TP is called True Positive, indicating the number of positive
the database for accident severity risk assessment.
samples predicted to be true.FP is called False Positive, indicating the
The data in this study were mainly obtained from the Ministry of
number of negative samples predicted to be true.FN is called False
Emergency Management, the State Administration of Work Safety, the
Negative, indicating the number of positive samples predicted to be
Safety Management Network and the official websites of the municipal
false.
government (MOHURD, 2021; SAOWS, 2021; SMN, 2021), using Python
In terms of indicator meaning, precision is a measure of how accurate
language to crawl the construction collapse accidents reports from 2013
the positive predictions are, and recall is a measure of how many of the
to 2021. Since the collected accident texts had problems such as
actual positives the model can identify. Additionally, precision and
incomplete content records and overly simple descriptions, it was
recall frequently exhibit a negative association, with an increase in one
necessary to manually remove invalid case texts. In addition, the num
indicator sometimes being followed by a disproportionate decrease in
ber of particularly significant accident cases was too small (only
the other (Diez-Pastor et al., 2021). The harmonic average of the F1-
collected 2 cases), so the particularly significant accident cases were not
score, which considers both precision and recall, provides a more
counted in this accident severity risk assessment process to ensure the
thorough evaluation of the classifier’s performance, with higher F1
objectivity and standardization of the selected text data. Referring to the
values suggesting better classification outcomes.
China 2021 version of the production safety accident report and inves
tigation regulations, the accident level can be classified into four
(2) Confusion matrix.
severity levels based on the number of deaths, injuries and economic
losses. Finally, a total of 264 investigation reports on construction
The confusion matrix usually presents the prediction categories of
collapse accidents were collected, including 97 (36.74%) general acci
the model on a set of test data with known true categories in the form of
dents, 153 (57.95%) large accidents, and 14 (5.3%) major accidents. The
a table or graph, which can effectively demonstrate the classifier’s ac
text report mainly includes information on four aspects: accident process
curacy in a visual manner. The number of correct predicted by the
description, accident cause, accident responsibility determination, and
classification model is represented by the confusion matrix’s diagonal
preventive and corrective measures. This study focuses on the scientific
elements, while the number of incorrect number of predictions is rep
and rational extraction of the attribute factors list of construction
resented by the matrix’s non-diagonal elements, so comparing the
collapse accidents from the accident cause module.

6
X. Luo et al. Safety Science 163 (2023) 106138

4.1. Data pre-processing results only caused by one attribute factor, but breaks through the bottom line
of the system under the combined action of different factors, resulting in
4.1.1. Text segmentation results based on mutual information and different degrees of damage. Therefore, it is necessary to use binary
information entropy format to process the sample data, that is, if the accident corresponds to
Data preprocessing is performed using the Jieba toolkit in Python3. a specific attribute factor, the attribute factor will be assigned as “1′′ ,
First, preliminary text segmentation is performed on the accident cause otherwise, it will be assigned as ”0′′ . The 264 construction collapse
descriptions in the collected text reports. Since Chinese text pre investigation reports are transformed into a series of array records.
processing is slightly different from English text, the process does not Then, to overcome the loss of prediction accuracy caused by the class
require stemming extraction, word form reduction and case normali imbalance problem, the SMOTE algorithm is used to synthesize general
zation, but requires the removal of punctuation, numbers, spaces, and and major accident data in the imbalanced dataset, and a data set con
stop words. Then, phrase extraction techniques based on mutual infor taining 459 accident records are obtained.
mation and information entropy are applied to capture more complete The Spearman rank-order correlations are hierarchically clustered
attribute feature factors. A comparison of the text segmentation results using the scikit-learn module in order to better understand the multi
based on phrase extraction technology and TextRank algorithm is shown collinearity features among the attribute factors in the dataset. The
in Table 1. findings indicate that there is no substantial multicollinearity among the
It can be seen from the final text segmentation effect that the phrase features, and the maximum Spearman correlation value of 0.4372 does
extraction technology based on mutual information and information not exceed the allowable threshold. The heatmap of the correlation
entropy can not only obtain more complete and clear semantic words, coefficient of the features is plotted in Fig. 3. Therefore, it is confirmed
but also break away from the constraints and shackles brought by that the linear relationship between the subsystem attribute factors is
loading professional domain dictionaries, so as to provide a boost for not significant, and the existence of slight multicollinearity does not
more comprehensive mining of potential information in accident affect the model performance.
reports.
4.2. Severity outcome prediction model performance evaluation and
4.1.2. List of accident attribute factors optimization
A list of attribute terms for construction collapse accidents is pro
duced after contextualizing the text segmentation result. In the process This study evaluates the performance and optimizes the results of the
of presenting the attribute list, the list of attributes is integrated into a RF accident severity prediction model from three scenarios: (a) model
unified terminology form taking into account the inconsistent expres training and testing accuracy using all attribute factors, (b) model
sions of the same accident attribute features in various accident reports, training and testing accuracy combining critical attribute factors iden
which results in the phenomenon of different terms expressing the same tified by feature importance, and (c) hyperparameter tuning of the
semantic information. For example, “inadequate supervision” and “su developed RF model to further improve the model accuracy. In addition,
pervisory personnel performance” can be combined into “inadequate all experimental tests are conducted in Python 3.9, mainly calling the
supervisory performance”. Regarding the system engineering theory, packages of scikit-learn 1.0.2, pandas 1.3.4, NumPy 1.20.3, joblib 1.0.1
accident causation mechanism and comprehensive related research for text preprocessing, performance evaluation, feature importance
(Khalid et al., 2021), the identified attribute manifestations of con ranking and model optimization. Scipy 1.7.1 is used for data over
struction collapse accident are corresponded to human factors (HF), sampling imbalance learning and seaborn 0.11.2 package for the visu
facility factors (FF), environmental factors (EF) and management factors alization of experimental results. For model experiments using all
(MF) to obtain a list of 49 sub-categories of attribute factors, as shown in attributes, the main parameters are set to n_estimators = 20, oob_score
Fig. 2. The attribute factors list extracted and contextualized from the = True and Bootstrap = True. When selecting critical attribute factors, a
text reports of construction collapse accidents can more comprehen conditional setting of feature importance in the top 80% is added. To
sively and rationally describe the specific contexts leading to accidents prevent overfitting, 3-fold cross-validation is performed, and the per
of different severity, and identify the critical attribute factors that in formance indicators under the three experimental scenarios are finally
fluence the accident severity outcome. obtained as shown in Table 2.
It can be seen from the evaluation matrix of the experimental results
4.1.3. Processing of class imbalance data and analysis of multicollinearity that the prediction accuracy of the model using the critical attribute
features factors (0.77) is slightly lower than that of using all attribute factors
The collected textual information structure needs to be transformed (0.80), and the model after hyperparameter optimization has the highest
into a data structure to meet the data format requirements of the ma prediction accuracy (0.82). Therefore, the constructed RF classification
chine learning methods. It is noteworthy that each accident record is not model has good performance for identifying the severity of construction
collapse accidents. In addition, the random search results illustrate that
when the number of decision trees is 120, the maximum depth is 54 and
Table 1
the minimum number of node divisible samples is 5, the prediction
Comparison of two text segmentation effects.
model for the severity of construction collapse accidents has the highest
Method Display of partial keywords
out-of-bag accuracy, and the trained model reaches the optimal state.
TextRank Program, construction unit, design, organization, regulations, The prediction results under the three experimental scenarios eval
inspection, measures, implementation, cause, qualification, uated using the confusion matrix and ROC curves are shown in Figs. 4
structure, responsibility, technical, supervision, violation,
supervision, fulfillment, violation, quality, responsibility,
and 5. Analysis of the confusion matrix shows that (1) the RF algorithm
violation, hidden danger, violation has the best classification effect in the major accidents. The number of
Phrase Construction plan, technical disclosure, supervision and misclassifications of the major accidents occurs less in all three sce
extraction inspection, safety management personnel, inadequate safety narios, and the general accidents are easily misclassified as the large
management, construction permit, safety education and training,
accidents. (2) In the experimental scenario using all attribute factors,
weak safety awareness, hidden danger investigation, inadequate
supervision, illegal construction, illegal command, illegal most of the large accidents are wrongly classified as general accidents,
operation, supervisory personnel performance, subcontracting but most large accidents are misclassified as major accidents after using
unit management, inadequate safety supervision, safety the critical attribute factors. (3) The layout of the comprehensive
management system, protective measures, quality safety, safety confusion matrix shows that the classification results of construction
technical specifications
collapse accident severity are more concentrated on the diagonal line,

7
X. Luo et al. Safety Science 163 (2023) 106138

Fig. 2. List of attribute factors to construction collapse accident.

Table 2
Evaluation matrix under the three experimental scenarios.
Experiment accuracy severity precision recall F1-score

a 0 0.69 0.83 0.75

0.80 1 0.74 0.65 0.69
2 1.00 0.94 0.97
Mean 0.81 0.80 0.80

b 0 0.71 0.76 0.73

0.77 1 0.68 0.68 0.68
2 0.93 0.88 0.90
Mean 0.78 0.77 0.77

c 0 0.75 0.83 0.79

0.82 1 0.73 0.71 0.72
2 0.97 0.91 0.94
Mean 0.82 0.81 0.81

which indicates the rationality and validity of the prediction model. In

addition, the area enclosed by the ROC curve and the coordinate axis
under the three experimental scenarios once again proves that the pre
diction model after hyperparameter optimization has the highest accu
Fig. 3. Spearman correlation coefficient heat map of the features. racy (curve_3), the prediction model using all attribute factors has an
average accuracy (curve_1), and the prediction model using critical
attribute factors has the lowest accuracy (curve_2).

8
X. Luo et al. Safety Science 163 (2023) 106138

Fig. 4. Confusion matrix for accident severity prediction under three experimental scenarios.

4.3. Attribute importance evaluation and factor analysis construction industry are employee skill levels and regulatory attention.
However, neither the quality of equipment and material nor geological
To measure the influence of each attribute factor on the prediction environmental factors have a substantial negative impact on the severity
performance of the RF model, the feature variables in the out-of-bag of construction collapse accidents. The possible explanation for this
sample are ranked in terms of VIM scores based on the average differ phenomenon is that building construction involves less intricate, risky
ence in OOB errors after noise interference, and the feature importance underground work than subway construction and is less influenced by
ranking results of the attribute factors are drawn as shown in Fig. 6. the external environment of hydrology, geology, and adjacent struc
From the calculation results, the three elements that have the biggest tures. As a result, collapse accidents involving building construction are
effects on the severity of construction collapse accidents are Project less likely to be related to environmental factors, as well as equipment
management neglects safety (MF3), Failure to implement zonal safety and material conditions (Zhou et al., 2022).
supervision responsibility (MF2) and Worker without certificate (HF12). The severity of construction collapse accidents is found to be strongly
Previous literature has described the significant contribution of mana correlated with construction plan design and review, operator compe
gerial attention and competence to on-site safety production (Gunderson tency development and education, contractor qualification review, and
and Gloeckner, 2011), and governmental supervision of construction emergency planning and management, according to the results of the
safety has also been proved to be the best strategy to improve the safety characteristic importance ranking of the attribute factors. Notably, as
production situation (Chen et al., 2021). The survey results of con one of the innovative ways to integrate employee health and safety into
struction companies show that the most important factors affecting risk the entire project life cycle upfront, Design for Safety (DfS) has been
assessment are the work experience and educational background of the widely used as a key prevention strategy to reduce or eliminate occu
workers (Moshood et al., 2020). In addition, the findings of this study pational injuries in various geographical settings, and some scholars are
reconfirm that the primary contributing variables to accidents in the gradually focusing on the DfS legislative framework, implementation

9
X. Luo et al. Safety Science 163 (2023) 106138

barriers, designer competencies, and stakeholders’ knowledge, attitudes

and practice (Goh et al., 2017; Ibrahim et al., 2021). However, few of the
accident text reports collected involved the attributes of architects and
engineers who designed the project functions and solutions, indicating
that China, as one of the developing countries, has not given enough
attention to the application of DfS to improve OSH performance and
needs to provide more in-depth insights in the areas of DfS competency
development and education, knowledge and attitudes of key stake
holders, and the tracing of accident design factors.

5. Discussion

The introduction of AI technology into the workflow enables the

design of statistical models and the construction of decision support
systems based on a data-driven approach that not only captures the full
potential value contained in the data, but also drives structural trans
formation based on simulated complex scenarios, thus enabling the
introduction of AI technology into OSH management to improve
Fig. 5. ROC evaluation curves of the prediction model under three experi
workplace safety performance. In order to investigate the validity and
mental scenarios. reliability of the application of artificial intelligence in occupational
injury research, this paper proposes the use of RF machine learning al
gorithm to analyze the attribute factors obtained from unstructured
accident texts to achieve effective prediction of the severity of

Fig. 6. Feature importance ranking.

10
X. Luo et al. Safety Science 163 (2023) 106138

occupational accidents. The main goals of the current research are (i) to predicting the severity of occupational injury accidents by combining
enhance the process of data preprocessing for text reports, not only by automated text preprocessing and RF machine learning algorithm to
using phrase extraction techniques in place of loading domain dictio build a mapping relationship between feature attributes and accident
naries to enhance text segmentation, but also by utilizing SMOTE to severity. As an experimental case, the study utilizes text data from the
address the issue of class imbalance and achieve the automated mining third-most serious construction collapse accident among occupational
of valuable attribute factors. Furthermore, (ii) RF algorithm is applied to injuries to validate the performance of the prediction model and identify
the data preprocessed attribute components for accident severity pre critical attribute elements impacting occupational safety based on the
diction, and it is based on feature importance measurements and feature importance ranking results.
hyperparameter tuning to optimize the accident severity prediction The experimental results show that (1) compared with the TextRank
model. Finally, (iii) the critical attribute factors influencing occupa algorithm based on graph networks, the application of phrase extraction
tional safety accidents are determined using the findings of the feature technology based on mutual information and information entropy to
importance ranking. In summary, the present study is expected to hold preprocess textual accident reports can automatically identify the seg
good potential to contribute both in theoretical and practical aspects. mentation results that contain more valuable and complete information,
and lay the foundation for obtaining a term list containing 49 attribute
5.1. Theoretical contributions factors of construction collapse accident attributes after contextualized
processing of textual segmentation results. (2) The performance of the
The main goal of text-based historical accident information research RF algorithm-based construction collapse accident severity prediction
is to extract potential information from text data, which is often com models constructed in all three contexts performed well, with 77%, 80%
bined with text mining and ML algorithms for automated information and 82% accuracy of the prediction models using some critical attribute
extraction and scientific decision support. However, the problems of features, using all attribute features and hyperparameter optimization,
inaccurate word segmentation and class imbalance in the process of respectively, which not only confirmed the applicability and accuracy of
processing unstructured text restrict the in-depth study of text analysis RF in occupational injury accident prediction, but also demonstrated
tasks, so the two statistical concepts of mutual information and infor that process optimization can effectively improve the recognition ac
mation entropy are combined to improve the accuracy of text word curacy of the prediction model. (3) The feature importance measure
segmentation. And using oversampling technology to solve the problem ment results indicate that the emphasis placed on safety by project
of class imbalance, the whole process of data preprocessing is managers is the most influential attribute factor on the severity of
completely applicable to other fields. In addition, the high performance construction collapse accidents, in addition to government regulation,
shown by the prediction model indicates that occupational injury operator qualification, DfS and emergency response are also influential
severity is not purely random, but rather follows some intrinsic rules factors. There is an urgent need for project teams and governments to
that can be explored by ML methods to capture accident attribute uphold the concept of collaborative governance, learn from the lessons
features. of workplace accident cases, continuously update the safety manage
ment strategy, and strictly implement the safety production risk classi
5.2. Practical contributions fication and control, so as to provide valuable references for decision-
making to ensure OSH.
One of the best ways to understand the nature of a hazard event Finally, since the proposed prediction model framework based on
occurrence in order to improve organizational or industrial safety per systematic data preprocessing and RF algorithm has been validated only
formance is through safety risk analysis and assessment. In this study, a in construction collapse accident text reports, its application in other
combination of systematic data preprocessing and ML algorithms can fields needs further validation. To confirm the relevance and efficacy of
realize the prediction of the severity of construction collapse accidents, the model of this research framework, further work can begin with
which can provide decision support to decision-makers for project risk workplace safety and productivity risk assessment in other industries or
level assessment and proactive prevention strategy research. It is worth other accident types in the construction industry that does not include
noting that the prediction process in this study can be used as an collapse accident types. Additionally, the list of accident attributes can
important supplement and reference for risk assessment in the con be expanded by merging multi-sourced data to identify more accurate
struction industry, but the prediction outcomes cannot be taken as rules indicating the relationship between attribute factors and accident
precise forecasts. Additionally, as opposed to the earlier emphasis on severity because different data sources include different valuable
human unsafe behavior and natural environmental factors, the critical information.
accident attribute factors calculated by the RF algorithm should be paid
more attention. The analysis results show that safety design and emer CRediT authorship contribution statement
gency plan significantly affect safety performance. Among them, the risk
assessment in the design stage can be used as the source of hazard Xixi Luo: Writing – original draft. Xinchun Li: Writing – review &
prevention, while the emergency plan can be used as a barrier to reduce editing, Funding acquisition. Yang Miang Goh: Writing – review &
accident losses and become an effective means to promote the editing. Xuefeng Song: Writing – review & editing. Quanlong Liu: Data
improvement of the risk management capability of construction curation.
projects.
Data availability
6. Conclusion
Data will be made available on request.
Narrative accident texts have been widely used to trace the causal
factors of accidents and their process evolution mechanisms. However, Declaration of Competing Interest
manually analyzing and sharing massive amounts of text incident re
ports is a very time-consuming and error-prone process. In order to The authors declare that they have no known competing financial
automate the textual data processing and optimize the risk assessment of interests or personal relationships that could have appeared to influence
occupational production processes, the primary purpose of this research the work reported in this paper.
is to enhance the data pre-processing procedure and optimize the RF-
based risk assessment model to systematically predict the seriousness
of occupational accidents. This paper proposes a framework for

11
X. Luo et al. Safety Science 163 (2023) 106138

Acknowledgement Lu, J., Su, W., Jiang, M.Z., Ji, Y., 2022. Severity prediction and risk assessment for non-
traditional safety events in sea lanes based on a random forest approach. Ocean
Coast. Manag. 225, 106202.
This study is funded by the Fundamental Research Funds for the Lu, J.L., 2022. Mining safety and health in the philippines: occupational and
Central Universities (Grant no. 2020ZDPYSK02). environmental impacts. Safety and Health at. Work 13 (Jan.), 142.
Ma, G., Wu, Z., Jia, J., Shang, S., 2021. Safety risk factors comprehensive analysis for
construction project: combined cascading effect and machine learning approach. Saf.
References Sci. 143, 105410.
Ministry of Housing and Urban-Rural Development (MOHURD). 2021. <https://www.
Adaku, E., Ankrah, N.A., Ndekugri, I.E., 2021. Design for occupational safety and health: mohurd.gov.cn/ess/>(Accessed: Sep. 20st, 2021).
a theoretical framework for organisational capability. Saf. Sci. 133, 105005. Moshood, T.D., Adeleke, A.Q., Nawanir, G., Mahmud, F., 2020. Ranking of human
Ali, N.F.M., Sadullah, A.F.M., Abdul, A.P., Razman, M.A.M., Musa, R.M., 2022. The factors affecting contractors’ risk attitudes in the malaysian construction industry.
identification of significant features towards travel mode choice and its prediction Soc. Sci. Human. Open 2, 1–17.
via optimised random forest classifier: an evaluation for active commuting behavior. Mutlu, N.G., Altuntas, S., 2019. Risk analysis for occupational safety and health in the
J. Transp. Health 25, 101362. textile industry: Integration of FMEA, FTA, and BIFPET methods. Int. J. Ind. Ergon.
Antwi-Afari, M.F., Li, H., Anwer, S., Yevu, S.K., Wu, Z.Z., 2020. Quantifying workers’ gait 72 (Jul.), 222–240.
patterns to identify safety hazards in construction using a wearable insole pressure Pan, T.T., Zhao, J.H., Wu, W., Yang, J., 2020. Learning imbalanced datasets based on
system. Saf. Sci. 129, 104855. SMOTE and Gaussian distribution. Inf. Sci. 512, 1214–1233.
Arteaga, C., Paz, A., Park, J.W., 2020. Injury severity on traffic crashes: a text mining Parinaz, J., Hattab, M.A., Emad, M., 2021. Automated extraction and time-cost
with an interpretable machine-learning approach. Saf. Sci. 132, 104988. prediction of contractual reporting requirements in construction using natural
Assaf, A.G., Tsionas, M., 2021. A bayesian solution to multicollinearity through language processing and simulation. Appl. Sci. 11 (13), 6188.
unobserved common factors. Tour. Manag. 84, 104277. Poh, C.Q.X., Ubeynarayana, C.U., Goh, Y.M., 2018. Safety leading indicators for
Baker, H., Hallowell, M.R., Tixier, J.P., 2020. Ai-based prediction of independent construction sites: a machine learning approach. Autom. Constr. 93 (Sep), 375–386.
construction safety outcomes from universal attributes. Autom. Constr. 118, 103146. Qian, W., Huang, J., Wang, Y., Shu, W., 2020. Mutual information-based label
Beckert, A.N., Barros, V.G., 2022. Waste management, COVID-19 and occupational safety distribution feature selection for multi-label learning. Knowl.-Based Syst. 195 (5),
and health: challenges, insights and evidence. Sci. Total Environ. 831, 154862. 105684.
Bergstra, J., Bengio, Y., 2012. Random search for hyper-parameter optimization. Ruan, X.C., Wan, D.S., 2018. An information entropy-based data preprocessing
J. Mach. Learn. Res. 13 (1), 281–305. technique. Microelectron. Comput. 35 (2), 5.
Chebila, M., 2020. Predicting the consequences of accidents involving dangerous Safety Management Network (SMN), 2021. <https://www.safehoo.com/Manage/>
substances using machine learning. Ecotoxicol. Environ. Saf. 208, 111470. (Accessed: Sep. 22th, 2021).
Chen, X., Xiang, L., Ren, Y., Cui, J., 2021. A study on the benefit distribution of multi- Sarkar, S., Maiti, J., 2020. Machine learning in occupational accident analysis: a review
level safety supervision for construction projects based on long-term cooperation. using science mapping approach with citation network analysis. Saf. Sci. 131,
J. Saf. Environ. 3, 1151–1157. https://doi.org/10.13637/j.issn.1009- 104900.
6094.2020.0092. Song, L., Yang, L., 2021. Governance innovation of occupational safety and health in
Diez-Pastor, J.F., Veiga, F., Bustillo, A., 2021. High-accuracy classification of thread China in the context of Industry 4.0. Ind. Safety Environ. Protect. 47 (9), 79–82.
quality in tapping processes with ensembles of classifiers for imbalanced learning. https://doi.org/10.3969/j.issn.1001-425X.2021.09.020.
Measurement 168 (15), 108328. State Administration of Work Safety (SAOWS). 2021. <https://www.mem.gov.cn/was5/
Feng, S., Keung, J., Yu, X., Xiao, Y., Zhang, M., 2021. Investigation on the stability of web/> (Accessed: Sep. 20th, 2021).
smote-based oversampling techniques in software defect prediction. Inf. Softw. Stemn, E., Krampah, F., 2022. Injury severity and influence factors in surface mines: a
Technol. 139 (6), 106662. correspondence analysis. Saf. Sci. 145, 105495.
Goh, Y.M., Guo, B.H.W., Toh, Y.Z., 2017. Knowledge, attitude, and practice of design for Tamascelli, N., Solini, R., Paltrinieri, N., Cozzani, V., 2022. Learning from major
safety: multiple stakeholders in the singapore construction industry. J. Constr. Eng. accidents: a machine learning approach. Comput. Chem. Eng. 162, 107786.
Manag. 143 (5), 4016131. Thorvaldsen, T., Kongsvik, T., Holmen, I.M., Storkersen, K., Salomonsen, C., 2020.
Goh, Y.M., Ubeynarayana, C.U., Wong, K.L.X., Guo, B.H., 2018. Factors influencing Occupational health, safety and work environments in Norwegian fish farming -
unsafe behaviors: a supervised learning approach. Accid. Anal. Prev. 118, 77–85. employee perspective. Aquaculture 524 (15), 735238.
Gunderson, D.E., Gloeckner, D., 2011. Superintendent competencies and attributes Tixier, J.P., Hallowell, M.R., Rajagopalan, B., Bowman, D., 2016. Automated content
required for success: a national study comparing construction professionals’ analysis for construction safety: a natural language processing system to extract
opinions. Int. J. Constr. Educ. Res. 7 (4), 294–311. precursors and outcomes from unstructured injury reports. Autom. Constr. 62 (Feb.),
Guzman, J., Recoco, G.A., Pandi, A.W., Padrones, J.M., Ignacio, J.J., 2022. Evaluating 45–56.
workplace safety in the oil and gas industry during the COVID-19 pandemic using Wang, L., Yang, Z., 2018. Bayesian network modelling and analysis of accident severity
occupational health and safety Vulnerability Measure and partial least square in waterborne transportation: a case study in china. Reliab. Eng. Syst. Saf. 180
Structural Equation Modelling. Cleaner. Eng. Technol. 6 (Feb.), 100378. (DEC.), 277–289.
Hingorani, R., Tanner, P., Prieto, M., Lara, C., 2020. Consequence classes and associated Xu, N., Ma, L., Liu, Q., Wang, L., Deng, Y.L., 2021. An improved text mining approach to
models for predicting loss of life in collapse of building structures. Struct. Saf. 85, extract safety risk factors from construction accident reports. Saf. Sci. 138 (8),
101910. 105216.
Ibrahim, C.K.I.C., Belayutham, S., Mohammad, M.Z., 2021. Prevention through design Xu, Q.W., Chong, H.Y., Liao, P.C., 2019. Exploring eye-tracking searching strategies for
(ptd) education for future civil engineers in malaysia: current state, challenges, and construction hazard recognition in a laboratory scene. Saf. Sci. 120, 824–832.
way forward. J. Civil Eng. Educ. 147 (1), 05020007. Yan, X., Xiong, X., Cheng, X., 2021. HMM-BiMM: hidden markov model-based word
International Labor Organization (ILO), 2021, Cases of Fatal Occupational Injury by segmentation via improved bi-directional maximal matching algorithm. Comput.
Economic Activity. <https://www.ilo.org/shinyapps/bulkexplorer42/?lang=en&se Electr. Eng. 94, 107354.
gment=indicator&id=INJ_FATL_ECO_NB_A> (April 10, 2022). Yang, K., Ahn, C.R., 2019. Inferring workplace safety hazards from the spatial patterns of
Janitza, S., Tutz, G., Boulesteix, A.L., 2016. Random forest for ordinal responses: workers’ wearable data. Adv. Eng. Inf. 41, 100924.
prediction and variable selection. Comput. Stat. Data Anal. 96, 57–73. Zhang, F., 2019. A hybrid structured deep neural network with word2vec for
Kang, K., Ryu, H., 2019. Predicting types of occupational accidents at construction sites construction accident causes classification. Int. J. Constr. Manag. 4, 1–21.
in Korea using random forest model. Saf. Sci. 120, 226–236. Zhang, Y., Li, Y., Wang, R., Lu, J., Ma, X., Qiu, M., 2020. PSAC: proactive sequence-aware
Khalid, U., Sagoo, A., Benachir, M., 2021. Safety management system (SMS) framework content caching via deep learning at the network edge. IEEE Trans Netw Sci Eng 7
development – mitigating the critical safety factors affecting health and safety (4), 2145–2154.
performance in construction projects. Saf. Sci. 143, 105402. Zhou, Z.P., Goh, Y.M., Shi, Q.Q., Qi, H.N., Liu, S., 2022. Data-driven determination of
Liu, R., Liu, Z., Liu, H.C., Shi, H., 2021. An improved alternative queuing method for collapse accident patterns for the mitigation of safety risks at metro construction
occupational health and safety risk assessment and its application to construction sites. Tunn. Undergr. Space Technol. 127, 104616.
excavation. Autom. Constr. 126 (Jun.), 103672. Zorzenon, R., Lizarelli, F.L., Moura, D.B., 2022. What is the potential impact of industry
Lourenco, P., Godinho, S., Sousa, A., Goncalves, A.C., 2021. Estimating tree aboveground 4.0 on health and safety at work? Saf. Sci. 153, 105802.
biomass using multispectral satellite-based data in mediterranean agroforestry Zou, Y., Kiviniemi, A., Jones, S.W., 2017. Retrieving similar cases for construction
system using random forest algorithm. Rem. Sens. Appl.: Soc. Environ. 23, 100560. project risk management using natural language processing techniques. Autom.
Constr. 80, 66–76.

Machine Learning in Occupational Accident Analysis A Review Using Science Mapping Approach With Citation Network Analysis
No ratings yet
Machine Learning in Occupational Accident Analysis A Review Using Science Mapping Approach With Citation Network Analysis
25 pages
Analysis and Prediction of Industrial Accidents Using Machine Learning
50% (2)
Analysis and Prediction of Industrial Accidents Using Machine Learning
39 pages
1 s2.0 S2667305323000947 Main
No ratings yet
1 s2.0 S2667305323000947 Main
20 pages
Deep Learning For Safety in Construction
No ratings yet
Deep Learning For Safety in Construction
12 pages
R Data Analysis Projects PDF
No ratings yet
R Data Analysis Projects PDF
354 pages
The AI Revolution in Networking
No ratings yet
The AI Revolution in Networking
221 pages
Inteligencia Artificial
No ratings yet
Inteligencia Artificial
24 pages
For Peer Review Only: The Perspective of Ergonomic Study of On-Road and Off-Road Accidents
No ratings yet
For Peer Review Only: The Perspective of Ergonomic Study of On-Road and Off-Road Accidents
11 pages
File
No ratings yet
File
40 pages
ICEF 2020 Keynote Prith Banerjee
No ratings yet
ICEF 2020 Keynote Prith Banerjee
23 pages
Journal of Safety Research: S.R. Dindarloo, J. Pollard, E. Siami-Irdemoos
No ratings yet
Journal of Safety Research: S.R. Dindarloo, J. Pollard, E. Siami-Irdemoos
9 pages
GOVIDAN-Development of AI-based Ergonomics Risk Assessment Tools For Harmonization of Industrial Work Systems
No ratings yet
GOVIDAN-Development of AI-based Ergonomics Risk Assessment Tools For Harmonization of Industrial Work Systems
164 pages
Sustainability 13 01102 v2
No ratings yet
Sustainability 13 01102 v2
11 pages
(Asce) Co 1943-7862 0001708
No ratings yet
(Asce) Co 1943-7862 0001708
17 pages
Analyzing Arizona OSHA - 2016
No ratings yet
Analyzing Arizona OSHA - 2016
6 pages
Paper 7
No ratings yet
Paper 7
15 pages
What Is The Potential Impact of Industry 4.0 On Health and Safety at Work
No ratings yet
What Is The Potential Impact of Industry 4.0 On Health and Safety at Work
27 pages
Addressing Diverse Petroleum Industry Problems Using Machine Learning Techniques - Literary Methodology-Spotlight On Predicting Well Integrity Failures - PMC
No ratings yet
Addressing Diverse Petroleum Industry Problems Using Machine Learning Techniques - Literary Methodology-Spotlight On Predicting Well Integrity Failures - PMC
29 pages
Sustainability 14 06126 With Cover
No ratings yet
Sustainability 14 06126 With Cover
16 pages
12209-Article Text-74805-2-10-20220825
No ratings yet
12209-Article Text-74805-2-10-20220825
12 pages
Sensors: Smart Helmet 5.0 For Industrial Internet of Things Using Artificial Intelligence
No ratings yet
Sensors: Smart Helmet 5.0 For Industrial Internet of Things Using Artificial Intelligence
27 pages
Loretta Sabs e
No ratings yet
Loretta Sabs e
32 pages
1 s2.0 S092658051930264X Main
No ratings yet
1 s2.0 S092658051930264X Main
7 pages
Risk Prediction and Factors Risk Analysis Based On IFOA-GRNN and Apriori Algorithms: Application of Artificial Intelligence in Accident Prevention
No ratings yet
Risk Prediction and Factors Risk Analysis Based On IFOA-GRNN and Apriori Algorithms: Application of Artificial Intelligence in Accident Prevention
17 pages
19175-Article Text-77192-2-10-20230822
No ratings yet
19175-Article Text-77192-2-10-20230822
10 pages
Road Accident Analysis and Prediction Model Using A Data Mining Hybrid Technique
No ratings yet
Road Accident Analysis and Prediction Model Using A Data Mining Hybrid Technique
7 pages
Automated Quanti Cation of Constructionworkers Exposure To Falling Object Hazards
No ratings yet
Automated Quanti Cation of Constructionworkers Exposure To Falling Object Hazards
16 pages
Safety - SEMINAR REPORT
No ratings yet
Safety - SEMINAR REPORT
40 pages
Detecting Interesting and Anomolous Patterns in Multivariate Time Series
No ratings yet
Detecting Interesting and Anomolous Patterns in Multivariate Time Series
13 pages
Machine Learning in Occupational Safety and Health - A Systematic
No ratings yet
Machine Learning in Occupational Safety and Health - A Systematic
19 pages
Computer Vision-Based Monitoring Method of Non-Wearing Helmet Events Using Face Recognition
No ratings yet
Computer Vision-Based Monitoring Method of Non-Wearing Helmet Events Using Face Recognition
20 pages
Safety - SEMINAR REPORT
No ratings yet
Safety - SEMINAR REPORT
42 pages
Buildings 11 00409
No ratings yet
Buildings 11 00409
27 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
Construction Site Hazards Identification Using Dee
No ratings yet
Construction Site Hazards Identification Using Dee
19 pages
Paper 10736
No ratings yet
Paper 10736
6 pages
NLP Construction
No ratings yet
NLP Construction
16 pages
1 s2.0 S0926580523004879 Main
No ratings yet
1 s2.0 S0926580523004879 Main
14 pages
10 1111@risa 13425
No ratings yet
10 1111@risa 13425
21 pages
Safety Measure Detection Using Deep Learning
No ratings yet
Safety Measure Detection Using Deep Learning
8 pages
Fin Irjmets1641023646
No ratings yet
Fin Irjmets1641023646
4 pages
Wearable Sensors and Artificial Intelligence For Physical Ergonomics A Systematic Review of Literature-Study-V-0
No ratings yet
Wearable Sensors and Artificial Intelligence For Physical Ergonomics A Systematic Review of Literature-Study-V-0
21 pages
Conference Paper IEEE
No ratings yet
Conference Paper IEEE
7 pages
A Novel Implementation of An AI-Based Smart Constr
No ratings yet
A Novel Implementation of An AI-Based Smart Constr
14 pages
10 1108 - Ci 04 2023 0062
No ratings yet
10 1108 - Ci 04 2023 0062
28 pages
Automation in Construction: Fan Zhang, Hasan Fleyeh, Xinru Wang, Minghui Lu
No ratings yet
Automation in Construction: Fan Zhang, Hasan Fleyeh, Xinru Wang, Minghui Lu
11 pages
2025-SafeDay English Compressedfile
No ratings yet
2025-SafeDay English Compressedfile
36 pages
An Overview of The Application of Machin
No ratings yet
An Overview of The Application of Machin
15 pages
Road Accident Analysis Using Machine Learning
No ratings yet
Road Accident Analysis Using Machine Learning
7 pages
2025 Safeday 1744731094305
No ratings yet
2025 Safeday 1744731094305
36 pages
Oil Gas Equipments Failuremode
No ratings yet
Oil Gas Equipments Failuremode
9 pages
Collection of Best Quotation From Many Sources
No ratings yet
Collection of Best Quotation From Many Sources
5 pages
1 s2.0 S0926580524001936 Main
No ratings yet
1 s2.0 S0926580524001936 Main
15 pages
Lenovo AI Playbook
No ratings yet
Lenovo AI Playbook
18 pages
Introducation To Ai 1st Unit Notes
No ratings yet
Introducation To Ai 1st Unit Notes
19 pages
New Safety Trends
No ratings yet
New Safety Trends
13 pages
Robotics 13 00031 v2
No ratings yet
Robotics 13 00031 v2
16 pages
Data Mining in Occupational Safety and Health A Sy
No ratings yet
Data Mining in Occupational Safety and Health A Sy
16 pages
Pilskog Orvik 2024 IOP Conf. Ser. Earth Environ. Sci. 1389 012012
No ratings yet
Pilskog Orvik 2024 IOP Conf. Ser. Earth Environ. Sci. 1389 012012
14 pages
Automated Evaluation of Unsafe Working Postures in Lifting and Carrying Heavy Objects in Construction Using A CNN Deep Learning Model
No ratings yet
Automated Evaluation of Unsafe Working Postures in Lifting and Carrying Heavy Objects in Construction Using A CNN Deep Learning Model
12 pages
10 1016@j Autcon 2019 102974
No ratings yet
10 1016@j Autcon 2019 102974
14 pages
Ishaleku Chap 1 Correction
No ratings yet
Ishaleku Chap 1 Correction
8 pages
AI Advertising An Overview Guidelines
No ratings yet
AI Advertising An Overview Guidelines
15 pages
Psychology and Marketing - 2023 - Oc - Luxury Is What You Say Analyzing Electronic Word of Mouth Marketing of Luxury
No ratings yet
Psychology and Marketing - 2023 - Oc - Luxury Is What You Say Analyzing Electronic Word of Mouth Marketing of Luxury
16 pages
B.sc. (Artificial Intelligence and Machine Learning) - 03102024
No ratings yet
B.sc. (Artificial Intelligence and Machine Learning) - 03102024
36 pages
IFDS An Intelligent Fault Diagnosis System With Multisource Unsupervised Domain Adaptation For Different Working Conditions
No ratings yet
IFDS An Intelligent Fault Diagnosis System With Multisource Unsupervised Domain Adaptation For Different Working Conditions
10 pages
Document 1
No ratings yet
Document 1
4 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Master Thesis
No ratings yet
Master Thesis
151 pages
MIT R For Machine Learning
No ratings yet
MIT R For Machine Learning
9 pages
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
No ratings yet
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
312 pages
DeepSeek Unlocked - Tavian F Draven
No ratings yet
DeepSeek Unlocked - Tavian F Draven
131 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
PAD Report
No ratings yet
PAD Report
13 pages
Deepfakes (Nina Schick)
No ratings yet
Deepfakes (Nina Schick)
155 pages
Project Report - Hangman
No ratings yet
Project Report - Hangman
18 pages
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
No ratings yet
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
25 pages
SSRN Id4713111
No ratings yet
SSRN Id4713111
60 pages
1 s2.0 S2162098924000859 Main
No ratings yet
1 s2.0 S2162098924000859 Main
36 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Skin Cancer Prediction Using Deep Learning Technique
No ratings yet
Skin Cancer Prediction Using Deep Learning Technique
57 pages
2022 WOM Proceedings
No ratings yet
2022 WOM Proceedings
44 pages
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
No ratings yet
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
21 pages
Karanchakravarthy@gmail
No ratings yet
Karanchakravarthy@gmail
8 pages
A Digital Guide For SC Disruptions
No ratings yet
A Digital Guide For SC Disruptions
12 pages
Qubo Frormulation
No ratings yet
Qubo Frormulation
7 pages
Vaibhav Gupta Data Analyst
No ratings yet
Vaibhav Gupta Data Analyst
1 page
AI-Driven Fortresses: Boosting Resilience in Critical Infrastructure Operations: AI Strategies for Enhancing Critical Infrastructure Resilience in Operations Management
From Everand
AI-Driven Fortresses: Boosting Resilience in Critical Infrastructure Operations: AI Strategies for Enhancing Critical Infrastructure Resilience in Operations Management
Alberto De Miranda
No ratings yet
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
From Everand
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
BoD - Books on Demand
No ratings yet

1 s2.0 S0925753523000802 Main

Uploaded by

1 s2.0 S0925753523000802 Main

Uploaded by

Safety Science 163 (2023) 106138

Contents lists available at ScienceDirect

Application of machine learning technology for occupational accident

Fig. 1. Framework flowchart.

efficiency with the stored existing information. The statistical word ∑

adjacent co-occurring combination words in the statistical corpus is

(1) Data indicators. 4. Experiments and results

Fig. 2. List of attribute factors to construction collapse accident.

a 0 0.69 0.83 0.75

b 0 0.71 0.76 0.73

c 0 0.75 0.83 0.79

which indicates the rationality and validity of the prediction model. In

barriers, designer competencies, and stakeholders’ knowledge, attitudes

The introduction of AI technology into the workflow enables the

Fig. 6. Feature importance ranking.

You might also like