Loretta Sabs e
Loretta Sabs e
https://www.emerald.com/insight/2046-6099.htm
Smart and
Unravelling incipient accidents: Sustainable Built
Environment
a machine learning prediction
of incident risks in
highway operations
Loretta Bortey Received 17 August 2024
Revised 15 September 2024
Infrastructure Futures Research Group, Birmingham City University, Accepted 18 September 2024
Birmingham, UK
David J. Edwards
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK and
Faculty of Engineering and the Built Environment, University of Johannesburg,
Johannesburg, South Africa
Chris Roberts
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK and
Quantity Surveying Department, Nelson Mandela University – South Campus,
Port Elizabeth, South Africa, and
Iain Rillie
Highways England Company Limited Birmingham, Birmingham, UK and
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK
Abstract
Purpose – Safety research has focused on drivers, pedestrians and vehicles, with scarce attention given to highway
traffic officers (HTOs). This paper develops a robust prediction model which enables highway safety authorities to
predict exclusive incidents occurring on the highway such as incursions and environmental hazards, respond
effectively to diverse safety risk incident scenarios and aid in timely safety precautions to minimise HTO incidents.
Design/methodology/approach – Using data from a highway incident database, a supervised machine learning
method that employs three algorithms [namely Support Vector Machine (SVM), Random Forests (RF) and Naı€ve
Bayes (NB)] was applied, and their performances were comparatively analysed. Three data balancing algorithms were
also applied to handle the class imbalance challenge. A five-phase sequential method, which includes (1) data
collection, (2) data pre-processing, (3) model selection, (4) data balancing and (5) model evaluation, was implemented.
Findings – The findings indicate that SVM with a polynomial kernel combined with the Synthetic Minority
Over-sampling Technique (SMOTE) algorithm is the best model to predict the various incidents, and the
Random Under-sampling (RU) algorithm was the most inefficient in improving model accuracy. Weather/
visibility, age range and location were the most significant factors in predicting highway incidents.
Originality/value – This is the first study to develop a prediction model for HTOs and utilise an incident
database solely dedicated to HTOs to forecast various incident outcomes in highway operations. The prediction
model will provide evidence-based information to safety officers to train HTOs on impending risks predicted by
the model thereby equipping workers with resilient shocks such as awareness, anticipation and flexibility.
Keywords Sustainable development goal 8.8, Safety risk, Predictive modelling, Machine learning,
Government policy
Paper type Research paper
Smart and Sustainable Built
Environment
The authors wish to thank National Highways (a UK Government company reporting to the © Emerald Publishing Limited
2046-6099
Department for Transport) for funding and supporting this research. DOI 10.1108/SASBE-08-2024-0316
SASBE 1. Introduction
In the ever-evolving landscape of road and transportation operations, ensuring the safety
and well-being of highway workers is paramount (Eseonu et al., 2018). Given the complex
nature of operations conducted on the highway and the close proximity to live traffic,
highway workers are prone to encounter various health and safety (H&S) risks, which could
result in the occurrence of incidents, accidents or injuries (Strain et al., 2022). Highway traffic
incidents, e.g. incursions and the emission of hazardous substances (Rogovski et al., 2021),
necessitate prompt and precise reactions from all personnel responsible for highway safety.
Over the last decade in the UK, 12 road employees (including two highway traffic officers
(HTOs)) have died on the road network, according to National Highways (formerly
Highways England, a UK Government company serving the UK Government’s Department
of Transport), with three workers being run over by drivers (GOV.UK, 2017).
To augment safety, judicious identification of event and/or incident categories likely to
occur on the highway is critical for deploying suitable resources and optimising traffic flow
(Eseonu et al., 2018). It is imperative that corporate workplace H&S teams can recognise and
understand possible safety incident infractions that could be detrimental to workplace safety
before highway personnel are deployed to work zones (Goh and Soon, 2014). For HTOs,
traditional statistical methods are currently being used to better understand and mitigate
risk after they have occurred, making it a reactive approach to safety risk management
(Bortey et al., 2021). However, traditional statistical methods of incident classification are
incapable of handling the voluminous incident data generated and/or the complexities of
modern highway operations (George and Hautier, 2021). Ajayi et al. (2020) espoused that
machine learning (ML) provides new tools for overcoming issues that conventional statistical
methods are inadequate for. For example, ML algorithms can capture the nuances of the
multifaceted and uncertain nature of incidents (Zhang et al., 2019; Eslamirad et al., 2020). The
innate ability of ML to discern intricate patterns when processing large volumes of data is a
desirable quality that makes it preferable for the present study (Dai et al., 2019; Fernando
et al., 2024). Therefore, the research question for this present research is as follows: “Can ML
accurately predict the risk of incipient accidents occurring involving HTOs?”
Through rigorous experimentation and analysis of primary data, this paper develops a
robust predictive framework which will enable highway authorities to prioritise resources,
respond effectively to the diverse safety scenarios and enhance proof-driven safety
precautions to minimise HTO incidents. Such a framework will be particularly useful during
safety risk assessments meetings conducted prior to HTOs resuming operations on the
highway (Hegde and Rokseth, 2020). Safety officers can then educate and train HTOs on
impending risks predicted by the model, thereby equipping workers with resilient shocks
such as awareness of risks (Tong et al., 2015; Shirali et al., 2018), anticipation of the risks and
response and flexibility to make important safety decisions (Woods and Wreathall, 2003).
Three ML algorithms were explored, namely Random Forests (RF), Naı€ve Bayes (NB) and
Support Vector Machine (SVM) classifiers, by leveraging their inherent strengths to enhance
the validity, veracity and efficiency of incident classification (Tixier et al., 2016; Zhang et al.,
2019; Koc et al., 2022). Associated objectives are threefold, namely to (1) present a
comparative analysis of SVM kernels by evaluating how well they performed using metrics
such as precision, accuracy score, recall and F1-score; (2) apply original data to three ML
algorithms (i.e. SVM, RF and NB) and evaluate these based on a comparative analysis of the
overall accuracy score of each model and the precision, recall and F1-score of each class for
the target variable and (3) apply balancing algorithms to the original data and compare their
performance to that of the unbalanced (UB) data. This study departs from existing literature
which only focuses on specific incidents such as accidents, injuries or their severity (Cheng
et al., 2012; Chiang et al., 2018). Instead, it assesses the feasibility of employing different ML
models to predict several safety risk incidents likely to occur during highway operations.
This is the first study which utilises an incident database solely dedicated to highway Smart and
workers to forecast incident outcomes using myriad variables and focuses on predictive Sustainable Built
models tailored for HTOs. Environment
SASBE
ML models
SRIM area Objective References Regression Classification Clustering Reinforcement Association
Preventive The goal is to find and apply Tixier et al. (2016), Zhang DNN, NB, RF, DT,
measures measures to prevent safety et al. (2019), Ajayi et al. MARS, KNN and SGTB
occurrences. Hazard identification, (2020), Uma and Eswari GLM
safety protocols, safety training, (2022), Oyedele et al.
equipment maintenance, and (2021)
engineering controls are some of the
issues that may be covered. The
purpose is to reduce the likelihood of
incidents occurring proactively
Incident This field delves into methods that Mirabadi and Sharifian LAD and DNN CART GRI;
investigation can be applied for incident (2010), Goh and Chua Apriori
and analysis investigations and analysis. It aims (2013), Jocelyn et al.
to understand what went wrong and (2018), Hegde and
why, investigation of triggers, Rokseth (2020), Weng
contributing factors, and root et al. (2016)
causes of safety events are
conducted. The findings can be used
to prevent future situations like this
Emergency This area explores emergency Zagorecki et al. (2013), ANN, SVM, DT K-means RLM
response and response strategies, processes, and Lopez et al. (2018) and KNN
management approaches. It involves the study of
the effectiveness of reaction teams,
evacuation plans, communication
protocols, and resource coordination
during and after safety crises
(continued )
ML models
SRIM area Objective References Regression Classification Clustering Reinforcement Association
Regulatory This section investigates the Rachman and Ratnayake KNN, SVM,
compliance and compliance of safety legislation, (2019) GBDT. RF and
standards standards, and best practises. It AB
probes whether organisations and
sectors follow predefined safety
guidelines and assess the efficiency
of current regulations in decreasing
safety risks
Human factors Human behaviour and variables like Patel and Jha (2015), Goh SVM, KNN, RF
and behavioural as training, motivation, and et al. (2018) ANN, LR and
aspects communication are frequently NB
influencing incidents pertaining to
safety. This area therefore examines
the impact of human factors in
safety events and investigates
methods to improve human
behaviour and safety decision-
making
Source(s): Table by authors’
Sustainable Built
Environment
Smart and
Table 1.
SASBE regression and genetic algorithm analysis to predict occupational risk in the ship-building
industry. Similarly, classification and regression trees have been utilised in exploring
contributing factors to occupational risk in the Taiwan construction industry (Cheng et al.,
2012). Another limitation of past ML models utilised resides in the usage of perception type
data (often gathered from questionnaires or interviews), which may not necessarily reflect
reality (cf. Abbasianjahromi and Aghakarimi, 2023; Guill�en Perales et al., 2024). Table 1
details research undertaken with ML models in the safety risk management areas, and
notably, classification is the most common approach to safety risk prediction, albeit most
studies combined different ML models.
3. Methodology
A positivist philosophical stance (Park et al., 2020) couched within an abductive approach
(Posillico, 2023) was adopted for this study. Within contemporary safety management
literature, such an approach has been extensively used to, for example, build an ML
predictive model based on national data for fatal accidents of construction workers (Choi
et al., 2020) and measure hand-arm vibration exposure risk in the UK utilities sector
(Edwards et al., 2020). Although it is through scientific means that positivist philosophies can
be verified, combining positivist philosophies with an abductive approach allows room for
elements of uncertainty which are presented as “most likely” or “best case scenario” (Sober,
2013). Such elements of uncertainty could be encountered in stochastic prediction models
developed in this present study, which justifies the adoption of abductive reasoning
(Kl€as and Vollmer, 2018). Moreover, abduction has been extensively used in a plethora of ML Smart and
research (Kl€as and Vollmer, 2018; Crowder et al., 2020), thus adding further validity to this Sustainable Built
choice. Environment
To predict incident risks, ML algorithms were developed using data from past incidents
and events recorded in incident reports contained within the UK’s highways accident report
database. The data consist of variables representing safety indicators (Bayramova et al.,
2023, 2024), which are combined and entered into ML algorithms to enable pre-emptive
prediction of incidents, thereby improving decision-making when a highway task or
operation is going to be executed.
A sequential explanatory mixed method, which employs both qualitative and
quantitative data (Roberts et al., 2021), was adopted for this study, where quantitative
data predominated in the analysis was conducted. This sequential method comprises five
phases, namely, (1) data collection, (2) data pre-processing, (3) model selection, (4) data
balancing and (5) model evaluation – refer to Figure 1. The data collection phase identified 22
independent variables such as region, site and/or project, location, weather and/or visibility
and season, and the dependent variable, which is “event type,” includes nine different classes,
namely, personal illness/injury (PI) type, undesirable circumstance/near miss (UN), security
(SC), environmental (EN), infrastructure (IF), facilities/site (FS), structural safety (SS), US and
incursion/impact protection vehicle (IPV) strike (IS) (refer to Table 2). The data were then
cleansed as part of the data pre-processing phase to improve data accuracy and quality.
Investigations carried out during the model selection phase were analysis of the correlation of
feature variables and analysis of feature importance using RF. These analyses were carried
out to ascertain which independent variables were most influential in the performance of the
classification model. The data balancing phase also evaluated and compared data balancing
algorithms and their impact on model training. The final model evaluation stage involved a
performance evaluation and comparison of the performance of the three ML models
employed.
Figure 1.
The sequential method
adopted
SASBE Target variable Description Examples
Personal illness/injury Situation where an HTO experiences a Fell, slipped or tripped from height or the
(PI) medical issue or sustains an injury same level, attacked by an animal,
while on duty collision, hit by a moving object/plant/
vehicle or falling object
Undesirable Events that have the potential to cause Security threat, technology failure,
circumstance/near harm but are narrowly avoided or contact with hazard, slips, trips and falls
miss (UN) stopped before any adverse without injuries
consequences occur
Security (SC) Events that compromise the safety and Intimidating behaviour, physical
well-being of highway traffic officers assault, racial abuse, verbal abuse or
due to intentional or malicious actions insult, object(s) thrown at road worker,
vehicle driven at road worker
Environmental (EN) Events that affect the natural Disturbance of natural site/ecology,
surroundings or the external conditions heritage/archaeology, land
in which HTOs operate contamination, nuisance (noise, light,
odour, vibration, dust, steam), spill, leak
or uncontrolled discharge, weather
Infrastructure (IF) Incidents related to the physical Failure or damage of technology,
structures, technology and components communication and signals, hard
of the transportation system shoulder misuse, cone strike, live
carriageway crossing
Facilities/site (FS) Hazards associated with the specific Cable management, car park, cleaning,
location or site where highway traffic fire evacuation, grounds maintenance,
officers are stationed temperature, housekeeping, pest control
and waste management
Structural safety (SS) Risks associated with the integrity of Collision with superstructure,
buildings, bridges, or other structures substructure, parapet or vehicle
in the highway environment containment barrier, bridge – fire, flood,
scour, bridge component – steel failure,
corrosion, component loss, concrete
deterioration/damage, post tensioning
Utility strike (US) Incidents where underground utilities, Utility/Service Strike CCTV, electricity,
such as gas, water, or electrical lines, are gas, oil, drainage, telecom, other cables
accidently damaged or struck during or pipelines, water
highway activities
Incursion/IPV strike Events where unauthorised vehicles incursion; intentional - because of
(IS) enter restricted areas or where highway breakdown, breach of rolling roadblock
traffic officers’ vehicles are struck to seek benefit incursion, to seek
information
blue light incursion incursions
incursion: unintentional - driver
confused, follow in incursion, result of
Table 2. accident
Description of target IPV strike
variable classes Source(s): Table by authors’
Figure 2.
The data pre-
processing procedure
SASBE Independent
Indicators variable Data type Meaning References
Figure 3.
Flowchart of model
building experiment
SASBE generalisation performance or the anticipated classification error for unknown test samples
is the fundamental training premise for SVMs (Bhavsar and Panchal, 2012).
In SVM, the number of features that exit in the dataset determines the dimensional space
available for plotting individual observations. For example, if there is F number of features in
the dataset, then each observation is plotted as a point in an F-dimensional space (Bhavsar
and Panchal, 2012). The presence of a hyperplane in SVM serves as a decision boundary,
which distinguishes the two classes in the feature plane (Mun et al., 2017). A data point
located either side of the hyperplane can be recognised as a distinct class (Ding et al., 2014).
The data points nearest to the hyperplane are referred to as the support vector, which has
influence on the orientation and positioning of the hyperplane (Ding et al., 2014). If a straight
line can be used to categorise the data into two sets, then the task is termed a linear SVM
problem. According to the original formulation for SVM (Cortes and Vapnik, 1995), the
hyperplane can be expressed as follows:
mT x � c ¼ 0 (1)
However, this present research presents a multi-classification task whereby the target
variable has nine classes; hence, the use of a plane rather than a line is more appropriate
(Mun et al., 2017). SVMs were initially intended for binary classification; however, complex
distributed real-world data cannot be distinguished using traditional linear SVMs
(Tixier et al., 2016). SVM therefore uses the kernel technique to generalise it to a non-
linear hyperplane (Guo et al., 2021). The resultant algorithm is similar, but each scaler
product is changed to a non-linear kernel function (Cortes and Vapnik, 1995).
3.3.1.1 The kernel technique. The kernel algorithm is a mathematical technique that
enables SVM to classify an ensemble of initially one-dimensional data in a “two-dimensional”
manner (Guo et al., 2021). The four kernel functions adopted for this study, given a kernel
“W,” are:
(1) Polynomial kernel
Expressed as:
� �
Wðvi ; vj Þ ¼ βvi T vj þ 1 8; β > 0 (3)
where Wðvi ; vj Þ is the value of the kernel function for two input data points, vi ; vj, β is the
constant that determines the scaling of the dot product between vi ; vj and 8 is the degree of the
kernel function and an adjustable parameter which makes the kernel flexible.
(2) Linear kernel
Expressed as:
Wðvi ; vj Þ ¼ vi T * vj (4)
(3) Gaussian radial basis function (RBF) Kernel Smart and
Sustainable Built
Expressed as: Environment
� �
Wðvi ; vj Þ ¼ exp −βkvi � vj k2 (5)
where kvi − vj k is the Euclidean distance between vectors vi and vj and β is the positive
constant which controls the shape and reach of the RBF kernel function.
(4) Sigmoid kernel function
Expressed as:
� �
Wðvi ; vj Þ ¼ tanh βvi T vj þ c (6)
where β is the adjustable parameter that establishes the weight or relevance of the dot
product of the input vectors vi ; and vj
3.3.2 Random forest classifier. RF is a ML classification method that comprises of an
ensemble of uncorrelated decision trees (forest of trees), which unanimously determine how
new objects are classified (Paul et al., 2018). Each individual decision tree present in the
random forest has a vote and can make prediction on classes (Rigatti, 2017). The class with a
majority of votes becomes the model’s prediction.
For the nine-class, multi-class problem in this study, the equation for a RF model can be
represented as follows:
where C i is the predicted class for the i-th sample; N is the number of trees in the random
forest; fN ðxi Þ is each individual decision tree and Mode for the i-th sample and the Mode
function determines the predicted class that is used the most frequently across all
decision trees.
3.3.3 Gaussian NB. NB classifiers are a type of simple probabilistic classifiers that use
Bayes’ theorem with a high degree of independence across the features (Li et al., 2022).
To train the model for each given class, the class probability and conditional probabilities are
calculated for each feature while taking into consideration the NB hypothesis that the
features are conditionally independent given the class label (Farid et al., 2014). Using the
Bayes theorem and the conditional probabilities computed during the training, the likelihood
that a data point is affiliated to each of the nine classes is calculated, and the one with highest
calculated probability will be the predicted class.
Figure 4.
Histogram showing
the percentage of
samples per class
(unbalanced dataset)
the training sets while maintaining its statistical properties, so none of the statistical
properties were lost.
RO is the process of enhancing training data with multiple replications of some minority
classes (Shelke et al., 2017). Using an UB dataset, SMOTE is an over-sampling approach
that creates artificial samples for the minority class (Chawla et al., 2002), while the RU is a
non-heuristic approach that attempts to balance desired distributions rather than
removing instances at random from the majority class (Estabrooks and Japkowicz,
2001). Although SMOTE and RO are over-sampling techniques, in contrast to RO, the
SMOTE algorithm oversamples the minority class by creating artificial cases as opposed
to oversampling by substitution (Fern�andez et al., 2018). Instead of using data space, the
SMOTE algorithm generates counterfeit instances depending on the feature space (Blagus
and Lusa, 2013).
Recall: evaluates the predictive ability of the model for positive samples. It is the proportion
of positive samples to total positive cases that were correctly categorised as positive.
TP Smart and
recall ¼ (9) Sustainable Built
TP þ FN Environment
Specificity: measures the number of actual negative instances correctly predicted as
negative.
TN
specificity ¼ (10)
TN þ FP
F1-score: it is the harmonic mean of recall and precision. It offers a balanced evaluation of
precision and recall, which is particularly valuable when classes are imbalanced.
2ðprecision * recallÞ
F1 � score ¼ (11)
precision þ recall
Accuracy score: measures the total accurate predictions (both TPs and TNs) out of all
instances.
TP þ TN
accuracy ¼ (12)
TP þ TP þ FP þ FN
4. Analysis results
The results obtained for the predictive modelling experiments performed indicated a
potential feasibility of applying ML models in accurately predicting incidents.
Figure 5.
Feature importance
analysis with RF
balanced dataset were created utilising three distinct resampling strategies to solve the
imbalanced dataset problem. The totals from all employed datasets and their distribution
among various incident type classes are shown in Table 4. Evidently, all the methods were
successful in balancing the data because they all introduced diversity by creating new
instances, thereby potentially improving the model’s generalisation ability (Sarkar
et al., 2020).
Original dataset N/A 38,509 34,491 1,188 933 545 389 312 265 203 183
Synthetic sampling SMOTE 310,419 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491
Under-sampling RU 1,647 183 183 183 183 183 183 183 183 183
Oversampling RO 310,419 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491
Source(s): Table by authors’
Distribution of classes
Sustainable Built
in target variable
Environment
Smart and
Table 4.
SASBE Target class EN FS IS IF PI SC SS UN US
Kernel
POLYNOMIAL Precision 0.24 0.37 0.79 0.50 1.00 0.57 0.27 0.99 0.38
Recall 0.09 0.42 0.74 0.69 1.00 0.15 0.04 1.00 0.19
F1-score 0.13 0.39 0.77 0.58 1.00 0.24 0.07 1.00 0.26
Overall accuracy score 0.96
RBF Precision 0.33 0.34 0.83 0.44 1.00 0.00 0.00 0.98 0.29
Recall 0.03 0.39 0.68 0.67 1.00 0.00 0.00 1.00 0.02
F1-score 0.05 0.37 0.74 0.53 1.00 0.00 0.00 0.99 0.04
Overall accuracy score 0.95
LINEAR Precision 0.00 0.30 0.80 0.40 1.00 1.00 0.00 0.99 0.00
Recall 0.00 0.28 0.71 0.72 1.00 0.01 0.00 1.00 0.00
F1-score 0.00 0.29 0.75 0.52 1.00 0.02 0.00 0.99 0.00
Overall accuracy score 0.95
SIGMOID Precision 0.00 0.00 0.38 0.01 0.79 0.00 0.00 0.93 0.00
Table 5. Recall 0.00 0.00 0.12 0.00 0.82 0.00 0.00 1.00 0.00
Comparative analysis F1-score 0.00 0.00 0.18 0.00 0.80 0.00 0.00 0.96 0.00
of SVM kernel Overall accuracy score 0.92
performance Source(s): Table by authors’
with the imbalanced data are detailed in Table 6 and Figure 6 for visual comparison. The
different balancing algorithms were then applied separately to the data and then analysed
using the three different classification algorithms separately. Results for the precision of each
target class have been presented in Table 6, and the percentage accuracy score has been
presented for comparison in Figure 6. Table 6 shows the comparative results of all these
algorithms on UB dataset and the three balancing datasets (SM, RO and RU). In all the cases
for the UB dataset, it is observed that RF (97%) performs better than the others based on the
accuracy score metric, followed by SVM (96%) and NB (94%).
The high percentage of accuracy score for the models indicates good performance.
Nevertheless, although accuracy score is mostly used to judge model performance, it might
suffer an anomaly when classes are imbalanced (Sarkar et al., 2020). The individual classes
are therefore evaluated based on the precision, recall and f1-score to ascertain how each class
performed in the model. The balancing algorithms are then applied to the dataset for further
analysis – refer to Figure 7.
It is observed that although the accuracy scores decreased for SVM (82%), the individual
classes had much better performance in classifying classes correctly when the SM algorithm
is applied. A model which has a performance of 70% and above is considered a good
performing model (Zhang et al., 2019); therefore, an accuracy score of 82% is satisfactory.
Some classes still performed poorly with NB and RF classifier when the SVM balancing
algorithm was applied.
5. Discussion
The number of features, attributes or variables used for input in ML is referred to as its
dimensionality (Jia et al., 2022). Therefore, dimensionality reduction is essentially the
technique of decreasing the number of variables in a dataset while retaining as much
variance as possible in the original dataset (Zebari et al., 2020). Several ML studies have
leveraged the proficiency of dimensionality reduction in handling multicollinearity, reducing
training time and preventing overfitting (Huang et al., 2019; Hasan and Abdulazeez, 2021).
However, while the literature is replete with several techniques that can be employed in
Target
class EN FS IS IF PI SC SS UN US
Model (%) UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU
Sustainable Built
Environment
and balancing
Smart and
algorithms
Table 6.
SASBE
NB, 94
NB
ML algorithm
RF, 97
RF
SVM, 96
SVM
Figure 6.
Accuracy score 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97
comparison for Accuracy score (%)
imbalanced data
Source(s): Figure by authors’
100
90
80
70
Accuracy score (%)
60
50
40
30
20
10
0
Figure 7. UB SM RO RU
Accuracy scores for the Balancing algorithms
three models with the
unbalanced and SVM RF NB
balanced data
Source(s): Figure by authors’
dimensionality reduction – such as factor analysis (Huang et al., 2019; Sing et al., 2022) and Smart and
principal component analysis (Hasan and Abdulazeez, 2021) – there is no specific technique Sustainable Built
recommended for a particular ML task. For instance, Zhang et al. (2012) used Pearson Environment
correlation to examine the relationship between categorical and continuous auxiliary
variables of soil organic matter. The variables selected for the model building include “type_
of_work”, “year”, “day_of_week”, “location”, “age range”, “weather/visibility”, “month”,
“season”, “time of day”, “site/project”, “part of body affected”, “experience in current role”,
“injury type”, “region” and “vehicles involved.” Hence, the dimensionality of the dataset was
reduced from 22 to 16. Given the novelty of this present research study, it is impractical to
compare these results against other similar studies conducted (cf. Tixier et al., 2016; Kidando
et al., 2021). That said, this work now provides precedence for other follow-on studies to
challenge the findings reported upon herein and improve the inherent performance of future
models developed. Moreover, identification of the selected variables acts as a first point
towards developing practical solutions to reducing risk. It should be emphasised that a
model per se will not eliminate risks and accidents but information contained within provides
the first step towards developing a human-centric knowledge management organisation
within National Highways. As additional and new data are collected, further refinements can
be made to ensure that knowledge is distributed throughout the organisation for the
betterment of all within the workforce. Indeed, promoting a safe and secure working
environment for all workers is an important sustainable development goal (SDG).
To investigate the significance of the variables selected, the RF method was used to probe
which variables had the strongest GI (Nembrini et al., 2018). Notably, variables selected were
confirmed as important variables by the RF method. Also, the variables found to be
significant in predicting safety incidents indicates that it is imprudent and insufficient to
solely rely on only safety-related data when predicting safety incident types. Rather, project-
related data such as the project location, site/project and demographic data (such as age of
workers) could have substantial impact on prediction outcomes.
The challenge with imbalanced data is that when a model is trained on imbalanced data, it
learns that it can obtain high accuracy by constantly predicting the majority class,
irrespective of whether recognising the minority class is equally or more significant when
applied to a real-world scenario (Sarkar et al., 2020). The three balancing algorithms, namely
SMOTE, RU and RO were therefore applied to each of the three ML models used in the
experiment (SVM, NB and RF) separately. This resulted in having the same number of data
entries for each class. Hence, there was no majority and minority class, which could cause
bias in prediction.
In the experiments to explore the performance of the ML classifiers when certain
balancing algorithms are applied, it is observed that, with the imbalanced dataset, for all
three algorithms, only the classes IS, PI and UN could be correctly classified, while the rest of
the classes performed poorly and failed to classify any of the instances correctly. This could
be attributed to these three classes being part of the top four classes with the majority of data.
Sarkar et al. (2020) explains how a model constantly exhibits bias towards majority classes
for imbalanced datasets, leading to poor predictions. This bias is evident in the results
presented by the original imbalanced dataset for all the three classes.
When the SMOTE algorithm was applied to all three classifiers, it was observed that the
SVM classifier could predict accurately 70% and above all the classes present. For a model to
be deemed as useful, it must have a minimum precision between 70 and 80% (Juba and Le,
2019); hence, the SVM þ SMOTE algorithm can be considered as a useful model for
predicting incident types. However, it is worth noting that the accuracy of the
SVM þ SMOTE algorithm was 82%, which is less than the accuracy score of the SVM
algorithm alone. The SVM classifier with the SMOTE balancing algorithm outperformed all
the other balancing algorithms (i.e. RO and RU), with 72% being the least precision recorded
SASBE for a class (0). This result is a clear indication that accuracy score alone is not a good enough
metric to evaluate imbalanced classes (McNee et al., 2006). For RF þ SMOTE, although the
accuracy score was 97% (same as with the original data), some of the classes could not be
correctly classified. Some classes such as “EN” had precision as low as 38%, which is
considered a very poor model. NB þ SMOTE had not shown much promise either. The
accuracy score was 93%, which indicated a good performance but the precision and recall of
the individual classes showed very poor performance (the lowest precision and recall being
0%). Therefore, with the SMOTE experiment, SVM is considered the best classifier for
incident types.
Premised upon the analysis of the results from the experiment, it can be concluded that
SVM þ SMOTE is the best-performing model among the models used and should therefore be
considered as a prime ML model for predicting various incident types in highway operations.
These findings demonstrate the model’s ability to generalise that the model could reasonably
predict the dependent variables that can identify events in highway operations for a given set
of inputs. The result of this model indicates the possibility of predicting incidents even with
imbalanced datasets. Zhang et al. (2015) had performed a similar study using SVM to forecast
the profitability of a project with an imbalanced dataset, which resulted in an accuracy score
between 0.74 and 0.91. However, all predicted values were from the majority class, which
presented a challenge that the proposed model in this study remedies. Augmenting SVM with
balancing algorithms such as SMOTE helps to ensure that the individual predicted values of a
class are the true reflection of the high accuracy score a model presents. Furthermore, this
study demonstrates that the occurrence of incidents are not random events but rather that there
are underlying trends or pattern based on features found to be significant in predicting incident
outcomes that could indicate an incident occurrence in time.
5.2 Proposed user interface for the model and practical implications
Given the complexity of modelling developed and big data set involved, this study proposes
the future development of web-based graphical user interface (GUI) software for safety
officers working for highway projects. Such software will inspire knowledge-based decision-
making backed by evidence and encourage the emergence of a learning organisation. The
software will enable users enter input variables concomitant to the project operation and then
send instructions to the ML model to forecast the type(s) of incident likely to occur during Smart and
that operation – refer to Figure 8. Hence, safety officers can prioritise H&S risk elements Sustainable Built
based on their probability of occurrence. Consequently, proper consideration is given to these Environment
risk variables while limiting occurrences in order to deliver a safer environment. Currently,
the process of HTO risk assessment relies heavily on the subjectivity of human judgement,
perception and human lived experiences when an incident is to be predicted (Hegde and
Rokseth, 2020). The model presents an efficient approach to discovering meaningful patterns
and trends that cause the various types of incidents, occurring for an objective alternative to
predicting and visualising incident scenarios.
A key implication of this study is the opportunity it offers safety officers to train HTO and
educate them on impending risks, provide needed resources (e.g. Personal protective
equipment) and pre-emptively address impending risks before they occur. The model also
unearths important attributes or indicators that safety officers should cogitate on when
implementing safety risk prevention strategies. The GUI also allows organisations to collect
new data, which would help them address the challenges of insufficient data. Data collected
can be analysed for deeper insights into why incidents occur, how they occur and what can be
done to prevent them from occurring. Furthermore, a major implication of this study is the
resilience that this model brings to organisations. Some indicators of resilience according to
Chen et al. (2018) include awareness, anticipation, management commitment, response and
reporting culture. This model enables all these measures to be attainable, i.e. predicting
incidents beforehand will allow workers to be aware of impending risks and anticipate the
incident, thereby providing enough time to respond to the incident with proactive measures
to pre-empt it. Entering data variables through the GUI instead of reporting to management
directly also promotes reporting culture, which will encourage employees to record
Figure 8.
Proposed highway
safety
prediction model
SASBE indicators without fear of being sanctioned. This is because the required data variables do
not have any personal identification indicators.
6. Conclusion
HTOs face an omnipresent risk of being involved in several incidents that could endanger
their lives and prevent them from returning home safely after work. For highway authorities
to establish an efficient and reliable solution for these incidents, the probability of their
occurrence must be predicted accurately as a first step towards risk mitigation measures
being implemented. However, increasingly, Industry 4.0 solutions are being employed as
solutions with which to identify, report and ultimately mitigate risks posed (Newman et al.,
2021). This is because current traditional statistical methods used to mitigate risk by UK
highway authorities are more of a reactive approach which needs to be tackled. ML
algorithms provide an effective technique to analysing complex data, identifying the
nuances of highly complicated patterns in the data and making accurate predictions based on
these patterns. This in turn promotes safe and secure working environments for all workers
as required by the SDG 8.8, thus contributing significantly to societal impact.
Therefore, by comparatively analysing the performance of three ML algorithms and three
balancing algorithms, this study developed a novel model for classifying incident risks in
highway operations. This study has shown that year, weather, age, type of work, vehicles
and project location play an essential role in predicting incident types. In this study, 16
variables out of the 22 variables presented were retained after correlation analysis was used
for dimensionality reduction. 10 out of the 16 variables were found to have a high
contribution to predicting incident types. The polynomial kernel of SVM had the best
performance measure for the SVM classifiers among the other kernels.
The SVM classifier with the SMOTE algorithm demonstrated the best performance for each
class of the target variable compared to the other algorithms. The RF algorithm combined with
the RO algorithm also had a good performance but was not better than SVM. The predictive
model developed in this study has shown sufficiently accurate results, which can be reliable for
preventing safety incidents in highway operations. Therefore, practical implications can be
accorded to the results of this study. Namely, the predictive model can augment the efforts of
safety officers by establishing which incidents need more resources and attention at a
particular time for effective decision-making. Preventing incidents will also enable highway
operations to run within budget, encourage timeliness and prevent waste of resources. Above
all, avoiding such incidents could save lives. Knowledge acquired from modelling will also
embed learning within a knowledge management-enabled organisation that uses introspection
(of good and poor practices) to drive improvements forward (cf. Savari et al., 2024).
References
Abbasianjahromi, H. and Aghakarimi, M. (2023), “Safety performance prediction and modification
strategies for construction projects via machine learning techniques”, Engineering Construction
and Architectural Management, Vol. 30 No. 3, pp. 1146-1164, doi: 10.1108/ECAM-04-2021-0303.
Ajayi, A., Oyedele, L., Owolabi, H., Akinade, O., Bilal, M., Davila Delgado, J.M. and Akanbi, L. (2020),
“Deep learning models for health and safety risk prediction in power infrastructure projects”,
Risk Analysis, Vol. 40 No. 10, pp. 2019-2039, doi: 10.1111/risa.13425.
Alizadeh, S.S., Mortazavi, S.B. and Mehdi Sepehri, M. (2015), “Assessment of accident severity in the
construction industry using the Bayesian theorem”, International Journal of Occupational
Safety and Ergonomics, Vol. 21 No. 4, pp. 551-557, doi: 10.1080/10803548.2015.1095546.
Bayramova, A., Edwards, D.J., Roberts, C. and Rillie, I. (2023), “Enhanced safety in complex socio-technical
systems via safety-in-cohesion”, Safety Science, Vol. 164, 106176, doi: 10.1016/j.ssci.2023.106176.
Bayramova, A., Edwards, D.J., Roberts, C. and Rillie, I. (2024), “Unravelling the Gordian knot of Smart and
leading indicators”, Safety Science, Vol. 177, 106603, doi: 10.1016/j.ssci.2024.106603. Sustainable Built
Bhavsar, H. and Panchal, M.H. (2012), “A review on support vector machine for data classification”, Environment
International Journal of Advanced Research in Computer Engineering and Technology
(IJARCET), Vol. 1 No. 10, pp. 185-189, ISSN: 2278 – 1323.
Blagus, R. and Lusa, L. (2013), “SMOTE for high-dimensional class-imbalanced data”, BMC
Bioinformatics, Vol. 14, pp. 1-16, doi: 10.1186/1471-2105-14-106.
Bortey, L., Edwards, D.J., Shelbourn, M. and Rillie, I. (2021), “Development of a proof-of-concept risk
model for accident prevention on highways construction”, paper presented at the Quantity
Surveying Research Conference, 10 November Port Elizabeth, South Africa, Vol. 10, available at:
https://www.researchgate.net/publication/358444274_Conceptual_model_development_for_
safety_risk_management_-a_machine_learning_approach (accessed 10 September 2024).
Bortey, L., Edwards, D.J., Roberts, C. and Rillie, I. (2022), “A review of safety risk theories and models
and the development of a digital highway construction safety risk model”, Digital, Vol. 2,
pp. 206-223, doi: 10.3390/digital2020013.
Cattermole-Terzic, V. and Horberry, T. (2020), “Improving traffic incident management using team
cognitive work analysis”, Journal of Cognitive Engineering and Decision Making, Vol. 14 No. 2,
pp. 152-173, doi: 10.1177/1555343419882.
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002), “SMOTE: synthetic minority
over-sampling technique”, Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357, doi:
10.48550/arXiv.1106.1813.
Chen, F., Wang, J. and Deng, Y. (2015), “Road safety risk evaluation by means of improved entropy
TOPSIS–RSR”, Safety Science, Vol. 79, pp. 39-54, doi: 10.1016/j.ssci.2015.05.006.
Chen, Y., McCabe, B. and Hyatt, D. (2018), “A resilience safety climate model predicting construction
safety performance”, Safety Science, Vol. 109, pp. 434-445, doi: 10.1016/j.ssci.2018.07.003.
Cheng, C., Leu, S., Cheng, Y., Wu, T. and Lin, C. (2012), “Applying data mining techniques to explore
factors contributing to occupational injuries in Taiwan’s construction industry”, Accident
Analysis and Prevention, Vol. 48, pp. 214-222, doi: 10.1016/j.aap.2011.04.014.
Chiang, Y.H., Wong, F.K.W. and Liang, S. (2018), “Fatal construction accidents in Hong Kong”,
Journal of Construction Engineering and Management, Vol. 144 No. 3, 04017121, doi: 10.1061/
(ASCE)CO.1943-7862.0001433.
Choi, J., Gu, B., Chin, S. and Lee, J.S. (2020), “Machine learning predictive model based on national
data for fatal accidents of construction workers”, Automation in Construction, Vol. 110, 102974,
doi: 10.1016/j.autcon.2019.102974.
Cortes, C. and Vapnik, V. (1995), “Support-vector networks”, Machine Learning, Vol. 20 No. 3,
pp. 273-297, doi: 10.1007/BF00994018.
Crowder, J.A., Carbone, J., Friess, S., Crowder, J.A., Carbone, J. and Friess, S. (2020), “Abductive
artificial intelligence learning models”, in Artificial Psychology: Psychological Modelling and
Testing of AI Systems, pp. 51-63, doi: 10.1007/978-3-030-17081-3_5.
Dadashova, B., Arenas-Ram�ırez, B., Mira-McWilliams, J. and Aparicio-Izquierdo, F. (2016),
“Methodological development for selection of significant predictors explaining fatal road
accidents”, Accident Analysis and Prevention, Vol. 90, pp. 82-94, doi: 10.1016/j.aap.2016.02.003.
Dai, W.Z., Zu, Q., Yu, Y and Zhou, Z.H. (2019), “Bridging machine learning and logical reasoning by
abductive learning”, in Wallach, H. et al. (Eds), Advances in Neural Information Processing
Systems, Curran Associates, available at: https://proceedings.neurips.cc/paper_files/paper/
2019/file/9c19a2aa1d84e04b0bd4bc888792bd1e-Paper.pdf
Darby, P., Murray, W. and Raeside, R. (2009), “Applying online fleet driver assessment to help
identify, target and reduce occupational road safety risks”, Safety Science, Vol. 47 No. 3,
pp. 436-442, doi: 10.1016/j.ssci.2008.05.004.
SASBE Dell’Acqua, G., Russo, F. and Biancardo, S.A. (2013), “Risk-type density diagrams by crash type on
two-lane rural roads”, Journal of Risk Research, Vol. 16 No. 10, pp. 1297-1314, doi: 10.1080/
13669877.2013.788547.
Ding, S., Hua, X. and Yu, J. (2014), “An overview on nonparallel hyperplane support vector machine
algorithms”, Neural Computing and Applications, Vol. 25 No. 5, pp. 975-982, doi: 10.1007/
s00521-013-1524-6.
Edwards, D.J., Rillie, I., Chileshe, N., Lai, J., Hosseini, M.R. and Thwala, W.D. (2020), “A field survey of
hand–arm vibration exposure in the UK utilities sector”, Engineering Construction and
Architectural Management, Vol. 27 No. 9, pp. 2179-2198, doi: 10.1108/ECAM-09-2019-0518.
Erdogan, S. (2009), “Explorative spatial analysis of traffic accident statistics and road mortality
among the provinces of Turkey”, Journal of Safety Research, Vol. 40 No. 5, pp. 341-351, doi: 10.
1016/j.jsr.2009.07.006.
Eseonu, C., Gambatese, J. and Nnaji, C. (2018), Reducing Highway Construction Fatalities through
Improved Adoption of Safety Technologies, The Centre for Construction Research and Training
(CPWR Small Study No.17-4-PS), available at: https://www.elcosh.org/record/document/4305/
d001575.pdf
Eslamirad, N., Malekpour Kolbadinejad, S., Mahdavinejad, M. and Mehranrad, M. (2020), “Thermal
comfort prediction by applying supervised machine learning in green sidewalks of Tehran”, Smart
and Sustainable Built Environment, Vol. 9 No. 4, pp. 361-374, doi: 10.1108/SASBE-03-2019-0028.
Estabrooks, A. and Japkowicz, N. (2001), “A mixture-of-experts framework for learning from
imbalanced data sets”, in Hoffmann, F., Hand, D.J., Adams, N., Fisher, D. and Guimaraes, G.
(Eds), Advances in Intelligent Data Analysis, Lecture Notes in Computer Science, Springer,
Berlin, Heidelberg, pp. 34-43, doi: 10.1007/3-540-44816-0_4.
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A. and Strachan, R. (2014), “Hybrid decision tree
and Naı€ve Bayes classifiers for multi-class classification tasks”, Expert Systems with
Applications, Part 2, Vol. 41 No. 4, pp. 1937-1946, doi: 10.1016/j.eswa.2013.08.089.
Fern�andez, A., Garcia, S., Herrera, F. and Chawla, N.V. (2018), “SMOTE for learning from imbalanced
data: progress and challenges, marking the 15-year anniversary”, Journal of Artificial
Intelligence Research, Vol. 61, pp. 863-905, doi: 10.1613/jair.1.11192.
Fernando, A., Siriwardana, C., Law, D., Gunasekara, C., Zhang, K. and Gamage, K. (2024), “A scoping
review and analysis of green construction research: a machine learning aided approach”, Smart
and Sustainable Built Environment, Vol. ahead-of-print No. ahead-of-print, doi: 10.1108/SASBE-
08-2023-0201.
George, J. and Hautier, G. (2021), “Chemist versus machine: traditional knowledge versus machine
learning techniques”, Trends in Chemistry, Vol. 3 No. 2, pp. 86-95, doi: 10.1016/j.trechm.2020.
10.007.
GG 104 Requirements for safety risk assessment (2018), available at: https://www.
standardsforhighways.co.uk/tses/attachments/0338b395-7959-4e5b-9537-5d2bdd75f3b9?
inline5true (accessed 12 August 2023).
Goh, Y.M. and Chua, D. (2013), “Neural network analysis of construction safety management systems:
a case study in Singapore”, Construction Management and Economics, Vol. 31 No. 5,
pp. 460-470, doi: 10.1080/01446193.2013.797095.
Goh, Y.M. and Soon, W.T. (2014), Safety Management Lessons from Major Accident Inquiries,
Pearson Education Pte Limited, South Asia, ISBN: 9789814598699.
Goh, Y.M., Ubeynarayana, C.U., Wong, K.L.X. and Guo, B.H. (2018), “Factors influencing unsafe
behaviours: a supervised learning approach”, Accident Analysis and Prevention, Vol. 118,
pp. 77-85, doi: 10.1016/j.aap.2018.06.002.
Gov.uk (2017), “Highways England highlights dangers faced by road workers”, available at: https://
www.gov.uk/government/news/highways-england-highlights-dangers-faced-by-road-workers
(accessed 25 October 2023).
Guill�en Perales, A., Li�ebana-Cabanillas, F., S�anchez-Fern�andez, J. and Herrera, L.J. (2024), “Assessing Smart and
university students’ perception of academic quality using machine learning”, Applied Sustainable Built
Computing and Informatics, Vol. 20 Nos 1/2, pp. 20-34, doi: 10.1108/ACI-06-2020-0003. Environment
Guo, Y., Zhang, Z. and Tang, F. (2021), “Feature selection with kernelized multi-class support vector
machine”, Pattern Recognition, Vol. 117, 107988, doi: 10.1016/j.patcog.2021.107988.
Hasan, B.M.S. and Abdulazeez, A.M. (2021), “A review of principal component analysis algorithm for
dimensionality reduction”, Journal of Soft Computing and Data Mining, Vol. 2 No. 1, pp. 20-30,
available at: https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/8032
Hegde, J. and Rokseth, B. (2020), “Applications of machine learning methods for engineering risk
assessment– a review”, Safety Science, Vol. 122, 104492, doi: 10.1016/j.ssci.2019.09.015.
Huang, X., Wu, L. and Ye, Y. (2019), “A review on dimensionality reduction techniques”, International
Journal of Pattern Recognition and Artificial Intelligence, Vol. 33 No. 10, 1950017, doi: 10.1142/
S0218001419500174.
Hughes, B.P., Newstead, S., Anund, A., Shu, C.C. and Falkmer, T. (2015), “A review of models relevant
to road safety”, Accident Analysis and Prevention, Vol. 74, pp. 250-270, doi: 10.1016/j.aap.2014.
06.003.
Jain, S. and Saha, A. (2022), “Rank-based univariate feature selection methods on machine learning
classifiers for code smell detection”, Evolutionary Intelligence, Vol. 15 No. 1, pp. 609-638, doi: 10.
1007/s12065-020-00536-z.
Jia, W., Sun, M., Lian, J. and Hou, S. (2022), “Feature dimensionality reduction: a review”, Complex and
Intelligent Systems, Vol. 8 No. 3, pp. 2663-2693, doi: 10.1007/s40747-021-00637-x.
Jocelyn, S., Ouali, M.S. and Chinniah, Y. (2018), “Estimation of probability of harm in safety of
machinery using an investigation systemic approach and Logical Analysis of Data”, Safety
Science, Vol. 105, pp. 32-45, doi: 10.1016/j.ssci.2018.01.018.
Juba, B. and Le, H.S. (2019), “Precision-recall versus accuracy and the role of large data sets”,
Proceedings of the AAAI Conference on Artificial Intelligence, 27 January- 1 February, Hawaii,
Vol. 33, pp. 4039-4048, doi: 10.1609/aaai.v33i01.33014039.
Kang, K. and Ryu, H. (2019), “Predicting types of occupational accidents at construction sites in Korea
using random forest model”, Safety Science, Vol. 120, pp. 226-236, doi: 10.1016/j.ssci.2019.
06.034.
Kidando, E., Kitali, A.E., Kutela, B., Ghorbanzadeh, M., Karaer, A., Koloushani, M., Moses, R.,
Ozguven, E.E. and Sando, T. (2021), “Prediction of vehicle occupants injury at signalized
intersections using real-time traffic and signal data”, Accident Analysis and Prevention,
Vol. 149, 105869, doi: 10.1177/03611981211047836.
Kl€as, M. and Vollmer, A.M. (2018), “Uncertainty in machine learning applications: a practice-driven
classification of uncertainty”, Proceedings of computer Safety, Reliability, and Security: SAFECOMP
2018 Workshops, ASSURE, DECSoS, SASSUR, STRIVE, and WAISE, V€aster� as, 18 September,
Vol. 37, pp. 431-438, doi: 10.1007/978-3-319-99229-7_36.
Koc, K., Ekmekcio� € and Gurgun, A.P. (2022), “Accident prediction in construction using hybrid
glu, O.
wavelet-machine learning”, Automation in Construction, Vol. 133, 103987, doi: 10.1016/j.autcon.
2021.103987.
Li, X., Wang, Y., Basu, S., Kumbier, K. and Yu, B. (2019), “A debiased MDI feature importance
measure for random forests”, Advances in Neural Information Processing Systems, Vol. 32,
pp. 8049-8059, available at: https://proceedings.neurips.cc/paper/2019/hash/702cafa3b
b4c9c86e4a3b6834b45aedd-Abstract.html (accessed 13 September 2024).
Li, L., Zhou, Z., Bai, N., Wang, T., Xue, K.H., Sun, H., He, Q., Cheng, W. and Miao, X. (2022), “Naive
Bayes classifier based on memristor nonlinear conductance”, Microelectronics Journal, Vol. 129,
105574, doi: 10.1016/j.mejo.2022.105574.
Lopez, C., Marti, J.R. and Sarkaria, S. (2018), “Distributed reinforcement learning in emergency
response simulation”, IEEE Access, Vol. 6, pp. 67261-67276, doi: 10.1109/ACCESS.2018.2878894.
SASBE Malakouti, S.M., Menhaj, M.B. and Suratgar, A.A. (2023), “The usage of 10-fold cross-validation and
grid search to enhance ML methods performance in solar farm power generation prediction”,
Cleaner Engineering and Technology, Vol. 15, 100664, doi: 10.1016/j.clet.2023.100664.
McNee, S.M., Riedl, J. and Konstan, J.A. (2006), “Being accurate is not enough: how accuracy metrics
have hurt recommender systems”, CHI’06 Extended Abstracts on Human factors in Computing
Systems, pp. 1097-1101, doi: 10.1145/1125451.1125659.
Meng, X., Zhang, K., Pang, K. and Xiang, X. (2020), “Characterization of spatio-temporal distribution
of vehicle emissions using web-based real-time traffic data”, Science of the Total Environment,
Vol. 709, 136227, doi: 10.1016/j.scitotenv.2019.136227.
Mirabadi, A. and Sharifian, S. (2010), “Application of association rules in Iranian Railways (RAI)
accident data analysis”, Safety Science, Vol. 48 No. 10, pp. 1427-1435, doi: 10.1016/j.ssci.2010.
06.006.
Moosavi, S., Samavatian, M.H., Parthasarathy, S., Teodorescu, R. and Ramnath, R. (2019), “Accident
risk prediction based on heterogeneous sparse data: new dataset and insights”, in Banaei-
Kashani, F., Trajcevski, G., G€
uting, R.H., Kulik, L. and Newsam, S. (Eds), Proceedings of the
27th ACM SIGSPATIAL International Conference on Advances in Geographic Information
Systems SIGSPATIAL ’19, Association for Computing Machinery, New York, NY, pp. 33-42,
doi: 10.1145/3347146.3359078.
Moosavi, S.M., Jablonka, K.M. and Smit, B. (2020), “The role of machine learning in the understanding
and design of materials”, Journal of the American Chemical Society, Vol. 142 No. 48,
pp. 20273-20287, doi: 10.1021/jacs.0c09105.
Mun, S., Park, S., Han, D.K. and Ko, H. (2017), “Generative adversarial network based acoustic scene
training set augmentation and selection using SVM hyper-plane”, Paper Presented at the
Detection and Classification of Acoustic Scenes and Events, 16 November, Munich, pp. 93-102,
available at: https://dcase.community/documents/challenge2017/technical_reports/
DCASE2017_Mun_213.pdf
Namian, M., Albert, A., Zuluaga, C.M. and Behm, M. (2016), “Role of safety training: impact on hazard
recognition and safety risk perception”, Journal of Construction Engineering and Management,
Vol. 142 No. 12, 04016073, doi: 10.1061/(ASCE)CO.1943-7862.0001198.
Nembrini, S., K€onig, I.R. and Wright, M.N. (2018), “The revival of the Gini importance”,
Bioinformatics, Vol. 34 No. 21, pp. 3711-3718, doi: 10.1093/bioinformatics/bty373.
Newman, C., Edwards, D., Martek, I., Lai, J., Thwala, W.D. and Rillie, I. (2021), “Industry 4.0
deployment in the construction industry: a bibliometric literature review and UK-based case
study”, Smart and Sustainable Built Environment, Vol. 10 No. 4, pp. 557-580, doi: 10.1108/
SASBE-02-2020-0016.
Nnaji, C., Gambatese, J., Karakhan, A. and Eseonu, C. (2019), “Influential safety technology adoption
predictors in construction”, Engineering Construction and Architectural Management, Vol. 26
No. 11, pp. 2655-2681, doi: 10.1108/ecam-09-2018-0381.
Owens, N., Armstrong, A., Sullivan, P., Mitchell, C., Newton, D., Brewster, R. and Trego, T. (2010),
“Traffic incident management handbook (No.FHWA-HOP-10-013)”, available at: http://www.
ops.fhwa.dot.gov/eto_tim_pse/publications/timhandbook/tim_handbook.pdf (accessed 1
October 2023).
Oyedele, A.O., Ajayi, A.O. and Oyedele, L.O. (2021), “Machine learning predictions for lost time
injuries in power transmission and distribution projects”, Machine Learning with Applications,
Vol. 6, 100158, doi: 10.1016/j.mlwa.2021.100158.
Park, Y.S., Konge, L. and Artino, A.R. (2020), “The positivism paradigm of research”, Academic
Medicine, Vol. 95 No. 5, pp. 690-694, doi: 10.1097/ACM.0000000000003093.
Patel, D.A. and Jha, K.N. (2015), “Neural network approach for safety climate prediction”, Journal of
Management in Engineering, Vol. 31 No. 6, 05014027, doi: 10.1061/(ASCE)ME.1943-5479.
000034.
Patle, A. and Chouhan, D.S. (2013), “SVM kernel functions for classification”, in Patel, C.H., Deheri, G., Smart and
Patel, S.H. and Mehta, S.M. (Eds), 2013 International Conference on Advances in Technology and Sustainable Built
Engineering (ICATE), 23-25 January, Mumbai, IEEE, pp. 1-9, doi: 10.1109/ICATE20315.2013. Environment
Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R. and Kundu, S. (2018), “Improved
random forest for classification”, IEEE Transactions on Image Processing, Vol. 27 No. 8,
pp. 4012-4024, doi: 10.1109/TIP.2018.2834830.
Posillico, J.J. (2023), “Development of an interpersonally grounded construction management
curriculum foundation model”, Doctoral thesis, Birmingham City University, available at:
https://www.open-access.bcu.ac.uk/14277/ (accessed 1 August 2024).
Rachman, A. and Ratnayake, R.M.C. (2019), “Machine learning approach for risk-based inspection
screening assessment”, Reliability Engineering and System Safety, Vol. 185, pp. 518-532, doi: 10.
1016/j.ress.2019.02.008.
Ren, Y., Zhang, L. and Suganthan, P.N. (2016), “Ensemble classification and regression-recent
developments, applications and future directions”, IEEE Computational Intelligence Magazine,
Vol. 11 No. 1, pp. 41-53, doi: 10.1109/MCI.2015.2471235.
Rigatti, S.J. (2017), “Random forest”, Journal of Insurance Medicine, Vol. 47 No. 1, pp. 31-39, doi: 10.
17849/insm-47-01-31-39.1.
Roberts, C., Edwards, D.J., Sing, M.C.P. and Aigbavboa, C. (2021), “Post-occupancy evaluation: process
delineation and implementation trends in the UK higher education sector”, Architectural Engineering
and Design Management, Vol. 19 No. 2, pp. 125-147, doi: 10.1080/17452007.2021.1956422.
Rogovski, P., Cadamuro, R.D., da Silva, R., de Souza, E.B., Bonatto, C., Viancelli, A., Michelon, W.,
Elmahdy, E.M., Treichel, H., Rodr�ıguez-L�azaro, D. and Fongaro, G. (2021), “Uses of
bacteriophages as bacterial control tools and environmental safety indicators”, Frontiers in
Microbiology, Vol. 12, p. 3756, doi: 10.3389/fmicb.2021.793135.
Saranya, T., Sridevi, S., Deisy, C., Chung, T.D. and Khan, M.A. (2020), “Performance analysis of
machine learning algorithms in intrusion detection system: a review”, Procedia Computer
Science, Vol. 171, pp. 1251-1260, doi: 10.1016/j.procs.2020.04.133.
Sarkar, S., Pramanik, A., Maiti, J. and Reniers, G. (2020), “Predicting and analysing injury severity: a
machine learning-based approach using class-imbalanced proactive and reactive data”, Safety
Science, Vol. 125, 104616, doi: 10.1016/j.ssci.2020.104616.
Sarvari, H., Edwards, D.J., Rillie, I. and Posillico, J. (2024), “Building a safer future: analysis of studies
on safety I and safety II in the construction industry”, Safety Science, Vol. 178, doi: 10.1016/j.
ssci.2024.106621.
Shelke, M.S., Deshmukh, P.R. and Shandilya, V.K. (2017), “A review on imbalanced data handling
using undersampling and oversampling technique”, International Journal of Recent Trends in
Engineering Research, Vol. 3 No. 4, pp. 444-449, doi: 10.23883/IJRTER.2017.3168.0UWXM.
Shirali, G., Shekari, M. and Angali, K.A. (2018), “Assessing reliability and validity of an instrument
for measuring resilience safety culture in sociotechnical systems”, Safety and Health at Work,
Vol. 9 No. 3, pp. 296-307, doi: 10.1016/j.shaw.2017.07.010.
Sing, M.C.P., Edwards, D.J., Leung, A.W.T., Liu, H. and Roberts, C.J. (2022), “A theoretical framework
for classifying project complexity at the preconstruction stage using cluster analysis
techniques”, Engineering Construction and Architectural Management, Vol. 29 No. 9,
pp. 3754-3774, doi: 10.1108/ECAM-09-2020-0726.
Sober, E. (2013), Core Questions in Philosophy: A Text with Readings, Vol. 6, Pearson Education,
Boston, p. 28, ISBN: 9780205206698.
Son, C., Sasangohar, F., Peres, S.C., Neville, T.J. and Moon, J. (2019), “Orchestrating through
whirlwind: identified challenges and resilience factors of incident management teams during
Hurricane Harvey”, Proceedings of the Human Factors and Ergonomics Society Annual
Meeting, 23 November, Sage CA: Los Angeles, CA, SAGE Publications, Vol. 63 No. 1,
pp. 899-903, doi: 10.1177/1071181319631265.
SASBE St Denis, L.A., Short, K.C., McConnell, K., Cook, M.C., Mietkiewicz, N.P., Buckland, M. and Balch, J.K.
(2023), “All-hazards dataset mined from the US national incident management system 1999-
2020”, Scientific Data, Vol. 10 No. 1, p. 112, doi: 10.1038/s41597-023-01955-0.
Strain, T.J., Wilson, R.E. and Littleworth, R. (2022), “Role of traffic officers in transportation asset
monitoring”, paper presented at the Transport Research Board 2022, 9-13 January, Washington
DC, available at: https://www.trb.org/Main/Blurbs/182248.aspx> (accessed 10
September 2024).
Sutton, R.S. and Barto, A.G. (2018), Reinforcement Learning: An Introduction, MIT Press, ISBN:
9780262352703, 0262352702.
Tixier, A.J.P., Hallowell, M.R., Rajagopalan, B. and Bowman, D. (2016), “Application of machine
learning to construction injury prediction”, Automation in Construction, Vol. 69, pp. 102-114,
doi: 10.1016/j.autcon.2016.05.016.
Tong, D.Y.K., Rasiah, D., Tong, X.F. and Lai, K.P. (2015), “Leadership empowerment behaviour on
safety officer and safety teamwork in manufacturing industry”, Safety Science, Vol. 72,
pp. 190-198, doi: 10.1016/j.ssci.2014.09.009.
Tsoukalas, V.D. and Fragiadakis, N.G. (2016), “Prediction of occupational risk in the shipbuilding
industry using multivariable linear regression and genetic algorithm analysis”, Safety Science,
Vol. 83, pp. 12-22, doi: 10.1016/j.ssci.2015.11.010.
Uma, S. and Eswari, R. (2022), “Accident prevention and safety assistance using IOT and machine
learning”, Journal of Reliable Intelligent Environments, Vol. 8 No. 2, pp. 79-103, doi: 10.1007/
s40860-021-00136-3.
Weng, J., Zhu, J.Z., Yan, X. and Liu, Z. (2016), “Investigation of work zone crash casualty patterns
using association rules”, Accident Analysis and Prevention, Vol. 92, pp. 43-52, doi: 10.1016/j.aap.
2016.03.017.
Woods, D. and Wreathall, J. (2003), Managing Risk Proactively: The Emergence of Resilience
Engineering, Ohio University, Columbus, GA, available at: https://www.researchgate.net/
publication/228711828_Managing_Risk_Proactively_The_Emergence_of_Resilience_
Engineering (accessed 10 September 2024).
Xu, J., Zhang, Y. and Miao, D. (2020), “Three-way confusion matrix for classification: a measure
driven view”, Information Sciences, Vol. 507, pp. 772-794, doi: 10.1016/j.ins.2019.06.064.
Yacouby, R. and Axman, D. (2020), “Probabilistic extension of precision, recall, and f1 score for more
thorough evaluation of classification models”, in Eger, S., Gao, Y., Peyrard, M., Zhao, W. and
Hovy, E. (Eds), Proceedings of the First Workshop on Evaluation and Comparison of NLP
Systems, 13 November, pp. 79-91, doi: 10.18653/v1/2020.eval4nlp-1.9.
Yue, W., Li, C., Wang, S., Xue, N. and Wu, J. (2023), “Cooperative incident management in mixed
traffic of CAVs and human-driven vehicles”, IEEE Transactions on Intelligent Transportation
Systems, Vol. 24 No. 11, pp. 12462-12476, doi: 10.1109/TITS.2023.3289983.
Zagorecki, A.T., Johnson, D.E. and Ristvej, J. (2013), “Data mining and machine learning in the
context of disaster and crisis management”, International Journal of Emergency Management,
Vol. 9 No. 4, pp. 351-365, doi: 10.1504/IJEM.2013.059879.
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D. and Saeed, J. (2020), “A comprehensive review of
dimensionality reduction techniques for feature selection and feature extraction”, Journal of
Applied Science and Technology Trends, Vol. 1 No. 2, pp. 56-70, doi: 10.38094/jastt1224.
Zhang, S., Huang, Y., Shen, C., Ye, H. and Du, Y. (2012), “Spatial prediction of soil organic matter
using terrain indices and categorical variables as auxiliary information”, Geoderma, Vol. 171,
pp. 35-43, doi: 10.1016/j.geoderma.2011.07.012.
Zhang, H., Li, Y. and Zhang, H. (2019), “Risk early warning safety model for sports events based on
back propagation neural network machine learning”, Safety Science, Vol. 118, pp. 332-336, doi:
10.1016/j.ssci.2019.05.011.
Zhang, L., Wu, X., Qin, Y., Skibniewski, M.J. and Liu, W. (2016), “Towards a fuzzy Bayesian network- Smart and
based approach for safety risk analysis of tunnel-induced pipeline damage”, Risk Analysis, Sustainable Built
Vol. 36 No. 2, pp. 278-330, doi: 10.1111/risa.12448. Environment
Zhang, H., Yang, F., Li, Y. and Li, H. (2015), “Predicting profitability of listed construction companies
based on principal component analysis and support vector machine—evidence from China”,
Automation in Construction, Vol. 53, pp. 22-28, doi: 10.1016/j.autcon.2015.03.001.
Further reading
Alawad, H., Kaewunruen, S. and An, M. (2019), “Learning from accidents: machine learning for safety
at railway stations”, IEEE Access, Vol. 8, pp. 633-648, doi: 10.1109/ACCESS.2019.2962072.
Alharahsheh, H.H. and Pius, A. (2020), “A review of key paradigms: positivism vs interpretivism”,
Global Academic Journal of Humanities and Social Sciences, Vol. 2 No. 3, pp. 39-43, doi: 10.
36348/gajhss.2020.v02i03.001.
Ali, M.U., Ahmed, S., Ferzund, J., Mehmood, A. and Rehman, A. (2017), “Using PCA and factor
analysis for dimensionality reduction of bio-informatics data”, International Journal of
Advanced Computer Science and Applications, Vol. 8 No. 5, pp. 1-12, doi: 10.48550/arXiv.
1707.07189.
Belgiu, M. and Dr�aguţ, L. (2016), “Random Forest in remote sensing: a review of applications and
future directions”, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 114, pp. 24-31,
doi: 10.1016/j.isprsjprs.2016.01.011.
Borovikov, E. (2014), “An evaluation of support vector machines as a pattern recognition tool”, arXiv,
Vol. 1412, p. 4186, doi: 10.48550/arXiv.1412.4186.
Gogtay, N.J. and Thatte, U.M. (2017), “Principles of correlation analysis”, Journal of the Association of
Physicians of India, Vol. 65 No. 3, pp. 78-81, PMID: 28462548.
Hamdan, Y.B. and Sathesh, A. (2021), “Construction of statistical SVM based recognition model for
handwritten character recognition”, Journal of Information Technology and Digital World,
Vol. 3 No. 2, pp. 92-107, doi: 10.36548/jitdw.2021.2.003.
Han, S., Qubo, C. and Meng, H. (2012), “Parameter selection in SVM with RBF kernel function”,
Proceedings of World Automation Congress 2012, IEEE, Puerto Vallarta, Mexico, 24-28 June,
pp. 1-4, available at: https://www.semanticscholar.org/paper/Parameter-selection-in-SVM-with-
RBF-kernel-function-Han-Qubo/c9dd0a01310e85a4fb8882cbd5f4dd084e0899f1 (accessed 14
September 2024).
Huang, H., Chin, H.C. and Haque, M.M. (2008), “Severity of driver injury and vehicle damage in traffic
crashes at intersections: a Bayesian hierarchical analysis”, Accident Analysis and Prevention,
Vol. 40 No. 1, pp. 45-54, doi: 10.1016/j.aap.2007.04.002.
Jørgensen, K. (2011), “A tool for safety officers investigating ‘simple’ accidents”, Safety Science,
Vol. 49 No. 1, pp. 32-38, doi: 10.1016/j.ssci.2009.12.023.
� c, L. and Bo�zi�c-Stuli�
Krstini�c, D., Braovi�c, M., Seri� � c, D. (2020), “Multi-label classifier performance
evaluation with confusion matrix”, Computer Science and Information Technology, Vol. 10,
pp. 1-14, doi: 10.5121/csit.2020.100801.
Mendeloff, J. and Staetsky, L. (2014), “Occupational fatality risks in the United States and the United
Kingdom”, American Journal of Industrial Medicine, Vol. 57 No. 1, pp. 4-14, doi: 10.1002/ajim.22258.
Mesbah, A. (2016), “Stochastic model predictive control: an overview and perspectives for future
research”, IEEE Control Systems Magazine, Vol. 36 No. 6, pp. 30-44, doi: 10.1109/MCS.2016.
2602087.
Mohammadi, M., Rashid, T.A., Karim, S.H.T., Aldalwie, A.H.M., Tho, Q.T., Bidaki, M., Rahmani, A.M.
and Hosseinzadeh, M. (2021), “A comprehensive survey and taxonomy of the SVM-based
intrusion detection systems”, Journal of Network and Computer Applications, Vol. 178, 102983,
doi: 10.1016/j.jnca.2021.102983.
SASBE Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B. and Turaga, D.S. (2017), “Learning feature
engineering for classification”, Proceedings of the Twenty-Sixth International Joint Conference
on Artificial Intelligence (IJCAI-17), Melbourne, 25 August, Vol. 17, pp. 2529-2535, doi: 10.
24963/ijcai.2017/352.
Rashid, H.M., Ahmed, A., Wali, B. and Qureshi, N.A. (2019), “An analysis of highway work zone
safety practices in Pakistan”, International Journal of Injury Control and Safety Promotion,
Vol. 26 No. 1, pp. 37-44, doi: 10.1080/17457300.2018.1476383.
Roberts, C.J., Edwards, D.J., Hosseini, M.R., Matzeo-Garcia, M. and Owusu-Man, D. (2019), “Post
occupancy evaluation: a critical review of literature”, Engineering Construction and
Architectural Management, Vol. 26 No. 9, pp. 2084-2106, doi: 10.1108/ECAM-09-2018-0390.
Scetbon, M. and Harchaoui, Z. (2021), “A spectral analysis of dot-product kernels”, paper presented at
the 24th International Conference on Artificial Intelligence and Statistics, 13-15 April, PMLR,
available at: https://proceedings.mlr.press/v130/scetbon21b.html (accessed 12 July 2024).
Shi, Q. and Abdel-Aty, M. (2015), “Big Data applications in real-time traffic operation and safety
monitoring and improvement on urban expressways”, Transportation Research Part C:
Emerging Technologies, Big Data in Transportation and Traffic Engineering, Vol. 58,
pp. 380-394, doi: 10.1016/j.trc.2015.02.022.
Tharwat, A., Gaber, T., Ibrahim, A. and Hassanien, A.E. (2017), “Linear discriminant analysis: a
detailed tutorial”, AI Communications, Vol. 30 No. 2, pp. 169-190, doi: 10.3233/AIC-170729.
Vapnik, V. (1999), The Nature of Statistical Learning Theory, Springer science & Business Media,
New York, ISBN: 9781475732641, 1475732643.
Wahbah, M., Mohandes, B., El-Fouly, T.H. and El Moursi, M.S. (2022), “Unbiased cross-validation
kernel density estimation for wind and PV probabilistic modelling”, Energy Conversion and
Management, Vol. 266, 115811, doi: 10.1016/j.enconman.2022.115811.
Yu, H., Yang, J. and Han, J. (2003), “Classifying large data sets using SVMs with hierarchical
clusters”, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, Washington, DC, 24 August, pp. 306-315, doi: 10.1145/956750.
956786.
Zhai, X., Krajcik, J. and Pellegrino, J.W. (2021), “On the validity of machine learning-based next
generation science assessments: a validity inferential network”, Journal of Science Education
and Technology, Vol. 30 No. 2, pp. 298-312, doi: 10.1007/s10956-020-09879-9.
Zhao, J., Fu, X. and Zhang, Y. (2016), “Research on risk assessment and safety management of
highway maintenance project”, Procedia Engineering, Vol. 137, pp. 434-441, doi: 10.1016/j.
proeng.2016.01.278.
Zheng, A. and Casari, A. (2018), Feature Engineering for Machine Learning: Principles and
Techniques for Data Scientists, O’Reilly Media, ISBN: 9781491953198, 1491953195.
Zhou, X., Lu, P., Zheng, Z., Tolliver, D. and Keramati, A. (2020), “Accident prediction accuracy assessment
for highway-rail grade crossings using random forest algorithm compared with decision tree”,
Reliability Engineering and System Safety, Vol. 200, 106931, doi: 10.1016/j.ress.2020.106931.
Zhou, H., Wang, X. and Zhu, R. (2022), “Feature selection based on mutual information with correlation
coefficient”, Applied Intelligence, Vol. 52 No. 5, pp. 5457-5474, doi: 10.1007/s10489-021-02524-x.
Corresponding author
David J. Edwards can be contacted at: [email protected]
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]