0% found this document useful (0 votes)

49 views32 pages

Loretta Sabs e

Uploaded by

k.training2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views32 pages

Loretta Sabs e

Uploaded by

k.training2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/2046-6099.htm

Smart and
Unravelling incipient accidents: Sustainable Built
Environment
a machine learning prediction
of incident risks in
highway operations
Loretta Bortey Received 17 August 2024
Revised 15 September 2024
Infrastructure Futures Research Group, Birmingham City University, Accepted 18 September 2024
Birmingham, UK
David J. Edwards
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK and
Faculty of Engineering and the Built Environment, University of Johannesburg,
Johannesburg, South Africa
Chris Roberts
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK and
Quantity Surveying Department, Nelson Mandela University – South Campus,
Port Elizabeth, South Africa, and
Iain Rillie
Highways England Company Limited Birmingham, Birmingham, UK and
Infrastructure Futures Research Group, Birmingham City University,
Birmingham, UK

Abstract
Purpose – Safety research has focused on drivers, pedestrians and vehicles, with scarce attention given to highway
traffic officers (HTOs). This paper develops a robust prediction model which enables highway safety authorities to
predict exclusive incidents occurring on the highway such as incursions and environmental hazards, respond
effectively to diverse safety risk incident scenarios and aid in timely safety precautions to minimise HTO incidents.
Design/methodology/approach – Using data from a highway incident database, a supervised machine learning
method that employs three algorithms [namely Support Vector Machine (SVM), Random Forests (RF) and Naı€ve
Bayes (NB)] was applied, and their performances were comparatively analysed. Three data balancing algorithms were
also applied to handle the class imbalance challenge. A five-phase sequential method, which includes (1) data
collection, (2) data pre-processing, (3) model selection, (4) data balancing and (5) model evaluation, was implemented.
Findings – The findings indicate that SVM with a polynomial kernel combined with the Synthetic Minority
Over-sampling Technique (SMOTE) algorithm is the best model to predict the various incidents, and the
Random Under-sampling (RU) algorithm was the most inefficient in improving model accuracy. Weather/
visibility, age range and location were the most significant factors in predicting highway incidents.
Originality/value – This is the first study to develop a prediction model for HTOs and utilise an incident
database solely dedicated to HTOs to forecast various incident outcomes in highway operations. The prediction
model will provide evidence-based information to safety officers to train HTOs on impending risks predicted by
the model thereby equipping workers with resilient shocks such as awareness, anticipation and flexibility.
Keywords Sustainable development goal 8.8, Safety risk, Predictive modelling, Machine learning,
Government policy
Paper type Research paper
Smart and Sustainable Built
Environment
The authors wish to thank National Highways (a UK Government company reporting to the © Emerald Publishing Limited
2046-6099
Department for Transport) for funding and supporting this research. DOI 10.1108/SASBE-08-2024-0316
SASBE 1. Introduction
In the ever-evolving landscape of road and transportation operations, ensuring the safety
and well-being of highway workers is paramount (Eseonu et al., 2018). Given the complex
nature of operations conducted on the highway and the close proximity to live traffic,
highway workers are prone to encounter various health and safety (H&S) risks, which could
result in the occurrence of incidents, accidents or injuries (Strain et al., 2022). Highway traffic
incidents, e.g. incursions and the emission of hazardous substances (Rogovski et al., 2021),
necessitate prompt and precise reactions from all personnel responsible for highway safety.
Over the last decade in the UK, 12 road employees (including two highway traffic officers
(HTOs)) have died on the road network, according to National Highways (formerly
Highways England, a UK Government company serving the UK Government’s Department
of Transport), with three workers being run over by drivers (GOV.UK, 2017).
To augment safety, judicious identification of event and/or incident categories likely to
occur on the highway is critical for deploying suitable resources and optimising traffic flow
(Eseonu et al., 2018). It is imperative that corporate workplace H&S teams can recognise and
understand possible safety incident infractions that could be detrimental to workplace safety
before highway personnel are deployed to work zones (Goh and Soon, 2014). For HTOs,
traditional statistical methods are currently being used to better understand and mitigate
risk after they have occurred, making it a reactive approach to safety risk management
(Bortey et al., 2021). However, traditional statistical methods of incident classification are
incapable of handling the voluminous incident data generated and/or the complexities of
modern highway operations (George and Hautier, 2021). Ajayi et al. (2020) espoused that
machine learning (ML) provides new tools for overcoming issues that conventional statistical
methods are inadequate for. For example, ML algorithms can capture the nuances of the
multifaceted and uncertain nature of incidents (Zhang et al., 2019; Eslamirad et al., 2020). The
innate ability of ML to discern intricate patterns when processing large volumes of data is a
desirable quality that makes it preferable for the present study (Dai et al., 2019; Fernando
et al., 2024). Therefore, the research question for this present research is as follows: “Can ML
accurately predict the risk of incipient accidents occurring involving HTOs?”
Through rigorous experimentation and analysis of primary data, this paper develops a
robust predictive framework which will enable highway authorities to prioritise resources,
respond effectively to the diverse safety scenarios and enhance proof-driven safety
precautions to minimise HTO incidents. Such a framework will be particularly useful during
safety risk assessments meetings conducted prior to HTOs resuming operations on the
highway (Hegde and Rokseth, 2020). Safety officers can then educate and train HTOs on
impending risks predicted by the model, thereby equipping workers with resilient shocks
such as awareness of risks (Tong et al., 2015; Shirali et al., 2018), anticipation of the risks and
response and flexibility to make important safety decisions (Woods and Wreathall, 2003).
Three ML algorithms were explored, namely Random Forests (RF), Naı€ve Bayes (NB) and
Support Vector Machine (SVM) classifiers, by leveraging their inherent strengths to enhance
the validity, veracity and efficiency of incident classification (Tixier et al., 2016; Zhang et al.,
2019; Koc et al., 2022). Associated objectives are threefold, namely to (1) present a
comparative analysis of SVM kernels by evaluating how well they performed using metrics
such as precision, accuracy score, recall and F1-score; (2) apply original data to three ML
algorithms (i.e. SVM, RF and NB) and evaluate these based on a comparative analysis of the
overall accuracy score of each model and the precision, recall and F1-score of each class for
the target variable and (3) apply balancing algorithms to the original data and compare their
performance to that of the unbalanced (UB) data. This study departs from existing literature
which only focuses on specific incidents such as accidents, injuries or their severity (Cheng
et al., 2012; Chiang et al., 2018). Instead, it assesses the feasibility of employing different ML
models to predict several safety risk incidents likely to occur during highway operations.
This is the first study which utilises an incident database solely dedicated to highway Smart and
workers to forecast incident outcomes using myriad variables and focuses on predictive Sustainable Built
models tailored for HTOs. Environment

2. Incident management and prediction for HTOs

Incident management is a process that handles unplanned events and interruptions in
operations and seeks to restore operations back to a functional state (Cattermole-Terzic and
Horberry, 2020). Various studies have sought to provide an adequate response to safety
incidents in diverse fields. For example, Son et al. (2019) identified and addressed the
challenges in incident management during a hurricane, and St Denis et al. (2023) proposed a
system that tracks data from the utility strike (US) national incidents management system to
address challenges in accessing incident data. One area where incident management has also
been optimally significant is traffic management (Yue et al., 2023; Cattermole-Terzic and
Horberry, 2020). Effective incident management is a critical component of road safety and
accident prevention (Owens et al., 2010). Managing incident risks enables a reduction of
secondary incident occurrences on the road and reduces the impact of resulting traffic
congestion and emissions (Hughes et al., 2015). Studies reveal that managing safety incidents
could be undertaken in diverse ways (Erdogan, 2009; Darby et al., 2009; Chen et al., 2015). For
instance, Chen et al. (2015) advanced a road safety model for performance evaluation and
decision-making for policymakers.
Studies into road safety have predominantly focused on drivers, pedestrians and the
general public – neglecting road workers or highway worker (Dell’Acqua et al., 2013;
Dadashova et al., 2016). Bortey et al. (2022) identified that a section of highway workers
mostly neglected in road safety research are HTOs. However, HTOs working within traffic
incident inner and outer cordons (i.e. physical barriers) are exposed to operational activities
executed both night and day, often in various atmospheric and climate conditions and
frequently in close proximity to high-speed traffic, creating a dangerous work environment
(Strain et al., 2022).

2.1 ML for H&S incident prediction

ML can combine both qualitative and quantitative data to build accurate prediction or
classification models (Moosavi et al., 2020; Ren et al., 2016). Table 1 shows different ML
methods that have been adopted in various safety risk management fields corresponding to
five ML techniques, namely regression, classification, clustering, association and
reinforcement learning (Koc et al., 2022). The association technique expresses the task as
connections between attributes (Saranya et al., 2020), whereas clustering describes the
domain problem through segmentation or division into coherent groupings (Saranya et al.,
2020). Reinforcement learning learns to solve multi-level problems by training models on
real-life scenarios in a trial-and-error method (Sutton and Barto, 2018).
Literature reveals that ML has been adopted in diverse safety risk management areas
(Tixier et al., 2016; Uma and Eswari, 2022) such as preventive measures (Zhang et al., 2019),
incident investigation and analysis (Jocelyn et al., 2018), emergency response and
management (Lopez et al., 2018), regulatory compliance and standards (Rachman and
Ratnayake, 2019) and human factors and behavioural aspects (Goh et al., 2018). Prediction of
H&S risk has gained significant interest, e.g. types of occupational accidents at construction
sites (Kang and Ryu, 2019), vehicle occupants’ injuries at signalised intersections (Kang and
Ryu, 2019) and accidents in construction (Koc et al., 2022). Notable studies include Moosavi
et al. (2019), who utilised ML in accident risk prediction, but a major limitation was the fact
that sparse data were used. Tsoukalas and Fragiadakis (2016) used multi-variable linear
incident management
models in safety risk
Application of ML
Table 1.

SASBE
ML models
SRIM area Objective References Regression Classification Clustering Reinforcement Association

Preventive The goal is to find and apply Tixier et al. (2016), Zhang DNN, NB, RF, DT,
measures measures to prevent safety et al. (2019), Ajayi et al. MARS, KNN and SGTB
occurrences. Hazard identification, (2020), Uma and Eswari GLM
safety protocols, safety training, (2022), Oyedele et al.
equipment maintenance, and (2021)
engineering controls are some of the
issues that may be covered. The
purpose is to reduce the likelihood of
incidents occurring proactively
Incident This field delves into methods that Mirabadi and Sharifian LAD and DNN CART GRI;
investigation can be applied for incident (2010), Goh and Chua Apriori
and analysis investigations and analysis. It aims (2013), Jocelyn et al.
to understand what went wrong and (2018), Hegde and
why, investigation of triggers, Rokseth (2020), Weng
contributing factors, and root et al. (2016)
causes of safety events are
conducted. The findings can be used
to prevent future situations like this
Emergency This area explores emergency Zagorecki et al. (2013), ANN, SVM, DT K-means RLM
response and response strategies, processes, and Lopez et al. (2018) and KNN
management approaches. It involves the study of
the effectiveness of reaction teams,
evacuation plans, communication
protocols, and resource coordination
during and after safety crises

(continued )
ML models
SRIM area Objective References Regression Classification Clustering Reinforcement Association

Regulatory This section investigates the Rachman and Ratnayake KNN, SVM,
compliance and compliance of safety legislation, (2019) GBDT. RF and
standards standards, and best practises. It AB
probes whether organisations and
sectors follow predefined safety
guidelines and assess the efficiency
of current regulations in decreasing
safety risks
Human factors Human behaviour and variables like Patel and Jha (2015), Goh SVM, KNN, RF
and behavioural as training, motivation, and et al. (2018) ANN, LR and
aspects communication are frequently NB
influencing incidents pertaining to
safety. This area therefore examines
the impact of human factors in
safety events and investigates
methods to improve human
behaviour and safety decision-
making
Source(s): Table by authors’

Sustainable Built
Environment

Smart and
Table 1.
SASBE regression and genetic algorithm analysis to predict occupational risk in the ship-building
industry. Similarly, classification and regression trees have been utilised in exploring
contributing factors to occupational risk in the Taiwan construction industry (Cheng et al.,
2012). Another limitation of past ML models utilised resides in the usage of perception type
data (often gathered from questionnaires or interviews), which may not necessarily reflect
reality (cf. Abbasianjahromi and Aghakarimi, 2023; Guill�en Perales et al., 2024). Table 1
details research undertaken with ML models in the safety risk management areas, and
notably, classification is the most common approach to safety risk prediction, albeit most
studies combined different ML models.

2.2 Safety indicator data

For safety prediction models, input variables are used to anticipate the likelihood of an accident
or incident, and the model’s performance is largely dependent on the input variables
(Tsoukalas and Fragiadakis, 2016). Industries and organisations evaluate safety performance
using certain metrics or measurements known as safety indicators (Zhang et al., 2019). These
safety indicators could either be leading or lagging metrics based on what they are used to
measure and how they are used (Bayramova et al., 2023; Sarvari et al., 2024). Input variables
are, however, attributes used as inputs to the ML model (Chen et al., 2018). They are the
parameters that the model analyses to make predictions (Chen et al., 2018). ML algorithms for
safety risk prediction can therefore use safety indicators as a basis to choose input variables.
Various other studies emphasise the influence that safety indicators can have on the
selection of input variables (Chen et al., 2018; Zhang et al., 2019). For example, if a safety
culture survey suggests a low degree of safety communication, safety communication input
variables might be used for the safety prediction model. Furthermore, safety indicators can
be employed to validate the input variables used for the safety prediction model (Chen et al.,
2018; Zhang et al., 2019) – e.g. if a high degree of safety communication is indicated by the
safety culture survey, the input factors relating to safety communication may be verified for
their usefulness in predicting safety events (Chen et al., 2018). Bortey et al. (2022) reviewed
various other indicators that could be used as input variables for a highway prediction
model. These include environmental indicators, e.g. weather and road conditions (Rogovski
et al., 2021); social and/or demography indicators, e.g. age and income (Alizadeh et al., 2015);
institutional indicators, e.g. drivers’ education and policing practices (Namian et al., 2016);
economic indicators, e.g. road budgets and safety investments (Ajayi et al., 2020);
technological indicators, e.g. vehicle fleet quality and composition and road infrastructure
characteristics (Nnaji et al., 2019); spatial-temporal indicators, e.g. real-time traffic volumes,
congestion and emergency response (Meng et al., 2020), and incident-based indicators, e.g.
injury occurrence rate and potential severity level (Sarkar et al., 2020).

3. Methodology
A positivist philosophical stance (Park et al., 2020) couched within an abductive approach
(Posillico, 2023) was adopted for this study. Within contemporary safety management
literature, such an approach has been extensively used to, for example, build an ML
predictive model based on national data for fatal accidents of construction workers (Choi
et al., 2020) and measure hand-arm vibration exposure risk in the UK utilities sector
(Edwards et al., 2020). Although it is through scientific means that positivist philosophies can
be verified, combining positivist philosophies with an abductive approach allows room for
elements of uncertainty which are presented as “most likely” or “best case scenario” (Sober,
2013). Such elements of uncertainty could be encountered in stochastic prediction models
developed in this present study, which justifies the adoption of abductive reasoning
(Kl€as and Vollmer, 2018). Moreover, abduction has been extensively used in a plethora of ML Smart and
research (Kl€as and Vollmer, 2018; Crowder et al., 2020), thus adding further validity to this Sustainable Built
choice. Environment
To predict incident risks, ML algorithms were developed using data from past incidents
and events recorded in incident reports contained within the UK’s highways accident report
database. The data consist of variables representing safety indicators (Bayramova et al.,
2023, 2024), which are combined and entered into ML algorithms to enable pre-emptive
prediction of incidents, thereby improving decision-making when a highway task or
operation is going to be executed.
A sequential explanatory mixed method, which employs both qualitative and
quantitative data (Roberts et al., 2021), was adopted for this study, where quantitative
data predominated in the analysis was conducted. This sequential method comprises five
phases, namely, (1) data collection, (2) data pre-processing, (3) model selection, (4) data
balancing and (5) model evaluation – refer to Figure 1. The data collection phase identified 22
independent variables such as region, site and/or project, location, weather and/or visibility
and season, and the dependent variable, which is “event type,” includes nine different classes,
namely, personal illness/injury (PI) type, undesirable circumstance/near miss (UN), security
(SC), environmental (EN), infrastructure (IF), facilities/site (FS), structural safety (SS), US and
incursion/impact protection vehicle (IPV) strike (IS) (refer to Table 2). The data were then
cleansed as part of the data pre-processing phase to improve data accuracy and quality.
Investigations carried out during the model selection phase were analysis of the correlation of
feature variables and analysis of feature importance using RF. These analyses were carried
out to ascertain which independent variables were most influential in the performance of the
classification model. The data balancing phase also evaluated and compared data balancing
algorithms and their impact on model training. The final model evaluation stage involved a
performance evaluation and comparison of the performance of the three ML models
employed.

Figure 1.
The sequential method
adopted
SASBE Target variable Description Examples

Personal illness/injury Situation where an HTO experiences a Fell, slipped or tripped from height or the
(PI) medical issue or sustains an injury same level, attacked by an animal,
while on duty collision, hit by a moving object/plant/
vehicle or falling object
Undesirable Events that have the potential to cause Security threat, technology failure,
circumstance/near harm but are narrowly avoided or contact with hazard, slips, trips and falls
miss (UN) stopped before any adverse without injuries
consequences occur
Security (SC) Events that compromise the safety and Intimidating behaviour, physical
well-being of highway traffic officers assault, racial abuse, verbal abuse or
due to intentional or malicious actions insult, object(s) thrown at road worker,
vehicle driven at road worker
Environmental (EN) Events that affect the natural Disturbance of natural site/ecology,
surroundings or the external conditions heritage/archaeology, land
in which HTOs operate contamination, nuisance (noise, light,
odour, vibration, dust, steam), spill, leak
or uncontrolled discharge, weather
Infrastructure (IF) Incidents related to the physical Failure or damage of technology,
structures, technology and components communication and signals, hard
of the transportation system shoulder misuse, cone strike, live
carriageway crossing
Facilities/site (FS) Hazards associated with the specific Cable management, car park, cleaning,
location or site where highway traffic fire evacuation, grounds maintenance,
officers are stationed temperature, housekeeping, pest control
and waste management
Structural safety (SS) Risks associated with the integrity of Collision with superstructure,
buildings, bridges, or other structures substructure, parapet or vehicle
in the highway environment containment barrier, bridge – fire, flood,
scour, bridge component – steel failure,
corrosion, component loss, concrete
deterioration/damage, post tensioning
Utility strike (US) Incidents where underground utilities, Utility/Service Strike CCTV, electricity,
such as gas, water, or electrical lines, are gas, oil, drainage, telecom, other cables
accidently damaged or struck during or pipelines, water
highway activities
Incursion/IPV strike Events where unauthorised vehicles incursion; intentional - because of
(IS) enter restricted areas or where highway breakdown, breach of rolling roadblock
traffic officers’ vehicles are struck to seek benefit incursion, to seek
information
blue light incursion incursions
incursion: unintentional - driver
confused, follow in incursion, result of
Table 2. accident
Description of target IPV strike
variable classes Source(s): Table by authors’

3.1 Data collection

A historical dataset consisting of highway incident reports recorded and stored on a bespoke
incident and accident reporting database known as the Highway Accident Reporting Tool
(HART) (populated and owned by National Highways) provided both independent and
dependent variable data for modelling. Although HART contains other dataset variables
(such as claims made, cost of claims, etc.) they were considered irrelevant to this specific
research and hence were excluded from further analysis. In total, 72,811 incident cases
reported between 2017 and 2022 were selected from the database with nine different classes Smart and
of various incidents (event types) which is the target variable (i.e. PI, UN, SC, EN, IF, FS, SS, Sustainable Built
US and IS). At the time of conducting the analysis, this timeframe included all available data, Environment
and although the COVID-19 pandemic occurring during this period, its true impact upon
traffic flow on motorways (which are covered by National Highways) can only be determined
when a longer time series is made available (before and after the pandemic). The class
distribution for the target variable is imbalanced (i.e. classes are not equally represented in
the data) with undesired UN being the class with the most occurrence.

3.2 Data pre-processing

Figure 2 depicts the process for the data pre-processing stage. First, the Python library
Pandas was used to upload the data from the cloud storage into Jupyter notebooks to allow
other libraries to access the data for its operations. Data cleaning techniques (Xu et al., 2020)
were then used to identify and amend errors (such as data duplication and data omissions)
presented in the data. The first task was managing missing data. Columns and rows which
had ˂50% of data required in it were dropped from the dataset. Missing data from rows were
filled in with data using information from the rows that have similar and corresponding
features with.
Other features which would be pertinent in deriving insights from data exploration were
extracted premised upon extant variables in a feature engineering process. For example, the
variables, “day of week” and “month” of incident were extracted from the variable “time of
event”. The variable “project risk level” was also derived from the “actual risk severity”
variable. The “actual risk severity” column, which ranges from 1 to 25, is used to compute
and classify the project risk level into high (20–25), medium (10–19) and low (1–9) based on
National Highways GG104 requirement for safety risk assessment (GG 104, 2018). A subset
of the features considered to be most significant was adopted as inputs for the prediction
model. This was done using a RF feature selection method. This approach allows the ML
models to have a better prediction accuracy (Jain and Saha, 2022). After the dataset cleaning,
64,912 data entries constituted the final data set (refer to Table 3).

Figure 2.
The data pre-
processing procedure
SASBE Independent
Indicators variable Data type Meaning References

Environmental Published Integer ID number for data point Zhang et al.

indicators Record ID (2016)
Region Categorical The region where project is Zhang et al.
based (2012)
Site/Project’ Categorical The site where project is based Sarkar et al.
(2020)
Location Categorical The location of the project site Tixier et al.
(2016)
Did this event Categorical Is incident a strategic road Tixier et al.
occur on the network related? (Yes/No) (2016)
SRN?
Weather/ Categorical The visibility at time of incident Ajayi et al. (2020)
Visibility (rainy, stormy, clear, windy)
Season Categorical The season of the incident Rogovski et al.
(winter, summer, spring, and (2021)
autumn)
Social/ Experience in Integer The number of years worker has Sarkar et al.
demography current role been working in that position (2020)
indicators Age range Integer The age of the worker Alizadeh et al.
(2015)
Time-based Month Integer The month of incident Meng et al.
indicators (2020)
Date and time of Date time The date and time incident Tsoukalas and
event occurred Fragiadakis
(2016)
Year Categorical Year of incident Chiang et al.
(2018)
Day_of_week Categorical The day of the week incident
happened (Monday-Sunday)
Time_of_day Categorical The time of the day (morning, Namian et al.
afternoon, evening, and night) (2016)
Agent-based Vehicles Categorical Whether a vehicle was a party to Kidando et al.
indicator involved? incident (Yes/No) (2021)
Incident-based “Injury Categorical The likelihood of an injury Cheng et al.
indicators occurrence” occurring (True/False) (2012)
“Part of Body Categorical The part of the body likely to be Ajayi et al. (2020)
Affected” affected (head, hand, waist, leg
etc)
“Project risk Categorical The likely severity of project Meng et al.
level” risk (high, medium, low) (2020)
Actual severity Integer What the actual impact was (1– Tixier et al.
rating 25) (2016)
Potential Integer What the possible impact of
severity rating incident could be (1–25)
Dependent
variable
“Event type” Categorical The kind of incident likely to Sarkar et al.
occur (personal illness/injury, (2020)
Table 3. undesirable circumstance,
Description of security, environment and
variables used in infrastructure)
modelling Source(s): Table by authors’
3.3 Model building Smart and
Python 3.0 (Anaconda) was the platform used in model building (Blagus and Lusa, 2013). Sustainable Built
A method known as the 10-fold cross-validation (Malakouti et al., 2023) was also used to Environment
validate the model by randomly dividing the data (selected features) into ten folds and
applied in ten experiments. In each experiment, the data are divided into training and test
datasets. The model was then trained, and the accuracy of each model was determined by
averaging the accuracy across the ten experiments. Due to the class imbalance of the target
variable, the best evaluation metrics used to assess how well the model performed were the
precision, recall, F1-score and the accuracy score of each class (Sarkar et al., 2020). The best
performing model was then determined after comparing the performance of each model.
Figure 3 depicts the model building process.
3.3.1 Support Vector Machine. SVM is the primary ML method used in building the
prediction model and represents a mathematical operational concept that was derived from
statistical learning theory (Bhavsar and Panchal, 2012). SVM is a learning algorithm for
pattern classification and regression. Finding the ideal linear hyperplane with acceptable

Figure 3.
Flowchart of model
building experiment
SASBE generalisation performance or the anticipated classification error for unknown test samples
is the fundamental training premise for SVMs (Bhavsar and Panchal, 2012).
In SVM, the number of features that exit in the dataset determines the dimensional space
available for plotting individual observations. For example, if there is F number of features in
the dataset, then each observation is plotted as a point in an F-dimensional space (Bhavsar
and Panchal, 2012). The presence of a hyperplane in SVM serves as a decision boundary,
which distinguishes the two classes in the feature plane (Mun et al., 2017). A data point
located either side of the hyperplane can be recognised as a distinct class (Ding et al., 2014).
The data points nearest to the hyperplane are referred to as the support vector, which has
influence on the orientation and positioning of the hyperplane (Ding et al., 2014). If a straight
line can be used to categorise the data into two sets, then the task is termed a linear SVM
problem. According to the original formulation for SVM (Cortes and Vapnik, 1995), the
hyperplane can be expressed as follows:

mT x � c ¼ 0 (1)

where m is an F-dimensional vector and c is a scalar.

Given a linear data, the hyperplane f(x) 5 0 differentiating the given data can be
expressed as follows:
X
s
f ðxÞ ¼ mT x þ c ¼ mj xj þ c ¼ 0 (2)
j¼1

However, this present research presents a multi-classification task whereby the target
variable has nine classes; hence, the use of a plane rather than a line is more appropriate
(Mun et al., 2017). SVMs were initially intended for binary classification; however, complex
distributed real-world data cannot be distinguished using traditional linear SVMs
(Tixier et al., 2016). SVM therefore uses the kernel technique to generalise it to a non-
linear hyperplane (Guo et al., 2021). The resultant algorithm is similar, but each scaler
product is changed to a non-linear kernel function (Cortes and Vapnik, 1995).
3.3.1.1 The kernel technique. The kernel algorithm is a mathematical technique that
enables SVM to classify an ensemble of initially one-dimensional data in a “two-dimensional”
manner (Guo et al., 2021). The four kernel functions adopted for this study, given a kernel
“W,” are:
(1) Polynomial kernel
Expressed as:
� �
Wðvi ; vj Þ ¼ βvi T vj þ 1 8; β > 0 (3)

where Wðvi ; vj Þ is the value of the kernel function for two input data points, vi ; vj, β is the
constant that determines the scaling of the dot product between vi ; vj and 8 is the degree of the
kernel function and an adjustable parameter which makes the kernel flexible.
(2) Linear kernel
Expressed as:

Wðvi ; vj Þ ¼ vi T * vj (4)
(3) Gaussian radial basis function (RBF) Kernel Smart and
Sustainable Built
Expressed as: Environment
� �
Wðvi ; vj Þ ¼ exp −βkvi � vj k2 (5)

where kvi − vj k is the Euclidean distance between vectors vi and vj and β is the positive
constant which controls the shape and reach of the RBF kernel function.
(4) Sigmoid kernel function
Expressed as:
� �
Wðvi ; vj Þ ¼ tanh βvi T vj þ c (6)

where β is the adjustable parameter that establishes the weight or relevance of the dot
product of the input vectors vi ; and vj
3.3.2 Random forest classifier. RF is a ML classification method that comprises of an
ensemble of uncorrelated decision trees (forest of trees), which unanimously determine how
new objects are classified (Paul et al., 2018). Each individual decision tree present in the
random forest has a vote and can make prediction on classes (Rigatti, 2017). The class with a
majority of votes becomes the model’s prediction.
For the nine-class, multi-class problem in this study, the equation for a RF model can be
represented as follows:

C i ¼ Modeðf1 ðxi Þ; f2 ðxi Þ; . . . ; f9 ðxi ÞÞ (7)

where C i is the predicted class for the i-th sample; N is the number of trees in the random
forest; fN ðxi Þ is each individual decision tree and Mode for the i-th sample and the Mode
function determines the predicted class that is used the most frequently across all
decision trees.
3.3.3 Gaussian NB. NB classifiers are a type of simple probabilistic classifiers that use
Bayes’ theorem with a high degree of independence across the features (Li et al., 2022).
To train the model for each given class, the class probability and conditional probabilities are
calculated for each feature while taking into consideration the NB hypothesis that the
features are conditionally independent given the class label (Farid et al., 2014). Using the
Bayes theorem and the conditional probabilities computed during the training, the likelihood
that a data point is affiliated to each of the nine classes is calculated, and the one with highest
calculated probability will be the predicted class.

3.4 Data balancing

Figure 4 shows a significant disparity in the quantity of data distributed among the classes of
the target variable (event type), being heavily biased towards “undesired circumstances/near
miss”. For such a major difference in the amount of data allocated to each class, the challenge
of class imbalance arises (Sarkar et al., 2020). This type of data set is known as an imbalanced
dataset (Sarkar et al., 2020). To handle the issue of data imbalance in this study, three
balancing algorithms, namely Synthetic Minority Over-sampling Technique (SMOTE)
(Chawla et al., 2002), random undersampling (RU) (Estabrooks and Japkowicz, 2001) and
random over-sampling (RO) (Shelke et al., 2017) were applied to each of the three ML models
used in the experiment (SVM, NB and RF) separately. For data consistency and reliability (for
balanced algorithms for an imbalanced data set), synthetic data were generated to augment
SASBE

Figure 4.
Histogram showing
the percentage of
samples per class
(unbalanced dataset)

the training sets while maintaining its statistical properties, so none of the statistical
properties were lost.
RO is the process of enhancing training data with multiple replications of some minority
classes (Shelke et al., 2017). Using an UB dataset, SMOTE is an over-sampling approach
that creates artificial samples for the minority class (Chawla et al., 2002), while the RU is a
non-heuristic approach that attempts to balance desired distributions rather than
removing instances at random from the majority class (Estabrooks and Japkowicz,
2001). Although SMOTE and RO are over-sampling techniques, in contrast to RO, the
SMOTE algorithm oversamples the minority class by creating artificial cases as opposed
to oversampling by substitution (Fern�andez et al., 2018). Instead of using data space, the
SMOTE algorithm generates counterfeit instances depending on the feature space (Blagus
and Lusa, 2013).

3.5 Performance evaluation

True positives (TPs), true negatives (TNs), false positives and false negatives are the four
types of classification outcomes in the frequently employed two-class confusion matrix
(Xu et al., 2020) for model evaluation. Five metrics of accuracy, sensitivity (recall), specificity,
precision and F1 were proposed based on the four outcomes (Yacouby and Axman, 2020).
These five metrics were employed in this study. The following equations are indicative of the
metrics used for performance evaluation in this research:
Precision: measures the number of correctly predicted positive instances.
TP
precision ¼ (8)
TP þ FP

Recall: evaluates the predictive ability of the model for positive samples. It is the proportion
of positive samples to total positive cases that were correctly categorised as positive.
TP Smart and
recall ¼ (9) Sustainable Built
TP þ FN Environment
Specificity: measures the number of actual negative instances correctly predicted as
negative.

TN
specificity ¼ (10)
TN þ FP
F1-score: it is the harmonic mean of recall and precision. It offers a balanced evaluation of
precision and recall, which is particularly valuable when classes are imbalanced.

2ðprecision * recallÞ
F1 � score ¼ (11)
precision þ recall

Accuracy score: measures the total accurate predictions (both TPs and TNs) out of all
instances.

TP þ TN
accuracy ¼ (12)
TP þ TP þ FP þ FN

4. Analysis results
The results obtained for the predictive modelling experiments performed indicated a
potential feasibility of applying ML models in accurately predicting incidents.

4.1 Feature importance in prediction model

The RF classifier was used to determine the relevance of each attribute in the predictive
model. In a study to predict types of occupational accidents occurring on construction sites,
Kang and Ryu (2019) used the RF method to determine the most important features that
contribute to the prediction. RF has an embedded feature importance score and could be
adopted for classification and regression tasks (Li et al., 2019). Gini Importance (GI) is the
specific feature importance indictor that is utilised in this study (Nembrini et al., 2018). When
splitting the data at a feature’s values within the decision trees of an RF, GI quantifies how
much each feature contributes to the decline in impurity (i.e. the degree of disorder in a
dataset) or the rise in purity (i.e. the degree of certainty in a dataset). The value of the GI sums
up to one – the higher the value of the GI, the greater its significance at the node. Based on the
trained model, feature importance analysis was performed – refer to Figure 5.
From Figure 5, it could be inferred that the top ten predictors of incidents in highway
operations are: weather/visibility: (GI 5 0.2886); year: (GI 5 0.1810); age range: (GI 5 0.1443);
location: (GI 5 0.0878); vehicles involved: (GI 5 0.0393); site/project: (GI 5 0.0391); injury
type: (GI 5 0.0324); region: (GI 5 0.0283); part of body affected: (GI 5 0.0266) and season:
(GI 5 0.0264).

4.2 Class imbalance handling

70% of the selected sample chosen as training data yielded a total of 38,509 data entries from
the initial dataset. Each class had the following data entries. “UN” (f 5 34,491), “PI”
(f 5 1,188), “IF” (f 5 933), “FS” (f 5 545), “IS” (f 5 389), “EN” (f 5 312), “SC” (f 5 265), “SS”
(f 5 203) and “US” (f 5 183). The under-sampling, over-sampling and synthetic sampling
SASBE

Figure 5.
Feature importance
analysis with RF

balanced dataset were created utilising three distinct resampling strategies to solve the
imbalanced dataset problem. The totals from all employed datasets and their distribution
among various incident type classes are shown in Table 4. Evidently, all the methods were
successful in balancing the data because they all introduced diversity by creating new
instances, thereby potentially improving the model’s generalisation ability (Sarkar
et al., 2020).

4.3 Performance evaluation of classifiers

Each model’s predicted accuracy on test data is examined in order to determine how
generalised it is. The accuracy score, recall, precision and F1-score are employed. These
metrics are frequently used to assess how well a model predicts the ground truth for a multi-
class performance evaluation (Yacouby and Axman, 2020).
4.3.1 Experiment 1: choosing a kernel for SVM. The SVM algorithm’s performance is
heavily influenced by the kernel that is used; however, there is currently no common
guideline regarding which kernel should be utilised (Patle and Chouhan, 2013). Table 5
presents the results for the first experiment session. Four individual kernel types (i.e.
polynomial kernels, RBF kernels, linear kernel and Sigmoid kernel) were examined and
applied separately to the SVM model. This was to compare which kernel had the best
performance on the original dataset. The experimental results presented in Table 5 show that
the polynomial function performed better in terms of accuracy score to all the other kernels.
Furthermore, only the polynomial kernel was able to predict classes of each data entry, albeit
the classification precision was extremely poor. The other kernels were unable to predict
some of the classes. Comparatively, therefore, the polynomial kernel is preferred for the SVM
model in this study.
4.3.2 Experiment 2: comparing the performance with unbalanced and balanced data. The
three algorithms are applied to the dataset independently in experiment two. Performance
metrics of each of the three classifiers are evaluated using the UB and balanced datasets.
First, the ML classifiers were applied on the original UB data. The results of the experiment
Dataset Balancing algorithm Total UN PI IF FS IS EN SC SS US

Original dataset N/A 38,509 34,491 1,188 933 545 389 312 265 203 183
Synthetic sampling SMOTE 310,419 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491
Under-sampling RU 1,647 183 183 183 183 183 183 183 183 183
Oversampling RO 310,419 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491 34,491
Source(s): Table by authors’
Distribution of classes

Sustainable Built
in target variable

Environment

Smart and
Table 4.
SASBE Target class EN FS IS IF PI SC SS UN US

Kernel
POLYNOMIAL Precision 0.24 0.37 0.79 0.50 1.00 0.57 0.27 0.99 0.38
Recall 0.09 0.42 0.74 0.69 1.00 0.15 0.04 1.00 0.19
F1-score 0.13 0.39 0.77 0.58 1.00 0.24 0.07 1.00 0.26
Overall accuracy score 0.96
RBF Precision 0.33 0.34 0.83 0.44 1.00 0.00 0.00 0.98 0.29
Recall 0.03 0.39 0.68 0.67 1.00 0.00 0.00 1.00 0.02
F1-score 0.05 0.37 0.74 0.53 1.00 0.00 0.00 0.99 0.04
Overall accuracy score 0.95
LINEAR Precision 0.00 0.30 0.80 0.40 1.00 1.00 0.00 0.99 0.00
Recall 0.00 0.28 0.71 0.72 1.00 0.01 0.00 1.00 0.00
F1-score 0.00 0.29 0.75 0.52 1.00 0.02 0.00 0.99 0.00
Overall accuracy score 0.95
SIGMOID Precision 0.00 0.00 0.38 0.01 0.79 0.00 0.00 0.93 0.00
Table 5. Recall 0.00 0.00 0.12 0.00 0.82 0.00 0.00 1.00 0.00
Comparative analysis F1-score 0.00 0.00 0.18 0.00 0.80 0.00 0.00 0.96 0.00
of SVM kernel Overall accuracy score 0.92
performance Source(s): Table by authors’

with the imbalanced data are detailed in Table 6 and Figure 6 for visual comparison. The
different balancing algorithms were then applied separately to the data and then analysed
using the three different classification algorithms separately. Results for the precision of each
target class have been presented in Table 6, and the percentage accuracy score has been
presented for comparison in Figure 6. Table 6 shows the comparative results of all these
algorithms on UB dataset and the three balancing datasets (SM, RO and RU). In all the cases
for the UB dataset, it is observed that RF (97%) performs better than the others based on the
accuracy score metric, followed by SVM (96%) and NB (94%).
The high percentage of accuracy score for the models indicates good performance.
Nevertheless, although accuracy score is mostly used to judge model performance, it might
suffer an anomaly when classes are imbalanced (Sarkar et al., 2020). The individual classes
are therefore evaluated based on the precision, recall and f1-score to ascertain how each class
performed in the model. The balancing algorithms are then applied to the dataset for further
analysis – refer to Figure 7.
It is observed that although the accuracy scores decreased for SVM (82%), the individual
classes had much better performance in classifying classes correctly when the SM algorithm
is applied. A model which has a performance of 70% and above is considered a good
performing model (Zhang et al., 2019); therefore, an accuracy score of 82% is satisfactory.
Some classes still performed poorly with NB and RF classifier when the SVM balancing
algorithm was applied.

5. Discussion
The number of features, attributes or variables used for input in ML is referred to as its
dimensionality (Jia et al., 2022). Therefore, dimensionality reduction is essentially the
technique of decreasing the number of variables in a dataset while retaining as much
variance as possible in the original dataset (Zebari et al., 2020). Several ML studies have
leveraged the proficiency of dimensionality reduction in handling multicollinearity, reducing
training time and preventing overfitting (Huang et al., 2019; Hasan and Abdulazeez, 2021).
However, while the literature is replete with several techniques that can be employed in
Target
class EN FS IS IF PI SC SS UN US
Model (%) UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU UB SM RO RU

SVM Precision 24 71 00 07 37 71 00 09 79 94 38 47 50 73 01 17 100 100 79 100 57 83 00 07 27 78 00 06 99 80 93 99 38 83 00 06

Recall 09 66 00 14 42 68 00 27 74 95 12 82 69 58 00 27 100 100 82 100 15 75 00 26 04 86 00 28 100 97 100 83 19 91 00 52
F1-score 13 69 00 09 39 69 00 13 77 94 18 60 58 65 00 20 100 100 80 100 24 79 00 11 07 82 00 10 100 88 96 90 26 87 00 10
accuracy 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80 96 82 92 80
score
RF Precision 27 38 53 26 53 56 62 41 78 76 80 49 58 61 60 52 100 100 100 100 43 41 49 24 31 17 33 21 100 100 99 100 51 47 51 23
Recall 18 20 19 18 70 75 74 62 84 87 85 89 73 73 78 39 100 100 100 100 19 21 19 33 11 08 12 39 100 100 100 97 19 34 43 69
F1-score 22 26 28 21 60 64 67 49 81 81 83 63 64 67 68 45 100 100 100 100 27 28 27 28 16 11 18 28 100 100 100 98 28 39 47 34
accuracy 97 97 97 93 97 97 97 93 97 97 97 93 97 97 97 93 97 97 97 100 97 97 97 93 97 97 97 93 97 97 97 93 97 97 97 93
score
NB Precision 15 14 16 23 00 00 00 24 53 47 46 57 00 00 00 67 100 100 100 100 57 59 53 50 12 12 11 15 100 100 100 100 44 38 44 22
Recall 25 19 20 07 00 00 00 58 88 88 66 86 00 00 00 03 100 100 100 100 07 09 07 09 84 90 87 70 99 99 99 99 11 08 11 02
F1-score 19 16 18 11 00 00 00 34 66 61 80 69 00 00 00 05 100 100 100 100 13 15 13 16 20 21 19 25 99 99 99 100 18 13 18 04
accuracy 94 94 93 94 94 94 93 94 94 94 93 94 94 94 93 94 94 93 94 94 94 94 93 94 94 94 93 94 94 94 93 94 94 94 93 94
score
Source(s): Table by authors’
of three ML algorithms
Comparative analysis

Sustainable Built
Environment
and balancing

Smart and
algorithms

Table 6.
SASBE

NB, 94
NB
ML algorithm

RF, 97
RF

SVM, 96
SVM

Figure 6.
Accuracy score 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97
comparison for Accuracy score (%)
imbalanced data
Source(s): Figure by authors’

100

70
Accuracy score (%)

0
Figure 7. UB SM RO RU
Accuracy scores for the Balancing algorithms
three models with the
unbalanced and SVM RF NB
balanced data
Source(s): Figure by authors’
dimensionality reduction – such as factor analysis (Huang et al., 2019; Sing et al., 2022) and Smart and
principal component analysis (Hasan and Abdulazeez, 2021) – there is no specific technique Sustainable Built
recommended for a particular ML task. For instance, Zhang et al. (2012) used Pearson Environment
correlation to examine the relationship between categorical and continuous auxiliary
variables of soil organic matter. The variables selected for the model building include “type_
of_work”, “year”, “day_of_week”, “location”, “age range”, “weather/visibility”, “month”,
“season”, “time of day”, “site/project”, “part of body affected”, “experience in current role”,
“injury type”, “region” and “vehicles involved.” Hence, the dimensionality of the dataset was
reduced from 22 to 16. Given the novelty of this present research study, it is impractical to
compare these results against other similar studies conducted (cf. Tixier et al., 2016; Kidando
et al., 2021). That said, this work now provides precedence for other follow-on studies to
challenge the findings reported upon herein and improve the inherent performance of future
models developed. Moreover, identification of the selected variables acts as a first point
towards developing practical solutions to reducing risk. It should be emphasised that a
model per se will not eliminate risks and accidents but information contained within provides
the first step towards developing a human-centric knowledge management organisation
within National Highways. As additional and new data are collected, further refinements can
be made to ensure that knowledge is distributed throughout the organisation for the
betterment of all within the workforce. Indeed, promoting a safe and secure working
environment for all workers is an important sustainable development goal (SDG).
To investigate the significance of the variables selected, the RF method was used to probe
which variables had the strongest GI (Nembrini et al., 2018). Notably, variables selected were
confirmed as important variables by the RF method. Also, the variables found to be
significant in predicting safety incidents indicates that it is imprudent and insufficient to
solely rely on only safety-related data when predicting safety incident types. Rather, project-
related data such as the project location, site/project and demographic data (such as age of
workers) could have substantial impact on prediction outcomes.
The challenge with imbalanced data is that when a model is trained on imbalanced data, it
learns that it can obtain high accuracy by constantly predicting the majority class,
irrespective of whether recognising the minority class is equally or more significant when
applied to a real-world scenario (Sarkar et al., 2020). The three balancing algorithms, namely
SMOTE, RU and RO were therefore applied to each of the three ML models used in the
experiment (SVM, NB and RF) separately. This resulted in having the same number of data
entries for each class. Hence, there was no majority and minority class, which could cause
bias in prediction.
In the experiments to explore the performance of the ML classifiers when certain
balancing algorithms are applied, it is observed that, with the imbalanced dataset, for all
three algorithms, only the classes IS, PI and UN could be correctly classified, while the rest of
the classes performed poorly and failed to classify any of the instances correctly. This could
be attributed to these three classes being part of the top four classes with the majority of data.
Sarkar et al. (2020) explains how a model constantly exhibits bias towards majority classes
for imbalanced datasets, leading to poor predictions. This bias is evident in the results
presented by the original imbalanced dataset for all the three classes.
When the SMOTE algorithm was applied to all three classifiers, it was observed that the
SVM classifier could predict accurately 70% and above all the classes present. For a model to
be deemed as useful, it must have a minimum precision between 70 and 80% (Juba and Le,
2019); hence, the SVM þ SMOTE algorithm can be considered as a useful model for
predicting incident types. However, it is worth noting that the accuracy of the
SVM þ SMOTE algorithm was 82%, which is less than the accuracy score of the SVM
algorithm alone. The SVM classifier with the SMOTE balancing algorithm outperformed all
the other balancing algorithms (i.e. RO and RU), with 72% being the least precision recorded
SASBE for a class (0). This result is a clear indication that accuracy score alone is not a good enough
metric to evaluate imbalanced classes (McNee et al., 2006). For RF þ SMOTE, although the
accuracy score was 97% (same as with the original data), some of the classes could not be
correctly classified. Some classes such as “EN” had precision as low as 38%, which is
considered a very poor model. NB þ SMOTE had not shown much promise either. The
accuracy score was 93%, which indicated a good performance but the precision and recall of
the individual classes showed very poor performance (the lowest precision and recall being
0%). Therefore, with the SMOTE experiment, SVM is considered the best classifier for
incident types.
Premised upon the analysis of the results from the experiment, it can be concluded that
SVM þ SMOTE is the best-performing model among the models used and should therefore be
considered as a prime ML model for predicting various incident types in highway operations.
These findings demonstrate the model’s ability to generalise that the model could reasonably
predict the dependent variables that can identify events in highway operations for a given set
of inputs. The result of this model indicates the possibility of predicting incidents even with
imbalanced datasets. Zhang et al. (2015) had performed a similar study using SVM to forecast
the profitability of a project with an imbalanced dataset, which resulted in an accuracy score
between 0.74 and 0.91. However, all predicted values were from the majority class, which
presented a challenge that the proposed model in this study remedies. Augmenting SVM with
balancing algorithms such as SMOTE helps to ensure that the individual predicted values of a
class are the true reflection of the high accuracy score a model presents. Furthermore, this
study demonstrates that the occurrence of incidents are not random events but rather that there
are underlying trends or pattern based on features found to be significant in predicting incident
outcomes that could indicate an incident occurrence in time.

5.1 Future work

The entirety of significant input variables was not utilised due to its unavailability. Therefore,
it is recommended that future work be undertaken to explore other significant indicators
highlighted in literature such as economic indicators, technological and spatial-temporal
indicators. Furthermore, this research only employs three ML models based on their
significance in literature. It is recommended that ensemble learning algorithms such as
AdaBoost, Gradient Boosting and XGBoost could be adopted by leveraging their ability to
combine the strengths of multiple ML models and compensate for the weaknesses of the
individual models used, leading to more accurate predictions. The research could also be
expanded to cover other different types of highway workers, e.g. tier one contractors and
subcontractors working on new road construction or maintenance and repair. Indeed, it would
be interesting and useful to observe differences and similarities between HTOs and other
workers. Such work could help tailor guidance given to all highway workers. Despite the
limitations of this research, the study provides a foundation for using ML models to develop a
customised safety risk incident prediction model for HTOs. This is the first study that uses
data solely sourced from a database dedicated to HTOs and considers factors peculiar to HTOs
in forecasting risk events. This study therefore forms a solid basis for future works, and the
findings can be adapted to various incidents that HTOs are exposed to in other industries.

5.2 Proposed user interface for the model and practical implications
Given the complexity of modelling developed and big data set involved, this study proposes
the future development of web-based graphical user interface (GUI) software for safety
officers working for highway projects. Such software will inspire knowledge-based decision-
making backed by evidence and encourage the emergence of a learning organisation. The
software will enable users enter input variables concomitant to the project operation and then
send instructions to the ML model to forecast the type(s) of incident likely to occur during Smart and
that operation – refer to Figure 8. Hence, safety officers can prioritise H&S risk elements Sustainable Built
based on their probability of occurrence. Consequently, proper consideration is given to these Environment
risk variables while limiting occurrences in order to deliver a safer environment. Currently,
the process of HTO risk assessment relies heavily on the subjectivity of human judgement,
perception and human lived experiences when an incident is to be predicted (Hegde and
Rokseth, 2020). The model presents an efficient approach to discovering meaningful patterns
and trends that cause the various types of incidents, occurring for an objective alternative to
predicting and visualising incident scenarios.
A key implication of this study is the opportunity it offers safety officers to train HTO and
educate them on impending risks, provide needed resources (e.g. Personal protective
equipment) and pre-emptively address impending risks before they occur. The model also
unearths important attributes or indicators that safety officers should cogitate on when
implementing safety risk prevention strategies. The GUI also allows organisations to collect
new data, which would help them address the challenges of insufficient data. Data collected
can be analysed for deeper insights into why incidents occur, how they occur and what can be
done to prevent them from occurring. Furthermore, a major implication of this study is the
resilience that this model brings to organisations. Some indicators of resilience according to
Chen et al. (2018) include awareness, anticipation, management commitment, response and
reporting culture. This model enables all these measures to be attainable, i.e. predicting
incidents beforehand will allow workers to be aware of impending risks and anticipate the
incident, thereby providing enough time to respond to the incident with proactive measures
to pre-empt it. Entering data variables through the GUI instead of reporting to management
directly also promotes reporting culture, which will encourage employees to record

Figure 8.
Proposed highway
safety
prediction model
SASBE indicators without fear of being sanctioned. This is because the required data variables do
not have any personal identification indicators.

6. Conclusion
HTOs face an omnipresent risk of being involved in several incidents that could endanger
their lives and prevent them from returning home safely after work. For highway authorities
to establish an efficient and reliable solution for these incidents, the probability of their
occurrence must be predicted accurately as a first step towards risk mitigation measures
being implemented. However, increasingly, Industry 4.0 solutions are being employed as
solutions with which to identify, report and ultimately mitigate risks posed (Newman et al.,
2021). This is because current traditional statistical methods used to mitigate risk by UK
highway authorities are more of a reactive approach which needs to be tackled. ML
algorithms provide an effective technique to analysing complex data, identifying the
nuances of highly complicated patterns in the data and making accurate predictions based on
these patterns. This in turn promotes safe and secure working environments for all workers
as required by the SDG 8.8, thus contributing significantly to societal impact.
Therefore, by comparatively analysing the performance of three ML algorithms and three
balancing algorithms, this study developed a novel model for classifying incident risks in
highway operations. This study has shown that year, weather, age, type of work, vehicles
and project location play an essential role in predicting incident types. In this study, 16
variables out of the 22 variables presented were retained after correlation analysis was used
for dimensionality reduction. 10 out of the 16 variables were found to have a high
contribution to predicting incident types. The polynomial kernel of SVM had the best
performance measure for the SVM classifiers among the other kernels.
The SVM classifier with the SMOTE algorithm demonstrated the best performance for each
class of the target variable compared to the other algorithms. The RF algorithm combined with
the RO algorithm also had a good performance but was not better than SVM. The predictive
model developed in this study has shown sufficiently accurate results, which can be reliable for
preventing safety incidents in highway operations. Therefore, practical implications can be
accorded to the results of this study. Namely, the predictive model can augment the efforts of
safety officers by establishing which incidents need more resources and attention at a
particular time for effective decision-making. Preventing incidents will also enable highway
operations to run within budget, encourage timeliness and prevent waste of resources. Above
all, avoiding such incidents could save lives. Knowledge acquired from modelling will also
embed learning within a knowledge management-enabled organisation that uses introspection
(of good and poor practices) to drive improvements forward (cf. Savari et al., 2024).

References
Abbasianjahromi, H. and Aghakarimi, M. (2023), “Safety performance prediction and modification
strategies for construction projects via machine learning techniques”, Engineering Construction
and Architectural Management, Vol. 30 No. 3, pp. 1146-1164, doi: 10.1108/ECAM-04-2021-0303.
Ajayi, A., Oyedele, L., Owolabi, H., Akinade, O., Bilal, M., Davila Delgado, J.M. and Akanbi, L. (2020),
“Deep learning models for health and safety risk prediction in power infrastructure projects”,
Risk Analysis, Vol. 40 No. 10, pp. 2019-2039, doi: 10.1111/risa.13425.
Alizadeh, S.S., Mortazavi, S.B. and Mehdi Sepehri, M. (2015), “Assessment of accident severity in the
construction industry using the Bayesian theorem”, International Journal of Occupational
Safety and Ergonomics, Vol. 21 No. 4, pp. 551-557, doi: 10.1080/10803548.2015.1095546.
Bayramova, A., Edwards, D.J., Roberts, C. and Rillie, I. (2023), “Enhanced safety in complex socio-technical
systems via safety-in-cohesion”, Safety Science, Vol. 164, 106176, doi: 10.1016/j.ssci.2023.106176.
Bayramova, A., Edwards, D.J., Roberts, C. and Rillie, I. (2024), “Unravelling the Gordian knot of Smart and
leading indicators”, Safety Science, Vol. 177, 106603, doi: 10.1016/j.ssci.2024.106603. Sustainable Built
Bhavsar, H. and Panchal, M.H. (2012), “A review on support vector machine for data classification”, Environment
International Journal of Advanced Research in Computer Engineering and Technology
(IJARCET), Vol. 1 No. 10, pp. 185-189, ISSN: 2278 – 1323.
Blagus, R. and Lusa, L. (2013), “SMOTE for high-dimensional class-imbalanced data”, BMC
Bioinformatics, Vol. 14, pp. 1-16, doi: 10.1186/1471-2105-14-106.
Bortey, L., Edwards, D.J., Shelbourn, M. and Rillie, I. (2021), “Development of a proof-of-concept risk
model for accident prevention on highways construction”, paper presented at the Quantity
Surveying Research Conference, 10 November Port Elizabeth, South Africa, Vol. 10, available at:
https://www.researchgate.net/publication/358444274_Conceptual_model_development_for_
safety_risk_management_-a_machine_learning_approach (accessed 10 September 2024).
Bortey, L., Edwards, D.J., Roberts, C. and Rillie, I. (2022), “A review of safety risk theories and models
and the development of a digital highway construction safety risk model”, Digital, Vol. 2,
pp. 206-223, doi: 10.3390/digital2020013.
Cattermole-Terzic, V. and Horberry, T. (2020), “Improving traffic incident management using team
cognitive work analysis”, Journal of Cognitive Engineering and Decision Making, Vol. 14 No. 2,
pp. 152-173, doi: 10.1177/1555343419882.
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002), “SMOTE: synthetic minority
over-sampling technique”, Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357, doi:
10.48550/arXiv.1106.1813.
Chen, F., Wang, J. and Deng, Y. (2015), “Road safety risk evaluation by means of improved entropy
TOPSIS–RSR”, Safety Science, Vol. 79, pp. 39-54, doi: 10.1016/j.ssci.2015.05.006.
Chen, Y., McCabe, B. and Hyatt, D. (2018), “A resilience safety climate model predicting construction
safety performance”, Safety Science, Vol. 109, pp. 434-445, doi: 10.1016/j.ssci.2018.07.003.
Cheng, C., Leu, S., Cheng, Y., Wu, T. and Lin, C. (2012), “Applying data mining techniques to explore
factors contributing to occupational injuries in Taiwan’s construction industry”, Accident
Analysis and Prevention, Vol. 48, pp. 214-222, doi: 10.1016/j.aap.2011.04.014.
Chiang, Y.H., Wong, F.K.W. and Liang, S. (2018), “Fatal construction accidents in Hong Kong”,
Journal of Construction Engineering and Management, Vol. 144 No. 3, 04017121, doi: 10.1061/
(ASCE)CO.1943-7862.0001433.
Choi, J., Gu, B., Chin, S. and Lee, J.S. (2020), “Machine learning predictive model based on national
data for fatal accidents of construction workers”, Automation in Construction, Vol. 110, 102974,
doi: 10.1016/j.autcon.2019.102974.
Cortes, C. and Vapnik, V. (1995), “Support-vector networks”, Machine Learning, Vol. 20 No. 3,
pp. 273-297, doi: 10.1007/BF00994018.
Crowder, J.A., Carbone, J., Friess, S., Crowder, J.A., Carbone, J. and Friess, S. (2020), “Abductive
artificial intelligence learning models”, in Artificial Psychology: Psychological Modelling and
Testing of AI Systems, pp. 51-63, doi: 10.1007/978-3-030-17081-3_5.
Dadashova, B., Arenas-Ram�ırez, B., Mira-McWilliams, J. and Aparicio-Izquierdo, F. (2016),
“Methodological development for selection of significant predictors explaining fatal road
accidents”, Accident Analysis and Prevention, Vol. 90, pp. 82-94, doi: 10.1016/j.aap.2016.02.003.
Dai, W.Z., Zu, Q., Yu, Y and Zhou, Z.H. (2019), “Bridging machine learning and logical reasoning by
abductive learning”, in Wallach, H. et al. (Eds), Advances in Neural Information Processing
Systems, Curran Associates, available at: https://proceedings.neurips.cc/paper_files/paper/
2019/file/9c19a2aa1d84e04b0bd4bc888792bd1e-Paper.pdf
Darby, P., Murray, W. and Raeside, R. (2009), “Applying online fleet driver assessment to help
identify, target and reduce occupational road safety risks”, Safety Science, Vol. 47 No. 3,
pp. 436-442, doi: 10.1016/j.ssci.2008.05.004.
SASBE Dell’Acqua, G., Russo, F. and Biancardo, S.A. (2013), “Risk-type density diagrams by crash type on
two-lane rural roads”, Journal of Risk Research, Vol. 16 No. 10, pp. 1297-1314, doi: 10.1080/
13669877.2013.788547.
Ding, S., Hua, X. and Yu, J. (2014), “An overview on nonparallel hyperplane support vector machine
algorithms”, Neural Computing and Applications, Vol. 25 No. 5, pp. 975-982, doi: 10.1007/
s00521-013-1524-6.
Edwards, D.J., Rillie, I., Chileshe, N., Lai, J., Hosseini, M.R. and Thwala, W.D. (2020), “A field survey of
hand–arm vibration exposure in the UK utilities sector”, Engineering Construction and
Architectural Management, Vol. 27 No. 9, pp. 2179-2198, doi: 10.1108/ECAM-09-2019-0518.
Erdogan, S. (2009), “Explorative spatial analysis of traffic accident statistics and road mortality
among the provinces of Turkey”, Journal of Safety Research, Vol. 40 No. 5, pp. 341-351, doi: 10.
1016/j.jsr.2009.07.006.
Eseonu, C., Gambatese, J. and Nnaji, C. (2018), Reducing Highway Construction Fatalities through
Improved Adoption of Safety Technologies, The Centre for Construction Research and Training
(CPWR Small Study No.17-4-PS), available at: https://www.elcosh.org/record/document/4305/
d001575.pdf
Eslamirad, N., Malekpour Kolbadinejad, S., Mahdavinejad, M. and Mehranrad, M. (2020), “Thermal
comfort prediction by applying supervised machine learning in green sidewalks of Tehran”, Smart
and Sustainable Built Environment, Vol. 9 No. 4, pp. 361-374, doi: 10.1108/SASBE-03-2019-0028.
Estabrooks, A. and Japkowicz, N. (2001), “A mixture-of-experts framework for learning from
imbalanced data sets”, in Hoffmann, F., Hand, D.J., Adams, N., Fisher, D. and Guimaraes, G.
(Eds), Advances in Intelligent Data Analysis, Lecture Notes in Computer Science, Springer,
Berlin, Heidelberg, pp. 34-43, doi: 10.1007/3-540-44816-0_4.
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A. and Strachan, R. (2014), “Hybrid decision tree
and Naı€ve Bayes classifiers for multi-class classification tasks”, Expert Systems with
Applications, Part 2, Vol. 41 No. 4, pp. 1937-1946, doi: 10.1016/j.eswa.2013.08.089.
Fern�andez, A., Garcia, S., Herrera, F. and Chawla, N.V. (2018), “SMOTE for learning from imbalanced
data: progress and challenges, marking the 15-year anniversary”, Journal of Artificial
Intelligence Research, Vol. 61, pp. 863-905, doi: 10.1613/jair.1.11192.
Fernando, A., Siriwardana, C., Law, D., Gunasekara, C., Zhang, K. and Gamage, K. (2024), “A scoping
review and analysis of green construction research: a machine learning aided approach”, Smart
and Sustainable Built Environment, Vol. ahead-of-print No. ahead-of-print, doi: 10.1108/SASBE-
08-2023-0201.
George, J. and Hautier, G. (2021), “Chemist versus machine: traditional knowledge versus machine
learning techniques”, Trends in Chemistry, Vol. 3 No. 2, pp. 86-95, doi: 10.1016/j.trechm.2020.
10.007.
GG 104 Requirements for safety risk assessment (2018), available at: https://www.
standardsforhighways.co.uk/tses/attachments/0338b395-7959-4e5b-9537-5d2bdd75f3b9?
inline5true (accessed 12 August 2023).
Goh, Y.M. and Chua, D. (2013), “Neural network analysis of construction safety management systems:
a case study in Singapore”, Construction Management and Economics, Vol. 31 No. 5,
pp. 460-470, doi: 10.1080/01446193.2013.797095.
Goh, Y.M. and Soon, W.T. (2014), Safety Management Lessons from Major Accident Inquiries,
Pearson Education Pte Limited, South Asia, ISBN: 9789814598699.
Goh, Y.M., Ubeynarayana, C.U., Wong, K.L.X. and Guo, B.H. (2018), “Factors influencing unsafe
behaviours: a supervised learning approach”, Accident Analysis and Prevention, Vol. 118,
pp. 77-85, doi: 10.1016/j.aap.2018.06.002.
Gov.uk (2017), “Highways England highlights dangers faced by road workers”, available at: https://
www.gov.uk/government/news/highways-england-highlights-dangers-faced-by-road-workers
(accessed 25 October 2023).
Guill�en Perales, A., Li�ebana-Cabanillas, F., S�anchez-Fern�andez, J. and Herrera, L.J. (2024), “Assessing Smart and
university students’ perception of academic quality using machine learning”, Applied Sustainable Built
Computing and Informatics, Vol. 20 Nos 1/2, pp. 20-34, doi: 10.1108/ACI-06-2020-0003. Environment
Guo, Y., Zhang, Z. and Tang, F. (2021), “Feature selection with kernelized multi-class support vector
machine”, Pattern Recognition, Vol. 117, 107988, doi: 10.1016/j.patcog.2021.107988.
Hasan, B.M.S. and Abdulazeez, A.M. (2021), “A review of principal component analysis algorithm for
dimensionality reduction”, Journal of Soft Computing and Data Mining, Vol. 2 No. 1, pp. 20-30,
available at: https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/8032
Hegde, J. and Rokseth, B. (2020), “Applications of machine learning methods for engineering risk
assessment– a review”, Safety Science, Vol. 122, 104492, doi: 10.1016/j.ssci.2019.09.015.
Huang, X., Wu, L. and Ye, Y. (2019), “A review on dimensionality reduction techniques”, International
Journal of Pattern Recognition and Artificial Intelligence, Vol. 33 No. 10, 1950017, doi: 10.1142/
S0218001419500174.
Hughes, B.P., Newstead, S., Anund, A., Shu, C.C. and Falkmer, T. (2015), “A review of models relevant
to road safety”, Accident Analysis and Prevention, Vol. 74, pp. 250-270, doi: 10.1016/j.aap.2014.
06.003.
Jain, S. and Saha, A. (2022), “Rank-based univariate feature selection methods on machine learning
classifiers for code smell detection”, Evolutionary Intelligence, Vol. 15 No. 1, pp. 609-638, doi: 10.
1007/s12065-020-00536-z.
Jia, W., Sun, M., Lian, J. and Hou, S. (2022), “Feature dimensionality reduction: a review”, Complex and
Intelligent Systems, Vol. 8 No. 3, pp. 2663-2693, doi: 10.1007/s40747-021-00637-x.
Jocelyn, S., Ouali, M.S. and Chinniah, Y. (2018), “Estimation of probability of harm in safety of
machinery using an investigation systemic approach and Logical Analysis of Data”, Safety
Science, Vol. 105, pp. 32-45, doi: 10.1016/j.ssci.2018.01.018.
Juba, B. and Le, H.S. (2019), “Precision-recall versus accuracy and the role of large data sets”,
Proceedings of the AAAI Conference on Artificial Intelligence, 27 January- 1 February, Hawaii,
Vol. 33, pp. 4039-4048, doi: 10.1609/aaai.v33i01.33014039.
Kang, K. and Ryu, H. (2019), “Predicting types of occupational accidents at construction sites in Korea
using random forest model”, Safety Science, Vol. 120, pp. 226-236, doi: 10.1016/j.ssci.2019.
06.034.
Kidando, E., Kitali, A.E., Kutela, B., Ghorbanzadeh, M., Karaer, A., Koloushani, M., Moses, R.,
Ozguven, E.E. and Sando, T. (2021), “Prediction of vehicle occupants injury at signalized
intersections using real-time traffic and signal data”, Accident Analysis and Prevention,
Vol. 149, 105869, doi: 10.1177/03611981211047836.
Kl€as, M. and Vollmer, A.M. (2018), “Uncertainty in machine learning applications: a practice-driven
classification of uncertainty”, Proceedings of computer Safety, Reliability, and Security: SAFECOMP
2018 Workshops, ASSURE, DECSoS, SASSUR, STRIVE, and WAISE, V€aster� as, 18 September,
Vol. 37, pp. 431-438, doi: 10.1007/978-3-319-99229-7_36.
Koc, K., Ekmekcio� € and Gurgun, A.P. (2022), “Accident prediction in construction using hybrid
glu, O.
wavelet-machine learning”, Automation in Construction, Vol. 133, 103987, doi: 10.1016/j.autcon.
2021.103987.
Li, X., Wang, Y., Basu, S., Kumbier, K. and Yu, B. (2019), “A debiased MDI feature importance
measure for random forests”, Advances in Neural Information Processing Systems, Vol. 32,
pp. 8049-8059, available at: https://proceedings.neurips.cc/paper/2019/hash/702cafa3b
b4c9c86e4a3b6834b45aedd-Abstract.html (accessed 13 September 2024).
Li, L., Zhou, Z., Bai, N., Wang, T., Xue, K.H., Sun, H., He, Q., Cheng, W. and Miao, X. (2022), “Naive
Bayes classifier based on memristor nonlinear conductance”, Microelectronics Journal, Vol. 129,
105574, doi: 10.1016/j.mejo.2022.105574.
Lopez, C., Marti, J.R. and Sarkaria, S. (2018), “Distributed reinforcement learning in emergency
response simulation”, IEEE Access, Vol. 6, pp. 67261-67276, doi: 10.1109/ACCESS.2018.2878894.
SASBE Malakouti, S.M., Menhaj, M.B. and Suratgar, A.A. (2023), “The usage of 10-fold cross-validation and
grid search to enhance ML methods performance in solar farm power generation prediction”,
Cleaner Engineering and Technology, Vol. 15, 100664, doi: 10.1016/j.clet.2023.100664.
McNee, S.M., Riedl, J. and Konstan, J.A. (2006), “Being accurate is not enough: how accuracy metrics
have hurt recommender systems”, CHI’06 Extended Abstracts on Human factors in Computing
Systems, pp. 1097-1101, doi: 10.1145/1125451.1125659.
Meng, X., Zhang, K., Pang, K. and Xiang, X. (2020), “Characterization of spatio-temporal distribution
of vehicle emissions using web-based real-time traffic data”, Science of the Total Environment,
Vol. 709, 136227, doi: 10.1016/j.scitotenv.2019.136227.
Mirabadi, A. and Sharifian, S. (2010), “Application of association rules in Iranian Railways (RAI)
accident data analysis”, Safety Science, Vol. 48 No. 10, pp. 1427-1435, doi: 10.1016/j.ssci.2010.
06.006.
Moosavi, S., Samavatian, M.H., Parthasarathy, S., Teodorescu, R. and Ramnath, R. (2019), “Accident
risk prediction based on heterogeneous sparse data: new dataset and insights”, in Banaei-
Kashani, F., Trajcevski, G., G€
uting, R.H., Kulik, L. and Newsam, S. (Eds), Proceedings of the
27th ACM SIGSPATIAL International Conference on Advances in Geographic Information
Systems SIGSPATIAL ’19, Association for Computing Machinery, New York, NY, pp. 33-42,
doi: 10.1145/3347146.3359078.
Moosavi, S.M., Jablonka, K.M. and Smit, B. (2020), “The role of machine learning in the understanding
and design of materials”, Journal of the American Chemical Society, Vol. 142 No. 48,
pp. 20273-20287, doi: 10.1021/jacs.0c09105.
Mun, S., Park, S., Han, D.K. and Ko, H. (2017), “Generative adversarial network based acoustic scene
training set augmentation and selection using SVM hyper-plane”, Paper Presented at the
Detection and Classification of Acoustic Scenes and Events, 16 November, Munich, pp. 93-102,
available at: https://dcase.community/documents/challenge2017/technical_reports/
DCASE2017_Mun_213.pdf
Namian, M., Albert, A., Zuluaga, C.M. and Behm, M. (2016), “Role of safety training: impact on hazard
recognition and safety risk perception”, Journal of Construction Engineering and Management,
Vol. 142 No. 12, 04016073, doi: 10.1061/(ASCE)CO.1943-7862.0001198.
Nembrini, S., K€onig, I.R. and Wright, M.N. (2018), “The revival of the Gini importance”,
Bioinformatics, Vol. 34 No. 21, pp. 3711-3718, doi: 10.1093/bioinformatics/bty373.
Newman, C., Edwards, D., Martek, I., Lai, J., Thwala, W.D. and Rillie, I. (2021), “Industry 4.0
deployment in the construction industry: a bibliometric literature review and UK-based case
study”, Smart and Sustainable Built Environment, Vol. 10 No. 4, pp. 557-580, doi: 10.1108/
SASBE-02-2020-0016.
Nnaji, C., Gambatese, J., Karakhan, A. and Eseonu, C. (2019), “Influential safety technology adoption
predictors in construction”, Engineering Construction and Architectural Management, Vol. 26
No. 11, pp. 2655-2681, doi: 10.1108/ecam-09-2018-0381.
Owens, N., Armstrong, A., Sullivan, P., Mitchell, C., Newton, D., Brewster, R. and Trego, T. (2010),
“Traffic incident management handbook (No.FHWA-HOP-10-013)”, available at: http://www.
ops.fhwa.dot.gov/eto_tim_pse/publications/timhandbook/tim_handbook.pdf (accessed 1
October 2023).
Oyedele, A.O., Ajayi, A.O. and Oyedele, L.O. (2021), “Machine learning predictions for lost time
injuries in power transmission and distribution projects”, Machine Learning with Applications,
Vol. 6, 100158, doi: 10.1016/j.mlwa.2021.100158.
Park, Y.S., Konge, L. and Artino, A.R. (2020), “The positivism paradigm of research”, Academic
Medicine, Vol. 95 No. 5, pp. 690-694, doi: 10.1097/ACM.0000000000003093.
Patel, D.A. and Jha, K.N. (2015), “Neural network approach for safety climate prediction”, Journal of
Management in Engineering, Vol. 31 No. 6, 05014027, doi: 10.1061/(ASCE)ME.1943-5479.
000034.
Patle, A. and Chouhan, D.S. (2013), “SVM kernel functions for classification”, in Patel, C.H., Deheri, G., Smart and
Patel, S.H. and Mehta, S.M. (Eds), 2013 International Conference on Advances in Technology and Sustainable Built
Engineering (ICATE), 23-25 January, Mumbai, IEEE, pp. 1-9, doi: 10.1109/ICATE20315.2013. Environment
Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R. and Kundu, S. (2018), “Improved
random forest for classification”, IEEE Transactions on Image Processing, Vol. 27 No. 8,
pp. 4012-4024, doi: 10.1109/TIP.2018.2834830.
Posillico, J.J. (2023), “Development of an interpersonally grounded construction management
curriculum foundation model”, Doctoral thesis, Birmingham City University, available at:
https://www.open-access.bcu.ac.uk/14277/ (accessed 1 August 2024).
Rachman, A. and Ratnayake, R.M.C. (2019), “Machine learning approach for risk-based inspection
screening assessment”, Reliability Engineering and System Safety, Vol. 185, pp. 518-532, doi: 10.
1016/j.ress.2019.02.008.
Ren, Y., Zhang, L. and Suganthan, P.N. (2016), “Ensemble classification and regression-recent
developments, applications and future directions”, IEEE Computational Intelligence Magazine,
Vol. 11 No. 1, pp. 41-53, doi: 10.1109/MCI.2015.2471235.
Rigatti, S.J. (2017), “Random forest”, Journal of Insurance Medicine, Vol. 47 No. 1, pp. 31-39, doi: 10.
17849/insm-47-01-31-39.1.
Roberts, C., Edwards, D.J., Sing, M.C.P. and Aigbavboa, C. (2021), “Post-occupancy evaluation: process
delineation and implementation trends in the UK higher education sector”, Architectural Engineering
and Design Management, Vol. 19 No. 2, pp. 125-147, doi: 10.1080/17452007.2021.1956422.
Rogovski, P., Cadamuro, R.D., da Silva, R., de Souza, E.B., Bonatto, C., Viancelli, A., Michelon, W.,
Elmahdy, E.M., Treichel, H., Rodr�ıguez-L�azaro, D. and Fongaro, G. (2021), “Uses of
bacteriophages as bacterial control tools and environmental safety indicators”, Frontiers in
Microbiology, Vol. 12, p. 3756, doi: 10.3389/fmicb.2021.793135.
Saranya, T., Sridevi, S., Deisy, C., Chung, T.D. and Khan, M.A. (2020), “Performance analysis of
machine learning algorithms in intrusion detection system: a review”, Procedia Computer
Science, Vol. 171, pp. 1251-1260, doi: 10.1016/j.procs.2020.04.133.
Sarkar, S., Pramanik, A., Maiti, J. and Reniers, G. (2020), “Predicting and analysing injury severity: a
machine learning-based approach using class-imbalanced proactive and reactive data”, Safety
Science, Vol. 125, 104616, doi: 10.1016/j.ssci.2020.104616.
Sarvari, H., Edwards, D.J., Rillie, I. and Posillico, J. (2024), “Building a safer future: analysis of studies
on safety I and safety II in the construction industry”, Safety Science, Vol. 178, doi: 10.1016/j.
ssci.2024.106621.
Shelke, M.S., Deshmukh, P.R. and Shandilya, V.K. (2017), “A review on imbalanced data handling
using undersampling and oversampling technique”, International Journal of Recent Trends in
Engineering Research, Vol. 3 No. 4, pp. 444-449, doi: 10.23883/IJRTER.2017.3168.0UWXM.
Shirali, G., Shekari, M. and Angali, K.A. (2018), “Assessing reliability and validity of an instrument
for measuring resilience safety culture in sociotechnical systems”, Safety and Health at Work,
Vol. 9 No. 3, pp. 296-307, doi: 10.1016/j.shaw.2017.07.010.
Sing, M.C.P., Edwards, D.J., Leung, A.W.T., Liu, H. and Roberts, C.J. (2022), “A theoretical framework
for classifying project complexity at the preconstruction stage using cluster analysis
techniques”, Engineering Construction and Architectural Management, Vol. 29 No. 9,
pp. 3754-3774, doi: 10.1108/ECAM-09-2020-0726.
Sober, E. (2013), Core Questions in Philosophy: A Text with Readings, Vol. 6, Pearson Education,
Boston, p. 28, ISBN: 9780205206698.
Son, C., Sasangohar, F., Peres, S.C., Neville, T.J. and Moon, J. (2019), “Orchestrating through
whirlwind: identified challenges and resilience factors of incident management teams during
Hurricane Harvey”, Proceedings of the Human Factors and Ergonomics Society Annual
Meeting, 23 November, Sage CA: Los Angeles, CA, SAGE Publications, Vol. 63 No. 1,
pp. 899-903, doi: 10.1177/1071181319631265.
SASBE St Denis, L.A., Short, K.C., McConnell, K., Cook, M.C., Mietkiewicz, N.P., Buckland, M. and Balch, J.K.
(2023), “All-hazards dataset mined from the US national incident management system 1999-
2020”, Scientific Data, Vol. 10 No. 1, p. 112, doi: 10.1038/s41597-023-01955-0.
Strain, T.J., Wilson, R.E. and Littleworth, R. (2022), “Role of traffic officers in transportation asset
monitoring”, paper presented at the Transport Research Board 2022, 9-13 January, Washington
DC, available at: https://www.trb.org/Main/Blurbs/182248.aspx> (accessed 10
September 2024).
Sutton, R.S. and Barto, A.G. (2018), Reinforcement Learning: An Introduction, MIT Press, ISBN:
9780262352703, 0262352702.
Tixier, A.J.P., Hallowell, M.R., Rajagopalan, B. and Bowman, D. (2016), “Application of machine
learning to construction injury prediction”, Automation in Construction, Vol. 69, pp. 102-114,
doi: 10.1016/j.autcon.2016.05.016.
Tong, D.Y.K., Rasiah, D., Tong, X.F. and Lai, K.P. (2015), “Leadership empowerment behaviour on
safety officer and safety teamwork in manufacturing industry”, Safety Science, Vol. 72,
pp. 190-198, doi: 10.1016/j.ssci.2014.09.009.
Tsoukalas, V.D. and Fragiadakis, N.G. (2016), “Prediction of occupational risk in the shipbuilding
industry using multivariable linear regression and genetic algorithm analysis”, Safety Science,
Vol. 83, pp. 12-22, doi: 10.1016/j.ssci.2015.11.010.
Uma, S. and Eswari, R. (2022), “Accident prevention and safety assistance using IOT and machine
learning”, Journal of Reliable Intelligent Environments, Vol. 8 No. 2, pp. 79-103, doi: 10.1007/
s40860-021-00136-3.
Weng, J., Zhu, J.Z., Yan, X. and Liu, Z. (2016), “Investigation of work zone crash casualty patterns
using association rules”, Accident Analysis and Prevention, Vol. 92, pp. 43-52, doi: 10.1016/j.aap.
2016.03.017.
Woods, D. and Wreathall, J. (2003), Managing Risk Proactively: The Emergence of Resilience
Engineering, Ohio University, Columbus, GA, available at: https://www.researchgate.net/
publication/228711828_Managing_Risk_Proactively_The_Emergence_of_Resilience_
Engineering (accessed 10 September 2024).
Xu, J., Zhang, Y. and Miao, D. (2020), “Three-way confusion matrix for classification: a measure
driven view”, Information Sciences, Vol. 507, pp. 772-794, doi: 10.1016/j.ins.2019.06.064.
Yacouby, R. and Axman, D. (2020), “Probabilistic extension of precision, recall, and f1 score for more
thorough evaluation of classification models”, in Eger, S., Gao, Y., Peyrard, M., Zhao, W. and
Hovy, E. (Eds), Proceedings of the First Workshop on Evaluation and Comparison of NLP
Systems, 13 November, pp. 79-91, doi: 10.18653/v1/2020.eval4nlp-1.9.
Yue, W., Li, C., Wang, S., Xue, N. and Wu, J. (2023), “Cooperative incident management in mixed
traffic of CAVs and human-driven vehicles”, IEEE Transactions on Intelligent Transportation
Systems, Vol. 24 No. 11, pp. 12462-12476, doi: 10.1109/TITS.2023.3289983.
Zagorecki, A.T., Johnson, D.E. and Ristvej, J. (2013), “Data mining and machine learning in the
context of disaster and crisis management”, International Journal of Emergency Management,
Vol. 9 No. 4, pp. 351-365, doi: 10.1504/IJEM.2013.059879.
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D. and Saeed, J. (2020), “A comprehensive review of
dimensionality reduction techniques for feature selection and feature extraction”, Journal of
Applied Science and Technology Trends, Vol. 1 No. 2, pp. 56-70, doi: 10.38094/jastt1224.
Zhang, S., Huang, Y., Shen, C., Ye, H. and Du, Y. (2012), “Spatial prediction of soil organic matter
using terrain indices and categorical variables as auxiliary information”, Geoderma, Vol. 171,
pp. 35-43, doi: 10.1016/j.geoderma.2011.07.012.
Zhang, H., Li, Y. and Zhang, H. (2019), “Risk early warning safety model for sports events based on
back propagation neural network machine learning”, Safety Science, Vol. 118, pp. 332-336, doi:
10.1016/j.ssci.2019.05.011.
Zhang, L., Wu, X., Qin, Y., Skibniewski, M.J. and Liu, W. (2016), “Towards a fuzzy Bayesian network- Smart and
based approach for safety risk analysis of tunnel-induced pipeline damage”, Risk Analysis, Sustainable Built
Vol. 36 No. 2, pp. 278-330, doi: 10.1111/risa.12448. Environment
Zhang, H., Yang, F., Li, Y. and Li, H. (2015), “Predicting profitability of listed construction companies
based on principal component analysis and support vector machine—evidence from China”,
Automation in Construction, Vol. 53, pp. 22-28, doi: 10.1016/j.autcon.2015.03.001.

Further reading
Alawad, H., Kaewunruen, S. and An, M. (2019), “Learning from accidents: machine learning for safety
at railway stations”, IEEE Access, Vol. 8, pp. 633-648, doi: 10.1109/ACCESS.2019.2962072.
Alharahsheh, H.H. and Pius, A. (2020), “A review of key paradigms: positivism vs interpretivism”,
Global Academic Journal of Humanities and Social Sciences, Vol. 2 No. 3, pp. 39-43, doi: 10.
36348/gajhss.2020.v02i03.001.
Ali, M.U., Ahmed, S., Ferzund, J., Mehmood, A. and Rehman, A. (2017), “Using PCA and factor
analysis for dimensionality reduction of bio-informatics data”, International Journal of
Advanced Computer Science and Applications, Vol. 8 No. 5, pp. 1-12, doi: 10.48550/arXiv.
1707.07189.
Belgiu, M. and Dr�aguţ, L. (2016), “Random Forest in remote sensing: a review of applications and
future directions”, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 114, pp. 24-31,
doi: 10.1016/j.isprsjprs.2016.01.011.
Borovikov, E. (2014), “An evaluation of support vector machines as a pattern recognition tool”, arXiv,
Vol. 1412, p. 4186, doi: 10.48550/arXiv.1412.4186.
Gogtay, N.J. and Thatte, U.M. (2017), “Principles of correlation analysis”, Journal of the Association of
Physicians of India, Vol. 65 No. 3, pp. 78-81, PMID: 28462548.
Hamdan, Y.B. and Sathesh, A. (2021), “Construction of statistical SVM based recognition model for
handwritten character recognition”, Journal of Information Technology and Digital World,
Vol. 3 No. 2, pp. 92-107, doi: 10.36548/jitdw.2021.2.003.
Han, S., Qubo, C. and Meng, H. (2012), “Parameter selection in SVM with RBF kernel function”,
Proceedings of World Automation Congress 2012, IEEE, Puerto Vallarta, Mexico, 24-28 June,
pp. 1-4, available at: https://www.semanticscholar.org/paper/Parameter-selection-in-SVM-with-
RBF-kernel-function-Han-Qubo/c9dd0a01310e85a4fb8882cbd5f4dd084e0899f1 (accessed 14
September 2024).
Huang, H., Chin, H.C. and Haque, M.M. (2008), “Severity of driver injury and vehicle damage in traffic
crashes at intersections: a Bayesian hierarchical analysis”, Accident Analysis and Prevention,
Vol. 40 No. 1, pp. 45-54, doi: 10.1016/j.aap.2007.04.002.
Jørgensen, K. (2011), “A tool for safety officers investigating ‘simple’ accidents”, Safety Science,
Vol. 49 No. 1, pp. 32-38, doi: 10.1016/j.ssci.2009.12.023.
� c, L. and Bo�zi�c-Stuli�
Krstini�c, D., Braovi�c, M., Seri� � c, D. (2020), “Multi-label classifier performance
evaluation with confusion matrix”, Computer Science and Information Technology, Vol. 10,
pp. 1-14, doi: 10.5121/csit.2020.100801.
Mendeloff, J. and Staetsky, L. (2014), “Occupational fatality risks in the United States and the United
Kingdom”, American Journal of Industrial Medicine, Vol. 57 No. 1, pp. 4-14, doi: 10.1002/ajim.22258.
Mesbah, A. (2016), “Stochastic model predictive control: an overview and perspectives for future
research”, IEEE Control Systems Magazine, Vol. 36 No. 6, pp. 30-44, doi: 10.1109/MCS.2016.
2602087.
Mohammadi, M., Rashid, T.A., Karim, S.H.T., Aldalwie, A.H.M., Tho, Q.T., Bidaki, M., Rahmani, A.M.
and Hosseinzadeh, M. (2021), “A comprehensive survey and taxonomy of the SVM-based
intrusion detection systems”, Journal of Network and Computer Applications, Vol. 178, 102983,
doi: 10.1016/j.jnca.2021.102983.
SASBE Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B. and Turaga, D.S. (2017), “Learning feature
engineering for classification”, Proceedings of the Twenty-Sixth International Joint Conference
on Artificial Intelligence (IJCAI-17), Melbourne, 25 August, Vol. 17, pp. 2529-2535, doi: 10.
24963/ijcai.2017/352.
Rashid, H.M., Ahmed, A., Wali, B. and Qureshi, N.A. (2019), “An analysis of highway work zone
safety practices in Pakistan”, International Journal of Injury Control and Safety Promotion,
Vol. 26 No. 1, pp. 37-44, doi: 10.1080/17457300.2018.1476383.
Roberts, C.J., Edwards, D.J., Hosseini, M.R., Matzeo-Garcia, M. and Owusu-Man, D. (2019), “Post
occupancy evaluation: a critical review of literature”, Engineering Construction and
Architectural Management, Vol. 26 No. 9, pp. 2084-2106, doi: 10.1108/ECAM-09-2018-0390.
Scetbon, M. and Harchaoui, Z. (2021), “A spectral analysis of dot-product kernels”, paper presented at
the 24th International Conference on Artificial Intelligence and Statistics, 13-15 April, PMLR,
available at: https://proceedings.mlr.press/v130/scetbon21b.html (accessed 12 July 2024).
Shi, Q. and Abdel-Aty, M. (2015), “Big Data applications in real-time traffic operation and safety
monitoring and improvement on urban expressways”, Transportation Research Part C:
Emerging Technologies, Big Data in Transportation and Traffic Engineering, Vol. 58,
pp. 380-394, doi: 10.1016/j.trc.2015.02.022.
Tharwat, A., Gaber, T., Ibrahim, A. and Hassanien, A.E. (2017), “Linear discriminant analysis: a
detailed tutorial”, AI Communications, Vol. 30 No. 2, pp. 169-190, doi: 10.3233/AIC-170729.
Vapnik, V. (1999), The Nature of Statistical Learning Theory, Springer science & Business Media,
New York, ISBN: 9781475732641, 1475732643.
Wahbah, M., Mohandes, B., El-Fouly, T.H. and El Moursi, M.S. (2022), “Unbiased cross-validation
kernel density estimation for wind and PV probabilistic modelling”, Energy Conversion and
Management, Vol. 266, 115811, doi: 10.1016/j.enconman.2022.115811.
Yu, H., Yang, J. and Han, J. (2003), “Classifying large data sets using SVMs with hierarchical
clusters”, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, Washington, DC, 24 August, pp. 306-315, doi: 10.1145/956750.
956786.
Zhai, X., Krajcik, J. and Pellegrino, J.W. (2021), “On the validity of machine learning-based next
generation science assessments: a validity inferential network”, Journal of Science Education
and Technology, Vol. 30 No. 2, pp. 298-312, doi: 10.1007/s10956-020-09879-9.
Zhao, J., Fu, X. and Zhang, Y. (2016), “Research on risk assessment and safety management of
highway maintenance project”, Procedia Engineering, Vol. 137, pp. 434-441, doi: 10.1016/j.
proeng.2016.01.278.
Zheng, A. and Casari, A. (2018), Feature Engineering for Machine Learning: Principles and
Techniques for Data Scientists, O’Reilly Media, ISBN: 9781491953198, 1491953195.
Zhou, X., Lu, P., Zheng, Z., Tolliver, D. and Keramati, A. (2020), “Accident prediction accuracy assessment
for highway-rail grade crossings using random forest algorithm compared with decision tree”,
Reliability Engineering and System Safety, Vol. 200, 106931, doi: 10.1016/j.ress.2020.106931.
Zhou, H., Wang, X. and Zhu, R. (2022), “Feature selection based on mutual information with correlation
coefficient”, Applied Intelligence, Vol. 52 No. 5, pp. 5457-5474, doi: 10.1007/s10489-021-02524-x.

Corresponding author
David J. Edwards can be contacted at: [email protected]

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]

Road Safety Engineering
No ratings yet
Road Safety Engineering
74 pages
Flagman Training
No ratings yet
Flagman Training
28 pages
BDE Final Report
No ratings yet
BDE Final Report
53 pages
Traffic Rules and Regulations
100% (1)
Traffic Rules and Regulations
25 pages
Road Accident Prediction and Model Interpretation Using A Hybrid K Means and Random Forest Algorithm Approach
No ratings yet
Road Accident Prediction and Model Interpretation Using A Hybrid K Means and Random Forest Algorithm Approach
13 pages
Batch-182 Literature Survey
No ratings yet
Batch-182 Literature Survey
125 pages
Thesis
No ratings yet
Thesis
166 pages
Road Safety Design and Evaluation
No ratings yet
Road Safety Design and Evaluation
29 pages
10 1111@risa 13425
No ratings yet
10 1111@risa 13425
21 pages
Inteligencia Artificial
No ratings yet
Inteligencia Artificial
24 pages
A Bayesian Deep Learning Method For Freeway Incident Detection With Uncertainty Quantification
No ratings yet
A Bayesian Deep Learning Method For Freeway Incident Detection With Uncertainty Quantification
31 pages
17j IAES Turnitin
No ratings yet
17j IAES Turnitin
21 pages
Vishwa ph3
No ratings yet
Vishwa ph3
30 pages
Road Accident Prediction Model Presentation-1
No ratings yet
Road Accident Prediction Model Presentation-1
24 pages
Thesis 12345 Final For Hardbound
100% (2)
Thesis 12345 Final For Hardbound
41 pages
Transportation Research Part C: Shun Yang, Wenshuo Wang, Yuande Jiang, Jian Wu, Sumin Zhang, Weiwen Deng T
No ratings yet
Transportation Research Part C: Shun Yang, Wenshuo Wang, Yuande Jiang, Jian Wu, Sumin Zhang, Weiwen Deng T
15 pages
1 s2.0 S0925753523000802 Main
No ratings yet
1 s2.0 S0925753523000802 Main
12 pages
Risk Field Model of Driving and Its Application in Modeling Car-Following Behavior
No ratings yet
Risk Field Model of Driving and Its Application in Modeling Car-Following Behavior
16 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
Root Cause Analysis of Incidents Using Text Clustering and Classification Algorithms
No ratings yet
Root Cause Analysis of Incidents Using Text Clustering and Classification Algorithms
12 pages
1 s2.0 S2046043024000492 Main
No ratings yet
1 s2.0 S2046043024000492 Main
17 pages
Mini Project Final Tamilarasi
No ratings yet
Mini Project Final Tamilarasi
35 pages
Ijst 2023 3152
No ratings yet
Ijst 2023 3152
11 pages
Road Accident Risk Prediction REVIEW-1
No ratings yet
Road Accident Risk Prediction REVIEW-1
18 pages
Road Accedient Prediction
No ratings yet
Road Accedient Prediction
35 pages
Bayesian Network-Based Knowledge Graph Inference For Highway Transportation Safety Risks
No ratings yet
Bayesian Network-Based Knowledge Graph Inference For Highway Transportation Safety Risks
11 pages
A Data-Driven Approach For Driving Safety Risk Prediction Using Driver Behavior and Roadway Information Data
No ratings yet
A Data-Driven Approach For Driving Safety Risk Prediction Using Driver Behavior and Roadway Information Data
15 pages
Accident Risk Prediction Based On Heterogeneous Sparse Data: New Dataset and Insights
No ratings yet
Accident Risk Prediction Based On Heterogeneous Sparse Data: New Dataset and Insights
10 pages
Computation 12 00131 v2
No ratings yet
Computation 12 00131 v2
21 pages
Ijgi 12 00227
No ratings yet
Ijgi 12 00227
17 pages
1822 B.E Cse Batchno 157
No ratings yet
1822 B.E Cse Batchno 157
47 pages
Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
No ratings yet
Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
6 pages
Pattern Extraction From Incident Reports Using Proactive and Reactive Data: A Case Study of Contractors Safety in A Steel Plant
No ratings yet
Pattern Extraction From Incident Reports Using Proactive and Reactive Data: A Case Study of Contractors Safety in A Steel Plant
12 pages
5integrated Driving Risk Surrogate Model and Car-Following Behavior For
No ratings yet
5integrated Driving Risk Surrogate Model and Car-Following Behavior For
22 pages
IJTE Volume 10 Issue 3 Pages 1089-1102
No ratings yet
IJTE Volume 10 Issue 3 Pages 1089-1102
14 pages
Unsupervised Machine Learning For Managing Safety Accidents in Railway Stations
No ratings yet
Unsupervised Machine Learning For Managing Safety Accidents in Railway Stations
11 pages
Road Accident Prediction Model Using Machine Learning
No ratings yet
Road Accident Prediction Model Using Machine Learning
6 pages
Road Safety
No ratings yet
Road Safety
10 pages
1 s2.0 S1877050924012304 Main
No ratings yet
1 s2.0 S1877050924012304 Main
8 pages
Marcillo2021 Chapter AReviewOfLearning-BasedTraffic
No ratings yet
Marcillo2021 Chapter AReviewOfLearning-BasedTraffic
10 pages
Sustainability 15 02014
No ratings yet
Sustainability 15 02014
19 pages
Informatics 10 00017
No ratings yet
Informatics 10 00017
15 pages
AI Paper
No ratings yet
AI Paper
9 pages
Batch 5 (1) - Corrected
No ratings yet
Batch 5 (1) - Corrected
9 pages
Master Hazard Identification & Risk Assessment (HIRA) - Rev 1
100% (1)
Master Hazard Identification & Risk Assessment (HIRA) - Rev 1
66 pages
KNN Paper
No ratings yet
KNN Paper
11 pages
Sustainability 15 05939 v3
No ratings yet
Sustainability 15 05939 v3
15 pages
Related Worked
No ratings yet
Related Worked
10 pages
Priyadarshini Phase 2
No ratings yet
Priyadarshini Phase 2
7 pages
SVBP 223 Failures Maintenance Rehabilitation of Transportation Infrastructure
67% (3)
SVBP 223 Failures Maintenance Rehabilitation of Transportation Infrastructure
33 pages
Road Accidents Prediction and Classification
No ratings yet
Road Accidents Prediction and Classification
6 pages
A Knowledge Graph-Based Hazard Prediction Approach For Preventing
No ratings yet
A Knowledge Graph-Based Hazard Prediction Approach For Preventing
19 pages
Road Accident Analysis and Prediction Model Using A Data Mining Hybrid Technique
No ratings yet
Road Accident Analysis and Prediction Model Using A Data Mining Hybrid Technique
7 pages
Optimized Feature Selection Approaches For Accident Classification To Enhance Road Safety
No ratings yet
Optimized Feature Selection Approaches For Accident Classification To Enhance Road Safety
8 pages
A Machine Learning Approach To Short-Term Traffic Flow Prediction A Case Study of Interstate 64 in Missouri
No ratings yet
A Machine Learning Approach To Short-Term Traffic Flow Prediction A Case Study of Interstate 64 in Missouri
7 pages
Road Accident Analysis and Prediction of
No ratings yet
Road Accident Analysis and Prediction of
8 pages
JSRT Roadaccidentproject
No ratings yet
JSRT Roadaccidentproject
12 pages
TTPML Paper-2
No ratings yet
TTPML Paper-2
12 pages
6752634395bef DataQuest
No ratings yet
6752634395bef DataQuest
3 pages
Road Accident Analysis Using Machine Learning
No ratings yet
Road Accident Analysis Using Machine Learning
7 pages
Road Accident Prediction Journal Paper
No ratings yet
Road Accident Prediction Journal Paper
3 pages
Final Review Poster
No ratings yet
Final Review Poster
1 page
Highway Chapter 3
No ratings yet
Highway Chapter 3
19 pages
Ab 1
No ratings yet
Ab 1
1 page
Lesson 2: Traffic Road Sign
No ratings yet
Lesson 2: Traffic Road Sign
35 pages
10 1108 - Ci 09 2022 02251
No ratings yet
10 1108 - Ci 09 2022 02251
43 pages
Drivers of Users Embracement of 3D Digital Educat
No ratings yet
Drivers of Users Embracement of 3D Digital Educat
39 pages
Antecedents and Consequences of User Acceptance of
No ratings yet
Antecedents and Consequences of User Acceptance of
35 pages
Road Safety
No ratings yet
Road Safety
60 pages
Broadacre City & Luthuli Avenue
No ratings yet
Broadacre City & Luthuli Avenue
56 pages
s40494 024 01507 8.cleaned
No ratings yet
s40494 024 01507 8.cleaned
24 pages
Miros Report VKT
No ratings yet
Miros Report VKT
124 pages
AgirachmanShinozaki 2021 DesignEvaluationinArchitectureEducationwithanAffordance BasedApproachUtilizingNon VirtualReali - Cleaned
No ratings yet
AgirachmanShinozaki 2021 DesignEvaluationinArchitectureEducationwithanAffordance BasedApproachUtilizingNon VirtualReali - Cleaned
20 pages
Buildings 3630763 Peer Review v1
No ratings yet
Buildings 3630763 Peer Review v1
19 pages
Integration of Building Information Modeling BIM A
No ratings yet
Integration of Building Information Modeling BIM A
31 pages
Lecture 4.2
No ratings yet
Lecture 4.2
48 pages
Buildings 3540399 Peer Review v1
No ratings yet
Buildings 3540399 Peer Review v1
15 pages
05 Jjmie 263 22
No ratings yet
05 Jjmie 263 22
14 pages
45 Skid PDF
No ratings yet
45 Skid PDF
10 pages
Synthesis, Characterization, Magnetic, Elastic, and Electronic Properties of La2Znmno6 Double Perovskite
No ratings yet
Synthesis, Characterization, Magnetic, Elastic, and Electronic Properties of La2Znmno6 Double Perovskite
11 pages
Main 1
No ratings yet
Main 1
11 pages
Buildings 15 01480.cleaned
No ratings yet
Buildings 15 01480.cleaned
11 pages
Two Car 1-D Collision E-1-CON-MO - Collision ( (Autorecovered-309448300122870320) ) (AutoRecovered)
No ratings yet
Two Car 1-D Collision E-1-CON-MO - Collision ( (Autorecovered-309448300122870320) ) (AutoRecovered)
8 pages
Short Description of The Project
No ratings yet
Short Description of The Project
5 pages
VK 95115135155 VKD 155170190210
No ratings yet
VK 95115135155 VKD 155170190210
32 pages
Problems Sheet 01
No ratings yet
Problems Sheet 01
2 pages
Gujarat Technological University: Page 1 of 4
No ratings yet
Gujarat Technological University: Page 1 of 4
4 pages
CN BQH1337
No ratings yet
CN BQH1337
7 pages
Amariles Lopez Et Al 2024 Physicomechanical Properties of Mortar With Diluted Eps As The Binding Material
No ratings yet
Amariles Lopez Et Al 2024 Physicomechanical Properties of Mortar With Diluted Eps As The Binding Material
17 pages
Google Self-Driving Car Project Monthly Report
No ratings yet
Google Self-Driving Car Project Monthly Report
5 pages
Cross-Section Elements: Principal Marginal
No ratings yet
Cross-Section Elements: Principal Marginal
9 pages
"Special Circumstances: Safe Operations For Vehicle Fires" Module Handout
No ratings yet
"Special Circumstances: Safe Operations For Vehicle Fires" Module Handout
5 pages
The in Uence of Thermal Gradients On The Fire Behavior of Raw Earth and Cement Stabilized Bricks at Various Water Contents
No ratings yet
The in Uence of Thermal Gradients On The Fire Behavior of Raw Earth and Cement Stabilized Bricks at Various Water Contents
9 pages
Hackathon
No ratings yet
Hackathon
10 pages
Soal Conditional
No ratings yet
Soal Conditional
1 page
KRSA Fund-Format
No ratings yet
KRSA Fund-Format
10 pages
QUESTIONNAIRE
No ratings yet
QUESTIONNAIRE
7 pages
Identifying Traffic Bottlenecks With Pattern Mining in Hyperon
No ratings yet
Identifying Traffic Bottlenecks With Pattern Mining in Hyperon
4 pages
Driver Negligence Jurisprudence
No ratings yet
Driver Negligence Jurisprudence
2 pages
Ảnh Màn Hình 2024-08-12 Lúc 15.19.45
No ratings yet
Ảnh Màn Hình 2024-08-12 Lúc 15.19.45
1 page
Assignment Title
No ratings yet
Assignment Title
2 pages
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
From Everand
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
BoD - Books on Demand
No ratings yet
CAREC Road Safety Engineering Manual 3: Roadside Hazard Management
From Everand
CAREC Road Safety Engineering Manual 3: Roadside Hazard Management
Asian Development Bank
No ratings yet

Loretta Sabs e

Uploaded by

Loretta Sabs e

Uploaded by

The current issue and full text archive of this journal is available on Emerald Insight at:

2. Incident management and prediction for HTOs

2.1 ML for H&S incident prediction

2.2 Safety indicator data

3.1 Data collection

3.2 Data pre-processing

Environmental Published Integer ID number for data point Zhang et al.

where m is an F-dimensional vector and c is a scalar.

C i ¼ Modeðf1 ðxi Þ; f2 ðxi Þ; . . . ; f9 ðxi ÞÞ (7)

3.4 Data balancing

3.5 Performance evaluation

4.1 Feature importance in prediction model

4.2 Class imbalance handling

4.3 Performance evaluation of classifiers

SVM Precision 24 71 00 07 37 71 00 09 79 94 38 47 50 73 01 17 100 100 79 100 57 83 00 07 27 78 00 06 99 80 93 99 38 83 00 06

5.1 Future work

You might also like