Bayesian Network-Based Knowledge Graph Inference For Highway Transportation Safety Risks
Bayesian Network-Based Knowledge Graph Inference For Highway Transportation Safety Risks
Research Article
Bayesian Network-Based Knowledge Graph Inference for Highway
Transportation Safety Risks
Received 16 October 2020; Revised 20 January 2021; Accepted 18 February 2021; Published 5 March 2021
Copyright © 2021 Luo Wenhui et al. ,is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Accurate inference of knowledge about highway transportation safety risks forms a crucial aspect of building a knowledge graph.
Based on the data related to highway transportation accidents, this study has developed a Bayesian network model. ,e initial
identification of the network nodes is through expert scoring. ,e network structure is then constructed by utilizing the prior
expert knowledge and K2 greedy search algorithm. Later, the network parameters are trained via the expectation-maximization
(EM) algorithm. Finally, knowledge about highway transportation safety risks is inferred using the junction tree algorithm. A
comparison is made between the trained conditional and actual probabilities during the network parameter training to verify the
validity of the proposed model that accords with expert experience, thereby proving the model validity. Further, its main “causal
chain” is inferred to be an improper emergency response-human failure-accident occurrence, where the probability of driver
failure is 82%, and the probability of accident occurrence is 68% by taking “a certain road traffic accident” as an example. ,ere is
consistency between the inference results and the actual accident sequence that suggests the effectiveness of the proposed
knowledge inference method.
innovativeness is in the introduction of new network executed by introducing the “July 16 Oil Depot
threat factors and the calculation of their possible Explosion and Fire” that yielded correct inference
impacts on the entire system. Yang et al. [4] pro- results. Seyed Hassani et al. [9] developed a Bayesian
posed an MLN-based method for joint sentiment network related to inference on knowledge graphs by
analysis of sentences to address the inadequate taking into consideration some hidden or ignored
utilization of contextual information with the information in complex social networks. ,ey
existing knowledge inference methods, as well as the identified the nodes of the Bayesian network, such as
cross-domain connection problem of sentence in- comments, avatar information, ensemble photos, or
formation, in the context of knowledge graph in- interactive information, and using the collected data,
ference. ,ey found through experimentation that trained its parameters. ,e effectiveness of their
MLN-based knowledge inference achieved rather algorithm was tested on Facebook that found the
desirable results. Liu et al. [5] put forward an en- model is highly accurate in finding information
semble learning-based MLN model, as well as its between users. Rajabi and Ataie-Ashtiani [10] pro-
learning algorithm, in response to the difficulty of posed a fuzzy Bayesian inference method, in terms of
MLN in inferring large-scale data, respecting the model modification and improvement, to address
modification and improvement of MLN models. the unavailability of parameter training data in
,ey conducted a knowledge extraction experiment conventional Bayesian inference. ,e method fused
in Google’s large-scale corpus using this method. As the information provided by experts with the
experiments proved, their method had higher pre- Bayesian network model. ,ey also developed an
cision and recall than the pipelining approach. algorithm for solving the model. Computational
(2) Bayesian network-based inference models: Bayesian results were compared with the Markov Chain
inference is a process of calculating posterior Monte Carlo- (MCMC-) based algorithm that
probability, based on conditional and prior proba- proved the effectiveness of their model and
bilities. Relating to risk management, Lu et al. [6] algorithm.
proposed a Bayesian network-based model for flood (3) Machine learning-based models: Xie et al. [11]
risk inference aiming at the problem of flood risk proposed a model that targeted the knowledge
misreporting. Here, expert scoring identified the completion issue with knowledge graphs based on
Bayesian network, and parameter learning was bag-of-words and convolutional neural network,
performed with the Monte Carlo model. ,ese where the bag-of-words model was used for the
helped in analyzing the changes in flood risk under vector representation of texts, and the convolutional
one- and two-factor uncertainties. ,e Bayesian neural network was responsible for classifying and
network built by them allowed bidirectional rea- inferring word relationships. ,ey confirmed the
soning and probability distribution inference of validity of their model through experimentation. A
arbitrary nodes. Experiments proved that their novel convolution-based model was proposed by
model was practically applicable to the risk assess- Annervaz et al. [12] that extracted relevant prior
ment and control in reservoirs. Based on a qualitative knowledge from the graphs via an attention mech-
weighted Bayesian network, Yin et al. [7] put forward anism. Experiments on public datasets demonstrated
a human factor inference model for coalmine acci- that their method was effective in enhancing the
dents in order to address the analytical deficiency of performance of deep learning models. Godin et al.
human factors in coal mine accident analysis. ,e [13] developed a ternary reward framework to cope
Bayesian network, based on typical accident cases, with the incorrect answering problem in the existing
was constructed using the Human Factors Analysis reinforcement learning-based question answering
and Classification System (HFACS) model, and the systems that established a new evaluation criterion
weight magnitudes between network nodes were by setting different rewards for wrong answers and
obtained by the expert method. ,e mutual logical no answers, thereby achieving better evaluation of
relationships between the human factors were re- model effectiveness. All of the above—completion,
flected rather accurately by their model. With a view fast retrieval, and answering problems related to
to combing the evolution process of such events, Xia knowledge graphs—can be understood as the sce-
et al. [8] proposed a dynamic Bayesian network- nario applications of knowledge inference.
based unconventional scenario inference model for (4) Hybrid models: Jiang et al. [14] developed a method
sudden disaster events. ,ey constructed a dynamic for representing knowledge based on weighted
Bayesian network with nodes such as the scenario knowledge graphs that was then combined with the
condition, handling goals, handling measures, and probabilistic graphical model to establish a medical
intrinsic variables. ,e prior probabilities were diagnosis knowledge network. A path sorting algo-
designated for root nodes, while for variables rithm-based random walk model was proposed by
without parent nodes, the expert method was Liu et al. [15] that performed knowledge inference on
employed to determine conditional probabilities. In the semantics of sentences regarding the inverse
the end, inference of accident scenario evolution was relationship from object to subject. An exhaustive
7074, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6624579 by Nat Prov Indonesia, Wiley Online Library on [04/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Civil Engineering 3
search was avoided by introducing a random sam- the determination of node conditional probabilities, and (iv)
pling mechanism, and the effectiveness of their the inference of safety risk knowledge for highway trans-
knowledge inference model was experimentally portation. Overall, the modeling procedure incorporates the
verified. expert method and machine learning. Figure 1 presents the
model framework proposed in this study.
,e advantage of logical rule-based reasoning is the
intuitive inference procedure that can reflect prior knowl-
edge, and its disadvantage is the difficulty in obtaining rules 2.1. Network Node Identification. At this stage, expert
that lead to error accumulation. ,e advantage of deep scoring identifies the network nodes. Prior expert knowledge
learning-based inference models is a powerful reasoning is collected according to five categories of “drivers, vehicles,
capability, while their disadvantages are insufficient inter- roads, environment, and management,” thereby con-
pretability and data limitations. Predicting the probability of structing a matrix for node information acquisition as
accident occurrence and estimating the “causal chain” are shown in the following expression:
the foremost tasks of knowledge graph inference for highway
transportation safety risks. On the other hand, we consider Ef � eij m×n, (1)
the task of inferring a knowledge graph about highway
where eij (0 ≤ eij ≤ 1) denotes the confidence that the ith
transportation safety risks from the Bayesian networking
expert has, regarding the causality between the jth risk and
perspective, given the limited data collected in this study.
the fth category, i ∈ [1, n], j ∈ [1, m], f ∈ [1, 5]. Matrix Ef
Expert scoring identifies the nodes of the Bayesian network.
is traversed, and if eij ≥ a, it indicates that the jth risk and the
,e node parameters are learned via the EM algorithm, and
fth category are casually correlated.
knowledge about highway transportation safety risks is
inferred with the junction tree model.
,e contributions of this paper include the following: 2.2. Network Architecture Identification. ,e network ar-
(1) A knowledge inference framework for road trans- chitecture identification is divided into two steps, given the
portation risks is proposed, and annotated datasets limited amount and quality of training data in this study,
are provided for the research field. namely, the expert method based network architecture and
the network architecture modification by data-based
(2) During the identification of network nodes, a net- learning. For network architecture learning, the greedy
work architecture identification method based on search and conditional restriction algorithms are generally
expert scoring combined with the K2 algorithm is used, of which the methods based on greedy space search are
proposed that ensures the incorporation of expert adapted to minimum data size. K2 algorithm optimizes the
knowledge about road transportation risks by the search capability on the basis of ordinary greedy search by
network architecture. deleting the redundant edges. Hence, for network archi-
(3) Based on the Bayesian network created in this paper, tecture learning, this study adopts the K2 algorithm, whose
the probability distributions of accident occurrence core idea is to find a network structure with high score
are inferred under multiple factors, including the functions. ,e K2 scoring method is described in the fol-
driver, vehicle, road, environment, and lowing formula[16]:
management. n qi ri
D ri − 1!
,e remainder of this paper is organized as follows: F � Nijk !, (2)
G i�1 j�1 Nij + ri − 1 k�1
Section 2 puts forward a model framework for risk
knowledge inference, Section 3 builds a Bayesian network- where G represents the Bayesian network architecture, D stands
based knowledge inference model for road transportation for the training dataset, i denotes the ith node, n denotes the
risks, Section 4 infers knowledge about road transportation number of nodes, j denotes the jth parent node of the current
risks, and Section 5 concludes the study. node, qi is the number of parent nodes of the current node, k
represents the kth value of the current node, ri represents the
2. Creation of Risk Knowledge Inference Model number of possible values of the current node, and Nijk denotes
the number of examples in the training dataset D that corre-
,e Bayesian network may be summarized as a probabilistic sponds to the kth value of current node and the jth value of
inference network that is based on the Bayesian formula, parent node, Nij � k�1
ri
Nijk .
where the nodes represent the random variables, and the For the K2 algorithm, its pseudocode is as in Algorithm 1
directed edges between nodes represent the internode [16].
causality. Each node has a probability distribution. As-
suming a given Bayesian network G(S, P) consists of two
parts, with S being a directed acyclic graph containing all 2.3. Network Parameter Learning. During network param-
nodes and P being a collection of conditional probability eter learning, commonly used algorithms include the
distribution tables. Construction of Bayesian network for maximum likelihood estimation (MLE) and the Bayesian
inferring highway transportation safety risk knowledge estimation and expectation-maximization (EM) algorithms,
comprises four steps, namely, (i) the network node iden- of which the MLE algorithm is generally suitable for sce-
tification, (ii) the network architecture identification, (iii) narios with large data size. When the sample size is small and
7074, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6624579 by Nat Prov Indonesia, Wiley Online Library on [04/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Advances in Civil Engineering
Start
Model building
Expert
Information input
experience
End
Figure 1: Flow chart of safety risk knowledge inference for highway transportation.
Input: training data D, node sequence ρ, positive integer u (u denotes the number of parent nodes)
Output: G
for i � 1 to n do
pa(Vi ) � ∅
Fold � fCH ((Vi , pa(Vi )), D)
start � true
while start (true) and |pa(Vi )| < u do
Z←pred(Vi )\pa(V), fCH ((Vi , pa(Vi ) ∪ {Z})D) it reaches the maximum node
Fnew � fCH ((Vi , pa(Vi ) ∪ {Z})D)
if Fnew > Fold then Fold � Fnew
pa(Vi ) � pa(Vi ) ∪ {Z}
else Start � false
end
end
where pred(Vi ) represents the nodes before Vi and pa(Vi ) is a collection of parent nodes.
ALGORITHM 1
the prior probability is hardly attainable, the use of the EM variables at the network inference stage are computed under
algorithm yields a good learning effect. Hence, this study exact values given for a set of evidence variables (causes).
adopts the EM algorithm to learn network parameters. Its Among common algorithms for network inference, such
algorithmic procedure can be divided into E-step and as junction tree and variable elimination, the former is
M-step (see Algorithm 2) [17]. adopted herein for network inference, owing to its easy-to-
understand and accurate inference advantages. ,e junction
tree algorithm is divided into four phases:
2.4. Network Inference. Network inference is the ultimate
objective of this study. All joint probability distributions for (1) In the initial phase, the built Bayesian network is
nodes are obtained through the network parameter learning modularized, and the parent nodes of each node are
stage. ,e probability (result) distributions for a set of query connected with undirected edges
7074, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6624579 by Nat Prov Indonesia, Wiley Online Library on [04/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Civil Engineering 5
Input: observation variable Y, hidden variable Z, joint distribution P(Y, Z|θ), conditional probability distribution P(Z|Y, θ).
Output: model parameter θ
(1) Assignment of the initial values for the model parameter;
(2) E-step: θi is the value of the model parameter after the ith iteration, the calculated function of expectation on i + 1 th iteration,
Q(θ, θi ) � EZ∼P(Z|Y,θi )[log P(Y, Z|θ)] �Z P(Z|Y, θi )log P(Y, Z|θ)
(3) M-step: find Q(θ, θi ) that maximizes θ, determine the estimated value θi+1 of the i + 1 th iteration.
(4) Repeat steps 2-3 until model converges.
ALGORITHM 2
(2) ,e Bayesian network is converted into an undi- Figure 2 displays the learned network architecture. After-
rected graph, where each arrow is replaced by an ward, modification of the network structure was performed
edge based on expert experience. Figure 3 illustrates the finalized
(3) ,e graph is triangulated, and an edge is added to the network architecture.
variables in the same loop As is clear from Figure 2, the network architecture
presents a strong progressive causal relationship that was
(4) ,e triangulated graph is converted into a clustering
divided into three layers vertically. ,e relationship between
tree, where each node represents factors in the
nodes is clear, and there are no connecting lines between
variable subset
unrelated factors. ,e nodes are divided into two parts,
including observable nodes and virtual nodes. ,e observ-
3. Bayesian Network-Based Knowledge able nodes are the nodes with actual data description, while
Inference Model for Highway Transportation the virtual nodes are the nodes added for the integrity of the
Safety Risks network.
As is clear from Figure 3, the hierarchy of the network is
,e purpose of safety risk knowledge inference for road not obvious, because through the learning of the algorithm,
transportation is to achieve the “prior” control of unsafe the implicit relationship between the nodes is further
risk factors. However, these risk factors have strong un- revealed, and the overall structure of the network satisfies the
certainties, while Bayesian networks have provided requirement of the cognition.
preferable solutions to complex and uncertain problems
[16]. Hence, following this idea, a Bayesian network-based
knowledge inference model for road transportation risks, 3.3. Network Parameter Learning. Learning of various net-
with which the inference problems of safety risk knowl- work parameters was accomplished via the EM algorithm in
edge for road transportation are solved, has been pro- GeNIe 2.0. ,e learning process can be summarized into five
posed in this paper. steps:
(1) Assignment of initial values for various root nodes.
3.1. Network Node Identification. Since the focus of this (2) Establishment of correspondences between node
research is on solving the problem of risk knowledge in- names and IDs.
ference, our team has listed a total of 28 risk sources in- (3) Import of datasets.
volving “drivers, vehicles, roads [18, 19], environment, and
(4) Matching of data with network nodes.
management” based on the description of safety risks in the
accident reports over the years that have been accomplished (5) Parameter learning. ,e learning results are listed in
by taking specific risks as the network nodes. We developed a Table 2, where N implies normal and F implies
quantitative questionnaire and distributed it among ten failure.
experts within the field. When eij ≥ 0.7, a causality is con- Table 2 clearly shows that, under identical conditions,
sidered to exist between risk and category. A risk has been improper emergency response and traffic violations exhibit
chosen as an effective node if more than six experts con- the highest posterior probabilities among the driver factors,
firmed its causality with the category. By employing expert followed by fatigue driving, distracted driving, and drunk
scoring, the network nodes for road transportation risk driving, while inexperienced driving shows the lowest
knowledge have been constructed according to formula (1) posterior probability. ,ese results are in line with expert
as detailed in Table 1 that was done by selecting a total of 26 experience and perception.
observable nodes and 5 virtual nodes.
V D R E M
V1 V2 V3 V4 V5 R1 R2 R3 R4 R5 R6 R7 R8 M1 M2 M3
D1 D2 D3 D4 D5 D6 E1 E2 E3 E4
Observable node
Virtual node
Improper
emergen..
Roadside risk
Poor
alignment; Interference
Road from t..
sections.. Bridge and
Distracted tunnel
driving Lighting
Unfamiliar inadequ..
driving Severe
weather
Driver factor
Natural
Poor sight disasters
Fatigue line
driving Traffic
violations Road
sections.. Dynamic
Drink and monitoring
driving Uneven
road sec.. Overload
or unbal.. Environment
Braking factor
system f..
Road factor Private
Management vehicles
Steering factor
system f.. Vehicle
factor
Intersections
Institutional?
Accident deficie..
Light signal
system f.. Tire failure
impact analysis is carried out keeping in mind the following are depicted in Figure 4 under human, vehicle, road, en-
aspects: vironment, and management factors.
From Figure 4(a), it is clear that, among the causes of
(1) Effects of the driver, vehicle, road, environment, and
accidents, the accident probability is the highest for the
management factors on the occurrence of accidents.
driver factors, with an inferential probability of 0.899, fol-
(2) Effects of drunk driving, fatigued driving, distracted lowed by the management factors, with an inferential
driving, traffic violations, improper emergency re- probability of 0.485. In contrast, road factors constitute the
sponse, and inexperienced driving on the driver least probable factors, primarily because they are indirect
risks. causes in general. According to Figure 4(b), the probability
(3) Effects of the braking system, steering system, light of accidents resulting from the failure of drivers, vehicles,
signal system, tires, truck overload, and unbalanced and management is quite high, while that caused by the
load on the vehicle risks. failure of environmental and road factors is rather low.
(4) Effects of roadsides, bridges and tunnels, alignment,
intersections, pavements, signs and markings, sight
4.2.2. Effects of Relevant Factors on the Driver Risks.
distance, and safety protection facilities on the road
Figure 5 illustrates the influences of drunk driving, fatigue
risks.
driving, distracted driving, traffic violations, improper
(5) Effects of traffic accidents, severe weather, natural emergency response, and inexperienced driving on the
disasters, and night lighting on the traffic environ- driver risks.
ment risks. From Figure 5, it is clear that, regarding driver factors,
(6) Effects of dynamic monitoring, regulations, and the failure probabilities attributed to improper emergency
private contracting on the management risks. response and traffic violations are comparatively higher at
0.694 and 0.660, respectively. In comparison, the failure
probability caused by inexperienced driving is lower. ,e
4.1. Data Description and Model Evaluation. Our team occurrence frequencies are the highest among causes of
collected 600 reports on road transportation accidents from accidents for speeding, illegal lane changing, illegal over-
the safety management websites that occurred between 2012 taking, illegal parking, and illegal emergency lane parking,
and 2019. ,ese accident reports have been sorted into three all of which can be classified as forms of a traffic violation.
categories. ,e risk factors of each accident are marked ,e probability of an accident is generally high in case of
either as 0 or as 1, where 1 indicates that the assessed risk is improper emergency response. ,e probability of accidents
an accident-causing factor, and 0 indicates that the assessed caused by inexperienced driving is rather low because of the
risk is not an accident-causing factor. ,e model evaluation strict management of drivers by road transportation
criteria are as follows: companies.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
DREM
DM
DE
DEM
DRM
DVRE
DVE
DVEM
DVREM
DV
VE
DVM
DVRM
VREM
EM
VM
VRM
REM
VEM
VRE
RE
RM
Driver factor Vehicle Road factor Environment Management
DR
DR
DVR
VR
factor factor factor
Influence factors
Influence factors
(a) (b)
0.8
1.0
0.7
0.6 0.8
Reasoning probability value
0.5
0.6
0.4
0.3 0.4
0.2
0.2
0.1
0.0 0.0
Drink and Fatigue Distracted Traffic Improper Unfamiliar Braking system Steering Light signal Tire failure Overload or
driving driving driving violations emergency driving failure system failure system failure unbalanced load
response Influence factors
Influence factors Figure 6: Effects of relevant factors on the vehicle risks.
Figure 5: Effects of relevant driver factors.
causes of accidents, the relevant probabilities are also high. In
because accidents can often be avoided in this case as long as contrast, sight distance is the least likely cause among all road
the drivers take appropriate emergency measures. factors because such problems generally involve roadside ob-
structions or intersections, whose probabilities are rather low
according to accident statistics.
4.2.4. Effects of Relevant Factors on the Road Risks. In Fig-
ure 7, the influences of roadsides, bridges and tunnels, alignment,
intersections, pavements, signs and markings, sight distance, and 4.2.5. Effects of Relevant Factors on Management Failures.
safety protection facilities on the road risks are detailed. Figure 8 presents how dynamic monitoring, regulations, and
From Figure 7, it is clear that the failure probabilities at- private contracting affect the management risks.
tributable to pavement and alignment problems are the highest According to Figure 8, the probability of management
among all road factors—valued at 0.852 and 0.782, respectively. problems caused by improper dynamic monitoring is the
Pavement problems generally refer to slippery surfaces, potholes, highest at 0.699. At present, information about fatigued
and so forth, while alignment problems include sharp bends, driving, distracted driving, speeding, vehicle trajectories,
steep slopes, and long descents. As these are rather common and so forth can be tracked and forewarned by the dynamic
7074, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6624579 by Nat Prov Indonesia, Wiley Online Library on [04/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Civil Engineering 9
0.7
0.6
0.4
0.3
0.2
0.1
0.0
Interference from Severe weather Natural disasters Lighting
traffic accidents inadequacy
Influence factors
Accident
State 0 32%
State 1 68%
Driver factor Vehicle factor Road factor Environment factor Management factor
State 0 18% State 0 42% State 0 40% State 0 53% State 0 36%
State 1 82% State 1 58% State 1 60% State 1 47% State 1 64%
Improper emergenc.. Traffic violations Uneven road sections Bridge and tunnel
State 0 12% State 0 12% State 0 67% State 0 37%
State 1 88% State 1 88% State 1 33% State 1 63%
attributable to vehicle fault, with a probability of 58%; besides, private contracting was 68%. Ultimately, the probability of
the probability of traffic environment problems resulting from occurrence of the accident was 0.68, which is rather high. ,ese
severe weather that was 47%; and the probability of manage- descriptions conform to the actual accident inference process
ment failure resulting from inadequate dynamic monitoring and that proves the feasibility of the proposed method.
7074, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6624579 by Nat Prov Indonesia, Wiley Online Library on [04/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Civil Engineering 11