0% found this document useful (0 votes)

17 views29 pages

Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System — a Review

This review paper discusses the advancements and limitations of pedestrian and vehicle behavior prediction in autonomous vehicle (AV) systems. It highlights the need for improved algorithms to enhance AV safety and efficiency, particularly in predicting the trajectories and intentions of road users. The authors analyze existing literature and categorize various prediction techniques, emphasizing the importance of deep learning methods in achieving better predictive performance.

Uploaded by

Nadeem Mohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views29 pages

Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System — a Review

Uploaded by

Nadeem Mohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Expert Systems With Applications 238 (2024) 121983

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Review

Pedestrian and vehicle behaviour prediction in autonomous vehicle system

— A review
Luiz G. Galvão, M. Nazmul Huda ∗
Department of Electronic and Electrical Engineering, Brunel University London, Kingston Lane, Uxbridge, UB8 3PH, London, United Kingdom

ARTICLE INFO ABSTRACT

Keywords: Autonomous vehicles (AV)s have become a trending topic nowadays since they have the potential to solve
Deep learning traffic problems, such as accidents and congestion. Although AV systems have greatly evolved, it still have
Autonomous vehicle their limitations. For example, Google reported that their AVs have been involved in several collisions and
Pedestrian
near misses. While most of these collisions and near misses were caused by third parties, the AVs should
Vehicles
be able to predict and avoid them. Events like this show that there is still room for improvement in the AV
Behaviour prediction
system. This paper aims to present a review of the state-of-the-art algorithms proposed to enable AV behaviour
prediction systems to predict trajectories and intentions for pedestrians and vehicles. This will be achieved by
using information from previous literature review papers, recent works, and results obtained using well-known
datasets.

1. Introduction Another major AV limitation is gaining public confidence that they are
safe to ride. Authors (Petrović et al., 2020) investigated 300 traffic
Road traffic accidents and congestion have posed significant chal- collisions in California (US) between 2015 and 2017 that involved AVs.
lenges for many countries today. Road traffic accidents claim the lives They found that most of the collisions were caused by conventional
of 1.35 million people annually and it is ranked 8th leading cause drivers, who were following the AVs too close, and violated the right-
of death worldwide (WHO, 2018). In addition, it has been reported of-way, traffic signals, and traffic signs. Google published a paper
that road traffic accidents are responsible for 20 to 50 million non- reporting the performance of their Waymo driver between 2019 and
fatal causalities, and 95% of these accidents are caused by human 2020, to show transparency and make the public more comfortable
errors and imprudence. It reported in the UK that, in 2020 and 2021 and confident with AVs. In the report, the Waymo driver drove 6.1
there were 92,055 and 119,850 road traffic causalities, respectively, million miles and was involved in 47 road traffic collisions and near-
and 1676 of these causalities led to death (GOVUK, 2020, 2021). miss events - these include both actual and counterfactual simulated
Congestion has a significant negative impact on society, affecting the events (Schwall et al., 2020). Most of the reported collisions were
economy, environment, public health and safety (Afrin & Yodo, 2020; induced by humans where one or more road traffic rules were broken,
Levy et al., 2010). Enforced legislation, advanced driving assistance such as violating the speed limit, driving on the wrong side of the road,
system (ADAS), other methods of transportation and road improve- not obeying the stop sign or the red traffic light signal, performing inap-
ments have been used to address these road traffic issues. However, it is propriate lane change or junction merging, not yielding the right of way
predicted that the number of road users will double by 2050 and these to the Waymo driver, and not yielding to the slowing down behaviour
current measures will not be sufficient (COLONNA, 2018). AVs are a of the Waymo’s driver. Although some of these cited events could not be
trending topic nowadays and companies such as Waymo and Uber have avoided by the AV drivers, for example, the conventional drivers hitting
already deployed several AVs on the roads to solve the aforementioned the rear of the AV while it was stationary or slowing down, there were
road traffic problems. Although AV systems have considerably evolved, instances where they could have been. For instance, accidents caused
they still have limitations such as efficiently and safely navigating in by changing lane manoeuvre, merging from a junction, or making a
complex scenarios. This could be achieved by avoiding congestion, turning manoeuvre, could have been avoided if the AV was able to
predicting, preventing, or mitigating any road traffic collisions. These make an accurate and longer prediction horizon of the trajectories and
are challenging tasks since the AVs have to share the roads with human intention of the conventional drivers. This shows that there is still room
road users, and as reported by World Health Organisation (WHO), for improvement in the AV system, mainly in the behaviour prediction
most road traffic collisions are linked to human error and imprudence.

∗ Corresponding author.
E-mail addresses: [email protected] (L.G. Galvão), [email protected] (M.N. Huda).

https://doi.org/10.1016/j.eswa.2023.121983
Received 24 December 2022; Received in revised form 11 September 2023; Accepted 1 October 2023
Available online 12 October 2023
0957-4174/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

of other road users since it enables the AVs to make a risk assessment • Presents a behaviour prediction general problem formulation.
of the situation in order to take appropriate action. The goal of this • Presents the most used terminologies in the pedestrian and vehi-
paper is to review the most relevant works that aimed to predict the cle behaviour prediction domain.
trajectories and intentions of vehicles and pedestrians. • Reviews not only pedestrian or vehicle behaviour prediction al-
There are several literature reviews covering both traditional and gorithms, but both of them;
Deep Learning (DL) techniques to predict the behaviour of vehicles, • Briefly presents the most important traditional techniques and
for example, Lefèvre et al. (2014), Leon and Gavrilescu (2019), Shirazi focuses more on the DL techniques for pedestrian, and vehicles
and Morris (2016), Sivaraman and Trivedi (2013) and Mozaffari et al. prediction algorithms;
(2020). Sivaraman and Trivedi (2013) briefly reviewed the behaviour • Summarises the key information extracted from the reviewed
prediction of vehicles but at that time this topic was fairly new and studies on predicting pedestrian and vehicle behaviour in tables.
only traditional techniques were reviewed. Lefèvre et al. (2014) pre- These tables report the methods employed, the problem that the
sented a survey and classified vehicle prediction behaviour algorithms algorithms are trying to solve, the datasets used, and the results
into physics-based, manoeuvre-based, and interaction-aware-based al- acquired.
gorithms. They concluded that a behaviour prediction algorithm needs • Reviews works that have performed prediction behaviour of het-
to consider the interaction between vehicles as well as the scene context erogeneous agent traffic.
to have a longer prediction horizon. In addition, they reviewed the
• Introduce a general framework for a behaviour prediction system
existing risk assessment methods for autonomous vehicles and con-
highlighting the system dependence on the AV’s hardware and the
cluded that a risk assessment module was highly dependent on the
perception module, and its typical outputs. In addition presents a
behaviour prediction algorithm. In this review, the authors only cov-
risk assessment for a general behaviour prediction system.
ered traditional techniques since DL techniques for vehicle behaviour
• Identifies the requirements and challenges to design a pedestrian
prediction were still emerging at the time. Shirazi and Morris (2016)
and vehicle behaviour prediction system for AV.
reviewed techniques used to analyse vehicles, drivers, and pedestri-
• Discusses whether the current techniques have met the previously
ans’ behaviour at road intersections. Only traditional techniques were
mentioned requirements, and suggests future works.
analysed, however, the focus was not on the prediction behaviour
of the vehicles. Leon and Gavrilescu (2019) reviewed methods used Some of the commonly used terminologies in the pedestrian and ve-
for vehicle tracking, behaviour prediction, and decision-making. Both hicle behaviour prediction literature are listed below (Mozaffari et al.,
traditional and DL techniques have been covered. The authors con- 2020).
cluded that DL techniques had better results since they are more
robust, flexible and have better generalisation ability. Mozaffari et al. • Object behaviour: means the object trajectories or intentions.
(2020) performed a systematic and comparative review of the different • Object trajectory: vectors with a sequence of data, typically
DL methods used to predict vehicle trajectories and its intentions. comprised of tracking information that describes the path an
They presented a more detailed taxonomy of the prediction behaviour object had followed.
algorithms compared to Lefèvre et al. (2014). They categorised the al- • Object intention: is a course of actions that an object intends to
gorithms based on the type of input, the type of output, and the method perform to achieve its goal. In the vehicle domain, these courses
of prediction. Although the review was extensive and very informative, of action are known as manoeuvres, such as, turning, changing
the authors did not cover in detail what intention behaviour the works lanes, stopping, cut-in/cut-out, etc. In the pedestrian domain,
were trying to predict, for example, lane change, overtaking, or making these intentions are crossing/non-crossing, stopping, etc.
a turn; and do not provide specific information on what datasets were • Observation time horizon (OTH): the time that an algorithm
used. observes the past behaviours of an object to predict its future
The following works have performed pedestrian behaviour predic-
behaviour.
tion reviews, Chen, Ding, et al. (2020), Kong and Fu (2018), Ridel et al.
• Prediction time horizon (PTH): Most of the reviewed works
(2018), Rudenko et al. (2020), Sharma et al. (2022) and Ahmed et al.
use prediction horizon to refer to the time that an algorithm can
(2019a). Kong and Fu (2018) presented traditional and DL techniques
predict an object’s behaviour before it happens. However, in some
that were used to recognise and predict human action. Ahmed et al.
works, the term ‘prediction’ is replaced with ‘anticipation’, and it
(2019a) presented a survey on the detection and intention prediction
is defined as the time that an algorithm can predict an object’s
of pedestrians and cyclists. A review on pedestrian behaviour was
behaviour before it begins. This paper adopts the term prediction
presented by Ridel et al. (2018), where they briefly described the
and its first meaning.
traditional and DL techniques that were used. Chen, Li, et al. (2020) dis-
• Ego Vehicle (EV): observes the others traffic agents using on-
cussed the required architecture, the traditional and DL techniques to
board sensors.
detect and predict pedestrian actions. Although, these works reviewed
DL techniques, only a limited amount of works were considered. A • Target object: the object that the EV is observing to predict its
detailed human trajectory prediction survey was done by Rudenko et al. behaviour.
(2020), where they reviewed a substantial amount of published works • Surrounding objects: the objects that may interact with and
to propose a taxonomy, identify the available datasets and evaluation affect the behaviour of the target object.
metrics, and the limitations of the current methods. However, the • Multi-modal behaviour: means that an observed history of
authors did not review methods used to predict pedestrian intentions. A behaviours could lead to multiple several potential future be-
comprehensive survey was done by Sharma et al. (2022) on pedestrian haviours.
intention prediction for AV systems. • Trajectory prediction: means to predict the future motion of
To the authors’ knowledge, the work presented by Gulzar et al. an object given a time frame of its and/or surrounding object’s
(2021) is the only one that reviewed the behaviour prediction of both trajectories, contextual information, and interactions between the
pedestrian and vehicle. The authors presented a novel taxonomy that objects in the scene.
unifies both pedestrian and vehicle behaviour prediction problems. • Intention prediction: usually uses the same history informa-
However, the authors did not explore the evaluation metrics, datasets, tion that trajectory prediction uses, however, the system aims to
features, and the results of the reviewed works. predict the future discrete action of the target object.
Unlike the previously cited review works on both pedestrian and • Interaction: influences that one or more objects have on each
vehicle behaviour prediction, this paper: other.

2
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Fig. 2. An example of lane change prediction problem: F0 is where the vehicle

manoeuvre starts, F1 is where the actual manoeuvre happens, and F2 is the end of
the manoeuvre.

et al. (2022), Piccoli et al. (2020), Rasouli et al. (2019, 2020), Vitas
et al. (2020), Yang, Zhang, et al. (2022), Yao et al. (2021b), Zeng
(2022), Zhang, Angeloudis, and Demiris (2022), and Xue et al. (2018),
a general intention prediction problem formulation is as follows: a
sequence of feature vector {𝐹𝑡−𝑂𝑇 𝐻 , … , 𝐹𝑡 } extracted from a given
sequence of video frames {𝑡 − 𝑂𝑇 𝐻, … , 𝑡} acquired from an image
sensor is used by a model to determine the probability of the target
Fig. 1. Object behaviour prediction full pipeline process. The detection and classifi- agent intention 𝐼𝑎𝑡+𝑛 𝜖{0, 1}, where 𝑡 is the specific time of the last
cation stage outputs the object position, size, type, bounding box, segmentation, and,
observed frame and 𝑛 is the number of frames from the last observed
global and local context information. The object tracking stage outputs the ID for each
detected object and its dynamics (e.g., speed). The output of the object behaviour frame to the final frame of the event, also known as time-to-event
prediction module can be the object’s intention and its future trajectory. (TTE). The prediction intention estimation can be described by the
equation

𝑝(𝐼𝑎 |𝐹𝑡−𝑇𝑜𝑏𝑠 ∶𝑡 ). (1)

A typical AV system architecture comprises perception, planning,
and acting modules (Durrant-Whyte, 2001; Galvao et al., 2021; Pendle- Based on the vehicle and pedestrian trajectory prediction problem
ton et al., 2017; Siegwart et al., 2011). However, the Waymo driver formulation proposed by Altché and de La Fortelle (2017), Dai et al.
system has an additional module called behaviour prediction, which (2019), Deo and Trivedi (2018a, 2018b), Kim et al. (2017), Lee, Choi,
comes before the planning module (Waymo, 2020). The perception et al. (2017), Li et al. (2019a, 2019b), Mangalam et al. (2020), Mes-
module is responsible to inform what is around the AV, for example, saoud et al. (2019), Mohamed et al. (2020), Sadeghian et al. (2019),
static objects (e.g., traffic lights, traffic signs, road works, etc.) and Sun et al. (2020), Vemula et al. (2018), Xin et al. (2018), Xu et al.
non-static objects (e.g., pedestrians, vehicles, etc.), and traffic road (2018), Zhang et al. (2019), Zhu et al. (2019), and Gupta et al. (2018),
contexts (e.g., road lanes, edges, curbs, pedestrian crossings, etc.). The a general trajectory prediction problem could use the same sequence of
behaviour prediction module is responsible to anticipate the behaviour feature vector used by intention prediction algorithm. However, in this
(e.g., trajectories and intentions) of other traffic agents. The planning case, the sequence’s purpose is to predict the future path of the target
module takes the perceived and predicted information to decide what agent, spanning up to the specified PTH. The trajectory prediction
action the AV should take in order to achieve its final goal. Finally, the estimation can be described by the equation
acting module performs the actual motion of the AV through actuators 𝑝(𝐹 𝑢𝑡𝑢𝑟𝑒𝑃 𝑎𝑡ℎ𝑡∶𝑃 𝑇 𝐻 |𝐹𝑡−𝑇𝑜𝑏𝑠 ∶𝑡 ). (2)
that control the steering wheel, accelerator, and brakes. This paper
adopts the Waymo driver architecture, that behaviour prediction is a Fig. 2 depicts an example of predicting a vehicle’s lane change
separate module in an AV system. manoeuvre. Here, image sequences from 𝑡 − 𝑁 to 𝑡 are used to extract
This paper is structured as follows: Section 2 presents a general a sequence of feature vectors, which are subsequently used to make
problem formulation for pedestrian and vehicle behaviour predic- predictions. In this case, a successful lane change manoeuvre predic-
tion; Sections 3 and 4 present the most relevant algorithms used tion occurs when the vehicle’s intention is correctly recognised before
to predict the behaviour of vehicles and pedestrians, respectively; reaching the F1 stage.
Section 5 presents behaviour prediction algorithms for heterogeneous The differences among the reviewed problem formulation of vehicle
road agents; and Section 6 discusses the findings. and pedestrian behaviour prediction are:

• Some problem formulations are for trajectories and others for

2. Behaviour prediction general problem formulation
intention.
• The input features used may be different, for example, some
A full pipeline of an object behaviour prediction system, as depicted
authors have used only position and speed, while others have
in Fig. 1, is composed of detection and classification, tracking, and
used local and global context vectors.
prediction stages. This paper only reviews works that studied the
• Some considered top-view and others considered on-board view
behaviour prediction stage. Literature reviews on the detection and
datasets.
tracking can be found in the following works (Abbas et al., 2021;
• Authors have used different predictive models.
Ahmed et al., 2019b; Antonio & Romero, 2018; Dendorfer et al., 2021;
Galvao et al., 2021; Ragesh & Rajesh, 2019; Shobha & Deepu, 2018). The listed items above are further discussed in the following Sec-
Based on the vehicle and pedestrian intention prediction problem tions 3–5.
formulation proposed by Achaji et al. (2022), Biparva et al. (2021), In the literature, some works use predicted intentions to improve
Bouhsain et al. (2020), Fernández-Llorca et al. (2020), Gazzeh and the accuracy of the future trajectories, and other works use predicted
Douik (2022), Izquierdo et al. (2021), Kotseruba et al. (2020), Naik trajectories to improve the accuracy of the predicted intention (Biparva

3
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Fig. 3. General interactions among traffic agents and their environments. Object 1 is the target object (blue circled), the blue arrow shows the direct interaction between the
target object and object 2; the orange arrows show the interaction between object 2 and objects 3, 9, 11, 12, and 13; the yellow arrow shows the interaction between object 17
and object 3.

et al., 2021; Mozaffari et al., 2020). These works will be discussed in Table 1
Motion, context and intention features that can be used to predict vehicle behaviour.
the upcoming sections.
Information Features
Before discussing the behaviour prediction of pedestrians and ve-
hicles, it is important to understand their potential interactions. As MOTION Target Vehicle (TV): Lateral/longitudinal position, velocity,
acceleration, yaw, yaw rate, and relative speed.
depicted in Fig. 3, interactions among different traffic agents can
TV-to-lane: lateral offset, and lateral speed.
cascade and get very challenging, for example, in order to predict the TV-to-Surrounding Vehicle (SV): distance from surrounding vehicles.
actions of object 1, it might be required to consider the actions of the:
CONTEXT Road: Lane marking, number of lanes, lane width, lane curvature,
type of lines, entries, exits, left/right/forward arrows, crosswalks,
• Object 2, since it can change direction and velocity.
traffic light, traffic signs, type of roads (urban, country,
• Object 3, since its action will affect the action of object 2. highway-motorway), bumps, road holes, road works, left/right-hand
• Object 17, since it will affect the action of object 3. side traffic, and junctions.
• Object 9, since it will affect the action of object 2. Vehicle: indicators, brake lights, warning lights, type of the vehicle,
and sirens’ light status.
• Object 13, since it can make a right turn, which will affect the
Other road agents: pedestrians, animals, cyclists, and trams.
action of object 2. Environment: sunny, snowing, rainy, foggy, and dark.
• Object 11 or 13, since they may break the law by not obeying the INTENTION Braking, turning left/right, lane keeping, left/right lane change,
red traffic light. speeding, normal driving, aggressive driving, abnormal driving,
merging, exiting, cutting in/out, and yielding.

3. Vehicle behaviour prediction

In the vehicle behaviour prediction domain, the literature often prediction algorithm should be fast, cost-effective, accurate, generalise
uses the terms prediction behaviour of drivers/vehicles or prediction well in different traffic scenes, consider the interdependence between
behaviour of target/surrounding vehicles. The former usually means to agents, and have a long prediction horizon. A long prediction horizon
predict the behaviour of the ego vehicle using its internal data, such as provides the AV with more time to make decisions and take appropriate
the steering angle, brake pedal position, velocity, speed, indicators sta- actions. A typical vehicle behaviour prediction pipeline consists of
tus, etc. Berndt and Dietmayer (2009), Girma et al. (2020), Raimundo multiple steps, starting with the detection of the target vehicle and
and Favio (2021), Xing et al. (2017). This approach is suitable for AV the surrounding vehicles. This detection information is used to obtain
systems when considering vehicle-to-vehicle communication. The latter tracking information. Subsequently, this tracking information is used
approach involves the ego vehicle using on-board sensors to gather as an observation feature to predict future trajectories. In order to
information from the surrounding vehicles to predict their behaviour. enhance the quality and duration of predictions, context information of
In this review, only the latter approach is reviewed, as vehicle-to- the traffic scene and the intention manoeuvre of other vehicles can be
vehicle communication is not yet available, and AVs would still share considered. Table 1 provides the type of motion, context, and intention
roads with conventional human drivers. information that has been and could be used by the researchers to
Vehicle behaviour prediction is a crucial component of the AV predict vehicle behaviour.
behaviour prediction system as it would enable the AV to perform risk Although vehicles have some characteristics that simplify their be-
assessment, plan future movements, and make appropriate decisions haviour prediction, such as constrained movement due to their inertial
to avoid/mitigate the impact of collisions. Ideally, a vehicle behaviour property, having to obey traffic road rules, and navigating inside the

4
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

road boundaries. It is still a challenging task since their behaviour is 3.1. Trajectory prediction
dependent on other vehicles’ actions, traffic regulations, road geometry,
and different driving environments (Lefèvre et al., 2014; Mozaffari As reported in Table 2, vehicle trajectory prediction has been
et al., 2020). Moreover, vehicles have multi-modal behaviour, different achieved using one or more of the following approaches: physics-based,
types of vehicles might provide different motion information, and manoeuvre-based, or interaction-aware motion models (Lefèvre et al.,
prediction can be affected if surrounding vehicles are occluded. 2014). Physics-based motion models were one of the first approaches
The main two sources of data used to predict the behaviour of to be proposed and it uses the principles of physics to predict vehicle
vehicles are top-view and on-board sensors. Top-view data are captured motions. This approach is computationally efficient, meets real-time
from static sensors usually installed on tall buildings, while on-board requirements, and does not require the dataset to be human-labelled.
sensors are captured from sensors installed on the EV. Top-view data However, they are less suitable for complex scenarios like busy urban
have the advantage of providing more precise information since the scenarios and junctions. This is because they do not take into account
acquired data have better quality, the vehicles surrounding the TV the TV intentions, the contextual information of the scene, or the
are captured, and vehicles are not easily occluded. However, it only interaction between the TV and the SVs. This lack of information
covers a specific and fixed portion of the traffic scene, limiting the limits the prediction horizon for the EV to less than 1 s (Lefèvre
algorithm to generalise to other traffic scenarios. Top-view sensors et al., 2014). In order to overcome the limitation of a short prediction
are typically used in two types of traffic environments: highways- horizon associated with the physics-based approach, manoeuvre-based
motorways and complex traffic scenes, such as busy urban areas and approaches were introduced. In the manoeuvre-based approach, the EV
junctions. Highway-Motorway datasets can suffer imbalanced samples, uses the predicted intention of the TV to predict future trajectories. This
where there are more instances of constant velocity behaviour than increases both the trajectory prediction horizon and accuracy, as the
the specific manoeuvres of interest (Altché & de La Fortelle, 2017). predicted trajectory would match the predicted intention. However, if
On-board sensor data can capture different traffic scenarios, however, the predicted manoeuvre is incorrect, the whole predicted trajectory
its data quality can be affected by noises, surrounding vehicles can be may also be inaccurate. The interaction-aware approach uses the tra-
occluded, and in order to detect all the vehicles surrounding the EV jectories and the intentions of both the TV and the SVs to predict the
and the TV, more than one sensor might be required (e.g., front, rear, TV trajectory. This approach further extends the prediction horizon
and sides cameras.) (Izquierdo et al., 2021). On-board sensor data is and improves the accuracy of the predicted trajectories. On the other
particularly advantageous for AV applications because the algorithms hand, it comes with complexities in implementation, demands greater
that use them, could be directly integrated into AVs, which are already computational power, and raises questions about how to determine
equipped with on-board sensors. Several sensors, such as cameras, which vehicles should be considered as SVs, and not all SV might be
radar, and LIDAR could be used to acquire both top-view and on-board reliably detected by the EV.
data (Izquierdo et al., 2019; SIMulation, 2007; Zhou et al., 2020; Zyner The previously cited approaches have been implemented using ei-
et al., 2019). However, this research mainly focuses on works that have ther traditional or DL techniques. Traditional techniques encompass
used camera sensors. For more information about the available datasets Linear methods like KF and Switching Linear Dynamic Models, as
for vehicle behaviour prediction, please refer to Izquierdo et al. (2021). well as Non-linear methods such as EKF, UKF, Switching Non-Linear
Table 2 summarise the most relevant vehicle trajectory and intention Dynamic Models, Particle filters, Bayesian filtering, Monte Carlo sim-
prediction works from 2009 to 2022. From the table, it is observed the ulation, Naive Bayes Classifiers, Dynamic Bayesian Networks, HMM,
following:
SVM, case-based reasoning, random decision Forest, Artificial Neural
Network (ANN), SVM, and Gaussian Process NN (Biparva et al., 2021).
• Shift to Deep Learning and NGSIM Dataset: Up to 2016, the
Traditional techniques have the advantage of being fast to infer and
majority of the works used traditional techniques and their OWN
not requiring an extensive dataset. However, they struggle to generalise
datasets, however, after 2016 most of the works adopted DL
techniques and used the NGSIM dataset. well and have limited prediction horizons. Additionally, most tradi-
tional techniques do not inherently account for vehicle interactions
• Expanding Information Sources: Vehicle behaviour prediction
and may require additional features. The DL techniques used in the
algorithms have evolved from using only motion information to
incorporating additional sources, including manoeuvre, interac- literature were based on ANNs, Convolutional Neural Networks (CNN),
tion, and driver-style information. Fully Connect Networks (FCN), Recurrent Neural Networks (RNN),
Graph Convolutional Neural Networks (GCNN), Gated Recurrent Unit
• Limited Use of Other Datasets: While the NGSIM dataset gained
popularity, other datasets such as Apollo, KITTI, LISA, INTERAC- (GRU), or Long-short Term Memory (LSTM) (Biparva et al., 2021). The
TION, HighD, and PREVENTION were rarely used. main advantage of DL techniques is their ability to implicitly extract
the required features to predict vehicle behaviour. Some DL techniques
• Trajectory Prediction Dominance: The majority of the research
efforts was to predict trajectories. It was not until 2020 that even consider the interaction between vehicles by themselves, for
more research began to address the prediction and recognition instance, RNN and GCNNs. Yet, DL techniques may not address the
of vehicle intentions. multi-modal behaviour of vehicles as they tend to average the multiple
• Focus on Lane Changing and Turning Manoeuvres: Most re- possible modalities to minimise the regression error. They also require
search works focused on predicting the trajectories and intentions an extensive dataset to generalise well, take longer to train, may suffer
related to lane changing and turning manoeuvres. Other types gradient vanishing, and may not provide accurate trajectory prediction
of manoeuvres such as reversing, braking, and U-turns were for longer time horizons.
seldomly used. The following paragraphs will discuss the most relevant DL algo-
• Evaluation Metrics: The most common evaluation metric for rithms used to predict vehicle trajectories.
trajectory prediction was the Root Mean Square Error (RMSE), Altché and de La Fortelle (2017) and Kim et al. (2017), to the
while for intention prediction, accuracy was the predominant authors’ knowledge, were one of the first ones to use LSTM-RNN to
evaluation metric. predict the future trajectories of the surrounding vehicles by using their
past trajectories as input feature. Park et al. (2018) predicted future
The following two subsections discuss the algorithms used to predict trajectories using an encoder–decoder LSTM. The encoder encodes past
vehicle behaviour. The first covers the algorithms used to predict trajectories of the surrounding vehicles, while the decoder decodes
trajectories, and the latter, the algorithms used to detect and predict future trajectories in an Occupancy Grid Map (OGM). The authors
vehicle intention. also applied a beam search algorithm, to reduce the error propagation

5
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 2
Relevant works for vehicle trajectory and intention prediction.
Work Methods Algorithm objectives Dataset-results
PF+RBF Hermes et al. (2009) Trajectory prototype. Particle Filter (PF) to track and Predict future trajectories of the OWN
generate motion hypothesis. RBF to classify trajectories. ego and surrounding vehicles. See Table 3.
QRLCS to measure similarity between trajectories.
Evaluation: RMSE.
Lim et al. (2010) Extended Kalman Filter Estimate Position and Velocity. OWN
Evaluation: Mean Distance Error. Graphs.
Kasper et al. (2012) Bayesian Networks. Detection of lane change OWN
Occupancy Grid Map (OGM). manoeuvre. Accuracy: 83.8%.
Evaluation: FP, FN, and Accuracy.
Kumar et al. (2013) SVM. Predict lane change manoeuvres OWN
Bayesian Filter. of the EV. Recall: 1
Evaluation: Recall, Precision, and F1-score. Precision: 0.8
F1-score: 0.9
APT: 0.97 s
Yoon and Kum (2016) Target lane model to predict in which lane the target vehicle Predict lane change of NGSIM
will go. surrounding vehicles. Absolute error: 0.7 m.
3rd Order Linear System to model trajectory.
Auto encoder to cluster the available trajectories into 3
prototype trajectories.
Multi-layer Perceptron (MLP) network to predict the target
lane and the probability for each one of the prototype
trajectories.
OTH/PTH: (1 s, 2 s, 3 s, 4 s,5 s)/5 s.
Evaluation: Prediction time and absolute error of lateral
position.
Khosroshahi et al. (2016) Features: linear changes, angular changes, and angular Classify manoeuvre intention at KITTI
changes histogram. intersections. 2 classes: 85%.
Multi-layer LSTM. 3 classes: 75%.
Evaluation: Accuracy. 8 classes: 65%.
12 classes: 40%.
Dueholm et al. (2016) Detection: DMP + Feature Pyramid + HOG Predict future trajectories of the OWN
Tracking: MDP + TLD. surrounding vehicles. Recall: 92%
Trajectory: KF.
Evaluation: Recall.
Kim et al. (2017) LSMT-RNN. Predict the future position of the OWN
OGM. surrounding vehicle using OGM. MAE:1.51 for 2 s; 0.88 for 1 s;
Data-driven approach. and 0.59 for 0.5 s.
PTH: 0.5 s, 1 s, and 2 s.
Information: Position, the velocity of surrounding vehicles,
and velocity and yaw rate of ego vehicle.
Evaluation: Mean Absolute Error(MAE).
Lee, Kwon, et al. (2017) CNN. Predict lane change manoeuvre. OWN
Evaluation: Accuracy. Accuracy: 89.87%
DESIRE Observation, sample generation, and rank refinement. Predict the future position of the SDD KITTI
Lee, Choi, et al. (2017) CVAE + RNN (GRU) to predict multi-modal trajectories surrounding vehicles considering See Table 3.
considering latent variables. static and dynamic scene context
IOC (based on Reinforcement Learning) to rank and refine and interaction between agents.
the predicted trajectories.
Spatial Grid-Based Pooling Layer to extract interaction
feature.
SCF to combine agents’ interactions and scene context.
OTH/PTH: 2 s/4 s.
Evaluation: L2 distance error and miss rate.
Altché and de La Fortelle (2017) LSTM encoder–decoder. Predict the target vehicle’s future NGSIM
Evaluation: average RMSE. position by considering See Table 3.
surrounding vehicles.
Xing et al. (2017) Two LSTM networks, one to encode past trajectories and Predict vehicle trajectory using NGSIM
predict intention manoeuvre, the other to encode past past trajectories and predicted See Table 3.
trajectories, and the predicted manoeuvre to decode future manoeuvre intention.
trajectories.
Evaluation: lateral and longitudinal RMSE.
Park et al. (2018) LSTM encoder–decoder. Predict the future position of the OWN
OGM. target and the surrounding MAE (Grid): 1.27 for 2 s; 1.14
Beam search algorithm. vehicles. for 1.6 s; 0.99 for 1.2 s; 0.84
OTH/PTH: 3 s/2 s. for 0.8 s; and 0.64 for 0.4 s.
Evaluation: MAE.

(continued on next page)

6
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 2 (continued).
Work Methods Algorithm objectives Dataset-results
M-LSTM Tracking history and Manoeuvres classification (Lane change, Trajectory prediction of NGSIM
Deo and Trivedi (2018b) brake, and normal driving) to allow multi-modal prediction. surrounding vehicles considering See Table 3.
LSTM encoder–decoder to encode tracked history motions the interaction between traffic
and to decode multi-modal future motions. agents.
OTH/PTH: 3 s/5 s.
Evaluation: RMSE.
C-VGMM+VIM HMM for manoeuvre recognition. Manoeuvre Intention (lane LISA-A
Deo et al. (2018) IMM + VGMM to predict trajectories. change, overtaking, cutting-in, MAE overtakes and cut-ins: 2.49
Markov Random Field for vehicle interaction. drift into ego lane) and for 5 s; 1.94 for 4 s; 1.39 for 3
PTH: 5 s Trajectory Prediction. s; 0.82 for 2 s; and 0.29 for 1 s.
Evaluation: Manoeuvre classification accuracy, mean and MAE stop-and-go: 2.17 for 5 s;
median error for the trajectory prediction. 1.65 for 4 s; 1.14 for 3 s; 0.64
for 2 s; 0.20 for 1 s.
Accuracy for overtakes and
cut-ins: 55.89%
Accuracy stop-and-go: 87.19%
Time: 6FPS.
CS-LSTM LSTM encoder–decoder to encode previous motion Predict future motions of NGSIM
Deo and Trivedi (2018a) information and to decode future motion. surrounding vehicles taking into See Table 3.
Convolutional Social Pooling to learn agent’s consideration motion, spatial Computation time: 0.29 s
interdependence motions. configuration, and (reported by Li et al. (2019b)).
Multi-modal prediction (6 classes: RLC, LLC, NLC, brake, and interdependence between agents.
normal).
OTH/PTH: 3 s / 5 s.
Evaluation: RMSE and Negative log-likelihood (NLL).
SA-LSTM Surrounding-Aware LSTM. Predict lane change manoeuvre NGSIM
Su et al. (2018) OTH: 6, 9, and 12 frames. and future trajectories. Avg. Accuracy: 86.19%.
Evaluation: Accuracy.
MATF Hybrid Model (LSTM + CNN) Trajectory prediction by NGSIM
Zhao et al. (2019) LSTM to encode past trajectories for multiple agents. considering social interaction and See Table 3.
CNN to encode context information. scene context.
MATF to fuse interaction, spatial structure, and context
information.
Conditional generative adversarial training to detect
uncertainty in predicting manoeuvres.
Environment: Highway-Motorway and pedestrian crowd
scenes.
OTH/PTH: 3 s / 5 s.
Evaluation: RMSE.
Benterki et al. (2019) Features: local position, velocity, acceleration, distance to Predict lane change manoeuvres NGSIM
lane markings, yaw angle and rate, lateral velocity, and of the surrounding vehicles. ANN Accuracy: 98.8%.
acceleration. Prediction: 2.4 s.
ANN and SVM. SVM Accuracy: 97.1%.
Evaluation: Recall, Accuracy, Precision, and F1-score. Prediction: 1.9 s.
ST-LSTM Spatio-temporal LSTM. Trajectory prediction by NGSIM I-80
Dai et al. (2019) Short-cut connections to avoid gradient vanishing. considering spatial and temporal See Table 3.
Weighted sum to integrate the outputs. information.
Consider the 6 vehicles around the target vehicle.
OTH/PTH: 3 s/6 s.
Evaluation: RMSE.
GRIP Fixed Graph Convolutional (10 blocks) Model to represent Predict surrounding vehicle NGSIM
Li et al. (2019b) interactions between agents. trajectories considering the See Table 3.
Single LSTM encoder–decoder to make trajectory predictions. interaction between them. Computation time: 0.05 s.
OTH/PTH: 3 s/5 s.
Hardware: 4.0 GHz i7, 32GB memory, and NVIDIA Titan XP.
Evaluation: RMSE.
GRIP++ Dynamic Graph Convolutional (3 blocks) Model to represent Predict surrounding vehicle ApolloScape
Li et al. (2019a) interactions between agents. trajectories considering the WSADE: 1.2588.
Three GRU-RNN encoder–decoder to make trajectory interaction between them. WSFDE: 2.3631.
predictions. NGSIM
OTH/PTH: 3 s/5 s. See Table 3.
Hardware: 4.0 GHz i7, 32GB memory, and NVIDIA Titan XP. Computation time: 0.02 s.
Evaluation: RMSE, WSADE, and WSFDE.
NLS-LSTM Local and non-local social pooling. Predict vehicle trajectory using HighD
Messaoud et al. (2019) LSTM encoder–decoder. local and non-local social pooling. See Table 3
Evaluation: RMSE. NGSIM
See Table 3

(continued on next page)

7
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 2 (continued).
Work Methods Algorithm objectives Dataset-results
Benterki et al. (2020) Hybrid Model Manoeuvre classification and NGSIM
ANN to classify manoeuvres. trajectory prediction. See Table 3
LSTM to predict trajectories.
OTH: 3 s, 5 s, and 6 s.
PTH: 1 s, 3 s, and 5 s.
Evaluation: RMSE and classification accuracy.

Fernández-Llorca et al. (2020) Two stream CNN (Disjoint). Recognition and prediction of PREVENTION
Spatio-temporal Multiplier Networks (ST) (cross-stream lane change/keep manoeuvre Disjoint
connections). using stacked visual cues from Classification Accuracy:89.46%.
ResNet-50 to extract both temporal and contextual videos. Prediction Accuracy:91.02%.
information. ST
OTH/PTH: 2 s/(1–2 s). Classification Accuracy: 90.30%.
4 Sizes of RoI are used x1, x2, x3 and x4. Prediction Accuracy: 91.94%.
Dense optical flow to extract movement context.
Evaluation: Classification accuracy and Prediction Accuracy.
ARIMA-Bi-LSTM Off-line Bi-LSTM. Predict trajectories and turning NGSIM-LP
Zhang and Fu (2020) Online ARIMA + Bi-LSTM. manoeuvres at intersections. GS: lateral 0.032; long. 0.1093.
PTH: 5 s. TL: lateral 0.2719; long. 0.1592.
Evaluation: RMSE and Accuracy. TR: lateral 0.1168 long. 0.3954
Accuracy: 94.2% at 1 s, 93.5%
at 2 s, and 74.5% at 3 s.
Izquierdo et al. (2021) TSM to differ between target and surrounding vehicles. Detection and prediction of lane PREVENTION
TIM to extract motion pattern. change performed by surrounding Manoeuvre Detection:
Greyscale image to extract context information. vehicles. Present a baseline to Accuracy: 82.7%.
Compared various CNN models to detect and predict compare human performance Anticipation:2.28 s.
manoeuvres. against automated systems. Briefly Manoeuvre Prediction:
OTH: 1 s. compared the available datasets. Accuracy: 83.4%.
Evaluation: Accuracy, precision, recall, anticipation (s), and Prediction: 0.72 s.
AUC.
Biparva et al. (2021) 4 action recognition models were evaluated: Two-stream Recognition and prediction of PREVENTION
CNN, Two-stream Inflated 3D CNN, STM network, and lane change/keep event using Accuracy for STM: 91.91% for 2
SlowFast Network. stacked visual cues from videos. s; 86.51% for 1 s.
4 Sizes of RoI.
Dense optical flow to extract movement context.
OTH:PTH: 2 s/(1–2 s).
Evaluation: Accuracy (%).
ST-Conv-LSTM Spatial–temporal Convolutional LSTM. Predict lateral (lane change) and BDD100K
Huang et al. (2021) OTH/PTH: 2.4 s/1 s. longitudinal (holding, sharp Accuracy: 57.9%.
Evaluation: Accuracy. acceleration, deceleration, and
stopping) intention.
IPTM-LSTM Intention encoder–decoder LSTM. Use intention to predict trajectory NGSIM-LP
Zhang, Song, et al. (2021) Trajectory encoder–decoder LSTM. of travelling straight, turning Avg. Intention Accuracy:
IPTM. left/right and braking. 90.94%
Evaluation: Accuracy and RMSE. RMSE: See Table 3
INTERACTION
Avg. Intention Accuracy:
86.92%.
LSTM-GAN LSTM + Generative Confrontation Network. Predict vehicle turning intention. OWN
He et al. (2021) Evaluation: Accuracy. Accuracy: 90.9%.
Luan et al. (2022) Game theory model to predict the intention of the driver. Predict the trajectory of lane NGSIM
Recognise the vehicle behaviour using past vehicle state. change manoeuvres using driver Graphs.
Nash-optimisation function. style (aggressive or conservative)
Evaluation: Lateral position error, yaw rate error, and behaviour recognition.
probability error.
AI-TP Approach: Data-driven. Trajectory prediction. NGSIM
Zhang, Zhao, et al. (2022) Features: Past trajectories. See Table 3
Model(s): graph attention mechanism (AI-TP), ConvGRU,
Evaluation: MSE.

caused by the greedy strategy that the decoder LSTM uses to maximise Trivedi, 2018a) combined convolutional social pooling and encoder–
the output probabilities. decoder LSTM to predict manoeuvres and future trajectories. The con-
Deo and Trivedi (2018b) presented a Manoeuvre-LSTM model that volution social pooling can learn the interaction and interdependence
encodes motion and interaction of the surrounding vehicles to assign of the surrounding vehicles. The downside of the algorithm is that
probabilities for each manoeuvre. The assigned probabilities enable the social tensor of the convolutional social network was fixed to the
multi-modal trajectory predictions. During that period, the algorithm defined spatial grid around the target vehicle, and it did not consider
achieved better RMSE results than the state-of-the-art algorithms, but visual context information. The disadvantage of the last two algorithms
the RMSE values for long PTH were still high. Although the algo- is that the predicted trajectories are dependent on the manoeuvre classi-
rithm considered the interaction between vehicles, it did not consider fication performance. For example, Deo and Trivedi (2018a) compared
their inter-dependencies. In order to overcome this limitation, (Deo & their algorithm with and without considering manoeuvre intention

8
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

however, their results could not be directly compared to state-of-the-art

algorithms such as (Deo & Trivedi, 2018a; Li et al., 2019a, 2019b).
The previously cited works did not take into account the visual
context of the scene, which is an important feature as it considers
the constraints of the environment. Authors (Lee, Choi, et al., 2017)
presented a Deep Stochastic Inverse Optimal Control RNN encoder–
decoder (DESIRE) network that considers scene context. The DESIRE
uses an RNN encoder to encode past trajectories, a Conditional Vari-
ational Auto-Encoder (CVAE) to enable multi-modal predictions, an
RNN decoder to decode future trajectories, and a CNN to extract scene
context information. In order to refine the predicted results, DESIRE
applies Inverse Optimal Control (IOC) to the predicted trajectories
and the extracted context information. The authors concluded that
the model achieving the best results was the one that considered
both scene context and vehicle interactions. Although their algorithm
performs better than linear methods, it cannot be directly compared
to other works in the literature, since they used different metrics and
datasets. Zhao et al. (2019) aimed to predict future trajectories using
interaction information between agents and the scene context. An LSTM
network was used to encode multi-agent past trajectories, and a CNN
was used to extract feature vectors from the scene context. The outputs
Fig. 4. Vehicle Trajectory Prediction Performance using the NGSIM dataset, with an of the LSTM and CNN were fused using a multi-agent tensor fusion
OTH of 3 s, and PTH ranging from 1–5 s (See Table 3).
(MATF) network, and the output of the MATF was then fed into an
FCN to predict the future trajectories. While these last two cited works
considered visual context and achieved good performance, they did not
and they reported that the algorithm without manoeuvre had better outperform algorithms that did not consider visual contexts, such as
performance. GRIP and ST-LSTM.
Dai et al. (2019) claimed that the existing LSTM models suffered Table 3 and Fig. 4 report the results for most algorithms reviewed in
from vanishing gradients and were not able to learn spatial interactions this paper. Note that the graph only contains the works that have used
between traffic agents. Therefore, they modified the conventional LSTM the same dataset, OTH, PTH, and evaluation metrics. The following
model by adding shortcut connections and treated spatial interaction observations can be made from the table and the graph:
between traffic agents as time series. Their model performed better
than the M-LSTM (Deo & Trivedi, 2018b) model, which considered • Not all algorithms can be directly compared since they have used
manoeuvre prediction information. different datasets, metrics, OTH, and/or PTH. Additionally, some
of the works combined the predicted lateral and longitudinal
Li et al. (2019b) presented the Graph-based Interaction-aware Tra-
trajectories to calculate their metrics, while others calculated the
jectory Prediction (GRIP) algorithm to predict future trajectories of
metrics for lateral and longitudinal trajectories, separately.
the TV considering the SV information. GRIP used a GCNN to learn
• When comparing the algorithms that used the same dataset,
interaction patterns between the TV and SVs. The learnt patterns were
metrics, OTH, and PTH, it is observed that GRIP has the best
then fed to an encoder–decoder LSTM model for predicting future
performance for PTHs of 1 s, 2 s, and 3 s; while AI-TP has the
trajectories. GRIP became the state-of-the-art algorithm and was one
best performance for PTHs of 4 s and 5 s.
of the few works to report inference times. The disadvantage of GRIP
• With the exception of KITTI and LISA-A, all the other datasets
is that it used a fixed graph structure to learn the interaction between
are top-view cameras, and the most frequently used dataset is the
agents, which may not be suitable for complex urban scenarios. In
NGSIM.
response, Li et al. (2019a) proposed GRIP++, an enhanced version that
• The most used metric is RMSE.
used both fixed and dynamic graphs to learn the interaction between
• Most of the works adopted an OTH of 3 s and PTH of up to 5 s.
agents. GRIP++ offered improved computational efficiency compared
• It is noticed that the algorithms’ performance worsens as the
to the existing algorithms.
prediction horizon increases.
Benterki et al. (2020) proposed a hybrid method where they com-
bined ANN and LSTM. ANN was first used to classify the target vehicle’s 3.2. Intention recognition and prediction
manoeuvre (LLC, RLC, and NLC) using the following manually se-
lected features: yaw, yaw rate, lateral velocity, and lateral acceleration. The difference between intention recognition and prediction is that
Subsequently, the LSTM used the vehicle’s position and the predicted for intention recognition, the manoeuvre can be recognised without any
manoeuvre to predict future trajectories. While the authors tested their anticipation, while for intention prediction, the manoeuvre event must
algorithm in a real vehicle scenario, only three tests were performed: be recognised before it happens. Generally, the researcher specifies
two for right lane changes and one for left lane-change manoeuvre. the desired anticipation time and then the accuracy of the manoeuvre
Luan et al. (2022) used vehicle behaviour and driver style to predict detection is calculated. The intention of a vehicle’s manoeuvre can
future trajectories. History trajectories of the surrounding vehicles were be recognised by using either prototype trajectories or the manoeuvre
used to determine the type of driver, whether aggressive or conserva- intention estimation method.
tive. Then, the predicted type of driver was used by a game theory The literature assumes that there is a motion pattern for the dif-
model to predict the intention of the driver. Vehicle behaviour was ferent types of vehicle manoeuvres. Consequently, previous trajectory
recognised by using past vehicle state. A comprehensive trajectory samples can be used to define a set of prototypes of trajectories which
was then predicted by feeding the predicted driver intention and the are then used to represent the different motion patterns. Vehicle ma-
recognised vehicle behaviour into two Nash-optimisation functions. noeuvres are then predicted by using initially observed trajectories
The authors claimed that with the inclusion of the type of driver performed by the vehicle, and matching it to the best available motion
information, the prediction of the vehicle trajectory was improved, patterns. However, this approach is computationally expensive because

9
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 3
Results for the most relevant vehicle trajectory prediction works.
Work Dataset Metrics Axis Obs. Hor. 1 s 2 s 3 s 4 s 5 s 6 s
NGSIM RMSE Both 3 s 0.73 1.78 3.13 4.78 6.68 -
CV
Deo and Trivedi (2018a)
S-LSTM NGSIM RMSE Both 3 s 0.65 1.31 2.16 3.25 4.55 -
Alahi et al. (2016)
GAIL-GRU NGSIM RMSE Both 3 s 0.69 1.51 2.55 3.65 4.71 -
Kuefler et al. (2017)
C-VGMM+VIM NGSIM RMSE Both 3 s 0.66 1.56 2.75 4.24 5.99 -
Deo et al. (2018)
M-LSTM NGSIM RMSE Both 3 s 0.58 1.26 2.12 3.24 4.66 -
Deo and Trivedi (2018b)
CS-LSTM(M) NGSIM RMSE Both 3 s 0.62 1.29 2.13 3.20 4.52 -
Deo and Trivedi (2018a)
CS-LSTM NGSIM RMSE Both 3 s 0.61 1.27 2.09 3.10 4.37 -
Deo and Trivedi (2018a)
MATF GAN NGSIM RMSE Both 3 s 0.66 1.34 2.08 2.97 4.13 -
Zhao et al. (2019)
ST-LSTM-1350 NGSIM RMSE Both 3 s 0.56 1.19 1.93 2.78 3.76 4.84
Dai et al. (2019) avg.
GRIP NGSIM RMSE Both 3 s 0.37 0.86 1.45 2.21 3.16 -
Li et al. (2019b)
GRIP++ NGSIM RMSE Both 3 s 0.38 0.89 1.45 2.14 2.94 -
Li et al. (2019a)
AI-TP NGSIM RMSE Both 3 s 0.47 0.1.05 1.53 1.93 2.31 -
Zhang, Zhao, et al. (2022)
NLS-LSTM NGSIM HighD RMSE Both 3 s 0.560.20 1.22 0.57 2.02 1.14 3.03 1.90 4.30 2.91 - -
Messaoud et al. (2019)
OGM-LSTM NGSIM RMSE Lateral Longi. 0.56 3.05 1.24 6.70 - - - - - - - -
Kim et al. (2017)
Dual LSTM NGSIM RMSE Lateral Longi. 5 s 0.15 0.47 0.26 1.39 0.38 2.57 0.45 4.04 0.49 5.77 - -
Xing et al. (2017)
Altché and de La Fortelle (2017) NGSIM RMSE Lateral Longi. 0.11 0.71 0.25 1.98 0.33 3.75 0.40 5.96 0.47 9.00 - -
ANN-LSTM NGSIM RMSE Lateral Longi. 3 s 0.043 0.122 - - 0.125 0.235 - - 0.235 0.264 - -
Benterki et al. (2020)
IPTM-LSTM NGSIM-LP RMSE Both 3 s 0.77 1.34 2.19 – – -
Zhang, Song, et al. (2021)
MATF GAN Massachusetts RMSE Both 3 s 0.75 1.4 2.0 2.7 – -
Zhao et al. (2019)
PF+RBF OWN RMSE Both – 0.7 1.4 5.0 – – -
Hermes et al. (2009)
CS-LSTM(M) NGSIM NLL Both 3 s 0.58 2.14 3.03 3.68 4.22 -
Deo and Trivedi (2018a)
C-VGMM+VIM LISA-A MAE Both 3 s 0.24 0.69 1.18 1.66 2.18 -
Deo et al. (2018)
DESIRE KITTI SDD DE PE Both 2 s 0.281.29 0.67 2.35 1.22 3.47 2.06 5.33 – –
Lee, Choi, et al. (2017)

it requires a substantial number of sample trajectories to determine the Bayesian Networks, HMM, and SVM. DL methods commonly used are
numerous possible motion patterns. RNN, LSTM, and action recognition models.
In contrast, the manoeuvre intention estimation methods use vehicle The following paragraphs will discuss the most relevant DL algo-
motion and road context features to classify the different types of ma- rithms used to predict vehicle intention manoeuvre.
noeuvres, for instance, stopping/non-stopping, turning left/right, etc. Khosroshahi et al. (2016) implemented a multi-layer LSTM net-
Although this method is less complex than calculating the numerous work to classify manoeuvre intentions at complex intersections. They
trajectory probabilities, a large training dataset is required to make the extracted samples representing manoeuvres intentions from the KITTI
system robust to the different road scenarios. Another limitation is that dataset to train and test the algorithm. The input features included
the manoeuvre classes may not be sufficient to cover the real vehicle linear and angular changes, as well as a histogram of angular changes
intention complexity. For instance, the system may predict a braking of the vehicle trajectories. The authors performed experiments with
manoeuvre, but the braking can be normal or harsh. A proposed different numbers of manoeuvre classes: 2 (straight or turning), 3
solution is to sub-categorise the manoeuvre, for example, normal/harsh (straight, turning left/right), 8 and 12 classes. The algorithm performed
stopping, normal/sharp right/left turn, however, this adds complexity well with 2 and 3 classes, but the accuracy significantly decreases with
to the dataset labelling (Mozaffari et al., 2020). 8 and 12 classes.
Intention prediction algorithms can also use predicted trajectories Lee, Kwon, et al. (2017) transformed real-world images into a
and interaction between vehicles to achieve better accuracy. Tradi- simplified version of Bird’s Eye View (BEV) and fed them into a
tional methods used to predict the intention of vehicles are Heuristics, CNN to predict lane change behaviour. Zhang and Fu (2020) used an

10
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

offline Bidirectional LSTM to learn driving behaviour and an online Table 4

Auto-Regressive Integrated Moving Average (ARIMA) to learn past Features and information used to predict pedestrian intention.

trajectories and predict future ones. The outputs of the offline Bi-LSTM Feature Information

and ARIMA were then fed into another Bi-LSTM to recognise turning Bbox coordinates Position, speed, height and width.
behaviour as left-turn, right-turn, or going straight. The algorithm went Bbox cropped image Pedestrian appearance, local and
through evaluation using the NGSIM Lankershim and Peachtree Street surrounding context.
dataset, and was able to meet real-time requirements while achieving Full image Global context and some interaction
good accuracy recognition for the PTH of 1 s and 2 s. However, between different traffic objects.
accuracy dropped when considering PTH of 3 s, and it only considered Body Pose Displacement, action, skeleton, and
turning left/right and going straight manoeuvres. Whereas vehicles landmarks.
at intersections can perform more complex manoeuvres as reported Ego vehicle position/speed Interaction between pedestrian and ego
by Khosroshahi et al. (2016). In addition, the dataset used was acquired vehicle. Pedestrian behaviour is affected
by ego vehicle speed.
from top-view sensors while AVs are equipped with on-board camera
sensors. Benterki et al. (2019) compared two conventional methods
to predict lane-change manoeuvre, ANN and SVM. They concluded
that ANN and SVM have almost the same performance; however, ANN techniques used over the years for addressing the pedestrian behaviour
showed the best results. prediction.
Izquierdo et al. (2021) used CNN, action recognition, and prediction Pedestrian behaviour prediction has been applied in three main
methods to recognise and predict lane-keeping/changing manoeuvres. types of datasets: datasets that are recorded using drones, for example,
Instead of using a sequence of images, they encoded context, interac- ETH and UCY; datasets recorded from static cameras; and datasets
tion, and dynamic state information in a unique enriched image. The recorded from a car dash cameras, for example, Daimler, JAAD, PIE
enriched image was created by extracting the red channel from a grey- or KITTI. Datasets from car cameras are more appropriate to train
scale version of the original image, using a target selection method models for AV because they provide more realistic representation.
(TSM), and a temporal integration method (TIM). The authors also However, when the car is in motion, it may affect the position of
investigated the human performance in recognising and predicting lane the pedestrian bounding box, and pedestrians can be easily occluded.
changes. Their findings indicated that humans can detect 83.9% of the Car cameras datasets can be categorised as either naturalistic or non-
lane change events with an average anticipation of 1.66 s before the naturalistic, as discussed by Fang and López (2018). In non-naturalistic
manoeuvre is completed. Only 3 out of 72 users were able to predict the datasets, the pedestrian behaviours and intentions are performed by
lane change events before they started, with an average prediction hori- actors, whereas, in naturalistic datasets, behaviours and intentions
zon of 1.08 s. On the other hand, their best algorithm, which considers are recorded from actual road traffic scenarios. Some of the features
the trade-off between accuracy and anticipation, achieved 86.4% accu- that have been used for predicting pedestrian behaviour are listed in
racy with an average anticipation of 2.09 s when considering TTE equal Table 4. Pedestrian behaviour prediction has been heavily investigated
to 0. When TTE was set to 1 s, their algorithm achieved an anticipation in the past years and it has many challenges. For instance, pedestrians
of 2.69 s, a prediction of 0.72 s, and an average accuracy of 83.4%. are highly dynamic, they can move in many directions and change them
Fernández-Llorca et al. (2020) and Biparva et al. (2021) recognised very quickly, and can be easily occluded by other objects. They can
and predicted lane-keeping/changing manoeuvres using video action also become distracted by their own objects or external environments,
recognition approaches. Biparva et al. (2021) used four types of video their movements may be influenced by other traffic agents, and they
action recognition approaches: Two-stream CNN, Two-stream Inflated can be difficult to detect in poor visibility conditions. As reported
3D CNN, spatio-temporal Multiplier Networks, and SlowFast Networks. in Tables 5 and 7, researchers have proposed various methods and
All of the aforementioned networks used spatial and temporal informa- features to address these challenges over the years. From these tables,
tion from a single image, a sequence of images, or a sequence of optical the following observations can be made:
flow images for recognition and prediction tasks. Moreover, four sizes
• Until 2018, most of the works used traditional methods and their
of RoI were used, denoted as x1, x2, x3 and x4, to consider the inter-
OWN dataset. Thereafter, most authors adopted DL techniques
action between agents, and to extract contextual information around
and used the ETH and UCY datasets for trajectory prediction, as
the target vehicle. The network with the best recognition performance well as JAAD and PIE datasets for intention prediction.
was the SlowFast CNN achieving an accuracy of 90.98% with an OTH
• Pedestrian behaviour prediction algorithms have evolved, from
of 2 s before the TTE. Meanwhile, the network with the best prediction
solely using motion information to using pedestrian appearance,
performance was the spatiotemporal multiplier, achieving an accuracy
body pose landmarks, local/global context, interactions between
of 91.94% with an OTH of 2 s. The limitations of the previously cited agents, and ego vehicle dynamics.
works are as follows: the distribution of the manoeuvre classes was
• Prior to 2018, the focus was predominantly on trajectory predic-
imbalanced, with more lane-keeping samples than lane-changing ones;
tion, thereafter substantial research efforts have been dedicated
the time required to recognise and predict a single instance was not to predict pedestrian intentions.
provided; and some of the algorithms, such as the SlowFast network,
• Most of the intention prediction works were to predict the cross-
was not able to complete its training due to the GPU memory limitation.
ing intention.
Furthermore, it was observed from the previous vehicle intention
• The most used evaluation metrics for intention prediction were
prediction works that the authors have selected a fixed PTH to predict
accuracy, F1-score, precision, recall, Area Under the Curve (AUC),
the vehicle’s intentions. The drawback of using a fixed PTH is that
and Receiver Operating Characteristic Curve (ROC-AUC).
manoeuvre samples may vary in length. For instance, the lane-change
• The most used evaluation metrics for trajectory prediction were
manoeuvre performed by an aggressive driver will be shorter than a
Average Displacement Error (ADE), Average Final Displacement
lane-changing manoeuvre performed by a normal driver.
Error (FDE), and MSE. Other metrics are Average Non-linear
Displacement Error (ANDE), Mean Average Displacement (MAD),
4. Pedestrian behaviour prediction
and Final Average Displacement (FAD).
At present, AV systems can effectively detect and track pedestrians, The following subsections discuss some of the algorithms reported
however, this alone is not enough to prevent potential collisions. In orin Tables 5 and 7. The first subsection provides an in-depth exploration
der to avoid a collision, AV systems must predict pedestrian behaviours. of trajectory prediction algorithms, while the subsequent subsection
This section aims to provide a literature review of the challenges and explore intention prediction algorithms.

11
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 5
Relevant works for pedestrian trajectory prediction.
Work Methods Dataset/Results
Schneider and Gavrila (2013) Approach: Dynamic. Daimler
Features: constant velocity, acceleration, turn. IMM has not shown
Models: Recursive Bayesian filters – Compared EKF and IMM significant performance over
filters. simpler models.
PTH: < 2 s.
Evaluation: MLPE.
Keller and Gavrila (2013) Approach: Dynamic. OWN (on-board)
Features: optical flow. GDPM and PHTM showed
Compared the performance between GDPMs, PHTM, KF and better accuracy, however, they
IMMKF. are more computationally
Provided human performance on classifying pedestrian expensive.
behaviour prediction. 10–50 cm Time Horizon 0.77
Evaluation: Mean Combined Longitudinal and Lateral RMSE. s.
Kooij et al. (2014) Approach: Dynamic + Context. OWN (on-board)
Features: Head orientation, distance between vehicle and Outperforms state-of-art
pedestrian, distance between pedestrian and curb. algorithm PHTM. Best result
Models: Dynamic Bayesian Filters (SLDS). of −0.33 was achieved in the
Evaluation: Predictive log likelihood. critical, vehicle-seen and
stopping scenario using the
full context information.
Social-LSTM Alahi et al. (2016) Approach: Data driven. ETH and UCY
Features: Past trajectories. ADE/FDE/AND:
Models: Social pooling layer, and LSTM. 0.27/0.61/0.15.
OTH/PTH: 8 (3.2 s)/12 (4.8 s) frames.
Evaluation: ADE, FDE, and AND.
Karasev et al. (2016) Approach: Dynamic + Context. OWN (on-board) for training
Features: pedestrian state (position, orientation, and speed), and KITTI for evaluation.
predicted goals, environment context (building, sidewalk, Displayed in a graph.
crosswalk, road and grass), dynamic environments such as
traffic lights, and assumed rational behaviour for the agent.
Models: Jump-Markov Process, and Rao-Blackwellized filter.
Evaluation: L2 error, and Average prediction error.
Rehder et al. (2018) Approach: Data driven + Goal-directed. OWN (on-board)
Features: visual cues, predicted pedestrian destinations, and Outperformed IMM. Results
trajectories. were not clear, but from graph
Models: RMDN, LSTM, topology network, and Markov Prediction accuracy 10(−1) for
Decision Process. 1.5 s. Destination plays an
Evaluation: Predicted probability distribution, Average important role when trying to
accuracy of predicted destination, and prediction accuracy predict pedestrian intention.
over time.
SR-LSTM Zhang et al. (2018) Approach: Data driven and social behaviour. ETH and UCY
Features: trajectories and current state of the neighbours. MAD: 0.45; FAD: 0.94.
Model(s): SR-LSTM and attention mechanism.
Evaluation: MAD, and FAD.
Social-GAN Approach: Data driven. ETH, UCY
Gupta et al. (2018) Features: Past trajectories. ADE: 0.39/0.58.
Model(s): GAN, Pooling Module, and LSTM. FDE: 0.78/1.18.
PTH: 8 and 12 metres.
Evaluation: ADE and FDE.
Social attention Approach: Data driven. ETH and UCY
Vemula et al. (2018) Features: Past trajectories. ADE: 0.30 m.
Model(s): ST-Graph, LSTM, and Attention. FDE: 2.59 m.
OTH/PTH: 8 (3.2 s)/12 (4.8 s) time steps.
Evaluation: ADE and FDE.
SS-LSTM Approach: Data driven. ETH and UCY
Xue et al. (2018) Features: Past trajectories, neighbour feature (occupancy ADE: 0.070 pixels.
maps: grip, circle and log), and individual information. FDE: 0.133 pixels.
Model(s): CNN, and Hierarchical-LSTM.
OTH/PTH: 8/12 frames.
Evaluation: ADE and FDE.
CIDNN Approach: Data driven. GC/ETH/UCY/CUHK/Subway
Xu et al. (2018) Features: Past trajectories, and interactions. ADE:
Model(s): stacked-LSTM, and MLP. 0.012/0.09/0.12/0.008/0.016.
OTH/PTH: 5/5 frames. Inference: 0.43 ms
Hardware: Intel Xeon CPU E52643 4.40 and TITAN GPU.
Evaluation: ADE.

(continued on next page)

12
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 5 (continued).
Work Methods Dataset/Results
LSTM-Bayesian Approach: Data driven. CityScapes(on-board)
Bhattacharyya et al. (2018) Features: Bbox coordinates past trajectories and ego MSE/NLL: 505/3.92.
vehicle odometry.
Model(s): Two stream architecture, Bayesian RNN
(LSTM), and CNN.
OTH/PTH: 0.5/1 s.
Evaluation: MSE in pixels and NLL.
DBN-SLDS Approach: Data driven. OWN (on-board,
Flohr et al. (2018) Features: context cues (VRU actions, and its static and non-naturalistic
dynamic environment). Graphs.
Model(s): DBN and SLDS.
TTE = [−15, 0]
PTH:1 s.
Evaluation: Prediction error.
MX-LSTM Approach: Data driven. UCY
Hasan et al. (2018) Features: Past trajectories, and head pose estimation. MAD/FAD: 0.49/1.12 m.
Model(s): tracklets, vislets, VFO social pooling, and Towncentre
LSTM. MAD/FAD: 1.15/2.30 m.
OTH/PTH: 8/12 frames.
Evaluation: MAD and FAD in metres.
Scene-LSTM Approach: Data driven. UCY and ETH
Manh and Alaghband (2018) Features: Past trajectories and scene divided into grid ADE/FDE/NDE: 0.7/0.7/0.9.
cells.
Model(s): Scene Data Filter, and Coupled-LSTM.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE, FDE and NDE.
SoPhie Approach: Data driven. ETH, UCY
Sadeghian et al. (2019) Features: Past trajectories, social interactions, and images ADE: 0.54 m.
of the scene. FDE: 1.15 m.
Model(s): CNN, LSTM, GAN, Social and physical SDD
attention mechanism. ADE: 16.24 pixels.
PTH: 12 future timesteps. FDE: 29.38 pixels.
Evaluation: ADE and FDE.
StarNet-DNN Approach: Data driven. ETH and UCY
Zhu et al. (2019) Features: Past trajectories. ADE/FDE: 0.30/0.57.
Model(s): StarNet DNN (Host and hub networks), and Inference: 0.073 s.
LSTM.
PTH: 8 frames.
Hardware: Tesla V100 GPU.
Evaluation: ADE and FDE.
PECNet Approach: Data driven and goal directed. ETH and UCY
Mangalam et al. (2020) Features: Past trajectories and estimated end point ADE/FDE: 0.29/0.48 m.
destination. SDD
Model(s): CVAE, attention mechanism, and social ADE/FDE: 9.96/15.88 p.
pooling.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE and FDE.
ST-GCNN Approach: Data driven. ETH and UCY
Mohamed et al. (2020) Features: Past trajectories and sequence of images. ADE/FDE: 0.44/0.75 m.
Model(s): GCN, and TXP-CNN.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE and FDE.
RSBG Approach: Data driven. ETH and UCY
Sun et al. (2020) Features: Past trajectories and local context. ADE/FDE: 0.48/0.99 m.
Model(s): GCN, CNN, and LSTM.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE and FDE.
LVTA Approach: Data driven. ETH and UCY
Xue et al. (2020) Features: Past trajectories and velocities. ADE/FDE: 0.46/0.92 m.
Model(s): attention mechanism, and LSTM.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE and FDE.
Holistic-LSTM Approach: Data driven. JAAD
Quan et al. (2021) Features: bbox past trajectories, crossing intention, MSE: 389.
pedestrian scale, depth estimation, and global scene PIE
dynamics (depth and optical flow). MSE: 167.
Model(s): ConvLSTM, modified LSTM with more inputs, S-KITTI
and attention mechanism. MSE: 525/1.5 s.
OTH/PTH: 0.5/1 s.
Evaluation: MSE, CMSE, and CFMSE of the bbox
coordinates.

(continued on next page)

13
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 5 (continued).
Work Methods Dataset/Results
Bi-TraP Approach: Data driven and Multi-modal goal estimation. JAAD
Yao et al. (2021a) Features: bbox past trajectories. ADE: 1206.
Model(s): CVAE, Gaussian distribution, GMM, and PIE
Bi-directional GRU. ADE: 511.
OTH/PTH (JAAD/PIE): 0.5/1.5 s. ETH-UCY
OTH/PTH (ETH/UCY): 3.2/4.8 s. ADE/FDE: 0.18/0.35.
Evaluation: ADE and FDE.
BA-PTP Approach: Data driven. PIE
Czech et al. (2022) Features: vehicle odometry, bbox, body, head MSE/CMSE/CFMSE:
orientation, and pose. 420/383/1513.
Model(s): attention mechanism and Bi-GRU, ECP-Intention
OTH/PTH (PIE): 0.5/1.5 s. MSE/CMSE/CFMSE:
OTH/PTH (ECP): 0.6/1.6 s. 768/680/1966
Evaluation: MSE, CMSE, and CFMSE.
SGNet Approach: Data driven, and goal directed. JAAD
Wang et al. (2022) Features: Past trajectories. MSE/CMSE/CFMSE:
Model(s): Stepwise goal estimator, attention mechanism, 1049/996/4076 p (1.5 s).
GRU, and CVAE. PIE
OTH/PTH (JAAD, PIE, HEV-I): 1.6/0.5,1.0,1.5 s. MSE/CMSE/CFMSE:
OTH/PTH (ETH & UCY): 3.2/4.8 s. 442/413/1761 p (1.5 s).
OTH/PTH (NuScenes): 2/6 s. ETH and UCY
Evaluation: MSE, CMSE, CFMSE, ADE and FDE. ADE/FDE: 0.35/0.83
Euclidean space.
NuScenes
ADE/FDE: 1.32/2.50.
PTPGC Approach: Data driven. ETH and UCY
Yang, Sun, et al. (2022) Features: Past trajectories, length of attributes, and ADE/FDE: 0.67/1.29.
number of pedestrians.
Model(s): Graph attention, convLSTM, and Temporal
CNN.
OTH/PTH: 3.2/4.8 s.
Evaluation: ADE and FDE.

4.1. Trajectory prediction layers that extract relevant features from the neighbours using non-
local attention. Yao et al. (2021a) also proposed a goal-direct method,
Both traditional and DL techniques have been used in order to where they combine CVAE and bi-directional GRU to encode past
predict pedestrian trajectories. Traditional techniques relies on hand- trajectories and decode multi-modal future trajectories. Goal-directed
crafted functions, such as EKF, IMM, and social forces, to predict pedes- models have the disadvantage that only one goal is estimated over a
trians’ future trajectories. However, these functions have limitations long-term prediction. For this reason, if a pedestrian changes direction
in handling complex scenarios. To address this, several researchers the estimated goal may be incorrect, and consequently affecting the
adopted DL techniques such as: CNN, Generative Adversarial Net- estimated predicted trajectories. Wang et al. (2022) proposed a method
work (GAN), GCNN, LSTM, GRU, CVAE, attention mechanism, and/or where they model and estimate goals continuously by using RNNs.
Multi-Layer Perceptron (MLP). While many studies relied on historical trajectories for predicting
Although LSTM networks have many advantages, it struggles to future ones, they often overlooked the current state of the pedestrian.
learn dependencies between multiple correlated sequences. For this In order to overcome this issue, Zhang et al. (2019) introduced a state
reason, Alahi et al. (2016) proposed a Social LSTM network to predict refinement LSTM that considered both the current and previous state
pedestrian trajectories. Social pooling layers were introduced to enable of the target pedestrian and the surrounding pedestrians. This state
LSTM networks to share their hidden state. This enables the algorithm refinement module enables the network to incorporate interactions
to learn interactions among pedestrians. Social-LSTM only considers through a message-passing mechanism. It also uses a motion gate as
motion features to model human interactions, however, Xu et al. (2018) an attention mechanism to focus on the most relevant features of the
argues that spatial position should also be considered. For this reason, neighbours.
they presented a model where MLP layers were used to encode location, Previous research, when considering human-to-human interactions,
and LSTM was used to encode motion for each neighbour. Both sets of would often take into account only nearby neighbours, even though
encoded information were then used as input to a crowd interaction more distant neighbours might also influence the behaviour of the
module to predict pedestrian displacement. In a different approach, Xue target pedestrian. A GAN was presented by Gupta et al. (2018) that
et al. (2020) used two LSTM layers to encode the pedestrian’s location not only considers local neighbours but all neighbours in the scene. The
and velocity, along with a temporal attention mechanism to extract the GAN network comprises an LSTM generator to generate multi-potential
most relevant features from the velocity and location inputs. trajectories, a pooling module to learn human-to-human interactions,
Humans are highly dynamic, which makes the task of predict- and an LSTM discriminator to select acceptable trajectories from the
ing their trajectories more challenging. In response to this, Rehder generated ones. Similarly, Vemula et al. (2018) considered all the
et al. (2018) implemented a DNN that would first predict the future pedestrians in the scene using a spatio-temporal graph and LSTM. Ad-
destinations of the pedestrians, and then predict their future trajec- ditionally, they adopted an attention mechanism to learn the relevance
tories. They have used CNN, LSTM and Mixture Density Network to of each agent, regardless of how far they are from each other. A star-
predict potential destinations, and another CNN to plan and predict like network was introduced by Zhu et al. (2019) to account for all
future trajectories based on these potential destinations. CVAE was agents in the scene. The network has a centralised hub network, which
used by Mangalam et al. (2020) to predict future endpoints, these then gathers motion information from all pedestrians in the scene, and a
were subsequently used to predict multi-modal longer-term trajecto- host network for each pedestrian. The host networks query the hub
ries. They also presented a novel self-attention-based social pooling network for social information to predict trajectories. Graph attention

14
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

and convolutional LSTM were also proposed by Yang, Sun, et al. (2022)
to consider the surrounding neighbours.
Xue et al. (2018) emphasised the importance of considering scene
layout when predicting pedestrian trajectories. As a result, they used
three different LSTMs to learn information about individuals, social
interactions, and scene layout. One LSTM used the trajectory of the
target pedestrian as its input, another used an occupancy map as
its input, and the final one used feature vectors extracted from the
original image by a CNN as its input. Likewise, Manh and Alaghband
(2018) took scene layout into account, where they used a two-level
grid structure of the original image and trajectory information as
inputs to a two-stream LSTM for predicting future trajectories. CNN,
LSTM, attention mechanism, and GAN were used by Sadeghian et al.
(2019) to predict trajectories using both past trajectories and scene
context as inputs. The CNN extracted scene-related features, the LSTM
extracted motion-related features, the attention mechanism extracted
both the physical and position relevant features, and the GAN generated
multiple trajectories and then selected the most suitable ones.
Mohamed et al. (2020) classified methods such as social pooling
or the combination of hidden state features, used to model human
Fig. 5. Pedestrian Trajectory Prediction Performance using the ETH and UCY datasets,
interactions, as ‘‘aggregation methods’’. They claimed that these types with an OTH of 3.2 s, a PTH of 4.8 s, and Average Displacement Error (ADE) in metres
of methods have limitations in accurately modelling human interac- (See Table 6).
tions because the aggregation occurs within the feature space and
does not directly model physical interactions. Furthermore, some of
these aggregation methods, such as pooling layers, may overlook to Table 6 and Fig. 5 report the results for the most relevant studies in
capture important information. Given these considerations, the authors pedestrian trajectory prediction. It is not possible to directly compare
proposed a social spatio-temporal GCN (ST-GCN) to model interactions all of them since some of them have used different datasets, metrics,
among pedestrians. The ST-GCN model’s output is subsequently used OTH, and PTH. However, when examining the results of the algorithms
as input for a time extrapolate CNN to predict future trajectories. that used the same dataset, metrics, OTH, and PTH, the Bi-Trap (Yao
The above works have not considered group-based interactions, et al., 2021a) algorithm outperformed others. Bi-Trap achieved ADE
which involve two or more individuals exhibiting similar movements, and FDE values of 0.18 m and 0.35 m, respectively.
behaviours, or goals. A recursive social behaviour graph and GCN
was implemented by Sun et al. (2020) to explore and learn group- 4.2. Intention recognition and prediction
based interactions. The authors also used CNN and LSTM to obtain an
individual representation of each pedestrian in the scene. The individ- The difference between pedestrian intention recognition and pre-
ual representations, along with the learned group-based features, were diction aligns with what was explained on Section 3. Recognition does
combined and used by a decoder LSTM to predict future trajectories. not require anticipation, while prediction does. The main methods
Bhattacharyya et al. (2018) claimed that they were the pioneers in used to predict pedestrian intentions include CNN, GCNN, GRU, LSTM,
using an on-board dataset to predict pedestrian behaviour. The authors attention mechanism, multi-tasking, and transformer networks.
used a two-stream LSTM architecture to encode bounding box coordi- CNN: Fang et al. (2017) and Fang and López (2018) used CNNs to
nates, ego-vehicle odometry information, and feature vectors extracted extract human skeleton features and used SVM/RF classifier to predict
from the original image by a CNN. Another work that used an on-board if the pedestrian is crossing the road. Abdulrahim and Salam (2016)
dataset is (Czech et al., 2022), in which the authors used a multi- also used CNNs, along with depth information to learn 3D human body
stream RNN to individually encode bounding box coordinates, head landmarks, including additional information such as the pedestrian
orientation, body orientation, pose skeleton, and past trajectories. The shoulders, neck, and face. While CNNs can extract spatial features, their
encoded information from each stream is fused through an attention capability to capture temporal dependencies is limited. To overcome
mechanism and subsequently input to an RNN decoder to predict future this limitation, Yang et al. (2021) implemented a 3D-CNN to extract
bounding boxes. The drawback of the latter two algorithms is that they spatio-temporal information. Additionally, Piccoli et al. (2020) pro-
did not consider social interaction among the agents. posed an alternative model called FuSSI-Net, designed to extract both
Hasan et al. (2018) argues that head orientation and movement are spatio-temporal information. FuSSI-Net is a spatio-temporal Dense-net
correlated. Consequently, they proposed a two-stream LSTM to encode that takes a sequence of bounding boxes and skeleton features as
both trajectory and head orientation information. The two encoded inputs to predict crossing intention. Although these last two models can
information, were then merged using a View Frustum social pooling extract spatial and temporal information, they are limited to short-time
layer. The disadvantage of this method is that it is only suitable for horizon prediction and become computationally expensive as the input
top-view and BEV datasets. sequence length increases.
Usually, when a system adopts LSTM networks and requires the LSTM: Rasouli et al. (2019) used LSTM to encode local context,
use of multiple types of inputs, these inputs are first combined be- trajectories, and ego vehicle information. Subsequently, the encoded
fore being fed to LSTM cells. This practice is required because LSTM information was decoded to estimate the probability of a pedestrian
cells are designed to accept only a single input sequence, which can crossing the road. Bouhsain et al. (2020) used bounding box coor-
constrain their ability to capture relevant information from various dinates and velocities features as inputs for a sequence-to-sequence
input sources. Quan et al. (2021) adapted the conventional LSTM cell LSTM, which was used to predict both the pedestrian intentions and
to accept four additional input sequences: vehicle speed, pedestrian the future position of the pedestrians’ bounding boxes. In a different
intention, correlation among frames, and bounding box location. The approach, Lian et al. (2022), introduced a stacked-LSTM model, where
vehicle speed was estimated by using optical flow and depth informa- appearance, context, and dynamic features of the pedestrian were
tion; the pedestrian intention was estimated using convLSTM; and the used to predict crossing intentions. LSTM networks have the ability
correlation among frames was derived from optical flow images. to learn and memorise features over the long term, as they capture

15
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 6
Results for the most relevant pedestrian trajectory prediction works.
Work Dataset OTH PTH ADE FDE AND MAD FAD MSE
Social-LSTM ETH & UCY 3.2 s 4.8 s. 0.27 m 0.61 m 0.15 m – – –
Alahi et al. (2016)
Scene-LSTM ETH & UCY 3.2 s 4.8 s 0.7 m 0.7 m 0.9 m – – –
Manh and Alaghband (2018)
Social-GAN ETH & UCY 3.2 s 4.8 s 0.48 m 0.98 m – – – -
Gupta et al. (2018)
Social-attention ETH & UCY 3.2 s 4.8 s 0.30 m 2.59 m – – – -
Vemula et al. (2018)
Sophie ETH & UCY 3.2 s 4.8 s 0.54 m 1.15 m – – – –
Sadeghian et al. (2019) SDD 16.24 pi 29.38 pi – – –
StarNet-DNN ETH & UCY 3.2 s 4.8 s 0.30 m 0.57 m – – – -
Zhu et al. (2019)
PECNet ETH & UCY 3.2 s 4.8 s 0.29 m 0.48 m – – – –
Mangalam et al. (2020) SDD 9.96 pi 15.88 pi – – –
ST-GCNN ETH & UCY 3.2 s 4.8 s 0.44 m 0.75 m – – – –
Mohamed et al. (2020)
RSBG ETH & UCY 3.2 s 4.8 s 0.48 m 0.99 m – – – –
Sun et al. (2020)
LVTA ETH & UCY 3.2 s 4.8 s 0.46 m 0.92 m – – – –
Xue et al. (2020)
Bi-TraP ETH & UCY 3.2 s 4.8 s 0.18 m 0.35 m – – – –
Yao et al. (2021a) JAAD 0.5 s 1.5 s 1206 – – – –
PIE 0.5 s 1.5 s 511 – –
SGNet ETH & UCY 3.2 s 4.8 s 0.35 m 0.83 – – – -
Wang et al. (2022) JAAD 1.6 s 1.5 s – — – – 1049
PIE 1.6 s 1.5 s – — 442
NuScenes 2 s 6 s 1.32 2.5 -
SGNet ETH & UCY 3.2 s 4.8 s 0.35 m 0.83 m – – – –
Wang et al. (2022)
PTPGC ETH & UCY 3.2 s 4.8 s 0.67 m 1.29 m – – – –
Yang, Sun, et al. (2022)
SS-LSTM ETH & UCY 3.2 s 4.8 s 0.070 npu 0.133 npu – – – -
Xue et al. (2018)
SR-LSTM Zhang et al. (2018) ETH & UCY 3.2 s 4.8 s – – – 0.45 0.94 –
CIDNN ETH & UCY 4 s 4 s 0.11 – – – – –
Xu et al. (2018)
MX-LSTM UCY 3.2 s 4.8 s – – – 0.49 m 1.12 m –
Hasan et al. (2018) Towncentre – – 1.15 m 2.30 m –
Holistic-LSTM JAAD 0.5 s 1 s – – – 389
Quan et al. (2021) PIE 1 s – – 167
S-KITTI 1.5 s 525
BA-PTP PIE 0.5 s 1.5 s – – – 420
Czech et al. (2022) ECP 0.6 s 1.6 s – – 768

long-distance dependencies (Chung et al., 2014). Nevertheless, they GCN: A spatio-temporal GCN was presented by Zhang, Angeloudis,
have limitations in extracting spatial features, managing dependencies and Demiris (2022), where they used a sequence of skeleton features
among the extracted features, exhibiting longer training times, and to predict crossing intentions. The skeleton joints were connected by
assigning uniform attention to all inputs, even though some inputs can nodes and edges to learn both spatial and temporal features. Cadena
be more relevant than others (Sharma et al., 2022). Ahmed et al. (2023) et al. (2022) used two GCNs, which took human body key points,
used a 2D pose estimator in conjunction with LSTM to predict crossing local context, and ego speed information as inputs to predict crossing
behaviour of the pedestrian. intentions. GCNs has the advantage of extracting interactions among
GRU: GRUs serve as an alternative to LSTMs, as they also learns the target pedestrian and its neighbours, considering both spatial and
temporal dependencies (Sharma et al., 2022). In addition, GCNs can
temporal information. Kotseruba et al. (2020) used pedestrian appear-
handle non-Euclidean data formats, such as scenarios where pedestrians
ance features, which were extracted using a VGG network, and ego
are dispersed across a scene, which cannot be represented using a grid-
vehicle velocity information as inputs for a GRU network to predict
like structure. However, they can only handle short-term sequences and
pedestrian intentions. Rasouli et al. (2020) used pedestrian appearance,
do not perform well when applied to regression tasks.
global context, body pose, bounding boxes, and ego-vehicle speed Attention Mechanism: Lian et al. (2022) also used a self-attention
features as inputs to a stacked GRU network to predict pedestrian mechanism to extract the most relevant information from the pedes-
crossing behaviour. These features were gradually integrated into the trian appearance, the pedestrian’s surroundings, and dynamic fea-
GRU network, starting with pedestrian appearance, followed by global tures. Rasouli et al. (2019) combined different attention mechanism
context, body pose, bounding boxes, and concluding with the ego layers at different locations of the network to investigate their impact
vehicle speed. GRUs offer the advantage of requiring less memory and on the model performance. Attention mechanism approaches enable
being faster than LSTMs. However, they tend to be less accurate when networks like LSTM to focus more on the most relevant features, and
handling with long input sequences (Chung et al., 2014). less on redundant ones.

16
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 7
Relevant works for pedestrian intention prediction.
Work Methods Problem Dataset/Results
Schneider and Gavrila (2013) Approach: Dynamic. Trajectory and intention Daimler
Features: prediction. IMM has not shown
Recursive Bayesian filters – Compared EKF and IMM filters significance performance over
(constant velocity/acceleration/turn). simpler models.
PTH: < 2 s.
Evaluation: MLPE.
Keller and Gavrila (2013) Approach: Dynamic. Trajectory and intention OWN (on-board)
Features: prediction. GDPM and PHTM showed
Compared the performance between GDPMs using optical better accuracy, however, they
flow information, PHTM, KF and IMMKF. are more computationally
Provided human performance on classifying pedestrian expensive.
behaviour prediction. 10–50 cm Time Horizon 0.77
Evaluation: Mean Combined Longitudinal and Lateral RMSE. s.
Bonnin et al. (2014) Approach: Dynamic + Context. Intention prediction (crossing). OWN (on-board) Inner-city
Features: distance and time to curb, distance and time to dataset, zebra dataset and
ego lane, distance and time to zebra crossing, distance and combination of both ICZ.
time to collision point, difference of time to collision point, Inner-city model: 31% TPR,
face, global and relative orientation. 0.0 FPR, PTH 0.72 s for the
Single Neural Network as classifier to learn the different zebra dataset. TPR 29%, PTH
features. 0.67 s for the inner-city
Inner-city and zebra model. dataset. TPR 31%, PTH 0.72 s
PTH: 1 s. for the ICZ dataset.
Evaluation: TPR and FPR. Zebra crossing model: 100%
TPR, 3.23 s PTH for the zebra
dataset. 86% TPR, 28% FPR
and 1.73 s PTH for the
inner-city dataset.
CMT model: 62% TPR, 2.59 s
PTH for the ICZ dataset.

Neogi et al. (2017) Approach: Dynamic + Context. Intention prediction. NTUC (OWN, on-board and
FLDCRF. actors)
Features: pedestrian position (distance to curb, and left or Average probability > 0.7
right side of the road), pedestrian–vehicle interaction, optical predicting 1.2 s before the
flow. action.
Evaluation: average probability, time to stop and time to
cross.
Minguez et al. (2018) Approach: Dynamic. Predict pedestrian actions. CMU-UAH
Balanced-GDPMs to reduce 3-D time relevant information Achieved MED of 41.24 mm
into low dimensional information and to assume future for TTE of 1 s, for starting
latent positions. activity; and MED of
Features: Skeleton motion analysis. 238.01 mm for TTE of 1 s for
Four models to predict start, stop, walk and stand actions. stopping activity.
HMM is used to select which model to use to predict future
pedestrian path and poses.
Evaluation: MED against TTE.
Fang et al. (2017) Approach: Data Driven. Intention prediction Daimler
Features: Skeleton. (crossing/not crossing). 0.8 predictability with
CNN for pose estimation. TTE=12 (750 ms).
Deep association for tracking.
Evaluation: Intention probability vs TTE.
CV Approach: Data Driven. Intention prediction See Table 8
Fang and López (2018) Features: Skeleton. (crossing/not crossing).
CNN for pose estimation.
Deep association for tracking.
Evaluation: Accuracy.
PIE (int) Approach: Data driven. Intention prediction (crossing). See Table 8
Rasouli et al. (2019) Features: bbox coord, image context, and image bbox.
RNN (LSTM).
Evaluation: Accuracy, and F1-score.
Bouhsain et al. (2020) Approach: Data Driven. Pedestrian intention and See Table 8.
Features: bboxes coordinates and velocities. pedestrian bbox predictions
PV-LSTM (crossing).
Multi-task sequence to sequence learning
Evaluation: ADE, FDE, Accuracy.
Liu et al. (2020) Approach: Context, Temporal, and Data driven. Intention prediction (crossing). Stanford-TIR
Features: A: 79.10%.
Graph Convolution and GRU to learn spatio-temporal JAAD
relationship. A: 79.28%.
Evaluation: Accuracy.

(continued on next page)

17
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 7 (continued).
Work Methods Problem Dataset/Results
Abughalieh and Alawneh (2020) Approach: Data driven. Intention prediction (walking OWN (on-board)
Features: pedestrian body landmarks considering depth and crossing). A: 89%.
information.
CNN.
Evaluation: Accuracy.
FUSSI-net Approach: Data driven, target-agent context. Intention prediction (crossing). See Table 8
Piccoli et al. (2020) Features: Skeleton and bbox.
DenseNet.
Evaluation: Accuracy.
SFR-GRU Approach: Data driven. Intention prediction (crossing). See Table 8
Rasouli et al. (2020) Features: pose, 2D bbox, appearance, global context, and
ego speed.
Stacked-RNN (GRU).
Evaluation: Accuracy, Precision, recall, F1-score, and AUC.
C+B+S+Int Approach: Data driven. Intention prediction (crossing). See Table 8
Kotseruba et al. (2020) Features: surrounding, appearance, context, bbox, and ego Studied human performance.
vehicle speed.
single GRU.
PTH: 2 s.
Evaluation: Accuracy, AUC, F1, Precision, and recall.
Razali et al. (2021) Approach: Data driven and key body landmarks. Recognition and Intention JAAD
Features: PAF and PIF. prediction (crossing) in Recognition: −0 s: 81.7%; −1
Uses only one RGB image. real-time. s: 83.6%; −2 s: 83.5%; −3 s:
Multitask learning. 83%; −4 s: 82.7%.
CNN (ResNet). Prediction: −1 s: 42.6%; −2 s:
Evaluation: Precision for different prediction horizon. 46.1%; −3 s: 46.3%; −4 s:
46.0%.
FPS: 5.
Zhang, Abdel-Aty, et al. (2021) Approach: Data Driven. Intention prediction (crossing CCTV
Features:: pose-key-points. at red light). A: 92%: 1 s; 92%: 2 s; 88.9%:
Compared SVM, RF, GBM, and XGBoost models. 3 s; 92.5%: 4 s.
Evaluation: Accuracy.
PCIR Approach: Data driven, context, and behavioural. Intention detection (crossing). See Table 8
Yang et al. (2021) Features: pedestrians, ego vehicle, and environment.
3D-CNN.
Evaluation: AP.
Chen et al. (2021) Approach: Data driven. Intention prediction (crossing). See Table 8
Features: bbox, body pose, road objects.
Graph encoder, CNN, and LSTM.
PTH: 1.5 s.
Evaluation: Balanced Accuracy and F1 score.
I+A+F+R Yao et al. (2021b) Approach: Data driven, and multi-task. Intention and action See Table 8
ARN Attentive Relation Network. prediction (crossing). Inference: < 6 ms.
CNN, MLP, and GRU.
PTH: 1–2 s.
Features: bbox context and coordinates, relation, and visual.
Evaluation: Accuracy, F1-score, ROC-AUC, precision.
PCPA Approach: Data driven. Intention prediction (crossing). See Table 8
Kotseruba et al. (2021) Features: bbox, pose, local context, and ego vehicle speed.
3D CNN + single-RNN (GRU) + attention mechanism.
Evaluation: Accuracy, AUC, and F1.
Yang, Zhang, et al. (2022) Approach: Data driven. Intention prediction (crossing). See Table 8
Features: local and global context, bbox, pose-key-points.
Attention mechanism, 2D CNN, and RNN.
Evaluation: Accuracy, F1, and recall.
Graph+ Approach: Data driven. Intention Prediction (crossing). See Table 8
Cadena et al. (2022) Features: context, ego vehicle velocity, and key body Inference: 6 ms.
landmarks.
Graph Convolutional Network.
Evaluation: Accuracy.
ST-CrossingPose Approach: Data driven. Intention prediction (crossing). JAAD
Zhang, Angeloudis, and Demiris Features: skeleton-based. Recognition: 63%.
(2022) Spatio-Temporal GCN. See Table 8
Evaluation: Accuracy, AUC, F1-score, Precision, and Recall.
Achaji et al. (2022) Approach: Data Driven. Intention recognition and PIE A:91%.
Features: bbox. prediction (crossing). F1:0.83.
Transformer Networks. CP2A A:91%.
PTH: 1 s and 2 s. F1:0.91.
Test human ability for pedestrian action prediction.
Evaluation: Accuracy and F1-Score.

(continued on next page)

18
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 7 (continued).
Work Methods Problem Dataset/Results
Scene-STGCN Approach: Data Driven. Intention recognition See Table 8
Naik et al. (2022) Features: (crossing).
Scene Spatio-Temporal GCN.
Evaluation: Accuracy, F1-score, AP, and ROC-AUC.
Zeng (2022) Approach: Data driven. Intention prediction (crossing). See Table 8
Features: body land-marks. Light-weight and inference
SqueezeNet and GRU. speed.
Hardware: AMD Ryzen 5 3600, G Force RTX 3070.
Evaluation: Accuracy and ROC-AUC.
CA-LSTM Approach: Data driven. context and dynamic. Intention Prediction (crossing). See Table 8
Lian et al. (2022) Features: appearance, velocity, and walking angle.
Attention LSTM.
Evaluation: Accuracy, F1-score, recall metrics.
Gazzeh and Douik (2022) Approach: Data driven. Intention recognition in See Table 8
Features: pedestrian localisation and environment contest real-time.
(lane lines).
ML and DL.
Evaluation: Accuracy.
Ma and Rong (2022) Approach: Data driven. Intention prediction (crossing). See Table 8
Features: pedestrian pose (skeleton), pedestrian to vehicle
distance, and ego vehicle information.
Multi-feature fusion.
Random forest classifier.
PTH: 0.6 s.
Evaluation: Accuracy and AUC.
Ahmed et al. (2023) Approach: Data driven. Intention prediction (crossing). JAAD and PIE
Features: Past trajectories, velocity, and 3D joint estimation. Accuracy: 89%/91%.
Model(s): Position and Velocity LSTM.
PTH: 0.4 s.
Evaluation: Accuracy.

Transformers: Even though attention mechanism have the ability combine the most relevant features. Yang, Zhang, et al. (2022) used
to focus on the most relevant features, it was reported by Achaji et al. 2D-CNN, stacked-RNN, and attention mechanism. Spatio-temporal GCN
(2022) that its effectiveness might be reduced when coupled with LSTM was used by Naik et al. (2022) to encode the input image, image
networks. For this reason, Achaji et al. (2022) proposed a framework class and location information tensors. Then the output of the spatio-
based on three types of transformer networks: encoder-only, encoder- temporal GCN was fed into an LSTM network to generate long-term
pooling, and encoder–decoder architectures. The proposed framework predictions. Zeng (2022) used SqueezeNet to extract visual features and
used only the pedestrian bounding box information as its input. The used GRU to extract temporal dependencies. They also used a multi-
authors argued that their model outperformed other methods that used tasking approach to predict both pedestrians’ intentions and poses. One
multiple input features. Transformer networks offer the advantage of primary advantage of using multiple models is that each model can
parallel input processing, which accelerates training stage. On the other compensate for the limitations of others. For example, CNN, GCN, and
hand, the ability to process the input data in parallel restricts the model attention mechanism can aid the limitations of an LSTM network to
to take advantage of the sequential nature of the input. extract spatial information, handle non-Euclidean data, and prioritise
Multiple Methods: many studies have used more than one method relevant features, respectively.
to predict pedestrian intention. Liu et al. (2020) used GCN to gen- Full-Pipeline: Gazzeh and Douik (2022) presented a full pipeline
erate a pedestrian-centring graph for each observation frame. These model which includes detection, tracking, and crossing intention pre-
graphs connect the target pedestrian to its surrounding, allowing the diction. They used YOLOv4 for object detection, DeepSort for tracking,
algorithm to learn relation between the pedestrian and the scene. In Canny Edge for lane line detection, and linear SVM for intention pre-
addition, edges were introduced between the pedestrian nodes in each diction. Another full pipeline system was implemented by Piccoli et al.
pedestrian-centring graph to allow the algorithm to learn temporal (2020), where they used YOLOv3 for detection, DeepSort for track-
information. The resulting interconnected graphs were then fed into ing, and spatio-temporal Densenet for intention prediction. YOLOv5,
a GRU network to predict crossing intention. Chen et al. (2021) used DeepSort, and an LSTM network with an attention mechanism were
a combination of methods, including a CNN to extract features from used by Lian et al. (2022) to detect, track, and predict pedestrian in-
traffic objects and pedestrian appearance, a GCN to auto encode the tention, respectively. A multi-task network was implemented by Razali
extracted features, another framework to extract human skeleton, and et al. (2021) to recognise pose state and predict pedestrian intentions.
an LSTM network to predict crossing intentions. CNN, ARN, MLP and ResNet was used to extract features, Part-Intensity-Fields (PIF), and
GRU were used by Yao et al. (2021b) to predict crossing intentions. Part-Association-Fields (PAF) to produce channels and pose joints, and
The CNN was used to extract global features, ARN was used to extract a head network to predict pedestrian intentions.
relational features from detected traffic objects, MLP was used for Table 8 presents the results achieved by the most relevant pedes-
intention classification, and the LSTM was used for intention prediction. trian intention prediction works in the literature. Unfortunately, direct
One major difference of this work is that the network also takes the comparisons between these studies are not possible due to variations
predicted intention output as input. Kotseruba et al. (2021) used 3D- in different problem formulations, OTH, TTE, datasets, and metrics.
CNN, RNN, and attention mechanism. The 3D-CNN was used to encode For example, the work that achieved the best accuracy was Zhang,
local features from a sequence of cropped bounding boxes, the RNN Angeloudis, and Demiris (2022), however, the authors used their own
was used to encode the bounding-box coordinates, pose landmarks and dataset. The second best was Bouhsain et al. (2020) but they used an
the ego-vehicle speed. Finally, an attention mechanism was used to observation horizon and TTE of 0.6 s.

19
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

5. Heterogeneous road agents grid map limits the velocity resolution and might not give realistic
measurements. Also, relying solely on the distance between the ego
All the previously mentioned works primarily focused on predicting and the target vehicle is not enough. For example, an ego vehicle
the behaviour of either pedestrians or vehicles. However, in a real- might maintain a safe distance from the target vehicle, but the target
world traffic scenario, complex interactions occur among various types vehicle can suddenly brake and change its velocity. Therefore, it would
of agents, each with different dimensions and dynamics. Consequently, be beneficial for the ego vehicle to predict and recognise instances
it is crucial to consider the interaction between heterogeneous agents. when the target vehicle is braking or experiencing a sudden change
Several works have addressed the detection and behaviour prediction in velocity.
of heterogeneous agents. Authors (Li, Wang, et al., 2020) considered themselves pioneers in
For example, authors (Ma et al., 2019) introduced the TrafficPre- combining object detection and intention recognition to assess the risks
dict algorithm, which was developed to learn motion patterns and in a complex traffic scenarios. Their objective was to detect both non-
predict the trajectories of different types of traffic agents, including static objects such as vehicles and pedestrians, and static objects such as
pedestrians, bicycles and cars. They adopted the 4D Graph network traffic lights, and then use the gained information to evaluate potential
in conjunction with an RCNN LSTM to learn the movements and hazards ahead. In order to detect the objects, they used the YOLOv4 and
interactions of traffic agents. The authors used an OTH of 2 s to predict the BDD100K dataset and achieved an mAP of 52.7%. For recognising
a horizon of 3 s. They achieved a state-of-the-art average displacement the pedestrian intention (crossing or not-crossing), they used VGG-19
error of 0.085 and a final displacement error of 0.141. DeepTAgent CNN and Part Affinity fields, achieving an accuracy of 97.5%. To pre-
is another heterogeneous system presented by Chandra, Randhavane, dict vehicle intentions, including braking and turning, they employed
et al. (2019) in which they used Mask R-CNN to detect objects, a the EfficientNet CNN, achieving a recognition accuracy of 94%. Lastly,
CNN to extract tracking features, and a Heterogeneous Interaction for recognising traffic light state (red, green, or amber), they used
Model (HTMI) that considered collision avoidance behaviour to predict the MobileNet CNN, achieving an accuracy of 97.75%. Nevertheless,
the agents’ position, velocity and subsequently their trajectory and using only the brake and the turn signal lights information to predict
interactions. The authors (Chandra, Bhattacharya, et al., 2019) pre- vehicle behaviour and assess danger is not sufficient since braking
sented a hybrid network for predicting the trajectory of road agents behaviour can exhibit varying intensities. For example, normal braking,
and modelling their interactions. They used a CNN to capture local characterised by a gradual decrease in the vehicle’s velocity, is typically
information, such as the agent’s shape and position, and an LSTM regarded as a potential hazard. In contrast, harsh braking, involving
network for trajectory prediction. In dense, diverse traffic situations, a sudden and significant change in the vehicle’s velocity, is seen as a
the algorithm demonstrated a notable performance of 30% over state- developing hazard. Furthermore, there are situations where the target
of-the-art methods. However, it did not outperform the state-of-the-art vehicles abruptly change their direction without using their turn signal,
algorithms in sparse and homogeneous traffic scenes. Li, Yang, et al. which also poses a developing hazard. Therefore, the ego vehicle must
(2020) presented a framework called EvolveGraph. In this framework, be capable of detecting sudden changes in the vehicle’s direction and
they encoded an observation graph to infer an interaction graph, and velocity. Similarly, depending only on pedestrian crossing/not crossing
subsequently, decoded both the observation and interaction graphs to intentions limits the system to make a long prediction horizon, as
predict future trajectories. Zhang, Zhao, et al. (2022) implemented pedestrians can cross at different velocities, and may suddenly change
the Attention-based Interaction-aware Trajectory Prediction (AI-TP) their goal destination.
model. This model used Graph Attention Network (GAT) to represent
interaction among heterogeneous traffic agents and used a Convolu-
6. Discussion
tional GRU (ConvGRU) to make predictions. A multi-agent trajectory
prediction system was performed by Mo et al. (2022) where a three-
This paper has surveyed several works that investigate the be-
channel framework was used to account for dynamics, interactions
haviour prediction of pedestrians and vehicles. Based on the findings,
and road structure. Moreover, a novel Heterogeneous Edge-enhanced
this section presents a general framework diagram, outlines risk as-
graph ATtention network (HEAT) was proposed to extract interaction
sessment, discusses challenges, examines techniques, outlines require-
features. Dynamic features were extracted from the agents’ previous
ments, and suggests potential future directions for pedestrian and ve-
trajectories, interaction patterns were represented through a directed
hicle behaviour prediction systems.
edge-feature heterogeneous graph and extracted with the HEAT net-
work. The road structure information was shared among all agents
using a gate mechanism. Finally, all the information acquired from the 6.1. General framework for a behaviour prediction system
previous process was combined to predict trajectories.
All the previously cited works have predicted the trajectories and A proposed general framework for a behaviour prediction system is
interactions among the agents. However, they have not taken into con- depicted in Fig. 6. The camera sensor outputs RGB images which are
sideration their intentions, such as crossing/not-crossing, braking/non- used by the detection and image processing algorithms.
braking. Also, they have not incorporated the information provided The detection algorithm is responsible for detecting both static
by road static objects like traffic lights and road signs. Static road and non-static road objects, including road lanes, vehicles, vulnerable
traffic objects play a crucial role in directing, informing, and controlling road users, traffic lights, and road signs. The position information
road users’ behaviour. Furthermore, there is limited research on how of the detected objects, represented by bounding boxes, is then used
to use detection and prediction information to identify potential and by a tracking algorithm to assign a unique ID to each object. This
developing hazards. ID assignment enables the system to track past trajectories of each
The authors (Chen et al., 2018) proposed a multi-task learning detected object, which serves as input for subsequent processing.
model that combines both object detection and distance prediction to The image processing algorithm uses the RGB images from the
identify dangerous traffic road objects. They used SSD CNN to detect camera sensor as well as the past trajectories of the detected objects
cars, vans, and pedestrians. The input image was divided into a grid to generate optical flow, depth, appearance, global and local context
map with four vertical and three horizontal distances. Depending on images. An example of how image processing uses past trajectories is
the category of the target vehicle and its location, the network assigned the use of the bounding box information to crop the RGB image at the
a danger level using blue, green, yellow, and red bounding boxes, specific location of the detected object. This cropping operation pro-
where blue and red represented the least and the most dangerous levels, vides local context information for further analysis and decision-making
respectively. However, predicting the target vehicle’s velocity using a within the system.

20
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Fig. 6. General behaviour prediction framework. The behaviour prediction module consists of an automated feature extractor (CNN, 3D-CNN, GCN, FCN, CVAE, GAN, etc.), an
embedding layer (FCN and ANN), and a time series algorithm (RNN, GRU, and LSTM). It is dependent on the perception module (Detection, tracking, image processing, interaction
representation, and feature engineering) which is dependent on the ego vehicle sensors (camera, GPS, and wheel encoder). Additionally, the outputs of the behaviour prediction
modules are sent to the planning module.

The interaction representation algorithm uses the past trajectories of • Computing components failure: computer or GPU failure.
the objects to calculate distances between the traffic agents, construct • Sensor Failure: Failure in the steering wheel, wheel encoder, GPS,
graph networks with vertices and edges, and generate grid maps that and IMU sensors.
account for interactions between traffic agents. • Detection algorithm failure: missed detections, poor intersection
The feature engineering algorithm uses the past trajectories of ob- over union, false-positive and false-negative classification.
jects and internal sensors data from the AV (e.g., steering wheel angle, • Tracking algorithm failure: missed tracking and incorrect associa-
yaw rate, wheel encoder, etc.) to derive additional features. For ex- tion of objects between frames. For instance, an object might not
ample, to use the differences between the objects’ positions between be tracked in the next frame or objects might swap their IDs due
consecutive frames to calculate their velocities. to overlap.
The outputs of the perception module are then fed into the au- • Image processing failure: incorrect optical flow and depth estima-
tomated feature extractor and the embedding algorithms within the tion.
behaviour prediction module. Automated feature extractors are deep • Interaction representation failure: noisy and incorrect distance
learning algorithms designed to generate feature vectors representing calculation, as well as incorrect graph or grid representation of
spatial properties of the inputs. Embedding uses a linear transformation the object interactions.
to transform the inputs into a desired output feature size. The time • Feature Engineering failure: redundant features, noisy estimates
series algorithm uses the combined feature vectors generated by the speed and acceleration due to poor detection and tracking per-
automated feature extractor and the embedding layer to learn temporal formance.
information, enabling it to predict various aspects of object behaviour, • Cybersecurity failure: remote hacking, vehicle spoofing, insider
including future trajectories, future intentions, goals, and current in- threat, and tampering with sensor data.
tentions. Note that the predicted goals and recognised intentions can
be used by the embedding layer and the time series algorithm as extra 6.2.2. Risk analysis
information for predicting future trajectories. The authors (Bhavsar et al., 2017) discussed several methods for
Finally, the outputs of the behaviour prediction module are then analysing risks in automotive contexts, including situation-based analy-
used by the AV’s Planning module, which in turn uses this information sis, ontology-based analysis, failure modes and effects analysis (FMEA),
to plan the actions of the AV to achieve its final goal. and fault tree analysis (FTA). From their investigation, they concluded
that FTA is the most suitable method for conducting a risk assessment
6.2. Risk assessment for behaviour prediction system
on AV features. For this reason, this paper also adopts FTA to per-
form a risk analysis on the behaviour prediction system. FTA methods
Authors (Bhavsar et al., 2017) proposed a risk assessment for a
have the following advantages, being event-orientated, enabling the
AV. They mentioned that AV failures can arise from various aspects,
diagnosis of the root cause of failures, facilitating an understanding
including vehicular components such as hardware, software, mechan-
of how subsystems can impact each other, having a straightforward
ical systems, communication infrastructure, and interactions between
and graphical nature for ease of comprehension, and aiding in decision-
the passenger and the AV Human Machine Interface system. Based on
making regarding the control of identified risks. The proposed FTA is
their finding, this paper presents a risk assessment specifically for an
depicted in Fig. 7. A qualitative analysis of the proposed FTA reveals
AV behaviour prediction system. This assessment identifies, analyses,
that the system is highly vulnerable because any failure occurrence
and provides recommendations for mitigating and controlling these
of the basic events (EVX) can lead to the failure of the behaviour
identified risks.
prediction system. For instance, if the detection algorithm fails, it can
cascade failures throughout the tracking algorithm, image processing,
6.2.1. Risk identification
interaction representation, and feature engineering, ultimately in the
Based on the general framework for a behaviour prediction system
failure of the behaviour predictions system.
depicted in Fig. 6, the following risks have been identified:
In order to quantitatively analyse the behaviour prediction system,
• Camera sensor failure: this includes hardware malfunctions, it is required to know the probability of failure for each event (EVX),
blocked field of view, and noise (electricity, heat, and illumina- which depends on the hardware, software, and cybersecurity in use.
tion). However, a general mathematical model to calculate the overall system

21
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Fig. 7. Fault tree analysis for a Behaviour Prediction System. The circle shapes with the square shapes are the basic events that may lead to failures on the top events. The square
shape after the TOP GATE is the top event which means the failure of the behaviour prediction system. The ‘‘OR’’ gates mean that if one of its input events occurs it will output
an event as true.

failure from an FTA diagram depicted in Fig. 7 is given by the following sensor. The disadvantage of this approach is that it is expensive
equation (Ruijters & Stoelinga, 2015; Xing & Amari, 2008), and requires more space in the vehicle.
• For the general prediction behaviour system in question, it is
𝑄0 (𝑡) ≤ (1 − 𝛱𝑗=1𝑘 [1−𝑄̌ ) (3) observed that it relies on three types of information (RGB image,
𝑗 (𝑡)]

where 𝑄0 (𝑡) is the top event (failure of the behaviour prediction sys- engineering feature, and interaction) for predictions. Therefore it
tem), 𝑄𝑗̌(𝑡) is the failure probability of a minimal cut-set. For instance, is recommended to enable the system to function in a degraded
the probability that the TOP GATE in the proposed FTA diagram mode by using one or two pieces of information if one of them
fails.
happens is given by,
• The detection and tracking algorithms are important for the sys-
𝑄0 (𝑡) ≤ (1 − [1 − 𝑃 (𝐺𝑇 1)] ∗ [1 − 𝑃 (𝐺𝑇 2)] ∗ [1 − 𝑃 (𝐺𝑇 3)]) (4) tem, as their outputs are used by the other algorithms. Thus, it
is recommended to make use of sensor fusion, since if one of the
where hardware or the algorithms responsible for detecting and tracking
𝑃 (𝐺𝑇 1) = (1 − [1 − 𝑃 (𝐸𝑉 1)] ∗ [1 − 𝑃 (𝐸𝑉 2)] ∗ [1 − 𝑃 (𝐸𝑉 3)]) (5) the object fails the system can work in a degraded mode.

and, 6.3. Behaviour prediction system challenges

𝑃 (𝐺𝑇 2) = (1 − [1 − 𝑃 (𝐸𝑉 4)] ∗ [1 − 𝑃 (𝐸𝑉 5)] ∗ [1 − 𝑃 (𝐸𝑉 6)]
(6) Table 9 summarises the main challenges in the research of be-
∗ [1 − 𝑃 (𝐸𝑉 7)]) haviour prediction of traffic agents. These challenges are categorised
and into target agents, systems, resources, and uncertainties. Target agents
refer to the unique characteristics of these agents that make their
𝑃 (𝐺𝑇 3) = (1 − [1 − 𝑃 (𝐸𝑉 8)] ∗ [1 − 𝑃 (𝐸𝑉 9)] ∗ [1 − 𝑃 (𝐸𝑉 10)] behaviour challenging to predict. System challenges are related to the
(7)
∗ [1 − 𝑃 (𝐸𝑉 11)]). inherent characteristics of the system, considering its design and eval-
uation. Resource challenges are associated with the hardware and data
6.2.3. Risk control required for training and operating the system. Uncertainties include
Based on the identification and analysis of risks, it has been con- events such as hardware malfunctions, cybersecurity vulnerabilities,
cluded that a behaviour prediction system is vulnerable. Below are and software failures.
some recommendations to mitigate these risks:
6.4. Behaviour prediction system requirements
• Given that any hardware failure can cause a top event, it is
recommended to have backups for hardware components with a An AV behaviour prediction system needs to meet several key
high probability of failure, for example, to have an extra camera requirements to ensure its effectiveness:

22
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 8
Results for the most relevant pedestrian intention prediction works.
Work Dataset Obs. Hor. TTE Acc(%) AUC(%) F1(%) Rec.(%) Prec(%) ROC-AUC(%)
JAAD – Recog. 92.88 – – – – –
Gazzeh and Douik (2022)
Fang and López (2018) JAAD 0.5 s Next-Frame 88 – – – – –
STRR-Graph JAAD 0.5 s Next-Frame 76.98 – – – – –
Liu et al. (2020)
FUSSI-net JAAD 0.5 s Next-Frame 76.6 – – – – –
Piccoli et al. (2020)
PIEint PIE 0.5 s Next-Frame 79 – 87 – 90 73
Rasouli et al. (2019)
CA-LSTM JAAD 0.5 s Next-Frame 89.68 – 75.38 85.96 – –
Lian et al. (2022)
PV-LSTM JAAD 0.6 s 0.6 s 91.48 – – – – –
Bouhsain et al. (2020)
Ma and Rong (2022) BPI – 0.6 s 89.5 99.2 – – – –
SFR-GRU PIE 0.5 s 2 s 84.4 82.9 72.1 80 65.7 –
Rasouli et al. (2020)
C+B+S+Int PIE 0.5 s 2 s 83 85 81 85 79 –
Kotseruba et al. (2020)
PCIR JAAD – – 89.6 – – – – –
Yang et al. (2021)
Chen et al. (2021) PIE 0.5 s 1.5 s 79 – 78 – – –
I+A+F+R JAAD 0.5 s 1-2 s 87 92 70 – 66 –
Yao et al. (2021b) PIE – 84 88 90 – 96 –
Yang, Zhang, et al. (2022) JAAD 0.5 s 1-2 s 83 82 63 81 51 –
PIE 89 86 80 81 79 –
GRAPH+ JAAD 0.5 s 1–2 s 86 88 65 75 58 –
Cadena et al. (2022) PIE 89 90 81 79 83 –
Achaji et al. (2022) PIE 0.5 s 1–2 s 91 91 83 – – –
Scene-STGCN PIE 0.5 s 1–2 s 83 – 89 – 96 85
Naik et al. (2022)
PCPA JAAD 0.5 s 0.5-1 s 85 86 68 – – –
Kotseruba et al. (2021) PIE – 87 86 77 – – –
ST-CrossingPose OWN 0.5 s 1 s 92 84.9 83.7 81.8 85.9 –
Zhang, Angeloudis, and Demiris 2 s 92 84.1 79.7 79.7 81.3 –
(2022)
Zeng (2022) JAAD -s -s 84 – – – – 85

• Good Evaluation Metric Performance: AV behaviour prediction Evaluation metrics, long prediction horizons, and robustness are
system is a safety-critical system, therefore it must perform well interrelated. For instance, as the prediction horizon increases the eval-
in terms of evaluation metric performance to prevent traffic colli- uation metric performance tends to decrease. In addition, as a system
sions. For example, if the system fails to predict that a pedestrian becomes more robust, its evaluation metric performance is expected
will cross the road, it could lead to a serious collision. to increase. The major challenges that limit behaviour prediction algo-
• Long Prediction Horizon (PTH): A system with a long PTH can rithms from meeting the previously mentioned requirements stem from
plan and react well in advance, reducing the chances of collisions the fact that an agent’s behaviour depends on other agents in the scene,
and improving overall safety. the local and global context, and their final goal. Various approaches
• Fast Inference Time: Given that an AV behaviour prediction have been proposed to address these challenges:
system must operate in a real-time, it must have a low inference • Social pooling layers (Alahi et al., 2016; Deo & Trivedi, 2018a),
time and require a low hardware resource. Graph representation, GCN, self-attention based social pooling
• Low Cost: To make AVs accessible to a wide range of people, (Mangalam et al., 2020), message passing mechanism (Zhang
the behaviour prediction system should be cost-effective, ensuring et al., 2019), occupancy maps (Kasper et al., 2012; Park et al.,
that AVs are affordable for all social classes 2018; Xue et al., 2018), view frustum social pooling (Hasan
• Low Hardware Resource Requirement: Efficient utilisation of et al., 2018), and star-like networks to model interactions be-
hardware resources is important, as it allows the system to run tween agents (Zhu et al., 2019).
on hardware with limited capacity. • CNNs to extract agents’ appearance, body pose, local context,
• Robustness: The system should be robust and able to handle global context, and to classify intentions (Biparva et al., 2021;
various scenarios and conditions on the road, ensuring reliable Chen et al., 2021; Fang et al., 2017; Fernández-Llorca et al., 2020;
performance in different situations. Izquierdo et al., 2021; Yang, Zhang, et al., 2022; Yao et al., 2021b;
• Prediction of Various Non-Static Objects: The system should be Zhao et al., 2019).
capable of predicting the behaviour of different types of non-static • Attention mechanisms and transformer networks to focus on the
objects on the road, including pedestrians, vehicles, animals, and most relevant information (Achaji et al., 2022; Lian et al., 2022;
cyclists, to ensure comprehensive safety. Rasouli et al., 2019).

23
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Table 9
Behaviour prediction research challenges.
Type of challenge Class Challenges
Pedestrian Highly dynamic, can move in many directions and change them
Target Agents
very quickly, be easily occluded, be distracted by their own objects
or external environments, their motion can be affected by other
traffic agents, might be under the influence of drugs or alcoholic
drinks, and they are hard to see in poor visibility condition.
Vehicle Dependent on other vehicles’ actions, traffic rules, road geometry,
different driving environments, vehicles have multi-modal
behaviour, different types of vehicles have different motion
properties, drivers might be under the influence of drugs or
alcoholic drinks, and target vehicles might be occluded.
System Design To achieve a good evaluation metric performance, long PTH,
*
real-time inference, low hardware resources, and robustness.
Evaluation Works have used different types of datasets, evaluation metrics,
observation and prediction horizon, and hardware setup. Therefore,
works cannot be directly compared and the actual progress of
pedestrians and vehicle behaviour prediction research cannot be
measured.
Resources Hardware Smaller size GPUs that can process deep learning algorithms in
*
real-time, sensors that enable the AV to perceive 360-degree road
view, and affordable hardware to enable all social classes to afford
AVs.
Data Several existing datasets are not publicly available and they are not
standardised to enable cross-dataset evaluation and progressive
training pipeline techniques.
Uncertainties Hardware Failure Camera, GPS, IMU, steering wheel, and wheel encoder sensor
* failure.
Cyber Attack Remote hacking, vehicle spoofing, insider threat, and tampering
with sensor data.
Software Failure Perception module (detection, tracking, image processing,
interaction representation, and feature engineering) failure.

• 3D-CNNs and temporal-Densenet to learn short-term temporal system’s overall inference time is expected to be shorter. However,
information (Biparva et al., 2021; Kotseruba et al., 2021; Piccoli there may be a trade-off between accuracy and inference time. For
et al., 2020; Yang et al., 2021). example, using multiple-feature information can increase the system’s
• LSTMs and GRUs to learn long-term temporal information (Bouh- accuracy but may lead to longer inference times compared to a sys-
sain et al., 2020; Chung et al., 2014; Kotseruba et al., 2020; tem using a single type of feature. The following methods have been
Rasouli et al., 2019, 2020). proposed in order to achieve low inference time, low cost, and low
• A modified version of the LSTM cell that accepts more than one hardware resource requirements:
input sequence set (Quan et al., 2021).
• CVAE was used to estimate the final goals of the agents to extend • GCN, which represents interactions between agents effectively
the prediction time horizon (Lee, Choi, et al., 2017; Mangalam without relying on additional information like original images,
et al., 2020; Wang et al., 2022; Yao et al., 2021a). cropped images, or contextual information (Li et al., 2019a,
• Heterogeneous agent behaviour prediction works have been pre- 2019b).
sented to enable the system to predict the behaviour of different • Dual-LSTM, which allows the system to learn more information
non-static object behaviour (Chandra, Bhattacharya, et al., 2019; from past trajectories without requiring extra input features (Xin
Chandra, Randhavane, et al., 2019; Chen et al., 2018; Li, Wang, et al., 2018).
et al., 2020; Li, Yang, et al., 2020; Ma et al., 2019; Mo et al., • Fusion of multiple input features (context, interaction, trajec-
2022). However, these works have primarily focused on pedestri- tories, and appearance) into an enriched image representation,
ans, cyclists, and vehicles, while there are other objects such as rather than processing a sequence of images (Izquierdo et al.,
animals, disabled individuals, scooters, toys (balls), skate riders, 2021).
etc.
• Combination of two or more methods to compensate their limita- 6.5. Behaviour prediction system further work
tions (Chen et al., 2021; Kotseruba et al., 2021; Liu et al., 2020;
Naik et al., 2022; Yang, Zhang, et al., 2022; Yao et al., 2021b; Despite the techniques presented to meet the specified require-
Zeng, 2022). ments, there is still work to be done from the authors’ perspective. For
• Systems that can predict the behaviour of heterogeneous agents example:
(Chandra, Bhattacharya, et al., 2019; Chandra, Randhavane, et al.,
• Most of the works, both for pedestrians and vehicles, were im-
2019; Li, Wang, et al., 2020; Li, Yang, et al., 2020; Ma et al.,
plemented using either a top-view or BEV dataset, which may
2019).
not be ideal for an AV system. Only in the past five years
Inference time, low cost, and low hardware resource requirements have researchers started implementing algorithms using on-board
are also interrelated. For example, if a system consumes less memory datasets such as PREVENTION, Appolo, JAAD, and PIE. Moreover,
and computational power, it results in cheaper hardware requirements, most of the works that used on-board datasets focused on imple-
making the overall system more cost-effective. Typically, when a sys- menting intention prediction algorithms, and most of proposed
tem requires less memory, such as for processing image inputs, the algorithms cannot be directly compared.

24
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

• While some works have used the same datasets, evaluation met- 7. Conclusion
rics, observation time horizon, and prediction time horizon, these
works were implemented on top-view and BEV datasets. For ex- AV systems must not only detect pedestrians and vehicles but also
ample, many vehicle trajectory predictions have used the NGSIM predict their behaviour to avoid or mitigate collisions. Therefore, the
dataset with an OTH of 3 s, a PTH of 5 s, and the MSE evalu- purpose of this literature review, was to survey the most relevant
ation metric. Several pedestrian prediction trajectory algorithms pedestrian and vehicle behaviour prediction algorithms to identify
adopted the ETH and UCY dataset, with an OTH of 3.2 s, a PTH of the requirements for a behaviour prediction algorithm, the challenges
4.8 s, and the ADE and FDE evaluation metric. If these datasets associated with predicting pedestrian and vehicle behaviour, whether
were ideal for AV systems, then the best vehicle trajectory pre- current techniques have met these requirements, and what steps are
diction algorithms would be GRIP (Li et al., 2019b), GRIP++ (Li needed to enable AVs to predict pedestrian and vehicle behaviours. In
et al., 2019a), and AI-TP (Zhang, Zhao, et al., 2022), and the best conclusion, the review shows that:
pedestrian trajectory prediction algorithm would be the Bi-Trap
• An AV behaviour prediction system must have a good evalua-
algorithm (Yao et al., 2021a).
tion metric performance, long prediction horizon, fast inference
• There is a lack of research on unusual behaviour exhibited by
time, must be cost-effective, robust, require minimal hardware
pedestrians and vehicles. For example, pedestrians might exhibit
resources, and predict various types of non-static objects on the
unusual behaviour when under the influence of toxic substances,
road.
involved in fights, or disoriented. Similarly, vehicles may display
• The main challenges in predicting the behaviour of traffic agents
unusual behaviour when the driver is under the influence of
involve modelling their interactions, establishing relationship be-
toxic substances, and is distracted with their personal belongings,
tween the agents and the scene, and achieving a balance between
or if the vehicle is an emergency vehicle, garbage truck, road
good evaluation metric performance and low inference times.
sweeper, carrying an abnormal load, or experiencing mechanical
• Current techniques do not fully meet these requirements for
malfunctioning. several reasons:
• There is a limited research on decreasing inference time, and
more emphasis should be placed on addressing this demand. – when predicting for long-term horizons, evaluation metric
• Standardising datasets would enable cross-dataset evaluation and performance significantly decreases;
the development of progressive training pipeline techniques. – while top-view and BEV datasets are commonly used in the
• Introducing universal metrics would allow for direct comparisons literature, there are limited works that adopted on-board
of algorithm performance. datasets, which are more suitable for AVs;
• When considering a full pipeline system (detection, tracking and – on-board datasets usually only use a single forward-facing
behaviour prediction), it is necessary to account for perception camera, limiting the behaviour prediction system to con-
uncertainties due to sensor noise, fuzzy features, or unknown insider only agents ahead, whereas considering agents around
puts (Liu et al., 2022). Since there are a limited number of works the ego vehicles using multiple cameras is essentials (Zhang,
that have implemented a full pipeline system, more works consid- 2021);
ering the entire pipeline process are recommended to investigate – more investigation is required to develop models that can
the effect of possible noise. predict intention and trajectory simultaneously; although
some authors (Li et al., 2019a, 2019b) claimed that their
Based on the literature review the following suggestions are given system has achieved real-time inference times, they have
to further improve and accelerate the development of the Autonomous used top-view cameras, whereas systems that use on-board
Vehicle Behaviour Prediction System: sensors may require more processing time;
• Encourage more research works to adopt on-board view datasets – there are no works that consider abnormal behaviour exhib-
for predicting both pedestrian and vehicle behaviour, including ited by traffic agents.
intention and trajectories.
• Most of the reviewed works have not considered the full pipeline
• Standardise existing dataset to enable cross-dataset evaluation
behaviour prediction process, which consists of detection, clas-
and progressive training pipeline techniques.
sification, and tracking. More research should focus on the full
• Choose or create a standard evaluation metric to enable direct
pipeline process to assess the performance of each stage and its
comparison among algorithms. impact on the final prediction results.
• Develop datasets that have instances of abnormal pedestrian and
vehicle behaviours to enable research on the recognition and Abbreviations
prediction of abnormal pedestrian and vehicle behaviour. AV Autonomous Vehicle.
• Implement behaviour prediction algorithms on resource-constra- ADAS Advanced Driver Assistance System.
ined hardware, such as Jetson Orin, and Jetson Xavier GPUs, WHO World Health Organisation.
which are low-cost, small in size, lightweight, and consume low DL Deep Learning.
power. OTH Observation Time Horizon.
• Investigate more methods to select the target object and the PTH Prediction Time Horizon.
objects that directly interact with the target object. EV Ego Vehicle.
TTE Time-To-Event.
The general object detection problem serves as an example of
KF Kalman Filter.
the importance of having a large dataset and standard evaluation EKF Extended Kalman Filter.
metrics. The field has achieved an acceptable level of maturity be- HMM Hidden Markov Model.
cause researchers have access to publicly available large image bench- SVM Support Vector Machine.
mark datasets, such as the ImageNet (Russakovsky et al., 2015) and ANN Artificial Neural Network.
COCO (Lin et al., 2014). These datasets enabled the authors to directly OGM Occupancy Grid Map.
compare their detection algorithm performance and to measure the CNN Convolutional Neural Network.
advancement of object detection research.

25
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

FCN Fully Connected Network. Acknowledgements

RNN Recurrent Neural Network.
GCNN Graph Convolutional Neural Network. This research was funded by EPSRC DTP Ph.D. studentship at Brunel
LSTM Long-short Term Memory. University London, United Kingdom.
RMSE Root Mean Square Error.
GRIP Graph-based Interaction-aware Trajectory References
Prediction.
Abbas, A. F., Sheikh, U. U., AL-Dhief, F. T., & Haji Mohd, M. N. (2021). A
LLC Left Lane Change.
comprehensive review of vehicle detection using computer vision. Telkomnika,
RLC Right Lane Change. 19(3).
NLC No Lane Change. Abdulrahim, K., & Salam, R. A. (2016). Traffic surveillance: A review of vision
DESIRE Deep Stochastic Inverse Optimal Control RNN based vehicle detection, recognition and tracking. International journal of applied
encoder–Decoder. engineering research, 11(1), 713–726.
CVAE Conditional Variational Auto Encoder. Abughalieh, K. M., & Alawneh, S. G. (2020). Predicting pedestrian intention to cross
the road. IEEE Access, 8, 72558–72569, Publisher: IEEE.
IOC Inverse Optimal Control.
Achaji, L., Moreau, J., Fouqueray, T., Aioun, F., & Charpillet, F. (2022). Is attention
BEV Bird’s Eye View. to bounding boxes all you need for pedestrian action prediction? In 2022 IEEE
ARIMA Auto-Regressive Integrated Moving Average. intelligent vehicles symposium (pp. 895–902). IEEE.
TSM Target Selection Model. Afrin, T., & Yodo, N. (2020). A survey of road traffic congestion measures towards
TIM Temporal Integration Method. a sustainable and resilient transportation system. Sustainability, 12(11), 4660,
Publisher: Multidisciplinary Digital Publishing Institute.
RoI Region of Interest.
Ahmed, S., Al Bazi, A., Saha, C., Rajbhandari, S., & Huda, M. N. (2023). Multi-
AUC Area Under the Curve. scale pedestrian intent prediction using 3D joint information as spatio-temporal
ROC-AUC Receiver Operating Characteristic Curve - AUC. representation. Expert Systems with Applications, 225, Article 120077, Publisher:
ANDE Average Non-Linear Displacement Error. Elsevier.
MAD Mean Average Displacement. Ahmed, S., Huda, M. N., Rajbhandari, S., Saha, C., Elshaw, M., & Kanarachos, S.
(2019a). Pedestrian and cyclist detection and intent estimation for autonomous
FAD Final Average Displacement.
vehicles: A survey. Applied Sciences, 9(11), 2335, Publisher: MDPI.
GAN Generative Adversarial Network. Ahmed, S., Huda, M. N., Rajbhandari, S., Saha, C., Elshaw, M., & Kanarachos, S.
MLP Multi-layer Perceptron. (2019b). Visual and thermal data for pedestrian and cyclist detection. In Towards
MDN Mixture Density Network. autonomous robotic systems: 20th annual conference, TAROS 2019, London, UK, July
ST-GCN Spatial–Temporal Graph Convolutional 3–5, 2019, proceedings, Part II 20 (pp. 223–234). Springer.
Network. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016).
Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the
GRU Gated Recurrent Unit.
IEEE conference on computer vision and pattern recognition (pp. 961–971).
PIF Part-Intensity-Fields. Altché, F., & de La Fortelle, A. (2017). An LSTM network for highway trajectory
PAF Part Association Fields. prediction. In 2017 IEEE 20th international conference on intelligent transportation
AI-TP Attention-Based Interaction-aware Trajectory systems (pp. 353–359). IEEE.
Prediction. Antonio, J. A., & Romero, M. (2018). Pedestrians’ detection methods in video images:
A literature review. In 2018 international conference on computational science and
HEAT Heterogeneous Edge-enhanced Graph
computational intelligence (pp. 354–360). IEEE.
Attention Network. Benterki, A., Boukhnifer, M., Judalet, V., & Choubeila, M. (2019). Prediction of
MATF Multi-Agent Tensor Fusion. surrounding vehicles lane change intention using machine learning. In 2019 10th
VGMM Variational Gaussian Mixture Models. IEEE international conference on intelligent data acquisition and advanced computing
WSADE Weight Sum of Average Displacement Error. systems: technology and applications, vol. 2 (pp. 839–843). IEEE.
Benterki, A., Boukhnifer, M., Judalet, V., & Maaoui, C. (2020). Artificial intelligence for
WSFDE Wight Sum of Final Displacement Error.
vehicle behavior anticipation: Hybrid approach based on maneuver classification
NGSIM-LP NGSIM Lankershim and Peachtree.
and trajectory prediction. IEEE Access, 8, 56992–57002, Publisher: IEEE.
RBF Radial Basis Function. Berndt, H., & Dietmayer, K. (2009). Driver intention inference with vehicle onboard
MLPE Mean Lateral Position Error. sensors. In 2009 IEEE international conference on vehicular electronics and safety (pp.
ADE Average Displacement Error. 102–107). IEEE.
FDE Final Displacement Error. Bhattacharyya, A., Fritz, M., & Schiele, B. (2018). Long-term on-board prediction of
people in traffic scenes under uncertainty. In Proceedings of the IEEE conference on
FMEA Failure Modes and Effects Analysis
computer vision and pattern recognition (pp. 4194–4202).
FTA Fault Tree Analysis Bhavsar, P., Das, P., Paugh, M., Dey, K., & Chowdhury, M. (2017). Risk analysis
of autonomous vehicles in mixed traffic streams. Transportation Research Record,
CRediT authorship contribution statement 2625(1), 51–61, Publisher: SAGE Publications Sage CA: Los Angeles, CA.
Biparva, M., Fernández-Llorca, D., Izquierdo-Gonzalo, R., & Tsotsos, J. K. (2021). Video
action recognition for lane-change classification and prediction of surrounding
Luiz G. Galvão: Conceptualization, Methodology, Investigation, vehicles. arXiv preprint arXiv:2101.05043.
Visualization, Writing – original draft, Writing – review & editing. Bonnin, S., Weisswange, T. H., Kummert, F., & Schmüdderich, J. (2014). Pedestrian
M. Nazmul Huda: Conceptualization, Writing – review & editing, crossing prediction using multiple context-based models. In 17th international IEEE
Supervision, Resources, Funding acquisition, Project administration. conference on intelligent transportation systems (pp. 378–385). IEEE.
Bouhsain, S. A., Saadatnejad, S., & Alahi, A. (2020). Pedestrian intention prediction: A
multi-task perspective. arXiv preprint arXiv:2010.10270.
Declaration of competing interest Cadena, P. R. G., Qian, Y., Wang, C., & Yang, M. (2022). Pedestrian graph+: A fast
pedestrian crossing prediction model based on graph convolutional networks. IEEE
The authors declare the following financial interests/personal rela- Transactions on Intelligent Transportation Systems, Publisher: IEEE.
Chandra, R., Bhattacharya, U., Randhavane, T., Bera, A., & Manocha, D. (2019). Road-
tionships which may be considered as potential competing interests:
Track: Realtime tracking of road agents in dense and heterogeneous environments.
Luiz G. Galvao reports financial support was provided by Engineering arXiv, arXiv–1906.
and Physical Sciences Research Council. Chandra, R., Randhavane, T., Bhattacharya, U., Bera, A., & Manocha, D. (2019).
Deeptagent: Realtime tracking of dense traffic agents using heterogeneous interaction:
Technical report, 2018. [Online]. Available: http://gamma.cs.unc.edu/HTI.
Data availability
Chen, L., Ding, Q., Zou, Q., Chen, Z., & Li, L. (2020). DenseLightNet: A light-weight
vehicle detection network for autonomous driving. IEEE Transactions on Industrial
No data was used for the research described in the article. Electronics, 67(12), 10600–10609, Publisher: IEEE.

26
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Chen, L., Ma, N., Wang, P., Li, J., Wang, P., Pang, G., & Shi, X. (2020). Survey of Huang, H., Zeng, Z., Yao, D., Pei, X., & Zhang, Y. (2021). Spatial–temporal ConvLSTM
pedestrian action recognition techniques for autonomous driving. Tsinghua Science for vehicle driving intention prediction. Tsinghua Science and Technology, 27,
and Technology, 25(4), 458–470, Publisher: TUP. 599–609.
Chen, T., Tian, R., & Ding, Z. (2021). Visual reasoning using graph convolutional Izquierdo, R., Quintanar, A., Lorenzo, J., García-Daza, I., Parra, I., Fernández-Llorca, D.,
networks for predicting pedestrian crossing intention. In Proceedings of the IEEE/CVF & Sotelo, M. A. (2021). Vehicle lane change prediction on highways using efficient
international conference on computer vision (pp. 3103–3109). environment representation and deep learning. IEEE Access, 9, 119454–119465,
Chen, Y., Zhao, D., Lv, L., & Zhang, Q. (2018). Multi-task learning for dangerous object Publisher: IEEE.
detection in autonomous driving. Information Sciences, 432, 559–571, Publisher: Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., & Sotelo, M. A. (2019).
Elsevier. The prevention dataset: A novel benchmark for prediction of vehicles intentions.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated In 2019 IEEE intelligent transportation systems conference (pp. 3114–3121). IEEE.
recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Karasev, V., Ayvaci, A., Heisele, B., & Soatto, S. (2016). Intent-aware long-term
COLONNA, M. (2018). Urbanisation worldwide. Knowledge for policy - European Com- prediction of pedestrian motion. In 2016 IEEE international conference on robotics
mission, URL: https://ec.europa.eu/knowledge4policy/foresight/topic/continuing- and automation (pp. 2543–2549). IEEE.
urbanisation/urbanisation-worldwide_en. Kasper, D., Weidl, G., Dang, T., Breuel, G., Tamke, A., Wedel, A., & Rosenstiel, W.
Czech, P., Braun, M., Kreßel, U., & Yang, B. (2022). On-board pedestrian trajectory (2012). Object-oriented Bayesian networks for detection of lane change maneuvers.
prediction using behavioral features. arXiv preprint arXiv:2210.11999. IEEE Intelligent Transportation Systems Magazine, 4(3), 19–31, Publisher: IEEE.
Dai, S., Li, L., & Li, Z. (2019). Modeling vehicle interactions via modified LSTM models Keller, C. G., & Gavrila, D. M. (2013). Will the pedestrian cross? a study on
for trajectory prediction. IEEE Access, 7, 38287–38296, Publisher: IEEE. pedestrian path prediction. IEEE Transactions on Intelligent Transportation Systems,
Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., & 15(2), 494–506, Publisher: IEEE.
Leal-Taixé, L. (2021). Motchallenge: A benchmark for single-camera multiple target Khosroshahi, A., Ohn-Bar, E., & Trivedi, M. M. (2016). Surround vehicles trajectory
tracking. International Journal of Computer Vision, 129, 845–881, Publisher: Springer. analysis with recurrent neural networks. In 2016 IEEE 19th international conference
Deo, N., Rangesh, A., & Trivedi, M. M. (2018). How would surround vehicles move? on intelligent transportation systems (pp. 2267–2272). IEEE.
A unified framework for maneuver classification and motion prediction. IEEE Kim, B., Kang, C. M., Kim, J., Lee, S. H., Chung, C. C., & Choi, J. W. (2017).
Transactions on Intelligent Vehicles, 3(2), 129–140, Publisher: IEEE. Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent
Deo, N., & Trivedi, M. M. (2018a). Convolutional social pooling for vehicle trajectory neural network. In 2017 IEEE 20th international conference on intelligent transportation
prediction. In Proceedings of the IEEE conference on computer vision and pattern systems (pp. 399–404). IEEE.
recognition workshops (pp. 1468–1476). Kong, Y., & Fu, Y. (2018). Human action recognition and prediction: A survey. arXiv
Deo, N., & Trivedi, M. M. (2018b). Multi-modal trajectory prediction of surrounding preprint arXiv:1806.11230.
vehicles with maneuver based LSTMS. In 2018 IEEE intelligent vehicles symposium
Kooij, J. F. P., Schneider, N., Flohr, F., & Gavrila, D. M. (2014). Context-based
(pp. 1179–1184). IEEE.
pedestrian path prediction. In European conference on computer vision (pp. 618–633).
Dueholm, J. V., Kristoffersen, M. S., Satzoda, R. K., Moeslund, T. B., & Trivedi, M. M.
Springer.
(2016). Trajectories and maneuvers of surrounding vehicles with panoramic camera
Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2020). Do they want to cross? understanding
arrays. IEEE Transactions on Intelligent Vehicles, 1(2), 203–214, Publisher: IEEE.
pedestrian intention for behavior prediction. In 2020 IEEE intelligent vehicles
Durrant-Whyte, H. (2001). A critical review of the state-of-the-art in autonomous
symposium (pp. 1688–1693). IEEE.
land vehicle systems and technology. Albuquerque (NM) andLivermore (CA), USA:
Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2021). Benchmark for evaluating pedestrian
SandiaNationalLaboratories, 41, 242.
action prediction. In Proceedings of the IEEE/CVF winter conference on applications
Fang, Z., & López, A. M. (2018). Is the pedestrian going to cross? answering by 2D
of computer vision (pp. 1258–1268).
pose estimation. In 2018 IEEE intelligent vehicles symposium (pp. 1271–1276). IEEE.
Kuefler, A., Morton, J., Wheeler, T., & Kochenderfer, M. (2017). Imitating driver
Fang, Z., Vázquez, D., & López, A. M. (2017). On-board detection of pedestrian
behavior with generative adversarial networks. In 2017 IEEE intelligent vehicles
intentions. Sensors, 17(10), 2193, Publisher: MDPI.
symposium (pp. 204–211). IEEE.
Fernández-Llorca, D., Biparva, M., Izquierdo-Gonzalo, R., & Tsotsos, J. K. (2020). Two-
Kumar, P., Perrollaz, M., Lefevre, S., & Laugier, C. (2013). Learning-based approach for
stream networks for lane-change prediction of surrounding vehicles. In 2020 IEEE
online lane change intention prediction. In 2013 IEEE intelligent vehicles symposium
23rd international conference on intelligent transportation systems (pp. 1–6). IEEE.
(pp. 797–802). IEEE.
Flohr, F. F., Kooij, J. F. K., Pool, E. A. P., & Gavrila, D. M. G. (2018). Context-based
Lee, N., Choi, W., Vernaza, P., Choy, C. B., Torr, P. H., & Chandraker, M. (2017). Desire:
path prediction for targets with switching dynamics.
Distant future prediction in dynamic scenes with interacting agents. In Proceedings
Galvao, L. G., Abbod, M., Kalganova, T., Palade, V., & Huda, M. N. (2021). Pedestrian
of the IEEE conference on computer vision and pattern recognition (pp. 336–345).
and vehicle detection in autonomous vehicle perception systems—A review. Sensors,
Lee, D., Kwon, Y. P., McMains, S., & Hedrick, J. K. (2017). Convolution neural network-
21(21), 7267, Publisher: MDPI.
Gazzeh, S., & Douik, A. (2022). Deep learning for pedestrian behavior understanding. based lane change intention prediction of surrounding vehicles for ACC. In 2017
In 2022 6th international conference on advanced technologies for signal and image IEEE 20th international conference on intelligent transportation systems (pp. 1–6). IEEE.
processing (pp. 1–5). IEEE. Lefèvre, S., Vasquez, D., & Laugier, C. (2014). A survey on motion prediction and
Girma, A., Amsalu, S., Workineh, A., Khan, M., & Homaifar, A. (2020). Deep learning risk assessment for intelligent vehicles. ROBOMECH Journal, 1(1), 1–14, Publisher:
with attention mechanism for predicting driver intention at intersection. In 2020 SpringerOpen.
IEEE intelligent vehicles symposium (pp. 1183–1188). IEEE. Leon, F., & Gavrilescu, M. (2019). A review of tracking, prediction and decision making
GOVUK, G. (2020). Reported road casualties Great Britain, annual report: 2020. methods for autonomous driving. arXiv preprint arXiv:1909.07707.
GOV.UK, URL: https://www.gov.uk/government/statistics/reported-road- Levy, J. I., Buonocore, J. J., & Von Stackelberg, K. (2010). Evaluation of the public
casualties-great-britain-annual-report-2020/reported-road-casualties-great-britain- health impacts of traffic congestion: A health risk assessment. Environmental Health,
annual-report-2020. 9(1), 1–12, Publisher: Springer.
GOVUK, G. (2021). Reported road casualties in Great Britain, provisional estimates: Li, Y., Wang, H., Dang, L. M., Nguyen, T. N., Han, D., Lee, A., Jang, I., &
year ending June 2021. GOV.UK, URL: https://www.gov.uk/government/statistics/ Moon, H. (2020). A deep learning-based hybrid framework for object detection
reported-road-casualties-in-great-britain-provisional-estimates-year-ending-june- and recognition in autonomous driving. IEEE Access, 8, 194228–194239, Publisher:
2021/reported-road-casualties-in-great-britain-provisional-estimates-year-ending- IEEE.
june-2021. Li, J., Yang, F., Tomizuka, M., & Choi, C. (2020). Evolvegraph: Multi-agent trajectory
Gulzar, M., Muhammad, Y., & Muhammad, N. (2021). A survey on motion prediction of prediction with dynamic relational reasoning. In Proceedings of the neural information
pedestrians and vehicles for autonomous driving. IEEE Access, 9, 137957–137969, processing systems.
Publisher: IEEE. Li, X., Ying, X., & Chuah, M. C. (2019a). Grip++: Enhanced graph-based interaction-
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., & Alahi, A. (2018). Social GAN: Socially aware trajectory prediction for autonomous driving. arXiv preprint arXiv:1907.
acceptable trajectories with generative adversarial networks. In Proceedings of the 07792.
IEEE conference on computer vision and pattern recognition (pp. 2255–2264). Li, X., Ying, X., & Chuah, M. C. (2019b). Grip: Graph-based interaction-aware trajectory
Hasan, I., Setti, F., Tsesmelis, T., Del Bue, A., Galasso, F., & Cristani, M. (2018). Mx- prediction. In 2019 IEEE intelligent transportation systems conference (pp. 3960–3966).
lstm: Mixing tracklets and vislets to jointly forecast trajectories and head poses. IEEE.
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. Lian, J., Yu, F., Li, L., & Zhou, Y. (2022). Early intention prediction of pedestrians
6067–6076). using contextual attention-based LSTM. Multimedia Tools and Applications, 1–17,
He, J.-H., Chen, Y.-L., Chen, X.-Z., & Chiang, H.-H. (2021). Vehicle turning intention Publisher: Springer.
prediction based on data-driven method with roadside radar and vision sensor. In Lim, Y.-C., Lee, M., Lee, C.-H., Kwon, S., & Lee, J.-h. (2010). Improvement of stereo
2021 IEEE international conference on consumer electronics-Taiwan (pp. 1–2). IEEE. vision-based position and velocity estimation and tracking using a stripe-based
Hermes, C., Wohler, C., Schenk, K., & Kummert, F. (2009). Long-term vehicle motion disparity estimation and inverse perspective map-based extended Kalman filter.
prediction. In 2009 IEEE intelligent vehicles symposium (pp. 652–657). IEEE. Optics and Lasers in Engineering, 48(9), 859–868, Publisher: Elsevier.

27
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Rehder, E., Wirth, F., Lauer, M., & Stiller, C. (2018). Pedestrian prediction by planning
Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer using deep neural networks. In 2018 IEEE international conference on robotics and
vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, automation (pp. 1–5). IEEE.
2014, proceedings, Part V 13 (pp. 740–755). Springer. Ridel, D., Rehder, E., Lauer, M., Stiller, C., & Wolf, D. (2018). A literature review on
Liu, B., Adeli, E., Cao, Z., Lee, K.-H., Shenoi, A., Gaidon, A., & Niebles, J. C. (2020). the prediction of pedestrian behavior in urban scenarios. In 2018 21st international
Spatiotemporal relationship reasoning for pedestrian intent prediction. IEEE Robotics conference on intelligent transportation systems (pp. 3105–3112). IEEE.
and Automation Letters, 5(2), 3485–3492, Publisher: IEEE. Rudenko, A., Palmieri, L., Herman, M., Kitani, K. M., Gavrila, D. M., & Arras, K.
Liu, J., Wang, H., Peng, L., Cao, Z., Yang, D., & Li, J. (2022). PNNUAD: Perception O. (2020). Human motion trajectory prediction: A survey. International Journal of
neural networks uncertainty aware decision-making for autonomous vehicle. IEEE Robotics Research, 39(8), 895–935, Publisher: Sage Publications Sage UK: London,
Transactions on Intelligent Transportation Systems, 23(12), 24355–24368, Publisher: England.
IEEE. Ruijters, E., & Stoelinga, M. (2015). Fault tree analysis: A survey of the state-of-the-art
Luan, Z., Huang, Y., Zhao, W., Zou, S., & Xu, C. (2022). A comprehensive lateral in modeling, analysis and tools. Computer Science Review, 15, 29–62, Publisher:
motion prediction method of surrounding vehicles integrating driver intention pre- Elsevier.
diction and vehicle behavior recognition. Proceedings of the Institution of Mechanical Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
Engineers, Part D (Journal of Automobile Engineering), Article 09544070221078636, Karpathy, A., Khosla, A., & Bernstein, M. (2015). Imagenet large scale visual
Publisher: SAGE Publications Sage UK: London, England. recognition challenge. International Journal of Computer Vision, 115, 211–252,
Ma, J., & Rong, W. (2022). Pedestrian crossing intention prediction method based on Publisher: Springer.
multi-feature fusion. World Electric Vehicle Journal, 13(8), 158, Publisher: MDPI. Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., & Savarese, S.
Ma, Y., Zhu, X., Zhang, S., Yang, R., Wang, W., & Manocha, D. (2019). Trafficpredict: (2019). Sophie: An attentive gan for predicting paths compliant to social and
Trajectory prediction for heterogeneous traffic-agents. In Proceedings of the AAAI physical constraints. In Proceedings of the IEEE/CVF conference on computer vision
conference on artificial intelligence, vol. 33 (pp. 6120–6127). Issue: 01. and pattern recognition (pp. 1349–1358).
Mangalam, K., Girase, H., Agarwal, S., Lee, K.-H., Adeli, E., Malik, J., & Gaidon, A. Schneider, N., & Gavrila, D. M. (2013). Pedestrian path prediction with recursive
(2020). It is not the journey but the destination: Endpoint conditioned trajectory Bayesian filters: A comparative study. In German conference on pattern recognition
prediction. In European conference on computer vision (pp. 759–776). Springer. (pp. 174–183). Springer.
Manh, H., & Alaghband, G. (2018). Scene-LSTM: A model for human trajectory Schwall, M., Daniel, T., Victor, T., Favaro, F., & Hohnhold, H. (2020). Waymo public
prediction. arXiv preprint arXiv:1808.04018. road safety performance data. arXiv preprint arXiv:2011.00038.
Messaoud, K., Yahiaoui, I., Verroust-Blondet, A., & Nashashibi, F. (2019). Non-local Sharma, N., Dhiman, C., & Indu, S. (2022). Pedestrian intention prediction for
social pooling for vehicle trajectory prediction. In 2019 IEEE intelligent vehicles autonomous vehicles: A comprehensive survey. Neurocomputing, Publisher: Elsevier.
symposium (pp. 975–980). IEEE. Shirazi, M. S., & Morris, B. T. (2016). Looking at intersections: A survey of intersection
Minguez, R. Q., Alonso, I. P., Fernandez-Llorca, D., & Sotelo, M. A. (2018). Pedestrian monitoring, behavior and safety analysis of recent studies. IEEE Transactions on
path, pose, and intention prediction through gaussian process dynamical models Intelligent Transportation Systems, 18(1), 4–24, Publisher: IEEE.
Shobha, B. S., & Deepu, R. (2018). A review on video based vehicle detection,
and pedestrian activity recognition. IEEE Transactions on Intelligent Transportation
recognition and tracking. In 2018 3rd international conference on computational
Systems, 20(5), 1803–1814, Publisher: IEEE.
systems and information technology for sustainable solutions (pp. 183–186). IEEE.
Mo, X., Huang, Z., Xing, Y., & Lv, C. (2022). Multi-agent trajectory prediction
Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous
with heterogeneous edge-enhanced graph attention network. IEEE Transactions on
mobile robots. MIT Press.
Intelligent Transportation Systems, Publisher: IEEE.
SIMulation, G. (2007). US highway 101 dataset.
Mohamed, A., Qian, K., Elhoseiny, M., & Claudel, C. (2020). Social-STGCNN: A
Sivaraman, S., & Trivedi, M. M. (2013). Looking at vehicles on the road: A survey of
social spatio-temporal graph convolutional neural network for human trajectory
vision-based vehicle detection, tracking, and behavior analysis. IEEE Transactions
prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern
on Intelligent Transportation Systems, 14(4), 1773–1795, Publisher: IEEE.
recognition (pp. 14424–14432).
Su, S., Muelling, K., Dolan, J., Palanisamy, P., & Mudalige, P. (2018). Learning vehicle
Mozaffari, S., Al-Jarrah, O. Y., Dianati, M., Jennings, P., & Mouzakitis, A. (2020). Deep
surrounding-aware lane-changing behavior from observed trajectories. In 2018 IEEE
learning-based vehicle behavior prediction for autonomous driving applications: A
intelligent vehicles symposium (pp. 1412–1417). IEEE.
review. IEEE Transactions on Intelligent Transportation Systems, Publisher: IEEE.
Sun, J., Jiang, Q., & Lu, C. (2020). Recursive social behavior graph for trajectory
Naik, A. Y., Bighashdel, A., Jancura, P., & Dubbelman, G. (2022). Scene spatio-temporal
prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern
graph convolutional network for pedestrian intention estimation. In 2022 IEEE
recognition (pp. 660–669).
intelligent vehicles symposium (pp. 874–881). IEEE.
Vemula, A., Muelling, K., & Oh, J. (2018). Social attention: Modeling attention in
Neogi, S., Hoy, M., Chaoqun, W., & Dauwels, J. (2017). Context based pedestrian
human crowds. In 2018 IEEE international conference on robotics and automation
intention prediction using factored latent dynamic conditional random fields. In
(pp. 4601–4607). IEEE.
2017 IEEE symposium series on computational intelligence (pp. 1–8). IEEE.
Vitas, D., Tomic, M., & Burul, M. (2020). Traffic light detection in autonomous driving
Park, S. H., Kim, B., Kang, C. M., Chung, C. C., & Choi, J. W. (2018). Sequence-to-
systems. IEEE Consumer Electronics Magazine, 9(4), 90–96, Publisher: IEEE.
sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. Wang, C., Wang, Y., Xu, M., & Crandall, D. J. (2022). Stepwise goal-driven networks
In 2018 IEEE intelligent vehicles symposium (pp. 1672–1678). IEEE. for trajectory prediction. IEEE Robotics and Automation Letters, 7(2), 2716–2723,
Pendleton, S. D., Andersen, H., Du, X., Shen, X., Meghjani, M., Eng, Y. H., Rus, D., & Publisher: IEEE.
Ang, M. H. (2017). Perception, planning, control, and coordination for autonomous Waymo, W. (2020). Waymo safety report. Waymo, URL: https://waymo.com/safety/.
vehicles. Machines, 5(1), 6, Publisher: Multidisciplinary Digital Publishing Institute. WHO, W. H. O. (2018). Global status report on road safety 2018: Summary: Technical
Petrović, D., Mijailović, R., & Pešić, D. (2020). Traffic accidents with autonomous report, World Health Organization.
vehicles: Type of collisions, manoeuvres and errors of conventional vehicles’ drivers. Xin, L., Wang, P., Chan, C.-Y., Chen, J., Li, S. E., & Cheng, B. (2018). Intention-
Transportation Research Procedia, 45, 161–168, Publisher: Elsevier. aware long horizon trajectory prediction of surrounding vehicles using dual LSTM
Piccoli, F., Balakrishnan, R., Perez, M. J., Sachdeo, M., Nunez, C., Tang, M., An- networks. In 2018 21st international conference on intelligent transportation systems
dreasson, K., Bjurek, K., Raj, R. D., & Davidsson, E. (2020). Fussi-net: Fusion of (pp. 1441–1446). IEEE.
spatio-temporal skeletons for intention prediction network. In 2020 54th asilomar Xing, L., & Amari, S. V. (2008). Fault tree analysis. In Handbook of performability
conference on signals, systems, and computers (pp. 68–72). IEEE. engineering (pp. 595–620). Publisher: Springer.
Quan, R., Zhu, L., Wu, Y., & Yang, Y. (2021). Holistic LSTM for pedestrian trajectory Xing, Y., Lv, C., Huaji, W., Wang, H., & Cao, D. (2017). Recognizing driver braking
prediction. IEEE Transactions on Image Processing, 30, 3229–3239, Publisher: IEEE. intention with vehicle data using unsupervised learning methods: Technical report, SAE
Ragesh, N. K., & Rajesh, R. (2019). Pedestrian detection in automotive safety: Technical Paper.
understanding state-of-the-art. IEEE Access, 7, 47864–47890, Publisher: IEEE. Xu, Y., Piao, Z., & Gao, S. (2018). Encoding crowd interaction with deep neural network
Raimundo, V., & Favio, M. (2021). Driver intention prediction at roundabouts. In 2021 for pedestrian trajectory prediction. In Proceedings of the IEEE conference on computer
XIX workshop on information processing and control (pp. 1–5). IEEE. vision and pattern recognition (pp. 5275–5284).
Rasouli, A., Kotseruba, I., Kunic, T., & Tsotsos, J. K. (2019). Pie: A large-scale Xue, H., Huynh, D. Q., & Reynolds, M. (2018). SS-LSTM: A hierarchical LSTM model
dataset and models for pedestrian intention estimation and trajectory prediction. for pedestrian trajectory prediction. In 2018 IEEE winter conference on applications
In Proceedings of the IEEE/CVF international conference on computer vision (pp. of computer vision (pp. 1186–1194). IEEE.
6262–6271). Xue, H., Huynh, D. Q., & Reynolds, M. (2020). A location-velocity-temporal attention
Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2020). Pedestrian action anticipation using LSTM model for pedestrian trajectory prediction. IEEE Access, 8, 44576–44589,
contextual feature fusion in stacked rnns. arXiv preprint arXiv:2005.06582. Publisher: IEEE.
Razali, H., Mordan, T., & Alahi, A. (2021). Pedestrian intention prediction: A convo- Yang, J., Sun, X., Wang, R. G., & Xue, L. X. (2022). PTPGC: Pedestrian trajectory
lutional bottom-up multi-task approach. Transportation Research Part C: Emerging prediction by graph attention network with ConvLSTM. Robotics and Autonomous
Technologies, 130, Article 103259, Publisher: Elsevier. Systems, 148, Article 103931, Publisher: Elsevier.

28
L.G. Galvão and M.N. Huda Expert Systems With Applications 238 (2024) 121983

Yang, B., Zhan, W., Wang, P., Chan, C., Cai, Y., & Wang, N. (2021). Crossing or Zhang, X., Cheng, L., Li, B., & Hu, H.-M. (2018). Too far to see? Not really!—Pedestrian
not? Context-based recognition of pedestrian crossing intention in the urban envi- detection with scale-aware localization policy. IEEE Transactions on Image Processing,
ronment. IEEE Transactions on Intelligent Transportation Systems, 23(6), 5338–5349, 27(8), 3703–3715, Publisher: IEEE.
Publisher: IEEE. Zhang, H., & Fu, R. (2020). A hybrid approach for turning intention prediction based
Yang, D., Zhang, H., Yurtsever, E., Redmill, K. A., & Ozguner, U. (2022). Predicting on time series forecasting and deep learning. Sensors, 20(17), 4887, Publisher:
pedestrian crossing intention with feature fusion and spatio-temporal attention. Multidisciplinary Digital Publishing Institute.
IEEE Transactions on Intelligent Vehicles, 7(2), 221–230, Publisher: IEEE. Zhang, P., Ouyang, W., Zhang, P., Xue, J., & Zheng, N. (2019). Sr-LSTM: State
Yao, Y., Atkins, E., Johnson-Roberson, M., Vasudevan, R., & Du, X. (2021a). Bitrap: refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the
Bi-directional pedestrian trajectory prediction with multi-modal goal estimation. IEEE/CVF conference on computer vision and pattern recognition (pp. 12085–12094).
IEEE Robotics and Automation Letters, 6(2), 1463–1470, Publisher: IEEE. Zhang, T., Song, W., Fu, M., Yang, Y., & Wang, M. (2021). Vehicle motion prediction at
Yao, Y., Atkins, E., Roberson, M. J., Vasudevan, R., & Du, X. (2021b). Coupling intersections based on the turning intention and prior trajectories model. IEEE/CAA
intent and action for pedestrian crossing behavior prediction. arXiv preprint arXiv: Journal of Automatica Sinica, 8(10), 1657–1666, Publisher: IEEE.
2105.04133. Zhang, K., Zhao, L., Dong, C., Wu, L., & Zheng, L. (2022). AI-TP: Attention-based
Yoon, S., & Kum, D. (2016). The multilayer perceptron approach to lateral motion interaction-aware trajectory prediction for autonomous driving. IEEE Transactions
prediction of surrounding vehicles for autonomous vehicles. In 2016 IEEE intelligent on Intelligent Vehicles, Publisher: IEEE.
vehicles symposium (pp. 1307–1312). IEEE. Zhao, T., Xu, Y., Monfort, M., Choi, W., Baker, C., Zhao, Y., Wang, Y., & Wu, Y.
Zeng, Z. (2022). High efficiency pedestrian crossing prediction. arXiv preprint arXiv: N. (2019). Multi-agent tensor fusion for contextual trajectory prediction. In Pro-
2204.01862. ceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp.
Zhang, J. (2021). Deep understanding Tesla FSD Part 1: HydraNet. Medium, 12126–12134).
URL: https://saneryee-studio.medium.com/deep-understanding-tesla-fsd-part-1- Zhou, W., Berrio, J. S., De Alvis, C., Shan, M., Worrall, S., Ward, J., & Nebot, E. (2020).
hydranet-1b46106d57. Developing and testing robust autonomy: The university of sydney campus data set.
Zhang, S., Abdel-Aty, M., Wu, Y., & Zheng, O. (2021). Pedestrian crossing intention IEEE Intelligent Transportation Systems Magazine, 12(4), 23–40, Publisher: IEEE.
prediction at red-light using pose estimation. IEEE Transactions on Intelligent Zhu, Y., Qian, D., Ren, D., & Xia, H. (2019). Starnet: Pedestrian trajectory prediction
Transportation Systems, 23(3), 2331–2339, Publisher: IEEE. using deep neural network in star topology. In 2019 IEEE/RSJ international
Zhang, X., Angeloudis, P., & Demiris, Y. (2022). ST CrossingPose: A spatial-temporal conference on intelligent robots and systems (pp. 8075–8080). IEEE.
graph convolutional network for skeleton-based pedestrian crossing intention Zyner, A., Worrall, S., & Nebot, E. M. (2019). ACFR five roundabouts dataset:
prediction. IEEE Transactions on Intelligent Transportation Systems, Publisher: IEEE. Naturalistic driving at unsignalized intersections. IEEE Intelligent Transportation
Systems Magazine, 11(4), 8–18, Publisher: IEEE.

Deep Neural Networks and Data For Automated Driving 1721847430
No ratings yet
Deep Neural Networks and Data For Automated Driving 1721847430
288 pages
Advanced Driver Assistance Systems and Autonomous Vehicles Bibis - Ir
100% (1)
Advanced Driver Assistance Systems and Autonomous Vehicles Bibis - Ir
628 pages
Deep Neural Networks and Data For Automated Driving
No ratings yet
Deep Neural Networks and Data For Automated Driving
435 pages
Road Vehicle Automation 3 2016
No ratings yet
Road Vehicle Automation 3 2016
292 pages
Accident Detection Using Deep Learning A
No ratings yet
Accident Detection Using Deep Learning A
10 pages
Art - 253A10.1186 - 252Fs40294 016 0025 8
No ratings yet
Art - 253A10.1186 - 252Fs40294 016 0025 8
34 pages
s41598-022-27026-9
No ratings yet
s41598-022-27026-9
27 pages
Amini 2021
No ratings yet
Amini 2021
27 pages
Pedestrian safety in an automated driving environment
No ratings yet
Pedestrian safety in an automated driving environment
15 pages
Documento Traducido Espanol
No ratings yet
Documento Traducido Espanol
33 pages
Visvesvaraya Technological University: Department of Electronics & Communication Engineering
No ratings yet
Visvesvaraya Technological University: Department of Electronics & Communication Engineering
37 pages
Collision Risk in Autonomous Vehicles Classification Challenges and Open Research Areas
No ratings yet
Collision Risk in Autonomous Vehicles Classification Challenges and Open Research Areas
34 pages
Automonous and Comp Vision
No ratings yet
Automonous and Comp Vision
33 pages
Self Driving Vehicles-An Ethical Overview: Sven Ove Hansson Matts Åke Belin Björn Lundgren
No ratings yet
Self Driving Vehicles-An Ethical Overview: Sven Ove Hansson Matts Åke Belin Björn Lundgren
26 pages
Software Verification and Validation of Safe Autonomous Cars - A Systematic Literature Review
No ratings yet
Software Verification and Validation of Safe Autonomous Cars - A Systematic Literature Review
23 pages
Main
No ratings yet
Main
19 pages
Predicting Pedestrian Intention to Cross the Road
No ratings yet
Predicting Pedestrian Intention to Cross the Road
12 pages
Revolutionizing Mobility: A Comprehensive Evaluation of The Impact of Ai On Automated Vehicles
No ratings yet
Revolutionizing Mobility: A Comprehensive Evaluation of The Impact of Ai On Automated Vehicles
9 pages
Modeling Driver Responses to Automation Failures With Active Inference
No ratings yet
Modeling Driver Responses to Automation Failures With Active Inference
12 pages
Accident Risk Prediction and Avoidance I
No ratings yet
Accident Risk Prediction and Avoidance I
11 pages
C1047 - 1E FR
No ratings yet
C1047 - 1E FR
47 pages
FULLTEXT01
No ratings yet
FULLTEXT01
24 pages
Investigating The Impacts of Autonomous Vehicles On Crash Severity and Traffic Safety
No ratings yet
Investigating The Impacts of Autonomous Vehicles On Crash Severity and Traffic Safety
10 pages
Autonomous Vehicles Guide For Policymakers
No ratings yet
Autonomous Vehicles Guide For Policymakers
62 pages
Futuretransp 04 00034 v2
No ratings yet
Futuretransp 04 00034 v2
24 pages
2303.12889v1 (1)
No ratings yet
2303.12889v1 (1)
19 pages
[3]_2024_Trajectory Planning and Tracking Control Of
No ratings yet
[3]_2024_Trajectory Planning and Tracking Control Of
16 pages
Pedestrianfinal
No ratings yet
Pedestrianfinal
10 pages
2021 Lee Momen LaFreniere TIS
No ratings yet
2021 Lee Momen LaFreniere TIS
9 pages
AV_Risk_Assessment_upload
No ratings yet
AV_Risk_Assessment_upload
14 pages
Machines 11 00676 v3
No ratings yet
Machines 11 00676 v3
29 pages
Collision Prediction in An Integrated Framework of Scenario-Based and Data-Driven Approaches
No ratings yet
Collision Prediction in An Integrated Framework of Scenario-Based and Data-Driven Approaches
14 pages
Applied Sciences: Autonomous Driving-A Crash Explained in Detail
No ratings yet
Applied Sciences: Autonomous Driving-A Crash Explained in Detail
23 pages
Accidents in Self Driving Cars
No ratings yet
Accidents in Self Driving Cars
8 pages
Investigation of Automated Vehicle Effects On Driver 2016 Transportation Res
No ratings yet
Investigation of Automated Vehicle Effects On Driver 2016 Transportation Res
10 pages
Questioning Based On Driving Simulator Data Analysis 1695395980
No ratings yet
Questioning Based On Driving Simulator Data Analysis 1695395980
17 pages
Bai 7 AV
No ratings yet
Bai 7 AV
13 pages
Obstacle Avoidance Path Design For Auton 1823a95e
No ratings yet
Obstacle Avoidance Path Design For Auton 1823a95e
19 pages
Review and Analysis of The Importance of Autonomous Vehicles Liability: A Systematic Literature Review
No ratings yet
Review and Analysis of The Importance of Autonomous Vehicles Liability: A Systematic Literature Review
23 pages
Autonomous Vehicles
No ratings yet
Autonomous Vehicles
11 pages
Deep_Learning_for_Safe_Autonomous_Driving_Current_Challenges_and_Future_Directions
No ratings yet
Deep_Learning_for_Safe_Autonomous_Driving_Current_Challenges_and_Future_Directions
21 pages
Autonomus Vehicle Case Study
No ratings yet
Autonomus Vehicle Case Study
14 pages
Safety Autonomus Vehicle
No ratings yet
Safety Autonomus Vehicle
13 pages
Design and Implementation of a Safe Driving System for Real-Time Driver Behavior Analysis and Hazard Alerting Using Low Cost, Universally Compatible Embedded Hardware
No ratings yet
Design and Implementation of a Safe Driving System for Real-Time Driver Behavior Analysis and Hazard Alerting Using Low Cost, Universally Compatible Embedded Hardware
18 pages
AUTONOMOUS VEHICLES
No ratings yet
AUTONOMOUS VEHICLES
11 pages
AI Paper
No ratings yet
AI Paper
9 pages
Dc-656837-V1-Torts of The Future Autonomous Emailable
No ratings yet
Dc-656837-V1-Torts of The Future Autonomous Emailable
20 pages
19605-Article Text PDF-122927-2-10-20220808
No ratings yet
19605-Article Text PDF-122927-2-10-20220808
9 pages
ZIEHL ABEGG Main Catalogue Centrifugal Fans With IEC Standard Moto
No ratings yet
ZIEHL ABEGG Main Catalogue Centrifugal Fans With IEC Standard Moto
285 pages
Experimental Investigation of A Dual Stage Ignition Biomass Downdraft
No ratings yet
Experimental Investigation of A Dual Stage Ignition Biomass Downdraft
10 pages
No Driver No Regulation
No ratings yet
No Driver No Regulation
3 pages
Autonomous Driving A Survey of Technological Gaps Using Google Scholar and Web of Science Trend Analysis
No ratings yet
Autonomous Driving A Survey of Technological Gaps Using Google Scholar and Web of Science Trend Analysis
18 pages
Template
No ratings yet
Template
1 page
Exploratiory data analysis
No ratings yet
Exploratiory data analysis
26 pages
Government Receipt Portal System (v2.0) 2
No ratings yet
Government Receipt Portal System (v2.0) 2
3 pages
Pictet Group Candidate Privacy Notice
No ratings yet
Pictet Group Candidate Privacy Notice
6 pages
Research Paper-Adam Donahue
No ratings yet
Research Paper-Adam Donahue
12 pages
Effect of Input Data in Hydraulic Modeling For Flood Warning Systems
No ratings yet
Effect of Input Data in Hydraulic Modeling For Flood Warning Systems
20 pages
Arabic Culture & Language Lesson by Slidesgo
No ratings yet
Arabic Culture & Language Lesson by Slidesgo
47 pages
MR352 PDF
No ratings yet
MR352 PDF
55 pages
The Recent Status of R & D On Solar Cell - ITB
No ratings yet
The Recent Status of R & D On Solar Cell - ITB
29 pages
Robe Spot 150 XT
No ratings yet
Robe Spot 150 XT
31 pages
Afton Series
No ratings yet
Afton Series
6 pages
Excel Demo Day 1
No ratings yet
Excel Demo Day 1
9 pages
Data-Intensive Inventory Forecasting With Artificial Intelligence Models For Cross-Border E-Commerce Service Automation - Paper
No ratings yet
Data-Intensive Inventory Forecasting With Artificial Intelligence Models For Cross-Border E-Commerce Service Automation - Paper
13 pages
Medha Ravi Goje - LinkedIn
No ratings yet
Medha Ravi Goje - LinkedIn
5 pages
Emma Hart 001 5000k
No ratings yet
Emma Hart 001 5000k
5 pages
PDF-5 A Comprehensive Powder Diffraction File For
No ratings yet
PDF-5 A Comprehensive Powder Diffraction File For
13 pages
Kahoot
No ratings yet
Kahoot
3 pages
Sample PR Questions For Practice - MAD-22617
No ratings yet
Sample PR Questions For Practice - MAD-22617
2 pages
Technical Note #33: Historical Trending of Power Factor On The EPM3710 & EPM3720
No ratings yet
Technical Note #33: Historical Trending of Power Factor On The EPM3710 & EPM3720
3 pages
The 555 - A Versatile Timer (RE-EN - 1992 - 09-12)
No ratings yet
The 555 - A Versatile Timer (RE-EN - 1992 - 09-12)
28 pages
220593.00 - Testing, Adjusting and Balancing For Plumbing Systems
No ratings yet
220593.00 - Testing, Adjusting and Balancing For Plumbing Systems
12 pages
Precios Aquamine
No ratings yet
Precios Aquamine
19 pages
IM2007rev06-Spare Parts
No ratings yet
IM2007rev06-Spare Parts
8 pages
FD TAs Allottment-2nd Semester 2023-24
No ratings yet
FD TAs Allottment-2nd Semester 2023-24
4 pages
RePack by Diakov
No ratings yet
RePack by Diakov
1 page
Pmos Vs Nmos
No ratings yet
Pmos Vs Nmos
1 page
Ma5252 Syllabus
No ratings yet
Ma5252 Syllabus
2 pages
Spec BST SPSS300W 1.5KW Series Solar Generator
No ratings yet
Spec BST SPSS300W 1.5KW Series Solar Generator
1 page
Assignment Strategic Innovation
No ratings yet
Assignment Strategic Innovation
4 pages
Myisolved How To Update A Direct Deposit Account
No ratings yet
Myisolved How To Update A Direct Deposit Account
6 pages
Federated Learning Based Intelligent Systems to Handle Issues and Challenges in IoVs (Part 1)
From Everand
Federated Learning Based Intelligent Systems to Handle Issues and Challenges in IoVs (Part 1)
Shelly Gupta
No ratings yet
Self Driving Trucks
From Everand
Self Driving Trucks
Zuri Deepwater
No ratings yet
Highway Sensors
From Everand
Highway Sensors
Lucas Lee
No ratings yet
Rail Safety
From Everand
Rail Safety
Kai Turing
No ratings yet
Automated Highway Systems
From Everand
Automated Highway Systems
Serena Vaughn
No ratings yet
Traffic Flow Algorithms
From Everand
Traffic Flow Algorithms
Kai Turing
No ratings yet
Smart Roads
From Everand
Smart Roads
Zuri Deepwater
No ratings yet
Vehicle Infrastructure Integration: Unlocking Insights and Advancements through Computer Vision
From Everand
Vehicle Infrastructure Integration: Unlocking Insights and Advancements through Computer Vision
Fouad Sabry
No ratings yet

Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System — a Review

Uploaded by

Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System — a Review

Uploaded by

Expert Systems With Applications 238 (2024) 121983

Contents lists available at ScienceDirect

Expert Systems With Applications

Pedestrian and vehicle behaviour prediction in autonomous vehicle system

ARTICLE INFO ABSTRACT

Fig. 2. An example of lane change prediction problem: F0 is where the vehicle

𝑝(𝐼𝑎 |𝐹𝑡−𝑇𝑜𝑏𝑠 ∶𝑡 ). (1)

• Some problem formulations are for trajectories and others for

3. Vehicle behaviour prediction

(continued on next page)

(continued on next page)

however, their results could not be directly compared to state-of-the-art

offline Bidirectional LSTM to learn driving behaviour and an online Table 4

(continued on next page)

(continued on next page)

(continued on next page)

(continued on next page)

and, 6.3. Behaviour prediction system challenges

FCN Fully Connected Network. Acknowledgements

You might also like