0% found this document useful (0 votes)
16 views14 pages

Detecting Darting Out Pedestrians With Occlusion Aware Sensor Fusion of Radar and Stereo Camera

detection

Uploaded by

mehmet alp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

Detecting Darting Out Pedestrians With Occlusion Aware Sensor Fusion of Radar and Stereo Camera

detection

Uploaded by

mehmet alp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO.

2, FEBRUARY 2023 1459

Detecting Darting Out Pedestrians With Occlusion


Aware Sensor Fusion of Radar and Stereo Camera
Andras Palffy , Member, IEEE, Julian F. P. Kooij , Member, IEEE, and Dariu M. Gavrila , Member, IEEE

Abstract—Early and accurate detection of crossing pedestrians is


crucial in automated driving in order to perform timely emergency
manoeuvres. However, this is a difficult task in urban scenarios
where pedestrians are often occluded (not visible) behind objects,
e.g., other parked vehicles. We propose an occlusion aware fusion
of stereo camera and radar sensors to address scenarios with
crossing pedestrians behind such parked vehicles. Our proposed
method adapts both the expected rate and properties of detections
in different areas according to the visibility of the sensors. In our
experiments on a real-world dataset, we show that the proposed
occlusion aware fusion of radar and stereo camera detects the
crossing pedestrians on average 0.26 seconds earlier than using
the camera alone, and 0.15 seconds earlier than fusing the sensors
without occlusion information. Our dataset containing 501 relevant
recordings of pedestrians behind vehicles will be publicly available
on our website for non-commercial, scientific use.
Index Terms—Advanced driver assistance systems, millimeter
wave radar, object detection, radar detection.

I. INTRODUCTION
BOUT 23% of the 1.35 million traffic fatalities world-wide
A involve pedestrians [1]. Automated driving has the poten-
tial to significantly reduce these traffic deaths, yet the sensor-
Fig. 1. Darting out scenario: a pedestrian steps out from behind a parked car
(blue) which blocks the line-of-sight of the ego-vehicle (white). We propose
based detection and tracking of pedestrians from a moving to detect such pedestrians with the fusion of stereo camera and radar in an
occlusion aware way, i.e., first building an occlusion model of the environment
vehicle remains challenging. Pedestrians have a wide variation and then expecting fewer and different detections (e.g. shorter visible parts of
in appearance, can quickly alter their course, and can step onto the pedestrian) from the occluded regions (O) than from the visible, unoccluded
the road at pretty much any location. ones (V ).
Intelligent vehicles can use multiple sensors to cope with this
task: cameras [2], [3], [4], radars [5], [6], [7] and LiDARs [8],
Pedestrian sensing is often complicated in urban scenarios by
[9]. Fusing different sensors, e.g., camera with radar [10]
occlusions, such as by parked vehicles. A substantial 26% of
or camera with LiDAR [11], can increase the reliability and
the accidents with crossing pedestrian analyzed in [12] involved
redundancy of such systems. In this paper, we consider the
some form of visual occlusion. In fact, this case is so important
fusion of a (stereo) camera with a radar. These are low-cost
that the consumer advocacy group Euro NCAP designates a spe-
sensors with complementary strengths that are well established
cial test scenario for it, titled “Running Child from Nearside from
in driver assistance context on the market. Cameras provide
Obstruction” [13]. This case of a pedestrian darting out [14] is
color/texture information at a fine horizontal and vertical
illustrated in Fig. 1. It is particularly dangerous because neither
resolution. Radar sensors provide accurate depth information,
a human driver nor the pedestrian have initially a clear, direct
can directly measure the radial velocities and are more robust
view of the other. Similarly, in an automated driving setting, a
to adverse weather and lighting conditions.
parked vehicle would block direct line-of-sight from the sensors
of the ego-vehicle to the pedestrian. However, the extent of this
Manuscript received 2 August 2022; revised 9 October 2022; accepted 25
October 2022. Date of publication 8 November 2022; date of current version blockage depends on the sensor’s type and on the size and shape
20 March 2023. This work was supported by Dutch Science Foundation NWO- of the occlusion.
TTW through SafeVRU Project under Grant 14667. (Corresponding author: A camera may see the upper body of a pedestrian behind a
Dariu M. Gavrila.)
The authors are with the Intelligent Vehicles Group, TU Delft, 2628 CD Delft, passenger car, while a person behind a larger vehicle, such as
The Netherlands (e-mail: [email protected]). a truck or a van may be invisible to the sensor. On the other
This article has supplementary material provided by the authors and color hand, commercially available 2+1D radars, which provide two
versions of one or more figures available at https://doi.org/10.1109/TIV.
2022.3220435. spatial dimensions (range and azimuth) and one dimension for
Digital Object Identifier 10.1109/TIV.2022.3220435 Doppler (radial velocity), are often able to detect the reflections
2379-8858 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1460 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

of a pedestrian even in complete occlusion due to multipath and fusing their results to detect partially occluded pedestri-
propagation [15], [16]. That is, the reflected radar signal may ans [19], [20], [21], [22]. More recently, researchers proposed
“bounce” off other parked cars or the ground beneath the oc- special loss functions [23], [24] or top-down approaches [25] to
cluding vehicle and reach the ego-vehicle’s sensor even if there jointly estimate the state of close-by pedestrians occluding each
is no direct line-of-sight. Such indirect reflections are weaker other, introduced hard negative mining to increase the occlusion
and occur less frequently than direct ones [15], but they could tolerance of networks [26], or proposed to explicitly collect more
still provide valuable information about a potentially darting out training data of partially occluded pedestrians [4] to address the
pedestrian. problem. However, none of these methods used a global scene
Since both camera and radar sensors are affected by occlu- model to describe the occlusions that may affect the number and
sions, their fusion preferably requires an occlusion model that attributes of detections.
describes how many detections to expect from each sensor in
the differently occluded areas of the scene (e.g., to expect fewer B. Radar Based Approaches
detections behind cars). In addition, the occlusion model could
Radars have been used to detect road users in a variety of
also provide information about the expected properties of such
ways, including clustering algorithms [27], [28], convolutional
detections, e.g., that the visible part of a partially occluded
neural networks [6], [29] or point cloud processing neural net-
pedestrian may be smaller than an unoccluded one. The stereo
works [30], [31]. A radar based multi-class classification system
camera is a suitable sensor to create this occlusion model because
(addressing pedestrians and pedestrian groups) was presented
it provides rich and dense textural and depth information that can
in [27]. [5] and [32] both aim to distinguish pedestrians from
help accurately detect and model the occluding vehicle itself.
vehicles based on features such as size and velocity profiles
In this paper, we present a Bayesian occlusion aware sensor
of the objects using radar. Some methods also used radars
fusion system designed to detect darting out pedestrians. We
to detect pedestrians in darting out or similar situations. [33]
show that incorporating an occlusion model into such a sensor
presented a tracking method using track-before-detection and
fusion system helps to detect darting out pedestrians earlier;
particle filtering. The system was also tested in scenes of the
thus precious time is gained to initiate emergency braking or
pedestrian entering and exiting an occluded region behind a
steering, if needed. While we consider the fusion of (stereo)
car. The radar was able to provide measurements even in the
camera and radar, the framework is suitable to integrate other
occlusion. However, the occlusion itself was not considered,
sensors, e.g., LiDAR.
and although they compared the performance to a camera based
The paper is structured as follows. In Section II we discuss
detection system, no fusion occurred. In [15], a binary classi-
previous work. Then, in Section III, we present our generic
fication system of pedestrians and static objects was presented
occlusion aware Bayesian multi-sensor fusion filter. Details of
that uses low-level radar data as input and extracts hand-crafted
how this filter was implemented with radar and stereo camera
features. The system was evaluated using darting out scenarios,
sensors, and applied to darting out scenarios are discussed in
but no sensor fusion was used, nor was occlusion investigated as
Section IV. Section V describes the dataset that was created and
a possible source of information. [16] exploited that radar signals
used for this work. In Section VI, we present our experiments
often “bounce” off large flat surfaces. They showed that it is pos-
and results, which are discussed in-depth in Section VII. Finally,
sible to detect moving road users outside the direct line-of-sight
Section VIII concludes the paper.
with reflected radar measurements by using building facades
or parked vehicles as relay walls. In [34], the authors explicitly
II. RELATED WORK
addressed the detection of fully occluded, darting out pedestrians
In this section, we first discuss camera-, radar-, and fusion with radar. They designed an experimental setup with a static
based methods for pedestrian detection, with a focus on darting radar sensor in an indoor area (behind a corner) and an outdoor
out scenarios and occluded pedestrians. Afterwards, we give an area (behind a van). Movement of the occluded pedestrian is
overview of some widely used methods for both object track- then classified by clustering into different behavior types, such
ing and for modeling the environment considering occlusions. as walking towards, walking out of it, and walking inside the
Finally, we review the available automotive datasets and their occluded region. None of these methods considered occlusion
usability for this research. as a source of information, and none of them compared or fused
camera and radar sensors to detect darting out pedestrians in
A. Camera Based Approaches realistic environments, i.e., from a moving ego-vehicle.
Cameras are often used for pedestrian detection as they
provide rich information while being relatively inexpensive. In C. Fusion Based Approaches
recent years, convolutional neural networks (CNNs) and deep Sensor fusion was extensively researched to provide more
learning methods [3], [4] dominate in this field. robust perception solutions either via model-based (mathemati-
The problem of occlusion is widely recognized, e.g., many cal, e.g., Kalman or particle filters, evidence modeling) [10],
benchmarks define separate metrics for different levels of occlu- [35], [36], or data-driven approaches (e.g., with neural net-
sion [4], [17]. For an overview of camera based methods that con- works) [37], [38]. In this subsection, we focus on fusion systems
sider occlusions, see [18]. Several approaches aimed to explicitly that use radar, with particular attention to whether and how
account for occlusions by learning a set of component detectors these systems address occluded pedestrians. A Kalman-filter

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1461

based pedestrian tracking system using camera and radar was bounding box. Using this depth, we back-projected each 2D
introduced in [10] for indoor, static applications. To deal with box to the 3D space to get a “2.5D” detection: a line segment
the frequent occlusion of the lower body, the authors trained in BEV with length corresponding to the width of the projected
their camera based detector to detect only the upper body of box. Areas behind these detections were considered occluded,
pedestrians, but they did not explicitly model occlusions. In [39], creating a binary map. While this solution resulted in fast pro-
LiDAR and radar were fused to detect pedestrians in a static ex- cessing time and contributed to earlier detection of darting out
perimental setup. First, a binary occlusion map of the scene was pedestrians in the experiments, it also had some drawbacks. By
created by detecting occluding objects with LiDAR. This map assigning a single distance to the entire occluding vehicle, parts
was then used to select which sensors to use for detection: both of the vehicle closer/farther than that distance are incorrectly
sensors for unoccluded regions, and purely radar for occluded considered “regular” unoccluded/occluded (but still walkable)
regions, exploiting its the multipath property. In [36], all three regions. However, a pedestrian cannot be physically present in
sensors were combined in a multi-class system for detecting either of these halves. Modeling occlusion with a bounding box
moving objects, including pedestrians, in an intelligent vehicle also has limitations in width and height, e.g., a pedestrian may
setup using an occupancy grid representation. The LiDAR was be more visible behind the shorter parts of a car than behind its
used as the main sensor to detect moving objects, while camera tallest point, but these two cases are treated identically.
and radar were mainly used for classification. The influence of An alternative camera based approach to creating a more
occlusions was not considered. None of the fused systems found accurate occlusion model that is still computationally efficient
were developed for use in intelligent vehicles to address darting may be to use stixels [49]. Stixels are rectangular column-wise
out scenarios, or considered occlusion as a source of information group of pixels based on disparity information with the goal
beyond helping sensor selection. of reducing the complexity of the stereo point cloud. Since
the original publication [49], researchers have integrated class
D. Tracking information [50] and later instance information into stixels [51].
The latter are referred to as Instance Stixels and could be a
Pedestrians are often tracked with Kalman Filters both in
well suited input for an occlusion model because they follow
camera based [40] and radar based [41] detection systems.
the shape of an occluding car (both in depth and width/height)
Kalman filters can only model linear motion. Situations with
and are still computationally efficient to compute and process.
possibly non-linear motion dynamics, e.g., a pedestrian who
In addition, the same Instance Stixels representation can also
may or may not stop at the road side, can be handled by using
serve as input to a pedestrian detection and tracking system by
an “extended” Kalman Filter, or by switching between multiple
providing the location and height of the pedestrian.
linear motion models with a switching dynamic system [40].
Another commonly used method for pedestrian tracking is the
particle filter [33], [42], [43], [44] which estimates the posterior F. Datasets
distribution over the state space using a set of weighted par- To study the detection of darting out pedestrians with the
ticles. Unlike Kalman Filters, a particle filter can handle non- fusion of camera and radar, a dataset is needed that 1) contains
linear motion dynamics, and can represent arbitrary, potentially measurements from both sensors and 2) contains hundreds of
multi-modal distributions. To satisfy our use case (detecting and the scenario under study. Several datasets have been published
tracking a pedestrian), a filter should not only track an object of to help the development and testing of autonomous vehicles,
interest (i.e., a pedestrian), but also report a probability that a e.g., the well-known KITTI [17] or the EuroCity dataset [4]. In
pedestrian is present in the scene. [42], [45] give solutions to recent years, the number of datasets containing radar data has
incorporate this existence probability into particle filters. increased with different goals such as ego-localization [52], [53],
object classification [54], or object detection [55]. At the time of
E. Environment Modeling writing, nuScenes [56], Zendar [57], Astyx [58], and View-of-
Modeling occluded areas in the environment is often done Delft [31] are the only publicly available automotive datasets
in bird’s-eye view (BEV). A common approach is to aggregate that include measurements from both a camera and a radar
range measurements from radar or LiDAR sensors into a 2D sensor (which provides Doppler data). However, a real-world
occupancy grid and then project “shadows” behind the extracted (i.e. not scripted or directed) dataset will always have relatively
objects [46], [47]. Creating an environment model with camera few darting out examples and thus, none of these datasets are
information can lead to a faster process (i.e., it does not need to suitable for our research.
be accumulated) and provides more information about the nature
of the occluding object (e.g., whether it is a car) due to the rich III. PROPOSED APPROACH
texture information. In [43], the goal was to explicitly model
only the occlusions caused by (parked) vehicles. To this end, 2D A. Overview and Contributions
detections in the image plane were fetched from the car, bus, The goal of this paper is to fuse radar and stereo camera (Fig.
truck, and van classes from the Single Shot Multibox Detector 2, blue and red dashed rectangles) by incorporating occlusion
(SSD) [48]. Depth (i.e., distance from the ego-vehicle) was information to detect darting out pedestrians. To this end, we
estimated by projecting the stereo point cloud into the camera propose a generic Bayesian filter to fuse these sensors in an
view and taking the median distance of the points inside each occlusion aware manner. This estimates not only the 2D position

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1462 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

Fig. 2. Overview of our pipeline. Sensor data is processed to get the pedestrian detections and the occlusion model (Preprocessing). Future states of the object in
the filter are predicted, and then their likelihood are updated with the detections considering the occlusion model (Filtering). The estimated existence probability
and state of the object are calculated, which information can be used in subsequent applications (Postprocessing). Blue/red dashed boxes mark camera/radar specific
steps that are described in Section IV.

and velocity of an object’s center on the ground plane (i.e. the size of the visible part of a pedestrian given its assumed
BEV), but also the probability that the object of interest (i.e., occlusion condition.
a pedestrian) is present in the scene. The used state space will The occlusion model can be retrieved from a single or from a
be discussed in details in Subsection III-B. combination of sensors, or from an independent source. In this
First, in a prediction step, we define the prior distribution of paper, it will be provided by the stereo camera, see Section IV.
the filter given previous measurements (Fig. 2, Predict step). The Finally, after the filtering, the object’s probability of existence
distribution of predicted positions and velocities is defined for and state (i.e., 2D BEV center location and velocity) can be
three cases: a new object entering the scene, an object leaving estimated and used in subsequent processing steps, e.g., in
the scene, and finally a tracked object remaining in the scene. predicting future positions or evaluating the dangerousness of
Please refer to Subsection III-C for details. the scene (Fig. 2, Postprocessing).
After the prediction step, the new detections are fetched from Our contributions are as follows.
each sensor (Fig. 2, Preprocessing) and incorporated in the 1) We propose a generic occlusion aware multi-sensor
update step (Fig. 2, Update step), which is discussed in details in Bayesian filter for object detection and tracking.
Subsection III-D. We assume conditional independence of the 2) We apply the proposed filter as a radar and stereo camera
sensors given the true state of the object, thus we can perform the based pedestrian detection and tracking system on chal-
update with their set of detections individually whenever they lenging darting out scenarios. We show that incorporating
arrive, even if the sensors operate asynchronously at different occlusion information and the radar sensor into our model
frame rates. We describe here the update in a generic way and helps detect darting out pedestrians earlier while keeping
define sensor specific details, e.g. measurement models, later in the number of false alarms low when the pedestrian stays
Section IV. behind the car.
When updating with any of the sensors, its K detections 3) We share our dataset1 containing more than 500 relevant
(as determined by its measurement model) are fed into the scenarios with camera, radar, LiDAR, and odometry data.
filter. This updates the likelihood of the hypotheses in two This work builds upon our previous conference publica-
ways. First, the likelihoods of measuring K detections with this tion [43], where we initially proposed an occlusion-aware
sensor are calculated. Since the number of detections depends Bayesian filter for darting out pedestrians based on stereo camera
on the position of the object, we can incorporate information and radar. This work features an improved sensor measurement
from an occlusion model here. That is, our system adjusts the model (incorporation of additional attributes besides location,
expected number of detections to the visibility of a position and see Subsections IV-B and IV-C). Among others, the occlusion
expects more/less detections at unoccluded/occluded locations, extent is now more accurately represented by a height profile
see Fig. 1. Second, we also consider the unique capabilities of derived from instance segmentation rather than by a bounding
the sensors. That is, we estimate the likelihood of the attribute of
the detection based on the estimated state of the object. Here we 1 The dataset will be made freely available at https://intelligent-vehicles.
could use, for example, the velocity measurement of a radar or org/datasets/ to academic and non-profit organizations for non-commercial,
the classification confidence of a camera. We can also evaluate scientific use.

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1463

box derived from an object detector (see Subsection II-E). In


terms of validation, this work features a significantly enlarged
dataset and added experimentation.

B. State Space and Notations


Now we discuss the mathematical formulation of our pro-
posed generic occlusion aware, multi-sensor Bayesian filter
without sensor related specifics. Let the space T consist of a
2D (lateral and longitudinal) position and velocity, and a binary
flag marking if the tracked object (e.g., a pedestrian) exists. Let
h be a state vector in T (vectors are written in boldface):
T : R × R × R × R × {0, 1}, (1) Fig. 3. Transition of states. Et = 0 denotes the lack of object, and Et = 1
denotes the presence of an object with the configuration of xt , vt at timestamp t.
h ∈ T , h = (x, v, E), (2)
where x = (x, y) and v = (vx , vy ) are the object’s 2D BEV po-
P (Et = 1|Et−1 = 1, ht−1 ) = ps (ht−1 ), (9)
sition and velocity vectors on the ground plane, and E represents
the existence probability. I.e., E = 1 means there is a pedestrian P (Et = 0|Et−1 = 1, ht−1 ) = 1 − ps (ht−1 ). (10)
in the scene and E = 0 represents its absence.
We define a Bayesian filter for detection and tracking which In case an object is present (Et = 1), the values of x and v are
estimates the posterior state distribution P (ht |Z1:t ) given all distributed as follows for entering and staying objects respec-
measurements Z1:t . The filter operates on-line, integrating new tively:
measurements into a posterior using Bayes’ theorem: P (xt , vt |Et = 1, Et−1 = 0, ht−1 ) = pe (xt , vt ), (11)
P (ht |Z1:t ) ∝ P (Zt |ht ) · P (ht |Z1:t−1 ), (3) P (xt , vt |Et = 1, Et−1 = 1, ht−1 ) = P (xt , vt |xt−1 , vt−1 ).
where Zt is the set of all sensor detections at current time t. For this last term, we use a constant velocity dynamic model
Here the prior distribution P (ht |Z1:t−1 ) for time t is obtained similar to [40], with a normally distributed acceleration noise
by applying a state transition probability on the previous poste- a ∼ N (0, Σa ):
rior, and integrating over the previous state ht−1 following the
Chapman-Kolmogorov equation: vt = vt−1 + aΔt, (12)

1
P (ht |Z1:t−1 ) = P (ht |ht−1 ) · P (ht−1 |Z1:t−1 )dht−1 . (4) xt = xt−1 + vt−1 Δt + aΔt2 . (13)
2
We are thus required to define the state transition distribution Through the introduction of the binary flag E, the full state
P (ht |ht−1 ) for the filter’s prediction step, and measurement transition can be regarded as a state machine, see Fig. 3.
likelihood function P (Zt |ht ) for the update step, which we
will derive in the following subsections. Note that the posterior D. Update Step
contains the expected existence probability of a pedestrian in the
Now we describe the likelihood P (Zt |ht ). We follow the
scene:
 common assumption of conditional independence for our sen-
P (Et |Z1:t ) = P (ht |Z1:t )dxt dvt . (5) sors, thus the single-sensor update step described here can be
applied independently to each. The sensor s returns K detections
at once: Z = {z 1 , . . . z K }. Each detection z k contains a 2D
C. Prediction Step BEV location and some additional attributes: z k = [zpos , zattr ].
The state transition distribution is factorized into two terms: To include occlusion awareness, our measurement model intro-
duces several auxiliary variables, with conditional dependencies
P (ht |ht−1 ) = P (Et |ht−1 ) · P (xt , vt |Et , ht−1 ). (6)
as shown in the graphical model of Fig. 4. These variables
The first term estimates the object presence flag E. A new object and their distributions will be introduced in the next para-
can appear with a probability of pn . Unlike pn , ps (ht−1 ), the graphs, where we first distinguish between the expected number
probability that an object stays in the scene depends on the of detections, which differentiates our occlusion aware from
previous state ht−1 , because the position of the object affects the naive approach, and then the likelihood term for a single
the probability that it will suddenly leave the region of interest. measurement z k .
Using these, we can determine the probability of E given the a) Detection rates: The total number of detections (K) is
previous state ht−1 for entering (new), not present and not the sum of foreground (K F ) and background (K B ) detections:
entering, staying, and leaving objects respectively: K = K F + K B . If we consider detections as conditionally in-
dependent events occurring during a fixed interval, it is natural to
P (Et = 1|Et−1 = 0, ht−1 ) = pn , (7)
model the number of foreground (true positive) and background
P (Et = 0|Et−1 = 0, ht−1 ) = 1 − pn , (8) (false positive) detections with two Poisson distributions. Let

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1464 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

  λB
P η k = 0|λB , λF = , (18)
λF + λB
where the binary flag η k denotes if the k th detection z k
comes from the tracked object, i.e., is a true positive de-
tection. Since every detection is conditioned on E and η k
latent
 variables, we have to define the likelihood function
P z k |E, x, v, η k for the following cases: (E = 1, η k = 1),
(E = 1, η k = 0), (E = 0, η k = 0), which stand for the true pos-
itive and for the false positive cases, with and without a present
pedestrian, respectively.
Fig. 4. Graphical model of the probabilistic dependencies in a single time slice Unlike [43], in which we only considered location of detec-
t with K detections Zt = {zt1 , . . . ztK }. ht is the state vector and λB , λF are
the expected detection rates. The binary flag ηtk denotes if the kth detection ztk
tions, here we compose this likelihood function with two parts:
comes from foreground or background. Discrete/real variables are shown with a spatial component (i.e., likelihood of detection’s location) and
square/circle nodes. Observed variables are shaded. an attribute component (likelihood of such a detection at that
location). We assume that true positive (foreground) detections
are spatially distributed around the object’s position x described
us denote the corresponding detection rates with λF (x, E) and by some distribution LF (zpos |x), and that false (background)
λB for the foreground and background detections respectively. detections are distributed as described by some distribution
The values of K B , K F follow Poisson distributions, K B ∼ LB (z k ). Similarly, we define the attribute likelihood functions,
P ois(λB ) and K F ∼ P ois(λF ), with scalar parameters λB AF (zattr |x, v) and AB (zattr ) for true and false detections, but
and λF . The total number of detections K is then also Poisson also conditioned on velocity. Then:
distributed:  
    P z k |E = 1, x, v, η k = 1 = LF (zpos |x) · AF (zattr |x, v),
P K|λB , λF = P ois λB + λF . (14)
 
P z k |η k = 0 = LB (zpos ) · AB (zattr ). (19)
We distinguish between our novel occlusion aware filter by the
way it determines the value of the foreground detection rate, as Finally, the complete likelihood of a single measurement is:
opposed to the naive approach. In the proposed Occlusion Aware    
Filter (OAF) approach, the number of foreground detections P z k |E, x, v, λB , λF = P z k |E, x, v, η k = 1
depends both on the object’s presence and location. A benefit of    
· P η k = 1|λB , λF + P z k |E, x, v, η k = 0
Poisson distributions is that we can incorporate the occlusion
 
information here with a spatially dependent rate parameter, · P η k = 0|λB , λF . (20)
i.e., more true detections are expected if the pedestrian is un-
occluded (i.e. visible) than if the pedestrian is occluded: Since all K detections are conditionally independent given x
 and E, and λB and λF are determined through position x and
λFunocc if x ∈ V, areas V plus O, the full measurement likelihood becomes:
λ =
F
(OAF) (15)
λFocc if x ∈ O,
  K
 
where λF P (Zt |ht ) = P K|λB , λF · P z k |ht , λB , λF . (21)
unocc , λocc indicate the expected detection rates in un-
F
k=1
occluded (V ), occluded (O) areas respectively, see Fig. 1. The
extent of these areas will be determined by the implementation-
IV. IMPLEMENTATION
specific environment occlusion model.
In contrast, a naive (i.e., not occlusion aware) filter assumes First, we describe how the Bayesian filter was implemented
that λF is constant, targeting the more typical unoccluded case: with a particle filter. Then, we discuss how the attribute likeli-
hood function was implemented for the two sensors. A summary
λF = λF
unocc . (naive approach) (16) of the model parameters is given in Table I.
Our occlusion aware filter behaves the same as a naive one in
unoccluded cases, but in occluded positions it adapts its expected A. Particle Filtering
rate λF . For inference, we use a particle filter to represent the posterior
b) Measurement likelihood: Derived from the properties of distribution in our model by a set of samples (i.e., particles). Un-
Poisson distributions, the number of false and true positive like, say, a multiple-model Kalman Filter, it is straight-forward
detections given K are distributed as Binomial distributions to include information about occlusion in the particle filter, i.e.
parametrized by the ratio of λB and λF . Thus, the probability of particles in occluded areas are treated differently than those in
a detection z k being foreground/background is (given K number unoccluded areas, and to represent uniform initial uncertainty
of detections): over the bounded occlusion region.
  λF Furthermore, such a system is easy to scale for the available
P η k = 1|λB , λF = , (17) hardware resources by changing the number of particles.
λF + λB

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1465

TABLE I Predicted variables are marked with ^ sign. First, we estimate


LIST OF MODEL PARAMETERS AND THEIR EXPERIMENTAL VALUE SETTINGS
the next weight of the negative particle as follows:
(0) wnp
P (Et = 0|Z1:t−1 ) = ŵt = , (26)
wnp + wp
where wp , wnp are the cumulative weights of present, and not
present predicted states using Eqs. (7)–(10):
Ns
(0)   (i)  (i)
wp = pn · wt−1 + ps h t wt−1 , (27)
i=1
Ns
(0)   (i)  (i)
wnp = (1 − pn ) · wt−1 + 1 − ps ht wt−1 . (28)
i=1

Afterwards, we sample Ns new positive particles, which are


either a mutation of an existing particle moved by the dynamic
model, or a completely new (entering) one, see Eq. (11). An
(i)
existing particle stays in the scene with probability ps (ht ), or
(i)
is replaced by a new one with probability of 1 − ps (ht ):
⎧  
To include the existence probability in the filter, we fol- ⎨hˆ (i) ∼ P h |h(i)
(i) t t t−1 if moved particle,
low [42]. Of N particles, the first one (index 0) will represent all ht−1 → (29)
hypotheses with non-present pedestrian, called the negative par- ⎩hˆ (i) ∼ p (h ) if new particle.
t e t
ticle. The remaining N − 1 = Ns particles (called the positive
ones) represent the cases of a present pedestrian: All weights of the predicted positive particles are then set
uniformly:
(0)
Et = 0 → wt , (22) (0)
  (i) 1 − ŵt
(i) (i)
ŵt = ∀ i = 1 . . . Ns . (30)
Et = 1 → ht , wt f or i = 1 . . . Ns . (23) Ns
3) Update Step: Particles are updated by new detections
(i) (i) (i) (i)
where ht = xt , vt , Et = 1 is the state of the ith particle, using the measurement likelihood Eq. (21):
(i) (i) 
wt is the weight assigned to it, and Et = 1 marks that these (i) 
wt ∝ ŵt · P Zt |hˆt .
(i) (i)
(31)
Ns particles represent hypotheses of a present pedestrian. Thus,
the estimated probability of a non-present/existing pedestrian Details of the attribute likelihood calculations are discussed later
given all detections is the normalized weight of the first parti- in Subsection IV-B and IV-C. After the update, all weights are
cle/summed weights of all the others, see Eq. (5): renormalized. To avoid sample degeneracy, we resample the
Ns
positive particles if the Effective Sample Size (ESS) drops below
(0) (i) a threshold [59].
P (Et = 0|Z1:t ) = wt , P (Et = 1|Z1:t ) = wt . (24)
i=1
B. Use of Stereo Camera Data
To obtain the estimated state of the pedestrian, we use the
weighted average of the particles along the hypothesis space: The camera sensor data is used for two purposes: 1) to update
our filter with camera based pedestrian detections and 2) to
 Ns
update our occlusion model, see Fig. 2, top. For both tasks,
(i) (i)
ht = xt , vt , E = wt · ht , (25)
we use the Instance Stixel representation [51]. Stixels [49] are
i=1
rectangular upright sticks in the 3D space, perpendicular to the
where xt = (xt , yt ) is the estimated position, vt = (vx,t , vy,t ) estimated ground plane. With the extension of [51], each stixel
is the estimated velocity vector of the pedestrian, and E is the has the following parameters: a 3D position of their bottom, a
estimate of the pedestrian being present, see Eq. (24). height, a class label (among others: car, bus, truck, person, sky)
1) Initialization: Particles’ positions are initialized uni- and an instance id. In this way, objects of interest (pedestrians
formly across the Region of Interest (ROI). Their velocity is and occluding vehicles) are represented by a loose set of stixels
drawn from normal distribution Wspeed ∼ N (p, Σw ) around connected by their class and instance information. Unlike the
slow walking pace p = 1 m/s and their orientation is drawn bounding box representation used in [43], these stixels better
from a uniform distribution Wdir between ±22.5°, where 0◦ describe the shape and extent of objects in both bird’s-eye and
is the orientation perpendicular to the movement of the ego- camera perspectives (e.g., varying visible height of cars) while
vehicle, pointing towards the road. keeping the processing load low. First we filter the stixels to keep
2) Prediction Step: The input of the prediction step are Ns only those from the relevant classes: pedestrian stixels as input
uniformly weighted particles representing the present pedes- for the particle filter and vehicle stixels (i.e. from car, truck, and
trian, and one particle representing the Et = 0 hypothesis. bus classes) to update the occlusion model.

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1466 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

1) Update of the Occlusion Model: The stixels of vehicles step, we filter the reflections based on their RCS and vr , i.e. we
that are close enough (i.e., at least one of their stixels is in ROI) remove targets with very weak reflections or too low velocities
are fitted with a bird’s-eye view 2D rectangle to model the po- to only keep ones that could potentially originate from a darting
sition and extent of the parked vehicles. The fitting is done with pedestrian. We also eliminate radar targets that are located in the
plausible minimum widths and lengths to avoid unrealistically rectangles fitted to the parked cars since a pedestrian cannot be
small car assumptions. We consider the projected region behind present there, but the high reflectivity of these cars could yield a
the farther end of these car models as occluded as shown on Figs. moving radar target in case of a faulty ego-motion compensation.
1 and 7. We also store the set of stixels for each car to calculate The remaining reflections are considered as detections for the
the height of the occlusion for later use, see below. filter: z = [zpos , zattr = zvel = vr ].
2) Update of the Filter: The pedestrian stixels are grouped The position zpos is then used in the spatial component and
by their instance id. Then, the average 2D BEV position of the modeled with a normal distribution analogous to the camera,
stixels and their largest height range in meters (i.e., the difference with standard deviation Σrx :
between the lowest and highest stixels ends) are computed to (i) (i)
create a pedestrian detection for the filter: z = [zpos , zattr = LF (zpos |xt ) = N (zpos |xt , Σrx ). (34)
zheight ]. The position zpos is then used in the spatial component, In addition to the spatial distribution, the radar also has an at-
which is modeled with a normal distribution, with standard (i) (i)
tribute likelihood component AF (zattr |xt , vt ). We consider
deviation Σcx : the likelihood of observing a radar reflection with the measured
(i) (i)
LF (zpos |xt ) = N (zpos |xt , Σcx ). (32) radial velocity zvel given the location and velocity of each
(i)
particles. Let us define los = xt − xradar as the line-of-sight
The height zheight is used to calculate the attribute likelihood
(i) (i)
vector pointing from the radar to the particle. We calculate
AF (zattr |xt , vt ). We consider the likelihood of observing (i) (i)
the expected radial velocity vr,t as the particle’s velocity vt ’s
(i)
a pedestrian with visible zheight at the location xt of each projection to this vector (i.e., its radial component):
particle, given the current occlusion model. First, we compute
(i) (i)
the expected observable height ht for each occluded particle (i) los · vt
vr,t = . (35)
by looking up the car stixel with the most similar angle to it, see los
Fig. 1. Then, the height of this stixel is scaled by the distance (i) (i)
of the particle to get how tall objects would be occluded by Finally, we model both AF (zattr |xt , vt ) and AB (zattr ) as
the stixel/parked car at the particle’s location. Afterwards, the zero mean normal distributions with standard deviations ΣF rv
(i)
expected observable height ht is the difference between the and ΣBrv :
occluded height and the expected height of a pedestrian mheight . dvel = zvel − vr,t ,
(i)
For example, behind a tall van we expect to see no part of a
(i) (i) (i)  
pedestrian (ht = 0), while at an unoccluded location the full AF (zattr |xt , vt ) = N dvel |0, ΣF rv ,
(i)  
height of the pedestrian should be visible ht = mheight .
(i) (i) AB (zattr ) = N dvel |0, ΣB rv . (36)
Finally, we model both AF (zattr |xt , vt ) and AB (zattr ) as
zero mean normal distributions with standard deviations ΣF ch V. DATASET
and ΣB ch :
Our dataset was captured by our prototype vehicle [60] in
(i)
dheight = zheight − ht , Delft, the Netherlands. We recorded the output of a Continental
    400 radar mounted behind the front bumper (2+1D: range,
(i) (i)
AF zattr |xt , vt = N dheight |0, ΣF
ch , azimuth, Doppler, ∼13 Hz, ∼100 m range, ∼120° field of view),
  an IDS stereo camera (1936 × 1216 px, ∼10 Hz, 35 cm baseline)
AB (zattr ) = N dheight |0, ΣB
ch . (33)
mounted on the windshield, a Velodyne HDL-64 LiDAR (64
layers, ∼10 Hz) scanner on the roof, and the ego-vehicle’s
C. Use of Radar Data odometry (Spatial Dual GNSS/INS/AHRS sensor and wheel
Radar data is solely used as an input to our pedestrian detec- odometry fused via an Unscented Kalman Filter, ∼30 Hz). All
tion filter. For an overview of radar specific steps, see Fig. 2, sensors were jointly calibrated following [61]. While the LiDAR
bottom. Our equipped radar outputs a sparse point cloud of re- data is not used in this paper, it will be made available for
flections called radar targets. Each point has two spatial dimen- future work.
sions, range r and azimuth α, and a third dimension referred to The dataset contains 501 recordings, each with a length
as Doppler, which is the radial velocity vrel of the target relative between 8–20 seconds. In each recording, the ego-vehicle ap-
to the ego-vehicle. First, we perform ego-motion compensation proaches or passes (at least) one parked vehicle with a pedestrian
for vrel . That is, by eliminating the motion of the sensor that behind it. All recordings were performed in a real environment,
comes from both the translational and rotational movement with driving speeds suitable for the environment (mean: 4.0 m/s,
of the ego-vehicle we get the compensated radial velocity, a std.: 0.57 m/s). The pedestrian either steps out from behind the
signed scalar value denoted by vr , describing the ego-motion parked vehicle (“darting” or “walking” sequences) or remains
compensated (i.e., absolute) radial velocity of the point. In a next there (“staying” sequences). Participants were instructed which

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1467

TABLE II
OVERVIEW OF THE COMPARED METHODS WITH WHETHER THEY USE RADAR,
TYPE OF CAMERA BASED METHOD (IS: INSTANCE STIXELS, SSD: SINGLE
SHOT DETECTOR), WHETHER THEY ARE OCCLUSION AWARE, AND WHETHER
THEY IMPLEMENT THE ATTRIBUTE LIKELIHOOD COMPONENTS

Fig. 5. Examples of darting out pedestrians from our dataset. information help to detect darting out pedestrians. For this pur-
pose, we compare the following methods: naive camera, naive
fusion, OAF camera, and OAF fusion, where “naive”/“OAF”
stands for naive/occlusion aware filtering. The naive camera and
naive fusion are methods that use only the camera/both sensors
to update the filter in a naive way, see Eq. (16). Similarly, OAF
camera and OAF fusion use only camera/both sensors to update,
but in an occlusion aware way, i.e., they are “occlusion aware
filters”, see Eq. (15). All four methods above use Instance Stixels
(IS) as camera based pedestrian detections to update the filter,
while OAF camera and OAF fusion also use Instance Stixels to
model the occlusions. To study the benefits of the improvements
introduced in this paper, we compare the methods above with
the fusion method from our previous publication [43]: OAFSSD
fusion. This is also an occlusion aware filter fusing both sensors,
similar to OAF fusion, but it uses the output of the Single Shot
Detector (SSD) instead of Instance Stixels (IS) as camera based
Fig. 6. Example of annotated frames on a walking out sequence. We marked
the first frames where (a) the pedestrian’s head, (b) the body center, (c) one of method. Further, in contrast to the other methods, OAFSSD fusion
the feet, and (d), full body is visible. does not use the attribute likelihood components introduced in
Subsection IV-B and IV-C, only the spatial component. Note
that unlike IS, SSD provides detections as bounding boxes, not
action to perform next, but were free to choose their walking
involving the height profile of the cars. Hence, the height related
speed during darting, or their activity (imitating e.g. phone call,
attribute likelihood component would not be possible to calcu-
bagging groceries, slight movement) during staying recordings.
late with SSD. An overview of the compared methods is given in
See Fig. 5 for examples of darting out pedestrians. Fifteen
Table II. Both the “Filtering” and the “Postprocessing” module
subjects with different heights participated in the experiment
from Fig. 2 (including the presented application example) run
(mean: 178 cm, standard deviation: 8.5 cm). In total, more than
at a processing speed of over 500 Hz for all methods with 1000
100 different parked vehicles were used as occlusion, ranging
particles in an optimized Python based implementation using the
from passenger cars (partial occlusion) to vans (full occlusion).
Robot Operating System (ROS) on a high-end system PC (64 GB
The resulting dataset contains 249 walking and staying 252
RAM, TITAN X (Pascal) GPU, Intel Xeon CPU E5-1560 CPU).
sequences. For each sequence, we manually annotated its type
This brings a negligible overhead compared to the camera based
(darting or staying), the pedestrian’s height, the occluding vehi-
detection modules (off-the-shelf implementation of SSD and IS,
cle’s type (car or van) and some environment conditions (e.g.,
including the occlusion model) running around 14 Hz, and the
harsh lighting, leaves on the ground, etc.). We have also marked
radar related preprocessing steps running at over 200 Hz.
the first timestamps where a) the head, b) the body center, c) one
Our framework has a set of parameters and distributions that
of the feet, and d) the entire body of the pedestrian is visible,
should be tuned to the characteristics (type, accuracy, noise, etc.)
see Fig. 6. This allows a temporal alignment of the sequences
of the user’s sensors. A brief overview of these can be found
and a better understanding of the visual occlusion in the case of
in Table I. In this research, the parameters were empirically
different occluding vehicles.
tuned on the distinct dataset used in [43] and during in-vehicle
experiments, and visually validated on the first few sequences
VI. EXPERIMENTS
of the new dataset. ROI was defined as a 4.5 m wide, 14 m
In our experiments, we investigate how the fusion of stereo long rectangle in front of the ego-vehicle. For the camera, we
camera and radar sensors, and the incorporation of occlusion use λF unocc = 1 because detection is reliable in this range in

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1468 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

Fig. 7. Camera views (top left), stixel images (bottom left) and top views (right) of the scene at consecutive timestamps using OAF fusion. Vehicle/pedestrian
stixels are shown with green/white colors (bottom left). Vehicle stixels are also shown as short green lines on the top view, representing the outlines of the detected
parked cars. Occlusion (greyish areas) is calculated as the “shadow” of these cars. Initially, the particles (blue to red, for small to high relative weights) have higher
density and weights in occluded regions (a) and converge on the pedestrian’s position after being detected first by the radar (magenta star, (b)) and then by the
camera sensor (yellow ‘x’, (c)). All views are cropped for easier visibility. Orange arrows connect corresponding objects in different views.

unoccluded regions. λF occ is set to 0.1 for occluded locations. Few


false positives occur in the ROI, so λB is set to 0.05. For radar,
we set λF unocc = 1.5 for unoccluded positions, since multiple
reflections are often received from the same pedestrian. In oc-
unocc = 0.3 as we still expect some reflections
clusions, we set λF
due to the multipath propagation. An average rate of λB = 0.1
is expected for radar, as false positives occur more often than
with camera due to e.g. incorrect ego-motion compensation.

A. Estimated Existence Probability in Dangerous Situations


In our first experiment, we ran the methods on all walking
sequences and recorded the reported existence probabilities as
in Eq. (5). The sequences were temporally aligned by marking
the first moment when the pedestrian’s body center was visible as Fig. 8. Estimated probabilities of a pedestrian being present, averaged over all
t = 0, see Fig. 6. Then, for each timestamp, and for each method, walking sequences, with standard deviation around the mean for fusion methods.
we calculate the mean estimated probability by averaging over t = 0 is the first moment when the pedestrian’s body center was visible. The
addition of radar results in earlier detection than using the camera alone.
all walking sequences as in [43], see Fig. 8. In general, the
inclusion of radar helps to detect the pedestrian earlier. I.e.,
any chosen threshold of probability is reached earlier by the
three fusion methods (naive fusion/OAF fusion/OAFSSD fusion) threshold later than OAF fusion by 0.06 seconds, but still earlier
using both sensors, than by the methods using only the camera. than naive fusion.
For example, on average, the threshold P (Et = 1|Z1:t ) = 0.5 We also examine the sequences individually, and calculate
is reached 0.26 seconds earlier by OAF fusion than by naive the time difference between the reported probabilities of naive
camera. When examining only smaller occluding vehicles (i.e., camera and OAF fusion to be over 0.5. A histogram of the gained
cars), this time gain increases to 0.30 s. In contrast, for sequences reaction times can be found in Fig. 9. In the large majority
with a van as an occlusion, the measured gain is only 0.12 s. (∼ 68%) of dangerous scenarios OAF fusion gains some addi-
The previously discussed threshold of P (Et = 1|Z1:t ) = 0.5 tional reaction time over naive camera.
is reached 0.15 seconds earlier by our proposed occlusion aware In Fig. 7, we show an example of a walking scene to demon-
fusion OAF fusion than by the naive method naive fusion. OAF strate how OAF fusion behaves when there has been no prior
fusion also reports higher probabilities at all times when the detection, and then when first the radar and then the camera has
pedestrian is occluded (t < 0). OAFSSD fusion reaches the same detected the pedestrian.
Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1469

and OAFSSD fusion) report higher probabilities of a darting


pedestrian earlier, see Fig. 10(a). When evaluating the non-
dangerous staying scenarios, all methods report a small, but
moderately increased probability of a darting pedestrian in the
moments before the occluding vehicle and the staying pedestrian
are passed, and significantly decreased probabilities after the
drive-by, see Fig. 10(b).

VII. DISCUSSION
The benefits of including radar in darting out pedestrian
detection has been shown in Subsection VI-A, where all fusion
methods reacted earlier than the methods using only the camera.
Such an earlier detection may mean additional reaction time in
case of a dangerous situation. One cause is that radars can often
detect pedestrians behind parked vehicles, as their reflected radar
Fig. 9. Histogram of the gained reaction times. Time difference is calculated
signal may be able to propagate under the occluding vehicle and
between the moments naive camera and OAF fusion reaches the threshold reach the sensor. Some of the gains could also be the result of
P (Et = 1|Z1:t ) = 0.5. For clarity, here we only show the sequences where cases when the camera was not able to detect the already visible
both methods reach the threshold within the time window of [−1 s, 0.5 s].
pedestrian (e.g. caused by harsh lighting), but the radar was. Our
radar is mounted on the front of the ego-vehicle, as is common
in the industry, see Fig. 1. This could provide a slightly better
B. Distinguishing Dangerous and Non-Dangerous Scenarios
viewing angle and also contribute to the earlier detections.
Similar to [2] we classify the scene into two classes: c = The reaction time gained was significantly greater for smaller
darting (there is a darting pedestrian, dangerous scenario) occluding vehicles such as cars than for vans (0.30 s vs 0.12 s
or c = non − darting (there is no pedestrian, or he/she is for threshold P (Et = 1|Z1:t ) = 0.5). This difference may have
not darting). To do this, we estimate the probability P (Et = been caused by the length of these vehicles. Vans tend to be
1|Z1:t ) of a present pedestrian of any kind (staying or dart- longer than cars, which can affect the propagation of the radar
ing) by Eq. (24). We also estimate whether the assumed-to- signal under the vehicle. This suggests that it may be beneficial
be present pedestrian darts out creating a dangerous scenario to also estimate the length of the parked vehicle and explicitly
P (c = darting|Z1:t , E = 1) based on the estimated state of the include it in the fusion pipeline (i.e., expect fewer reflections
pedestrian ht , see Eq. (25). The pedestrian is assumed to be behind longer vehicles).
darting if he/she is already on the road in front of the ego-vehicle: The benefits of occlusion awareness become clear when we
xt > dangerousP os (axis is perpendicular to the movement of compare the naive methods (naive camera, naive fusion) with
the ego-vehicle, increases towards the road), or he/she has a their occlusion aware pairs (OAF camera, OAF fusion). For
lateral velocity component large enough to assume he/she will example, OAF fusion reports a higher probability of a pedestrian
be on the road later: vx,t > dangerousSpeed. Similarly, we being present than naive fusion at all times when the pedestrian
assume that the pedestrian will not dart out if he/she is far enough is occluded (t < 0). The reason for this is twofold. First, OAF
from the road: xt < saf eP os, or their lateral velocity is close fusion is occlusion aware, and thus it “acknowledges” that parts
to zero/pointing away from the road: vx,t < saf eSpeed. Proba- of the scenes are occluded and cannot be properly observed,
bilities for values between these limits (dangerousP os > xt > leading to uncertainty. That is, the absence or low number of
saf eP os and dangerousSpeed > vx,t > saf eSpeed) are lin- detections from these areas is not considered hard evidence for
early interpolated. We evaluate the probability of darting based the absence of a pedestrian, unlike in naive methods, e.g. naive
on these two conditions (spatial and velocity) independently, and fusion. Instead, particles behind occlusions are weighted higher
then take the maximum of the two values for safety. Finally, the compared to the unoccluded particles to represent this uncer-
probability of a present, darting pedestrian is calculated by mul- tainty, resulting in higher a priori awareness to these locations
tiplying the two probabilities: P (Et = 1, c = darting|Z1:t ) = even before any detections occur, see Fig. 7, left. Similarly, this
P (Et = 1|Z1:t ) · P (c = darting|Z1:t , E = 1). In the staying elevated a priori awareness of an occlusion aware method is also
scenarios, the pedestrian’s body center was not always visible observable between OAF camera and naive camera for t < 0
during the recording as the pedestrian may have remained hidden moments. Such “caution” resembles the behavior of a human
completely. Hence, unlike for walking scenes, we marked the driver approaching highly occluded regions where pedestrians
last moment the occluding vehicle was still visible as t = 0, to might be. Second, detections originating from these occluded
represent the moment when the ego-vehicle passes the occlud- regions are valued more than in the naive methods, because the
ing, parked vehicle. For each timestamp, we average the proba- number of detections received better fits the expectations in Eq.
bility of a darting pedestrian P (Et = 1, c = darting|Z1:t ) for (17). As a result, the likelihoods are higher for the same detec-
walking and staying scenes separately, see Fig. 10. For the tions than when processed by a naive method, e.g. naive fusion,
walking cases, the fusion methods (naive fusion, OAF fusion see Eq. (21).

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1470 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

Fig. 10. Estimated probabilities of a darting pedestrian, averaged over all walking (a) and staying (b) sequences, with standard deviation around the mean for
fusion methods. t = 0 is the first moment when the pedestrian’s body center was visible for walking scenes, and the last moment when the occluding vehicle was
visible for staying scenes. For walking scenes (a), the addition of radar results in earlier detection than using the camera alone. For staying scenes (b), all methods
reports slightly increased, but still small probabilities of danger (i.e. darting) before the passing.

The occlusion aware fusion approach presented in this paper, increase can be explained by the way the particles are initialized
OAF fusion, responded earlier to darting out pedestrians than with a walking speed and an movement orientation pointing
its older version OAFSSD fusion from [43]. The reason for this, to the road, which intentionally introduces a bias towards the
we believe, is twofold. First, as described in Section II, the darting hypothesis. Such increase in uncertainty about whether
occlusion model used by OAFSSD fusion was often inaccurate. the sighted pedestrian will dart out is similar to the reaction of
A more accurate model of the occlusions (see Subsubsection a human driver, who, having noticed a pedestrian in a similar
IV-B1) helped to better evaluate measurements in this study. situation, would also slow down/be more cautious for safety
Although this occlusion model was created using stixels, this reasons. The occlusion aware fusion methods (OAF fusion and
improvement could also be achieved using other methods to OAFSSD fusion) show further increased caution due to perceived
obtain more accurate occlusion information. Second, this work occlusions in the scene, which increase a priori uncertainty
introduced the concept of attribute likelihood components. More by design. OAF fusion, however, shows lower estimated prob-
specifically, for camera based detections, even small patches abilities for all t < 0 timestamps than OAFSSD fusion, again
of detections were accepted as reasonable, valid measurements suggesting that the new occlusion model based on Instance
if they matched our occlusion model. This was also supported Stixels is superior to the one based on SSD, and follows the
by the decision to use instance segmentation as input instead shape of the occlusions more closely.
of standard object detection, since the former tends to provide All filters depend heavily on the quality of the inputs, espe-
more partial detections, which suits our use-case. In the case cially from the camera, where we expect “high-end” detections
of radar, the attribute component meant comparing expected (e.g. pedestrian instance stixels) from an off-the-shelf module,
and observed radial velocities. This filters out unrealistic radar even under occlusion. The quality of camera based detections
targets, which could originate from other road users or simply also affects the reliability of the filter over the occlusion model.
from noise. On the other hand, radar detections that matched Common errors arise from radar targets that are incorrectly re-
our prior expectations of the object’s motion were highly valued ported as moving by the radar due to poor ego-motion estimation,
and increased the probability earlier. It is noteworthy, however, and from camera based detections that mistake vertically shaped
that OAFSSD fusion still responded earlier than naive fusion, objects (e.g., trees) for pedestrians.
suggesting that even its simpler, SSD based occlusion model The proposed system can be further improved in several ways.
benefited more than the attribute likelihood components and For example, an additional use of the occlusion model would be
the use of stixels as camera based detection. This means that, to adjust the expected background noise for the radar. That is,
depending on the application and available resources, using instead of uniform distribution, it might be beneficial to increase
a simpler occlusion model (i.e., SSD instead of IS) could be the expected noise near parked vehicles with highly reflective
satisfactory with the benefit of reduced computational load. metallic chassis, and decrease it in uncluttered regions.
In our second experiment, we presented an example applica- Integrating additional sensors, (e.g., LiDAR) into our frame-
tion of our methods to distinguish dangerous and non-dangerous work is straightforward. In particular, replacing or supporting
situations. For scenes where the pedestrian remains behind the the 2+1D radar used in this paper with a 3+1D radar similar to
car (i.e. not in danger), the estimated probabilities somewhat that used in [31] could be interesting for three reasons. First, the
increase during the drive-by, but remain small. This observed elevation information and increased density of the radar point

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
PALFFY et al.: DETECTING DARTING OUT PEDESTRIANS WITH OCCLUSION AWARE SENSOR FUSION OF RADAR AND STEREO CAMERA 1471

cloud could be used in a more advanced pedestrian classification REFERENCES


step, as shown in [31]. Second, the elevation information could [1] World Health Organization, “Global status report on road safety,”
be further used in this particular use case by filtering the radar 2018. [Online]. Available: https://www.who.int/publications/i/item/
targets based on their elevation angle, leaving only those that 9789241565684
[2] C. G. Keller and D. M. Gavrila, “Will the pedestrian cross? A study on
are received from below the parked, occluding vehicle - as pedestrian path prediction,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 2,
these targets could be the result of multi-path propagation. This pp. 494–506, Apr. 2014.
step would help filter out false positive radar reflections that [3] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer
vision and deep learning techniques for pedestrian detection and tracking:
originate from the chassis of parked cars and not from occluded A survey,” Neurocomputing, vol. 300, pp. 17–33, 2018.
pedestrians. Third, in [31] the 3+1D radar has been shown to be [4] M. Braun, S. Krebs, F. Flohr, and D. M. Gavrila, “EuroCity persons:
capable of detecting both moving and parked vehicles. As such, A novel benchmark for person detection in traffic scenes,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1844–1861, Aug. 2019.
it could contribute directly to the occlusion model and reduce or [5] S. Heuel and H. Rohling, “Pedestrian recognition in automotive radar
even eliminate the need for the camera sensor. sensors,” in Proc. IEEE Int. Radar Symp., 2013, pp. 732–739.
To generalize the filter for other road users, one has to adjust [6] A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “CNN based road user
detection using the 3D radar cube,” IEEE Robot. Automat. Lett., vol. 5,
the prior velocity and RCS values, e.g., faster and more reflec- no. 2, pp. 1263–1270, Apr. 2020.
tive targets should be expected from a cyclist. For the camera [7] O. Schumann, M. Hahn, J. Dickmann, and C. Wöhler, “Semantic segmen-
based detectors (IS, SSD), the expected class of object has to be tation on radar point clouds,” in Proc. IEEE Int. Conf. Inf. Fusion, 2018,
pp. 2179–2186.
changed. Multiple road users can also be tracked with the filter [8] K. Granström, S. Reuter, M. Fatemi, and L. Svensson, “Pedestrian track-
by modifying the state estimation step in Eq. (25) to expect ing using velodyne data — stochastic optimization for extended object
more than one peak in the particle distribution. Consideration tracking,” in Proc. IEEE Intell. Veh. Symp., 2017, pp. 39–46.
[9] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom,
of objects other than vehicles as occlusions, e.g., walls, is also “PointPillars: Fast encoders for object detection from point clouds,” in
possible, and the observed visible height should be treated as in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12689–12697.
this study. However, the type of occlusion must be considered for [10] R. Streubel and B. Yang, “Fusion of stereo camera and MIMO-FMCW
radar for pedestrian tracking in indoor environments,” in Proc. IEEE Int.
radar, since multipath propagation is not possible if the occlusion Conf. Inf. Fusion, 2016, pp. 565–572.
has no space under it, such as walls. [11] J. Schlosser, C. K. Chow, and Z. Kira, “Fusing LIDAR and images for
Finally, we did state estimation in this research and showed pedestrian detection using convolutional neural networks,” in Proc. IEEE
Int. Conf. Robot. Automat., 2016, pp. 2198–2205.
quantitative benefits of both fusion and occlusion awareness. [12] B. Bartels and H. Liers, “Bewegungsverhalten von fußgängern im straßen-
However, extending the scope to trajectory prediction, the gained verkehr,” FAT-Schriftenreihe, vol. 268, no. 2, pp. 1–59, 2014 .
reaction times detecting/predicting dangerous situations could [13] European New Car Assessment Programme, “Test protocol - AEB VRU
systems,” 2020 [Online]. Available: https://cdn.euroncap.com/media/
be even greater. 58226/euro-ncap-aeb-vru-test-protocol-v303.pdf
[14] R. Sherony and C. Zhang, “Pedestrian and bicyclist crash scenarios in the
U.S.,” in Proc. IEEE Conf. Intell. Transp. Syst., 2015, pp. 1533–1538.
[15] A. Bartsch, F. Fitzek, and R. H. Rasshofer, “Pedestrian recognition using
VIII. CONCLUSIONS AND FUTURE WORK automotive radar sensors,” Adv. Radio Sci., vol. 10, pp. 45–55, 2012.
[16] N. Scheiner et al., “Seeing around street corners: Non-line-of-sight detec-
In this paper we proposed a generic occlusion aware multi- tion and tracking in-the-wild using doppler radar,” in Proc. IEEE Conf.
sensor Bayesian filter to detect occluded crossing pedestrians. Comput. Vis. Pattern Recognit., 2020, pp. 2065–2074.
To facilitate our and future research of these scenarios, we [17] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The
KITTI dataset,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237, 2013.
publish our dataset of more than 500 relevant scenarios with [18] C. Ning, L. Menglu, Y. Hao, S. Xueping, and L. Yunhong, “Survey of
stereo camera, radar, LiDAR, and odometry data. We applied the pedestrian detection with occlusion,” Complex Intell. Syst., vol. 7, no. 1,
proposed filter to camera and radar data using this dataset, and pp. 577–587, 2021.
[19] M. Enzweiler, A. Eigenstetter, B. Schiele, and D. M. Gavrila, “Multi-cue
provided techniques to account for the unique characteristics pedestrian classification with partial occlusion handling,” in Proc. IEEE
of these sensors. Our results show that both the inclusion of Conf. Comput. Vis. Pattern Recognit., 2010, pp. 990–997.
radar sensor and occlusion information is beneficial for this use [20] Y. Tian, P. Luo, X. Wang, and X. Tang, “Deep learning strong parts
for pedestrian detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2015,
case, as pedestrians are detected earlier in dangerous walking pp. 1904–1912.
scenarios. For example, the threshold of 0.5 for the estimated [21] C. Zhou and J. Yuan, “Learning to integrate occlusion-specific detectors
existence probability of a pedestrian in the scene is reached on for heavily occluded pedestrian detection,” in Proc. Asian Conf. Comput.
Vis., 2017, pp. 305–320.
average 0.26 seconds earlier by our occlusion aware fusion than [22] C. Zhou and J. Yuan, “Multi-label learning of part detectors for heavily
by a naive camera only detector, and 0.15 seconds earlier than occluded pedestrian detection,” in Proc. IEEE Int. Conf. Comput. Vis.,
by the method that fuses the two sensors in a naive way. 2017, pp. 3506–3515.
[23] X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun, and C. Shen, “Repulsion
We also showed in an application example of our filter that loss: Detecting pedestrians in a crowd,” in Proc. IEEE Conf. Comput. Vis.
it can distinguish between dangerous and non-dangerous situa- Pattern Recognit., 2018, pp. 7774–7783.
tions, which is necessary to avoid false alarms. In this task, too, [24] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Occlusion-aware R-CNN:
Detecting pedestrians in a crowd,” in Proc. Eur. Conf. Comput. Vis., 2018,
the inclusion of the radar proved to be beneficial. pp. 657–674.
Future work may include a more precise expected distribution [25] M. Braun, F. B. Flohr, S. Krebs, U. Kreße, and D. M. Gavrila, “Simple
of background noise, improved scene classification by extend- pair pose - pairwise human pose estimation in dense urban traffic scenes,”
in Proc. IEEE Intell. Veh. Symp., 2021, pp. 1545–1552.
ing the scope for trajectory prediction, and the inclusion of [26] X. Wang, A. Shrivastava, and A. Gupta, “A-Fast-RCNN: Hard positive
further sensors, more particularly a 3+1D radar as discussed generation via adversary for object detection,” in Proc. IEEE Conf. Com-
Section VII. put. Vis. Pattern Recognit., 2017, pp. 3039–3048.

Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.
1472 IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 8, NO. 2, FEBRUARY 2023

[27] O. Schumann, M. Hahn, J. Dickmann, and C. Wöhler, “Comparison of [53] D. Barnes, M. Gadd, P. Murcutt, P. Newman, and I. Posner, “The oxford
random forest and long short-term memory network performances in radar RobotCar dataset: A radar extension to the oxford RobotCar dataset,”
classification tasks using radar,” in Proc. Sensor Data Fusion: Trends, in Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 6433–6438.
Solutions, Appl., 2017, pp. 1–6. [54] J. Bai, L. Zheng, S. Li, B. Tan, S. Chen, and L. Huang, “Radar transformer:
[28] R. Prophet et al., “Pedestrian classification with a 79 GHz automotive radar An object classification network based on 4D MMW imaging radar,”
sensor,” in Proc. IEEE Int. Radar Symp., 2018, pp. 1–6. Sensors, vol. 21, no. 11, 2021, Art. no. 3854.
[29] R. Pérez, F. Schubert, R. Rasshofer, and E. Biebl, “Single-frame vulner- [55] O. Schumann et al., “RadarScenes: A. real-world radar point cloud data set
able road users classification with a 77 GHz FMCW radar sensor and for automotive applications,” in Proc. IEEE Int. Conf. Inf. Fusion, 2021,
a convolutional neural network,” in Proc. IEEE Int. Radar Symp., 2018, pp. 1–8.
pp. 1–10. [56] H. Caesar et al., “nuScenes: A multimodal dataset for autonomous driving,”
[30] A. Danzer, T. Griebel, M. Bach, and K. Dietmayer, “2D car detection in in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11618–
radar data with PointNets,” in Proc. IEEE Conf. Intell. Transp. Syst., 2019, 11628.
pp. 61–66. [57] M. Mostajabi, C. M. Wang, D. Ranjan, and G. Hsyu, “High resolution radar
[31] A. Palffy, E. Pool, S. Baratam, J. F. P. Kooij, and D. M. Gavrila, “Multi- dataset for semi-supervised learning of dynamic objects,” in Proc. IEEE
class road user detection with 3+ 1D radar in the view-of-delft dataset,” Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 450–457.
IEEE Robot. Automat. Lett., vol. 7, no. 2, pp. 4961–4968, Apr. 2022. [58] M. Meyer and G. Kuschk, “Automotive radar dataset for deep learning
[32] S. Heuel and H. Rohling, “Pedestrian classification in automotive radar based 3D object detection,” in Proc. IEEE Eur. Radar Conf., 2019, pp. 129–
systems,” in Proc. IEEE Int. Radar Symp., 2012, pp. 39–44. 132.
[33] M. Heuer, A. Al-Hamadi, A. Rain, and M. M. Meinecke, “Detection and [59] T. Li, S. Sun, T. P. Sattar, and J. M. Corchado, “Fight sample degeneracy
tracking approach using an automotive radar to increase active pedestrian and impoverishment in particle filters: A review of intelligent approaches,”
safety,” in Proc. IEEE Intell. Veh. Symp., 2014, pp. 890–893. Expert Syst. Appl., vol. 41, no. 8, pp. 3944–3954, 2014.
[34] S. Hayashi, K. Saho, D. Isobe, and M. Masugi, “Pedestrian detection in [60] L. Ferranti et al., “SafeVRU: A research platform for the interaction of
blind area and motion classification based on rush-out risk using micro- self-driving vehicles with vulnerable road users,” in Proc. IEEE Intell.
doppler radar,” Sensors, vol. 21, no. 10, 2021, Art. no. 3388. Veh. Symp., 2019, pp. 1660–1666.
[35] M. P. Muresan, I. Giosan, and S. Nedevschi, “Stabilization and valida- [61] J. Domhof, J. F. P. Kooij, and D. M. Gavrila, “A joint extrinsic calibration
tion of 3D object position using multimodal sensor fusion and semantic tool for radar, camera and LIDAR,” IEEE Trans. Intell. Veh., vol. 6, no. 3,
segmentation,” Sensors, vol. 20, no. 4, 2020, Art. no. 1110. pp. 571–582, Sep. 2021.
[36] R. O. Chavez-Garcia and O. Aycard, “Multiple sensor fusion and clas-
sification for moving object detection and tracking,” IEEE Trans. Intell.
Transp. Syst., vol. 17, no. 2, pp. 525–534, Feb. 2016.
[37] S. Chadwick, W. Maddern, and P. Newman, “Distant vehicle detection
using radar and vision,” in Proc. IEEE Int. Conf. Robot. Automat., 2019, Andras Palffy (Member, IEEE) received the M.Sc.
pp. 8311–8317. degree in computer science engineering from Paz-
[38] J. Nie, J. Yan, H. Yin, L. Ren, and Q. Meng, “A multimodality fusion many Peter Catholic University, Budapest, Hungary,
deep neural network and safety test strategy for intelligent vehicles,” IEEE in 2016, and the M.Sc. degree in digital signal and im-
Trans. Intell. Veh., vol. 6, no. 2, pp. 310–322, Jun. 2021. age processing from Cranfield University, Cranfield,
[39] S. K. Kwon, E. Hyun, J.-H. Lee, J.-H. Lee, and S. H. Son, “Detection U.K., in 2015. He is currently working toward the
scheme for a partially occluded pedestrian based on occluded depth in Ph.D. degree with Delft University of Technology,
lidar-radar sensor fusion,” Opt. Eng., vol. 56, no. 11, 2017, Art. no. 113112. Delft, Netherlands, focusing on radar based vulnera-
[40] J. F. P. Kooij, N. Schneider, F. Flohr, and D. M. Gavrila, “Context-based ble road user detection for autonomous driving. From
pedestrian path prediction,” in Proc. Eur. Conf. Comput. Vis., 2014, 2013 to 2017, he was with Startup Eutecus, develop-
pp. 618–633. ing computer vision algorithms for traffic monitoring
[41] A. Angelov, A. Robertson, R. Murray-Smith, and F. Fioranelli, “Practical and driver assistance applications.
classification of different moving targets using automotive radar and
deep neural networks,” IET Radar, Sonar Navigation, vol. 12, no. 10,
pp. 1082–1089, 2018.
[42] S. Munder, C. Schnörr, and D. M. Gavrila, “Pedestrian detection and
tracking using a mixture of view-based shape-texture models,” IEEE Trans. Julian F. P. Kooij (Member, IEEE) received the
Intell. Transp. Syst., vol. 9, no. 2, pp. 333–343, Jun. 2008. Ph.D. degree in artificial intelligence from the Uni-
[43] A. Palffy, J. F. P. Kooij, and D. M. Gavrila, “Occlusion aware sensor fusion versity of Amsterdam, Amsterdam, Netherlands, in
for early crossing pedestrian detection,” in Proc. IEEE Intell. Veh. Symp., 2015. In 2013, he was with Daimler AG worked on
2019, pp. 1768–1774. path prediction for vulnerable road users. In 2014,
[44] A. Almeida, J. Almeida, and R. Araújo, “Real-time tracking of moving he joined Computer Vision Lab, Delft University of
objects using particle filters,” in Proc. IEEE Int. Symp. Ind. Electron., 2005, Technology (TU Delft), Delft, Netherlands. Since
pp. 1327–1332. 2016, he has been with Intelligent Vehicles Group,
[45] Z. Radosavljević, D. Mušicki, B. Kovačević, W. C. Kim, and T. L. Song, part of the Cognitive Robotics Department, TU Delft,
“Integrated particle filter for target tracking in clutter,” IET Radar, Sonar where he is currently an Associate Professor. His
Navigation, vol. 9, no. 8, pp. 1063–1069, 2015. research interests include probabilistic models and
[46] S. Hoermann, P. Henzler, M. Bach, and K. Dietmayer, “Object detection machine learning techniques to infer and anticipate critical traffic situations
on dynamic occupancy grid maps using deep learning and automatic label from multi-modal sensor data.
generation,” in Proc. IEEE Intell. Veh. Symp., 2018, pp. 826–833.
[47] D. Nuss, T. Yuan, G. Krehl, M. Stübler, S. Reuter, and K. Dietmayer, “Fu-
sion of laser and radar sensor data with a sequential monte carlo bayesian
occupancy filter,” in Proc. IEEE Intell. Veh. Symp., 2015, pp. 1074–1081.
[48] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Dariu M. Gavrila (Member, IEEE) received the
Comput. Vis., 2016, pp. 21–37. Ph.D. degree in computer science from the University
[49] H. Badino, U. Franke, and D. Pfeiffer, “The stixel world - A compact of Maryland, College Park, MD, USA, in 1996. From
medium level representation of the 3D-World,” in Lecture Notes in Com- 1997, he was with Daimler R&D, Ulm, Germany,
put. Sci., Berlin, Heidelberg, Germany: Springer, 2009, pp. 51–60. where he became a Distinguished Scientist. 2016,
[50] L. Schneider et al., “Semantic stixels: Depth is not enough,” in Proc. IEEE he moved to Delft University of Technology, Delft,
Intell. Veh. Symp., 2016, pp. 110–117. Netherlands, where he since Heads the Intelligent
[51] T. Hehn, J. Kooij, and D. Gavrila, “Fast and compact image segmentation Vehicles Group as a Full Professor. His research
using instance stixels,” IEEE Trans. Intell. Veh., vol. 7, no. 1, pp. 45–56, interests include sensor-based detection of humans
Mar. 2022. and analysis of behavior, recently in the context of the
[52] G. Kim, Y. S. Park, Y. Cho, J. Jeong, and A. Kim, “MulRan: Multimodal self-driving cars in urban traffic. He was the recipient
range dataset for urban place recognition,” in Proc. IEEE Int. Conf. Robot. of the Outstanding Application Award 2014 and the Outstanding Researcher
Automat., 2020, pp. 6246–6253. Award 2019, from the IEEE Intelligent Transportation Systems Society.
Authorized licensed use limited to: Eastern Michigan University. Downloaded on April 03,2024 at 20:12:38 UTC from IEEE Xplore. Restrictions apply.

You might also like