Competing With Autonomous Model Vehicles A Softwar

Uploaded by

Minh Nhật Võ Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Competing With Autonomous Model Vehicles A Softwar

Uploaded by

Minh Nhật Võ Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Bächle et al.

Autonomous Intelligent Systems (2024) 4:17 Autonomous Intelligent

https://doi.org/10.1007/s43684-024-00074-w
Systems

S H O R T PA P E R Open Access

Competing with autonomous model

vehicles: a software stack for driving in smart
city environments
Julius Bächle1† , Jakob Häringer1† , Noah Köhler1† , Kadir-Kaan Özer1† , Markus Enzweiler1* and
Reiner Marchthaler1

Abstract
This article introduces an open-source software stack designed for autonomous 1:10 scale model vehicles. Initially
developed for the Bosch Future Mobility Challenge (BFMC) student competition, this versatile software stack is
applicable to a variety of autonomous driving competitions. The stack comprises perception, planning, and control
modules, each essential for precise and reliable scene understanding in complex environments such as a miniature
smart city in the context of BFMC. Given the limited computing power of model vehicles and the necessity for
low-latency real-time applications, the stack is implemented in C++, employs YOLO Version 5 s for environmental
perception, and leverages the state-of-the-art Robot Operating System (ROS) for inter-process communication. We
believe that this article and the accompanying open-source software will be a valuable resource for future teams
participating in autonomous driving student competitions. Our work can serve as a foundational tool for novice
teams and a reference for more experienced participants. The code and data are publicly available on GitHub.
Keywords: Autonomous model vehicle, Software architecture, Embedded real-time systems, Bosch Future Mobility
Challenge, Autonomous driving

1 Introduction system, such as perception, planning, and control. Further-

Autonomous driving is one of the most significant chal- more, they provide the opportunity to dive deeper into
lenges of our time, with the potential to revolutionize topics such computer vision, machine learning, and con-
transportation, enhancing safety and efficiency on our trol theory, fostering independent research projects and
roads. The development of autonomous vehicles is a com- the development of new algorithms and techniques.
plex task that requires the integration of various advanced This paper presents a comprehensive software stack for
technologies, including computer vision, machine learn- autonomous model vehicles, utilized during the Bosch
ing, and robotics. One effective way to inspire young peo- Future Mobility Challenge (BFMC) 2023 [1]. The BFMC
ple to pursue careers in this field is by providing opportu- serves as a competitive platform for students to share
nities to learn and experiment with autonomous model ve- knowledge, build connections, and showcase their work,
hicles, as depicted in Fig. 1. These models enable students thereby motivating them to enhance their skills and knowl-
to grasp the fundamental requirements of an autonomous edge. However, for students new to the field, developing
a deep understanding of the software stack used in high-
level competitions can be challenging due to the scarcity of
*
Correspondence: [email protected] accessible prior work. This paper and the provided code-
1
Institute for Intelligent Systems, Faculty of Computer Science and base aim to bridge this gap.
Engineering, Esslingen University of Applied Sciences, Flandernstraße 101,
Esslingen am Neckar, 73732, Germany The software stack is designed to perform a wide range
†
Equal contributors of tasks while maintaining affordability by using a min-

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The
images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise
in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 2 of 13

customized. Special wheel speed sensors were installed

to measure the traveled distance more accurately. Addi-
tionally, an Nvidia Jetson TX2 [5], a high-performance
and power-efficient embedded computing device, was in-
tegrated to accelerate the vehicle’s perception and data
processing capabilities. The primary sensor is an Intel Re-
alSense D435 camera [6], which features an RGB color sen-
sor and two infrared cameras for stereo vision, providing
depth information about the vehicle’s surroundings.
Figure 2 illustrates the overall hardware architecture.
The camera is directly connected to the TX2, enabling
rapid processing of the video stream. Detected objects and
lane markings are then transmitted to the Raspberry Pi
via User Datagram Protocol (UDP). The Raspberry Pi pro-
cesses data from the TX2, wheel speed sensors, and the
Figure 1 1:10 scale model car including sensors, compute platforms, Inertial Measurement Unit (IMU). After processing, actu-
and actuators ator commands are sent to the Nucleo board, which con-
trols the longitudinal movement using the motor and mo-
tor driver, and the lateral movement using the steering
imal number of sensors. It is implemented in C++, uti- servo.
lizes YOLO Version 5 s [2] for environmental perception,
and employs the state-of-the-art Robot Operating System 2.2 Software architecture
(ROS) for inter-process communication. We provide a de- The software architecture for the vehicle is designed to
tailed overview of the hardware and software architecture, distribute tasks across the available computing units, opti-
with a focus on the interaction between perception, behav- mizing resource utilization and improving system respon-
ior, and trajectory planning. The paper demonstrates how siveness. The software stack is divided into three main
object and lane recognition approaches can be adapted blocks: perception, planning, and acting. Each block is as-
to model vehicles. Additionally, we discuss the decision- signed to a specific computing unit to facilitate efficient
making process in an autonomous vehicle and the meth- data management and minimize communication over-
head. Figure 3 gives an overview of the architecture, which
ods for calculating actions and trajectories.
is explained in this section.
2 Architecture Perception The object detection and lane detection tasks
In the pursuit of autonomous mobility for the BFMC 2023,
are implemented on the graphics processing unit (GPU)
each participating team developed a model vehicle with to enhance processing efficiency. Lane detection employs
custom hardware and software architectures. The first a deterministic approach, while object detection utilizes a
section presents the hardware architecture, encompass- neural network on the GPU. Data exchange between the
ing physical components such as embedded computing GPU and the main processing unit employs the User Data-
boards, various sensors, and critical actuators. The design gram Protocol (UDP) due to its lightweight nature. The
prioritizes simplicity and cost-effectiveness to ensure the real-time messages include the class, bounding box, and
system functionality and accessibility. The second section distance of the detected objects. The lane detection mes-
details the software architecture, focusing on the efficient sage contains information about the curvature and the dis-
allocation of tasks to individual computing units to opti- tance to the center of the lane. As false detections are fil-
mize resource utilization. We provide insight into the ra- tered before transmission, data transfer is minimized, al-
tionale behind our design decisions and highlight special lowing the behavior planning to react almost immediately
features of our software architecture, offering a valuable to all received messages. Detailed explanations are pro-
reference point for future teams in autonomous driving vided in Sect. 3.
student competitions.
Planning The main computing unit, equipped with a
2.1 Hardware architecture Raspberry Pi, employs the Robot Operating System (ROS)
All teams admitted to the BFMC 2023 received a 1:10 Noetic [7] for robust data communication. Input data from
scale model car, which included a Raspberry Pi 4 Model the perception module, IMU, and wheel speed sensors are
B [3] and an STM32 Nucleo F401RE microcontroller [4], analyzed to formulate driving strategies. Target steering
as shown in Fig. 2. To optimize performance while main- angles and speed signals are determined and transmit-
taining budget and simplicity, several components were ted to the Nucleo board via the Universal Asynchronous
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 3 of 13

Figure 2 Overview of the hardware architecture and interfaces

Figure 3 Overview of the software architecture

Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 4 of 13

Receiver/Transmitter (UART) protocol. Detailed explana- Overall, the model detects 15 classes, including cross-
tions are provided in Sect. 4. walks, stop signs, and priority signs. Additionally, dynamic
traffic participants (e.g., cars and pedestrians) and static
Acting The Nucleo board receives control signals from obstacles are recognized. The model also identifies stop
the main computing unit and utilizes a PID controller to lines and parking spaces for junctions and parking situa-
adjust the actual speed based on the target speed. The tions.
steering angle is set to a target value, constrained within
defined boundary conditions. Details are given in Sect. 4. Model selection Implementing an efficient and robust
object detection system is paramount in the development
3 Perception of autonomous model vehicles for competitions such as
This section examines the design of our perception system, the BFMC. One of the critical tasks in this domain is the
covering camera setup, object detection, performance op- identification of obstacles, paths, and other relevant envi-
timization on the Raspberry Pi, and lane detection al- ronmental features. After considering various algorithms,
gorithms, all tailored towards the BFMC environment. YOLOv5s [2, 8], a variant of the YOLO family of object
Key points include sensor choice and camera alignment, detection models, was selected for this purpose due to its
dataset creation, neural network selection, architecture strengths and suitability for the specific requirements of
and training, task parallelization, and efficiency enhance- the 1:10 scale autonomous model vehicle.
ment methods. Lane detection is addressed through pre- YOLOv5s, as the second smallest and fastest model in
processing steps and histogram-based techniques. This the YOLOv5 series, offers a balance of speed and accu-
overview aims to provide a clear understanding of the sys- racy, making it suitable for real-time object detection in
tems behind the functioning of autonomous model vehi- resource-constrained environments like model vehicles
[9]. The model’s architecture, building upon the advance-
cles, especially in competitive settings like the BFMC.
ments of its predecessors, incorporates several features
that meet the high-performance demands of autonomous
3.1 Camera setup
navigation while remaining computationally efficient [10].
The Intel RealSense camera utilized in this system features
It includes optimizations like anchor box adjustments and
an RGB sensor with resolutions up to 1920 × 1080 px [6].
advanced training techniques [11], making it suitable for
To balance performance and processing speed, we operate
real-time object detection in autonomous vehicles. Its abil-
the camera at a resolution of 960 × 540 pixels at a sampling
ity to detect objects of various sizes and under different
rate of 30 Hz. The depth images are spatially aligned with
conditions is crucial for the safety and reliability of au-
the RGB color images. The RGB module is positioned on
tonomous driving systems [12].
the left side of the camera, providing comprehensive cap-
In addition to these technical advantages, the widespread
ture on the left side. To ensure traffic signs on the right
adoption and active development community surround-
side are detected at shorter distances, the camera has been
ing the YOLO family of models provide resources for sup-
rotated accordingly. port and further enhancements. The availability of pre-
trained models, extensive documentation, and a large user
3.2 Object detection community contribute to the development process and fa-
Dataset A test track was constructed to evaluate the soft- cilitate the implementation of advanced features and im-
ware stack and capture images for training the object de- provements.
tection network (see Fig. 4). The test track was designed
to closely follow the rules of the BFMC [1] and includes a Methodology The training of YOLOv5s involved opti-
roundabout, two intersections, and a parking lot. It is com- mizing parameters such as the number of epochs, batch
plemented by signs with 3D-printed poles and two pedes- size, and the use of autobatching. The performance was
trian dummies. Although the test track is sufficient to ana- evaluated based on four key metrics: box loss, object loss,
lyze most scenarios, it is approximately four times smaller class loss, and Mean Average Precision (MAP). These met-
than the original competition track. rics collectively offer insights into the model’s accuracy, re-
The dataset used to train the object recognition model liability, and efficiency in detecting and classifying objects.
consists of 4665 images captured while driving on the test Box loss emphasizes the spatial accuracy of object de-
track and during the competition. Additionally, 774 images tection, measuring the precision of the predicted bound-
from videos provided by Bosch were included. These im- ing boxes against the ground-truth boxes. Object loss ad-
ages were taken from vehicles in previous competitions, dresses the discernment between objects and non-objects,
using different cameras, resolutions, and aspect ratios. evaluating the model’s ability to detect and distinguish ob-
Despite these variations, incorporating these images im- jects from the background. Class loss measures the accu-
proved perception in scenes that were difficult to recreate racy of categorizing detected objects into the correct cate-
on our own test track, such as motorway exits. gories. A well-trained model should ideally have low scores
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 5 of 13

Figure 4 Test track at the Institute for Intelligent Systems at Esslingen University of Applied Sciences

Table 1 Parameters, validation losses, and MAP of diﬀerent training / validation runs
Epochs Batch-size Box loss Object loss Class loss MAP
75 64 0.0121 0.0040 0.0004 0.8653
75 32 0.0244 0.0064 0.0009 0.6657
100 32 0.0129 0.0039 0.0005 0.8638
100 autobatch 0.0156 0.0049 0.0008 0.8194
300 64 0.0123 0.0040 0.0005 0.8713
300 32 0.0123 0.0043 0.0005 0.8696
500 32 0.0113 0.0044 0.0009 0.8688

on all three types of losses, indicating high precision and as detecting small model cars. This insight can guide future
accuracy in both detecting and classifying objects in im- training procedures in similar applications, emphasizing
ages. the need for a balanced approach to training deep learning
models.
Training This section focuses on the different training
parameters and the definition of various losses regarding
Filtering of misdetections In addition to missing known
the YOLOv5 object detection model. Table 1 provides a
brief overview of the performance differences resulting objects (false negatives), recognizing false objects (false
from the parameter adjustments. positives) is also a significant problem in the perception of
The analysis of various training configurations of neural networks. These misdetections can lead to incor-
YOLOv5s for the BFMC underscores the importance of rect reactions of the model vehicle, so detections are fil-
carefully selecting training parameters. The configuration tered using prior scene knowledge before being forwarded
with 300 epochs and a batch size of 64 emerged as the most to behavior planning. To analyze the effectiveness of the
effective, striking an optimal balance between training du- filters in more detail, the number of detections removed
ration and model performance. during a drive on the test track was recorded. The applied
This setup not only achieved the highest MAP but also filters and the proportion of valid detections that passed
maintained low loss values, making it the preferred choice our filters are shown in Fig. 5. The filter functions are ap-
for tasks requiring high precision in object detection, such plied sequentially from top to bottom.
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 6 of 13

Preprocessing The ﬁrst step of the lane detection algo-

rithm’s preprocessing routine is to crop the RGB input im-
age, focusing on a pre-selected region of interest (ROI)
containing the road area. This static cropping operation
eliminates the need to dynamically ascertain a ROI for
each frame, thus preventing any potential performance
overhead. By reducing computational complexity in sub-
sequent processing stages, this approach enhances the ef-
ficiency and accuracy of the algorithm by focusing on the
relevant portion of the image.
The cropped section of the image is then converted into
a bird’s eye view (BEV) format. This conversion aids in the
Figure 5 Filtered and passed detections identification of lane markings by presenting a more intu-
itive representation of the road layout, allowing the algo-
rithm to interpret the spatial connections more effectively
between lane markings and the location of the vehicle. The
Before the detections are counted, a confidence thresh- transformation to BEV is accomplished by utilizing homo-
old and a non-maximum suppression (NMS) threshold are graphies to map points from one image plane to another.
applied, as is typical for object detection networks. The homography parameters are established through cor-
In the next step, all detections with a measured distance responding points between the initial image and a BEV ref-
of zero are filtered out. The distance for a bounding box erence image, ensuring precise alignment and reconstruc-
detection is determined as the average distance in the area tion of the road scene (see Fig. 6a and Fig. 6b).
covered by the bounding box, estimated via the available To enhance computational efficiency and emphasize
depth image. False positives typically occur inconsistently intensity-based lane detection, the BEV image is converted
over time, so only objects recognized in several consecu- to grayscale. Color information is often redundant for lane
tive frames are considered valid. detection, as intensity-based contrast between lane mark-
Given the context of the BFMC, where traffic lights and ings and the surrounding road surface suffices for effective
most traffic signs appear on the right-hand side, those rec- differentiation. The linear transformation method is em-
ognized on the left-hand side are ignored. Additionally, de- ployed for grayscale conversion, preserving intensity data
tections are filtered using distance thresholds between one and removing extraneous color information by computing
and three meters, allowing the system to ignore distant ob- a weighted average of the red, green, and blue channels.
jects that are not yet relevant to the vehicle. This check Further refining the representation of the road layout,
for maximum distance, combined with the minimum con- the grayscale BEV image is binarized in a two-step process.
fidence threshold, accounts for approximately 24% of the First, Canny edge detection [13], a technique for effectively
identifying image edges, is applied. Canny edge detection
filtered detections and has been empirically estimated.
is particularly well-suited for intensity-based lane detec-
The final filter ensures that detected objects match the
tion because it robustly extracts edges while minimizing
expected spatial relationship in the scene. For example, ob-
noise and preserving essential features such as lane mark-
jects are assumed to be ground-based and traffic signs are
ings. The resulting edge map, which highlights the bound-
assumed to be at a certain height above the ground. For
aries between lane markings and the surrounding road sur-
each bounding box, we estimate the distance measured via
face, serves as a critical input for subsequent lane detection
the depth image and a second distance measure obtained
stages.
from the known camera geometry, punishing deviations
Second, a mask is generated from the pre-Canny image
between both estimates. to eliminate unnecessary detections and additional noise.
This is done by applying a threshold that binarizes each
3.3 Lane detection pixel into either white or black. The mask is then super-
The lane detection algorithm follows an engineered ap- imposed on the post-Canny edge image to produce a clear
proach rather than relying on machine learning. It is di- representation of the road layout. The result is shown in
vided into two main phases: preprocessing, which includes Fig. 6c.
tasks like cropping and transforming the image, and de- In the final preprocessing step, a histogram is con-
tection, in which lane markings are identified using search structed by counting the number of white pixels per col-
boxes. The result of the lane detection algorithm is the cur- umn of the bottom 25 pixel rows of the grayscale image.
vature of the lane markings and the offset of the vehicle This ROI corresponds to the road surface where the ori-
from the middle of the lane. gins of lane markings are typically located from the vehi-
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 7 of 13

Figure 6 Preprocessing steps of the lane detection. a) cropped Image, b) BEV representation, c) masked image

Figure 7 Successive steps for the determination of the vehicle’s oﬀset from the center of the roadway and the roadway’s curvature

cle’s perspective. The histogram provides a statistical rep- the center coordinates of the next search box, as visualized
resentation of the intensity distribution across the image in Fig. 7b.
columns, allowing the identification of prominent peaks After iterating through the complete image, the row and
that correspond to lane markings. This information serves column values identified from the histogram peaks in each
as crucial input to the subsequent lane detection stages search box are preserved. To approximate the smooth tra-
(see Fig. 7). jectory of lane markings and eliminate outliers, we fit a
Together, these preprocessing steps prepare the input quadratic parabola to the accumulated x-y pixel pairs us-
image for the lane detection algorithm, effectively trans- ing a least-squares fitting approach, similar to the method
forming the raw camera data into a format suitable for ro- described in [15] and shown in Fig. 7c. Curve fitting pro-
bust and accurate lane detection, similar to the approach vides a more reliable representation of lane boundaries by
used in [14]. smoothing out fluctuations in the x-y-centers of the search
boxes and capturing the underlying curvature of the lane
markings.
Detection During the second stage, the algorithm iden-
The fitted parabola is then used to calculate the lane tra-
tifies the lane markings on the road. Utilizing the origins
jectory, providing necessary information about the vehi-
of the lane markings at the bottom of the image, identified
cle’s lane position and potential departures from the lane.
through the preprocessing routine, the algorithm scans the
This information is essential for enabling the vehicle to
binary BEV image from bottom to top in search of lane
maintain its position within the lane and prevent lane de-
markings. This scanning process employs search boxes partures, a critical safety feature for autonomous vehicles.
(see Fig. 7a) that start at the identified origins and serve The lane detection algorithm outputs the lane boundaries
two primary purposes. First, implementing a search box and their curvature, as well as the vehicle’s offset from the
reduces computational requirements by limiting the scope center of the roadway.
of pixel analysis. Second, using a search box decreases the
likelihood of misinterpreting a single pixel and minimizes 3.4 Performance optimization
the effect of pixel-wise errors in the search process. As discussed in Sect. 2, the TX2 runs the entire percep-
A histogram is created for the area of each search box. tion stack, fetching camera images, detecting lanes and ob-
This histogram offers a statistical representation of the in- jects, and communicating the results to the Raspberry Pi.
tensity distribution of white pixels in the search box across While image retrieval and alignment must be carried out
each column, facilitating the identification of significant sequentially before processing, lane and object detection
peaks that correspond to possible lane markings. After fil- can be parallelized. With sequential execution, the entire
tering out columns lacking sufficient white pixels, the aver- perception module requires 64 ms from image retrieval to
age of the remaining column numbers is used to pinpoint message transmission, as shown in Table 2. When lane and
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 8 of 13

Table 2 Perception runtime for sequential and parallel execution

(average) the Nucleo board. Together, these three modules consti-
tute the essential framework of behavior planning, en-
Module Sequential [ms] Parallel [ms] abling the vehicle to navigate, comprehend, and react to
Image Retrieval 13 15 its surroundings.
Object Detection 36 36 Behavior and trajectory planning is performed at a fre-
Lane Detection 14 14
quency of 20 Hz, although this frequency could be in-
All 64 51
creased as this part of the software stack involves relatively
simple calculations with an overall runtime of approxi-
Table 3 Perception runtime using different inference engines mately 2 ms. It is important to note that synchronization
(average) with the slowest component in the system is maintained to
avoid bottlenecks; in this case the perception module.
Inference engine Object detection [ms] Perception module [ms]
ONNX 87 100 4.1 Environment
TensorRT (FP32) 55 72
TensorRT (FP16) 36 52
The central pillar of the autonomous vehicle architecture
is the “Environment” module, which combines the input
from various sensors into a holistic real-time perception
of the vehicle’s surroundings to determine precise vehi-
object detection are executed in parallel, only 51 ms are re- cle states based on processed sensor data. The following
quired. As lane detection is executed on the CPU and the paragraphs discuss the integration and post-processing of
object detection on the GPU of the TX2, the difference of multi-sensor inputs and the use of global map features to
13 ms corresponds almost exactly to the execution time of enhance the sensor data.
the lane detection (14 ms).
In addition to parallelization, the inference engine can Integration and post-processing of multi-sensor inputs
also be optimized to speed up perception. The effect of The “Environment” module fuses all sensor inputs to en-
using an Open Neural Network Exchange (ONNX) neu- able the autonomous vehicle to make informed decisions
ral network model and a TensorRT implementation [8] is based on the current situation. This fusion includes data
investigated. When the ONNX neural network model ob- from the wheel speed sensor, the IMU for spatial orien-
tained after training is integrated directly into C++ code tation, and vehicle-to-everything communication which is
using the OpenCV library, object detection requires 87 ms, available in the BFMC context [1] to obtain the status of
and the perception module requires 100 ms (see Table 3). traffic lights. Additionally, processed information from the
However, when the ONNX file is converted into an engine perception module is integrated, including details on lane
file using the TensorRT library, 55 ms are necessary for ob- positions and detected objects.
ject recognition. Further reducing the internal accuracy of A crucial aspect of navigating an environment is de-
the model from 32-bit to 16-bit floating point numbers determining the ego pose of the vehicle. To determine the
creases the processing time to 36 ms, with no noticeable current ego pose, a dead-reckoning approach is used to
loss of accuracy. Overall, by improving the inference en- dynamically update the x and y positions of the vehicle
gine, the time required for perception was halved to 52 ms. based on sensor data (distance traveled) from the wheel
Since the fastest inference engine was also used for the pre- speed sensor and the current yaw angle of the IMU. The
vious measurements, this number largely corresponds to wheel speed sensor, with an accuracy of approximately
the parallel execution time from Table 2, with a deviation 0.03 m/1 m, combined with the IMU, has shown sufficient
of one millisecond due to measurement inaccuracies. precision for the tasks set within the requirements of the
BFMC. The IMU, configured and used with the RTIM-
4 Behavior and trajectory planning ULib [16], which provides Kalman-filtered pose data, plays
This section focuses on behavior and trajectory planning. a central role in this process by significantly reducing the
The crucial modules used in this architecture are the “En- rotational drift and noise of the sensor values. The yaw an-
vironment”, “Actions”, and “Command” modules. Starting gle exhibits minimal rotation, with a maximum difference
with the “Environment” module, it integrates sensor in- of 0.11 degrees over a 10-minute measurement period, as
puts, refines data through post-processing, and sets pre- indicated in Table 4, subtly impacting the vehicle’s head-
cise vehicle states for informed decision-making. The tran- ing information. Additionally, the rotational drift for roll
sition to the “Actions” module includes mapping vehicle and pitch angles is eliminated, and noise in the acceleration
states to actions and orchestrating behavior and trajectory values is minimized. However, it is worth noting that dead
planning. Concluding this comprehensive exploration, the reckoning relies on the integration of incremental changes,
focus shifts to the “Command” module, which executes so errors can accumulate over time. To counteract these
critical commands, controls the vehicle within predeter- errors, the global map can be used to relocate the vehicle
mined limits, and facilitates seamless communication with based on known map features such as stop lines.
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 9 of 13

Table 4 Static IMU measurements over a 10-minute-period

Sensor value Start End Error
Roll [deg] 0.318 0.318 0.000
Pitch [deg] 1.210 1.210 0.000
Yaw [deg] 359.945 0.000 0.055
Acceleration (x) [m/s2 ] –0.216 –0.216 0.000
Acceleration (y) [m/s2 ] 0.029 0.049 0.020
Acceleration (z) [m/s2 ] 9.679 9.718 0.039

Figure 8 Section of the global map represented as a node graph based on [17]

Use of global map features The BFMC provides a detailed containing node information such as ID, position, and a
map of the track, as shown in Fig. 8, represented as a node Boolean value indicating the lane type. The algorithm also
graph in a GraphML ﬁle. Each node contains information computes the summed distances between nodes in the
about its global position, the lane type (dashed or solid), planned route to estimate the total length of the route.
and its connections. For global route planning, an A∗ al- In addition, detected objects are subjected to further val-
gorithm with a Euclidean distance heuristic is used, eﬃ- idation using the global map to ensure accuracy and reli-
ciently calculating the shortest route between two given ability as follows. The position of detected objects is esti-
map nodes. The output of the A∗ algorithm is a route mated based on the given distance to the object and the
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 10 of 13

current pose of the vehicle. Subsequently, this calculated 5 Experimental evaluation

position is validated against defined ROIs, as shown in To evaluate the accuracy of our planning and controller, we
Fig. 8, which are tailored to the expected location of in- compare the output of our trajectory control with the re-
frastructure elements obtained from the map. It should be spective ground-truth trajectories in various driving sce-
noted that the ROI approach only applies to static objects, narios. Here, the ground-truth trajectory is defined as
such as traffic signs, as dynamic objects, such as pedestri- the center of the lane in which the vehicle is driving.
ans, can appear anywhere on the map. This evaluation is conducted within the simulator, since
the exact position of our vehicle relative to the ground-
4.2 Actions truth is not obtainable in real-world driving conditions.
The “Actions” module serves as a fundamental component, The main metric used is the average displacement error
mapping vehicle states to specific actions and perform- (ADE). The ADE is sampled across the trajectories at regu-
ing behavior and trajectory planning. Each vehicle state lar intervals, providing a measure of the deviation between
triggers certain actions, such as navigating in a lane, stop- the planned and ground-truth paths. Specifically, at each
ping at an intersection, or parking in a free parking space. sampling point along the trajectory, the Euclidean dis-
Within each action, behavior planning and trajectory plan- tance between the planned position and the correspond-
ning are integrated. Behavior planning involves decision- ing ground-truth position is calculated. These positional
making processes to determine the optimal course of ac- errors are then averaged over the entire trajectory to yield
tion in the current situation. Trajectory planning focuses the ADE.
on calculating a path or trajectory that the vehicle should We calculate the ADE for two different vehicle speeds
follow to perform the desired action. in four different driving scenarios: (i) straight_line, (ii)
90_deg_turn, (iii) roundabout, (iv) rural_road, as illus-
For instance, in the crosswalk state/action, behavior
trated in Fig. 9. For each scenario, three separate drives are
planning involves assessing the environment, such as de-
performed and the individual ADEs are averaged. Results
tecting the presence of pedestrians on the crosswalk, and
are given in Table 5.
determining appropriate responses. Concurrently, trajec-
Note that the ADE is consistently low in the straight line
tory planning guides the vehicle through the area around
scenario for both speeds, with an average ADE of 0.004
the crosswalk, considering factors such as staying in the
meters, indicating high accuracy of our trajectory control
lane and adjusting speed to ensure a safe and controlled
in maintaining a straight path. In the 90_deg_turn sce-
crossing.
nario, the ADE increases slightly to 0.018 meters at 0.3 m/s
A basic action is to navigate in a lane, which requires the
and 0.023 meters at 0.8 m/s, suggesting that sharp turns
calculation of the correct steering angle. Due to the rela-
introduce a slightly higher error margin, particularly at
tively low vehicle speeds of 0.3 - 0.8 m/s, a simple controller
higher speeds. In the roundabout scenario, the ADE is
approach is sufficient. The steering angle is calculated with
higher, with values of 0.049 meters at 0.3 m/s and 0.064
an adaptive P-controller using the input from the lane de-
meters at 0.8 m/s, highlighting the challenge of accurately
tection module (see Sect. 3.3). Specifically, the steering an- navigating complex curved paths, especially at increased
gle is determined by multiplying the offset to the center speeds. The rural road scenario, which involves varied and
of the lane, taken at a preview distance of approximately unpredictable path deviations, shows an average ADE of
0.3 m, by a P-value. Based on the estimated curve coeffi- 0.029 meters for both speeds, indicating that our trajectory
cient, the P-value is adjusted to enable tighter curves and control performs reasonably well in more dynamic and less
stabilize the general steering behavior. predictable situations.
Finally, the “Actions” module ensures the seamless ex- A second evaluation focuses on computational runtime
ecution of the planned behavior by converting high-level and overall system delays. Table 6 lists the algorithmic exe-
actions into low-level control signals. cution times for the main system modules. Given that our
driving scenario is relatively simple from a behavior point
4.3 Command of view, the computation times are dominated by the per-
The “Command” module is a central element in manag- ception stack, particularly the neural network.
ing commands for the autonomous vehicle, overseeing the Finally, the overall system latency, which is the time de-
validation and control of key parameters such as steering lay between an event in the scene and the initial response
and speed commands. This validation prevents extreme of our vehicle, has been estimated. This overall system
inputs, ensuring stability and control over the vehicle’s be- latency does not only include the algorithmic execution
havior. Additionally, this module converts ROS messages times but also factors such as data acquisition time, inter-
into UART messages, facilitating efficient communication module communication time, network latency, and actua-
with the Nucleo board (see Fig. 2) and ensuring smooth tor response times. In our current configuration, the over-
integration into the software ecosystem. all system delay is estimated to be approximately 190 ms
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 11 of 13

Figure 9 Driving scenarios (i) straight_line, (ii) 90_deg_turn, (iii) roundabout, (iv) rural_road (left to right) used to evaluate planning and control
accuracy. Ground-truth trajectories are shown in blue. Actual vehicle trajectories at a speed of 0.8 m/s are shown in magenta

Table 5 ADE in [m] for the driving scenarios under consideration

Driving scenario Vehicle speed [m/s] Distance driven [m] Mean ADE [m]
straight_line 0.3 7.1 0.004
straight_line 0.8 7.1 0.004
90_deg_turn 0.3 3.0 0.018
90_deg_turn 0.8 3.1 0.023
roundabout 0.3 3.0 0.049
roundabout 0.8 3.1 0.064
rural_road 0.3 14.9 0.029
rural_road 0.8 15.0 0.029
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 12 of 13

Table 6 Average compute times of system modules (without

latency considerations) through challenges like the BFMC, provide an ideal testbed
to rapidly advance this exciting ﬁeld.
Module Time per frame [ms]
Perception 52
Behavior and Trajectory Planning 2 Acknowledgements
We would like to thank the Faculty of Computer Science and Engineering and
Other 5
the Institute for Intelligent Systems (IIS) at Esslingen University of Applied
Overall 59 Sciences for providing us with resources for this project. They also provided us
with ﬁnancial support for the procurement of parts, and the trip to the student
competition in Cluj, Romania. We would like to thank Robert Bosch GmbH for
organizing the BFMC, the temporary loan of the model vehicle, the informative
on average which does not impose a limit on our vehicle meetings with technical experts, and the accommodation during the
speeds since they are much lower than those of real-world competition. We acknowledge support by the state of Baden-Württemberg
through bwHPC, which provided the necessary compute resources for model
vehicles. training.

Author contributions
6 Discussion and conclusion JB, JH, NK and KO contributed equally to the design and conduct of the
This paper presented a comprehensive software stack de- research and the writing of the manuscript. ME and RM supported and
signed for autonomous model vehicles, successfully used supervised the research and revised the manuscript. All authors read and
approved the final manuscript.
in the Bosch Future Mobility Challenge. It featured the im-
plementation of a controller, advanced filters to minimize Funding
false detections, and the use of the YOLOv5s model along- No funding was received for conducting this study.
side lane detection for accurate environmental perception. Data availability
The coordinated approach to integrating perception, plan- The dataset used for training the object detection network is available at:
ning, and control demonstrated the system’s efficiency and https://universe.roboflow.com/team-driverles/bfmc-6btkg
adaptability within the constraints of a model vehicle plat- Code availability
form. The source code of the presented software stack is available at: https://github.
However, several limitations should be addressed in fu- com/JakobHaeringer/BFMC_DriverlES_2023
ture work. Replacing hand-crafted filters to prevent false
positives with more principled methods, such as tempo- Declarations
ral tracking or neural network uncertainty estimates, could
Competing interests
improve detection reliability. Moving beyond the simple The authors declare that they have no competing interests.
P-controller to advanced techniques like model predic-
tive control would enhance trajectory tracking. Leveraging Received: 21 April 2024 Revised: 5 July 2024 Accepted: 5 August 2024
ROS capabilities for mapping, localization, and sensor fu-
sion can boost reliability and enable more advanced auton- References
omy features. Exploring more recent neural network mod- 1. Robert Bosch GmbH, Bosch Future Mobility Challenge (2023). [Online].
Available: https://boschfuturemobility.com/. Accessed 21 December 2023
els tailored for embedded devices, could provide more ac- 2. G. Jocher, YOLOv5, Ultralytics Inc. (2020). [Online]. Available: https://docs.
curate and efficient perception. ultralytics.com/de/models/yolov5/. Accessed 21 December 2023
An important aspect for future exploration is the gen- 3. R. Pi, Datasheet - Raspberry Pi 4 Model B (2019). [Online]. Available: https://
datasheets.raspberrypi.com/rpi4/raspberry-pi-4-datasheet.pdf. Accessed
eralizability of the results. While the software stack is de- 21 December 2023
signed to be transferable to other competitions, empirical 4. STMicroelectronics, NUCLEO–F401RE (2023). [Online]. Available: https://
evidence or case studies demonstrating its successful ap- www.st.com/resource/en/data_brief/nucleo-f401re.pdf. Accessed 21
December 2023
plication beyond the BFMC are currently limited. We ex- 5. NVIDIA Corporation Jetson TX2-Module (2023). [Online]. Available: https://
pect to use this stack in further competitions and will gain www.nvidia.com/de-de/autonomous-machines/embedded-systems/
more experience, which we plan to report in future work. jetson-tx2/. Accessed 21 December 2023
6. I. Corporation, Intel RealSense Depth Camera D435 (2023). [Online].
In that regard, scalability is another area that needs ex- Available: https://www.intelrealsense.com/depth-camera-d435/. Accessed
ploration. The scalability of the software stack for larger, 21 December 2023
more complex environments or more sophisticated mod- 7. S. Loretz, ROS Noetic Ninjemys, Open Source Robotics Foundation Inc.
(2020). [Online]. Available: https://wiki.ros.org/noetic. Accessed 04 January
els is not addressed in this paper. There could be limita- 2024
tions when scaling up from miniature smart cities to larger 8. N. van der Meer, YOLOv5-TensorR (2022). [Online]. Available: https://
or more dynamic settings. github.com/noahmr/yolov5-tensorrt. Accessed 02 December 2023
9. G. Jocher, YOLOv5, Ultralytics Inc. (2020). [Online]. Available: https://github.
The modular architecture offers a versatile platform for com/ultralytics/yolov5. Accessed 21 December 2023
future enhancements. While it provides a solid foundation, 10. A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, YOLOv4: Optimal Speed and
continued research will be crucial to push model vehicle Accuracy of Object Detection (2020). https://doi.org/10.48550/arXiv.2004.
10934
autonomy closer to that of their full-scale counterparts. 11. R.-A. Bratulescu, R.-I. Vatasoiu, G. Sucic, S.-A. Mitroi, M.-C. Vochin, M.-A.
Collaborative efforts between industry and academia, Sachian, Object detection in autonomous vehicles, in 2022 25th
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 13 of 13

International Symposium on Wireless Personal Multimedia Communications

(WPMC), Herning, Denmark (2022). https://doi.org/10.1109/ICSET59111.
2023.10295116
12. J. Redmon, D. Santosh, G. Ross, F. Ali, You only look once: uniﬁed, real-time
object detection, in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, NV, USA (2016). https://doi.org/10.48550/
arXiv.1506.02640
13. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern
Anal. Mach. Intell. PAMI-8(6), 679–698 (1986)
14. Y. Ding, Z. Xu, Y. Zhang, K. Su, Fast lane detection based on bird’s eye view
and improved random sample consensus algorithm. Multimed. Tools
Appl. 76, 22979–22998 (2017)
15. J. Wang, F. Gu, C. Zhang, G. Zhang, Lane boundary detection based on
parabola model, in The 2010 IEEE International Conference on Information
and Automation, Harbin, China (2010). https://doi.org/10.1109/ICINFA.
2010.5512219
16. RPi-Distro, RTIMULib (2015). [Online]. Available: https://github.com/RPi-
Distro/RTIMULib. Accessed 19 December 2023
17. Robert Bosch GmbH, Bosch Future Mobility Challenge (2023). [Online].
Available: https://github.com/ECC-BFMC. Accessed 12 February 2024

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional aﬃliations.