Competing With Autonomous Model Vehicles A Softwar
Competing With Autonomous Model Vehicles A Softwar
S H O R T PA P E R Open Access
Abstract
This article introduces an open-source software stack designed for autonomous 1:10 scale model vehicles. Initially
developed for the Bosch Future Mobility Challenge (BFMC) student competition, this versatile software stack is
applicable to a variety of autonomous driving competitions. The stack comprises perception, planning, and control
modules, each essential for precise and reliable scene understanding in complex environments such as a miniature
smart city in the context of BFMC. Given the limited computing power of model vehicles and the necessity for
low-latency real-time applications, the stack is implemented in C++, employs YOLO Version 5 s for environmental
perception, and leverages the state-of-the-art Robot Operating System (ROS) for inter-process communication. We
believe that this article and the accompanying open-source software will be a valuable resource for future teams
participating in autonomous driving student competitions. Our work can serve as a foundational tool for novice
teams and a reference for more experienced participants. The code and data are publicly available on GitHub.
Keywords: Autonomous model vehicle, Software architecture, Embedded real-time systems, Bosch Future Mobility
Challenge, Autonomous driving
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The
images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise
in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 2 of 13
Receiver/Transmitter (UART) protocol. Detailed explana- Overall, the model detects 15 classes, including cross-
tions are provided in Sect. 4. walks, stop signs, and priority signs. Additionally, dynamic
traffic participants (e.g., cars and pedestrians) and static
Acting The Nucleo board receives control signals from obstacles are recognized. The model also identifies stop
the main computing unit and utilizes a PID controller to lines and parking spaces for junctions and parking situa-
adjust the actual speed based on the target speed. The tions.
steering angle is set to a target value, constrained within
defined boundary conditions. Details are given in Sect. 4. Model selection Implementing an efficient and robust
object detection system is paramount in the development
3 Perception of autonomous model vehicles for competitions such as
This section examines the design of our perception system, the BFMC. One of the critical tasks in this domain is the
covering camera setup, object detection, performance op- identification of obstacles, paths, and other relevant envi-
timization on the Raspberry Pi, and lane detection al- ronmental features. After considering various algorithms,
gorithms, all tailored towards the BFMC environment. YOLOv5s [2, 8], a variant of the YOLO family of object
Key points include sensor choice and camera alignment, detection models, was selected for this purpose due to its
dataset creation, neural network selection, architecture strengths and suitability for the specific requirements of
and training, task parallelization, and efficiency enhance- the 1:10 scale autonomous model vehicle.
ment methods. Lane detection is addressed through pre- YOLOv5s, as the second smallest and fastest model in
processing steps and histogram-based techniques. This the YOLOv5 series, offers a balance of speed and accu-
overview aims to provide a clear understanding of the sys- racy, making it suitable for real-time object detection in
tems behind the functioning of autonomous model vehi- resource-constrained environments like model vehicles
[9]. The model’s architecture, building upon the advance-
cles, especially in competitive settings like the BFMC.
ments of its predecessors, incorporates several features
that meet the high-performance demands of autonomous
3.1 Camera setup
navigation while remaining computationally efficient [10].
The Intel RealSense camera utilized in this system features
It includes optimizations like anchor box adjustments and
an RGB sensor with resolutions up to 1920 × 1080 px [6].
advanced training techniques [11], making it suitable for
To balance performance and processing speed, we operate
real-time object detection in autonomous vehicles. Its abil-
the camera at a resolution of 960 × 540 pixels at a sampling
ity to detect objects of various sizes and under different
rate of 30 Hz. The depth images are spatially aligned with
conditions is crucial for the safety and reliability of au-
the RGB color images. The RGB module is positioned on
tonomous driving systems [12].
the left side of the camera, providing comprehensive cap-
In addition to these technical advantages, the widespread
ture on the left side. To ensure traffic signs on the right
adoption and active development community surround-
side are detected at shorter distances, the camera has been
ing the YOLO family of models provide resources for sup-
rotated accordingly. port and further enhancements. The availability of pre-
trained models, extensive documentation, and a large user
3.2 Object detection community contribute to the development process and fa-
Dataset A test track was constructed to evaluate the soft- cilitate the implementation of advanced features and im-
ware stack and capture images for training the object de- provements.
tection network (see Fig. 4). The test track was designed
to closely follow the rules of the BFMC [1] and includes a Methodology The training of YOLOv5s involved opti-
roundabout, two intersections, and a parking lot. It is com- mizing parameters such as the number of epochs, batch
plemented by signs with 3D-printed poles and two pedes- size, and the use of autobatching. The performance was
trian dummies. Although the test track is sufficient to ana- evaluated based on four key metrics: box loss, object loss,
lyze most scenarios, it is approximately four times smaller class loss, and Mean Average Precision (MAP). These met-
than the original competition track. rics collectively offer insights into the model’s accuracy, re-
The dataset used to train the object recognition model liability, and efficiency in detecting and classifying objects.
consists of 4665 images captured while driving on the test Box loss emphasizes the spatial accuracy of object de-
track and during the competition. Additionally, 774 images tection, measuring the precision of the predicted bound-
from videos provided by Bosch were included. These im- ing boxes against the ground-truth boxes. Object loss ad-
ages were taken from vehicles in previous competitions, dresses the discernment between objects and non-objects,
using different cameras, resolutions, and aspect ratios. evaluating the model’s ability to detect and distinguish ob-
Despite these variations, incorporating these images im- jects from the background. Class loss measures the accu-
proved perception in scenes that were difficult to recreate racy of categorizing detected objects into the correct cate-
on our own test track, such as motorway exits. gories. A well-trained model should ideally have low scores
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 5 of 13
Figure 4 Test track at the Institute for Intelligent Systems at Esslingen University of Applied Sciences
Table 1 Parameters, validation losses, and MAP of different training / validation runs
Epochs Batch-size Box loss Object loss Class loss MAP
75 64 0.0121 0.0040 0.0004 0.8653
75 32 0.0244 0.0064 0.0009 0.6657
100 32 0.0129 0.0039 0.0005 0.8638
100 autobatch 0.0156 0.0049 0.0008 0.8194
300 64 0.0123 0.0040 0.0005 0.8713
300 32 0.0123 0.0043 0.0005 0.8696
500 32 0.0113 0.0044 0.0009 0.8688
on all three types of losses, indicating high precision and as detecting small model cars. This insight can guide future
accuracy in both detecting and classifying objects in im- training procedures in similar applications, emphasizing
ages. the need for a balanced approach to training deep learning
models.
Training This section focuses on the different training
parameters and the definition of various losses regarding
Filtering of misdetections In addition to missing known
the YOLOv5 object detection model. Table 1 provides a
brief overview of the performance differences resulting objects (false negatives), recognizing false objects (false
from the parameter adjustments. positives) is also a significant problem in the perception of
The analysis of various training configurations of neural networks. These misdetections can lead to incor-
YOLOv5s for the BFMC underscores the importance of rect reactions of the model vehicle, so detections are fil-
carefully selecting training parameters. The configuration tered using prior scene knowledge before being forwarded
with 300 epochs and a batch size of 64 emerged as the most to behavior planning. To analyze the effectiveness of the
effective, striking an optimal balance between training du- filters in more detail, the number of detections removed
ration and model performance. during a drive on the test track was recorded. The applied
This setup not only achieved the highest MAP but also filters and the proportion of valid detections that passed
maintained low loss values, making it the preferred choice our filters are shown in Fig. 5. The filter functions are ap-
for tasks requiring high precision in object detection, such plied sequentially from top to bottom.
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 6 of 13
Figure 6 Preprocessing steps of the lane detection. a) cropped Image, b) BEV representation, c) masked image
Figure 7 Successive steps for the determination of the vehicle’s offset from the center of the roadway and the roadway’s curvature
cle’s perspective. The histogram provides a statistical rep- the center coordinates of the next search box, as visualized
resentation of the intensity distribution across the image in Fig. 7b.
columns, allowing the identification of prominent peaks After iterating through the complete image, the row and
that correspond to lane markings. This information serves column values identified from the histogram peaks in each
as crucial input to the subsequent lane detection stages search box are preserved. To approximate the smooth tra-
(see Fig. 7). jectory of lane markings and eliminate outliers, we fit a
Together, these preprocessing steps prepare the input quadratic parabola to the accumulated x-y pixel pairs us-
image for the lane detection algorithm, effectively trans- ing a least-squares fitting approach, similar to the method
forming the raw camera data into a format suitable for ro- described in [15] and shown in Fig. 7c. Curve fitting pro-
bust and accurate lane detection, similar to the approach vides a more reliable representation of lane boundaries by
used in [14]. smoothing out fluctuations in the x-y-centers of the search
boxes and capturing the underlying curvature of the lane
markings.
Detection During the second stage, the algorithm iden-
The fitted parabola is then used to calculate the lane tra-
tifies the lane markings on the road. Utilizing the origins
jectory, providing necessary information about the vehi-
of the lane markings at the bottom of the image, identified
cle’s lane position and potential departures from the lane.
through the preprocessing routine, the algorithm scans the
This information is essential for enabling the vehicle to
binary BEV image from bottom to top in search of lane
maintain its position within the lane and prevent lane de-
markings. This scanning process employs search boxes partures, a critical safety feature for autonomous vehicles.
(see Fig. 7a) that start at the identified origins and serve The lane detection algorithm outputs the lane boundaries
two primary purposes. First, implementing a search box and their curvature, as well as the vehicle’s offset from the
reduces computational requirements by limiting the scope center of the roadway.
of pixel analysis. Second, using a search box decreases the
likelihood of misinterpreting a single pixel and minimizes 3.4 Performance optimization
the effect of pixel-wise errors in the search process. As discussed in Sect. 2, the TX2 runs the entire percep-
A histogram is created for the area of each search box. tion stack, fetching camera images, detecting lanes and ob-
This histogram offers a statistical representation of the in- jects, and communicating the results to the Raspberry Pi.
tensity distribution of white pixels in the search box across While image retrieval and alignment must be carried out
each column, facilitating the identification of significant sequentially before processing, lane and object detection
peaks that correspond to possible lane markings. After fil- can be parallelized. With sequential execution, the entire
tering out columns lacking sufficient white pixels, the aver- perception module requires 64 ms from image retrieval to
age of the remaining column numbers is used to pinpoint message transmission, as shown in Table 2. When lane and
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 8 of 13
Figure 8 Section of the global map represented as a node graph based on [17]
Use of global map features The BFMC provides a detailed containing node information such as ID, position, and a
map of the track, as shown in Fig. 8, represented as a node Boolean value indicating the lane type. The algorithm also
graph in a GraphML file. Each node contains information computes the summed distances between nodes in the
about its global position, the lane type (dashed or solid), planned route to estimate the total length of the route.
and its connections. For global route planning, an A∗ al- In addition, detected objects are subjected to further val-
gorithm with a Euclidean distance heuristic is used, effi- idation using the global map to ensure accuracy and reli-
ciently calculating the shortest route between two given ability as follows. The position of detected objects is esti-
map nodes. The output of the A∗ algorithm is a route mated based on the given distance to the object and the
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 10 of 13
Figure 9 Driving scenarios (i) straight_line, (ii) 90_deg_turn, (iii) roundabout, (iv) rural_road (left to right) used to evaluate planning and control
accuracy. Ground-truth trajectories are shown in blue. Actual vehicle trajectories at a speed of 0.8 m/s are shown in magenta
Author contributions
6 Discussion and conclusion JB, JH, NK and KO contributed equally to the design and conduct of the
This paper presented a comprehensive software stack de- research and the writing of the manuscript. ME and RM supported and
signed for autonomous model vehicles, successfully used supervised the research and revised the manuscript. All authors read and
approved the final manuscript.
in the Bosch Future Mobility Challenge. It featured the im-
plementation of a controller, advanced filters to minimize Funding
false detections, and the use of the YOLOv5s model along- No funding was received for conducting this study.
side lane detection for accurate environmental perception. Data availability
The coordinated approach to integrating perception, plan- The dataset used for training the object detection network is available at:
ning, and control demonstrated the system’s efficiency and https://universe.roboflow.com/team-driverles/bfmc-6btkg
adaptability within the constraints of a model vehicle plat- Code availability
form. The source code of the presented software stack is available at: https://github.
However, several limitations should be addressed in fu- com/JakobHaeringer/BFMC_DriverlES_2023
ture work. Replacing hand-crafted filters to prevent false
positives with more principled methods, such as tempo- Declarations
ral tracking or neural network uncertainty estimates, could
Competing interests
improve detection reliability. Moving beyond the simple The authors declare that they have no competing interests.
P-controller to advanced techniques like model predic-
tive control would enhance trajectory tracking. Leveraging Received: 21 April 2024 Revised: 5 July 2024 Accepted: 5 August 2024
ROS capabilities for mapping, localization, and sensor fu-
sion can boost reliability and enable more advanced auton- References
omy features. Exploring more recent neural network mod- 1. Robert Bosch GmbH, Bosch Future Mobility Challenge (2023). [Online].
Available: https://boschfuturemobility.com/. Accessed 21 December 2023
els tailored for embedded devices, could provide more ac- 2. G. Jocher, YOLOv5, Ultralytics Inc. (2020). [Online]. Available: https://docs.
curate and efficient perception. ultralytics.com/de/models/yolov5/. Accessed 21 December 2023
An important aspect for future exploration is the gen- 3. R. Pi, Datasheet - Raspberry Pi 4 Model B (2019). [Online]. Available: https://
datasheets.raspberrypi.com/rpi4/raspberry-pi-4-datasheet.pdf. Accessed
eralizability of the results. While the software stack is de- 21 December 2023
signed to be transferable to other competitions, empirical 4. STMicroelectronics, NUCLEO–F401RE (2023). [Online]. Available: https://
evidence or case studies demonstrating its successful ap- www.st.com/resource/en/data_brief/nucleo-f401re.pdf. Accessed 21
December 2023
plication beyond the BFMC are currently limited. We ex- 5. NVIDIA Corporation Jetson TX2-Module (2023). [Online]. Available: https://
pect to use this stack in further competitions and will gain www.nvidia.com/de-de/autonomous-machines/embedded-systems/
more experience, which we plan to report in future work. jetson-tx2/. Accessed 21 December 2023
6. I. Corporation, Intel RealSense Depth Camera D435 (2023). [Online].
In that regard, scalability is another area that needs ex- Available: https://www.intelrealsense.com/depth-camera-d435/. Accessed
ploration. The scalability of the software stack for larger, 21 December 2023
more complex environments or more sophisticated mod- 7. S. Loretz, ROS Noetic Ninjemys, Open Source Robotics Foundation Inc.
(2020). [Online]. Available: https://wiki.ros.org/noetic. Accessed 04 January
els is not addressed in this paper. There could be limita- 2024
tions when scaling up from miniature smart cities to larger 8. N. van der Meer, YOLOv5-TensorR (2022). [Online]. Available: https://
or more dynamic settings. github.com/noahmr/yolov5-tensorrt. Accessed 02 December 2023
9. G. Jocher, YOLOv5, Ultralytics Inc. (2020). [Online]. Available: https://github.
The modular architecture offers a versatile platform for com/ultralytics/yolov5. Accessed 21 December 2023
future enhancements. While it provides a solid foundation, 10. A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, YOLOv4: Optimal Speed and
continued research will be crucial to push model vehicle Accuracy of Object Detection (2020). https://doi.org/10.48550/arXiv.2004.
10934
autonomy closer to that of their full-scale counterparts. 11. R.-A. Bratulescu, R.-I. Vatasoiu, G. Sucic, S.-A. Mitroi, M.-C. Vochin, M.-A.
Collaborative efforts between industry and academia, Sachian, Object detection in autonomous vehicles, in 2022 25th
Bächle et al. Autonomous Intelligent Systems (2024) 4:17 Page 13 of 13
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.