Automated Vehicle Parking Occupancy Detection in Real-Time
Automated Vehicle Parking Occupancy Detection in Real-Time
Abstract—Parking occupancy detection systems help to Since surveillance cameras are usually available in parking
identify the available parking spaces and direct vehicles spaces, they can be used instead of costly dedicated sensors.
efficiently to unoccupied lots by reducing time and energy. This
paper presents an approach for the design and development of This paper presents an approach to design and develop an
an end-to-end automated vehicle parking occupancy detection APODS. Recent advancements in deep learning-based object
system. The novelty of this study lies in the methodology detection techniques, such as RetinaNet and Faster R-CNN
followed for the object detection process using RetinaNet one [10], [11]; have led to systems that work in different weather
stage detector and region-based convolutional neural network conditions and identifying objects with high accuracy. As a
deep learning technique. The proposed software architecture novel method, we consider the two models RetinaNet [10];
consists of low coupled components that support scalability and and Faster R-CNN [11]; which is a region-based technique
reliability. The developed web-based and mobile-based client relies on the bounding box and the object label, to develop
applications assist to find parking spaces easily and efficiently. the proposed APODS. These two models were the state-of-
The existing solutions utilize dedicated sensors and depend on the-art single-stage and multi-stage object detectors,
manual segmentation of surveillance footage to detect the state respectively, in terms of accuracy at the start of our research.
of parking spaces. The proposed approach eliminates existing The single-stage detector has faster inference and relatively
limitations while maintaining reasonable accuracy. less computational cost, while a two-stage (region-proposal)
detector trades them for relatively higher accuracy [12].
Keywords—Object detection, computer vision, cloud
computing, microservice architecture The approach for parking occupancy detection is chosen
based on the factors such as the deployment requirements,
I. INTRODUCTION computational limitations, cost and response time. In order to
With the increase of global sustainability, there is a need to satisfy the objective of supporting a commercially viable
utilize the land space for vehicle parking. The demand for APODS to identify and navigate to parking spaces, we have
parking lots is further compounded by the fact that, the paid attention to the usability of the system. This study
available parking spaces in cities decrease with the growing presents both a web application and a mobile application with
demand for the land. Thus, having an automated system that micro-service architecture based back-end with Kubernetes,
can efficiently guide drivers to parking spaces is a timely which is a container orchestration service for the deployment.
need. Generally, an automated parking occupancy detection This approach is used to generalize the APODS solution for
system (APODS) discovers the state of the parking space, different camera angles without depending on a specific
whether it is occupied or free. The process should support marking. The performance is assessed using the PKLot
reliable identification irrespective of the external constraints dataset [13]; with 695,899 annotated parking spaces. The
such as vehicle type, lighting, or weather condition. paper is structured as follows. Section II gives an overview
Moreover, the solution should be implemented with less of parking occupancy detection, the dataset and object
installation and maintenance cost. However, most of the detection process. Section III, Section IV and Section V
existing solutions depend on dedicated devices such as describe the design methodology, results and discussion,
ultrasonic or magnetic sensors to identify the state of parking respectively. Finally, Section VI concludes the paper.
spaces, with additional installation and maintenance cost [1]–
[3]. Other solutions [4]–[8]; uses surveillance footage from II. BACKGROUND
existing surveillance systems instead of dedicated sensors. A. Parking Occupancy Detection
Further, they require manually segmenting the video stream
The parking occupancy detection problem identifies
to parking spaces which is time-consuming.
whether a given a parking space is currently occupied or not.
Yusnita et al. [9]; have addressed the issue of manual This can support with visual and non-visual solutions. Visual
segmentation using marking in parking spaces. However, this solutions use camera footage usually the surveillance
approach required the camera to be mounted in a top-down cameras, which are common in parking lots. Non-visual
position and parking spaces to have specific markings. These methods use sensors other than cameras such as magnetic
existing issues motivated to develop an APODS, to provide field sensors to identify the occupancy of a parking space [2].
reliable results in different conditions and with less cost.
Some approaches use visual sensors in conjunction with of Parana (PUCPR), both located in Curitiba, Brazil. Among
magnetic sensors [1]. This is due to the higher energy all the images there are 695,899 parking spaces labelled as
consumption of magnetic sensors compared with visual occupied or not. 337,780 (48.54%) instances are marked as
sensors. They use visual sensors to enable the magnetic occupied while 358,119 (51.46%) instances are marked as
sensor, that detects the occupancy of a parking lot. Another unoccupied. Thus, the dataset is not biased to apply for the
combined sensor approach is to use magnetic and distance learning model. The camera views are labelled as PUCPR,
sensors, which compensates for the small magnetic footprint UFPR04 and UFPR05 as shown in Fig. 1, left, middle, right,
of certain vehicles [3]. This technique has shown over 99% respectively. For a given lot, the same set of parking spaces
accuracy. However, the issue is the need for a dedicated are annotated in all the images.
sensor for each parking space, leading to additional
installation and maintenance cost. Further, these sensors can C. Object Detection
affect conditions such as water and oil in parking lots. Object detection is defined as detecting the instances of
Another common approach to detect parking occupancy is semantic objects in a given image. The main types of
the use of surveillance cameras that are already installed in solutions can be named as classic object detectors, two-stage
the parking spaces [14]. This mitigates the need for additional detectors and single-stage detector [10]. Classic object
sensors as in the case of non-visual methods. However, it is detectors use the sliding window method, where a classifier
challenging to use computer vision methods to obtain robust is applied to a region selected by a sliding window. This
results with diverse weather and lighting conditions, and classifier could be a CNN or traditional machine learning
vehicle types. Further, a parked vehicle could obstruct the methods such as SVM.
view of other parked vehicles and visual features.
Several methods have used for two-stage object detection
Many existing solutions consider the location of the [10]. The first phase generates a sparse set of candidate
parking lots as a constant factor in each image. They segment proposals that should contain all the objects. The second
the image to parking spaces manually and classify these stage classifies the proposals into the foreground and
segments as occupied or not. An improved colour vectors background classes. R-CNN [18]; uses CNN for both stages,
feature-based Support Vector Machine (SVM) approach is followed by a series of improvements. Faster R-CNN is the
used in Wu et al. [15]. They have classified the state of three latest technique [11]. The state-of-the-art two-stage methods
neighbouring parking spots as a unit, thus reducing the effect outperform all other methods in standardized object detection
of occlusion. Huang et al. [16]; has presented a hierarchical test such as Common Objects in Context (COCO) [19]. In
Bayesian generator which is robust than the previous SVM contrast, a standardized image dataset for object detection is
based solutions. Moreover, Yusnita et al. [9]; has proposed a given by Pascal VOC data format, with a different format for
solution that does not depend on manual segmentation of the the boundary box. Speed is the main issue of two-stage
image into parking spots. However, it requires a top-down detectors compared with single-stage detectors. They are
view of the parking space. comparatively slower than single-stage detectors in both
Amato et al. [8]; has proposed a deep learning solution inference and training. Usually, this time spent on the region
built using Convolutional Neural Network (CNN) classifiers proposal stage and it is CPU bound in many cases [11]. In
and showed an accuracy of over 90% in the PKLot [13]; Faster R-CNN this issue was addressed by the creation of
dataset. Later, an end-to-end solution with a camera system Region Proposal Network (RPN). This allowed Faster R-
and front-end application was proposed by Valipour et al. [6]. CNN to be run entirely on GPU or an accelerator for deep
This solution reported an AUC over 0.99 for the PKLot neural networks making it faster and accurate than the rest of
dataset. Moreover, Ahmad et al. [17]; used M-RCNN (Mask R-CNN family.
Region-based Convolutional Neural Networks) to match Single-stage detectors combine both region proposal and
vehicle like objects to predefined parking spaces, to detect the classification tasks to a single neural network that can be
availability of parking spaces. However, all these solutions trained end-to-end. Though the performance of this method
were based on manual segmentation of the image into parking is inferior to two-stage methods in terms of accuracy, recent
spots. Further, there is a lack of research that analyses the developments in new architectures such as YOLO [20]; and
effectiveness of using object detection, as a solution to detect RetinaNet [10]; has made them competitive with two-stage
available parking spaces with no prior manual input. detectors. A limitation of single-stage detectors over two-
B. PKLot Dataset stage detectors was many candidate proposals that are
generated being belonging to the background. This class
This study is used PKLot [13]; dataset for training and imbalance issue was to an extent addressed in RetinaNet
evaluation of inference service. This dataset contains 12,417 using a new loss function called Focal loss [10]; which
images taken from 2 parking lots, one at Federal University allowed RetinaNet to be comparable and in some cases beat
of Parna (UFPR) and other at Pontifical Catholic University two-stage detectors.
Fig. 1. Sample images with annotations. (left) PUCPR, (middle) UFPR04, (right) UFPR05
645
Authorized licensed use limited to: Qatar University. Downloaded on March 03,2024 at 08:13:57 UTC from IEEE Xplore. Restrictions apply.
Moratuwa Engineering Research Conference (MERCon) 2020
As a summary, the two-state method predicts the object The advantages of a microservice architecture are
locations by filtering the background and then classifies the scalability and reliability [23]. Since each component
objects. Hence, R-CNN supports high accuracy level. functions in separate and independent compute environments
However, it is computationally expensive for real-time object with each parallel process, the video sampling and inference
detection. On the other hand, single-stage methods, which are processes execute on independent compute environments.
the recent state-of-the-art object detection techniques using Thus, the failure of a single component of the system has a
deep learning supports fast processing. Hence, suitable for minimum impact on the rest of the system. Additionally, each
real-time object detection, [12]. However, the accuracy is low component can be restarted and modified without affecting
compared to two-stage methods. Thus, the single state-based the rest of the modules, as the system is designed with a
RetinaNet can be used to fix the focal loss in object detection, microservice architecture, dockers, taking benefit of Google
by rescaling the loss function to improve the accuracy [10]. Kubernetes Engine. This also allows the system to be updated
in both hardware and software to meet the requirements. For
In many existing studies, the object detection element instance, computational resources for inference service can
belongs to the foreground of the image. However, in the be upgraded without affecting any other component.
parking occupancy detection scenario, the unoccupied
parking spaces instances are part of the background. For this A. Video Sampling Service
study, a single-stage detector, RetinaNet [10]; and a two-
stage detector, Faster R-CNN [11]; were selected for the The workflow of the video sampling service is shown in
inference service. These detectors have used for similar tasks Fig. 3. When a parking lot provider is registered, they are
such as road marking detection [21]; and road damage given a URL to stream their surveillance footage and
detection [22]; in images. At the time of this study, they were necessary credentials to authenticate. For each video stream,
the state-of-the-art single and two-stage detectors with a separate sampling node is created (If the pre-set maximum
highest average precision scores in the COCO dataset [19]; is not reached) using Google Kubernetes Engine. As a result,
which is typically used as the benchmark for object detection. if a processing node fails, it does not affect the rest of the
nodes. Thus, the system can be scaled based on the number
III. SYSTEM DESIGN AND IMPLEMENTATION of video streams it receives automatically.
Initially, the surveillance camera sends the video stream to Video sampler, samples the video stream at a fixed rate and
the video sampling service and saves images to the persistent write the sampled images to the shared persistent storage. The
storage. The data such as parking lot identifier are retrieved directory structure of the storage shows the parking lot that
from the database. The inference function triggers for each an image belongs, and the file name gives the time stamp.
new write to the storage and extracts features such as lot id When the image is written, it triggers a Firebase function,
and sends along with image id to the rest endpoint of the which calls the Inference service with the image and parking
cloud run. Then creates an instance that will download the lot identifier.
image, run inference on it, find the number of parking spaces
and updates the database. When a client application wants to B. Inference Service
know the parking spaces, it calls the client service endpoints
Fig. 4 shows the workflow of the inference service module.
which then extract that data from the shared database.
Inference service is hosted on the Google Cloud run. When
this service receives the image, it will create a new processing
instance (If the pre-set maximum is not reached), which will
get recycled after running inference on the image.
646
Authorized licensed use limited to: Qatar University. Downloaded on March 03,2024 at 08:13:57 UTC from IEEE Xplore. Restrictions apply.
Moratuwa Engineering Research Conference (MERCon) 2020
Fig. 5. Web application map view According to Table I results, RetinaNet outperforms Faster
R-CNN for the parking occupancy detection task and
Similarly, the mobile application shows the user’s location shallower version of the RetinaNet (R50) outperform the
and the parking lots within proximity. The user can drag a deeper version (R101). It must be noted that these could also
map marker to view parking lots in an area user desires as be due to the relatively small size of the dataset compared to
well. By clicking on a location and choosing to show the path datasets such as COCO [19]. Since COCO dataset has 80
it will generate a path to the parking lot as shown in Fig. 6. labelled classes and PKLot dataset has only 2 labelled classes,
The directions to that location are displayed in the bottom a better precision could be expected. But comparing the
right corner. Also, it provides real-time notifications that send number of detections in the two datasets, for a single image
updates on nearby parking spaces to the user’s phone if he of PKLot dataset is higher than COCO dataset. Both models
wishes to enable the service. This is done through Firebase generate constant region proposals (RetinaNet 100K, Faster
Cloud Messaging. Further, the ability to report observable R-CNN 2K) as the first step in the detection process. These
issues within parking lots allows improvement of the system. region proposals are sufficient to handle nearly 10 objects in
647
Authorized licensed use limited to: Qatar University. Downloaded on March 03,2024 at 08:13:57 UTC from IEEE Xplore. Restrictions apply.
Moratuwa Engineering Research Conference (MERCon) 2020
648
Authorized licensed use limited to: Qatar University. Downloaded on March 03,2024 at 08:13:57 UTC from IEEE Xplore. Restrictions apply.
Moratuwa Engineering Research Conference (MERCon) 2020
The current system was trained on partially annotated data. IEEE Conf. on Computer Vision and Pattern Recognition Workshops,
Thus, the performance can be improved by using a fully Las Vegas, NV, USA, 2016, pp. 9–15.
annotated dataset in future work. This should allow the [8] G. Amato, F. Carrara, F. Falchi, C. Gennaro, and C. Vairo, “Car
parking occupancy detection using smart camera networks and Deep
training to be performed on one parking lot and testing to be Learning,” in Proc. IEEE Symposium on Computers and
performed on a different space, which is more representative Communications, Messina, Italy, 2016, pp. 1212–1217.
of the real-world scenario. The accuracy of the model can be [9] R. Yusnita, N. Fariza, and B. Norazwinawati, “Intelligent Parking
improved by using Region Proposal Network with constraints Space Detection System Based on Image Processing,” Journal of
on boundary box sizes. The approach can be extended with Innovation, Management and Technology, vol. 3, no. 3, pp. 232–235,
soft sampling [26]. Further, the proposed system has not been 2012.
tested in night-time due to the lack of a supportive dataset. [10] T. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for
Dense Object Detection," IEEE Transactions on Pattern Analysis and
However, the system can easily adapt to operate at night by Machine Intelligence, vol. 42, no. 2, pp. 318-327, 2020
fine-tuning the trained model on night-time data.
[11] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks,” IEEE
TABLE III. COMPARISON WITH EXISTING APPROACHES Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 6, pp. 1137–1149, 2017.
Method Inbuild Manual Signal Support
Sensor For Segmentation APODS [12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C-Y. Fu, and B.
Each Space C. Alexander, “SSD: Single Shot MultiBox Detector”, in Computer
Variation of the magnetic field of Yes Yes No Vision, B. Leibe, J. Matas, N. Sebe and M. Welling, Ed., Springer,
the vehicle [1][2][3] Cham, 2016, LNCS, vol. 9905, pp. 21–37.
Classify with manually segmented No Yes No [13] P. R. De Almeida, L. S. Oliveira, A. S. Britto, E. J. Silva, and A. L.
video stream [6][7][8][16] Koerich, “PKLot - A robust dataset for parking lot classification,”
Proposed solution No No Yes Expert Systems with Applications, vol. 42, no. 11, pp. 4937–4949,
2015.
[14] T. Lin, H. Rivano, and F. Le Mouël, “A survey of smart parking
VI. CONCLUSION solutions,” IEEE Transactions on Intelligent Transportation Systems,
vol. 18, no. 12, pp. 3229–3253, 2017.
This paper is presented an end-to-end automated vehicle [15] Q. Wu, C. Huang, S. Y. Wang, W. C. Chiu, and T. Chen, “Robust
parking occupancy detection system with the use of parking space detection considering inter-space correlation,” in Proc.
IEEE Int. Conf. on Multimedia and Expo, Beijing, China, 2007, pp.
surveillance stream. A novel approach is proposed to detect 659–662.
and classify the parking spaces using object detection [16] C. C. Huang and S. J. Wang, “A hierarchical bayesian generation
techniques. The proposed solution is used both the single- framework for vacant parking space detection,” IEEE Transactions on
stage detector and two-stage detector techniques to preserve Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1770–
the accuracy and efficiency of the results. The design is based 1785, 2010.
on modular software architecture and the microservices are [17] J. Ahmad, Z. Lewis, P. Duraisamy, and T. Mcdonald, “Parking lot
used for the implementation. Hence, supports the scalability monitoring using mrcnn,” in Proc. 10th Conf. on Computing,
Communication and Networking Technologies, Kanpur, India, 2019,
and resilience of the system and reduces unnecessary pp. 1–4.
installation and maintenance costs. This was implemented as
[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
a web and mobile application to automatically detect both the hierarchies for accurate object detection and semantic segmentation,”
occupied and unoccupied parking spaces. The solution avoids in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern
the need for manual segmentation of the video stream as in Recognition, Columbus, OH, USA, 2014, pp. 580–587.
existing solutions, making a flexible deployment. Thus, the [19] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P.
proposed approach can be extended for a commercially Dollár and C. L. Zitnick, “Microsoft coco: Common objects in
viable automated parking occupancy detection. context,” in Computer Vision, D. Fleet, T. Pajdla, B. Schiele and T.
Tuytelaars, Ed., Springer, Cham, 2014, LNCS, vol. 8693, pp. 740-755.
REFERENCES [20] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proc. IEEE Computer
[1] E. Sifuentes, O. Casas, and R. Pallas-Areny, “Wireless magnetic sensor Society Conf. on Computer Vision and Pattern Recognition, Las Vegas,
node for vehicle detection with optical wake-up,” IEEE Sensors NV, USA, 2016, pp. 779–788.
Journal, vol. 11, no. 8, pp. 1669–1676, 2011. [21] T. Hoang, P. Nguyen, N. Truong, Y. Lee, and K. Park, “Deep
[2] Z. Zhang, M. Tao, and H. Yuan, “A parking occupancy detection RetinaNet-Based Detection and Classification of Road Markings by
algorithm based on AMR sensor,” IEEE Sensors Journal, vol. 15, no. Visible Light Camera Sensors,” Sensors, vol. 19(2), no. 281, pp. 1-25,
2, pp. 1261–1269, 2015. 2019.
[3] S. Ma, C. Xu, X. Bao, Y. Wang, and F. Li, “Reliable Wireless Vehicle [22] L. Ale, N. Zhang, and L. Li, “Road Damage Detection Using
Detection using Magnetic Sensor and Distance Sensor,” Journal of RetinaNet,” in Proc. IEEE Int. Conf. on Big Data, Los Angeles, CA,
Digital Content Technology and its Applications (JDCTA), vol. 8, pp. USA, 2019, pp. 5197–5200.
112–121, 2014. [23] W. Hasselbring and G. Steinacker, “Microservice architectures for
[4] G. Amato, F. Carrara, F. Falchi, C. Gennaro, C. Meghini, and C. Vairo, scalability, agility and reliability in e-commerce,” in Proc. IEEE Int.
“Deep learning for decentralized parking lot occupancy detection,” Conf. on Software Architecture Workshops, Gothenburg, Sweden,
Expert Systems with Applications, vol. 72, pp. 327–334, 2017. 2017, pp. 243–246.
[5] W. Balzano and F. Vitale, “DiG-Park: A smart parking availability [24] Y. Wu, A. Kirillov, F. Massa, and W.-Y. Lo, “Detectron2,” GitHub.
searching method using V2V/V2I and DGP-class problem,” in Proc. https://github.com/facebookresearch/detectron2 (accessed July 05,
31st IEEE Int. Conf. on Advanced Information Networking and 2019).
Applications Workshops, Taipei, Taiwan, 2017, pp. 698–703. [25] G. Gamage, I. Sudasingha, I. Perera and D. Meedeniya, “Reinstating
[6] S. Valipour, M. Siam, E. Stroulia, and M. Jagersand, “Parking-stall Dlib Correlation Human Trackers Under Occlusions in Human
vacancy indicator system, based on deep convolutional neural Detection based Tracking”, in Proc. 18th Int. Conf. on Advances in ICT
networks,” in Proc. IEEE 3rd World Forum on Internet of Things, for Emerging Regions, Colombo, Sri Lanka, 2018, pp. 92-98.
Reston, VA, USA, 2017, pp. 655– 660. [26] Z. Wu, N. Bodla, B. Singh, M. Najibi, R. Chellappa, and L. S. Davis,
[7] M. Ahrnbom, K. Astrom, and M. Nilsson, “Fast classification of empty “Soft sampling for robust object detection,” arXiv preprint, arXiv:
and occupied parking spaces using integral channel features,” in Proc. 1806.06986, 2018, pp. 1-12.
649
Authorized licensed use limited to: Qatar University. Downloaded on March 03,2024 at 08:13:57 UTC from IEEE Xplore. Restrictions apply.