0% found this document useful (0 votes)
49 views

Road Image Classification: Leonid Dashko

This document summarizes approaches for classifying road images. The main approaches discussed are using Hough line transforms to detect road borders or markings, analyzing color information, detecting vanishing points, and using deep learning with convolutional neural networks (CNNs). CNNs have been widely and successfully used for image classification, including achieving high accuracy in classifying over 1 million images into 1000 classes. The document also lists several road image datasets that have been used for model training and evaluation.

Uploaded by

João Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Road Image Classification: Leonid Dashko

This document summarizes approaches for classifying road images. The main approaches discussed are using Hough line transforms to detect road borders or markings, analyzing color information, detecting vanishing points, and using deep learning with convolutional neural networks (CNNs). CNNs have been widely and successfully used for image classification, including achieving high accuracy in classifying over 1 million images into 1000 classes. The document also lists several road image datasets that have been used for model training and evaluation.

Uploaded by

João Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Road image classification

Leonid Dashko
[email protected]
Institute of Computer Science, University of Tartu
Supervisor: Amnir Hadachi

April 24, 2018

Abstract. This article summarizes the different ap- Basically, a road image can be classified into a struc-
proaches for road detection and recognition from single tured (covered with asphalt and may have line markings,
image and shows the results from classifying images to e.g. a road in urban area) or unstructured roads (for
detect whether the image contains road or not. example, roads in rural area). The most commonly used
Keywords: road classification, road image datasets, approach for detecting structured road in image is to
convolution neural network, image classification. localize road borders or road markings. The main tech-
niques for detecting roads are algorithms based on:
1) Finding 2 most dominant lines (their angles are
I. Introduction usually close to 30 and 120 degrees) which can make a
trapezoid form by using Hough Line Transform or other
earching for a valid path is one of the most im-

S portant tasks for navigating autonomous vehicle or


robot, hence the road detection is a trending and
crucial topic nowadays. In addition to finding a valid
transformations. This approach does not work well on
curvy roads and when the image contains other borders
such as crop fields, however, the Hough Line Detector
operation does not have a high computational cost, that
space for navigation, other Intelligent Transportation Sys-
is why it can be used for real-time detection.
tem (ITS) tasks are dependant on this task such as vehicle,
2) Analyzing color information (histogram, color inten-
pedestrian, or obstacle detection.
sity) [1], [2] work consistently only for structured roads
Researchers also may use LiDAR (Light Identification
which have noticeable markings or borders.
Detection and Ranging) for road detection task due to
3) Vanishing Point (VP) Detection. In paper [3], the
reliable information about surrounding geometrical ob-
researchers introduced the multi-task network VPGNet
jects, however, these sensors are much expensive than
(Vanishing Point Guided Network) for lane and road
monocular cameras. From another side, the camera can-
marking detection and recognition which is guided by a
not provide reliable information about the distance to
vanishing point under different weather and day/night-
surrounding objects.
time conditions. VPGNet performs four tasks: grid
This paper summarizes the approaches for road de-
regression, object detection, multi-label classification,
tection, introduces the results of experiment on road
and vanishing point prediction. VPGNet was trained
classification and road detection based on a single input
on 20,000 labeled images with 17 lane and road mark-
image.
ing classes under four different scenarios: no rain, rain,
To be more specific, the road image classification is
heavy rain, and night. The results show that the VPGNet
the task that answers the question "Does the input image
achieves high accuracy and robustness under various
contain valid road path(s)?", while the road detection is
conditions in real-time (20 fps). Interestingly, during
the task where the output shows which image pixels are
the training, authors did flipping the original images in
-road and -non-road image pixels.
order to extend the dataset in two times. It also helped
The main factors that make the road detection task
to simulate a left-sided environment as the initial dataset
challengeable are different types of roads, with(out) line
contained only images from right-sided environment.
marking, illumination conditions, daytime environments,
Also, it is worth to mention that VP is a middle point
etc.
between line of horizon (upper part of image) and road
part (bottom of the image) and everything above VP can
II. Background be ignored because we are only interested in road regions
in the image.
During the last century, researchers have been trying to 4) Deep learning. Researchers mainly apply a super-
solve the road detection problem in a number of ways. vised machine learning to this problem where the image

1
Distributed System Seminar • Spring 2018

Figure 1: Example of the structure of CNN (Convolutional Neural Network) for classification task

(or specific patch/region) and its labels are used as input into 7 classes: daylight, night, rainy day, rainy night,
to a model and the output from the model is the mullti- snowy, sun stoke, images in the tunnel.
dimentional array (the same as image size) that shows – CamVid [9]. 701 labelled images of structured roads
which pixels are road or non-road. only with different variation of weather, contains
In computer vision, people widely use the CNNs (Con- videos.
volutional Neural Networks) for image classification, ob- – LabelMeFacade [10]. 945 labelled images of structured
ject detection, image segmentation tasks. This type of roads only, contains videos.
neural network will also be used in our experiment. Ex- – SUN2012. 426 images of different types of roads,
ample of the structure of CNN for classification task can does not have labels.
be seen in Figure 1. – Dataset provided by Amnir Hadachi (supervisor of this
As an evidence, the paper [4] is considered as one of work). 1,780 snowy road images captured on the
the breakthroughs in computer vision during the last snowy roads of Tartu neighbourhood, does not have
decade. In this article, the researchers used a large deep labels.
convolutional neural network to classify the 1.2 million – Dataset collected from ImageNet online image catalog by
high-resolution images in the ImageNet ILSVRC-2012 the author [11]. 494 road images of different type,
contests into the 1000 different classes and they. Their under different weather and day-time conditions,
approach performed extremely well and they achieved a does not have labels.
winning top-1 and top-5 error rates of 17.0% and 37.5% – Dataset created from Google, Bing, Yahoo images by the
accordingly using purely supervised learning. author [11]. 5,390 road images of different type, un-
CNN architecture for classification tasks can be sum- der different weather and day-time conditions, does
marized as following: not have labels.

– Receives a vector of images as input layer and pass II.i. Overview of operations in CNN
it further to hidden layers.
– Hidden layers include a set of mathematical oper- Convolutional filters (Figure 2). The convolutional fil-
ations for extracting low-level features such as ap- ters on the first step try to detect low-level features, for
plying convolution filters, activation functions, max- example, edges and curves. Later, when we apply ad-
pooling operation, resizing the layer, transforming ditional convolutional filters, they can be more complex
to fully-connected layer. as they are builded up to the already discovered fil-
– Finally, the last operation in CNN for classification ters. Usually, the CNN has more than one convolu-
problem will be taking the fully-connected layer as tional filter that ends up in a series of convolutional
input and flattening out the nodes into the vector of layers. For every convolution filter, we must specify the
size N (where N is the number of classes). Filter_size = WidthxHeight (this is how many pixels we
want to supply to filters).
Datasets. Among existing road image datasets, these
are the most-widely used:

– ROMA [8]. 116 images of structured roads only,


contains videos.
– KITTI [6]. 612 labelled images of structured roads
only, contains videos. Besides, this dataset contains
the stereo data from LiDAR sensor.
– iRoads. 4,656 images of structured roads only, con-
tains videos. Also, this dataset has been categorized Figure 2: Example of convolutional filters

2
Distributed System Seminar • Spring 2018

Stride and Padding for filters (Figure 3). There are


several optional parameter which we can supply to con-
volutional filter (visualization can be found here [5]):

– Stride - the number of pixels by which the filter


shifts every time during convolution (by default the
shift is one unit at a time)
– Padding. In CNNs we move square filters around
the image, but we cannot go all the way to the edges Figure 5: Example of max-polling operation (downsampling)
of images if there is no padding, since part of the
filter would be outside the image, that is way we
can apply additional padding which will extends
III. Implementation
the margins of input tensor and fills those margins
To classify whether the input input contains the road or
with zero-values.
not, we used CNN is depicted on Figure 6 that has 2
Fully-Connected layers in the end.
To build the network, we used Keras 2 (library for
deep learning) and Python 3.6.3.

Some input parameters for our model:

– Input images were rescaled to 50x50 in RGB color


format.
– Number of epoches: 25.
– Batch size: 25 (the number of images that will be
Figure 3: Example of stride of size 1 which is default (on the left),
processed within 1 iteration).
and zero-padding of size 2 (on the right)
– Initial learning rate: 0.001.
– Input images were normalized to the range [0, 1]
Activation function (Figure 4). It is common approach instead of [0, 255].
to apply activation function after convolutional layers in – Data was splitted into training (80%) and test-
order to transform the output values in desired form or ing(20%) datasets.
modify the values through specific (activation) function. – Data Augmentation was applied in order to pre-
Overall, the layer with activation function basically maps vent overfitting that can do rotation of input images
the resulting values depending on the function. In case on +-30 degrees, horizontal flip, zooming, applying
of ReLU, the input values will be passed through func- shearing transformations.
tion f ( x ) = max (0, x ); that basically transforms negative – Total number of trainable parameters: 3,628,072. Pre-
values to zeros. latest Fully-Connected (Dense) layer contains the
biggest number of trainable parameters (3,600,500)
in comparison to other layers.
– "Adam" optimizer is applied.

IV. Resuls
Figure 4: Main activation functions The false road images of different objects and environ-
ments which surrounded by nature (they were down-
Pooling (down sampling) layers. After activation loaded from Google, 1,000 non-road images). The model
functions, pooling layer can be applied to this output was trained and tested on 3 different datasets separately:
of activation function. This basically takes a filter (nor- 1) "KITTI" dataset [6] contains 612 images with labels of
mally its size 2x2) and a stride of the same length and only structured roads.
pull the maximum value in the filter. This serves two - 596 photos are chosen for training (roads: 484 (81.21%),
main purposes. The first is that the amount of parameters non-roads: 112 (19%).)
or weights is reduced by 75%, thus lessening the compu- - 150 photos are chosen for testing (roads: 128 (85.33%),
tation cost. The second is that it will control overfitting. non-roads: 22 (15%)).
Results of classified images: 150 total images, correctly

3
Distributed System Seminar • Spring 2018

Figure 6: Structure of proposed classification CNN

classified 147, wrongly classified 3. Accuracy: 0.98% (it Actual/Predicted Road Non-road
seems that the model was overfitted) Road 0 0
Non-road 86 764
Actual/Predicted Road Non-road
Table 3: Confusion matrix for another test on non-road 850 images
Road 126 2 which were not included to training/testing data
Non-road 1 21

Table 1: Confusion matrix for model results on testing data on


"KITTI" dataset.

2) "ImageNet" dataset. The images were parsed from


original repository [7] by using scrapper script, cleaned
up (initially, there were 1200 images), and classified man-
ually into 3 groups: structured roads (319), unstructured
roads (175), vehicle in scene (387). Later, only the struc-
tured and unstructured roads will be used, while images
with vehicle(s) in scene will be ignored.
- 502 photos are chosen for training (roads: 393 (78.29%),
non-roads: 109 (22%)).
Figure 7: Results of classification on ImageNet dataset. Description
- 126 photos are chosen for testing (roads: 101 (80.16%), of every row: True Positive non-roads; False Negative
non-roads: 25 (20%)). non-roads; True Positive roads; False Negative roads
Results of classified images: 126 total images, correctly
classified 113, wrongly classified 13. Accuracy: 0.89%.
(Some results of classification is shown on Figure 7)
V. Conclusion and Future work
Actual/Predicted Road Non-road Due to the fact that the accuracy on KITTI dataset is 98%,
Road 93 8 it shows that the model was overfitted. Our assumption
Non-road 5 20 is that it happened due to lack of heterogeneity in
images as they basically contain similar environment.
Table 2: Confusion matrix for model results on testing data on "Im- However, the results on ImageNet dataset shows that
ageNet" dataset
the heterogeneity of training images is higher, and the
accuracy reached 89%. Still, we think that the number
of input images for training should be at least 2,000
Another test was conducted on this dataset where we to learn features better. From another side, the size of
tested only non-roads images which were not included trainable images was small (50x50), and the size may
in test/train datasets. also be increased to higher dimensions.
Results of classified images: 850 total images, correctly
classified 764, wrongly classified 86. Accuracy: 0.89%. Finally, the future work will consist of building the
Confusion matrix is shown below: neural network for pixel-wise classification.

4
Distributed System Seminar • Spring 2018

References
[1] Z. Tian, C. Xu, X. Wang and Z. Yang, "Non-
parametic model for robust road recognition" IEEE
10th INTERNATIONAL CONFERENCE ON SIG-
NAL PROCESSING PROCEEDINGS, Beijing, 2010,
pp. 869-872.

[2] J. M. A. Alvarez and A. M. Lopez, "Road De-


tection Based on Illuminant Invariance" in IEEE
Transactions on Intelligent Transportation Systems,
vol. 12, no. 1, pp. 184-193, March 2011. doi:
10.1109/TITS.2010.2076349

[3] Seokju Lee, Junsik Kim, Jae Shin Yoon, Seunghak


Shin, Oleksandr Bailo, Namil Kim, Tae-Hee Lee,
Hyun Seok Hong, Seung-Hoon Han, In So Kweon,
"VPGNet: Vanishing Point Guided Network for
Lane and Road Marking Detection and Recogni-
tion" in IEEE International Conference on Computer
Vision (ICCV 2017).

[4] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,


"ImageNet Classification with Deep Convolutional
Neural Networks" In Advances in neural informa-
tion processing systems (NIPS), pages 1097-1105.

[5] ”Visualization of optional mathematical operation


that can be used with convolutional operation”. Last
accessed: 9/4/2018.

[6] ”KITTI road image dataset”. Last accessed:


9/4/2018.

[7] ”ImageNet - image catalog”. Last accessed:


9/4/2018.

[8] ”ROMA road image dataset”. Last accessed:


24/4/2018.

[9] ”CamVid road image dataset”. Last accessed:


24/4/2018.

[10] ”LabelMeFacade road image dataset”. Last accessed:


24/4/2018.

[11] ”GitHub: Road image datasets created by author”.


Last accessed: 24/4/2018.

You might also like