3D Vehicle Detection Using A Laser Scanner and A Video Camera
3D Vehicle Detection Using A Laser Scanner and A Video Camera
org
ISSN 1751-956X
Abstract: A new approach for vehicle detection performs sensor fusion of a laser scanner and a video sensor.
This combination provides enough information to handle the problem of multiple views of a car. The laser
scanner estimates the distance as well as the contour information of observed objects. The contour
information can be used to identify the discrete sides of rectangular objects in the laser scanner coordinate
system. The transformation of the three-dimensional coordinates of the most visible side to the image
coordinate system allows for a reconstruction of its original view. This transformation also compensates size
differences in the video image, which are caused by different distances to the video sensor. Afterwards, a
pattern recognition algorithm can classify the object’s sides based on contour and shape information. Since
the number of available object hypotheses is enormously reduced by the laser scanner, the system is
applicable in real time. In addition, video-based vehicle detection and additional laser scanner features are
fused in order to create a consistent vehicle environment description.
There are several challenges corresponding to the Usually, the orientation of observed vehicles is only
vehicle detection task. If only a single video sensor is restricted on highways. At intersections and in urban
used for object detection, there will be a lot of object areas, vehicles can occur in all possible orientations to
hypotheses, which must be evaluated. Vehicles must be the video sensor. The vehicle’s appearance changes
expected at all possible positions and in all possible sizes with its orientation to the video sensor (Fig. 4).
in the video image, if no additional knowledge is Usually, pattern recognition concentrates on contour
available. This is usually performed with image and texture information. For this reason the detection
pyramids (Fig. 1). The complete image-scanning task becomes more complex. This fact is considered
procedure then needs a lot of processing time. There by Schneiderman and Kanade [7]. They trained a
are approaches, which reduce the available object detector with samples of different viewpoints. Thus, a
hypotheses by a flat world assumption [1, 2]. This complex detector, which can represent all views, was
approach benefits from the correlation of object necessary. Another possibility is to train several
position and size (Fig. 2). Unfortunately the flat world detectors to cover different viewpoints with different
assumption does not hold for all possible traffic scenarios. detectors. Consequently many detectors have to be
& The Institution of Engineering and Technology 2008 IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112/ 105
doi: 10.1049/iet-its:20070031
www.ietdl.org
106 / IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112 & The Institution of Engineering and Technology 2008
doi: 10.1049/iet-its:20070031
www.ietdl.org
Figure 4 Cars significantly change the appearance with the orientation to the video sensor
Therefore the detection task becomes very difficult, if pattern recognition based on texture and contour is used
3.2 Estimation of the horizontal length of a car, because the object shape usually only
orientation defines an object orientation in a range of 0– 908.
The laser scanner measurements of an object provide In addition to the horizontal position, the laser scanner
information about the vehicle’s horizontal position and also provides a vertical position of its measurements. As
orientation. A distance-based segmentation algorithm the laser scanner measures in four horizontal layers, the
groups distance measurements, which probably vertical information is quite insufficient to determine
correspond to the same object. The contour of the the exact vertical object expansion. Nevertheless, this
object is now analysed. An algorithm fits two types of information can be used to calculate an object box,
shapes into each segment. The L-shape is typical for which certainly includes the object. The upper
rectangular objects such as cars with two observed boundary of this box is the sum of the maximum object
sides. An I-shape can be caused by a rectangular object height and the smallest z-value of all laser scanner
with one observed side or by objects such as walls or measurements. The lower boundary is the difference
crash barriers. The algorithm selects the shape, which between the largest z-value and the maximum object
fits the best. If this shape does not fit well enough, an height.
O-shape will be selected. This type describes objects
of undefined shape (Fig. 7).
3.4 Coordinate transformation
Details about the shape estimation can be found in The 3D object box side with the most visible orientation
[9]. The assigned shape allows for an extraction of the to the camera is transformed to the image coordinate
object’s orientation in cases of L- and I-shapes. system. The transformed box side describes the
Objects with an O-shape will be ignored by the position and the deformation of the corresponding
following algorithms. object side in the image. For pattern recognition
purposes, the box side is increased by 25% before it is
transformed to the image coordinate system. An
3.3 Object box fitting example for this transformation is shown in Fig. 9.
The estimation of the visible object sides is based on an
object box, which is fitted in the laser scanner 3.5 Rectification of the object sides
measurements (Fig. 8). As the transformed box side describes the position and
deformation of the original object side in the video
The horizontal position is defined precisely by the image, a perspective warping can reconstruct the
assigned shape. The visible object sides are positioned
directly at the visible surface of the box. The width
and the length of the box is defined by the maximum
& The Institution of Engineering and Technology 2008 IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112/ 107
doi: 10.1049/iet-its:20070031
www.ietdl.org
108 / IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112 & The Institution of Engineering and Technology 2008
doi: 10.1049/iet-its:20070031
www.ietdl.org
Figure 12 Sample data for rear ends (left) and left sides (right)
& The Institution of Engineering and Technology 2008 IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112/ 109
doi: 10.1049/iet-its:20070031
www.ietdl.org
Figure 14 Algorithm calculates an edge image and localises pairs of wheels by means of a Hough transform in an
appropriate ROI of the original image
Human user only confirms or corrects the labelled data
110 / IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112 & The Institution of Engineering and Technology 2008
doi: 10.1049/iet-its:20070031
www.ietdl.org
& The Institution of Engineering and Technology 2008 IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112/ 111
doi: 10.1049/iet-its:20070031
www.ietdl.org
Obviously, the pure laser-scanner-based classification [4] MAEHLISCH M., SCHWEIGER R., RITTER W. , DIETMAYER K.:
already performs very well. But there is still a significant ‘Sensorfusion using spatio-temporal aligned video and
improvement of the feature level fusion. Several facts lidar for improved vehicle detection’. Proc. Intelligent
should be considered, while comparing the results. Vehicles Symposium 2006, Tokyo, Japan, 2006
First, the field of view of the video camera is much
smaller than the laser scanner’s field of view. Thus, [5] KAEMPCHEN N.: ‘Feature-level fusion of laser scanner and
the classification can only be improved for some of the video data for advanced driver assistance systems’, PhD
objects. Second, the laser scanner already performs a Thesis, Ulm University, 2007
good classification of moving objects. Detailed analyses
of the test sequences have shown that the achieved [6] TAKIZAWA H., YAMADA K., ITO T.: ‘Vehicles detection using
benefit is primarily on non-moving cars, but the sensor fusion’. Proc. 2004 IEEE Intelligent Vehicles
evaluated scenarios contain a much higher amount of Symposium, Parma, Italy, 2004
moving cars than non-moving cars. However, the
small amount of cases, which are improved by the [7] SCHNEIDERMAN H., KANADE T.: ‘A statistical model for 3D
video camera, is nevertheless of importance. Non- object detection applied to faces and cars’. Proc. IEEE
moving objects, which are poorly classified by the Conf. Computer Vision and Pattern Recognition, 2000
pure laser scanner approach, can especially cause
dangerous situations, if they are located in front of the [8] FAUGERAS O.: ‘Three-dimensional computer vision’ (MIT
test vehicle. Fortunately, these objects are in the field Press, 2001)
of view of the video camera.
[9] WENDER S., FUERSTENBERG K.CH., DIETMAYER K.C.J.: ‘Object
tracking and classification for intersection scenarios
7 Conclusion using a multilayer laser scanner’. Proc. 11th World
A new real-time vehicle detector was introduced. The Congress on Intelligent Transportation Systems, Nagoya,
system fuses laser scanner measurements with images of a Japan, 2004
video sensor. The system can handle the infinite number
of different views of a car, which are generated by [10] Intel Research Labs: ‘Open Source Computer Vision
different orientations to the observer. A three-dimensional Library’, Online available at: http://www.intel.com/
object model is used. A perspective transformation is technology/computing/opencv/, accessed October 2007
applied to one of the object sides. In addition, scaling
effects caused by different distances to the video sensor [11] VIOLA P., JONES M.: ‘Fast and robust classification using
are compensated. This allows for the application of shape- asymmetric AdaBoost and a detector cascade’, in
and contour-based pattern recognition algorithms at a ‘Advances in Neural Information Processing Systems 14’,
small number of sub-window positions and scales. The (MIT Press, 2002)
system performance was evaluated with labelled test data.
[12] LINDNER F., KRESSEL U., KAELBERER S.: ‘Robust recognition of
8 Acknowledgment traffic signals’. Proc. 2004 IEEE Intelligent Vehicles
Symposium, Parma, Italy, 2004
A previous version of this paper was presented at the
ITS ’07 European Congress held in Aalborg, Denmark [13] MAEHLISCH M., OBERLAENDER M., LOEHLEIN O., GAVRILA D., RITTER W.:
in June 2007. ‘A multiple detector approach to low-resolution FIR
pedestrian recognition’. Proc. 2005 IEEE Intelligent Vehicles
9 References Symposium, Las Vegas, USA, 2005
[1] PONZA D., LOPEZ A., LUMBRERAS F., SERRAT J., GRAF T.: ‘3D vehicle [14] KALLENBACH I., SCHWEIGER R., PALM G., LOEHLEIN O.: ‘Multi-class
sensor based on monocular vision’. Proc. IEEE Conf. object detection in vision systems using a hierarchy of
Intelligent Transportation Systems, Vienna, Austria, 2005 cascaded classifiers’. Proc. 2006 IEEE Intelligent Vehicles
Symposium, Tokyo, Japan, 2006
[2] KHAMMARI A., NASHASHIBI F., ABRAMSON Y. , LAURGEAU C.:
‘Vehicle detection combining gradient analysis and [15] WALCHSHAEUSL L., LINDL R., VOGEL K.: ‘Detection of road
AdaBoost classification’. Proc. IEEE 8th Int. Conf. users in fused sensor data streams for collision
Intelligent Transport Systems, Vienna, Austria, 2005 mitigation’. Proc. AMAA: Advanced Microsystems for
Automotive Applications 2006, Berlin, Germany, 2006
[3] KAEMPCHEN N., DIETMAYER K.: ‘Fusion of Laserscanner and
video for advanced driver assistance systems’. Proc. 11th [16] WENDER S., WEISS T., FUERSTENBERG K., DIETMAYER K.C.J. :
World Congress on Intelligent Transportation Systems, ‘Feature level fusion for object classification’, PReVENT
Nagoya, Japan, 2004 ProFusion e-J., 2006, 1, pp. 31– 36
112 / IET Intell. Transp. Syst., 2008, Vol. 2, No. 2, pp. 105 – 112 & The Institution of Engineering and Technology 2008
doi: 10.1049/iet-its:20070031