0% found this document useful (0 votes)

95 views11 pages

Semantickitti: A Dataset For Semantic Scene Understanding of Lidar Sequences

The document introduces the SemanticKITTI dataset, a large dataset for semantic scene understanding of LiDAR sequences from self-driving cars. It provides dense, point-wise annotations for over 43,000 scans from sequences captured with a Velodyne HDL-64E LiDAR. This dataset aims to advance research in laser-based semantic segmentation of 3D point clouds. It establishes three benchmark tasks: 1) semantic segmentation of single scans, 2) semantic segmentation using multiple past scans, and 3) semantic scene completion to anticipate future scenes. The dataset significantly surpasses prior datasets in scale and detail of annotations.

Uploaded by

Ignacio Vizzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views11 pages

Semantickitti: A Dataset For Semantic Scene Understanding of Lidar Sequences

Uploaded by

Ignacio Vizzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

SemanticKITTI: A Dataset for Semantic Scene Understanding

of LiDAR Sequences

Jens Behley∗ Martin Garbade∗ Andres Milioto Jan Quenzel

Sven Behnke Cyrill Stachniss Juergen Gall
University of Bonn, Germany
www.semantic-kitti.org

Figure 1: Our dataset provides dense annotations for each scan of all sequences from the KITTI Odometry Benchmark [19].
Here, we show multiple scans aggregated using pose information estimated by a SLAM approach.

Abstract 1. Introduction

Semantic scene understanding is important for various Semantic scene understanding is essential for many ap-
applications. In particular, self-driving cars need a fine- plications and an integral part of self-driving cars. Par-
grained understanding of the surfaces and objects in their ticularly, fine-grained understanding provided by seman-
vicinity. Light detection and ranging (LiDAR) provides pre- tic segmentation is necessary to distinguish drivable and
cise geometric information about the environment and is non-drivable surfaces and to reason about functional prop-
thus a part of the sensor suites of almost all self-driving erties, like parking areas and sidewalks. Currently, such un-
cars. Despite the relevance of semantic scene understand- derstanding, represented in so-called high definition maps,
ing for this application, there is a lack of a large dataset for is mainly generated in advance using surveying vehicles.
this task which is based on an automotive LiDAR. However, self-driving cars should also be able to drive
in unmapped areas and adapt their behavior if there are
In this paper, we introduce a large dataset to propel re-
changes in the environment.
search on laser-based semantic segmentation. We anno-
tated all sequences of the KITTI Vision Odometry Bench- Most self-driving cars currently use multiple different
mark and provide dense point-wise annotations for the com- sensors to perceive the environment. Complementary sen-
plete 360o field-of-view of the employed automotive LiDAR. sor modalities enable to cope with deficits or failures of par-
We propose three benchmark tasks based on this dataset: ticular sensors. Besides cameras, light detection and rang-
(i) semantic segmentation of point clouds using a single ing (LiDAR) sensors are often used as they provide precise
scan, (ii) semantic segmentation using multiple past scans, distance measurements that are not affected by lighting.
and (iii) semantic scene completion, which requires to an- Publicly available datasets and benchmarks are crucial
ticipate the semantic scene in the future. We provide base- for empirical evaluation of research. They mainly ful-
line experiments and show that there is a need for more fill three purposes: (i) they provide a basis to measure
sophisticated models to efficiently tackle these tasks. Our progress, since they allow to provide results that are re-
dataset opens the door for the development of more ad- producible and comparable, (ii) they uncover shortcomings
vanced methods, but also provides plentiful data to inves- of the current state of the art and therefore pave the way
tigate new research directions. for novel approaches and research directions, and (iii) they
make it possible to develop approaches without the need to
first painstakingly collect and label data. While multiple
∗ indicates equal contribution
1
#scans1 #points2 #classes3 sensor annotation sequential
SemanticKITTI (Ours) 23201/20351 4549 25 (28) Velodyne HDL-64E point-wise 3
Oakland3d [36] 17 1.6 5 (44) SICK LMS point-wise 7
Freiburg [50, 6] 77 1.1 4 (11) SICK LMS point-wise 7
Wachtberg [6] 5 0.4 5 (5) Velodyne HDL-64E point-wise 7
Semantic3d [23] 15/15 4009 8 (8) Terrestrial Laser Scanner point-wise 7
Paris-Lille-3D [47] 3 143 9 (50) Velodyne HDL-32E point-wise 7
Zhang et al. [65] 140/112 32 10 (10) Velodyne HDL-64E point-wise 7
KITTI [19] 7481/7518 1799 3 Velodyne HDL-64E bounding box 7

Table 1: Overview of other point cloud datasets with semantic annotations. Ours is by far the largest dataset with sequential
information. 1 Number of scans for train and test set, 2 Number of points is given in millions, 3 Number of classes used for
evaluation and number of classes annotated in brackets.

large datasets for image-based semantic segmentation exist 2. Related Work

[10, 39], publicly available datasets with point-wise annota-
tion of three-dimensional point clouds are still comparably The progress of computer vision has always been driven
small, as shown in Table 1. by benchmarks and datasets [55], but the availability of es-
pecially large-scale datasets, such as ImageNet [13], was
To close this gap we propose SemanticKITTI, a large
even a crucial prerequisite for the advent of deep learning.
dataset showing unprecedented detail in point-wise annota-
tion with 28 classes, which is suited for various tasks. In this More task-specific datasets geared towards self-driving
paper, we mainly focus on laser-based semantic segmenta- cars were also proposed. Notable is here the KITTI Vi-
tion, but also semantic scene completion. The dataset is dis- sion Benchmark [19] since it showed that off-the-shelf so-
tinct from other laser datasets as we provide accurate scan- lutions are not always suitable for autonomous driving. The
wise annotations of sequences. Overall, we annotated all 22 Cityscapes dataset [10] is the first dataset for self-driving
sequences of the odometry benchmark of the KITTI Vision car applications that provides a considerable amount of
Benchmark [19] consisting of over 43 000 scans. Moreover, pixel-wise labeled images suitable for deep learning. The
we labeled the complete horizontal 360◦ field-of-view of Mapillary Vistas dataset [39] surpasses the amount and di-
the rotating laser sensor. Figure 1 shows example scenes versity of labeled data compared to Cityscapes.
from the provided dataset. In summary, our main contribu- Also in point cloud-based interpretation, e.g., semantic
tions are: segmentation, RGB-D based datasets enabled tremendous
progress. ShapeNet [8] is especially noteworthy for point
• We present a point-wise annotated dataset of point clouds showing a single object, but such data is not directly
cloud sequences with an unprecedented number of transferable to other domains. Specifically, LiDAR sensors
classes and unseen level-of-detail for each scan. usually do not cover objects as densely as an RGB-D sensor
due to their lower angular resolution, in particular in vertical
• We furthermore provide an evaluation of state-of-the- direction.
art methods for semantic segmentation of point clouds. For indoor environments, there are several datasets [48,
46, 24, 3, 11, 35, 32, 12] available, which are mainly
• We investigate the usage of sequence information for
recorded using RGB-D cameras or synthetically generated.
semantic segmentation using multiple scans.
However, such data shows very different characteristics
• Based on the annotation of sequences of a moving car, compared to outdoor environments, which is also caused
we furthermore introduce a real-world dataset for se- by the size of the environment, since point clouds captured
mantic scene completion and provide baseline results. indoors tend to be much denser due to the range at which
objects are scanned. Furthermore, the sensors have differ-
• Together with a benchmark website, the point cloud ent properties regarding sparsity and accuracy. While laser
labeling tool is also publicly available, enabling other sensors are more precise than RGB-D sensors, they usually
researchers to generate other labeled datasets in future. only capture a sparse point cloud compared to the latter.
For outdoor environments, datasets were recently pro-
This large dataset will stimulate the development of posed that are recorded with a terrestrial laser scanner
novel algorithms, make it possible to investigate new re- (TLS), like the Semantic3d dataset [23], or using automo-
search directions, and puts evaluation and comparison of tive LiDARs, like the Paris-Lille-3D dataset [47]. However,
these novel algorithms on a more solid ground. the Paris-Lille-3D provides only the aggregated scans with
point-wise annotations for 50 classes from which 9 are se-
lected for evaluation. Another recently used large dataset
for autonomous driving [57], but with fewer classes, is not
publicly available.
The Virtual KITTI dataset [17] provides synthetically
generated sequential images with depth information and
dense pixel-wise annotation. The depth information can
also be used to generate point clouds. However, these point
clouds do not show the same characteristics as a real rotat-
ing LiDAR, including defects like reflections and outliers.
In contrast to these datasets, our dataset combines a large
amount of labeled points, a large variety of classes, and se-
quential scans generated by a commonly employed sensor
used in autonomous driving, which is distinct from all pub-
licly available datasets, also shown in Table 1.

road sidewalk parking car pole

3. The SemanticKITTI Dataset
vegetation terrain trunk building

Our dataset is based on the odometry dataset of the other-structure other-object

KITTI Vision Benchmark [19] showing inner city traffic,

residential areas, but also highway scenes and countryside Figure 2: Single scan (top) and multiple superimposed
roads around Karlsruhe, Germany. The original odome- scans with labels (bottom). Also shown is a moving car
try dataset consists of 22 sequences, splitting sequences in the center of the image resulting in a trace of points.
00 to 10 as training set, and 11 to 21 as test set. For con-
sistency with the original benchmark, we adopt the same essary to account for the sparsity and vertical field-of-view.
division for our training and test set. Moreover, we do not More specifically, we do not distinguish between persons
interfere with the original odometry benchmark by provid- riding a vehicle and the vehicle, but label the vehicle and
ing labels only for the training data. Overall, we provide the person as either bicyclist or motorcyclist.
23 201 full 3D scans for training and 20 351 for testing, We furthermore distinguished between moving and non-
which makes it by a wide margin the largest dataset pub- moving vehicles and humans, i.e., vehicles or humans gets
licly available. the corresponding moving class if they moved in some scan
We decided to use the KITTI dataset as a basis for our la- while observing them, as shown in the lower part of Fig-
beling effort, since it allowed us to exploit one of the largest ure 2. All annotated classes are listed in Figure 3 and a more
available collections of raw point cloud data captured with a detailed discussion and definition of the different classes
car. We furthermore expect that there are also potential syn- can be found in the supplementary material. In summary,
ergies between our annotations and the existing benchmarks we have 28 classes, where 6 classes are assigned the at-
and this will enable the investigation and evaluation of ad- tribute moving or non-moving, and one outlier class is in-
ditional research directions, such as the usage of semantics cluded for erroneous laser measurements caused by reflec-
for laser-based odometry estimation. tions or other effects.
Compared to other datasets (cf. Table 1), we provide The dataset is publicly available through a benchmark
labels for sequential point clouds generated with a com- website and we provide only the training set with ground
monly used automotive LiDAR, i.e., the Velodyne HDL- truth labels and perform the test set evaluation online. We
64E. Other publicly available datasets, like Paris-Lille-3D furthermore will also limit the number of possible test set
[47] or Wachtberg [6], also use such sensors, but only pro- evaluations to prevent overfitting to the test set [55].
vide the aggregated point cloud of the whole acquired se-
3.1. Labeling Process
quence or some individual scans of the whole sequence,
respectively. Since we provide the individual scans of the To make the labeling of point cloud sequences practi-
whole sequence, one can also investigate how aggregating cal, we superimpose multiple scans above each other, which
multiple consecutive scans influences the performance of conversely allows us to label multiple scans consistently. To
the semantic segmentation and use the information to rec- this end, we first register and loop close the sequences using
ognize moving objects. an off-the-shelf laser-based SLAM system [5]. This step
We annotated 28 classes, where we ensured a large over- is needed as the provided information of the inertial nav-
lap of classes with the Mapillary Vistas dataset [39] and igation system (INS) often results in map inconsistencies,
Cityscapes dataset [10] and made modifications where nec- i.e., streets that are revisited after some time have differ-
1
109 ignored for evaluation
number of points

1
108

other-structure

1
other-ground

other-vehicle

motorcyclist

other-object
motorcycle

vegetation
107

traffic sign
sidewalk

bicyclist
building

1
parking

bicycle

person
terrain

outlier
fence
trunk
truck
106
road

pole
car
105
ground structure vehicle nature human object

Figure 3: Label distribution. The number of labeled points per class and the root categories for the classes are shown. For
movable classes, we also show the number of points on non-moving (solid bars) and moving objects (hatched bars).

ent height. For three sequences, we had to manually add motorcyclist only occurs rarely, but still more than 100 000
loop closure constraints to get correctly loop closed trajec- points are annotated.
tories, since this is essential to get consistent point clouds The unbalanced count of classes is common for datasets
for annotation. The loop closed poses allow us to load all captured in natural environments and some classes will be
overlapping point clouds for specific locations and visualize always under-represented, since they do not occur that of-
them together, as depicted in Figure 2. ten. Thus, an unbalanced class distribution is part of the
We subdivide the sequence of point clouds into tiles of problem that an approach has to master. Overall, the distri-
100 m by 100 m. For each tile, we only load scans overlap- bution and relative differences between the classes is quite
ping with the tile. This enables us to label all scans con- similar in other datasets, e.g. Cityscapes [10].
sistently even when we encounter temporally distant loop
closures. To ensure consistency for scans overlapping with 4. Evaluation of Semantic Segmentation
more than one tile, we show all points inside each tile and a
In this section, we provide the evaluation of several state-
small boundary overlapping with neighboring tiles. Thus, it
of-the-art methods for semantic segmentation of a single
is possible to continue labels from a neighboring tile.
scan. We also provide experiments exploiting information
Following best practices, we compiled a labeling instruc-
provided by sequences of multiple scans.
tion and provided instructional videos on how to label cer-
tain objects, such as cars and bicycles standing near a wall. 4.1. Single Scan Experiments
Compared to image-based annotation, the annotation pro-
Task and Metrics. In semantic segmentation of point
cess with point clouds is more complex, since the annotator
clouds, we want to infer the label of each three-dimensional
often needs to change the viewpoint. An annotator needs on
point. Therefore, the input to all evaluated methods is a list
average 4.5 hours per tile, when labeling residential areas
of coordinates of the three-dimensional points along with
corresponding to the most complex encountered scenery,
their remission, i.e., the strength of the reflected laser beam
and needs on average 1.5 hours for labeling a highway tile.
which depends on the properties of the surface that was hit.
We explicitly did not use bounding boxes or other avail-
Each method should then output a label for each point of a
able annotations for the KITTI dataset, since we want to en-
scan, i.e., one full turn of the rotating LiDAR sensor.
sure that the labeling is consistent and the point-wise labels
To assess the labeling performance, we rely on the com-
should only contain the object itself.
monly applied mean Jaccard Index or mean intersection-
We provided regular feedback to the annotators to im-
over-union (mIoU) metric [15] over all classes, given by
prove the quality and accuracy of labels. Nevertheless, a
single annotator also verified the labels in a second pass, 1 X
C
TPc
i.e., corrected inconsistencies and added missing labels. In , (1)
C c=1 TPc + FPc + FNc
summary, the whole dataset comprises 518 tiles and over
1 400 hours of labeling effort have been invested with addi- where TPc , FPc , and FNc correspond to the number of true
tional 10 − 60 minutes verification and correction per tile, positive, false positive, and false negative predictions for
resulting in a total of over 1 700 hours. class c, and C is the number of classes.
As the classes other-structure and other-object have ei-
3.2. Dataset Statistics
ther only a few points and are otherwise too diverse with a
Figure 3 shows the distribution of the different classes, high intra-class variation, we decided to not include these
where we also included the root categories as labels on the classes in the evaluation. Thus, we use 25 instead of 28
x-axis. The ground classes, road, sidewalk, building, vege- classes, ignoring outlier, other-structure, and other-object
tation, and terrain are the most frequent classes. The class during training and inference.
Furthermore, we cannot expect to distinguish moving data is sampled from smooth surfaces and defining a tan-
from non-moving objects with a single scan, since this Velo- gent convolution as a convolution applied to the projection
dyne LiDAR cannot measure velocities like radars exploit- of the local surface at each point into the tangent plane.
ing the Doppler effect. We therefore combine the moving SPLATNet [51] takes an approach that is similar to the
classes with the corresponding non-moving class resulting aforementioned voxelization methods and represents the
in a total number of 19 classes for training and evaluation. point clouds in a high-dimensional sparse lattice. As with
voxel-based methods, this scales poorly both in compu-
State of the Art. Semantic segmentation or point-wise tation and in memory cost and therefore they exploit the
classification of point clouds is a long-standing topic [2], sparsity of this representation by using bilateral convolu-
which was traditionally solved using a feature extractor, tions [27], which only operates on occupied lattice parts.
such as Spin Images [29], in combination with a traditional Similarly to PointNet, Superpoint Graph [31], captures
classifier, like support vector machines [1] or even semantic the local relationships by summarizing geometrically ho-
hashing [4]. Many approaches used Conditional Random mogeneous groups of points into superpoints, which are
Fields (CRF) to enforce label consistency of neighboring later embedded by local PointNets. The result is a super-
points [56, 37, 36, 38, 62]. point graph representation that is more compact and rich
With the advent of deep learning approaches in image- than the original point cloud exploiting contextual relation-
based classification, the whole pipeline of feature extrac- ships between the superpoints.
tion and classification has been replaced by end-to-end deep SqueezeSeg [60, 61] also discretizes the point cloud in
neural networks. Voxel-based methods transforming the a way that makes it possible to apply 2D convolutions to
point cloud into a voxel-grid and then applying convolu- the point cloud data exploiting the sensor geometry of a ro-
tional neural networks (CNN) with 3D convolutions for ob- tating LiDAR. In the case of a rotating LiDAR, all points
ject classification [34] and semantic segmentation [26] were of a single turn can be projected to an image by using a
among the first investigated models, since they allowed to spherical projection. A fully convolutional neural network
exploit architectures and insights known for images. is applied and then finally filtered with a CRF to smooth
To overcome the limitations of the voxel-based represen- the results. Due to the promising results of SqueezeSeg and
tation, such as the exploding memory consumption when the fast training, we investigated how the labeling perfor-
the resolution of the voxel grid increases, more recent ap- mance is affected by the number of model parameters. To
proaches either upsample voxel-predictions [53] using a this end, we used a different backbone based on the Dark-
CRF or use different representations, like more efficient net architecture [42] with 21 and 53 layers, and 25 and 50
spatial subdivisions [30, 44, 63, 59, 21], rendered 2D im- million parameters respectively. We furthermore eliminated
age views [7], graphs [31, 54], splats [51], or even directly the vertical downsampling used in the architecture.
the points [41, 40, 25, 22, 43, 28, 14]. We modified the available implementations such that the
Baseline approaches. We provide the results of six state- methods could be trained and evaluated on our large-scale
of-the-art architectures for the semantic segmentation of dataset. Note that most of these approaches have so far only
point clouds in our dataset: PointNet [40], PointNet++ [41], been evaluated on shape [8] or RGB-D indoor datasets [48].
Tangent Convolutions [52], SPLATNet [51], Superpoint However, some of the approaches [40, 41] were only possi-
Graph [31], and SqueezeSeg (V1 and V2) [60, 61]. Further- ble to run with considerable downsampling to 50 000 points
more, we investigate two extensions of SqueezeSeg: Dark- due to memory limitations.
Net21Seg and DarkNet53Seg. Results and Discussion. Table 2 shows the results of our
PointNet [40] and PointNet++ [41] use the raw un- baseline experiments for various approaches using either di-
ordered point cloud data as input. Core of these approaches rectly the point cloud information [40, 41, 51, 52, 31] or a
is max pooling to get an order-invariant operator that works projection of the point cloud [60]. The results show that the
surprisingly well for semantic segmentation of shapes and current state of the art for point cloud semantic segmenta-
several other benchmarks. Due to this nature, however, tion falls short for the size and complexity of our dataset.
PointNet fails to capture the spatial relationships between We believe that this is mainly caused by the limited ca-
the features. To alleviate this, PointNet++ [41] applies indi- pacity of the used architectures (see Table 3), because the
vidual PointNets to local neighborhoods and uses a hierar- number of parameters of these approaches is much lower
chical approach to combine their outputs. This enables it to than the number of parameters used in leading image-based
build complex hierarchical features that capture both local semantic segmentation networks. As mentioned above, we
fine-grained and global contextual information. added DarkNet21Seg and DarkNet53Seg to test this hy-
Tangent Convolutions [52] also handles unstructured pothesis and the results show that this simple modifica-
point clouds by applying convolutional neural networks di- tion improves the accuracy from 29.5 % for SqueezeSeg to
rectly on surfaces. This is achieved by assuming that the 47.4 % for DarkNet21Seg and to 49.9 % for DarkNet53Seg.
other-vehicle
other-ground

motorcyclist
motorcycle

traffic sign
vegetation
sidewalk

bicyclist
building
parking

bicycle

person
terrain
mIoU

fence
trunk
truck
road

pole
car
Approach
PointNet [40] 14.6 61.6 35.7 15.8 1.4 41.4 46.3 0.1 1.3 0.3 0.8 31.0 4.6 17.6 0.2 0.2 0.0 12.9 2.4 3.7
SPGraph [31] 17.4 45.0 28.5 0.6 0.6 64.3 49.3 0.1 0.2 0.2 0.8 48.9 27.2 24.6 0.3 2.7 0.1 20.8 15.9 0.8
SPLATNet [51] 18.4 64.6 39.1 0.4 0.0 58.3 58.2 0.0 0.0 0.0 0.0 71.1 9.9 19.3 0.0 0.0 0.0 23.1 5.6 0.0
PointNet++ [41] 20.1 72.0 41.8 18.7 5.6 62.3 53.7 0.9 1.9 0.2 0.2 46.5 13.8 30.0 0.9 1.0 0.0 16.9 6.0 8.9
SqueezeSeg [60] 29.5 85.4 54.3 26.9 4.5 57.4 68.8 3.3 16.0 4.1 3.6 60.0 24.3 53.7 12.9 13.1 0.9 29.0 17.5 24.5
SqueezeSegV2 [61] 39.7 88.6 67.6 45.8 17.7 73.7 81.8 13.4 18.5 17.9 14.0 71.8 35.8 60.2 20.1 25.1 3.9 41.1 20.2 36.3
TangentConv [52] 40.9 83.9 63.9 33.4 15.4 83.4 90.8 15.2 2.7 16.5 12.1 79.5 49.3 58.1 23.0 28.4 8.1 49.0 35.8 28.5
DarkNet21Seg 47.4 91.4 74.0 57.0 26.4 81.9 85.4 18.6 26.2 26.5 15.6 77.6 48.4 63.6 31.8 33.6 4.0 52.3 36.0 50.0
DarkNet53Seg 49.9 91.8 74.6 64.8 27.9 84.1 86.4 25.5 24.5 32.7 22.6 78.3 50.1 64.0 36.2 33.6 4.7 55.0 38.9 52.2

Table 2: Single scan results (19 classes) for all baselines on sequences 11 to 21 (test set). All methods were trained on
sequences 00 to 10, except for sequence 08 which is used as validation set.

PointNet SPLATNet SqueezeSegV2

PointNet++ TangentConv DarkNet21Seg
Approach num. parameters train time inference
time

GPU hours seconds
SPGraph SqueezeSeg DarkNet53Seg (million) epoch point cloud

PointNet 3 4 0.5
40 PointNet++ 6 16 5.9
SPGraph 0.25 6 5.2
Mean IoU [%]

30
TangentConv 0.4 6 3.0
SPLATNet 0.8 8 1.0
SqueezeSeg 1 0.5 0.015
20
SqueezeSegV2 1 0.6 0.02
DarkNet21Seg 25 2 0.055
10
DarkNet53Seg 50 3 0.1
10 15 20 25 30 35 40 45 50
Distance to sensor [m] Table 3: Approach statistics.
Figure 4: IoU vs. distance to the sensor.
4.2. Multiple Scan Experiments
Another reason is that the point clouds generated by Li- Task and Metrics. In this task, we allow methods to ex-
DAR are relatively sparse, especially as the distance to the ploit information from a sequence of multiple past scans
sensor increases. This is partially solved in SqueezeSeg, to improve the segmentation of the current scan. We fur-
which exploits the way the rotating scanner captures the thermore want the methods to distinguish moving and non-
data to generate a dense range image, where each pixel cor- moving classes, i.e., all 25 classes must be predicted, since
responds roughly to a point in the scan. this information should be visible in the temporal informa-
These effects are further analyzed in Figure 4, where the tion of multiple past scans. The evaluation metric for this
mIoU is plotted w.r.t. the distance to the sensor. It shows task is still the same as in the single scan case, i.e., we eval-
that results of all approaches get worse with increasing dis- uate the mean IoU of the current scan no matter how many
tance. This further confirms our hypothesis that the spar- past scans were used to compute the results.
sity is the main reason for worse results at large distances.
However, the results also show that some methods, like SP- Baselines. We exploit the sequential information by com-
Graph, are less affected by the distance-dependent sparsity bining 5 scans into a single, large point cloud, i.e., the cur-
and this might be a promising direction for future research rent scan at timestamp t and the 4 scans before at times-
to combine the strength of both paradigms. tamps t − 1, . . . , t − 4. We evaluate DarkNet53Seg and
Especially classes with few examples, like motorcyclists TangentConv, since these approaches can deal with a larger
and trucks, seem to be more difficult for all approaches. But number of points without downsampling of the point clouds
also classes with only a small number of points in a single and could still be trained in a reasonable amount of time.
point cloud, like bicycles and poles, are hard classes. Results and Discussion. Table 4 shows the per-class re-
Finally, the best performing approach (DarkNet53Seg) sults for the movable classes and the mean IoU (mIoU) over
with 49.9% mIoU is still far from achieving results that all classes. For each method, we show in the upper part of
are on par with image-based approaches, e.g., 80% on the the row the IoU for non-moving (unshaded) and in the lower
Cityscapes benchmark [10]. part of the row the IoU for moving objects (shaded). The
records their backsides, which are hidden in the initial

other-vehicle

motorcyclist
scan due to self-occlusion. This is exactly the information

bicyclist
person

mIoU
needed for semantic scene completion as it contains the full

truck
car
Approach 3D geometry of all objects while their semantics are pro-
84.9 21.1 18.5 1.6 0.0 0.0 vided by our dense annotations.
TangentConv [52] 34.1
40.3 42.2 30.1 6.4 1.1 1.9
Dataset Generation. By superimposing an exhaustive
DarkNet53Seg
84.1 20.0 20.7 7.5 0.0 0.0
41.6 number of future laser scans in a predefined region in front
61.5 37.8 28.9 15.2 14.1 0.2 of the car, we can generate pairs of inputs and targets that
correspond to the task of semantic scene completion. As
Table 4: IoU results using a sequence of multiple past scans proposed by Song et al. [49], our dataset for the scene com-
(in %). Shaded cells correspond to the IoU of the moving pletion task is a voxelized representation of the 3D scene.
classes, while unshaded entries are the non-moving classes. We select a volume of 51.2 m ahead of the car, 25.6 m
to every side and 6.4 m in height with a voxel resolution of
performance of the remaining static classes is similar to the 0.2 m, which results in a volume of 256×256×32 voxels to
single scan results and we refer to the supplement for a table predict. We assign a single label to every voxel based on the
containing all classes. majority vote over all labeled points inside a voxel. Voxels
The general trend that the projective methods perform that do not contain any points are labeled as empty.
better than the point-based methods is still apparent, which To compute which voxels belong to the occluded space,
can be also attributed to the larger amount of parameters we check for every pose of the car which voxels are visi-
as in the single scan case. Both approaches show difficul- ble to the sensor by tracing a ray. Some of the voxels, e.g.
ties in separating moving and non-moving objects, which those inside objects or behind walls are never visible, so we
might be caused by our design decision to aggregate multi- ignore them during training and evaluation.
ple scans into a single large point cloud. The results show Overall, we extracted 19 130 pairs of input and target
that especially bicyclist and motorcyclist never get correctly voxel grids for training, 815 for validation and 3 992 for
assigned the non-moving class, which is most likely a con- testing. For the test set, we only provide the unlabeled in-
sequence from the generally sparser object point clouds. put voxel grid and withhold the target voxel grids. Figure 5
We expect that new approaches could explicitly exploit shows an example of an input and target pair.
the sequential information by using multiple input streams Task and Metrics. In semantic scene completion, we are
to the architecture or even recurrent neural networks to ac- interested in predicting the complete scene inside a certain
count for the temporal information, which again might open volume from a single initial scan. More specifically, we use
a new line of research. as input a voxel grid, where each voxel is marked as empty
or occupied, depending on whether or not it contains a laser
5. Evaluation of Semantic Scene Completion measurement. For semantic scene completion, one needs to
predict whether a voxel is occupied and its semantic label
After leveraging a sequence of past scans for seman-
in the completed scene.
tic point cloud segmentation, we now show a scenario that
makes use of future scans. Due to its sequential nature, our For evaluation, we follow the evaluation protocol of
dataset provides the unique opportunity to be extended for Song et al. [49] and compute the IoU for the task of scene
the task of 3D semantic scene completion. Note that this is completion, which only classifies a voxel as being occu-
the first real world outdoor benchmark for this task. Exist- pied or empty, i.e., ignoring the semantic label, as well as
ing point cloud datasets cannot be used to address this task, mIoU (1) for the task of semantic scene completion over the
as they do not allow for aggregating labeled point clouds same 19 classes that were used for the single scan semantic
that are sufficiently dense in both space and time. segmentation task (see Section 4).
In semantic scene completion, one fundamental prob- State of the Art. Early approaches addressed the task of
lem is to obtain ground truth labels for real world datasets. scene completion either without predicting semantics [16],
In case of NYUv2 [48], CAD models were fit into the thereby not providing a holistic understanding of the scene,
scene [45] using an RGB-D image captured by a Kinect or by trying to fit a fixed number of mesh models to the
sensor. New approaches often resort to prove their effective- scene geometry [20], which limits the expressiveness of the
ness on the larger, but synthetic SUNCG dataset [49]. How- approach.
ever, a dataset combining the scale of a synthetic dataset and Song et al. [49] were the first to address the task of se-
usage of real-world data is still missing. mantic scene completion in an end-to-end fashion. Their
In the case of our proposed dataset, the car carrying the work spawned a lot of interest in the field yielding mod-
LiDAR moves past 3D objects in the scene and thereby els that combine the usage of color and depth informa-
Figure 5: Left: Visualization of the incomplete input for the semantic scene completion benchmark. Note that we show the
labels only for better visualization, but the real input is a single raw voxel grid without any labels. Right: Corresponding
target output representing the completed and fully labeled 3D scene.

tion [33, 18] or address the problem of sparse 3D fea- Completion Semantic Scene
ture maps by introducing submanifold convolutions [64] or (IoU) Completion (mIoU)

increase the output resolution by deploying a multi-stage SSCNet [49] 29.83 9.53
TS3D [18] 29.81 9.54
coarse to fine training scheme [12]. Other works exper- TS3D [18] + DarkNet53Seg 24.99 10.19
imented with new encoder-decoder CNN architectures as TS3D [18] + DarkNet53Seg + SATNet 50.60 17.70
well as improving the loss term by adding adversarial loss
components [58]. Table 5: Semantic scene completion baselines.
Baseline Approaches. We report the results of four se-
memory limitations, we use random cropping during train-
mantic scene completion approaches. In the first approach,
ing. During inference, we divide each volume into six equal
we apply SSCNet [49] without the flipped TSDF as input
parts, perform scene completion on them individually and
feature. This has minimal impact on the performance, but
subsequently fuse them. This approach performs much bet-
significantly speeds up the training time due to faster pre-
ter than the SSCNet based approaches.
processing [18]. Then we use the Two Stream (TS3D) ap-
Apart from dealing with the target resolution, a challenge
proach [18], which makes use of the additional information
for current models is the sparsity of the laser input signal in
from the RGB image corresponding to the input laser scan.
the far field as can be seen from Figure 5. To obtain a higher
Therefore the RGB image is first processed by a 2D seman-
resolution input signal in the far field, approaches would
tic segmentation network, using the approach DeepLab v2
have to exploit more efficiently information from high res-
(ResNet-101) [9] trained on Cityscapes to generate a se-
olution RGB images provided along with each laser scan.
mantic segmentation. The depth information from the sin-
gle laser scan and the labels inferred from the RGB image
6. Conclusion and Outlook
are combined in an early fusion. Furthermore, we modify
the TS3D approach in two steps: First, by directly using In this work, we have presented a large-scale dataset
labels from the best LiDAR-based semantic segmentation showing unprecedented scale in point-wise annotation of
approach (DarkNet53Seg) and secondly, by exchanging the point cloud sequences. We provide a range of different
3D-CNN backbone by SATNet [33]. baseline experiments for three tasks: (i) semantic segmen-
Results and Discussion. Table 5 shows the results of each tation using a single scan, (ii) semantic segmentation using
of the baselines, whereas results for individual classes are multiple scans, and (iii) semantic scene completion.
reported in the supplement. The TS3D network, incorpo- In future work, we plan to provide also instance-level
rating 2D semantic segmentation of the RGB image, per- annotation over the whole sequence, i.e., we want to distin-
forms similar to SSCNet which only uses depth informa- guish different objects in a scan, but also identify the same
tion. However, the usage of the best semantic segmen- object over time. This will enable to investigate temporal
tation directly working on the point cloud slightly out- instance segmentation over sequences. However, we also
performs SSCNet on semantic scene completion (TS3D + see potential for other new tasks based on our labeling ef-
DarkNet53Seg). Note that the first three approaches are fort, such as the evaluation of semantic SLAM.
based on SSCNet’s 3D-CNN architecture, which performs Acknowledgments We thank all students that helped with
a 4 fold downsampling in a forward pass and thus renders annotating the data. The work has been funded by the
them incapable of dealing with details of the scene. In Deutsche Forschungsgemeinschaft (DFG, German Research
our final approach, we exchange the SSCNet-backbone of Foundation) under FOR 1505 Mapping on Demand, BE 5996/1-1,
TS3D + DarkNet53Seg with SATNet [33], which is capa- GA 1927/2-2, and under Germanys Excellence Strategy, EXC-
ble of dealing with the desired output resolution. Due to 2070 – 390732324 (PhenoRob).
References Scans. In Proc. of the IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2018. 2, 8
[1] Anuraag Agrawal, Atsushi Nakazawa, and Haruo Takemura. [13] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and
MMM-classification of 3D Range Data. In Proc. of the IEEE Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image
Intl. Conf. on Robotics & Automation (ICRA), 2009. 5 Database. In Proc. of the IEEE Conf. on Computer Vision
[2] Dragomir Anguelov, Ben Taskar, Vassil Chatalbashev, and Pattern Recognition (CVPR), 2009. 2
Daphne Koller, Dinkar Gupta, Geremy Heitz, and Andrew [14] Francis Engelmann, Theodora Kontogianni, Jonas Schult,
Ng. Discriminative Learning of Markov Random Fields and Bastian Leibe. Know What Your Neighbors Do: 3D Se-
for Segmentation of 3D Scan Data. In Proc. of the IEEE mantic Segmentation of Point Clouds. arXiv preprint, 2018.
Conf. on Computer Vision and Pattern Recognition (CVPR), 5
pages 169–176, 2005. 5 [15] Mark Everingham, S.M. Ali Eslami, Luc van Gool, Christo-
[3] Iro Armeni, Alexander Sax, Amir R. Zamir, and Silvio pher K.I. Williams, John Winn, and Andrew Zisserman. The
Savarese. Joint 2D-3D-Semantic Data for Indoor Scene Un- Pascal Visual Object Classes Challenge a Retrospective. In-
derstanding. arXiv preprint, 2017. 2 ternational Journal on Computer Vision (IJCV), 111(1):98–
[4] Jens Behley, Kristian Kersting, Dirk Schulz, Volker Stein- 136, 2015. 4
hage, and Armin B. Cremers. Learning to Hash Logistic [16] Michael Firman, Oisin Mac Aodha, Simon Julier, and
Regression for Fast 3D Scan Point Classification. In Proc. of Gabriel J. Brostow. Structured Prediction of Unobserved
the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems Voxels From a Single Depth Image. In Proc. of the IEEE
(IROS), pages 5960–5965, 2010. 5 Conf. on Computer Vision and Pattern Recognition (CVPR),
[5] Jens Behley and Cyrill Stachniss. Efficient Surfel-Based pages 5431–5440, 2016. 7
SLAM using 3D Laser Range Data in Urban Environments. [17] Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora
In Proc. of Robotics: Science and Systems (RSS), 2018. 3 Vig. Virtual Worlds as Proxy for Multi-Object Tracking
[6] Jens Behley, Volker Steinhage, and Armin B. Cremers. Per- Analysis. In Proc. of the IEEE Conf. on Computer Vision
formance of Histogram Descriptors for the Classification of and Pattern Recognition (CVPR), 2016. 3
3D Laser Range Data in Urban Environments. In Proc. of the [18] Martin Garbade, Yueh-Tung Chen, J. Sawatzky, and Juer-
IEEE Intl. Conf. on Robotics & Automation (ICRA), 2012. 2, gen Gall. Two Stream 3D Semantic Scene Completion. In
3 Proc. of the IEEE/CVF Conf. on Computer Vision and Pat-
[7] Alexandre Boulch, Joris Guerry, Bertrand Le Saux, and tern Recognition (CVPR) Workshops, 2019. 8
Nicolas Audebert. SnapNet: 3D point cloud semantic la- [19] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we
beling with 2D deep segmentation networks. Computers & ready for Autonomous Driving? The KITTI Vision Bench-
Graphics, 2017. 5 mark Suite. In Proc. of the IEEE Conf. on Computer Vision
[8] Angel X. Chang, Thomas Funkhouser, Leonidas J. Guibas, and Pattern Recognition (CVPR), pages 3354–3361, 2012.
Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, 1, 2, 3
Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, [20] Andres Geiger and Chaohui Wang. Joint 3d Object and Lay-
and Fisher Yu. ShapeNet: An Information-Rich 3D Model out Inference from a single RGB-D Image. In Proc. of the
Repository. Technical Report arXiv:1512.03012 [cs.GR], German Conf. on Pattern Recognition (GCPR), pages 183–
Stanford University and Princeton University and Toyota 195, 2015. 7
Technological Institute at Chicago, 2015. 2, 5 [21] Benjamin Graham, Martin Engelcke, and Laurens van der
[9] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Maaten. 3D Semantic Segmentation with Submanifold
Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic Sparse Convolutional Networks. In Proc. of the IEEE
Image Segmentation withDeep Convolutional Nets, Atrous Conf. on Computer Vision and Pattern Recognition (CVPR),
Convolution,and Fully Connected CRFs. IEEE Transac- 2018. 5
tions on Pattern Analysis and Machine Intelligence (PAMI), [22] Fabian Groh, Patrick Wieschollek, and Hendrik Lensch.
40(4):834–848, 2018. 8 Flex-Convolution (Million-Scale Pointcloud Learning Be-
[10] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo yond Grid-Worlds). In Proc. of the Asian Conf. on Computer
Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Vision (ACCV), Dezember 2018. 5
Franke, Stefan Roth, and Bernt Schiele. The Cityscapes [23] Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D.
Dataset for Semantic Urban Scene Understanding. In Wegner, Konrad Schindler, and Marc Pollefeys. SEMAN-
Proc. of the IEEE Conf. on Computer Vision and Pattern TIC3D.NET: A new large-scale point cloud classification
Recognition (CVPR), 2016. 2, 3, 4, 6 benchmark. In ISPRS Annals of the Photogrammetry, Re-
[11] Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- mote Sensing and Spatial Information Sciences, volume IV-
ber, Thomas Funkhouser, and Matthias Nießner. ScanNet: 1-W1, pages 91–98, 2017. 2
Richly-annotated 3D Reconstructions of Indoor Scenes. In [24] Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen,
Proc. of the IEEE Conf. on Computer Vision and Pattern Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. SceneNN:
Recognition (CVPR), 2009. 2 A Scene Meshes Dataset with aNNotations. In Proc. of the
[12] Angela Dai, Daniel Ritchie, Martin Bokeloh, Scott Reed, Intl. Conf. on 3D Vision (3DV), 2016. 2
Jürgen Sturm, and Matthias Nießner. ScanComplete: Large- [25] Binh-Son Hua, Minh-Khoi Tran, and Sai-Kit Yeung. Point-
Scale Scene Completion and Semantic Segmentation for 3D wise Convolutional Neural Networks. In Proc. of the IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), [39] Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and
2018. 5 Peter Kontschieder. The Mapillary Vistas Dataset for Se-
[26] Jing Huang and Suya You. Point Cloud Labeling using 3D mantic Understanding of Street Scenes. In Proc. of the IEEE
Convolutional Neural Network. In Proc. of the Intl. Conf. on Intl. Conf. on Computer Vision (ICCV), 2017. 2, 3
Pattern Recognition (ICPR), 2016. 5 [40] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas.
[27] Varun Jampani, Martin Kiefel, and Peter V. Gehler. Learn- PointNet: Deep Learning on Point Sets for 3D Classification
ing Sparse High Dimensional Filters: Image Filtering, Dense and Segmentation. In Proc. of the IEEE Conf. on Computer
CRFs and Bilateral Neural Networks. In Proc. of the IEEE Vision and Pattern Recognition (CVPR), 2017. 5, 6
Conf. on Computer Vision and Pattern Recognition (CVPR), [41] Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Point-
2016. 5 Net++: Deep Hierarchical Feature Learning on Point Sets in
[28] Mingyang Jiang, Yiran Wu, and Cewu Lu. PointSIFT: A a Metric Space. In Proc. of the Conf. on Neural Information
SIFT-like Network Module for 3D Point Cloud Semantic Processing Systems (NeurIPS), 2017. 5, 6
Segmentation. arXiv preprint, 2018. 5 [42] Joseph Redmon and Ali Farhadi. YOLOv3: An Incremental
[29] Andrew E. Johnson and Martial Hebert. Using spin Improvement. arXiv preprint, 2018. 5
images for effcient object recognition in cluttered 3D [43] Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab,
scenes. Trans. on Pattern Analysis and Machine Intelligence and Frederico Tombari. Fully-Convolutional Point Net-
(TPAMI), 21(5):433–449, 1999. 5 works for Large-Scale Point Clouds. Proc. of the European
[30] Roman Klukov and Victor Lempitsky. Escape from Cells: Conf. on Computer Vision (ECCV), 2018. 5
Deep Kd-Networks for the Recognition of 3D Point Cloud [44] Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger.
Models. In Proc. of the IEEE Intl. Conf. on Computer Vision OctNet: Learning Deep 3D Representations at High Reso-
(ICCV), 2017. 5 lutions. In Proc. of the IEEE Conf. on Computer Vision and
[31] Loic Landrieu and Martin Simonovsky. Large-scale Point Pattern Recognition (CVPR), 2017. 5
Cloud Semantic Segmentation with Superpoint Graphs. In
[45] Jason Rock, Tanmay Gupta, Justin Thorsen, JunYoung
Proc. of the IEEE Conf. on Computer Vision and Pattern
Gwak, Daeyun Shin, and Derek Hoiem. Completing 3D Ob-
Recognition (CVPR), 2018. 5, 6
ject Shape from One Depth Image. In Proc. of the IEEE
[32] Wenbin Li, Sajad Saeedi, John McCormac, Ronald Clark,
Conf. on Computer Vision and Pattern Recognition (CVPR),
Dimos Tzoumanikas, Qing Ye, Yuzhong Huang, Rui Tang,
2015. 7
and Stefan Leutenegger. InteriorNet: Mega-scale Multi-
[46] German Ros, Laura Sellart, Joanna Materzynska, David
sensor Photo-realistic Indoor Scenes Dataset. In Proc. of the
Vazquez, and Antonio Lopez. The SYNTHIA Dataset: A
British Machine Vision Conference (BMVC), 2018. 2
Large Collection of Synthetic Images for Semantic Segmen-
[33] Shice Liu, Yu Hu, Yiming Zeng, Qiankun Tang, Beibei Jin,
tation of Urban Scenes. In Proc. of the IEEE Conf. on Com-
Yainhe Han, and Xiaowei Li. See and Think: Disentangling
puter Vision and Pattern Recognition (CVPR), June 2016. 2
Semantic Scene Completion. In Proc. of the Conf. on Neural
Information Processing Systems (NeurIPS), pages 261–272, [47] Xavier Roynard, Jean-Emmanuel Deschaud, and Francois
2018. 8 Goulette. Paris-Lille-3D: A large and high-quality ground-
truth urban point cloud dataset for automatic segmentation
[34] Daniel Maturana and Sebastian Scherer. VoxNet: A 3D Con-
and classification. Intl. Journal of Robotics Research (IJRR),
volutional Neural Network for Real-Time Object Recogni-
37(6):545–557, 2018. 2, 3
tion. In Proc. of the IEEE/RSJ Intl. Conf. on Intelligent
Robots and Systems (IROS), 2015. 5 [48] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob
[35] John McCormac, Ankur Handa, Stefan Leutenegger, and Fergus. Indoor Segmentation and Support Inference from
Andrew J. Davison. SceneNet RGB-D: Can 5M Synthetic RGBD Images. In Proc. of the European Conf. on Computer
Images Beat Generic ImageNet Pre-training on Indoor Seg- Vision (ECCV), 2012. 2, 5, 7
mentation? In Proc. of the IEEE Intl. Conf. on Computer [49] Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang,
Vision (ICCV), 2017. 2 Manolis Savva, and Thomas Funkhouser. Semantic Scene
[36] Daniel Munoz, J. Andrew Bagnell, Nicolas Vandapel, and Completion from a Single Depth Image. In Proc. of the IEEE
Martial Hebert. Contextual Classification with Functional Conf. on Computer Vision and Pattern Recognition (CVPR),
Max-Margin Markov Networks. In Proc. of the IEEE 2017. 7, 8
Conf. on Computer Vision and Pattern Recognition (CVPR), [50] Bastian Steder, Giorgio Grisetti, and Wolfram Burgard. Ro-
2009. 2, 5 bust Place Recognition for 3D Range Data based on Point
[37] Daniel Munoz, Nicholas Vandapel, and Marial Hebert. Di- Features. In Proc. of the IEEE Intl. Conf. on Robotics &
rectional Associative Markov Network for 3-D Point Cloud Automation (ICRA), 2010. 2
Classification. In Proc. of the International Symposium [51] Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji,
on 3D Data Processing, Visualization and Transmission Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz.
(3DPVT), pages 63–70, 2008. 5 SPLATNet: Sparse Lattice Networks for Point Cloud Pro-
[38] Daniel Munoz, Nicholas Vandapel, and Martial Hebert. On- cessing. In Proc. of the IEEE Conf. on Computer Vision and
board Contextual Classification of 3-D Point Clouds with Pattern Recognition (CVPR), 2018. 5, 6
Learned High-order Markov Random Fields. In Proc. of the [52] Maxim Tatarchenko, Jaesik Park, Vladen Koltun, and Qian-
IEEE Intl. Conf. on Robotics & Automation (ICRA), 2009. 5 Yi Zhou. Tangent Convolutions for Dense Prediction in 3D.
In Proc. of the IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2018. 5, 6, 7
[53] Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni,
Jun Young Gwak, and Silvio Savarese. SEGCloud: Se-
mantic Segmentation of 3D Point Clouds. In Proc. of the
Intl. Conf. on 3D Vision (3DV), 2017. 5
[54] Gusi Te, Wei Hu, Zongming Guo, and Amin Zheng.
RGCNN: Regularized Graph CNN for Point Cloud Segmen-
tation. arXiv preprint, 2018. 5
[55] A. Torralba and A. Efros. Unbiased Look at Dataset Bias.
In Proc. of the IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2011. 2, 3
[56] Rudolph Triebel, Krisitian Kersting, and Wolfram Bur-
gard. Robust 3D Scan Point Classification using Associa-
tive Markov Networks. In Proc. of the IEEE Intl. Conf. on
Robotics & Automation (ICRA), pages 2603–2608, 2006. 5
[57] Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei
Pokrovsky, and Raquel Urtasun. Deep Parametric Contin-
uous Convolutional Neural Networks. In Proc. of the IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR),
2018. 3
[58] Yida Wang, Davod Tan Joseph, Nassir Navab, and Frederico
Tombari. Adversarial Semantic Scene Completion from a
Single Depth Image. In Proc. of the Intl. Conf. on 3D Vision
(3DV), pages 426–434, 2018. 8
[59] Zongji Wang and Feng Lu. VoxSegNet: Volumetric CNNs
for Semantic Part Segmentation of 3D Shapes. arXiv
preprint, 2018. 5
[60] Bichen Wu, Alvin Wan, Xiangyu Yue, and Kurt Keutzer.
SqueezeSeg: Convolutional Neural Nets with Recurrent
CRF for Real-Time Road-Object Segmentation from 3D Li-
DAR Point Cloud. In Proc. of the IEEE Intl. Conf. on
Robotics & Automation (ICRA), 2018. 5, 6
[61] Bichen Wu, Xuanyu Zhou, Sicheng Zhao, Xiangyu Yue, and
Kurt Keutzer. SqueezeSegV2: Improved Model Structure
and Unsupervised Domain Adaptation for Road-Object Seg-
mentation from a LiDAR Point Cloud. Proc. of the IEEE
Intl. Conf. on Robotics & Automation (ICRA), 2019. 5, 6
[62] Xuehan Xiong, Daniel Munoz, J. Andrew Bagnell, and Mar-
tial Hebert. 3-D Scene Analysis via Sequenced Predictions
over Points and Regions. In Proc. of the IEEE Intl. Conf. on
Robotics & Automation (ICRA), pages 2609–2616, 2011. 5
[63] Wei Zeng and Theo Gevers. 3DContextNet: K-d Tree
Guided Hierarchical Learning of Point Clouds Using Local
and Global Contextual Cues. arXiv preprint, 2017. 5
[64] Jiahui Zhang, Hao Zhao, Anbang Yao, Yurong Chen, Li
Zhang, and Hongen Liao. Efficient Semantic Scene Com-
pletion Network with Spatial Group Convolution. In Proc. of
the European Conf. on Computer Vision (ECCV), pages 733–
749, 2018. 8
[65] Richard Zhang, Stefan A. Candra, Kai Vetter, and Avideh
Zakhor. Sensor Fusion for Semantic Segmentation of Ur-
ban Scenes. In Proc. of the IEEE Intl. Conf. on Robotics &
Automation (ICRA), 2015. 2

44
No ratings yet
44
21 pages
CUET PG 12th March 2024 COPQ 12 by Cracku
No ratings yet
CUET PG 12th March 2024 COPQ 12 by Cracku
31 pages
Semantic Segmentation a Systematic Analysis From S
No ratings yet
Semantic Segmentation a Systematic Analysis From S
28 pages
k
No ratings yet
k
10 pages
33
No ratings yet
33
7 pages
[BEVfusion] NeurIPS-2022-bevfusion-a-simple-and-robust-lidar-camera-fusion-framework-Paper-Conference
No ratings yet
[BEVfusion] NeurIPS-2022-bevfusion-a-simple-and-robust-lidar-camera-fusion-framework-Paper-Conference
14 pages
SaViD Spectravista Aesthetic Vision Integration Fo
No ratings yet
SaViD Spectravista Aesthetic Vision Integration Fo
8 pages
10
No ratings yet
10
6 pages
2209.13797v3
No ratings yet
2209.13797v3
7 pages
FuseSeg Semantic Segmentation of Urban Scenes Based On RGB and Thermal Data Fusion
No ratings yet
FuseSeg Semantic Segmentation of Urban Scenes Based On RGB and Thermal Data Fusion
12 pages
Panovild: A Challenging Panoramic Vision, Inertial and Lidar Dataset For Simultaneous Localization and Mapping
No ratings yet
Panovild: A Challenging Panoramic Vision, Inertial and Lidar Dataset For Simultaneous Localization and Mapping
21 pages
DS2CD202WFIW
No ratings yet
DS2CD202WFIW
4 pages
FusionLane：使用深度神经网络进行车道标记语义分割的多传感器融合
No ratings yet
FusionLane：使用深度神经网络进行车道标记语义分割的多传感器融合
10 pages
Aveva_Worktask
No ratings yet
Aveva_Worktask
6 pages
7
No ratings yet
7
10 pages
Lenet
No ratings yet
Lenet
7 pages
Road Environment Semantic Segmentation with Deep
No ratings yet
Road Environment Semantic Segmentation with Deep
13 pages
The UAVid Dataset For Video Semantic Segmentation
No ratings yet
The UAVid Dataset For Video Semantic Segmentation
9 pages
1711.08681v1
No ratings yet
1711.08681v1
30 pages
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
No ratings yet
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
21 pages
Driver Rack DBX 260
No ratings yet
Driver Rack DBX 260
2 pages
TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers
No ratings yet
TransFusion Robust LiDAR-Camera Fusion For 3D Object Detection With Transformers
10 pages
Semantic Segmentation For Urban-Scene Images: Shorya Sharma
No ratings yet
Semantic Segmentation For Urban-Scene Images: Shorya Sharma
15 pages
pdf paraphrase
No ratings yet
pdf paraphrase
16 pages
MHD202-Semantic Segmentation and Mapping of Traffic Scene-Presentation Slides
No ratings yet
MHD202-Semantic Segmentation and Mapping of Traffic Scene-Presentation Slides
32 pages
Cross Dataset Sensor Alignment Making Visual 3D Object Detector Generalize
No ratings yet
Cross Dataset Sensor Alignment Making Visual 3D Object Detector Generalize
8 pages
Chapter 01 - 3D Perception Vision
No ratings yet
Chapter 01 - 3D Perception Vision
8 pages
A joint deep learning network of point clouds and multiple views for roadside object classification from lidar point clouds
No ratings yet
A joint deep learning network of point clouds and multiple views for roadside object classification from lidar point clouds
6 pages
Salsanet: Fast Road and Vehicle Segmentation in Lidar Point Clouds For Autonomous Driving
No ratings yet
Salsanet: Fast Road and Vehicle Segmentation in Lidar Point Clouds For Autonomous Driving
7 pages
Caesar NuScenes a Multimodal Dataset for Autonomous Driving
No ratings yet
Caesar NuScenes a Multimodal Dataset for Autonomous Driving
11 pages
RangeRCNN Towards Fast and Accurate 3D Object Detection
No ratings yet
RangeRCNN Towards Fast and Accurate 3D Object Detection
9 pages
Radarpoint Cloud Dataset Forapplications
No ratings yet
Radarpoint Cloud Dataset Forapplications
8 pages
HD Map
No ratings yet
HD Map
10 pages
A Multimodal Dataset For Autonomous Driving
No ratings yet
A Multimodal Dataset For Autonomous Driving
16 pages
Carina Dataset Emerging
No ratings yet
Carina Dataset Emerging
6 pages
HDMapNet
No ratings yet
HDMapNet
7 pages
Catalogo Simbox W lv10
No ratings yet
Catalogo Simbox W lv10
13 pages
RangeNet ++
No ratings yet
RangeNet ++
8 pages
2207.12691v1
No ratings yet
2207.12691v1
6 pages
CARRADA Dataset Camera and Automotive Radar With R
No ratings yet
CARRADA Dataset Camera and Automotive Radar With R
8 pages
BirdNet A 3D Object Detection Framework
No ratings yet
BirdNet A 3D Object Detection Framework
8 pages
Squeeze Seg
No ratings yet
Squeeze Seg
7 pages
Lin2014 Chapter MicrosoftCOCOCommonObjectsInCo
No ratings yet
Lin2014 Chapter MicrosoftCOCOCommonObjectsInCo
16 pages
paper-2-4
No ratings yet
paper-2-4
3 pages
%% Implementation of Generation and Decoding of Cyclic Codes
No ratings yet
%% Implementation of Generation and Decoding of Cyclic Codes
7 pages
Usab 6206
No ratings yet
Usab 6206
65 pages
Khanda Photogrammetry paraphrase
No ratings yet
Khanda Photogrammetry paraphrase
14 pages
Atlasnet: A Papier-M Ach E Approach To Learning 3D Surface Generation
No ratings yet
Atlasnet: A Papier-M Ach E Approach To Learning 3D Surface Generation
16 pages
Eex4465 - 2020-Cat 2
No ratings yet
Eex4465 - 2020-Cat 2
2 pages
Sensors Paper Urban Road Filter
No ratings yet
Sensors Paper Urban Road Filter
18 pages
2205.07690
No ratings yet
2205.07690
11 pages
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
No ratings yet
Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation
17 pages
Seg-LSTM: Performance of XLSTM For Semantic Segmentation of Remotely Sensed Images
No ratings yet
Seg-LSTM: Performance of XLSTM For Semantic Segmentation of Remotely Sensed Images
5 pages
Overlapnet: Loop Closing For Lidar-Based Slam
No ratings yet
Overlapnet: Loop Closing For Lidar-Based Slam
10 pages
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
No ratings yet
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
10 pages
ECS5520-18X/ECS5520-18T: Datasheet
No ratings yet
ECS5520-18X/ECS5520-18T: Datasheet
6 pages
Domain Transfer For Semantic Segmentation of Lidar Data Using Deep Neural Networks
No ratings yet
Domain Transfer For Semantic Segmentation of Lidar Data Using Deep Neural Networks
8 pages
Sat - 90.Pdf - Prediction of Bank Customer Churn Using Machine Learning Technique
No ratings yet
Sat - 90.Pdf - Prediction of Bank Customer Churn Using Machine Learning Technique
11 pages
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
No ratings yet
Vision Meets Robotics: The KITTI Dataset: Andreas Geiger, Philip Lenz, Christoph Stiller and Raquel Urtasun
6 pages
Learning An Overlap-Based Observation Model For 3D Lidar Localization
No ratings yet
Learning An Overlap-Based Observation Model For 3D Lidar Localization
7 pages
Optimizing Window Design On Residential Building Facades
No ratings yet
Optimizing Window Design On Residential Building Facades
27 pages
Pointseg: Real-Time Semantic Segmentation Based On 3D Lidar Point Cloud
No ratings yet
Pointseg: Real-Time Semantic Segmentation Based On 3D Lidar Point Cloud
7 pages
Renjini Paper 2
No ratings yet
Renjini Paper 2
16 pages
Vision Meets Robotics
No ratings yet
Vision Meets Robotics
7 pages
Scancomplete: Large-Scale Scene Completion and Semantic Segmentation For 3D Scans
No ratings yet
Scancomplete: Large-Scale Scene Completion and Semantic Segmentation For 3D Scans
15 pages
Ref 7
No ratings yet
Ref 7
14 pages
(Analog) - MS2318 - Design Fundamentals of Implementing An Isolated Half-Bridge Gate Driver
No ratings yet
(Analog) - MS2318 - Design Fundamentals of Implementing An Isolated Half-Bridge Gate Driver
4 pages
A Generative Model For 3D Urban Scene Understanding From Movable Platforms
No ratings yet
A Generative Model For 3D Urban Scene Understanding From Movable Platforms
8 pages
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
No ratings yet
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
9 pages
Algorithm - Writing Lab Reports
No ratings yet
Algorithm - Writing Lab Reports
2 pages
Spatial As Deep: Spatial CNN For Traffic Scene Understanding
No ratings yet
Spatial As Deep: Spatial CNN For Traffic Scene Understanding
8 pages
Question Results: Score 1 of 1
No ratings yet
Question Results: Score 1 of 1
27 pages
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
No ratings yet
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
6 pages
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
No ratings yet
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
10 pages
Thermal Analysis For A Coupled Inductor For 4-Channel Interleaved Automotive Bi-Directional DC-DC Converter Based On Finite-Element Modeling
No ratings yet
Thermal Analysis For A Coupled Inductor For 4-Channel Interleaved Automotive Bi-Directional DC-DC Converter Based On Finite-Element Modeling
7 pages
Mdu Course Detail
No ratings yet
Mdu Course Detail
564 pages
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
No ratings yet
Recent Progress in Semantic Image Segmentation: Xiaolong Liu Zhidong Deng Yuhan Yang
18 pages
Lu-Net: An Efficient Network For 3D Lidar Point Cloud Semantic Segmentation Based On End-To-End-Learned 3D Features and U-Net
No ratings yet
Lu-Net: An Efficient Network For 3D Lidar Point Cloud Semantic Segmentation Based On End-To-End-Learned 3D Features and U-Net
9 pages
LA 6587T - Sanyo
No ratings yet
LA 6587T - Sanyo
5 pages
Scrum A Breathtakingly Brief and Agile Introduction by Chris Sims
No ratings yet
Scrum A Breathtakingly Brief and Agile Introduction by Chris Sims
6 pages
Embedded System Design For Visual Scene Classification: Abstract - Computer Vision and Robotics Community Is
No ratings yet
Embedded System Design For Visual Scene Classification: Abstract - Computer Vision and Robotics Community Is
5 pages
Experiment No:5 Date: 7/1/2022 Internet - Web Browsers, Search Engines and Email
No ratings yet
Experiment No:5 Date: 7/1/2022 Internet - Web Browsers, Search Engines and Email
3 pages
8800 rm-13 33 Service Manuall Level4
No ratings yet
8800 rm-13 33 Service Manuall Level4
79 pages
Mike Resume
No ratings yet
Mike Resume
1 page
3 D Point Cloud Classification
No ratings yet
3 D Point Cloud Classification
7 pages
Introduction To Deep Learning - Assignment
No ratings yet
Introduction To Deep Learning - Assignment
4 pages
Hermle Brochure Overview en PDF
No ratings yet
Hermle Brochure Overview en PDF
78 pages
Srs Chat Box
50% (2)
Srs Chat Box
49 pages
Frequency Measurement and Switching Instruments T401 / T402: Operating Instructions 383D-64618
No ratings yet
Frequency Measurement and Switching Instruments T401 / T402: Operating Instructions 383D-64618
26 pages
Rohan Report
No ratings yet
Rohan Report
40 pages
NI DRUMLAB Manual (English)
No ratings yet
NI DRUMLAB Manual (English)
99 pages
Factor Graphs For Robot Perception
100% (1)
Factor Graphs For Robot Perception
144 pages
Subhranshu SAP FICO FRESHER
No ratings yet
Subhranshu SAP FICO FRESHER
3 pages
Lock Up Latch in VLSI
100% (1)
Lock Up Latch in VLSI
8 pages
Assessment Report
No ratings yet
Assessment Report
25 pages
SASMO WORKSHEET April 8
No ratings yet
SASMO WORKSHEET April 8
2 pages
Technical Mapping Solutions: Definitive Reference for Developers and Engineers
From Everand
Technical Mapping Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet