0% found this document useful (0 votes)

23 views12 pages

Ar Xiv

Uploaded by

knparashar.be

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

Ar Xiv

Uploaded by

knparashar.be

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/347112693

All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-

Density Long-Range Point Clouds

Preprint · December 2020

DOI: 10.13140/RG.2.2.21621.81122

CITATION READS

1 1,695

6 authors, including:

Xinshuo Weng Yunze Man

NVIDIA Zhejiang University
67 PUBLICATIONS 2,786 CITATIONS 13 PUBLICATIONS 441 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Xinshuo Weng on 15 December 2020.

The user has requested enhancement of the downloaded file.

All-In-One Drive: A Large-Scale Comprehensive Perception Dataset
with High-Density Long-Range Point Clouds

Xinshuo Weng, Yunze Man, Dazhi Cheng, Jinhyung Park, Matthew O’Toole, Kris Kitani
Robotics Institute, Carnegie Mellon University
{xinshuow, yman, dazhic, jinhyun1, motoole2, kkitani}@cs.cmu.edu

Abstract AIODrive
High-density
Developing datasets that cover comprehensive sensors, Point cloud long-range
segmentation point clouds
annotations and full data distribution is important for SemanticKITTI
Violation of
innovating robust multi-sensor multi-task perception sys- Image traffic rules
High-speed
tems. Though many datasets have been released, they tar- segmentation
driving
Cityscape A*3D Accidents
get for different use-cases such as 3D segmentation (Se- SPAD-
LiDAR
manticKITTI), radar sensing (nuScenes), large-scale train- Rich maps
Large-scale
ing (Waymo). As a result, we are still in need of a dataset Argoverse
3D detection
and tracking data
that forms a union of various strengths of existing datasets. Adverse Waymo
To address this challenge, we present the AIODrive dataset, weather

a synthetic large-scale dataset that provides comprehensive Radar sensing

nuScenes
sensors, annotations and environmental variations. Specif-
ically, we provide (1) eight sensor modalities (RGB, Stereo, Figure 1: Our proposed dataset forms a union of various
Depth, LiDAR, SPAD-LiDAR, Radar, IMU, GPS), (2) an- strength of currently datasets, including comprehensive en-
notations for all mainstream perception tasks (e.g., detec- vironmental variations, sensors and annotations.
tion, tracking, trajectory prediction, segmentation, depth
has different focuses as shown in Figure 1. For example,
estimation), and (3) rare driving scenarios such as adverse
Waymo [53] dataset provides large-scale data for training
weather and lighting, crowded scenes, high-speed driving,
3D object detection and tracking algorithms but does not
violation of traffic rules, and accidents. In addition to com-
support other perception tasks such as point cloud segmen-
prehensive data, long-range perception is also important
tation. Likewise, Argoverse [13] dataset provides map an-
to perception systems as early detection of faraway ob-
notation for improving perception algorithms but cannot be
jects can help prevent collision in high-speed driving sce-
used for algorithms requiring Radar data. To innovate per-
narios. However, due to the sparsity and limited range of
ception systems that require diverse sensor modalities or
point cloud data in prior datasets, developing and evalu-
methods that integrate multiple perception tasks, existing
ating long-range perception algorithms is challenging. To
datasets might not be applicable. Also, merging a few ex-
address the issue, we provide high-density long-range point
isting datasets together is non-trivial because sensor config-
clouds for LiDAR and SPAD-LiDAR sensors, about 10×
urations are significantly different across datasets.
denser and larger sensing range than Velodyne-64.
As a community, we are in need of a dataset that forms
a union of various strengths of existing datasets to inno-
1. Introduction vate multi-sensor multi-task perception systems. However,
The present surge towards building autonomous vehi- building a real-world dataset that combines the strengths of
cles has undoubtedly advanced computer vision research multiple datasets is challenging as it requires significant re-
by generating large diverse datasets acquired from hun- sources. As mentioned above, the creation of a real-world
dreds of hours of data, thousands of hours of manual an- dataset requires (1) large amount of diverse data to capture
notation, and billions of dollars towards the development full data distribution, (2) large amounts of annotation across
of a customized sensing platform – the autonomous vehi- various perception tasks, and most importantly (3) a physi-
cle. As a result of these investments, large driving datasets cal and expensive platform for recording such data.
[53, 12, 40, 6, 21, 72, 73, 42] have been released to the re- One solution that we propose in this work is the use of a
search community. It is important to note that while these simulator, Carla [16], to generate a comprehensive percep-
datasets helped to advance perception systems, each dataset tion dataset, which we call All-In-One Drive (AIODrive)

1
dataset. Synthetic data generation is able to meet the chal- Velodyne-64 Point Cloud (Bird’s Eye View)
lenges of creating a comprehensive perception dataset be-
cause: (1) a large amount of diverse data can be generated
in simulation as the Carla simulator can change the density
of traffic, velocity of agents, generate violations of traffic
rules and change weather and lighting; (2) large amounts
of annotation for a multitude of tasks can be generated by
combining and post-processing Carla outputs. For exam- Our High-Density Long-Range Point Cloud
ple, we can project 2D semantic annotation to 3D given the
depth image, resulting in 3D semantic annotation for point
clouds. Then, combining with 3D bounding box annota-
tion, 3D semantic annotation can be converted to 3D in-
stance and panoptic segmentation; (3) A ‘physical’ yet af-
fordable sensing platform can be constructed in simulation
to change sensor configuration and even create sensors that
are not yet available in public datasets (early prototype is
available in industry), e.g., long-range high-density LiDAR Figure 2: (Top) Velodyne-64 [1] point cloud with 100k
and SPAD-LiDAR as shown in Figure. 2. These powerful points and a range of 120m. (Bottom) Point cloud from our
sensors can help advance research in long-range perception. LiDAR sensor has 1M points and a range of 1km, which
To summarize, our AIODrive dataset provides: can be used to innovate long-range perception systems.
(1) Eight sensor modalities: 5 high resolution RGB cam-
of prior synthetic datasets, we believe that the usefulness of
eras (1 stereo pair); 5 depth cameras, 1000 meter range
our dataset is also undoubted, as validated by our experi-
LiDAR at multiple levels of density (up to 1M points),
ments on real datasets. Again, we emphasize that the role
1000 meter range SPAD-LiDAR, Radar, IMU, and
of our dataset is not to replace real datasets. Instead, it can
GPS. Four of the sensors have 360◦ horizontal cover-
be used in concert with real data, such as using our data to
age (camera, LiDAR, SPAD-LiDAR, Radar);
pre-train detectors to improve performance on real data or
(2) Annotations for mainstream perception tasks: 2D/3D
using our rare driving data as out-of-distribution test data.
semantic, instance and panoptic segmentation, fine-
The broader impact of our AIODrive dataset is its com-
grained object categories, 2D/3D bounding boxes, ob-
prehensive nature allowing for development and evalua-
ject trajectories, velocity and acceleration;
tion of multi-sensor multi-task perception systems that are
(3) Diverse environmental variations: adverse weather and
sometimes not possible with existing datasets. The AIO-
lighting, crowded scenes, people running, high-speed
Drive dataset includes a super-set of sensors, annotations
driving, violations of the traffic rule, and car accidents.
and environmental variations needed to develop novel per-
Though synthetic data generation can be used to create ception systems. Also, our AIODrive dataset is the largest
a comprehensive dataset, one might argue that the domain driving dataset with over 250k frames with 26M 2D/3D la-
gap between synthetic and real data is a weakness. In our beled object instances, roughly double the size of Waymo
defense, we believe that the usefulness of synthetic datasets dataset in terms of labeled instances. We will release our
is firmly predicated on a body of prior work [47, 38, 45, 22] dataset to the community for free use and open a series of
that has shown, when synthetic data is used correctly, it can challenges, such as long-range 3D object detection based on
be used to enhance perception performance on real data. our long-range high-density point cloud data.
For example, [38] showed that using synthetic data for aug-
mentation can improve performance for depth prediction on 2. Related Work
real NYU [50] and SUN RGB-D [51] datasets. [45] showed Perception Dataset. Sensors, environmental variations and
that using synthetic data created from Unity with free an- annotations are key aspects of perception datasets for au-
notation of semantic segmentation can improve segmenta- tonomous driving [57]. In terms of the annotation, KITTI
tion performance on real-world datasets such as KITTI [17], [17] provides 2D/3D box trajectory labels on images and
CamVid [10], LabelMe [46], CBCL [7]. Also, [47] showed LiDAR data, enabling object detection, tracking and fore-
that augmenting with LiDAR point clouds generated from casting. To enable image segmentation research, Cityscape
Carla simulator can improve bird’s eye view 2D detection [14], Mapillary [39], Apolloscape [56], SYNTHIA [45]
performance on the real-world KITTI dataset. [22] showed datasets are proposed, each having an increased number of
that using GTA-V [44] to synthesize LiDAR point clouds annotated frames. For 3D segmentation, SemanticKITTI
for pre-training 3D object detectors can improve 5% aver- [6] released point-wise semantic labels on point clouds. As
age precision on the KITTI dataset. Similar to the success map information such as drivable area is useful to percep-

2
tion, Argoverse [13] annotates rich map annotations to in- though AirSim has advantages in aerial data capture. In ad-
novate novel perception algorithm requiring the map data. dition to simulators, commercial video games such as GTA-
In addition to annotations, perception datasets also need V [44] can also be used for synthetic data generation but
diverse environmental variations to capture the rare driving they do not allow low-level control of scene elements. Ac-
situations. As prior datasets usually have a small number cordingly, we have selected to use Carla for data generation
(<10) of agents per frame without complex interactions be- as it affords the most flexibility and customization.
tween agents, H3D [40] was released, with an average of 37 Long-Range Perception. Increasing the maximum sens-
agents per frame to include data in highly-crowded scenar- ing range of perception systems is important for safety in
ios with complex agent-agent interactions. To deal with ad- high-speed driving scenarios. However, LiDAR used in ex-
verse weather and lighting conditions, recent datasets such isting datasets has limited range, e.g., 120m in KITTI [17],
as CADC [42], nuScenes [12], A*3D[41], Waymo [53] col- 70m in nuScenes [12], 75m in Waymo [53]. Assuming per-
lected data under rainy, snowy, foggy, dusky and night con- fect detection accuracy and zero algorithmic latency, a car
ditions. As prior datasets usually acquired data at a low moving at a speed of 120km/h will have at most 3.6 sec-
driving speed (e.g., average 16 km/h in nuScenes), A*3D onds to respond to a detected obstacle with a 120m-range
dataset [41] was proposed to collect data at a much higher LiDAR. Naturally, enabling perception at a longer-range is
speed (e.g., 40-70 km/h), in order to include more high- preferred for increased safety. To the best of our knowl-
speed driving data that is common in the real-world. edge, [73] is the only work exploring a scenario with up
Regarding the sensing modalities, nuScenes [12] col- to 300m of depth sensing using three high-resolution RGB
lected the first dataset with Radar data, in addition to stan- cameras. In contrast, our work uses a simulator to collect
dard RGB camera, LiDAR, IMU, and GPS sensors. As ear- long-range high-density point clouds. We believe that our
lier datasets collected data in the frontal direction only, ig- data can help aid in the development of long-range percep-
noring objects to the sides or rear that are also important tion algorithms before data from real-world long-range sen-
to decision-making in driving, Argoverse [13], Audi [18], sors become widely available to the research community.
and nuScenes [12] equip their vehicles with multiple Li-
DAR and camera sensors for 360◦ data capturing. 3. The AIODrive Dataset
In comparison to existing datasets with a subset of sen-
For each scene in our dataset, we choose one of eight
sors, annotations and environmental variations, we provide
cities from Carla assets and sample locations covering the
a super-set of sensors, annotations and environmental vari-
entire city to generate agents. For each agent (vehicles,
ations. Also, beyond standard LiDAR such as Velodyne-
people), we set a random and faraway target destination to
64 [1] used in prior datasets for data collection, we pro-
generate diverse trajectories. We randomly customize the
vide LiDAR sensors with 10× larger sensing range and 4
behavior (e.g., maximum speed, how often to ignore red
levels of point densities, with the highest level having 10×
light, how often to cross the road) for each agent to increase
higher resolution (point density) than Velodyne-64. Impor-
the diversity of the data. Once the environment is set up,
tantly, the design of our long-range LiDAR sensors is not
we randomly select a vehicle as our ego-vehicle and equip
imaginary but based on active developments in new LiDAR
our full sensor suite to this vehicle for data recording. For
sensors such as AlphaPrime [4], Ouster [5] and Panasonic
agents who have approached their destinations, we provide
[3], which are developed with higher-resolution and longer-
them another faraway destination so that there is no dummy
range (e.g., 300m) depth sensing. In addition to providing
agent in our environment. We collected 250 such scenes
LiDAR sensors, also referred to as APD-LiDAR (avalanche
in our dataset, each containing 1000 frames with full set of
photodiodes), our dataset also provides SPAD-LiDAR (sin-
annotations. As shown in Table 1, our dataset has the most
gle photon avalanche diode) sensor which records photon
number of annotated frames compared to other datasets.
counts over space and time. This type of SPAD-LiDAR
sensor, although available in industry [2, 11], is not found 3.1. Comprehensive Sensor Suite
in public perception datasets for research purpose.
To increase robustness to sensor failure, multi-sensor
Synthetic Data Generation. Though many existing sim- perception approaches [29, 43, 65, 59, 69, 30, 25] are often
ulators (e.g., Sim4CV [37], Nvidia Drive [8]) can be used more favorable than single-sensor approaches [49, 36, 60,
for synthetic data generation, most of these simulators are 71, 62]. To innovate multi-sensor approach, it is crucial that
not open-source (not easy to make modifications) and free- datasets can provide comprehensive sensing modalities. To
to-use license is not available (i.e., derivative products are that end, we provide common sensors such as RGB, Depth,
not allowed). For the open-sourced simulators, AirSim [48] Stereo camera, LiDAR, IMU and GPS, as well as the Radar
and Carla [16] are popular due to detailed documentation and SPAD-LiDAR sensors, which are often not available in
and diverse sensors. However, AirSim does not allow low- prior work as shown in Table 1 (except for nuScenes provid-
level control over every agent in the way that Carla allows, ing the Radar data). To the best of our knowledge, we are

3
Table 1: Comparison of size and sensor modalities. Our dataset has the most comprehensive sensors while being the largest.
Dataset # of cities # of hours # of sequences # of annotated images Stereo Depth LiDAR Radar SPAD-LiDAR IMU/GPS All 360◦
KITTI [17] 1 1.5 22 15k 3 3 3 3
Cityscape [14] 27 2.5 0 5k 3 3
Mapillary Vistas [39] 30 - - 25k
ApolloScape [21, 56] 4 - - 140k 3 3 3
SYNTHIA [45] 1 2.2 4 200k 3 3
H3D [40] 4 0.8 160 27k 3 3
SemanticKITTI [6] 1 1.2 22 43k 3
DrivingStereo [52] - 5 42 180k 3 3 3 3
Argoverse [13] 2 0.6 113 22k 3 3 3 3
EuroCity [9] 31 0.4 - 47k
CADC [42] 1 0.6 75 7k 3 3
Audi [18] 3 0.3 3 12k 3 3 3 3 3
nuScenes [12] 2 5.5 1k 40k 3 3 3
A*3D [41] 1 55 - 39k 3 3
Waymo Open [53] 3 6.4 1150 230k 3
Ours (AIODrive) 8 6.9 250 250k 3 3 3 3 3 3 3

Table 2: Sensor description. Camera (Right)

Sensor Brief Description Camera (Front Right)
5× RGB Camera 10Hz frequency, two face forward stereo cam-
era, the others are for left, right and back direc-
tions, each with a FoV of 120◦ , 1920 × 720 Camera (Back)
LiDAR, Radar
5× Depth Camera same as the above RGB cameras IMU/GPS (Top)
3× LiDAR 64/800/1200 channels, 100k/600k/1M points Camera (Front Left)
per frame, 360◦ horizontal FoV, −90◦ to 90◦
Camera (Left)
vertical FoV, 10Hz frequency, ≤1000m range
1× SPAD-LiDAR same as the above LiDAR Downward from ground X-axis
1× Radar 10Hz frequency, 360◦ horizontal FoV, 100k Y-axis
Upward from ground Z-axis
points per second, ≤1000m range
1× IMU/GPS 10Hz frequency Figure 3: Sensor layout and coordinate systems.

the first to provide the SPAD-LiDAR data in public percep- ilar specifications to help aid in the development of long-
tion datasets. Also, our camera, LiDAR, Radar and SPAD range perception systems. Specifically, we provide three
sensors all have 360◦ horizontal field of view (FoV). LiDAR sensors, each with a resolution (density) of 100k,
Sensor Specifications. We show sensor descriptions in Ta- 600k, 1M points per frame. Each point in the cloud is a tu-
ble 2. Our sensor suite contains five (four for 360◦ sensing ple of (x, y, z, r), where (x, y, z) is the 3D location. Also,
and one for stereo) RGB and five depth cameras, as well as r is the simulated reflectance (also called intensity) value,
three LiDAR, one Radar, one SPAD-LiDAR and IMU/GPS which depends on many factors such as the sensor’s attenu-
sensors. All sensors are synchronized and we use the same ation factor, distance of the point, and color of the reflection
capturing frequency of 10Hz for all sensors. surface. The first LiDAR with 100k points and a range of
120m is to mimic the Velodyne-64, and the other two high-
Sensor Layout and Coordinate System. We follow KITTI density long-range LiDARs are provided to innovate long-
and use the right-hand rule for coordinate systems. Specif- range perception systems. All LiDARs are spinning and
ically, for camera coordinate, we use x axis for the right, collecting point clouds via ray-casting. To increase the re-
y axis pointing downward and z axis for the front direc- alism of the LiDAR point clouds, two augmentation mech-
tion. For LiDAR/Radar and IMU/GPS coordinate, we use anisms are used: (1) we randomly drop a small portion of
x axis for the front, y axis for the left and z axis pointing points based on their intensity values, i.e., the lower the in-
upward. We summarize sensor layout and coordinate sys- tensity is, the higher probability to be dropped; (2) we ran-
tems in Figure 3. To avoid transforming the coordinate be- domly perturb a small portion of points along the direction
tween LiDAR, Radar, IMU and GPS sensors, we place these of the laser ray, creating noisy distance measurements.
sensors at the same location (on top of the ego-vehicle) for In addition to high-density point clouds from LiDAR,
convenience, which is possible in the simulator. we also provide point clouds obtained from depth images.
High-Density Long-Range Point Cloud. To ensure safety Specifically, we project depth images from five cameras to
in high-speed driving scenarios, long-range perception [73] 3D space to obtain five point clouds, and then fuse them to
is critical. To innovate long-range perception systems, we obtain a full-surround point cloud with 4M points and 1km
as a community need public datasets that collect data us- sensing range (see supplementary for details). We refer to
ing longer-range LiDAR sensors than standard 120m-range this point cloud obtained from depth images as the depth
Velodyne-64 [1]. In anticipation of new high-density long- point cloud. We show a comparison of the Velodyne-64
range LiDAR sensors such as AlphaPrime [4], OS2 [5] and depth point cloud in Figure 4. For a car at 130 meters,
and Panasonic [3], we simulate LiDAR sensors with sim- depth point cloud can capture a decent number of points

4
Table 3: Comparison of annotation availability. We provide the most diverse annotations for all mainstream perception tasks.
Dataset # of 2D bounding boxes # of 3D bounding boxes Trajectory Image seg. Point cloud seg. Motion dynamics Fine-grained object class Map
KITTI [17] 80k 80k 3
Cityscape [14] 65k - 3
Mapillary Vistas [39] 200k - 3
ApolloScape [21, 56] 2.5M 70k 3 3
SYNTHIA [45] - - 3
H3D [40] - 1M 3
SemanticKITTI [6] - - 3
DrivingStereo [52] - -
Argoverse [13] - 993k 3 3
EuroCity [9] 238k -
CADC [42] - 344k
Audi [18] - 42k 3 3
nuScenes [12] - 1.4M 3 3
A*3D [41] - 230k
Waymo Open [53] 9.9M 12M 3
Ours (AIODrive) 26M 26M 3 3 3 3 3 3

Car at ~130m Car at ~130m

Car at ~80m Car at ~80m

Velodyne-64 point cloud Our dense depth point cloud

Figure 4: Comparison of point cloud density between Velodyne-64 (left) and depth point cloud (right). Clearly, depth
point cloud with higher density provides larger potential for detecting objects at a large distance.
while Velodyne-64 does not capture any point. This shows we provide the most comprehensive annotations, which in-
that depth point clouds have higher potential when used as cludes 2D-3D bounding box trajectories, image and point
inputs to perception algorithms to detect faraway objects. cloud based segmentation, motion dynamics, fine-grained
object class as well as the rich map annotation.
SPAD-LiDAR is useful in tasks such as depth sensing [33],
non-line-of-sight imaging [34, 20]. In anticipation of next Bounding Box Trajectories. To support 2D-3D object
generation SPAD-LiDAR (e.g., ON Semiconductor [2], Le- detection [66, 27, 58] and re-identification [28], 2D-3D
ica SPL100 [11]), we simulate SPAD-LiDAR to mimic the multi-object tracking [24, 54, 62, 61], trajectory forecast-
configurations of new SPAD-LiDAR sensors that are ac- ing [64, 67], we provide 2D-3D box annotations and ob-
tively being developed in industry. In comparison to Li- ject identities as shown in Figure 5. Following KITTI [17]
DAR (or APD-LiDAR) which requires hundreds of photons convention, we use (x1 , y1 , x2 , y2 ) to represent a 2D box,
received in a short period to trigger an avalanche (i.e., a where the (x1 , y1 ) and (x2 , y2 ) denotes coordinates of the
valid return point), SPAD is designed to measure every sin- top left and bottom right corner points. A truncation and
gle photon. Working at a Mega-hertz sampling frequency, occlusion measurement is also provided. For 3D bounding
SPAD-LiDAR can generate a spatial-temporal sampling of box, we use the representation of (x, y, z, l, w, h, θ), where
the scene to capture fine-grained time of flight information, the (x, y, z) denotes the object center, (l, w, h) denotes the
i.e., a 3D tensor of photon counts with dimensions equal to size of the 3D box and θ is the heading orientation.
azimuth×elevation×time. Instead of working with the raw
3D tensor format, one can also sample point cloud returns 2D-3D Segmentation. In addition to box-level perception,
from our SPAD-LiDAR, which has about 1M points with a pixel-level scene understanding is also useful [68], which
sensing range of 1k meters. For detailed SPAD-LiDAR sim- requires pixel-level annotation. To innovate pixel-level per-
ulation process, please refer to our supplementary. Again, ception algorithms, we provide 2D-3D semantic, instance
we emphasize that our dataset is the first providing SPAD- and panoptic segmentation labels as shown in Figure 6. The
LiDAR which is not found in public datasets. Please refer 2D segmentation labels are defined for each pixel in the im-
to supp. for other sensors such as Radar and depth camera. age while the 3D segmentation provides point-wise labels
on the point cloud. We provide segmentation labels on ob-
3.2. Diverse Annotations jects such as vehicle, pedestrian, vegetation, building, road,
The annotation availability to various perception tasks sidewalk, wall, traffic sign, pole and fence. Our segmenta-
is important to perception datasets. As shown in Table 3, tion labels can support a range of tasks such as image seg-

5
Figure 5: 2D-3D Bounding Box Trajectory Annotation. For each agent, we provide both the 2D (left) and 3D (right) tight
box annotation, along with a unique identity across videos. We denote different object identities with a different color.

Figure 6: 2D-3D Segmentation Annotation. We provide both the 2D image (top) and 3D point cloud (bottom) segmentation
annotation. From left to right, we show semantic, instance and panoptic segmentation respectively.

mentation, video object segmentation, point cloud segmen- Table 4: Comparison of environmental variations. We pro-
tation, multi-object tracking and segmentation (MOTS) [55] vide the most variations with many rare driving scenarios.
and multi-object panoptic tracking (MOPT) [23]. Dataset Adv. wea./light. Crowded High-speed Vio. of rule Acci.
KITTI [17]
Other labels. In addition to above mainstream annotations, Cityscape [14]
we also provide other annotations: (1) motion data for all Mapillary Vistas [39] 3
ApolloScape [21, 56] 3
agents including linear velocity, acceleration, and angular SYNTHIA [45] 3
velocity. Our motion data can be useful to ego-motion esti- H3D [40] 3
SemanticKITTI [6]
mation, velocity estimation, tracking; (2) Fine-grained ob- DrivingStereo [52] 3
ject class labels such as vehicle model class of Audi A2, Argoverse [13] 3
EuroCity [9] 3
Toyota Prius and Tesla Model 3; (3) Vehicle control sig- CADC [42] 3 3
nals such as throttle, steer, brake, and reverse; (4) City map Audi [18] 3
nuScenes [12] 3 3
and road structure, which is useful to localization, odome- A*3D [41] 3 3
try and trajectory forecasting. Also, our large-scale dataset Waymo Open [53] 3 3
Ours (AIODrive) 3 3 3 3 3
with point clouds and depth images can be used for point
cloud forecasting [63] and depth estimation [35]. See sup- datasets often have cars driving at a low speed and barely
plementary for details of other annotaions. have data of violation of traffic rules, let alone car accidents.
3.3. High Environmental Variations Instead, our dataset contains all these rare driving scenarios
and has the highest environmental variations.
To train perception systems robust to rare driving sce-
narios such as adverse weather, violation of traffic rule, car Crowded Scenes. Driving in crowded scenes is challeng-
accidents, it is important to first include a large number of ing as interactions between agents are complex and colli-
these rare driving data in the dataset. However, collecting a sion might happen. To address the challenge, datasets with
large number of such data is difficult in the real world be- highly crowded scenes are needed in order to train percep-
cause they are rare to happen and can be dangerous or at a tion systems robust to the scenarios. To that end, we collect
high cost, especially for car accidents. To collect such rare many scenes with a high agent density. On average, we have
driving data without causing any danger, we leverage the 104 agents per frame within the sensing range of our sen-
simulator to intentionally generate rare data and increase sors. We show the comparison of agents per frame and total
our environmental variations. We compare the environmen- labeled instances between datasets in Figure 7 (a). Note
tal variations between datasets in Table 4. Though recent that some datasets such as KITTI and Cityscape have a sig-
datasets mostly have weather/lighting conditions, some are nificantly lower number of labeled instances because they
limited by having too few number of agents. Also, existing only label objects in front. Though existing datasets such as

6
Table 5: Quantitative results of 2D and 3D object detection baselines on the test split of our AIODrive dataset.
Car Pedestrian Cyclist
Method Input Data Output Modalities
Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard
FPN [31] RGB from 5 cameras 2D 89.45 78.66 69.51 92.88 87.28 75.50 94.15 90.80 72.10
PointRCNN [49] Depth point cloud 3D 78.13 77.99 73.63 58.73 53.71 44.74 59.03 53.85 49.36
PointPillars [26] 80.86 77.39 69.77 55.37 47.79 40.94 60.72 50.20 46.35
SECOND [70] 81.35 79.38 70.57 62.32 59.23 54.34 61.45 58.49 52.86

100 Agents per frame 2500 4% AIODrive AIODrive

Total labeled instances (k)

Total labeled instances KITTI
25%
KITTI
80 2000 3%
Agents per frame

20%
2%
60 1500

Percentage

Percentage
2% 15%
40 1000
2%
10%
20 500 1%
0 0 0%
5%
Waymo
Audi
Cityscape
KITTI
A*3D
EuroCity
CADC
Argoverse
H3D

nuScenes
Ours

0% 0%
0 20 40 60 80 100 120 0 2 4 6 8 10 12 14
Driving Speed (km/h) Pedestrian Speed (km/h)
(a) High Crowdness (b) Driving Speed Distribution (c) People Speed Distribution
Figure 7: Data Statistics: (a) We compare agent density in terms of agents per frame and total labeled agents, which shows
that our dataset has more labeled instances; (b)(c) We compare the speed of the ego-vehicle and pedestrians, showing that
our data is collected at the speed closer to our normal daily driving, and we have more jogging and running pedestrians.
H3D, nuScenes, Waymo and Argoverse also have crowded [15] and COCO [32]. We then fine-tune the baseline on
scenes, about 30 to 50 agents per frame, our dataset is twice AIODrive. The results are shown in the 1st row of Table 5.
crowded and have a much higher number of labeled objects. We can see that FPN’s performance is reasonable but lower
High-Speed Driving. Existing datasets often collect data than its performance on KITTI, e.g., 93.53/89.35/79.35 for
at a low driving speed (e.g., nuScenes at 16km/h on aver- car in the easy/moderate/hard level. We believe this is be-
age), which is significantly different from our normal daily cause: (1) our evaluation requires detection at a larger range
driving speed, i.e., 30 to 60km/h on local road and 80 to (more difficult) than KITTI, e.g., our ‘hard’ level requires
120km/h on highway. To bridge the gap and mimic our detection of objects up to 120 meters while KITTI ‘hard’
daily driving, we collect data by driving our ego-vehicle at level requires detection up to 70 meters; (2) AIODrive has
a much higher speed as shown in Figure 7 (b). Specifically, a much higher object density than KITTI. As a result, there
our driving speed ranges from 0 to 130 km/h. will be more objects occluded in the images which are hard
to detect. With the challenges of long-range detection and
Other Variations. Besides above variations, we also pro- detection in crowded scenes, we hope that future work can
vide many other rare driving data such as adverse weather be encouraged to push performance higher on our dataset.
and lighting (e.g., rainy, foggy and night), car accidents, ve-
hicles that run over the red light, speed over the limit and 4.2. 3D Object Detection Evaluation
change the lane aggressively, children and adults jogging Baselines. We use LiDAR-based 3D object detection meth-
and running. Though these cases happen in the real world, ods such as PointRCNN [49], PointPillars [26], SECOND
they barely exist in existing datasets. To build robust per- [70] as baselines. See supp. for implementation details.
ception systems, it is important to include these rare scenar- Results on AIODrive with Depth Point Clouds. To reach
ios in the dataset. As an example, we show the pedestrian the best performance we can, we first use our densest point
speed in Figure 7 (c), which contains jogging and running cloud (i.e., depth point cloud) as input to baselines. As our
people. See supplementary for details of other variations. point clouds have a loner range than prior datasets such as
4. Experiments KITTI, we change the input point cloud range of detectors
To enable comparison with future work, we benchmark from 0-70m in frontal direction used in KITTI to 120m for
several baselines for 2D-3D object detection on our dataset. all directions, to enable perception at a larger range.
For evaluation protocol and data split, please refer to sup- We summarize the results in Table 5. We can see that all
plementary. Our code and training data will be released so 3D detection baselines achieve reasonable performance on
that AIODrive can be used to benchmark future methods, our AIODrive dataset. Also, performance tends to decrease
while a test set will remain private for fair comparisons. significantly from the ‘easy’ to the ‘moderate’ to the ‘hard’
level where the required detection range is increasing (see
4.1. 2D Object Detection Evaluation supp. for detailed evaluation protocol). Again, this shows
We use FPN [31] with a ResNet50 [19] backbone as the that detection at a longer range is harder than detection of
baseline. where the backbone is pre-trained on ImageNet nearby objects. We hope that our high-density long-range

7
Table 6: Quantitative results of 3D detection using point cloud with different densities in our AIODrive dataset.
Car Pedestrian Cyclist
Method Point Density (# of points)
Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard
PointRCNN [49] 100,000 (Velodyne-64 LiDAR p.c.) 74.98 72.73 53.85 45.31 37.37 34.66 56.95 50.70 42.96
600,000 (Long-range LiDAR p.c.) 76.74 75.17 69.76 56.39 50.14 40.38 58.71 52.37 46.83
1,000,000 (Long-range LiDAR p.c.) 77.71 77.26 71.17 58.16 51.92 43.81 59.64 52.61 47.73
4,000,000 (Depth p.c.) 78.13 77.99 73.63 58.73 53.71 44.74 59.03 53.85 49.36
1,000,000 (SPAD-LiDAR p.c.) 77.83 71.41 63.30 59.88 53.43 44.79 61.10 55.69 48.80
Table 7: 3D detection results on the real world KITTI dataset when training is augmented with our AIODrive dataset.
Car Pedestrian Cyclist
Method Training Data
Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard
PointRCNN [49] 250k frames AIODrive 65.32 46.21 39.38 24.57 19.04 18.32 40.93 30.41 26.68
KITTI 85.02 75.16 68.14 46.53 38.76 33.96 73.40 56.73 51.87
KITTI + 10k frames AIODrive 87.24 76.83 70.53 46.97 40.78 36.03 74.19 59.31 52.93
KITTI + 250k frames AIODrive 88.10 77.03 72.41 51.03 42.18 37.26 78.01 60.14 52.89

point clouds can be used to encourage future research to- formation in the raw 3D tensor data. Future work is needed
wards improving long-range 3D object detection. to fully leverage the SPAD-LiDAR data for 3D detection.
Effect of Point Cloud Density. In the above experiment, Results on Real-World KITTI Data. Lastly but also im-
we have benchmarked 3D detection performance of several portantly, we investigate if using our dataset can improve
baselines using our depth point clouds. To show effect of performance on the real data. To that end, we augment the
point cloud density, now we evaluate the same detector us- KITTI training data with the data from our dataset to train
ing point clouds with different density levels. We empha- PointRCNN [49]. This data augmentation is achieved by
size that this experiment is unique to our dataset as only equally (same number of frames) combining data from two
we provide (LiDAR and depth) point clouds with different datasets in every batch of training. In the case we have a
density levels, e.g., 100k, 600k, 1M, 4M points per frame. total of more frames from AIODrive than KITTI, we ran-
Also, we adapt PointRCNN and show the first 3D detection domly sample frames from AIODrive and still maintain an
baseline that works with SPAD-LiDAR point cloud inputs. equal number of frames from two datasets in every batch.
We summarize the results in Table 6. We can see that, We follow the KITTI evaluation on the test set and sum-
using (LiDAR and depth) point clouds with a higher den- marize the results in Table 7. We can see that PointRCNN
sity as input generally achieves higher performance, espe- trained with only KITTI data (the 2nd row) achieves simi-
cially in the ‘hard’ level which includes faraway objects up lar performance for car as reported in [49]. Also, PointR-
to 120m. This suggests that high-density long-range point CNN trained with only synthetic AIODrive data (the 1st
clouds could be helpful for improving 3D detection at a row) achieves lower performance on KITTI compared to
longer range. Also, for LiDAR and depth point clouds with trained with the KITTI data. This suggests that domain gap
different densities, we found that the differences of per- exists between two datasets. Importantly, when we aug-
formance in the ‘easy’ level are not significant (except for ment training data by combining data from two datasets (the
pedestrians). This shows that, for cars and cyclists, the main 3rd and 4th rows), we observed clear performance improve-
performance bottleneck of 3D detection at nearby range (up ments. This proves that our synthetic data can be used in
to 40 meters in the ‘easy’ level) may not be point cloud concert with real data to improve performance on the real
density but other factors such as model capacity. In con- data. Moreover, higher performance is achieved if more
trast, detection for nearby pedestrians can be significantly augmented frames (e.g., 250k vs. 10k frames) are used. The
improved using point clouds with a higher density. best performance is achieved when both KITTI and all data
We also note that we observed a different performance from AIODrive are used for training.
pattern when using SPAD-LiDAR (the last row in Table 6),
which tends to achieve higher performance for pedestrians 5. Conclusion
and cyclists (small objects) and lower performance for cars We proposed a dataset with the most diverse annotations,
(large objects). We hypothesize that the higher performance environmental variations and sensors. Our dataset can sup-
for small objects may be due to the larger fill factor of the port all mainstream perception tasks and innovate multi-
SPAD-LiDAR compared to APD-LiDAR (see supp. for de- task multi-sensor perception systems. Also, we confirmed
tails). However, it is not fully clear why performance drops that our high-density long-range point clouds can be used to
for cars. We hypothesize that it is because our method of us- improve long-range perception. To enable public compari-
ing SPAD-LiDAR by merging multiple point cloud returns son and encourage future research in long-range perception,
(see supp. for details) does not fully exploit multi-echo in- our full dataset and accompanying code will be released.

8
References [16] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto-
nio Lopez, and Vladlen Koltun. CARLA: An Open Urban
[1] High Definition Real-Time 3D Lidar. https : / / Driving Simulator. CoRL, 2017. 1, 3
velodynelidar.com/products/hdl-64e/. 2, 3,
[17] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are We
4
Ready for Autonomous Driving? the KITTI Vision Bench-
[2] ON Semiconductor to Demonstrate Long-range and In- mark Suite. CVPR, 2012. 2, 3, 4, 5, 6
Vehicle Automotive Imaging and Detection Technology.
[18] Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi,
https : / / www . onsemi . com / PowerSolutions /
Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz
newsItem.do?article=4444. 3, 5
Hauswald, Viet Hoang Pham, Maximilian Mühlegg, Se-
[3] Panasonic Develops Long-Range TOF Image Sensor. bastian Dorn, Tiffany Fernandez, Martin Jänicke, Sudesh
https://news.panasonic.com/global/press/ Mirashi, Chiragkumar Savani, Martin Sturm, Oleksandr
data/2018/06/en180619-3/en180619-3.html. Vorobiov, Martin Oelker, Sebastian Garreis, and Peter
3, 4 Schuberth. A2D2: Audi Autonomous Driving Dataset.
[4] The Alpha Prime Delivers Unrivaled Combination of arXiv:2004.06320, 2020. 3, 4, 5, 6
Field-of-View, Range, and Image Clarity. https : / /
[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
velodynelidar.com/products/alphaprime/.
Deep Residual Learning for Image Recognition. CVPR,
3, 4
2016. 7
[5] The OS2 Delivers Long-Range, High-Resolution 3D Sens-
[20] Felix Heide, Matthew O’Toole, Kai Zang, David B Lindell,
ing. https : / / ouster . com / products / os2 -
Steven Diamond, and Gordon Wetzstein. Non-Line-of-Sight
lidar-sensor/. 3, 4
Imaging with Partial Occluders and Surface Normals. ACM
[6] Jens Behley, Martin Garbade, Andres Milioto, Jan Quen- Transactions on Graphics, 2019. 5
zel, Sven Behnke, Cyrill Stachniss, and Juergen Gall. Se-
[21] Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao,
manticKITTI: A Dataset for Semantic Scene Understanding
Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang
of LiDAR Sequences. ICCV, 2019. 1, 2, 4, 5, 6
Yang. The ApolloScape Dataset for Autonomous Driving.
[7] S. Bileschi. CBCL Streetscenes Challenge Framework, CVPRW, 2018. 1, 4, 5, 6
2007. 2
[22] Braden Hurl, Krzysztof Czarnecki, and Steven Waslander.
[8] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski,
Precise Synthetic Image and LiDAR (PreSIL) Dataset for
Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D.
Autonomous Vehicle Perception. IV, 2019. 2
Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin
[23] Juana Valeria Hurtado, Rohit Mohan, and Abhinav Valada.
Zhang, Jake Zhao, and Karol Zieba. End to End Learning
MOPT: Multi-Object Panoptic Tracking. arXiv:2004.08189,
for Self-Driving Cars. arXiv:1604.07316, 2016. 3
2020. 6
[9] Markus Braun, Sebastian Krebs, Fabian Flohr, and Dariu M.
Gavrila. The EuroCity Persons Dataset: A Novel Benchmark [24] Hiroaki Ishioka, Xinshuo Weng, Yunze Man, and Kris Ki-
for Object Detection. TPAMI, 2019. 4, 5, 6 tani. Single Camera Worker Detection, Tracking and Action
Recognition in Construction Site. ISARC, 2020. 5
[10] Gabriel J. Brostow, Julien Fauqueur, and Roberto Cipolla.
Semantic Object Classes in Video: A High-Definition [25] Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh,
Ground Truth Database. Pattern Recognition Letters, 2009. and Steven Waslander. Joint 3D Proposal Generation and
2 Object Detection from View Aggregation. IROS, 2018. 3
[11] Rebecca Brown, Preston Hartzell, and Craig Glennie. Evalu- [26] Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou,
ation of SPL100 Single Photon Lidar Data. Remote Sensing, Jiong Yang, and Oscar Beijbom. PointPillars: Fast Encoders
2020. 3, 5 for Object Detection from Point Clouds. CVPR, 2019. 7
[12] Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, [27] Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu
Venice Erin Liong, and Qiang Xu. nuScenes: A Multimodal Zhang, Fares Beainy, Kris Kitani, and Takeo Kanade. Visual
Dataset for Autonomous Driving. CVPR, 2020. 1, 3, 4, 5, 6 Compiler: Synthesizing a Scene-Specific Pedestrian Detec-
[13] Ming-fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet tor and Pose Estimator. arXiv:1612.05234, 2016. 5
Singh, B Sławomir, Andrew Hartnett, De Wang, Peter Carr, [28] Yu-Jhe Li, Xinshuo Weng, and Kris Kitani. Learning Shape
Simon Lucey, Deva Ramanan, and James Hays. Argoverse: Representations for Person Re-Identification under Clothing
3D Tracking and Forecasting with Rich Maps. CVPR, 2019. Chang. WACV, 2021. 5
1, 3, 4, 5, 6 [29] Ming Liang, Bin Yang, Yun Chen, Rui Hu, and Raquel Urta-
[14] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo sun. Multi-Task Multi-Sensor Fusion for 3D Object Detec-
Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe tion. CVPR, 2019. 3
Franke, Stefan Roth, and Bernt Schiele. The Cityscapes [30] Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urta-
Dataset for Semantic Urban Scene Understanding. CVPR, sun. Deep Continuous Fusion for Multi-Sensor 3D Object
2016. 2, 4, 5, 6 Detection. ECCV, 2018. 3
[15] Jia Deng, Wei Dong, Richard Socher, Li-jia Li, Kai Li, and [31] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Bharath Hariharan, and Serge Belongie. Feature Pyramid
Database. CVPR, 2009. 7 Networks for Object Detection. CVPR, 2017. 7

9
[32] Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, [48] Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish
Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Kapoor. AirSim: High-Fidelity Visual and Physical Simu-
Zitnick. Microsoft COCO: Common Objects in Context. lation for Autonomous Vehicles. Field and Service Robotics,
ECCV, 2014. 7 2017. 3
[33] David B Lindell, Matthew O’Toole, and Gordon Wetzstein. [49] Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. PointR-
Single-Photon 3D Imaging with Deep Sensor Fusion. ACM CNN: 3D Object Proposal Generation and Detection from
Transactions on Graphics, 2018. 5 Point Cloud. CVPR, 2019. 3, 7, 8
[34] David B Lindell, Gordon Wetzstein, and Matthew O’Toole. [50] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob
Wave-Based Non-Line-of-Sight Imaging Using Fast FK Mi- Fergus. Indoor Segmentation and Support Inference from
gration. ACM Transactions on Graphics, 2019. 5 RGBD Images. ECCV, 2012. 2
[35] Reza Mahjourian, Martin Wicke, and Anelia Angelova. Un- [51] Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao.
supervised Learning of Depth and Ego-Motion from Monoc- SUN RGB-D: A RGB-D Scene Understanding Benchmark
ular Video Using 3D Geometric Constraints. CVPR, 2018. Suite. CVPR, 2015. 2
6 [52] Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi,
[36] Aashi Manglik, Xinshuo Weng, Eshed Ohn-bar, and Kris M and Bolei Zhou. DrivingStereo: A Large-Scale Dataset for
Kitani. Forecasting Time-to-Collision from Monocular Stereo Matching in Autonomous Driving Scenarios. CVPR,
Video: Feasibility, Dataset, and Challenges. IROS, 2019. 2019. 4, 5, 6
3 [53] Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien
[37] M Matthias, Casser Jean, Lahoud Neil, and C V Mar. Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou,
Sim4CV: A Photo-Realistic Simulator for Computer Vision Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han,
Applications. IJCV, 2018. 3 Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Et-
[38] Maxim Maximov, Kevin Galim, and Laura Leal-Taixé. Fo- tinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang,
cus on defocus: bridging the synthetic to real domain gap for Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov.
depth estimation. CVPR, 2020. 2 Scalability in Perception for Autonomous Driving: Waymo
[39] Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Open Dataset. CVPR, 2020. 1, 3, 4, 5, 6
Peter Kontschieder. The Mapillary Vistas Dataset for Se- [54] Xi Sun, Xinshuo Weng, and Kris Kitani. When We First
mantic Understanding of Street Scenes. ICCV, 2017. 2, 4, 5, Met: Visual-Inertial Person Localization for Co-Robot Ren-
6 dezvous. IROS, 2020. 5
[40] Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting [55] Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon
Chen. The H3D Dataset for Full-Surround 3D Multi-Object Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger,
Detection and Tracking in Crowded Urban Scenes. ICRA, and Bastian Leibe. MOTS: Multi-Object Tracking and Seg-
2019. 1, 3, 4, 5, 6 mentation. CVPR, 2019. 6
[41] Quang-hieu Pham, Pierre Sevestre, Ramanpreet Singh [56] Peng Wang, Xinyu Huang, Xinjing Cheng, Dingfu Zhou,
Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Qichuan Geng, and Ruigang Yang. The ApolloScape
Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3D Dataset: Open Dataset for Autonomous Driving and its Application.
Towards Autonomous Driving in Challenging Environments. TPAMI, 2019. 2, 4, 5, 6
ICRA, 2020. 3, 4, 5, 6 [57] Sen Wang, Daoyuan Jia, and Xinshuo Weng. Deep
[42] Matthew Pitropov, Danson Garcia, Jason Rebello, Michael Reinforcement Learning for Autonomous Driving.
Smart, Carlos Wang, Krzysztof Czarnecki, and Steven arXiv:1811.11329, 2018. 2
Waslander. Canadian Adverse Driving Conditions Dataset. [58] Yongxin Wang, Kris Kitani, and Xinshuo Weng. Joint Ob-
arXiv:2001.10117, 2020. 1, 3, 4, 5, 6 ject Detection and Multi-Object Tracking with Graph Neural
[43] Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Networks. arXiv:2006.13164, 2020. 5
Leonidas J. Guibas. Frustum PointNets for 3D Object De- [59] Zhixin Wang and Kui Jia. Frustum ConvNet: Sliding Frus-
tection from RGB-D Data. CVPR, 2018. 3 tums to Aggregate Local Point-Wise Features for Amodal
[44] Stephan R. Richter, Zeeshan Hayder, and Vladlen Koltun. 3D Object Detection. IROS, 2019. 3
Playing for Benchmarks. ICCV, 2017. 2, 3 [60] Xinshuo Weng and Kris Kitani. Monocular 3D Object De-
[45] German Ros, Laura Sellart, Joanna Materzynska, David tection with Pseudo-LiDAR Point Cloud. ICCVW, 2019. 3
Vazquez, and Antonio Lopez. The SYNTHIA Dataset: A [61] Xinshuo Weng and Kris Kitani. AutoSelect: Automatic and
Large Collection of Synthetic Images for Semantic Segmen- Dynamic Detection Selection for 3D Multi-Object Tracking.
tation of Urban Scenes. CVPR, 2016. 2, 4, 5, 6 arXiv:2012.05894, 2020. 5
[46] Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and [62] Xinshuo Weng, Jianren Wang, David Held, and Kris Kitani.
William T. Freeman. LabelMe: A Database and Web-Based 3D Multi-Object Tracking: A Baseline and New Evaluation
Tool for Image Annotation. IJCV, 2008. 2 Metrics. IROS, 2020. 3, 5
[47] Ahmad El Sallab, Ibrahim Sobh, Mohamed Zahran, and Mo- [63] Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani,
hamed Shawky. Unsupervised Neural Sensor Models for and Nick Rhinehart. 4D Forecasting: Sequantial Forecasting
Synthetic LiDAR Data Augmentation. NeurIPSW, 2019. 2 of 100,000 Points. ECCVW, 2020. 6

10
[64] Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani,
and Nick Rhinehart. Inverting the Pose Forecasting Pipeline
with SPF2: Sequential Pointcloud Forecasting for Sequential
Pose Forecasting. CoRL, 2020. 5
[65] Xinshuo Weng, Yongxin Wang, Yunze Man, and Kris Kitani.
GNN3DMOT: Graph Neural Network for 3D Multi-Object
Tracking with 2D-3D Multi-Feature Learning. CVPR, 2020.
3
[66] Xinshuo Weng, Shangxuan Wu, Fares Beainy, and Kris Ki-
tani. Rotational Rectification Network: Enabling Pedestrian
Detection for Mobile Vision. WACV, 2018. 5
[67] Xinshuo Weng, Ye Yuan, and Kris Kitani. Joint 3D Tracking
and Forecasting with Graph Neural Network and Diversity
Sampling. arXiv:2003.07847, 2020. 5
[68] Shangxuan Wu and Xinshuo Weng. Image Labeling with
Markov Random Fields and Conditional Random Fields.
arXiv:1811.11323, 2018. 5
[69] Chen Xiaozhi, Ma Huimin, Wan Ji, Li Bo, and Xia Tian.
Multi-View 3D Object Detection Network for Autonomous
Driving. CVPR, 2017. 3
[70] Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely Embed-
ded Convolutional Detection. Sensors, 2018. 7
[71] Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Ji-
aya Jia. STD: Sparse-to-Dense 3D Object Detector for Point
Cloud. ICCV, 2019. 3
[72] Senthil Yogamani, Ciaran Hughes, Jonathan Horgan, Ganesh
Sistu, Padraig Varley, Derek O’Dea, Michal Uricar, Stefan
Milz, Martin Simon, Karl Amende, Christian Witt, Hazem
Rashed, Sumanth Chennupati, Sanjaya Nayak, Saquib Man-
soor, Xavier Perroton, and Patrick Perez. WoodScape: A
Multi-Task, Multi-Camera Fisheye Dataset for Autonomous
Driving. ICCV, 2019. 1
[73] Kai Zhang, Jiaxin Xie, Noah Snavely, and Qifeng Chen.
Depth Sensing Beyond LiDAR Range. CVPR, 2020. 1, 3, 4

View publication stats

Anand Bhat PHD Thesis
No ratings yet
Anand Bhat PHD Thesis
173 pages
SRS documentation of Virtual Classroom System , SRS documentation of Personal Identity Management ,SRS documentation of EMentoring for women system , SRS Documentation of Employee Performance Management SRS Documentation of Online Ticketing
96% (26)
SRS documentation of Virtual Classroom System , SRS documentation of Personal Identity Management ,SRS documentation of EMentoring for women system , SRS Documentation of Employee Performance Management SRS Documentation of Online Ticketing
79 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
Cross-Field Road Markings Detection Based on Inverse Perspective Mapping
No ratings yet
Cross-Field Road Markings Detection Based on Inverse Perspective Mapping
21 pages
1608 07916 PDF
No ratings yet
1608 07916 PDF
8 pages
1902 07830
No ratings yet
1902 07830
27 pages
Manual Caldera Parker Co 104-40 104 GC2
No ratings yet
Manual Caldera Parker Co 104-40 104 GC2
152 pages
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
No ratings yet
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
21 pages
Semantickitti: A Dataset For Semantic Scene Understanding of Lidar Sequences
No ratings yet
Semantickitti: A Dataset For Semantic Scene Understanding of Lidar Sequences
11 pages
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
No ratings yet
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
21 pages
a-survey-on-deep-learning-approaches-for-data-integration-in-26mdyhdm
No ratings yet
a-survey-on-deep-learning-approaches-for-data-integration-in-26mdyhdm
25 pages
Deep Learning For Image and Point Cloud Fusion in Autonomous Driving: A Review
No ratings yet
Deep Learning For Image and Point Cloud Fusion in Autonomous Driving: A Review
19 pages
Augmented Li DARSimulatorfor Autonomous Driving
No ratings yet
Augmented Li DARSimulatorfor Autonomous Driving
11 pages
Kshitij Synopsis
No ratings yet
Kshitij Synopsis
8 pages
Deep SCNN-Based Real-Time Object Detection For Self-Driving Vehicles Using LiDAR Temporal Data
No ratings yet
Deep SCNN-Based Real-Time Object Detection For Self-Driving Vehicles Using LiDAR Temporal Data
10 pages
44
No ratings yet
44
21 pages
Deep Learning For Lidar-Only and Lidar-Fusion 3D Perception: A Survey
No ratings yet
Deep Learning For Lidar-Only and Lidar-Fusion 3D Perception: A Survey
25 pages
Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey
No ratings yet
Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey
38 pages
Chapter 01 - 3D Perception Vision
No ratings yet
Chapter 01 - 3D Perception Vision
8 pages
Deep Learning For Image and Point Cloud Fusion in Autonomous Driving A Review
No ratings yet
Deep Learning For Image and Point Cloud Fusion in Autonomous Driving A Review
18 pages
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
No ratings yet
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
32 pages
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
No ratings yet
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
14 pages
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
No ratings yet
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
10 pages
2103 - ICML - Perceiver General Perception With Iterative Attention
No ratings yet
2103 - ICML - Perceiver General Perception With Iterative Attention
16 pages
Proximity Based Automatic Data Annotation For Autonomous Driving
No ratings yet
Proximity Based Automatic Data Annotation For Autonomous Driving
10 pages
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers
No ratings yet
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers
17 pages
3D Point Cloud Generation with Millimeter-Wave Radar
No ratings yet
3D Point Cloud Generation with Millimeter-Wave Radar
23 pages
Object Detection For Automotive Radar Point Clouds - A Comparison
No ratings yet
Object Detection For Automotive Radar Point Clouds - A Comparison
23 pages
Python UNIT.1
No ratings yet
Python UNIT.1
24 pages
Prakash Multi-Modal Fusion Transformer For End-to-End Autonomous Driving CVPR 2021 Paper
No ratings yet
Prakash Multi-Modal Fusion Transformer For End-to-End Autonomous Driving CVPR 2021 Paper
11 pages
IET Computer Vision - 2024 - Massoud - Learnable fusion mechanisms for multimodal object detection in autonomous vehicles
No ratings yet
IET Computer Vision - 2024 - Massoud - Learnable fusion mechanisms for multimodal object detection in autonomous vehicles
13 pages
Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation
No ratings yet
Dark Green Light Green White Corporate Geometric Company Internal Deck Business Presentation
17 pages
LiDar Re
No ratings yet
LiDar Re
13 pages
Point-Trajectory Transformer For Efficient Temporal 3D Object Detection
No ratings yet
Point-Trajectory Transformer For Efficient Temporal 3D Object Detection
10 pages
Object Detection Using Deep Learning Approach
100% (1)
Object Detection Using Deep Learning Approach
9 pages
Pseudo-Image and Sparse Points
No ratings yet
Pseudo-Image and Sparse Points
13 pages
Radarpoint Cloud Dataset Forapplications
No ratings yet
Radarpoint Cloud Dataset Forapplications
8 pages
M100 - 210 Manual
No ratings yet
M100 - 210 Manual
166 pages
Multi-Modal 3D Object Detection in Autonomous Driving a Survey and Taxonomy
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving a Survey and Taxonomy
18 pages
kim2020
No ratings yet
kim2020
11 pages
Goose
No ratings yet
Goose
7 pages
Obstacle Detection For Autonomus Vehicles Using 3D LiDAR Point Cloud Data
No ratings yet
Obstacle Detection For Autonomus Vehicles Using 3D LiDAR Point Cloud Data
14 pages
Attention_and_Feature_Fusion_SSD_for_Remote_Sensing_Object_Detection
No ratings yet
Attention_and_Feature_Fusion_SSD_for_Remote_Sensing_Object_Detection
9 pages
ApolloCar3D
No ratings yet
ApolloCar3D
13 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
2018_DGCNN
No ratings yet
2018_DGCNN
12 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
Pan_3D_Object_Detection_With_Pointformer_CVPR_2021_paper
No ratings yet
Pan_3D_Object_Detection_With_Pointformer_CVPR_2021_paper
10 pages
2005.01864v1
No ratings yet
2005.01864v1
13 pages
1 s2.0 S0921889023001975 Main
No ratings yet
1 s2.0 S0921889023001975 Main
9 pages
Drones 07 00682
No ratings yet
Drones 07 00682
18 pages
CARRADA Dataset Camera and Automotive Radar With R
No ratings yet
CARRADA Dataset Camera and Automotive Radar With R
8 pages
Multimodal Fusion Object Detection System For Autonomous Vehicles
No ratings yet
Multimodal Fusion Object Detection System For Autonomous Vehicles
9 pages
7
No ratings yet
7
10 pages
AI Models for 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
No ratings yet
AI Models for 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
8 pages
2022 See_Eye_to_Eye_A_Lidar-Agnostic_3D_Detection_Framework_for_Unsupervised_Multi-Target_Domain_Adaptation
No ratings yet
2022 See_Eye_to_Eye_A_Lidar-Agnostic_3D_Detection_Framework_for_Unsupervised_Multi-Target_Domain_Adaptation
8 pages
SaViD Spectravista Aesthetic Vision Integration Fo
No ratings yet
SaViD Spectravista Aesthetic Vision Integration Fo
8 pages
Improving Distant 3D Object Detection Using 2D Box Supervision
No ratings yet
Improving Distant 3D Object Detection Using 2D Box Supervision
11 pages
World Class Manufacturing
No ratings yet
World Class Manufacturing
78 pages
ref19
No ratings yet
ref19
6 pages
Fourier Series For A Periodic Function F (T) : Power Electronics by D. W. Hart Chapter 02
No ratings yet
Fourier Series For A Periodic Function F (T) : Power Electronics by D. W. Hart Chapter 02
43 pages
33
No ratings yet
33
7 pages
Aiav Unit 2 Notes
No ratings yet
Aiav Unit 2 Notes
8 pages
5-Jul-11093 paper
No ratings yet
5-Jul-11093 paper
5 pages
Open-Loop and Closed-Loop Control of A DC Motor With Ni Myrio
No ratings yet
Open-Loop and Closed-Loop Control of A DC Motor With Ni Myrio
15 pages
isprs-archives-XLI-B3-309-2016
No ratings yet
isprs-archives-XLI-B3-309-2016
6 pages
GPS19x HVS INST ML
No ratings yet
GPS19x HVS INST ML
52 pages
PDMS and Associated Products Installation Guide
No ratings yet
PDMS and Associated Products Installation Guide
90 pages
Gas Leakage Detection & Monitiring
No ratings yet
Gas Leakage Detection & Monitiring
23 pages
Cargador Frontal WA500-6 (English) Komatsu
100% (1)
Cargador Frontal WA500-6 (English) Komatsu
12 pages
IBM VIOS Maintenance
100% (1)
IBM VIOS Maintenance
46 pages
Lecture Notes - 17ec741 - Module - Audio & Video Compression - Raja GV
No ratings yet
Lecture Notes - 17ec741 - Module - Audio & Video Compression - Raja GV
51 pages
EIS 011217 1416 18 API Integration Process
No ratings yet
EIS 011217 1416 18 API Integration Process
14 pages
Nokia 5130 Xpressmusic-1
No ratings yet
Nokia 5130 Xpressmusic-1
16 pages
Highway Alignment Optimization: An Integrated BIM and GIS Approach
No ratings yet
Highway Alignment Optimization: An Integrated BIM and GIS Approach
28 pages
Identification, Prod Item Corporate Manufacturing Specification (ITEM) 18111
No ratings yet
Identification, Prod Item Corporate Manufacturing Specification (ITEM) 18111
15 pages
Solar LED Street Light - Ver - 2.0
No ratings yet
Solar LED Street Light - Ver - 2.0
24 pages
Interview Question
No ratings yet
Interview Question
18 pages
Volvo Camshafts
No ratings yet
Volvo Camshafts
1 page
What Is Xilinx XC7K160T-2FBG484i Fpga
No ratings yet
What Is Xilinx XC7K160T-2FBG484i Fpga
13 pages
Opm IV Module
No ratings yet
Opm IV Module
18 pages
Women and Cyber Crimes
No ratings yet
Women and Cyber Crimes
9 pages
Ford Motor Company Advanced Product Quality Planning (APQP) Status Reporting Guideline
No ratings yet
Ford Motor Company Advanced Product Quality Planning (APQP) Status Reporting Guideline
3 pages
EET 2281 PLC Final Exam PART 1 SPRING 2021 PROBLEMS
No ratings yet
EET 2281 PLC Final Exam PART 1 SPRING 2021 PROBLEMS
4 pages
7-72-9490 - Kempinski Hotel - Muscat: Revit Working in A Worksharing Environment (Worksets)
No ratings yet
7-72-9490 - Kempinski Hotel - Muscat: Revit Working in A Worksharing Environment (Worksets)
4 pages
SIEMENS SIMATIC S7-1500T Flying Saw - ID_ 109744840 - Industry Support Siemens
No ratings yet
SIEMENS SIMATIC S7-1500T Flying Saw - ID_ 109744840 - Industry Support Siemens
3 pages
Resume
No ratings yet
Resume
2 pages
F-Center: Smart bIOS
No ratings yet
F-Center: Smart bIOS
1 page
Kev Sop Caltech Updated 1
No ratings yet
Kev Sop Caltech Updated 1
3 pages
WorldWind Development Essentials: Definitive Reference for Developers and Engineers
From Everand
WorldWind Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Ar Xiv

Uploaded by

Ar Xiv

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-

Preprint · December 2020

Xinshuo Weng Yunze Man

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

a synthetic large-scale dataset that provides comprehensive Radar sensing

Table 2: Sensor description. Camera (Right)

Car at ~130m Car at ~130m

Car at ~80m Car at ~80m

Velodyne-64 point cloud Our dense depth point cloud

100 Agents per frame 2500 4% AIODrive AIODrive

Total labeled instances (k)

View publication stats

You might also like