This document summarizes a paper titled "DeepI2P: Image-to-Point Cloud Registration via Deep Classification". The paper proposes a method for estimating the camera pose within a point cloud map using a deep learning model. The model first classifies whether points in the point cloud fall within the camera's frustum or image grid. It then performs pose optimization to estimate the camera pose by minimizing the projection error of inlier points onto the image. The method achieves more accurate camera pose estimation compared to existing techniques based on feature matching or depth estimation. It provides a new approach for camera localization using point cloud maps without requiring cross-modal feature learning.
2020/10/10に開催された第4回全日本コンピュータビジョン勉強会「人に関する認識・理解論文読み会」発表資料です。
以下の2本を読みました
Harmonious Attention Network for Person Re-identification. (CVPR2018)
Weekly Supervised Person Re-Identification (CVPR2019)
2018/10/20コンピュータビジョン勉強会@関東「ECCV読み会2018」発表資料
Yew, Z. J., & Lee, G. H. (2018). 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration. European Conference on Computer Vision.
27. 27
[Engel2014]LSD-SLAM (2/3)
Tracking
濃度勾配の高い画素のみPose推定に使用(Semi-Dense)
深度を使ってKeyFrameの画素を現フレームに投影し、差分を最小
化するようPose推定 (Direct法)
Depth Map Estimation
Poseの変化が閾値を超えたらKeyFrame生成
KeyFrameの深度初期値を前KeyFrameの深度を投影して生成
追跡フレームとKeyFrameとのベースラインステレオで深度を補正*
Map Optimization
KeyFrame生成時近傍のKeyFrameおよび類似KeyFrameを取得し、そ
れぞれLoopかを判別
Loopが存在する場合、2つのKeyFrameの画素と深度から相対Pose
を求め、それをLoop上を伝播させて最適化(Graph Optimization)
*J. Engel, J. Sturm, and D. Cremers. Semi-dense visual odometry for a monocular camera. In IEEE International Conference on
ComputerVision (ICCV), December 2013
28. [Engel2014]LSD-SLAM (3/3)
[9]Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular
camera. In: Intl. Conf. on ComputerVision (ICCV) (2013)
[15]Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Intl.
Symp. on Mixed and Augmented Reality (ISMAR) (2007)
[14]Kerl, C., Sturm, J., Cremers, D.: Dense visual SLAM for RGB-D cameras. In: Intl.
Conf. on Intelligent Robot Systems (IROS) (2013)
[7]Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard,W.:An evaluation of
the RGB-D slam system. In: Intl. Conf. on Robotics and Automation (ICRA) (2012)
TUM-RGBDベンチマーク(軌跡の二乗誤差(cm))
28
36. [Tateno2017]CNN-SLAM (2/3)
Camera Pose Estimation
現フレームの画素を前キーフレーム上へ投影した時の差が最
小となるPoseを推定(Direct法)
LSD-SLAM同様、輝度勾配の高い領域
投影時にCNNで推定した深度情報を使用
LSD-SLAMではKey-Frame間のステレオで深度推定
CNN Depth Prediction & Semantic Segmentation
Laina, I., Rupprecht, C., Belagiannis,V.,Tombari, F., & Navab, N.
(2016). Deeper Depth Prediction with Fully Convolutional
Residual Networks. IEEE International Conference on 3DVision.
各KeyFrameに対し深度推定
LSD-SLAMと同様にbaseline stereoを用いて深度を補正
36
37. [Tateno2017]CNN-SLAM (3/3)
ICL-NUIM datasetとTUM datasetによる軌跡と深度の精度評価
以下の環境でリアルタイム
• Intel Xeon CPU at 2.4GHz with 16GB of RAM
• Nvidia Quadro K5200 GPU with 8GB of VRAM
37
41. 参考文献 (カメラによるVisual SLAM)
[Klein2007]Klein, G., & Murray, D. (2007). ParallelTracking and Mapping for Small
AR Workspaces. In IEEE and ACM International Symposium on Mixed and Augmented
Reality, ISMAR.
[Newcombe2011]Newcombe, R.A., Lovegrove, S. J., & Davison,A. J. (2011). DTAM:
DenseTracking and Mapping in Real-Time. In International Conference on Computer
Vision.
[Engel2014]Engel, J., Schops,T., & Cremers, D. (2014). LSD-SLAM: Large-Scale Direct
monocular SLAM. In European Conference on ComputerVision
[Mur-Artal2015]Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM:A
Versatile and Accurate Monocular SLAM System. IEEETransactions on Robotics, 31(5),
1147–1163.
[Mur-Artal2016]Mur-Artal, R., & Tardos, J. D. (2016). ORB-SLAM2: an Open-Source
SLAM System for Monocular, Stereo and RGB-D Cameras. ArXiv, (October).
Retrieved from
[Tateno2017]Tateno, K.,Tombari, F., Laina, I., & Navab, N. (2017). CNN-SLAM : Real-
time dense monocular SLAM with learned depth prediction. In IEEE Conference on
ComputerVision and Pattern Recognition.
[Zhou2018]Zhou, H., & Ummenhofer, B. (2018). DeepTAM : DeepTracking and
Mapping. In European Conference on ComputerVision.
41
58. [Whelan2016]ElasticFusion (4/4)
TUM RGB-D DatasetでLocalization評価
ICL-NUIM DatasetでMapping評価
Surfelと処理速度の関係
Intel Core i7-4930K CPU at 3.4GHz,
32GB of RAM
nVidia GeForce GTX 780 Ti GPU
with 3GB mem
58
62. [Dai2017]BundleFusion (4/4)
Structure Sensorで取得した屋内データでの比較 ICL-NUIM DatasetでMapping評価
TUM RGB-D DatasetでLocalization評価 パフォーマンス評価
Core i7 3.4GHz CPU (32GB RAM)
NVIDIA GeForce GTXTitan X (for reconstruction)
NVIDIA GTXTitan Black (for search / global pose optimization)
62
63. 参考文献 (RGB-D SLAM)
[Newcombe2011]Newcombe, R. a., Davison,A. J., Izadi, S., Kohli,
P., Hilliges, O., Shotton, J., … Fitzgibbon,A. (2011). KinectFusion:
Real-time dense surface mapping and tracking. IEEE International
Symposium on Mixed and Augmented Reality.
[Kerl2013]Kerl, C., Strum, J., & Cremers, D. (2013). DenseVisual
SLAM for RGB-D Cameras. In IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS).
[Whelan2016]Whelan,T., Salas-Moreno, R. F., Glocker, B.,
Davidson,A. J., & Leutenegger, S. (2016). ElasticFuion: Real-Time
Dense SLAM and Light Source Estimation. The International
Journal of Robotics Research.
[Dai2017]Dai,A., Niessner, M., Zollhofer, M., Izadi, S., & Theobalt,
C. (2017). BundleFusion: Real-time Globally Consistent 3D
Reconstruction using On-the-fly Surface Re-integration. ACM
Transactions on Graphics (TOG).
63
81. 参考文献 (Visual Inertial SLAM)
[Leutenegeer2015]Leutenegeer, S., Furgale, P., Rabaud,V., Chli,
M., Konolige, K., & Siegwart, R. (2015). Keyframe-BasedVisual-
Inertial SLAM Using Nonlinear Optimization. The International
Journal of Robotics Research, (september).
[Qin2018]Qin,T., Li, P., & Shen, S. (2018).VINS-Mono:A Robust
andVersatile MonocularVisual-Inertial State Estimator. IEEE
Transactions on Robotics, 34(4), 1004–1020.
[Bloesch2017]Bloesch, M., Burri, M., Omari, S., Hutter, M., &
Siegwart, R. (2017). IEKF-basedVisual-Inertial Odometry using
Direct Photometric Feedback. The International Journal of
Robotics Research, 36(1053–1072).
[Schnider2017]Schnider,T., Dymczyk, M., Fehr, M., Egger, K.,
Lynen, S., Gilitschenski, I., & Siegwart, R. (2017). maplab:An
Open Framework for Research inVisual-inertial Mapping and
Localization. IEEE Robot, 3, 1418–1425.
81