Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...Masaya Kaneko
SfMLearner + KF selectionを提案した"Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM [ICCV19]"を論文読み会で紹介した時の資料です.
11. 10
1. Camera Calibration
• Directな手法ではこの部分がかなり大事
– Indirectな手法では特徴抽出器・記述子は測光の変動に頑強性を
持つのでこの操作の大部分は無視することができる
– Geometric CalibrationとPhotometric Calibrationの2種類で
モデル化する
• Geometric Calibration
– よく知られたピンホールカメラモデル
– 3D点 𝑥, 𝑦, 𝑧 ∈ ℝ3
から画像点 𝑢 𝑑, 𝑣 𝑑 ∈ Ωへ
(投影関数であり, Π 𝑐 ∶ ℝ3 → Ω と表記)
(1)[1]
[1] J. Engel, V. Usenko, D. Cremers. A Photometrically Calibrated Benchmark For Monocular Visual Odometry, In arXiv:1607.02555, 2016.
12. 11
1. Camera Calibration
– 今回は歪みあり画像点 𝑢 𝑑, 𝑣 𝑑 から歪みなし画像点 𝑢 𝑢, 𝑣 𝑢 へ変換
– この点を三次元へ変換する際には以下の変換を行う
(逆投影関数であり, Π 𝑐
−1 ∶ ℝ × Ω → ℝ3 と表記)
– 今回のcalibrationはPTAM[2]の実装を使い,チェックボードを用
いることで [𝑓𝑥, 𝑓𝑦, 𝑐 𝑥, 𝑐 𝑦, 𝜔]を推定
[1] J. Engel, V. Usenko, D. Cremers. A Photometrically Calibrated Benchmark For Monocular Visual Odometry, In arXiv:1607.02555, 2016.
[2] G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In International Symposium on Mixed and Augmented Reality (ISMAR), 2007.
(2,3) [1]
20. 19
3. Windowベースの最適化
• Jacob行列𝐽 𝑘の定義
– Gauss-Newton法において𝒙を動かす方向(勾配を降りる)となる
– Jacob行列は𝛿geo = 𝐓𝑖, 𝐓𝑗, 𝑑, 𝒄 , 𝛿photo = (𝑎𝑖, 𝑎𝑗, 𝑏𝑖, 𝑏𝑗)で分割
– これにより以下2つの近似を行うことができる
• First Estimate Jacobians [4]による安定性の確保?
– 𝐽geo, 𝐽photoは𝒙に対してsmoothな空間になっている
• 𝐽geoは𝒩𝑝全体で等しくなるので中央画素だけ計算する(削減)
(12)
(13)
[4] G. P. Huang, A. I. Mourikis, and S. I. Roumeliotis. A first-estimates Jacobian EKF for improving SLAM consistency. In International Symposium on Experimental Robotics, 2008.
6
58. 57
参考文献
• J. Engel, V. Koltun, D. Cremers. Direct sparse odometry. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2017.
- 本論文
• J. Engel, V. Usenko, D. Cremers. A Photometrically Calibrated Benchmark For
Monocular Visual Odometry, In arXiv:1607.02555, 2016.
- Photometric Calibrationの詳細(本スライド引用[1])
• E. Ethan. Gauss-Newton / Levenberg-Marquardt optimization. 2013.
- Gauss-Newton法の説明資料(本スライド引用[5])
• B. Jose-Luis. A tutorial on se (3) transformation parameterizations and on-m
anifold optimization. University of Malaga, 2010.
- CVにおけるLie代数の説明資料(本スライド引用[3])
• 岡谷貴之, et al. バンドルアジャストメント. 研究報告コンピュータビジョンとイ
メージメディア (CVIM), 2009, 2009.37: 1-16.
- BAの最適化に関する入門資料(本スライド引用[6])
• B. Simon, I. MATTHEWS. Lucas-Kanade 20 Years On: A Unifying Framewor
k. International journal of computer vision, 2004, 56.3: 221-255.
- DirectなSLAMの最適化に使われるLucas-Kanade法の説明資料(Gauss-Ne
wton法, Levenberg-Marquardt法の部分が参考になった)